2.6.21-rc7-mm2

2007-04-25 Thread Andrew Morton

ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.21-rc7/2.6.21-rc7-mm2/


- this has everything which is in 2.6.21.  Plus more!

- a number of nasty bugs were fixed.  This should be (a lot) more stable
  than 2.6.21-rc7-mm1.

  Some sysfs-related problems are still expected.  Fiddling with the
  setting of CONFIG_SYSFS_DEPRECATED might help avoid them.

- the 64-bit futex patches and (consequently) the private-futex patches were
  dropped.  Because the 64-bit futex patches need to be reconstituted.

- the unprivileged mounts code was dropped, pending an updated patch series

- lots of minor fbdev bugs were fixed


Boilerplate:

- See the `hot-fixes' directory for any important updates to this patchset.

- To fetch an -mm tree using git, use (for example)

  git-fetch git://git.kernel.org/pub/scm/linux/kernel/git/smurf/linux-trees.git 
tag v2.6.16-rc2-mm1
  git-checkout -b local-v2.6.16-rc2-mm1 v2.6.16-rc2-mm1

- -mm kernel commit activity can be reviewed by subscribing to the
  mm-commits mailing list.

echo "subscribe mm-commits" | mail [EMAIL PROTECTED]

- If you hit a bug in -mm and it is not obvious which patch caused it, it is
  most valuable if you can perform a bisection search to identify which patch
  introduced the bug.  Instructions for this process are at

http://www.zip.com.au/~akpm/linux/patches/stuff/bisecting-mm-trees.txt

  But beware that this process takes some time (around ten rebuilds and
  reboots), so consider reporting the bug first and if we cannot immediately
  identify the faulty patch, then perform the bisection search.

- When reporting bugs, please try to Cc: the relevant maintainer and mailing
  list on any email.

- When reporting bugs in this kernel via email, please also rewrite the
  email Subject: in some manner to reflect the nature of the bug.  Some
  developers filter by Subject: when looking for messages to read.

- Occasional snapshots of the -mm lineup are uploaded to
  ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/mm/ and are announced on
  the mm-commits list.



Changes since 2.6.21-rc7-mm1:

 origin.patch
 git-acpi.patch
 git-alsa.patch
 git-agpgart.patch
 git-arm.patch
 git-avr32.patch
 git-cifs.patch
 git-cpufreq.patch
 git-powerpc.patch
 git-drm.patch
 git-dvb.patch
 git-gfs2-nmw.patch
 git-hid.patch
 git-ia64.patch
 git-ieee1394.patch
 git-infiniband.patch
 git-input.patch
 git-jfs.patch
 git-kbuild.patch
 git-kvm.patch
 git-leds.patch
 git-libata-all.patch
 git-md-accel.patch
 git-mips.patch
 git-mmc.patch
 git-mtd.patch
 git-ubi.patch
 git-netdev-all.patch
 git-e1000.patch
 git-net.patch
 git-ioat.patch
 git-nfs-server-cluster-locking-api.patch
 git-ocfs2.patch
 git-parisc.patch
 git-r8169.patch
 git-selinux.patch
 git-pciseg.patch
 git-s390.patch
 git-s390-fixup.patch
 git-sh.patch
 git-scsi-misc.patch
 git-block.patch
 git-watchdog.patch
 git-ipwireless_cs.patch
 git-cryptodev.patch
 git-gccbug.patch

 git trees

-fix-possible-null-pointer-access-in-8250-serial-driver.patch
-fix-oom-killing-processes-wrongly-thought-mpol_bind.patch
-char-mxser_new-fix-recursive-locking.patch
-char-mxser_new-fix-tiocmiwait.patch
-char-mxser-fix-tiocmiwait.patch
-taskstats-fix-the-structure-members-alignment-issue.patch
-maintainers-use-listslinux-foundationorg.patch
-paride-drivers-initialize-spinlocks.patch
-add-mbuesch-to-mailmap.patch
-fix-spelling-in-drivers-video-kconfig.patch
-page-migration-fix-nr_file_pages-accounting.patch
-ieee1394-update-maintainers-database.patch
-v9fs-dont-use-primary-fid-when-removing-file.patch
-acpi-thermal-fix-mod_timer-interval.patch
-allow-reading-tainted-flag-as-user.patch
-do-not-truncate-irq-number-for-icom-adapter.patch
-hwmon-w83627ehf-dont-redefine-region_offset.patch
-reiserfs-fix-xattr-root-locking-refcount-bug.patch
-char-icom-mark-__init-as-__devinit.patch
-fault-injection-add-entry-to-maintainers.patch
-8250-fix-possible-deadlock-between-serial8250_handle_port-and-serial8250_interrupt.patch
-oom-kill-all-threads-that-share-mm-with-killed-task.patch
-fix-x86-fix-potential-overflow-in-perfctr-reservation.patch
-cleanup-cpufreq-kconfig-options.patch
-ppc-pci_32-stop-using-old-hotplug-unsafe-apis.patch
-jdelvare-i2c-i2c-delete-scx200_i2c.patch
-jdelvare-i2c-i2c-obsolete-ixp2000-and-ixp4xx.patch
-jdelvare-hwmon-hwmon-smsc47m1-use-dynamic-attributes.patch
-ide-cmd64x-remove-broken-sw-mw-dma-support.patch
-ide-cmd64x-interrupt-status-fixes-resend.patch
-ide-cmd64x-add-fix-enablebits.patch
-ide-cmd64x-procfs-code-fixes-cleanups.patch
-ide-cmd64x-use-interrupt-status-from-mrdmode-register.patch
-ide-cmd64x-add-back-mwdma-support.patch
-git-netdev-all-baycom_ser_fdx-fix.patch
-fix-sparse-errors-in-drivers-net-ibmvethc.patch
-netdrv-perform-missing-csum_offset-conversions.patch
-x86_64-mm-remove-noreplacement.patch
-fix-x86_64-mm-fam10-mwait-idle.patch
-more-fix-x86_64-mm-fam10-mwait-idle.patch
-fix-x86_64-mm-sched-clock-share.patch

Re: [PATCH -mm] x86_64: kill 19000+ sparse warnings

2007-04-25 Thread Andrew Morton
On Wed, 25 Apr 2007 22:45:09 -0700 Randy Dunlap <[EMAIL PROTECTED]> wrote:

> From: Randy Dunlap <[EMAIL PROTECTED]>
> 
> Eliminate 19439 (!!) sparse warnings like:
> include/linux/mm.h:321:22: warning: constant 0x8100 is so big it 
> is unsigned long
> 
> Eliminate 56 sparse warnings like:
> arch/x86_64/kernel/setup.c:248:16: warning: constant 0x8000 is so 
> big it is unsigned long
> 
> Eliminate 5 sparse warnings like:
> arch/x86_64/kernel/module.c:49:13: warning: constant 0xfff0 is so 
> big it is unsigned long
> 
> Eliminate 23 sparse warnings like:
> arch/x86_64/mm/init.c:551:37: warning: constant 0xc200 is so big 
> it is unsigned long
> 
> Eliminate 6 sparse warnings like:
> arch/x86_64/kernel/module.c:49:13: warning: constant 0x8800 is so 
> big it is unsigned long
> 
> Eliminate 23 sparse warnings like:
> arch/x86_64/mm/init.c:552:6: warning: constant 0xe1ff is so big 
> it is unsigned long
> 
> Eliminate 3 sparse warnings like:
> arch/x86_64/kernel/e820.c:186:17: warning: constant 0x3fff is so big 
> it is long
> 
> ...
>
> +#ifdef __ASSEMBLY__
>  #define MAXMEM0x3fff
>  #define VMALLOC_START0xc200
>  #define VMALLOC_END  0xe1ff
>  #define MODULES_VADDR0x8800
>  #define MODULES_END  0xfff0
>  #define MODULES_LEN   (MODULES_END - MODULES_VADDR)
> +#else
> +#define MAXMEM0x3fffUL
> +#define VMALLOC_START0xc200UL
> +#define VMALLOC_END  0xe1ffUL
> +#define MODULES_VADDR0x8800UL
> +#define MODULES_END  0xfff0UL
> +#define MODULES_LEN   (MODULES_END - MODULES_VADDR)
> +#endif
>  

hm, the duplication is unfortunate.

I wonder if it's worth doing a cpp token-pasting trick to avoid having to
do that.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Linux 2.6.21

2007-04-25 Thread Nick Piggin

Greg KH wrote:

On Thu, Apr 26, 2007 at 06:08:06AM +0200, Adrian Bunk wrote:


What I will NOT do:
Waste my time with tracking 2.6.22-rc regressions.



I sure hope you don't do this.

Tracking these is tough, and I think you are doing a great job with it.

No release will have no regressions, there's just too many different
combinations of hardware and sometimes people don't have the time to
test to see if their original report is even fixed or not.

And some of them will get fixed with patches coming in the next kernel
release, which will then be tracked down and added to the -stable
releases.

So if you can, please keep it up, if you think it's a thankless job,
here's my hearty thanks for doing this work.  It's really needed and I
really appreciate it.


Fifthed here, Adrian. It could potentially become one of the best things
to happen to the mainline release process (and I believe has already been
worthwhile). Even if it takes a while for people to get on board, or some
regressions slip through. And note, a release with regressions doesn't
make your hard work useless -- you've still got the important who, when,
how, etc. info that can be used in future, and it could serve as a "known
issues for upgraders" document as well.

--
SUSE Labs, Novell Inc.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [00/17] Large Blocksize Support V3

2007-04-25 Thread Eric W. Biederman
Christoph Lameter <[EMAIL PROTECTED]> writes:

> On Wed, 25 Apr 2007, Eric W. Biederman wrote:
>
>> The page cache has no problems supporting things with a block
>> size larger then page size.  Now the block device layer may not
>> have the code to do the scatter gather into small pages and it
>> may not handle buffer heads whose data is split between multiple
>> pages. 
>
> It does have that problem. If a system is in use then memory is fragmented
> and requests to the devices are in 4k sizes. The kernel has to manage the 
> 4k size. The number of requests that the driver can take is limited. 
> Larger blocks allow shuffling more data to the device.

I have a hard time believe that device hardware limits don't allow them
to have enough space to handle larger requests.  If so it was a poor
design by the hardware manufacturers.

>> And generally larger physical pages are a mistake to use.
>> Especially as it looks from some of the later comment you don't
>> date test on 32bit because the memory fragments faster.
>
> Ummm.. Dont get me to comment on i386. I never said that memory fragments 
> faster on i386. i386 has multiple issues with memory management that 
> require a lot of work and that will cause difficulty. If you have these
> fun systems with 512k ZONE_NORMAL and 63GB HIGHMEM then good luck...
>
>> Is it common for hardware that supports large block sizes to not
>> support splitting those blocks apart during DMA?  Unless it is common
>> the whole premise of this patchset seems broken.
>
> Huh? Splitting the blocks requires hardware effort -> Reduction in 
> transfer rate.

Splitting the blocks doesn't change the transfer effort one iota.
The bus pci/pcie/hypertransport already have block sizes below 4KB.
Reading a longer list of descriptors might slow things down, but I would
be surprised.

The physical medium is the primary disk bottleneck.

Thinking about it the fastest thing I can do with a filesystem or disk
is to not use it.  That is to cache it efficiently. Having page sized
chunks in my cache increases my caching efficiency.   Large order
pages work directly against my caching efficiency.

>> I suspect what needs to be fixed is the page cache block device
>> interface so that we have helper functions that know how to stuff
>> a single block into several pages.
>
> Oh we have scores of these hacks around. Look at the dvd/cd layer. The 
> point is to get rid of those.

Perhaps this is just a matter of cleaning them up so they are no
longer hacks?

You are trying to couple something that has no business being coupled
as it reduces the system usability when you couple them.

>> Right now I don't even want to think about trying to use a swap device
>> with a large block size when we are low on memory.
>
> But that is due to the VM (at least Linus tree) having no defrag methods.
> mm has Mel's antifrag methods and can do it.

This is fundamental.  Fragmentation when you multiple chunk sizes
cannot be solved without a the ability to move things in memory,
whereas it doesn't exist when you only have a single chunk size.

>> > 2. 32/64k blocksize is also used in flash devices. Same issues.
>> 
>> flash devices are not block devices so I strongly doubt it is
>> the same issue.
>
> But they could be treated as such. Right now these poor guys have to 
> improvise around the page size limit.

The reason they are different is that they have very different
fundamental properties.  Flash devices have essentially no seek time
so random access if fast.  However the have a maximum number of erases
per sector so you have to be careful to do wear leveling.  Flash
devices are distinctly different, and using the block layer for them
while they do not behave like block devices is the wrong thing to do.

>> > 4. Reduce fsck times. Larger block sizes mean faster file system checking.
>> 
>> Fewer seeks and less meta-data means faster fsck times.  Larger block
>> sizes get us there only tangentially.  
>
> Less meta data to manage does not reduce fsck times? Going from order 0 to 
> order 2 blocks cuts the metadata to a fourth.

I agree that less meta data helps.  But switching to extents can reduce the
meta data much more, and still doesn't penalize you for small files if
you have them.

>> > 5. Performance. If we look at IA64 vs. x86_64 then it seems that the
>> >faster interrupt handling on x86_64 compensate for the speed loss due to
>> >a smaller page size (4k vs 16k on IA64). Supporting larger block sizes
>> > sizes on all allows a significant reduction in I/O overhead and increases
>> >the size of I/O that can be performed by hardware in a single request
>> >since the number of scatter gather entries are typically limited for
>> >one request. This is going to become increasingly important to support
>> >the ever growing memory sizes since we may have to handle excessively
>> >large amounts of 4k requests for data sizes that may become common
>> >soon. For example to write a 1 

Re: pgprot_writecombine() and PATs on x86

2007-04-25 Thread Michael S. Tsirkin
> So in general the pci prefetchable attribute means write-combining as
> well as prefetching is safe.  A sane BIOS will allocate prefetchable
> BARS contiguously in the address space.  So on a good day you
> can just use one MTRR to map all of the prefetchable BARs as write-combining.

Good point, and sounds easy enough.
So why does not linux do it automatically then where possible?

There are sure to be some broken devices, but if some device
can't live with WC, we can always disable WC system-wide.

-- 
MST
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH -mm] x86_64: kill 19000+ sparse warnings

2007-04-25 Thread Randy Dunlap
From: Randy Dunlap <[EMAIL PROTECTED]>

Eliminate 19439 (!!) sparse warnings like:
include/linux/mm.h:321:22: warning: constant 0x8100 is so big it is 
unsigned long

Eliminate 56 sparse warnings like:
arch/x86_64/kernel/setup.c:248:16: warning: constant 0x8000 is so 
big it is unsigned long

Eliminate 5 sparse warnings like:
arch/x86_64/kernel/module.c:49:13: warning: constant 0xfff0 is so 
big it is unsigned long

Eliminate 23 sparse warnings like:
arch/x86_64/mm/init.c:551:37: warning: constant 0xc200 is so big it 
is unsigned long

Eliminate 6 sparse warnings like:
arch/x86_64/kernel/module.c:49:13: warning: constant 0x8800 is so 
big it is unsigned long

Eliminate 23 sparse warnings like:
arch/x86_64/mm/init.c:552:6: warning: constant 0xe1ff is so big it 
is unsigned long

Eliminate 3 sparse warnings like:
arch/x86_64/kernel/e820.c:186:17: warning: constant 0x3fff is so big it 
is long

Signed-off-by: Randy Dunlap <[EMAIL PROTECTED]>
---
 include/asm-x86_64/page.h|   11 +++
 include/asm-x86_64/pgtable.h |9 +
 2 files changed, 20 insertions(+)

--- linux-2.6.21-rc7-mm1.orig/include/asm-x86_64/page.h
+++ linux-2.6.21-rc7-mm1/include/asm-x86_64/page.h
@@ -80,9 +80,16 @@ extern unsigned long phys_base;
 
 #define __PHYSICAL_START   CONFIG_PHYSICAL_START
 #define __KERNEL_ALIGN 0x20
+
+#ifdef __ASSEMBLY__
 #define __START_KERNEL (__START_KERNEL_map + __PHYSICAL_START)
 #define __START_KERNEL_map 0x8000
 #define __PAGE_OFFSET   0x8100
+#else
+#define __START_KERNEL (__START_KERNEL_map + __PHYSICAL_START)
+#define __START_KERNEL_map 0x8000UL
+#define __PAGE_OFFSET   0x8100UL
+#endif
 
 /* to align the pointer to the (next) page boundary */
 #define PAGE_ALIGN(addr)   (((addr)+PAGE_SIZE-1)_MASK)
@@ -94,7 +101,11 @@ extern unsigned long phys_base;
 #define __VIRTUAL_MASK ((_AC(1,UL) << __VIRTUAL_MASK_SHIFT) - 1)
 
 #define KERNEL_TEXT_SIZE  (40*1024*1024)
+#ifdef __ASSEMBLY__
 #define KERNEL_TEXT_START 0x8000
+#else
+#define KERNEL_TEXT_START 0x8000UL
+#endif
 
 #ifndef __ASSEMBLY__
 
--- linux-2.6.21-rc7-mm1.orig/include/asm-x86_64/pgtable.h
+++ linux-2.6.21-rc7-mm1/include/asm-x86_64/pgtable.h
@@ -134,12 +134,21 @@ static inline pte_t ptep_get_and_clear_f
 #define USER_PTRS_PER_PGD  ((TASK_SIZE-1)/PGDIR_SIZE+1)
 #define FIRST_USER_ADDRESS 0
 
+#ifdef __ASSEMBLY__
 #define MAXMEM  0x3fff
 #define VMALLOC_START0xc200
 #define VMALLOC_END  0xe1ff
 #define MODULES_VADDR0x8800
 #define MODULES_END  0xfff0
 #define MODULES_LEN   (MODULES_END - MODULES_VADDR)
+#else
+#define MAXMEM  0x3fffUL
+#define VMALLOC_START0xc200UL
+#define VMALLOC_END  0xe1ffUL
+#define MODULES_VADDR0x8800UL
+#define MODULES_END  0xfff0UL
+#define MODULES_LEN   (MODULES_END - MODULES_VADDR)
+#endif
 
 #define _PAGE_BIT_PRESENT  0
 #define _PAGE_BIT_RW   1
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [00/17] Large Blocksize Support V3

2007-04-25 Thread Nick Piggin

Eric W. Biederman wrote:

[EMAIL PROTECTED] writes:



V2->V3
- More restructuring
- It actually works!
- Add XFS support
- Fix up UP support
- Work out the direct I/O issues
- Add CONFIG_LARGE_BLOCKSIZE. Off by default which makes the inlines revert
 back to constants. Disabled for 32bit and HIGHMEM configurations.
 This also allows a gradual migration to the new page cache
 inline functions. LARGE_BLOCKSIZE capabilities can be
 added gradually and if there is a problem then we can disable
 a subsystem.

V1->V2
- Some ext2 support
- Some block layer, fs layer support etc.
- Better page cache macros
- Use macros to clean up code.

This patchset modifies the Linux kernel so that larger block sizes than
page size can be supported. Larger block sizes are handled by using
compound pages of an arbitrary order for the page cache instead of
single pages with order 0.



Huh?

You seem to be mixing two very different concepts.

The page cache has no problems supporting things with a block
size larger then page size.  Now the block device layer may not
have the code to do the scatter gather into small pages and it
may not handle buffer heads whose data is split between multiple
pages. 


Yeah, this patch is not really large blocksize support (which we normally
think of as block size > page cache size).



But this is not a page cache issue.

And generally larger physical pages are a mistake to use.
Especially as it looks from some of the later comment you don't
date test on 32bit because the memory fragments faster.


I actually completely agree with this, and I'm concerned in general about
using higher order pages. I think it is fundamentally the wrong approach
because of fragmentation and defragmentation costs (similarly to Linus's
take on page colouring).

I think starting with the assumption that we _want_ to use higher order
allocations, and then creating all this complexity around that is not a
good one, and if we start introducing things that _require_ significant
higher order allocations to function then it is a nasty thing for
robustness.



Is it common for hardware that supports large block sizes to not
support splitting those blocks apart during DMA?  Unless it is common
the whole premise of this patchset seems broken.

I suspect what needs to be fixed is the page cache block device
interface so that we have helper functions that know how to stuff
a single block into several pages.


I am working now and again on some code to do this, it is a big job but
I think it is the right way to do it. But it would take a long time to
get stable and supported by filesystems...



That would make the choice of using larger order pages (essentially
increasing PAGE_SIZE) something that can be investigated in parallel.


I agree that hardware inefficiencies should be handled by increasing
PAGE_SIZE (not making PAGE_CACHE_SIZE > PAGE_SIZE) at the arch level.

--
SUSE Labs, Novell Inc.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[patch 7/7] SLUB: Major slabinfo update

2007-04-25 Thread clameter
Enhancement to slabinfo
- Support for slab shrinking (-r option)
- Slab summary showing system totals
- Sync with new form of alias handling
- Sort by size, reverse sorting etc
- Alias lookups

Signed-off-by: Christoph Lameter <[EMAIL PROTECTED]>

Index: linux-2.6.21-rc7-mm1/Documentation/vm/slabinfo.c
===
--- linux-2.6.21-rc7-mm1.orig/Documentation/vm/slabinfo.c   2007-04-25 
21:20:24.0 -0700
+++ linux-2.6.21-rc7-mm1/Documentation/vm/slabinfo.c2007-04-25 
21:46:40.0 -0700
@@ -3,7 +3,7 @@
  *
  * (C) 2007 sgi, Christoph Lameter <[EMAIL PROTECTED]>
  *
- * Compile by doing:
+ * Compile by:
  *
  * gcc -o slabinfo slabinfo.c
  */
@@ -17,15 +17,47 @@
 #include 
 #include 
 
+#define MAX_SLABS 500
+#define MAX_ALIASES 500
+#define MAX_NODES 1024
+
+struct slabinfo {
+   char *name;
+   int alias;
+   int refs;
+   int aliases, align, cache_dma, cpu_slabs, destroy_by_rcu;
+   int hwcache_align, object_size, objs_per_slab;
+   int sanity_checks, slab_size, store_user, trace;
+   int order, poison, reclaim_account, red_zone;
+   unsigned long partial, objects, slabs;
+   int numa[MAX_NODES];
+   int numa_partial[MAX_NODES];
+} slabinfo[MAX_SLABS];
+
+struct aliasinfo {
+   char *name;
+   char *ref;
+   struct slabinfo *slab;
+} aliasinfo[MAX_ALIASES];
+
+int slabs = 0;
+int aliases = 0;
+int highest_node = 0;
+
 char buffer[4096];
 
 int show_alias = 0;
 int show_slab = 0;
-int show_parameters = 0;
 int skip_zero = 1;
 int show_numa = 0;
 int show_track = 0;
+int show_first_alias = 0;
 int validate = 0;
+int shrink = 0;
+int show_inverted = 0;
+int show_single_ref = 0;
+int show_totals = 0;
+int sort_size = 0;
 
 int page_size;
 
@@ -47,11 +79,16 @@ void usage(void)
"-a|--aliases   Show aliases\n"
"-h|--help  Show usage information\n"
"-n|--numa  Show NUMA information\n"
-   "-p|--parametersShow global parameters\n"
+   "-r|--reduceShrink slabs\n"
"-v|--validate  Validate slabs\n"
"-t|--tracking  Show alloc/free information\n"
+   "-T|--TotalsShow summary information\n"
"-s|--slabs Show slabs\n"
+   "-S|--Size  Sort by size\n"
"-z|--zero  Include empty slabs\n"
+   "-f|--first-alias   Show first alias\n"
+   "-i|--inverted  Inverted list\n"
+   "-1|--1ref  Single reference\n"
);
 }
 
@@ -86,23 +123,32 @@ unsigned long get_obj(char *name)
 unsigned long get_obj_and_str(char *name, char **x)
 {
unsigned long result = 0;
+   char *p;
+
+   *x = NULL;
 
if (!read_obj(name)) {
x = NULL;
return 0;
}
-   result = strtoul(buffer, x, 10);
-   while (**x == ' ')
-   (*x)++;
+   result = strtoul(buffer, , 10);
+   while (*p == ' ')
+   p++;
+   if (*p)
+   *x = strdup(p);
return result;
 }
 
-void set_obj(char *name, int n)
+void set_obj(struct slabinfo *s, char *name, int n)
 {
-   FILE *f = fopen(name, "w");
+   char x[100];
+
+   sprintf(x, "%s/%s", s->name, name);
+
+   FILE *f = fopen(x, "w");
 
if (!f)
-   fatal("Cannot write to %s\n", name);
+   fatal("Cannot write to %s\n", x);
 
fprintf(f, "%d\n", n);
fclose(f);
@@ -143,167 +189,613 @@ int store_size(char *buffer, unsigned lo
return n;
 }
 
-void alias(const char *name)
+void decode_numa_list(int *numa, char *t)
 {
-   int count;
-   char *p;
-
-   if (!show_alias)
-   return;
+   int node;
+   int nr;
 
-   count = readlink(name, buffer, sizeof(buffer));
+   memset(numa, 0, MAX_NODES * sizeof(int));
 
-   if (count < 0)
-   return;
+   while (*t == 'N') {
+   t++;
+   node = strtoul(t, , 10);
+   if (*t == '=') {
+   t++;
+   nr = strtoul(t, , 10);
+   numa[node] = nr;
+   if (node > highest_node)
+   highest_node = node;
+   }
+   while (*t == ' ')
+   t++;
+   }
+}
 
-   buffer[count] = 0;
+char *hackname(struct slabinfo *s)
+{
+   char *n = s->name;
 
-   p = buffer + count;
+   if (n[0] == ':') {
+   char *nn = malloc(20);
+   char *p;
+
+   strncpy(nn, n, 20);
+   n = nn;
+   p = n + 4;
+   while (*p && *p !=':')
+   p++;
+   *p = 0;
+   }
+   return n;
+}
 
-   while (p > buffer && p[-1] != '/')
- 

[patch 6/7] SLUB: Free slabs and sort partial slab lists in kmem_cache_shrink

2007-04-25 Thread clameter
At kmem_cache_shrink check if we have any empty slabs on the partial
if so then remove them.

Also--as an anti-fragmentation measure--sort the partial slabs so that
the most fully allocated ones come first and the least allocated last.

The next allocations may fill up the nearly full slabs. Having the
least allocated slabs last gives them the maximum chance that their
remaining objects may be freed. Thus we can hopefully minimize the
partial slabs.

I think this is the best one can do in terms antifragmentation
measures. Real defragmentation (meaning moving objects out of slabs with
the least free objects to those that are almost full) can be implemted
by reverse scanning through the list produced here but that would mean
that we need to provide a callback at slab cache creation that allows
the deletion or moving of an object. This will involve slab API
changes so defer for now.

Signed-off-by: Christoph Lameter <[EMAIL PROTECTED]>

---
 mm/slub.c |  118 ++
 1 file changed, 104 insertions(+), 14 deletions(-)

Index: linux-2.6.21-rc7-mm1/mm/slub.c
===
--- linux-2.6.21-rc7-mm1.orig/mm/slub.c 2007-04-25 21:25:48.0 -0700
+++ linux-2.6.21-rc7-mm1/mm/slub.c  2007-04-25 21:27:07.0 -0700
@@ -109,9 +109,19 @@
 /* Enable to test recovery from slab corruption on boot */
 #undef SLUB_RESILIENCY_TEST
 
-/* Mininum number of partial slabs */
+/*
+ * Mininum number of partial slabs. These will be left on the partial
+ * lists even if they are empty. kmem_cache_shrink may reclaim them.
+ */
 #define MIN_PARTIAL 2
 
+/*
+ * Maximum number of desirable partial slabs.
+ * The existence of more partial slabs makes kmem_cache_shrink
+ * sort the partial list by the number of objects in the.
+ */
+#define MAX_PARTIAL 10
+
 #define DEBUG_DEFAULT_FLAGS (SLAB_DEBUG_FREE | SLAB_RED_ZONE | \
SLAB_POISON | SLAB_STORE_USER)
 /*
@@ -2163,6 +2173,78 @@ void kfree(const void *x)
 }
 EXPORT_SYMBOL(kfree);
 
+/*
+ *  kmem_cache_shrink removes empty slabs from the partial lists
+ *  and then sorts the partially allocated slabs by the number
+ *  of items in use. The slabs with the most items in use
+ *  come first. New allocations will remove these from the
+ *  partial list because they are full. The slabs with the
+ *  least items are placed last. If it happens that the objects
+ *  are freed then the page can be returned to the page allocator.
+ */
+int kmem_cache_shrink(struct kmem_cache *s)
+{
+   int node;
+   int i;
+   struct kmem_cache_node *n;
+   struct page *page;
+   struct page *t;
+   struct list_head *slabs_by_inuse =
+   kmalloc(sizeof(struct list_head) * s->objects, GFP_KERNEL);
+   unsigned long flags;
+
+   if (!slabs_by_inuse)
+   return -ENOMEM;
+
+   flush_all(s);
+   for_each_online_node(node) {
+   n = get_node(s, node);
+
+   if (n->nr_partial <= MIN_PARTIAL)
+   continue;
+
+   for (i = 0; i < s->objects; i++)
+   INIT_LIST_HEAD(slabs_by_inuse + i);
+
+   spin_lock_irqsave(>list_lock, flags);
+
+   /*
+* Build lists indexed by the items in use in
+* each slab or free slabs if empty.
+*
+* Note that concurrent frees may occur while
+* we hold the list_lock. page->inuse here is
+* the upper limit.
+*/
+   list_for_each_entry_safe(page, t, >partial, lru) {
+   if (!page->inuse) {
+   list_del(>lru);
+   discard_slab(s, page);
+   } else
+   if (n->nr_partial > MAX_PARTIAL)
+   list_move(>lru,
+   slabs_by_inuse + page->inuse);
+   }
+
+   if (n->nr_partial <= MAX_PARTIAL)
+   goto out;
+
+   /*
+* Rebuild the partial list with the slabs filled up
+* most first and the least used slabs at the end.
+*/
+   for (i = s->objects - 1; i > 0; i--)
+   list_splice(slabs_by_inuse + i, n->partial.prev);
+
+   out:
+   spin_unlock_irqrestore(>list_lock, flags);
+   }
+
+   kfree(slabs_by_inuse);
+   return 0;
+}
+EXPORT_SYMBOL(kmem_cache_shrink);
+
 /**
  * krealloc - reallocate memory. The contents will remain unchanged.
  *
@@ -2408,17 +2490,6 @@ static struct notifier_block __cpuinitda
 
 #endif
 
-/***
- * Compatiblility definitions
- **/
-
-int kmem_cache_shrink(struct kmem_cache *s)
-{
-   

Re: Question about Reiser4

2007-04-25 Thread lkml777

On Wed, 25 Apr 2007 22:49:11 +0800, "Jeff Chua"
<[EMAIL PROTECTED]> said:
> 
> Reiser4 has great potential and I'll be more than happy to test it.
> 
Yeah,... let us know the details of your testing.
-- 
  
  [EMAIL PROTECTED]

-- 
http://www.fastmail.fm - Access all of your messages and folders
  wherever you are

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[patch 4/7] SLUB: Conform more to SLABs SLAB_HWCACHE_ALIGN behavior

2007-04-25 Thread clameter
Currently SLUB is using a strict L1_CACHE_BYTES alignment if
SLAB_HWCACHE_ALIGN is specified. SLAB does not align to a cacheline if the
object is smaller than half of a cacheline. Small objects are then aligned
by SLAB to a fraction of a cacheline.

Make SLUB just forget about the alignment requirement if the object size
is less than L1_CACHE_BYTES. It seems that fractional alignments are no
good because they grow the object and reduce the object density in a cache
line needlessly causing additional cache line fetches.

If we are already throwing the user suggestion of a cache line alignment
away then lets do the best we can. Maybe SLAB_HWCACHE_ALIGN also needs
to be tossed given its wishy-washy handling but doing so would require
an audit of all kmem_cache_allocs throughout the kernel source.

In any case one needs to explictly specify an alignment during
kmem_cache_create to either slab allocator in order to ensure that the
objects are cacheline aligned.

Signed-off-by: Christoph Lameter <[EMAIL PROTECTED]>

Index: linux-2.6.21-rc7-mm1/mm/slub.c
===
--- linux-2.6.21-rc7-mm1.orig/mm/slub.c 2007-04-25 21:23:56.0 -0700
+++ linux-2.6.21-rc7-mm1/mm/slub.c  2007-04-25 21:23:59.0 -0700
@@ -1482,9 +1482,19 @@ static int calculate_order(int size)
  * various ways of specifying it.
  */
 static unsigned long calculate_alignment(unsigned long flags,
-   unsigned long align)
+   unsigned long align, unsigned long size)
 {
-   if (flags & SLAB_HWCACHE_ALIGN)
+   /*
+* If the user wants hardware cache aligned objects then
+* follow that suggestion if the object is sufficiently
+* large.
+*
+* The hardware cache alignment cannot override the
+* specified alignment though. If that is greater
+* then use it.
+*/
+   if ((flags & SLAB_HWCACHE_ALIGN) &&
+   size > L1_CACHE_BYTES / 2)
return max_t(unsigned long, align, L1_CACHE_BYTES);
 
if (align < ARCH_SLAB_MINALIGN)
@@ -1673,7 +1683,7 @@ static int calculate_sizes(struct kmem_c
 * user specified (this is unecessarily complex due to the attempt
 * to be compatible with SLAB. Should be cleaned up some day).
 */
-   align = calculate_alignment(flags, align);
+   align = calculate_alignment(flags, align, s->objsize);
 
/*
 * SLUB stores one object immediately after another beginning from
@@ -2250,7 +2260,7 @@ static struct kmem_cache *find_mergeable
return NULL;
 
size = ALIGN(size, sizeof(void *));
-   align = calculate_alignment(flags, align);
+   align = calculate_alignment(flags, align, size);
size = ALIGN(size, align);
 
list_for_each(h, _caches) {

--
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[patch 3/7] SLUB: debug printk cleanup

2007-04-25 Thread clameter
Set up a new function slab_err in order to report errors consistently.

Consistently report corrective actions taken by SLUB by a printk starting
with @@@.

Signed-off-by: Christoph Lameter <[EMAIL PROTECTED]>

Index: linux-2.6.21-rc7-mm1/mm/slub.c
===
--- linux-2.6.21-rc7-mm1.orig/mm/slub.c 2007-04-25 21:20:36.0 -0700
+++ linux-2.6.21-rc7-mm1/mm/slub.c  2007-04-25 21:22:50.0 -0700
@@ -324,8 +324,8 @@ static void object_err(struct kmem_cache
 {
u8 *addr = page_address(page);
 
-   printk(KERN_ERR "*** SLUB: %s in [EMAIL PROTECTED] slab 0x%p\n",
-   reason, s->name, object, page);
+   printk(KERN_ERR "*** SLUB %s: [EMAIL PROTECTED] slab 0x%p\n",
+   s->name, reason, object, page);
printk(KERN_ERR "offset=%tu flags=0x%04lx inuse=%u freelist=0x%p\n",
object - addr, page->flags, page->inuse, page->freelist);
if (object > addr + 16)
@@ -335,6 +335,19 @@ static void object_err(struct kmem_cache
dump_stack();
 }
 
+static void slab_err(struct kmem_cache *s, struct page *page, char *reason, 
...)
+{
+   va_list args;
+   char buf[100];
+
+   va_start(args, reason);
+   vsnprintf(buf, sizeof(buf), reason, args);
+   va_end(args);
+   printk(KERN_ERR "*** SLUB %s: %s in slab @0x%p\n", s->name, buf,
+   page);
+   dump_stack();
+}
+
 static void init_object(struct kmem_cache *s, void *object, int active)
 {
u8 *p = object;
@@ -412,7 +425,7 @@ static int check_valid_pointer(struct km
 static void restore_bytes(struct kmem_cache *s, char *message, u8 data,
void *from, void *to)
 {
-   printk(KERN_ERR "@@@ SLUB: %s Restoring %s (0x%x) from 0x%p-0x%p\n",
+   printk(KERN_ERR "@@@ SLUB %s: Restoring %s (0x%x) from 0x%p-0x%p\n",
s->name, message, data, from, to - 1);
memset(from, data, to - from);
 }
@@ -459,9 +472,7 @@ static int slab_pad_check(struct kmem_ca
return 1;
 
if (!check_bytes(p + length, POISON_INUSE, remainder)) {
-   printk(KERN_ERR "SLUB: %s slab 0x%p: Padding fails check\n",
-   s->name, p);
-   dump_stack();
+   slab_err(s, page, "Padding check failed");
restore_bytes(s, "slab padding", POISON_INUSE, p + length,
p + length + remainder);
return 0;
@@ -547,30 +558,25 @@ static int check_slab(struct kmem_cache 
VM_BUG_ON(!irqs_disabled());
 
if (!PageSlab(page)) {
-   printk(KERN_ERR "SLUB: %s Not a valid slab page @0x%p "
-   "flags=%lx mapping=0x%p count=%d \n",
-   s->name, page, page->flags, page->mapping,
+   slab_err(s, page, "Not a valid slab page flags=%lx "
+   "mapping=0x%p count=%d", page->flags, page->mapping,
page_count(page));
return 0;
}
if (page->offset * sizeof(void *) != s->offset) {
-   printk(KERN_ERR "SLUB: %s Corrupted offset %lu in slab @0x%p"
-   " flags=0x%lx mapping=0x%p count=%d\n",
-   s->name,
+   slab_err(s, page, "Corrupted offset %lu flags=0x%lx "
+   "mapping=0x%p count=%d",
(unsigned long)(page->offset * sizeof(void *)),
-   page,
page->flags,
page->mapping,
page_count(page));
-   dump_stack();
return 0;
}
if (page->inuse > s->objects) {
-   printk(KERN_ERR "SLUB: %s inuse %u > max %u in slab "
-   "page @0x%p flags=%lx mapping=0x%p count=%d\n",
-   s->name, page->inuse, s->objects, page, page->flags,
+   slab_err(s, page, "inuse %u > max %u @0x%p flags=%lx "
+   "mapping=0x%p count=%d",
+   s->name, page->inuse, s->objects, page->flags,
page->mapping, page_count(page));
-   dump_stack();
return 0;
}
/* Slab_pad_check fixes things up after itself */
@@ -599,12 +605,13 @@ static int on_freelist(struct kmem_cache
set_freepointer(s, object, NULL);
break;
} else {
-   printk(KERN_ERR "SLUB: %s slab 0x%p "
-   "freepointer 0x%p corrupted.\n",
-   s->name, page, fp);
-   dump_stack();
+   slab_err(s, page, "Freepointer 0x%p corrupt",
+   fp);

[patch 2/7] SLAB: Fix sysfs directory handling

2007-04-25 Thread clameter
This fixes the problem that SLUB does not track the names of aliased
slabs by changing the way that SLUB manages the files in /sys/slab.

If the slab that is being operated on is not mergeable (usually the
case if we are debugging) then do not create any aliases. If an alias
exists that we conflict with then remove it before creating the
directory for the unmergeable slab. If there is a true slab cache there
and not an alias then we fail since there is a true duplication of
slab cache names. So debugging allows the detection of slab name
duplication as usual.

If the slab is mergeable then we create a directory with a unique name
created from the slab size, slab options and the pointer to the kmem_cache
structure (disambiguation). All names referring to the slabs will
then be created as symlinks to that unique name. These symlinks are
not going to be removed on kmem_cache_destroy() since we only carry
a counter for the number of aliases. If a new symlink is created
then it may just replace an existing one. This means that one can create
a gazillion slabs with the same name (if they all refer to mergeable
caches). It will only increase the alias count. So we have the potential
of not detecting duplicate slab names (there is actually no harm
done by doing that). We will detect the duplications as
as soon as debugging is enabled because we will then no longer
generate symlinks and special unique names.

Signed-off-by: Christoph Lameter <[EMAIL PROTECTED]>

Index: linux-2.6.21-rc7-mm1/mm/slub.c
===
--- linux-2.6.21-rc7-mm1.orig/mm/slub.c 2007-04-25 19:41:23.0 -0700
+++ linux-2.6.21-rc7-mm1/mm/slub.c  2007-04-25 19:41:23.0 -0700
@@ -3297,16 +3297,68 @@ static struct kset_uevent_ops slab_ueven
 
 decl_subsys(slab, _ktype, _uevent_ops);
 
+#define ID_STR_LENGTH 64
+
+/* Create a unique string id for a slab cache:
+ * format
+ * :[flags-]size:[memory address of kmemcache]
+ */
+static char *create_unique_id(struct kmem_cache *s)
+{
+   char *name = kmalloc(ID_STR_LENGTH, GFP_KERNEL);
+   char *p = name;
+
+   BUG_ON(!name);
+
+   *p++ = ':';
+   /*
+* First flags affecting slabcache operations */
+   if (s->flags & SLAB_CACHE_DMA)
+   *p++ = 'd';
+   if (s->flags & SLAB_RECLAIM_ACCOUNT)
+   *p++ = 'a';
+   if (s->flags & SLAB_DESTROY_BY_RCU)
+   *p++ = 'r';\
+   /* Debug flags */
+   if (s->flags & SLAB_RED_ZONE)
+   *p++ = 'Z';
+   if (s->flags & SLAB_POISON)
+   *p++ = 'P';
+   if (s->flags & SLAB_STORE_USER)
+   *p++ = 'U';
+   if (p != name + 1)
+   *p++ = '-';
+   p += sprintf(p,"%07d:0x%p" ,s->size, s);
+   BUG_ON(p > name + ID_STR_LENGTH - 1);
+   return name;
+}
+
 static int sysfs_slab_add(struct kmem_cache *s)
 {
int err;
+   const char *name;
 
if (slab_state < SYSFS)
/* Defer until later */
return 0;
 
+   if (s->flags & SLUB_NEVER_MERGE) {
+   /*
+* Slabcache can never be merged so we can use the name proper.
+* This is typically the case for debug situations. In that
+* case we can catch duplicate names easily.
+*/
+   sysfs_remove_link(_subsys.kset.kobj, s->name);
+   name = s->name;
+   } else
+   /*
+* Create a unique name for the slab as a target
+* for the symlinks.
+*/
+   name = create_unique_id(s);
+
kobj_set_kset_s(s, slab_subsys);
-   kobject_set_name(>kobj, s->name);
+   kobject_set_name(>kobj, name);
kobject_init(>kobj);
err = kobject_add(>kobj);
if (err)
@@ -3316,6 +3368,10 @@ static int sysfs_slab_add(struct kmem_ca
if (err)
return err;
kobject_uevent(>kobj, KOBJ_ADD);
+   if (!(s->flags & SLUB_NEVER_MERGE)) {
+   sysfs_slab_alias(s, s->name);
+   kfree(name);
+   }
return 0;
 }
 
@@ -3341,9 +3397,14 @@ static int sysfs_slab_alias(struct kmem_
 {
struct saved_alias *al;
 
-   if (slab_state == SYSFS)
+   if (slab_state == SYSFS) {
+   /*
+* If we have a leftover link then remove it.
+*/
+   sysfs_remove_link(_subsys.kset.kobj, name);
return sysfs_create_link(_subsys.kset.kobj,
>kobj, name);
+   }
 
al = kmalloc(sizeof(struct saved_alias), GFP_KERNEL);
if (!al)

--
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[patch 0/7] SLUB updates

2007-04-25 Thread clameter
A series of updates to slub to make error reporting and recovery
more consistent. Rework sysfs behavior, make kmem_cache_shrink
perform fragmentation avoidance and update the slabinfo tool.

--
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[patch 1/7] SLUB: Remove duplicate VM_BUG_ON

2007-04-25 Thread clameter
Somehow this artifact got in during merge with mm.

Signed-off-by: Christoph Lameter <[EMAIL PROTECTED]>

Index: linux-2.6.21-rc7-mm1/mm/slub.c
===
--- linux-2.6.21-rc7-mm1.orig/mm/slub.c 2007-04-25 09:48:40.0 -0700
+++ linux-2.6.21-rc7-mm1/mm/slub.c  2007-04-25 09:48:47.0 -0700
@@ -633,8 +633,6 @@ static void add_full(struct kmem_cache *
 
VM_BUG_ON(!irqs_disabled());
 
-   VM_BUG_ON(!irqs_disabled());
-
if (!(s->flags & SLAB_STORE_USER))
return;
 

--
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[patch 5/7] SLUB: Add MIN_PARTIAL

2007-04-25 Thread clameter
We leave a mininum of partial slabs on nodes when we search for
partial slabs on other node. Define a constant for that value.

Then modify slub to keep MIN_PARTIAL slabs around.

This avoids bad situations where a function frees the last object
in a slab (which results in the page being returned to the page
allocator) only to then allocate one again (which requires getting
a page back from the page allocator if the partial list was empty).
Keeping a couple of slabs on the partial list reduces overhead.

Empty slabs are added to the end of the partial list to ensure that
partially allocated slabs are consumed first (defragmentation).

Signed-off-by: Christoph Lameter <[EMAIL PROTECTED]>

Index: linux-2.6.21-rc7-mm1/mm/slub.c
===
--- linux-2.6.21-rc7-mm1.orig/mm/slub.c 2007-04-25 21:23:59.0 -0700
+++ linux-2.6.21-rc7-mm1/mm/slub.c  2007-04-25 21:25:48.0 -0700
@@ -109,6 +109,9 @@
 /* Enable to test recovery from slab corruption on boot */
 #undef SLUB_RESILIENCY_TEST
 
+/* Mininum number of partial slabs */
+#define MIN_PARTIAL 2
+
 #define DEBUG_DEFAULT_FLAGS (SLAB_DEBUG_FREE | SLAB_RED_ZONE | \
SLAB_POISON | SLAB_STORE_USER)
 /*
@@ -635,16 +638,8 @@ static int on_freelist(struct kmem_cache
 /*
  * Tracking of fully allocated slabs for debugging
  */
-static void add_full(struct kmem_cache *s, struct page *page)
+static void add_full(struct kmem_cache_node *n, struct page *page)
 {
-   struct kmem_cache_node *n;
-
-   VM_BUG_ON(!irqs_disabled());
-
-   if (!(s->flags & SLAB_STORE_USER))
-   return;
-
-   n = get_node(s, page_to_nid(page));
spin_lock(>list_lock);
list_add(>lru, >full);
spin_unlock(>list_lock);
@@ -923,10 +918,16 @@ static __always_inline int slab_trylock(
 /*
  * Management of partially allocated slabs
  */
-static void add_partial(struct kmem_cache *s, struct page *page)
+static void add_partial_tail(struct kmem_cache_node *n, struct page *page)
 {
-   struct kmem_cache_node *n = get_node(s, page_to_nid(page));
+   spin_lock(>list_lock);
+   n->nr_partial++;
+   list_add_tail(>lru, >partial);
+   spin_unlock(>list_lock);
+}
 
+static void add_partial(struct kmem_cache_node *n, struct page *page)
+{
spin_lock(>list_lock);
n->nr_partial++;
list_add(>lru, >partial);
@@ -1026,7 +1027,7 @@ static struct page *get_any_partial(stru
n = get_node(s, zone_to_nid(*z));
 
if (n && cpuset_zone_allowed_hardwall(*z, flags) &&
-   n->nr_partial > 2) {
+   n->nr_partial > MIN_PARTIAL) {
page = get_partial_node(n);
if (page)
return page;
@@ -1060,15 +1061,31 @@ static struct page *get_partial(struct k
  */
 static void putback_slab(struct kmem_cache *s, struct page *page)
 {
+   struct kmem_cache_node *n = get_node(s, page_to_nid(page));
+
if (page->inuse) {
+
if (page->freelist)
-   add_partial(s, page);
-   else if (PageError(page))
-   add_full(s, page);
+   add_partial(n, page);
+   else if (PageError(page) && (s->flags & SLAB_STORE_USER))
+   add_full(n, page);
slab_unlock(page);
+
} else {
-   slab_unlock(page);
-   discard_slab(s, page);
+   if (n->nr_partial < MIN_PARTIAL) {
+   /*
+* Adding an empty page to the partial slabs in order
+* to avoid page allocator overhead. This page needs to
+* come after all the others that are not fully empty
+* in order to make sure that we do maximum
+* defragmentation.
+*/
+   add_partial_tail(n, page);
+   slab_unlock(page);
+   } else {
+   slab_unlock(page);
+   discard_slab(s, page);
+   }
}
 }
 
@@ -1325,7 +1342,7 @@ checks_ok:
 * then add it.
 */
if (unlikely(!prior))
-   add_partial(s, page);
+   add_partial(get_node(s, page_to_nid(page)), page);
 
 out_unlock:
slab_unlock(page);
@@ -1541,7 +1558,7 @@ static struct kmem_cache_node * __init e
kmalloc_caches->node[node] = n;
init_kmem_cache_node(n);
atomic_long_inc(>nr_slabs);
-   add_partial(kmalloc_caches, page);
+   add_partial(n, page);
return n;
 }
 

--
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  

Re: Linux 2.6.21

2007-04-25 Thread Willy Tarreau
On Thu, Apr 26, 2007 at 06:08:06AM +0200, Adrian Bunk wrote:
> On Wed, Apr 25, 2007 at 08:29:28PM -0700, Linus Torvalds wrote:
> >...
> > So it's been over two and a half months, and while it's certainly not the 
> > longest release cycle ever, it still dragged out a bit longer than I'd 
> > have hoped for and it should have. As usual, I'd like to thank Adrian (and 
> > the people who jumped on the entries Adrian had) for keeping everybody on 
> > their toes with the regression list - there's a few entries there still, 
> > but it got to the point where we didn't even know if they were real 
> > regressions, and delaying things further just wasn't going to help.
> >...
> 
> 
> Number of different known regressions compared to 2.6.20 at the time
> of the 2.6.21 release:
> 14
> 
> Number of different known regressions compared to 2.6.20 at the time
> of the 2.6.21 release that were first reported in March or earlier:
> 8
> 
> Number of different known regressions compared to 2.6.20 at the time
> of the 2.6.21 release with patches available at the time of the 2.6.21 
> release [1]:
> 3
> 
> What I will NOT do:
> Waste my time with tracking 2.6.22-rc regressions.
> 
> 
> We have an astonishing amount of -rc testers, but obviously not the 
> developer manpower for handling them.
> 
> If we would take "no regressions" seriously, it might take 4 or 5 months 
> between releases due to the lack of developer manpower for handling 
> regressions. But that should be considered OK if avoiding regressions 
> was considered more important than getting as quick as possible to the 
> next two week regression-merge window.
> 
> But releasing with so many known regressions is insulting for the many 
> people who spent their time testing -rc kernels.

Adrian,

I understand your concerns, it's more and more common to see developers
considering their work is worthless. But it's not. You should see the
current development model as a pipeline. What you feed at the input can
take some time to reach the output, and if we wait for the whole pipeline
to flush, more crap gets released.

What is needed is a higher priority on fixes for known regressions. I
find your summary above more readable than the large lists of regressions.
I think that you should reply to Linus' announces with something that
short, starting from the known-with-patch, known-for-more-than-1-month,
and all-known-regressions. It may help Linus focus even more on those.
Also, while it will not prevent any release with regressions, at least
it will prevent such a stupid case of known regressions with patch
available.

Also, check how many regressions you have reported and which have been
fixed during the -rc stage. You'll see your work really was useful.

Maybe Linus should accept to dedicate -final to known regressions only,
to force a check in this area ? Whether or not all of them get fixed is
not the real problem, but at least we will not have any regressions with
pending patch unapplied !

Please do continue that task if you have the time to do so !

Thanks,
Willy

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] rename TANBAC TB0219 config

2007-04-25 Thread Yoichi Yuasa
Hi

This patch has renamed config of TANBAC TB0219 GPIO support.
It changed to an appropriate name. 

Yoichi

Signed-off-by: Yoichi Yuasa <[EMAIL PROTECTED]>

diff -pruN -X generic/Documentation/dontdiff generic-orig/drivers/char/Kconfig 
generic/drivers/char/Kconfig
--- generic-orig/drivers/char/Kconfig   2007-04-26 13:45:27.225157000 +0900
+++ generic/drivers/char/Kconfig2007-04-26 13:58:56.663743750 +0900
@@ -905,8 +905,8 @@ config SONYPI
  To compile this driver as a module, choose M here: the
  module will be called sonypi.
 
-config TANBAC_TB0219
-   tristate "TANBAC TB0219 base board support"
+config GPIO_TB0219
+   tristate "TANBAC TB0219 GPIO support"
depends on TANBAC_TB022X
select GPIO_VR41XX
 
diff -pruN -X generic/Documentation/dontdiff generic-orig/drivers/char/Makefile 
generic/drivers/char/Makefile
--- generic-orig/drivers/char/Makefile  2007-04-26 13:45:27.345164500 +0900
+++ generic/drivers/char/Makefile   2007-04-26 13:43:30.361853500 +0900
@@ -91,7 +91,7 @@ obj-$(CONFIG_PC8736x_GPIO)+= pc8736x_gp
 obj-$(CONFIG_NSC_GPIO) += nsc_gpio.o
 obj-$(CONFIG_CS5535_GPIO)  += cs5535_gpio.o
 obj-$(CONFIG_GPIO_VR41XX)  += vr41xx_giu.o
-obj-$(CONFIG_TANBAC_TB0219)+= tb0219.o
+obj-$(CONFIG_GPIO_TB0219)  += tb0219.o
 obj-$(CONFIG_TELCLOCK) += tlclk.o
 
 obj-$(CONFIG_WATCHDOG) += watchdog/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Linux 2.6.21

2007-04-25 Thread Greg KH
On Thu, Apr 26, 2007 at 06:08:06AM +0200, Adrian Bunk wrote:
> What I will NOT do:
> Waste my time with tracking 2.6.22-rc regressions.

I sure hope you don't do this.

Tracking these is tough, and I think you are doing a great job with it.

No release will have no regressions, there's just too many different
combinations of hardware and sometimes people don't have the time to
test to see if their original report is even fixed or not.

And some of them will get fixed with patches coming in the next kernel
release, which will then be tracked down and added to the -stable
releases.

So if you can, please keep it up, if you think it's a thankless job,
here's my hearty thanks for doing this work.  It's really needed and I
really appreciate it.

thanks,

greg k-h
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Question about Reiser4

2007-04-25 Thread lkml777

On Wed, 25 Apr 2007 23:50:22 +0800, "Jeff Chua"
<[EMAIL PROTECTED]> said:
> On 4/25/07, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote:
> 
> > Laurent Riffard's Reiser4 patch to the default linux-2.6.20 kernel and a
> > couple of others.
> 
> Thank you. Got it. Testing it now.
> 
> Jeff.

What plugins etc are you looking at?
-- 
  
  [EMAIL PROTECTED]

-- 
http://www.fastmail.fm - Email service worth paying for. Try it for free

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [00/17] Large Blocksize Support V3

2007-04-25 Thread Christoph Lameter
On Wed, 25 Apr 2007, Eric W. Biederman wrote:

> The page cache has no problems supporting things with a block
> size larger then page size.  Now the block device layer may not
> have the code to do the scatter gather into small pages and it
> may not handle buffer heads whose data is split between multiple
> pages. 

It does have that problem. If a system is in use then memory is fragmented
and requests to the devices are in 4k sizes. The kernel has to manage the 
4k size. The number of requests that the driver can take is limited. 
Larger blocks allow shuffling more data to the device.

> And generally larger physical pages are a mistake to use.
> Especially as it looks from some of the later comment you don't
> date test on 32bit because the memory fragments faster.

Ummm.. Dont get me to comment on i386. I never said that memory fragments 
faster on i386. i386 has multiple issues with memory management that 
require a lot of work and that will cause difficulty. If you have these
fun systems with 512k ZONE_NORMAL and 63GB HIGHMEM then good luck...

> Is it common for hardware that supports large block sizes to not
> support splitting those blocks apart during DMA?  Unless it is common
> the whole premise of this patchset seems broken.

Huh? Splitting the blocks requires hardware effort -> Reduction in 
transfer rate.
 
> I suspect what needs to be fixed is the page cache block device
> interface so that we have helper functions that know how to stuff
> a single block into several pages.

Oh we have scores of these hacks around. Look at the dvd/cd layer. The 
point is to get rid of those.

> Right now I don't even want to think about trying to use a swap device
> with a large block size when we are low on memory.

But that is due to the VM (at least Linus tree) having no defrag methods.
mm has Mel's antifrag methods and can do it.

> > 2. 32/64k blocksize is also used in flash devices. Same issues.
> 
> flash devices are not block devices so I strongly doubt it is
> the same issue.

But they could be treated as such. Right now these poor guys have to 
improvise around the page size limit.

> > 4. Reduce fsck times. Larger block sizes mean faster file system checking.
> 
> Fewer seeks and less meta-data means faster fsck times.  Larger block
> sizes get us there only tangentially.  

Less meta data to manage does not reduce fsck times? Going from order 0 to 
order 2 blocks cuts the metadata to a fourth.

> > 5. Performance. If we look at IA64 vs. x86_64 then it seems that the
> >faster interrupt handling on x86_64 compensate for the speed loss due to
> >a smaller page size (4k vs 16k on IA64). Supporting larger block sizes
> >sizes on all allows a significant reduction in I/O overhead and increases
> >the size of I/O that can be performed by hardware in a single request
> >since the number of scatter gather entries are typically limited for
> >one request. This is going to become increasingly important to support
> >the ever growing memory sizes since we may have to handle excessively
> >large amounts of 4k requests for data sizes that may become common
> >soon. For example to write a 1 terabyte file the kernel would have to
> >handle 256 million 4k chunks.
> 
> This assumes you get the option of large files and batching things as
> the systems scale.  At SGI maybe that is true.  However in general
> you gets lots of small requests as systems scale up.

Yes you get lots of small request *because* we do not support defrag and 
cannot large contiguous allocations.

> > 6. Cross arch compatibility: It is currently not possible to mount
> >an 16k blocksize ext2 filesystem created on IA64 on an x86_64 system.
> >With this patch this becoems possible.
> 
> Again this is a problem with the page cache block device interface not
> a page cache problem.

Ummm the other arches read 16k blocks of contigous memory. That is not 
supported on 4k platforms right now. I guess you you move those to vmalloc 
areas? Want to hack the filesystems for this?
 
> I think supporting larger block sizes is a nice goal.  However unless
> we are bumping up against hardware limitations let's see how far
> we can go with batching and fixing the block layer/page cache interface
> instead of assuming that larger page sizes are the answer.

There are multiple scaling issues in the kernel. What you propose is to 
add hack over hack into the VM to avoid having to deal with 
defragmentation. That in turn will cause churn with hardware etc etc.


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [00/17] Large Blocksize Support V3

2007-04-25 Thread Eric W. Biederman
[EMAIL PROTECTED] writes:

> V2->V3
> - More restructuring
> - It actually works!
> - Add XFS support
> - Fix up UP support
> - Work out the direct I/O issues
> - Add CONFIG_LARGE_BLOCKSIZE. Off by default which makes the inlines revert
>   back to constants. Disabled for 32bit and HIGHMEM configurations.
>   This also allows a gradual migration to the new page cache
>   inline functions. LARGE_BLOCKSIZE capabilities can be
>   added gradually and if there is a problem then we can disable
>   a subsystem.
>
> V1->V2
> - Some ext2 support
> - Some block layer, fs layer support etc.
> - Better page cache macros
> - Use macros to clean up code.
>
> This patchset modifies the Linux kernel so that larger block sizes than
> page size can be supported. Larger block sizes are handled by using
> compound pages of an arbitrary order for the page cache instead of
> single pages with order 0.

Huh?

You seem to be mixing two very different concepts.

The page cache has no problems supporting things with a block
size larger then page size.  Now the block device layer may not
have the code to do the scatter gather into small pages and it
may not handle buffer heads whose data is split between multiple
pages. 

But this is not a page cache issue.

And generally larger physical pages are a mistake to use.
Especially as it looks from some of the later comment you don't
date test on 32bit because the memory fragments faster.

Is it common for hardware that supports large block sizes to not
support splitting those blocks apart during DMA?  Unless it is common
the whole premise of this patchset seems broken.

I suspect what needs to be fixed is the page cache block device
interface so that we have helper functions that know how to stuff
a single block into several pages.

That would make the choice of using larger order pages (essentially
increasing PAGE_SIZE) something that can be investigated in parallel.

Right now I don't even want to think about trying to use a swap device
with a large block size when we are low on memory.

>
> Rationales:
>
> 1. We have problems supporting devices with a higher blocksize than
>page size. This is for example important to support CD and DVDs that
>can only read and write 32k or 64k blocks. We currently have a shim
>layer in there to deal with this situation which limits the speed
>of I/O. The developers are currently looking for ways to completely
>bypass the page cache because of this deficiency.

block device /page cache interface issue.

> 2. 32/64k blocksize is also used in flash devices. Same issues.

flash devices are not block devices so I strongly doubt it is
the same issue.

> 3. Future harddisks will support bigger block sizes that Linux cannot
>support since we are limited to PAGE_SIZE. Ok the on board cache
>may buffer this for us but what is the point of handling smaller
>page sizes than what the drive supports?

No fragmenting memory and keeping the system running. 

> 4. Reduce fsck times. Larger block sizes mean faster file system checking.

Fewer seeks and less meta-data means faster fsck times.  Larger block
sizes get us there only tangentially.  

> 5. Performance. If we look at IA64 vs. x86_64 then it seems that the
>faster interrupt handling on x86_64 compensate for the speed loss due to
>a smaller page size (4k vs 16k on IA64). Supporting larger block sizes
>sizes on all allows a significant reduction in I/O overhead and increases
>the size of I/O that can be performed by hardware in a single request
>since the number of scatter gather entries are typically limited for
>one request. This is going to become increasingly important to support
>the ever growing memory sizes since we may have to handle excessively
>large amounts of 4k requests for data sizes that may become common
>soon. For example to write a 1 terabyte file the kernel would have to
>handle 256 million 4k chunks.

This assumes you get the option of large files and batching things as
the systems scale.  At SGI maybe that is true.  However in general
you gets lots of small requests as systems scale up.

For example I have gigabytes of kernel trees.  How are larger requests
going to speed of my reading and writing of those?  And yes even with
8G of ram I have enough kernel trees that they fall out of memory.
So cache is not the only answer.

> 6. Cross arch compatibility: It is currently not possible to mount
>an 16k blocksize ext2 filesystem created on IA64 on an x86_64 system.
>With this patch this becoems possible.

Again this is a problem with the page cache block device interface not
a page cache problem.

I think supporting larger block sizes is a nice goal.  However unless
we are bumping up against hardware limitations let's see how far
we can go with batching and fixing the block layer/page cache interface
instead of assuming that larger page sizes are the answer.

Eric
-
To unsubscribe from this list: send the line 

Re: [PATCH 2.4.35-pre4] fix 'pc_keyb: controller jammed (0xA7)' error on systems with KVM

2007-04-25 Thread Willy Tarreau
Hi Brian,

On Wed, Apr 25, 2007 at 03:13:13PM -0400, Brian Maly wrote:
> Ive had a few requests for this patch, so Im posting it against 
> linux-2.4.35-pre4 kernel.

OK, does not look too intrusive, and seems fair enough. Will merge it.
Thanks !

Willy

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC][PATCH -mm take4 2/6] support multiple logging

2007-04-25 Thread David Miller
From: Keiichi KII <[EMAIL PROTECTED]>
Date: Thu, 26 Apr 2007 13:02:04 +0900

> Stephen Hemminger said "The configuration of netconsole's looks like the 
> configuration of routes".
> I think so too.
> So I think ioctl commands for adding/removing port and the following userland 
> application like route(8) command by using the ioctl.

Like the route command itself, the route changing ioctl()s are
old deprecated BSD compatible functionality.

All current routing configuration is done using netlink and the 'ip'
utility.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Linux 2.6.21

2007-04-25 Thread Dave Jones
On Thu, Apr 26, 2007 at 06:08:06AM +0200, Adrian Bunk wrote:
 
 > What I will NOT do:
 > Waste my time with tracking 2.6.22-rc regressions.

I seriously hope you'll reconsider.  If you hadn't have done this,
things would have been a *lot* worse imo.

But either way, thanks for doing what remains a really grotty
job that may not get you as many kernel groupies as rewriting the
process scheduler, but is equally as (if not moreso) important.

Dave

-- 
http://www.codemonkey.org.uk
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] sworks-agp: Switch to PCI ref counting APIs

2007-04-25 Thread Dave Jones
On Wed, Apr 25, 2007 at 09:23:39PM -0700, Andrew Morton wrote:
 > On Thu, 26 Apr 2007 00:20:19 -0400 Dave Jones <[EMAIL PROTECTED]> wrote:
 > 
 > > On Wed, Apr 25, 2007 at 07:21:58PM -0700, Andrew Morton wrote:
 > >  > On Mon, 23 Apr 2007 14:51:29 +0100 Alan Cox <[EMAIL PROTECTED]> wrote:
 > >  > 
 > >  > >  {
 > >  > >   struct agp_bridge_data *bridge = pci_get_drvdata(pdev);
 > >  > >  
 > >  > > + pci_dev_put(bridge->dev);
 > >  > >   agp_remove_bridge(bridge);
 > >  > >   agp_put_bridge(bridge);
 > >  > > + pci_dev_put(serverworks_private.svrwrks_dev)
 > >  > > + serverworks_private.svrwrks_dev = NULL;
 > >  > 
 > >  > err, guys?
 > > 
 > > ? One put for the agp bridge, one for the host bridge.
 > > What am I missing?
 > > 
 > 
 > a semicolon.

Yow. I thought I build tested that.
I'll regenerate the git tree tomorrow. Same goes for the cpufreq
tree with the acpi fixup.

Thanks.

Dave

-- 
http://www.codemonkey.org.uk
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/9] Kconfig: cleanup s390 v2.

2007-04-25 Thread Dave Jones
On Wed, Apr 25, 2007 at 08:02:07PM -0700, Andrew Morton wrote:
 > On Wed, 25 Apr 2007 19:38:23 -0700 Randy Dunlap <[EMAIL PROTECTED]> wrote:
 > > > In fact, I should probably munge it together with a similar thing
 > > > I wrote at http://www.codemonkey.org.uk/projects/findbugs/
 > > > (Warning: scary regexps)
 > > I'll be glad to help maintain such animals if wanted.
 > 
 > wanted ;)
 > 
 > At least, it would be interesting to investigate the usefulness.  I suspect
 > it will prove to be very useful for the little things.

Yeah, the original script tried to do things like spinlock balancing checks,
(badly). This was long before had sparse, and it was partly a "lets learn some 
perl"
experience for myself. I'll toss that idea out now that we have better tools
for that, and keep it to simple checks.

 > Heck, someone could subscribe a robot to all the mailing lists which sends
 > nastygrams straight back at people who submit broken patches.  We already
 > need that for tab-replaced and word-wrapped patches.  (ok, we have it -
 > it's called akpm, but being robotic wearies one)

Ok, I've got a few different flavours of that script. I'll roll them
all into one tomorrow and throw out some of the noisy silly ones
(I don't think warning about strcpy->strncpy is really worthwhile for eg).

Additional regexps gratefully recieved.

Dave

-- 
http://www.codemonkey.org.uk
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] sworks-agp: Switch to PCI ref counting APIs

2007-04-25 Thread Andrew Morton
On Thu, 26 Apr 2007 00:20:19 -0400 Dave Jones <[EMAIL PROTECTED]> wrote:

> On Wed, Apr 25, 2007 at 07:21:58PM -0700, Andrew Morton wrote:
>  > On Mon, 23 Apr 2007 14:51:29 +0100 Alan Cox <[EMAIL PROTECTED]> wrote:
>  > 
>  > >  {
>  > >  struct agp_bridge_data *bridge = pci_get_drvdata(pdev);
>  > >  
>  > > +pci_dev_put(bridge->dev);
>  > >  agp_remove_bridge(bridge);
>  > >  agp_put_bridge(bridge);
>  > > +pci_dev_put(serverworks_private.svrwrks_dev)
>  > > +serverworks_private.svrwrks_dev = NULL;
>  > 
>  > err, guys?
> 
> ? One put for the agp bridge, one for the host bridge.
> What am I missing?
> 

a semicolon.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] sworks-agp: Switch to PCI ref counting APIs

2007-04-25 Thread Dave Jones
On Wed, Apr 25, 2007 at 07:21:58PM -0700, Andrew Morton wrote:
 > On Mon, 23 Apr 2007 14:51:29 +0100 Alan Cox <[EMAIL PROTECTED]> wrote:
 > 
 > >  {
 > >struct agp_bridge_data *bridge = pci_get_drvdata(pdev);
 > >  
 > > +  pci_dev_put(bridge->dev);
 > >agp_remove_bridge(bridge);
 > >agp_put_bridge(bridge);
 > > +  pci_dev_put(serverworks_private.svrwrks_dev)
 > > +  serverworks_private.svrwrks_dev = NULL;
 > 
 > err, guys?

? One put for the agp bridge, one for the host bridge.
What am I missing?

Dave

-- 
http://www.codemonkey.org.uk
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] pcmcia: irq probe can be done without risking an IRQ storm

2007-04-25 Thread Andrew Morton
On Thu, 5 Apr 2007 14:09:36 +0100 Alan Cox <[EMAIL PROTECTED]> wrote:

> Nowdays you can ask for an IRQ to be allocated but not enabled, when
> PCMCIA was written this was not true and this feature is thus not used
> 
> Signed-off-by: Alan Cox <[EMAIL PROTECTED]>
> 
> diff -u --new-file --recursive --exclude-from /usr/src/exclude 
> linux.vanilla-2.6.21-rc5-mm4/drivers/pcmcia/pcmcia_resource.c 
> linux-2.6.21-rc5-mm4/drivers/pcmcia/pcmcia_resource.c
> --- linux.vanilla-2.6.21-rc5-mm4/drivers/pcmcia/pcmcia_resource.c 
> 2007-04-03 16:52:14.0 +0100
> +++ linux-2.6.21-rc5-mm4/drivers/pcmcia/pcmcia_resource.c 2007-04-03 
> 17:10:42.0 +0100
> @@ -810,8 +810,11 @@
>   type = IRQF_SHARED;
>   if (req->Attributes & IRQ_TYPE_DYNAMIC_SHARING)
>   type = IRQF_SHARED; 
>  #ifdef CONFIG_PCMCIA_PROBE
> + if (!(req->Attributes & IRQ_HANDLE_PRESENT))
> + type |= IRQ_NOAUTOEN;
> +
>   if (s->irq.AssignedIRQ != 0) {
>   /* If the interrupt is already assigned, it must be the same */
>   irq = s->irq.AssignedIRQ;

alpha:

drivers/pcmcia/pcmcia_resource.c: In function 'pcmcia_request_irq':
drivers/pcmcia/pcmcia_resource.c:816: error: 'IRQ_NOAUTOEN' undeclared (first 
use in this function)
drivers/pcmcia/pcmcia_resource.c:816: error: (Each undeclared identifier is 
reported only once
drivers/pcmcia/pcmcia_resource.c:816: error: for each function it appears in.)

Problem is, IRQ_NOAUTOEN is a generic-irq thing, so architectures which
don't use generic-irqs break.  And it's defined in linux/irq.h which
(stupidly) cannot be included in generic code.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] CFS scheduler, -v6

2007-04-25 Thread William Lee Irwin III
On Wed, Apr 25, 2007 at 11:47:04PM +0200, Ingo Molnar wrote:
>>  - upstream fix: SysRq-T should show runnable tasks

On Thu, Apr 26, 2007 at 05:29:27AM +0200, Nick Piggin wrote:
> BTW. can you send this upstream? It is very annoying how it currently works,
> and I've had more than one bug that required seeing runnable tasks in order
> to diagnose and fix...

There are other things that should go upstream separately. The
init/main.c comment fix for one. I'd even argue that scheduler classes
should be done separately from and prior to the specific cfs policy.


-- wli
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Linux 2.6.21

2007-04-25 Thread Adrian Bunk
On Wed, Apr 25, 2007 at 08:29:28PM -0700, Linus Torvalds wrote:
>...
> So it's been over two and a half months, and while it's certainly not the 
> longest release cycle ever, it still dragged out a bit longer than I'd 
> have hoped for and it should have. As usual, I'd like to thank Adrian (and 
> the people who jumped on the entries Adrian had) for keeping everybody on 
> their toes with the regression list - there's a few entries there still, 
> but it got to the point where we didn't even know if they were real 
> regressions, and delaying things further just wasn't going to help.
>...


Number of different known regressions compared to 2.6.20 at the time
of the 2.6.21 release:
14

Number of different known regressions compared to 2.6.20 at the time
of the 2.6.21 release that were first reported in March or earlier:
8

Number of different known regressions compared to 2.6.20 at the time
of the 2.6.21 release with patches available at the time of the 2.6.21 
release [1]:
3

What I will NOT do:
Waste my time with tracking 2.6.22-rc regressions.


We have an astonishing amount of -rc testers, but obviously not the 
developer manpower for handling them.

If we would take "no regressions" seriously, it might take 4 or 5 months 
between releases due to the lack of developer manpower for handling 
regressions. But that should be considered OK if avoiding regressions 
was considered more important than getting as quick as possible to the 
next two week regression-merge window.

But releasing with so many known regressions is insulting for the many 
people who spent their time testing -rc kernels.


cu
Adrian

[1] http://lkml.org/lkml/2007/4/25/496

-- 

   "Is there not promise of rain?" Ling Tan asked suddenly out
of the darkness. There had been need of rain for many days.
   "Only a promise," Lao Er said.
   Pearl S. Buck - Dragon Seed

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC][PATCH -mm take4 2/6] support multiple logging

2007-04-25 Thread Keiichi KII

Well..  before you can finish this work we need to decide upon what the
interface to userspace will be.

- The miscdev isn't appropriate

Why isn't miscdev appropriate? 
We just shouldn't use miscdev for networking conventionally?



Yes it's rather odd, especially for networking.

What does the miscdev _do_ anyway?  Is it purely a target for the ioctls?


Yes, I purely use miscdev for the ioctls.

I want to use sysfs and ioctl to implement the dynamic configurabillity.
The sysfs shows/changes netconsole configurations(IP address, port and so on).
A userland application using the ioctl adds/removes netconsole port.

I thought that the dynamic configurability could be realized without a 
userland application. in the kernel only.

(e.g. only sysfs, no userland application)
But I think we need the function to automatically resolve the destination 
MAC address from IP address because of the resolving cost and 
I should implement a userland application, not netconsole kernel module.

The netconsle will become more useful by implementing the above function.


Some other speculations:
1. Would it be possible to add ioctl's to /dev/console? This would be more in
keeping with older Unix style model.

2. Using sysfs makes sense if there is a device object that exists to
   add the sysfs attributes to.

3. Procfs is handy for summary type tables.

4. Netlink does feel like overkill for this. Although newer generic netlink
   makes it easier.


If I use sysfs, Is it proper location that adds each attributes of netconsole 
port in "/sys/class/misc/netconsole/port[0-9]*", or another locations in /sys/?


Stephen Hemminger said "The configuration of netconsole's looks like the 
configuration of routes".

I think so too.
So I think ioctl commands for adding/removing port and the following userland 
application like route(8) command by using the ioctl.


e.g.
1. add port
# netconfig add 192.168.0.10 

2. remove port
# netconfig remove 1

3. show port info
# netconfig
id status  Source IP   Source Port Destination IP Destination Port Destination 
MAC
1  enable  192.168.0.1 6665192.168.0.10    
00:11:22:33:44:55
2  disable 192.168.0.1 6665192.168.0.20    
00:11:22:33:44:66

route(8) command uses ioctl for Netlink.
But, I'm going to implement ioctl's to /dev/console because of the above 
comments.

Thank you for your comments.
Any comments very welcome.
--
Keiichi KII
NEC Corporation OSS Promotion Center
E-mail: [EMAIL PROTECTED]

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] CFS scheduler, -v6

2007-04-25 Thread Andrew Morton
On Thu, 26 Apr 2007 05:29:27 +0200 Nick Piggin <[EMAIL PROTECTED]> wrote:

> >  - upstream fix: SysRq-T should show runnable tasks
> 
> BTW. can you send this upstream? It is very annoying how it currently works,
> and I've had more than one bug that required seeing runnable tasks in order
> to diagnose and fix...

I have it.  I'm just waiting to see if Linus took it.  Seems not.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.21-rc7-mm1: Oops and Gnome desktop freezes

2007-04-25 Thread Dan Kruchinin

Hi.

On 4/25/07, Antonino A. Daplas <[EMAIL PROTECTED]> wrote:

The Gnome desktop does not finish launching.  And I get this tracing,
all coming from Gnome apps.

Tony

BUG: unable to handle kernel paging request at virtual address c0a74000
 printing eip:
c014c469
*pde = 005f3027
*pte = 
Oops: 0002 [#1]
Modules linked in: xt_pkttype xt_tcpudp ipt_LOG xt_limit nfsd exportfs
lockd nfs_acl sunrpc snd_pcm_oss snd_mixer_oss snd_seq button battery ac
ipt_REJECT xt_state iptable_mangle iptable_nat nf_nat iptable_filter
ip6table_filter nf_conntrack_ipv4 nf_conntrack ip_tables ip6_tables
x_tables nls_iso8859_1 nls_cp437 vfat fat nls_utf8 ntfs reiserfs loop
usblp snd_via82xx snd_ac97_codec ac97_bus snd_pcm ide_cd cdrom snd_timer
snd_page_alloc snd_mpu401_uart rtc_cmos snd_rawmidi rtc_core
snd_seq_device rtc_lib snd soundcore via_rhine ehci_hcd uhci_hcd usbcore
sc92031 via_agp 8139too i2c_viapro ext3 mbcache jbd edd fan thermal
processor via82cxxx ide_disk ide_core
CPU:0
EIP:0060:[]Tainted: G  D VLI
EFLAGS: 00210246   (2.6.21-rc7-mm1-default #74)
EIP is at get_page_from_freelist+0x2b5/0x359
eax:    ebx: c1014e80   ecx: 0400   edx: c0002fe8
esi: c1014e80   edi: c0a74000   ebp: d27c9eb0   esp: d27c9e58
ds: 007b   es: 007b   fs: 00d8  gs: 0033  ss: 0068
Process evolution-data- (pid: 4745, ti=d27c8000 task=d32a8aa0
task.ti=d27c8000)
Stack: c0363628 0002  c0113c58 0805c000 c1fccd40 c0a74000
0001
    001280d2 c03640f0 c0363600 00200246 0002 
0001
   0001  00200246 c03640f4 001280d2  d27c9f00
c014c5f5
Call Trace:
 [] show_trace_log_lvl+0x1a/0x30
 [] show_stack_log_lvl+0x9b/0xaa
 [] show_registers+0x1b6/0x288
 [] die+0xe7/0x1fc
 [] do_page_fault+0x429/0x4f8
 [] error_code+0x71/0x78
 [] __alloc_pages+0xe8/0x29e
 [] __handle_mm_fault+0x16d/0x5fc
 [] do_page_fault+0x1fe/0x4f8
 [] error_code+0x71/0x78
 ===
INFO: lockdep is turned off.
Code: 00 00 66 83 7d cc 00 c7 45 ec 00 00 00 00 78 30 eb 36 ba 03 00 00
00 89 d8 e8 8d 63 fc ff b9 00 04 00 00 89 45 c0 31 c0 8b 7d c0  ab
8b 45 c0 ba 03 00 00 00 83 c3 20 e8 d8 63 fc ff ff 45 ec
EIP: [] get_page_from_freelist+0x2b5/0x359 SS:ESP
0068:d27c9e58
note: evolution-data-[4745] exited with preempt_count 1
BUG: sleeping function called from invalid context at kernel/rwsem.c:20
in_atomic():1, irqs_disabled():0
INFO: lockdep is turned off.
 [] show_trace_log_lvl+0x1a/0x30
 [] show_trace+0x12/0x14
 [] dump_stack+0x16/0x18
 [] __might_sleep+0xc9/0xcf
 [] down_read+0x18/0x50
 [] futex_wake+0x35/0xcd
 [] do_futex+0x91/0x104d
 [] sys_futex+0xc1/0xd4
 [] mm_release+0x84/0x8b
 [] exit_mm+0x19/0xc3
 [] do_exit+0x1f8/0x744
 [] die+0x1d6/0x1fc
 [] do_page_fault+0x429/0x4f8
 [] error_code+0x71/0x78
 [] __alloc_pages+0xe8/0x29e
 [] __handle_mm_fault+0x16d/0x5fc
 [] do_page_fault+0x1fe/0x4f8
 [] error_code+0x71/0x78
 ===
BUG: scheduling while atomic: evolution-data-/0x1001/4745
INFO: lockdep is turned off.
 [] show_trace_log_lvl+0x1a/0x30
 [] show_trace+0x12/0x14
 [] dump_stack+0x16/0x18
 [] __sched_text_start+0x71/0x553
 [] __cond_resched+0x28/0x3f
 [] cond_resched+0x29/0x34
 [] down_read+0x1d/0x50
 [] futex_wake+0x35/0xcd
 [] do_futex+0x91/0x104d
 [] sys_futex+0xc1/0xd4
 [] mm_release+0x84/0x8b
 [] exit_mm+0x19/0xc3
 [] do_exit+0x1f8/0x744
 [] die+0x1d6/0x1fc
 [] do_page_fault+0x429/0x4f8
 [] error_code+0x71/0x78
 [] __alloc_pages+0xe8/0x29e
 [] __handle_mm_fault+0x16d/0x5fc
 [] do_page_fault+0x1fe/0x4f8
 [] error_code+0x71/0x78
 ===
BUG: unable to handle kernel paging request at virtual address c0a75000
 printing eip:
c0152d68
*pde = 005f3027
*pte = 
Oops: 0002 [#2]
Modules linked in: xt_pkttype xt_tcpudp ipt_LOG xt_limit nfsd exportfs
lockd nfs_acl sunrpc snd_pcm_oss snd_mixer_oss snd_seq button battery ac
ipt_REJECT xt_state iptable_mangle iptable_nat nf_nat iptable_filter
ip6table_filter nf_conntrack_ipv4 nf_conntrack ip_tables ip6_tables
x_tables nls_iso8859_1 nls_cp437 vfat fat nls_utf8 ntfs reiserfs loop
usblp snd_via82xx snd_ac97_codec ac97_bus snd_pcm ide_cd cdrom snd_timer
snd_page_alloc snd_mpu401_uart rtc_cmos snd_rawmidi rtc_core
snd_seq_device rtc_lib snd soundcore via_rhine ehci_hcd uhci_hcd usbcore
sc92031 via_agp 8139too i2c_viapro ext3 mbcache jbd edd fan thermal
processor via82cxxx ide_disk ide_core
CPU:0
EIP:0060:[]Tainted: G  D VLI
EFLAGS: 00010296   (2.6.21-rc7-mm1-default #74)
EIP is at __do_fault+0x17a/0x301
eax: c0a75000   ebx: c1014ea0   ecx: 0400   edx: c0002fe4
esi: d2207000   edi: c0a75000   ebp: d7ad9f00   esp: d7ad9ea8
ds: 007b   es: 007b   fs: 00d8  gs: 0033  ss: 0068
Process smart (pid: 4687, ti=d7ad8000 task=d6c99510 task.ti=d7ad8000)
Stack: d28766e8 d2876730 d7ad9ec0 0007 c0a75000 d2207000 b7a3d650
d291b668
   c1fcc0c0 0101 c12440e0 c1fcc128 d6c99510 18100073 d7ad9f40
b7a3d000
   0003 0001 0002 d291b668 b7a3d650 c1fb38f4 d7ad9f50
c015419f
Call 

Re: MMCv4 support (8-bit support missing)

2007-04-25 Thread Madhusudhan c

Hi Pierre/Philip,


I've looked through the MMC 4.2 spec and I see nothing in it that even hints
that 8-bit support might be optional. So as it stands, the bus testing is still 
out.

Okay. Its possible that my understanding was wrong in the sense that I
thought bus testing procedure is mandatory to support 8-bit cards. If
8-bit is mandatory for MMC4 cards, then the changes required in the
MMC core to support 8-bit might be  simple. Based on host controller
cap this can be handled.

Philip asked me about the access to the 8-bit controller. We might not
be able to provide you direct access to the hardware platform as it
requires involvement of business managers and so on. But can I be of
help by testing your code on our platform and leting you know the
results?

Regards,
Madhu



On 4/24/07, Pierre Ossman <[EMAIL PROTECTED]> wrote:

Madhusudhan c wrote:
>
> Suppose a host controller is capable of suporting 8-bit and it tells
> the core that it can support 8-bit. Now the card that is plugged in
> might or might not support 8-bit based on the type of the card. There
> is no field in the ext_csd which will tell you what bus width the card
> can support.
>

I've looked through the MMC 4.2 spec and I see nothing in it that even hints
that 8-bit support might be optional. So as it stands, the bus testing is still 
out.

Rgds
Pierre





-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: PROBLEM: Oops: 0002 [1] SMP

2007-04-25 Thread Thiago M.
I have also suspected this. memtest86 from test #1 to #10 showed an
error on test #3 once, so i removed the dimm, cleaned it and fixed it
again and run the tests two more times without any error. Is there any
other tool i could use to test the memory?

Thanks.

Thiago.

On Wed, 2007-04-25 at 18:24 -0400, Chuck Ebbert wrote:
> Thiago M. Sayão wrote:
> > I also got this error yesterday which seems related:
> > 
> > Bad pagetable: 001d [1] SMP 
> > Bad pagetable: 0009 [2] SMP 
> > 
> 
> You may have a hardware problem. Did you test the memory?
> 

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] CFS scheduler, -v6

2007-04-25 Thread Nick Piggin
On Wed, Apr 25, 2007 at 11:47:04PM +0200, Ingo Molnar wrote:
> 
> i'm pleased to announce release -v6 of the CFS scheduler patchset. The 
> main goal of CFS is to implement "high quality desktop scheduling" as 
> well as technically possible.
> 
> The CFS patch against v2.6.21-rc7 or against v2.6.20.7 can be downloaded 
> from the usual place:
> 
> http://redhat.com/~mingo/cfs-scheduler/
> 
> i got lots of -v5 feedback (thanks and please keep the reports coming!) 
> so the -v6 release includes many bugfixes and improvements:
> 
> 19 files changed, 317 insertions(+), 744 deletions(-)
> 
> the biggest user-visible changes in -v6 are various refinements to the 
> precise-scheduling infrastructure that should result in generally better 
> interactivity and a smoother desktop. In particular a number of "movie 
> playback lags/stutters" and "firefox lags under load" type of 
> regressions have been resolved. (Please re-report any regression that 
> might not be fixed yet.)
> 
> Changes since -v5:
> 
>  - feature: increase the preemption granularity value on SMP systems. 
>Idea and code comes from the SD scheduler of Con Kolivas, with Con's
>kind permission. (thanks Con!)
> 
>  - fix: the "privileged_nice_level=X" boot option should convert signed
>integers. (Mike Galbraith)
> 
>  - build fix: yield_to unistd.h fix (Srivatsa Vaddagiri)
> 
>  - build fix: CONFIG_HEADERS_CHECK complained about sched.h.
>(reported by Zach Carter)
> 
>  - build fix: normalize_rt_tasks() UP build fix. (Mike Galbraith)
> 
>  - interactivity fix: sched_clock() accuracy fixes. This should resolve 
>certain types of interactivity regressions reported on systems that
>change their CPU frequencies. (mainly laptops)
> 
>  - default settings tweak: changed the X renicing default from -19 to 
>-10, based on tester feedback. (Might still be too much - more 
>feedback is needed.)
> 
>  - feature: introduced "wakeup granularity" and added the 
>/proc/sys/kernel/sched_wakeup_granularity_ns tunable, set to 0 by 
>default for now. This is now distinct from the sched_granularity_ns
>'preemption granularity' property of the scheduler - allowing a
>more agressive increase in the preemption granularity without
>jeopardizing interactivity.
> 
>  - debugging feature: SysRq-T now also shows the /proc/sched_debug 
>output - useful to generate a dump of all relevant scheduler state in 
>one easy step.
> 
>  - debugging feature: make SysRq-Nice normalize negative nice level 
>tasks too and reset the CFS state.
> 
>  - debugging: extend /proc/sched_debug with a few more clock related 
>fields, to be able to better debug problems caused by unstable 
>clocks.
> 
>  - upstream fix: SysRq-T should show runnable tasks

BTW. can you send this upstream? It is very annoying how it currently works,
and I've had more than one bug that required seeing runnable tasks in order
to diagnose and fix...
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] i386: For debugging, make the initial page table setup less forgiving.

2007-04-25 Thread Zachary Amsden

Eric W. Biederman wrote:

I suspect what we want to do is come up with a function to call
to test to see if a page should be read-only and map such pages
_PAGE_KERNEL_RO, or _PAGE_KERNEL_RO_EXEC if it's code.

Speaking of things what are paravirt_alloc_pd and parafirt_alloc_pd 
supposed to do?
  


For hypervisors which shadow kernel page tables, none of these concerns 
with keeping page tables read-only arise.  However, another set of 
concerns does arise with maintaining shadow synchronization.  One of 
those problems is keeping the hypervisor aware of when pages are being 
used as page tables.


However, it turns out both direct page table and shadow page table 
implementations can be made to use one page table allocation function; 
in the direct page table case (as for Xen), this is the point where page 
tables can be recognized and made read-only.  So this is the dual 
purpose of the paravirt_alloc_p[dt] functions.


Zach
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Linux 2.6.21

2007-04-25 Thread Linus Torvalds

If the goal for 2.6.20 was to be a stable release (and it was), the goal 
for 2.6.21 is to have just survived the big timer-related changes and some 
of the other surprises (just as an example: we were apparently unlucky 
enough to hit what looks like a previously unknown hardware errata in one 
of the ethernet drivers that got updated etc).

So it's been over two and a half months, and while it's certainly not the 
longest release cycle ever, it still dragged out a bit longer than I'd 
have hoped for and it should have. As usual, I'd like to thank Adrian (and 
the people who jumped on the entries Adrian had) for keeping everybody on 
their toes with the regression list - there's a few entries there still, 
but it got to the point where we didn't even know if they were real 
regressions, and delaying things further just wasn't going to help.

So the big change during 2.6.21 is all the timer changes to support a 
tickless system (and even with ticks, more varied time sources). Thanks 
(when it no longer broke for lots of people ;) go to Thomas Gleixner and 
Ingo Molnar and a cadre of testers and coders.

Of course, the timer stuff was just the most painful and core part (and 
thus the one that I remember most): there's a lot of changes all over. The 
appended changelog is just for the fixes since -rc7, so that doesn't look 
very impressive, the full changes since 2.6.20 are obviously a *lot* 
bigger (and you're better off reading the individual -rc changelogs).

We now return you to your regular scheduler discussions,

Linus

---
Akinobu Mita (1):
  fault injection: add entry to MAINTAINERS

Alan Cox (3):
  exec.c: fix coredump to pipe problem and obscure "security hole"
  pata_sis: Fix oops on boot
  [SPARC] openprom: Switch to ref counting PCI API

Alexey Dobriyan (1):
  paride drivers: initialize spinlocks

Alexey Kuznetsov (1):
  [NETLINK]: Infinite recursion in netlink.

Andi Kleen (5):
  x86: Fix gcc 4.2 _proxy_pda workaround
  x86: Fix potential overflow in perfctr reservation
  x86: Remove noreplacement option
  x86-64: Always flush all pages in change_page_attr
  i386: Fix some warnings added by earlier patch

Andrea Righi (1):
  [netdrvr] depca: handle platform_device_add() failure

Andrew Morton (4):
  drivers/macintosh/smu.c: fix locking snafu
  acpi-thermal: fix mod_timer() interval
  drivers/net/hamradio/baycom_ser_fdx build fix
  packet: fix error handling

Atsushi Nemoto (3):
  [MIPS] Disallow CpU exception in kernel again.
  [MIPS] Retry {save,restore}_fp_context if failed in atomic context.
  [MIPS] Fix BUG(), BUG_ON() handling

Aubrey.Li (1):
  [NET]: Fix UDP checksum issue in net poll mode.

Avi Kivity (1):
  KVM: Fix off-by-one when writing to a nonpae guest pde

Badari Pulavarty (1):
  cache_k8_northbridges() overflows beyond allocation

Balbir Singh (1):
  Taskstats fix the structure members alignment issue

Bartlomiej Zolnierkiewicz (2):
  ide/Kconfig: add missing range check for IDE_MAX_HWIFS
  Revert "adjust legacy IDE resource setting (v2)"

Bastian Blank (1):
  Allow reading tainted flag as user

Ben Dooks (2):
  [ARM] 4313/1: S3C24XX: Update s3c2410 defconfig to 2.6.21-rc6
  spi: fix use of set_cs in spi_s3c24xx driver

Benjamin Herrenschmidt (1):
  fix bogon in /dev/mem mmap'ing on nommu

Christoph Lameter (1):
  page migration: fix NR_FILE_PAGES accounting

Dan Williams (1):
  usb-net/pegasus: fix pegasus carrier detection

Dave Jiang (1):
  gianfar needs crc32 lib dependency

Dave Johnson (1):
  [MIPS] Fix wrong checksum for split TCP packets on 64-bit MIPS

Dave Jones (1):
  Longhaul - Revert ACPI C3 on Longhaul ver. 2

David Brownell (1):
  MAINTAINERS: use lists.linux-foundation.org

David Rientjes (1):
  oom: kill all threads that share mm with killed task

David S. Miller (2):
  [IPSEC] af_key: Fix thinko in pfkey_xfrm_policy2msg()
  [PARPORT] SUNBPP: Fix OOPS when debugging is enabled.

Denis Lunev (1):
  [NETLINK]: Don't attach callback to a going-away netlink socket

Divy Le Ray (2):
  cxgb3 - Fix low memory conditions
  cxgb3 - PHY interrupts and GPIO pins.

Don Zickus (1):
  allow vmsplice to work in 32-bit mode on ppc64

Evgeniy Dushistov (1):
  ufs proper handling of zero link case

Evgeny Kravtsunov (1):
  [BRIDGE]: Unaligned access when comparing ethernet addresses

Herbert Xu (1):
  [NET]: Get rid of alloc_skb_from_cache

Hugh Dickins (1):
  fix OOM killing processes wrongly thought MPOL_BIND

Ivan Kokshaysky (3):
  alpha: fixes for specific machine types
  alpha: more fixes for specific machine types
  alpha: build fixes - force architecture

Jan Yenya Kasprzak (1):
  Char: mxser_new, fix recursive locking

Jean Delvare (3):
  hwmon/w83627ehf: Fix the fan5 clock divider write
  i2c-pasemi: Depend on PPC_PASEMI again
  hwmon/w83627ehf: 

Re: menuconfig issue (checklist) in 2.6.20.7 & 2.6.21-rc7 ?

2007-04-25 Thread Mike Galbraith
On Wed, 2007-04-25 at 22:30 +0200, Sam Ravnborg wrote:

> > There are general funnies in the menuconfig world (my preference) here.
> > For instance, I recently had reason to change/test different default IO
> > schedulers, and found that no matter what I did, I couldn't select a
> > default IO scheduler any more, though I used to be able to do so.

> Tried it now with latest -git from Linus and here it works.
> Notice that you need to make the scheduler a built-in <*>
> before you can select it as default.
> A scheduler selected as a module  cannot be made default.

Ok, I guess my ncurses is ill.  (all built in)  Thanks.

-Mike

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: suspend2 merge (was Re: [Suspend2-devel] Re: CFS and suspend2: hang in atomic copy)

2007-04-25 Thread Linus Torvalds


On Thu, 26 Apr 2007, Nigel Cunningham wrote:
> 
> Sorry. I wasn't clear. I wasn't saying that suspend to ram has a
> snapshot point. I was trying to say it has a point where you're seeking
> to save information (PCI state / SCSI transaction number or whatever)
> that you'll need to get the hardware into the same state at a later
> stage. That (saving information) is the point of similarity.

Yes, they do both save information, but I'm not actually convinced they 
would necessarily even save the *same* information.

Let's just take an example of USB, and to make things more interesting, 
say that the disk you want to suspend to is itself over USB (not 
necessarily something you _want_ to do, but I think we can all agree that 
it's something that should potentially work, no?)

Now, USB devices actually have per-connection state (at a minimum, the 
"toggle" bit or whatever), and that's obviously something that will 
inevitably *change* as a result of the device being used after 
snapshotting (and even if not used, by the rediscovery by the first kernel 
to boot), and we fundamentally cannot put the final toggle state in the 
snapshot.

So in the snapshot-to-disk scenario, there are some pieces of data that 
simply fundamentally *cannot* be snapshotted, because they are not 
controller state, they are "connection" state.

So in that case, you basically know that you *have* to rebuild the 
connection when you do the "snapshot_resume()" thing. So there's no point 
in even keeping these kinds of connection states (the same is true of 
keyboards, mice, anything else - it's how USB works).

In contrast, in suspend-to-RAM, USB connections might just be things you 
actually want to keep open and active, and you *can* do so, in ways you 
simply cannot do with "snapshot to disk". In fact, if you are something 
like an OLPC and actually go to s2ram very aggressively, you might well 
want to keep the connection established, because it's conceivable that you 
might otherwise lose keypresses etc issues)

See? There are real *technical* reasons to believe that the two "save 
state" operations are really fundamentally different. There are reasons to 
believe that a s2ram can actually happen while keeping some connections 
open that cannot be kept open over a disk snapshot.

Do they *have* to be different? Of course not. For many devices the "save" 
and "freeze" operations will likely all be no-ops, and there would be 
absolutely no difference between suspending and snapshotting, because the 
driver state already natively contains all the information needed to get 
the device going again.

Equally, I don't doubt that in many drivers you'll have very similar "save 
state" logic, but in fact I believe that in many cases that "save state" 
logic will often just be a simple

pci_save_state(dev);

call, so it's literally the case that they will not be just shared between 
the "suspend" and "snapshot" case, they'll be shared across all simple PCI 
devices too!

But that doesn't mean that the functions to do so should be the same. You 
might have

static int mypcidevice_suspend(struct pci_dev *dev)
{
pci_save_state(dev);
pci_set_power_state(dev, PCI_D3);
return 0;
}

static int mupcidevice_snapshot(struct pci_dev *dev)
{
pci_save_state(dev);
return 0;
}

and who cares if they both have that same call to a shared "save state" 
function? They're still totally different operations, and the fact that 
*some* devices may save the same things doesn't make them any more 
similar! See above why some devices might save totally *different* things 
for a "snapshot" vs a "suspend" event.

> I suppose that's another point of similarity - for snapshotting, the
> same ordering is probably needed?

I agree that you're likely to walk the device list in the same order. The 
whole "shut down leaf devices first", "start up root devices first" is 
pretty fundamental.

But that's true of reboot and device discovery too. Should that ordering 
mean that we should use the "discovery()" function and pass it a flag and 
say "you shouldn't discover, you should snapshot or suspend now"? No. 
Everybody agrees that device discovery is something different from device 
suspend. The fact that it's done in a topological order and thus they bear 
some kind of inverse relationship to each other doesn't make them "the 
same".

> > And yes, the _individual_ "save-and-suspend" events obviously needs to be 
> > "atomic", but it's purely about that particular individual device, so 
> > there's never any cross-device issues about that.
> 
> No interdependencies? I'm not sure.

Well, we pretty much count on it, since we will *suspend* the devices at 
the same time. So if they had interdependencies that aren't described by 
the ordering we enforce, they are pretty much screwed anyway ;)

So yes, the device list needs to be topologically 

Re: [PATCH 0/9] Kconfig: cleanup s390 v2.

2007-04-25 Thread Andrew Morton
On Wed, 25 Apr 2007 19:38:23 -0700 Randy Dunlap <[EMAIL PROTECTED]> wrote:

> Dave Jones wrote:
> > On Wed, Apr 25, 2007 at 05:24:47PM -0700, Andrew Morton wrote:
> > 
> >  > It would be neat if someone could create and maintain a new
> >  > scripts/spot-common-mistakes.  Feed it a unified diff and it would 
> > complain
> >  > about newly-added code (and only newly-added code) which has busted
> >  > whitespace, adds new semaphores, adds new kernel_thread calls, etc, etc.
> > 
> > years and years ago, when the dinosaurs roamed the land, I hacked up..
> > http://janitor.kernelnewbies.org/scripts/  and then left it by the wayside.
> > Some of the checks it did are actually bogus, but I'm happy to pick that
> > up again if there's interest in it being a useful tool.
> > 
> > In fact, I should probably munge it together with a similar thing
> > I wrote at http://www.codemonkey.org.uk/projects/findbugs/
> > (Warning: scary regexps)
> > 
> >  > It would need to be fairly simple and easily-extensible, as I can
> >  > imagine quite a few things getting added to it.
> >  > 
> >  > (Imagines a procmail rule which just bounces the email if
> >  > spot-common-mistakes failed)
> > 
> > or a git checkin rule that refuses to commit if it fails ;-)
> 
> Yep, I was going to mention your scripts but you beat me to it.
> 
> I'll be glad to help maintain such animals if wanted.
> 

wanted ;)

At least, it would be interesting to investigate the usefulness.  I suspect
it will prove to be very useful for the little things.

Heck, someone could subscribe a robot to all the mailing lists which sends
nastygrams straight back at people who submit broken patches.  We already
need that for tab-replaced and word-wrapped patches.  (ok, we have it -
it's called akpm, but being robotic wearies one)
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Nonfunctional ethernet (was Re: 2.6.21-rc7-mm1 + sysfs-oops-workaround.patch -- INFO: possible recursive locking detected)

2007-04-25 Thread Andrew Morton
On Thu, 26 Apr 2007 11:26:36 +0900 Tejun Heo <[EMAIL PROTECTED]> wrote:

> Hello, Antonino, Andrew.
> 
> Andrew Morton wrote:
> > On Thu, 26 Apr 2007 09:02:02 +0800 "Antonino A. Daplas" <[EMAIL PROTECTED]> 
> > wrote:
> > 
> >> I can bring up the network manually using ifconfig.  It's opensuse's
> >> rcnetwork script that fails to bring the network up. Entries
> >> in /sys/class/net are still bogus.
> >>
> >> This kernel is now usable to me, I'll start bisection later today if
> >> nobody has an answer.
> > 
> > rc7-mm1 is hardly worth bothering with.  Quite a few really bad ones have
> > now been fixed and I'll try to get rc7-mm2 out within the next 12 hours (I
> > assume a 76-hour debug session won't be needed this time).
> > 
> > But I don't think the sysfs changes in Greg's tree have been updated, so
> > things will probably still fail in that area.  A suitable bisection
> > starting pair would be around gregkh-driver-*
> 
> This is the rename bug I wrote about in the other thread.

ok.

>  Can you hold -mm2 off a bit?  I'm almost done here.

sure.  I'm having much fun with all the obviously-wont-compile patches
which have been checked into various subsystem trees in the past 24 hours.

Please include simple instructions about which gregkh patches I should drop
when this new set comes in.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/9] Kconfig: cleanup s390 v2.

2007-04-25 Thread Randy Dunlap

Dave Jones wrote:

On Wed, Apr 25, 2007 at 05:24:47PM -0700, Andrew Morton wrote:

 > It would be neat if someone could create and maintain a new
 > scripts/spot-common-mistakes.  Feed it a unified diff and it would complain
 > about newly-added code (and only newly-added code) which has busted
 > whitespace, adds new semaphores, adds new kernel_thread calls, etc, etc.

years and years ago, when the dinosaurs roamed the land, I hacked up..
http://janitor.kernelnewbies.org/scripts/  and then left it by the wayside.
Some of the checks it did are actually bogus, but I'm happy to pick that
up again if there's interest in it being a useful tool.

In fact, I should probably munge it together with a similar thing
I wrote at http://www.codemonkey.org.uk/projects/findbugs/
(Warning: scary regexps)

 > It would need to be fairly simple and easily-extensible, as I can
 > imagine quite a few things getting added to it.
 > 
 > (Imagines a procmail rule which just bounces the email if

 > spot-common-mistakes failed)

or a git checkin rule that refuses to commit if it fails ;-)


Yep, I was going to mention your scripts but you beat me to it.

I'll be glad to help maintain such animals if wanted.

--
~Randy
*** Remember to use Documentation/SubmitChecklist when testing your code ***
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: suspend2 merge (was Re: [Suspend2-devel] Re: CFS and suspend2: hang in atomic copy)

2007-04-25 Thread Linus Torvalds


On Wed, 25 Apr 2007, H. Peter Anvin wrote:
> 
> That was the 1990s.  On a brand new server system:
> 
> 00:08.0 System peripheral: Intel Corporation 5000 Series Chipset DMA
> Engine (rev b1)
> 
> For better or worse, slave DMA seems to be making a comeback of sorts.
> Not to mention all kinds of embedded crap^Whardware with optimized DMA
> engines which look nothing like PCI at all.

Well, the solution to that tends to be to just leave them be, and hold 
them on until the very end - and just ignore them (and just make-believe 
that it's actually the device itself that does the DMA transfer).

The PCI spec for controlling DMA is really pretty nasty. You can disable 
it in the PCI config word, of course, but that usually just messes up the 
device entirely.

So in practice, the way to shut up DMA (regardless of whether it's an 
internal DMA engine or an external one) is that you just tell the device 
not to listen any more (for example, for a network controller, the way to 
make sure it doesn't do DMA is just to make sure that you're not sending 
any frames, but also that it's not listening to any either)!

So whether it's internal to the device, or some "system DMA controller", 
the sequence for shutting down DMA always ends up being the same:

 - make sure the host itself doesn't generate any new traffic (eg shut 
   down the send-queue). This is generally a higher-level thing anyway, ie 
   not really a driver decision.
 - the driver needs to tell the hardware to stop listening (ie "stop 
   scanning the command mailboxes" or "stop walking USB command 
   structures" or "stop receiving data")
 - the driver then needs to wait for the controller to say "ok, I'm idle".

because regardless of whether it's the system DMA controller or some 
on-chip DMA controller, you generally can *not* just say "stop 
transferring DMA data", because that will generally just lock the chip up 
or cause other major unhappiness.

So I don't think an external DMA controller (like the i8237, ugh!) really 
_changes_ anything. Except for just the horrible pain of serializing 
access to them for programming etc horrible resource handling issues, of 
course (but that's not specific to suspend/resume).

Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Nonfunctional ethernet (was Re: 2.6.21-rc7-mm1 + sysfs-oops-workaround.patch -- INFO: possible recursive locking detected)

2007-04-25 Thread Tejun Heo
Hello, Antonino, Andrew.

Andrew Morton wrote:
> On Thu, 26 Apr 2007 09:02:02 +0800 "Antonino A. Daplas" <[EMAIL PROTECTED]> 
> wrote:
> 
>> I can bring up the network manually using ifconfig.  It's opensuse's
>> rcnetwork script that fails to bring the network up. Entries
>> in /sys/class/net are still bogus.
>>
>> This kernel is now usable to me, I'll start bisection later today if
>> nobody has an answer.
> 
> rc7-mm1 is hardly worth bothering with.  Quite a few really bad ones have
> now been fixed and I'll try to get rc7-mm2 out within the next 12 hours (I
> assume a 76-hour debug session won't be needed this time).
> 
> But I don't think the sysfs changes in Greg's tree have been updated, so
> things will probably still fail in that area.  A suitable bisection
> starting pair would be around gregkh-driver-*

This is the rename bug I wrote about in the other thread.  Can you hold
-mm2 off a bit?  I'm almost done here.

Thanks.

-- 
tejun
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] sworks-agp: Switch to PCI ref counting APIs

2007-04-25 Thread Andrew Morton
On Mon, 23 Apr 2007 14:51:29 +0100 Alan Cox <[EMAIL PROTECTED]> wrote:

>  {
>   struct agp_bridge_data *bridge = pci_get_drvdata(pdev);
>  
> + pci_dev_put(bridge->dev);
>   agp_remove_bridge(bridge);
>   agp_put_bridge(bridge);
> + pci_dev_put(serverworks_private.svrwrks_dev)
> + serverworks_private.svrwrks_dev = NULL;

err, guys?
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] use mutex instead of semaphore in tty_io.c

2007-04-25 Thread Andrew Morton
On Wed, 25 Apr 2007 20:13:59 +0100 Christoph Hellwig <[EMAIL PROTECTED]> wrote:

> On Wed, Apr 25, 2007 at 05:49:34PM +0200, Matthias Kaehlcke wrote:
> > drivers/char/tty_io.c uses a semaphore as mutex. use the mutex API
> > instead of the (binary) semaphore
> 
> This looks like it should be a spinlock:
> 
> > -   down(_ptys_lock);
> > +   mutex_lock(_ptys_lock);
> > idr_remove(_ptys, idx);
> > -   up(_ptys_lock);
> > +   mutex_unlock(_ptys_lock);
> 
> idr_remove is a quick operation that doesn't sleep.
> 
> > @@ -2639,24 +2639,24 @@ static int ptmx_open(struct inode * inode, struct 
> > file * filp)
> > nonseekable_open(inode, filp);
> >  
> > /* find a device that is not in use. */
> > -   down(_ptys_lock);
> > +   mutex_lock(_ptys_lock);
> > if (!idr_pre_get(_ptys, GFP_KERNEL)) {
> > -   up(_ptys_lock);
> 
> The idr_pre_get should be moved out of the lock, that's the whole
> point for it's existance..
> 

I think having it inside the lock makes sense:

mutex_lock()
idr_pre_get()
idr_get_new()
mutex_unlock()

here, if idr_pre_get() succeeded, we know that idr_get_new() will succeed.

otoh:

try_again:
idr_pre_get()
mutex_lock()
if (idr_get_new() == failed) {
mutex_unlock()
goto try_again;
}
mutex_unlock()

is not nice.


the IDR api is awful.  A little project is to rip out all its internal
locking and to implement caller-provided locking.

Unfortunately the fact that the library allocates memory means that we
might need to do awkward things like radix_tree_preload() to make it
reliable for callers who use spinlocking.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] CFS scheduler, -v6

2007-04-25 Thread Gene Heskett
On Wednesday 25 April 2007, Ingo Molnar wrote:
>i'm pleased to announce release -v6 of the CFS scheduler patchset. The
>main goal of CFS is to implement "high quality desktop scheduling" as
>well as technically possible.
>
>The CFS patch against v2.6.21-rc7 or against v2.6.20.7 can be downloaded
>from the usual place:
>
>http://redhat.com/~mingo/cfs-scheduler/
>
It hasn't made it to this server yet, and its 22:14 EDT here.

-- 
Cheers, Gene
"There are four boxes to be used in defense of liberty:
 soap, ballot, jury, and ammo. Please use in that order."
-Ed Howdershelt (Author)
Doing gets it done.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: suspend2 merge (was Re: [Suspend2-devel] Re: CFS and suspend2: hang in atomic copy)

2007-04-25 Thread H. Peter Anvin
Linus Torvalds wrote:
> 
> On Thu, 26 Apr 2007, Pavel Machek wrote:
>> Ok, I guess I'll have nightmares of DMA controllers doing DMAs from
>> chips that are no longer there tonight.
> 
> Umm. Welcome to the 21st century: we don't do that "separate DMA 
> controller" thing any more. All devices do their own DMA.
> 

That was the 1990s.  On a brand new server system:

00:08.0 System peripheral: Intel Corporation 5000 Series Chipset DMA
Engine (rev b1)

For better or worse, slave DMA seems to be making a comeback of sorts.
Not to mention all kinds of embedded crap^Whardware with optimized DMA
engines which look nothing like PCI at all.

-hpa
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: pcmcia - failed to initialize IDE interface

2007-04-25 Thread Andrew Morton
On Wed, 25 Apr 2007 15:27:26 +0200 "Aeschbacher, Fabrice" <[EMAIL PROTECTED]> 
wrote:

> Hi,
> 
> [kernel 2.6.20.7, arch=mips, processor=amd au1550]
> 
> I'm trying to install a 2.6 kernel on an Alchemy au1550, and having
> problem with the pcmcia socket, where I plugged a CompactFlash card. The
> card seems to be recognized by the kernel, appears in
> /sys/bus/pcmcia/devices, but not in /proc/bus/pccard, and I can't access
> the device (/dev/hda).
> 
> The relevant console messages:
> 
> pccard: PCMCIA card inserted into slot 0
> pcmcia: registering new device pcmcia0.0
> hda: SanDisk SDCFB-64, CFA DISK drive
> ide0: Disabled unable to get IRQ 35.
> ide0: failed to initialize IDE interface
> ide0: I/O resource 0x10200E-0x10200E not free.
> ide0: ports already in use, skipping probe
> ide0: I/O resource 0x10200E-0x10200E not free.
> ide0: ports already in use, skipping probe
> ide0: I/O resource 0x10200E-0x10200E not free.
> ide0: ports already in use, skipping probe
> ide0: I/O resource 0x10200E-0x10200E not free.
> ide0: ports already in use, skipping probe
> ide0: I/O resource 0x10200E-0x10200E not free.
> ide0: ports already in use, skipping probe
> ide0: I/O resource 0x10200E-0x10200E not free.
> ide0: ports already in use, skipping probe
> ide0: I/O resource 0x10200E-0x10200E not free.
> ide0: ports already in use, skipping probe
> ide0: I/O resource 0x10200E-0x10200E not free.
> ide0: ports already in use, skipping probe
> ide0: I/O resource 0x10200E-0x10200E not free.
> ide0: ports already in use, skipping probe
> ide-cs: ide_register() at 0x102000 & 0x10200e, irq 35 failed
> 
> 
> Here is the relevant part of the kernel config:
> CONFIG_IDE=y
> CONFIG_IDE_GENERIC=y
> CONFIG_BLK_DEV_IDE=y
> CONFIG_BLK_DEV_IDECS=y
> CONFIG_PCCARD=y
> CONFIG_PCMCIA_DEBUG=y
> CONFIG_PCMCIA=y
> CONFIG_PCMCIA_AU1X00=y
> 

(cc'ed linux-mips)

Perhaps /proc/ioports will tell us where the conflict lies.

The output of `dmesg -s 100' might also be needed.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: suspend2 merge (was Re: [Suspend2-devel] Re: CFS and suspend2: hang in atomic copy)

2007-04-25 Thread Linus Torvalds


On Thu, 26 Apr 2007, Nigel Cunningham wrote:
>
> That's where I think you're overstretching the argument. Like suspend 
>(to ram), we're concerned at the snapshot point with getting the hardware 
>in the same state at a later stage.

Really, no.

"suspend to ram" doesn't _have_ a "snapshot point".

I've tried to explain this multiple times, I don't know why it's not 
apparently sinking in. This is much more fundamental than the fact that 
you don't want to stop disks for snapshotting, although it really boils 
down to all the same issues: the operations are simply not at all the 
same!

I agree 100% that "snapshot to disk" is a "snapshot event". You have to 
create a single point in time when everything is stable. And I'd much 
rather call it "snapshot to disk" than "suspend to disk" to make it clear 
that it's something _totally_ different from "suspend".

Because the thing is, "suspend to ram" is *not* a snapshot event. At no 
point do you actually need to "snapshot" the system at all. You can just 
gradually shut more and more things down, and equally gradually bring them 
back up. There simply is *never* any "snapshot" time from a device 
standpoint, because you can just shut down devices in the right order AND 
YOU ARE DONE.

Really. 

[ Obviously s2ram does have one "magic moment", namely the time when the 
  CPU does the magic read from the northbridge that actually turns off 
  power for the CPU. But that's really a total non-event from a device 
  standpoint, so while it's undoubtedly a very interesting moment in the 
  suspend sequence, it's not really relevant in any way for device 
  drivers in general. Not at all like the "snapshot moment" that requires 
  the whole system to be totally quiescent in a "snapshot to disk"! ]

And the reason s2ram doesn't have a that "snapshot" moment is exactly that 
the RAM contents are just always there, so there's no need to have a 
"synchronization event" when ram and devices match. The RAM will *always* 
match whatever any particular device has done to it, and the proper way to 
handle things is to just do a simple per-device "save-and-suspend" event.

And yes, the _individual_ "save-and-suspend" events obviously needs to be 
"atomic", but it's purely about that particular individual device, so 
there's never any cross-device issues about that.

For example, if you're a USB hub controller, which is just about the most 
complex issue you can have, you obviously want to "save the state" with 
the controller in a STOPPED state, but that should just go without saying: 
if the controller isn't stopped, you simply *cannot* save the state, since 
the state is changing under you. 

The difference is, that the USB driver needs to just "stop, save, and 
suspend" as one simple operation for s2ram. In contrast, when doing 
snapshot to disk, it cannot do that, because while it does want to do the 
"stop" part, it needs to do so _separately_ from the "save" part because 
you need to stop everything else *too* before you "save" anythng at all.

Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[BUG REPORT] 2.6.21-rc7 - Yukon-EC Ultra <-> sky2 driver bug(s)

2007-04-25 Thread speedy
Kernel: 2.6.21-rc7
Device: Yukon-EC Ultra (0xb4) rev 2 [integrated on Gigabyte GA-965P-DQ6]
OS: Ubuntu 7.04 (Feisty Fawn)

Description: 

The driver reports rx errors, drops carrier due to HW error, rmmod/modprobe 
combo returns carrier to sane state.. 
after that it works with rx errors for a while, then OOPSes the kernel in 
different ways each time 
ie. ext3 routines, vma (traversal(?)) or this time in the workqueue. Seemingly, 
random memory corruption takes place.

I assume a kernel bug because of the recent git commits regarding sky2, the 
fact Windows XP works flawlessly and
the OOPS itself. Also, that box recently compiled the kernel, so I regard it as 
stable.

The bug is easily reproducible (*sigh* too easy) and occurs also with Ubuntu 
default kernel - 2.6.20 with 
Ubuntu patches. After linux boots and crashes, the network card malfunctions 
even when dual-booted to Windows 
(causing slowness and reboots). It takes power-off/power-on cycle to bring it 
back to stable state.

Thanks in advance for all help, if you need more info, .config or testing any 
patches, let me know.

Cheers,
speedy over

ps. not subscribed to LKML, plz. keep me in CC:


DMESG output:

[   49.876701] ACPI: PCI Interrupt :03:00.0[A] -> GSI 16 (level, low) -> 
IRQ 16
[   49.876713] PCI: Setting latency timer of device :03:00.0 to 64
[   49.876735] sky2 :03:00.0: v1.13 addr 0xf900 irq 16 Yukon-EC Ultra 
(0xb4) rev 2
[   49.876887] PM: Adding info for No Bus:eth0
[   49.876939] sky2 eth0: addr 00:16:e6:d7:a6:ea
[   49.14] sky2 eth0: enabling interface
[   49.891741] sky2 eth0: ram buffer 0K

...

[   52.326154] sky2 eth0: Link is up at 1000 Mbps, full duplex, flow control 
both
[   52.328083] ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
[   53.394751] NET: Registered protocol family 17

...

[   94.584404] sky2 eth0: rx error, status 0x5ca0002 length 1482
[  101.984227] sky2 eth0: rx error, status 0x5ac0002 length 1452
[  102.216648] sky2 eth0: rx error, status 0x5cc0002 length 1484
[  103.182574] sky2 eth0: rx error, status 0x5ac0002 length 1452
[  103.604065] sky2 eth0: rx error, status 0x5ac0002 length 1452
[  103.697021] sky2 eth0: rx error, status 0x5ac0002 length 1452
[  104.244439] sky2 eth0: rx error, status 0x5ac0002 length 1452
[  105.038951] sky2 eth0: rx error, status 0x5ac0002 length 1452
[  105.374538] sky2 eth0: rx error, status 0x5ac0002 length 1452
[  106.878209] sky2 eth0: rx error, status 0x5ac0002 length 1452
[  107.328009] sky2 eth0: rx error, status 0x5ac0002 length 1452
[  107.381861] sky2 eth0: rx error, status 0x5ac0002 length 1452
[  111.996276] printk: 10 messages suppressed.
[  111.996282] sky2 eth0: rx error, status 0x5ac0002 length 1452
[  118.404802] printk: 12 messages suppressed.
[  118.404808] sky2 eth0: rx error, status 0x5ac0002 length 1452
[  174.080264] sky2 eth0: rx error, status 0x5ca0002 length 1482
[  174.095495] sky2 eth0: rx error, status 0x5ca0002 length 1482
[  174.102641] sky2 eth0: hw error interrupt status 0x8
[  174.102645] sky2 eth0: MAC parity error
[  174.181775] sky2 eth0: rx error, status 0x5ca0002 length 1482
[  176.979478] sky2 eth0: rx error, status 0x5ca0002 length 1482
[  177.244215] sky2 eth0: rx error, status 0x5ca0002 length 1482
[  177.617673] sky2 eth0: rx error, status 0x5ca0002 length 1482
[  177.692007] sky2 eth0: rx error, status 0x5ca0002 length 1482
[  178.214524] sky2 eth0: rx error, status 0x5ca0002 length 1482
[  179.230857] printk: 2 messages suppressed.
[  179.230863] sky2 eth0: rx error, status 0x5ca0002 length 1482
[  184.548409] printk: 9 messages suppressed.
[  184.548415] sky2 eth0: rx error, status 0x5ca0002 length 1482
[  189.247824] printk: 5 messages suppressed.
[  189.247830] sky2 eth0: rx error, status 0x5ca0002 length 1482
[  194.293119] printk: 9 messages suppressed.
[  194.293125] sky2 eth0: rx error, status 0x5ca0002 length 1482
[  196.015470] sky2 eth0: transmit descriptor error (hardware problem)
[  196.015561] sky2 eth0: Link is down.
[  199.212348] sky2 eth0: Link is up at 1000 Mbps, full duplex, flow control 
both
[  199.212354] sky2 eth0: transmit descriptor error (hardware problem)
[  199.212485] sky2 eth0: Link is down.
[  201.858518] sky2 eth0: Link is up at 1000 Mbps, full duplex, flow control 
both
[  201.858525] sky2 eth0: transmit descriptor error (hardware problem)
[  201.858657] sky2 eth0: Link is down.
[  204.644601] sky2 eth0: Link is up at 1000 Mbps, full duplex, flow control 
both
[  204.644608] sky2 eth0: transmit descriptor error (hardware problem)
[  204.644739] sky2 eth0: Link is down.
[  207.396671] sky2 eth0: Link is up at 1000 Mbps, full duplex, flow control 
both
[  207.396679] sky2 eth0: transmit descriptor error (hardware problem)
[  207.396811] sky2 eth0: Link is down.
[  210.131335] sky2 eth0: Link is up at 1000 Mbps, full duplex, flow control 
both
[  210.131342] sky2 eth0: transmit descriptor error (hardware problem)
[  210.131472] sky2 eth0: Link is down.

# rmmod sky2
[  

Re: [PATCH 0/9] Kconfig: cleanup s390 v2.

2007-04-25 Thread Anton Vorontsov
On Thu, Apr 26, 2007 at 02:32:06AM +0200, Arnd Bergmann wrote:
> On Thursday 26 April 2007, Andrew Morton wrote:
> > It would be neat if someone could create and maintain a new
> > scripts/spot-common-mistakes.  Feed it a unified diff and it would complain
> > about newly-added code (and only newly-added code) which has busted
> > whitespace, adds new semaphores, adds new kernel_thread calls, etc, etc.
> 
> http://patchstylecheck.googlecode.com/svn/trunk/patchstylecheckemail.pl
> Might serve as a starting point for this. It doesn't have any semantic
> checks right now, but I guess they can be added.

Had run this utility against my battery patches, and caught
bunch of false positives (I believe).


+#define BATTERY_PROP(bat, prop) ({ \
+   void *value = bat->get_property(bat, BATTERY_PROP_##prop); \
+   value ? *(int*)value : 0;  \
+})

Got: "Macros with multiple statements should be enclosed in a do - while
loop"

I believed ({}) is equivalent for "do - while", it's widely used in
kernel.


+   switch (bp) {
+   default: break;
+   };

Got "Gotos should not be indented", at "default: break;"


+static int bind_pst_to_psy(struct power_supplicant *pst,
+   struct power_supply *psy)
+{

Got "use tabs not spaces". Here spaces intentionally used for
formatting purpose, not for the indenting.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Sleep during spinlock in TPM driver

2007-04-25 Thread Andrew Morton
On Mon, 23 Apr 2007 08:14:03 -0400 (EDT) Parag Warudkar <[EMAIL PROTECTED]> 
wrote:

> --- linux-2.6-us/drivers/char/tpm/tpm.c   2007-04-21 14:55:03.134975360 
> -0400
> +++ linux-2.6-wk/drivers/char/tpm/tpm.c   2007-04-22 14:58:51.95763 
> -0400
> @@ -942,12 +942,12 @@
>  {
>   struct tpm_chip *chip = file->private_data;
>  
> + flush_scheduled_work();
>   spin_lock(_lock);
>   file->private_data = NULL;
> - chip->num_opens--;
>   del_singleshot_timer_sync(>user_read_timer);
> - flush_scheduled_work();
>   atomic_set(>data_pending, 0);

btw, this driver has a timer handler which does:

static void user_reader_timeout(unsigned long ptr)
{
struct tpm_chip *chip = (struct tpm_chip *) ptr;

schedule_work(>work);
}

which appears to duplicate schedule_delayed_work()'s functionality.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: suspend2 merge (was Re: [Suspend2-devel] Re: CFS and suspend2: hang in atomic copy)

2007-04-25 Thread Antonino A. Daplas
On Wed, 2007-04-25 at 21:25 +0200, Adrian Bunk wrote:
> On Wed, Apr 25, 2007 at 11:50:45AM -0700, Linus Torvalds wrote:
> > 
> > 
> > On Wed, 25 Apr 2007, Adrian Bunk wrote:
> > > 
> > > 3W for the complete system? In CPU state S1? [1]
> > 
> > In STR, 3W is quite realistic. The CPU is off, all (or most - up to you) 
> > the devices are off, but the motherboard and memory is powered.
> 
> As far as I understand it, the CPU isn't off in S1.
> 
> > > And even 3W would still be a waste of energy.

It is, especially if you're living in a place where power infrastructure
is unreliable (such as where I live). Currently, because of the summer
heat, power demand exceeds power supply so we experience practically
daily rotating 4-hour power interruption. 

That 3W saved multiplied by the total number of computers is a lot.
In this perspective, S2D (or shutdown) is preferred over S2RAM.

Tony


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: MODULE_MAINTAINER

2007-04-25 Thread Andrew Morton
On Mon, 23 Apr 2007 14:32:36 +0200 Rene Herman <[EMAIL PROTECTED]> wrote:

> Provide MODULE_MAINTAINER() as a convenient place to stick a name and email 
> address both for drivers having multiple (current and non-current) authors 
> and for when someone who wants to maintain a driver isn't so much an author.
> 
> Signed-off-by: Rene Herman <[EMAIL PROTECTED]>
> ===
> 
> Rene.
> 
> 
> 
> [module_maintainer2.diff  text/plain (604B)]
> diff --git a/include/linux/module.h b/include/linux/module.h
> index 10f771a..3c54774 100644
> --- a/include/linux/module.h
> +++ b/include/linux/module.h
> @@ -128,6 +128,10 @@ extern struct module __this_module;
>  /* Author, ideally of form NAME [, NAME ]*[ and NAME ] 
> */
>  #define MODULE_AUTHOR(_author) MODULE_INFO(author, _author)
>
> +/* Maintainer, ideally of form NAME  */
> +#define MODULE_MAINTAINER(_maintainer) \
> + MODULE_AUTHOR("(Maintained by) "_maintainer)
> +

I'm not sure we want to do this - that's what ./MAINTAINERS is for and we
end up having to maintain the same info in two places.

I actually use git-whatchanged if I'm unsure who to blame^Wask for help on
a particular piece of code.

An easy way of doing this is to go to
http://git.kernel.org/gitweb.cgi?p=linux/kernel/git/torvalds/linux-2.6.git;a=tree
then drill down to the file and hit the "history" link.  That will tell you who
is *really* doing work on the particular code.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [00/17] Large Blocksize Support V3

2007-04-25 Thread David Chinner
On Thu, Apr 26, 2007 at 11:14:49AM +1000, David Chinner wrote:
> On Wed, Apr 25, 2007 at 03:46:19PM -0700, Badari Pulavarty wrote:
> > On Tue, 2007-04-24 at 15:21 -0700, [EMAIL PROTECTED] wrote:
> > > V2->V3
> > 
> > Hmm.. It broke ext2 :(
> > 
> > V2 worked fine with the small fix I sent you earlier.
> > But on V3, I can't run fsx. I see random data showing up.
> > I will debug, when I get a chance.
> 
> Same thing on XFS - 'fsx -d -S 42 -R -W foobar' fails on
> the tenth operation

H - even normal block size filesystems (ext3) are reading bogus
data (e.g. /etc/mtod).

Cheers,

Dave.
-- 
Dave Chinner
Principal Engineer
SGI Australian Software Group
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC][PATCH] on-demand readahead

2007-04-25 Thread Fengguang Wu
On Wed, Apr 25, 2007 at 06:08:44PM +0200, Andi Kleen wrote:
> > Yeah, the on-demand readahead can avoid _all_ lookups for small in-cache 
> > files.
> 
> How?

In filemap.c:
if (!page) {
page_cache_readahead_adaptive(mapping,
, filp, page,
index, last_index - index);
page = find_get_page(mapping, index);
}
if (page && PageReadahead(page)) {
page_cache_readahead_adaptive(mapping,
, filp, page,
index, last_index - index);
}

Cache hot files neither have missing pages (!page) or lookahead
pages (PageReadahead(page)).  So it will not even be called.

> > > You seem to have a lot of magic numbers. They probably all need symbols 
> > > and 
> > > explanations.
> > 
> > The magic numbers are for easier testings, and will be removed in
> > future.  For now, they enables convenient comparing of the two
> > algorithms in one kernel.
> 
> I mean the 16 and 4 not the sysctl

The numbers and the code in get_next_ra_size2() is simply copied from
get_next_ra_size():

if (cur < max / 16) {
newsize = 4 * cur;
} else {
newsize = 2 * cur;
}

It's a trick to ramp up small sizes more quickly.
That trick is documented in the related get_init_ra_size().
So, it would be better to put the two routines together to make it clear.

> > 
> > If this new algorithm has been further tested and approved, I'll
> > re-submit the patch in a cleaner, standalone form. The adaptive
> > readahead patches can be dropped then. They may better be reworked as
> > a kernel module.
> 
> If they actually help and don't cause regressions they shouldn't be a module, 
> but integrated eventually Just it has to be all step by step.

Yeah, the adaptive readahead is complex and the possible workloads diverse.
It becomes obvious that there is a long way to go, and kernel module makes
life easier.

> > > Your white space also needs some work.
> > 
> > White space in patch description?
> 
> In the code indentation.

Ah, got it: a silly copy/paste mistake.

Thank you,
Wu

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [00/17] Large Blocksize Support V3

2007-04-25 Thread David Chinner
On Wed, Apr 25, 2007 at 03:46:19PM -0700, Badari Pulavarty wrote:
> On Tue, 2007-04-24 at 15:21 -0700, [EMAIL PROTECTED] wrote:
> > V2->V3
> 
> Hmm.. It broke ext2 :(
> 
> V2 worked fine with the small fix I sent you earlier.
> But on V3, I can't run fsx. I see random data showing up.
> I will debug, when I get a chance.

Same thing on XFS - 'fsx -d -S 42 -R -W foobar' fails on
the tenth operation

Cheers,

Dave.
-- 
Dave Chinner
Principal Engineer
SGI Australian Software Group
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: rsdl v46 report,numbers,comments

2007-04-25 Thread Con Kolivas
On Wednesday 25 April 2007 04:26, Mike Mattie wrote:
> Hello,
>
> 0. intro
>
> I am very happy to report that v46 of RSDL subjectively is much better than
> v42. As you (Con Kolivas) might remember from a previous mail I was
> experimenting with using nice levels effectively. I have refined these
> levels to this layout:
>
> -2  : clock (ntpd)
> -1  : syslog,sshd,X
> 0   : command; default for shells
> 1   : audacious (audio), xfce window manager (with compositor on )
> 2   :  emacs (SCHED_OTHER), desktop/window manager infrastructure (dbus),
> ssh-agent , bind (batch scheduled ) 3   : desktop applications (mail ,
> xchat, openoffice )
> 5   : spamd,batch scheduled compiles/test-suites.
> 10  : cron jobs
>
> 1. Some numbers
>
> My machine is a particularly tough case I think. A uni-processor Athlon XP
> 3000+ (involuntary pre-empt) with a software RAID5 on PATA drives. I load
> it heavily with compiles/test-suites, and I am very sensitive to audio
> glitches.
>
> here are some stats for idle:
>
> ---load-avg--- --memory-usage- total-cpu-usage
> interrupts--- ---system-- _1m_ _5m_ 15m_|_used _buff _cach _free|usr
> sys idl wai hiq siq|__17_ __18_ __20_|_int_ _csw_ 0.2  0.2  0.2| 170M   15M
>  309M 6560k|  2   1  94   4   0   0|   1 7   150 | 238   208 0.2  0.2 
> 0.2| 170M   15M  309M 6568k|  1   0  99   0   0   0|   0 0 0 |  76 
>   55 0.2  0.2  0.2| 170M   15M  309M 6568k|  0   1  99   0   0   0|   0
> 0 0 |  7547 0.2  0.2  0.2| 170M   15M  309M 6624k|  4   0  96   0  
> 0   0|   0 0 0 |  7537 0.2  0.2  0.2| 170M   15M  309M 6624k| 
> 1   0  99   0   0   0|   0 0 0 |  7536
>
> here are some stats for music playing:
>
> ---load-avg--- --memory-usage- total-cpu-usage
> interrupts--- ---system-- _1m_ _5m_ 15m_|_used _buff _cach _free|usr
> sys idl wai hiq siq|__17_ __18_ __20_|_int_ _csw_ 0.9  0.4  0.2| 175M   15M
>  305M 5652k|  2   1  94   4   0   0|   1 7   150 | 238   210 0.9  0.4 
> 0.2| 175M   15M  305M 5652k| 10   1  89   0   0   0|   0 3   989 |1068 
> 1510 0.9  0.4  0.2| 175M   15M  305M 5592k| 13   0  87   0   0   0|   0
> 3  1013 |1093  1565 0.9  0.4  0.2| 175M   15M  304M 6300k| 11   1  88   0  
> 0   0|   0 3  1000 |1078  1496 0.9  0.4  0.2| 175M   15M  305M 6300k|
> 13   0  87   0   0   0|   0 3  1006 |1084  1509 0.8  0.4  0.2| 175M  
> 15M  305M 6180k| 13   1  86   0   0   0|   0 3  1000 |1078  1524 0.8 
> 0.4  0.2| 175M   15M  305M 6060k| 12   1  87   0   0   0|   0 3  1000
> |1078  1564
>
> The context switches are high, but so are the interrupts (USB 2.0 Audigy
> NX)
>
> To see how effective using these nice levels were I decided to play with
> rr_interval, on the theory that with priorities strictly enforced and used
> aggressively that a longer time-slice would not cause audio delay. So far
> that theory is holding. All of these numbers are with rr_internal = 20, and
> I have less audio problems than any previous kernel/tuning setup.
>
> That is very impressive.
>
> as far as batch loading goes I tried a kernel compile. These numbers look
> nice for RSDL but there are some caveats:
>
> kernel compile , CFS v3 : make  756.83s user 89.37s
> system 58% cpu 24:08.21 total kernel compile , v46 rr_interval = default  :
> make  754.66s user 89.74s system 59% cpu 23:35.38 total kernel compile ,
> v46 rr_interval = 20   : make  682.83s user 84.34s system 73% cpu
> 17:29.57 total
>
> 1. The system was noisy. I did this intentionally. My typical load is a
> mixture of desktop/compile. All three numbers were generated while
> listening to music, reading docs/web/news, using emacs etc. with each of
> the compiles I tried running a visualization plugin (ProjectM inside
> audacious ) for a minute or so.
>
>This skews the numbers for comparison , but I was looking for an
> impression that was based off a *real* work-load.
>
>It would like to add as well that before RSDL the mainline scheduler
> failed completely at running ProjectM even when it was the only application
> on the desktop. ( It stalled for seconds with a rock steady period ).
>
> 2. All of these ran nice 5 sched: BATCH
>
> 3. I have the xfce compositor turned on, using the transparency.
>
> 4. compiled on software RAID 5 (md) -> dev mapper -> lvm2 -> ext3 , 4
> drives, write-cache disabled, external 512 mg flash drive for a external
> journal , commit=15, journal=data
>
> From the caveats above , especially the deep stack for the block layer,
> plus meeting audio deadlines while sharing a interrupt with the journal
> drive (arghh) this is very impressive system behavior for me.
>
> Here is the stats for doing a kernel compile with audacious running, plus
> mail,editor etc.
>
> ---load-avg--- --memory-usage- total-cpu-usage
> interrupts--- ---system-- _1m_ _5m_ 15m_|_used _buff _cach _free|usr
> sys idl wai hiq siq|__17_ __18_ __20_|_int_ _csw_ 1.31  0.8| 198M   22M
>  

Re: Nonfunctional ethernet (was Re: 2.6.21-rc7-mm1 + sysfs-oops-workaround.patch -- INFO: possible recursive locking detected)

2007-04-25 Thread Andrew Morton
On Thu, 26 Apr 2007 09:02:02 +0800 "Antonino A. Daplas" <[EMAIL PROTECTED]> 
wrote:

> I can bring up the network manually using ifconfig.  It's opensuse's
> rcnetwork script that fails to bring the network up. Entries
> in /sys/class/net are still bogus.
> 
> This kernel is now usable to me, I'll start bisection later today if
> nobody has an answer.

rc7-mm1 is hardly worth bothering with.  Quite a few really bad ones have
now been fixed and I'll try to get rc7-mm2 out within the next 12 hours (I
assume a 76-hour debug session won't be needed this time).

But I don't think the sysfs changes in Greg's tree have been updated, so
things will probably still fail in that area.  A suitable bisection
starting pair would be around gregkh-driver-*


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: suspend2 merge (was Re: [Suspend2-devel] Re: CFS and suspend2: hang in atomic copy)

2007-04-25 Thread Linus Torvalds


On Thu, 26 Apr 2007, Alan Cox wrote:
> 
> You bet there is. We need to know if data arrived or not, because there
> is no guarantee that the data retrieved if we inadvertently re-execute a
> command will be the same. The hardware state itself isn't the problem,
> its the combination of hardware state and internal state which need to
> match in some cases.

... which is why "suspend()" suspends the hardware.

Is that so hard to understand?

Once the hardware is suspended, it's not doing anything.

But STR doesn't have any need for atomicity guarantees _between_devices_.

That's a really *fundamental* difference. 

The reason s2ram is *so* different from snapshot-to-disk is exactly the 
fact that s2ram can (and does) work on one device at a time. 

In contrast, snapshot-to-disk needs to snapshot all the devices 
*together*, since it has a separate disk image.

See? Two *totally* different cases. They have *nothing* in common. Not the 
call sequence, not the logic, not *anything*.

Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ck] [REPORT] cfs-v5 vs sd-0.46

2007-04-25 Thread Con Kolivas
On Tuesday 24 April 2007 17:37, Michael Gerdau wrote:
> Hi list,
>
> with cfs-v5 finally booting on my machine I have run my daily
> numbercrunching jobs on both cfs-v5 and sd-0.46, 2.6.21-v7 on
> top of a stock openSUSE 10.2 (X86_64).

Thanks for testing.

> Both cfs and sd showed very similar behavior when monitored in top.
> I'll show more or less representative excerpt from a 10 minutes
> log, delay 3sec.
>
> sd-0.46
> top - 00:14:24 up  1:17,  9 users,  load average: 4.79, 4.95, 4.80
> Tasks:   3 total,   3 running,   0 sleeping,   0 stopped,   0 zombie
> Cpu(s): 99.8%us,  0.0%sy,  0.0%ni,  0.0%id,  0.0%wa,  0.2%hi,  0.0%si, 
> 0.0%st Mem:   3348628k total,  1648560k used,  1700068k free,64392k
> buffers Swap:  2097144k total,0k used,  2097144k free,   828204k
> cached
>
>   PID USER  PR  NI  VIRT  RES  SHR S %CPU %MEMTIME+  COMMAND
>  6671 mgd   33   0 95508  22m 3652 R  100  0.7  44:28.11 perl
>  6669 mgd   31   0 95176  22m 3652 R   50  0.7  43:50.02 perl
>  6674
>  mgd   31   0 95368  22m 3652 R   50  0.7  47:55.29 perl
>
> cfs-v5
> top - 08:07:50 up 21 min,  9 users,  load average: 4.13, 4.16, 3.23
> Tasks:   3 total,   3 running,   0 sleeping,   0 stopped,   0 zombie
> Cpu(s): 99.5%us,  0.2%sy,  0.0%ni,  0.0%id,  0.0%wa,  0.0%hi,  0.3%si, 
> 0.0%st Mem:   3348624k total,  1193500k used,  2155124k free,32516k
> buffers Swap:  2097144k total,0k used,  2097144k free,   545568k
> cached
>
>   PID USER  PR  NI  VIRT  RES  SHR S %CPU %MEMTIME+  COMMAND
>  6357 mgd   20   0 92024  19m 3652 R  100  0.6   8:54.21 perl
>  6356 mgd   20   0 91652  18m 3652 R   50  0.6  10:35.52 perl
>  6359 mgd   20   0 91700  18m 3652 R   50  0.6   8:47.32 perl
>
> What did surprise me is that cpu utilization had been spread 100/50/50
> (round robin) most of the time. I did expect 66/66/66 or so.

You have 3 tasks and only 2 cpus. The %cpu is the percentage of the cpu the 
task is currently on that it is using; it is not the percentage of 
the "overall cpu available on the machine". Since you have 3 tasks and 2 
cpus, the extra task will always be on one or the other cpu taking half of 
the cpu but never on both cpus.

> What I also don't understand is the difference in load average, sd
> constantly had higher values, the above figures are representative
> for the whole log. I don't know which is better though.

There isn't much useful to say about the load average in isolation. It may be 
meaningful or not depending on whether it just shows the timing of when the 
cpu load is determined, or whether there is more time waiting in runqueues. 
Only throughput measurements can really tell them apart.

What is important is that if all three tasks are fully cpu bound and started 
at the same time at the same nice level, that they all receive close to the 
same total cpu time overall showing some fairness is working as well. This 
should be the case no matter how many cpus you have.

-- 
-ck
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/9] Kconfig: cleanup s390 v2.

2007-04-25 Thread Andrew Morton
On Thu, 26 Apr 2007 02:32:06 +0200 Arnd Bergmann <[EMAIL PROTECTED]> wrote:

> On Thursday 26 April 2007, Andrew Morton wrote:
> > It would be neat if someone could create and maintain a new
> > scripts/spot-common-mistakes.  Feed it a unified diff and it would complain
> > about newly-added code (and only newly-added code) which has busted
> > whitespace, adds new semaphores, adds new kernel_thread calls, etc, etc.
> 
> http://patchstylecheck.googlecode.com/svn/trunk/patchstylecheckemail.pl
> Might serve as a starting point for this. It doesn't have any semantic
> checks right now, but I guess they can be added.
> 

print "Your patch is now worthy to be reviewed by a real person\n";

heh.  Yes, that looks like an ideal starting point.

Methinks it should do `exit 1' if anything was detected.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Nonfunctional ethernet (was Re: 2.6.21-rc7-mm1 + sysfs-oops-workaround.patch -- INFO: possible recursive locking detected)

2007-04-25 Thread Antonino A. Daplas
On Thu, 2007-04-26 at 07:45 +0800, Antonino A. Daplas wrote:
> On Wed, 2007-04-25 at 22:48 +0800, Antonino A. Daplas wrote:
> > On Wed, 2007-04-25 at 14:18 +0900, Tejun Heo wrote:
> > > Miles Lane wrote:
> 
> > eth0 renamed to eth54
> > BUG: atomic counter underflow at:
> >  [] show_trace_log_lvl+0x1a/0x30
> >  [] show_trace+0x12/0x14
> >  [] dump_stack+0x16/0x18
> >  [] _atomic_dec_and_lock+0x29/0x4c
> >  [] dput+0x34/0x103
> >  [] sysfs_drop_dentry+0x141/0x149
> >  [] sysfs_hash_and_remove+0x89/0x10e
> >  [] sysfs_remove_link+0xe/0x10
> >  [] device_rename+0x110/0x181
> >  [] dev_change_name+0x11e/0x1ca
> >  [] dev_ifsioc+0x330/0x3d7
> >  [] dev_ioctl+0x350/0x46e
> >  [] sock_ioctl+0x1be/0x1ca
> >  [] do_ioctl+0x1c/0x53
> >  [] vfs_ioctl+0x1ec/0x203
> >  [] sys_ioctl+0x49/0x62
> >  [] sysenter_past_esp+0x5f/0x99
> >  ===
> 
> The above tracing was caused by CONFIG_SYSFS_DEPRECATED=y and by setting
> this to n, the tracing disappeared..  Still, all my network cards are
> non-functional.  Entries in /sys/class/net are bogus:
> 
> / # cd /sys/class/net/
> /sys/class/net # ls
> eth1  eth44  eth54  lo
> 
> /sys/class/net # cd eth1
> -bash: cd: eth1: No such file or directory
> 
> /sys/class/net # ls -l eth1
> lrwxrwxrwx 1 root root 0 Apr 26 07:15 eth1 ->
> ../../devices/pci:00/:00:12.0/net/eth0
> 
> /sys/class/net # cd ../../devices/pci\:00/\:00\:12.0/net/eth0
> -bash: cd: ../../devices/pci:00/:00:12.0/net/eth0: No such file
> or directory
> 
> Do you know of any patches I need to revert/apply?  Anyway, I have to
> boot back to this kernel and find out more what's going on.
> 

More info.

I can bring up the network manually using ifconfig.  It's opensuse's
rcnetwork script that fails to bring the network up. Entries
in /sys/class/net are still bogus.

This kernel is now usable to me, I'll start bisection later today if
nobody has an answer.

Tony 

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


SD renice recommendation was: Re: [REPORT] cfs-v4 vs sd-0.44

2007-04-25 Thread Con Kolivas
On Tuesday 24 April 2007 16:36, Ingo Molnar wrote:

> So, my point is, the nice level of X for desktop users should not be set
> lower than a low limit suggested by that particular scheduler's author.
> That limit is scheduler-specific. Con i think recommends a nice level of
> -1 for X when using SD [Con, can you confirm?], while my tests show that
> if you want you can go as low as -10 under CFS, without any bad
> side-effects. (-19 was a bit too much)

Nice 0 as a default for X, but if renicing, nice -10 as the lower limit for X 
on SD. The reason for that on SD is that the priority of freshly woken up 
tasks (ie not fully cpu bound) for both nice 0 and nice -10 will still be the 
same at PRIO 1 (see the prio_matrix). Therefore, there will _not_ be 
preemption of the nice 0 task and a context switch _unless_ it is already cpu 
bound and has consumed a certain number of cycles and has been demoted. 
Contrary to popular belief, it is not universal that a less niced task will 
preempt its more niced counterpart and depends entirely on implementation of 
nice. Yes it is true that context switch rate will go up with a reniced X 
because the conditions that lead to preemption are more likely to be met, but 
it is definitely not every single wakeup of the reniced X.

Alas, again, I am forced to spend as little time as possible at the pc for my 
health, so expect _very few_ responses via email from me. Luckily SD is in 
pretty fine shape with version 0.46.

-- 
-ck
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 10/28] i386: map enough initial memory to create lowmem mappings

2007-04-25 Thread Jeremy Fitzhardinge
Chris Wright wrote:
> I was using real hardware with your .config when I reproduced it.
>   

Yes, I first found it on real hardware. I haven't tested my fix on real
hardware yet, but it seems OK on kvm.

J
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [2/2] Driver for the Maxim DS1WM, a 1-wire bus master ASIC core.

2007-04-25 Thread Andrew Morton
On Tue, 24 Apr 2007 14:02:03 +0400 Evgeniy Polyakov <[EMAIL PROTECTED]> wrote:

> +#define DS1WM_CMD_1W_RESET  1 << 0   /* force reset on 1-wire bus */
> +#define DS1WM_CMD_SRA1 << 1  /* enable Search ROM 
> accelerator mode */
> +#define DS1WM_CMD_DQ_OUTPUT 1 << 2   /* write only - forces bus low */
> +#define DS1WM_CMD_DQ_INPUT  1 << 3   /* read only - reflects state of bus */
> +
> +#define DS1WM_INT_PD 1 << 0  /* presence detect */
> +#define DS1WM_INT_PDR1 << 1  /* presence detect result */
> +#define DS1WM_INT_TBE1 << 2  /* tx buffer empty */
> +#define DS1WM_INT_TSRE   1 << 3  /* tx shift register empty */
> +#define DS1WM_INT_RBF1 << 4  /* rx buffer full */
> +#define DS1WM_INT_RSRF   1 << 5  /* rx shift register full */
> +
> +#define DS1WM_INTEN_EPD  1 << 0  /* enable presence detect int */
> +#define DS1WM_INTEN_IAS  1 << 1  /* INTR active state */
> +#define DS1WM_INTEN_ETBE1 << 2   /* enable tx buffer empty int */
> +#define DS1WM_INTEN_ETMT1 << 3   /* enable tx shift register empty int */
> +#define DS1WM_INTEN_ERBF1 << 4   /* enable rx buffer full int */
> +#define DS1WM_INTEN_ERSRF   1 << 5   /* enable rx shift register full int */
> +#define DS1WM_INTEN_DQO  1 << 6  /* enable direct bus driving ops
> +(undocumented), Szabolcs Gyurko */

These macros are very dangerous - please parenthesise them all.

> +
> +struct ds1wm_data {
> + void*map;
> + int bus_shift; /* # of shifts to calc register offsets */
> + struct platform_device *pdev;
> + struct ds1wm_platform_data *pdata;
> + int irq;
> + struct clk  *clk;
> + int slave_present;
> + void*reset_complete;
> + void*read_complete;
> + void*write_complete;
> + u8  read_byte; /* last byte received */
> +};
> +
> +static inline void ds1wm_write_register(struct ds1wm_data *ds1wm_data, u32 
> reg,
> + u8 val)
> +{
> +__raw_writeb(val, ds1wm_data->map + (reg << ds1wm_data->bus_shift));
> +}
> +
> +static inline u8 ds1wm_read_register(struct ds1wm_data *ds1wm_data, u32 reg)
> +{
> +return __raw_readb(ds1wm_data->map + (reg << ds1wm_data->bus_shift));
> +}
> +
> +
> +static irqreturn_t ds1wm_isr(int isr, void *data)
> +{
> + struct ds1wm_data *ds1wm_data = data;
> + u8 intr = ds1wm_read_register(ds1wm_data, DS1WM_INT);
> +
> + ds1wm_data->slave_present = intr & DS1WM_INT_PDR ? 0 : 1;

Normally we'd parenthesise an expression like this so people don't have to
go scrambling for the C precedence table.


> + if (intr & DS1WM_INT_PD && ds1wm_data->reset_complete)
> + complete(ds1wm_data->reset_complete);

Ditto (lots of instances of this in this patch)

> + if (intr & DS1WM_INT_RBF) {
> + ds1wm_data->read_byte = ds1wm_read_register(ds1wm_data,
> + DS1WM_DATA);
> + if (ds1wm_data->read_complete)
> + complete(ds1wm_data->read_complete);
> + }
> +
> + if (intr & DS1WM_INT_TSRE && ds1wm_data->write_complete)
> + complete(ds1wm_data->write_complete);
> +
> + return IRQ_HANDLED;
> +}
> +
> +static int ds1wm_reset(struct ds1wm_data *ds1wm_data)
> +{
> + unsigned long timeleft;
> + DECLARE_COMPLETION(reset_done);

This will cause lockdep warnings.

- Convert to DECLARE_COMPLETION_ONSTACK

- Test the code using lockdep!  This is covered in
  Documentation/SubmitChecklist, which has many other useful tips.

> + ds1wm_data->reset_complete = _done;
> +
> + ds1wm_write_register(ds1wm_data, DS1WM_INT_EN, DS1WM_INTEN_EPD |
> + (ds1wm_data->pdata->active_high ? DS1WM_INTEN_IAS : 0));
> +
> + ds1wm_write_register(ds1wm_data, DS1WM_CMD, DS1WM_CMD_1W_RESET);
> +
> + timeleft = wait_for_completion_timeout(_done, DS1WM_TIMEOUT);
> + ds1wm_data->reset_complete = NULL;
> + if (!timeleft) {
> +dev_dbg(_data->pdev->dev, "reset failed\n");
> +return 1;
> + }
> +
> + /* Wait for the end of the reset. According to the specs, the time
> +  * from when the interrupt is asserted to the end of the reset is:
> +  * tRSTH  - tPDH  - tPDL - tPDI
> +  * 625 us - 60 us - 240 us - 100 ns = 324.9 us
> +  *
> +  * We'll wait a bit longer just to be sure.
> +  */
> + udelay(500);
> +
> + ds1wm_write_register(ds1wm_data, DS1WM_INT_EN,
> + DS1WM_INTEN_ERBF | DS1WM_INTEN_ETMT | DS1WM_INTEN_EPD |
> + (ds1wm_data->pdata->active_high ? DS1WM_INTEN_IAS : 0));
> +
> + if (!ds1wm_data->slave_present) {
> +dev_dbg(_data->pdev->dev, "reset: no devices found\n");
> +return 1;
> +}


Re: suspend2 merge (was Re: [Suspend2-devel] Re: CFS and suspend2: hang in atomic copy)

2007-04-25 Thread Thomas Orgis
Sort of my 2-many-cents story on why I need "snapshot/restore"...

Am Wed, 25 Apr 2007 13:08:09 -0700 (PDT)
schrieb Linus Torvalds <[EMAIL PROTECTED]>: 

> 
> 
> On Wed, 25 Apr 2007, Kenneth Crudup wrote:
> > 
> > Any working suspend-to-disk method takes care of that for me.  (I'm
> > really not sure why Linus hates S2D so much, though. Back in the day
> > there was a lot more BIOS support, but that's been years now.)
> 
> The really sad part is that APM actually did this better.. 

This really triggers a nerve in me. My laptops (always used models from
some years ago, even) didn't necessarily get easier with respect to power
management (suspend) over time.

My first laptop (Siemens Scenic Mobile 710, 200Mhz Pentium, maxed to
192MB RAM) worked just fine with APM, be it s2ram or s2disk.
Everything handled by the BIOS.
Admittedly, S2disk was quite slow as it stored all ram and didn't write
to the disk as fast as possible, but it worked.
S2ram was also a viable option because I was even able to easily swap
batteries because the thing had two bays to put batteries in.

The next one was a Toshiba Portege 7020 CT (366MHz Pentium2 with dynamic
clock, 192MB), supporting both APM and ACPI.
Installing Linux was not that easy, I think I remember that APM in kernel
froze the box (early 2.6 kernel), while ACPI needed some headache to set
up (compiling a fixed DSDT into the kernel, for example)... I needed
experimental toshiba_acpi to get functions and the acpi_pm_timer to
get something like continuous system clock (special cpu throttling has
funny effects).
Well, I got it together after some time.
Used suspend2 for "snapshot/restore" and actually was able to use ACPI
S3 with the glitch of having to unload/load psmouse driver ... until I
realized that it only resumed in about 80% of cases (BIOS ).
So suspend2 was a badly needed "hack" around the hardware/BIOS to get
some sane workflow.
I remember dealing with swsusp / pmdisk before... but I really ended
up with suspend2 as the thing that works (and I wouldn't have bothered
finding this patch if the in-kernel stuff worked for me).
Of course this was a long time ago and recently I have seen that
in-kernel swsusp works ok, just this unresponsiveness after "restore"
due to missing page cache...

Now I have an IBM ThinkPad X31 (600-1.4GHz Pentium M, 512MB).
ACPI. SpeedStep.
The machine generally works fine, hardware config via ACPI seems to
be fine.
But doing S3/STR? Well... this machine has the odd idea that turning the
system off but the screen backlight back on after a second is a good idea.
Of course just now S3 worked fine... you cannot even depend on the
malfunction -- could have something to do with changing bootup video
from LCD to VGA output for some other reason recently.
Hm. Perhaps it even may work (after tricking the BIOS!?). But I doubt
I'll suddenly develop trust in that.
I _had_ trust in APM STR and STD.
I am quite confident in suspend2 being able to correctly resume (restore)
after a successful suspend (snapshot/restore).

And then, STR doesn't help me on the road when I need to exchange the
battery (I'd need this special extra battery to put under the ThinkPad
for that).
Another thing is that the old Siemens has a nice auxilliary monochrome
LCD that shows the charge status of the batteries in 5 levels, so you
have some means to predict the time you have in STR. The Thinkpad has
greed LED for "battery level OK" and red for "battery level low".
Well, but the Linux kernel won't change that...

Perhaps at some time ACPI implementations in BIOS get to something
reliable (hm, should I get a PowerBook instead?) and can be a good partner
for Linux which struggles for many years now to get into the post-APM era.
Remember reading desktop PC test reports in the c't magazine in the last
years, S3 usually did _not_ work; with Windows, even.
Well, there must be a reason Microsoft chose to implement the "hibernate"
(it _is_ in software, right?).

The APM->ACPI transition made me use the software STD
(snapshot/restore...;-) and I think I will stay with it for the
forseeable future, and be it because I can do fancy things like image
encryption.
ACPI S3 / STR is a nice addition when it works, for the smaller pauses
(changing a train at the station, leaving office for half an hour...),
but I consider STD really to be the more important feature that enables
me to _never_ close my applications unless I want to do a kernel update.

I really must say that some sort of STD is a total must for a laptop for me.
On the other hand I once had a Psion 5MX, which basically was on STR all
the (non-working) time -- and enabled well over 20h of working time on two AAs.
When laptops enter that range of battery life, I guess I could arrange with
just doing STR and won't have to worry about changing batteries without AC
connection;-)


Alrighty then,

Thomas.


signature.asc
Description: PGP signature


Re: [3/3] 2.6.21-rc7: known regressions (v2)

2007-04-25 Thread john stultz
On Wed, 2007-04-25 at 20:33 -0400, Len Brown wrote:
> On Wednesday 25 April 2007 14:08, john stultz wrote:
> > On Wed, 2007-04-25 at 04:06 -0700, Andrew Morton wrote:
> > > On Mon, 23 Apr 2007 23:49:09 +0200 Adrian Bunk <[EMAIL PROTECTED]> wrote:
> > > > Subject: acpi_pm clocksource loses time on x86-64
> > > > References : http://lkml.org/lkml/2007/4/17/143
> > > > Submitter  : Mikael Pettersson <[EMAIL PROTECTED]>
> > > > Handled-By : John Stultz <[EMAIL PROTECTED]>
> > > > Status : problem is being debugged
> > 
> > 
> > The ACPI PM one is *really* odd as its the same clocksource driver on
> > both arches. I had Mikael cut out the clocksource frequency adjustments,
> > and confirmed both i386 and x86_64 are using the same base freq
> > (confirmed via printks).
> 
> If this chipset's PM-timer loses "several minutes per hour" on x86_64,
> I would expect it to do the same on i386.  I can't imagine what the
> difference could be.  Any possibility it is the 24-bit version
> and we do something funky on wraparound?

No, we assume the PM timer wraps at 24 bits and mask it as such on all
systems.

-john

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC][PATCH] ChunkFS: fs fission for faster fsck

2007-04-25 Thread David Chinner
On Wed, Apr 25, 2007 at 04:03:44PM -0700, Valerie Henson wrote:
> On Wed, Apr 25, 2007 at 08:54:34PM +1000, David Chinner wrote:
> > On Tue, Apr 24, 2007 at 04:53:11PM -0500, Amit Gud wrote:
> > > 
> > > The structure looks like this:
> > > 
> > >  --   --
> > > | cnode 0  |-->| cnode 0  |--> to another cnode or NULL
> > >  --   --
> > > | cnode 1  |-  | cnode 1  |-
> > >  --   |   --  |
> > > | cnode 2  |-- |  | cnode 2  |--   |
> > >  --  | |  --  |   |
> > > | cnode 3  | | |  | cnode 3  | |   |
> > >  --  | |  --  |   |
> > > |  |  ||  |   |
> > > 
> > >  inodes   inodes or NULL
> > 
> > How do you recover if fsfuzzer takes out a cnode in the chain? The
> > chunk is marked clean, but clearly corrupted and needs fixing and
> > you don't know what it was pointing at.  Hence you have a pointer to
> > a trashed cnode *somewhere* that you need to find and fix, and a
> > bunch of orphaned cnodes that nobody points to *somewhere else* in
> > the filesystem that you have to find. That's a full scan fsck case,
> > isn't?
> 
> Excellent question.  This is one of the trickier aspects of chunkfs -
> the orphan inode problem (tricky, but solvable).  The problem is what
> if you smash/lose/corrupt an inode in one chunk that has a
> continuation inode in another chunk?  A back pointer does you no good
> if the back pointer is corrupted.

*nod*

> What you do is keep tabs on whether you see damage that looks like
> this has occurred - e.g., inode use/free counts wrong, you had to zero
> a corrupted inode - and when this happens, you do a scan of all
> continuation inodes in chunks that have links to the corrupted chunk.

This assumes that you know a chunk has been corrupted, though.
How do you find that out?

> What you need to make this go fast is (1) a pre-made list of which
> chunks have links with which other chunks,

So you add a new on-disk structure that needs to be kept up to
date? How do you trust that structure to be correct if you are
not journalling it? What happens if fsfuzzer trashes part
of this table as well and you can't trust it?

> (2) a fast way to read all
> of the continuation inodes in a chunk (ignoring chunk-local inodes).
> This stage is O(fs size) approximately, but it should be quite swift.

Assuming you can trust this list. if not, finding cnodes is going
to be rather slow.

> > It seems that any sort of damage to the underlying storage (e.g.
> > media error, I/O error or user brain explosion) results in the need
> > to do a full fsck and hence chunkfs gives you no benefit in this
> > case.
> 
> I worry about this but so far haven't found something which couldn't
> be cut down significantly with just a little extra work.  It might be
> helpful to look at an extreme case.
> 
> Let's say we're incredibly paranoid.  We could be justified in running
> a full fsck on the entire file system in between every single I/O.
> After all, something *might* have been silently corrupted.  But this
> would be ridiculously slow.  We could instead never check the file
> system.  But then we would end up panicking and corrupting the file
> system a lot.  So what's a good compromise?
> 
> In the chunkfs case, here's my rules of thumb so far:
> 
> 1. Detection: All metadata has magic numbers and checksums.
> 2. Scrubbing: Random check of chunks when possible.
> 3. Repair: When we detect corruption, either by checksum error, file
>system code assertion failure, or hardware tells us we have a bug,
>check the chunk containing the error and any outside-chunk
>information that could be affected by it.

So if you end up with a corruption in a "clean" part of the
filesystem, you may not find out about the corruption on reboot and
fsck?  You need to trip over the corruption first before fsck can be
told it needs to check/repair a given chunk? Or do you need to force
a "check everything" fsck in this case?

Cheers,

Dave.
-- 
Dave Chinner
Principal Engineer
SGI Australian Software Group
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: BUG: Null pointer dereference in fs/open.c

2007-04-25 Thread William Heimbigner

On Wed, 25 Apr 2007, Andrew Morton wrote:

On Wed, 25 Apr 2007 22:53:00 + (GMT) William Heimbigner <[EMAIL PROTECTED]> 
wrote:


On Wed, 25 Apr 2007, Andrew Morton wrote:


OK.  I am able to use the pktcdvd driver OK in mainline with a piix/sata
drive.  It could be that something is going wrong at the IDE level for you.

Perhaps; I'll try an external usb cd burner, and see where that goes.


Are you able to identify the most recent kernel which actually worked?

No, because I haven't set packet writing up in Linux before - however, I do know
that I've successfully set up packet writing (using 2 of the 3 cd burners I
have) in another operating system before. I'll try 2.6.18 and see if that gets
me anywhere different, though.


OK.

A quick summary: mainline's pktcdvd isn't working for William using IDE.
It is working for me using sata.





So what has happened here is that this code, in ide-cd.c's
cdrom_decode_status() is now triggering:

} else if (blk_pc_request(rq) || rq->cmd_type == REQ_TYPE_ATA_PC) {
/* All other functions, except for READ. */
unsigned long flags;

/*
 * if we have an error, pass back CHECK_CONDITION as the
 * scsi status byte
 */
if (blk_pc_request(rq) && !rq->errors)
rq->errors = SAM_STAT_CHECK_CONDITION;


I suspect this is a bug introduced by
406c9b605cbc45151c03ac9a3f95e9acf050808c (in which case it'll be the third
bug so far).

Perhaps the IDE driver was previously not considering these requests to be
of type blk_pc_request(), and after
406c9b605cbc45151c03ac9a3f95e9acf050808c it _is_ treating them as
blk_pc_request() and is incorrectly reporting an error.  Or something like
that.

Guys: help!

A follow-up: after looking around a bit, I have managed to get packet writing to 
work properly on /dev/hdc (before, it was reporting only 1.8 MB available or so; 
this was a formatting issue).
I've also gotten the external cd-rw drive to work. However, I'm still at a loss 
as to why /dev/hdd won't work. I tried formatting a dvd-rw for this drive, 
however, it consistently gives me:

[27342.503933] drivers/ide/ide-cd.c:729: setting error to 2
[27342.509251]  [] show_trace_log_lvl+0x1a/0x30
[27342.514411]  [] show_trace+0x12/0x20
[27342.518864]  [] dump_stack+0x16/0x20
[27342.523317]  [] cdrom_decode_status+0x1f4/0x3b0
[27342.528732]  [] cdrom_newpc_intr+0x38/0x320
[27342.533791]  [] ide_intr+0x96/0x200
[27342.538157]  [] handle_IRQ_event+0x28/0x60
[27342.543139]  [] handle_edge_irq+0xa6/0x130
[27342.548121]  [] do_IRQ+0x49/0xa0
[27342.552228]  [] common_interrupt+0x2e/0x34
[27342.557200]  [] mwait_idle+0x12/0x20
[27342.561653]  [] cpu_idle+0x4a/0x80
[27342.565934]  [] rest_init+0x37/0x40
[27342.570300]  [] start_kernel+0x34b/0x420
[27342.575109]  [<>] 0x0
[27342.578089]  ===
and doesn't work (the above output was generated by Andrew's patch to log 
certain areas).


# dvd+rw-format /dev/hdd -force
* BD/DVDRW/-RAM format utility by <[EMAIL PROTECTED]>, version 7.0.
:-( failed to locate "Quick Format" descriptor.
* 4.7GB DVD-RW media in Sequential mode detected.
* formatting 0.0\:-[ READ TRACK INFORMATION failed with SK=3h/ASC=11h/ACQ=05h]: 
Input/output error

I tried putting in a different dvd-rw, and this time I get:
# dvd+rw-format /dev/hdd -force
* BD/DVDRW/-RAM format utility by <[EMAIL PROTECTED]>, version 7.0.
* 4.7GB DVD-RW media in Sequential mode detected.
* formatting 0.0|:-[ FORMAT UNIT failed with SK=5h/ASC=26h/ACQ=00h]: 
Input/output error

William Heimbigner
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Question about Reiser4

2007-04-25 Thread lkml777

On Wed, 25 Apr 2007 19:03:12 +0400, "Edward Shishkin"
<[EMAIL PROTECTED]> said:
> [EMAIL PROTECTED] wrote:
> 
> >
> >As I understand it, the default Reiser4 DOES NOT USE any compression at
> >all, not even tail compression,
> >
> 
> ^tail compression^tail conversion
> Reiser4 does use tail conversion by default.
> 
> > but saves space by eliminating block
> >alignment wastage (tail compression is an option).
> >
> >So lets LOSE the statistics that involve compression. The results now
> >look like this:
> >
> >.-.
> >| FILESYSTEM | TIME |DISK |
> >| TYPE   |(secs)|USAGE|
> >.-.
> >|REISER4 | 3462 | 692 |
> >|EXT2| 4092 | 816 |
> >|JFS | 4225 | 806 |
> >|EXT4| 4408 | 816 |
> >|EXT3| 4421 | 816 |
> >|XFS | 4625 | 779 |
> >|REISER3 | 6178 | 793 |
> >|FAT32   |12342 | 988 |
> >|NTFS-3g |10414 | 772 |
> >.-.
> >
> >These results are still EXTREMELY GOOD for REISER4.
> >  
> >
> 
> Everything is not so simple in the science of testing..
> Would you please change direction of your activity to stressing
> instead of benchmarking? Caught oopses would have great value..
> OK?
> 
> Regards,
> Edward.
> 

Tail conversion is NOT compression,

So what exactly is your point?

By "tail compression" I mean plugin ctail40, but since I was never able
to get it to work, maybe its not tail compression at all.
-- 
  
  [EMAIL PROTECTED]

-- 
http://www.fastmail.fm - Or how I learned to stop worrying and
  love email again

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/9] Kconfig: cleanup s390 v2.

2007-04-25 Thread Dave Jones
On Wed, Apr 25, 2007 at 05:24:47PM -0700, Andrew Morton wrote:

 > It would be neat if someone could create and maintain a new
 > scripts/spot-common-mistakes.  Feed it a unified diff and it would complain
 > about newly-added code (and only newly-added code) which has busted
 > whitespace, adds new semaphores, adds new kernel_thread calls, etc, etc.

years and years ago, when the dinosaurs roamed the land, I hacked up..
http://janitor.kernelnewbies.org/scripts/  and then left it by the wayside.
Some of the checks it did are actually bogus, but I'm happy to pick that
up again if there's interest in it being a useful tool.

In fact, I should probably munge it together with a similar thing
I wrote at http://www.codemonkey.org.uk/projects/findbugs/
(Warning: scary regexps)

 > It would need to be fairly simple and easily-extensible, as I can
 > imagine quite a few things getting added to it.
 > 
 > (Imagines a procmail rule which just bounces the email if
 > spot-common-mistakes failed)

or a git checkin rule that refuses to commit if it fails ;-)

Dave

-- 
http://www.codemonkey.org.uk
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC][PATCH] syctl for selecting global zonelist[] order

2007-04-25 Thread KAMEZAWA Hiroyuki
On Thu, 26 Apr 2007 09:31:12 +0900
KAMEZAWA Hiroyuki <[EMAIL PROTECTED]> wrote:
> > 
> > So a IA64 platform with i386 sicknesses? And pretty bad case of it since I 
> > assume that the memory sizes per node are equal. Your solution of taking 
> > 4G off node 0 and then going to node 1 first must hurt some 
> > processes running on node 0. 
> I think so, too. It is because I made this as selectable option.
^
 why...

sorry.
-Kame  

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [3/3] 2.6.21-rc7: known regressions (v2)

2007-04-25 Thread Len Brown
On Wednesday 25 April 2007 14:08, john stultz wrote:
> On Wed, 2007-04-25 at 04:06 -0700, Andrew Morton wrote:
> > On Mon, 23 Apr 2007 23:49:09 +0200 Adrian Bunk <[EMAIL PROTECTED]> wrote:
> > 
> > > This email lists some known regressions in Linus' tree compared to 2.6.20.
> > > 
> > > If you find your name in the Cc header, you are either submitter of one
> > > of the bugs, maintainer of an affectected subsystem or driver, a patch
> > > of you caused a breakage or I'm considering you in any other way
> > > possibly involved with one or more of these issues.
> > > 
> > > Due to the huge amount of recipients, please trim the Cc when answering.
> > > 
> > > 
> > > Subject: HPET enabled freeze my machine at boot
> > >  workaround: clocksource=acpi_pm
> > > References : http://lkml.org/lkml/2007/4/19/370
> > > Submitter  : Guilherme Schroeder <[EMAIL PROTECTED]>
> > > Caused-By  : Thomas Gleixner <[EMAIL PROTECTED]>
> > >  commit 5d8b34fdcb384161552d01ee8f34af5ff11f9684
> > > Handled-By : John Stultz <[EMAIL PROTECTED]>
> > > Status : problem is being debugged
> > > 
> > > 
> > > Subject: acpi_pm clocksource loses time on x86-64
> > > References : http://lkml.org/lkml/2007/4/17/143
> > > Submitter  : Mikael Pettersson <[EMAIL PROTECTED]>
> > > Handled-By : John Stultz <[EMAIL PROTECTED]>
> > > Status : problem is being debugged
> > > 
> > > 
> > > Subject: suspend to disk hangs  (CONFIG_NO_HZ)
> > > References : http://lkml.org/lkml/2007/3/25/217
> > > Submitter  : Jeff Chua <[EMAIL PROTECTED]>
> > > Status : unknown
> > 
> > That's still rather a lot of bustage from the timekeeping changes.  Is
> > anything really happening here or have we all given up?
> 
> 
> The ACPI PM one is *really* odd as its the same clocksource driver on
> both arches. I had Mikael cut out the clocksource frequency adjustments,
> and confirmed both i386 and x86_64 are using the same base freq
> (confirmed via printks).

If this chipset's PM-timer loses "several minutes per hour" on x86_64,
I would expect it to do the same on i386.  I can't imagine what the
difference could be.  Any possibility it is the 24-bit version
and we do something funky on wraparound?

-Len


> It almost seems like when booting x86_64 the ACPI PM counter is running
> slowly! 
> 
> Len: Have you ever heard of such a thing? It seems quite unlikely...
> 
> 
> WRT the HPET freeze issue, I'm still digging there. In that case it
> appears the HPET isn't counting, so timekeeping just stops. I was
> thinking it might be HRT messing w/ the wrong HPET registers, but so far
> that hasn't shaken out.
> 
> I'll spend some more time on these today and see if we get any further.
> 
> thanks
> -john
> 
> 
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [EMAIL PROTECTED]
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: suspend2 merge (was Re: [Suspend2-devel] Re: CFS and suspend2: hang in atomic copy)

2007-04-25 Thread Linus Torvalds


On Thu, 26 Apr 2007, Pavel Machek wrote:
> 
> Ok, I guess I'll have nightmares of DMA controllers doing DMAs from
> chips that are no longer there tonight.

Umm. Welcome to the 21st century: we don't do that "separate DMA 
controller" thing any more. All devices do their own DMA.

> Only the fact that we are currently using same device call during
> snapshot() and during restore(). We obviously could do _5_ device
> calls
> 
> (suspend/resume/freeze/quiesce_disable_dma/thaw)
> 
> ...but that looks like too many calls to me.

I'd much rather have five or even more functions that each do *one* 
obvious thing. 

Think like a device driver writer: would you prefer to just implement five 
functions that do something very specific that you know trivially how to 
do ("I know how to disable interrupts and DMA") or would you want to do 
some high-level opertion that you don't even know why the caller asks you 
to suspend? What does "suspend()" even mean when the caller is just going 
to wake up up immediately again? Is it performance-critical? Should I tear 
down all my DMA's? I dunno!

In other words, splitting things up actually makes things simpler. That's 
*doubly* true if you can then give each specific function some really 
clear goals.

Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 10/28] i386: map enough initial memory to create lowmem mappings

2007-04-25 Thread Chris Wright
* Jeremy Fitzhardinge ([EMAIL PROTECTED]) wrote:
> Eric W. Biederman wrote:
> > Then why you had to allocate enough pages to cause a failure has me stumped.
> > Perhaps there is some other bug?
> 
> Perhaps, but nothing comes to mind. I'll see what happens when I boot
> this kernel on real hardware (rather than kvm).

I was using real hardware with your .config when I reproduced it.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: suspend2 merge (was Re: [Suspend2-devel] Re: CFS and suspend2: hang in atomic copy)

2007-04-25 Thread Linus Torvalds


On Wed, 25 Apr 2007, Linus Torvalds wrote:
> 
> The *thaw* needs to happen with devices quiescent. 

Btw, I sure as hell hope you didn't use "suspend()" for that. You're 
(again) much better off having a totally separate function that just 
freezes stuff.

So in the "snapshot+shutdown" path, you should have:

 - prepare_to_snapshot() - allocate memory, and possibly return errors

   We can skip this, if we just make the rule be that any devices that 
   want to support snapshotting must always have the memory required for 
   snapshotting pre-allocated. Most devices really do allocate memory for 
   their state anyway, and the only real reason for the "prepare" stage 
   here is becasue the final snapshot has to happen with interrupts off, 
   obviously. So *if* we don't need to allocate any memory, and if we 
   don't expect to want to accept some early error case, this is likely 
   useless.

 - snapshot() - actually save device state that is consistent with the 
   memory image at the time. Called with interrupts off, but the device 
   has to be usable both before and afterwards!

And I would seriously suggest that "snapshot()" be documented to not rely 
on any DMA memory, exactly because the device has to be accessible both 
before and after (before - because we're running and allocating memory, 
and after - because we'll be writing thigns out). But see later:

For the "resume snapshot" path, I would suggest having 

 - freeze(): quiesce the device. This literally just does the absolute 
   minimum to make sure that the device doesn't do anything surprising (no 
   interrupts, no DMA, no nothing). For many devices, it's a no-op, even 
   if they can do DMA (eg most disk controllers will do DMA, but only as 
   an actual result of a request, and upper layers will be quiescent 
   anyway, so they do *not* need to disable DMA)

   NOTE! The "freeze()" gets called from the *old* kernel just _before_ a
   snapshot unpacking!!

 - restart_snapshot() - actually restart the snapshot (and usually this 
   would involve re-setting the device, not so much trying to restore all 
   the saved state. IOW, it's easier to just re-initialize the DMA command 
   queues than to try to make them "atomic" in the snapshot).

   NOTE! This gets called by the *new* kernel _after_ the snapshot resume!

And if you *want* to, I can see that you might want to actually do a 
"unfreeze()" thing too, and make the actual shapshotting be:

/* We may not even need this.. */
for_each_device() {
err = prepare_to_snapshot();
if (err)
return err;
}

/* This is the real work for snapshotting */
cli();
for_each_device()
freeze(dev);
for_each_device()
snapshot(dev);
.. snapshot current memory image ..
for_each_device_depth_first()
unfreeze(dev);
sti();

and maybe it's worth it, but I would almost suggest that you just make the 
rule be that any DMA etc just *has* to be re-initialized by 
"restart_snapshot()", in which case it's not even necessary to 
freeze/unfreeze over the device, and "snapshot()" itself only needs to 
make sure any non-DMA data is safe.

But adding the freeze/unfreeze (which is a no-op for most hardware anyway) 
might make things easier to think about, so I would certainly not *object* 
to it, even if I suspect it's not necessary.

Anyway, the restore_snapshot() sequence should be:

/* Old kernel.. Normal boot, load snapshot image */
cli()
for_each_device()
freeze(dev);
restore_snapshot_image();
restore_regs_and_jump_to_image();
/* noreturn */


/* New kernel, gets called at the snapshot restore address
 * with interrupts off and devices frozen, and memory image
 * constsntent with what it was at "snapshot()" time
 */
for_each_dev_depth_first()
restore_snapshot(dev);
/* And if you want to, just to be "symmetric"

for_each_dev_depth_first()
unfreeze(dev)

   although I think you could just make "restore_snapshot()" 
   implicitly unfreeze it too..
 */
sti();
/* We're up */

and notice how *different* this is from what happens for s2ram. There 
really isn't anything in common here. Exactly because s2ram simply doesn't 
_have_ any of the issues with atomic memory images.

So s2ram is just

for_each_dev()
suspend(dev);
cli();
for_each_dev()
late_suspend(dev);
.. go to sleep ..
for_each_dev_depth_first()
early_resume(dev);
sti();
for_each_dev_depth_first()
resume(dev);

and has none of the "freeze" issues at all.

Doesn't that seem a lot more straightforward? Yes, it's more functions, 
but each function is a lot more 

Re: [PATCH 0/9] Kconfig: cleanup s390 v2.

2007-04-25 Thread Arnd Bergmann
On Thursday 26 April 2007, Andrew Morton wrote:
> It would be neat if someone could create and maintain a new
> scripts/spot-common-mistakes.  Feed it a unified diff and it would complain
> about newly-added code (and only newly-added code) which has busted
> whitespace, adds new semaphores, adds new kernel_thread calls, etc, etc.

http://patchstylecheck.googlecode.com/svn/trunk/patchstylecheckemail.pl
Might serve as a starting point for this. It doesn't have any semantic
checks right now, but I guess they can be added.

Arnd <><
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC][PATCH] syctl for selecting global zonelist[] order

2007-04-25 Thread KAMEZAWA Hiroyuki
On Wed, 25 Apr 2007 12:17:15 -0700 (PDT)
Christoph Lameter <[EMAIL PROTECTED]> wrote:

> On Wed, 25 Apr 2007, KAMEZAWA Hiroyuki wrote:
> 
> > Make zonelist policy selectable from sysctl.
> > 
> > Assume 2 node NUMA, only node(0) has ZONE_DMA (ZONE_DMA32).
> > 
> > In this case, default (node0's) zonelist order is
> > 
> > Node(0)'s NORMAL -> Node(0)'s DMA -> Node(1)"s NORMAL.
> > 
> > This means Node(0)'s DMA is used before Node(1)'s NORMAL.
> 
> So a IA64 platform with i386 sicknesses? And pretty bad case of it since I 
> assume that the memory sizes per node are equal. Your solution of taking 
> 4G off node 0 and then going to node 1 first must hurt some 
> processes running on node 0. 
I think so, too. It is because I made this as selectable option.

> Whatever you do the  memory balance between the two nodes is making
> the system behave in an unsymmetric way.


> > In some server, some application uses large memory allcation.
> > This exhaust memory in the above order.
> 
> Could we add a boot time option instead that changes the zonelist build 
> behavior? Maybe an arch hook that can deal with it?
> 
Yes, it' in my plan. I'll add boot option support.

Thanks,
-Kame

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: suspend2 merge (was Re: [Suspend2-devel] Re: CFS and suspend2: hang in atomic copy)

2007-04-25 Thread David Lang

On Thu, 26 Apr 2007, Pavel Machek wrote:


Now, if the old kernel left DMAs running, it could be overwriting
the data we are copying in.


The *thaw* needs to happen with devices quiescent.

But that sure doesn't have anythign to do with the "snapshot()" path. In
fact, you'll have rebooted the machine in between.


Only the fact that we are currently using same device call during
snapshot() and during restore(). We obviously could do _5_ device
calls

(suspend/resume/freeze/quiesce_disable_dma/thaw)

...but that looks like too many calls to me.


So what does that have to do with "snapshotting"?


I'm not comfortable with memory I'm copying changing under my hands
because of some DMA. It just looks like asking for trouble. I _think_
we can get away with DMA running during snapshot, because driver may
not assume anything about the DMA result before it got completion
interrupt, but...


the key is that with STR you don't need to copy the memory (it's staying where 
it is)


for STD you need to copy the memory, and there you halt DMA becouse you need to 
make an atomic snapshot.


David Lang
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/9] Kconfig: cleanup s390 v2.

2007-04-25 Thread Andrew Morton
On Wed, 25 Apr 2007 14:30:11 -0700 Andrew Morton <[EMAIL PROTECTED]> wrote:

> But that only applies to things which I merge.  There's heaps of stuff
> coming in via the git trees which is obviously inadequately reviewed - look
> at all the instances of open-coded kernel_thread() which were merged after
> the kthread() API was introduced, for example.
> 
> 
> And other basic stuff like "use mutexes, not semaphores":
> 
> box:/usr/src/25> grep '^+.*[]down[  ]*[(]' patches/git-*.patch | wc -l
> 32
> 
> 
> 
> Ever wonder where all those whitespace bugs are coming from?
> 
> box:/usr/src/25> grep '^+.*[]if[(]' patches/git-*.patch | wc -l
> 265
> box:/usr/src/25> grep '^+.*[]while[(]' patches/git-*.patch | wc -l  
> 35
> 
> 
> Code which use spaces where it should be using tabs?
> 
> box:/usr/src/25> grep '^+' patches/git-*.patch | wc -l
> 1346
> 

It would be neat if someone could create and maintain a new
scripts/spot-common-mistakes.  Feed it a unified diff and it would complain
about newly-added code (and only newly-added code) which has busted
whitespace, adds new semaphores, adds new kernel_thread calls, etc, etc.

It would need to be fairly simple and easily-extensible, as I can
imagine quite a few things getting added to it.

(Imagines a procmail rule which just bounces the email if
spot-common-mistakes failed)


> 
> Heaven knows how many more serious problems are being snuck into the tree
> via this route.

But it won't solve this problem.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: suspend2 merge (was Re: [Suspend2-devel] Re: CFS and suspend2: hang in atomic copy)

2007-04-25 Thread Alan Cox
> STR does not need to "ensure that you have a consistent snapshot".

Linus I think someone's been spiking your guinness again...

> Why? Becuase there is no _room_ for inconsistency. There's nothing to be 
> "inconsistent with", since any changes to memory (by things like DMA or 
> other setup that happens while the suspend process is going on) is by 
> _definition_ consistent with the resume image (becasue there is no 
> separate image).

You bet there is. We need to know if data arrived or not, because there
is no guarantee that the data retrieved if we inadvertently re-execute a
command will be the same. The hardware state itself isn't the problem,
its the combination of hardware state and internal state which need to
match in some cases.

> off DMA and try to make the hardware be wevy wevy quiet while it's hunting 
> wabbits, it's a lot easier to just do nothing at all on "freeze", and just 
> make sure that "thaw" will re-initialze the DMA tables entirely! All 

Who cares about DMA mapping tables, those are easy to deal with, not even
that bad with an IOMMU to restore. More problematic is the users data
because if we have a device where re-executing a command is not
repeatable (eg O_DIRECT SCSI on a shared bus) then we risk being
inconsistent in our S2RAM.  If we re-run the command on resume having
allowed it to prattle on while doing S2anything then we'll get the wrong
answer.

Now there are lots of devices we don't care about as they don't have
state in the form that causes problems - network cards, TV capture etc,
but there are cases where it matters that every operation is either
finished or not started and there is no doubt about them getting done
during the S2RAM/S2DISK

S2DISK/S2RAM both need to synchronize the state of a device so it can
build a valid snapshot. That bit is a shared requirement just like you
said didn't exist. Doesn't even need to involve turning DMA off, just
getting a consistent state.

The rest can be quite different.

Mind you some laptops think S2RAM is just a transition state on the way
to disk, leave them in ACPI S2RAM and the firmware will magically turn it
into a save to disk and back to ram if the battery runs low or you leave
it idle too long.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] ia64: race flushing icache in do_no_page path

2007-04-25 Thread Mike Stroyan
  This is a very similar problem to a copy-on-write cache flushing problem
that Tony Luck fixed in July 2006.  In this case the do_no_page function
handles a fault in an executable or library that is mmapped from an
NFS file system.  The code is copied into a newly reallocated page.
The lazy_mmu_prot_update() function should be used to flush old entries
from the icache for that page on ia64 processors.  But that call is made
after a set_pte_at call that makes the page accessible to other threads
executing the same code.  This was seen to cause application crashes
when an OpenMP application ran many threads calling same functions at
the same time.  The first thread to reach a page starts to fault in the
new code.  One of the other threads overtakes the first and executes old
data from the icache.  That could result in bad instructions.  It is more
obvious when an old cache line contains prefetched non-instruction bits
that result in an illegal instruction trap.

  The problem has only been seen on montecito processors which have
separate level 2 icache and dcache.  This dcache to icache coherency
problem is more likely to occur there because of the much larger level
2 icache.  I suspect that the non-NFS case is working because direct
DMA into the new page is making the instruction cache coherent.  Any
file system that uses a non-DMA copy into the text page could show the
same problem.

Signed-off-by: Mike Stroyan <[EMAIL PROTECTED]>

diff --git a/mm/memory.c b/mm/memory.c
index e7066e7..50c8848 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -2291,6 +2291,7 @@ retry:
entry = mk_pte(new_page, vma->vm_page_prot);
if (write_access)
entry = maybe_mkwrite(pte_mkdirty(entry), vma);
+   lazy_mmu_prot_update(entry);
set_pte_at(mm, address, page_table, entry);
if (anon) {
inc_mm_counter(mm, anon_rss);
@@ -2312,7 +2313,6 @@ retry:
 
/* no need to invalidate: a not-present page shouldn't be cached */
update_mmu_cache(vma, address, entry);
-   lazy_mmu_prot_update(entry);
 unlock:
pte_unmap_unlock(page_table, ptl);
if (dirty_page) {

-- 
Mike Stroyan, [EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] ide-cs: recognize 2GB CompactFlash from Transcend

2007-04-25 Thread Andrew Morton
On Wed, 25 Apr 2007 11:27:09 +0200 "Aeschbacher, Fabrice" <[EMAIL PROTECTED]> 
wrote:

> Without the following patch, the kernel does not automatically detect
> 2GB CompactFlash cards from Transcend.
> 
> I'm not sure which correct values must be assigned to the 3th and 4th
> parameters (here: 0x709b1bf1, 0xf54a91c8). Anyway, the patch is working
> with these values. Tested on arch=mips.
> 

Thanks.  Your patch was wordwrapped and had tabs replaced by spaces, btw.

> 
> ===
> --- linux-2.6.20.7-orig/drivers/ide/legacy/ide-cs.c 2007-04-15
> 21:08:02.0 +0200
> +++ linux-2.6.20.7/drivers/ide/legacy/ide-cs.c  2007-04-25
> 10:53:53.0 +0200
> @@ -64,6 +64,7 @@
>  
>  #define INT_MODULE_PARM(n, v) static int n = v; module_param(n, int, 0)
>  
> +#define PCMCIA_DEBUG 1
>  #ifdef PCMCIA_DEBUG
>  INT_MODULE_PARM(pc_debug, PCMCIA_DEBUG);
>  #define DEBUG(n, args...) if (pc_debug>(n)) printk(KERN_DEBUG args)

I removed the above change

> @@ -399,6 +400,7 @@
> PCMCIA_DEVICE_PROD_ID12("TOSHIBA", "MK2001MPL", 0xb4585a1a,
> 0x3489e003),
> PCMCIA_DEVICE_PROD_ID1("TRANSCEND512M   ", 0xd0909443),
> PCMCIA_DEVICE_PROD_ID12("TRANSCEND", "TS1GCF80", 0x709b1bf1,
> 0x2a54d4b1),
> +   PCMCIA_DEVICE_PROD_ID12("TRANSCEND", "TS2GCF120", 0x709b1bf1,
> 0xf54a91c8),
> PCMCIA_DEVICE_PROD_ID12("TRANSCEND", "TS4GCF120", 0x709b1bf1,
> 0xf54a91c8),
> PCMCIA_DEVICE_PROD_ID12("WIT", "IDE16", 0x244e5994, 0x3e232852),
> PCMCIA_DEVICE_PROD_ID12("WEIDA", "TWTTI", 0xcc7cf69c,
> 0x212bb918),

I'm never sure whether it's Bart or Dominik who handles pcmcia-cs patches.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: suspend2 merge (was Re: [Suspend2-devel] Re: CFS and suspend2: hang in atomic copy)

2007-04-25 Thread Pavel Machek
Hi!

> > > Why? Becuase there is no _room_ for inconsistency. There's nothing to be 
> > > "inconsistent with", since any changes to memory (by things like DMA or 
> > > other setup that happens while the suspend process is going on) is by 
> > > _definition_ consistent with the resume image (becasue there is no 
> > > separate image).
> > 
> > Do you propose to keep DMAs running while suspending-to-RAM?
> 
> What part of "suspend a chip" do you have trouble with?
> 
> DMA obviously does *not* happen with a suspended device. There's no need 
> to turn DMA even off - it just doesn't happen!

Ok, I guess I'll have nightmares of DMA controllers doing DMAs from
chips that are no longer there tonight.

> > > For example, the whole myth that "freeze" needs to shut off DMA is a 
> > > total 
> > > and utter *myth*. It needs nothing of the sort at all. Rather than shut 
> > > off DMA and try to make the hardware be wevy wevy quiet while it's 
> > > hunting 
> > > wabbits, it's a lot easier to just do nothing at all on "freeze",
> > 
> > No. Sorry, you are wrong here. 
> > 
> > Remember that during resume we run
> > 
> > freeze()
> > copy old data into memory
> > thaw()
> > 
> > Now, if the old kernel left DMAs running, it could be overwriting
> > the data we are copying in.
> 
> The *thaw* needs to happen with devices quiescent. 
> 
> But that sure doesn't have anythign to do with the "snapshot()" path. In 
> fact, you'll have rebooted the machine in between.

Only the fact that we are currently using same device call during
snapshot() and during restore(). We obviously could do _5_ device
calls

(suspend/resume/freeze/quiesce_disable_dma/thaw)

...but that looks like too many calls to me.

> So what does that have to do with "snapshotting"?

I'm not comfortable with memory I'm copying changing under my hands
because of some DMA. It just looks like asking for trouble. I _think_
we can get away with DMA running during snapshot, because driver may
not assume anything about the DMA result before it got completion
interrupt, but... 

Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] i386: For debugging, make the initial page table setup less forgiving.

2007-04-25 Thread Jeremy Fitzhardinge
Eric W. Biederman wrote:
>> The issue is not a matter of avoiding duplicate work, but making sure
>> all the pagetables are consistent from Xen's perspective.
>>
>> Specifically, you may not ever, at any time, create a writable mapping
>> of a page which is currently part of an active pagetable.  This means
>> that when we're creating mappings of physical memory, the pages which
>> are part of the current pagetable must be mapped RO.  The easiest way I
>> found to guarantee that is to copy the Xen-provided pagetable as a
>> template, and only update pages which are missing.
>> 
>
> Hmm.  I now see your problem.
>
>   
>> The other way I could do this is to have special-purpose init-time
>> version of xen_set_pte which checks to see if it's making a RO mapping
>> RW, and refuse to do it.  That would minimize the changes to mm/init.c,
>> but give init-time set_pte rather unexpected hidden semantics.
>> 
>
> Yes.  However how do we handle attempting to create this kind
> of mapping when mmap /dev/mem?  or /dev/kmem?
>   

Hm, I hadn't thought about that. I'm not sure that /dev/k?mem is very
useful in an unprivileged guest, but I guess its useful for debugging or
stats or something. It's tricky to tell whether an arbitrary pfn is part
of a pagetable or not; there's a PG_PINNED page flag to tell you if its
active, but iff you've already determined its a pagetable page.

> I'm pretty certain there are other paths through the kernel where
> we can get page table mapping.
>
> Right now by leaving things read-only you are hiding from the kernel 
> what you are really trying to do.  That makes me distinctly
> uncomfortable.  In general when things get swept under the rug
> we can never handle the properly.  Although this issue may be small
> enough it doesn't matter.
>   

Well, the general idea is that in a paravirtualized environment
pagetable pages need special handling. Different hypervisors need
different handling, but they all need something special. The paravirt
hooks are intended to capture all the interesting events, without
over-constraining what special thing the hypervisor wants to do at that
point.

That's why I went for the "allow the hypervisor to provide a prototype
pagetable, and avoid the bits it has already set up"; it allows it to do
whatever it wants, without getting too specific about what that is, and
retains a fairly straightforward interface.

> I suspect what we want to do is come up with a function to call
> to test to see if a page should be read-only and map such pages
> _PAGE_KERNEL_RO, or _PAGE_KERNEL_RO_EXEC if it's code.
>   

Hm, I think that's a hard function to write in general. For the special
case of pagetable_init it wouldn't be too hard, but it doesn't seem like
a big improvement over the current state of affairs.

> Speaking of things what are paravirt_alloc_pd and parafirt_alloc_pd 
> supposed to do?
>   
(alloc_pd and alloc_pt)

Broadly speaking, they tell the hypevisor that there's a new page about
to be attached to the pagetable. Xen uses it as the hook to map those
pages RO if the pagetable is active. VMI (and lguest?) use it to tell
the hypervisor's shadow pagetable machinery that there's something new
to track.

J
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: suspend2 merge (was Re: [Suspend2-devel] Re: CFS and suspend2: hang in atomic copy)

2007-04-25 Thread Linus Torvalds


On Thu, 26 Apr 2007, Pavel Machek wrote:
> > 
> > Why? Becuase there is no _room_ for inconsistency. There's nothing to be 
> > "inconsistent with", since any changes to memory (by things like DMA or 
> > other setup that happens while the suspend process is going on) is by 
> > _definition_ consistent with the resume image (becasue there is no 
> > separate image).
> 
> Do you propose to keep DMAs running while suspending-to-RAM?

What part of "suspend a chip" do you have trouble with?

DMA obviously does *not* happen with a suspended device. There's no need 
to turn DMA even off - it just doesn't happen!

> > For example, the whole myth that "freeze" needs to shut off DMA is a total 
> > and utter *myth*. It needs nothing of the sort at all. Rather than shut 
> > off DMA and try to make the hardware be wevy wevy quiet while it's hunting 
> > wabbits, it's a lot easier to just do nothing at all on "freeze",
> 
> No. Sorry, you are wrong here. 
> 
> Remember that during resume we run
> 
> freeze()
> copy old data into memory
> thaw()
> 
> Now, if the old kernel left DMAs running, it could be overwriting
> the data we are copying in.

The *thaw* needs to happen with devices quiescent. 

But that sure doesn't have anythign to do with the "snapshot()" path. In 
fact, you'll have rebooted the machine in between.

So what does that have to do with "snapshotting"?

Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: suspend2 merge (was Re: [Suspend2-devel] Re: CFS and suspend2: hang in atomic copy)

2007-04-25 Thread Linus Torvalds


On Thu, 26 Apr 2007, Pavel Machek wrote:
> 
> > For suspend to ram, in contrast, since you *know* that nobody will be 
> > touching the hardware, and since the timings are very different anyway 
> > (you'd hope that you can resume in a second or two), you'd generally want 
> > to keep the DMA engine tables right where they are, and just literally 
> > suspend the PCI chip itself.
> 
> I'd actually prefer resume to be similar to module insert, too... Do
> you think that resume is _that_ time critical?

I think it probably depends on the device, and it should depend on the 
driver writer how he wants to do it.

My _point_ is that there is absolutely zero reason to think that the two 
events are the same. We *know* that for snapshot+shutdown, we need to 
actually keep the DMA tables intact *over* the snapshot (because writing 
out the snapshot may _need_ them). But exactly because we keep them 
intact, a driver writer may sanely say "I didn't even bother shutting them 
down, so on thaw, I cannot trust them, so I'll just re-initialize them 
entirely".

In contrast, over suspend-to-ram, it's entirely reasonable to just leave 
them in memory, and just keep them. There's no *reason* not to.

And that's my whole point in this argument: the two paths are 
fundamentally totally different. You *claim* that "snapshot()" needs to 
stop DMA etc, but that's simply not true.

So I claim:
 - for a lot of devices, it's actually a *lot* easier to just have 
   snapshot not do anythign at all, and re-initialze on thaw
 - for those same devices, for s2ram, since the tables are *safe* and 
   don't get touched by anything else, it's probably easier to just let 
   them be.

See? The "it's easier to do X" is a _different_ X for the two cases. 

So the whole "suspend is a superset of freeze" is simply not true.

> [I'd like you to drop me a line saying you understand current design
> and that it works -- even if it is very inelegant]

I _do_ understand the current design. I just think that it's totally 
and seriously broken. I've ranted against it before. I think it's stupid 
to play like you're "suspending" something just to save some state, 
especially since most users probably don't even *want* to suspend the 
state, and would quite happily re-initialize the chip instead.

And I think it's horrible to have a dynamic flag to tell the difference 
between two or more state changes that the devices should statically be 
able to determine. _If_ some driver really does have the same routine, 
just use the same routine. There are no downsides to splitting them up.

> Now, we can separate suspend/freeze and resume/thaw (with some common
> helpers). It will speed the code up by avoiding unneccessary
> operations. It also needs attetion from driver writers (ouch).
> 
> Do we want to do that?

I'd personally certainly want to do that. But I want to split up the 
callers too. Right now we mix those a lot as well. I suspect that would 
automatically be fixed by just forcing them to separate out (since they 
now call different functions of the devices), but I'm not 100% sure. There 
might be other issues.

Just as an example: one of the most painful things there is in the suspend 
sequence is that we shut off the console (because the console device will 
be suspended in hw, and it's thus not safe to use it over a suspend/resume 
sequence). That should just go away entirely for "snapshot()", because 
there is *never* any excuse for actually turning off the console during a 
snapshot: even a network console should be entirely functional.  Things 
like that - things that sound like small issues, but that really aren't.

(Right now you can enable the "don't disable the console" config option, 
but since network drivers will actually shut down etc, it just means that 
you'll have oopses etc if you do, and you have netconsole enabled)

Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Linux 2.6.20.8

2007-04-25 Thread David Miller
From: Greg KH <[EMAIL PROTECTED]>
Date: Wed, 25 Apr 2007 16:52:10 -0700

> Because I haven't been applying any network-related patches unless you
> forward them to me, based on what happened the last time I did that
> without asking :)

:-)  I'm trying not to be too controlling and stay out of the way
every once in a while :)

> So, sorry, I didn't realize this was a big issue, can you forward the
> needed patches to me?  I'll do a new release with them in it after I get
> back from dinner.

I'll send it to you under seperate cover, thanks a lot Greg.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Linux 2.6.20.8

2007-04-25 Thread Greg KH
On Wed, Apr 25, 2007 at 04:29:44PM -0700, David Miller wrote:
> From: Greg KH <[EMAIL PROTECTED]>
> Date: Wed, 25 Apr 2007 14:22:25 -0700
> 
> > We (the -stable team) are announcing the release of the 2.6.20.8 kernel.
> > This release has a security bugfix so any users of kernels older than
> > 2.6.20.7 are highly encouraged to upgrade as soon as possible.
> 
> Greg, Yoshifuji sent you an ipv6 security fix of nearly
> equally severity yesterday.
> 
> Why did you leave that out?

Because I haven't been applying any network-related patches unless you
forward them to me, based on what happened the last time I did that
without asking :)

So, sorry, I didn't realize this was a big issue, can you forward the
needed patches to me?  I'll do a new release with them in it after I get
back from dinner.

thanks,

greg k-h

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.21-rc7-mm1: BUG_ON in kthread_bind during _cpu_down

2007-04-25 Thread Andrew Morton
On Thu, 26 Apr 2007 01:10:21 +0200 "Rafael J. Wysocki" <[EMAIL PROTECTED]> 
wrote:

> Hi,
> 
> The BUG_ON in khthread_bind (line 165 in kthread.c) triggers for me during
> attempted suspend to disk, when disable_nonboot_cpus() calls _cpu_down()
> (on x86_64).

I guess the backtrace would be pretty important here.

Guys, please don't add BUG_ONs unless there is simply no sane way to recover.

Because when someone goofs up, the BUG_ON will kill the whole machine and
everyone else who has code being tested in -mm loses a tester.

Plus a BUG_ON *greatly* decreases our chances of getting a trace from the
tester: dead box, nothing in the logs.


--- 
a/kernel/kthread.c~fix-kthread_create-vs-freezer-theoretical-race-dont-be-obnoxious
+++ a/kernel/kthread.c
@@ -162,7 +162,10 @@ EXPORT_SYMBOL(kthread_create);
  */
 void kthread_bind(struct task_struct *k, unsigned int cpu)
 {
-   BUG_ON(k->state != TASK_UNINTERRUPTIBLE);
+   if (k->state != TASK_UNINTERRUPTIBLE) {
+   WARN_ON(1);
+   return;
+   }
/* Must have done schedule() in kthread() before we set_task_cpu */
wait_task_inactive(k);
set_task_cpu(k, cpu);
_

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: suspend2 merge (was Re: [Suspend2-devel] Re: CFS and suspend2: hang in atomic copy)

2007-04-25 Thread Pavel Machek
Hi!

> > Both of them have to ensure you can make a consistent snapshot.
> 
> Bzzt. Wrong again. Very much so.
> 
> STR does not need to "ensure that you have a consistent snapshot".
> 
> Why? Becuase there is no _room_ for inconsistency. There's nothing to be 
> "inconsistent with", since any changes to memory (by things like DMA or 
> other setup that happens while the suspend process is going on) is by 
> _definition_ consistent with the resume image (becasue there is no 
> separate image).

Do you propose to keep DMAs running while suspending-to-RAM? That
sounds really unsafe; we are shutting down our PCI controllers at that
time; doing that while DMAs are running sounds bad.

> That's TOTALLY DIFFERENT from "suspend to disk". In suspend to disk, you 
> need a completely different kind of mindset, namely you need a single 
> consistent image, where the image is consistent not only with memory, but 
> with all the devices.
> 
> For example, the whole myth that "freeze" needs to shut off DMA is a total 
> and utter *myth*. It needs nothing of the sort at all. Rather than shut 
> off DMA and try to make the hardware be wevy wevy quiet while it's hunting 
> wabbits, it's a lot easier to just do nothing at all on "freeze",

No. Sorry, you are wrong here. 

Remember that during resume we run

freeze()
copy old data into memory
thaw()

. Now, if the old kernel left DMAs running, it could be overwriting
the data we are copying in. It is not about DMA tables. While
resuming, CPU needs to be alone, without interference from DMA engines
(or other CPUs), because copying back old image means writing to
memory that was not properly alocated.

(Now, we could add one more hook, turn_off_dmas_for_copyback(), but
that looks like way too many hooks to me. And I'm not comfortable with
DMA engines running while I'm trying to copy image. They may be
overwriting data I'm trying to copy...) 

Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Nonfunctional ethernet (was Re: 2.6.21-rc7-mm1 + sysfs-oops-workaround.patch -- INFO: possible recursive locking detected)

2007-04-25 Thread Antonino A. Daplas
On Wed, 2007-04-25 at 22:48 +0800, Antonino A. Daplas wrote:
> On Wed, 2007-04-25 at 14:18 +0900, Tejun Heo wrote:
> > Miles Lane wrote:

> eth0 renamed to eth54
> BUG: atomic counter underflow at:
>  [] show_trace_log_lvl+0x1a/0x30
>  [] show_trace+0x12/0x14
>  [] dump_stack+0x16/0x18
>  [] _atomic_dec_and_lock+0x29/0x4c
>  [] dput+0x34/0x103
>  [] sysfs_drop_dentry+0x141/0x149
>  [] sysfs_hash_and_remove+0x89/0x10e
>  [] sysfs_remove_link+0xe/0x10
>  [] device_rename+0x110/0x181
>  [] dev_change_name+0x11e/0x1ca
>  [] dev_ifsioc+0x330/0x3d7
>  [] dev_ioctl+0x350/0x46e
>  [] sock_ioctl+0x1be/0x1ca
>  [] do_ioctl+0x1c/0x53
>  [] vfs_ioctl+0x1ec/0x203
>  [] sys_ioctl+0x49/0x62
>  [] sysenter_past_esp+0x5f/0x99
>  ===

The above tracing was caused by CONFIG_SYSFS_DEPRECATED=y and by setting
this to n, the tracing disappeared..  Still, all my network cards are
non-functional.  Entries in /sys/class/net are bogus:

/ # cd /sys/class/net/
/sys/class/net # ls
eth1  eth44  eth54  lo

/sys/class/net # cd eth1
-bash: cd: eth1: No such file or directory

/sys/class/net # ls -l eth1
lrwxrwxrwx 1 root root 0 Apr 26 07:15 eth1 ->
../../devices/pci:00/:00:12.0/net/eth0

/sys/class/net # cd ../../devices/pci\:00/\:00\:12.0/net/eth0
-bash: cd: ../../devices/pci:00/:00:12.0/net/eth0: No such file
or directory

Do you know of any patches I need to revert/apply?  Anyway, I have to
boot back to this kernel and find out more what's going on.

Tony


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: W1 printk format warning

2007-04-25 Thread Andrew Morton
On Wed, 25 Apr 2007 16:21:04 -0700 Randy Dunlap <[EMAIL PROTECTED]> wrote:

> in 2.6.21-rc7-mm1.  Are you aware of this?
> 
> drivers/w1/w1.c:460: warning: too few arguments for format
> 
>   dev_dbg(>dev, "%s: registering %s as %p.\n", __func__,
>   >dev.bus_id[0]);
> 

Yeah, that's because Dan's dev_dbg-check-dev_dbg-arguments.patch added
printk arg-checking to dev_dbg() and a bunch of bugs got exposed.  I fixed
a few of them.

Incidentally, there are at least four different drivers which privately do
things like:

#if !defined(DEBUG)
#undef dev_dbg
static inline int __attribute__ ((format (printf, 2, 3))) dev_dbg(
const struct device *_dev, const char *fmt, ...) {return 0;}
#endif

which can all be removed with Dan's (good) patch in place.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] i386: For debugging, make the initial page table setup less forgiving.

2007-04-25 Thread Eric W. Biederman
Jeremy Fitzhardinge <[EMAIL PROTECTED]> writes:

> Eric W. Biederman wrote:
>> No.  Please just remove the conditionals on the leaf pages.
>>   
>
> So, to be specific, you mean make updating the pte_t entries (and pmd_t
> entries which refer to hugepages) entries unconditional?

I mean make updating pte_t and pmd_t entries that refer to identity
mapped physical pages unconditional.

>> We know exactly what we require them to be, there is minimal
>> cost and no downside to just setting the pte entries to
>> what we want them to be for the identity mapping.
>>
>> It doesn't make sense for paravirtualization or anything else to 
>> influence that.
>>
>> This may be redoing work that has been done before but it is
>> doing it all one common place.
>>   
>
> The issue is not a matter of avoiding duplicate work, but making sure
> all the pagetables are consistent from Xen's perspective.
>
> Specifically, you may not ever, at any time, create a writable mapping
> of a page which is currently part of an active pagetable.  This means
> that when we're creating mappings of physical memory, the pages which
> are part of the current pagetable must be mapped RO.  The easiest way I
> found to guarantee that is to copy the Xen-provided pagetable as a
> template, and only update pages which are missing.

Hmm.  I now see your problem.

> The other way I could do this is to have special-purpose init-time
> version of xen_set_pte which checks to see if it's making a RO mapping
> RW, and refuse to do it.  That would minimize the changes to mm/init.c,
> but give init-time set_pte rather unexpected hidden semantics.

Yes.  However how do we handle attempting to create this kind
of mapping when mmap /dev/mem?  or /dev/kmem?

I'm pretty certain there are other paths through the kernel where
we can get page table mapping.

Right now by leaving things read-only you are hiding from the kernel 
what you are really trying to do.  That makes me distinctly
uncomfortable.  In general when things get swept under the rug
we can never handle the properly.  Although this issue may be small
enough it doesn't matter.

I suspect what we want to do is come up with a function to call
to test to see if a page should be read-only and map such pages
_PAGE_KERNEL_RO, or _PAGE_KERNEL_RO_EXEC if it's code.

Speaking of things what are paravirt_alloc_pd and parafirt_alloc_pd 
supposed to do?

Eric
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: suspend2 merge (was Re: [Suspend2-devel] Re: CFS and suspend2: hang in atomic copy)

2007-04-25 Thread Pavel Machek
Hi!

> > Current design is:
> 
> Broken. Yes. I've tried to tell you.

Ok.

...

> It's worse than just confusing, it's *idiotic*.
> 
> It _can_ work in practice, but
>  - we have pretty damn solid evidence that it doesn't work all that often 
>in practice
>  - the fact that something *can* be done the stupid way is in no way an 
>argument that it *should* be done the stupid way.
> 
> I claim that the current STD is *stupid*. Yes, it can work. But that 
> doesn't make it less stupid.

Good. So you understand how it works.

> What's your argument? Your argument seems to be that it's not stupid, 
> because it can work. Can't you see that that simply isn't an
> argument at 

I tried keeping module_init/thaw/resume similar code, so that driver
authors can debug suspend-to-disk, cross their fingers, and have
suspend-to-ram work, too.

Now, perhaps enough people do std/str these days so this is not
important any longer... lets hope so.
Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


  1   2   3   4   5   6   7   8   9   10   >