2.6.21-rc7-mm2
ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.21-rc7/2.6.21-rc7-mm2/ - this has everything which is in 2.6.21. Plus more! - a number of nasty bugs were fixed. This should be (a lot) more stable than 2.6.21-rc7-mm1. Some sysfs-related problems are still expected. Fiddling with the setting of CONFIG_SYSFS_DEPRECATED might help avoid them. - the 64-bit futex patches and (consequently) the private-futex patches were dropped. Because the 64-bit futex patches need to be reconstituted. - the unprivileged mounts code was dropped, pending an updated patch series - lots of minor fbdev bugs were fixed Boilerplate: - See the `hot-fixes' directory for any important updates to this patchset. - To fetch an -mm tree using git, use (for example) git-fetch git://git.kernel.org/pub/scm/linux/kernel/git/smurf/linux-trees.git tag v2.6.16-rc2-mm1 git-checkout -b local-v2.6.16-rc2-mm1 v2.6.16-rc2-mm1 - -mm kernel commit activity can be reviewed by subscribing to the mm-commits mailing list. echo "subscribe mm-commits" | mail [EMAIL PROTECTED] - If you hit a bug in -mm and it is not obvious which patch caused it, it is most valuable if you can perform a bisection search to identify which patch introduced the bug. Instructions for this process are at http://www.zip.com.au/~akpm/linux/patches/stuff/bisecting-mm-trees.txt But beware that this process takes some time (around ten rebuilds and reboots), so consider reporting the bug first and if we cannot immediately identify the faulty patch, then perform the bisection search. - When reporting bugs, please try to Cc: the relevant maintainer and mailing list on any email. - When reporting bugs in this kernel via email, please also rewrite the email Subject: in some manner to reflect the nature of the bug. Some developers filter by Subject: when looking for messages to read. - Occasional snapshots of the -mm lineup are uploaded to ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/mm/ and are announced on the mm-commits list. Changes since 2.6.21-rc7-mm1: origin.patch git-acpi.patch git-alsa.patch git-agpgart.patch git-arm.patch git-avr32.patch git-cifs.patch git-cpufreq.patch git-powerpc.patch git-drm.patch git-dvb.patch git-gfs2-nmw.patch git-hid.patch git-ia64.patch git-ieee1394.patch git-infiniband.patch git-input.patch git-jfs.patch git-kbuild.patch git-kvm.patch git-leds.patch git-libata-all.patch git-md-accel.patch git-mips.patch git-mmc.patch git-mtd.patch git-ubi.patch git-netdev-all.patch git-e1000.patch git-net.patch git-ioat.patch git-nfs-server-cluster-locking-api.patch git-ocfs2.patch git-parisc.patch git-r8169.patch git-selinux.patch git-pciseg.patch git-s390.patch git-s390-fixup.patch git-sh.patch git-scsi-misc.patch git-block.patch git-watchdog.patch git-ipwireless_cs.patch git-cryptodev.patch git-gccbug.patch git trees -fix-possible-null-pointer-access-in-8250-serial-driver.patch -fix-oom-killing-processes-wrongly-thought-mpol_bind.patch -char-mxser_new-fix-recursive-locking.patch -char-mxser_new-fix-tiocmiwait.patch -char-mxser-fix-tiocmiwait.patch -taskstats-fix-the-structure-members-alignment-issue.patch -maintainers-use-listslinux-foundationorg.patch -paride-drivers-initialize-spinlocks.patch -add-mbuesch-to-mailmap.patch -fix-spelling-in-drivers-video-kconfig.patch -page-migration-fix-nr_file_pages-accounting.patch -ieee1394-update-maintainers-database.patch -v9fs-dont-use-primary-fid-when-removing-file.patch -acpi-thermal-fix-mod_timer-interval.patch -allow-reading-tainted-flag-as-user.patch -do-not-truncate-irq-number-for-icom-adapter.patch -hwmon-w83627ehf-dont-redefine-region_offset.patch -reiserfs-fix-xattr-root-locking-refcount-bug.patch -char-icom-mark-__init-as-__devinit.patch -fault-injection-add-entry-to-maintainers.patch -8250-fix-possible-deadlock-between-serial8250_handle_port-and-serial8250_interrupt.patch -oom-kill-all-threads-that-share-mm-with-killed-task.patch -fix-x86-fix-potential-overflow-in-perfctr-reservation.patch -cleanup-cpufreq-kconfig-options.patch -ppc-pci_32-stop-using-old-hotplug-unsafe-apis.patch -jdelvare-i2c-i2c-delete-scx200_i2c.patch -jdelvare-i2c-i2c-obsolete-ixp2000-and-ixp4xx.patch -jdelvare-hwmon-hwmon-smsc47m1-use-dynamic-attributes.patch -ide-cmd64x-remove-broken-sw-mw-dma-support.patch -ide-cmd64x-interrupt-status-fixes-resend.patch -ide-cmd64x-add-fix-enablebits.patch -ide-cmd64x-procfs-code-fixes-cleanups.patch -ide-cmd64x-use-interrupt-status-from-mrdmode-register.patch -ide-cmd64x-add-back-mwdma-support.patch -git-netdev-all-baycom_ser_fdx-fix.patch -fix-sparse-errors-in-drivers-net-ibmvethc.patch -netdrv-perform-missing-csum_offset-conversions.patch -x86_64-mm-remove-noreplacement.patch -fix-x86_64-mm-fam10-mwait-idle.patch -more-fix-x86_64-mm-fam10-mwait-idle.patch -fix-x86_64-mm-sched-clock-share.patch
Re: [PATCH -mm] x86_64: kill 19000+ sparse warnings
On Wed, 25 Apr 2007 22:45:09 -0700 Randy Dunlap <[EMAIL PROTECTED]> wrote: > From: Randy Dunlap <[EMAIL PROTECTED]> > > Eliminate 19439 (!!) sparse warnings like: > include/linux/mm.h:321:22: warning: constant 0x8100 is so big it > is unsigned long > > Eliminate 56 sparse warnings like: > arch/x86_64/kernel/setup.c:248:16: warning: constant 0x8000 is so > big it is unsigned long > > Eliminate 5 sparse warnings like: > arch/x86_64/kernel/module.c:49:13: warning: constant 0xfff0 is so > big it is unsigned long > > Eliminate 23 sparse warnings like: > arch/x86_64/mm/init.c:551:37: warning: constant 0xc200 is so big > it is unsigned long > > Eliminate 6 sparse warnings like: > arch/x86_64/kernel/module.c:49:13: warning: constant 0x8800 is so > big it is unsigned long > > Eliminate 23 sparse warnings like: > arch/x86_64/mm/init.c:552:6: warning: constant 0xe1ff is so big > it is unsigned long > > Eliminate 3 sparse warnings like: > arch/x86_64/kernel/e820.c:186:17: warning: constant 0x3fff is so big > it is long > > ... > > +#ifdef __ASSEMBLY__ > #define MAXMEM0x3fff > #define VMALLOC_START0xc200 > #define VMALLOC_END 0xe1ff > #define MODULES_VADDR0x8800 > #define MODULES_END 0xfff0 > #define MODULES_LEN (MODULES_END - MODULES_VADDR) > +#else > +#define MAXMEM0x3fffUL > +#define VMALLOC_START0xc200UL > +#define VMALLOC_END 0xe1ffUL > +#define MODULES_VADDR0x8800UL > +#define MODULES_END 0xfff0UL > +#define MODULES_LEN (MODULES_END - MODULES_VADDR) > +#endif > hm, the duplication is unfortunate. I wonder if it's worth doing a cpp token-pasting trick to avoid having to do that. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Linux 2.6.21
Greg KH wrote: On Thu, Apr 26, 2007 at 06:08:06AM +0200, Adrian Bunk wrote: What I will NOT do: Waste my time with tracking 2.6.22-rc regressions. I sure hope you don't do this. Tracking these is tough, and I think you are doing a great job with it. No release will have no regressions, there's just too many different combinations of hardware and sometimes people don't have the time to test to see if their original report is even fixed or not. And some of them will get fixed with patches coming in the next kernel release, which will then be tracked down and added to the -stable releases. So if you can, please keep it up, if you think it's a thankless job, here's my hearty thanks for doing this work. It's really needed and I really appreciate it. Fifthed here, Adrian. It could potentially become one of the best things to happen to the mainline release process (and I believe has already been worthwhile). Even if it takes a while for people to get on board, or some regressions slip through. And note, a release with regressions doesn't make your hard work useless -- you've still got the important who, when, how, etc. info that can be used in future, and it could serve as a "known issues for upgraders" document as well. -- SUSE Labs, Novell Inc. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [00/17] Large Blocksize Support V3
Christoph Lameter <[EMAIL PROTECTED]> writes: > On Wed, 25 Apr 2007, Eric W. Biederman wrote: > >> The page cache has no problems supporting things with a block >> size larger then page size. Now the block device layer may not >> have the code to do the scatter gather into small pages and it >> may not handle buffer heads whose data is split between multiple >> pages. > > It does have that problem. If a system is in use then memory is fragmented > and requests to the devices are in 4k sizes. The kernel has to manage the > 4k size. The number of requests that the driver can take is limited. > Larger blocks allow shuffling more data to the device. I have a hard time believe that device hardware limits don't allow them to have enough space to handle larger requests. If so it was a poor design by the hardware manufacturers. >> And generally larger physical pages are a mistake to use. >> Especially as it looks from some of the later comment you don't >> date test on 32bit because the memory fragments faster. > > Ummm.. Dont get me to comment on i386. I never said that memory fragments > faster on i386. i386 has multiple issues with memory management that > require a lot of work and that will cause difficulty. If you have these > fun systems with 512k ZONE_NORMAL and 63GB HIGHMEM then good luck... > >> Is it common for hardware that supports large block sizes to not >> support splitting those blocks apart during DMA? Unless it is common >> the whole premise of this patchset seems broken. > > Huh? Splitting the blocks requires hardware effort -> Reduction in > transfer rate. Splitting the blocks doesn't change the transfer effort one iota. The bus pci/pcie/hypertransport already have block sizes below 4KB. Reading a longer list of descriptors might slow things down, but I would be surprised. The physical medium is the primary disk bottleneck. Thinking about it the fastest thing I can do with a filesystem or disk is to not use it. That is to cache it efficiently. Having page sized chunks in my cache increases my caching efficiency. Large order pages work directly against my caching efficiency. >> I suspect what needs to be fixed is the page cache block device >> interface so that we have helper functions that know how to stuff >> a single block into several pages. > > Oh we have scores of these hacks around. Look at the dvd/cd layer. The > point is to get rid of those. Perhaps this is just a matter of cleaning them up so they are no longer hacks? You are trying to couple something that has no business being coupled as it reduces the system usability when you couple them. >> Right now I don't even want to think about trying to use a swap device >> with a large block size when we are low on memory. > > But that is due to the VM (at least Linus tree) having no defrag methods. > mm has Mel's antifrag methods and can do it. This is fundamental. Fragmentation when you multiple chunk sizes cannot be solved without a the ability to move things in memory, whereas it doesn't exist when you only have a single chunk size. >> > 2. 32/64k blocksize is also used in flash devices. Same issues. >> >> flash devices are not block devices so I strongly doubt it is >> the same issue. > > But they could be treated as such. Right now these poor guys have to > improvise around the page size limit. The reason they are different is that they have very different fundamental properties. Flash devices have essentially no seek time so random access if fast. However the have a maximum number of erases per sector so you have to be careful to do wear leveling. Flash devices are distinctly different, and using the block layer for them while they do not behave like block devices is the wrong thing to do. >> > 4. Reduce fsck times. Larger block sizes mean faster file system checking. >> >> Fewer seeks and less meta-data means faster fsck times. Larger block >> sizes get us there only tangentially. > > Less meta data to manage does not reduce fsck times? Going from order 0 to > order 2 blocks cuts the metadata to a fourth. I agree that less meta data helps. But switching to extents can reduce the meta data much more, and still doesn't penalize you for small files if you have them. >> > 5. Performance. If we look at IA64 vs. x86_64 then it seems that the >> >faster interrupt handling on x86_64 compensate for the speed loss due to >> >a smaller page size (4k vs 16k on IA64). Supporting larger block sizes >> > sizes on all allows a significant reduction in I/O overhead and increases >> >the size of I/O that can be performed by hardware in a single request >> >since the number of scatter gather entries are typically limited for >> >one request. This is going to become increasingly important to support >> >the ever growing memory sizes since we may have to handle excessively >> >large amounts of 4k requests for data sizes that may become common >> >soon. For example to write a 1
Re: pgprot_writecombine() and PATs on x86
> So in general the pci prefetchable attribute means write-combining as > well as prefetching is safe. A sane BIOS will allocate prefetchable > BARS contiguously in the address space. So on a good day you > can just use one MTRR to map all of the prefetchable BARs as write-combining. Good point, and sounds easy enough. So why does not linux do it automatically then where possible? There are sure to be some broken devices, but if some device can't live with WC, we can always disable WC system-wide. -- MST - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH -mm] x86_64: kill 19000+ sparse warnings
From: Randy Dunlap <[EMAIL PROTECTED]> Eliminate 19439 (!!) sparse warnings like: include/linux/mm.h:321:22: warning: constant 0x8100 is so big it is unsigned long Eliminate 56 sparse warnings like: arch/x86_64/kernel/setup.c:248:16: warning: constant 0x8000 is so big it is unsigned long Eliminate 5 sparse warnings like: arch/x86_64/kernel/module.c:49:13: warning: constant 0xfff0 is so big it is unsigned long Eliminate 23 sparse warnings like: arch/x86_64/mm/init.c:551:37: warning: constant 0xc200 is so big it is unsigned long Eliminate 6 sparse warnings like: arch/x86_64/kernel/module.c:49:13: warning: constant 0x8800 is so big it is unsigned long Eliminate 23 sparse warnings like: arch/x86_64/mm/init.c:552:6: warning: constant 0xe1ff is so big it is unsigned long Eliminate 3 sparse warnings like: arch/x86_64/kernel/e820.c:186:17: warning: constant 0x3fff is so big it is long Signed-off-by: Randy Dunlap <[EMAIL PROTECTED]> --- include/asm-x86_64/page.h| 11 +++ include/asm-x86_64/pgtable.h |9 + 2 files changed, 20 insertions(+) --- linux-2.6.21-rc7-mm1.orig/include/asm-x86_64/page.h +++ linux-2.6.21-rc7-mm1/include/asm-x86_64/page.h @@ -80,9 +80,16 @@ extern unsigned long phys_base; #define __PHYSICAL_START CONFIG_PHYSICAL_START #define __KERNEL_ALIGN 0x20 + +#ifdef __ASSEMBLY__ #define __START_KERNEL (__START_KERNEL_map + __PHYSICAL_START) #define __START_KERNEL_map 0x8000 #define __PAGE_OFFSET 0x8100 +#else +#define __START_KERNEL (__START_KERNEL_map + __PHYSICAL_START) +#define __START_KERNEL_map 0x8000UL +#define __PAGE_OFFSET 0x8100UL +#endif /* to align the pointer to the (next) page boundary */ #define PAGE_ALIGN(addr) (((addr)+PAGE_SIZE-1)_MASK) @@ -94,7 +101,11 @@ extern unsigned long phys_base; #define __VIRTUAL_MASK ((_AC(1,UL) << __VIRTUAL_MASK_SHIFT) - 1) #define KERNEL_TEXT_SIZE (40*1024*1024) +#ifdef __ASSEMBLY__ #define KERNEL_TEXT_START 0x8000 +#else +#define KERNEL_TEXT_START 0x8000UL +#endif #ifndef __ASSEMBLY__ --- linux-2.6.21-rc7-mm1.orig/include/asm-x86_64/pgtable.h +++ linux-2.6.21-rc7-mm1/include/asm-x86_64/pgtable.h @@ -134,12 +134,21 @@ static inline pte_t ptep_get_and_clear_f #define USER_PTRS_PER_PGD ((TASK_SIZE-1)/PGDIR_SIZE+1) #define FIRST_USER_ADDRESS 0 +#ifdef __ASSEMBLY__ #define MAXMEM 0x3fff #define VMALLOC_START0xc200 #define VMALLOC_END 0xe1ff #define MODULES_VADDR0x8800 #define MODULES_END 0xfff0 #define MODULES_LEN (MODULES_END - MODULES_VADDR) +#else +#define MAXMEM 0x3fffUL +#define VMALLOC_START0xc200UL +#define VMALLOC_END 0xe1ffUL +#define MODULES_VADDR0x8800UL +#define MODULES_END 0xfff0UL +#define MODULES_LEN (MODULES_END - MODULES_VADDR) +#endif #define _PAGE_BIT_PRESENT 0 #define _PAGE_BIT_RW 1 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [00/17] Large Blocksize Support V3
Eric W. Biederman wrote: [EMAIL PROTECTED] writes: V2->V3 - More restructuring - It actually works! - Add XFS support - Fix up UP support - Work out the direct I/O issues - Add CONFIG_LARGE_BLOCKSIZE. Off by default which makes the inlines revert back to constants. Disabled for 32bit and HIGHMEM configurations. This also allows a gradual migration to the new page cache inline functions. LARGE_BLOCKSIZE capabilities can be added gradually and if there is a problem then we can disable a subsystem. V1->V2 - Some ext2 support - Some block layer, fs layer support etc. - Better page cache macros - Use macros to clean up code. This patchset modifies the Linux kernel so that larger block sizes than page size can be supported. Larger block sizes are handled by using compound pages of an arbitrary order for the page cache instead of single pages with order 0. Huh? You seem to be mixing two very different concepts. The page cache has no problems supporting things with a block size larger then page size. Now the block device layer may not have the code to do the scatter gather into small pages and it may not handle buffer heads whose data is split between multiple pages. Yeah, this patch is not really large blocksize support (which we normally think of as block size > page cache size). But this is not a page cache issue. And generally larger physical pages are a mistake to use. Especially as it looks from some of the later comment you don't date test on 32bit because the memory fragments faster. I actually completely agree with this, and I'm concerned in general about using higher order pages. I think it is fundamentally the wrong approach because of fragmentation and defragmentation costs (similarly to Linus's take on page colouring). I think starting with the assumption that we _want_ to use higher order allocations, and then creating all this complexity around that is not a good one, and if we start introducing things that _require_ significant higher order allocations to function then it is a nasty thing for robustness. Is it common for hardware that supports large block sizes to not support splitting those blocks apart during DMA? Unless it is common the whole premise of this patchset seems broken. I suspect what needs to be fixed is the page cache block device interface so that we have helper functions that know how to stuff a single block into several pages. I am working now and again on some code to do this, it is a big job but I think it is the right way to do it. But it would take a long time to get stable and supported by filesystems... That would make the choice of using larger order pages (essentially increasing PAGE_SIZE) something that can be investigated in parallel. I agree that hardware inefficiencies should be handled by increasing PAGE_SIZE (not making PAGE_CACHE_SIZE > PAGE_SIZE) at the arch level. -- SUSE Labs, Novell Inc. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[patch 7/7] SLUB: Major slabinfo update
Enhancement to slabinfo - Support for slab shrinking (-r option) - Slab summary showing system totals - Sync with new form of alias handling - Sort by size, reverse sorting etc - Alias lookups Signed-off-by: Christoph Lameter <[EMAIL PROTECTED]> Index: linux-2.6.21-rc7-mm1/Documentation/vm/slabinfo.c === --- linux-2.6.21-rc7-mm1.orig/Documentation/vm/slabinfo.c 2007-04-25 21:20:24.0 -0700 +++ linux-2.6.21-rc7-mm1/Documentation/vm/slabinfo.c2007-04-25 21:46:40.0 -0700 @@ -3,7 +3,7 @@ * * (C) 2007 sgi, Christoph Lameter <[EMAIL PROTECTED]> * - * Compile by doing: + * Compile by: * * gcc -o slabinfo slabinfo.c */ @@ -17,15 +17,47 @@ #include #include +#define MAX_SLABS 500 +#define MAX_ALIASES 500 +#define MAX_NODES 1024 + +struct slabinfo { + char *name; + int alias; + int refs; + int aliases, align, cache_dma, cpu_slabs, destroy_by_rcu; + int hwcache_align, object_size, objs_per_slab; + int sanity_checks, slab_size, store_user, trace; + int order, poison, reclaim_account, red_zone; + unsigned long partial, objects, slabs; + int numa[MAX_NODES]; + int numa_partial[MAX_NODES]; +} slabinfo[MAX_SLABS]; + +struct aliasinfo { + char *name; + char *ref; + struct slabinfo *slab; +} aliasinfo[MAX_ALIASES]; + +int slabs = 0; +int aliases = 0; +int highest_node = 0; + char buffer[4096]; int show_alias = 0; int show_slab = 0; -int show_parameters = 0; int skip_zero = 1; int show_numa = 0; int show_track = 0; +int show_first_alias = 0; int validate = 0; +int shrink = 0; +int show_inverted = 0; +int show_single_ref = 0; +int show_totals = 0; +int sort_size = 0; int page_size; @@ -47,11 +79,16 @@ void usage(void) "-a|--aliases Show aliases\n" "-h|--help Show usage information\n" "-n|--numa Show NUMA information\n" - "-p|--parametersShow global parameters\n" + "-r|--reduceShrink slabs\n" "-v|--validate Validate slabs\n" "-t|--tracking Show alloc/free information\n" + "-T|--TotalsShow summary information\n" "-s|--slabs Show slabs\n" + "-S|--Size Sort by size\n" "-z|--zero Include empty slabs\n" + "-f|--first-alias Show first alias\n" + "-i|--inverted Inverted list\n" + "-1|--1ref Single reference\n" ); } @@ -86,23 +123,32 @@ unsigned long get_obj(char *name) unsigned long get_obj_and_str(char *name, char **x) { unsigned long result = 0; + char *p; + + *x = NULL; if (!read_obj(name)) { x = NULL; return 0; } - result = strtoul(buffer, x, 10); - while (**x == ' ') - (*x)++; + result = strtoul(buffer, , 10); + while (*p == ' ') + p++; + if (*p) + *x = strdup(p); return result; } -void set_obj(char *name, int n) +void set_obj(struct slabinfo *s, char *name, int n) { - FILE *f = fopen(name, "w"); + char x[100]; + + sprintf(x, "%s/%s", s->name, name); + + FILE *f = fopen(x, "w"); if (!f) - fatal("Cannot write to %s\n", name); + fatal("Cannot write to %s\n", x); fprintf(f, "%d\n", n); fclose(f); @@ -143,167 +189,613 @@ int store_size(char *buffer, unsigned lo return n; } -void alias(const char *name) +void decode_numa_list(int *numa, char *t) { - int count; - char *p; - - if (!show_alias) - return; + int node; + int nr; - count = readlink(name, buffer, sizeof(buffer)); + memset(numa, 0, MAX_NODES * sizeof(int)); - if (count < 0) - return; + while (*t == 'N') { + t++; + node = strtoul(t, , 10); + if (*t == '=') { + t++; + nr = strtoul(t, , 10); + numa[node] = nr; + if (node > highest_node) + highest_node = node; + } + while (*t == ' ') + t++; + } +} - buffer[count] = 0; +char *hackname(struct slabinfo *s) +{ + char *n = s->name; - p = buffer + count; + if (n[0] == ':') { + char *nn = malloc(20); + char *p; + + strncpy(nn, n, 20); + n = nn; + p = n + 4; + while (*p && *p !=':') + p++; + *p = 0; + } + return n; +} - while (p > buffer && p[-1] != '/') -
[patch 6/7] SLUB: Free slabs and sort partial slab lists in kmem_cache_shrink
At kmem_cache_shrink check if we have any empty slabs on the partial if so then remove them. Also--as an anti-fragmentation measure--sort the partial slabs so that the most fully allocated ones come first and the least allocated last. The next allocations may fill up the nearly full slabs. Having the least allocated slabs last gives them the maximum chance that their remaining objects may be freed. Thus we can hopefully minimize the partial slabs. I think this is the best one can do in terms antifragmentation measures. Real defragmentation (meaning moving objects out of slabs with the least free objects to those that are almost full) can be implemted by reverse scanning through the list produced here but that would mean that we need to provide a callback at slab cache creation that allows the deletion or moving of an object. This will involve slab API changes so defer for now. Signed-off-by: Christoph Lameter <[EMAIL PROTECTED]> --- mm/slub.c | 118 ++ 1 file changed, 104 insertions(+), 14 deletions(-) Index: linux-2.6.21-rc7-mm1/mm/slub.c === --- linux-2.6.21-rc7-mm1.orig/mm/slub.c 2007-04-25 21:25:48.0 -0700 +++ linux-2.6.21-rc7-mm1/mm/slub.c 2007-04-25 21:27:07.0 -0700 @@ -109,9 +109,19 @@ /* Enable to test recovery from slab corruption on boot */ #undef SLUB_RESILIENCY_TEST -/* Mininum number of partial slabs */ +/* + * Mininum number of partial slabs. These will be left on the partial + * lists even if they are empty. kmem_cache_shrink may reclaim them. + */ #define MIN_PARTIAL 2 +/* + * Maximum number of desirable partial slabs. + * The existence of more partial slabs makes kmem_cache_shrink + * sort the partial list by the number of objects in the. + */ +#define MAX_PARTIAL 10 + #define DEBUG_DEFAULT_FLAGS (SLAB_DEBUG_FREE | SLAB_RED_ZONE | \ SLAB_POISON | SLAB_STORE_USER) /* @@ -2163,6 +2173,78 @@ void kfree(const void *x) } EXPORT_SYMBOL(kfree); +/* + * kmem_cache_shrink removes empty slabs from the partial lists + * and then sorts the partially allocated slabs by the number + * of items in use. The slabs with the most items in use + * come first. New allocations will remove these from the + * partial list because they are full. The slabs with the + * least items are placed last. If it happens that the objects + * are freed then the page can be returned to the page allocator. + */ +int kmem_cache_shrink(struct kmem_cache *s) +{ + int node; + int i; + struct kmem_cache_node *n; + struct page *page; + struct page *t; + struct list_head *slabs_by_inuse = + kmalloc(sizeof(struct list_head) * s->objects, GFP_KERNEL); + unsigned long flags; + + if (!slabs_by_inuse) + return -ENOMEM; + + flush_all(s); + for_each_online_node(node) { + n = get_node(s, node); + + if (n->nr_partial <= MIN_PARTIAL) + continue; + + for (i = 0; i < s->objects; i++) + INIT_LIST_HEAD(slabs_by_inuse + i); + + spin_lock_irqsave(>list_lock, flags); + + /* +* Build lists indexed by the items in use in +* each slab or free slabs if empty. +* +* Note that concurrent frees may occur while +* we hold the list_lock. page->inuse here is +* the upper limit. +*/ + list_for_each_entry_safe(page, t, >partial, lru) { + if (!page->inuse) { + list_del(>lru); + discard_slab(s, page); + } else + if (n->nr_partial > MAX_PARTIAL) + list_move(>lru, + slabs_by_inuse + page->inuse); + } + + if (n->nr_partial <= MAX_PARTIAL) + goto out; + + /* +* Rebuild the partial list with the slabs filled up +* most first and the least used slabs at the end. +*/ + for (i = s->objects - 1; i > 0; i--) + list_splice(slabs_by_inuse + i, n->partial.prev); + + out: + spin_unlock_irqrestore(>list_lock, flags); + } + + kfree(slabs_by_inuse); + return 0; +} +EXPORT_SYMBOL(kmem_cache_shrink); + /** * krealloc - reallocate memory. The contents will remain unchanged. * @@ -2408,17 +2490,6 @@ static struct notifier_block __cpuinitda #endif -/*** - * Compatiblility definitions - **/ - -int kmem_cache_shrink(struct kmem_cache *s) -{ -
Re: Question about Reiser4
On Wed, 25 Apr 2007 22:49:11 +0800, "Jeff Chua" <[EMAIL PROTECTED]> said: > > Reiser4 has great potential and I'll be more than happy to test it. > Yeah,... let us know the details of your testing. -- [EMAIL PROTECTED] -- http://www.fastmail.fm - Access all of your messages and folders wherever you are - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[patch 4/7] SLUB: Conform more to SLABs SLAB_HWCACHE_ALIGN behavior
Currently SLUB is using a strict L1_CACHE_BYTES alignment if SLAB_HWCACHE_ALIGN is specified. SLAB does not align to a cacheline if the object is smaller than half of a cacheline. Small objects are then aligned by SLAB to a fraction of a cacheline. Make SLUB just forget about the alignment requirement if the object size is less than L1_CACHE_BYTES. It seems that fractional alignments are no good because they grow the object and reduce the object density in a cache line needlessly causing additional cache line fetches. If we are already throwing the user suggestion of a cache line alignment away then lets do the best we can. Maybe SLAB_HWCACHE_ALIGN also needs to be tossed given its wishy-washy handling but doing so would require an audit of all kmem_cache_allocs throughout the kernel source. In any case one needs to explictly specify an alignment during kmem_cache_create to either slab allocator in order to ensure that the objects are cacheline aligned. Signed-off-by: Christoph Lameter <[EMAIL PROTECTED]> Index: linux-2.6.21-rc7-mm1/mm/slub.c === --- linux-2.6.21-rc7-mm1.orig/mm/slub.c 2007-04-25 21:23:56.0 -0700 +++ linux-2.6.21-rc7-mm1/mm/slub.c 2007-04-25 21:23:59.0 -0700 @@ -1482,9 +1482,19 @@ static int calculate_order(int size) * various ways of specifying it. */ static unsigned long calculate_alignment(unsigned long flags, - unsigned long align) + unsigned long align, unsigned long size) { - if (flags & SLAB_HWCACHE_ALIGN) + /* +* If the user wants hardware cache aligned objects then +* follow that suggestion if the object is sufficiently +* large. +* +* The hardware cache alignment cannot override the +* specified alignment though. If that is greater +* then use it. +*/ + if ((flags & SLAB_HWCACHE_ALIGN) && + size > L1_CACHE_BYTES / 2) return max_t(unsigned long, align, L1_CACHE_BYTES); if (align < ARCH_SLAB_MINALIGN) @@ -1673,7 +1683,7 @@ static int calculate_sizes(struct kmem_c * user specified (this is unecessarily complex due to the attempt * to be compatible with SLAB. Should be cleaned up some day). */ - align = calculate_alignment(flags, align); + align = calculate_alignment(flags, align, s->objsize); /* * SLUB stores one object immediately after another beginning from @@ -2250,7 +2260,7 @@ static struct kmem_cache *find_mergeable return NULL; size = ALIGN(size, sizeof(void *)); - align = calculate_alignment(flags, align); + align = calculate_alignment(flags, align, size); size = ALIGN(size, align); list_for_each(h, _caches) { -- - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[patch 3/7] SLUB: debug printk cleanup
Set up a new function slab_err in order to report errors consistently. Consistently report corrective actions taken by SLUB by a printk starting with @@@. Signed-off-by: Christoph Lameter <[EMAIL PROTECTED]> Index: linux-2.6.21-rc7-mm1/mm/slub.c === --- linux-2.6.21-rc7-mm1.orig/mm/slub.c 2007-04-25 21:20:36.0 -0700 +++ linux-2.6.21-rc7-mm1/mm/slub.c 2007-04-25 21:22:50.0 -0700 @@ -324,8 +324,8 @@ static void object_err(struct kmem_cache { u8 *addr = page_address(page); - printk(KERN_ERR "*** SLUB: %s in [EMAIL PROTECTED] slab 0x%p\n", - reason, s->name, object, page); + printk(KERN_ERR "*** SLUB %s: [EMAIL PROTECTED] slab 0x%p\n", + s->name, reason, object, page); printk(KERN_ERR "offset=%tu flags=0x%04lx inuse=%u freelist=0x%p\n", object - addr, page->flags, page->inuse, page->freelist); if (object > addr + 16) @@ -335,6 +335,19 @@ static void object_err(struct kmem_cache dump_stack(); } +static void slab_err(struct kmem_cache *s, struct page *page, char *reason, ...) +{ + va_list args; + char buf[100]; + + va_start(args, reason); + vsnprintf(buf, sizeof(buf), reason, args); + va_end(args); + printk(KERN_ERR "*** SLUB %s: %s in slab @0x%p\n", s->name, buf, + page); + dump_stack(); +} + static void init_object(struct kmem_cache *s, void *object, int active) { u8 *p = object; @@ -412,7 +425,7 @@ static int check_valid_pointer(struct km static void restore_bytes(struct kmem_cache *s, char *message, u8 data, void *from, void *to) { - printk(KERN_ERR "@@@ SLUB: %s Restoring %s (0x%x) from 0x%p-0x%p\n", + printk(KERN_ERR "@@@ SLUB %s: Restoring %s (0x%x) from 0x%p-0x%p\n", s->name, message, data, from, to - 1); memset(from, data, to - from); } @@ -459,9 +472,7 @@ static int slab_pad_check(struct kmem_ca return 1; if (!check_bytes(p + length, POISON_INUSE, remainder)) { - printk(KERN_ERR "SLUB: %s slab 0x%p: Padding fails check\n", - s->name, p); - dump_stack(); + slab_err(s, page, "Padding check failed"); restore_bytes(s, "slab padding", POISON_INUSE, p + length, p + length + remainder); return 0; @@ -547,30 +558,25 @@ static int check_slab(struct kmem_cache VM_BUG_ON(!irqs_disabled()); if (!PageSlab(page)) { - printk(KERN_ERR "SLUB: %s Not a valid slab page @0x%p " - "flags=%lx mapping=0x%p count=%d \n", - s->name, page, page->flags, page->mapping, + slab_err(s, page, "Not a valid slab page flags=%lx " + "mapping=0x%p count=%d", page->flags, page->mapping, page_count(page)); return 0; } if (page->offset * sizeof(void *) != s->offset) { - printk(KERN_ERR "SLUB: %s Corrupted offset %lu in slab @0x%p" - " flags=0x%lx mapping=0x%p count=%d\n", - s->name, + slab_err(s, page, "Corrupted offset %lu flags=0x%lx " + "mapping=0x%p count=%d", (unsigned long)(page->offset * sizeof(void *)), - page, page->flags, page->mapping, page_count(page)); - dump_stack(); return 0; } if (page->inuse > s->objects) { - printk(KERN_ERR "SLUB: %s inuse %u > max %u in slab " - "page @0x%p flags=%lx mapping=0x%p count=%d\n", - s->name, page->inuse, s->objects, page, page->flags, + slab_err(s, page, "inuse %u > max %u @0x%p flags=%lx " + "mapping=0x%p count=%d", + s->name, page->inuse, s->objects, page->flags, page->mapping, page_count(page)); - dump_stack(); return 0; } /* Slab_pad_check fixes things up after itself */ @@ -599,12 +605,13 @@ static int on_freelist(struct kmem_cache set_freepointer(s, object, NULL); break; } else { - printk(KERN_ERR "SLUB: %s slab 0x%p " - "freepointer 0x%p corrupted.\n", - s->name, page, fp); - dump_stack(); + slab_err(s, page, "Freepointer 0x%p corrupt", + fp);
[patch 2/7] SLAB: Fix sysfs directory handling
This fixes the problem that SLUB does not track the names of aliased slabs by changing the way that SLUB manages the files in /sys/slab. If the slab that is being operated on is not mergeable (usually the case if we are debugging) then do not create any aliases. If an alias exists that we conflict with then remove it before creating the directory for the unmergeable slab. If there is a true slab cache there and not an alias then we fail since there is a true duplication of slab cache names. So debugging allows the detection of slab name duplication as usual. If the slab is mergeable then we create a directory with a unique name created from the slab size, slab options and the pointer to the kmem_cache structure (disambiguation). All names referring to the slabs will then be created as symlinks to that unique name. These symlinks are not going to be removed on kmem_cache_destroy() since we only carry a counter for the number of aliases. If a new symlink is created then it may just replace an existing one. This means that one can create a gazillion slabs with the same name (if they all refer to mergeable caches). It will only increase the alias count. So we have the potential of not detecting duplicate slab names (there is actually no harm done by doing that). We will detect the duplications as as soon as debugging is enabled because we will then no longer generate symlinks and special unique names. Signed-off-by: Christoph Lameter <[EMAIL PROTECTED]> Index: linux-2.6.21-rc7-mm1/mm/slub.c === --- linux-2.6.21-rc7-mm1.orig/mm/slub.c 2007-04-25 19:41:23.0 -0700 +++ linux-2.6.21-rc7-mm1/mm/slub.c 2007-04-25 19:41:23.0 -0700 @@ -3297,16 +3297,68 @@ static struct kset_uevent_ops slab_ueven decl_subsys(slab, _ktype, _uevent_ops); +#define ID_STR_LENGTH 64 + +/* Create a unique string id for a slab cache: + * format + * :[flags-]size:[memory address of kmemcache] + */ +static char *create_unique_id(struct kmem_cache *s) +{ + char *name = kmalloc(ID_STR_LENGTH, GFP_KERNEL); + char *p = name; + + BUG_ON(!name); + + *p++ = ':'; + /* +* First flags affecting slabcache operations */ + if (s->flags & SLAB_CACHE_DMA) + *p++ = 'd'; + if (s->flags & SLAB_RECLAIM_ACCOUNT) + *p++ = 'a'; + if (s->flags & SLAB_DESTROY_BY_RCU) + *p++ = 'r';\ + /* Debug flags */ + if (s->flags & SLAB_RED_ZONE) + *p++ = 'Z'; + if (s->flags & SLAB_POISON) + *p++ = 'P'; + if (s->flags & SLAB_STORE_USER) + *p++ = 'U'; + if (p != name + 1) + *p++ = '-'; + p += sprintf(p,"%07d:0x%p" ,s->size, s); + BUG_ON(p > name + ID_STR_LENGTH - 1); + return name; +} + static int sysfs_slab_add(struct kmem_cache *s) { int err; + const char *name; if (slab_state < SYSFS) /* Defer until later */ return 0; + if (s->flags & SLUB_NEVER_MERGE) { + /* +* Slabcache can never be merged so we can use the name proper. +* This is typically the case for debug situations. In that +* case we can catch duplicate names easily. +*/ + sysfs_remove_link(_subsys.kset.kobj, s->name); + name = s->name; + } else + /* +* Create a unique name for the slab as a target +* for the symlinks. +*/ + name = create_unique_id(s); + kobj_set_kset_s(s, slab_subsys); - kobject_set_name(>kobj, s->name); + kobject_set_name(>kobj, name); kobject_init(>kobj); err = kobject_add(>kobj); if (err) @@ -3316,6 +3368,10 @@ static int sysfs_slab_add(struct kmem_ca if (err) return err; kobject_uevent(>kobj, KOBJ_ADD); + if (!(s->flags & SLUB_NEVER_MERGE)) { + sysfs_slab_alias(s, s->name); + kfree(name); + } return 0; } @@ -3341,9 +3397,14 @@ static int sysfs_slab_alias(struct kmem_ { struct saved_alias *al; - if (slab_state == SYSFS) + if (slab_state == SYSFS) { + /* +* If we have a leftover link then remove it. +*/ + sysfs_remove_link(_subsys.kset.kobj, name); return sysfs_create_link(_subsys.kset.kobj, >kobj, name); + } al = kmalloc(sizeof(struct saved_alias), GFP_KERNEL); if (!al) -- - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[patch 0/7] SLUB updates
A series of updates to slub to make error reporting and recovery more consistent. Rework sysfs behavior, make kmem_cache_shrink perform fragmentation avoidance and update the slabinfo tool. -- - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[patch 1/7] SLUB: Remove duplicate VM_BUG_ON
Somehow this artifact got in during merge with mm. Signed-off-by: Christoph Lameter <[EMAIL PROTECTED]> Index: linux-2.6.21-rc7-mm1/mm/slub.c === --- linux-2.6.21-rc7-mm1.orig/mm/slub.c 2007-04-25 09:48:40.0 -0700 +++ linux-2.6.21-rc7-mm1/mm/slub.c 2007-04-25 09:48:47.0 -0700 @@ -633,8 +633,6 @@ static void add_full(struct kmem_cache * VM_BUG_ON(!irqs_disabled()); - VM_BUG_ON(!irqs_disabled()); - if (!(s->flags & SLAB_STORE_USER)) return; -- - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[patch 5/7] SLUB: Add MIN_PARTIAL
We leave a mininum of partial slabs on nodes when we search for partial slabs on other node. Define a constant for that value. Then modify slub to keep MIN_PARTIAL slabs around. This avoids bad situations where a function frees the last object in a slab (which results in the page being returned to the page allocator) only to then allocate one again (which requires getting a page back from the page allocator if the partial list was empty). Keeping a couple of slabs on the partial list reduces overhead. Empty slabs are added to the end of the partial list to ensure that partially allocated slabs are consumed first (defragmentation). Signed-off-by: Christoph Lameter <[EMAIL PROTECTED]> Index: linux-2.6.21-rc7-mm1/mm/slub.c === --- linux-2.6.21-rc7-mm1.orig/mm/slub.c 2007-04-25 21:23:59.0 -0700 +++ linux-2.6.21-rc7-mm1/mm/slub.c 2007-04-25 21:25:48.0 -0700 @@ -109,6 +109,9 @@ /* Enable to test recovery from slab corruption on boot */ #undef SLUB_RESILIENCY_TEST +/* Mininum number of partial slabs */ +#define MIN_PARTIAL 2 + #define DEBUG_DEFAULT_FLAGS (SLAB_DEBUG_FREE | SLAB_RED_ZONE | \ SLAB_POISON | SLAB_STORE_USER) /* @@ -635,16 +638,8 @@ static int on_freelist(struct kmem_cache /* * Tracking of fully allocated slabs for debugging */ -static void add_full(struct kmem_cache *s, struct page *page) +static void add_full(struct kmem_cache_node *n, struct page *page) { - struct kmem_cache_node *n; - - VM_BUG_ON(!irqs_disabled()); - - if (!(s->flags & SLAB_STORE_USER)) - return; - - n = get_node(s, page_to_nid(page)); spin_lock(>list_lock); list_add(>lru, >full); spin_unlock(>list_lock); @@ -923,10 +918,16 @@ static __always_inline int slab_trylock( /* * Management of partially allocated slabs */ -static void add_partial(struct kmem_cache *s, struct page *page) +static void add_partial_tail(struct kmem_cache_node *n, struct page *page) { - struct kmem_cache_node *n = get_node(s, page_to_nid(page)); + spin_lock(>list_lock); + n->nr_partial++; + list_add_tail(>lru, >partial); + spin_unlock(>list_lock); +} +static void add_partial(struct kmem_cache_node *n, struct page *page) +{ spin_lock(>list_lock); n->nr_partial++; list_add(>lru, >partial); @@ -1026,7 +1027,7 @@ static struct page *get_any_partial(stru n = get_node(s, zone_to_nid(*z)); if (n && cpuset_zone_allowed_hardwall(*z, flags) && - n->nr_partial > 2) { + n->nr_partial > MIN_PARTIAL) { page = get_partial_node(n); if (page) return page; @@ -1060,15 +1061,31 @@ static struct page *get_partial(struct k */ static void putback_slab(struct kmem_cache *s, struct page *page) { + struct kmem_cache_node *n = get_node(s, page_to_nid(page)); + if (page->inuse) { + if (page->freelist) - add_partial(s, page); - else if (PageError(page)) - add_full(s, page); + add_partial(n, page); + else if (PageError(page) && (s->flags & SLAB_STORE_USER)) + add_full(n, page); slab_unlock(page); + } else { - slab_unlock(page); - discard_slab(s, page); + if (n->nr_partial < MIN_PARTIAL) { + /* +* Adding an empty page to the partial slabs in order +* to avoid page allocator overhead. This page needs to +* come after all the others that are not fully empty +* in order to make sure that we do maximum +* defragmentation. +*/ + add_partial_tail(n, page); + slab_unlock(page); + } else { + slab_unlock(page); + discard_slab(s, page); + } } } @@ -1325,7 +1342,7 @@ checks_ok: * then add it. */ if (unlikely(!prior)) - add_partial(s, page); + add_partial(get_node(s, page_to_nid(page)), page); out_unlock: slab_unlock(page); @@ -1541,7 +1558,7 @@ static struct kmem_cache_node * __init e kmalloc_caches->node[node] = n; init_kmem_cache_node(n); atomic_long_inc(>nr_slabs); - add_partial(kmalloc_caches, page); + add_partial(n, page); return n; } -- - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at
Re: Linux 2.6.21
On Thu, Apr 26, 2007 at 06:08:06AM +0200, Adrian Bunk wrote: > On Wed, Apr 25, 2007 at 08:29:28PM -0700, Linus Torvalds wrote: > >... > > So it's been over two and a half months, and while it's certainly not the > > longest release cycle ever, it still dragged out a bit longer than I'd > > have hoped for and it should have. As usual, I'd like to thank Adrian (and > > the people who jumped on the entries Adrian had) for keeping everybody on > > their toes with the regression list - there's a few entries there still, > > but it got to the point where we didn't even know if they were real > > regressions, and delaying things further just wasn't going to help. > >... > > > Number of different known regressions compared to 2.6.20 at the time > of the 2.6.21 release: > 14 > > Number of different known regressions compared to 2.6.20 at the time > of the 2.6.21 release that were first reported in March or earlier: > 8 > > Number of different known regressions compared to 2.6.20 at the time > of the 2.6.21 release with patches available at the time of the 2.6.21 > release [1]: > 3 > > What I will NOT do: > Waste my time with tracking 2.6.22-rc regressions. > > > We have an astonishing amount of -rc testers, but obviously not the > developer manpower for handling them. > > If we would take "no regressions" seriously, it might take 4 or 5 months > between releases due to the lack of developer manpower for handling > regressions. But that should be considered OK if avoiding regressions > was considered more important than getting as quick as possible to the > next two week regression-merge window. > > But releasing with so many known regressions is insulting for the many > people who spent their time testing -rc kernels. Adrian, I understand your concerns, it's more and more common to see developers considering their work is worthless. But it's not. You should see the current development model as a pipeline. What you feed at the input can take some time to reach the output, and if we wait for the whole pipeline to flush, more crap gets released. What is needed is a higher priority on fixes for known regressions. I find your summary above more readable than the large lists of regressions. I think that you should reply to Linus' announces with something that short, starting from the known-with-patch, known-for-more-than-1-month, and all-known-regressions. It may help Linus focus even more on those. Also, while it will not prevent any release with regressions, at least it will prevent such a stupid case of known regressions with patch available. Also, check how many regressions you have reported and which have been fixed during the -rc stage. You'll see your work really was useful. Maybe Linus should accept to dedicate -final to known regressions only, to force a check in this area ? Whether or not all of them get fixed is not the real problem, but at least we will not have any regressions with pending patch unapplied ! Please do continue that task if you have the time to do so ! Thanks, Willy - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] rename TANBAC TB0219 config
Hi This patch has renamed config of TANBAC TB0219 GPIO support. It changed to an appropriate name. Yoichi Signed-off-by: Yoichi Yuasa <[EMAIL PROTECTED]> diff -pruN -X generic/Documentation/dontdiff generic-orig/drivers/char/Kconfig generic/drivers/char/Kconfig --- generic-orig/drivers/char/Kconfig 2007-04-26 13:45:27.225157000 +0900 +++ generic/drivers/char/Kconfig2007-04-26 13:58:56.663743750 +0900 @@ -905,8 +905,8 @@ config SONYPI To compile this driver as a module, choose M here: the module will be called sonypi. -config TANBAC_TB0219 - tristate "TANBAC TB0219 base board support" +config GPIO_TB0219 + tristate "TANBAC TB0219 GPIO support" depends on TANBAC_TB022X select GPIO_VR41XX diff -pruN -X generic/Documentation/dontdiff generic-orig/drivers/char/Makefile generic/drivers/char/Makefile --- generic-orig/drivers/char/Makefile 2007-04-26 13:45:27.345164500 +0900 +++ generic/drivers/char/Makefile 2007-04-26 13:43:30.361853500 +0900 @@ -91,7 +91,7 @@ obj-$(CONFIG_PC8736x_GPIO)+= pc8736x_gp obj-$(CONFIG_NSC_GPIO) += nsc_gpio.o obj-$(CONFIG_CS5535_GPIO) += cs5535_gpio.o obj-$(CONFIG_GPIO_VR41XX) += vr41xx_giu.o -obj-$(CONFIG_TANBAC_TB0219)+= tb0219.o +obj-$(CONFIG_GPIO_TB0219) += tb0219.o obj-$(CONFIG_TELCLOCK) += tlclk.o obj-$(CONFIG_WATCHDOG) += watchdog/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Linux 2.6.21
On Thu, Apr 26, 2007 at 06:08:06AM +0200, Adrian Bunk wrote: > What I will NOT do: > Waste my time with tracking 2.6.22-rc regressions. I sure hope you don't do this. Tracking these is tough, and I think you are doing a great job with it. No release will have no regressions, there's just too many different combinations of hardware and sometimes people don't have the time to test to see if their original report is even fixed or not. And some of them will get fixed with patches coming in the next kernel release, which will then be tracked down and added to the -stable releases. So if you can, please keep it up, if you think it's a thankless job, here's my hearty thanks for doing this work. It's really needed and I really appreciate it. thanks, greg k-h - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Question about Reiser4
On Wed, 25 Apr 2007 23:50:22 +0800, "Jeff Chua" <[EMAIL PROTECTED]> said: > On 4/25/07, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote: > > > Laurent Riffard's Reiser4 patch to the default linux-2.6.20 kernel and a > > couple of others. > > Thank you. Got it. Testing it now. > > Jeff. What plugins etc are you looking at? -- [EMAIL PROTECTED] -- http://www.fastmail.fm - Email service worth paying for. Try it for free - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [00/17] Large Blocksize Support V3
On Wed, 25 Apr 2007, Eric W. Biederman wrote: > The page cache has no problems supporting things with a block > size larger then page size. Now the block device layer may not > have the code to do the scatter gather into small pages and it > may not handle buffer heads whose data is split between multiple > pages. It does have that problem. If a system is in use then memory is fragmented and requests to the devices are in 4k sizes. The kernel has to manage the 4k size. The number of requests that the driver can take is limited. Larger blocks allow shuffling more data to the device. > And generally larger physical pages are a mistake to use. > Especially as it looks from some of the later comment you don't > date test on 32bit because the memory fragments faster. Ummm.. Dont get me to comment on i386. I never said that memory fragments faster on i386. i386 has multiple issues with memory management that require a lot of work and that will cause difficulty. If you have these fun systems with 512k ZONE_NORMAL and 63GB HIGHMEM then good luck... > Is it common for hardware that supports large block sizes to not > support splitting those blocks apart during DMA? Unless it is common > the whole premise of this patchset seems broken. Huh? Splitting the blocks requires hardware effort -> Reduction in transfer rate. > I suspect what needs to be fixed is the page cache block device > interface so that we have helper functions that know how to stuff > a single block into several pages. Oh we have scores of these hacks around. Look at the dvd/cd layer. The point is to get rid of those. > Right now I don't even want to think about trying to use a swap device > with a large block size when we are low on memory. But that is due to the VM (at least Linus tree) having no defrag methods. mm has Mel's antifrag methods and can do it. > > 2. 32/64k blocksize is also used in flash devices. Same issues. > > flash devices are not block devices so I strongly doubt it is > the same issue. But they could be treated as such. Right now these poor guys have to improvise around the page size limit. > > 4. Reduce fsck times. Larger block sizes mean faster file system checking. > > Fewer seeks and less meta-data means faster fsck times. Larger block > sizes get us there only tangentially. Less meta data to manage does not reduce fsck times? Going from order 0 to order 2 blocks cuts the metadata to a fourth. > > 5. Performance. If we look at IA64 vs. x86_64 then it seems that the > >faster interrupt handling on x86_64 compensate for the speed loss due to > >a smaller page size (4k vs 16k on IA64). Supporting larger block sizes > >sizes on all allows a significant reduction in I/O overhead and increases > >the size of I/O that can be performed by hardware in a single request > >since the number of scatter gather entries are typically limited for > >one request. This is going to become increasingly important to support > >the ever growing memory sizes since we may have to handle excessively > >large amounts of 4k requests for data sizes that may become common > >soon. For example to write a 1 terabyte file the kernel would have to > >handle 256 million 4k chunks. > > This assumes you get the option of large files and batching things as > the systems scale. At SGI maybe that is true. However in general > you gets lots of small requests as systems scale up. Yes you get lots of small request *because* we do not support defrag and cannot large contiguous allocations. > > 6. Cross arch compatibility: It is currently not possible to mount > >an 16k blocksize ext2 filesystem created on IA64 on an x86_64 system. > >With this patch this becoems possible. > > Again this is a problem with the page cache block device interface not > a page cache problem. Ummm the other arches read 16k blocks of contigous memory. That is not supported on 4k platforms right now. I guess you you move those to vmalloc areas? Want to hack the filesystems for this? > I think supporting larger block sizes is a nice goal. However unless > we are bumping up against hardware limitations let's see how far > we can go with batching and fixing the block layer/page cache interface > instead of assuming that larger page sizes are the answer. There are multiple scaling issues in the kernel. What you propose is to add hack over hack into the VM to avoid having to deal with defragmentation. That in turn will cause churn with hardware etc etc. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [00/17] Large Blocksize Support V3
[EMAIL PROTECTED] writes: > V2->V3 > - More restructuring > - It actually works! > - Add XFS support > - Fix up UP support > - Work out the direct I/O issues > - Add CONFIG_LARGE_BLOCKSIZE. Off by default which makes the inlines revert > back to constants. Disabled for 32bit and HIGHMEM configurations. > This also allows a gradual migration to the new page cache > inline functions. LARGE_BLOCKSIZE capabilities can be > added gradually and if there is a problem then we can disable > a subsystem. > > V1->V2 > - Some ext2 support > - Some block layer, fs layer support etc. > - Better page cache macros > - Use macros to clean up code. > > This patchset modifies the Linux kernel so that larger block sizes than > page size can be supported. Larger block sizes are handled by using > compound pages of an arbitrary order for the page cache instead of > single pages with order 0. Huh? You seem to be mixing two very different concepts. The page cache has no problems supporting things with a block size larger then page size. Now the block device layer may not have the code to do the scatter gather into small pages and it may not handle buffer heads whose data is split between multiple pages. But this is not a page cache issue. And generally larger physical pages are a mistake to use. Especially as it looks from some of the later comment you don't date test on 32bit because the memory fragments faster. Is it common for hardware that supports large block sizes to not support splitting those blocks apart during DMA? Unless it is common the whole premise of this patchset seems broken. I suspect what needs to be fixed is the page cache block device interface so that we have helper functions that know how to stuff a single block into several pages. That would make the choice of using larger order pages (essentially increasing PAGE_SIZE) something that can be investigated in parallel. Right now I don't even want to think about trying to use a swap device with a large block size when we are low on memory. > > Rationales: > > 1. We have problems supporting devices with a higher blocksize than >page size. This is for example important to support CD and DVDs that >can only read and write 32k or 64k blocks. We currently have a shim >layer in there to deal with this situation which limits the speed >of I/O. The developers are currently looking for ways to completely >bypass the page cache because of this deficiency. block device /page cache interface issue. > 2. 32/64k blocksize is also used in flash devices. Same issues. flash devices are not block devices so I strongly doubt it is the same issue. > 3. Future harddisks will support bigger block sizes that Linux cannot >support since we are limited to PAGE_SIZE. Ok the on board cache >may buffer this for us but what is the point of handling smaller >page sizes than what the drive supports? No fragmenting memory and keeping the system running. > 4. Reduce fsck times. Larger block sizes mean faster file system checking. Fewer seeks and less meta-data means faster fsck times. Larger block sizes get us there only tangentially. > 5. Performance. If we look at IA64 vs. x86_64 then it seems that the >faster interrupt handling on x86_64 compensate for the speed loss due to >a smaller page size (4k vs 16k on IA64). Supporting larger block sizes >sizes on all allows a significant reduction in I/O overhead and increases >the size of I/O that can be performed by hardware in a single request >since the number of scatter gather entries are typically limited for >one request. This is going to become increasingly important to support >the ever growing memory sizes since we may have to handle excessively >large amounts of 4k requests for data sizes that may become common >soon. For example to write a 1 terabyte file the kernel would have to >handle 256 million 4k chunks. This assumes you get the option of large files and batching things as the systems scale. At SGI maybe that is true. However in general you gets lots of small requests as systems scale up. For example I have gigabytes of kernel trees. How are larger requests going to speed of my reading and writing of those? And yes even with 8G of ram I have enough kernel trees that they fall out of memory. So cache is not the only answer. > 6. Cross arch compatibility: It is currently not possible to mount >an 16k blocksize ext2 filesystem created on IA64 on an x86_64 system. >With this patch this becoems possible. Again this is a problem with the page cache block device interface not a page cache problem. I think supporting larger block sizes is a nice goal. However unless we are bumping up against hardware limitations let's see how far we can go with batching and fixing the block layer/page cache interface instead of assuming that larger page sizes are the answer. Eric - To unsubscribe from this list: send the line
Re: [PATCH 2.4.35-pre4] fix 'pc_keyb: controller jammed (0xA7)' error on systems with KVM
Hi Brian, On Wed, Apr 25, 2007 at 03:13:13PM -0400, Brian Maly wrote: > Ive had a few requests for this patch, so Im posting it against > linux-2.4.35-pre4 kernel. OK, does not look too intrusive, and seems fair enough. Will merge it. Thanks ! Willy - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC][PATCH -mm take4 2/6] support multiple logging
From: Keiichi KII <[EMAIL PROTECTED]> Date: Thu, 26 Apr 2007 13:02:04 +0900 > Stephen Hemminger said "The configuration of netconsole's looks like the > configuration of routes". > I think so too. > So I think ioctl commands for adding/removing port and the following userland > application like route(8) command by using the ioctl. Like the route command itself, the route changing ioctl()s are old deprecated BSD compatible functionality. All current routing configuration is done using netlink and the 'ip' utility. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Linux 2.6.21
On Thu, Apr 26, 2007 at 06:08:06AM +0200, Adrian Bunk wrote: > What I will NOT do: > Waste my time with tracking 2.6.22-rc regressions. I seriously hope you'll reconsider. If you hadn't have done this, things would have been a *lot* worse imo. But either way, thanks for doing what remains a really grotty job that may not get you as many kernel groupies as rewriting the process scheduler, but is equally as (if not moreso) important. Dave -- http://www.codemonkey.org.uk - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] sworks-agp: Switch to PCI ref counting APIs
On Wed, Apr 25, 2007 at 09:23:39PM -0700, Andrew Morton wrote: > On Thu, 26 Apr 2007 00:20:19 -0400 Dave Jones <[EMAIL PROTECTED]> wrote: > > > On Wed, Apr 25, 2007 at 07:21:58PM -0700, Andrew Morton wrote: > > > On Mon, 23 Apr 2007 14:51:29 +0100 Alan Cox <[EMAIL PROTECTED]> wrote: > > > > > > > { > > > > struct agp_bridge_data *bridge = pci_get_drvdata(pdev); > > > > > > > > + pci_dev_put(bridge->dev); > > > > agp_remove_bridge(bridge); > > > > agp_put_bridge(bridge); > > > > + pci_dev_put(serverworks_private.svrwrks_dev) > > > > + serverworks_private.svrwrks_dev = NULL; > > > > > > err, guys? > > > > ? One put for the agp bridge, one for the host bridge. > > What am I missing? > > > > a semicolon. Yow. I thought I build tested that. I'll regenerate the git tree tomorrow. Same goes for the cpufreq tree with the acpi fixup. Thanks. Dave -- http://www.codemonkey.org.uk - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/9] Kconfig: cleanup s390 v2.
On Wed, Apr 25, 2007 at 08:02:07PM -0700, Andrew Morton wrote: > On Wed, 25 Apr 2007 19:38:23 -0700 Randy Dunlap <[EMAIL PROTECTED]> wrote: > > > In fact, I should probably munge it together with a similar thing > > > I wrote at http://www.codemonkey.org.uk/projects/findbugs/ > > > (Warning: scary regexps) > > I'll be glad to help maintain such animals if wanted. > > wanted ;) > > At least, it would be interesting to investigate the usefulness. I suspect > it will prove to be very useful for the little things. Yeah, the original script tried to do things like spinlock balancing checks, (badly). This was long before had sparse, and it was partly a "lets learn some perl" experience for myself. I'll toss that idea out now that we have better tools for that, and keep it to simple checks. > Heck, someone could subscribe a robot to all the mailing lists which sends > nastygrams straight back at people who submit broken patches. We already > need that for tab-replaced and word-wrapped patches. (ok, we have it - > it's called akpm, but being robotic wearies one) Ok, I've got a few different flavours of that script. I'll roll them all into one tomorrow and throw out some of the noisy silly ones (I don't think warning about strcpy->strncpy is really worthwhile for eg). Additional regexps gratefully recieved. Dave -- http://www.codemonkey.org.uk - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] sworks-agp: Switch to PCI ref counting APIs
On Thu, 26 Apr 2007 00:20:19 -0400 Dave Jones <[EMAIL PROTECTED]> wrote: > On Wed, Apr 25, 2007 at 07:21:58PM -0700, Andrew Morton wrote: > > On Mon, 23 Apr 2007 14:51:29 +0100 Alan Cox <[EMAIL PROTECTED]> wrote: > > > > > { > > > struct agp_bridge_data *bridge = pci_get_drvdata(pdev); > > > > > > +pci_dev_put(bridge->dev); > > > agp_remove_bridge(bridge); > > > agp_put_bridge(bridge); > > > +pci_dev_put(serverworks_private.svrwrks_dev) > > > +serverworks_private.svrwrks_dev = NULL; > > > > err, guys? > > ? One put for the agp bridge, one for the host bridge. > What am I missing? > a semicolon. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] sworks-agp: Switch to PCI ref counting APIs
On Wed, Apr 25, 2007 at 07:21:58PM -0700, Andrew Morton wrote: > On Mon, 23 Apr 2007 14:51:29 +0100 Alan Cox <[EMAIL PROTECTED]> wrote: > > > { > >struct agp_bridge_data *bridge = pci_get_drvdata(pdev); > > > > + pci_dev_put(bridge->dev); > >agp_remove_bridge(bridge); > >agp_put_bridge(bridge); > > + pci_dev_put(serverworks_private.svrwrks_dev) > > + serverworks_private.svrwrks_dev = NULL; > > err, guys? ? One put for the agp bridge, one for the host bridge. What am I missing? Dave -- http://www.codemonkey.org.uk - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] pcmcia: irq probe can be done without risking an IRQ storm
On Thu, 5 Apr 2007 14:09:36 +0100 Alan Cox <[EMAIL PROTECTED]> wrote: > Nowdays you can ask for an IRQ to be allocated but not enabled, when > PCMCIA was written this was not true and this feature is thus not used > > Signed-off-by: Alan Cox <[EMAIL PROTECTED]> > > diff -u --new-file --recursive --exclude-from /usr/src/exclude > linux.vanilla-2.6.21-rc5-mm4/drivers/pcmcia/pcmcia_resource.c > linux-2.6.21-rc5-mm4/drivers/pcmcia/pcmcia_resource.c > --- linux.vanilla-2.6.21-rc5-mm4/drivers/pcmcia/pcmcia_resource.c > 2007-04-03 16:52:14.0 +0100 > +++ linux-2.6.21-rc5-mm4/drivers/pcmcia/pcmcia_resource.c 2007-04-03 > 17:10:42.0 +0100 > @@ -810,8 +810,11 @@ > type = IRQF_SHARED; > if (req->Attributes & IRQ_TYPE_DYNAMIC_SHARING) > type = IRQF_SHARED; > #ifdef CONFIG_PCMCIA_PROBE > + if (!(req->Attributes & IRQ_HANDLE_PRESENT)) > + type |= IRQ_NOAUTOEN; > + > if (s->irq.AssignedIRQ != 0) { > /* If the interrupt is already assigned, it must be the same */ > irq = s->irq.AssignedIRQ; alpha: drivers/pcmcia/pcmcia_resource.c: In function 'pcmcia_request_irq': drivers/pcmcia/pcmcia_resource.c:816: error: 'IRQ_NOAUTOEN' undeclared (first use in this function) drivers/pcmcia/pcmcia_resource.c:816: error: (Each undeclared identifier is reported only once drivers/pcmcia/pcmcia_resource.c:816: error: for each function it appears in.) Problem is, IRQ_NOAUTOEN is a generic-irq thing, so architectures which don't use generic-irqs break. And it's defined in linux/irq.h which (stupidly) cannot be included in generic code. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] CFS scheduler, -v6
On Wed, Apr 25, 2007 at 11:47:04PM +0200, Ingo Molnar wrote: >> - upstream fix: SysRq-T should show runnable tasks On Thu, Apr 26, 2007 at 05:29:27AM +0200, Nick Piggin wrote: > BTW. can you send this upstream? It is very annoying how it currently works, > and I've had more than one bug that required seeing runnable tasks in order > to diagnose and fix... There are other things that should go upstream separately. The init/main.c comment fix for one. I'd even argue that scheduler classes should be done separately from and prior to the specific cfs policy. -- wli - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Linux 2.6.21
On Wed, Apr 25, 2007 at 08:29:28PM -0700, Linus Torvalds wrote: >... > So it's been over two and a half months, and while it's certainly not the > longest release cycle ever, it still dragged out a bit longer than I'd > have hoped for and it should have. As usual, I'd like to thank Adrian (and > the people who jumped on the entries Adrian had) for keeping everybody on > their toes with the regression list - there's a few entries there still, > but it got to the point where we didn't even know if they were real > regressions, and delaying things further just wasn't going to help. >... Number of different known regressions compared to 2.6.20 at the time of the 2.6.21 release: 14 Number of different known regressions compared to 2.6.20 at the time of the 2.6.21 release that were first reported in March or earlier: 8 Number of different known regressions compared to 2.6.20 at the time of the 2.6.21 release with patches available at the time of the 2.6.21 release [1]: 3 What I will NOT do: Waste my time with tracking 2.6.22-rc regressions. We have an astonishing amount of -rc testers, but obviously not the developer manpower for handling them. If we would take "no regressions" seriously, it might take 4 or 5 months between releases due to the lack of developer manpower for handling regressions. But that should be considered OK if avoiding regressions was considered more important than getting as quick as possible to the next two week regression-merge window. But releasing with so many known regressions is insulting for the many people who spent their time testing -rc kernels. cu Adrian [1] http://lkml.org/lkml/2007/4/25/496 -- "Is there not promise of rain?" Ling Tan asked suddenly out of the darkness. There had been need of rain for many days. "Only a promise," Lao Er said. Pearl S. Buck - Dragon Seed - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC][PATCH -mm take4 2/6] support multiple logging
Well.. before you can finish this work we need to decide upon what the interface to userspace will be. - The miscdev isn't appropriate Why isn't miscdev appropriate? We just shouldn't use miscdev for networking conventionally? Yes it's rather odd, especially for networking. What does the miscdev _do_ anyway? Is it purely a target for the ioctls? Yes, I purely use miscdev for the ioctls. I want to use sysfs and ioctl to implement the dynamic configurabillity. The sysfs shows/changes netconsole configurations(IP address, port and so on). A userland application using the ioctl adds/removes netconsole port. I thought that the dynamic configurability could be realized without a userland application. in the kernel only. (e.g. only sysfs, no userland application) But I think we need the function to automatically resolve the destination MAC address from IP address because of the resolving cost and I should implement a userland application, not netconsole kernel module. The netconsle will become more useful by implementing the above function. Some other speculations: 1. Would it be possible to add ioctl's to /dev/console? This would be more in keeping with older Unix style model. 2. Using sysfs makes sense if there is a device object that exists to add the sysfs attributes to. 3. Procfs is handy for summary type tables. 4. Netlink does feel like overkill for this. Although newer generic netlink makes it easier. If I use sysfs, Is it proper location that adds each attributes of netconsole port in "/sys/class/misc/netconsole/port[0-9]*", or another locations in /sys/? Stephen Hemminger said "The configuration of netconsole's looks like the configuration of routes". I think so too. So I think ioctl commands for adding/removing port and the following userland application like route(8) command by using the ioctl. e.g. 1. add port # netconfig add 192.168.0.10 2. remove port # netconfig remove 1 3. show port info # netconfig id status Source IP Source Port Destination IP Destination Port Destination MAC 1 enable 192.168.0.1 6665192.168.0.10 00:11:22:33:44:55 2 disable 192.168.0.1 6665192.168.0.20 00:11:22:33:44:66 route(8) command uses ioctl for Netlink. But, I'm going to implement ioctl's to /dev/console because of the above comments. Thank you for your comments. Any comments very welcome. -- Keiichi KII NEC Corporation OSS Promotion Center E-mail: [EMAIL PROTECTED] - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] CFS scheduler, -v6
On Thu, 26 Apr 2007 05:29:27 +0200 Nick Piggin <[EMAIL PROTECTED]> wrote: > > - upstream fix: SysRq-T should show runnable tasks > > BTW. can you send this upstream? It is very annoying how it currently works, > and I've had more than one bug that required seeing runnable tasks in order > to diagnose and fix... I have it. I'm just waiting to see if Linus took it. Seems not. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.21-rc7-mm1: Oops and Gnome desktop freezes
Hi. On 4/25/07, Antonino A. Daplas <[EMAIL PROTECTED]> wrote: The Gnome desktop does not finish launching. And I get this tracing, all coming from Gnome apps. Tony BUG: unable to handle kernel paging request at virtual address c0a74000 printing eip: c014c469 *pde = 005f3027 *pte = Oops: 0002 [#1] Modules linked in: xt_pkttype xt_tcpudp ipt_LOG xt_limit nfsd exportfs lockd nfs_acl sunrpc snd_pcm_oss snd_mixer_oss snd_seq button battery ac ipt_REJECT xt_state iptable_mangle iptable_nat nf_nat iptable_filter ip6table_filter nf_conntrack_ipv4 nf_conntrack ip_tables ip6_tables x_tables nls_iso8859_1 nls_cp437 vfat fat nls_utf8 ntfs reiserfs loop usblp snd_via82xx snd_ac97_codec ac97_bus snd_pcm ide_cd cdrom snd_timer snd_page_alloc snd_mpu401_uart rtc_cmos snd_rawmidi rtc_core snd_seq_device rtc_lib snd soundcore via_rhine ehci_hcd uhci_hcd usbcore sc92031 via_agp 8139too i2c_viapro ext3 mbcache jbd edd fan thermal processor via82cxxx ide_disk ide_core CPU:0 EIP:0060:[]Tainted: G D VLI EFLAGS: 00210246 (2.6.21-rc7-mm1-default #74) EIP is at get_page_from_freelist+0x2b5/0x359 eax: ebx: c1014e80 ecx: 0400 edx: c0002fe8 esi: c1014e80 edi: c0a74000 ebp: d27c9eb0 esp: d27c9e58 ds: 007b es: 007b fs: 00d8 gs: 0033 ss: 0068 Process evolution-data- (pid: 4745, ti=d27c8000 task=d32a8aa0 task.ti=d27c8000) Stack: c0363628 0002 c0113c58 0805c000 c1fccd40 c0a74000 0001 001280d2 c03640f0 c0363600 00200246 0002 0001 0001 00200246 c03640f4 001280d2 d27c9f00 c014c5f5 Call Trace: [] show_trace_log_lvl+0x1a/0x30 [] show_stack_log_lvl+0x9b/0xaa [] show_registers+0x1b6/0x288 [] die+0xe7/0x1fc [] do_page_fault+0x429/0x4f8 [] error_code+0x71/0x78 [] __alloc_pages+0xe8/0x29e [] __handle_mm_fault+0x16d/0x5fc [] do_page_fault+0x1fe/0x4f8 [] error_code+0x71/0x78 === INFO: lockdep is turned off. Code: 00 00 66 83 7d cc 00 c7 45 ec 00 00 00 00 78 30 eb 36 ba 03 00 00 00 89 d8 e8 8d 63 fc ff b9 00 04 00 00 89 45 c0 31 c0 8b 7d c0 ab 8b 45 c0 ba 03 00 00 00 83 c3 20 e8 d8 63 fc ff ff 45 ec EIP: [] get_page_from_freelist+0x2b5/0x359 SS:ESP 0068:d27c9e58 note: evolution-data-[4745] exited with preempt_count 1 BUG: sleeping function called from invalid context at kernel/rwsem.c:20 in_atomic():1, irqs_disabled():0 INFO: lockdep is turned off. [] show_trace_log_lvl+0x1a/0x30 [] show_trace+0x12/0x14 [] dump_stack+0x16/0x18 [] __might_sleep+0xc9/0xcf [] down_read+0x18/0x50 [] futex_wake+0x35/0xcd [] do_futex+0x91/0x104d [] sys_futex+0xc1/0xd4 [] mm_release+0x84/0x8b [] exit_mm+0x19/0xc3 [] do_exit+0x1f8/0x744 [] die+0x1d6/0x1fc [] do_page_fault+0x429/0x4f8 [] error_code+0x71/0x78 [] __alloc_pages+0xe8/0x29e [] __handle_mm_fault+0x16d/0x5fc [] do_page_fault+0x1fe/0x4f8 [] error_code+0x71/0x78 === BUG: scheduling while atomic: evolution-data-/0x1001/4745 INFO: lockdep is turned off. [] show_trace_log_lvl+0x1a/0x30 [] show_trace+0x12/0x14 [] dump_stack+0x16/0x18 [] __sched_text_start+0x71/0x553 [] __cond_resched+0x28/0x3f [] cond_resched+0x29/0x34 [] down_read+0x1d/0x50 [] futex_wake+0x35/0xcd [] do_futex+0x91/0x104d [] sys_futex+0xc1/0xd4 [] mm_release+0x84/0x8b [] exit_mm+0x19/0xc3 [] do_exit+0x1f8/0x744 [] die+0x1d6/0x1fc [] do_page_fault+0x429/0x4f8 [] error_code+0x71/0x78 [] __alloc_pages+0xe8/0x29e [] __handle_mm_fault+0x16d/0x5fc [] do_page_fault+0x1fe/0x4f8 [] error_code+0x71/0x78 === BUG: unable to handle kernel paging request at virtual address c0a75000 printing eip: c0152d68 *pde = 005f3027 *pte = Oops: 0002 [#2] Modules linked in: xt_pkttype xt_tcpudp ipt_LOG xt_limit nfsd exportfs lockd nfs_acl sunrpc snd_pcm_oss snd_mixer_oss snd_seq button battery ac ipt_REJECT xt_state iptable_mangle iptable_nat nf_nat iptable_filter ip6table_filter nf_conntrack_ipv4 nf_conntrack ip_tables ip6_tables x_tables nls_iso8859_1 nls_cp437 vfat fat nls_utf8 ntfs reiserfs loop usblp snd_via82xx snd_ac97_codec ac97_bus snd_pcm ide_cd cdrom snd_timer snd_page_alloc snd_mpu401_uart rtc_cmos snd_rawmidi rtc_core snd_seq_device rtc_lib snd soundcore via_rhine ehci_hcd uhci_hcd usbcore sc92031 via_agp 8139too i2c_viapro ext3 mbcache jbd edd fan thermal processor via82cxxx ide_disk ide_core CPU:0 EIP:0060:[]Tainted: G D VLI EFLAGS: 00010296 (2.6.21-rc7-mm1-default #74) EIP is at __do_fault+0x17a/0x301 eax: c0a75000 ebx: c1014ea0 ecx: 0400 edx: c0002fe4 esi: d2207000 edi: c0a75000 ebp: d7ad9f00 esp: d7ad9ea8 ds: 007b es: 007b fs: 00d8 gs: 0033 ss: 0068 Process smart (pid: 4687, ti=d7ad8000 task=d6c99510 task.ti=d7ad8000) Stack: d28766e8 d2876730 d7ad9ec0 0007 c0a75000 d2207000 b7a3d650 d291b668 c1fcc0c0 0101 c12440e0 c1fcc128 d6c99510 18100073 d7ad9f40 b7a3d000 0003 0001 0002 d291b668 b7a3d650 c1fb38f4 d7ad9f50 c015419f Call
Re: MMCv4 support (8-bit support missing)
Hi Pierre/Philip, I've looked through the MMC 4.2 spec and I see nothing in it that even hints that 8-bit support might be optional. So as it stands, the bus testing is still out. Okay. Its possible that my understanding was wrong in the sense that I thought bus testing procedure is mandatory to support 8-bit cards. If 8-bit is mandatory for MMC4 cards, then the changes required in the MMC core to support 8-bit might be simple. Based on host controller cap this can be handled. Philip asked me about the access to the 8-bit controller. We might not be able to provide you direct access to the hardware platform as it requires involvement of business managers and so on. But can I be of help by testing your code on our platform and leting you know the results? Regards, Madhu On 4/24/07, Pierre Ossman <[EMAIL PROTECTED]> wrote: Madhusudhan c wrote: > > Suppose a host controller is capable of suporting 8-bit and it tells > the core that it can support 8-bit. Now the card that is plugged in > might or might not support 8-bit based on the type of the card. There > is no field in the ext_csd which will tell you what bus width the card > can support. > I've looked through the MMC 4.2 spec and I see nothing in it that even hints that 8-bit support might be optional. So as it stands, the bus testing is still out. Rgds Pierre - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: PROBLEM: Oops: 0002 [1] SMP
I have also suspected this. memtest86 from test #1 to #10 showed an error on test #3 once, so i removed the dimm, cleaned it and fixed it again and run the tests two more times without any error. Is there any other tool i could use to test the memory? Thanks. Thiago. On Wed, 2007-04-25 at 18:24 -0400, Chuck Ebbert wrote: > Thiago M. Sayão wrote: > > I also got this error yesterday which seems related: > > > > Bad pagetable: 001d [1] SMP > > Bad pagetable: 0009 [2] SMP > > > > You may have a hardware problem. Did you test the memory? > - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] CFS scheduler, -v6
On Wed, Apr 25, 2007 at 11:47:04PM +0200, Ingo Molnar wrote: > > i'm pleased to announce release -v6 of the CFS scheduler patchset. The > main goal of CFS is to implement "high quality desktop scheduling" as > well as technically possible. > > The CFS patch against v2.6.21-rc7 or against v2.6.20.7 can be downloaded > from the usual place: > > http://redhat.com/~mingo/cfs-scheduler/ > > i got lots of -v5 feedback (thanks and please keep the reports coming!) > so the -v6 release includes many bugfixes and improvements: > > 19 files changed, 317 insertions(+), 744 deletions(-) > > the biggest user-visible changes in -v6 are various refinements to the > precise-scheduling infrastructure that should result in generally better > interactivity and a smoother desktop. In particular a number of "movie > playback lags/stutters" and "firefox lags under load" type of > regressions have been resolved. (Please re-report any regression that > might not be fixed yet.) > > Changes since -v5: > > - feature: increase the preemption granularity value on SMP systems. >Idea and code comes from the SD scheduler of Con Kolivas, with Con's >kind permission. (thanks Con!) > > - fix: the "privileged_nice_level=X" boot option should convert signed >integers. (Mike Galbraith) > > - build fix: yield_to unistd.h fix (Srivatsa Vaddagiri) > > - build fix: CONFIG_HEADERS_CHECK complained about sched.h. >(reported by Zach Carter) > > - build fix: normalize_rt_tasks() UP build fix. (Mike Galbraith) > > - interactivity fix: sched_clock() accuracy fixes. This should resolve >certain types of interactivity regressions reported on systems that >change their CPU frequencies. (mainly laptops) > > - default settings tweak: changed the X renicing default from -19 to >-10, based on tester feedback. (Might still be too much - more >feedback is needed.) > > - feature: introduced "wakeup granularity" and added the >/proc/sys/kernel/sched_wakeup_granularity_ns tunable, set to 0 by >default for now. This is now distinct from the sched_granularity_ns >'preemption granularity' property of the scheduler - allowing a >more agressive increase in the preemption granularity without >jeopardizing interactivity. > > - debugging feature: SysRq-T now also shows the /proc/sched_debug >output - useful to generate a dump of all relevant scheduler state in >one easy step. > > - debugging feature: make SysRq-Nice normalize negative nice level >tasks too and reset the CFS state. > > - debugging: extend /proc/sched_debug with a few more clock related >fields, to be able to better debug problems caused by unstable >clocks. > > - upstream fix: SysRq-T should show runnable tasks BTW. can you send this upstream? It is very annoying how it currently works, and I've had more than one bug that required seeing runnable tasks in order to diagnose and fix... - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] i386: For debugging, make the initial page table setup less forgiving.
Eric W. Biederman wrote: I suspect what we want to do is come up with a function to call to test to see if a page should be read-only and map such pages _PAGE_KERNEL_RO, or _PAGE_KERNEL_RO_EXEC if it's code. Speaking of things what are paravirt_alloc_pd and parafirt_alloc_pd supposed to do? For hypervisors which shadow kernel page tables, none of these concerns with keeping page tables read-only arise. However, another set of concerns does arise with maintaining shadow synchronization. One of those problems is keeping the hypervisor aware of when pages are being used as page tables. However, it turns out both direct page table and shadow page table implementations can be made to use one page table allocation function; in the direct page table case (as for Xen), this is the point where page tables can be recognized and made read-only. So this is the dual purpose of the paravirt_alloc_p[dt] functions. Zach - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Linux 2.6.21
If the goal for 2.6.20 was to be a stable release (and it was), the goal for 2.6.21 is to have just survived the big timer-related changes and some of the other surprises (just as an example: we were apparently unlucky enough to hit what looks like a previously unknown hardware errata in one of the ethernet drivers that got updated etc). So it's been over two and a half months, and while it's certainly not the longest release cycle ever, it still dragged out a bit longer than I'd have hoped for and it should have. As usual, I'd like to thank Adrian (and the people who jumped on the entries Adrian had) for keeping everybody on their toes with the regression list - there's a few entries there still, but it got to the point where we didn't even know if they were real regressions, and delaying things further just wasn't going to help. So the big change during 2.6.21 is all the timer changes to support a tickless system (and even with ticks, more varied time sources). Thanks (when it no longer broke for lots of people ;) go to Thomas Gleixner and Ingo Molnar and a cadre of testers and coders. Of course, the timer stuff was just the most painful and core part (and thus the one that I remember most): there's a lot of changes all over. The appended changelog is just for the fixes since -rc7, so that doesn't look very impressive, the full changes since 2.6.20 are obviously a *lot* bigger (and you're better off reading the individual -rc changelogs). We now return you to your regular scheduler discussions, Linus --- Akinobu Mita (1): fault injection: add entry to MAINTAINERS Alan Cox (3): exec.c: fix coredump to pipe problem and obscure "security hole" pata_sis: Fix oops on boot [SPARC] openprom: Switch to ref counting PCI API Alexey Dobriyan (1): paride drivers: initialize spinlocks Alexey Kuznetsov (1): [NETLINK]: Infinite recursion in netlink. Andi Kleen (5): x86: Fix gcc 4.2 _proxy_pda workaround x86: Fix potential overflow in perfctr reservation x86: Remove noreplacement option x86-64: Always flush all pages in change_page_attr i386: Fix some warnings added by earlier patch Andrea Righi (1): [netdrvr] depca: handle platform_device_add() failure Andrew Morton (4): drivers/macintosh/smu.c: fix locking snafu acpi-thermal: fix mod_timer() interval drivers/net/hamradio/baycom_ser_fdx build fix packet: fix error handling Atsushi Nemoto (3): [MIPS] Disallow CpU exception in kernel again. [MIPS] Retry {save,restore}_fp_context if failed in atomic context. [MIPS] Fix BUG(), BUG_ON() handling Aubrey.Li (1): [NET]: Fix UDP checksum issue in net poll mode. Avi Kivity (1): KVM: Fix off-by-one when writing to a nonpae guest pde Badari Pulavarty (1): cache_k8_northbridges() overflows beyond allocation Balbir Singh (1): Taskstats fix the structure members alignment issue Bartlomiej Zolnierkiewicz (2): ide/Kconfig: add missing range check for IDE_MAX_HWIFS Revert "adjust legacy IDE resource setting (v2)" Bastian Blank (1): Allow reading tainted flag as user Ben Dooks (2): [ARM] 4313/1: S3C24XX: Update s3c2410 defconfig to 2.6.21-rc6 spi: fix use of set_cs in spi_s3c24xx driver Benjamin Herrenschmidt (1): fix bogon in /dev/mem mmap'ing on nommu Christoph Lameter (1): page migration: fix NR_FILE_PAGES accounting Dan Williams (1): usb-net/pegasus: fix pegasus carrier detection Dave Jiang (1): gianfar needs crc32 lib dependency Dave Johnson (1): [MIPS] Fix wrong checksum for split TCP packets on 64-bit MIPS Dave Jones (1): Longhaul - Revert ACPI C3 on Longhaul ver. 2 David Brownell (1): MAINTAINERS: use lists.linux-foundation.org David Rientjes (1): oom: kill all threads that share mm with killed task David S. Miller (2): [IPSEC] af_key: Fix thinko in pfkey_xfrm_policy2msg() [PARPORT] SUNBPP: Fix OOPS when debugging is enabled. Denis Lunev (1): [NETLINK]: Don't attach callback to a going-away netlink socket Divy Le Ray (2): cxgb3 - Fix low memory conditions cxgb3 - PHY interrupts and GPIO pins. Don Zickus (1): allow vmsplice to work in 32-bit mode on ppc64 Evgeniy Dushistov (1): ufs proper handling of zero link case Evgeny Kravtsunov (1): [BRIDGE]: Unaligned access when comparing ethernet addresses Herbert Xu (1): [NET]: Get rid of alloc_skb_from_cache Hugh Dickins (1): fix OOM killing processes wrongly thought MPOL_BIND Ivan Kokshaysky (3): alpha: fixes for specific machine types alpha: more fixes for specific machine types alpha: build fixes - force architecture Jan Yenya Kasprzak (1): Char: mxser_new, fix recursive locking Jean Delvare (3): hwmon/w83627ehf: Fix the fan5 clock divider write i2c-pasemi: Depend on PPC_PASEMI again hwmon/w83627ehf:
Re: menuconfig issue (checklist) in 2.6.20.7 & 2.6.21-rc7 ?
On Wed, 2007-04-25 at 22:30 +0200, Sam Ravnborg wrote: > > There are general funnies in the menuconfig world (my preference) here. > > For instance, I recently had reason to change/test different default IO > > schedulers, and found that no matter what I did, I couldn't select a > > default IO scheduler any more, though I used to be able to do so. > Tried it now with latest -git from Linus and here it works. > Notice that you need to make the scheduler a built-in <*> > before you can select it as default. > A scheduler selected as a module cannot be made default. Ok, I guess my ncurses is ill. (all built in) Thanks. -Mike - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: suspend2 merge (was Re: [Suspend2-devel] Re: CFS and suspend2: hang in atomic copy)
On Thu, 26 Apr 2007, Nigel Cunningham wrote: > > Sorry. I wasn't clear. I wasn't saying that suspend to ram has a > snapshot point. I was trying to say it has a point where you're seeking > to save information (PCI state / SCSI transaction number or whatever) > that you'll need to get the hardware into the same state at a later > stage. That (saving information) is the point of similarity. Yes, they do both save information, but I'm not actually convinced they would necessarily even save the *same* information. Let's just take an example of USB, and to make things more interesting, say that the disk you want to suspend to is itself over USB (not necessarily something you _want_ to do, but I think we can all agree that it's something that should potentially work, no?) Now, USB devices actually have per-connection state (at a minimum, the "toggle" bit or whatever), and that's obviously something that will inevitably *change* as a result of the device being used after snapshotting (and even if not used, by the rediscovery by the first kernel to boot), and we fundamentally cannot put the final toggle state in the snapshot. So in the snapshot-to-disk scenario, there are some pieces of data that simply fundamentally *cannot* be snapshotted, because they are not controller state, they are "connection" state. So in that case, you basically know that you *have* to rebuild the connection when you do the "snapshot_resume()" thing. So there's no point in even keeping these kinds of connection states (the same is true of keyboards, mice, anything else - it's how USB works). In contrast, in suspend-to-RAM, USB connections might just be things you actually want to keep open and active, and you *can* do so, in ways you simply cannot do with "snapshot to disk". In fact, if you are something like an OLPC and actually go to s2ram very aggressively, you might well want to keep the connection established, because it's conceivable that you might otherwise lose keypresses etc issues) See? There are real *technical* reasons to believe that the two "save state" operations are really fundamentally different. There are reasons to believe that a s2ram can actually happen while keeping some connections open that cannot be kept open over a disk snapshot. Do they *have* to be different? Of course not. For many devices the "save" and "freeze" operations will likely all be no-ops, and there would be absolutely no difference between suspending and snapshotting, because the driver state already natively contains all the information needed to get the device going again. Equally, I don't doubt that in many drivers you'll have very similar "save state" logic, but in fact I believe that in many cases that "save state" logic will often just be a simple pci_save_state(dev); call, so it's literally the case that they will not be just shared between the "suspend" and "snapshot" case, they'll be shared across all simple PCI devices too! But that doesn't mean that the functions to do so should be the same. You might have static int mypcidevice_suspend(struct pci_dev *dev) { pci_save_state(dev); pci_set_power_state(dev, PCI_D3); return 0; } static int mupcidevice_snapshot(struct pci_dev *dev) { pci_save_state(dev); return 0; } and who cares if they both have that same call to a shared "save state" function? They're still totally different operations, and the fact that *some* devices may save the same things doesn't make them any more similar! See above why some devices might save totally *different* things for a "snapshot" vs a "suspend" event. > I suppose that's another point of similarity - for snapshotting, the > same ordering is probably needed? I agree that you're likely to walk the device list in the same order. The whole "shut down leaf devices first", "start up root devices first" is pretty fundamental. But that's true of reboot and device discovery too. Should that ordering mean that we should use the "discovery()" function and pass it a flag and say "you shouldn't discover, you should snapshot or suspend now"? No. Everybody agrees that device discovery is something different from device suspend. The fact that it's done in a topological order and thus they bear some kind of inverse relationship to each other doesn't make them "the same". > > And yes, the _individual_ "save-and-suspend" events obviously needs to be > > "atomic", but it's purely about that particular individual device, so > > there's never any cross-device issues about that. > > No interdependencies? I'm not sure. Well, we pretty much count on it, since we will *suspend* the devices at the same time. So if they had interdependencies that aren't described by the ordering we enforce, they are pretty much screwed anyway ;) So yes, the device list needs to be topologically
Re: [PATCH 0/9] Kconfig: cleanup s390 v2.
On Wed, 25 Apr 2007 19:38:23 -0700 Randy Dunlap <[EMAIL PROTECTED]> wrote: > Dave Jones wrote: > > On Wed, Apr 25, 2007 at 05:24:47PM -0700, Andrew Morton wrote: > > > > > It would be neat if someone could create and maintain a new > > > scripts/spot-common-mistakes. Feed it a unified diff and it would > > complain > > > about newly-added code (and only newly-added code) which has busted > > > whitespace, adds new semaphores, adds new kernel_thread calls, etc, etc. > > > > years and years ago, when the dinosaurs roamed the land, I hacked up.. > > http://janitor.kernelnewbies.org/scripts/ and then left it by the wayside. > > Some of the checks it did are actually bogus, but I'm happy to pick that > > up again if there's interest in it being a useful tool. > > > > In fact, I should probably munge it together with a similar thing > > I wrote at http://www.codemonkey.org.uk/projects/findbugs/ > > (Warning: scary regexps) > > > > > It would need to be fairly simple and easily-extensible, as I can > > > imagine quite a few things getting added to it. > > > > > > (Imagines a procmail rule which just bounces the email if > > > spot-common-mistakes failed) > > > > or a git checkin rule that refuses to commit if it fails ;-) > > Yep, I was going to mention your scripts but you beat me to it. > > I'll be glad to help maintain such animals if wanted. > wanted ;) At least, it would be interesting to investigate the usefulness. I suspect it will prove to be very useful for the little things. Heck, someone could subscribe a robot to all the mailing lists which sends nastygrams straight back at people who submit broken patches. We already need that for tab-replaced and word-wrapped patches. (ok, we have it - it's called akpm, but being robotic wearies one) - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Nonfunctional ethernet (was Re: 2.6.21-rc7-mm1 + sysfs-oops-workaround.patch -- INFO: possible recursive locking detected)
On Thu, 26 Apr 2007 11:26:36 +0900 Tejun Heo <[EMAIL PROTECTED]> wrote: > Hello, Antonino, Andrew. > > Andrew Morton wrote: > > On Thu, 26 Apr 2007 09:02:02 +0800 "Antonino A. Daplas" <[EMAIL PROTECTED]> > > wrote: > > > >> I can bring up the network manually using ifconfig. It's opensuse's > >> rcnetwork script that fails to bring the network up. Entries > >> in /sys/class/net are still bogus. > >> > >> This kernel is now usable to me, I'll start bisection later today if > >> nobody has an answer. > > > > rc7-mm1 is hardly worth bothering with. Quite a few really bad ones have > > now been fixed and I'll try to get rc7-mm2 out within the next 12 hours (I > > assume a 76-hour debug session won't be needed this time). > > > > But I don't think the sysfs changes in Greg's tree have been updated, so > > things will probably still fail in that area. A suitable bisection > > starting pair would be around gregkh-driver-* > > This is the rename bug I wrote about in the other thread. ok. > Can you hold -mm2 off a bit? I'm almost done here. sure. I'm having much fun with all the obviously-wont-compile patches which have been checked into various subsystem trees in the past 24 hours. Please include simple instructions about which gregkh patches I should drop when this new set comes in. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/9] Kconfig: cleanup s390 v2.
Dave Jones wrote: On Wed, Apr 25, 2007 at 05:24:47PM -0700, Andrew Morton wrote: > It would be neat if someone could create and maintain a new > scripts/spot-common-mistakes. Feed it a unified diff and it would complain > about newly-added code (and only newly-added code) which has busted > whitespace, adds new semaphores, adds new kernel_thread calls, etc, etc. years and years ago, when the dinosaurs roamed the land, I hacked up.. http://janitor.kernelnewbies.org/scripts/ and then left it by the wayside. Some of the checks it did are actually bogus, but I'm happy to pick that up again if there's interest in it being a useful tool. In fact, I should probably munge it together with a similar thing I wrote at http://www.codemonkey.org.uk/projects/findbugs/ (Warning: scary regexps) > It would need to be fairly simple and easily-extensible, as I can > imagine quite a few things getting added to it. > > (Imagines a procmail rule which just bounces the email if > spot-common-mistakes failed) or a git checkin rule that refuses to commit if it fails ;-) Yep, I was going to mention your scripts but you beat me to it. I'll be glad to help maintain such animals if wanted. -- ~Randy *** Remember to use Documentation/SubmitChecklist when testing your code *** - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: suspend2 merge (was Re: [Suspend2-devel] Re: CFS and suspend2: hang in atomic copy)
On Wed, 25 Apr 2007, H. Peter Anvin wrote: > > That was the 1990s. On a brand new server system: > > 00:08.0 System peripheral: Intel Corporation 5000 Series Chipset DMA > Engine (rev b1) > > For better or worse, slave DMA seems to be making a comeback of sorts. > Not to mention all kinds of embedded crap^Whardware with optimized DMA > engines which look nothing like PCI at all. Well, the solution to that tends to be to just leave them be, and hold them on until the very end - and just ignore them (and just make-believe that it's actually the device itself that does the DMA transfer). The PCI spec for controlling DMA is really pretty nasty. You can disable it in the PCI config word, of course, but that usually just messes up the device entirely. So in practice, the way to shut up DMA (regardless of whether it's an internal DMA engine or an external one) is that you just tell the device not to listen any more (for example, for a network controller, the way to make sure it doesn't do DMA is just to make sure that you're not sending any frames, but also that it's not listening to any either)! So whether it's internal to the device, or some "system DMA controller", the sequence for shutting down DMA always ends up being the same: - make sure the host itself doesn't generate any new traffic (eg shut down the send-queue). This is generally a higher-level thing anyway, ie not really a driver decision. - the driver needs to tell the hardware to stop listening (ie "stop scanning the command mailboxes" or "stop walking USB command structures" or "stop receiving data") - the driver then needs to wait for the controller to say "ok, I'm idle". because regardless of whether it's the system DMA controller or some on-chip DMA controller, you generally can *not* just say "stop transferring DMA data", because that will generally just lock the chip up or cause other major unhappiness. So I don't think an external DMA controller (like the i8237, ugh!) really _changes_ anything. Except for just the horrible pain of serializing access to them for programming etc horrible resource handling issues, of course (but that's not specific to suspend/resume). Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Nonfunctional ethernet (was Re: 2.6.21-rc7-mm1 + sysfs-oops-workaround.patch -- INFO: possible recursive locking detected)
Hello, Antonino, Andrew. Andrew Morton wrote: > On Thu, 26 Apr 2007 09:02:02 +0800 "Antonino A. Daplas" <[EMAIL PROTECTED]> > wrote: > >> I can bring up the network manually using ifconfig. It's opensuse's >> rcnetwork script that fails to bring the network up. Entries >> in /sys/class/net are still bogus. >> >> This kernel is now usable to me, I'll start bisection later today if >> nobody has an answer. > > rc7-mm1 is hardly worth bothering with. Quite a few really bad ones have > now been fixed and I'll try to get rc7-mm2 out within the next 12 hours (I > assume a 76-hour debug session won't be needed this time). > > But I don't think the sysfs changes in Greg's tree have been updated, so > things will probably still fail in that area. A suitable bisection > starting pair would be around gregkh-driver-* This is the rename bug I wrote about in the other thread. Can you hold -mm2 off a bit? I'm almost done here. Thanks. -- tejun - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] sworks-agp: Switch to PCI ref counting APIs
On Mon, 23 Apr 2007 14:51:29 +0100 Alan Cox <[EMAIL PROTECTED]> wrote: > { > struct agp_bridge_data *bridge = pci_get_drvdata(pdev); > > + pci_dev_put(bridge->dev); > agp_remove_bridge(bridge); > agp_put_bridge(bridge); > + pci_dev_put(serverworks_private.svrwrks_dev) > + serverworks_private.svrwrks_dev = NULL; err, guys? - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] use mutex instead of semaphore in tty_io.c
On Wed, 25 Apr 2007 20:13:59 +0100 Christoph Hellwig <[EMAIL PROTECTED]> wrote: > On Wed, Apr 25, 2007 at 05:49:34PM +0200, Matthias Kaehlcke wrote: > > drivers/char/tty_io.c uses a semaphore as mutex. use the mutex API > > instead of the (binary) semaphore > > This looks like it should be a spinlock: > > > - down(_ptys_lock); > > + mutex_lock(_ptys_lock); > > idr_remove(_ptys, idx); > > - up(_ptys_lock); > > + mutex_unlock(_ptys_lock); > > idr_remove is a quick operation that doesn't sleep. > > > @@ -2639,24 +2639,24 @@ static int ptmx_open(struct inode * inode, struct > > file * filp) > > nonseekable_open(inode, filp); > > > > /* find a device that is not in use. */ > > - down(_ptys_lock); > > + mutex_lock(_ptys_lock); > > if (!idr_pre_get(_ptys, GFP_KERNEL)) { > > - up(_ptys_lock); > > The idr_pre_get should be moved out of the lock, that's the whole > point for it's existance.. > I think having it inside the lock makes sense: mutex_lock() idr_pre_get() idr_get_new() mutex_unlock() here, if idr_pre_get() succeeded, we know that idr_get_new() will succeed. otoh: try_again: idr_pre_get() mutex_lock() if (idr_get_new() == failed) { mutex_unlock() goto try_again; } mutex_unlock() is not nice. the IDR api is awful. A little project is to rip out all its internal locking and to implement caller-provided locking. Unfortunately the fact that the library allocates memory means that we might need to do awkward things like radix_tree_preload() to make it reliable for callers who use spinlocking. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] CFS scheduler, -v6
On Wednesday 25 April 2007, Ingo Molnar wrote: >i'm pleased to announce release -v6 of the CFS scheduler patchset. The >main goal of CFS is to implement "high quality desktop scheduling" as >well as technically possible. > >The CFS patch against v2.6.21-rc7 or against v2.6.20.7 can be downloaded >from the usual place: > >http://redhat.com/~mingo/cfs-scheduler/ > It hasn't made it to this server yet, and its 22:14 EDT here. -- Cheers, Gene "There are four boxes to be used in defense of liberty: soap, ballot, jury, and ammo. Please use in that order." -Ed Howdershelt (Author) Doing gets it done. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: suspend2 merge (was Re: [Suspend2-devel] Re: CFS and suspend2: hang in atomic copy)
Linus Torvalds wrote: > > On Thu, 26 Apr 2007, Pavel Machek wrote: >> Ok, I guess I'll have nightmares of DMA controllers doing DMAs from >> chips that are no longer there tonight. > > Umm. Welcome to the 21st century: we don't do that "separate DMA > controller" thing any more. All devices do their own DMA. > That was the 1990s. On a brand new server system: 00:08.0 System peripheral: Intel Corporation 5000 Series Chipset DMA Engine (rev b1) For better or worse, slave DMA seems to be making a comeback of sorts. Not to mention all kinds of embedded crap^Whardware with optimized DMA engines which look nothing like PCI at all. -hpa - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: pcmcia - failed to initialize IDE interface
On Wed, 25 Apr 2007 15:27:26 +0200 "Aeschbacher, Fabrice" <[EMAIL PROTECTED]> wrote: > Hi, > > [kernel 2.6.20.7, arch=mips, processor=amd au1550] > > I'm trying to install a 2.6 kernel on an Alchemy au1550, and having > problem with the pcmcia socket, where I plugged a CompactFlash card. The > card seems to be recognized by the kernel, appears in > /sys/bus/pcmcia/devices, but not in /proc/bus/pccard, and I can't access > the device (/dev/hda). > > The relevant console messages: > > pccard: PCMCIA card inserted into slot 0 > pcmcia: registering new device pcmcia0.0 > hda: SanDisk SDCFB-64, CFA DISK drive > ide0: Disabled unable to get IRQ 35. > ide0: failed to initialize IDE interface > ide0: I/O resource 0x10200E-0x10200E not free. > ide0: ports already in use, skipping probe > ide0: I/O resource 0x10200E-0x10200E not free. > ide0: ports already in use, skipping probe > ide0: I/O resource 0x10200E-0x10200E not free. > ide0: ports already in use, skipping probe > ide0: I/O resource 0x10200E-0x10200E not free. > ide0: ports already in use, skipping probe > ide0: I/O resource 0x10200E-0x10200E not free. > ide0: ports already in use, skipping probe > ide0: I/O resource 0x10200E-0x10200E not free. > ide0: ports already in use, skipping probe > ide0: I/O resource 0x10200E-0x10200E not free. > ide0: ports already in use, skipping probe > ide0: I/O resource 0x10200E-0x10200E not free. > ide0: ports already in use, skipping probe > ide0: I/O resource 0x10200E-0x10200E not free. > ide0: ports already in use, skipping probe > ide-cs: ide_register() at 0x102000 & 0x10200e, irq 35 failed > > > Here is the relevant part of the kernel config: > CONFIG_IDE=y > CONFIG_IDE_GENERIC=y > CONFIG_BLK_DEV_IDE=y > CONFIG_BLK_DEV_IDECS=y > CONFIG_PCCARD=y > CONFIG_PCMCIA_DEBUG=y > CONFIG_PCMCIA=y > CONFIG_PCMCIA_AU1X00=y > (cc'ed linux-mips) Perhaps /proc/ioports will tell us where the conflict lies. The output of `dmesg -s 100' might also be needed. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: suspend2 merge (was Re: [Suspend2-devel] Re: CFS and suspend2: hang in atomic copy)
On Thu, 26 Apr 2007, Nigel Cunningham wrote: > > That's where I think you're overstretching the argument. Like suspend >(to ram), we're concerned at the snapshot point with getting the hardware >in the same state at a later stage. Really, no. "suspend to ram" doesn't _have_ a "snapshot point". I've tried to explain this multiple times, I don't know why it's not apparently sinking in. This is much more fundamental than the fact that you don't want to stop disks for snapshotting, although it really boils down to all the same issues: the operations are simply not at all the same! I agree 100% that "snapshot to disk" is a "snapshot event". You have to create a single point in time when everything is stable. And I'd much rather call it "snapshot to disk" than "suspend to disk" to make it clear that it's something _totally_ different from "suspend". Because the thing is, "suspend to ram" is *not* a snapshot event. At no point do you actually need to "snapshot" the system at all. You can just gradually shut more and more things down, and equally gradually bring them back up. There simply is *never* any "snapshot" time from a device standpoint, because you can just shut down devices in the right order AND YOU ARE DONE. Really. [ Obviously s2ram does have one "magic moment", namely the time when the CPU does the magic read from the northbridge that actually turns off power for the CPU. But that's really a total non-event from a device standpoint, so while it's undoubtedly a very interesting moment in the suspend sequence, it's not really relevant in any way for device drivers in general. Not at all like the "snapshot moment" that requires the whole system to be totally quiescent in a "snapshot to disk"! ] And the reason s2ram doesn't have a that "snapshot" moment is exactly that the RAM contents are just always there, so there's no need to have a "synchronization event" when ram and devices match. The RAM will *always* match whatever any particular device has done to it, and the proper way to handle things is to just do a simple per-device "save-and-suspend" event. And yes, the _individual_ "save-and-suspend" events obviously needs to be "atomic", but it's purely about that particular individual device, so there's never any cross-device issues about that. For example, if you're a USB hub controller, which is just about the most complex issue you can have, you obviously want to "save the state" with the controller in a STOPPED state, but that should just go without saying: if the controller isn't stopped, you simply *cannot* save the state, since the state is changing under you. The difference is, that the USB driver needs to just "stop, save, and suspend" as one simple operation for s2ram. In contrast, when doing snapshot to disk, it cannot do that, because while it does want to do the "stop" part, it needs to do so _separately_ from the "save" part because you need to stop everything else *too* before you "save" anythng at all. Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[BUG REPORT] 2.6.21-rc7 - Yukon-EC Ultra <-> sky2 driver bug(s)
Kernel: 2.6.21-rc7 Device: Yukon-EC Ultra (0xb4) rev 2 [integrated on Gigabyte GA-965P-DQ6] OS: Ubuntu 7.04 (Feisty Fawn) Description: The driver reports rx errors, drops carrier due to HW error, rmmod/modprobe combo returns carrier to sane state.. after that it works with rx errors for a while, then OOPSes the kernel in different ways each time ie. ext3 routines, vma (traversal(?)) or this time in the workqueue. Seemingly, random memory corruption takes place. I assume a kernel bug because of the recent git commits regarding sky2, the fact Windows XP works flawlessly and the OOPS itself. Also, that box recently compiled the kernel, so I regard it as stable. The bug is easily reproducible (*sigh* too easy) and occurs also with Ubuntu default kernel - 2.6.20 with Ubuntu patches. After linux boots and crashes, the network card malfunctions even when dual-booted to Windows (causing slowness and reboots). It takes power-off/power-on cycle to bring it back to stable state. Thanks in advance for all help, if you need more info, .config or testing any patches, let me know. Cheers, speedy over ps. not subscribed to LKML, plz. keep me in CC: DMESG output: [ 49.876701] ACPI: PCI Interrupt :03:00.0[A] -> GSI 16 (level, low) -> IRQ 16 [ 49.876713] PCI: Setting latency timer of device :03:00.0 to 64 [ 49.876735] sky2 :03:00.0: v1.13 addr 0xf900 irq 16 Yukon-EC Ultra (0xb4) rev 2 [ 49.876887] PM: Adding info for No Bus:eth0 [ 49.876939] sky2 eth0: addr 00:16:e6:d7:a6:ea [ 49.14] sky2 eth0: enabling interface [ 49.891741] sky2 eth0: ram buffer 0K ... [ 52.326154] sky2 eth0: Link is up at 1000 Mbps, full duplex, flow control both [ 52.328083] ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready [ 53.394751] NET: Registered protocol family 17 ... [ 94.584404] sky2 eth0: rx error, status 0x5ca0002 length 1482 [ 101.984227] sky2 eth0: rx error, status 0x5ac0002 length 1452 [ 102.216648] sky2 eth0: rx error, status 0x5cc0002 length 1484 [ 103.182574] sky2 eth0: rx error, status 0x5ac0002 length 1452 [ 103.604065] sky2 eth0: rx error, status 0x5ac0002 length 1452 [ 103.697021] sky2 eth0: rx error, status 0x5ac0002 length 1452 [ 104.244439] sky2 eth0: rx error, status 0x5ac0002 length 1452 [ 105.038951] sky2 eth0: rx error, status 0x5ac0002 length 1452 [ 105.374538] sky2 eth0: rx error, status 0x5ac0002 length 1452 [ 106.878209] sky2 eth0: rx error, status 0x5ac0002 length 1452 [ 107.328009] sky2 eth0: rx error, status 0x5ac0002 length 1452 [ 107.381861] sky2 eth0: rx error, status 0x5ac0002 length 1452 [ 111.996276] printk: 10 messages suppressed. [ 111.996282] sky2 eth0: rx error, status 0x5ac0002 length 1452 [ 118.404802] printk: 12 messages suppressed. [ 118.404808] sky2 eth0: rx error, status 0x5ac0002 length 1452 [ 174.080264] sky2 eth0: rx error, status 0x5ca0002 length 1482 [ 174.095495] sky2 eth0: rx error, status 0x5ca0002 length 1482 [ 174.102641] sky2 eth0: hw error interrupt status 0x8 [ 174.102645] sky2 eth0: MAC parity error [ 174.181775] sky2 eth0: rx error, status 0x5ca0002 length 1482 [ 176.979478] sky2 eth0: rx error, status 0x5ca0002 length 1482 [ 177.244215] sky2 eth0: rx error, status 0x5ca0002 length 1482 [ 177.617673] sky2 eth0: rx error, status 0x5ca0002 length 1482 [ 177.692007] sky2 eth0: rx error, status 0x5ca0002 length 1482 [ 178.214524] sky2 eth0: rx error, status 0x5ca0002 length 1482 [ 179.230857] printk: 2 messages suppressed. [ 179.230863] sky2 eth0: rx error, status 0x5ca0002 length 1482 [ 184.548409] printk: 9 messages suppressed. [ 184.548415] sky2 eth0: rx error, status 0x5ca0002 length 1482 [ 189.247824] printk: 5 messages suppressed. [ 189.247830] sky2 eth0: rx error, status 0x5ca0002 length 1482 [ 194.293119] printk: 9 messages suppressed. [ 194.293125] sky2 eth0: rx error, status 0x5ca0002 length 1482 [ 196.015470] sky2 eth0: transmit descriptor error (hardware problem) [ 196.015561] sky2 eth0: Link is down. [ 199.212348] sky2 eth0: Link is up at 1000 Mbps, full duplex, flow control both [ 199.212354] sky2 eth0: transmit descriptor error (hardware problem) [ 199.212485] sky2 eth0: Link is down. [ 201.858518] sky2 eth0: Link is up at 1000 Mbps, full duplex, flow control both [ 201.858525] sky2 eth0: transmit descriptor error (hardware problem) [ 201.858657] sky2 eth0: Link is down. [ 204.644601] sky2 eth0: Link is up at 1000 Mbps, full duplex, flow control both [ 204.644608] sky2 eth0: transmit descriptor error (hardware problem) [ 204.644739] sky2 eth0: Link is down. [ 207.396671] sky2 eth0: Link is up at 1000 Mbps, full duplex, flow control both [ 207.396679] sky2 eth0: transmit descriptor error (hardware problem) [ 207.396811] sky2 eth0: Link is down. [ 210.131335] sky2 eth0: Link is up at 1000 Mbps, full duplex, flow control both [ 210.131342] sky2 eth0: transmit descriptor error (hardware problem) [ 210.131472] sky2 eth0: Link is down. # rmmod sky2 [
Re: [PATCH 0/9] Kconfig: cleanup s390 v2.
On Thu, Apr 26, 2007 at 02:32:06AM +0200, Arnd Bergmann wrote: > On Thursday 26 April 2007, Andrew Morton wrote: > > It would be neat if someone could create and maintain a new > > scripts/spot-common-mistakes. Feed it a unified diff and it would complain > > about newly-added code (and only newly-added code) which has busted > > whitespace, adds new semaphores, adds new kernel_thread calls, etc, etc. > > http://patchstylecheck.googlecode.com/svn/trunk/patchstylecheckemail.pl > Might serve as a starting point for this. It doesn't have any semantic > checks right now, but I guess they can be added. Had run this utility against my battery patches, and caught bunch of false positives (I believe). +#define BATTERY_PROP(bat, prop) ({ \ + void *value = bat->get_property(bat, BATTERY_PROP_##prop); \ + value ? *(int*)value : 0; \ +}) Got: "Macros with multiple statements should be enclosed in a do - while loop" I believed ({}) is equivalent for "do - while", it's widely used in kernel. + switch (bp) { + default: break; + }; Got "Gotos should not be indented", at "default: break;" +static int bind_pst_to_psy(struct power_supplicant *pst, + struct power_supply *psy) +{ Got "use tabs not spaces". Here spaces intentionally used for formatting purpose, not for the indenting. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Sleep during spinlock in TPM driver
On Mon, 23 Apr 2007 08:14:03 -0400 (EDT) Parag Warudkar <[EMAIL PROTECTED]> wrote: > --- linux-2.6-us/drivers/char/tpm/tpm.c 2007-04-21 14:55:03.134975360 > -0400 > +++ linux-2.6-wk/drivers/char/tpm/tpm.c 2007-04-22 14:58:51.95763 > -0400 > @@ -942,12 +942,12 @@ > { > struct tpm_chip *chip = file->private_data; > > + flush_scheduled_work(); > spin_lock(_lock); > file->private_data = NULL; > - chip->num_opens--; > del_singleshot_timer_sync(>user_read_timer); > - flush_scheduled_work(); > atomic_set(>data_pending, 0); btw, this driver has a timer handler which does: static void user_reader_timeout(unsigned long ptr) { struct tpm_chip *chip = (struct tpm_chip *) ptr; schedule_work(>work); } which appears to duplicate schedule_delayed_work()'s functionality. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: suspend2 merge (was Re: [Suspend2-devel] Re: CFS and suspend2: hang in atomic copy)
On Wed, 2007-04-25 at 21:25 +0200, Adrian Bunk wrote: > On Wed, Apr 25, 2007 at 11:50:45AM -0700, Linus Torvalds wrote: > > > > > > On Wed, 25 Apr 2007, Adrian Bunk wrote: > > > > > > 3W for the complete system? In CPU state S1? [1] > > > > In STR, 3W is quite realistic. The CPU is off, all (or most - up to you) > > the devices are off, but the motherboard and memory is powered. > > As far as I understand it, the CPU isn't off in S1. > > > > And even 3W would still be a waste of energy. It is, especially if you're living in a place where power infrastructure is unreliable (such as where I live). Currently, because of the summer heat, power demand exceeds power supply so we experience practically daily rotating 4-hour power interruption. That 3W saved multiplied by the total number of computers is a lot. In this perspective, S2D (or shutdown) is preferred over S2RAM. Tony - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: MODULE_MAINTAINER
On Mon, 23 Apr 2007 14:32:36 +0200 Rene Herman <[EMAIL PROTECTED]> wrote: > Provide MODULE_MAINTAINER() as a convenient place to stick a name and email > address both for drivers having multiple (current and non-current) authors > and for when someone who wants to maintain a driver isn't so much an author. > > Signed-off-by: Rene Herman <[EMAIL PROTECTED]> > === > > Rene. > > > > [module_maintainer2.diff text/plain (604B)] > diff --git a/include/linux/module.h b/include/linux/module.h > index 10f771a..3c54774 100644 > --- a/include/linux/module.h > +++ b/include/linux/module.h > @@ -128,6 +128,10 @@ extern struct module __this_module; > /* Author, ideally of form NAME [, NAME ]*[ and NAME ] > */ > #define MODULE_AUTHOR(_author) MODULE_INFO(author, _author) > > +/* Maintainer, ideally of form NAME */ > +#define MODULE_MAINTAINER(_maintainer) \ > + MODULE_AUTHOR("(Maintained by) "_maintainer) > + I'm not sure we want to do this - that's what ./MAINTAINERS is for and we end up having to maintain the same info in two places. I actually use git-whatchanged if I'm unsure who to blame^Wask for help on a particular piece of code. An easy way of doing this is to go to http://git.kernel.org/gitweb.cgi?p=linux/kernel/git/torvalds/linux-2.6.git;a=tree then drill down to the file and hit the "history" link. That will tell you who is *really* doing work on the particular code. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [00/17] Large Blocksize Support V3
On Thu, Apr 26, 2007 at 11:14:49AM +1000, David Chinner wrote: > On Wed, Apr 25, 2007 at 03:46:19PM -0700, Badari Pulavarty wrote: > > On Tue, 2007-04-24 at 15:21 -0700, [EMAIL PROTECTED] wrote: > > > V2->V3 > > > > Hmm.. It broke ext2 :( > > > > V2 worked fine with the small fix I sent you earlier. > > But on V3, I can't run fsx. I see random data showing up. > > I will debug, when I get a chance. > > Same thing on XFS - 'fsx -d -S 42 -R -W foobar' fails on > the tenth operation H - even normal block size filesystems (ext3) are reading bogus data (e.g. /etc/mtod). Cheers, Dave. -- Dave Chinner Principal Engineer SGI Australian Software Group - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC][PATCH] on-demand readahead
On Wed, Apr 25, 2007 at 06:08:44PM +0200, Andi Kleen wrote: > > Yeah, the on-demand readahead can avoid _all_ lookups for small in-cache > > files. > > How? In filemap.c: if (!page) { page_cache_readahead_adaptive(mapping, , filp, page, index, last_index - index); page = find_get_page(mapping, index); } if (page && PageReadahead(page)) { page_cache_readahead_adaptive(mapping, , filp, page, index, last_index - index); } Cache hot files neither have missing pages (!page) or lookahead pages (PageReadahead(page)). So it will not even be called. > > > You seem to have a lot of magic numbers. They probably all need symbols > > > and > > > explanations. > > > > The magic numbers are for easier testings, and will be removed in > > future. For now, they enables convenient comparing of the two > > algorithms in one kernel. > > I mean the 16 and 4 not the sysctl The numbers and the code in get_next_ra_size2() is simply copied from get_next_ra_size(): if (cur < max / 16) { newsize = 4 * cur; } else { newsize = 2 * cur; } It's a trick to ramp up small sizes more quickly. That trick is documented in the related get_init_ra_size(). So, it would be better to put the two routines together to make it clear. > > > > If this new algorithm has been further tested and approved, I'll > > re-submit the patch in a cleaner, standalone form. The adaptive > > readahead patches can be dropped then. They may better be reworked as > > a kernel module. > > If they actually help and don't cause regressions they shouldn't be a module, > but integrated eventually Just it has to be all step by step. Yeah, the adaptive readahead is complex and the possible workloads diverse. It becomes obvious that there is a long way to go, and kernel module makes life easier. > > > Your white space also needs some work. > > > > White space in patch description? > > In the code indentation. Ah, got it: a silly copy/paste mistake. Thank you, Wu - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [00/17] Large Blocksize Support V3
On Wed, Apr 25, 2007 at 03:46:19PM -0700, Badari Pulavarty wrote: > On Tue, 2007-04-24 at 15:21 -0700, [EMAIL PROTECTED] wrote: > > V2->V3 > > Hmm.. It broke ext2 :( > > V2 worked fine with the small fix I sent you earlier. > But on V3, I can't run fsx. I see random data showing up. > I will debug, when I get a chance. Same thing on XFS - 'fsx -d -S 42 -R -W foobar' fails on the tenth operation Cheers, Dave. -- Dave Chinner Principal Engineer SGI Australian Software Group - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: rsdl v46 report,numbers,comments
On Wednesday 25 April 2007 04:26, Mike Mattie wrote: > Hello, > > 0. intro > > I am very happy to report that v46 of RSDL subjectively is much better than > v42. As you (Con Kolivas) might remember from a previous mail I was > experimenting with using nice levels effectively. I have refined these > levels to this layout: > > -2 : clock (ntpd) > -1 : syslog,sshd,X > 0 : command; default for shells > 1 : audacious (audio), xfce window manager (with compositor on ) > 2 : emacs (SCHED_OTHER), desktop/window manager infrastructure (dbus), > ssh-agent , bind (batch scheduled ) 3 : desktop applications (mail , > xchat, openoffice ) > 5 : spamd,batch scheduled compiles/test-suites. > 10 : cron jobs > > 1. Some numbers > > My machine is a particularly tough case I think. A uni-processor Athlon XP > 3000+ (involuntary pre-empt) with a software RAID5 on PATA drives. I load > it heavily with compiles/test-suites, and I am very sensitive to audio > glitches. > > here are some stats for idle: > > ---load-avg--- --memory-usage- total-cpu-usage > interrupts--- ---system-- _1m_ _5m_ 15m_|_used _buff _cach _free|usr > sys idl wai hiq siq|__17_ __18_ __20_|_int_ _csw_ 0.2 0.2 0.2| 170M 15M > 309M 6560k| 2 1 94 4 0 0| 1 7 150 | 238 208 0.2 0.2 > 0.2| 170M 15M 309M 6568k| 1 0 99 0 0 0| 0 0 0 | 76 > 55 0.2 0.2 0.2| 170M 15M 309M 6568k| 0 1 99 0 0 0| 0 > 0 0 | 7547 0.2 0.2 0.2| 170M 15M 309M 6624k| 4 0 96 0 > 0 0| 0 0 0 | 7537 0.2 0.2 0.2| 170M 15M 309M 6624k| > 1 0 99 0 0 0| 0 0 0 | 7536 > > here are some stats for music playing: > > ---load-avg--- --memory-usage- total-cpu-usage > interrupts--- ---system-- _1m_ _5m_ 15m_|_used _buff _cach _free|usr > sys idl wai hiq siq|__17_ __18_ __20_|_int_ _csw_ 0.9 0.4 0.2| 175M 15M > 305M 5652k| 2 1 94 4 0 0| 1 7 150 | 238 210 0.9 0.4 > 0.2| 175M 15M 305M 5652k| 10 1 89 0 0 0| 0 3 989 |1068 > 1510 0.9 0.4 0.2| 175M 15M 305M 5592k| 13 0 87 0 0 0| 0 > 3 1013 |1093 1565 0.9 0.4 0.2| 175M 15M 304M 6300k| 11 1 88 0 > 0 0| 0 3 1000 |1078 1496 0.9 0.4 0.2| 175M 15M 305M 6300k| > 13 0 87 0 0 0| 0 3 1006 |1084 1509 0.8 0.4 0.2| 175M > 15M 305M 6180k| 13 1 86 0 0 0| 0 3 1000 |1078 1524 0.8 > 0.4 0.2| 175M 15M 305M 6060k| 12 1 87 0 0 0| 0 3 1000 > |1078 1564 > > The context switches are high, but so are the interrupts (USB 2.0 Audigy > NX) > > To see how effective using these nice levels were I decided to play with > rr_interval, on the theory that with priorities strictly enforced and used > aggressively that a longer time-slice would not cause audio delay. So far > that theory is holding. All of these numbers are with rr_internal = 20, and > I have less audio problems than any previous kernel/tuning setup. > > That is very impressive. > > as far as batch loading goes I tried a kernel compile. These numbers look > nice for RSDL but there are some caveats: > > kernel compile , CFS v3 : make 756.83s user 89.37s > system 58% cpu 24:08.21 total kernel compile , v46 rr_interval = default : > make 754.66s user 89.74s system 59% cpu 23:35.38 total kernel compile , > v46 rr_interval = 20 : make 682.83s user 84.34s system 73% cpu > 17:29.57 total > > 1. The system was noisy. I did this intentionally. My typical load is a > mixture of desktop/compile. All three numbers were generated while > listening to music, reading docs/web/news, using emacs etc. with each of > the compiles I tried running a visualization plugin (ProjectM inside > audacious ) for a minute or so. > >This skews the numbers for comparison , but I was looking for an > impression that was based off a *real* work-load. > >It would like to add as well that before RSDL the mainline scheduler > failed completely at running ProjectM even when it was the only application > on the desktop. ( It stalled for seconds with a rock steady period ). > > 2. All of these ran nice 5 sched: BATCH > > 3. I have the xfce compositor turned on, using the transparency. > > 4. compiled on software RAID 5 (md) -> dev mapper -> lvm2 -> ext3 , 4 > drives, write-cache disabled, external 512 mg flash drive for a external > journal , commit=15, journal=data > > From the caveats above , especially the deep stack for the block layer, > plus meeting audio deadlines while sharing a interrupt with the journal > drive (arghh) this is very impressive system behavior for me. > > Here is the stats for doing a kernel compile with audacious running, plus > mail,editor etc. > > ---load-avg--- --memory-usage- total-cpu-usage > interrupts--- ---system-- _1m_ _5m_ 15m_|_used _buff _cach _free|usr > sys idl wai hiq siq|__17_ __18_ __20_|_int_ _csw_ 1.31 0.8| 198M 22M >
Re: Nonfunctional ethernet (was Re: 2.6.21-rc7-mm1 + sysfs-oops-workaround.patch -- INFO: possible recursive locking detected)
On Thu, 26 Apr 2007 09:02:02 +0800 "Antonino A. Daplas" <[EMAIL PROTECTED]> wrote: > I can bring up the network manually using ifconfig. It's opensuse's > rcnetwork script that fails to bring the network up. Entries > in /sys/class/net are still bogus. > > This kernel is now usable to me, I'll start bisection later today if > nobody has an answer. rc7-mm1 is hardly worth bothering with. Quite a few really bad ones have now been fixed and I'll try to get rc7-mm2 out within the next 12 hours (I assume a 76-hour debug session won't be needed this time). But I don't think the sysfs changes in Greg's tree have been updated, so things will probably still fail in that area. A suitable bisection starting pair would be around gregkh-driver-* - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: suspend2 merge (was Re: [Suspend2-devel] Re: CFS and suspend2: hang in atomic copy)
On Thu, 26 Apr 2007, Alan Cox wrote: > > You bet there is. We need to know if data arrived or not, because there > is no guarantee that the data retrieved if we inadvertently re-execute a > command will be the same. The hardware state itself isn't the problem, > its the combination of hardware state and internal state which need to > match in some cases. ... which is why "suspend()" suspends the hardware. Is that so hard to understand? Once the hardware is suspended, it's not doing anything. But STR doesn't have any need for atomicity guarantees _between_devices_. That's a really *fundamental* difference. The reason s2ram is *so* different from snapshot-to-disk is exactly the fact that s2ram can (and does) work on one device at a time. In contrast, snapshot-to-disk needs to snapshot all the devices *together*, since it has a separate disk image. See? Two *totally* different cases. They have *nothing* in common. Not the call sequence, not the logic, not *anything*. Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [ck] [REPORT] cfs-v5 vs sd-0.46
On Tuesday 24 April 2007 17:37, Michael Gerdau wrote: > Hi list, > > with cfs-v5 finally booting on my machine I have run my daily > numbercrunching jobs on both cfs-v5 and sd-0.46, 2.6.21-v7 on > top of a stock openSUSE 10.2 (X86_64). Thanks for testing. > Both cfs and sd showed very similar behavior when monitored in top. > I'll show more or less representative excerpt from a 10 minutes > log, delay 3sec. > > sd-0.46 > top - 00:14:24 up 1:17, 9 users, load average: 4.79, 4.95, 4.80 > Tasks: 3 total, 3 running, 0 sleeping, 0 stopped, 0 zombie > Cpu(s): 99.8%us, 0.0%sy, 0.0%ni, 0.0%id, 0.0%wa, 0.2%hi, 0.0%si, > 0.0%st Mem: 3348628k total, 1648560k used, 1700068k free,64392k > buffers Swap: 2097144k total,0k used, 2097144k free, 828204k > cached > > PID USER PR NI VIRT RES SHR S %CPU %MEMTIME+ COMMAND > 6671 mgd 33 0 95508 22m 3652 R 100 0.7 44:28.11 perl > 6669 mgd 31 0 95176 22m 3652 R 50 0.7 43:50.02 perl > 6674 > mgd 31 0 95368 22m 3652 R 50 0.7 47:55.29 perl > > cfs-v5 > top - 08:07:50 up 21 min, 9 users, load average: 4.13, 4.16, 3.23 > Tasks: 3 total, 3 running, 0 sleeping, 0 stopped, 0 zombie > Cpu(s): 99.5%us, 0.2%sy, 0.0%ni, 0.0%id, 0.0%wa, 0.0%hi, 0.3%si, > 0.0%st Mem: 3348624k total, 1193500k used, 2155124k free,32516k > buffers Swap: 2097144k total,0k used, 2097144k free, 545568k > cached > > PID USER PR NI VIRT RES SHR S %CPU %MEMTIME+ COMMAND > 6357 mgd 20 0 92024 19m 3652 R 100 0.6 8:54.21 perl > 6356 mgd 20 0 91652 18m 3652 R 50 0.6 10:35.52 perl > 6359 mgd 20 0 91700 18m 3652 R 50 0.6 8:47.32 perl > > What did surprise me is that cpu utilization had been spread 100/50/50 > (round robin) most of the time. I did expect 66/66/66 or so. You have 3 tasks and only 2 cpus. The %cpu is the percentage of the cpu the task is currently on that it is using; it is not the percentage of the "overall cpu available on the machine". Since you have 3 tasks and 2 cpus, the extra task will always be on one or the other cpu taking half of the cpu but never on both cpus. > What I also don't understand is the difference in load average, sd > constantly had higher values, the above figures are representative > for the whole log. I don't know which is better though. There isn't much useful to say about the load average in isolation. It may be meaningful or not depending on whether it just shows the timing of when the cpu load is determined, or whether there is more time waiting in runqueues. Only throughput measurements can really tell them apart. What is important is that if all three tasks are fully cpu bound and started at the same time at the same nice level, that they all receive close to the same total cpu time overall showing some fairness is working as well. This should be the case no matter how many cpus you have. -- -ck - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/9] Kconfig: cleanup s390 v2.
On Thu, 26 Apr 2007 02:32:06 +0200 Arnd Bergmann <[EMAIL PROTECTED]> wrote: > On Thursday 26 April 2007, Andrew Morton wrote: > > It would be neat if someone could create and maintain a new > > scripts/spot-common-mistakes. Feed it a unified diff and it would complain > > about newly-added code (and only newly-added code) which has busted > > whitespace, adds new semaphores, adds new kernel_thread calls, etc, etc. > > http://patchstylecheck.googlecode.com/svn/trunk/patchstylecheckemail.pl > Might serve as a starting point for this. It doesn't have any semantic > checks right now, but I guess they can be added. > print "Your patch is now worthy to be reviewed by a real person\n"; heh. Yes, that looks like an ideal starting point. Methinks it should do `exit 1' if anything was detected. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Nonfunctional ethernet (was Re: 2.6.21-rc7-mm1 + sysfs-oops-workaround.patch -- INFO: possible recursive locking detected)
On Thu, 2007-04-26 at 07:45 +0800, Antonino A. Daplas wrote: > On Wed, 2007-04-25 at 22:48 +0800, Antonino A. Daplas wrote: > > On Wed, 2007-04-25 at 14:18 +0900, Tejun Heo wrote: > > > Miles Lane wrote: > > > eth0 renamed to eth54 > > BUG: atomic counter underflow at: > > [] show_trace_log_lvl+0x1a/0x30 > > [] show_trace+0x12/0x14 > > [] dump_stack+0x16/0x18 > > [] _atomic_dec_and_lock+0x29/0x4c > > [] dput+0x34/0x103 > > [] sysfs_drop_dentry+0x141/0x149 > > [] sysfs_hash_and_remove+0x89/0x10e > > [] sysfs_remove_link+0xe/0x10 > > [] device_rename+0x110/0x181 > > [] dev_change_name+0x11e/0x1ca > > [] dev_ifsioc+0x330/0x3d7 > > [] dev_ioctl+0x350/0x46e > > [] sock_ioctl+0x1be/0x1ca > > [] do_ioctl+0x1c/0x53 > > [] vfs_ioctl+0x1ec/0x203 > > [] sys_ioctl+0x49/0x62 > > [] sysenter_past_esp+0x5f/0x99 > > === > > The above tracing was caused by CONFIG_SYSFS_DEPRECATED=y and by setting > this to n, the tracing disappeared.. Still, all my network cards are > non-functional. Entries in /sys/class/net are bogus: > > / # cd /sys/class/net/ > /sys/class/net # ls > eth1 eth44 eth54 lo > > /sys/class/net # cd eth1 > -bash: cd: eth1: No such file or directory > > /sys/class/net # ls -l eth1 > lrwxrwxrwx 1 root root 0 Apr 26 07:15 eth1 -> > ../../devices/pci:00/:00:12.0/net/eth0 > > /sys/class/net # cd ../../devices/pci\:00/\:00\:12.0/net/eth0 > -bash: cd: ../../devices/pci:00/:00:12.0/net/eth0: No such file > or directory > > Do you know of any patches I need to revert/apply? Anyway, I have to > boot back to this kernel and find out more what's going on. > More info. I can bring up the network manually using ifconfig. It's opensuse's rcnetwork script that fails to bring the network up. Entries in /sys/class/net are still bogus. This kernel is now usable to me, I'll start bisection later today if nobody has an answer. Tony - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
SD renice recommendation was: Re: [REPORT] cfs-v4 vs sd-0.44
On Tuesday 24 April 2007 16:36, Ingo Molnar wrote: > So, my point is, the nice level of X for desktop users should not be set > lower than a low limit suggested by that particular scheduler's author. > That limit is scheduler-specific. Con i think recommends a nice level of > -1 for X when using SD [Con, can you confirm?], while my tests show that > if you want you can go as low as -10 under CFS, without any bad > side-effects. (-19 was a bit too much) Nice 0 as a default for X, but if renicing, nice -10 as the lower limit for X on SD. The reason for that on SD is that the priority of freshly woken up tasks (ie not fully cpu bound) for both nice 0 and nice -10 will still be the same at PRIO 1 (see the prio_matrix). Therefore, there will _not_ be preemption of the nice 0 task and a context switch _unless_ it is already cpu bound and has consumed a certain number of cycles and has been demoted. Contrary to popular belief, it is not universal that a less niced task will preempt its more niced counterpart and depends entirely on implementation of nice. Yes it is true that context switch rate will go up with a reniced X because the conditions that lead to preemption are more likely to be met, but it is definitely not every single wakeup of the reniced X. Alas, again, I am forced to spend as little time as possible at the pc for my health, so expect _very few_ responses via email from me. Luckily SD is in pretty fine shape with version 0.46. -- -ck - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 10/28] i386: map enough initial memory to create lowmem mappings
Chris Wright wrote: > I was using real hardware with your .config when I reproduced it. > Yes, I first found it on real hardware. I haven't tested my fix on real hardware yet, but it seems OK on kvm. J - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [2/2] Driver for the Maxim DS1WM, a 1-wire bus master ASIC core.
On Tue, 24 Apr 2007 14:02:03 +0400 Evgeniy Polyakov <[EMAIL PROTECTED]> wrote: > +#define DS1WM_CMD_1W_RESET 1 << 0 /* force reset on 1-wire bus */ > +#define DS1WM_CMD_SRA1 << 1 /* enable Search ROM > accelerator mode */ > +#define DS1WM_CMD_DQ_OUTPUT 1 << 2 /* write only - forces bus low */ > +#define DS1WM_CMD_DQ_INPUT 1 << 3 /* read only - reflects state of bus */ > + > +#define DS1WM_INT_PD 1 << 0 /* presence detect */ > +#define DS1WM_INT_PDR1 << 1 /* presence detect result */ > +#define DS1WM_INT_TBE1 << 2 /* tx buffer empty */ > +#define DS1WM_INT_TSRE 1 << 3 /* tx shift register empty */ > +#define DS1WM_INT_RBF1 << 4 /* rx buffer full */ > +#define DS1WM_INT_RSRF 1 << 5 /* rx shift register full */ > + > +#define DS1WM_INTEN_EPD 1 << 0 /* enable presence detect int */ > +#define DS1WM_INTEN_IAS 1 << 1 /* INTR active state */ > +#define DS1WM_INTEN_ETBE1 << 2 /* enable tx buffer empty int */ > +#define DS1WM_INTEN_ETMT1 << 3 /* enable tx shift register empty int */ > +#define DS1WM_INTEN_ERBF1 << 4 /* enable rx buffer full int */ > +#define DS1WM_INTEN_ERSRF 1 << 5 /* enable rx shift register full int */ > +#define DS1WM_INTEN_DQO 1 << 6 /* enable direct bus driving ops > +(undocumented), Szabolcs Gyurko */ These macros are very dangerous - please parenthesise them all. > + > +struct ds1wm_data { > + void*map; > + int bus_shift; /* # of shifts to calc register offsets */ > + struct platform_device *pdev; > + struct ds1wm_platform_data *pdata; > + int irq; > + struct clk *clk; > + int slave_present; > + void*reset_complete; > + void*read_complete; > + void*write_complete; > + u8 read_byte; /* last byte received */ > +}; > + > +static inline void ds1wm_write_register(struct ds1wm_data *ds1wm_data, u32 > reg, > + u8 val) > +{ > +__raw_writeb(val, ds1wm_data->map + (reg << ds1wm_data->bus_shift)); > +} > + > +static inline u8 ds1wm_read_register(struct ds1wm_data *ds1wm_data, u32 reg) > +{ > +return __raw_readb(ds1wm_data->map + (reg << ds1wm_data->bus_shift)); > +} > + > + > +static irqreturn_t ds1wm_isr(int isr, void *data) > +{ > + struct ds1wm_data *ds1wm_data = data; > + u8 intr = ds1wm_read_register(ds1wm_data, DS1WM_INT); > + > + ds1wm_data->slave_present = intr & DS1WM_INT_PDR ? 0 : 1; Normally we'd parenthesise an expression like this so people don't have to go scrambling for the C precedence table. > + if (intr & DS1WM_INT_PD && ds1wm_data->reset_complete) > + complete(ds1wm_data->reset_complete); Ditto (lots of instances of this in this patch) > + if (intr & DS1WM_INT_RBF) { > + ds1wm_data->read_byte = ds1wm_read_register(ds1wm_data, > + DS1WM_DATA); > + if (ds1wm_data->read_complete) > + complete(ds1wm_data->read_complete); > + } > + > + if (intr & DS1WM_INT_TSRE && ds1wm_data->write_complete) > + complete(ds1wm_data->write_complete); > + > + return IRQ_HANDLED; > +} > + > +static int ds1wm_reset(struct ds1wm_data *ds1wm_data) > +{ > + unsigned long timeleft; > + DECLARE_COMPLETION(reset_done); This will cause lockdep warnings. - Convert to DECLARE_COMPLETION_ONSTACK - Test the code using lockdep! This is covered in Documentation/SubmitChecklist, which has many other useful tips. > + ds1wm_data->reset_complete = _done; > + > + ds1wm_write_register(ds1wm_data, DS1WM_INT_EN, DS1WM_INTEN_EPD | > + (ds1wm_data->pdata->active_high ? DS1WM_INTEN_IAS : 0)); > + > + ds1wm_write_register(ds1wm_data, DS1WM_CMD, DS1WM_CMD_1W_RESET); > + > + timeleft = wait_for_completion_timeout(_done, DS1WM_TIMEOUT); > + ds1wm_data->reset_complete = NULL; > + if (!timeleft) { > +dev_dbg(_data->pdev->dev, "reset failed\n"); > +return 1; > + } > + > + /* Wait for the end of the reset. According to the specs, the time > + * from when the interrupt is asserted to the end of the reset is: > + * tRSTH - tPDH - tPDL - tPDI > + * 625 us - 60 us - 240 us - 100 ns = 324.9 us > + * > + * We'll wait a bit longer just to be sure. > + */ > + udelay(500); > + > + ds1wm_write_register(ds1wm_data, DS1WM_INT_EN, > + DS1WM_INTEN_ERBF | DS1WM_INTEN_ETMT | DS1WM_INTEN_EPD | > + (ds1wm_data->pdata->active_high ? DS1WM_INTEN_IAS : 0)); > + > + if (!ds1wm_data->slave_present) { > +dev_dbg(_data->pdev->dev, "reset: no devices found\n"); > +return 1; > +}
Re: suspend2 merge (was Re: [Suspend2-devel] Re: CFS and suspend2: hang in atomic copy)
Sort of my 2-many-cents story on why I need "snapshot/restore"... Am Wed, 25 Apr 2007 13:08:09 -0700 (PDT) schrieb Linus Torvalds <[EMAIL PROTECTED]>: > > > On Wed, 25 Apr 2007, Kenneth Crudup wrote: > > > > Any working suspend-to-disk method takes care of that for me. (I'm > > really not sure why Linus hates S2D so much, though. Back in the day > > there was a lot more BIOS support, but that's been years now.) > > The really sad part is that APM actually did this better.. This really triggers a nerve in me. My laptops (always used models from some years ago, even) didn't necessarily get easier with respect to power management (suspend) over time. My first laptop (Siemens Scenic Mobile 710, 200Mhz Pentium, maxed to 192MB RAM) worked just fine with APM, be it s2ram or s2disk. Everything handled by the BIOS. Admittedly, S2disk was quite slow as it stored all ram and didn't write to the disk as fast as possible, but it worked. S2ram was also a viable option because I was even able to easily swap batteries because the thing had two bays to put batteries in. The next one was a Toshiba Portege 7020 CT (366MHz Pentium2 with dynamic clock, 192MB), supporting both APM and ACPI. Installing Linux was not that easy, I think I remember that APM in kernel froze the box (early 2.6 kernel), while ACPI needed some headache to set up (compiling a fixed DSDT into the kernel, for example)... I needed experimental toshiba_acpi to get functions and the acpi_pm_timer to get something like continuous system clock (special cpu throttling has funny effects). Well, I got it together after some time. Used suspend2 for "snapshot/restore" and actually was able to use ACPI S3 with the glitch of having to unload/load psmouse driver ... until I realized that it only resumed in about 80% of cases (BIOS ). So suspend2 was a badly needed "hack" around the hardware/BIOS to get some sane workflow. I remember dealing with swsusp / pmdisk before... but I really ended up with suspend2 as the thing that works (and I wouldn't have bothered finding this patch if the in-kernel stuff worked for me). Of course this was a long time ago and recently I have seen that in-kernel swsusp works ok, just this unresponsiveness after "restore" due to missing page cache... Now I have an IBM ThinkPad X31 (600-1.4GHz Pentium M, 512MB). ACPI. SpeedStep. The machine generally works fine, hardware config via ACPI seems to be fine. But doing S3/STR? Well... this machine has the odd idea that turning the system off but the screen backlight back on after a second is a good idea. Of course just now S3 worked fine... you cannot even depend on the malfunction -- could have something to do with changing bootup video from LCD to VGA output for some other reason recently. Hm. Perhaps it even may work (after tricking the BIOS!?). But I doubt I'll suddenly develop trust in that. I _had_ trust in APM STR and STD. I am quite confident in suspend2 being able to correctly resume (restore) after a successful suspend (snapshot/restore). And then, STR doesn't help me on the road when I need to exchange the battery (I'd need this special extra battery to put under the ThinkPad for that). Another thing is that the old Siemens has a nice auxilliary monochrome LCD that shows the charge status of the batteries in 5 levels, so you have some means to predict the time you have in STR. The Thinkpad has greed LED for "battery level OK" and red for "battery level low". Well, but the Linux kernel won't change that... Perhaps at some time ACPI implementations in BIOS get to something reliable (hm, should I get a PowerBook instead?) and can be a good partner for Linux which struggles for many years now to get into the post-APM era. Remember reading desktop PC test reports in the c't magazine in the last years, S3 usually did _not_ work; with Windows, even. Well, there must be a reason Microsoft chose to implement the "hibernate" (it _is_ in software, right?). The APM->ACPI transition made me use the software STD (snapshot/restore...;-) and I think I will stay with it for the forseeable future, and be it because I can do fancy things like image encryption. ACPI S3 / STR is a nice addition when it works, for the smaller pauses (changing a train at the station, leaving office for half an hour...), but I consider STD really to be the more important feature that enables me to _never_ close my applications unless I want to do a kernel update. I really must say that some sort of STD is a total must for a laptop for me. On the other hand I once had a Psion 5MX, which basically was on STR all the (non-working) time -- and enabled well over 20h of working time on two AAs. When laptops enter that range of battery life, I guess I could arrange with just doing STR and won't have to worry about changing batteries without AC connection;-) Alrighty then, Thomas. signature.asc Description: PGP signature
Re: [3/3] 2.6.21-rc7: known regressions (v2)
On Wed, 2007-04-25 at 20:33 -0400, Len Brown wrote: > On Wednesday 25 April 2007 14:08, john stultz wrote: > > On Wed, 2007-04-25 at 04:06 -0700, Andrew Morton wrote: > > > On Mon, 23 Apr 2007 23:49:09 +0200 Adrian Bunk <[EMAIL PROTECTED]> wrote: > > > > Subject: acpi_pm clocksource loses time on x86-64 > > > > References : http://lkml.org/lkml/2007/4/17/143 > > > > Submitter : Mikael Pettersson <[EMAIL PROTECTED]> > > > > Handled-By : John Stultz <[EMAIL PROTECTED]> > > > > Status : problem is being debugged > > > > > > The ACPI PM one is *really* odd as its the same clocksource driver on > > both arches. I had Mikael cut out the clocksource frequency adjustments, > > and confirmed both i386 and x86_64 are using the same base freq > > (confirmed via printks). > > If this chipset's PM-timer loses "several minutes per hour" on x86_64, > I would expect it to do the same on i386. I can't imagine what the > difference could be. Any possibility it is the 24-bit version > and we do something funky on wraparound? No, we assume the PM timer wraps at 24 bits and mask it as such on all systems. -john - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC][PATCH] ChunkFS: fs fission for faster fsck
On Wed, Apr 25, 2007 at 04:03:44PM -0700, Valerie Henson wrote: > On Wed, Apr 25, 2007 at 08:54:34PM +1000, David Chinner wrote: > > On Tue, Apr 24, 2007 at 04:53:11PM -0500, Amit Gud wrote: > > > > > > The structure looks like this: > > > > > > -- -- > > > | cnode 0 |-->| cnode 0 |--> to another cnode or NULL > > > -- -- > > > | cnode 1 |- | cnode 1 |- > > > -- | -- | > > > | cnode 2 |-- | | cnode 2 |-- | > > > -- | | -- | | > > > | cnode 3 | | | | cnode 3 | | | > > > -- | | -- | | > > > | | || | | > > > > > > inodes inodes or NULL > > > > How do you recover if fsfuzzer takes out a cnode in the chain? The > > chunk is marked clean, but clearly corrupted and needs fixing and > > you don't know what it was pointing at. Hence you have a pointer to > > a trashed cnode *somewhere* that you need to find and fix, and a > > bunch of orphaned cnodes that nobody points to *somewhere else* in > > the filesystem that you have to find. That's a full scan fsck case, > > isn't? > > Excellent question. This is one of the trickier aspects of chunkfs - > the orphan inode problem (tricky, but solvable). The problem is what > if you smash/lose/corrupt an inode in one chunk that has a > continuation inode in another chunk? A back pointer does you no good > if the back pointer is corrupted. *nod* > What you do is keep tabs on whether you see damage that looks like > this has occurred - e.g., inode use/free counts wrong, you had to zero > a corrupted inode - and when this happens, you do a scan of all > continuation inodes in chunks that have links to the corrupted chunk. This assumes that you know a chunk has been corrupted, though. How do you find that out? > What you need to make this go fast is (1) a pre-made list of which > chunks have links with which other chunks, So you add a new on-disk structure that needs to be kept up to date? How do you trust that structure to be correct if you are not journalling it? What happens if fsfuzzer trashes part of this table as well and you can't trust it? > (2) a fast way to read all > of the continuation inodes in a chunk (ignoring chunk-local inodes). > This stage is O(fs size) approximately, but it should be quite swift. Assuming you can trust this list. if not, finding cnodes is going to be rather slow. > > It seems that any sort of damage to the underlying storage (e.g. > > media error, I/O error or user brain explosion) results in the need > > to do a full fsck and hence chunkfs gives you no benefit in this > > case. > > I worry about this but so far haven't found something which couldn't > be cut down significantly with just a little extra work. It might be > helpful to look at an extreme case. > > Let's say we're incredibly paranoid. We could be justified in running > a full fsck on the entire file system in between every single I/O. > After all, something *might* have been silently corrupted. But this > would be ridiculously slow. We could instead never check the file > system. But then we would end up panicking and corrupting the file > system a lot. So what's a good compromise? > > In the chunkfs case, here's my rules of thumb so far: > > 1. Detection: All metadata has magic numbers and checksums. > 2. Scrubbing: Random check of chunks when possible. > 3. Repair: When we detect corruption, either by checksum error, file >system code assertion failure, or hardware tells us we have a bug, >check the chunk containing the error and any outside-chunk >information that could be affected by it. So if you end up with a corruption in a "clean" part of the filesystem, you may not find out about the corruption on reboot and fsck? You need to trip over the corruption first before fsck can be told it needs to check/repair a given chunk? Or do you need to force a "check everything" fsck in this case? Cheers, Dave. -- Dave Chinner Principal Engineer SGI Australian Software Group - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: BUG: Null pointer dereference in fs/open.c
On Wed, 25 Apr 2007, Andrew Morton wrote: On Wed, 25 Apr 2007 22:53:00 + (GMT) William Heimbigner <[EMAIL PROTECTED]> wrote: On Wed, 25 Apr 2007, Andrew Morton wrote: OK. I am able to use the pktcdvd driver OK in mainline with a piix/sata drive. It could be that something is going wrong at the IDE level for you. Perhaps; I'll try an external usb cd burner, and see where that goes. Are you able to identify the most recent kernel which actually worked? No, because I haven't set packet writing up in Linux before - however, I do know that I've successfully set up packet writing (using 2 of the 3 cd burners I have) in another operating system before. I'll try 2.6.18 and see if that gets me anywhere different, though. OK. A quick summary: mainline's pktcdvd isn't working for William using IDE. It is working for me using sata. So what has happened here is that this code, in ide-cd.c's cdrom_decode_status() is now triggering: } else if (blk_pc_request(rq) || rq->cmd_type == REQ_TYPE_ATA_PC) { /* All other functions, except for READ. */ unsigned long flags; /* * if we have an error, pass back CHECK_CONDITION as the * scsi status byte */ if (blk_pc_request(rq) && !rq->errors) rq->errors = SAM_STAT_CHECK_CONDITION; I suspect this is a bug introduced by 406c9b605cbc45151c03ac9a3f95e9acf050808c (in which case it'll be the third bug so far). Perhaps the IDE driver was previously not considering these requests to be of type blk_pc_request(), and after 406c9b605cbc45151c03ac9a3f95e9acf050808c it _is_ treating them as blk_pc_request() and is incorrectly reporting an error. Or something like that. Guys: help! A follow-up: after looking around a bit, I have managed to get packet writing to work properly on /dev/hdc (before, it was reporting only 1.8 MB available or so; this was a formatting issue). I've also gotten the external cd-rw drive to work. However, I'm still at a loss as to why /dev/hdd won't work. I tried formatting a dvd-rw for this drive, however, it consistently gives me: [27342.503933] drivers/ide/ide-cd.c:729: setting error to 2 [27342.509251] [] show_trace_log_lvl+0x1a/0x30 [27342.514411] [] show_trace+0x12/0x20 [27342.518864] [] dump_stack+0x16/0x20 [27342.523317] [] cdrom_decode_status+0x1f4/0x3b0 [27342.528732] [] cdrom_newpc_intr+0x38/0x320 [27342.533791] [] ide_intr+0x96/0x200 [27342.538157] [] handle_IRQ_event+0x28/0x60 [27342.543139] [] handle_edge_irq+0xa6/0x130 [27342.548121] [] do_IRQ+0x49/0xa0 [27342.552228] [] common_interrupt+0x2e/0x34 [27342.557200] [] mwait_idle+0x12/0x20 [27342.561653] [] cpu_idle+0x4a/0x80 [27342.565934] [] rest_init+0x37/0x40 [27342.570300] [] start_kernel+0x34b/0x420 [27342.575109] [<>] 0x0 [27342.578089] === and doesn't work (the above output was generated by Andrew's patch to log certain areas). # dvd+rw-format /dev/hdd -force * BD/DVDRW/-RAM format utility by <[EMAIL PROTECTED]>, version 7.0. :-( failed to locate "Quick Format" descriptor. * 4.7GB DVD-RW media in Sequential mode detected. * formatting 0.0\:-[ READ TRACK INFORMATION failed with SK=3h/ASC=11h/ACQ=05h]: Input/output error I tried putting in a different dvd-rw, and this time I get: # dvd+rw-format /dev/hdd -force * BD/DVDRW/-RAM format utility by <[EMAIL PROTECTED]>, version 7.0. * 4.7GB DVD-RW media in Sequential mode detected. * formatting 0.0|:-[ FORMAT UNIT failed with SK=5h/ASC=26h/ACQ=00h]: Input/output error William Heimbigner [EMAIL PROTECTED] - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Question about Reiser4
On Wed, 25 Apr 2007 19:03:12 +0400, "Edward Shishkin" <[EMAIL PROTECTED]> said: > [EMAIL PROTECTED] wrote: > > > > >As I understand it, the default Reiser4 DOES NOT USE any compression at > >all, not even tail compression, > > > > ^tail compression^tail conversion > Reiser4 does use tail conversion by default. > > > but saves space by eliminating block > >alignment wastage (tail compression is an option). > > > >So lets LOSE the statistics that involve compression. The results now > >look like this: > > > >.-. > >| FILESYSTEM | TIME |DISK | > >| TYPE |(secs)|USAGE| > >.-. > >|REISER4 | 3462 | 692 | > >|EXT2| 4092 | 816 | > >|JFS | 4225 | 806 | > >|EXT4| 4408 | 816 | > >|EXT3| 4421 | 816 | > >|XFS | 4625 | 779 | > >|REISER3 | 6178 | 793 | > >|FAT32 |12342 | 988 | > >|NTFS-3g |10414 | 772 | > >.-. > > > >These results are still EXTREMELY GOOD for REISER4. > > > > > > Everything is not so simple in the science of testing.. > Would you please change direction of your activity to stressing > instead of benchmarking? Caught oopses would have great value.. > OK? > > Regards, > Edward. > Tail conversion is NOT compression, So what exactly is your point? By "tail compression" I mean plugin ctail40, but since I was never able to get it to work, maybe its not tail compression at all. -- [EMAIL PROTECTED] -- http://www.fastmail.fm - Or how I learned to stop worrying and love email again - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/9] Kconfig: cleanup s390 v2.
On Wed, Apr 25, 2007 at 05:24:47PM -0700, Andrew Morton wrote: > It would be neat if someone could create and maintain a new > scripts/spot-common-mistakes. Feed it a unified diff and it would complain > about newly-added code (and only newly-added code) which has busted > whitespace, adds new semaphores, adds new kernel_thread calls, etc, etc. years and years ago, when the dinosaurs roamed the land, I hacked up.. http://janitor.kernelnewbies.org/scripts/ and then left it by the wayside. Some of the checks it did are actually bogus, but I'm happy to pick that up again if there's interest in it being a useful tool. In fact, I should probably munge it together with a similar thing I wrote at http://www.codemonkey.org.uk/projects/findbugs/ (Warning: scary regexps) > It would need to be fairly simple and easily-extensible, as I can > imagine quite a few things getting added to it. > > (Imagines a procmail rule which just bounces the email if > spot-common-mistakes failed) or a git checkin rule that refuses to commit if it fails ;-) Dave -- http://www.codemonkey.org.uk - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC][PATCH] syctl for selecting global zonelist[] order
On Thu, 26 Apr 2007 09:31:12 +0900 KAMEZAWA Hiroyuki <[EMAIL PROTECTED]> wrote: > > > > So a IA64 platform with i386 sicknesses? And pretty bad case of it since I > > assume that the memory sizes per node are equal. Your solution of taking > > 4G off node 0 and then going to node 1 first must hurt some > > processes running on node 0. > I think so, too. It is because I made this as selectable option. ^ why... sorry. -Kame - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [3/3] 2.6.21-rc7: known regressions (v2)
On Wednesday 25 April 2007 14:08, john stultz wrote: > On Wed, 2007-04-25 at 04:06 -0700, Andrew Morton wrote: > > On Mon, 23 Apr 2007 23:49:09 +0200 Adrian Bunk <[EMAIL PROTECTED]> wrote: > > > > > This email lists some known regressions in Linus' tree compared to 2.6.20. > > > > > > If you find your name in the Cc header, you are either submitter of one > > > of the bugs, maintainer of an affectected subsystem or driver, a patch > > > of you caused a breakage or I'm considering you in any other way > > > possibly involved with one or more of these issues. > > > > > > Due to the huge amount of recipients, please trim the Cc when answering. > > > > > > > > > Subject: HPET enabled freeze my machine at boot > > > workaround: clocksource=acpi_pm > > > References : http://lkml.org/lkml/2007/4/19/370 > > > Submitter : Guilherme Schroeder <[EMAIL PROTECTED]> > > > Caused-By : Thomas Gleixner <[EMAIL PROTECTED]> > > > commit 5d8b34fdcb384161552d01ee8f34af5ff11f9684 > > > Handled-By : John Stultz <[EMAIL PROTECTED]> > > > Status : problem is being debugged > > > > > > > > > Subject: acpi_pm clocksource loses time on x86-64 > > > References : http://lkml.org/lkml/2007/4/17/143 > > > Submitter : Mikael Pettersson <[EMAIL PROTECTED]> > > > Handled-By : John Stultz <[EMAIL PROTECTED]> > > > Status : problem is being debugged > > > > > > > > > Subject: suspend to disk hangs (CONFIG_NO_HZ) > > > References : http://lkml.org/lkml/2007/3/25/217 > > > Submitter : Jeff Chua <[EMAIL PROTECTED]> > > > Status : unknown > > > > That's still rather a lot of bustage from the timekeeping changes. Is > > anything really happening here or have we all given up? > > > The ACPI PM one is *really* odd as its the same clocksource driver on > both arches. I had Mikael cut out the clocksource frequency adjustments, > and confirmed both i386 and x86_64 are using the same base freq > (confirmed via printks). If this chipset's PM-timer loses "several minutes per hour" on x86_64, I would expect it to do the same on i386. I can't imagine what the difference could be. Any possibility it is the 24-bit version and we do something funky on wraparound? -Len > It almost seems like when booting x86_64 the ACPI PM counter is running > slowly! > > Len: Have you ever heard of such a thing? It seems quite unlikely... > > > WRT the HPET freeze issue, I'm still digging there. In that case it > appears the HPET isn't counting, so timekeeping just stops. I was > thinking it might be HRT messing w/ the wrong HPET registers, but so far > that hasn't shaken out. > > I'll spend some more time on these today and see if we get any further. > > thanks > -john > > > - > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to [EMAIL PROTECTED] > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ > - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: suspend2 merge (was Re: [Suspend2-devel] Re: CFS and suspend2: hang in atomic copy)
On Thu, 26 Apr 2007, Pavel Machek wrote: > > Ok, I guess I'll have nightmares of DMA controllers doing DMAs from > chips that are no longer there tonight. Umm. Welcome to the 21st century: we don't do that "separate DMA controller" thing any more. All devices do their own DMA. > Only the fact that we are currently using same device call during > snapshot() and during restore(). We obviously could do _5_ device > calls > > (suspend/resume/freeze/quiesce_disable_dma/thaw) > > ...but that looks like too many calls to me. I'd much rather have five or even more functions that each do *one* obvious thing. Think like a device driver writer: would you prefer to just implement five functions that do something very specific that you know trivially how to do ("I know how to disable interrupts and DMA") or would you want to do some high-level opertion that you don't even know why the caller asks you to suspend? What does "suspend()" even mean when the caller is just going to wake up up immediately again? Is it performance-critical? Should I tear down all my DMA's? I dunno! In other words, splitting things up actually makes things simpler. That's *doubly* true if you can then give each specific function some really clear goals. Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 10/28] i386: map enough initial memory to create lowmem mappings
* Jeremy Fitzhardinge ([EMAIL PROTECTED]) wrote: > Eric W. Biederman wrote: > > Then why you had to allocate enough pages to cause a failure has me stumped. > > Perhaps there is some other bug? > > Perhaps, but nothing comes to mind. I'll see what happens when I boot > this kernel on real hardware (rather than kvm). I was using real hardware with your .config when I reproduced it. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: suspend2 merge (was Re: [Suspend2-devel] Re: CFS and suspend2: hang in atomic copy)
On Wed, 25 Apr 2007, Linus Torvalds wrote: > > The *thaw* needs to happen with devices quiescent. Btw, I sure as hell hope you didn't use "suspend()" for that. You're (again) much better off having a totally separate function that just freezes stuff. So in the "snapshot+shutdown" path, you should have: - prepare_to_snapshot() - allocate memory, and possibly return errors We can skip this, if we just make the rule be that any devices that want to support snapshotting must always have the memory required for snapshotting pre-allocated. Most devices really do allocate memory for their state anyway, and the only real reason for the "prepare" stage here is becasue the final snapshot has to happen with interrupts off, obviously. So *if* we don't need to allocate any memory, and if we don't expect to want to accept some early error case, this is likely useless. - snapshot() - actually save device state that is consistent with the memory image at the time. Called with interrupts off, but the device has to be usable both before and afterwards! And I would seriously suggest that "snapshot()" be documented to not rely on any DMA memory, exactly because the device has to be accessible both before and after (before - because we're running and allocating memory, and after - because we'll be writing thigns out). But see later: For the "resume snapshot" path, I would suggest having - freeze(): quiesce the device. This literally just does the absolute minimum to make sure that the device doesn't do anything surprising (no interrupts, no DMA, no nothing). For many devices, it's a no-op, even if they can do DMA (eg most disk controllers will do DMA, but only as an actual result of a request, and upper layers will be quiescent anyway, so they do *not* need to disable DMA) NOTE! The "freeze()" gets called from the *old* kernel just _before_ a snapshot unpacking!! - restart_snapshot() - actually restart the snapshot (and usually this would involve re-setting the device, not so much trying to restore all the saved state. IOW, it's easier to just re-initialize the DMA command queues than to try to make them "atomic" in the snapshot). NOTE! This gets called by the *new* kernel _after_ the snapshot resume! And if you *want* to, I can see that you might want to actually do a "unfreeze()" thing too, and make the actual shapshotting be: /* We may not even need this.. */ for_each_device() { err = prepare_to_snapshot(); if (err) return err; } /* This is the real work for snapshotting */ cli(); for_each_device() freeze(dev); for_each_device() snapshot(dev); .. snapshot current memory image .. for_each_device_depth_first() unfreeze(dev); sti(); and maybe it's worth it, but I would almost suggest that you just make the rule be that any DMA etc just *has* to be re-initialized by "restart_snapshot()", in which case it's not even necessary to freeze/unfreeze over the device, and "snapshot()" itself only needs to make sure any non-DMA data is safe. But adding the freeze/unfreeze (which is a no-op for most hardware anyway) might make things easier to think about, so I would certainly not *object* to it, even if I suspect it's not necessary. Anyway, the restore_snapshot() sequence should be: /* Old kernel.. Normal boot, load snapshot image */ cli() for_each_device() freeze(dev); restore_snapshot_image(); restore_regs_and_jump_to_image(); /* noreturn */ /* New kernel, gets called at the snapshot restore address * with interrupts off and devices frozen, and memory image * constsntent with what it was at "snapshot()" time */ for_each_dev_depth_first() restore_snapshot(dev); /* And if you want to, just to be "symmetric" for_each_dev_depth_first() unfreeze(dev) although I think you could just make "restore_snapshot()" implicitly unfreeze it too.. */ sti(); /* We're up */ and notice how *different* this is from what happens for s2ram. There really isn't anything in common here. Exactly because s2ram simply doesn't _have_ any of the issues with atomic memory images. So s2ram is just for_each_dev() suspend(dev); cli(); for_each_dev() late_suspend(dev); .. go to sleep .. for_each_dev_depth_first() early_resume(dev); sti(); for_each_dev_depth_first() resume(dev); and has none of the "freeze" issues at all. Doesn't that seem a lot more straightforward? Yes, it's more functions, but each function is a lot more
Re: [PATCH 0/9] Kconfig: cleanup s390 v2.
On Thursday 26 April 2007, Andrew Morton wrote: > It would be neat if someone could create and maintain a new > scripts/spot-common-mistakes. Feed it a unified diff and it would complain > about newly-added code (and only newly-added code) which has busted > whitespace, adds new semaphores, adds new kernel_thread calls, etc, etc. http://patchstylecheck.googlecode.com/svn/trunk/patchstylecheckemail.pl Might serve as a starting point for this. It doesn't have any semantic checks right now, but I guess they can be added. Arnd <>< - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC][PATCH] syctl for selecting global zonelist[] order
On Wed, 25 Apr 2007 12:17:15 -0700 (PDT) Christoph Lameter <[EMAIL PROTECTED]> wrote: > On Wed, 25 Apr 2007, KAMEZAWA Hiroyuki wrote: > > > Make zonelist policy selectable from sysctl. > > > > Assume 2 node NUMA, only node(0) has ZONE_DMA (ZONE_DMA32). > > > > In this case, default (node0's) zonelist order is > > > > Node(0)'s NORMAL -> Node(0)'s DMA -> Node(1)"s NORMAL. > > > > This means Node(0)'s DMA is used before Node(1)'s NORMAL. > > So a IA64 platform with i386 sicknesses? And pretty bad case of it since I > assume that the memory sizes per node are equal. Your solution of taking > 4G off node 0 and then going to node 1 first must hurt some > processes running on node 0. I think so, too. It is because I made this as selectable option. > Whatever you do the memory balance between the two nodes is making > the system behave in an unsymmetric way. > > In some server, some application uses large memory allcation. > > This exhaust memory in the above order. > > Could we add a boot time option instead that changes the zonelist build > behavior? Maybe an arch hook that can deal with it? > Yes, it' in my plan. I'll add boot option support. Thanks, -Kame - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: suspend2 merge (was Re: [Suspend2-devel] Re: CFS and suspend2: hang in atomic copy)
On Thu, 26 Apr 2007, Pavel Machek wrote: Now, if the old kernel left DMAs running, it could be overwriting the data we are copying in. The *thaw* needs to happen with devices quiescent. But that sure doesn't have anythign to do with the "snapshot()" path. In fact, you'll have rebooted the machine in between. Only the fact that we are currently using same device call during snapshot() and during restore(). We obviously could do _5_ device calls (suspend/resume/freeze/quiesce_disable_dma/thaw) ...but that looks like too many calls to me. So what does that have to do with "snapshotting"? I'm not comfortable with memory I'm copying changing under my hands because of some DMA. It just looks like asking for trouble. I _think_ we can get away with DMA running during snapshot, because driver may not assume anything about the DMA result before it got completion interrupt, but... the key is that with STR you don't need to copy the memory (it's staying where it is) for STD you need to copy the memory, and there you halt DMA becouse you need to make an atomic snapshot. David Lang - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/9] Kconfig: cleanup s390 v2.
On Wed, 25 Apr 2007 14:30:11 -0700 Andrew Morton <[EMAIL PROTECTED]> wrote: > But that only applies to things which I merge. There's heaps of stuff > coming in via the git trees which is obviously inadequately reviewed - look > at all the instances of open-coded kernel_thread() which were merged after > the kthread() API was introduced, for example. > > > And other basic stuff like "use mutexes, not semaphores": > > box:/usr/src/25> grep '^+.*[]down[ ]*[(]' patches/git-*.patch | wc -l > 32 > > > > Ever wonder where all those whitespace bugs are coming from? > > box:/usr/src/25> grep '^+.*[]if[(]' patches/git-*.patch | wc -l > 265 > box:/usr/src/25> grep '^+.*[]while[(]' patches/git-*.patch | wc -l > 35 > > > Code which use spaces where it should be using tabs? > > box:/usr/src/25> grep '^+' patches/git-*.patch | wc -l > 1346 > It would be neat if someone could create and maintain a new scripts/spot-common-mistakes. Feed it a unified diff and it would complain about newly-added code (and only newly-added code) which has busted whitespace, adds new semaphores, adds new kernel_thread calls, etc, etc. It would need to be fairly simple and easily-extensible, as I can imagine quite a few things getting added to it. (Imagines a procmail rule which just bounces the email if spot-common-mistakes failed) > > Heaven knows how many more serious problems are being snuck into the tree > via this route. But it won't solve this problem. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: suspend2 merge (was Re: [Suspend2-devel] Re: CFS and suspend2: hang in atomic copy)
> STR does not need to "ensure that you have a consistent snapshot". Linus I think someone's been spiking your guinness again... > Why? Becuase there is no _room_ for inconsistency. There's nothing to be > "inconsistent with", since any changes to memory (by things like DMA or > other setup that happens while the suspend process is going on) is by > _definition_ consistent with the resume image (becasue there is no > separate image). You bet there is. We need to know if data arrived or not, because there is no guarantee that the data retrieved if we inadvertently re-execute a command will be the same. The hardware state itself isn't the problem, its the combination of hardware state and internal state which need to match in some cases. > off DMA and try to make the hardware be wevy wevy quiet while it's hunting > wabbits, it's a lot easier to just do nothing at all on "freeze", and just > make sure that "thaw" will re-initialze the DMA tables entirely! All Who cares about DMA mapping tables, those are easy to deal with, not even that bad with an IOMMU to restore. More problematic is the users data because if we have a device where re-executing a command is not repeatable (eg O_DIRECT SCSI on a shared bus) then we risk being inconsistent in our S2RAM. If we re-run the command on resume having allowed it to prattle on while doing S2anything then we'll get the wrong answer. Now there are lots of devices we don't care about as they don't have state in the form that causes problems - network cards, TV capture etc, but there are cases where it matters that every operation is either finished or not started and there is no doubt about them getting done during the S2RAM/S2DISK S2DISK/S2RAM both need to synchronize the state of a device so it can build a valid snapshot. That bit is a shared requirement just like you said didn't exist. Doesn't even need to involve turning DMA off, just getting a consistent state. The rest can be quite different. Mind you some laptops think S2RAM is just a transition state on the way to disk, leave them in ACPI S2RAM and the firmware will magically turn it into a save to disk and back to ram if the battery runs low or you leave it idle too long. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] ia64: race flushing icache in do_no_page path
This is a very similar problem to a copy-on-write cache flushing problem that Tony Luck fixed in July 2006. In this case the do_no_page function handles a fault in an executable or library that is mmapped from an NFS file system. The code is copied into a newly reallocated page. The lazy_mmu_prot_update() function should be used to flush old entries from the icache for that page on ia64 processors. But that call is made after a set_pte_at call that makes the page accessible to other threads executing the same code. This was seen to cause application crashes when an OpenMP application ran many threads calling same functions at the same time. The first thread to reach a page starts to fault in the new code. One of the other threads overtakes the first and executes old data from the icache. That could result in bad instructions. It is more obvious when an old cache line contains prefetched non-instruction bits that result in an illegal instruction trap. The problem has only been seen on montecito processors which have separate level 2 icache and dcache. This dcache to icache coherency problem is more likely to occur there because of the much larger level 2 icache. I suspect that the non-NFS case is working because direct DMA into the new page is making the instruction cache coherent. Any file system that uses a non-DMA copy into the text page could show the same problem. Signed-off-by: Mike Stroyan <[EMAIL PROTECTED]> diff --git a/mm/memory.c b/mm/memory.c index e7066e7..50c8848 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -2291,6 +2291,7 @@ retry: entry = mk_pte(new_page, vma->vm_page_prot); if (write_access) entry = maybe_mkwrite(pte_mkdirty(entry), vma); + lazy_mmu_prot_update(entry); set_pte_at(mm, address, page_table, entry); if (anon) { inc_mm_counter(mm, anon_rss); @@ -2312,7 +2313,6 @@ retry: /* no need to invalidate: a not-present page shouldn't be cached */ update_mmu_cache(vma, address, entry); - lazy_mmu_prot_update(entry); unlock: pte_unmap_unlock(page_table, ptl); if (dirty_page) { -- Mike Stroyan, [EMAIL PROTECTED] - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] ide-cs: recognize 2GB CompactFlash from Transcend
On Wed, 25 Apr 2007 11:27:09 +0200 "Aeschbacher, Fabrice" <[EMAIL PROTECTED]> wrote: > Without the following patch, the kernel does not automatically detect > 2GB CompactFlash cards from Transcend. > > I'm not sure which correct values must be assigned to the 3th and 4th > parameters (here: 0x709b1bf1, 0xf54a91c8). Anyway, the patch is working > with these values. Tested on arch=mips. > Thanks. Your patch was wordwrapped and had tabs replaced by spaces, btw. > > === > --- linux-2.6.20.7-orig/drivers/ide/legacy/ide-cs.c 2007-04-15 > 21:08:02.0 +0200 > +++ linux-2.6.20.7/drivers/ide/legacy/ide-cs.c 2007-04-25 > 10:53:53.0 +0200 > @@ -64,6 +64,7 @@ > > #define INT_MODULE_PARM(n, v) static int n = v; module_param(n, int, 0) > > +#define PCMCIA_DEBUG 1 > #ifdef PCMCIA_DEBUG > INT_MODULE_PARM(pc_debug, PCMCIA_DEBUG); > #define DEBUG(n, args...) if (pc_debug>(n)) printk(KERN_DEBUG args) I removed the above change > @@ -399,6 +400,7 @@ > PCMCIA_DEVICE_PROD_ID12("TOSHIBA", "MK2001MPL", 0xb4585a1a, > 0x3489e003), > PCMCIA_DEVICE_PROD_ID1("TRANSCEND512M ", 0xd0909443), > PCMCIA_DEVICE_PROD_ID12("TRANSCEND", "TS1GCF80", 0x709b1bf1, > 0x2a54d4b1), > + PCMCIA_DEVICE_PROD_ID12("TRANSCEND", "TS2GCF120", 0x709b1bf1, > 0xf54a91c8), > PCMCIA_DEVICE_PROD_ID12("TRANSCEND", "TS4GCF120", 0x709b1bf1, > 0xf54a91c8), > PCMCIA_DEVICE_PROD_ID12("WIT", "IDE16", 0x244e5994, 0x3e232852), > PCMCIA_DEVICE_PROD_ID12("WEIDA", "TWTTI", 0xcc7cf69c, > 0x212bb918), I'm never sure whether it's Bart or Dominik who handles pcmcia-cs patches. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: suspend2 merge (was Re: [Suspend2-devel] Re: CFS and suspend2: hang in atomic copy)
Hi! > > > Why? Becuase there is no _room_ for inconsistency. There's nothing to be > > > "inconsistent with", since any changes to memory (by things like DMA or > > > other setup that happens while the suspend process is going on) is by > > > _definition_ consistent with the resume image (becasue there is no > > > separate image). > > > > Do you propose to keep DMAs running while suspending-to-RAM? > > What part of "suspend a chip" do you have trouble with? > > DMA obviously does *not* happen with a suspended device. There's no need > to turn DMA even off - it just doesn't happen! Ok, I guess I'll have nightmares of DMA controllers doing DMAs from chips that are no longer there tonight. > > > For example, the whole myth that "freeze" needs to shut off DMA is a > > > total > > > and utter *myth*. It needs nothing of the sort at all. Rather than shut > > > off DMA and try to make the hardware be wevy wevy quiet while it's > > > hunting > > > wabbits, it's a lot easier to just do nothing at all on "freeze", > > > > No. Sorry, you are wrong here. > > > > Remember that during resume we run > > > > freeze() > > copy old data into memory > > thaw() > > > > Now, if the old kernel left DMAs running, it could be overwriting > > the data we are copying in. > > The *thaw* needs to happen with devices quiescent. > > But that sure doesn't have anythign to do with the "snapshot()" path. In > fact, you'll have rebooted the machine in between. Only the fact that we are currently using same device call during snapshot() and during restore(). We obviously could do _5_ device calls (suspend/resume/freeze/quiesce_disable_dma/thaw) ...but that looks like too many calls to me. > So what does that have to do with "snapshotting"? I'm not comfortable with memory I'm copying changing under my hands because of some DMA. It just looks like asking for trouble. I _think_ we can get away with DMA running during snapshot, because driver may not assume anything about the DMA result before it got completion interrupt, but... Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] i386: For debugging, make the initial page table setup less forgiving.
Eric W. Biederman wrote: >> The issue is not a matter of avoiding duplicate work, but making sure >> all the pagetables are consistent from Xen's perspective. >> >> Specifically, you may not ever, at any time, create a writable mapping >> of a page which is currently part of an active pagetable. This means >> that when we're creating mappings of physical memory, the pages which >> are part of the current pagetable must be mapped RO. The easiest way I >> found to guarantee that is to copy the Xen-provided pagetable as a >> template, and only update pages which are missing. >> > > Hmm. I now see your problem. > > >> The other way I could do this is to have special-purpose init-time >> version of xen_set_pte which checks to see if it's making a RO mapping >> RW, and refuse to do it. That would minimize the changes to mm/init.c, >> but give init-time set_pte rather unexpected hidden semantics. >> > > Yes. However how do we handle attempting to create this kind > of mapping when mmap /dev/mem? or /dev/kmem? > Hm, I hadn't thought about that. I'm not sure that /dev/k?mem is very useful in an unprivileged guest, but I guess its useful for debugging or stats or something. It's tricky to tell whether an arbitrary pfn is part of a pagetable or not; there's a PG_PINNED page flag to tell you if its active, but iff you've already determined its a pagetable page. > I'm pretty certain there are other paths through the kernel where > we can get page table mapping. > > Right now by leaving things read-only you are hiding from the kernel > what you are really trying to do. That makes me distinctly > uncomfortable. In general when things get swept under the rug > we can never handle the properly. Although this issue may be small > enough it doesn't matter. > Well, the general idea is that in a paravirtualized environment pagetable pages need special handling. Different hypervisors need different handling, but they all need something special. The paravirt hooks are intended to capture all the interesting events, without over-constraining what special thing the hypervisor wants to do at that point. That's why I went for the "allow the hypervisor to provide a prototype pagetable, and avoid the bits it has already set up"; it allows it to do whatever it wants, without getting too specific about what that is, and retains a fairly straightforward interface. > I suspect what we want to do is come up with a function to call > to test to see if a page should be read-only and map such pages > _PAGE_KERNEL_RO, or _PAGE_KERNEL_RO_EXEC if it's code. > Hm, I think that's a hard function to write in general. For the special case of pagetable_init it wouldn't be too hard, but it doesn't seem like a big improvement over the current state of affairs. > Speaking of things what are paravirt_alloc_pd and parafirt_alloc_pd > supposed to do? > (alloc_pd and alloc_pt) Broadly speaking, they tell the hypevisor that there's a new page about to be attached to the pagetable. Xen uses it as the hook to map those pages RO if the pagetable is active. VMI (and lguest?) use it to tell the hypervisor's shadow pagetable machinery that there's something new to track. J - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: suspend2 merge (was Re: [Suspend2-devel] Re: CFS and suspend2: hang in atomic copy)
On Thu, 26 Apr 2007, Pavel Machek wrote: > > > > Why? Becuase there is no _room_ for inconsistency. There's nothing to be > > "inconsistent with", since any changes to memory (by things like DMA or > > other setup that happens while the suspend process is going on) is by > > _definition_ consistent with the resume image (becasue there is no > > separate image). > > Do you propose to keep DMAs running while suspending-to-RAM? What part of "suspend a chip" do you have trouble with? DMA obviously does *not* happen with a suspended device. There's no need to turn DMA even off - it just doesn't happen! > > For example, the whole myth that "freeze" needs to shut off DMA is a total > > and utter *myth*. It needs nothing of the sort at all. Rather than shut > > off DMA and try to make the hardware be wevy wevy quiet while it's hunting > > wabbits, it's a lot easier to just do nothing at all on "freeze", > > No. Sorry, you are wrong here. > > Remember that during resume we run > > freeze() > copy old data into memory > thaw() > > Now, if the old kernel left DMAs running, it could be overwriting > the data we are copying in. The *thaw* needs to happen with devices quiescent. But that sure doesn't have anythign to do with the "snapshot()" path. In fact, you'll have rebooted the machine in between. So what does that have to do with "snapshotting"? Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: suspend2 merge (was Re: [Suspend2-devel] Re: CFS and suspend2: hang in atomic copy)
On Thu, 26 Apr 2007, Pavel Machek wrote: > > > For suspend to ram, in contrast, since you *know* that nobody will be > > touching the hardware, and since the timings are very different anyway > > (you'd hope that you can resume in a second or two), you'd generally want > > to keep the DMA engine tables right where they are, and just literally > > suspend the PCI chip itself. > > I'd actually prefer resume to be similar to module insert, too... Do > you think that resume is _that_ time critical? I think it probably depends on the device, and it should depend on the driver writer how he wants to do it. My _point_ is that there is absolutely zero reason to think that the two events are the same. We *know* that for snapshot+shutdown, we need to actually keep the DMA tables intact *over* the snapshot (because writing out the snapshot may _need_ them). But exactly because we keep them intact, a driver writer may sanely say "I didn't even bother shutting them down, so on thaw, I cannot trust them, so I'll just re-initialize them entirely". In contrast, over suspend-to-ram, it's entirely reasonable to just leave them in memory, and just keep them. There's no *reason* not to. And that's my whole point in this argument: the two paths are fundamentally totally different. You *claim* that "snapshot()" needs to stop DMA etc, but that's simply not true. So I claim: - for a lot of devices, it's actually a *lot* easier to just have snapshot not do anythign at all, and re-initialze on thaw - for those same devices, for s2ram, since the tables are *safe* and don't get touched by anything else, it's probably easier to just let them be. See? The "it's easier to do X" is a _different_ X for the two cases. So the whole "suspend is a superset of freeze" is simply not true. > [I'd like you to drop me a line saying you understand current design > and that it works -- even if it is very inelegant] I _do_ understand the current design. I just think that it's totally and seriously broken. I've ranted against it before. I think it's stupid to play like you're "suspending" something just to save some state, especially since most users probably don't even *want* to suspend the state, and would quite happily re-initialize the chip instead. And I think it's horrible to have a dynamic flag to tell the difference between two or more state changes that the devices should statically be able to determine. _If_ some driver really does have the same routine, just use the same routine. There are no downsides to splitting them up. > Now, we can separate suspend/freeze and resume/thaw (with some common > helpers). It will speed the code up by avoiding unneccessary > operations. It also needs attetion from driver writers (ouch). > > Do we want to do that? I'd personally certainly want to do that. But I want to split up the callers too. Right now we mix those a lot as well. I suspect that would automatically be fixed by just forcing them to separate out (since they now call different functions of the devices), but I'm not 100% sure. There might be other issues. Just as an example: one of the most painful things there is in the suspend sequence is that we shut off the console (because the console device will be suspended in hw, and it's thus not safe to use it over a suspend/resume sequence). That should just go away entirely for "snapshot()", because there is *never* any excuse for actually turning off the console during a snapshot: even a network console should be entirely functional. Things like that - things that sound like small issues, but that really aren't. (Right now you can enable the "don't disable the console" config option, but since network drivers will actually shut down etc, it just means that you'll have oopses etc if you do, and you have netconsole enabled) Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Linux 2.6.20.8
From: Greg KH <[EMAIL PROTECTED]> Date: Wed, 25 Apr 2007 16:52:10 -0700 > Because I haven't been applying any network-related patches unless you > forward them to me, based on what happened the last time I did that > without asking :) :-) I'm trying not to be too controlling and stay out of the way every once in a while :) > So, sorry, I didn't realize this was a big issue, can you forward the > needed patches to me? I'll do a new release with them in it after I get > back from dinner. I'll send it to you under seperate cover, thanks a lot Greg. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Linux 2.6.20.8
On Wed, Apr 25, 2007 at 04:29:44PM -0700, David Miller wrote: > From: Greg KH <[EMAIL PROTECTED]> > Date: Wed, 25 Apr 2007 14:22:25 -0700 > > > We (the -stable team) are announcing the release of the 2.6.20.8 kernel. > > This release has a security bugfix so any users of kernels older than > > 2.6.20.7 are highly encouraged to upgrade as soon as possible. > > Greg, Yoshifuji sent you an ipv6 security fix of nearly > equally severity yesterday. > > Why did you leave that out? Because I haven't been applying any network-related patches unless you forward them to me, based on what happened the last time I did that without asking :) So, sorry, I didn't realize this was a big issue, can you forward the needed patches to me? I'll do a new release with them in it after I get back from dinner. thanks, greg k-h - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.21-rc7-mm1: BUG_ON in kthread_bind during _cpu_down
On Thu, 26 Apr 2007 01:10:21 +0200 "Rafael J. Wysocki" <[EMAIL PROTECTED]> wrote: > Hi, > > The BUG_ON in khthread_bind (line 165 in kthread.c) triggers for me during > attempted suspend to disk, when disable_nonboot_cpus() calls _cpu_down() > (on x86_64). I guess the backtrace would be pretty important here. Guys, please don't add BUG_ONs unless there is simply no sane way to recover. Because when someone goofs up, the BUG_ON will kill the whole machine and everyone else who has code being tested in -mm loses a tester. Plus a BUG_ON *greatly* decreases our chances of getting a trace from the tester: dead box, nothing in the logs. --- a/kernel/kthread.c~fix-kthread_create-vs-freezer-theoretical-race-dont-be-obnoxious +++ a/kernel/kthread.c @@ -162,7 +162,10 @@ EXPORT_SYMBOL(kthread_create); */ void kthread_bind(struct task_struct *k, unsigned int cpu) { - BUG_ON(k->state != TASK_UNINTERRUPTIBLE); + if (k->state != TASK_UNINTERRUPTIBLE) { + WARN_ON(1); + return; + } /* Must have done schedule() in kthread() before we set_task_cpu */ wait_task_inactive(k); set_task_cpu(k, cpu); _ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: suspend2 merge (was Re: [Suspend2-devel] Re: CFS and suspend2: hang in atomic copy)
Hi! > > Both of them have to ensure you can make a consistent snapshot. > > Bzzt. Wrong again. Very much so. > > STR does not need to "ensure that you have a consistent snapshot". > > Why? Becuase there is no _room_ for inconsistency. There's nothing to be > "inconsistent with", since any changes to memory (by things like DMA or > other setup that happens while the suspend process is going on) is by > _definition_ consistent with the resume image (becasue there is no > separate image). Do you propose to keep DMAs running while suspending-to-RAM? That sounds really unsafe; we are shutting down our PCI controllers at that time; doing that while DMAs are running sounds bad. > That's TOTALLY DIFFERENT from "suspend to disk". In suspend to disk, you > need a completely different kind of mindset, namely you need a single > consistent image, where the image is consistent not only with memory, but > with all the devices. > > For example, the whole myth that "freeze" needs to shut off DMA is a total > and utter *myth*. It needs nothing of the sort at all. Rather than shut > off DMA and try to make the hardware be wevy wevy quiet while it's hunting > wabbits, it's a lot easier to just do nothing at all on "freeze", No. Sorry, you are wrong here. Remember that during resume we run freeze() copy old data into memory thaw() . Now, if the old kernel left DMAs running, it could be overwriting the data we are copying in. It is not about DMA tables. While resuming, CPU needs to be alone, without interference from DMA engines (or other CPUs), because copying back old image means writing to memory that was not properly alocated. (Now, we could add one more hook, turn_off_dmas_for_copyback(), but that looks like way too many hooks to me. And I'm not comfortable with DMA engines running while I'm trying to copy image. They may be overwriting data I'm trying to copy...) Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Nonfunctional ethernet (was Re: 2.6.21-rc7-mm1 + sysfs-oops-workaround.patch -- INFO: possible recursive locking detected)
On Wed, 2007-04-25 at 22:48 +0800, Antonino A. Daplas wrote: > On Wed, 2007-04-25 at 14:18 +0900, Tejun Heo wrote: > > Miles Lane wrote: > eth0 renamed to eth54 > BUG: atomic counter underflow at: > [] show_trace_log_lvl+0x1a/0x30 > [] show_trace+0x12/0x14 > [] dump_stack+0x16/0x18 > [] _atomic_dec_and_lock+0x29/0x4c > [] dput+0x34/0x103 > [] sysfs_drop_dentry+0x141/0x149 > [] sysfs_hash_and_remove+0x89/0x10e > [] sysfs_remove_link+0xe/0x10 > [] device_rename+0x110/0x181 > [] dev_change_name+0x11e/0x1ca > [] dev_ifsioc+0x330/0x3d7 > [] dev_ioctl+0x350/0x46e > [] sock_ioctl+0x1be/0x1ca > [] do_ioctl+0x1c/0x53 > [] vfs_ioctl+0x1ec/0x203 > [] sys_ioctl+0x49/0x62 > [] sysenter_past_esp+0x5f/0x99 > === The above tracing was caused by CONFIG_SYSFS_DEPRECATED=y and by setting this to n, the tracing disappeared.. Still, all my network cards are non-functional. Entries in /sys/class/net are bogus: / # cd /sys/class/net/ /sys/class/net # ls eth1 eth44 eth54 lo /sys/class/net # cd eth1 -bash: cd: eth1: No such file or directory /sys/class/net # ls -l eth1 lrwxrwxrwx 1 root root 0 Apr 26 07:15 eth1 -> ../../devices/pci:00/:00:12.0/net/eth0 /sys/class/net # cd ../../devices/pci\:00/\:00\:12.0/net/eth0 -bash: cd: ../../devices/pci:00/:00:12.0/net/eth0: No such file or directory Do you know of any patches I need to revert/apply? Anyway, I have to boot back to this kernel and find out more what's going on. Tony - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: W1 printk format warning
On Wed, 25 Apr 2007 16:21:04 -0700 Randy Dunlap <[EMAIL PROTECTED]> wrote: > in 2.6.21-rc7-mm1. Are you aware of this? > > drivers/w1/w1.c:460: warning: too few arguments for format > > dev_dbg(>dev, "%s: registering %s as %p.\n", __func__, > >dev.bus_id[0]); > Yeah, that's because Dan's dev_dbg-check-dev_dbg-arguments.patch added printk arg-checking to dev_dbg() and a bunch of bugs got exposed. I fixed a few of them. Incidentally, there are at least four different drivers which privately do things like: #if !defined(DEBUG) #undef dev_dbg static inline int __attribute__ ((format (printf, 2, 3))) dev_dbg( const struct device *_dev, const char *fmt, ...) {return 0;} #endif which can all be removed with Dan's (good) patch in place. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] i386: For debugging, make the initial page table setup less forgiving.
Jeremy Fitzhardinge <[EMAIL PROTECTED]> writes: > Eric W. Biederman wrote: >> No. Please just remove the conditionals on the leaf pages. >> > > So, to be specific, you mean make updating the pte_t entries (and pmd_t > entries which refer to hugepages) entries unconditional? I mean make updating pte_t and pmd_t entries that refer to identity mapped physical pages unconditional. >> We know exactly what we require them to be, there is minimal >> cost and no downside to just setting the pte entries to >> what we want them to be for the identity mapping. >> >> It doesn't make sense for paravirtualization or anything else to >> influence that. >> >> This may be redoing work that has been done before but it is >> doing it all one common place. >> > > The issue is not a matter of avoiding duplicate work, but making sure > all the pagetables are consistent from Xen's perspective. > > Specifically, you may not ever, at any time, create a writable mapping > of a page which is currently part of an active pagetable. This means > that when we're creating mappings of physical memory, the pages which > are part of the current pagetable must be mapped RO. The easiest way I > found to guarantee that is to copy the Xen-provided pagetable as a > template, and only update pages which are missing. Hmm. I now see your problem. > The other way I could do this is to have special-purpose init-time > version of xen_set_pte which checks to see if it's making a RO mapping > RW, and refuse to do it. That would minimize the changes to mm/init.c, > but give init-time set_pte rather unexpected hidden semantics. Yes. However how do we handle attempting to create this kind of mapping when mmap /dev/mem? or /dev/kmem? I'm pretty certain there are other paths through the kernel where we can get page table mapping. Right now by leaving things read-only you are hiding from the kernel what you are really trying to do. That makes me distinctly uncomfortable. In general when things get swept under the rug we can never handle the properly. Although this issue may be small enough it doesn't matter. I suspect what we want to do is come up with a function to call to test to see if a page should be read-only and map such pages _PAGE_KERNEL_RO, or _PAGE_KERNEL_RO_EXEC if it's code. Speaking of things what are paravirt_alloc_pd and parafirt_alloc_pd supposed to do? Eric - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: suspend2 merge (was Re: [Suspend2-devel] Re: CFS and suspend2: hang in atomic copy)
Hi! > > Current design is: > > Broken. Yes. I've tried to tell you. Ok. ... > It's worse than just confusing, it's *idiotic*. > > It _can_ work in practice, but > - we have pretty damn solid evidence that it doesn't work all that often >in practice > - the fact that something *can* be done the stupid way is in no way an >argument that it *should* be done the stupid way. > > I claim that the current STD is *stupid*. Yes, it can work. But that > doesn't make it less stupid. Good. So you understand how it works. > What's your argument? Your argument seems to be that it's not stupid, > because it can work. Can't you see that that simply isn't an > argument at I tried keeping module_init/thaw/resume similar code, so that driver authors can debug suspend-to-disk, cross their fingers, and have suspend-to-ram work, too. Now, perhaps enough people do std/str these days so this is not important any longer... lets hope so. Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/