where is the code for read system call?
My application reads from socket. I need to change the behavior of read system call for an experiment. Can someone point me to code? thanks - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [broken-out-2007-07-20-00-22] kernel bug at kernel/params:570
On 7/21/07, Greg KH [EMAIL PROTECTED] wrote: On Fri, Jul 20, 2007 at 03:59:12PM -0700, Andrew Morton wrote: On Fri, 20 Jul 2007 15:50:47 -0700 Greg KH [EMAIL PROTECTED] wrote: On Fri, Jul 20, 2007 at 06:32:21PM +0200, Michal Piotrowski wrote: Hi Greg, This looks like a sysfs bug http://www.stardust.webpages.pl/files/tbf/bitis-gabonica/ broken-out-2007-07-20-00-22/3.jpg l *kernel_param_sysfs_setup+0x75 0xc13c0894 is in kernel_param_sysfs_setup (kernel/params.c:570). 565 mk-mod = THIS_MODULE; 566 kobj_set_kset_s(mk, module_subsys); 567 kobject_set_name(mk-kobj, name); 568 kobject_init(mk-kobj); 569 ret = kobject_add(mk-kobj); 570 BUG_ON(ret 0); 571 param_sysfs_setup(mk, kparam, num_params, name_skip); 572 kobject_uevent(mk-kobj, KOBJ_ADD); 573 } 574 http://www.stardust.webpages.pl/files/tbf/bitis-gabonica/broken-out-2007-07-20-00-22/mm-config What kernel version is this happening on? The -mm tree? Can you try Linus's tree instead? It looks like there was some needed information right before the first stack dump, showing exactly what kobject was trying to be added that was already present. Odds are this is a kernel parameter with the same name as a duplicate one within the same module, I don't think that's an -EEXIST. I think what we have here is kobject_add() exiting with -EINVAL. (kobject attempted to be registered with no name!) [ The first trace on that screen shows: kobject_shadow_add+0x5b/0x189. That's the WARN_ON(1) at lib/kobject.c:176. If it was a EEXIST case, we would've seen an offset in kobject_shadow_add closer to 0x189, because the dump_stack() for EEXIST is barely 4 instructions before we return from that function. ] but the trick is going to be trying to figure out what module is causing this. So I'd guess we want to search for a module that's passing a kobject * to kobject_add() such that !kobj-k_name is true. So it's not a sysfs bug, but rather a driver issue that this is catching. In that case a BUG was way too harsh treatment, and in fact directly contributed to our inability to debug the bug! Can we wind that back a bit? Add some useful printks and then recover in some fashion? [...] So I'm guessing he was trying to catch something specific here. Considering that: (1) This isn't a bug that should bring down the kernel that hard, and, (2) kobject_shadow_add() seems to be dumping enough stacks and printing printk's on errors already, I'd suggest to just get rid of the BUG_ON() in kernel_param_sysfs_setup() Satyam - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] x86: Create clflush() inline, remove hardcoded wbinvd
Glauber de Oliveira Costa wrote: On Fri, 2007-07-20 at 14:19 -0700, H. Peter Anvin wrote: Create an inline function for clflush(), with the proper arguments, and use it instead of hard-coding the instruction. This also removes one instance of hard-coded wbinvd, based on a patch by Bauder de Oliveira Costa. Hey, Who's that guy that got a name so close to mine? ;-) That would be Mr. Typo! Cc: Andi Kleen [EMAIL PROTECTED] Cc: Glauber de Oliveira Costa [EMAIL PROTECTED] I got it right here at least :-/ -hpa - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] compat_ioctl requires CONFIG_BLOCK
On Saturday 21 July 2007, Sebastian Siewior wrote: Got with randconfig include/linux/loop.h:66: error: expected specifier-qualifier-list before 'request_queue_t' make[1]: *** [fs/compat_ioctl.o] Error 1 parts of compat ioctl require CONFIG_BLOCK to be set. Signed-off-by: Sebastian Siewior [EMAIL PROTECTED] Index: b/fs/compat_ioctl.c === --- a/fs/compat_ioctl.c +++ b/fs/compat_ioctl.c @@ -63,7 +63,9 @@ #include linux/wireless.h #include linux/atalk.h #include linux/blktrace_api.h +#ifdef CONFIG_BLOCK #include linux/loop.h +#endif Adding #ifdef around an #include is considered bad style. Better just make loop.h compile without any conditionals. Does the below patch work for you? Arnd --- a/include/linux/loop.h +++ b/include/linux/loop.h @@ -63,7 +63,7 @@ struct loop_device { struct task_struct *lo_thread; wait_queue_head_t lo_event; - request_queue_t *lo_queue; + struct request_queue*lo_queue; struct gendisk *lo_disk; struct list_headlo_list; }; - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v3] pcmcia: CompactFlash driver for PA Semi Electra boards
On Thu, 5 Jul 2007 09:49:14 -0500 [EMAIL PROTECTED] (Olof Johansson) wrote: Driver for the CompactFlash slot on the PA Semi Electra eval board. It's a simple device sitting on localbus, with interrupts and detect/voltage control over GPIO. The driver is implemented as an of_platform driver, and adds localbus as a bus being probed by the of_platform framework. Signed-off-by: Olof Johansson [EMAIL PROTECTED] --- On Mon, Jun 25, 2007 at 03:43:41PM -0500, olof wrote: The ifdef is needed since for CONFIG_PCMCIA=n builds, the bus notifier isn't available. I wanted to do the bus notifier registration explicitly before the of_platform bus probe to avoid later surprises due to reordered initcalls in case it was split up in it's own initcall. I could add the code under ifdef as well, but it didn't seem too critical. Once the second major board comes along I'll probably move it out to a per-board file, there's no real need for it just yet. Alright, turns out I still need to declare the extern bus type, which would mean two #ifdefs in one function. Moving it out instead. I've addressed Milton's comments as well. Who's maintaining PCMCIA? MAINTAINERS only lists a mailing list, no person. Seems weird for a component that's marked as maintained. Dominik Brodowski. He's having a bit of downtime at present (exams, I think). He expects to return. Meanwhile, cc'ing me usually has some effect. ... +static const char driver_name[] = electra-cf; ... +static struct of_device_id electra_cf_match[] = +{ + { + .compatible = electra-cf, + }, + {}, +}; Could have reused driver_name[] here, if that was appropriate. +static struct of_platform_driver electra_cf_driver = +{ + .name = (char *)driver_name, ug. But it's not your fault - we should have always made it const. --- mainline.orig/arch/powerpc/platforms/pasemi/setup.c +++ mainline/arch/powerpc/platforms/pasemi/setup.c I never know who maintains random-scruffy-ppc code like this. From a peek in the git-whatchanged output, it appears to be yourself. Have a few little fixies: --- a/drivers/pcmcia/electra_cf.c~pcmcia-compactflash-driver-for-pa-semi-electra-boards-fix +++ a/drivers/pcmcia/electra_cf.c @@ -201,9 +201,7 @@ static int __devinit electra_cf_probe(st if (!cf) return -ENOMEM; - init_timer(cf-timer); - cf-timer.function = electra_cf_timer; - cf-timer.data = (unsigned long) cf; + setup_timer(cf-timer, electra_cf_timer, (unsigned long)cf); cf-irq = NO_IRQ; cf-ofdev = ofdev; @@ -340,16 +338,14 @@ static int __devexit electra_cf_remove(s return 0; } -static struct of_device_id electra_cf_match[] = -{ +static struct of_device_id electra_cf_match[] = { { .compatible = electra-cf, }, {}, }; -static struct of_platform_driver electra_cf_driver = -{ +static struct of_platform_driver electra_cf_driver = { .name = (char *)driver_name, .match_table= electra_cf_match, .probe= electra_cf_probe, @@ -371,4 +367,3 @@ module_exit(electra_cf_exit); MODULE_LICENSE(GPL); MODULE_AUTHOR (Olof Johansson [EMAIL PROTECTED]); MODULE_DESCRIPTION(PA Semi Electra CF driver); - _ - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC, Announce] Unified x86 architecture, arch/x86
On Fri, 20 Jul 2007 18:38:39 -0400 Jeff Garzik [EMAIL PROTECTED] wrote: I agree with Andi... it's quite nice to be able to leave some arch/i386 stuff, and not carry it over to arch/x86-64. Its easy enough to push that stuff into arch/x86/legacy and have one subdirectory of stuff to pull in for ancient systems. Alan - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC, Announce] Unified x86 architecture, arch/x86
Hi, On 21/07/07, Thomas Gleixner [EMAIL PROTECTED] wrote: We are pleased to announce a project we've been working on for some time: the unified x86 architecture tree, or arch/x86 - and we'd like to solicit feedback about it. What is this about? [..] As usual, comments and suggestions are welcome! I really like this idea - code duplication is a bad thing. BTW. I don't see any regression here :) Thomas, Ingo Regards, Michal -- LOG http://www.stardust.webpages.pl/log/ - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC, Announce] Unified x86 architecture, arch/x86
* Michal Piotrowski [EMAIL PROTECTED] wrote: We are pleased to announce a project we've been working on for some time: the unified x86 architecture tree, or arch/x86 - and we'd like to solicit feedback about it. What is this about? [..] As usual, comments and suggestions are welcome! I really like this idea - code duplication is a bad thing. BTW. I don't see any regression here :) cool - could you tell us a bit more about on what type of box you tried it, and how wide and versatile the .config is? Ingo - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [broken-out-2007-07-20-00-22] kernel bug at kernel/params:570
Oh, which means ... On 7/21/07, Satyam Sharma [EMAIL PROTECTED] wrote: On 7/21/07, Greg KH [EMAIL PROTECTED] wrote: On Fri, Jul 20, 2007 at 03:59:12PM -0700, Andrew Morton wrote: On Fri, 20 Jul 2007 15:50:47 -0700 Greg KH [EMAIL PROTECTED] wrote: On Fri, Jul 20, 2007 at 06:32:21PM +0200, Michal Piotrowski wrote: Hi Greg, This looks like a sysfs bug http://www.stardust.webpages.pl/files/tbf/bitis-gabonica/ broken-out-2007-07-20-00-22/3.jpg l *kernel_param_sysfs_setup+0x75 0xc13c0894 is in kernel_param_sysfs_setup (kernel/params.c:570). 565 mk-mod = THIS_MODULE; 566 kobj_set_kset_s(mk, module_subsys); 567 kobject_set_name(mk-kobj, name); Shouldn't the return of kobject_set_name() be checked here? [ Looking at code, and realizing that kobject_set_name() manages to succeed even when given a null string! ] 568 kobject_init(mk-kobj); 569 ret = kobject_add(mk-kobj); 570 BUG_ON(ret 0); 571 param_sysfs_setup(mk, kparam, num_params, name_skip); 572 kobject_uevent(mk-kobj, KOBJ_ADD); 573 } 574 http://www.stardust.webpages.pl/files/tbf/bitis-gabonica/ broken-out-2007-07-20-00-22/mm-config What kernel version is this happening on? The -mm tree? Can you try Linus's tree instead? It looks like there was some needed information right before the first stack dump, showing exactly what kobject was trying to be added that was already present. Odds are this is a kernel parameter with the same name as a duplicate one within the same module, I don't think that's an -EEXIST. I think what we have here is kobject_add() exiting with -EINVAL. (kobject attempted to be registered with no name!) [ The first trace on that screen shows: kobject_shadow_add+0x5b/0x189. That's the WARN_ON(1) at lib/kobject.c:176. If it was a EEXIST case, we would've seen an offset in kobject_shadow_add closer to 0x189, because the dump_stack() for EEXIST is barely 4 instructions before we return from that function. ] but the trick is going to be trying to figure out what module is causing this. So I'd guess we want to search for a module that's passing a kobject * to kobject_add() such that !kobj-k_name is true. Oh, that's kernel_param_sysfs_setup itself. So we actually need to search for a built-in module in Michal's config that ... has an ... empty modname !? Shouldn't that turn up pretty quickly in a grep? How do I do that, btw? - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC, Announce] Unified x86 architecture, arch/x86
Alan Cox wrote: On Fri, 20 Jul 2007 18:38:39 -0400 Jeff Garzik [EMAIL PROTECTED] wrote: I agree with Andi... it's quite nice to be able to leave some arch/i386 stuff, and not carry it over to arch/x86-64. Its easy enough to push that stuff into arch/x86/legacy and have one subdirectory of stuff to pull in for ancient systems. The other thing is that legacy in this context is fungible. No IOMMU was legacy until the Intel x86-64 chips came out, and I can promise you that some legacy code will be necessary once we start seeing VIA and others come out with embedded x86-64. On the other hand, it's pretty bloody safe to assume that we'll never see an x86-64 chip without CPUID, CMOV, FXSAVE, SSE-2, CMPXCHG, etc. -hpa - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: joydev.c and saitek cyborg evo force
On 20/06/07, Jiri Kosina [EMAIL PROTECTED] wrote: Could you please send me the report descriptor of the device, so that I could debug it locally here? Hi Jiri, sorry for the delay, below the report descriptor and attached is the full report when I've connected the joystick. report descriptor (size 851, read 851) = 05 01 09 04 a1 01 09 01 a1 00 85 06 09 30 15 00 26 00 10 35 00 46 00 10 75 10 95 01 81 02 09 31 81 02 05 02 09 bb 26 ff 00 46 ff 00 75 08 81 02 05 09 19 01 29 0c 25 01 45 01 75 01 95 0c 81 02 05 01 09 39 25 07 46 3b 01 55 00 65 44 75 04 95 01 81 42 65 00 05 02 09 ba 26 ff 00 46 ff 00 75 08 81 02 c0 05 0f 09 92 a1 02 85 02 09 a6 09 a4 09 a0 09 9f 25 01 45 00 75 01 95 04 81 02 75 04 95 01 81 03 09 22 75 07 25 09 81 02 09 94 75 01 25 01 81 02 75 08 81 03 c0 09 21 a1 02 85 0b 09 22 25 09 91 02 09 25 a1 02 09 26 09 30 09 32 09 31 09 33 09 34 09 40 09 41 15 01 25 08 91 00 c0 09 53 25 0c 75 05 91 02 09 56 15 00 25 01 75 01 91 02 09 55 a1 02 05 01 09 30 09 31 95 02 91 02 c0 05 0f 09 50 27 fe ff 00 00 47 fe ff 00 00 75 10 95 01 55 fd 66 01 10 91 02 55 00 65 00 09 57 26 ff 00 46 68 01 75 08 65 44 91 02 65 00 09 54 27 fe ff 00 00 47 fe ff 00 00 75 10 55 fd 66 01 10 91 02 55 00 65 00 09 58 a1 02 05 0a 09 01 09 02 26 2b 01 45 00 95 02 91 02 c0 05 0f 09 a7 27 fe ff 00 00 47 fe ff 00 00 95 01 55 fd 66 01 10 91 02 55 00 65 00 c0 09 5a a1 02 85 0c 09 23 26 2b 01 45 00 91 02 09 5c 26 10 27 46 10 27 55 fd 66 01 10 91 02 55 00 65 00 09 5b 25 7f 75 08 91 02 09 5e 26 10 27 75 10 55 fd 66 01 10 91 02 55 00 65 00 09 5d 25 7f 75 08 91 02 c0 09 73 a1 02 85 0d 09 23 26 2b 01 45 00 75 10 91 02 09 70 15 81 25 7f 36 f0 d8 46 10 27 75 08 91 02 c0 09 6e a1 02 85 0e 09 23 15 00 26 2b 01 35 00 45 00 75 10 91 02 09 70 25 7f 46 10 27 75 08 91 02 09 6f 15 81 36 f0 d8 91 02 09 71 15 00 26 ff 00 35 00 46 68 01 91 02 09 72 26 10 27 46 10 27 75 10 55 fd 66 01 10 91 02 55 00 65 00 c0 09 5f a1 02 85 0f 09 23 26 2b 01 45 00 91 02 09 61 15 9c 25 64 36 f0 d8 46 10 27 75 08 91 02 09 62 91 02 09 60 16 0c fe 26 f4 01 75 10 91 02 09 65 15 00 26 e8 03 35 00 91 02 09 63 25 64 75 08 91 02 09 64 91 02 c0 09 77 a1 02 85 51 09 22 25 09 45 00 91 02 09 78 a1 02 09 7b 09 79 09 7a 15 01 25 03 91 00 c0 09 7c 15 00 26 fe 00 91 02 c0 09 92 a1 02 85 52 09 96 a1 02 09 9a 09 99 09 97 09 98 09 9b 09 9c 15 01 25 06 91 00 c0 c0 05 ff 0a 01 03 a1 02 85 40 0a 02 03 a1 02 1a 11 03 2a 20 03 25 10 91 00 c0 0a 03 03 15 00 27 ff ff 00 00 75 10 91 02 c0 05 0f 09 7d a1 02 85 43 09 7e 26 80 00 46 10 27 75 08 91 02 c0 09 85 a1 02 85 44 09 86 27 ff ff 00 00 45 00 75 10 91 02 09 87 91 02 09 88 91 02 c0 05 ff 0a 00 01 a1 02 85 81 05 01 09 30 15 81 25 7f 36 f0 d8 46 10 27 75 08 91 02 09 31 91 02 c0 05 0f 09 7f a1 02 85 0b 09 80 15 00 26 ff 7f 35 00 45 00 75 0f b1 03 09 a9 25 01 75 01 b1 03 09 83 26 ff 00 75 08 b1 03 09 84 25 10 b1 03 09 a8 a1 02 09 73 09 6e 09 5a 09 5f 95 04 b1 03 c0 c0 c0 cheers, --renato Reclaim your digital rights, eliminate DRM, learn more at http://www.defectivebydesign.org/what_is_drm joy-dmesg.log.gz Description: GNU Zip compressed data
Re: [RFC, Announce] Unified x86 architecture, arch/x86
On 21/07/07, Ingo Molnar [EMAIL PROTECTED] wrote: * Michal Piotrowski [EMAIL PROTECTED] wrote: We are pleased to announce a project we've been working on for some time: the unified x86 architecture tree, or arch/x86 - and we'd like to solicit feedback about it. What is this about? [..] As usual, comments and suggestions are welcome! I really like this idea - code duplication is a bad thing. BTW. I don't see any regression here :) cool - could you tell us a bit more about on what type of box you tried it, it is an old P4 (i386) and how wide and versatile the .config is? http://www.stardust.webpages.pl/files/tbf/bitis-gabonica/2.6.22-git15/config Ingo Regards, Michal -- LOG http://www.stardust.webpages.pl/log/ - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Documentation for sysfs, hotplug, and firmware loading.
On Thursday 19 July 2007 4:16:17 am Cornelia Huck wrote: On Wed, 18 Jul 2007 13:39:53 -0400, Rob Landley [EMAIL PROTECTED] wrote: Nope. If you recurse down under /sys/class following symlinks, you go into an endless loop bouncing off of /sys/devices and getting pointed back. If you don't follow symlinks, it works fine up until about 2.6.20 at which point things that were previously directories BECAME symlinks because the directories got moved, and it all broke. I have no idea what you're doing. See the email to kay sievers. In 2.6.14 following symlinks hit an endless /sys/block/hda/device/block/device/block/device/block... This has changed since, like much of sysfs, but in the absence of either a spec or a stable API there's no guarantee it won't reoccur. Which is why I want it documented where to look for these suckers. Just give me ONE STABLE WAY TO FIND THIS INFORMATION, PLEASE. See Documentation/sysfs-rules.txt. Ok: Paragraph 1: It's not stable. Paragraph 2: It's not stable. Paragraph 3: If you really really need to access it directly... Paragraph 4: DO NOT DO $XXX. Paragraph 5: Expect it to be mounted at /sys Paragraph 6: DO NOT DO $XXX. (Specficially, the way you were distinguishing between block and char devices? Don't do that. No, we won't tell you what to replace it with, keep reading.) So far, not exactly gripping reading. Paragraph 7: What a devpath is. Ok, is it just me or does it say that applications shouldn't use the symlinks in sysfs? Why are they there, then? Paragraph 8: The kernel has a name for the device. Paragraph 9: Subsystem is a string. What it means, we leave for you to guess. Paragraph 10: Driver is the name of a driver. (Does this mean a driver is currently loaded and handling the device, or that the kernel is suggesting a driver based on something like PCI ID, through the kind of mechanism that used to be used to request module loading? Experimentally, it looks like the first, which makes sense but isn't specified. Does something like /sys/class/mem/zero or have a driver? Experimentally, no, it hasn't got a device link.) Paragraph 11: Atributes, and yet more DO NOT DO $XXX. It took me three reads of that to figure out they probably meant Attributes belong to a device, don't confuse the attributes of another device with attributes of this device. (Following _which_ device symlink?) Ok, back up. /sys/devices does not contain all the information necessary to populate /dev, because it hasn't got things like ramdisks, /dev/zero, /dev/console which are THERE in sysfs, which may or may not be supported by the kernel (the kernel might have ramdisk support, might not). These things could also, in future, have their major and minor numbers dynamically (even randomly) assigned. That's been discussed on this list. I'm not trying to document /sys/devices. I'm trying to document hotplug, populating /dev, and things like firmware loading that fall out of that. This requires use of sysfs, and I'm only trying to document as much of sysfs as you need to do that. I'm not documenting stuff like /sys/devices/system/cpu. The consensus so far is the udev implementation is the spec, except I watched the udev implementation change rather a lot before I stopped tracking it, and saw a number of people complain on this list about things breaking when they upgraded the kernel but not udev. Back to reading the document: - Properties of parent devices never belong into a child device. Belong into? Always look at the parent devices themselves for determining device context properties. For determining? What was the original language of this document? If the device 'eth0' or 'sda' does not have a driver-link, then this device does not have a driver. Again, whether they mean the kernel was not built with a driver that can handle this device or no driver is currently loaded and handling this device. It _sounds_ like this device is not supported by Linux, which probably isn't what they meant. Never copy any property of the parent-device into a child-device. I note that the only mention made so far of parent-child relationships in devices is in terms of don'ts. I assume they're talking about how a partition can be the child of a block device, and a network controller card can be the child of a pci bus device? Ah, I see. The next paragraph is on hierarchy, yet doesn't actually explain anything, other than to imply that the device hierarchy being fully represented there is a dream to be achieved sometime in the future but not necessarily the truth with today's kernels, because stuff is still being _moved_ into /sys/devices. - Classification by subsystem There are currently three places for classification of devices: /sys/block, /sys/class and /sys/bus. So if somebody wants to write code that runs on a current kernel, they have no alternative but to look in these three places. If future kernels
Re: v2.6.22.1-rt3
On Thu, 2007-07-19 at 20:37 -0700, Daniel Walker wrote: The broken out series is here, ftp://source.mvista.com/pub/dwalker/rt/patch-2.6.22.1-rt4-dw1.tar.gz I'll pick that up soon. Thanks, tglx - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [broken-out-2007-07-20-00-22] kernel bug at kernel/params:570
On 21/07/07, Satyam Sharma [EMAIL PROTECTED] wrote: Oh, which means ... On 7/21/07, Satyam Sharma [EMAIL PROTECTED] wrote: On 7/21/07, Greg KH [EMAIL PROTECTED] wrote: On Fri, Jul 20, 2007 at 03:59:12PM -0700, Andrew Morton wrote: On Fri, 20 Jul 2007 15:50:47 -0700 Greg KH [EMAIL PROTECTED] wrote: On Fri, Jul 20, 2007 at 06:32:21PM +0200, Michal Piotrowski wrote: Hi Greg, This looks like a sysfs bug http://www.stardust.webpages.pl/files/tbf/bitis-gabonica/ broken-out-2007-07-20-00-22/3.jpg l *kernel_param_sysfs_setup+0x75 0xc13c0894 is in kernel_param_sysfs_setup (kernel/params.c:570). 565 mk-mod = THIS_MODULE; 566 kobj_set_kset_s(mk, module_subsys); 567 kobject_set_name(mk-kobj, name); Shouldn't the return of kobject_set_name() be checked here? [ Looking at code, and realizing that kobject_set_name() manages to succeed even when given a null string! ] 568 kobject_init(mk-kobj); 569 ret = kobject_add(mk-kobj); 570 BUG_ON(ret 0); 571 param_sysfs_setup(mk, kparam, num_params, name_skip); 572 kobject_uevent(mk-kobj, KOBJ_ADD); 573 } 574 http://www.stardust.webpages.pl/files/tbf/bitis-gabonica/ broken-out-2007-07-20-00-22/mm-config What kernel version is this happening on? The -mm tree? Can you try Linus's tree instead? It looks like there was some needed information right before the first stack dump, showing exactly what kobject was trying to be added that was already present. Odds are this is a kernel parameter with the same name as a duplicate one within the same module, I don't think that's an -EEXIST. I think what we have here is kobject_add() exiting with -EINVAL. (kobject attempted to be registered with no name!) [ The first trace on that screen shows: kobject_shadow_add+0x5b/0x189. That's the WARN_ON(1) at lib/kobject.c:176. If it was a EEXIST case, we would've seen an offset in kobject_shadow_add closer to 0x189, because the dump_stack() for EEXIST is barely 4 instructions before we return from that function. ] but the trick is going to be trying to figure out what module is causing this. So I'd guess we want to search for a module that's passing a kobject * to kobject_add() such that !kobj-k_name is true. Oh, that's kernel_param_sysfs_setup itself. So we actually need to search for a built-in module in Michal's config that ... has an ... empty modname !? I'll try to figure out this Shouldn't that turn up pretty quickly in a grep? How do I do that, btw? Regards, Michal -- LOG http://www.stardust.webpages.pl/log/ - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC, Announce] Unified x86 architecture, arch/x86
On Saturday 21 July 2007, Thomas Gleixner wrote: The topic of sharing more x86 code has been discussed on LKML a number of times. Various approaches were discussed and we decided to advance the discussion by implementing a full solution that brings the transition to a shared tree to completion. Great stuff. I've worked on doing the same for s390 and powerpc in the past, and really think it's the right thing to do. I've even started my own x86 merge two or three times in the past but never got very far because of the quickly moving source. In this initial implementation the old arch/i386 and arch/x86_64 trees are removed _immediately_, in the same commit, and all future x86 development goes on in the new, shared tree. So the transition right now is one atomic operation. As a next step we plan to generate a gradual, fully bisectable, fully working switchover from the current code to the fully populated arch/x86 tree. It will result in about 1000-2000 commits. We are releasing our current solution because it 100% represents the finally resulting arch/x86 source tree already, and we first wanted to make sure that the new architecture layout works fine and folks are happy before we go and do the (even more complex) fine-grained work. I don't think it's really good to do it this way, or maybe I'm still misunderstanding where you're going. If you really want to end up with the exact set of files that you have your tree now, I see absolutely zero point in making it bisectable. On the contrary, there is nothing particularly complicated in it, so once it has seen some amount of testing it can better get merged in one big changeset. I'm just not convinced that it actully is what we want to end up with. In my experience, it's very helpful to have a single set of header files, and merging the two versions of one header usually exposes bugs that have been fixed in only one of the two, so you get to fix actual bugs in the process. In the s390 merge, I also started out in an attempt to guarantee unchanged object files, much like what you describe. However, it turned out that fixing it in the process is actually easier. Either way, 'diff -D __x86_64__' is a great tool for a start, you should try it out to see how easy it is to merge a lot of files. To put it into perspective, I think the s390 merge was a lot easier than the x86 merge, because there is only a very limited set of hardware configurations for s390 compared to others. We ended up doing the full merge with three people within less than a week and no separate files at all. OTOH, the powerpc merge is now going into its third year, mostly because it was started with the intention to remove all cruft in the process and to only allow sane code into the new architecture. The steps that I'd suggest instead are: * merge all exported header files of the two architectures. This alone is a worthy goal, because it allows us to get rid of the ugly code for deciding which version to use in installed headers and elsewhere. * Merge the remaining header files, to end up with a single include/asm-x86 directory. * Come up with a model that integrates the machine type selection of i386 with the way we build things on x86_64. One way would be to make X86_64 another platform next to X86_PC, X86_VOYAGER and the others. * Create an arch/x86/Kconfig that handles the new common configuration * Create an arch/x86/Makefile that descends into ../i386/* and ../x86_64/* instead of its subdirectories. * Merge the arch/x86/* subdirectories, one at a time, starting with the low-hanging fruit like oprofile or pci, and do the hard ones like mm and kernel last. Unfortunately, I don't think I'll spend much time on this, so I don't get to decide on it, but you asked for feedback ;-) Arnd - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [git patches] two warning fixes
On Fri, 2007-07-20 at 20:34 +0200, Krzysztof Halasa wrote: Linus Torvalds [EMAIL PROTECTED] writes: More people *should* generally ask themselves: was the warning worth it? and then, if the answer is no, they shouldn't add code, they should remove the thing that causes the warning in the first place. Sure. If a routine uses must_check yet its return value may be safely ignored then that must_check is simply misplaced and should be removed. It does not mean all must_checks are bad - each of them isn't bad unless one can demonstrate it is. Back to sysfs_create_bin_file() - if one can demonstrate a caller can safely ignore the return value (which, it seems, is the case), then exactly this very must_check should be removed Typically, the EDID creation in radeonfb :-) In fact, I'm not even sure there's -any- user of those sysfs files. I added them back then to allow distros to extract the EDID infos that were probed by radeonfb to properly configure the X server (because on some machines, the EDID is coming from the firmware/BIOS, not from DDC, and X can't get at it). I don't know if they ever used them. In any case, it doesn't make sense to abort initialization of the driver if for some reasons those files can't be created (for example, the core fbdev starts exposing EDID files, radeonfb isn't properly updated, name clash, error). Aborting the initialization will make sure that on some machines such as powermacs with radeon, whatever error is displayed will never be seen by the user. That's a typical, but I have plenty more. For example, the powermac thermal control drivers. They work pretty well by themselves. They also expose via sysfs all the current values, fan speeds, temps ,etc... for the sake of whoever wants to do a GUI or monitor what's going on, but that is not critical to the operation of the driver. Thus, failure to create those files is not critical. I have plenty other examples. Thus, we have two choices here: - The simple one: sysfs_create_blah() displays a warning when it fails and has no must_check - The one that adds code everywhere (the current one): sysfs_create_blah() returns an error, has much_check, and thus all callers like I described abvoe need to add code to test it and print a warning. Lots of added .text and .data for little benefit. Cheers, Ben. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[GIT PULL] MMC updates
Linus, please pull from git://git.kernel.org/pub/scm/linux/kernel/git/drzeus/mmc.git for-linus to receive the following updates: MAINTAINERS |7 ++- drivers/mmc/host/at91_mci.c | 13 - drivers/mmc/host/sdhci.c|2 ++ drivers/mmc/host/sdhci.h|1 + 4 files changed, 21 insertions(+), 2 deletions(-) Marc Pignat (1): mmc: at91_mci: wakeup on card insertion (or removal) Pierre Ossman (2): mmc: add maintainer for at91 sdhci: make sure to clear the error interrupt diff --git a/MAINTAINERS b/MAINTAINERS index fbe0dca..c9fab2b 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -645,7 +645,12 @@ W: http://linux-atm.sourceforge.net S: Maintained ATMEL AT91 MCI DRIVER -S: Orphan +P: Nicolas Ferre +M: [EMAIL PROTECTED] +L: [EMAIL PROTECTED] (subscribers-only) +W: http://www.atmel.com/products/AT91/ +W: http://www.at91.com/ +S: Maintained ATMEL MACB ETHERNET DRIVER P: Haavard Skinnemoen diff --git a/drivers/mmc/host/at91_mci.c b/drivers/mmc/host/at91_mci.c index 28c8818..15aab37 100644 --- a/drivers/mmc/host/at91_mci.c +++ b/drivers/mmc/host/at91_mci.c @@ -903,8 +903,10 @@ static int __init at91_mci_probe(struct platform_device *pdev) /* * Add host to MMC layer */ - if (host-board-det_pin) + if (host-board-det_pin) { host-present = !at91_get_gpio_value(host-board-det_pin); + device_init_wakeup(pdev-dev, 1); + } else host-present = -1; @@ -940,6 +942,7 @@ static int __exit at91_mci_remove(struct platform_device *pdev) host = mmc_priv(mmc); if (host-present != -1) { + device_init_wakeup(pdev-dev, 0); free_irq(host-board-det_pin, host); cancel_delayed_work(host-mmc-detect); } @@ -966,8 +969,12 @@ static int __exit at91_mci_remove(struct platform_device *pdev) static int at91_mci_suspend(struct platform_device *pdev, pm_message_t state) { struct mmc_host *mmc = platform_get_drvdata(pdev); + struct at91mci_host *host = mmc_priv(mmc); int ret = 0; + if (device_may_wakeup(pdev-dev)) + enable_irq_wake(host-board-det_pin); + if (mmc) ret = mmc_suspend_host(mmc, state); @@ -977,8 +984,12 @@ static int at91_mci_suspend(struct platform_device *pdev, pm_message_t state) static int at91_mci_resume(struct platform_device *pdev) { struct mmc_host *mmc = platform_get_drvdata(pdev); + struct at91mci_host *host = mmc_priv(mmc); int ret = 0; + if (device_may_wakeup(pdev-dev)) + disable_irq_wake(host-board-det_pin); + if (mmc) ret = mmc_resume_host(mmc); diff --git a/drivers/mmc/host/sdhci.c b/drivers/mmc/host/sdhci.c index 10d15c3..4a24db0 100644 --- a/drivers/mmc/host/sdhci.c +++ b/drivers/mmc/host/sdhci.c @@ -1024,6 +1024,8 @@ static irqreturn_t sdhci_irq(int irq, void *dev_id) intmask = ~(SDHCI_INT_CMD_MASK | SDHCI_INT_DATA_MASK); + intmask = ~SDHCI_INT_ERROR; + if (intmask SDHCI_INT_BUS_POWER) { printk(KERN_ERR %s: Card is consuming too much power!\n, mmc_hostname(host-mmc)); diff --git a/drivers/mmc/host/sdhci.h b/drivers/mmc/host/sdhci.h index 7400f4b..a6c8704 100644 --- a/drivers/mmc/host/sdhci.h +++ b/drivers/mmc/host/sdhci.h @@ -107,6 +107,7 @@ #define SDHCI_INT_CARD_INSERT 0x0040 #define SDHCI_INT_CARD_REMOVE 0x0080 #define SDHCI_INT_CARD_INT0x0100 +#define SDHCI_INT_ERROR 0x8000 #define SDHCI_INT_TIMEOUT 0x0001 #define SDHCI_INT_CRC 0x0002 #define SDHCI_INT_END_BIT 0x0004 -- -- Pierre Ossman Linux kernel, MMC maintainerhttp://www.kernel.org PulseAudio, core developer http://pulseaudio.org rdesktop, core developer http://www.rdesktop.org - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Documentation for sysfs, hotplug, and firmware loading.
On Wednesday 18 July 2007 7:40:20 pm Greg KH wrote: On Wed, Jul 18, 2007 at 01:39:53PM -0400, Rob Landley wrote: PICK ONE! JUST #*%(#% PICK ONE! HHH! I don't care where it is. Just put it somewhere I can find it, and keep it there. All this gratuitous moving stuff around serves NO PURPOSE other than to break userspace. I'm trying to document this so that the next time you go oh wait, it should be at /sys/tarantula/fruitbat I can show that you're breaking an existing documented userspace API. There's a kernel config option to make symlinks from the old location. /sys/block makes as much sense as any other location, and it's what's there now. Read the sysfs documentation file we just added, it describes how this is all documented and should be used. So well that I do not think you need to try to document it again. I'm not trying to document all of sysfs, I'm trying to document hotplug. I realize now I should have been more clear about that. I've been working on the document I just posted on and off since may, (Possibly longer but I lost a lot of data in the hard drive crash on my laptop last month. For example, I can't find a copy of my half-finished history of hotplug document and will probably need to start over, although I've still got a few places to look to see if I backed up a copy...) This document has been sitting mostly unchanged on my hard drive since OLS, until I finally tracked down example code to do the netlink bit so I could finish it. I tried to bounce a copy of the everything but netlink version off of kay by replying to his email with notes from OLS, and that's when I bumped into the he's spam-blocking me issue. It got lost in the shuffle of OLS, and I just got back to it at the start of this thread. Earlier today I read (and commented on, in the message to Cornelia Huck) the copy of Documentation/sysfs-rules.txt. (Ah, darn it. I have too many open windows on my desktop. Hits send on message to Cornelia huck I _wrote_ earlier today.) Documentation/sysfs-rules.txt doesn't talk about /sbin/hotplug or netlink hotplug. It doesn't say how to distinguish a char device from a block device. It mostly talks about finding stuff under the /sys/devices directory, most of which isn't relevant to populating /dev. It doesn't clearly distinguish where you can find information in current kernels (2.6.22 and earlier) from stuff that hasn't gone into any existing release. Ideally I'd like to identify a subset of that information which is not only present in current kernels but should remain findable at that location in future kernels. Over half the document is about what _not_ to do, and consists of warnings about buggy apps, despite the assumption that anything _not_ explicitly documented is forbidden because most of the things sysfs exports are considered unmaintainable. I've read the stuff under Documentation/ABI/{stable,testing}, and would be happy to refer to it rather than duplicating if I could get the info I needed out of it. Documentation/filesystems/sysfs.txt is still from Patrick Mochel in 2003 and mostly about the kernel side rather than an API exported to userspace, and sysfs-pci.txt in that directory is similar. Is there more I missed? thanks, greg k-h Sorry, I'm not trying to be a pain. I'm trying to document something I had to figure out for myself experimentally in 2005, which has been broken for me by kernel changes twice since then (when the device symlink went in back around 2.6.14, and when subdirs turned to symlinks recently), and I'm told is changing again with the additon of /sys/class/block (which means /sys/class/* no longer contains just char devices). Ideally I'd like to come up with documentation that allows somebody to write one program that works on existing AND on new kernels, hence stable API. Rob -- One of my most productive days was throwing away 1000 lines of code. - Ken Thompson. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Documentation for sysfs, hotplug, and firmware loading.
On Fri, Jul 20, 2007 at 08:21:39PM -0400, Rob Landley wrote: Ok, back up. /sys/devices does not contain all the information necessary to populate /dev, because it hasn't got things like ramdisks, /dev/zero, /dev/console which are THERE in sysfs, which may or may not be supported by the kernel (the kernel might have ramdisk support, might not). Welcome to 2007: $ ls /sys/devices/virtual/mem/ full kmem kmsg mem null port random urandom zero $ ls /sys/devices/virtual/tty/ console tty12 tty19 tty25 tty31 tty38 tty44 tty50 tty57 tty63 ptmx tty13 tty2 tty26 tty32 tty39 tty45 tty51 tty58 tty7 tty tty14 tty20 tty27 tty33 tty4 tty46 tty52 tty59 tty8 tty0 tty15 tty21 tty28 tty34 tty40 tty47 tty53 tty6 tty9 tty1 tty16 tty22 tty29 tty35 tty41 tty48 tty54 tty60 tty10tty17 tty23 tty3 tty36 tty42 tty49 tty55 tty61 tty11tty18 tty24 tty30 tty37 tty43 tty5 tty56 tty62 I suggest you take a close look at the kernel before making statements like the above :) These things could also, in future, have their major and minor numbers dynamically (even randomly) assigned. That's been discussed on this list. I tried that once, it will require some core api kernel changes and a lot of infrastrucure work to get that to work properly. Not that it will never happen in the future, but it's just not a trivial change at the moment... thanks, greg k-h - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Documentation for sysfs, hotplug, and firmware loading.
On Fri, Jul 20, 2007 at 08:21:39PM -0400, Rob Landley wrote: I'm not trying to document /sys/devices. I'm trying to document hotplug, populating /dev, and things like firmware loading that fall out of that. This requires use of sysfs, and I'm only trying to document as much of sysfs as you need to do that. Like I stated before, you do not need to even have sysfs mounted to have a dynamic /dev. And why do you need to document populating /dev dynamically? udev already solves this problem for you, it's not like people are going off and reinventing udev for their own enjoyment would not at least look at how it solves this problem first. To do otherwise would be foolish :) Firmware loading is fine to document if you wish to do so. But again, why? We already have multiple userspace programs that provide this feature for them. Perhaps you want to document how to add firmware to a system in order for these different programs to pick them up? Or perhaps you want to document how to add this kind of functionality to your kernel driver so that it can handle firmware loading by using the firmware interface that the kernel provides? If you just want to document the hotplug/uevent api, then do just that. However I think you are overreaching with your scope here and getting mighty confused in the process. thanks, greg k-h - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Documentation for sysfs, hotplug, and firmware loading.
On Fri, Jul 20, 2007 at 08:21:39PM -0400, Rob Landley wrote: Always look at the parent devices themselves for determining device context properties. For determining? What was the original language of this document? Ok, that's just being mean, cut it out right now if you ever want my help again. I'll gladly accept patches for this document that is in the kernel tree now if you want to send them. But criticizing the grammer of a document with statements like this one gets you no where and is damm rude. I suggest you start this thread over if you want my feedback, I'm not going to respond anymore to this one. greg k-h - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Use the tsk argument in init_new_context()
On Thu, Jul 19, 2007 at 05:42:38PM -0700, Andrew Morton wrote: On Sun, 8 Jul 2007 22:55:08 -0300 Diego Woitasen [EMAIL PROTECTED] wrote: Signed-off-by: Diego Woitasen [EMAIL PROTECTED] --- arch/i386/kernel/ldt.c |2 +- arch/x86_64/kernel/ldt.c |2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/arch/i386/kernel/ldt.c b/arch/i386/kernel/ldt.c index e0b2d17..c2eb4fb 100644 --- a/arch/i386/kernel/ldt.c +++ b/arch/i386/kernel/ldt.c @@ -96,7 +96,7 @@ int init_new_context(struct task_struct *tsk, struct mm_struct *mm) init_MUTEX(mm-context.sem); mm-context.size = 0; - old_mm = current-mm; + old_mm = tsk-mm; if (old_mm old_mm-context.size 0) { down(old_mm-context.sem); retval = copy_ldt(mm-context, old_mm-context); diff --git a/arch/x86_64/kernel/ldt.c b/arch/x86_64/kernel/ldt.c index bc9ffd5..99a92ed 100644 --- a/arch/x86_64/kernel/ldt.c +++ b/arch/x86_64/kernel/ldt.c @@ -100,7 +100,7 @@ int init_new_context(struct task_struct *tsk, struct mm_struct *mm) init_MUTEX(mm-context.sem); mm-context.size = 0; - old_mm = current-mm; + old_mm = tsk-mm; if (old_mm old_mm-context.size 0) { down(old_mm-context.sem); retval = copy_ldt(mm-context, old_mm-context); When called from dup_mm(), `tsk' refers to the new task and `current' refers to the old one. I'd have expected this to crash during your testing? - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ Yes, sorry... that patch is bad. Now my question is, why all architectures have the task argument and neither use it? I undertand now that init_new_context() work with current but what about the *tsk arg. -- -- Diego Woitasen - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC, Announce] Unified x86 architecture, arch/x86
Thomas Gleixner wrote: [...] As usual, comments and suggestions are welcome! Compiles and boots fine here ( on my Dell Precision WorkStation 530 MT ). And nothing broke so far. I only got some Kconfig warnings[1] with my config[2] but that is. ( I don't know whatever this matter but it boots 7,52 seconds faster as current git head ) [1]http://194.231.229.228/linux-x86/warning [2]http://194.231.229.228/linux-x86/config-x86 Thomas, Ingo Regards, Gabriel C - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [broken-out-2007-07-20-00-22] kernel bug at kernel/params:570
On Sat, Jul 21, 2007 at 02:28:52AM +0200, Michal Piotrowski wrote: On 21/07/07, Satyam Sharma [EMAIL PROTECTED] wrote: Oh, which means ... On 7/21/07, Satyam Sharma [EMAIL PROTECTED] wrote: On 7/21/07, Greg KH [EMAIL PROTECTED] wrote: On Fri, Jul 20, 2007 at 03:59:12PM -0700, Andrew Morton wrote: On Fri, 20 Jul 2007 15:50:47 -0700 Greg KH [EMAIL PROTECTED] wrote: On Fri, Jul 20, 2007 at 06:32:21PM +0200, Michal Piotrowski wrote: Hi Greg, This looks like a sysfs bug http://www.stardust.webpages.pl/files/tbf/bitis-gabonica/ broken-out-2007-07-20-00-22/3.jpg l *kernel_param_sysfs_setup+0x75 0xc13c0894 is in kernel_param_sysfs_setup (kernel/params.c:570). 565 mk-mod = THIS_MODULE; 566 kobj_set_kset_s(mk, module_subsys); 567 kobject_set_name(mk-kobj, name); Shouldn't the return of kobject_set_name() be checked here? [ Looking at code, and realizing that kobject_set_name() manages to succeed even when given a null string! ] 568 kobject_init(mk-kobj); 569 ret = kobject_add(mk-kobj); 570 BUG_ON(ret 0); 571 param_sysfs_setup(mk, kparam, num_params, name_skip); 572 kobject_uevent(mk-kobj, KOBJ_ADD); 573 } 574 http://www.stardust.webpages.pl/files/tbf/bitis-gabonica/ broken-out-2007-07-20-00-22/mm-config What kernel version is this happening on? The -mm tree? Can you try Linus's tree instead? It looks like there was some needed information right before the first stack dump, showing exactly what kobject was trying to be added that was already present. Odds are this is a kernel parameter with the same name as a duplicate one within the same module, I don't think that's an -EEXIST. I think what we have here is kobject_add() exiting with -EINVAL. (kobject attempted to be registered with no name!) [ The first trace on that screen shows: kobject_shadow_add+0x5b/0x189. That's the WARN_ON(1) at lib/kobject.c:176. If it was a EEXIST case, we would've seen an offset in kobject_shadow_add closer to 0x189, because the dump_stack() for EEXIST is barely 4 instructions before we return from that function. ] but the trick is going to be trying to figure out what module is causing this. So I'd guess we want to search for a module that's passing a kobject * to kobject_add() such that !kobj-k_name is true. Oh, that's kernel_param_sysfs_setup itself. So we actually need to search for a built-in module in Michal's config that ... has an ... empty modname !? I'll try to figure out this Try the patch below to help you boot and figure out what went wrong. Post the kernel log results and I'll try to help you out. thanks, greg k-h --- kernel/params.c |6 +- 1 file changed, 5 insertions(+), 1 deletion(-) --- a/kernel/params.c +++ b/kernel/params.c @@ -567,7 +567,11 @@ static void __init kernel_param_sysfs_se kobject_set_name(mk-kobj, name); kobject_init(mk-kobj); ret = kobject_add(mk-kobj); - BUG_ON(ret 0); + if (ret) { + printk(KERN_ERR module '%s' failed to be added to sysfs, + the system will be unstable now.\n, name); + return; + } param_sysfs_setup(mk, kparam, num_params, name_skip); kobject_uevent(mk-kobj, KOBJ_ADD); } - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
film at 11: kernel update breaks udev.
Just one of my machines to 2.6.22.1, and got this during boot.. Starting udev: udevd-event[619]: udev_node_symlink: symlink(../../sdc/dev/disk/by-uuid/2d773baf-8174-10a6-14db-a78e0e676e89) failed: File exists Under 2.6.21, all was fine. sdc is one disk of a 3 disk raid5 set. The raidset still manages to come up despite this. This is a Fedora 7 box, with udev-106-4.1.fc7 What changed this time? Dave -- http://www.codemonkey.org.uk - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] hugetlbfs read() support
Nishanth Aravamudan wrote: On 19.07.2007 [09:58:50 -0700], Andrew Morton wrote: On Thu, 19 Jul 2007 08:51:49 -0700 Badari Pulavarty [EMAIL PROTECTED] wrote: + } + + offset += ret; + retval += ret; + len -= ret; + index += offset HPAGE_SHIFT; + offset = ~HPAGE_MASK; + + page_cache_release(page); + if (ret == nr len) + continue; + goto out; + } +out: + return retval; +} This code doesn't have all the ghastly tricks which we deploy to handle concurrent truncate. Do I need to ? Baaahh!! I don't want to deal with them. Nick, can you think of any serious consequences of a read/truncate race in there? I can't.. All I want is a simple read() to get my oprofile working. Please advise. Did you consider changing oprofile userspace to read the executable with mmap? It's not actually oprofile's code, though, it's libbfd (used by oprofile). And it works fine (presumably) for other binaries. So... what's the problem with changing it? The fact that it is a library doesn't really make a difference except that you'll also help everyone else who links with it. It won't break backwards compatibility, and it will work on older kernels... -- SUSE Labs, Novell Inc. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: film at 11: kernel update breaks udev.
On 7/21/07, Dave Jones [EMAIL PROTECTED] wrote: Just one of my machines to 2.6.22.1, and got this during boot.. Starting udev: udevd-event[619]: udev_node_symlink: symlink(../../sdc/dev/disk/by-uuid/2d773baf-8174-10a6-14db-a78e0e676e89) failed: File exists Under 2.6.21, all was fine. sdc is one disk of a 3 disk raid5 set. The raidset still manages to come up despite this. This is a Fedora 7 box, with udev-106-4.1.fc7 What changed this time? CONFIG_BLK_DEV_BSG=y? There's a name-clash, because bsg tries to create devices with the same name. James sent a patch, it's on lkml. Kay - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: where is the code for read system call?
Am Samstag, 21. Juli 2007 schrieb Agarwal, Lomesh: My application reads from socket. I need to change the behavior of read system call for an experiment. Can someone point me to code? fs/read_write.c: line 356 asmlinkage ssize_t sys_read(unsigned int fd, char __user * buf, size_t count) - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: film at 11: kernel update breaks udev.
On Sat, Jul 21, 2007 at 03:09:55AM +0200, Kay Sievers wrote: On 7/21/07, Dave Jones [EMAIL PROTECTED] wrote: Just one of my machines to 2.6.22.1, and got this during boot.. Starting udev: udevd-event[619]: udev_node_symlink: symlink(../../sdc/dev/disk/by-uuid/2d773baf-8174-10a6-14db-a78e0e676e89) failed: File exists Under 2.6.21, all was fine. sdc is one disk of a 3 disk raid5 set. The raidset still manages to come up despite this. This is a Fedora 7 box, with udev-106-4.1.fc7 What changed this time? CONFIG_BLK_DEV_BSG=y? There's a name-clash, because bsg tries to create devices with the same name. James sent a patch, it's on lkml. BSG isn't in 2.6.22 Dave -- http://www.codemonkey.org.uk - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 1/7] lguest: documentation pt I: Preparation
The netfilter code had very good documentation: the Netfilter Hacking HOWTO. Noone ever read it. So this time I'm trying something different, using a bit of Knuthiness. Start with drivers/lguest/README. Signed-off-by: Rusty Russell [EMAIL PROTECTED] --- Documentation/lguest/extract | 58 + Documentation/lguest/lguest.c |9 +++-- drivers/lguest/Makefile | 12 ++ drivers/lguest/README | 47 ++ drivers/lguest/core.c |7 ++- drivers/lguest/hypercalls.c |9 +++-- drivers/lguest/interrupts_and_traps.c | 13 +++ drivers/lguest/io.c |8 +++- drivers/lguest/lguest.c | 30 +++-- drivers/lguest/lguest_bus.c |3 + drivers/lguest/lguest_user.c |7 +++ drivers/lguest/page_tables.c | 10 - drivers/lguest/segments.c | 11 ++ drivers/lguest/switcher.S | 13 +++ 14 files changed, 218 insertions(+), 19 deletions(-) === --- /dev/null +++ b/Documentation/lguest/extract @@ -0,0 +1,58 @@ +#! /bin/sh + +set -e + +PREFIX=$1 +shift + +trap 'rm -r $TMPDIR' 0 +TMPDIR=`mktemp -d` + +exec 3/dev/null +for f; do +while IFS= + read -r LINE; do + case $LINE in + *$PREFIX:[0-9]*:\**) + NUM=`echo $LINE | sed s/.*$PREFIX:\([0-9]*\).*/\1/` + if [ -f $TMPDIR/$NUM ]; then + echo $TMPDIR/$NUM already exits prior to $f + exit 1 + fi + exec 3$TMPDIR/$NUM + echo $f | sed 's,\.\./,,g' $TMPDIR/.$NUM + /bin/echo $LINE | sed -e s/$PREFIX:[0-9]*// -e s/:\*/*/ 3 + ;; + *$PREFIX:[0-9]*) + NUM=`echo $LINE | sed s/.*$PREFIX:\([0-9]*\).*/\1/` + if [ -f $TMPDIR/$NUM ]; then + echo $TMPDIR/$NUM already exits prior to $f + exit 1 + fi + exec 3$TMPDIR/$NUM + echo $f | sed 's,\.\./,,g' $TMPDIR/.$NUM + /bin/echo $LINE | sed s/$PREFIX:[0-9]*// 3 + ;; + *:\**) + /bin/echo $LINE | sed -e s/:\*/*/ -e s,/\*\*/,, 3 + echo 3 + exec 3/dev/null + ;; + *) + /bin/echo $LINE 3 + ;; + esac +done $f +echo 3 +exec 3/dev/null +done + +LASTFILE= +for f in $TMPDIR/*; do +if [ $LASTFILE != $(cat $TMPDIR/.$(basename $f) ) ]; then + LASTFILE=$(cat $TMPDIR/.$(basename $f) ) + echo [ $LASTFILE ] +fi +cat $f +done + === --- a/Documentation/lguest/lguest.c +++ b/Documentation/lguest/lguest.c @@ -1,5 +1,10 @@ -/* Simple program to layout physical memory for new lguest guest. - * Linked high to avoid likely physical memory. */ +/*P:100 This is the Launcher code, a simple program which lays out the + * physical memory for the new Guest by mapping the kernel image and the + * virtual devices, then reads repeatedly from /dev/lguest to run the Guest. + * + * The only trick: the Makefile links it statically at a high address, so it + * will be clear of the guest memory region. It means that each Guest cannot + * have more than 2.5G of memory on a normally configured Host. :*/ #define _LARGEFILE64_SOURCE #define _GNU_SOURCE #include stdio.h === --- a/drivers/lguest/Makefile +++ b/drivers/lguest/Makefile @@ -5,3 +5,15 @@ obj-$(CONFIG_LGUEST) += lg.o obj-$(CONFIG_LGUEST) += lg.o lg-y := core.o hypercalls.o page_tables.o interrupts_and_traps.o \ segments.o io.o lguest_user.o switcher.o + +Preparation Preparation!: PREFIX=P +Guest: PREFIX=G +Drivers: PREFIX=D +Launcher: PREFIX=L +Host: PREFIX=H +Switcher: PREFIX=S +Mastery: PREFIX=M +Beer: + @for f in Preparation Guest Drivers Launcher Host Switcher Mastery; do echo {==- $$f -==}; make -s $$f; done; echo {==-==} +Preparation Preparation! Guest Drivers Launcher Host Switcher Mastery: + @sh ../../Documentation/lguest/extract $(PREFIX) `find ../../* -name '*.[chS]' -wholename '*lguest*'` === --- /dev/null +++ b/drivers/lguest/README @@ -0,0 +1,47 @@ +Welcome, friend reader, to lguest. + +Lguest is an adventure, with you, the reader, as Hero. I can't think of many +5000-line projects which offer both such capability and glimpses of future +potential; it is an exciting time to be delving into the source! + +But be warned; this is an arduous journey of several hours or more! And as we +know, all true Heroes are driven by a Noble Goal. Thus I offer a Beer (or +equivalent) to anyone I meet who has completed this documentation. + +So get
[PATCH 2/7] lguest: documentation pt II: Guest
Documentation: The Guest Signed-off-by: Rusty Russell [EMAIL PROTECTED] --- drivers/lguest/lguest.c | 458 --- drivers/lguest/lguest_asm.S | 57 +++-- include/linux/lguest.h | 47 +++- 3 files changed, 512 insertions(+), 50 deletions(-) === --- a/drivers/lguest/lguest.c +++ b/drivers/lguest/lguest.c @@ -66,6 +66,12 @@ #include asm/mce.h #include asm/io.h +/*G:010 Welcome to the Guest! + * + * The Guest in our tale is a simple creature: identical to the Host but + * behaving in simplified but equivalent ways. In particular, the Guest is the + * same kernel as the Host (or at least, built from the same source code). :*/ + /* Declarations for definitions in lguest_guest.S */ extern char lguest_noirq_start[], lguest_noirq_end[]; extern const char lgstart_cli[], lgend_cli[]; @@ -84,7 +90,26 @@ struct lguest_device_desc *lguest_device struct lguest_device_desc *lguest_devices; static cycle_t clock_base; -static enum paravirt_lazy_mode lazy_mode; +/*G:035 Notice the lazy_hcall() above, rather than hcall(). This is our first + * real optimization trick! + * + * When lazy_mode is set, it means we're allowed to defer all hypercalls and do + * them as a batch when lazy_mode is eventually turned off. Because hypercalls + * are reasonably expensive, batching them up makes sense. For example, a + * large mmap might update dozens of page table entries: that code calls + * lguest_lazy_mode(PARAVIRT_LAZY_MMU), does the dozen updates, then calls + * lguest_lazy_mode(PARAVIRT_LAZY_NONE). + * + * So, when we're in lazy mode, we call async_hypercall() to store the call for + * future processing. When lazy mode is turned off we issue a hypercall to + * flush the stored calls. + * + * There's also a hack where mode is set to PARAVIRT_LAZY_FLUSH which + * indicates we're to flush any outstanding calls immediately. This is used + * when an interrupt handler does a kmap_atomic(): the page table changes must + * happen immediately even if we're in the middle of a batch. Usually we're + * not, though, so there's nothing to do. */ +static enum paravirt_lazy_mode lazy_mode; /* Note: not SMP-safe! */ static void lguest_lazy_mode(enum paravirt_lazy_mode mode) { if (mode == PARAVIRT_LAZY_FLUSH) { @@ -108,6 +133,16 @@ static void lazy_hcall(unsigned long cal async_hcall(call, arg1, arg2, arg3); } +/* async_hcall() is pretty simple: I'm quite proud of it really. We have a + * ring buffer of stored hypercalls which the Host will run though next time we + * do a normal hypercall. Each entry in the ring has 4 slots for the hypercall + * arguments, and a hcall_status word which is 0 if the call is ready to go, + * and 255 once the Host has finished with it. + * + * If we come around to a slot which hasn't been finished, then the table is + * full and we just make the hypercall directly. This has the nice side + * effect of causing the Host to run all the stored calls in the ring buffer + * which empties it for next time! */ void async_hcall(unsigned long call, unsigned long arg1, unsigned long arg2, unsigned long arg3) { @@ -115,6 +150,9 @@ void async_hcall(unsigned long call, static unsigned int next_call; unsigned long flags; + /* Disable interrupts if not already disabled: we don't want an +* interrupt handler making a hypercall while we're already doing +* one! */ local_irq_save(flags); if (lguest_data.hcall_status[next_call] != 0xFF) { /* Table full, so do normal hcall which will flush table. */ @@ -124,7 +162,7 @@ void async_hcall(unsigned long call, lguest_data.hcalls[next_call].edx = arg1; lguest_data.hcalls[next_call].ebx = arg2; lguest_data.hcalls[next_call].ecx = arg3; - /* Make sure host sees arguments before valid flag. */ + /* Arguments must all be written before we mark it to go */ wmb(); lguest_data.hcall_status[next_call] = 0; if (++next_call == LHCALL_RING_SIZE) @@ -132,9 +170,14 @@ void async_hcall(unsigned long call, } local_irq_restore(flags); } - +/*:*/ + +/* Wrappers for the SEND_DMA and BIND_DMA hypercalls. This is mainly because + * Jeff Garzik complained that __pa() should never appear in drivers, and this + * helps remove most of them. But also, it wraps some ugliness. */ void lguest_send_dma(unsigned long key, struct lguest_dma *dma) { + /* The hcall might not write this if something goes wrong */ dma-used_len = 0; hcall(LHCALL_SEND_DMA, key, __pa(dma), 0); } @@ -142,11 +185,16 @@ int lguest_bind_dma(unsigned long key, s int lguest_bind_dma(unsigned long key, struct lguest_dma *dmas, unsigned int num, u8 irq) { + /* This is the only hypercall which actually wants 5
[PATCH 3/7] lguest: documentation pt III: Drivers
Documentation: The Drivers Signed-off-by: Rusty Russell [EMAIL PROTECTED] --- drivers/block/lguest_blk.c | 171 +++--- drivers/char/hvc_lguest.c | 77 + drivers/lguest/lguest_bus.c | 72 drivers/net/lguest_net.c| 222 +++ include/linux/lguest_bus.h |5 include/linux/lguest_launcher.h | 60 ++ 6 files changed, 565 insertions(+), 42 deletions(-) === --- a/drivers/block/lguest_blk.c +++ b/drivers/block/lguest_blk.c @@ -1,6 +1,12 @@ -/* A simple block driver for lguest. - * - * Copyright 2006 Rusty Russell [EMAIL PROTECTED] IBM Corporation +/*D:400 + * The Guest block driver + * + * This is a simple block driver, which appears as /dev/lgba, lgbb, lgbc etc. + * The mechanism is simple: we place the information about the request in the + * device page, then use SEND_DMA (containing the data for a write, or an empty + * ping DMA for a read). + :*/ +/* Copyright 2006 Rusty Russell [EMAIL PROTECTED] IBM Corporation * * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License as published by @@ -25,27 +31,50 @@ static char next_block_index = 'a'; +/*D:420 Here is the structure which holds all the information we need about + * each Guest block device. + * + * I'm sure at this stage, you're wondering hey, where was the adventure I was + * promised? and thinking Rusty sucks, I shall say nasty things about him on + * my blog. I think Real adventures have boring bits, too, and you're in the + * middle of one. But it gets better. Just not quite yet. */ struct blockdev { + /* The block queue infrastructure wants a spinlock: it is held while it +* calls our block request function. We grab it in our interrupt +* handler so the responses don't mess with new requests. */ spinlock_t lock; - /* The disk structure for the kernel. */ + /* The disk structure registered with kernel. */ struct gendisk *disk; - /* The major number for this disk. */ + /* The major device number for this disk, and the interrupt. We only +* really keep them here for completeness; we'd need them if we +* supported device unplugging. */ int major; int irq; + /* The physical address of this device's memory page */ unsigned long phys_addr; - /* The mapped block page. */ + /* The mapped memory page for convenient acces. */ struct lguest_block_page *lb_page; - /* We only have a single request outstanding at a time. */ + /* We only have a single request outstanding at a time: this is it. */ struct lguest_dma dma; struct request *req; }; -/* Jens gave me this nice helper to end all chunks of a request. */ +/*D:495 We originally used end_request() throughout the driver, but it turns + * out that end_request() is deprecated, and doesn't actually end the request + * (which seems like a good reason to deprecate it!). It simply ends the first + * bio. So if we had 3 bios in a struct request we would do all 3, + * end_request(), do 2, end_request(), do 1 and end_request(): twice as much + * work as we needed to do. + * + * This reinforced to me that I do not understand the block layer. + * + * Nonetheless, Jens Axboe gave me this nice helper to end all chunks of a + * request. This improved disk speed by 130%. */ static void end_entire_request(struct request *req, int uptodate) { if (end_that_request_first(req, uptodate, req-hard_nr_sectors)) @@ -55,30 +84,62 @@ static void end_entire_request(struct re end_that_request_last(req, uptodate); } +/* I'm told there are only two stories in the world worth telling: love and + * hate. So there used to be a love scene here like this: + * + * Launcher: We could make beautiful I/O together, you and I. + * Guest: My, that's a big disk! + * + * Unfortunately, it was just too raunchy for our otherwise-gentle tale. */ + +/*D:490 This is the interrupt handler, called when a block read or write has + * been completed for us. */ static irqreturn_t lgb_irq(int irq, void *_bd) { + /* We handed our struct blockdev as the argument to request_irq(), so +* it is passed through to us here. This tells us which device we're +* dealing with in case we have more than one. */ struct blockdev *bd = _bd; unsigned long flags; + /* We weren't doing anything? Strange, but could happen if we shared +* interrupts (we don't!). */ if (!bd-req) { pr_debug(No work!\n); return IRQ_NONE; } + /* Not done yet? That's equally strange. */ if (!bd-lb_page-result) { pr_debug(No result!\n); return IRQ_NONE; } + /* We have
[PATCH 6/7] lguest: documentation pt VI: Switcher
Documentation: The Switcher Signed-off-by: Rusty Russell [EMAIL PROTECTED] --- drivers/lguest/core.c | 51 +++- drivers/lguest/switcher.S | 271 ++--- 2 files changed, 276 insertions(+), 46 deletions(-) === --- a/drivers/lguest/core.c +++ b/drivers/lguest/core.c @@ -394,46 +394,89 @@ static void set_ts(void) write_cr0(cr0|8); } +/*S:010 + * We are getting close to the Switcher. + * + * Remember that each CPU has two pages which are visible to the Guest when it + * runs on that CPU. This has to contain the state for that Guest: we copy the + * state in just before we run the Guest. + * + * Each Guest has changed flags which indicate what has changed in the Guest + * since it last ran. We saw this set in interrupts_and_traps.c and + * segments.c. + */ static void copy_in_guest_info(struct lguest *lg, struct lguest_pages *pages) { + /* Copying all this data can be quite expensive. We usually run the +* same Guest we ran last time (and that Guest hasn't run anywhere else +* meanwhile). If that's not the case, we pretend everything in the +* Guest has changed. */ if (__get_cpu_var(last_guest) != lg || lg-last_pages != pages) { __get_cpu_var(last_guest) = lg; lg-last_pages = pages; lg-changed = CHANGED_ALL; } - /* These are pretty cheap, so we do them unconditionally. */ + /* These copies are pretty cheap, so we do them unconditionally: */ + /* Save the current Host top-level page directory. */ pages-state.host_cr3 = __pa(current-mm-pgd); + /* Set up the Guest's page tables to see this CPU's pages (and no +* other CPU's pages). */ map_switcher_in_guest(lg, pages); + /* Set up the two TSS members which tell the CPU what stack to use +* for traps which do directly into the Guest (ie. traps at privilege +* level 1). */ pages-state.guest_tss.esp1 = lg-esp1; pages-state.guest_tss.ss1 = lg-ss1; - /* Copy direct trap entries. */ + /* Copy direct-to-Guest trap entries. */ if (lg-changed CHANGED_IDT) copy_traps(lg, pages-state.guest_idt, default_idt_entries); - /* Copy all GDT entries but the TSS. */ + /* Copy all GDT entries which the Guest can change. */ if (lg-changed CHANGED_GDT) copy_gdt(lg, pages-state.guest_gdt); /* If only the TLS entries have changed, copy them. */ else if (lg-changed CHANGED_GDT_TLS) copy_gdt_tls(lg, pages-state.guest_gdt); + /* Mark the Guest as unchanged for next time. */ lg-changed = 0; } +/* Finally: the code to actually call into the Switcher to run the Guest. */ static void run_guest_once(struct lguest *lg, struct lguest_pages *pages) { + /* This is a dummy value we need for GCC's sake. */ unsigned int clobber; + /* Copy the guest-specific information into this CPU's struct +* lguest_pages. */ copy_in_guest_info(lg, pages); - /* Put eflags on stack, lcall does rest: suitable for iret return. */ + /* Now: we push the eflags register on the stack, then do an lcall. +* This is how we change from using the kernel code segment to using +* the dedicated lguest code segment, as well as jumping into the +* Switcher. +* +* The lcall also pushes the old code segment (KERNEL_CS) onto the +* stack, then the address of this call. This stack layout happens to +* exactly match the stack of an interrupt... */ asm volatile(pushf; lcall *lguest_entry +/* This is how we tell GCC that %eax (a) and %ebx (b) + * are changed by this routine. The = means output. */ : =a(clobber), =b(clobber) +/* %eax contains the pages pointer. (0 refers to the + * 0-th argument above, ie a). %ebx contains the + * physical address of the Guest's top-level page + * directory. */ : 0(pages), 1(__pa(lg-pgdirs[lg-pgdidx].pgdir)) +/* We tell gcc that all these registers could change, + * which means we don't have to save and restore them in + * the Switcher. */ : memory, %edx, %ecx, %edi, %esi); } +/*:*/ /*H:030 Let's jump straight to the the main loop which runs the Guest. * Remember, this is called by the Launcher reading /dev/lguest, and we keep === --- a/drivers/lguest/switcher.S +++ b/drivers/lguest/switcher.S @@ -6,41 +6,131 @@ * are feeling invigorated and refreshed then the next, more challenging stage * can be found in make Guest. :*/ +/*S:100
Re: [PATCH] hugetlbfs read() support
Andrew Morton wrote: On Thu, 19 Jul 2007 08:51:49 -0700 Badari Pulavarty [EMAIL PROTECTED] wrote: This code doesn't have all the ghastly tricks which we deploy to handle concurrent truncate. Do I need to ? Baaahh!! I don't want to deal with them. Nick, can you think of any serious consequences of a read/truncate race in there? I can't.. As it doesn't allow writes, then I _think_ it should be OK. If you ever did want to add write(2) support, then you would have transient zeroes problems. But I'm not completely sure.. we've had a lot of (and still have some known and probably unknown) bugs just in that single generic_mapping_read function, most of which are due to our rabid aversion to doing any locking whatsoever there. So why not just hold i_mutex around the whole thing to be safe? -- SUSE Labs, Novell Inc. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 7/7] lguest: documentation pt VII: FIXMEs
Documentation: The FIXMEs Signed-off-by: Rusty Russell [EMAIL PROTECTED] --- Documentation/lguest/lguest.c | 12 drivers/char/hvc_lguest.c |3 +++ drivers/lguest/interrupts_and_traps.c | 14 ++ drivers/lguest/io.c | 10 ++ drivers/lguest/lguest.c |8 drivers/lguest/lguest_asm.S | 14 ++ drivers/lguest/page_tables.c |5 + drivers/lguest/segments.c |4 drivers/net/lguest_net.c | 19 +++ 9 files changed, 89 insertions(+) === --- a/Documentation/lguest/lguest.c +++ b/Documentation/lguest/lguest.c @@ -1536,3 +1536,15 @@ int main(int argc, char *argv[]) /* Finally, run the Guest. This doesn't return. */ run_guest(lguest_fd, device_list); } +/*:*/ + +/*M:999 + * Mastery is done: you now know everything I do. + * + * But surely you have seen code, features and bugs in your wanderings which + * you now yearn to attack? That is the real game, and I look forward to you + * patching and forking lguest into the Your-Name-Here-visor. + * + * Farewell, and good coding! + * Rusty Russell. + */ === --- a/drivers/char/hvc_lguest.c +++ b/drivers/char/hvc_lguest.c @@ -13,6 +13,9 @@ * functions. :*/ +/*M:002 The console can be flooded: while the Guest is processing input the + * Host can send more. Buffering in the Host could alleviate this, but it is a + * difficult problem in general. :*/ /* Copyright (C) 2006 Rusty Russell, IBM Corporation * * This program is free software; you can redistribute it and/or modify === --- a/drivers/lguest/interrupts_and_traps.c +++ b/drivers/lguest/interrupts_and_traps.c @@ -231,6 +231,20 @@ static int direct_trap(const struct lgue * go direct, of course 8) */ return idt_type(trap-a, trap-b) == 0xF; } +/*:*/ + +/*M:005 The Guest has the ability to turn its interrupt gates into trap gates, + * if it is careful. The Host will let trap gates can go directly to the + * Guest, but the Guest needs the interrupts atomically disabled for an + * interrupt gate. It can do this by pointing the trap gate at instructions + * within noirq_start and noirq_end, where it can safely disable interrupts. */ + +/*M:006 The Guests do not use the sysenter (fast system call) instruction, + * because it's hardcoded to enter privilege level 0 and so can't go direct. + * It's about twice as fast as the older int 0x80 system call, so it might + * still be worthwhile to handle it in the Switcher and lcall down to the + * Guest. The sysenter semantics are hairy tho: search for that keyword in + * entry.S :*/ /*H:260 When we make traps go directly into the Guest, we need to make sure * the kernel stack is valid (ie. mapped in the page tables). Otherwise, the === --- a/drivers/lguest/io.c +++ b/drivers/lguest/io.c @@ -553,6 +553,16 @@ void release_all_dma(struct lguest *lg) up_read(lg-mm-mmap_sem); } +/*M:007 We only return a single DMA buffer to the Launcher, but it would be + * more efficient to return a pointer to the entire array of DMA buffers, which + * it can cache and choose one whenever it wants. + * + * Currently the Launcher uses a write to /dev/lguest, and the return value is + * the address of the DMA structure with the interrupt number placed in + * dma-used_len. If we wanted to return the entire array, we need to return + * the address, array size and interrupt number: this seems to require an + * ioctl(). :*/ + /*L:320 This routine looks for a DMA buffer registered by the Guest on the * given key (using the BIND_DMA hypercall). */ unsigned long get_dma_buffer(struct lguest *lg, === --- a/drivers/lguest/lguest.c +++ b/drivers/lguest/lguest.c @@ -251,6 +251,14 @@ static void irq_enable(void) { lguest_data.irq_enabled = X86_EFLAGS_IF; } +/*:*/ +/*M:003 Note that we don't check for outstanding interrupts when we re-enable + * them (or when we unmask an interrupt). This seems to work for the moment, + * since interrupts are rare and we'll just get the interrupt on the next timer + * tick, but now we have CONFIG_NO_HZ, we should revisit this. One way + * would be to put the irq_enabled field in a page by itself, and have the + * Host write-protect it when an interrupt comes in when irqs are disabled. + * There will then be a page fault as soon as interrupts are re-enabled. :*/ /*G:034 * The Interrupt Descriptor Table (IDT). === --- a/drivers/lguest/lguest_asm.S +++ b/drivers/lguest/lguest_asm.S @@ -41,6 +41,20 @@ LGUEST_PATCH(pushf, movl
Re: [PATCH] AFS: Fix file locking
Andrew Morton wrote: On Wed, 18 Jul 2007 15:56:53 +1000 Nick Piggin [EMAIL PROTECTED] wrote: Andrew Morton wrote: On Tue, 17 Jul 2007 13:47:32 +0100 David Howells [EMAIL PROTECTED] wrote: + if (type == AFS_LOCK_READ + vnode-flags (1 AFS_VNODE_READLOCKED)) { Here we use vnode-flags (1 foo) + set_bit(AFS_VNODE_LOCKING, vnode-flags); and elsewhere we use set_bit(foo, vnode-flags) and clear_bit() This is a bit strange. Does the open-coded bit-test have any performance benefit on any architecture? Not on x86 at least, afaik. It uses locked operations on x86, but you can use __set_bit instead (which should always be at least as efficient as the C version). I said bit-test. ie: test_bit(). That doesn't use a locked operation. So you did. Then to answer that, yes it could be faster because there are stupid volatiles sprinkled all over the bitops code so you could easily end up having to do more loads. Does it make a real difference? Unlikely, but David loves counting cycles :) -- SUSE Labs, Novell Inc. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: film at 11: kernel update breaks udev.
On 7/21/07, Dave Jones [EMAIL PROTECTED] wrote: On Sat, Jul 21, 2007 at 03:09:55AM +0200, Kay Sievers wrote: On 7/21/07, Dave Jones [EMAIL PROTECTED] wrote: Just one of my machines to 2.6.22.1, and got this during boot.. Starting udev: udevd-event[619]: udev_node_symlink: symlink(../../sdc/dev/disk/by-uuid/2d773baf-8174-10a6-14db-a78e0e676e89) failed: File exists Under 2.6.21, all was fine. sdc is one disk of a 3 disk raid5 set. The raidset still manages to come up despite this. This is a Fedora 7 box, with udev-106-4.1.fc7 What changed this time? CONFIG_BLK_DEV_BSG=y? There's a name-clash, because bsg tries to create devices with the same name. James sent a patch, it's on lkml. BSG isn't in 2.6.22 Ok. There has nothing else changed, that I could think of what could cause this. The code in udev that prints this message looks like: err(symlink(%s, %s) failed: %s, linktarget, filename, strerror(errno)); That doesn't really match what you posted. Are there chars missing? Can you please recheck? And what does: udevtest /block/sdc print? Thanks, Kay - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2/3] i386: use x86_64's desc_def.h
* Rusty Russell ([EMAIL PROTECTED]) wrote: On Thu, 2007-07-19 at 09:27 +1000, Rusty Russell wrote: On Wed, 2007-07-18 at 09:19 -0700, Zachary Amsden wrote: +#define GET_CONTENTS(desc) (((desc)-raw32.b 10) 3) +#define GET_WRITABLE(desc) (((desc)-raw32.b 9) 1) You got rid of the duplicate definitions here, but then added new duplicates (GET_CONTENTS / WRITABLE). Can you stick them in desc.h? To be honest, I got sick of counting bits at this point, and didn't want to introduce bugs. Here's the updated version of PATCH 1/3: And 2/3: === i386: use x86_64's desc_def.h plus this needed as well now Index: linus-2.6/include/asm-i386/xen/hypercall.h === --- linus-2.6.orig/include/asm-i386/xen/hypercall.h +++ linus-2.6/include/asm-i386/xen/hypercall.h @@ -359,8 +359,8 @@ MULTI_update_descriptor(struct multicall mcl-op = __HYPERVISOR_update_descriptor; mcl-args[0] = maddr; mcl-args[1] = maddr 32; - mcl-args[2] = desc.a; - mcl-args[3] = desc.b; + mcl-args[2] = desc.raw32.a; + mcl-args[3] = desc.raw32.b; } static inline void Index: linus-2.6/drivers/lguest/interrupts_and_traps.c === --- linus-2.6.orig/drivers/lguest/interrupts_and_traps.c +++ linus-2.6/drivers/lguest/interrupts_and_traps.c @@ -103,9 +103,9 @@ void maybe_do_interrupt(struct lguest *l } idt = lg-idt[FIRST_EXTERNAL_VECTOR+irq]; - if (idt_present(idt-a, idt-b)) { + if (idt_present(idt-raw32.a, idt-raw32.b)) { clear_bit(irq, lg-irqs_pending); - set_guest_interrupt(lg, idt-a, idt-b, 0); + set_guest_interrupt(lg, idt-raw32.a, idt-raw32.b, 0); } } @@ -116,7 +116,7 @@ static int has_err(unsigned int trap) int deliver_trap(struct lguest *lg, unsigned int num) { - u32 lo = lg-idt[num].a, hi = lg-idt[num].b; + u32 lo = lg-idt[num].raw32.a, hi = lg-idt[num].raw32.b; if (!idt_present(lo, hi)) return 0; @@ -139,7 +139,7 @@ static int direct_trap(const struct lgue return 0; /* Interrupt gates (0xE) or not present (0x0) can't go direct. */ - return idt_type(trap-a, trap-b) == 0xF; + return idt_type(trap-raw32.a, trap-raw32.b) == 0xF; } void pin_stack_pages(struct lguest *lg) @@ -170,15 +170,15 @@ static void set_trap(struct lguest *lg, u8 type = idt_type(lo, hi); if (!idt_present(lo, hi)) { - trap-a = trap-b = 0; + trap-raw32.a = trap-raw32.b = 0; return; } if (type != 0xE type != 0xF) kill_guest(lg, bad IDT type %i, type); - trap-a = ((__KERNEL_CS|GUEST_PL)16) | (lo0x); - trap-b = (hi0xEF00); + trap-raw32.a = ((__KERNEL_CS|GUEST_PL)16) | (lo0x); + trap-raw32.b = (hi0xEF00); } void load_guest_idt_entry(struct lguest *lg, unsigned int num, u32 lo, u32 hi) @@ -204,8 +204,8 @@ static void default_idt_entry(struct des if (trap == LGUEST_TRAP_ENTRY) flags |= (GUEST_PL 13); - idt-a = (LGUEST_CS16) | (handler0x); - idt-b = (handler0x) | flags; + idt-raw32.a = (LGUEST_CS16) | (handler0x); + idt-raw32.b = (handler0x) | flags; } void setup_default_idt_entries(struct lguest_ro_state *state, Index: linus-2.6/drivers/lguest/lg.h === --- linus-2.6.orig/drivers/lguest/lg.h +++ linus-2.6/drivers/lguest/lg.h @@ -44,8 +44,8 @@ void free_pagetables(void); int init_pagetables(struct page **switcher_page, unsigned int pages); /* Full 4G segment descriptors, suitable for CS and DS. */ -#define FULL_EXEC_SEGMENT ((struct desc_struct){0x, 0x00cf9b00}) -#define FULL_SEGMENT ((struct desc_struct){0x, 0x00cf9300}) +#define FULL_EXEC_SEGMENT ((struct desc_struct){ {0x00cf9b00ULL} }) +#define FULL_SEGMENT ((struct desc_struct){ {0x00cf9300ULL} }) struct lguest_dma_info { Index: linus-2.6/drivers/lguest/lguest.c === --- linus-2.6.orig/drivers/lguest/lguest.c +++ linus-2.6/drivers/lguest/lguest.c @@ -173,7 +173,7 @@ static void lguest_load_idt(const struct struct desc_struct *idt = (void *)desc-address; for (i = 0; i (desc-size+1)/8; i++) - hcall(LHCALL_LOAD_IDT_ENTRY, i, idt[i].a, idt[i].b); + hcall(LHCALL_LOAD_IDT_ENTRY, i, idt[i].raw32.a, idt[i].raw32.b); } static void lguest_load_gdt(const struct Xgt_desc_struct *desc) Index: linus-2.6/drivers/lguest/segments.c === --- linus-2.6.orig/drivers/lguest/segments.c +++ linus-2.6/drivers/lguest/segments.c @@ -3,12 +3,12 @@ static int
Re: [PATCH 3/3] i386: Replace struct Xgt_desc_struct with struct desc_ptr
* Rusty Russell ([EMAIL PROTECTED]) wrote: Remove i386's Xgt_desc_struct definition and use desc_def.h's desc_ptr. plus this is needed now Index: linus-2.6/drivers/lguest/lg.h === --- linus-2.6.orig/drivers/lguest/lg.h +++ linus-2.6/drivers/lguest/lg.h @@ -91,13 +91,13 @@ struct lguest_ro_state { /* Host information we need to restore when we switch back. */ u32 host_cr3; - struct Xgt_desc_struct host_idt_desc; - struct Xgt_desc_struct host_gdt_desc; + struct desc_ptr host_idt_desc; + struct desc_ptr host_gdt_desc; u32 host_sp; /* Fields which are used when guest is running. */ - struct Xgt_desc_struct guest_idt_desc; - struct Xgt_desc_struct guest_gdt_desc; + struct desc_ptr guest_idt_desc; + struct desc_ptr guest_gdt_desc; struct i386_hw_tss guest_tss; struct desc_struct guest_idt[IDT_ENTRIES]; struct desc_struct guest_gdt[GDT_ENTRIES]; Index: linus-2.6/arch/i386/xen/enlighten.c === --- linus-2.6.orig/arch/i386/xen/enlighten.c +++ linus-2.6/arch/i386/xen/enlighten.c @@ -301,7 +301,7 @@ static void xen_set_ldt(const void *addr xen_mc_issue(PARAVIRT_LAZY_CPU); } -static void xen_load_gdt(const struct Xgt_desc_struct *dtr) +static void xen_load_gdt(const struct desc_ptr *dtr) { unsigned long *frames; unsigned long va = dtr-address; @@ -401,7 +401,7 @@ static int cvt_gate_to_trap(int vector, } /* Locations of each CPU's IDT */ -static DEFINE_PER_CPU(struct Xgt_desc_struct, idt_desc); +static DEFINE_PER_CPU(struct desc_ptr, idt_desc); /* Set an IDT entry. If the entry is part of the current IDT, then also update Xen. */ @@ -433,7 +433,7 @@ static void xen_write_idt_entry(struct d preempt_enable(); } -static void xen_convert_trap_info(const struct Xgt_desc_struct *desc, +static void xen_convert_trap_info(const struct desc_ptr *desc, struct trap_info *traps) { unsigned in, out, count; @@ -452,7 +452,7 @@ static void xen_convert_trap_info(const void xen_copy_trap_info(struct trap_info *traps) { - const struct Xgt_desc_struct *desc = __get_cpu_var(idt_desc); + const struct desc_ptr *desc = __get_cpu_var(idt_desc); xen_convert_trap_info(desc, traps); } @@ -460,7 +460,7 @@ void xen_copy_trap_info(struct trap_info /* Load a new IDT into Xen. In principle this can be per-CPU, so we hold a spinlock to protect the static traps[] array (static because it avoids allocation, and saves stack space). */ -static void xen_load_idt(const struct Xgt_desc_struct *desc) +static void xen_load_idt(const struct desc_ptr *desc) { static DEFINE_SPINLOCK(lock); static struct trap_info traps[257]; Index: linus-2.6/drivers/lguest/lguest.c === --- linus-2.6.orig/drivers/lguest/lguest.c +++ linus-2.6/drivers/lguest/lguest.c @@ -167,7 +167,7 @@ static void lguest_write_idt_entry(struc hcall(LHCALL_LOAD_IDT_ENTRY, entrynum, low, high); } -static void lguest_load_idt(const struct Xgt_desc_struct *desc) +static void lguest_load_idt(const struct desc_ptr *desc) { unsigned int i; struct desc_struct *idt = (void *)desc-address; @@ -176,7 +176,7 @@ static void lguest_load_idt(const struct hcall(LHCALL_LOAD_IDT_ENTRY, i, idt[i].raw32.a, idt[i].raw32.b); } -static void lguest_load_gdt(const struct Xgt_desc_struct *desc) +static void lguest_load_gdt(const struct desc_ptr *desc) { BUG_ON((desc-size+1)/8 != GDT_ENTRIES); hcall(LHCALL_LOAD_GDT, __pa(desc-address), GDT_ENTRIES, 0); - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] hugetlbfs read() support
(sorry if this is a resend... something bad seems to have happened to me) Andrew Morton wrote: On Thu, 19 Jul 2007 08:51:49 -0700 Badari Pulavarty [EMAIL PROTECTED] wrote: This code doesn't have all the ghastly tricks which we deploy to handle concurrent truncate. Do I need to ? Baaahh!! I don't want to deal with them. Nick, can you think of any serious consequences of a read/truncate race in there? I can't.. As it doesn't allow writes, then I _think_ it should be OK. If you ever did want to add write(2) support, then you would have transient zeroes problems. But I'm not completely sure.. we've had a lot of (and still have some known and probably unknown) bugs just in that single generic_mapping_read function, most of which are due to our rabid aversion to doing any locking whatsoever there. So why not just hold i_mutex around the whole thing to be safe? -- SUSE Labs, Novell Inc. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: film at 11: kernel update breaks udev.
On Sat, Jul 21, 2007 at 03:28:12AM +0200, Kay Sievers wrote: On 7/21/07, Dave Jones [EMAIL PROTECTED] wrote: On Sat, Jul 21, 2007 at 03:09:55AM +0200, Kay Sievers wrote: On 7/21/07, Dave Jones [EMAIL PROTECTED] wrote: Just one of my machines to 2.6.22.1, and got this during boot.. Starting udev: udevd-event[619]: udev_node_symlink: symlink(../../sdc/dev/disk/by-uuid/2d773baf-8174-10a6-14db-a78e0e676e89) failed: File exists Under 2.6.21, all was fine. sdc is one disk of a 3 disk raid5 set. The raidset still manages to come up despite this. This is a Fedora 7 box, with udev-106-4.1.fc7 What changed this time? CONFIG_BLK_DEV_BSG=y? There's a name-clash, because bsg tries to create devices with the same name. James sent a patch, it's on lkml. BSG isn't in 2.6.22 Ok. There has nothing else changed, that I could think of what could cause this. The code in udev that prints this message looks like: err(symlink(%s, %s) failed: %s, linktarget, filename, strerror(errno)); That doesn't really match what you posted. Are there chars missing? Umm. Now I'm confused. Note above that it's talking about sdc. /dev/disk/by-uuid/ contains .. lrwxrwxrwx 1 root root 9 2007-07-17 20:35 2d773baf-8174-10a6-14db-a78e0e676e89 - ../../sdd lrwxrwxrwx 1 root root 10 2007-07-20 18:44 3B69-1AFD - ../../sdl1 lrwxrwxrwx 1 root root 10 2007-07-20 19:06 46A1-3FCB - ../../sdi1 lrwxrwxrwx 1 root root 10 2007-07-17 20:35 4e728818-fcf1-21ee-07a5-302b72bc6129 - ../../sdc1 lrwxrwxrwx 1 root root 10 2007-07-17 20:35 5f435361-5797-4a8c-a285-c72fa455d401 - ../../sda1 lrwxrwxrwx 1 root root 9 2007-07-17 20:35 9502a546-dd98-41df-8916-45032d801b69 - ../../md0 lrwxrwxrwx 1 root root 10 2007-07-17 20:35 ed102ac9-5615-c34b-5fe7-1a9029705ebf - ../../sda2 note that uuid matches sdd instead. And what does: udevtest /block/sdc print? parse_file: reading '/etc/udev/rules.d/05-udev-early.rules' as rules file parse_file: reading '/etc/udev/rules.d/40-multipath.rules' as rules file parse_file: reading '/etc/udev/rules.d/50-udev.rules' as rules file parse_file: reading '/etc/udev/rules.d/60-libsane.rules' as rules file parse_file: reading '/etc/udev/rules.d/60-net.rules' as rules file parse_file: reading '/etc/udev/rules.d/60-pcmcia.rules' as rules file parse_file: reading '/etc/udev/rules.d/60-wacom.rules' as rules file parse_file: reading '/etc/udev/rules.d/85-pcscd_ccid.rules' as rules file parse_file: reading '/etc/udev/rules.d/85-pcscd_egate.rules' as rules file parse_file: reading '/etc/udev/rules.d/90-alsa.rules' as rules file parse_file: reading '/etc/udev/rules.d/90-hal.rules' as rules file parse_file: reading '/etc/udev/rules.d/95-pam-console.rules' as rules file parse_file: reading '/etc/udev/rules.d/bluetooth.rules' as rules file This program is for debugging only, it does not create any node, or run any program specified by a RUN key. It may show incorrect results, if rules match against subsystem specfic kernel event variables. main: looking at device '/block/sdc' from subsystem 'block' run_program: '/bin/bash -c '/sbin/lsmod | /bin/grep ^dm_multipath'' run_program: '/bin/bash' (stdout) 'dm_multipath 28889 0 ' run_program: '/bin/bash' returned with status 0 run_program: '/lib/udev/usb_id -x' run_program: '/lib/udev/usb_id' returned with status 1 run_program: '/lib/udev/scsi_id -g -x -s /block/sdc -d /dev/.tmp-8-32' run_program: '/lib/udev/scsi_id' (stdout) 'ID_VENDOR=ATA' run_program: '/lib/udev/scsi_id' (stdout) 'ID_MODEL=WDC_WD2500KS-00M' run_program: '/lib/udev/scsi_id' (stdout) 'ID_REVISION=02.0' run_program: '/lib/udev/scsi_id' (stdout) 'ID_SERIAL=SATA_WDC_WD2500KS-00_WD-WCANK6187088' run_program: '/lib/udev/scsi_id' (stdout) 'ID_SERIAL_SHORT=WD-WCANK6187088' run_program: '/lib/udev/scsi_id' (stdout) 'ID_TYPE=disk' run_program: '/lib/udev/scsi_id' (stdout) 'ID_BUS=scsi' run_program: '/lib/udev/scsi_id' returned with status 0 udev_rules_get_name: add symlink 'disk/by-id/scsi-SATA_WDC_WD2500KS-00_WD-WCANK6187088' run_program: '/lib/udev/path_id /block/sdc' run_program: '/lib/udev/path_id' (stdout) 'ID_PATH=pci-:05:05.0-scsi-0:0:0:0' run_program: '/lib/udev/path_id' returned with status 0 udev_rules_get_name: add symlink 'disk/by-path/pci-:05:05.0-scsi-0:0:0:0' run_program: '/lib/udev/vol_id --export /dev/.tmp-8-32' run_program: '/lib/udev/vol_id' (stdout) 'ID_FS_USAGE=raid' run_program: '/lib/udev/vol_id' (stdout) 'ID_FS_TYPE=linux_raid_member' run_program: '/lib/udev/vol_id' (stdout) 'ID_FS_VERSION=0.90.0' run_program: '/lib/udev/vol_id' (stdout) 'ID_FS_UUID=2d773baf-8174-10a6-14db-a78e0e676e89' run_program: '/lib/udev/vol_id' (stdout) 'ID_FS_LABEL=' run_program: '/lib/udev/vol_id' (stdout) 'ID_FS_LABEL_SAFE=' run_program: '/lib/udev/vol_id' returned with status 0 udev_rules_get_name: add symlink 'disk/by-uuid/2d773baf-8174-10a6-14db-a78e0e676e89'
Re: [PATCH] AFS: Fix file locking
On Fri, 20 Jul 2007, Nick Piggin wrote: So you did. Then to answer that, yes it could be faster because there are stupid volatiles sprinkled all over the bitops code so you could easily end up having to do more loads. Does it make a real difference? Unlikely, but David loves counting cycles :) I thought we long long since removed the volatiles. They are buggy and horrible, and we really want to let the compiler combine multiple test-bits, and if they matter that implies locking is buggy or something worse.. Ie we'd *want* if (test_bit(x, y) || test_bit(z,y)) to be rewritten by the compiler as testing bits x/z at the same time. But now I'm too scared to look. Linus - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [broken-out-2007-07-20-00-22] kernel bug at kernel/params:570
On Fri, 20 Jul 2007 18:02:57 -0700 Greg KH [EMAIL PROTECTED] wrote: --- a/kernel/params.c +++ b/kernel/params.c @@ -567,7 +567,11 @@ static void __init kernel_param_sysfs_se kobject_set_name(mk-kobj, name); kobject_init(mk-kobj); ret = kobject_add(mk-kobj); - BUG_ON(ret 0); + if (ret) { + printk(KERN_ERR module '%s' failed to be added to sysfs, + the system will be unstable now.\n, name); + return; + } It would be nice to print the value of `ret' too. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [broken-out-2007-07-20-00-22] kernel bug at kernel/params:570
On Fri, Jul 20, 2007 at 06:37:33PM -0700, Andrew Morton wrote: On Fri, 20 Jul 2007 18:02:57 -0700 Greg KH [EMAIL PROTECTED] wrote: --- a/kernel/params.c +++ b/kernel/params.c @@ -567,7 +567,11 @@ static void __init kernel_param_sysfs_se kobject_set_name(mk-kobj, name); kobject_init(mk-kobj); ret = kobject_add(mk-kobj); - BUG_ON(ret 0); + if (ret) { + printk(KERN_ERR module '%s' failed to be added to sysfs, + the system will be unstable now.\n, name); + return; + } It would be nice to print the value of `ret' too. Ok, how about this version: --- kernel/params.c |7 ++- 1 file changed, 6 insertions(+), 1 deletion(-) --- a/kernel/params.c +++ b/kernel/params.c @@ -567,7 +567,12 @@ static void __init kernel_param_sysfs_se kobject_set_name(mk-kobj, name); kobject_init(mk-kobj); ret = kobject_add(mk-kobj); - BUG_ON(ret 0); + if (ret) { + printk(KERN_ERR Module '%s' failed to be added to sysfs, + error number %d\n, name, ret); + printk(KERN_ERR The system will be unstable now.\n); + return; + } param_sysfs_setup(mk, kparam, num_params, name_skip); kobject_uevent(mk-kobj, KOBJ_ADD); } - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: film at 11: kernel update breaks udev.
On 7/21/07, Dave Jones [EMAIL PROTECTED] wrote: On Sat, Jul 21, 2007 at 03:28:12AM +0200, Kay Sievers wrote: On 7/21/07, Dave Jones [EMAIL PROTECTED] wrote: On Sat, Jul 21, 2007 at 03:09:55AM +0200, Kay Sievers wrote: On 7/21/07, Dave Jones [EMAIL PROTECTED] wrote: Just one of my machines to 2.6.22.1, and got this during boot.. Starting udev: udevd-event[619]: udev_node_symlink: symlink(../../sdc/dev/disk/by-uuid/2d773baf-8174-10a6-14db-a78e0e676e89) failed: File exists Under 2.6.21, all was fine. sdc is one disk of a 3 disk raid5 set. The raidset still manages to come up despite this. This is a Fedora 7 box, with udev-106-4.1.fc7 What changed this time? CONFIG_BLK_DEV_BSG=y? There's a name-clash, because bsg tries to create devices with the same name. James sent a patch, it's on lkml. BSG isn't in 2.6.22 Ok. There has nothing else changed, that I could think of what could cause this. The code in udev that prints this message looks like: err(symlink(%s, %s) failed: %s, linktarget, filename, strerror(errno)); That doesn't really match what you posted. Are there chars missing? Umm. Now I'm confused. Note above that it's talking about sdc. /dev/disk/by-uuid/ contains .. lrwxrwxrwx 1 root root 9 2007-07-17 20:35 2d773baf-8174-10a6-14db-a78e0e676e89 - ../../sdd lrwxrwxrwx 1 root root 10 2007-07-20 18:44 3B69-1AFD - ../../sdl1 lrwxrwxrwx 1 root root 10 2007-07-20 19:06 46A1-3FCB - ../../sdi1 lrwxrwxrwx 1 root root 10 2007-07-17 20:35 4e728818-fcf1-21ee-07a5-302b72bc6129 - ../../sdc1 lrwxrwxrwx 1 root root 10 2007-07-17 20:35 5f435361-5797-4a8c-a285-c72fa455d401 - ../../sda1 lrwxrwxrwx 1 root root 9 2007-07-17 20:35 9502a546-dd98-41df-8916-45032d801b69 - ../../md0 lrwxrwxrwx 1 root root 10 2007-07-17 20:35 ed102ac9-5615-c34b-5fe7-1a9029705ebf - ../../sda2 note that uuid matches sdd instead. And what does: udevtest /block/sdc print? parse_file: reading '/etc/udev/rules.d/05-udev-early.rules' as rules file parse_file: reading '/etc/udev/rules.d/40-multipath.rules' as rules file parse_file: reading '/etc/udev/rules.d/50-udev.rules' as rules file parse_file: reading '/etc/udev/rules.d/60-libsane.rules' as rules file parse_file: reading '/etc/udev/rules.d/60-net.rules' as rules file parse_file: reading '/etc/udev/rules.d/60-pcmcia.rules' as rules file parse_file: reading '/etc/udev/rules.d/60-wacom.rules' as rules file parse_file: reading '/etc/udev/rules.d/85-pcscd_ccid.rules' as rules file parse_file: reading '/etc/udev/rules.d/85-pcscd_egate.rules' as rules file parse_file: reading '/etc/udev/rules.d/90-alsa.rules' as rules file parse_file: reading '/etc/udev/rules.d/90-hal.rules' as rules file parse_file: reading '/etc/udev/rules.d/95-pam-console.rules' as rules file parse_file: reading '/etc/udev/rules.d/bluetooth.rules' as rules file This program is for debugging only, it does not create any node, or run any program specified by a RUN key. It may show incorrect results, if rules match against subsystem specfic kernel event variables. main: looking at device '/block/sdc' from subsystem 'block' run_program: '/bin/bash -c '/sbin/lsmod | /bin/grep ^dm_multipath'' run_program: '/bin/bash' (stdout) 'dm_multipath 28889 0 ' run_program: '/bin/bash' returned with status 0 run_program: '/lib/udev/usb_id -x' run_program: '/lib/udev/usb_id' returned with status 1 run_program: '/lib/udev/scsi_id -g -x -s /block/sdc -d /dev/.tmp-8-32' run_program: '/lib/udev/scsi_id' (stdout) 'ID_VENDOR=ATA' run_program: '/lib/udev/scsi_id' (stdout) 'ID_MODEL=WDC_WD2500KS-00M' run_program: '/lib/udev/scsi_id' (stdout) 'ID_REVISION=02.0' run_program: '/lib/udev/scsi_id' (stdout) 'ID_SERIAL=SATA_WDC_WD2500KS-00_WD-WCANK6187088' run_program: '/lib/udev/scsi_id' (stdout) 'ID_SERIAL_SHORT=WD-WCANK6187088' run_program: '/lib/udev/scsi_id' (stdout) 'ID_TYPE=disk' run_program: '/lib/udev/scsi_id' (stdout) 'ID_BUS=scsi' run_program: '/lib/udev/scsi_id' returned with status 0 udev_rules_get_name: add symlink 'disk/by-id/scsi-SATA_WDC_WD2500KS-00_WD-WCANK6187088' run_program: '/lib/udev/path_id /block/sdc' run_program: '/lib/udev/path_id' (stdout) 'ID_PATH=pci-:05:05.0-scsi-0:0:0:0' run_program: '/lib/udev/path_id' returned with status 0 udev_rules_get_name: add symlink 'disk/by-path/pci-:05:05.0-scsi-0:0:0:0' run_program: '/lib/udev/vol_id --export /dev/.tmp-8-32' run_program: '/lib/udev/vol_id' (stdout) 'ID_FS_USAGE=raid' run_program: '/lib/udev/vol_id' (stdout) 'ID_FS_TYPE=linux_raid_member' run_program: '/lib/udev/vol_id' (stdout) 'ID_FS_VERSION=0.90.0' run_program: '/lib/udev/vol_id' (stdout) 'ID_FS_UUID=2d773baf-8174-10a6-14db-a78e0e676e89' run_program: '/lib/udev/vol_id' (stdout) 'ID_FS_LABEL=' run_program: '/lib/udev/vol_id' (stdout) 'ID_FS_LABEL_SAFE=' run_program: '/lib/udev/vol_id' returned with status 0 udev_rules_get_name: add symlink
Re: [RFC, Announce] Unified x86 architecture, arch/x86
On Sat, 21 Jul 2007, Arnd Bergmann wrote: On Saturday 21 July 2007, Thomas Gleixner wrote: In my experience, it's very helpful to have a single set of header files, and merging the two versions of one header usually exposes bugs that have been fixed in only one of the two, so you get to fix actual bugs in the process. This can still be done after the merge tglx did. In the s390 merge, I also started out in an attempt to guarantee unchanged object files, much like what you describe. However, it turned out that fixing it in the process is actually easier. Either way, 'diff -D __x86_64__' is a great tool for a start, you should try it out to see how easy it is to merge a lot of files. To put it into perspective, I think the s390 merge was a lot easier than the x86 merge, because there is only a very limited set of hardware configurations for s390 compared to others. We ended up doing the full merge with three people within less than a week and no separate files at all. This is the big reason they wanted to keep it binary identical. Since there are just way too many different configs out there in the x86 world OTOH, the powerpc merge is now going into its third year, mostly because it was started with the intention to remove all cruft in the process and to only allow sane code into the new architecture. I'd expect x86 to move much faster, just because there are more developers and users of x86 PCs than there are for powerpc. The steps that I'd suggest instead are: * merge all exported header files of the two architectures. This alone is a worthy goal, because it allows us to get rid of the ugly code for deciding which version to use in installed headers and elsewhere. I don't see why this can't be done after the first Big merge. * Create an arch/x86/Makefile that descends into ../i386/* and ../x86_64/* instead of its subdirectories. The thing that Thomas pointed out, is that physical location of the source actually does matter. Having two files side by side with the same name except for a _32.c and _64.c, makes a developer want to merge them. A perfect example is looking at both arch/x86/kernel/module_{32,64}.c One would be encouraged to make that into a single file. But having a arch/i386/kernel/module.c and a arch/x86_64/kernel/module.c would take some time before anyone would care. * Merge the arch/x86/* subdirectories, one at a time, starting with the low-hanging fruit like oprofile or pci, and do the hard ones like mm and kernel last. Your looking at a 10year plus merge with that approach. I think that is exactly what Ingo and Thomas _dont_ want. Doing it as the big bang way as is posted in this patch is the fastest way to get where we want to go. Unfortunately, I don't think I'll spend much time on this, so I don't get to decide on it, but you asked for feedback ;-) I'm actually looking forward to helping out here ;-) -- Steve - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: film at 11: kernel update breaks udev.
On 7/21/07, Kay Sievers [EMAIL PROTECTED] wrote: On 7/21/07, Dave Jones [EMAIL PROTECTED] wrote: On Sat, Jul 21, 2007 at 03:28:12AM +0200, Kay Sievers wrote: On 7/21/07, Dave Jones [EMAIL PROTECTED] wrote: On Sat, Jul 21, 2007 at 03:09:55AM +0200, Kay Sievers wrote: On 7/21/07, Dave Jones [EMAIL PROTECTED] wrote: Just one of my machines to 2.6.22.1, and got this during boot.. Starting udev: udevd-event[619]: udev_node_symlink: symlink(../../sdc/dev/disk/by-uuid/2d773baf-8174-10a6-14db-a78e0e676e89) failed: File exists Under 2.6.21, all was fine. sdc is one disk of a 3 disk raid5 set. The raidset still manages to come up despite this. This is a Fedora 7 box, with udev-106-4.1.fc7 What changed this time? CONFIG_BLK_DEV_BSG=y? There's a name-clash, because bsg tries to create devices with the same name. James sent a patch, it's on lkml. BSG isn't in 2.6.22 Ok. There has nothing else changed, that I could think of what could cause this. The code in udev that prints this message looks like: err(symlink(%s, %s) failed: %s, linktarget, filename, strerror(errno)); That doesn't really match what you posted. Are there chars missing? Umm. Now I'm confused. Note above that it's talking about sdc. /dev/disk/by-uuid/ contains .. lrwxrwxrwx 1 root root 9 2007-07-17 20:35 2d773baf-8174-10a6-14db-a78e0e676e89 - ../../sdd lrwxrwxrwx 1 root root 10 2007-07-20 18:44 3B69-1AFD - ../../sdl1 lrwxrwxrwx 1 root root 10 2007-07-20 19:06 46A1-3FCB - ../../sdi1 lrwxrwxrwx 1 root root 10 2007-07-17 20:35 4e728818-fcf1-21ee-07a5-302b72bc6129 - ../../sdc1 lrwxrwxrwx 1 root root 10 2007-07-17 20:35 5f435361-5797-4a8c-a285-c72fa455d401 - ../../sda1 lrwxrwxrwx 1 root root 9 2007-07-17 20:35 9502a546-dd98-41df-8916-45032d801b69 - ../../md0 lrwxrwxrwx 1 root root 10 2007-07-17 20:35 ed102ac9-5615-c34b-5fe7-1a9029705ebf - ../../sda2 note that uuid matches sdd instead. And what does: udevtest /block/sdc print? parse_file: reading '/etc/udev/rules.d/05-udev-early.rules' as rules file parse_file: reading '/etc/udev/rules.d/40-multipath.rules' as rules file parse_file: reading '/etc/udev/rules.d/50-udev.rules' as rules file parse_file: reading '/etc/udev/rules.d/60-libsane.rules' as rules file parse_file: reading '/etc/udev/rules.d/60-net.rules' as rules file parse_file: reading '/etc/udev/rules.d/60-pcmcia.rules' as rules file parse_file: reading '/etc/udev/rules.d/60-wacom.rules' as rules file parse_file: reading '/etc/udev/rules.d/85-pcscd_ccid.rules' as rules file parse_file: reading '/etc/udev/rules.d/85-pcscd_egate.rules' as rules file parse_file: reading '/etc/udev/rules.d/90-alsa.rules' as rules file parse_file: reading '/etc/udev/rules.d/90-hal.rules' as rules file parse_file: reading '/etc/udev/rules.d/95-pam-console.rules' as rules file parse_file: reading '/etc/udev/rules.d/bluetooth.rules' as rules file This program is for debugging only, it does not create any node, or run any program specified by a RUN key. It may show incorrect results, if rules match against subsystem specfic kernel event variables. main: looking at device '/block/sdc' from subsystem 'block' run_program: '/bin/bash -c '/sbin/lsmod | /bin/grep ^dm_multipath'' run_program: '/bin/bash' (stdout) 'dm_multipath 28889 0 ' run_program: '/bin/bash' returned with status 0 run_program: '/lib/udev/usb_id -x' run_program: '/lib/udev/usb_id' returned with status 1 run_program: '/lib/udev/scsi_id -g -x -s /block/sdc -d /dev/.tmp-8-32' run_program: '/lib/udev/scsi_id' (stdout) 'ID_VENDOR=ATA' run_program: '/lib/udev/scsi_id' (stdout) 'ID_MODEL=WDC_WD2500KS-00M' run_program: '/lib/udev/scsi_id' (stdout) 'ID_REVISION=02.0' run_program: '/lib/udev/scsi_id' (stdout) 'ID_SERIAL=SATA_WDC_WD2500KS-00_WD-WCANK6187088' run_program: '/lib/udev/scsi_id' (stdout) 'ID_SERIAL_SHORT=WD-WCANK6187088' run_program: '/lib/udev/scsi_id' (stdout) 'ID_TYPE=disk' run_program: '/lib/udev/scsi_id' (stdout) 'ID_BUS=scsi' run_program: '/lib/udev/scsi_id' returned with status 0 udev_rules_get_name: add symlink 'disk/by-id/scsi-SATA_WDC_WD2500KS-00_WD-WCANK6187088' run_program: '/lib/udev/path_id /block/sdc' run_program: '/lib/udev/path_id' (stdout) 'ID_PATH=pci-:05:05.0-scsi-0:0:0:0' run_program: '/lib/udev/path_id' returned with status 0 udev_rules_get_name: add symlink 'disk/by-path/pci-:05:05.0-scsi-0:0:0:0' run_program: '/lib/udev/vol_id --export /dev/.tmp-8-32' run_program: '/lib/udev/vol_id' (stdout) 'ID_FS_USAGE=raid' run_program: '/lib/udev/vol_id' (stdout) 'ID_FS_TYPE=linux_raid_member' run_program: '/lib/udev/vol_id' (stdout) 'ID_FS_VERSION=0.90.0' run_program: '/lib/udev/vol_id' (stdout) 'ID_FS_UUID=2d773baf-8174-10a6-14db-a78e0e676e89' run_program: '/lib/udev/vol_id' (stdout) 'ID_FS_LABEL=' run_program: '/lib/udev/vol_id' (stdout)
Re: net/ipv4/inetpeer.c stack warnings
From: Patrick McHardy [EMAIL PROTECTED] Date: Thu, 19 Jul 2007 14:48:59 +0200 Gabriel C wrote: Hello , I noticed on current git this warning in net/ipv4/inetpeer.c Yeah, I have no idea why the gcc people thought that this was something worth warning about. Especially since explicitly checking for != NULL silences the warning again. Sigh, applied :-) - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC, Announce] Unified x86 architecture, arch/x86
On 7/20/07, Steven Rostedt [EMAIL PROTECTED] wrote: I really like the idea of a unified source tree for the 2 x86 variants. The technical differences are really small (of course there are differences, especially in the boot sequence), and striving to unify as much as possible while having a clean way to do per 32/64 bit parts as well is something that imo is the right thing. Not to mention all the paravirt stuff that's going on. Having a single x86 arch to work with would be greatly beneficial to the work being done to port paravirt to x86_64. As for paravirt, it'd really help. As I had the tree lagged behind by so much, a great part of the work now is checking where i386 is, seeing if it applies for 64-bit, and so on. The differences are not so huge, and I'm trying my best to not let them deviate too much. It could mostly be built incrementally. And I bet a huge part of the tree could be like this too: In most places, they are different for no particular reason, just because two people implemented it separately. There'd be a huge effort to bring those differences into an end, but I think I'd pay in future development speed. (not to mention the duplicate bugs linus have already talked about) Way to go, Thomas and Ingo! I am pretty much for it too. -- Glauber de Oliveira Costa. Free as in Freedom http://glommer.net The less confident you are, the more serious you have to act. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: PROBLEM: Dell Inspiron 1501 fails to boot in 2.6.21+
On 7/20/07, Mark Tiefenbruck [EMAIL PROTECTED] wrote: I'd appreciate any help on getting this report sent to the appropriate list and, of course, getting this fixed. I don't know what's useful, so you're getting everything. This will be a very long e-mail. My new laptop won't boot with kernel versions 2.6.21 or 2.6.22 . No oops. No panic. It just stops printing messages. Maybe it would eventually continue if I wait long enough, but it's unacceptable either way. I include below the contents of dmesg for a working kernel up to the point where it halts. I'm also including what it usually does for a few lines after that point. I did git-bisect on the 2.6.21.y tree. I'm including the result of that as well. It mentions HPET, so I should mention my computer also fails to boot when I enable HPET in my BIOS. I don't have the details of this currently; I can reproduce it again if needed. I've also included my kernel configuration and ver_linux output. You'll notice that my gcc version is 4.2.0, but this also happens with 4.1.2. I'm including /proc/cpuinfo and lspci -vvv. I'm including /proc/ioports and /proc/iomem. I don't have a /proc/scsi. Thanks, Mark Here's the commit that causes the problem: e9e2cdb412412326c4827fc78ba27f410d837e6e is first bad commit commit e9e2cdb412412326c4827fc78ba27f410d837e6e Author: Thomas Gleixner [EMAIL PROTECTED] Date: Fri Feb 16 01:28:04 2007 -0800 [PATCH] clockevents: i386 drivers Add clockevent drivers for i386: lapic (local) and PIT/HPET (global). Update the timer IRQ to call into the PIT/HPET driver's event handler and the lapic-timer IRQ to call into the lapic clockevent driver. The assignement of timer functionality is delegated to the core framework code and replaces the compile and runtime evalution in do_timer_interrupt_hook() Use the clockevents broadcast support and implement the lapic_broadcast function for ACPI. No changes to existing functionality. [ kdump fix from Vivek Goyal [EMAIL PROTECTED] ] [ fixes based on review feedback from Arjan van de Ven [EMAIL PROTECTED] ] Cleanups-from: Adrian Bunk [EMAIL PROTECTED] Build-fixes-from: Andrew Morton [EMAIL PROTECTED] Signed-off-by: Thomas Gleixner [EMAIL PROTECTED] Signed-off-by: Ingo Molnar [EMAIL PROTECTED] Cc: john stultz [EMAIL PROTECTED] Cc: Roman Zippel [EMAIL PROTECTED] Cc: Andi Kleen [EMAIL PROTECTED] Signed-off-by: Andrew Morton [EMAIL PROTECTED] Signed-off-by: Linus Torvalds [EMAIL PROTECTED] As a wild guess, I'd bet that the rcu queues are failing to get called (probably some problem with the timer interrupt in the APs?), thus preventing the system to get into a quiescent state. It does seem timer related to me. Maybe one of the timer gurus have any other word on this? -- Glauber de Oliveira Costa. Free as in Freedom http://glommer.net The less confident you are, the more serious you have to act. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC, Announce] Unified x86 architecture, arch/x86
On 7/20/07, Ingo Molnar [EMAIL PROTECTED] wrote: * Jeff Garzik [EMAIL PROTECTED] wrote: I agree with Andi... it's quite nice to be able to leave some arch/i386 stuff, and not carry it over to arch/x86-64. we can leave those few items in arch/x86 just as much. No need to keep around a legacy tree for that. how about making all files ans directories take _32 or _64 in the name? except the files or dir that are shared. for example: k8_bus.c is only need by 64 === change it to k8_bus_64.c mach-generic=== mach-generic_32 YH - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 4/5] [V2] Define is_global_init() and is_container_init()
Andrew Morton [EMAIL PROTECTED] wrote: | On Thu, 19 Jul 2007 00:21:58 -0700 | [EMAIL PROTECTED] wrote: | | --- lx26-22-rc6-mm1a.orig/kernel/pid.c 2007-07-16 12:55:15.0 -0700 | +++ lx26-22-rc6-mm1a/kernel/pid.c 2007-07-16 13:10:48.0 -0700 | @@ -69,6 +69,13 @@ struct pid_namespace init_pid_ns = { | .last_pid = 0, | .child_reaper = init_task | }; | +EXPORT_SYMBOL(init_pid_ns); | + | +int is_global_init(struct task_struct *tsk) | +{ | + return tsk == init_pid_ns.child_reaper; | +} | +EXPORT_SYMBOL(is_global_init); | | I don't immediately see why init_pid_ns was exported to modules. | | It would need to be exported if is_global_init() was made static inline in a | header (which seems like a sensible thing to do), but it wasn't. It did not need to be exported in this patch. I have a couple of follow-on patches that cleaned up some header-file dependencies and made is_global_init() inline. Those patches are changing a bit as I merge them with Pavel Emelianov's pid ns changes. I will send a separate patch to inline is_global_init(). Suka - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [broken-out-2007-07-20-00-22] kernel bug at kernel/params:570
[ Considering this has sufficiently excited me, I became the second person to illegitimately download 2.6.22-mm1 and am presently building Michal's config. The strange thing is that I couldn't get 22-mm1 to even build with the posted .config -- so had to deselect XFS, ATA, unionfs. Hopefully this bug should be 100% reproducible at boot time anyway. Don't care much for XFS and unionfs, but hoping deselecting ATA from the config doesn't change the variables much in this equation. ] On 7/21/07, Greg KH [EMAIL PROTECTED] wrote: On Fri, Jul 20, 2007 at 06:37:33PM -0700, Andrew Morton wrote: On Fri, 20 Jul 2007 18:02:57 -0700 Greg KH [EMAIL PROTECTED] wrote: --- a/kernel/params.c +++ b/kernel/params.c @@ -567,7 +567,11 @@ static void __init kernel_param_sysfs_se kobject_set_name(mk-kobj, name); kobject_init(mk-kobj); ret = kobject_add(mk-kobj); - BUG_ON(ret 0); + if (ret) { + printk(KERN_ERR module '%s' failed to be added to sysfs, + the system will be unstable now.\n, name); + return; + } It would be nice to print the value of `ret' too. What I'm surprised about is that %eax doesn't seem to contain the return value `ret' of kobject_add(). It's 1, which is funny, given: ret = kobject_add(mk-kobj); BUG_ON(ret 0); One wouldn't expect BUG() -- or the corresponding exception handler -- to clobber registers, that would be a sad day. Ok, how about this version: --- kernel/params.c |7 ++- 1 file changed, 6 insertions(+), 1 deletion(-) --- a/kernel/params.c +++ b/kernel/params.c @@ -567,7 +567,12 @@ static void __init kernel_param_sysfs_se kobject_set_name(mk-kobj, name); kobject_init(mk-kobj); ret = kobject_add(mk-kobj); - BUG_ON(ret 0); + if (ret) { + printk(KERN_ERR Module '%s' failed to be added to sysfs, + error number %d\n, name, ret); + printk(KERN_ERR The system will be unstable now.\n); + return; + } param_sysfs_setup(mk, kparam, num_params, name_skip); kobject_uevent(mk-kobj, KOBJ_ADD); } I'm building with this: if (ret) { printk(~ .%s.%d.%s. ~\n, name, ret, kparam-name); return; } To also print out the evil kparam-name that caused us to crash. When ret == EINVAL, name would be , so not so helpful alone. Also enabling netconsole, though I'm sure there's zero chances of NET / ethXXX / netconsole being up _this_ early in the boot ... Will keep you guys posted :-) Satyam - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] infiniband mlx4: potential leaks in __mlx4_ib_modify_qp
thanks, applied. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Use descriptor's functions instead of inline assembly
* Glauber de Oliveira Costa ([EMAIL PROTECTED]) wrote: This patch provides a new set of functions for managing the descriptor tables that can be used instead of putting the raw assembly in .c files. Looks alright, some cleanups below Remodeling of store_tr() suggested by Frederik Deweerdt. Signed-off-by: Glauber de Oliveira Costa [EMAIL PROTECTED] diff --git a/arch/x86_64/kernel/head64.c b/arch/x86_64/kernel/head64.c index 6c34bdd..dde41d7 100644 --- a/arch/x86_64/kernel/head64.c +++ b/arch/x86_64/kernel/head64.c @@ -70,7 +70,7 @@ void __init x86_64_start_kernel(char * real_mode_data) for (i = 0; i IDT_ENTRIES; i++) set_intr_gate(i, early_idt_handler); - asm volatile(lidt %0 :: m (idt_descr)); + load_idt((const struct desc_ptr *)idt_descr); No need for extra casting early_printk(Kernel alive\n); diff --git a/arch/x86_64/kernel/reboot.c b/arch/x86_64/kernel/reboot.c index 7503068..7c50a12 100644 --- a/arch/x86_64/kernel/reboot.c +++ b/arch/x86_64/kernel/reboot.c @@ -11,6 +11,7 @@ #include linux/sched.h #include asm/io.h #include asm/delay.h +#include asm/desc.h #include asm/hw_irq.h #include asm/system.h #include asm/pgtable.h @@ -132,7 +133,7 @@ void machine_emergency_restart(void) } case BOOT_TRIPLE: - __asm__ __volatile__(lidt (%0): :r (no_idt)); + load_idt((const struct desc_ptr *)no_idt); same here, plus opportunity for cleanup __asm__ __volatile__(int3); reboot_type = BOOT_KBD; diff --git a/arch/x86_64/kernel/setup64.c b/arch/x86_64/kernel/setup64.c index 1200aaa..fef7290 100644 --- a/arch/x86_64/kernel/setup64.c +++ b/arch/x86_64/kernel/setup64.c @@ -224,8 +224,8 @@ void __cpuinit cpu_init (void) memcpy(cpu_gdt(cpu), cpu_gdt_table, GDT_SIZE); cpu_gdt_descr[cpu].size = GDT_SIZE; - asm volatile(lgdt %0 :: m (cpu_gdt_descr[cpu])); - asm volatile(lidt %0 :: m (idt_descr)); + load_gdt((const struct desc_ptr *)cpu_gdt_descr[cpu]); + load_idt((const struct desc_ptr *)idt_descr); same here memset(me-thread.tls_array, 0, GDT_ENTRY_TLS_ENTRIES * 8); syscall_init(); diff --git a/arch/x86_64/kernel/suspend.c b/arch/x86_64/kernel/suspend.c index b39d478..ddedadf 100644 --- a/arch/x86_64/kernel/suspend.c +++ b/arch/x86_64/kernel/suspend.c @@ -32,9 +32,9 @@ void __save_processor_state(struct saved_context *ctxt) /* * descriptor tables */ - asm volatile (sgdt %0 : =m (ctxt-gdt_limit)); - asm volatile (sidt %0 : =m (ctxt-idt_limit)); - asm volatile (str %0 : =m (ctxt-tr)); + store_gdt((struct desc_ptr *)ctxt-gdt_limit); + store_idt((struct desc_ptr *)ctxt-idt_limit); same here, opportunity for cleanup + store_tr(ctxt-tr); /* XMM0..XMM15 should be handled by kernel_fpu_begin(). */ /* @@ -91,8 +91,9 @@ void __restore_processor_state(struct saved_context *ctxt) * now restore the descriptor tables to their proper values * ltr is done i fix_processor_context(). */ - asm volatile (lgdt %0 :: m (ctxt-gdt_limit)); - asm volatile (lidt %0 :: m (ctxt-idt_limit)); + load_gdt((const struct desc_ptr *)ctxt-gdt_limit); + load_idt((const struct desc_ptr *)ctxt-idt_limit); + /* * segment registers diff --git a/include/asm-x86_64/desc.h b/include/asm-x86_64/desc.h index ac991b5..f2b0a6f 100644 --- a/include/asm-x86_64/desc.h +++ b/include/asm-x86_64/desc.h @@ -20,6 +20,15 @@ extern struct desc_struct cpu_gdt_table[GDT_ENTRIES]; #define load_LDT_desc() asm volatile(lldt %w0::r (GDT_ENTRY_LDT*8)) #define clear_LDT() asm volatile(lldt %w0::r (0)) +static inline unsigned long __store_tr(void) +{ + unsigned long tr; + asm volatile (str %w0:=r (tr)); + return tr; +} native_store_tr (although I've no objection to just fixing the interface) Index: linus-2.6/arch/x86_64/kernel/head64.c === --- linus-2.6.orig/arch/x86_64/kernel/head64.c +++ linus-2.6/arch/x86_64/kernel/head64.c @@ -70,7 +70,7 @@ void __init x86_64_start_kernel(char * r for (i = 0; i IDT_ENTRIES; i++) set_intr_gate(i, early_idt_handler); - load_idt((const struct desc_ptr *)idt_descr); + load_idt(idt_descr); early_printk(Kernel alive\n); Index: linus-2.6/arch/x86_64/kernel/reboot.c === --- linus-2.6.orig/arch/x86_64/kernel/reboot.c +++ linus-2.6/arch/x86_64/kernel/reboot.c @@ -24,7 +24,7 @@ void (*pm_power_off)(void); EXPORT_SYMBOL(pm_power_off); -static long no_idt[3]; +static struct desc_ptr no_idt; static enum { BOOT_TRIPLE = 't', BOOT_KBD = 'k' @@ -133,7 +133,7 @@ void machine_emergency_restart(void) }
Re: posible latency issues in seq_read
Chris Friesen a écrit : Lee Revell wrote: On 7/20/07, Chris Friesen [EMAIL PROTECTED] wrote: We've run into an issue (on 2.6.10) where calling lsof triggers lost packets on our server. Preempt is disabled, and NAPI is enabled. Can you reproduce with a recent kernel? Lots of latency issues have been fixed since then. Unfortunately I have to fix it on this version (the bug was found on shipped product), so if there was a difference I'd have to isolate the changes and backport them. Also, I can't run the software that triggers the problem on a newer kernel as it has dependencies on various patches that are not in mainline. Basically what I'd like to know is whether calling schedule() in seq_read() is safe or whether it would break assumptions made by seq_file users. It wont help much. seq_read() is fine in itself. The problem is in established_get_next() and established_get_first() not allowing softirq processing, while scanning a possibly huge hash table, even if few sockets are hashed in. As cond_resched_softirq() was added in linux-2.6.11, you probably *need* to check the diffs between linux-2.6.10 linux-2.6.11 files : include/linux/sched.h net/core/sock.c (__release_sock() latency) net/ipv4/tcp_ipv4.c (/proc/net/tcp latency) - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Documentation for sysfs, hotplug, and firmware loading.
On Friday 20 July 2007 4:09:36 am Greg KH wrote: On Fri, Jul 20, 2007 at 09:54:01AM +0200, Cornelia Huck wrote: On Fri, 20 Jul 2007 00:00:01 -0700, Greg KH [EMAIL PROTECTED] wrote: I don't insist on it, mknod insists on it. You cannot mknod a dev node without specifying block or char. You're saying that sysfs should provide major and minor numbers without anywhere specifying char or block, meaning the major and minor numbers cannot be _used_. I am insisting on getting the third piece of information without which major and minor are useless. I asked very specifically about this at OLS, several times. What you're telling me now seems to contradict what you told me then. Here's the rule: If the SUBSYSTEM is block, it's a block device. Otherwise it's a char device. That's actually quite confusing to the casual reader, since: But also realize that the majority of events you will get have nothing to do with device nodes. I think you are forgetting this fact. So the rule should be: If the SUBSYSTEM is block (implying major/minor are provided), it's a block device. If the SUBSYSTEM is not block, and major/minor are provided, it's a char device. If major/minor are not provided, the event/device is not relevant to device node creation. Yes, that is much more descriptive, thanks. agreed, thanks. I'll try to post an updated version of my hotplug documentation later tonight. (Just a _touch_ jetlagged at the moment, though. It may only be 9:47 california time, but it's 11:47 on the east cost. I think.) greg k-h Rob -- One of my most productive days was throwing away 1000 lines of code. - Ken Thompson. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [broken-out-2007-07-20-00-22] kernel bug at kernel/params:570
On 7/21/07, Satyam Sharma [EMAIL PROTECTED] wrote: Hopefully this bug should be 100% reproducible at boot time anyway. Don't care much for XFS and unionfs, but hoping deselecting ATA from the config doesn't change the variables much in this equation. ] Gargh! My system obviously cannot boot without libata. Guess it's time to go through git log and see how to fix that build breakage myself ... Michal, how did you even manage to build / boot this kernel! On 7/21/07, Greg KH [EMAIL PROTECTED] wrote: On Fri, Jul 20, 2007 at 06:37:33PM -0700, Andrew Morton wrote: On Fri, 20 Jul 2007 18:02:57 -0700 Greg KH [EMAIL PROTECTED] wrote: --- a/kernel/params.c +++ b/kernel/params.c @@ -567,7 +567,11 @@ static void __init kernel_param_sysfs_se kobject_set_name(mk-kobj, name); kobject_init(mk-kobj); ret = kobject_add(mk-kobj); - BUG_ON(ret 0); + if (ret) { + printk(KERN_ERR module '%s' failed to be added to sysfs, + the system will be unstable now.\n, name); + return; + } It would be nice to print the value of `ret' too. What I'm surprised about is that %eax doesn't seem to contain the return value `ret' of kobject_add(). It's 1, which is funny, given: ret = kobject_add(mk-kobj); BUG_ON(ret 0); One wouldn't expect BUG() -- or the corresponding exception handler -- to clobber registers, that would be a sad day. But I cracked this one alright. His .config has CONFIG_PROFILE_LIKELY=y which replaces unlikely() / likely() with do_check_likely() and forces gcc to clobber %eax with the condition itself, which in our case was (ret 0) == TRUE, and thus, the 1 value we saw in %eax in the register dumps. We should probably document somewhere that CONFIG_PROFILE_LIKELY is not good for debugging. Hmmm ... thinking out aloud here, but probably I don't need to fix that libata breakage at all. I'll just put the BUG_ON(ret 0) back in the code, deselect PROFILE_LIKELY, and this time we _will_ have the return of kobject_add() in %eax ... That'll at least clear up the EEXIST vs EINVAL mystery, that'll be a good data point, yes. Anyway, I guess I must stop my running commentary -- will only post after this is cleared up now :-) Satyam - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] Kconfig: Remove top level menu Code maturity level options
This patch removes the top level menu Code maturity level options, and moves its options into menu General setup. This makes Kconfig less cluttered and easier to setup. Cc: Andrew Morton [EMAIL PROTECTED] Signed-off-by: Al Boldi [EMAIL PROTECTED] --- --- a/init/Kconfig 2007-07-09 06:38:47.0 +0300 +++ b/init/Kconfig 2007-07-21 06:42:06.0 +0300 @@ -7,7 +7,7 @@ config DEFCONFIG_LIST default /boot/config-$UNAME_RELEASE default arch/$ARCH/defconfig -menu Code maturity level options +menu General setup config EXPERIMENTAL bool Prompt for development and/or incomplete code/drivers @@ -61,9 +61,6 @@ config INIT_ENV_ARG_LIMIT Maximum of each of the number of arguments and environment variables passed to init from the kernel command line. -endmenu - -menu General setup config LOCALVERSION string Local version - append to kernel release - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 0/8] readahead cleanups and interleaved readahead take 2
Linus, To save you from some merge conflicts, I rebased this readahead patchset to 2.6.22-git5. The following patches are based on yesterday's discussions, compiled and tested OK. smaller file_ra_state: [PATCH 1/8] compacting file_ra_state [PATCH 2/8] mmap read-around simplification [PATCH 3/8] combine file_ra_state.prev_index/prev_offset into prev_pos code cleanups: [PATCH 4/8] trivial filemap.c cleanups [PATCH 5/8] remove several readahead macros [PATCH 6/8] remove the limit max_sectors_kb imposed on max_readahead_kb support of interleaved reads: [PATCH 7/8] introduce radix_tree_scan_hole() [PATCH 8/8] basic support of interleaved reads The diffstat is block/ll_rw_blk.c |9 - fs/ext3/dir.c |2 - fs/ext4/dir.c |2 - fs/splice.c|2 - include/linux/fs.h | 14 +++- include/linux/mm.h |2 - include/linux/radix-tree.h |2 + lib/radix-tree.c | 34 mm/filemap.c | 31 +- mm/readahead.c | 58 +++ 10 files changed, 92 insertions(+), 64 deletions(-) Regards, Fengguang Wu --- - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 8/8] basic support of interleaved reads
This is a simplified version of the pagecache context based readahead. It handles the case of multiple threads reading on the same fd and invalidating each others' readahead state. It does the trick by scanning the pagecache and recovering the current read stream's readahead status. The algorithm works in a opportunistic way, in that it do not try to detect interleaved reads _actively_, which requires a probe into the page cache(which means a little more overheads for random reads). It only tries to handle a previously started sequential readahead whose state was overwritten by another concurrent stream, and it can do this job pretty well. Negative and positive examples(or what you can expect from it): 1) it cannot detect and serve perfect request-by-request interleaved reads right: timestream 1 stream 2 0 1 1 1001 2 2 3 1002 4 3 5 1003 6 4 7 1004 8 5 9 1005 Here no single readahead will be carried out. 2) However, if it's two concurrent reads by two threads, the chance of the initial sequential readahead be started is huge. Once the first sequential readahead is started for a stream, this patch will ensure that the readahead window continues to rampup and won't be disturbed by other streams. timestream 1 stream 2 0 1 1 2 2 1001 3 3 4 1002 5 1003 6 4 7 5 8 1004 9 6 101005 11 7 121006 131007 Here steam 1 will start a readahead at page 2, and stream 2 will start its first readahead at page 1003. From then on the two streams will be served right. Cc: Nick Piggin [EMAIL PROTECTED] Cc: Rusty Russell [EMAIL PROTECTED] Signed-off-by: Fengguang Wu [EMAIL PROTECTED] --- mm/readahead.c | 33 +++-- 1 file changed, 23 insertions(+), 10 deletions(-) --- linux-2.6.22-git15.orig/mm/readahead.c +++ linux-2.6.22-git15/mm/readahead.c @@ -371,6 +371,29 @@ ondemand_readahead(struct address_space } /* +* Hit a marked page without valid readahead state. +* E.g. interleaved reads. +* Query the pagecache for async_size, which normally equals to +* readahead size. Ramp it up and use it as the new readahead size. +*/ + if (hit_readahead_marker) { + pgoff_t start; + + read_lock_irq(mapping-tree_lock); + start = radix_tree_scan_hole(mapping-page_tree, offset, max+1); + read_unlock_irq(mapping-tree_lock); + + if (!start || start - offset max) + return 0; + + ra-start = start; + ra-size = start - offset; /* old async_size */ + ra-size = get_next_ra_size(ra, max); + ra-async_size = ra-size; + goto readit; + } + + /* * It may be one of * - first read on start of file * - sequential cache miss @@ -381,16 +404,6 @@ ondemand_readahead(struct address_space ra-size = get_init_ra_size(req_size, max); ra-async_size = ra-size req_size ? ra-size - req_size : ra-size; - /* -* Hit on a marked page without valid readahead state. -* E.g. interleaved reads. -* Not knowing its readahead pos/size, bet on the minimal possible one. -*/ - if (hit_readahead_marker) { - ra-start++; - ra-size = get_next_ra_size(ra, max); - } - readit: return ra_submit(ra, mapping, filp); } -- - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 1/8] compacting file_ra_state
Use 'unsigned int' instead of 'unsigned long' for readahead sizes. This helps reduce memory consumption on 64bit CPU when a lot of files are opened. CC: Andi Kleen [EMAIL PROTECTED] Signed-off-by: Fengguang Wu [EMAIL PROTECTED] --- include/linux/fs.h |8 mm/readahead.c |2 +- 2 files changed, 5 insertions(+), 5 deletions(-) --- linux-2.6.22-git15.orig/include/linux/fs.h +++ linux-2.6.22-git15/include/linux/fs.h @@ -697,12 +697,12 @@ struct fown_struct { * Track a single file's readahead state */ struct file_ra_state { - pgoff_t start; /* where readahead started */ - unsigned long size; /* # of readahead pages */ - unsigned long async_size; /* do asynchronous readahead when + pgoff_t start; /* where readahead started */ + unsigned int size; /* # of readahead pages */ + unsigned int async_size;/* do asynchronous readahead when there are only # of pages ahead */ - unsigned long ra_pages; /* Maximum readahead window */ + unsigned int ra_pages; /* Maximum readahead window */ unsigned long mmap_hit; /* Cache hit stat for mmap accesses */ unsigned long mmap_miss;/* Cache miss stat for mmap accesses */ unsigned long prev_index; /* Cache last read() position */ --- linux-2.6.22-git15.orig/mm/readahead.c +++ linux-2.6.22-git15/mm/readahead.c @@ -350,7 +350,7 @@ ondemand_readahead(struct address_space bool hit_readahead_marker, pgoff_t offset, unsigned long req_size) { - unsigned long max; /* max readahead pages */ + int max;/* max readahead pages */ int sequential; max = ra-ra_pages; -- - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 5/8] remove several readahead macros
Remove VM_MAX_CACHE_HIT, MAX_RA_PAGES and MIN_RA_PAGES. Signed-off-by: Fengguang Wu [EMAIL PROTECTED] --- include/linux/mm.h |2 -- mm/readahead.c | 10 +- 2 files changed, 1 insertion(+), 11 deletions(-) --- linux-2.6.22-git15.orig/include/linux/mm.h +++ linux-2.6.22-git15/include/linux/mm.h @@ -1136,8 +1136,6 @@ int write_one_page(struct page *page, in /* readahead.c */ #define VM_MAX_READAHEAD 128 /* kbytes */ #define VM_MIN_READAHEAD 16 /* kbytes (includes current page) */ -#define VM_MAX_CACHE_HIT 256 /* max pages in a row in cache before -* turning readahead off */ int do_page_cache_readahead(struct address_space *mapping, struct file *filp, pgoff_t offset, unsigned long nr_to_read); --- linux-2.6.22-git15.orig/mm/readahead.c +++ linux-2.6.22-git15/mm/readahead.c @@ -21,16 +21,8 @@ void default_unplug_io_fn(struct backing } EXPORT_SYMBOL(default_unplug_io_fn); -/* - * Convienent macros for min/max read-ahead pages. - * Note that MAX_RA_PAGES is rounded down, while MIN_RA_PAGES is rounded up. - * The latter is necessary for systems with large page size(i.e. 64k). - */ -#define MAX_RA_PAGES (VM_MAX_READAHEAD*1024 / PAGE_CACHE_SIZE) -#define MIN_RA_PAGES DIV_ROUND_UP(VM_MIN_READAHEAD*1024, PAGE_CACHE_SIZE) - struct backing_dev_info default_backing_dev_info = { - .ra_pages = MAX_RA_PAGES, + .ra_pages = VM_MAX_READAHEAD * 1024 / PAGE_CACHE_SIZE, .state = 0, .capabilities = BDI_CAP_MAP_COPY, .unplug_io_fn = default_unplug_io_fn, -- - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 6/8] remove the limit max_sectors_kb imposed on max_readahead_kb
Remove the size limit max_sectors_kb imposed on max_readahead_kb. The size restriction is unreasonable. Especially when max_sectors_kb cannot grow larger than max_hw_sectors_kb, which can be rather small for some disk drives. Cc: Jens Axboe [EMAIL PROTECTED] Signed-off-by: Fengguang Wu [EMAIL PROTECTED] Acked-by: Jens Axboe [EMAIL PROTECTED] --- block/ll_rw_blk.c |9 - 1 file changed, 9 deletions(-) --- linux-2.6.22-git15.orig/block/ll_rw_blk.c +++ linux-2.6.22-git15/block/ll_rw_blk.c @@ -3946,7 +3946,6 @@ queue_max_sectors_store(struct request_q max_hw_sectors_kb = q-max_hw_sectors 1, page_kb = 1 (PAGE_CACHE_SHIFT - 10); ssize_t ret = queue_var_store(max_sectors_kb, page, count); - int ra_kb; if (max_sectors_kb max_hw_sectors_kb || max_sectors_kb page_kb) return -EINVAL; @@ -3955,14 +3954,6 @@ queue_max_sectors_store(struct request_q * values synchronously: */ spin_lock_irq(q-queue_lock); - /* -* Trim readahead window as well, if necessary: -*/ - ra_kb = q-backing_dev_info.ra_pages (PAGE_CACHE_SHIFT - 10); - if (ra_kb max_sectors_kb) - q-backing_dev_info.ra_pages = - max_sectors_kb (PAGE_CACHE_SHIFT - 10); - q-max_sectors = max_sectors_kb 1; spin_unlock_irq(q-queue_lock); -- - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 7/8] introduce radix_tree_scan_hole()
Introduce radix_tree_scan_hole(root, index, max_scan) to scan radix tree for the first hole. It will be used in interleaved readahead. The implementation is dumb and obviously correct. It can help debug(and document) the possible smart one in future. Cc: Nick Piggin [EMAIL PROTECTED] Signed-off-by: Fengguang Wu [EMAIL PROTECTED] --- include/linux/radix-tree.h |2 ++ lib/radix-tree.c | 34 ++ 2 files changed, 36 insertions(+) --- linux-2.6.22-git15.orig/include/linux/radix-tree.h +++ linux-2.6.22-git15/include/linux/radix-tree.h @@ -155,6 +155,8 @@ void *radix_tree_delete(struct radix_tre unsigned int radix_tree_gang_lookup(struct radix_tree_root *root, void **results, unsigned long first_index, unsigned int max_items); +unsigned long radix_tree_scan_hole(struct radix_tree_root *root, + unsigned long index, unsigned long max_scan); int radix_tree_preload(gfp_t gfp_mask); void radix_tree_init(void); void *radix_tree_tag_set(struct radix_tree_root *root, --- linux-2.6.22-git15.orig/lib/radix-tree.c +++ linux-2.6.22-git15/lib/radix-tree.c @@ -599,6 +599,40 @@ int radix_tree_tag_get(struct radix_tree EXPORT_SYMBOL(radix_tree_tag_get); #endif +static unsigned long +radix_tree_scan_hole_dumb(struct radix_tree_root *root, + unsigned long index, unsigned long max_scan) +{ + unsigned long i; + + for (i = 0; i max_scan; i++) { + if (!radix_tree_lookup(root, index)) + break; + if (++index == 0) + break; + } + + return index; +} + +/** + * radix_tree_scan_hole-scan for hole + * @root: radix tree root + * @index: index key + * @max_scan: advice on max items to scan (it may scan a little more) + * + * Scan forward from @index for a hole/empty item, stop when + * - hit hole + * - wrap-around to index 0 + * - @max_scan or more items scanned + */ +unsigned long radix_tree_scan_hole(struct radix_tree_root *root, + unsigned long index, unsigned long max_scan) +{ + return radix_tree_scan_hole_dumb(root, index, max_scan); +} +EXPORT_SYMBOL(radix_tree_scan_hole); + static unsigned int __lookup(struct radix_tree_node *slot, void **results, unsigned long index, unsigned int max_items, unsigned long *next_index) -- - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 3/8] combine file_ra_state.prev_index/prev_offset into prev_pos
Combine the file_ra_state members unsigned long prev_index unsigned int prev_offset into loff_t prev_pos It is more consistent and better supports huge files. Thanks to Peter for the nice proposal! Cc: Peter Zijlstra [EMAIL PROTECTED] Cc: Christoph Lameter [EMAIL PROTECTED] Signed-off-by: Fengguang Wu [EMAIL PROTECTED] --- fs/ext3/dir.c |2 +- fs/ext4/dir.c |2 +- fs/splice.c|2 +- include/linux/fs.h |3 +-- mm/filemap.c | 11 ++- mm/readahead.c | 15 --- 6 files changed, 18 insertions(+), 17 deletions(-) --- linux-2.6.22-git15.orig/include/linux/fs.h +++ linux-2.6.22-git15/include/linux/fs.h @@ -704,8 +704,7 @@ struct file_ra_state { unsigned int ra_pages; /* Maximum readahead window */ int mmap_miss; /* Cache miss stat for mmap accesses */ - unsigned long prev_index; /* Cache last read() position */ - unsigned int prev_offset; /* Offset where last read() ended in a page */ + loff_t prev_pos;/* Cache last read() position */ }; /* --- linux-2.6.22-git15.orig/mm/filemap.c +++ linux-2.6.22-git15/mm/filemap.c @@ -879,8 +879,8 @@ void do_generic_mapping_read(struct addr cached_page = NULL; index = *ppos PAGE_CACHE_SHIFT; next_index = index; - prev_index = ra.prev_index; - prev_offset = ra.prev_offset; + prev_index = ra.prev_pos PAGE_CACHE_SHIFT; + prev_offset = ra.prev_pos (PAGE_CACHE_SIZE-1); last_index = (*ppos + desc-count + PAGE_CACHE_SIZE-1) PAGE_CACHE_SHIFT; offset = *ppos ~PAGE_CACHE_MASK; @@ -966,7 +966,6 @@ page_ok: index += offset PAGE_CACHE_SHIFT; offset = ~PAGE_CACHE_MASK; prev_offset = offset; - ra.prev_offset = offset; page_cache_release(page); if (ret == nr desc-count) @@ -1056,7 +1055,9 @@ no_cached_page: out: *_ra = ra; - _ra-prev_index = prev_index; + _ra-prev_pos = prev_index; + _ra-prev_pos = PAGE_CACHE_SHIFT; + _ra-prev_pos |= prev_offset; *ppos = ((loff_t) index PAGE_CACHE_SHIFT) + offset; if (cached_page) @@ -1415,7 +1416,7 @@ retry_find: * Found the page and have a reference on it. */ mark_page_accessed(page); - ra-prev_index = page-index; + ra-prev_pos = page-index PAGE_CACHE_SHIFT; vmf-page = page; return ret | VM_FAULT_LOCKED; --- linux-2.6.22-git15.orig/mm/readahead.c +++ linux-2.6.22-git15/mm/readahead.c @@ -45,7 +45,7 @@ void file_ra_state_init(struct file_ra_state *ra, struct address_space *mapping) { ra-ra_pages = mapping-backing_dev_info-ra_pages; - ra-prev_index = -1; + ra-prev_pos = -1; } EXPORT_SYMBOL_GPL(file_ra_state_init); @@ -326,7 +326,7 @@ static unsigned long get_next_ra_size(st * indicator. The flag won't be set on already cached pages, to avoid the * readahead-for-nothing fuss, saving pointless page cache lookups. * - * prev_index tracks the last visited page in the _previous_ read request. + * prev_pos tracks the last visited byte in the _previous_ read request. * It should be maintained by the caller, and will be used for detecting * small random reads. Note that the readahead algorithm checks loosely * for sequential patterns. Hence interleaved reads might be served as @@ -350,11 +350,9 @@ ondemand_readahead(struct address_space bool hit_readahead_marker, pgoff_t offset, unsigned long req_size) { - int max;/* max readahead pages */ - int sequential; - - max = ra-ra_pages; - sequential = (offset - ra-prev_index = 1UL) || (req_size max); + int max = ra-ra_pages; /* max readahead pages */ + pgoff_t prev_offset; + int sequential; /* * It's the expected callback offset, assume sequential access. @@ -368,6 +366,9 @@ ondemand_readahead(struct address_space goto readit; } + prev_offset = ra-prev_pos PAGE_CACHE_SHIFT; + sequential = offset - prev_offset = 1UL || req_size max; + /* * Standalone, small read. * Read as is, and do not pollute the readahead state. --- linux-2.6.22-git15.orig/fs/ext3/dir.c +++ linux-2.6.22-git15/fs/ext3/dir.c @@ -143,7 +143,7 @@ static int ext3_readdir(struct file * fi sb-s_bdev-bd_inode-i_mapping, filp-f_ra, filp, index, 1); - filp-f_ra.prev_index = index; + filp-f_ra.prev_pos = index PAGE_CACHE_SHIFT; bh = ext3_bread(NULL, inode, blk, 0, err); } ---
[PATCH 4/8] trivial filemap.c cleanups
- remove unused local next_index in do_generic_mapping_read() - convert some 'unsigned long' to pgoff_t - wrap a long line Signed-off-by: Fengguang Wu [EMAIL PROTECTED] --- mm/filemap.c | 16 +++- 1 file changed, 7 insertions(+), 9 deletions(-) --- linux-2.6.22-git15.orig/mm/filemap.c +++ linux-2.6.22-git15/mm/filemap.c @@ -866,11 +866,10 @@ void do_generic_mapping_read(struct addr read_actor_t actor) { struct inode *inode = mapping-host; - unsigned long index; - unsigned long offset; - unsigned long last_index; - unsigned long next_index; - unsigned long prev_index; + pgoff_t index; + pgoff_t offset; + pgoff_t last_index; + pgoff_t prev_index; unsigned int prev_offset; struct page *cached_page; int error; @@ -878,7 +877,6 @@ void do_generic_mapping_read(struct addr cached_page = NULL; index = *ppos PAGE_CACHE_SHIFT; - next_index = index; prev_index = ra.prev_pos PAGE_CACHE_SHIFT; prev_offset = ra.prev_pos (PAGE_CACHE_SIZE-1); last_index = (*ppos + desc-count + PAGE_CACHE_SIZE-1) PAGE_CACHE_SHIFT; @@ -1219,7 +1217,8 @@ out: } EXPORT_SYMBOL(generic_file_aio_read); -int file_send_actor(read_descriptor_t * desc, struct page *page, unsigned long offset, unsigned long size) +int file_send_actor(read_descriptor_t * desc, struct page *page, + unsigned long offset, unsigned long size) { ssize_t written; unsigned long count = desc-count; @@ -1272,7 +1271,6 @@ asmlinkage ssize_t sys_readahead(int fd, } #ifdef CONFIG_MMU -static int FASTCALL(page_cache_read(struct file * file, unsigned long offset)); /** * page_cache_read - adds requested page to the page cache if not already there * @file: file to read @@ -1281,7 +1279,7 @@ static int FASTCALL(page_cache_read(stru * This adds the requested page to the page cache if it isn't already there, * and schedules an I/O to read in its contents from disk. */ -static int fastcall page_cache_read(struct file * file, unsigned long offset) +static int fastcall page_cache_read(struct file * file, pgoff_t offset) { struct address_space *mapping = file-f_mapping; struct page *page; -- - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 2/8] mmap read-around simplification
Fold file_ra_state.mmap_hit into file_ra_state.mmap_miss and make it an int. Signed-off-by: Fengguang Wu [EMAIL PROTECTED] --- include/linux/fs.h |3 +-- mm/filemap.c |4 ++-- 2 files changed, 3 insertions(+), 4 deletions(-) --- linux-2.6.22-git15.orig/include/linux/fs.h +++ linux-2.6.22-git15/include/linux/fs.h @@ -703,8 +703,7 @@ struct file_ra_state { there are only # of pages ahead */ unsigned int ra_pages; /* Maximum readahead window */ - unsigned long mmap_hit; /* Cache hit stat for mmap accesses */ - unsigned long mmap_miss;/* Cache miss stat for mmap accesses */ + int mmap_miss; /* Cache miss stat for mmap accesses */ unsigned long prev_index; /* Cache last read() position */ unsigned int prev_offset; /* Offset where last read() ended in a page */ }; --- linux-2.6.22-git15.orig/mm/filemap.c +++ linux-2.6.22-git15/mm/filemap.c @@ -1369,7 +1369,7 @@ retry_find: * Do we miss much more than hit in this file? If so, * stop bothering with read-ahead. It will only hurt. */ - if (ra-mmap_miss ra-mmap_hit + MMAP_LOTSAMISS) + if (ra-mmap_miss MMAP_LOTSAMISS) goto no_cached_page; /* @@ -1395,7 +1395,7 @@ retry_find: } if (!did_readaround) - ra-mmap_hit++; + ra-mmap_miss--; /* * We have a locked page in the page cache, now we need to check -- - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/8] compacting file_ra_state
Sorry, forgot to prefix the patch titles with [readahead]. Should I repost? - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [ofa-general] [PATCH 1/5] ehca: Supports large page MRs
I applied this, but I agree with checkpatch.pl: WARNING: externs should be avoided in .c files #227: FILE: drivers/infiniband/hw/ehca/ehca_mrmw.c:67: +extern int ehca_mr_largepage; WARNING: externs should be avoided in .c files #949: FILE: drivers/infiniband/hw/ehca/hcp_if.c:753: +extern int ehca_debug_level; if you need to use a variable in more than one .c file, put the extern declaration in a common header that's included everywhere you use the variable, including the .c file that it is defined in. That way the compiler can see if you get confused about the type of the variable. When you get a chance, please post a follow-on patch to fix this. - R. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2/5] ehca: Generate event when SRQ limit reached
thanks, applied. BTW, does your SRQ-capable hardware support generating the last WQE reached event? There's not any reliable way to avoid problems when destroying QPs attached to an SRQ without it, and the IB spec requires CAs that support SRQs to generate it (o11-5.2.5 in chapter 11 of vol 1). I don't see any code in ehca to generate the event, and IPoIB CM at least will be very unhappy when using SRQs if the event is not generated. - R. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 3/5] ehca: Make ehca2ib_return_code() non-inline
thanks, applied - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Fixing lables after GNU indent (Re: [PATCH 1/2] run scripts/Lindent on it to match Documentation/CodingStyle)
[] sed -i -e 's/^\t* \(\w*:\)/ \1/' $@ which will replace the leading tabs and spaces with one space. It should leave case labels unmolested, as they should be indented with tabs, not 6 spaces. Any regexp ninjas want to have a go at something better? I'm the one. Trying to write portable, optimized and easy to understand scripts [0]. Please, describe more what must be done, and i will do it. Case labels are handled very strangely in you example. OK. indent will indent labels to a column number that's a multiple of 8, plus 6. So it may start in column 6, 14, 20, 28, etc. I'm not quite sure what the definition of a label is; I had it as \w*: up there, but I don't know if that would match the _. The point is to *not* handle case labels, only goto labels. t=`printf '\t'` sed -i s_^\($t*\) *\([^:]*:\)_\1\2_ $@ ^-_ I'm not sure about leaving one space `here, otherwise it removes spaces between (supposedly right indented) line start, i.e. nothing or tab(s), and a label, i.e. `label_name:' without space before colon; `label_name' here actually not a colon, let's leave that kind of breakage to compiler. The variable $t is used for readability of the regex and because POSIX BREs leave undefined characters after a backslash, POSIX sed defines only \n as a new line. -- -o--=O`C #oo'L O ___=E M - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [ofa-general] [PATCH 5/5] ehca: Support small QP queues
thanks, applied. I fixed this up myself to work with commit 20c2df83, which got rid of the destructor argument to kmem_cache_create() -- you probably want to check my tree to make sure it's OK. Also the same as I said before about checkpatch.pl's warning: WARNING: externs should be avoided in .c files #337: FILE: drivers/infiniband/hw/ehca/ehca_pd.c:91: + extern struct kmem_cache *small_qp_cache; please fix that up when you get a chance - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/8] compacting file_ra_state
On Sat, 21 Jul 2007, Fengguang Wu wrote: Sorry, forgot to prefix the patch titles with [readahead]. Should I repost? Not for me, but on the other hand, I'd prefer for this to be in -mm a bit, even if it does mean missing the merge window this time around. Linus - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/8] compacting file_ra_state
On Fri, Jul 20, 2007 at 09:27:01PM -0700, Linus Torvalds wrote: On Sat, 21 Jul 2007, Fengguang Wu wrote: Sorry, forgot to prefix the patch titles with [readahead]. Should I repost? Not for me, but on the other hand, I'd prefer for this to be in -mm a bit, even if it does mean missing the merge window this time around. OK. Let me repost it... - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 5/7] readahead: remove the limit max_sectors_kb imposed on max_readahead_kb
Remove the size limit max_sectors_kb imposed on max_readahead_kb. The size restriction is unreasonable. Especially when max_sectors_kb cannot grow larger than max_hw_sectors_kb, which can be rather small for some disk drives. Cc: Jens Axboe [EMAIL PROTECTED] Signed-off-by: Fengguang Wu [EMAIL PROTECTED] Acked-by: Jens Axboe [EMAIL PROTECTED] --- block/ll_rw_blk.c |9 - 1 file changed, 9 deletions(-) --- linux-2.6.22-rc6-mm1.orig/block/ll_rw_blk.c +++ linux-2.6.22-rc6-mm1/block/ll_rw_blk.c @@ -3945,7 +3945,6 @@ queue_max_sectors_store(struct request_q max_hw_sectors_kb = q-max_hw_sectors 1, page_kb = 1 (PAGE_CACHE_SHIFT - 10); ssize_t ret = queue_var_store(max_sectors_kb, page, count); - int ra_kb; if (max_sectors_kb max_hw_sectors_kb || max_sectors_kb page_kb) return -EINVAL; @@ -3954,14 +3953,6 @@ queue_max_sectors_store(struct request_q * values synchronously: */ spin_lock_irq(q-queue_lock); - /* -* Trim readahead window as well, if necessary: -*/ - ra_kb = q-backing_dev_info.ra_pages (PAGE_CACHE_SHIFT - 10); - if (ra_kb max_sectors_kb) - q-backing_dev_info.ra_pages = - max_sectors_kb (PAGE_CACHE_SHIFT - 10); - q-max_sectors = max_sectors_kb 1; spin_unlock_irq(q-queue_lock); -- - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 4/7] readahead: remove several readahead macros
Remove VM_MAX_CACHE_HIT, MAX_RA_PAGES and MIN_RA_PAGES. Signed-off-by: Fengguang Wu [EMAIL PROTECTED] --- include/linux/mm.h |2 -- mm/readahead.c | 10 +- 2 files changed, 1 insertion(+), 11 deletions(-) --- linux-2.6.22-rc6-mm1.orig/include/linux/mm.h +++ linux-2.6.22-rc6-mm1/include/linux/mm.h @@ -1148,8 +1148,6 @@ int write_one_page(struct page *page, in /* readahead.c */ #define VM_MAX_READAHEAD 128 /* kbytes */ #define VM_MIN_READAHEAD 16 /* kbytes (includes current page) */ -#define VM_MAX_CACHE_HIT 256 /* max pages in a row in cache before -* turning readahead off */ int do_page_cache_readahead(struct address_space *mapping, struct file *filp, pgoff_t offset, unsigned long nr_to_read); --- linux-2.6.22-rc6-mm1.orig/mm/readahead.c +++ linux-2.6.22-rc6-mm1/mm/readahead.c @@ -21,16 +21,8 @@ void default_unplug_io_fn(struct backing } EXPORT_SYMBOL(default_unplug_io_fn); -/* - * Convienent macros for min/max read-ahead pages. - * Note that MAX_RA_PAGES is rounded down, while MIN_RA_PAGES is rounded up. - * The latter is necessary for systems with large page size(i.e. 64k). - */ -#define MAX_RA_PAGES (VM_MAX_READAHEAD*1024 / PAGE_CACHE_SIZE) -#define MIN_RA_PAGES DIV_ROUND_UP(VM_MIN_READAHEAD*1024, PAGE_CACHE_SIZE) - struct backing_dev_info default_backing_dev_info = { - .ra_pages = MAX_RA_PAGES, + .ra_pages = VM_MAX_READAHEAD * 1024 / PAGE_CACHE_SIZE, .state = 0, .capabilities = BDI_CAP_MAP_COPY, .unplug_io_fn = default_unplug_io_fn, -- - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 7/7] readahead: basic support of interleaved reads
This is a simplified version of the pagecache context based readahead. It handles the case of multiple threads reading on the same fd and invalidating each others' readahead state. It does the trick by scanning the pagecache and recovering the current read stream's readahead status. The algorithm works in a opportunistic way, in that it do not try to detect interleaved reads _actively_, which requires a probe into the page cache(which means a little more overheads for random reads). It only tries to handle a previously started sequential readahead whose state was overwritten by another concurrent stream, and it can do this job pretty well. Negative and positive examples(or what you can expect from it): 1) it cannot detect and serve perfect request-by-request interleaved reads right: timestream 1 stream 2 0 1 1 1001 2 2 3 1002 4 3 5 1003 6 4 7 1004 8 5 9 1005 Here no single readahead will be carried out. 2) However, if it's two concurrent reads by two threads, the chance of the initial sequential readahead be started is huge. Once the first sequential readahead is started for a stream, this patch will ensure that the readahead window continues to rampup and won't be disturbed by other streams. timestream 1 stream 2 0 1 1 2 2 1001 3 3 4 1002 5 1003 6 4 7 5 8 1004 9 6 101005 11 7 121006 131007 Here steam 1 will start a readahead at page 2, and stream 2 will start its first readahead at page 1003. From then on the two streams will be served right. Cc: Rusty Russell [EMAIL PROTECTED] Signed-off-by: Fengguang Wu [EMAIL PROTECTED] --- mm/readahead.c | 33 +++-- 1 file changed, 23 insertions(+), 10 deletions(-) --- linux-2.6.22-rc6-mm1.orig/mm/readahead.c +++ linux-2.6.22-rc6-mm1/mm/readahead.c @@ -363,6 +363,29 @@ ondemand_readahead(struct address_space } /* +* Hit a marked page without valid readahead state. +* E.g. interleaved reads. +* Query the pagecache for async_size, which normally equals to +* readahead size. Ramp it up and use it as the new readahead size. +*/ + if (hit_readahead_marker) { + pgoff_t start; + + read_lock_irq(mapping-tree_lock); + start = radix_tree_scan_hole(mapping-page_tree, offset, max+1); + read_unlock_irq(mapping-tree_lock); + + if (!start || start - offset max) + return 0; + + ra-start = start; + ra-size = start - offset; /* old async_size */ + ra-size = get_next_ra_size(ra, max); + ra-async_size = ra-size; + goto readit; + } + + /* * It may be one of * - first read on start of file * - sequential cache miss @@ -373,16 +396,6 @@ ondemand_readahead(struct address_space ra-size = get_init_ra_size(req_size, max); ra-async_size = ra-size req_size ? ra-size - req_size : ra-size; - /* -* Hit on a marked page without valid readahead state. -* E.g. interleaved reads. -* Not knowing its readahead pos/size, bet on the minimal possible one. -*/ - if (hit_readahead_marker) { - ra-start++; - ra-size = get_next_ra_size(ra, max); - } - readit: return ra_submit(ra, mapping, filp); } -- - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 1/7] readahead: compacting file_ra_state
Use 'unsigned int' instead of 'unsigned long' for readahead sizes. This helps reduce memory consumption on 64bit CPU when a lot of files are opened. CC: Andi Kleen [EMAIL PROTECTED] Signed-off-by: Fengguang Wu [EMAIL PROTECTED] --- include/linux/fs.h |8 mm/filemap.c |2 +- mm/readahead.c |2 +- 3 files changed, 6 insertions(+), 6 deletions(-) --- linux-2.6.22-rc6-mm1.orig/include/linux/fs.h +++ linux-2.6.22-rc6-mm1/include/linux/fs.h @@ -771,12 +771,12 @@ struct fown_struct { * Track a single file's readahead state */ struct file_ra_state { - pgoff_t start; /* where readahead started */ - unsigned long size; /* # of readahead pages */ - unsigned long async_size; /* do asynchronous readahead when + pgoff_t start; /* where readahead started */ + unsigned int size; /* # of readahead pages */ + unsigned int async_size;/* do asynchronous readahead when there are only # of pages ahead */ - unsigned long ra_pages; /* Maximum readahead window */ + unsigned int ra_pages; /* Maximum readahead window */ unsigned long mmap_hit; /* Cache hit stat for mmap accesses */ unsigned long mmap_miss;/* Cache miss stat for mmap accesses */ unsigned long prev_index; /* Cache last read() position */ --- linux-2.6.22-rc6-mm1.orig/mm/filemap.c +++ linux-2.6.22-rc6-mm1/mm/filemap.c @@ -840,7 +840,7 @@ static void shrink_readahead_size_eio(st if (count 5) return; count++; - printk(KERN_WARNING Reducing readahead size to %luK\n, + printk(KERN_WARNING Reducing readahead size to %dK\n, ra-ra_pages (PAGE_CACHE_SHIFT - 10)); } --- linux-2.6.22-rc6-mm1.orig/mm/readahead.c +++ linux-2.6.22-rc6-mm1/mm/readahead.c @@ -342,7 +342,7 @@ ondemand_readahead(struct address_space bool hit_readahead_marker, pgoff_t offset, unsigned long req_size) { - unsigned long max; /* max readahead pages */ + int max;/* max readahead pages */ int sequential; max = ra-ra_pages; -- - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 6/7] radixtree: introduce radix_tree_scan_hole()
Introduce radix_tree_scan_hole(root, index, max_scan) to scan radix tree for the first hole. It will be used in interleaved readahead. The implementation is dumb and obviously correct. It can help debug(and document) the possible smart one in future. Cc: Nick Piggin [EMAIL PROTECTED] Signed-off-by: Fengguang Wu [EMAIL PROTECTED] --- include/linux/radix-tree.h |2 ++ lib/radix-tree.c | 34 ++ 2 files changed, 36 insertions(+) --- linux-2.6.22-rc6-mm1.orig/include/linux/radix-tree.h +++ linux-2.6.22-rc6-mm1/include/linux/radix-tree.h @@ -155,6 +155,8 @@ void *radix_tree_delete(struct radix_tre unsigned int radix_tree_gang_lookup(struct radix_tree_root *root, void **results, unsigned long first_index, unsigned int max_items); +unsigned long radix_tree_scan_hole(struct radix_tree_root *root, + unsigned long index, unsigned long max_scan); int radix_tree_preload(gfp_t gfp_mask); void radix_tree_init(void); void *radix_tree_tag_set(struct radix_tree_root *root, --- linux-2.6.22-rc6-mm1.orig/lib/radix-tree.c +++ linux-2.6.22-rc6-mm1/lib/radix-tree.c @@ -601,6 +601,40 @@ int radix_tree_tag_get(struct radix_tree EXPORT_SYMBOL(radix_tree_tag_get); #endif +static unsigned long +radix_tree_scan_hole_dumb(struct radix_tree_root *root, + unsigned long index, unsigned long max_scan) +{ + unsigned long i; + + for (i = 0; i max_scan; i++) { + if (!radix_tree_lookup(root, index)) + break; + if (++index == 0) + break; + } + + return index; +} + +/** + * radix_tree_scan_hole-scan for hole + * @root: radix tree root + * @index: index key + * @max_scan: advice on max items to scan (it may scan a little more) + * + * Scan forward from @index for a hole/empty item, stop when + * - hit hole + * - wrap-around to index 0 + * - @max_scan or more items scanned + */ +unsigned long radix_tree_scan_hole(struct radix_tree_root *root, + unsigned long index, unsigned long max_scan) +{ + return radix_tree_scan_hole_dumb(root, index, max_scan); +} +EXPORT_SYMBOL(radix_tree_scan_hole); + static unsigned int __lookup(struct radix_tree_node *slot, void **results, unsigned long index, unsigned int max_items, unsigned long *next_index) -- - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 2/7] readahead: mmap read-around simplification
Fold file_ra_state.mmap_hit into file_ra_state.mmap_miss and make it an int. Signed-off-by: Fengguang Wu [EMAIL PROTECTED] --- include/linux/fs.h |3 +-- mm/filemap.c |4 ++-- 2 files changed, 3 insertions(+), 4 deletions(-) --- linux-2.6.22-rc6-mm1.orig/include/linux/fs.h +++ linux-2.6.22-rc6-mm1/include/linux/fs.h @@ -777,8 +777,7 @@ struct file_ra_state { there are only # of pages ahead */ unsigned int ra_pages; /* Maximum readahead window */ - unsigned long mmap_hit; /* Cache hit stat for mmap accesses */ - unsigned long mmap_miss;/* Cache miss stat for mmap accesses */ + int mmap_miss; /* Cache miss stat for mmap accesses */ unsigned long prev_index; /* Cache last read() position */ unsigned int prev_offset; /* Offset where last read() ended in a page */ }; --- linux-2.6.22-rc6-mm1.orig/mm/filemap.c +++ linux-2.6.22-rc6-mm1/mm/filemap.c @@ -1389,7 +1389,7 @@ retry_find: * Do we miss much more than hit in this file? If so, * stop bothering with read-ahead. It will only hurt. */ - if (ra-mmap_miss ra-mmap_hit + MMAP_LOTSAMISS) + if (ra-mmap_miss MMAP_LOTSAMISS) goto no_cached_page; /* @@ -1415,7 +1415,7 @@ retry_find: } if (!did_readaround) - ra-mmap_hit++; + ra-mmap_miss--; /* * We have a locked page in the page cache, now we need to check -- - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 3/7] readahead: combine file_ra_state.prev_index/prev_offset into prev_pos
Combine the file_ra_state members unsigned long prev_index unsigned int prev_offset into loff_t prev_pos It is more consistent and better supports huge files. Thanks to Peter for the nice proposal! Cc: Peter Zijlstra [EMAIL PROTECTED] Signed-off-by: Fengguang Wu [EMAIL PROTECTED] --- fs/ext3/dir.c |2 +- fs/ext4/dir.c |2 +- fs/splice.c|2 +- include/linux/fs.h |3 +-- mm/filemap.c | 11 ++- mm/readahead.c | 15 --- 6 files changed, 18 insertions(+), 17 deletions(-) --- linux-2.6.22-rc6-mm1.orig/include/linux/fs.h +++ linux-2.6.22-rc6-mm1/include/linux/fs.h @@ -778,8 +778,7 @@ struct file_ra_state { unsigned int ra_pages; /* Maximum readahead window */ int mmap_miss; /* Cache miss stat for mmap accesses */ - unsigned long prev_index; /* Cache last read() position */ - unsigned int prev_offset; /* Offset where last read() ended in a page */ + loff_t prev_pos;/* Cache last read() position */ }; /* --- linux-2.6.22-rc6-mm1.orig/mm/filemap.c +++ linux-2.6.22-rc6-mm1/mm/filemap.c @@ -881,8 +881,8 @@ void do_generic_mapping_read(struct addr index = *ppos PAGE_CACHE_SHIFT; next_index = index; - prev_index = ra.prev_index; - prev_offset = ra.prev_offset; + prev_index = ra.prev_pos PAGE_CACHE_SHIFT; + prev_offset = ra.prev_pos (PAGE_CACHE_SIZE-1); last_index = (*ppos + desc-count + PAGE_CACHE_SIZE-1) PAGE_CACHE_SHIFT; offset = *ppos ~PAGE_CACHE_MASK; @@ -968,7 +968,6 @@ page_ok: index += offset PAGE_CACHE_SHIFT; offset = ~PAGE_CACHE_MASK; prev_offset = offset; - ra.prev_offset = offset; page_cache_release(page); if (ret == nr desc-count) @@ -1055,7 +1054,9 @@ no_cached_page: out: *_ra = ra; - _ra-prev_index = prev_index; + _ra-prev_pos = prev_index; + _ra-prev_pos = PAGE_CACHE_SHIFT; + _ra-prev_pos |= prev_offset; *ppos = ((loff_t) index PAGE_CACHE_SHIFT) + offset; if (filp) @@ -1435,7 +1436,7 @@ retry_find: * Found the page and have a reference on it. */ mark_page_accessed(page); - ra-prev_index = page-index; + ra-prev_pos = page-index PAGE_CACHE_SHIFT; return page; outside_data_content: --- linux-2.6.22-rc6-mm1.orig/mm/readahead.c +++ linux-2.6.22-rc6-mm1/mm/readahead.c @@ -45,7 +45,7 @@ void file_ra_state_init(struct file_ra_state *ra, struct address_space *mapping) { ra-ra_pages = mapping-backing_dev_info-ra_pages; - ra-prev_index = -1; + ra-prev_pos = -1; } EXPORT_SYMBOL_GPL(file_ra_state_init); @@ -318,7 +318,7 @@ static unsigned long get_next_ra_size(st * indicator. The flag won't be set on already cached pages, to avoid the * readahead-for-nothing fuss, saving pointless page cache lookups. * - * prev_index tracks the last visited page in the _previous_ read request. + * prev_pos tracks the last visited byte in the _previous_ read request. * It should be maintained by the caller, and will be used for detecting * small random reads. Note that the readahead algorithm checks loosely * for sequential patterns. Hence interleaved reads might be served as @@ -342,11 +342,9 @@ ondemand_readahead(struct address_space bool hit_readahead_marker, pgoff_t offset, unsigned long req_size) { - int max;/* max readahead pages */ - int sequential; - - max = ra-ra_pages; - sequential = (offset - ra-prev_index = 1UL) || (req_size max); + int max = ra-ra_pages; /* max readahead pages */ + pgoff_t prev_offset; + int sequential; /* * It's the expected callback offset, assume sequential access. @@ -360,6 +358,9 @@ ondemand_readahead(struct address_space goto readit; } + prev_offset = ra-prev_pos PAGE_CACHE_SHIFT; + sequential = offset - prev_offset = 1UL || req_size max; + /* * Standalone, small read. * Read as is, and do not pollute the readahead state. --- linux-2.6.22-rc6-mm1.orig/fs/ext3/dir.c +++ linux-2.6.22-rc6-mm1/fs/ext3/dir.c @@ -143,7 +143,7 @@ static int ext3_readdir(struct file * fi sb-s_bdev-bd_inode-i_mapping, filp-f_ra, filp, index, 1); - filp-f_ra.prev_index = index; + filp-f_ra.prev_pos = index PAGE_CACHE_SHIFT; bh = ext3_bread(NULL, inode, blk, 0, err); } --- linux-2.6.22-rc6-mm1.orig/fs/ext4/dir.c +++ linux-2.6.22-rc6-mm1/fs/ext4/dir.c @@ -142,7 +142,7 @@ static
[PATCH 0/7] readahead cleanups and interleaved readahead take 3
Andrew, The following patches are based on yesterday's discussions, compiled and tested OK: smaller file_ra_state: [PATCH 1/7] readahead: compacting file_ra_state [PATCH 2/7] readahead: mmap read-around simplification [PATCH 3/7] readahead: combine file_ra_state.prev_index/prev_offset into prev_ code cleanups: [PATCH 4/7] readahead: remove several readahead macros [PATCH 5/7] readahead: remove the limit max_sectors_kb imposed on max_readahead_kb support of interleaved reads: [PATCH 6/7] radixtree: introduce radix_tree_scan_hole() [PATCH 7/7] readahead: basic support of interleaved reads The diffstat is block/ll_rw_blk.c |9 - fs/ext3/dir.c |2 - fs/ext4/dir.c |2 - fs/splice.c|2 - include/linux/fs.h | 14 +++- include/linux/mm.h |2 - include/linux/radix-tree.h |2 + lib/radix-tree.c | 34 mm/filemap.c | 17 +- mm/readahead.c | 58 +++ 10 files changed, 86 insertions(+), 56 deletions(-) Regards, Fengguang Wu -- - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[GIT PULL] please pull infiniband.git
Linus, please pull from master.kernel.org:/pub/scm/linux/kernel/git/roland/infiniband.git for-linus This tree is also available from kernel.org mirrors at: git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband.git for-linus This will get another small batch of changes for 2.6.23: Arthur Jones (1): IB/ipath: Remove ipath_layer dead code Florin Malita (1): IB/mlx4: Fix leaks in __mlx4_ib_modify_qp Hoang-Nam Nguyen (3): IB/ehca: Support large page MRs IB/ehca: Generate async event when SRQ limit reached IB/ehca: Move ehca2ib_return_code() out of line Joachim Fenkes (1): IB/ehca: Make internal_create/destroy_qp() static Michael S. Tsirkin (1): IB/mthca: Change command token on timeout Roland Dreier (2): mlx4_core: Change command token on timeout IB/mlx4: Fix error path in create_qp_common() Stefan Roscher (1): IB/ehca: Support small QP queues drivers/infiniband/hw/ehca/ehca_classes.h | 50 +++-- drivers/infiniband/hw/ehca/ehca_cq.c |8 +- drivers/infiniband/hw/ehca/ehca_eq.c |8 +- drivers/infiniband/hw/ehca/ehca_irq.c | 42 +++- drivers/infiniband/hw/ehca/ehca_main.c| 49 - drivers/infiniband/hw/ehca/ehca_mrmw.c| 371 - drivers/infiniband/hw/ehca/ehca_mrmw.h|2 +- drivers/infiniband/hw/ehca/ehca_pd.c | 25 ++- drivers/infiniband/hw/ehca/ehca_qp.c | 178 -- drivers/infiniband/hw/ehca/ehca_tools.h | 19 +-- drivers/infiniband/hw/ehca/ehca_uverbs.c |2 +- drivers/infiniband/hw/ehca/hcp_if.c | 50 +++- drivers/infiniband/hw/ehca/ipz_pt_fn.c| 222 + drivers/infiniband/hw/ehca/ipz_pt_fn.h| 26 ++- drivers/infiniband/hw/ipath/Makefile |1 - drivers/infiniband/hw/ipath/ipath_layer.c | 365 drivers/infiniband/hw/ipath/ipath_layer.h | 71 -- drivers/infiniband/hw/ipath/ipath_verbs.h |2 - drivers/infiniband/hw/mlx4/qp.c | 20 +- drivers/infiniband/hw/mthca/mthca_cmd.c |3 +- drivers/net/mlx4/cmd.c|3 +- 21 files changed, 802 insertions(+), 715 deletions(-) delete mode 100644 drivers/infiniband/hw/ipath/ipath_layer.c delete mode 100644 drivers/infiniband/hw/ipath/ipath_layer.h - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: where is the code for read system call?
My application reads from socket. I need to change the behavior of read system call for an experiment. Can someone point me to code? Wouldn't it be easier to create a preload-library-wrapper around glibc? Folkert van Heusden -- MultiTail is a versatile tool for watching logfiles and output of commands. Filtering, coloring, merging, diff-view, etc. http://www.vanheusden.com/multitail/ -- Phone: +31-6-41278122, PGP-key: 1F28D8AE, www.vanheusden.com - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [linux-pm] Re: Hibernation considerations
Hi. On Saturday 21 July 2007 08:43:20 [EMAIL PROTECTED] wrote: On Fri, 20 Jul 2007, Alan Stern wrote: On Fri, 20 Jul 2007, Jeremy Maitin-Shepard wrote: when doing a suspend-to-ram you get to a point where you just don't use any userspace. What do you mean? How can you prevent user tasks from running? That's basically what the freezer does, and the whole point of this approach is to eliminate the freezer. Right? Presumably no tasks at all would be scheduled. How would you prevent tasks from being scheduled? How would you prevent drivers from deadlocking because in order to put their device in a low-power state they need to acquire a lock which is held by a user task? you give up on the suspend becouse you have no way of getting the user task to give up the lock. however, kernel locks should not be held by user tasks, user tasks are not expected to behave in rational ways, allowing them to compete with kernel tasks for locks is a sure way to get a deadlock or indefinate stall. what locks are accessed this way? Any userspace process can do a syscall. In the process of the syscall, it can take kernel locks, and it can schedule (eg, while seeking to take a second lock). Regards, Nigel pgpl7edMXgJyR.pgp Description: PGP signature
Re: [RFC, Announce] Unified x86 architecture, arch/x86
On Saturday 21 July 2007 00:32, Thomas Gleixner wrote: We are pleased to announce a project we've been working on for some time: the unified x86 architecture tree, or arch/x86 - and we'd like to solicit feedback about it. Well you know my position on this. I think it's a bad idea because it means we can never get rid of any old junk. IMNSHO arch/x86_64 is significantly cleaner and simpler in many ways than arch/i386 and I would like to preserve that. Also in general arch/x86_64 is much easier to hack than arch/i386 because it's easier to regression test and in general has to care about much less junk. And I don't know of any way to ever fix that for i386 besides splitting the old stuff off completely. Besides radical file movements like this are bad anyways. They cause a big break in patchkits and forward/backwards porting that doesn't really help anybody. This causes double maintenance even for functionality that is conceptually the same for the 32-bit and the 64-bit tree. (such as support for standard PC platform architecture devices) It's not really the same platform: one is PC hardware going back forever with zillions of bugs, the other is modern PC platforms which much less bugs and quirks To see it otherwise it's more a junkification of arch/x86_64 than a cleanup of arch/i386 -- in fact you didn't really clean up arch/i386 at all. How did we do it? - As an initial matter, we made it painstakingly sure that the resulting .o files in a 32-bit build are bit for bit equal. You got not a single line less code duplication then, so i don't really see the point of this. -Andi - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC, Announce] Unified x86 architecture, arch/x86
On Saturday 21 July 2007 01:55, Michal Piotrowski wrote: Hi, On 21/07/07, Thomas Gleixner [EMAIL PROTECTED] wrote: We are pleased to announce a project we've been working on for some time: the unified x86 architecture tree, or arch/x86 - and we'd like to solicit feedback about it. What is this about? [..] As usual, comments and suggestions are welcome! I really like this idea - code duplication is a bad thing. Did you actually look at the patch? It doesn't have a single line less duplication than there was before. Everything that could be easily shared was shared already. It's just new window dressing without any real advantages. -Andi - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2/7] console: fix section mismatch warning in vgacon.c
On Sat, Jul 21, 2007 at 07:37:29AM +0800, Antonino A. Daplas wrote: On Fri, 2007-07-20 at 23:27 +0200, Sam Ravnborg wrote: Fix following section mismatch warning: WARNING: vmlinux.o(.text+0x121e62): Section mismatch: reference to .init.text:__alloc_bootmem (between 'vgacon_startup' and 'vgacon_scrolldelta') Browsing the code it seems that vgacon_scrollback_startup() is only called during the init phase so the reference to the .init.text section is OK. Teach modpost not to warn using ___init_refok. Signed-off-by: Sam Ravnborg [EMAIL PROTECTED] Acked-by: Antonino Daplas [EMAIL PROTECTED] Thanks. Will you take care of forwarding it it or do we rely on Andrew in this area? Sam - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 6/7] radixtree: introduce radix_tree_scan_hole()
On Sat, 21 Jul 2007 12:43:06 +0800 Fengguang Wu [EMAIL PROTECTED] wrote: Introduce radix_tree_scan_hole(root, index, max_scan) to scan radix tree for the first hole. It will be used in interleaved readahead. If you're ever feeling fantastically bored, please consider updating the userspace radix-tree test harness for this? Cook up a couple of testcases for the new functionality? Thanks. http://www.zip.com.au/~akpm/linux/patches/stuff/rtth.tar.gz is the latest. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC, Announce] Unified x86 architecture, arch/x86
On Saturday 21 July 2007 01:55, Michal Piotrowski wrote: I really like this idea - code duplication is a bad thing. Did you actually look at the patch? It doesn't have a single line less duplication than there was before. Everything that could be easily shared was shared already. It's just new window dressing without any real advantages. And did you read what tglx wrote? This patch was the beginning of the merger, not the end result. It strived for binary identical images. It was to put everything together as a _starting_point_! The next thing to do after this is to start the merging. -- Steve - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/8] compacting file_ra_state
On Fri, Jul 20, 2007 at 09:27:01PM -0700, Linus Torvalds wrote: On Sat, 21 Jul 2007, Fengguang Wu wrote: Sorry, forgot to prefix the patch titles with [readahead]. Should I repost? Not for me, but on the other hand, I'd prefer for this to be in -mm a bit, Haven't the readahead patches already essentially been in -mm* for some time? I thought the new patches were some some restructured code, but essentially the tested algorithms? -Andi - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Move KVM, paravirt, lguest, VMI and Xen under arch-level Virtualization option
On Fri, 2007-07-20 at 08:24 +0300, Avi Kivity wrote: Rusty Russell wrote: Any objections? Rusty. === Having KVM appear in the middle of drivers is kinda strange, and having it alone under a menu called virtualization doubly so. 1) Move the Virtualization menu into the arch-specific i386 and x86-64 Kconfig. Virtualization is hardly x86 specific. How about moving it to top-level, and having individual items disable themselves on archs they don't apply to? Otherwise we end up with $NARCH copies of that Kconfig, each slightly different. The top-level entry can be made to depend on the archs that actually have some virt capability, so as not to show empty an menu. I dislike the duplication, too, but 1) it's a CPU capability, and that's where it belongs in the menu. 2) And as you can see from the difference between the x86_64 and i386 help text, there are real platform differences (and not mentioning what's under the menu would be kinda cheating). 3) Virtualization doesn't even make sense as an option for some platforms where it's always on. Cheers, Rusty. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] mm: Fix memory hotplug oops from ZONE_MOVABLE changes.
zone_movable_pfn is presently marked as __initdata and referenced from adjust_zone_range_for_zone_movable(), which in turn is referenced by zone_spanned_pages_in_node(). Both of these are __meminit annotated. When memory hotplug is enabled, this will oops on a hot-add, due to zone_movable_pfn having been freed. __meminitdata annotation gives the desired behaviour. This will only impact platforms that enable both memory hotplug and ARCH_POPULATES_NODE_MAP. Signed-off-by: Paul Mundt [EMAIL PROTECTED] -- mm/page_alloc.c |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 43cb3b3..40954fb 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -138,7 +138,7 @@ static unsigned long __meminitdata dma_reserve; #endif /* CONFIG_MEMORY_HOTPLUG_RESERVE */ unsigned long __initdata required_kernelcore; unsigned long __initdata required_movablecore; - unsigned long __initdata zone_movable_pfn[MAX_NUMNODES]; + unsigned long __meminitdata zone_movable_pfn[MAX_NUMNODES]; /* movable_zone is the real zone pages in ZONE_MOVABLE are taken from */ int movable_zone; - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] virtual sched_clock() for s390
Paul Mackerras wrote: Do you think this makes the PURR more useful for CFS, or less? To me it looks like this would mean that CFS can make a more equitable distribution of CPU time if, for example, you had 3 runnable tasks on a 2-core x dual-threaded machine (4 virtual CPUs). Sounds reasonable to me. I've proposed in the past that sched_clock should be scaled by the cpufreq frequency to achieve the same effect (ie, measure the actual number of cpu cycles that are really available to tasks). But more specifically, what you've described is exactly analogous to hypervisor stolen time, since one thread steals time from the other. BTW, what does time spent running during sleep mean? Does it mean time that other tasks are running while this task is sleeping? That's how I interpreted it. You're only credited for sleeping if someone else wanted the CPU in the meantime. J - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Fix Dreamcast DMA
On Thursday 19 July 2007, Adrian McMenamin wrote: Signed-off by: Adrian McMenamin [EMAIL PROTECTED] @@ -183,6 +183,7 @@ int dmac_search_free_channel(const char *dev_id) return result; atomic_set(channel-busy, 1); + return channel-chan; } @@ -194,7 +195,6 @@ int request_dma(unsigned int chan, const char *dev_id) struct dma_channel *channel = { 0 }; struct dma_info *info = get_dma_info(chan); int result; - channel = get_dma_channel(chan); if (atomic_xchg(channel-busy, 1)) return -EBUSY; @@ -387,7 +388,7 @@ int register_dmac(struct dma_info *info) } list_add(info-list, registered_dmac_list); - + return 0; } EXPORT_SYMBOL(register_dmac); seems like whitespace noise in here ... -mike signature.asc Description: This is a digitally signed message part.
Re: [PATCH] [updated] PHY fixed driver: rework release path and update phy_id notation
On Thu, 19 Jul 2007 03:38:04 +0400 Vitaly Bordug [EMAIL PROTECTED] wrote: device_bind_driver() error code returning has been fixed. release() function has been written, so that to free resources in correct way; the release path is now clean. Before the rework, it used to cause Device '[EMAIL PROTECTED]:1' does not have a release() function, it is broken and must be fixed. BUG: at drivers/base/core.c:104 device_release() Call Trace: [802ec380] kobject_cleanup+0x53/0x7e [802ec3ab] kobject_release+0x0/0x9 [802ecf3f] kref_put+0x74/0x81 [8035493b] fixed_mdio_register_device+0x230/0x265 [80564d31] fixed_init+0x1f/0x35 [802071a4] init+0x147/0x2fb [80223b6e] schedule_tail+0x36/0x92 [8020a678] child_rip+0xa/0x12 [80311714] acpi_ds_init_one_object+0x0/0x83 [8020705d] init+0x0/0x2fb [8020a66e] child_rip+0x0/0x12 Also changed the notation of the fixed phy definition on mdio bus to the form of speed+duplex to make it able to be used by gianfar and ucc_geth that define phy_id strictly as %d:%d and cleaned up the whitespace issues. Confused. Does the above refer to the difference between this patch and the previous version, or does it just describe this patch? Hopefully the latter, because the former isn't interesting, long-term. If is _is_ a full standalone description of this patch then it's a bit hard to follow ;) +config FIXED_MII_1000_FDX + bool Emulation for 1000M Fdx fixed PHY behavior + depends on FIXED_PHY + +config FIXED_MII_AMNT +int Number of emulated PHYs to allocate +depends on FIXED_PHY +default 1 +---help--- +Sometimes it is required to have several independent emulated +PHYs on the bus (in case of multi-eth but phy-less HW for instance). +This control will have specified number allocated for each fixed +PHY type enabled. Shouldn't these be runtime options (ie: module parameters)? ... + * Private information hoder for mii_bus tpyo. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: libata [ata_piix] still no resume from S3 ?
Rúben Fonseca wrote: I wish I could disable this card reader. It is built in on the hardware, and there are no drivers for Linux. There is no option on the BIOS to disable the device. Is there any way (kernel parameters, magic program, etc) to disable this device without opening my laptop to cut the wires? :D OIC. How about not loading tifm_7xx1 module? Does that make any difference? -- tejun - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/