Re: NAK new drivers without proper power management?
> > > > to a large degree, a device driver that doesn't suspend is better than > > no device driver at all, right? > > I'm not sure it is. It only makes more work for everyone else: We have > to help people figure out what causes their computer to fail to resume > (which can take quite a while), so we make the kernel printk on suspend if there are devices without suspend/resume. Heck, make a config option that prints that at modprobe time. -- if you want to mail me at work (you don't), use arjan (at) linux.intel.com Test the interaction between Linux and your BIOS via http://www.linuxfirmwarekit.org - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.18.2: sporadic SATA port resets (Broadcom BCM5785 (HT1000))
Emmeran Seehuber wrote: # smartctl -a /dev/sda smartctl version 5.36 [x86_64-unknown-linux-gnu] Copyright (C) 2002-6 Bruce Allen Home page is http://smartmontools.sourceforge.net/ Device: ATA WDC WD1500ADFD-0 Version: 20.0 Serial number: WD-WMAP41246348 Device type: disk Local Time is: Fri Feb 9 18:06:23 2007 CET Device does not support SMART Hmmm... Raptor not supporting SMART. That's weird. Please try 'smartctl -d ata -a /dev/sda'. Thanks. -- tejun - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[git pull] Input patches for 2.6.20+
Hi Linus, Please pull from: git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input.git for-linus or master.kernel.org:/pub/scm/linux/kernel/git/dtor/input.git for-linus to receive updates for input subsystem. Changelog: -- Akinobu Mita (1): Input: pc110pad - return proper error Cyrill V. Gorcunov (1): Input: HIL - handle erros from input_register_device() David Brownell (1): Input: ads7846 - be more compatible with the hwmon framework Dmitry Torokhov (3): Input: i8042 - really suppress ACK/NAK during panic blink Input: i8042 - fix AUX IRQ delivery check Imre Deak (5): Input: ads7846 - pluggable filtering logic Input: ads7846 - optionally leave Vref on during differential measurements Input: ads7846 - switch to using hrtimer Input: ads7846 - select correct SPI mode Input: ads7846 - detect pen up from GPIO state Jaya Kumar (1): Input: add Atlas button driver Jiri Slaby (2): Input: hid-ff - add support for Logitech Momo racing wheel Input: remove scan_keyb driver Michael Leun (1): Input: wistron - add support for Fujitsu-Siemens Amilo D88x0 Phil Blundell (1): Input: gpio-keys - keyboard driver for GPIO buttons Richard Purdie (1): Input: tsdev - schedule removal Robert P. J. Day (1): Input: inport - use correct config option for ATIXL Diffstat: - b/Documentation/feature-removal-schedule.txt | 15 + b/drivers/input/keyboard/Kconfig | 19 + b/drivers/input/keyboard/Makefile|5 b/drivers/input/keyboard/gpio_keys.c | 147 b/drivers/input/keyboard/hilkbd.c| 114 +- b/drivers/input/misc/Kconfig | 10 b/drivers/input/misc/Makefile|1 b/drivers/input/misc/atlas_btns.c| 170 +++ b/drivers/input/misc/wistron_btns.c | 20 + b/drivers/input/mouse/inport.c |2 b/drivers/input/mouse/pc110pad.c |2 b/drivers/input/serio/i8042.c|5 b/drivers/input/touchscreen/Kconfig |9 b/drivers/input/touchscreen/ads7846.c| 306 +++ b/drivers/input/tsdev.c |4 b/drivers/usb/input/hid-ff.c |1 b/drivers/usb/input/hid-lgff.c |1 b/include/asm-arm/hardware/gpio_keys.h | 17 + b/include/linux/spi/ads7846.h|2 drivers/char/scan_keyb.c | 149 - drivers/char/scan_keyb.h | 15 - drivers/input/serio/i8042.c |7 drivers/input/touchscreen/ads7846.c | 275 +++- include/linux/spi/ads7846.h | 10 24 files changed, 889 insertions(+), 417 deletions(-) -- Dmitry - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: NAK new drivers without proper power management?
On Fri, Feb 09, 2007 at 07:25:34PM -0500, Jeff Garzik wrote: > Nigel Cunningham wrote: > >Hi. > > > >On Fri, 2007-02-09 at 23:17 +0100, Arjan van de Ven wrote: > >>On Sat, 2007-02-10 at 08:57 +1100, Nigel Cunningham wrote: > >>>Hi. > >>> > >>>I don't think this is already done (feel free to correct me if I'm > >>>wrong).. > >>> > >>>Can we start to NAK new drivers that don't have proper power management > >>>implemented? There really is no excuse for writing a new driver and not > >>>putting .suspend and .resume methods in anymore, is there? > >> > >>to a large degree, a device driver that doesn't suspend is better than > >>no device driver at all, right? > > > >I'm not sure it is. It only makes more work for everyone else: We have > >to help people figure out what causes their computer to fail to resume > >(which can take quite a while), then get them them complain to driver > >author, and the driver author has to submit patches to fix it. > > > >All of this is avoided if they'll just do it right in the first place. > > A lot of a lot of things could have been avoided, if they just did it > right the first time. > > I think it's more valuable to users to get a basic network driver that > pings or a basic ATA driver that reads/writes, than peripheral issues > like suspend/resume. 100% agreed. I've been used to a notebook (VAIO) which did not correctly shut down, and did not support reboot. Now the one I have behaves normally on both features. I've never ever felt the need for suspend/resume, that I've always attributed to "geeks" requirements. I had to debug the shutdown code myself for the previous notebook, and discovered that it was caused by bugs in the ACPI state transitions for suspend and such fancy features. I would really have prefered that the people writing the ACPI code had focused first on power-on/ power-off before the rest. > Certainly we should ask for it, but it shouldn't be a merge-stopper. I think we should even proceed in the opposite direction : refuse to suspend if at least one driver does not support the feature, and enumerate the faulty drivers on the console. While I agree that a machine which resumes in a bad state is not funny at all to debug, at least when the user expects his notebook to suspend and sees that it refuses, he can complain about the drivers which do not support it, and can even unload them first if unneeded. Regards, Willy - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [git patches] libata updates 1 of 3
This update breaks sata_via on my VIA K8T800Pro machine: sata_via :00:0f.0 : failed to iomap PCI BAR 0 sata_via :00:0f.0 : out of memory sata_via probe of :00:0f.0 failed with error -12 -- Markus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Fix null pointer dereference in appledisplay driver
Applied. thanks, -Len On Friday 09 February 2007 19:18, Michael Hanselmann wrote: > Commit 40b20c257a13c5a526ac540bc5e43d0fdf29792a by Len Brown introduced > a null pointer dereference in the appledisplay driver. This patch fixes > it. > > Signed-off-by: Michael Hanselmann <[EMAIL PROTECTED]> > > --- > I suggest adding this to 2.6.20.1 because this bug causes the kernel to > panic on boot when the driver is compiled in. > > diff -Nrup --exclude-from linux-exclude-from > linux-2.6.20.orig/drivers/usb/misc/appledisplay.c > linux-2.6.20/drivers/usb/misc/appledisplay.c > --- linux-2.6.20.orig/drivers/usb/misc/appledisplay.c 2007-02-09 > 22:35:56.0 +0100 > +++ linux-2.6.20/drivers/usb/misc/appledisplay.c 2007-02-10 > 01:00:28.0 +0100 > @@ -281,8 +281,8 @@ static int appledisplay_probe(struct usb > /* Register backlight device */ > snprintf(bl_name, sizeof(bl_name), "appledisplay%d", > atomic_inc_return(_displays) - 1); > - pdata->bd = backlight_device_register(bl_name, NULL, NULL, > - _bl_data); > + pdata->bd = backlight_device_register(bl_name, NULL, > + pdata, _bl_data); > if (IS_ERR(pdata->bd)) { > err("appledisplay: Backlight registration failed"); > goto error; > - > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to [EMAIL PROTECTED] > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ > - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH -mm] libata: warn if speed limited due to 40-wire cable
Print an explicit warning when a device's UDMA mode is limited due to a 40-wire cable being detected, so that users have some idea why their device isn't running as fast as it should. This moves the application of the drive's mode masks before the cable rule, so that can tell whether the rate is being limited by the cable and not the drive or controller. I haven't tested whether the message actually shows up, as my system isn't horked in this manner.. Signed-off-by: Robert Hancock <[EMAIL PROTECTED]> --- linux-2.6.20-rc6-mm3/drivers/ata/libata-core.c 2007-02-04 21:48:25.0 -0600 +++ linux-2.6.20-rc6-mm3edit/drivers/ata/libata-core.c 2007-02-09 21:04:14.0 -0600 @@ -3393,22 +3393,24 @@ static void ata_dev_xfermask(struct ata_ xfer_mask = ata_pack_xfermask(ap->pio_mask, ap->mwdma_mask, ap->udma_mask); + /* drive modes available */ + xfer_mask &= ata_pack_xfermask(dev->pio_mask, + dev->mwdma_mask, dev->udma_mask); + xfer_mask &= ata_id_xfermask(dev->id); + /* Apply cable rule here. Don't apply it early because when * we handle hot plug the cable type can itself change. +* Unknown or 80 wire cables reported host side are checked +* drive side as well. Cases where we know a 40wire cable +* is used safely for 80 are not checked here. */ - if (ap->cbl == ATA_CBL_PATA40) - xfer_mask &= ~(0xF8 << ATA_SHIFT_UDMA); - /* Apply drive side cable rule. Unknown or 80 pin cables reported -* host side are checked drive side as well. Cases where we know a -* 40wire cable is used safely for 80 are not checked here. -*/ -if (ata_drive_40wire(dev->id) && (ap->cbl == ATA_CBL_PATA_UNK || ap->cbl == ATA_CBL_PATA80)) + if ((xfer_mask & (0xF8 << ATA_SHIFT_UDMA)) && + ((ap->cbl == ATA_CBL_PATA40) || +(ata_drive_40wire(dev->id) && + (ap->cbl == ATA_CBL_PATA_UNK || ap->cbl == ATA_CBL_PATA80 { + ata_dev_printk(dev, KERN_WARNING, "limited to UDMA2 due to 40-wire cable\n"); xfer_mask &= ~(0xF8 << ATA_SHIFT_UDMA); - - - xfer_mask &= ata_pack_xfermask(dev->pio_mask, - dev->mwdma_mask, dev->udma_mask); - xfer_mask &= ata_id_xfermask(dev->id); + } /* * CFA Advanced TrueIDE timings are not allowed on a shared - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: NAK new drivers without proper power management?
Hi. On Sat, 2007-02-10 at 03:42 +, Matthew Garrett wrote: > On Sat, Feb 10, 2007 at 08:57:49AM +1100, Nigel Cunningham wrote: > > > Can we start to NAK new drivers that don't have proper power management > > implemented? There really is no excuse for writing a new driver and not > > putting .suspend and .resume methods in anymore, is there? > > The PCI layer is able to deal with drivers that have no PM methods in > the most simple case. Yeah. I suppose we could use a pm_safe bit flag in struct device_driver and/or struct pci_driver. I have other things to do right now, but will seek to understand the relationship between those structs better later. Regards, Nigel - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: forcedeth problems on 2.6.20-rc6-mm3
Ayaz Abdulla wrote: For all those who are having issues, please try out the attached patch. Ayaz Seems to solve the problem for me (not heavily tested, but certainly isn't totally dead as it was before). -- Robert Hancock Saskatoon, SK, Canada To email, remove "nospam" from [EMAIL PROTECTED] Home Page: http://www.roberthancock.com/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: NAK new drivers without proper power management?
On Fri, Feb 09, 2007 at 08:59:55PM -0500, Lee Revell wrote: > On 2/9/07, Robert Hancock <[EMAIL PROTECTED]> wrote: > >I would disagree that it's a peripheral issue, it's pretty core these > >days, at least for any hardware that you can stuff in a laptop (though a > >fair number of desktops get suspended and resumed these days too). > > Servers are still the most important Linux market, and don't care > about suspend/resume. I would consider implementing suspend./resume > for a driver that will only be used in server or HPC class hardware a > waste of valuable development resources. Please allow me to be offensively blunt for a moment. So, the situation seems to be: 1. The work of the suspend developer who engages the users who put effort into making suspend work on their hardware (bless their addled little heads) often doesn't meet kernel standards, or isn't well enough documented to prove the real *need* for the features and/or hacks that have happened to get actual users' systems sleeping and running again. 2. The swsusp maintainer continues in the belief that as long as their are no bug reports in kernel bugzilla or crossing the (relatively obscure) swsusp mailing lists, it has zarro boogs and meanwhile works on the fourth implementation of suspend support in as many years. It's in CVS on sourceforge. There's no documentation whatsoever. 3. There's another guy who appears to be doing a lot of work, so I shan't leave him out. Like the two developers previously mentioned, he seems to be working pretty hard on the whole thing. The previously mentioned fourth suspend implementation seems to be largely his doing, for good and for ill. 4. "Everybody" knows suspend doesn't work on Linux without a huge amount of tinkering, deep magic, and dead chickens. Only Gentoo users seem to bother; everyone else waits for Ubuntu 12.04 wherein suspend will "just work". The Gentoo users all use swsusp2, as it contains the hacks to work around: 5. All the suspend developers blame the lack of power-management support in drivers for the inablility of Linux to properly suspend on anything that doesn't support APM. 6. Getting proper power-management support in Linux device drivers is not a priority; drivers without any power management support whatsoever should not only be accepted -- they should be merged without comment or complaint. How is working suspend support ever supposed to happen? -- Joseph Fannin [EMAIL PROTECTED] || [EMAIL PROTECTED] - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Kwatch: kernel watchpoints using CPU debug registers
On Fri, 9 Feb 2007, Roland McGrath wrote: > I don't think I really object to the ABI change of clearing %dr6 after an > exception so that it does not accumulate multiple results. But first I'll > have to convince myself that we never actually do want to accumulate > multiple results. Hmm, I think we can, so maybe I do object. If you set > two watchpoints inside a user buffer and then do a system call that touches > both those addresses (e.g. read), then you will go through do_debug (to > send_sigtrap) twice before returning to user mode. When the syscall is > done, you'll have a pending SIGTRAP for the debugger to handle. By looking > at your %dr6 the debugger can see that both watchpoints hit. (gdb does not > handle this case, but it should.) Am I wrong? I think you're right. > So this gets to the more complicated view of %dr6 handling that I had first > had in mind yesterday. Each allocation "owns" one of the low 4 bits in > %dr6 too. Only the dr6 bits owned by the userland "raw" allocation > (i.e. ptrace/utrace_regset) should appear nonzero in thread.debugreg[6]. > So when kwatch swallows a debug exception, it should mask off its bit from > %dr6 in the CPU, but not clear %dr6 completely. That way you can have a > sequence of user dr0 hit, kwatch dr3 hit, user dr1 hit, all inside one > system call (including interrupt handlers), and when it gets to the > userland debugger examining dr6 it sees the low 2 bits both set. Okay; I'll fix this too. Come to think of it, kwatch needs to handle multiple hits as well -- there might be two watchpoints set to the same address. > > It's really quite a tricky matter. Should a register be allocated to > > kwatch only when no user process needs it? Should we really go about > > checking the requirements of every single process whenever a kwatch > > allocation request comes in? What if the processes which need a > > particular register aren't running -- should the register then be given to > > kwatch? What if one of those processes then does start running on one > > CPU? > > To "go about checking the requirements of every single process" is not so > hard as it sounds when they're recorded as a single global use count per > slot, as your original code does. When you mentioned a "your allocation is > available" callback, I was thinking it might come to that being called > inside context switch. It's all rather tricky, indeed. > > The obvious answer is to start simple. If any user process anywhere uses > drN, kwatch has to give it up for all CPUs (watchpoints with less than > "break ptrace" priority do). If anyone really cares about more flexibility > than that, we can change or extend it. Some copious comments in the > interface descriptions can lead them in the right direction if the > situation comes up. Probably with systemtap support in a while, we'll get > a lot more concrete uses of watchpoints and people finding out what really > matters to them. It's still more complicated than you might think. Let's say two user processes each have dr1 allocated, one with low priority and the other with high priority. The kernel has to be aware of the high-priority allocation, so that it can refuse intermediate-priority kwatch allocation attempts. Now suppose the second process exits. dr1 is still allocated to the first user process but only with low priority, so now intermediate-priority kwatch allocation attempts should succeed. In order for this to work, when the second process gives up its allocation I would have to either scan though all tasks to see the first process, or else keep several global use counts for each slot -- in fact, one use count for each priority level. That's doable if there are only a few levels, but not if there are many. How do you suggest this be handled? Maybe we should just keep track of a maximum user priority level for each slot, allowing it to go up but not down until all user processes have given up the slot. (I.e., in the example above the later kwatch requests would still fail because we would continue to remember the high user priority level so long as the first process maintained its allocation.) That would be overly pessimistic, but it would at least be safe. Alan Stern - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [GIT PATCH] ACPI patches for 2.6.21
On Friday 09 February 2007 18:09, Pavel Machek wrote: > Hi! > > > Per your request, and the request of the distros, we've changed > > how ACPICA Core releases are integrated into Linux so that each > > upstream (CVS) check-in appears as a single git commit. > > While this process is not yet perfect, it should be vastly better > > than previous "code drops" in allowing git bisect to work, > > and allowing distros to cherry-pick individual fixes. > > > > The "bay" driver is new (and marked EXPERIMENTAL) -- adding initial > > hot-plug support for ACPI controlled drive bays such as the > > IBM ultrabay or the Dell Module Bay. > > Could you describe userland interface it uses? /proc? Will it be > usable for bays on notebooks not using acpi? No, Not until somebody finds one and writes code to support it. > > The "asus-laptop" driver is also new. Consistent with msi-laptop, > > it uses ACPI in platform-specific ways, but strives to avoid > > exposing ACPI-specific implementation details to the user. > > asus-laptop is mutually exclusive with asus_acpi, which it will > > replace over time. > > Not including another /proc/acpi/ibm -like nightmare, is it? No. See discussion on linux-acpi. I've prohibited new files under /proc/acpi/ for quite some time now. > > the old /proc/acpi/ interfaces with cleaner interfaces in sysfs -- > > non-ACPI-specific generic ones whenever possible. This effort > > is not complete, but it has been in -mm for a long time and > > I believe that it is time to push it upstream to benefit > > from broader exposure and testing. > > Does it still include completely broken alarm interface? Can't find it > in changelogs, so hopefully not. No. See discussion on linux-acpi. David Brownell's RTC driver will provide the new RTC interface in sysfs. /proc/acpi/alarm will go away when the rest of /proc/acpi goes away. thanks, -Len - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: NAK new drivers without proper power management?
Hi Dmitry! On Fri, 2007-02-09 at 22:27 -0500, Dmitry Torokhov wrote: > Hi Nigel, > > On Friday 09 February 2007 21:05, Nigel Cunningham wrote: > > [ 17.684475] Device driver serio0 lacks bus and class support for being > > resumed. > > [ 17.684724] Device driver serio1 lacks bus and class support for being > > resumed. > > [ 17.684874] Device driver psaux lacks bus and class support for being > > suspended or resumed. > > [ 17.685015] Device driver serio2 lacks bus and class support for being > > resumed. > > [ 18.373576] Device driver serio3 lacks bus and class support for being > > resumed. > > [ 18.375666] Device driver serio4 lacks bus and class support for being > > resumed. > > > > You should probably only warn if driver does not have resume method - not > having suspend is quite valid if driver is able to restore state at resume > without explicitely saving anything at suspend time. Can do. Will do :) Nigel - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 4 of 7] lguest: Config and headers
On Sat, 10 Feb 2007, Rusty Russell wrote: > Well it was the use of get_order() which triggered Andi's alarm bells, > so I went back to deriving it. This code is correct, however. + hype_pages = alloc_pages(GFP_KERNEL|__GFP_ZERO, HYPERVISOR_MAP_ORDER); + if (!hype_pages) + return -ENOMEM; This will try and allocate 2^16 pages. I guess we need a HYPERVISOR_PAGE_ORDER ? - James -- James Morris <[EMAIL PROTECTED]> - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: NAK new drivers without proper power management?
On Sat, Feb 10, 2007 at 08:57:49AM +1100, Nigel Cunningham wrote: > Can we start to NAK new drivers that don't have proper power management > implemented? There really is no excuse for writing a new driver and not > putting .suspend and .resume methods in anymore, is there? The PCI layer is able to deal with drivers that have no PM methods in the most simple case. -- Matthew Garrett | [EMAIL PROTECTED] - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [GIT PATCH] ACPI patches for 2.6.21
On Fri, Feb 09, 2007 at 05:24:10PM -0800, Kristen Carlson Accardi wrote: > The user interface for the Bay driver is via sysfs - it is a platform > driver Though, ideally, in the long run it'll be tied into the PATA/SATA interface that it's associated with. That involves a little more magic, though :) -- Matthew Garrett | [EMAIL PROTECTED] - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: NAK new drivers without proper power management?
Hi Nigel, On Friday 09 February 2007 21:05, Nigel Cunningham wrote: > [ 17.684475] Device driver serio0 lacks bus and class support for being > resumed. > [ 17.684724] Device driver serio1 lacks bus and class support for being > resumed. > [ 17.684874] Device driver psaux lacks bus and class support for being > suspended or resumed. > [ 17.685015] Device driver serio2 lacks bus and class support for being > resumed. > [ 18.373576] Device driver serio3 lacks bus and class support for being > resumed. > [ 18.375666] Device driver serio4 lacks bus and class support for being > resumed. > You should probably only warn if driver does not have resume method - not having suspend is quite valid if driver is able to restore state at resume without explicitely saving anything at suspend time. -- Dmitry - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [ANNOUNCE] d80211 based driver for Intel PRO/Wireless 3945ABG
Over the past year we were able to make the necessary changes to the microcode used with the 3945 such that we were able to remove the regulatory daemon. Great news !! Congratz ;-) -- As you read this post global entropy rises. Have Fun ;-) Nick - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: NAK new drivers without proper power management?
On Fri, 2007-02-09 at 18:22 -0800, Lee Revell wrote: > On 2/9/07, Nigel Cunningham <[EMAIL PROTECTED]> wrote: > > On Fri, 2007-02-09 at 20:59 -0500, Lee Revell wrote: > > > On 2/9/07, Robert Hancock <[EMAIL PROTECTED]> wrote: > > > > I would disagree that it's a peripheral issue, it's pretty core > these > > > > days, at least for any hardware that you can stuff in a laptop > (though a > > > > fair number of desktops get suspended and resumed these days > too). > > > > > > Servers are still the most important Linux market, and don't care > > > about suspend/resume. I would consider implementing > suspend./resume > > > for a driver that will only be used in server or HPC class > hardware a > > > waste of valuable development resources. > > > > Not necessarily. Imagine suspending to disk in order to replace a > faulty > > card. That could be way faster and less disruptive than shutting > down > > normally and loosing caches and so on. > > > > Hmm. If uptime is critical I would make sure to have redundant > systems anyway and I would just reboot the thing. I would not expect > the suspend/resume paths on server class hardware like 10gig ethernet, > Infiniband adapters, or high end SCSI to be particularly well tested. Speaking from the HPC standpoint, we are gaining more and more nodes in clusters as time goes on, so the potential for single failures affecting performance is growing. A lot of the server class nodes have redundancy such that the node slows down but not die on failure. Unfortunately, slowdown in a single node in a tightly coupled job can greatly affect performance. A good example would be ECC memory. If a chip is going bad, the machine can detect it but it will run slower until the memory is replaced. This one node can affect thousands of other nodes in the same job. Having a mechanism to migrate the operating system that is running on this failing node to another node would be quite beneficial to performance. If all drivers properly supported suspend/resume, it could possibly be extended to support migration to another node as well. At least for the HPC world, we'd like to see, and encourage, the hardware you describe getting full support for suspend/resume. Kevin > > Irrespective of the above, servers tend not to have too much in the > way > > of hardware unique to them anyway, and even if you don't find it > useful, > > that's not to say others won't want it. > > Yes but for such hardware, suspend/resume is likely to be a lot of > work to implement, and I'd rather the developers devote those > resources to making the driver as stable and performant as possible. > > I agree 100% that drivers for desktop and laptop hardware should be > rejected if missing suspend/resume. > > Lee > - > To unsubscribe from this list: send the line "unsubscribe > linux-kernel" in > the body of a message to [EMAIL PROTECTED] > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ > > - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] Re: NAK new drivers without proper power management?
Hi. On Fri, 2007-02-09 at 19:50 -0600, Robert Hancock wrote: > It also kind of bothers me that if a driver has no suspend/resume > functions, and you suspend and resume the system, we don't complain > about it even though there's a very good chance that device is not going > to function properly. How about something in dmesg like: > > Warning: driver for device has no suspend or resume support. > Device may not function properly after resume. > > so that users know who to complain to. Maybe there are some devices that > truly don't need any handling for suspend, but if so I suspect the > number of those is small enough that adding empty functions would be a > good-enough solution. Here's my current version of a patch to do this, if anyone wants to try it out. It dumps stack with the warning to make it easier to see what the source of the message is: drivers/base/core.c | 25 + drivers/pci/pci-driver.c |6 ++ drivers/usb/core/driver.c |5 + include/linux/device.h|1 + 4 files changed, 37 insertions(+) diff -ruNp 920-report-no-pm-support.patch-old/drivers/base/core.c 920-report-no-pm-support.patch-new/drivers/base/core.c --- 920-report-no-pm-support.patch-old/drivers/base/core.c 2007-02-06 14:48:31.0 +1100 +++ 920-report-no-pm-support.patch-new/drivers/base/core.c 2007-02-10 13:36:33.0 +1100 @@ -552,6 +552,30 @@ int device_add(struct device *dev) class_intf->add_dev(dev, class_intf); up(>class->sem); } + +#ifdef CONFIG_PM + { + int nosusp = 0, nores = 0; + + if (!((dev->class && dev->class->suspend) || + (dev->bus && (dev->bus->suspend || dev->bus->suspend_late + nosusp = 1; + + if (!((dev->class && dev->class->resume) || + (dev->bus && (dev->bus->resume || dev->bus->resume_early + nores = 1; + + if ((nosusp || nores) && !dev->pm_safe) { + printk("Device driver %s lacks bus and class support for " + "being %s.\n", + kobject_name(>kobj), + nosusp ? (nores ? "suspended or resumed" : + "resumed") : "suspended"); + dump_stack(); + } + } +#endif + Done: kfree(class_name); put_device(dev); @@ -851,6 +875,7 @@ struct device *device_create(struct clas dev->class = class; dev->parent = parent; dev->release = device_create_release; + dev->pm_safe = 1; va_start(args, fmt); vsnprintf(dev->bus_id, BUS_ID_SIZE, fmt, args); diff -ruNp 920-report-no-pm-support.patch-old/drivers/pci/pci-driver.c 920-report-no-pm-support.patch-new/drivers/pci/pci-driver.c --- 920-report-no-pm-support.patch-old/drivers/pci/pci-driver.c 2007-02-06 14:48:44.0 +1100 +++ 920-report-no-pm-support.patch-new/drivers/pci/pci-driver.c 2007-02-10 14:00:39.0 +1100 @@ -449,6 +449,12 @@ int __pci_register_driver(struct pci_dri if (error) driver_unregister(>driver); + if (!drv->suspend || !drv->resume) + printk("PCI driver %s lacks driver specific %s support.\n", + drv->name, + !drv->suspend ? (drv->resume ? "suspend" : + "suspend and resume") : "resume"); + return error; } diff -ruNp 920-report-no-pm-support.patch-old/drivers/usb/core/driver.c 920-report-no-pm-support.patch-new/drivers/usb/core/driver.c --- 920-report-no-pm-support.patch-old/drivers/usb/core/driver.c 2007-02-06 14:48:47.0 +1100 +++ 920-report-no-pm-support.patch-new/drivers/usb/core/driver.c 2007-02-10 12:32:57.0 +1100 @@ -709,6 +709,11 @@ int usb_register_device_driver(struct us pr_info("%s: registered new device driver %s\n", usbcore_name, new_udriver->name); usbfs_update_special(); + if (!new_udriver->suspend || !new_udriver->resume) + printk("USB driver %s lacks %s support.\n", + new_udriver->name, !new_udriver->suspend ? + (new_udriver->resume ? "suspend" : +"suspend and resume") : "resume"); } else { printk(KERN_ERR "%s: error %d registering device " " driver %s\n", diff -ruNp 920-report-no-pm-support.patch-old/include/linux/device.h 920-report-no-pm-support.patch-new/include/linux/device.h --- 920-report-no-pm-support.patch-old/include/linux/device.h 2007-02-06 14:48:56.0 +1100 +++ 920-report-no-pm-support.patch-new/include/linux/device.h 2007-02-10 13:36:01.0 +1100 @@ -356,6 +356,7 @@ struct device { struct kobject kobj; charbus_id[BUS_ID_SIZE];/*
Re:
You're doing it wrong. Please read the bottom of your emails. On 9 February 2007, at 00:29, Priyanka Sharma wrote: unsubscribe linux-kernel -- Priyanka 202.141.151.80/~priyanka - To unsubscribe from this list: send the line "unsubscribe linux- kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ -- hackmiester (Hunter Fuller) who can help me ? i'm french and i don't know irc can't help you with the being french part, you are screwed their mate Phone Voice: +1 251 589 6348 Fax: Call the voice number and ask. Email General chat: [EMAIL PROTECTED] Large attachments: [EMAIL PROTECTED] SPS-related stuff: [EMAIL PROTECTED] IM AIM: hackmiester1337 Skype: hackmiester31337 YIM: hackm1ester Gtalk: hackmiester MSN: [EMAIL PROTECTED] Xfire: hackmiester - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[patch 3/3] mm: fix PageUptodate memorder
After running SetPageUptodate, preceeding stores to the page contents to actually bring it uptodate may not be ordered with the store to set the page uptodate. Therefore, another CPU which checks PageUptodate is true, then reads the page contents can get stale data. Fix this by ensuring SetPageUptodate is always called with the page locked (except in the case of a new page that cannot be visible to other CPUs), and requiring PageUptodate be checked only when the page is locked. To facilitate lockless checks, SetPageUptodate contains an smp_wmb to order preceeding stores before the store to page flags, and a new PageUptodate_NoLock is introduced, which issues a smp_rmb after the page flags are loaded for the test. I'm still not sure that a DMA memory barrier is not required, however I think the logical place to put such a barrier would be in the IO completion routines, when they come back to tell us that they have succeeded. (Help? Anyone?) One thing I like about it is that it unifies the anonymous page handling with the rest of the page management, by marking anon pages as uptodate when they _are_ uptodate, rather than when our implementation requires that they be marked as such. Doing this let me get rid of the smp_wmb's in the page copying functions, which were specially added for anonymous pages for a closely related issue, always vaguely troubled me. Convert core code and some filesystems to use PageUptodate_NoLock, just for reference (a more complete patch follows). Signed-off-by: Nick Piggin <[EMAIL PROTECTED]> fs/ext2/dir.c |2 - fs/namei.c |2 - fs/partitions/check.c |2 - fs/splice.c|4 +-- include/linux/highmem.h|4 --- include/linux/page-flags.h | 57 + mm/filemap.c | 28 +++--- mm/hugetlb.c |2 + mm/memory.c|9 +++ mm/page_io.c |2 - mm/swap_state.c|2 - mm/swapfile.c |2 - 12 files changed, 86 insertions(+), 30 deletions(-) Index: linux-2.6/include/linux/highmem.h === --- linux-2.6.orig/include/linux/highmem.h +++ linux-2.6/include/linux/highmem.h @@ -57,8 +57,6 @@ static inline void clear_user_highpage(s void *addr = kmap_atomic(page, KM_USER0); clear_user_page(addr, vaddr, page); kunmap_atomic(addr, KM_USER0); - /* Make sure this page is cleared on other CPU's too before using it */ - smp_wmb(); } #ifndef __HAVE_ARCH_ALLOC_ZEROED_USER_HIGHPAGE @@ -108,8 +106,6 @@ static inline void copy_user_highpage(st copy_user_page(vto, vfrom, vaddr, to); kunmap_atomic(vfrom, KM_USER0); kunmap_atomic(vto, KM_USER1); - /* Make sure this page is cleared on other CPU's too before using it */ - smp_wmb(); } #endif Index: linux-2.6/include/linux/page-flags.h === --- linux-2.6.orig/include/linux/page-flags.h +++ linux-2.6/include/linux/page-flags.h @@ -126,16 +126,60 @@ #define ClearPageReferenced(page) clear_bit(PG_referenced, &(page)->flags) #define TestClearPageReferenced(page) test_and_clear_bit(PG_referenced, &(page)->flags) -#define PageUptodate(page) test_bit(PG_uptodate, &(page)->flags) -#ifdef CONFIG_S390 +static inline int PageUptodate(struct page *page) +{ + WARN_ON(!PageLocked(page)); + return test_bit(PG_uptodate, &(page)->flags); +} + +/* + * PageUptodate to be used when not holding the page lock. + */ +static inline int PageUptodate_NoLock(struct page *page) +{ + int ret = test_bit(PG_uptodate, &(page)->flags); + + /* +* Must ensure that the data we read out of the page is loaded +* _after_ we've loaded page->flags and found that it is uptodate. +* See SetPageUptodate() for the other side of the story. +*/ + if (ret) + smp_rmb(); + + return ret; +} + static inline void SetPageUptodate(struct page *page) { + WARN_ON(!PageLocked(page)); +#ifdef CONFIG_S390 if (!test_and_set_bit(PG_uptodate, >flags)) page_test_and_clear_dirty(page); -} #else -#define SetPageUptodate(page) set_bit(PG_uptodate, &(page)->flags) + /* +* Memory barrier must be issued before setting the PG_uptodate bit, +* so all previous writes that served to bring the page uptodate are +* visible before PageUptodate becomes true. +* +* S390 is guaranteed to have a barrier in the test_and_set operation +* (see Documentation/atomic_ops.txt). +* +* This memory barrier should not need to provide ordering against +* DMA writes into the page, because the IO completion should really +* be doing that. +*/ + smp_wmb(); + set_bit(PG_uptodate, &(page)->flags);
[PATCH] Make aout executables work again
This a reworked, replacement version of x86-fix-vdso-mapping-for-aout-executables-* series of patches in -mm. 1) Define arch_setup_additional_pages() as weak in linux/interp.h 2) Include linux/interp.h in appropriate places 3) Conditionally call arch_setup_additional_pages() from binfmt_*.c if the arch defines it 4) EXPORT_SYMBOL_GPL(arch_setup_additional_pages) for all x86{64}, powerpc, sh - binfmt_aout can be built as module 5) Get rid of ARCH_HAS_SETUP_ADDITIONAL_PAGES from various places 6) For x86_64 - define and export arch_setup_additional_pages as a wrapper over syscall32_setup_pages, call it from ia32_aout.c Fully tested on x86. (Compile, boot and run the aout binary at http://ftp.funet.fi/pub/Linux/bin/as86.tar.Z). Other arches - changes are minimal but still I'll appreciate if someone tests them. Signed-off-by: Parag Warudkar <[EMAIL PROTECTED]> diff -urN --exclude='*git*' --exclude='scripts*' linux-2.6-us/arch/i386/kernel/sysenter.c linux-2.6-wk/arch/i386/kernel/sysenter.c --- linux-2.6-us/arch/i386/kernel/sysenter.c2007-02-09 17:29:34.0 -0500 +++ linux-2.6-wk/arch/i386/kernel/sysenter.c2007-02-09 17:54:48.0 -0500 @@ -137,6 +137,7 @@ up_write(>mmap_sem); return ret; } +EXPORT_SYMBOL_GPL(arch_setup_additional_pages); const char *arch_vma_name(struct vm_area_struct *vma) { diff -urN --exclude='*git*' --exclude='scripts*' linux-2.6-us/arch/powerpc/kernel/vdso.c linux-2.6-wk/arch/powerpc/kernel/vdso.c --- linux-2.6-us/arch/powerpc/kernel/vdso.c 2007-02-09 17:29:34.0 -0500 +++ linux-2.6-wk/arch/powerpc/kernel/vdso.c 2007-02-09 18:02:09.0 -0500 @@ -254,6 +254,7 @@ up_write(>mmap_sem); return rc; } +EXPORT_SYMBOL_GPL(arch_setup_additional_pages); const char *arch_vma_name(struct vm_area_struct *vma) { diff -urN --exclude='*git*' --exclude='scripts*' linux-2.6-us/arch/sh/kernel/vsyscall/vsyscall.c linux-2.6-wk/arch/sh/kernel/vsyscall/vsyscall.c --- linux-2.6-us/arch/sh/kernel/vsyscall/vsyscall.c 2007-02-09 17:29:34.0 -0500 +++ linux-2.6-wk/arch/sh/kernel/vsyscall/vsyscall.c 2007-02-09 18:02:51.0 -0500 @@ -85,6 +85,7 @@ up_write(>mmap_sem); return ret; } +EXPORT_SYMBOL_GPL(arch_setup_additional_pages); const char *arch_vma_name(struct vm_area_struct *vma) { diff -urN --exclude='*git*' --exclude='scripts*' linux-2.6-us/arch/x86_64/ia32/ia32_aout.c linux-2.6-wk/arch/x86_64/ia32/ia32_aout.c --- linux-2.6-us/arch/x86_64/ia32/ia32_aout.c 2007-01-26 18:49:37.0 -0500 +++ linux-2.6-wk/arch/x86_64/ia32/ia32_aout.c 2007-02-09 20:29:01.0 -0500 @@ -23,6 +23,7 @@ #include #include #include +#include #include #include @@ -410,6 +411,12 @@ send_sig(SIGKILL, current, 0); return retval; } + + retval = arch_setup_additional_pages(bprm, EXSTACK_DEFAULT); + if (retval < 0) { + send_sig(SIGKILL, current, 0); + return retval; + } current->mm->start_stack = (unsigned long)create_aout_tables((char __user *)bprm->p, bprm); diff -urN --exclude='*git*' --exclude='scripts*' linux-2.6-us/arch/x86_64/ia32/ia32_binfmt.c linux-2.6-wk/arch/x86_64/ia32/ia32_binfmt.c --- linux-2.6-us/arch/x86_64/ia32/ia32_binfmt.c 2007-01-27 17:23:08.0 -0500 +++ linux-2.6-wk/arch/x86_64/ia32/ia32_binfmt.c 2007-02-09 17:58:42.0 -0500 @@ -258,10 +258,6 @@ static void elf32_init(struct pt_regs *); -#define ARCH_HAS_SETUP_ADDITIONAL_PAGES 1 -#define arch_setup_additional_pages syscall32_setup_pages -extern int syscall32_setup_pages(struct linux_binprm *, int exstack); - #include "../../../fs/binfmt_elf.c" static void elf32_init(struct pt_regs *regs) diff -urN --exclude='*git*' --exclude='scripts*' linux-2.6-us/arch/x86_64/ia32/syscall32.c linux-2.6-wk/arch/x86_64/ia32/syscall32.c --- linux-2.6-us/arch/x86_64/ia32/syscall32.c 2007-02-09 17:29:34.0 -0500 +++ linux-2.6-wk/arch/x86_64/ia32/syscall32.c 2007-02-09 18:01:23.0 -0500 @@ -48,6 +48,12 @@ return ret; } +int arch_setup_additional_pages(struct linux_binprm* bprm, int exstack) +{ + return syscall32_setup_pages(bprm, exstack); +} +EXPORT_SYMBOL_GPL(arch_setup_additional_pages); + const char *arch_vma_name(struct vm_area_struct *vma) { if (vma->vm_start == VSYSCALL32_BASE && diff -urN --exclude='*git*' --exclude='scripts*' linux-2.6-us/fs/binfmt_aout.c linux-2.6-wk/fs/binfmt_aout.c --- linux-2.6-us/fs/binfmt_aout.c 2007-01-26 18:49:39.0 -0500 +++ linux-2.6-wk/fs/binfmt_aout.c 2007-02-09 17:53:33.0 -0500 @@ -22,6 +22,7 @@ #include #include #include +#include #include #include @@ -445,6 +446,14 @@ send_sig(SIGKILL, current, 0); return retval; } + + if(arch_setup_additional_pages) { + retval = arch_setup_additional_pages(bprm, EXSTACK_DEFAULT); + if (retval < 0) { +
[patch 2/3] fs: buffer don't PageUptodate without page locked
__block_write_full_page is calling SetPageUptodate without the page locked. This is unusual, but not incorrect, as PG_writeback is still set. However the next patch will require that SetPageUptodate always be called with the page locked. Simply don't bother setting the page uptodate in this case (it is unusual that the write path does such a thing anyway). Instead just leave it to the read side to bring the page uptodate when it notices that all buffers are uptodate. Signed-off-by: Nick Piggin <[EMAIL PROTECTED]> fs/buffer.c | 11 +-- 1 file changed, 1 insertion(+), 10 deletions(-) Index: linux-2.6/fs/buffer.c === --- linux-2.6.orig/fs/buffer.c +++ linux-2.6/fs/buffer.c @@ -1698,17 +1698,8 @@ done: * clean. Someone wrote them back by hand with * ll_rw_block/submit_bh. A rare case. */ - int uptodate = 1; - do { - if (!buffer_uptodate(bh)) { - uptodate = 0; - break; - } - bh = bh->b_this_page; - } while (bh != head); - if (uptodate) - SetPageUptodate(page); end_page_writeback(page); + /* * The page and buffer_heads can be released at any time from * here on. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[patch 1/3] mm: make read_cache_page synchronous
Ensure pages are uptodate after returning from read_cache_page, which allows us to cut out most of the filesystem-internal PageUptodate calls. I didn't have a great look down the call chains, but this appears to fixes 7 possible use-before uptodate in hfs, 2 in hfsplus, 1 in jfs, a few in ecryptfs, 1 in jffs2, and a possible cleared data overwritten with readpage in block2mtd. All depending on whether the filler is async and/or can return with a !uptodate page. Signed-off-by: Nick Piggin <[EMAIL PROTECTED]> drivers/mtd/devices/block2mtd.c |3 -- fs/afs/dir.c|3 -- fs/afs/mntpt.c | 11 +++-- fs/cramfs/inode.c |3 +- fs/ecryptfs/mmap.c | 10 fs/ext2/dir.c |3 -- fs/freevxfs/vxfs_subr.c |3 -- fs/minix/dir.c |1 fs/namei.c | 12 -- fs/nfs/dir.c|5 fs/nfs/symlink.c|6 - fs/ntfs/aops.h |3 -- fs/ntfs/attrib.c| 18 +-- fs/ntfs/file.c |3 -- fs/ntfs/super.c | 30 +++-- fs/ocfs2/symlink.c |7 - fs/partitions/check.c |3 -- fs/reiserfs/xattr.c |4 --- fs/sysv/dir.c | 10 fs/ufs/dir.c|6 - fs/ufs/util.c |6 + include/linux/pagemap.h | 11 + mm/filemap.c| 48 ++-- mm/swapfile.c |3 -- 24 files changed, 68 insertions(+), 144 deletions(-) Index: linux-2.6/fs/afs/dir.c === --- linux-2.6.orig/fs/afs/dir.c +++ linux-2.6/fs/afs/dir.c @@ -187,10 +187,7 @@ static struct page *afs_dir_get_page(str page = read_mapping_page(dir->i_mapping, index, NULL); if (!IS_ERR(page)) { - wait_on_page_locked(page); kmap(page); - if (!PageUptodate(page)) - goto fail; if (!PageChecked(page)) afs_dir_check_page(dir, page); if (PageError(page)) Index: linux-2.6/fs/afs/mntpt.c === --- linux-2.6.orig/fs/afs/mntpt.c +++ linux-2.6/fs/afs/mntpt.c @@ -77,13 +77,11 @@ int afs_mntpt_check_symlink(struct afs_v } ret = -EIO; - wait_on_page_locked(page); - buf = kmap(page); - if (!PageUptodate(page)) - goto out_free; if (PageError(page)) goto out_free; + buf = kmap(page); + /* examine the symlink's contents */ size = vnode->status.size; _debug("symlink to %*.*s", size, (int) size, buf); @@ -100,8 +98,8 @@ int afs_mntpt_check_symlink(struct afs_v ret = 0; - out_free: kunmap(page); + out_free: page_cache_release(page); out: _leave(" = %d", ret); @@ -184,8 +182,7 @@ static struct vfsmount *afs_mntpt_do_aut } ret = -EIO; - wait_on_page_locked(page); - if (!PageUptodate(page) || PageError(page)) + if (PageError(page)) goto error; buf = kmap(page); Index: linux-2.6/fs/cramfs/inode.c === --- linux-2.6.orig/fs/cramfs/inode.c +++ linux-2.6/fs/cramfs/inode.c @@ -180,7 +180,8 @@ static void *cramfs_read(struct super_bl struct page *page = NULL; if (blocknr + i < devsize) { - page = read_mapping_page(mapping, blocknr + i, NULL); + page = read_mapping_page_async(mapping, blocknr + i, + NULL); /* synchronous error? */ if (IS_ERR(page)) page = NULL; Index: linux-2.6/fs/ext2/dir.c === --- linux-2.6.orig/fs/ext2/dir.c +++ linux-2.6/fs/ext2/dir.c @@ -161,10 +161,7 @@ static struct page * ext2_get_page(struc struct address_space *mapping = dir->i_mapping; struct page *page = read_mapping_page(mapping, n, NULL); if (!IS_ERR(page)) { - wait_on_page_locked(page); kmap(page); - if (!PageUptodate(page)) - goto fail; if (!PageChecked(page)) ext2_check_page(page); if (PageError(page)) Index: linux-2.6/fs/freevxfs/vxfs_subr.c === --- linux-2.6.orig/fs/freevxfs/vxfs_subr.c +++ linux-2.6/fs/freevxfs/vxfs_subr.c @@ -74,10 +74,7 @@ vxfs_get_page(struct address_space *mapp pp =
[patch 0/3] 2.6.20 fix for PageUptodate memorder problem (try 3)
OK, I have got rid of SetPageUptodate_nowarn, and removed the atomic op from SetNewPageUptodate. Made PageUptodate_NoLock only issue the memory barrier is the page was uptodate (hopefully the compiler can thread the branch into the caller's branch). SetNewPageUptodate does not do the S390 page_test_and_clear_dirty, so I'd like to make sure that's OK. Rearranged the patch series so we don't have the first patch introducing a lot of WARN_ONs that are solved in the next two patches (rather, solve those issues first). Thanks, Nick -- SuSE Labs - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: NAK new drivers without proper power management?
On 2/9/07, Nigel Cunningham <[EMAIL PROTECTED]> wrote: On Fri, 2007-02-09 at 20:59 -0500, Lee Revell wrote: > On 2/9/07, Robert Hancock <[EMAIL PROTECTED]> wrote: > > I would disagree that it's a peripheral issue, it's pretty core these > > days, at least for any hardware that you can stuff in a laptop (though a > > fair number of desktops get suspended and resumed these days too). > > Servers are still the most important Linux market, and don't care > about suspend/resume. I would consider implementing suspend./resume > for a driver that will only be used in server or HPC class hardware a > waste of valuable development resources. Not necessarily. Imagine suspending to disk in order to replace a faulty card. That could be way faster and less disruptive than shutting down normally and loosing caches and so on. Hmm. If uptime is critical I would make sure to have redundant systems anyway and I would just reboot the thing. I would not expect the suspend/resume paths on server class hardware like 10gig ethernet, Infiniband adapters, or high end SCSI to be particularly well tested. Irrespective of the above, servers tend not to have too much in the way of hardware unique to them anyway, and even if you don't find it useful, that's not to say others won't want it. Yes but for such hardware, suspend/resume is likely to be a lot of work to implement, and I'd rather the developers devote those resources to making the driver as stable and performant as possible. I agree 100% that drivers for desktop and laptop hardware should be rejected if missing suspend/resume. Lee - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: NAK new drivers without proper power management?
Hi. On Fri, 2007-02-09 at 20:59 -0500, Lee Revell wrote: > On 2/9/07, Robert Hancock <[EMAIL PROTECTED]> wrote: > > I would disagree that it's a peripheral issue, it's pretty core these > > days, at least for any hardware that you can stuff in a laptop (though a > > fair number of desktops get suspended and resumed these days too). > > Servers are still the most important Linux market, and don't care > about suspend/resume. I would consider implementing suspend./resume > for a driver that will only be used in server or HPC class hardware a > waste of valuable development resources. Not necessarily. Imagine suspending to disk in order to replace a faulty card. That could be way faster and less disruptive than shutting down normally and loosing caches and so on. Irrespective of the above, servers tend not to have too much in the way of hardware unique to them anyway, and even if you don't find it useful, that's not to say others won't want it. Regards, Nigel - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: NAK new drivers without proper power management?
Hi. On Fri, 2007-02-09 at 19:50 -0600, Robert Hancock wrote: > Jeff Garzik wrote: > > Nigel Cunningham wrote: > >> Hi. > >> > >> On Fri, 2007-02-09 at 23:17 +0100, Arjan van de Ven wrote: > >>> On Sat, 2007-02-10 at 08:57 +1100, Nigel Cunningham wrote: > Hi. > > I don't think this is already done (feel free to correct me if I'm > wrong).. > > Can we start to NAK new drivers that don't have proper power management > implemented? There really is no excuse for writing a new driver and not > putting .suspend and .resume methods in anymore, is there? > >>> > >>> to a large degree, a device driver that doesn't suspend is better than > >>> no device driver at all, right? > >> > >> I'm not sure it is. It only makes more work for everyone else: We have > >> to help people figure out what causes their computer to fail to resume > >> (which can take quite a while), then get them them complain to driver > >> author, and the driver author has to submit patches to fix it. > >> > >> All of this is avoided if they'll just do it right in the first place. > > > > A lot of a lot of things could have been avoided, if they just did it > > right the first time. > > > > I think it's more valuable to users to get a basic network driver that > > pings or a basic ATA driver that reads/writes, than peripheral issues > > like suspend/resume. > > > > Certainly we should ask for it, but it shouldn't be a merge-stopper. > > > > Jeff > > I would disagree that it's a peripheral issue, it's pretty core these > days, at least for any hardware that you can stuff in a laptop (though a > fair number of desktops get suspended and resumed these days too). One > driver on a system which doesn't suspend or resume properly can ruin the > entire process, causing a ton of user frustration. Certainly I would > consider a driver without suspend/resume support to be incomplete. > > The trouble with deferring adding this support is that it's a lot harder > to add this support in after the fact than if it was considered during > the original driver development. > > I would be in favor of not merging drivers lacking suspend unless > there's a very good reason they're lacking it. > > It also kind of bothers me that if a driver has no suspend/resume > functions, and you suspend and resume the system, we don't complain > about it even though there's a very good chance that device is not going > to function properly. How about something in dmesg like: > > Warning: driver for device has no suspend or resume support. > Device may not function properly after resume. > > so that users know who to complain to. Maybe there are some devices that > truly don't need any handling for suspend, but if so I suspect the > number of those is small enough that adding empty functions would be a > good-enough solution. I've already made a start on doing just that. Rafael was clearly right in asserting that some drivers would need to have warnings suppressed, but that can be dealt with (see below). Even if no-one wants it for vanilla, I think I'll put this in Suspend2. It will at least help my users with debugging issues. Regards, Nigel [ 14.936667] Device driver platform lacks bus and class support for being suspended or resumed. [ 14.937612] Device driver vtcon0 lacks bus and class support for being suspended or resumed. [ 14.955258] Device driver pci:00 lacks bus and class support for being suspended or resumed. [ 15.004268] Device driver pnp0 lacks bus and class support for being suspended or resumed. [ 15.010618] Device driver mem lacks bus and class support for being suspended or resumed. [ 15.010779] Device driver kmem lacks bus and class support for being suspended or resumed. [ 15.010932] Device driver null lacks bus and class support for being suspended or resumed. [ 15.011090] Device driver port lacks bus and class support for being suspended or resumed. [ 15.011248] Device driver zero lacks bus and class support for being suspended or resumed. [ 15.011414] Device driver full lacks bus and class support for being suspended or resumed. [ 15.011566] Device driver random lacks bus and class support for being suspended or resumed. [ 15.011723] Device driver urandom lacks bus and class support for being suspended or resumed. [ 15.011875] Device driver kmsg lacks bus and class support for being suspended or resumed. [ 15.305495] Device driver mcelog lacks bus and class support for being suspended or resumed. [ 15.305688] Device driver msr0 lacks bus and class support for being suspended or resumed. [ 15.306571] Device driver snapshot lacks bus and class support for being suspended or resumed. [ 15.359006] Device driver fb0 lacks bus and class support for being suspended or resumed. [ 15.359471] Device driver vtcon1 lacks bus and class support for being suspended or resumed. [ 15.455642] Device driver tty lacks bus and class support
Re: NAK new drivers without proper power management?
On 2/9/07, Robert Hancock <[EMAIL PROTECTED]> wrote: I would disagree that it's a peripheral issue, it's pretty core these days, at least for any hardware that you can stuff in a laptop (though a fair number of desktops get suspended and resumed these days too). Servers are still the most important Linux market, and don't care about suspend/resume. I would consider implementing suspend./resume for a driver that will only be used in server or HPC class hardware a waste of valuable development resources. Lee - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [GIT PATCH] ACPI patches for 2.6.21
On Fri, 09 Feb 2007, Pavel Machek wrote: > Not including another /proc/acpi/ibm -like nightmare, is it? Don't worry, I am already on my way to kill /proc/acpi/ibm... :-) -- "One disk to rule them all, One disk to find them. One disk to bring them all and in the darkness grind them. In the Land of Redmond where the shadows lie." -- The Silicon Valley Tarot Henrique Holschuh - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: -mm merge plans for 2.6.21
> From: Russell King > Newsgroups: gmane.linux.kernel > Subject: Re: -mm merge plans for 2.6.21 > Date: Fri, 9 Feb 2007 22:03:27 + [] > However: > > sys_foo(int a, int c, unsigned long long b, unsigned long long d) > > is entirely reasonable and leaves us with spare room for one additional > 32-bit arg to be passed. > >> Is that actually written anywhere, and does anyone bother to check? > > Mostly mailing list archives I'd guess. As far as anyone bothering > to check, that's me when I'm aware of new syscalls... which typically > happens a long time after the syscalls have been introduced on x86 > etc. Why not to have "the most large argument first" rule here? sys_bar(largest,..., larger,..., smaller,..., small); Put it in Documentation/ABI/README and bother only, when compiller will bark on -mm tree. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] kfifo: overflow of unsigned integer
2007/2/9, Andrew Morton <[EMAIL PROTECTED]>: On Thu, 8 Feb 2007 20:16:55 +0800 "Cong WANG" <[EMAIL PROTECTED]> wrote: > 2007/2/8, Andrew Morton <[EMAIL PROTECTED]>: > > On Thu, 8 Feb 2007 17:07:28 +0800 "Cong WANG" <[EMAIL PROTECTED]> wrote: > > > > > Kfifo is a ring-buffer in kernel which can be used as a lock-free way > > > for concurrent read/write when there are only one producer and one > > > consumer. Details of its design can be found in kernel/kfifo.c and > > > include/linux/kfifo.h. > > > > > > You will find that the 'in' and 'out' fields of 'struct kfifo' are > > > both represented as 'unsigned int' and in most cases 'in' is larger > > > than 'out' and their difference will NOT be over 'size'. > > > > > > Now the problem is that 'in' will be *smaller* than 'out' when 'in' > > > overflows and 'out' doesn't (Yes, this may occur quietly.). This is > > > NOT what we expect, though it may not cause any serious problems if we > > > carefully use kfifo*() functions. And this is really a bug. > > > > You seem to be saying that it's not a bug, but it's a bug. > > > > Exactly what goes wrong? > > I wrote a module on my machine to test this bug. And when the overflow > occurs, I cann't put any data into the fifo even though it is not > full. Why did you remove the mailing list? Please don't do that. Sorry. I used the poor 'reply'. I can't find any bug. I converted the code so that it'll run in userspace: http://userweb.kernel.org/~akpm/kfifo.c http://userweb.kernel.org/~akpm/kfifo.h Please see if you can reproduce the problem with that setup and then let's see if we can understand what's going on, and fix it. Thanks for your work. And you are right. I think the OLD /proc API which I used in my module confused my eyes. I got completely lost by that. OLD /proc API is very bad, isn't it? BTW, can you tell me which way do you use to exchange information between user-space and kernel-space when debugging the kernel? Thanks again! And have a nice day! - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: NAK new drivers without proper power management?
Jeff Garzik wrote: Nigel Cunningham wrote: Hi. On Fri, 2007-02-09 at 23:17 +0100, Arjan van de Ven wrote: On Sat, 2007-02-10 at 08:57 +1100, Nigel Cunningham wrote: Hi. I don't think this is already done (feel free to correct me if I'm wrong).. Can we start to NAK new drivers that don't have proper power management implemented? There really is no excuse for writing a new driver and not putting .suspend and .resume methods in anymore, is there? to a large degree, a device driver that doesn't suspend is better than no device driver at all, right? I'm not sure it is. It only makes more work for everyone else: We have to help people figure out what causes their computer to fail to resume (which can take quite a while), then get them them complain to driver author, and the driver author has to submit patches to fix it. All of this is avoided if they'll just do it right in the first place. A lot of a lot of things could have been avoided, if they just did it right the first time. I think it's more valuable to users to get a basic network driver that pings or a basic ATA driver that reads/writes, than peripheral issues like suspend/resume. Certainly we should ask for it, but it shouldn't be a merge-stopper. Jeff I would disagree that it's a peripheral issue, it's pretty core these days, at least for any hardware that you can stuff in a laptop (though a fair number of desktops get suspended and resumed these days too). One driver on a system which doesn't suspend or resume properly can ruin the entire process, causing a ton of user frustration. Certainly I would consider a driver without suspend/resume support to be incomplete. The trouble with deferring adding this support is that it's a lot harder to add this support in after the fact than if it was considered during the original driver development. I would be in favor of not merging drivers lacking suspend unless there's a very good reason they're lacking it. It also kind of bothers me that if a driver has no suspend/resume functions, and you suspend and resume the system, we don't complain about it even though there's a very good chance that device is not going to function properly. How about something in dmesg like: Warning: driver for device has no suspend or resume support. Device may not function properly after resume. so that users know who to complain to. Maybe there are some devices that truly don't need any handling for suspend, but if so I suspect the number of those is small enough that adding empty functions would be a good-enough solution. -- Robert Hancock Saskatoon, SK, Canada To email, remove "nospam" from [EMAIL PROTECTED] Home Page: http://www.roberthancock.com/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 3/3] ext2: use perform_write aop
On Sat, 10 Feb 2007 02:34:07 +0100 Nick Piggin <[EMAIL PROTECTED]> wrote: > On Fri, Feb 09, 2007 at 11:45:39AM -0800, Andrew Morton wrote: > > On Fri, 9 Feb 2007 11:14:55 -0800 Andrew Morton <[EMAIL PROTECTED]> wrote: > > > > > If so, that might be preventable by leaving the buffer nonuptodate. > > > > oh, OK, it was buffer_new(), so zeroes are the right thing for a reader to > > see. > > > > But if it wasn't buffer_new() then the appropriate thing for the reader to > > see is what's on the disk. But __block_prepare_write() won't read a buffer > > which is fully-inside the write area from disk. > > > > And that's seemingly OK, because if a reader gets in there after the short > > copy, that reader will see the non-uptodate buffer and will populate it > > from disk. > > > > But doing that will overwrite the data which the write() caller managed to > > copy into the page before it took a fault. And that's not OK because > > block_perform_write() does iovec_iterator_advance(i, copied) in this case > > and hence will not rerun the copy after acquiring the page lock? > > Hmm, yeah. This can be handled by not advancing partially into a !uptodate > buffer. Think so, yeah. Overall, the implementation you have there seems reasonable to me. Basically it's passing the responsibility for preventing the deadlock and the exposure-of-zeroes problem down into the filesystem itself, where we have visibility of the state of the various subsections of the page and can take appropriate actions in response to that. It's got conceptually harder to follow as a result, which is a shame. But still no magic bullet is on offer. I pity the poor schmuck who has to write ext3_journalled_perform_write(), ext3_ordered_perform_write(), ext3_writeback_perform_write(), ext3_writeback_nobh_perform_write() and all that other stuff. But I think we need to do that pretty soon to validate the whole approach. Also xfs and reiser3. NTFS will be interesting from the can-this-be-made-to-work POV. Is NFS vulnerable to the deadlock? It looks to be. Shudder. We'd need to find a way of communicating all this to the poor old fs maintainers. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH -mm] readahead: partial sendfile fix
Enable readahead to handle partially done read requests, e.g. sendfile(188, 1921, [1478592], 19553028) = 37440 sendfile(188, 1921, [1516032], 19515588) = 28800 sendfile(188, 1921, [1544832], 19486788) = 37440 sendfile(188, 1921, [1582272], 19449348) = 14400 sendfile(188, 1921, [1596672], 19434948) = 37440 sendfile(188, 1921, [1634112], 19397508) = 37440 In the above strace log, - some lighttpd is doing _sequential_ reading - every sendfile() returns with only _partial_ work done page_cache_readahead() expects that if it returns @next_index, it will be called exactly at @next_index next time. That's not true here. So the pattern will be falsely recognized as a random read trace. Also documented in "Linux AIO Performance and Robustness for Enterprise Workloads" section 3.5: sendfile(fd, 0, 2GB, fd2) = 8192, tells readahead about up to 128KB of the read sendfile(fd, 8192, 2GB - 8192, fd2) = 8192, tells readahead about 8KB - 132KB of the read sendfile(fd, 16384, 2GB - 16384, fd2) = 8192, tells readahead about 16KB-140KB of the read ... This confuses the readahead logic about the I/O pattern which appears to be 0-128K, 8K-132K, 16K-140K instead of clear sequentiality from 0-2GB that is really appropriate. Retry based AIO shares the same read pattern and readahead problem. In this case, simply disabling readahead on restarted aio is not a good option: we still need to call into readahead in the rare case of (req_size > ra_max). Signed-off-by: Fengguang Wu <[EMAIL PROTECTED]> --- mm/filemap.c |3 --- mm/readahead.c |9 + 2 files changed, 9 insertions(+), 3 deletions(-) --- linux-2.6.20-rc6-mm3.orig/mm/readahead.c +++ linux-2.6.20-rc6-mm3/mm/readahead.c @@ -581,6 +581,15 @@ page_cache_readahead(struct address_spac int sequential; /* +* A previous read request is partially completed, +* causing the retried/continued read calls into us prematurely. +*/ + if (ra->start < offset && + offset < ra->prev_page && +ra->prev_page < ra->ahead_start + ra->ahead_size) + goto out; + + /* * We avoid doing extra work and bogusly perturbing the readahead * window expansion logic. */ --- linux-2.6.20-rc6-mm3.orig/mm/filemap.c +++ linux-2.6.20-rc6-mm3/mm/filemap.c @@ -915,9 +915,6 @@ void do_generic_mapping_read(struct addr if (!isize) goto out; - if (unlikely(aio_restarted())) - next_index = last_index; /* Avoid repeat readahead */ - end_index = (isize - 1) >> PAGE_CACHE_SHIFT; for (;;) { struct page *page; - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 3/3] ext2: use perform_write aop
On Fri, Feb 09, 2007 at 11:45:39AM -0800, Andrew Morton wrote: > On Fri, 9 Feb 2007 11:14:55 -0800 Andrew Morton <[EMAIL PROTECTED]> wrote: > > > If so, that might be preventable by leaving the buffer nonuptodate. > > oh, OK, it was buffer_new(), so zeroes are the right thing for a reader to > see. > > But if it wasn't buffer_new() then the appropriate thing for the reader to > see is what's on the disk. But __block_prepare_write() won't read a buffer > which is fully-inside the write area from disk. > > And that's seemingly OK, because if a reader gets in there after the short > copy, that reader will see the non-uptodate buffer and will populate it > from disk. > > But doing that will overwrite the data which the write() caller managed to > copy into the page before it took a fault. And that's not OK because > block_perform_write() does iovec_iterator_advance(i, copied) in this case > and hence will not rerun the copy after acquiring the page lock? Hmm, yeah. This can be handled by not advancing partially into a !uptodate buffer. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: sata_nv - ADMA issues with 2.6.20
David R wrote: I've just upgraded my home server to 2.6.20. It's got an Athlon64 on an ASUS nForce-4 motherboard running a 32 bit kernel. I've had to fall back to using sata_nv.adma=0 on the kernel command line. One of the NCQ capable drives repeatedly produced the following errors. There wasn't much disk IO going on at the time. It's perfectly happy now with ADMA disabled. Strange thing is the other identical drive ata8 showed no problems (they're both part of a software raid1) Some clues follow. Cheers David Feb 9 18:40:27 server kernel: ata7: EH in ADMA mode, notifier 0x0 notifier_error 0x0 gen_ctl 0x1501000 status 0x400 Feb 9 18:40:27 server kernel: ata7: CPB 0: ctl_flags 0x1f, resp_flags 0x0 Feb 9 18:40:27 server kernel: ata7: CPB 1: ctl_flags 0x1f, resp_flags 0x1 Feb 9 18:40:27 server kernel: ata7: CPB 2: ctl_flags 0x1f, resp_flags 0x1 Feb 9 18:40:27 server kernel: ata7: CPB 3: ctl_flags 0x1f, resp_flags 0x1 etc etc.. Feb 9 18:40:29 server kernel: ata7: CPB 27: ctl_flags 0x1f, resp_flags 0x1 Feb 9 18:40:29 server kernel: ata7: CPB 28: ctl_flags 0x1f, resp_flags 0x1 Feb 9 18:40:29 server kernel: ata7: CPB 29: ctl_flags 0x1f, resp_flags 0x1 Feb 9 18:40:29 server kernel: ata7: CPB 30: ctl_flags 0x1f, resp_flags 0x1 Feb 9 18:40:29 server kernel: ata7: Resetting port Feb 9 18:40:29 server kernel: ata7.00: exception Emask 0x0 SAct 0x1 SErr 0x0 action 0x2 frozen Feb 9 18:40:29 server kernel: ata7.00: cmd 61/08:00:1f:e4:50/00:00:09:00:00/40 tag 0 cdb 0x0 data 4096 out Feb 9 18:40:29 server kernel: res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) So it was tag 0 that timed out , and according to the CPBs the controller indeed believes the command is still outstanding, i.e. we didn't lose an interrupt. I'm suspicious of the fact that only one of two identical drives produced this error.. some kind of hardware-related problem perhaps? 30 seconds is an awfully long time for a drive to take to finish a command. You can also try disabling NCQ without disabling ADMA and see what that does: echo 1 > /sys/block/sdX/device/queue_depth -- Robert Hancock Saskatoon, SK, Canada To email, remove "nospam" from [EMAIL PROTECTED] Home Page: http://www.roberthancock.com/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: -mm merge plans for 2.6.21
On Sat, 10 Feb 2007 02:15:11 +0100 Carl-Daniel Hailfinger <[EMAIL PROTECTED]> wrote: > Andrew Morton wrote: > > On Fri, 9 Feb 2007 19:37:53 + > > Alan <[EMAIL PROTECTED]> wrote: > > > >> Please just push the EDAC K8 stuff. > > > > OK. > > > >> Andi will say "no" from now until the > >> end of time, but end users want it, distributions want it, and Andi is > >> not the EDAC maintainer so should consider himself overruled on what > >> isn't a technical issue but a personal political viewpoint. > > > > I'll just tell him I sent it by accident. > > Could you please merge ACPI-DSDT-in-initrd for the same reasons? > I don't know what that is. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
arch/arm: typos in KERN_ERR, KERN_INFO
Typos in KERN_ERR, KERN_INFO. Signed-off-by: Nicolas Kaiser <[EMAIL PROTECTED]> --- arch/arm/mach-imx/dma.c |2 +- arch/arm/mach-s3c2410/pm-simtec.c |2 +- arch/arm/plat-omap/dma.c |2 +- 3 files changed, 3 insertions(+), 3 deletions(-) diff -ur a/arch/arm/mach-imx/dma.c b/arch/arm/mach-imx/dma.c --- a/arch/arm/mach-imx/dma.c 2006-11-29 22:57:37.0 +0100 +++ b/arch/arm/mach-imx/dma.c 2007-02-09 23:42:15.0 +0100 @@ -234,7 +234,7 @@ imxdma->resbytes = dma_length; if (!sg || !sgcount) { - printk(KERN_ERR "imxdma%d: imx_dma_setup_sg epty sg list\n", + printk(KERN_ERR "imxdma%d: imx_dma_setup_sg empty sg list\n", dma_ch); return -EINVAL; } diff -ur a/arch/arm/mach-s3c2410/pm-simtec.c b/arch/arm/mach-s3c2410/pm-simtec.c --- a/arch/arm/mach-s3c2410/pm-simtec.c 2007-01-21 15:40:56.0 +0100 +++ b/arch/arm/mach-s3c2410/pm-simtec.c 2007-02-09 23:40:46.0 +0100 @@ -52,7 +52,7 @@ !machine_is_aml_m5900()) return 0; - printk(KERN_INFO "Simtec Board Power Manangement" COPYRIGHT "\n"); + printk(KERN_INFO "Simtec Board Power Management" COPYRIGHT "\n"); gstatus4 = (__raw_readl(S3C2410_BANKCON7) & 0x3) << 30; gstatus4 |= (__raw_readl(S3C2410_BANKCON6) & 0x3) << 28; diff -ur a/arch/arm/plat-omap/dma.c b/arch/arm/plat-omap/dma.c --- a/arch/arm/plat-omap/dma.c 2006-11-29 22:57:37.0 +0100 +++ b/arch/arm/plat-omap/dma.c 2007-02-09 23:39:56.0 +0100 @@ -1053,7 +1053,7 @@ void omap_set_lcd_dma_b1_vxres(unsigned long vxres) { if (omap_dma_in_1510_mode()) { - printk(KERN_ERR "DMA virtual resulotion is not supported " + printk(KERN_ERR "DMA virtual resolution is not supported " "in 1510 mode\n"); BUG(); } - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [GIT PATCH] ACPI patches for 2.6.21
On Fri, 9 Feb 2007 23:09:29 + Pavel Machek <[EMAIL PROTECTED]> wrote: > Hi! > > > Per your request, and the request of the distros, we've changed > > how ACPICA Core releases are integrated into Linux so that each > > upstream (CVS) check-in appears as a single git commit. > > While this process is not yet perfect, it should be vastly better > > than previous "code drops" in allowing git bisect to work, > > and allowing distros to cherry-pick individual fixes. > > > > The "bay" driver is new (and marked EXPERIMENTAL) -- adding initial > > hot-plug support for ACPI controlled drive bays such as the > > IBM ultrabay or the Dell Module Bay. > > Could you describe userland interface it uses? /proc? Will it be > usable for bays on notebooks not using acpi? The user interface for the Bay driver is via sysfs - it is a platform driver, so once you load it you will find 2 files created under /sys/devices/platform/bay.X, "eject" and "present". When the user writes 1 to the "eject" file, the driver will call the ACPI eject routine - this normally blinks leds and does whatever the system vendor thinks is necessary to safely eject the device. The "present" file will query the driver to determine if the device is present or not (note, not good for poll(), it's on my todo list...). Depending on the system implementation, when the user presses the eject button on the laptop for the bay device, the driver will inform user space via a CHANGE uevent. User space is then responsible for doing whatever needs to be done to cleanup and safely eject the drive, the driver will not call the ACPI eject routine without user space initiation. The driver currently only handles module bays that use ACPI to send eject notifications or need "something" done before ejecting (i.e. _EJ0 in ACPI). The bay driver will also register with the dock driver if the bay is on the dock device (such as with the IBM X60) so that when the dock station is ejected, the bay driver is notified with the eject request as well. This notification will be passed to user space via the CHANGE uevent. Kristen - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Strange message in log upon resuming a PCIe system
Larry Finger wrote: A bcm43xx user is having problems with suspend/resume with a PCIe system. This may be the first time we have tried to resume with PCIe. The problem occurs someplace within the initialization of the bcm43xx chip and we are still tracing it; however, there are some strange messages in the log from the pnp, namely: pnp: Device 00:04 does not support activation. pnp: Device 00:05 does not support activation. How does one trace back these device numbers? The output of 'lspci -v' shows the following: Those aren't PCI devices, they're PnP devices, likely on the motherboard. If you look in sysfs (not booted into Linux right now so I can't tell you exactly where) you can get some idea of what those are. In any case I think those are messages are harmless and unrelated to any bcm43xx problems. -- Robert Hancock Saskatoon, SK, Canada To email, remove "nospam" from [EMAIL PROTECTED] Home Page: http://www.roberthancock.com/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: -mm merge plans for 2.6.21
Andrew Morton wrote: > On Fri, 9 Feb 2007 19:37:53 + > Alan <[EMAIL PROTECTED]> wrote: > >> Please just push the EDAC K8 stuff. > > OK. > >> Andi will say "no" from now until the >> end of time, but end users want it, distributions want it, and Andi is >> not the EDAC maintainer so should consider himself overruled on what >> isn't a technical issue but a personal political viewpoint. > > I'll just tell him I sent it by accident. Could you please merge ACPI-DSDT-in-initrd for the same reasons? Regards, Carl-Daniel - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[git patches] libata updates 1 of 3
(just sent this upstream to Andrew and Linus) This is libata push 1 of 3. This is largely the "accumulated driver updates" push: lots of minor changes. A few new drivers. The most notable thing is "devres", an optional subsystem for drivers that greatly simplifies the task of driver housekeeping, if you have to acquire+map then later unmap+free a bunch of MMIO resources, some PIO resources, an IRQ (or two or three) like we do with ATA host controllers. devres is only used by libata drivers at the moment, but the APIs are generic enough to be used by any driver. This should enable the elimination of several highly common code patterns in various drivers. devres, in turn, has enabled us to finally merge the patches that convert libata to using the lib/iomap.c stuff. Anyone with eyes can see the code savings in libata-sff that iomap brings. Kudos to Tejun Heo for the devres work. I will be pushing ACPI support on Saturday or Sunday, in order to stage it into a separate 2.6.20-gitX snapshot. That's libata push 2 of 3. The third push will eliminate the ugly split-driver configuration created by quirk_intel_ide_combined() and request_resource(), whereby libata claims one half of a controller (SATA), and old-IDE claims the other half (PATA). libata wins the battle for DMA and IRQ, and so old-IDE (PATA) is driven via the slower PIO data xfer methods. Was necessary at the time, as libata lacked ATAPI support and old-IDE failed to handle irq storms created by newer Intel IDE irq-ack behavior. But times have changed, and neither conditions remain true. So we can remove the hacks. Please pull from 'upstream-linus' branch of master.kernel.org:/pub/scm/linux/kernel/git/jgarzik/libata-dev.git upstream-linus to receive the following updates: Documentation/driver-model/devres.txt | 268 +++ drivers/ata/Kconfig | 41 ++- drivers/ata/Makefile |3 + drivers/ata/ahci.c| 236 +++--- drivers/ata/ata_generic.c |8 +- drivers/ata/ata_piix.c| 56 ++-- drivers/ata/libata-core.c | 591 - drivers/ata/libata-eh.c |7 +- drivers/ata/libata-scsi.c | 98 +++-- drivers/ata/libata-sff.c | 641 --- drivers/ata/libata.h |4 +- drivers/ata/pata_ali.c| 32 +- drivers/ata/pata_amd.c| 36 +- drivers/ata/pata_artop.c | 12 +- drivers/ata/pata_atiixp.c |6 +- drivers/ata/pata_cmd64x.c | 18 +- drivers/ata/pata_cs5520.c | 41 ++- drivers/ata/pata_cs5530.c | 41 +- drivers/ata/pata_cs5535.c |6 +- drivers/ata/pata_cypress.c|6 +- drivers/ata/pata_efar.c |6 +- drivers/ata/pata_hpt366.c | 26 +- drivers/ata/pata_hpt37x.c | 61 +-- drivers/ata/pata_hpt3x2n.c| 26 +- drivers/ata/pata_hpt3x3.c |8 +- drivers/ata/pata_isapnp.c | 21 +- drivers/ata/pata_it8213.c | 354 +++ drivers/ata/pata_it821x.c | 58 +-- drivers/ata/pata_ixp4xx_cf.c | 50 +-- drivers/ata/pata_jmicron.c|8 +- drivers/ata/pata_legacy.c | 166 drivers/ata/pata_marvell.c| 12 +- drivers/ata/pata_mpc52xx.c| 538 +++ drivers/ata/pata_mpiix.c | 113 ++--- drivers/ata/pata_netcell.c|6 +- drivers/ata/pata_ns87410.c|6 +- drivers/ata/pata_oldpiix.c| 24 +- drivers/ata/pata_opti.c | 24 +- drivers/ata/pata_optidma.c| 40 +- drivers/ata/pata_pcmcia.c | 27 +- drivers/ata/pata_pdc2027x.c | 122 ++--- drivers/ata/pata_pdc202xx_old.c | 41 +- drivers/ata/pata_platform.c | 67 +--- drivers/ata/pata_qdi.c| 50 ++- drivers/ata/pata_radisys.c|6 +- drivers/ata/pata_rz1000.c |6 +- drivers/ata/pata_sc1200.c |6 +- drivers/ata/pata_serverworks.c| 31 +- drivers/ata/pata_sil680.c |8 +- drivers/ata/pata_sis.c| 70 +++- drivers/ata/pata_sl82c105.c | 10 +- drivers/ata/pata_triflex.c|6 +- drivers/ata/pata_via.c| 22 +- drivers/ata/pata_winbond.c| 49 ++- drivers/ata/pdc_adma.c| 120 ++ drivers/ata/sata_inic162x.c | 781 + drivers/ata/sata_mv.c | 200 +++-- drivers/ata/sata_nv.c | 629 --- drivers/ata/sata_promise.c| 379 +++- drivers/ata/sata_qstor.c | 138 ++ drivers/ata/sata_sil.c| 99 ++---
Re: [ipw3945-devel] [ANNOUNCE] d80211 based driver for Intel PRO/Wireless 3945ABG
Hi all! On Fre, 09 Feb 2007, James Ketrenos wrote: > We are pleased to announce the availability of a new driver for the > Intel PRO/Wireless 3945ABG Network Connection adapter. This new driver I am impressed: I had 2.6.20 running with ipw3945 + wpa_supplicant. I installed the d80211 system and the new driver, rebooted, and you won't believe it, I had network connection even with WEP encryption. That was a big surprise for me that it worked out of the box without any magic. If you are interested in any dmesg/log output, please let me know. Ahhh ... one thing: The LED on my Acer Laptop (TM3012) does not show up, maybe because I have CONFIG_D80211_LEDS off? Again, thanks a lot for the good work! Norbert --- Dr. Norbert Preining <[EMAIL PROTECTED]>Università di Siena Debian Developer <[EMAIL PROTECTED]> Debian TeX Group gpg DSA: 0x09C5B094 fp: 14DF 2E6C 0307 BE6D AD76 A9C0 D2BF 4AA3 09C5 B094 --- HARPENDEN (n.) The coda to a phone conversion, consisting of about eight exchanges, by which people try gracefully to get off the line. --- Douglas Adams, The Meaning of Liff - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: + smaps-add-clear_refs-file-to-clear-reference.patch added to -mm tree
Do not clear references when the task_struct's mm is NULL by using /proc/pid/clear_refs. Also, use mmap_sem since the mm_struct's VMA's are being iterated in fs/proc/task_mmu.c. Reported by Oleg Nesterov <[EMAIL PROTECTED]>. Signed-off-by: David Rientjes <[EMAIL PROTECTED]> --- fs/proc/base.c |9 - 1 files changed, 8 insertions(+), 1 deletions(-) diff --git a/fs/proc/base.c b/fs/proc/base.c --- a/fs/proc/base.c +++ b/fs/proc/base.c @@ -719,6 +719,7 @@ static ssize_t clear_refs_write(struct file *file, const char __user *buf, size_t count, loff_t *ppos) { struct task_struct *task; + struct mm_struct *mm; char buffer[PROC_NUMBUF], *end; memset(buffer, 0, sizeof(buffer)); @@ -733,7 +734,13 @@ static ssize_t clear_refs_write(struct file *file, const char __user *buf, task = get_proc_task(file->f_path.dentry->d_inode); if (!task) return -ESRCH; - clear_refs_smap(task->mm->mmap); + mm = get_task_mm(task); + if (mm) { + down_read(>mmap_sem); + clear_refs_smap(mm->mmap); + up_read(>mmap_sem); + mmput(mm); + } put_task_struct(task); if (end - buffer == 0) return -EIO; - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: + smaps-add-clear_refs-file-to-clear-reference.patch added to -mm tree
On Sat, 10 Feb 2007 03:39:58 +0300 Oleg Nesterov <[EMAIL PROTECTED]> wrote: > David Rientjes wrote: > > > > +static ssize_t clear_refs_write(struct file *file, const char __user *buf, > > + size_t count, loff_t *ppos) > > +{ > > ... > > + task = get_proc_task(file->f_path.dentry->d_inode); > > + if (!task) > > + return -ESRCH; > > + clear_refs_smap(task->mm->mmap); > > task->mm may be NULL and not stable, this needs get_task_mm() (may fail). yup. > Don't we also need ->mmap_sem to iterate vmas? and yup. Like this? --- a/fs/proc/base.c~smaps-add-clear_refs-file-to-clear-reference-fix +++ a/fs/proc/base.c @@ -720,6 +720,7 @@ static ssize_t clear_refs_write(struct f { struct task_struct *task; char buffer[PROC_NUMBUF], *end; + struct mm_struct *mm; memset(buffer, 0, sizeof(buffer)); if (count > sizeof(buffer) - 1) @@ -733,7 +734,11 @@ static ssize_t clear_refs_write(struct f task = get_proc_task(file->f_path.dentry->d_inode); if (!task) return -ESRCH; - clear_refs_smap(task->mm->mmap); + mm = get_task_mm(task); + if (mm) { + clear_refs_smap(mm); + mmput(mm); + } put_task_struct(task); if (end - buffer == 0) return -EIO; diff -puN fs/proc/task_mmu.c~smaps-add-clear_refs-file-to-clear-reference-fix fs/proc/task_mmu.c --- a/fs/proc/task_mmu.c~smaps-add-clear_refs-file-to-clear-reference-fix +++ a/fs/proc/task_mmu.c @@ -350,11 +350,15 @@ static int show_smap(struct seq_file *m, return show_map_internal(m, v, ); } -void clear_refs_smap(struct vm_area_struct *vma) +void clear_refs_smap(struct mm_struct *mm) { - for (; vma; vma = vma->vm_next) + struct vm_area_struct *vma; + + down_read(>mmap_sem); + for (vma = mm->mmap; vma; vma = vma->vm_next) if (vma->vm_mm && !is_vm_hugetlb_page(vma)) for_each_pmd(vma, clear_refs_one_pmd, NULL); + up_read(>mmap_sem); } static void *m_start(struct seq_file *m, loff_t *pos) diff -puN include/linux/proc_fs.h~smaps-add-clear_refs-file-to-clear-reference-fix include/linux/proc_fs.h --- a/include/linux/proc_fs.h~smaps-add-clear_refs-file-to-clear-reference-fix +++ a/include/linux/proc_fs.h @@ -104,7 +104,7 @@ int proc_pid_readdir(struct file * filp, unsigned long task_vsize(struct mm_struct *); int task_statm(struct mm_struct *, int *, int *, int *, int *); char *task_mem(struct mm_struct *, char *); -void clear_refs_smap(struct vm_area_struct *); +void clear_refs_smap(struct mm_struct *mm); extern struct proc_dir_entry *create_proc_entry(const char *name, mode_t mode, struct proc_dir_entry *parent); _ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
PROBLEM: xt_state compiles without errors but cannot be loaded
[1.] module: xt_state compiles without errors but cannot be loaded [2.] Here's what shows up in /var/log/messages: kernel: xt_state: Unknown symbol nf_conntrack_untracked kernel: xt_state: Unknown symbol nf_ct_l3proto_module_put kernel: xt_state: disagrees about version of symbol xt_unregister_matches kernel: xt_state: Unknown symbol xt_unregister_matches kernel: xt_state: Unknown symbol nf_ct_l3proto_try_module_get kernel: xt_state: disagrees about version of symbol xt_register_matches kernel: xt_state: Unknown symbol xt_register_matches [3.] modules, netfilter: [4.] 2.6.20: [5.] 2.6.19.2 possibly 2.6.9.3: [7.] try loading module using insmod [8.] CentOS 4.4 [8.1.] apf 0.9.6 (modified to use xt_state), and insmod - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[ANNOUNCE] sparse-0.2-cl2 is now available
Temporarily at: http://userweb.kernel.org/~chrisl/sparse-0.2-cl2 Will appear later at: http://ftp.kernel.org//pub/linux/kernel/people/chrisl/patches/sparse/sparse-0.2-cl2/ I have been play with sparse to add more Stanford checker style of checking. The paper is "Checking System Rules Using System- Specific, Programmer-Written Compiler Extensions" by Dawson Engler etc. Unlike the Stanford checker and smatch, this checker is working on the linearization level instead of AST level. Linearization code can be very convenient (when it works) to trace the data flow because pseudo is in SSA form. There is define/user chain to avoid scan every instruction. I take the malloc checking for example to explain how the checker works. The checking usually happen in three step: The first step is scanning the linearize instruction. It look for relevant operations. For malloc checker, the task is find out the malloc/free function call and usage of malloced pointer. The second step is converting the relevant operations into checker instruction. The checker instruction is a simplification of the whole program, only contain the operation relevant to checker. The third step is executing the checker instruction. It try to execute every possible execution flow in the function. The execution engine will let the checker instruction perform state changes. Thanks to step two, the size and complexity of the of program has been greatly reduced. The new checking has been very fast, it add a few seconds to the make C=1 run. Again, comment and feed back are always welcome. Chris Change log in sparse-0.2-cl2: - adding pointer signedness fix - adding spinlock checking Change log in sparse-0.2-cl1: The most interesting part is the inline function annotation. The new checker can find out inlined function usage. The interrupt checker does not depend on x86 asm instruction any more. origin.patch 006eff06c7adcfb0d06c6fadf6e9b64f0488b2bf URL: git://git.kernel.org/pub/scm/linux/kernel/git/josh/sparse.git incompatible-ptr-signess Bug fix in pointer modifiers inherent at function degeneration. sizeof-incomplete Fix double semicolon in struct declare anon-symbol Fix core dump on anonymous symbol. instruction-buffer-size Fix core dump on huge switch debug-checker Adding debug option for showing the linearized instruction. no-dead-instruction Disable liveness "dead" instruction by default. ptr-allocator Make the ptrlist using the sparse allocator. annotate-inline-2 Add annotation for inline function call. malloc-checker Adding the malloc NULL pointer checker. interrupt-checker Adding the interrupt checker spinlock-checker Adding spinlock checker Total 12 patches - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] fix quadratic behavior of shrink_dcache_parent()
On Fri, 9 Feb 2007 19:23:31 -0500 "Russ Cox" <[EMAIL PROTECTED]> wrote: > > "The file system mounted on /tmp/z in the example contains 2^50 > > directories". heh. > > > > I do wonder how realistic this problem is in real life. > > That's a fair concern, although I was trying this as part > of evaluating how much someone could hose a system > if we let them mount arbitrary FUSE servers. And the > answer is: they could make it completely unusable, > requiring reboot. > > I ran a later test that printed how deep it got into > the file tree and it was only a few hundred thousand > if I recall correctly. A determined attacker might even > manage to do this in a normal file system. > > But sure, it's not a common case. ;-) Well that's a good point - sometimes people do crazy things on purpose. We were all University students once ;) The patches look nice and as I said, potentially of some use for memory reclaim. But I hope that someone who has worked on dcache.c more recently than I has time to apply a toothcomb to this work. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: + smaps-add-clear_refs-file-to-clear-reference.patch added to -mm tree
David Rientjes wrote: > > +static ssize_t clear_refs_write(struct file *file, const char __user *buf, > + size_t count, loff_t *ppos) > +{ > ... > + task = get_proc_task(file->f_path.dentry->d_inode); > + if (!task) > + return -ESRCH; > + clear_refs_smap(task->mm->mmap); task->mm may be NULL and not stable, this needs get_task_mm() (may fail). Don't we also need ->mmap_sem to iterate vmas? Oleg. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] fbdev driver for S3 Trio/Virge, updated
Ondrej Zajicek napsal(a): This patch adds driver for S3 Trio / S3 Virge. Driver is tested with most versions of S3 Trio and S3 Virge, on i386. It is tested both as compiled-in and module. It is against linux-2.6.20 . This is version 3. There are some minor modifications from version 2 (mostly coding style cleanups). Signed-off-by: Ondrej Zajicek <[EMAIL PROTECTED]> --- [...] +/* PCI probe */ + +static int __devinit s3_pci_probe(struct pci_dev *dev, const struct pci_device_id *id) +{ + struct fb_info *info; + struct s3fb_info *par; + int rc; + u8 regval, cr38, cr39; + + /* Ignore secondary VGA device because there is no VGA arbitration */ + if (! svga_primary_device(dev)) { + dev_info(&(dev->dev), "ignoring secondary device\n"); + return -ENODEV; + } + + /* Allocate and fill driver data structure */ + info = framebuffer_alloc(sizeof(struct s3fb_info), NULL); + if (!info) { + dev_err(&(dev->dev), "cannot allocate memory\n"); + return -ENOMEM; + } + + par = info->par; + mutex_init(>open_lock); + + info->flags = FBINFO_PARTIAL_PAN_OK | FBINFO_HWACCEL_YPAN; + info->fbops = _ops; + + /* Prepare PCI device */ + rc = pci_enable_device(dev); + if (rc < 0) { + dev_err(&(dev->dev), "cannot enable PCI device\n"); + goto err_enable_device; + } + + rc = pci_request_regions(dev, "s3fb"); + if (rc < 0) { + dev_err(&(dev->dev), "cannot reserve framebuffer region\n"); + goto err_request_regions; + } + + + info->fix.smem_start = pci_resource_start(dev, 0); + info->fix.smem_len = pci_resource_len(dev, 0); + + /* Map physical IO memory address into kernel space */ + info->screen_base = pci_iomap(dev, 0, 0); + if (! info->screen_base) { + rc = -ENOMEM; + dev_err(&(dev->dev), "iomap for framebuffer failed\n"); + goto err_iomap; + } + + /* Unlock regs */ + cr38 = vga_rcrt(NULL, 0x38); + cr39 = vga_rcrt(NULL, 0x39); + vga_wseq(NULL, 0x08, 0x06); + vga_wcrt(NULL, 0x38, 0x48); + vga_wcrt(NULL, 0x39, 0xA5); + + /* Find how many physical memory there is on card */ + /* 0x36 register is accessible even if other registers are locked */ + regval = vga_rcrt(NULL, 0x36); + info->screen_size = s3_memsizes[regval >> 5] << 10; + info->fix.smem_len = info->screen_size; + + par->chip = id->driver_data & CHIP_MASK; + par->rev = vga_rcrt(NULL, 0x2f); + if (par->chip & CHIP_UNDECIDED_FLAG) + par->chip = s3_identification(par->chip); + + /* Find MCLK frequency */ + regval = vga_rseq(NULL, 0x10); + par->mclk_freq = ((vga_rseq(NULL, 0x11) + 2) * 14318) / ((regval & 0x1F) + 2); + par->mclk_freq = par->mclk_freq >> (regval >> 5); + + /* Restore locks */ + vga_wcrt(NULL, 0x38, cr38); + vga_wcrt(NULL, 0x39, cr39); + + strcpy(info->fix.id, s3_names [par->chip]); + info->fix.mmio_start = 0; + info->fix.mmio_len = 0; + info->fix.type = FB_TYPE_PACKED_PIXELS; + info->fix.visual = FB_VISUAL_PSEUDOCOLOR; + info->fix.ypanstep = 0; + info->fix.accel = FB_ACCEL_NONE; + info->pseudo_palette = (void*) (par->pseudo_palette); + + /* Prepare startup mode */ + rc = fb_find_mode(&(info->var), info, mode, NULL, 0, NULL, 8); + if (! ((rc == 1) || (rc == 2))) { + rc = -EINVAL; + dev_err(&(dev->dev), "mode %s not found\n", mode); + goto err_find_mode; + } + + rc = fb_alloc_cmap(>cmap, 256, 0); + if (rc < 0) { + dev_err(&(dev->dev), "cannot allocate colormap\n"); + goto err_alloc_cmap; + } + + rc = register_framebuffer(info); + if (rc < 0) { + dev_err(&(dev->dev), "cannot register framebugger\n"); Bugger :DD LOL? Buffer? + goto err_reg_fb; + } + + printk(KERN_INFO "fb%d: %s on %s, %d MB RAM, %d MHz MCLK\n", info->node, info->fix.id, +pci_name(dev), info->fix.smem_len >> 20, (par->mclk_freq + 500) / 1000); + + if (par->chip == CHIP_UNKNOWN) + printk(KERN_INFO "fb%d: unknown chip, CR2D=%x, CR2E=%x, CRT2F=%x, CRT30=%x\n", + info->node, vga_rcrt(NULL, 0x2d), vga_rcrt(NULL, 0x2e), + vga_rcrt(NULL, 0x2f), vga_rcrt(NULL, 0x30)); dev_info x 2, but it's a dite. + + /* Record a reference to the driver data */ + pci_set_drvdata(dev, info); + +#ifdef CONFIG_MTRR + if (mtrr) { + par->mtrr_reg = -1; + par->mtrr_reg = mtrr_add(info->fix.smem_start, info->fix.smem_len, MTRR_TYPE_WRCOMB, 1); + } +#endif + + return 0; + + /* Error handling */ +err_reg_fb: +
Re: [PATCH 21/22] honor r/w changes at do_remount() time
On 9 Feb 2007, at 23:22, Andrew Morton wrote: On Fri, 09 Feb 2007 14:53:44 -0800 Dave Hansen <[EMAIL PROTECTED]> wrote: This is the core of the read-only bind mount patch set. Who wants read-only bind mounts, and for what reason? On our local mirror server (mirrors just under 3TiB worth of stuff) we hold all data on r/w mounted storage in a private location in the file tree. (Note the server runs Solaris 10 not Linux or the following would not be possible at present...) We then bind mount (i.e. loopback mount on Solaris) various directories from inside the private paths to various other locations so for example we create /export/ftp/pub/* where "*" are directories we want to export via FTP and we do all of those as read-only bind mounts. This gives us that little bit of extra confidence that no- one from the outside can cause any writes to happen to our mirrored data. We do similar for NFS by creating lots of read-only bind mounts in /* that again point into the private locations. It would be nice if the Linux box that we have that is a copy/backup of the Solaris box could do the same rather than have all the bind mounts be read-write because we need the storage in the private locations to be writable. Best regards, Anton - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: NAK new drivers without proper power management?
Nigel Cunningham wrote: Hi. On Fri, 2007-02-09 at 23:17 +0100, Arjan van de Ven wrote: On Sat, 2007-02-10 at 08:57 +1100, Nigel Cunningham wrote: Hi. I don't think this is already done (feel free to correct me if I'm wrong).. Can we start to NAK new drivers that don't have proper power management implemented? There really is no excuse for writing a new driver and not putting .suspend and .resume methods in anymore, is there? to a large degree, a device driver that doesn't suspend is better than no device driver at all, right? I'm not sure it is. It only makes more work for everyone else: We have to help people figure out what causes their computer to fail to resume (which can take quite a while), then get them them complain to driver author, and the driver author has to submit patches to fix it. All of this is avoided if they'll just do it right in the first place. A lot of a lot of things could have been avoided, if they just did it right the first time. I think it's more valuable to users to get a basic network driver that pings or a basic ATA driver that reads/writes, than peripheral issues like suspend/resume. Certainly we should ask for it, but it shouldn't be a merge-stopper. Jeff - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Fix null pointer dereference in appledisplay driver
Commit 40b20c257a13c5a526ac540bc5e43d0fdf29792a by Len Brown introduced a null pointer dereference in the appledisplay driver. This patch fixes it. Signed-off-by: Michael Hanselmann <[EMAIL PROTECTED]> --- I suggest adding this to 2.6.20.1 because this bug causes the kernel to panic on boot when the driver is compiled in. diff -Nrup --exclude-from linux-exclude-from linux-2.6.20.orig/drivers/usb/misc/appledisplay.c linux-2.6.20/drivers/usb/misc/appledisplay.c --- linux-2.6.20.orig/drivers/usb/misc/appledisplay.c 2007-02-09 22:35:56.0 +0100 +++ linux-2.6.20/drivers/usb/misc/appledisplay.c2007-02-10 01:00:28.0 +0100 @@ -281,8 +281,8 @@ static int appledisplay_probe(struct usb /* Register backlight device */ snprintf(bl_name, sizeof(bl_name), "appledisplay%d", atomic_inc_return(_displays) - 1); - pdata->bd = backlight_device_register(bl_name, NULL, NULL, - _bl_data); + pdata->bd = backlight_device_register(bl_name, NULL, + pdata, _bl_data); if (IS_ERR(pdata->bd)) { err("appledisplay: Backlight registration failed"); goto error; - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] fix quadratic behavior of shrink_dcache_parent()
"The file system mounted on /tmp/z in the example contains 2^50 directories". heh. I do wonder how realistic this problem is in real life. That's a fair concern, although I was trying this as part of evaluating how much someone could hose a system if we let them mount arbitrary FUSE servers. And the answer is: they could make it completely unusable, requiring reboot. I ran a later test that printed how deep it got into the file tree and it was only a few hundred thousand if I recall correctly. A determined attacker might even manage to do this in a normal file system. But sure, it's not a common case. ;-) Russ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0 of 4] Generic AIO by scheduling stacks
On Sat, 10 Feb 2007, Eric Dumazet wrote: > > Well, I guess if the original program was mono-threaded, and syscall used > fget_light(), we might have a problem here if the child try a close(). So you > may have to disable fget_light() magic if async call is the originator of the > syscall. Yes. All the issues that I already brought up with Zach's patches are still there. This doesn't really change any of them. Any optimization that checks for "am I single-threaded" will need to be aware of pending and running async things. With my patch, any _running_ async things will always be seen as normal clones, but the pending ones won't. So you'd need to effectively change anything that looks like if (atomic_read(>mm->count) == 1) .. do some simplified version .. into if (!current->async_cookie && atomic_read(..) == 1) .. do the simplified thing .. to make it safe. I think we only do it for fget_light and some VM TLB simplification, so it shouldn't be a big burden to check. Side note: the real issues still remain. The interfaces, and the performance testing. Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: NAK new drivers without proper power management?
On Saturday, 10 February 2007 00:28, Nigel Cunningham wrote: > Hi. > > On Sat, 2007-02-10 at 00:12 +0100, Rafael J. Wysocki wrote: > > > > I think if CONFIG_PM_DEBUG is set, the core should warn about drivers > > > > not > > > > having .suspend or .resume routines. > > > > > > The only problem with that is, not everyone turns on CONFIG_PM_DEBUG. > > > CONFIG_PM instead? > > > > Well, I can imagine a driver that doesn't need a .suspend routine, for > > example, > > and I don't think we should make the kernel always complain about that. > > How about... > > #ifdef CONFIG_PM_PARANOIA > static int empty_suspend_routine(struct device *dev, pm_message_t state) > { > return 0; > } > #define empty_suspend empty_suspend_routine > #else > #define empty_suspend NULL > #endif > > ... > > .suspend = empty_suspend; > ... > > > Then CONFIG_PM_PARANOIA can be enabled by default for now, and when we > eventually device it's not needed anymore, someone can submit a patch > replacing either turning off the CONFIG by default or removing the whole > mechanism. I think that would be tempting people to abuse it, for example by defining or undefining things just to quieten the warning. In my opinion the only way to make the warning go away should be to define a non-NULL .suspend (.resume) routine and that's why I don't think the warning should be mandatory. > > I think if someone doesn't set CONFIG_PM_DEBUG, we can ask him to set it > > and report back. > > We can, but the whole point to the suggestion was to make your life and > mine easier, as well as those of our users. > > Making it dependent on CONFIG_PM instead achieves that by: > - Saving you, I and distro people from having to tell their users to > enable the option (and how to) I think the distro people can patch their kernels to fit their needs. > - Saving the users the problem of going through all the steps, making > mistakes, potentially ending up with unbootable systems because they > make mistakes and so on. > > This way, they just need to look in dmesg. Well, IMO, if someone doesn't know how to compile and install the kernel, he'll be using a distro kernel anyway and then see above. Otherwise we can safely ask him to turn on whatever debugging options we need. Greetings, Rafael -- If you don't have the time to read, you don't have the time or the tools to write. - Stephen King - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 02/22] r/o bind mounts: add vfsmount writer counts
On Sat, 2007-02-10 at 00:41 +0100, Eric Dumazet wrote: > Dave, please read again this comment in struct vfsmount definition. > > If I understand your infrastructure, mnt=5Fwriters is going to be frequently > modified, so it should be placed at the end of struct vfsmount, in the same > cache line than mnt_count. That's an excellent point, thanks for catching it. Here's an updated patch. -- Dave This patch actually adds the mount and superblock writer counts, and the mnt_want/drop_write() functions that use them. Before these can become useful, we must first cover each place in the VFS where writes are performed with a want/drop pair. When that is complete, we can actually introduce code that will safely check the counts before allowing r/w<->r/o transitions to occur. Signed-off-by: Dave Hansen <[EMAIL PROTECTED]> --- lxc-dave/fs/namespace.c| 53 + lxc-dave/fs/super.c| 18 ++--- lxc-dave/include/linux/fs.h|2 + lxc-dave/include/linux/mount.h | 28 +++-- 4 files changed, 94 insertions(+), 7 deletions(-) diff -puN fs/namespace.c~03-24-add-vfsmount-writer-count fs/namespace.c --- lxc/fs/namespace.c~03-24-add-vfsmount-writer-count 2007-02-09 16:04:40.0 -0800 +++ lxc-dave/fs/namespace.c 2007-02-09 16:04:40.0 -0800 @@ -58,6 +58,7 @@ struct vfsmount *alloc_vfsmnt(const char if (mnt) { mnt->mnt_user_ns = get_user_ns(current->nsproxy->user_ns); atomic_set(>mnt_count, 1); + mnt->mnt_writers = 0; INIT_LIST_HEAD(>mnt_hash); INIT_LIST_HEAD(>mnt_child); INIT_LIST_HEAD(>mnt_mounts); @@ -78,6 +79,56 @@ struct vfsmount *alloc_vfsmnt(const char return mnt; } +int mnt_make_readonly(struct vfsmount *mnt) +{ + int ret = 0; + + WARN_ON(__mnt_is_readonly(mnt)); + + /* +* This flag set is actually redundant with what +* happens in do_remount(), but since we do this +* under the lock, anyone attempting to get a write +* on it after this will fail. +*/ + spin_lock(>mnt_sb->s_mnt_writers_lock); + if (!mnt->mnt_writers) + mnt->mnt_flags |= MNT_READONLY; + else + ret = -EBUSY; + spin_unlock(>mnt_sb->s_mnt_writers_lock); + return ret; +} + +int mnt_want_write(struct vfsmount *mnt) +{ + int ret = 0; + + spin_lock(>mnt_sb->s_mnt_writers_lock); + if (mnt->mnt_writers) + goto out; + + if (__mnt_is_readonly(mnt)) { + ret = -EROFS; + goto out; + } + mnt->mnt_sb->s_writers++; + mnt->mnt_writers++; +out: + spin_unlock(>mnt_sb->s_mnt_writers_lock); + return ret; +} +EXPORT_SYMBOL_GPL(mnt_want_write); + +void mnt_drop_write(struct vfsmount *mnt) +{ + spin_lock(>mnt_sb->s_mnt_writers_lock); + mnt->mnt_sb->s_writers--; + mnt->mnt_writers--; + spin_unlock(>mnt_sb->s_mnt_writers_lock); +} +EXPORT_SYMBOL_GPL(mnt_drop_write); + int simple_set_mnt(struct vfsmount *mnt, struct super_block *sb) { mnt->mnt_sb = sb; @@ -1415,6 +1466,8 @@ long do_mount(char *dev_name, char *dir_ ((char *)data_page)[PAGE_SIZE - 1] = 0; /* Separate the per-mountpoint flags */ + if (flags & MS_RDONLY) + mnt_flags |= MNT_READONLY; if (flags & MS_NOSUID) mnt_flags |= MNT_NOSUID; if (flags & MS_NODEV) diff -puN fs/super.c~03-24-add-vfsmount-writer-count fs/super.c --- lxc/fs/super.c~03-24-add-vfsmount-writer-count 2007-02-09 16:04:40.0 -0800 +++ lxc-dave/fs/super.c 2007-02-09 16:04:40.0 -0800 @@ -93,6 +93,8 @@ static struct super_block *alloc_super(s s->s_qcop = sb_quotactl_ops; s->s_op = _op; s->s_time_gran = 10; + s->s_writers = 0; + spin_lock_init(>s_mnt_writers_lock); } out: return s; @@ -576,6 +578,11 @@ static void mark_files_ro(struct super_b file_list_unlock(); } +static int sb_remount_ro(struct super_block *sb) +{ + return fs_may_remount_ro(sb); +} + /** * do_remount_sb - asks filesystem to change mount options. * @sb:superblock in question @@ -587,7 +594,8 @@ static void mark_files_ro(struct super_b */ int do_remount_sb(struct super_block *sb, int flags, void *data, int force) { - int retval; + int retval = 0; + int sb_started_ro = (sb->s_flags & MS_RDONLY); #ifdef CONFIG_BLOCK if (!(flags & MS_RDONLY) && bdev_read_only(sb->s_bdev)) @@ -600,11 +608,13 @@ int do_remount_sb(struct super_block *sb /* If we are remounting RDONLY and current sb is read/write, make sure there are no rw files opened */ - if ((flags & MS_RDONLY) && !(sb->s_flags & MS_RDONLY)) { + if ((flags & MS_RDONLY) &&
Re: [PATCH 0 of 4] Generic AIO by scheduling stacks
Linus Torvalds a écrit : Ok, here's another entry in this discussion. - IF the system call blocks, we call the architecture-specific "schedule_async()" function before we even get any scheduler locks, and it can just do a fork() at that time, and let the *child* return to the original user space. The process that already started doing the system call will just continue to do the system call. Well, I guess if the original program was mono-threaded, and syscall used fget_light(), we might have a problem here if the child try a close(). So you may have to disable fget_light() magic if async call is the originator of the syscall. Eric - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 02/22] r/o bind mounts: add vfsmount writer counts
Dave Hansen a écrit : @@ -56,6 +57,7 @@ struct vfsmount { struct vfsmount *mnt_master;/* slave is on master->mnt_slave_list */ struct mnt_namespace *mnt_ns; /* containing namespace */ struct user_namespace *mnt_user_ns; /* namespace for uid interpretation */ + int mnt_writers;/* nr files open for write */ /* * We put mnt_count & mnt_expiry_mark at the end of struct vfsmount * to let these frequently modified fields in a separate cache line @@ -72,7 +74,26 @@ static inline struct vfsmount *mntget(st atomic_inc(>mnt_count); return mnt; Dave, please read again this comment in struct vfsmount definition. If I understand your infrastructure, mnt_writers is going to be frequently modified, so it should be placed at the end of struct vfsmount, in the same cache line than mnt_count. Thank you Eric - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] fix quadratic behavior of shrink_dcache_parent()
On Fri, 09 Feb 2007 23:01:06 +0100 Miklos Szeredi <[EMAIL PROTECTED]> wrote: > From: Miklos Szeredi <[EMAIL PROTECTED]> > > The time shrink_dcache_parent() takes, grows quadratically with the > depth of the tree under 'parent'. This starts to get noticable at > about 10,000. > > These kinds of depths don't occur normally, and filesystems which > invoke shrink_dcache_parent() via d_invalidate() seem to have other > depth dependent timings, so it's not even easy to expose this problem. > > However with FUSE it's easy to create a deep tree and d_invalidate() > will also get called. This can make a syscall hang for a very long > time. > > This is the original discovery of the problem by Russ Cox: > > http://article.gmane.org/gmane.comp.file-systems.fuse.devel/3826 "The file system mounted on /tmp/z in the example contains 2^50 directories". heh. I do wonder how realistic this problem is in real life. > The following patch fixes the quadratic behavior, by optionally > allowing prune_dcache() to prune ancestors of a dentry in one go, > instead of doing it one at a time. > > Common code in dput() and prune_one_dentry() is extracted into a new > helper function d_kill(). > > shrink_dcache_parent() as well as shrink_dcache_sb() are converted to > use the ancestry-pruner option. Only for shrink_dcache_memory() is > this behavior not desirable, so it keeps using the old algorithm. > I wonder if we should be setting shrink_parents=1 in shrink_dcache_memory()? Because we have this problem where the dentry slabs suffer lots of internal fragmentation and we end up with whole slab pages pinned by a single directory dentry. I expect that if shrink_dcache_memory() were aggressive about reaping newly-childless directory dentries, some improvements might be realised there. If so, we should change prune_dcache() to return the number pruned, so that shrink_dcache_memory() can keep its arithmetic correct. Would require some careful testing and is out of scope for your work. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 21/22] honor r/w changes at do_remount() time
On Fri, 2007-02-09 at 15:22 -0800, Andrew Morton wrote: > On Fri, 09 Feb 2007 14:53:44 -0800 > Dave Hansen <[EMAIL PROTECTED]> wrote: > > > This is the core of the read-only bind mount patch set. > > Who wants read-only bind mounts, and for what reason? The original desire came out of the linux-vserver project. It allows a sysadmin to share directories between many vservers/containers and keep those containers from writing to it, even though the users in that vserver may have "root" privileges. This also has the advantage of cleaning up the somewhat hackish "look for writable-open-files during remount/ro operations". It should also allow us to separate the concepts of the user wanting a filesystem to be r/o and the filesystem _itself_ being r/o because of a r/o device or some kind of corruption. -- Dave - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [ck] Re: Swap prefetch merge plans
On Fri, 09 Feb 2007 18:35:51 -0500 Chuck Ebbert wrote: > Andrew Morton wrote: > > I have an email sitting in my drafts folder stating that I'll no longer > > accept any features unless they've been publically reviewed in detail and > > run-time tested by a third party. The idea being to force people to spend > > more time reviewing and testing each other's stuff and less time writing > > new stuff. Maybe on a sufficiently gloomy day I'll actually send it. > > > /me sneaks into Andrew's office and sends it out. Thanks. 8) --- ~Randy *** Remember to use Documentation/SubmitChecklist when testing your code *** - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Add PM_TRACE x86_64 support.
Hi! > > > Nigel Cunningham <[EMAIL PROTECTED]> writes: > > > > > > > - for (tracedata = &__tracedata_start ; tracedata < > > > > &__tracedata_end ; tracedata += 6) { > > > > + for (tracedata = &__tracedata_start ; tracedata < > > > > &__tracedata_end ; tracedata += 2 + sizeof(unsigned long)) { > > > > > > Could you split this line? > > > > Sure. > > > > -- New version -- (What's the right way to do this?) > > > > This patch add x86_64 support for PM_TRACE, and shifts per-arch code to > > the appropriate subdirectories. > > > > Symbol exports are added so tracing can be used from drivers built as > > modules too. > > Don't include exports in a patch that doesn't use them. Introduce the > exports in a later patch series, for when you actually need it. It is debugging infrastructure, so export actually makes sense... It will not ever be used in mainline kernel; you need to modify code manually to use this code.. -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [GIT PATCH] ACPI patches for 2.6.21
Hi! > Per your request, and the request of the distros, we've changed > how ACPICA Core releases are integrated into Linux so that each > upstream (CVS) check-in appears as a single git commit. > While this process is not yet perfect, it should be vastly better > than previous "code drops" in allowing git bisect to work, > and allowing distros to cherry-pick individual fixes. > > The "bay" driver is new (and marked EXPERIMENTAL) -- adding initial > hot-plug support for ACPI controlled drive bays such as the > IBM ultrabay or the Dell Module Bay. Could you describe userland interface it uses? /proc? Will it be usable for bays on notebooks not using acpi? > The "asus-laptop" driver is also new. Consistent with msi-laptop, > it uses ACPI in platform-specific ways, but strives to avoid > exposing ACPI-specific implementation details to the user. > asus-laptop is mutually exclusive with asus_acpi, which it will > replace over time. Not including another /proc/acpi/ibm -like nightmare, is it? > the old /proc/acpi/ interfaces with cleaner interfaces in sysfs -- > non-ACPI-specific generic ones whenever possible. This effort > is not complete, but it has been in -mm for a long time and > I believe that it is time to push it upstream to benefit > from broader exposure and testing. Does it still include completely broken alarm interface? Can't find it in changelogs, so hopefully not. -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 1/1] PM: Adds remount fs ro at suspend
On Wed 2007-02-07 09:25:39, Henrique de Moraes Holschuh wrote: > On Wed, 07 Feb 2007, Nigel Cunningham wrote: > > Ok, as far as usage scenario goes, that's fair enough. But as to the > > solution, I wonder though whether it's making life more complicated than > > it needs to be. After all, we should also be able to cope okay with > > having the power suddenly go out. If we can cope with that, cleaning > > filesystems prior to suspending should be a non-issue. > > We don't cope okay with the power going out, at all. And as an user case, a > need for fsck if you do something that is a reasonable use case (unplugging > devices while suspended) is not okay, either. It would be nice to umount devices over suspend, but I do not think solution is as easy as patch that started this thread. For now it is 'dont do that' and fsck is nice reminder that you done something wrong. Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] [NETDEV] [004] dmfe : Add suspend/resume support
Hi! > From: Maxim Levitsky <[EMAIL PROTECTED]> > Subject: [PATCH] [NETDEV] [004] dmfe : Add suspend/resume support > > Adds support for suspend/resume Patch looks ok, but your mailer damaged it heavily. > --- linux-2.6.20-mod/drivers/net/tulip/dmfe.c 2007-02-07 18:46:13.0 > +0200 > +++ linux-2.6.20-test/drivers/net/tulip/dmfe.c 2007-02-07 18:50:52.0 > +0200 > @@ -55,9 +55,6 @@ > > TODO > > - Implement pci_driver::suspend() and pci_driver::resume() > - power management methods. > - > Check on 64 bit boxes. > Check and fix on big endian boxes. > > @@ -2027,11 +2024,59 @@ static struct pci_device_id dmfe_pci_tbl > MODULE_DEVICE_TABLE(pci, dmfe_pci_tbl); > > > + > +static int dmfe_suspend(struct pci_dev *pci_dev, pm_message_t state) > +{ Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: DMA mapping API for non-system memory pools
Yes- this would be interesting to know wrt to doing things like PCI<>PCI xfers (e.g., for things like the Micromemory NVRAM card). On 2/9/07, Kumar Gala <[EMAIL PROTECTED]> wrote: We've been having a discussion on the linuxppc-dev list about how to handle IO memory that exists on some PPC SoC devices. These IO memories behave like system memory but are faster to the processor or device needed accessing for things like buffer descriptors. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: DMA mapping API for non-system memory pools
On Fri, 2007-02-09 at 17:33 -0600, Kumar Gala wrote: > ideally all this would be handled via the dma mapping API, the > question is how to convey to the API to use the IO memory vs the > system memory? Should we look at adding a new GFP_IOMEM flag or do > something based on struct device? > > Any ideas on direction (or if this is a solved problem elsewhere) > would be appreciated. Doesn't the dma_declare_coherent_memory() API work for this case? it was designed for the ARM SoC (and the voyager weird SCSI card). James - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [ck] Re: Swap prefetch merge plans
Andrew Morton wrote: > I have an email sitting in my drafts folder stating that I'll no longer > accept any features unless they've been publically reviewed in detail and > run-time tested by a third party. The idea being to force people to spend > more time reviewing and testing each other's stuff and less time writing > new stuff. Maybe on a sufficiently gloomy day I'll actually send it. > /me sneaks into Andrew's office and sends it out. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 11/10] lguest: use disable_acpi()
On Fri, 2007-02-09 at 12:49 -0500, Len Brown wrote: > On Friday 09 February 2007 12:14, James Morris wrote: > > This is being disabled in the guest kernel only. The host and guest > > kernels are expected to be the same build. > > Okay, but better to use disable_acpi() > indeed, since this would be the first code not already inside CONFIG_ACPI > to invoke disable_acpi(), we could define the inline as empty and you could > then scratch the #ifdef too. Thanks Len! This applies on top of that series. == Len Brown <[EMAIL PROTECTED]> said: > Okay, but better to use disable_acpi() > indeed, since this would be the first code not already inside CONFIG_ACPI > to invoke disable_acpi(), we could define the inline as empty and you could > then scratch the #ifdef too. Signed-off-by: Rusty Russell <[EMAIL PROTECTED]> diff -r 85363b87e20b arch/i386/lguest/lguest.c --- a/arch/i386/lguest/lguest.c Sat Feb 10 01:52:37 2007 +1100 +++ b/arch/i386/lguest/lguest.c Sat Feb 10 10:28:36 2007 +1100 @@ -555,10 +555,7 @@ static __attribute_used__ __init void lg mce_disabled = 1; #endif -#ifdef CONFIG_ACPI - acpi_disabled = 1; - acpi_ht = 0; -#endif + disable_acpi(); if (boot->initrd_size) { /* We stash this at top of memory. */ INITRD_START = boot->max_pfn*PAGE_SIZE - boot->initrd_size; diff -r 85363b87e20b include/asm-i386/acpi.h --- a/include/asm-i386/acpi.h Sat Feb 10 01:52:37 2007 +1100 +++ b/include/asm-i386/acpi.h Sat Feb 10 10:43:43 2007 +1100 @@ -127,6 +127,7 @@ extern int acpi_irq_balance_set(char *st #define acpi_ioapic 0 static inline void acpi_noirq_set(void) { } static inline void acpi_disable_pci(void) { } +static inline void disable_acpi(void) { } #endif /* !CONFIG_ACPI */ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] fix misannotation of linkinfo_dn
From: Al Viro <[EMAIL PROTECTED]> Date: Fri, 09 Feb 2007 18:13:42 + > > Signed-off-by: Al Viro <[EMAIL PROTECTED]> > --- > include/linux/dn.h |2 +- > 1 files changed, 1 insertions(+), 1 deletions(-) Also applied, thanks Al. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] FRA_{DST,SRC} are le16 for decnet
From: Al Viro <[EMAIL PROTECTED]> Date: Fri, 09 Feb 2007 18:13:37 + > > Signed-off-by: Al Viro <[EMAIL PROTECTED]> > --- > net/decnet/dn_rules.c | 12 ++-- > 1 files changed, 6 insertions(+), 6 deletions(-) Applied. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[stable patch 2.6.20 3/3] ieee1394: fix host device registering when nodemgr disabled
Date: Tue, 6 Feb 2007 02:34:45 +0100 (CET) From: Stefan Richter <[EMAIL PROTECTED]> Since my commit 8252bbb1363b7fe963a3eb6f8a36da619a6f5a65 in 2.6.20-rc1, host devices have a dummy driver attached. Alas the driver was not registered before use if ieee1394 was loaded with disable_nodemgr=1. This resulted in non-functional FireWire drivers or kernel lockup. http://bugzilla.kernel.org/show_bug.cgi?id=7942 Signed-off-by: Stefan Richter <[EMAIL PROTECTED]> --- drivers/ieee1394/nodemgr.c | 24 1 file changed, 16 insertions(+), 8 deletions(-) same as commit 91efa462054d44ae52b0c6c8325ed5e899f2cd17 in linux-2.6.20-git# (Side note: The parameter disable_nodemgr=1 is merely an optional tuning parameter for people who know what they are doing and who don't need device discovery and bus management.) Index: linux-2.6.20/drivers/ieee1394/nodemgr.c === --- linux-2.6.20.orig/drivers/ieee1394/nodemgr.c +++ linux-2.6.20/drivers/ieee1394/nodemgr.c @@ -274,7 +274,6 @@ static struct device_driver nodemgr_mid_ struct device nodemgr_dev_template_host = { .bus= _bus_type, .release= nodemgr_release_host, - .driver = _mid_layer_driver, }; @@ -1889,22 +1888,31 @@ int init_ieee1394_nodemgr(void) error = class_register(_ne_class); if (error) - return error; - + goto fail_ne; error = class_register(_ud_class); - if (error) { - class_unregister(_ne_class); - return error; - } + if (error) + goto fail_ud; error = driver_register(_mid_layer_driver); + if (error) + goto fail_ml; + /* This driver is not used if nodemgr is off (disable_nodemgr=1). */ + nodemgr_dev_template_host.driver = _mid_layer_driver; + hpsb_register_highlevel(_highlevel); return 0; + +fail_ml: + class_unregister(_ud_class); +fail_ud: + class_unregister(_ne_class); +fail_ne: + return error; } void cleanup_ieee1394_nodemgr(void) { hpsb_unregister_highlevel(_highlevel); - + driver_unregister(_mid_layer_driver); class_unregister(_ud_class); class_unregister(_ne_class); } -- Stefan Richter -=-=-=== --=- -=-=- http://arcgraph.de/sr/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[stable patch 2.6.20 2/3] ieee1394: video1394: DMA fix
Date: Sat, 03 Feb 2007 03:09:09 -0500 From: David Moore <[EMAIL PROTECTED]> This together with the phys_to_virt fix in lib/swiotlb.c::swiotlb_sync_sg fixes video1394 DMA on machines with DMA bounce buffers, especially Intel x86-64 machines with > 3GB RAM. Signed-off-by: Stefan Richter <[EMAIL PROTECTED]> Signed-off-by: David Moore <[EMAIL PROTECTED]> Tested-by: Nicolas Turro <[EMAIL PROTECTED]> --- drivers/ieee1394/video1394.c |8 1 file changed, 8 insertions(+) same as commit a5782010b4e75cba571357efaa27df22a89427c2 in linux-2.6.20-git# Index: linux-2.6.20/drivers/ieee1394/video1394.c === --- linux-2.6.20.orig/drivers/ieee1394/video1394.c +++ linux-2.6.20/drivers/ieee1394/video1394.c @@ -489,6 +489,9 @@ static void wakeup_dma_ir_ctx(unsigned l reset_ir_status(d, i); d->buffer_status[d->buffer_prg_assignment[i]] = VIDEO1394_BUFFER_READY; do_gettimeofday(>buffer_time[d->buffer_prg_assignment[i]]); + dma_region_sync_for_cpu(>dma, + d->buffer_prg_assignment[i] * d->buf_size, + d->buf_size); } } @@ -1096,6 +1099,8 @@ static long video1394_ioctl(struct file DBGMSG(ohci->host->id, "Starting iso transmit DMA ctx=%d", d->ctx); put_timestamp(ohci, d, d->last_buffer); + dma_region_sync_for_device(>dma, + v.buffer * d->buf_size, d->buf_size); /* Tell the controller where the first program is */ reg_write(ohci, d->cmdPtr, @@ -,6 +1116,9 @@ static long video1394_ioctl(struct file "Waking up iso transmit dma ctx=%d", d->ctx); put_timestamp(ohci, d, d->last_buffer); + dma_region_sync_for_device(>dma, + v.buffer * d->buf_size, d->buf_size); + reg_write(ohci, d->ctrlSet, 0x1000); } } -- Stefan Richter -=-=-=== --=- -=-=- http://arcgraph.de/sr/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 4 of 7] lguest: Config and headers
On Fri, 2007-02-09 at 13:15 -0500, James Morris wrote: > On Sat, 10 Feb 2007, Rusty Russell wrote: > > > +/* 64k ought to be enough for anybody! */ > > +#define HYPERVISOR_MAP_ORDER 16 > > +#define HYPERVISOR_PAGES ((1 << HYPERVISOR_MAP_ORDER)/PAGE_SIZE) > > I think it'd be better to go back to defining HYPERVISOR_SIZE then derive > the map order from that via get_order(), as it should be 4 instead of 16; > and this code is now both implying PAGE_SIZE while also using it for > calculations. Well it was the use of get_order() which triggered Andi's alarm bells, so I went back to deriving it. This code is correct, however. get_order() is one of those classic functions only a kernel coder could love. Look how lovingly it has been optimized: #define get_order(n)\ ( \ __builtin_constant_p(n) ? \ ((n < (1UL << PAGE_SHIFT)) ? 0 : ilog2(n) - PAGE_SHIFT) : \ __get_order(n, PAGE_SHIFT) \ ) All that time spent, yet no consideration that it should be called "get_page_order()" or some name which hints that the divide by page size is happening. It's even documented in the comment above, so someone thought it needed explaining. Too bad they chose to explain it instead of actually clarifying it. 8( Cheers, Rusty. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[stable patch 2.6.20 1/3] Missing critical phys_to_virt in lib/swiotlb.c
Date: Sun, 04 Feb 2007 13:39:40 -0500 From: David Moore <[EMAIL PROTECTED]> Adds missing call to phys_to_virt() in the lib/swiotlb.c:swiotlb_sync_sg() function. Without this change, a kernel panic will always occur whenever a SWIOTLB bounce buffer from a scatter-gather list gets synced. Signed-off-by: David Moore <[EMAIL PROTECTED]> Signed-off-by: Stefan Richter <[EMAIL PROTECTED]> --- This is a fraction of patch "[IA64] swiotlb bug fixes" in 2.6.20-git#, commit cde14bbfb3aa79b479db35bd29e6c083513d8614. Unlike its heading suggests, it is also important for EM64T. Example crashes caused by swiotlb_sync_sg: http://lists.opensuse.org/opensuse-bugs/2006-12/msg02943.html http://qa.mandriva.com/show_bug.cgi?id=28224 http://www.pchdtv.com/forum/viewtopic.php?t=2063=a959a14a4c2db0eebaab7b0df56103ce --- linux-2.6.20.orig/lib/swiotlb.c 2007-02-04 13:18:41.0 -0500 +++ linux-2.6.20/lib/swiotlb.c 2007-02-04 13:19:43.0 -0500 @@ -750,7 +750,7 @@ swiotlb_sync_sg(struct device *hwdev, st for (i = 0; i < nelems; i++, sg++) if (sg->dma_address != SG_ENT_PHYS_ADDRESS(sg)) - sync_single(hwdev, (void *) sg->dma_address, + sync_single(hwdev, phys_to_virt(sg->dma_address), sg->dma_length, dir, target); } -- Stefan Richter -=-=-=== --=- -=-=- http://arcgraph.de/sr/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0 of 4] Generic AIO by scheduling stacks
On Fri, 9 Feb 2007, Davide Libenzi wrote: > > That's another way to do it. But you end up creating/destroying a new > thread for every request. May be performing just fine. Well, I actually wanted to add a special CLONE_ASYNC flag, because I think we could do it better if we know it's a particularly limited special case. But that's really just a "small implementation detail", and I don't know how big a deal it is. I didn't want to obscure the basic idea with anything bigger. I agree that the create/destroy is a big overhead, but at least it's now only done when we actually end up doing some IO (and _after_ we've started the IO, of course - that's when we block), so compared to doing it up front, I'm hoping that it's not actually that horrid. The "fork-like" approach also means that it's very flexible. It's not really even limited to doing simple system calls any more: you *could*, for example, decide that since you already have the thread, and now that it's asynchronous, you'd actually return to user space (to let user space "complete" whatever asynchronous action it wanted to complete). > Another, even simpler way IMO, is to just have a plain per-task kthread > pool, and a queue. Yes, that is actually quite doable with basically the same interface. It's literally a "small decision" inside of "schedule_async()" on how it actually would want to handle the case of "hey, we now have concurrent work to be done". But I actually don't think a per-task kthread pool is necessarily a good idea. If a thread pool works for this, then it should have worked for regular thread create/destroy loads too - ie there really is little reason to special-case the "async system call" case. NOTE! I'm also not at all sure that we actually want to waste real threads on this. My patch is in no way meant to be an "exclusive alternative" to fibrils. Quite the reverse, actually: I _like_ those synchronous fibrils, but I didn't like how Zach did the overhead of creating them up-front, because I really would like the cached case to be totally *synchronous*. So I wrote my patch with a "schedule_async()" implementation that just creates a full-sized thread, but I actually wanted very much to try to make it use fibrils that are allocated on-demand too. I was just too lazy. So the patch is really meant as a "ok, this is how easy it is to make the thread allocation be 'on-demand' instead of 'up-front'". The actual _policy_ on how thread allocation is done isn't even interesting to me, to some degree. I think Zack's fibrils would work fine, a thread pool would work fine, and just the silly outright "new thread for everything" that the example patch actually used may also possibly work well enough. It's one reason I liked my patch. It was not only small and simple, it really is very flexible, I think. It's also totally independent on how you actually end up _executing_ the async requests. (In fact, you could easily make it a config option whether you support any asynchronous behaviour AT ALL. The "async()" system call might still be there, but it would just return "0" all the time, and do the actual work synchronously). Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
DMA mapping API for non-system memory pools
We've been having a discussion on the linuxppc-dev list about how to handle IO memory that exists on some PPC SoC devices. These IO memories behave like system memory but are faster to the processor or device needed accessing for things like buffer descriptors. Here's an example in which allocation is done either via system memory or a specialized allocator for MURAM from drivers/net/ucc_geth.c: (Yes, the system memory should be moved to use the dma mapping api) if (uf_info->bd_mem_part == MEM_PART_SYSTEM) { u32 align = 4; if (UCC_GETH_TX_BD_RING_ALIGNMENT > 4) align = UCC_GETH_TX_BD_RING_ALIGNMENT; ugeth->tx_bd_ring_offset[j] = kmalloc((u32) (length + align), GFP_KERNEL); if (ugeth->tx_bd_ring_offset[j] != 0) ugeth->p_tx_bd_ring[j] = (void*)((ugeth- >tx_bd_ring_offset[j] + align) & ~(align - 1)); } else if (uf_info->bd_mem_part == MEM_PART_MURAM) { ugeth->tx_bd_ring_offset[j] = qe_muram_alloc(length, UCC_GETH_TX_BD_RING_ALIGNMENT); if (!IS_MURAM_ERR(ugeth->tx_bd_ring_offset[j])) ugeth->p_tx_bd_ring[j] = (u8 *) qe_muram_addr(ugeth-> tx_bd_ring_offset[j]); } ideally all this would be handled via the dma mapping API, the question is how to convey to the API to use the IO memory vs the system memory? Should we look at adding a new GFP_IOMEM flag or do something based on struct device? Any ideas on direction (or if this is a solved problem elsewhere) would be appreciated. Thanks - kumar - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Kwatch: kernel watchpoints using CPU debug registers
> Yes. In fact, the current existing code does not handle dr6 correctly. > It never clears the register, which means you're likely to get into > trouble when multiple breakpoints (or watchpoints) are enabled. This is a subtle change from the existing ABI, in which userland has to clear %dr6 via ptrace itself. But gdb never does that AFAICT. So it's in fact subject to confusion when two watchpoints are set and the second hits after the first. So gdb ought to be fixed to clear dr6 via ptrace, to work with existing and older kernels. I don't think I really object to the ABI change of clearing %dr6 after an exception so that it does not accumulate multiple results. But first I'll have to convince myself that we never actually do want to accumulate multiple results. Hmm, I think we can, so maybe I do object. If you set two watchpoints inside a user buffer and then do a system call that touches both those addresses (e.g. read), then you will go through do_debug (to send_sigtrap) twice before returning to user mode. When the syscall is done, you'll have a pending SIGTRAP for the debugger to handle. By looking at your %dr6 the debugger can see that both watchpoints hit. (gdb does not handle this case, but it should.) Am I wrong? So this gets to the more complicated view of %dr6 handling that I had first had in mind yesterday. Each allocation "owns" one of the low 4 bits in %dr6 too. Only the dr6 bits owned by the userland "raw" allocation (i.e. ptrace/utrace_regset) should appear nonzero in thread.debugreg[6]. So when kwatch swallows a debug exception, it should mask off its bit from %dr6 in the CPU, but not clear %dr6 completely. That way you can have a sequence of user dr0 hit, kwatch dr3 hit, user dr1 hit, all inside one system call (including interrupt handlers), and when it gets to the userland debugger examining dr6 it sees the low 2 bits both set. > It's really quite a tricky matter. Should a register be allocated to > kwatch only when no user process needs it? Should we really go about > checking the requirements of every single process whenever a kwatch > allocation request comes in? What if the processes which need a > particular register aren't running -- should the register then be given to > kwatch? What if one of those processes then does start running on one > CPU? To "go about checking the requirements of every single process" is not so hard as it sounds when they're recorded as a single global use count per slot, as your original code does. When you mentioned a "your allocation is available" callback, I was thinking it might come to that being called inside context switch. It's all rather tricky, indeed. The obvious answer is to start simple. If any user process anywhere uses drN, kwatch has to give it up for all CPUs (watchpoints with less than "break ptrace" priority do). If anyone really cares about more flexibility than that, we can change or extend it. Some copious comments in the interface descriptions can lead them in the right direction if the situation comes up. Probably with systemtap support in a while, we'll get a lot more concrete uses of watchpoints and people finding out what really matters to them. Thanks, Roland - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: NAK new drivers without proper power management?
Hi. On Sat, 2007-02-10 at 00:12 +0100, Rafael J. Wysocki wrote: > > > I think if CONFIG_PM_DEBUG is set, the core should warn about drivers not > > > having .suspend or .resume routines. > > > > The only problem with that is, not everyone turns on CONFIG_PM_DEBUG. > > CONFIG_PM instead? > > Well, I can imagine a driver that doesn't need a .suspend routine, for > example, > and I don't think we should make the kernel always complain about that. How about... #ifdef CONFIG_PM_PARANOIA static int empty_suspend_routine(struct device *dev, pm_message_t state) { return 0; } #define empty_suspend empty_suspend_routine #else #define empty_suspend NULL #endif ... .suspend = empty_suspend; ... Then CONFIG_PM_PARANOIA can be enabled by default for now, and when we eventually device it's not needed anymore, someone can submit a patch replacing either turning off the CONFIG by default or removing the whole mechanism. > I think if someone doesn't set CONFIG_PM_DEBUG, we can ask him to set it > and report back. We can, but the whole point to the suggestion was to make your life and mine easier, as well as those of our users. Making it dependent on CONFIG_PM instead achieves that by: - Saving you, I and distro people from having to tell their users to enable the option (and how to) - Saving the users the problem of going through all the steps, making mistakes, potentially ending up with unbootable systems because they make mistakes and so on. This way, they just need to look in dmesg. Regards, Nigel - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 21/22] honor r/w changes at do_remount() time
On Fri, 09 Feb 2007 14:53:44 -0800 Dave Hansen <[EMAIL PROTECTED]> wrote: > This is the core of the read-only bind mount patch set. Who wants read-only bind mounts, and for what reason? - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 01/22] filesystem helpers for custom 'struct file's
On Fri, 09 Feb 2007 14:53:29 -0800 Dave Hansen <[EMAIL PROTECTED]> wrote: > +/* > + * Note: This is a crappy interface. It is here to make > + * merging with the existing users of get_empty_filp() > + * who have complex failure logic easier. All users > + * of this should be moving to alloc_file(). > + */ > +int init_file(struct file *file, struct vfsmount *mnt, > +struct dentry *dentry, mode_t mode, > +const struct file_operations *fop) crappy name too ;) At least two filesystems have defined their own static-scope init_file() and so they'll explode if they somehow maange to include file.h. I guess we can cross that bridge when we fall off it, but sometime it might be prudent to do s/init_file/configfs_init_file/ and ditto sysfs_init_file. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: NAK new drivers without proper power management?
Hi, On Friday, 9 February 2007 23:51, Nigel Cunningham wrote: > Hi. > > On Fri, 2007-02-09 at 23:44 +0100, Rafael J. Wysocki wrote: > > On Friday, 9 February 2007 23:26, Nigel Cunningham wrote: > > > Hi. > > > > > > On Fri, 2007-02-09 at 23:17 +0100, Arjan van de Ven wrote: > > > > On Sat, 2007-02-10 at 08:57 +1100, Nigel Cunningham wrote: > > > > > Hi. > > > > > > > > > > I don't think this is already done (feel free to correct me if I'm > > > > > wrong).. > > > > > > > > > > Can we start to NAK new drivers that don't have proper power > > > > > management > > > > > implemented? There really is no excuse for writing a new driver and > > > > > not > > > > > putting .suspend and .resume methods in anymore, is there? > > > > > > > > > > > > to a large degree, a device driver that doesn't suspend is better than > > > > no device driver at all, right? > > > > > > I'm not sure it is. It only makes more work for everyone else: We have > > > to help people figure out what causes their computer to fail to resume > > > (which can take quite a while), then get them them complain to driver > > > author, and the driver author has to submit patches to fix it. > > > > > > All of this is avoided if they'll just do it right in the first place. > > > > > > > now.. if you want to make the core warn about it, that's very fair > > > > > > That's probably a good idea too, since I'm only suggesting this for new > > > drivers. > > > > I think if CONFIG_PM_DEBUG is set, the core should warn about drivers not > > having .suspend or .resume routines. > > The only problem with that is, not everyone turns on CONFIG_PM_DEBUG. > CONFIG_PM instead? Well, I can imagine a driver that doesn't need a .suspend routine, for example, and I don't think we should make the kernel always complain about that. I think if someone doesn't set CONFIG_PM_DEBUG, we can ask him to set it and report back. Greetings, Rafael - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0 of 4] Generic AIO by scheduling stacks
On Fri, 9 Feb 2007, Linus Torvalds wrote: > > Ok, here's another entry in this discussion. That's another way to do it. But you end up creating/destroying a new thread for every request. May be performing just fine. Another, even simpler way IMO, is to just have a plain per-task kthread pool, and a queue. An async_submit() drops a request in the queue, and wakes the requests queue-head where the kthreads are sleeping. One kthread picks up the request, service it, drops a result in the result queue, and wakes results queue-head (where async_fetch() are sleeping). Cancellation is not problem here (by the mean of sending a signal to the service kthread). Also, no problem with arch-dependent code. This is a 1:1 match of what my userspace implementation does. Of course, no hot-path optimization are performed here, and you need a few context switches more than necessary. Let's have Zach (Ingo support to Zach would be great) play with the optimized version, and then we can maybe bench the three to see if the more complex code that the optimized version require, gets a pay-back from the performance side. /me thinks it likely will - Davide - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [ANNOUNCE] d80211 based driver for Intel PRO/Wireless 3945ABG
Hello. On Sat, 2007-02-10 at 09:26, Neil Brown wrote: > On Friday February 9, [EMAIL PROTECTED] wrote: > > > > Ok. Now... any questions? > > > > Yes. Does this require a closed user-space helper like the other > 3945ABG driver, or is it completely open (maybe excepting firmware)? Quote from the mentioned website: "In addition to using the new d80211 subsystem, this project uses a new microcode image which removes the need for the user space regulatory daemon for this adapter" regards Stefan Schmidt signature.asc Description: Digital signature
[PATCH] saa7134: cleanup
A cleanup patch against 2.6.20 for saa7134 video4linux driver: - use generic sort instead of bubblesort - removed useless saa7134_video_fini function - small coding style changes Signed-off-by: Heikki Orsila <[EMAIL PROTECTED]> -- Heikki Orsila Barbie's law: [EMAIL PROTECTED] "Math is hard, let's go shopping!" http://www.iki.fi/shd diff -urp linux-2.6.20-org/drivers/media/video/saa7134/saa7134-core.c linux-2.6.20/drivers/media/video/saa7134/saa7134-core.c --- linux-2.6.20-org/drivers/media/video/saa7134/saa7134-core.c 2007-02-04 20:44:54.0 +0200 +++ linux-2.6.20/drivers/media/video/saa7134/saa7134-core.c 2007-02-10 00:51:01.0 +0200 @@ -703,7 +703,6 @@ static int saa7134_hwfini(struct saa7134 saa7134_ts_fini(dev); saa7134_input_fini(dev); saa7134_vbi_fini(dev); - saa7134_video_fini(dev); saa7134_tvaudio_fini(dev); return 0; } diff -urp linux-2.6.20-org/drivers/media/video/saa7134/saa7134-video.c linux-2.6.20/drivers/media/video/saa7134/saa7134-video.c --- linux-2.6.20-org/drivers/media/video/saa7134/saa7134-video.c 2007-02-04 20:44:54.0 +0200 +++ linux-2.6.20/drivers/media/video/saa7134/saa7134-video.c2007-02-10 00:51:01.0 +0200 @@ -26,6 +26,7 @@ #include #include #include +#include #include "saa7134-reg.h" #include "saa7134.h" @@ -516,14 +517,12 @@ static int res_get(struct saa7134_dev *d return 1; } -static -int res_check(struct saa7134_fh *fh, unsigned int bit) +static int res_check(struct saa7134_fh *fh, unsigned int bit) { return (fh->resources & bit); } -static -int res_locked(struct saa7134_dev *dev, unsigned int bit) +static int res_locked(struct saa7134_dev *dev, unsigned int bit) { return (dev->resources & bit); } @@ -732,25 +731,6 @@ struct cliplist { __u8 disable; }; -static void sort_cliplist(struct cliplist *cl, int entries) -{ - struct cliplist swap; - int i,j,n; - - for (i = entries-2; i >= 0; i--) { - for (n = 0, j = 0; j <= i; j++) { - if (cl[j].position > cl[j+1].position) { - swap = cl[j]; - cl[j] = cl[j+1]; - cl[j+1] = swap; - n++; - } - } - if (0 == n) - break; - } -} - static void set_cliplist(struct saa7134_dev *dev, int reg, struct cliplist *cl, int entries, char *name) { @@ -784,15 +764,27 @@ static int clip_range(int val) return val; } +/* Sort into smallest position first order */ +static int cliplist_cmp(const void *a, const void *b) +{ + const struct cliplist *cla = a; + const struct cliplist *clb = b; + if (cla->position < clb->position) + return -1; + if (cla->position > clb->position) + return 1; + return 0; +} + static int setup_clipping(struct saa7134_dev *dev, struct v4l2_clip *clips, int nclips, int interlace) { struct cliplist col[16], row[16]; - int cols, rows, i; + int cols = 0, rows = 0, i; int div = interlace ? 2 : 1; - memset(col,0,sizeof(col)); cols = 0; - memset(row,0,sizeof(row)); rows = 0; + memset(col, 0, sizeof(col)); + memset(row, 0, sizeof(row)); for (i = 0; i < nclips && i < 8; i++) { col[cols].position = clip_range(clips[i].c.left); col[cols].enable = (1 << i); @@ -808,8 +800,8 @@ static int setup_clipping(struct saa7134 row[rows].disable = (1 << i); rows++; } - sort_cliplist(col,cols); - sort_cliplist(row,rows); + sort(col, cols, sizeof col[0], cliplist_cmp, NULL); + sort(row, rows, sizeof row[0], cliplist_cmp, NULL); set_cliplist(dev,0x380,col,cols,"cols"); set_cliplist(dev,0x384,row,rows,"rows"); return 0; @@ -1261,19 +1253,14 @@ static struct videobuf_queue* saa7134_qu static int saa7134_resource(struct saa7134_fh *fh) { - int res = 0; + if (fh->type == V4L2_BUF_TYPE_VIDEO_CAPTURE) + return RESOURCE_VIDEO; - switch (fh->type) { - case V4L2_BUF_TYPE_VIDEO_CAPTURE: - res = RESOURCE_VIDEO; - break; - case V4L2_BUF_TYPE_VBI_CAPTURE: - res = RESOURCE_VBI; - break; - default: - BUG(); - } - return res; + if (fh->type == V4L2_BUF_TYPE_VBI_CAPTURE) + return RESOURCE_VBI; + + BUG(); + return 0; } static int video_open(struct inode *inode, struct file *file) @@ -1461,8 +1448,7 @@ static int video_release(struct inode *i return 0; } -static int -video_mmap(struct file *file, struct vm_area_struct * vma) +static int
[PATCH 06/22] elevate write count during entire ncp_ioctl()
Some ioctls need write access, but others don't. Make a helper function to decide when write access is needed, and take it. Signed-off-by: Dave Hansen <[EMAIL PROTECTED]> --- lxc-dave/fs/ncpfs/ioctl.c | 55 +- 1 file changed, 54 insertions(+), 1 deletion(-) diff -puN fs/ncpfs/ioctl.c~08-24-elevate-write-count-during-entire-ncp-ioctl fs/ncpfs/ioctl.c --- lxc/fs/ncpfs/ioctl.c~08-24-elevate-write-count-during-entire-ncp-ioctl 2007-02-09 14:26:50.0 -0800 +++ lxc-dave/fs/ncpfs/ioctl.c 2007-02-09 14:26:50.0 -0800 @@ -14,6 +14,7 @@ #include #include #include +#include #include #include #include @@ -260,7 +261,7 @@ ncp_get_charsets(struct ncp_server* serv } #endif /* CONFIG_NCPFS_NLS */ -int ncp_ioctl(struct inode *inode, struct file *filp, +static int __ncp_ioctl(struct inode *inode, struct file *filp, unsigned int cmd, unsigned long arg) { struct ncp_server *server = NCP_SERVER(inode); @@ -821,6 +822,58 @@ outrel: return -EINVAL; } +static int ncp_ioctl_need_write(unsigned int cmd) +{ + switch (cmd) { + case NCP_IOC_GET_FS_INFO: + case NCP_IOC_GET_FS_INFO_V2: + case NCP_IOC_NCPREQUEST: + case NCP_IOC_SETDENTRYTTL: + case NCP_IOC_SIGN_INIT: + case NCP_IOC_LOCKUNLOCK: + case NCP_IOC_SET_SIGN_WANTED: + return 1; + case NCP_IOC_GETOBJECTNAME: + case NCP_IOC_SETOBJECTNAME: + case NCP_IOC_GETPRIVATEDATA: + case NCP_IOC_SETPRIVATEDATA: + case NCP_IOC_SETCHARSETS: + case NCP_IOC_GETCHARSETS: + case NCP_IOC_CONN_LOGGED_IN: + case NCP_IOC_GETDENTRYTTL: + case NCP_IOC_GETMOUNTUID2: + case NCP_IOC_SIGN_WANTED: + case NCP_IOC_GETROOT: + case NCP_IOC_SETROOT: + return 0; + default: + /* unkown IOCTL command, assume write */ + WARN_ON(1); + } + return 1; +} + +int ncp_ioctl(struct inode *inode, struct file *filp, + unsigned int cmd, unsigned long arg) +{ + int ret; + + if (ncp_ioctl_need_write(cmd)) { + /* +* inside the ioctl(), any failures which +* are because of file_permission() are +* -EACCESS, so it seems consistent to keep +* that here. +*/ + if (mnt_want_write(filp->f_vfsmnt)) + return -EACCES; + } + ret = __ncp_ioctl(inode, filp, cmd, arg); + if (ncp_ioctl_need_write(cmd)) + mnt_drop_write(filp->f_vfsmnt); + return ret; +} + #ifdef CONFIG_COMPAT long ncp_compat_ioctl(struct file *file, unsigned int cmd, unsigned long arg) { _ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 04/22] elevate writer count for chown and friends
chown/chmod,etc... don't call permission in the same way that the normal "open for write" calls do. They still write to the filesystem, so bump the write count during these operations. Signed-off-by: Dave Hansen <[EMAIL PROTECTED]> --- lxc-dave/fs/open.c | 37 + 1 file changed, 33 insertions(+), 4 deletions(-) diff -puN fs/open.c~06-24-elevate-writer-count-for-chown-and-friends fs/open.c --- lxc/fs/open.c~06-24-elevate-writer-count-for-chown-and-friends 2007-02-09 14:26:48.0 -0800 +++ lxc-dave/fs/open.c 2007-02-09 14:26:48.0 -0800 @@ -511,9 +511,12 @@ asmlinkage long sys_fchmod(unsigned int err = -EROFS; if (IS_RDONLY(inode)) goto out_putf; + err = mnt_want_write(file->f_vfsmnt); + if (err) + goto out_putf; err = -EPERM; if (IS_IMMUTABLE(inode) || IS_APPEND(inode)) - goto out_putf; + goto out_drop_write; mutex_lock(>i_mutex); if (mode == (mode_t) -1) mode = inode->i_mode; @@ -522,6 +525,8 @@ asmlinkage long sys_fchmod(unsigned int err = notify_change(dentry, ); mutex_unlock(>i_mutex); +out_drop_write: + mnt_drop_write(file->f_vfsmnt); out_putf: fput(file); out: @@ -541,13 +546,16 @@ asmlinkage long sys_fchmodat(int dfd, co goto out; inode = nd.dentry->d_inode; + error = mnt_want_write(nd.mnt); + if (error) + goto dput_and_out; error = -EROFS; if (IS_RDONLY(inode)) - goto dput_and_out; + goto out_drop_write; error = -EPERM; if (IS_IMMUTABLE(inode) || IS_APPEND(inode)) - goto dput_and_out; + goto out_drop_write; mutex_lock(>i_mutex); if (mode == (mode_t) -1) @@ -557,6 +565,8 @@ asmlinkage long sys_fchmodat(int dfd, co error = notify_change(nd.dentry, ); mutex_unlock(>i_mutex); +out_drop_write: + mnt_drop_write(nd.mnt); dput_and_out: path_release(); out: @@ -582,7 +592,7 @@ static int chown_common(struct dentry * error = -EROFS; if (IS_RDONLY(inode)) goto out; - error = -EPERM; + error = -EPERM; if (IS_IMMUTABLE(inode) || IS_APPEND(inode)) goto out; newattrs.ia_valid = ATTR_CTIME; @@ -611,7 +621,12 @@ asmlinkage long sys_chown(const char __u error = user_path_walk(filename, ); if (error) goto out; + error = mnt_want_write(nd.mnt); + if (error) + goto out_release; error = chown_common(nd.dentry, user, group); + mnt_drop_write(nd.mnt); +out_release: path_release(); out: return error; @@ -631,7 +646,12 @@ asmlinkage long sys_fchownat(int dfd, co error = __user_walk_fd(dfd, filename, follow, ); if (error) goto out; + error = mnt_want_write(nd.mnt); + if (error) + goto out_release; error = chown_common(nd.dentry, user, group); + mnt_drop_write(nd.mnt); +out_release: path_release(); out: return error; @@ -645,7 +665,11 @@ asmlinkage long sys_lchown(const char __ error = user_path_walk_link(filename, ); if (error) goto out; + error = mnt_want_write(nd.mnt); + if (error) + goto out_release; error = chown_common(nd.dentry, user, group); +out_release: path_release(); out: return error; @@ -662,9 +686,14 @@ asmlinkage long sys_fchown(unsigned int if (!file) goto out; + error = mnt_want_write(file->f_vfsmnt); + if (error) + goto out_fput; dentry = file->f_path.dentry; audit_inode(NULL, dentry->d_inode); error = chown_common(dentry, user, group); + mnt_drop_write(file->f_vfsmnt); +out_fput: fput(file); out: return error; _ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 07/22] elevate write count for link and symlink calls
Signed-off-by: Dave Hansen <[EMAIL PROTECTED]> --- lxc-dave/fs/namei.c | 10 ++ 1 file changed, 10 insertions(+) diff -puN fs/namei.c~09-24-elevate-write-count-for-link-and-symlink-calls fs/namei.c --- lxc/fs/namei.c~09-24-elevate-write-count-for-link-and-symlink-calls 2007-02-09 14:26:50.0 -0800 +++ lxc-dave/fs/namei.c 2007-02-09 14:26:50.0 -0800 @@ -2236,7 +2236,12 @@ asmlinkage long sys_symlinkat(const char if (IS_ERR(dentry)) goto out_unlock; + error = mnt_want_write(nd.mnt); + if (error) + goto out_dput; error = vfs_symlink(nd.dentry->d_inode, dentry, from, S_IALLUGO); + mnt_drop_write(nd.mnt); +out_dput: dput(dentry); out_unlock: mutex_unlock(>d_inode->i_mutex); @@ -2331,7 +2336,12 @@ asmlinkage long sys_linkat(int olddfd, c error = PTR_ERR(new_dentry); if (IS_ERR(new_dentry)) goto out_unlock; + error = mnt_want_write(nd.mnt); + if (error) + goto out_dput; error = vfs_link(old_nd.dentry, nd.dentry->d_inode, new_dentry); + mnt_drop_write(nd.mnt); +out_dput: dput(new_dentry); out_unlock: mutex_unlock(>d_inode->i_mutex); _ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 08/22] elevate mount count for extended attributes
This basically audits the callers of xattr_permission(), which calls permission() and can perform writes to the filesystem. Signed-off-by: Dave Hansen <[EMAIL PROTECTED]> --- lxc-dave/fs/nfsd/nfs4proc.c |7 ++- lxc-dave/fs/xattr.c | 14 ++ 2 files changed, 20 insertions(+), 1 deletion(-) diff -puN fs/nfsd/nfs4proc.c~10-24-elevate-mount-count-for-extended-attributes fs/nfsd/nfs4proc.c --- lxc/fs/nfsd/nfs4proc.c~10-24-elevate-mount-count-for-extended-attributes 2007-02-09 14:26:51.0 -0800 +++ lxc-dave/fs/nfsd/nfs4proc.c 2007-02-09 14:26:51.0 -0800 @@ -626,14 +626,19 @@ nfsd4_setattr(struct svc_rqst *rqstp, st return status; } } + status = mnt_want_write(cstate->current_fh.fh_export->ex_mnt); + if (status) + return status; status = nfs_ok; if (setattr->sa_acl != NULL) status = nfsd4_set_nfs4_acl(rqstp, >current_fh, setattr->sa_acl); if (status) - return status; + goto out; status = nfsd_setattr(rqstp, >current_fh, >sa_iattr, 0, (time_t)0); +out: + mnt_drop_write(cstate->current_fh.fh_export->ex_mnt); return status; } diff -puN fs/xattr.c~10-24-elevate-mount-count-for-extended-attributes fs/xattr.c --- lxc/fs/xattr.c~10-24-elevate-mount-count-for-extended-attributes 2007-02-09 14:26:51.0 -0800 +++ lxc-dave/fs/xattr.c 2007-02-09 14:26:51.0 -0800 @@ -12,6 +12,7 @@ #include #include #include +#include #include #include #include @@ -237,7 +238,11 @@ sys_setxattr(char __user *path, char __u error = user_path_walk(path, ); if (error) return error; + error = mnt_want_write(nd.mnt); + if (error) + return error; error = setxattr(nd.dentry, name, value, size, flags); + mnt_drop_write(nd.mnt); path_release(); return error; } @@ -252,7 +257,11 @@ sys_lsetxattr(char __user *path, char __ error = user_path_walk_link(path, ); if (error) return error; + error = mnt_want_write(nd.mnt); + if (error) + return error; error = setxattr(nd.dentry, name, value, size, flags); + mnt_drop_write(nd.mnt); path_release(); return error; } @@ -268,9 +277,14 @@ sys_fsetxattr(int fd, char __user *name, f = fget(fd); if (!f) return error; + error = mnt_want_write(f->f_vfsmnt); + if (error) + goto out_fput; dentry = f->f_path.dentry; audit_inode(NULL, dentry->d_inode); error = setxattr(dentry, name, value, size, flags); + mnt_drop_write(f->f_vfsmnt); +out_fput: fput(f); return error; } _ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 02/22] r/o bind mounts: add vfsmount writer counts
This patch actually adds the mount and superblock writer counts, and the mnt_want/drop_write() functions that use them. Before these can become useful, we must first cover each place in the VFS where writes are performed with a want/drop pair. When that is complete, we can actually introduce code that will safely check the counts before allowing r/w<->r/o transitions to occur. Signed-off-by: Dave Hansen <[EMAIL PROTECTED]> --- lxc-dave/fs/namespace.c| 53 + lxc-dave/fs/super.c| 18 ++--- lxc-dave/include/linux/fs.h|2 + lxc-dave/include/linux/mount.h | 21 4 files changed, 90 insertions(+), 4 deletions(-) diff -puN fs/namespace.c~03-24-add-vfsmount-writer-count fs/namespace.c --- lxc/fs/namespace.c~03-24-add-vfsmount-writer-count 2007-02-09 14:26:47.0 -0800 +++ lxc-dave/fs/namespace.c 2007-02-09 14:26:47.0 -0800 @@ -58,6 +58,7 @@ struct vfsmount *alloc_vfsmnt(const char if (mnt) { mnt->mnt_user_ns = get_user_ns(current->nsproxy->user_ns); atomic_set(>mnt_count, 1); + mnt->mnt_writers = 0; INIT_LIST_HEAD(>mnt_hash); INIT_LIST_HEAD(>mnt_child); INIT_LIST_HEAD(>mnt_mounts); @@ -78,6 +79,56 @@ struct vfsmount *alloc_vfsmnt(const char return mnt; } +int mnt_make_readonly(struct vfsmount *mnt) +{ + int ret = 0; + + WARN_ON(__mnt_is_readonly(mnt)); + + /* +* This flag set is actually redundant with what +* happens in do_remount(), but since we do this +* under the lock, anyone attempting to get a write +* on it after this will fail. +*/ + spin_lock(>mnt_sb->s_mnt_writers_lock); + if (!mnt->mnt_writers) + mnt->mnt_flags |= MNT_READONLY; + else + ret = -EBUSY; + spin_unlock(>mnt_sb->s_mnt_writers_lock); + return ret; +} + +int mnt_want_write(struct vfsmount *mnt) +{ + int ret = 0; + + spin_lock(>mnt_sb->s_mnt_writers_lock); + if (mnt->mnt_writers) + goto out; + + if (__mnt_is_readonly(mnt)) { + ret = -EROFS; + goto out; + } + mnt->mnt_sb->s_writers++; + mnt->mnt_writers++; +out: + spin_unlock(>mnt_sb->s_mnt_writers_lock); + return ret; +} +EXPORT_SYMBOL_GPL(mnt_want_write); + +void mnt_drop_write(struct vfsmount *mnt) +{ + spin_lock(>mnt_sb->s_mnt_writers_lock); + mnt->mnt_sb->s_writers--; + mnt->mnt_writers--; + spin_unlock(>mnt_sb->s_mnt_writers_lock); +} +EXPORT_SYMBOL_GPL(mnt_drop_write); + int simple_set_mnt(struct vfsmount *mnt, struct super_block *sb) { mnt->mnt_sb = sb; @@ -1415,6 +1466,8 @@ long do_mount(char *dev_name, char *dir_ ((char *)data_page)[PAGE_SIZE - 1] = 0; /* Separate the per-mountpoint flags */ + if (flags & MS_RDONLY) + mnt_flags |= MNT_READONLY; if (flags & MS_NOSUID) mnt_flags |= MNT_NOSUID; if (flags & MS_NODEV) diff -puN fs/super.c~03-24-add-vfsmount-writer-count fs/super.c --- lxc/fs/super.c~03-24-add-vfsmount-writer-count 2007-02-09 14:26:47.0 -0800 +++ lxc-dave/fs/super.c 2007-02-09 14:26:47.0 -0800 @@ -93,6 +93,8 @@ static struct super_block *alloc_super(s s->s_qcop = sb_quotactl_ops; s->s_op = _op; s->s_time_gran = 10; + s->s_writers = 0; + spin_lock_init(>s_mnt_writers_lock); } out: return s; @@ -576,6 +578,11 @@ static void mark_files_ro(struct super_b file_list_unlock(); } +static int sb_remount_ro(struct super_block *sb) +{ + return fs_may_remount_ro(sb); +} + /** * do_remount_sb - asks filesystem to change mount options. * @sb:superblock in question @@ -587,7 +594,8 @@ static void mark_files_ro(struct super_b */ int do_remount_sb(struct super_block *sb, int flags, void *data, int force) { - int retval; + int retval = 0; + int sb_started_ro = (sb->s_flags & MS_RDONLY); #ifdef CONFIG_BLOCK if (!(flags & MS_RDONLY) && bdev_read_only(sb->s_bdev)) @@ -600,11 +608,13 @@ int do_remount_sb(struct super_block *sb /* If we are remounting RDONLY and current sb is read/write, make sure there are no rw files opened */ - if ((flags & MS_RDONLY) && !(sb->s_flags & MS_RDONLY)) { + if ((flags & MS_RDONLY) && !sb_started_ro) { if (force) mark_files_ro(sb); - else if (!fs_may_remount_ro(sb)) - return -EBUSY; + else + retval = sb_remount_ro(sb); + if (retval) + return retval; } if (sb->s_op->remount_fs) { diff -puN
[PATCH 22/22] kill open files traverse on remount ro
Now that we have the sb writer count, and all of the writers marked with mnt_want_write(), we don't need to go looking at all of the individual open files. Kill the open files walk, and use the sb writer count. Signed-off-by: Dave Hansen <[EMAIL PROTECTED]> --- lxc-dave/fs/file_table.c| 25 - lxc-dave/fs/super.c | 13 - lxc-dave/include/linux/fs.h |2 -- 3 files changed, 12 insertions(+), 28 deletions(-) diff -puN fs/file_table.c~24-24-kill-open-files-traverse-on-remount-ro fs/file_table.c --- lxc/fs/file_table.c~24-24-kill-open-files-traverse-on-remount-ro 2007-02-09 14:27:01.0 -0800 +++ lxc-dave/fs/file_table.c2007-02-09 14:27:01.0 -0800 @@ -308,31 +308,6 @@ void file_kill(struct file *file) } } -int fs_may_remount_ro(struct super_block *sb) -{ - struct list_head *p; - - /* Check that no files are currently opened for writing. */ - file_list_lock(); - list_for_each(p, >s_files) { - struct file *file = list_entry(p, struct file, f_u.fu_list); - struct inode *inode = file->f_path.dentry->d_inode; - - /* File with pending delete? */ - if (inode->i_nlink == 0) - goto too_bad; - - /* Writeable file? */ - if (S_ISREG(inode->i_mode) && (file->f_mode & FMODE_WRITE)) - goto too_bad; - } - file_list_unlock(); - return 1; /* Tis' cool bro. */ -too_bad: - file_list_unlock(); - return 0; -} - void __init files_init(unsigned long mempages) { int n; diff -puN fs/super.c~24-24-kill-open-files-traverse-on-remount-ro fs/super.c --- lxc/fs/super.c~24-24-kill-open-files-traverse-on-remount-ro 2007-02-09 14:27:01.0 -0800 +++ lxc-dave/fs/super.c 2007-02-09 14:27:01.0 -0800 @@ -580,7 +580,18 @@ static void mark_files_ro(struct super_b static int sb_remount_ro(struct super_block *sb) { - return fs_may_remount_ro(sb); + int ret = 0; + + /* +* The r/o flag actually gets set +* by the caller. +*/ + spin_lock(>s_mnt_writers_lock); + if (sb->s_writers) + ret = -EBUSY; + spin_unlock(>s_mnt_writers_lock); + + return ret; } /** diff -puN include/linux/fs.h~24-24-kill-open-files-traverse-on-remount-ro include/linux/fs.h --- lxc/include/linux/fs.h~24-24-kill-open-files-traverse-on-remount-ro 2007-02-09 14:27:01.0 -0800 +++ lxc-dave/include/linux/fs.h 2007-02-09 14:27:01.0 -0800 @@ -1657,8 +1657,6 @@ extern const struct file_operations read extern const struct file_operations write_fifo_fops; extern const struct file_operations rdwr_fifo_fops; -extern int fs_may_remount_ro(struct super_block *); - #ifdef CONFIG_BLOCK /* * return READ, READA, or WRITE _ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: somebody dropped a (warning) bomb
Hello! > void* comparisons are unsigned. Period. As far as the C standard is concerned, there is no relationship between comparison on pointers and comparison of their values casted to uintptr_t. The address space needn't be linear and on some machines it isn't. So speaking about signedness of pointer comparisons doesn't make sense, except for concrete implementations. Have a nice fortnight -- Martin `MJ' Mares <[EMAIL PROTECTED]> http://mj.ucw.cz/ Faculty of Math and Physics, Charles University, Prague, Czech Rep., Earth Top ten reasons to procrastinate: 1. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 19/22] elevate writer count for custom struct_file
Some filesystems forego the use of normal vfs calls to create struct files. Make sure that these users elevate the mnt writer count. These probably don't have any real meaning because there is no real backing store for these mounts, but it is here for consistency. Signed-off-by: Dave Hansen <[EMAIL PROTECTED]> --- lxc-dave/fs/file_table.c |4 1 file changed, 4 insertions(+) diff -puN fs/file_table.c~22-24-elevate-writer-count-for-custom-struct-file fs/file_table.c --- lxc/fs/file_table.c~22-24-elevate-writer-count-for-custom-struct-file 2007-02-09 14:26:59.0 -0800 +++ lxc-dave/fs/file_table.c2007-02-09 14:26:59.0 -0800 @@ -171,6 +171,10 @@ int init_file(struct file *file, struct file->f_mapping = dentry->d_inode->i_mapping; file->f_mode = mode; file->f_op = fop; + if (mode & FMODE_WRITE) { + error = mnt_want_write(mnt); + WARN_ON(error); + } return error; } _ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 15/22] elevate write count for do_sys_utime() and touch_atime()
Signed-off-by: Dave Hansen <[EMAIL PROTECTED]> --- lxc-dave/fs/inode.c | 20 1 file changed, 12 insertions(+), 8 deletions(-) diff -puN fs/inode.c~17-24-elevate-write-count-for-do-sys-utime-and-touch-atime fs/inode.c --- lxc/fs/inode.c~17-24-elevate-write-count-for-do-sys-utime-and-touch-atime 2007-02-09 14:26:56.0 -0800 +++ lxc-dave/fs/inode.c 2007-02-09 14:26:56.0 -0800 @@ -1170,22 +1170,23 @@ void touch_atime(struct vfsmount *mnt, s struct inode *inode = dentry->d_inode; struct timespec now; - if (inode->i_flags & S_NOATIME) + if (mnt && mnt_want_write(mnt)) return; + if (inode->i_flags & S_NOATIME) + goto out; if (IS_NOATIME(inode)) - return; + goto out; if ((inode->i_sb->s_flags & MS_NODIRATIME) && S_ISDIR(inode->i_mode)) - return; + goto out; /* * We may have a NULL vfsmount when coming from NFSD */ if (mnt) { if (mnt->mnt_flags & MNT_NOATIME) - return; + goto out; if ((mnt->mnt_flags & MNT_NODIRATIME) && S_ISDIR(inode->i_mode)) - return; - + goto out; if (mnt->mnt_flags & MNT_RELATIME) { /* * With relative atime, only update atime if the @@ -1196,16 +1197,19 @@ void touch_atime(struct vfsmount *mnt, s >i_atime) < 0 && timespec_compare(>i_ctime, >i_atime) < 0) - return; + goto out; } } now = current_fs_time(inode->i_sb); if (timespec_equal(>i_atime, )) - return; + goto out; inode->i_atime = now; mark_inode_dirty_sync(inode); +out: + if (mnt) + mnt_drop_write(mnt); } EXPORT_SYMBOL(touch_atime); _ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 12/22] elevate write count files are open()ed
This is the first really tricky patch in the series. It elevates the writer count on a mount each time a non-special file is opened for write. This is not completely apparent in the patch because the two if() conditions in may_open() above the mnt_want_write() call are, combined, equivalent to special_file(). There is also an elevated count around the vfs_create() call in open_namei(). The count needs to be kept elevated all the way into the may_open() call. Otherwise, when the write is dropped, a ro->rw transisition could occur. This would lead to having rw access on the newly created file, while the vfsmount is ro. That is bad. Signed-off-by: Dave Hansen <[EMAIL PROTECTED]> --- lxc-dave/fs/file_table.c |5 - lxc-dave/fs/namei.c | 22 ++ lxc-dave/ipc/mqueue.c|3 +++ 3 files changed, 25 insertions(+), 5 deletions(-) diff -puN fs/file_table.c~14-24-tricky-elevate-write-count-files-are-open-ed fs/file_table.c --- lxc/fs/file_table.c~14-24-tricky-elevate-write-count-files-are-open-ed 2007-02-09 14:26:54.0 -0800 +++ lxc-dave/fs/file_table.c2007-02-09 14:26:54.0 -0800 @@ -209,8 +209,11 @@ void fastcall __fput(struct file *file) if (unlikely(S_ISCHR(inode->i_mode) && inode->i_cdev != NULL)) cdev_put(inode->i_cdev); fops_put(file->f_op); - if (file->f_mode & FMODE_WRITE) + if (file->f_mode & FMODE_WRITE) { put_write_access(inode); + if(!special_file(inode->i_mode)) + mnt_drop_write(mnt); + } put_pid(file->f_owner.pid); put_user_ns(file->f_owner.user_ns); file_kill(file); diff -puN fs/namei.c~14-24-tricky-elevate-write-count-files-are-open-ed fs/namei.c --- lxc/fs/namei.c~14-24-tricky-elevate-write-count-files-are-open-ed 2007-02-09 14:26:54.0 -0800 +++ lxc-dave/fs/namei.c 2007-02-09 14:26:54.0 -0800 @@ -1548,8 +1548,17 @@ int may_open(struct nameidata *nd, int a return -EACCES; flag &= ~O_TRUNC; - } else if (IS_RDONLY(inode) && (flag & FMODE_WRITE)) - return -EROFS; + } else if (flag & FMODE_WRITE) { + /* +* effectively: !special_file() +* balanced by __fput() +*/ + error = mnt_want_write(nd->mnt); + if (error) + return error; + if (IS_RDONLY(inode)) + return -EROFS; + } /* * An append-only file must be opened in append mode for writing. */ @@ -1688,14 +1697,17 @@ do_last: } if (IS_ERR(nd->intent.open.file)) { - mutex_unlock(>d_inode->i_mutex); error = PTR_ERR(nd->intent.open.file); - goto exit_dput; + goto exit_mutex_unlock; } /* Negative dentry, just create the file */ if (!path.dentry->d_inode) { + error = mnt_want_write(nd->mnt); + if (error) + goto exit_mutex_unlock; error = open_namei_create(nd, , flag, mode); + mnt_drop_write(nd->mnt); if (error) goto exit; return 0; @@ -1733,6 +1745,8 @@ ok: goto exit; return 0; +exit_mutex_unlock: + mutex_unlock(>d_inode->i_mutex); exit_dput: dput_path(, nd); exit: diff -puN ipc/mqueue.c~14-24-tricky-elevate-write-count-files-are-open-ed ipc/mqueue.c --- lxc/ipc/mqueue.c~14-24-tricky-elevate-write-count-files-are-open-ed 2007-02-09 14:26:54.0 -0800 +++ lxc-dave/ipc/mqueue.c 2007-02-09 14:26:54.0 -0800 @@ -687,6 +687,9 @@ asmlinkage long sys_mq_open(const char _ goto out; filp = do_open(dentry, oflag); } else { + error = mnt_want_write(mqueue_mnt); + if (error) + goto out; filp = do_create(mqueue_mnt->mnt_root, dentry, oflag, mode, u_attr); } _ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 13/22] elevate writer count for do_sys_truncate()
Signed-off-by: Dave Hansen <[EMAIL PROTECTED]> --- lxc-dave/fs/open.c | 16 +++- 1 file changed, 11 insertions(+), 5 deletions(-) diff -puN fs/open.c~15-24-elevate-writer-count-for-do-sys-truncate fs/open.c --- lxc/fs/open.c~15-24-elevate-writer-count-for-do-sys-truncate 2007-02-09 14:26:55.0 -0800 +++ lxc-dave/fs/open.c 2007-02-09 14:26:55.0 -0800 @@ -241,28 +241,32 @@ static long do_sys_truncate(const char _ if (!S_ISREG(inode->i_mode)) goto dput_and_out; - error = vfs_permission(, MAY_WRITE); + error = mnt_want_write(nd.mnt); if (error) goto dput_and_out; + error = vfs_permission(, MAY_WRITE); + if (error) + goto mnt_drop_write_and_out; + error = -EROFS; if (IS_RDONLY(inode)) - goto dput_and_out; + goto mnt_drop_write_and_out; error = -EPERM; if (IS_IMMUTABLE(inode) || IS_APPEND(inode)) - goto dput_and_out; + goto mnt_drop_write_and_out; /* * Make sure that there are no leases. */ error = break_lease(inode, FMODE_WRITE); if (error) - goto dput_and_out; + goto mnt_drop_write_and_out; error = get_write_access(inode); if (error) - goto dput_and_out; + goto mnt_drop_write_and_out; error = locks_verify_truncate(inode, NULL, length); if (!error) { @@ -271,6 +275,8 @@ static long do_sys_truncate(const char _ } put_write_access(inode); +mnt_drop_write_and_out: + mnt_drop_write(nd.mnt); dput_and_out: path_release(); out: _ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 16/22] sys_mknodat(): elevate write count for vfs_mknod/create()
This takes care of all of the direct callers of vfs_mknod(). Since a few of these cases also handle normal file creation as well, this also covers some calls to vfs_create(). Signed-off-by: Dave Hansen <[EMAIL PROTECTED]> --- lxc-dave/fs/namei.c | 12 lxc-dave/fs/nfsd/vfs.c |4 lxc-dave/net/unix/af_unix.c |4 3 files changed, 20 insertions(+) diff -puN fs/namei.c~18-24-sys-mknodat-elevate-write-count-for-vfs-mknod-create fs/namei.c --- lxc/fs/namei.c~18-24-sys-mknodat-elevate-write-count-for-vfs-mknod-create 2007-02-09 14:26:57.0 -0800 +++ lxc-dave/fs/namei.c 2007-02-09 14:26:57.0 -0800 @@ -1903,14 +1903,26 @@ asmlinkage long sys_mknodat(int dfd, con if (!IS_ERR(dentry)) { switch (mode & S_IFMT) { case 0: case S_IFREG: + error = mnt_want_write(nd.mnt); + if (error) + break; error = vfs_create(nd.dentry->d_inode,dentry,mode,); + mnt_drop_write(nd.mnt); break; case S_IFCHR: case S_IFBLK: + error = mnt_want_write(nd.mnt); + if (error) + break; error = vfs_mknod(nd.dentry->d_inode,dentry,mode, new_decode_dev(dev)); + mnt_drop_write(nd.mnt); break; case S_IFIFO: case S_IFSOCK: + error = mnt_want_write(nd.mnt); + if (error) + break; error = vfs_mknod(nd.dentry->d_inode,dentry,mode,0); + mnt_drop_write(nd.mnt); break; case S_IFDIR: error = -EPERM; diff -puN fs/nfsd/vfs.c~18-24-sys-mknodat-elevate-write-count-for-vfs-mknod-create fs/nfsd/vfs.c --- lxc/fs/nfsd/vfs.c~18-24-sys-mknodat-elevate-write-count-for-vfs-mknod-create 2007-02-09 14:26:57.0 -0800 +++ lxc-dave/fs/nfsd/vfs.c 2007-02-09 14:26:57.0 -0800 @@ -664,6 +664,9 @@ nfsd_open(struct svc_rqst *rqstp, struct /* Disallow write access to files with the append-only bit set * or any access when mandatory locking enabled */ + err = mnt_want_write(fhp->fh_export->ex_mnt); + if (err) + goto out_nfserr; err = nfserr_perm; if (IS_APPEND(inode) && (access & MAY_WRITE)) goto out; @@ -1199,6 +1202,7 @@ nfsd_create(struct svc_rqst *rqstp, stru printk("nfsd: bad file type %o in nfsd_create\n", type); host_err = -EINVAL; } + mnt_drop_write(fhp->fh_export->ex_mnt); if (host_err < 0) goto out_nfserr; diff -puN net/unix/af_unix.c~18-24-sys-mknodat-elevate-write-count-for-vfs-mknod-create net/unix/af_unix.c --- lxc/net/unix/af_unix.c~18-24-sys-mknodat-elevate-write-count-for-vfs-mknod-create 2007-02-09 14:26:57.0 -0800 +++ lxc-dave/net/unix/af_unix.c 2007-02-09 14:26:57.0 -0800 @@ -816,7 +816,11 @@ static int unix_bind(struct socket *sock */ mode = S_IFSOCK | (SOCK_INODE(sock)->i_mode & ~current->fs->umask); + err = mnt_want_write(nd.mnt); + if (err) + goto out_mknod_dput; err = vfs_mknod(nd.dentry->d_inode, dentry, mode, 0); + mnt_drop_write(nd.mnt); if (err) goto out_mknod_dput; mutex_unlock(>d_inode->i_mutex); _ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/