Re: NAK new drivers without proper power management?

2007-02-09 Thread Arjan van de Ven

> > 
> > to a large degree, a device driver that doesn't suspend is better than
> > no device driver at all, right?
> 
> I'm not sure it is. It only makes more work for everyone else: We have
> to help people figure out what causes their computer to fail to resume
> (which can take quite a while), 

so we make the kernel printk on suspend if there are devices without
suspend/resume. Heck, make a config option that prints that at modprobe
time.


-- 
if you want to mail me at work (you don't), use arjan (at) linux.intel.com
Test the interaction between Linux and your BIOS via 
http://www.linuxfirmwarekit.org

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.18.2: sporadic SATA port resets (Broadcom BCM5785 (HT1000))

2007-02-09 Thread Tejun Heo

Emmeran Seehuber wrote:

# smartctl -a /dev/sda
smartctl version 5.36 [x86_64-unknown-linux-gnu] Copyright (C) 2002-6 Bruce 
Allen

Home page is http://smartmontools.sourceforge.net/

Device: ATA  WDC WD1500ADFD-0 Version: 20.0
Serial number:  WD-WMAP41246348
Device type: disk
Local Time is: Fri Feb  9 18:06:23 2007 CET
Device does not support SMART


Hmmm... Raptor not supporting SMART.  That's weird.  Please try 
'smartctl -d ata -a /dev/sda'.


Thanks.

--
tejun
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[git pull] Input patches for 2.6.20+

2007-02-09 Thread Dmitry Torokhov
Hi Linus,

Please pull from:

        git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input.git for-linus

or
        master.kernel.org:/pub/scm/linux/kernel/git/dtor/input.git for-linus

to receive updates for input subsystem.

Changelog:
--
Akinobu Mita (1):
  Input: pc110pad - return proper error

Cyrill V. Gorcunov (1):
  Input: HIL - handle erros from input_register_device()

David Brownell (1):
  Input: ads7846 - be more compatible with the hwmon framework

Dmitry Torokhov (3):
  Input: i8042 - really suppress ACK/NAK during panic blink
  Input: i8042 - fix AUX IRQ delivery check

Imre Deak (5):
  Input: ads7846 - pluggable filtering logic
  Input: ads7846 - optionally leave Vref on during differential measurements
  Input: ads7846 - switch to using hrtimer
  Input: ads7846 - select correct SPI mode
  Input: ads7846 - detect pen up from GPIO state

Jaya Kumar (1):
  Input: add Atlas button driver

Jiri Slaby (2):
  Input: hid-ff - add support for Logitech Momo racing wheel
  Input: remove scan_keyb driver

Michael Leun (1):
  Input: wistron - add support for Fujitsu-Siemens Amilo D88x0

Phil Blundell (1):
  Input: gpio-keys - keyboard driver for GPIO buttons

Richard Purdie (1):
  Input: tsdev - schedule removal

Robert P. J. Day (1):
  Input: inport - use correct config option for ATIXL

Diffstat:
-
 b/Documentation/feature-removal-schedule.txt |   15 +
 b/drivers/input/keyboard/Kconfig |   19 +
 b/drivers/input/keyboard/Makefile|5 
 b/drivers/input/keyboard/gpio_keys.c |  147 
 b/drivers/input/keyboard/hilkbd.c|  114 +-
 b/drivers/input/misc/Kconfig |   10 
 b/drivers/input/misc/Makefile|1 
 b/drivers/input/misc/atlas_btns.c|  170 +++
 b/drivers/input/misc/wistron_btns.c  |   20 +
 b/drivers/input/mouse/inport.c   |2 
 b/drivers/input/mouse/pc110pad.c |2 
 b/drivers/input/serio/i8042.c|5 
 b/drivers/input/touchscreen/Kconfig  |9 
 b/drivers/input/touchscreen/ads7846.c|  306 +++
 b/drivers/input/tsdev.c  |4 
 b/drivers/usb/input/hid-ff.c |1 
 b/drivers/usb/input/hid-lgff.c   |1 
 b/include/asm-arm/hardware/gpio_keys.h   |   17 +
 b/include/linux/spi/ads7846.h|2 
 drivers/char/scan_keyb.c |  149 -
 drivers/char/scan_keyb.h |   15 -
 drivers/input/serio/i8042.c  |7 
 drivers/input/touchscreen/ads7846.c  |  275 +++-
 include/linux/spi/ads7846.h  |   10 
 24 files changed, 889 insertions(+), 417 deletions(-)


-- 
Dmitry
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: NAK new drivers without proper power management?

2007-02-09 Thread Willy Tarreau
On Fri, Feb 09, 2007 at 07:25:34PM -0500, Jeff Garzik wrote:
> Nigel Cunningham wrote:
> >Hi.
> >
> >On Fri, 2007-02-09 at 23:17 +0100, Arjan van de Ven wrote:
> >>On Sat, 2007-02-10 at 08:57 +1100, Nigel Cunningham wrote:
> >>>Hi.
> >>>
> >>>I don't think this is already done (feel free to correct me if I'm
> >>>wrong)..
> >>>
> >>>Can we start to NAK new drivers that don't have proper power management
> >>>implemented? There really is no excuse for writing a new driver and not
> >>>putting .suspend and .resume methods in anymore, is there?
> >>
> >>to a large degree, a device driver that doesn't suspend is better than
> >>no device driver at all, right?
> >
> >I'm not sure it is. It only makes more work for everyone else: We have
> >to help people figure out what causes their computer to fail to resume
> >(which can take quite a while), then get them them complain to driver
> >author, and the driver author has to submit patches to fix it.
> >
> >All of this is avoided if they'll just do it right in the first place.
> 
> A lot of a lot of things could have been avoided, if they just did it 
> right the first time.
> 
> I think it's more valuable to users to get a basic network driver that 
> pings or a basic ATA driver that reads/writes, than peripheral issues 
> like suspend/resume.

100% agreed.

I've been used to a notebook (VAIO) which did not correctly shut down, and
did not support reboot. Now the one I have behaves normally on both features.
I've never ever felt the need for suspend/resume, that I've always attributed
to "geeks" requirements. I had to debug the shutdown code myself for the
previous notebook, and discovered that it was caused by bugs in the ACPI
state transitions for suspend and such fancy features. I would really have
prefered that the people writing the ACPI code had focused first on power-on/
power-off before the rest.

> Certainly we should ask for it, but it shouldn't be a merge-stopper.

I think we should even proceed in the opposite direction : refuse to suspend
if at least one driver does not support the feature, and enumerate the
faulty drivers on the console. While I agree that a machine which resumes
in a bad state is not funny at all to debug, at least when the user expects
his notebook to suspend and sees that it refuses, he can complain about the
drivers which do not support it, and can even unload them first if unneeded.

Regards,
Willy

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [git patches] libata updates 1 of 3

2007-02-09 Thread Markus Trippelsdorf
This update breaks sata_via on my VIA K8T800Pro machine:
 
 sata_via :00:0f.0 : failed to iomap PCI BAR 0
 sata_via :00:0f.0 : out of memory
 sata_via probe of :00:0f.0 failed with error -12
-- 
Markus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Fix null pointer dereference in appledisplay driver

2007-02-09 Thread Len Brown
Applied.

thanks,
-Len

On Friday 09 February 2007 19:18, Michael Hanselmann wrote:
> Commit 40b20c257a13c5a526ac540bc5e43d0fdf29792a by Len Brown introduced
> a null pointer dereference in the appledisplay driver. This patch fixes
> it.
> 
> Signed-off-by: Michael Hanselmann <[EMAIL PROTECTED]>
> 
> ---
> I suggest adding this to 2.6.20.1 because this bug causes the kernel to
> panic on boot when the driver is compiled in.
> 
> diff -Nrup --exclude-from linux-exclude-from 
> linux-2.6.20.orig/drivers/usb/misc/appledisplay.c 
> linux-2.6.20/drivers/usb/misc/appledisplay.c
> --- linux-2.6.20.orig/drivers/usb/misc/appledisplay.c 2007-02-09 
> 22:35:56.0 +0100
> +++ linux-2.6.20/drivers/usb/misc/appledisplay.c  2007-02-10 
> 01:00:28.0 +0100
> @@ -281,8 +281,8 @@ static int appledisplay_probe(struct usb
>   /* Register backlight device */
>   snprintf(bl_name, sizeof(bl_name), "appledisplay%d",
>   atomic_inc_return(_displays) - 1);
> - pdata->bd = backlight_device_register(bl_name, NULL, NULL,
> - _bl_data);
> + pdata->bd = backlight_device_register(bl_name, NULL,
> + pdata, _bl_data);
>   if (IS_ERR(pdata->bd)) {
>   err("appledisplay: Backlight registration failed");
>   goto error;
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [EMAIL PROTECTED]
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH -mm] libata: warn if speed limited due to 40-wire cable

2007-02-09 Thread Robert Hancock

Print an explicit warning when a device's UDMA mode is limited due to
a 40-wire cable being detected, so that users have some idea why their
device isn't running as fast as it should.

This moves the application of the drive's mode masks before the cable
rule, so that can tell whether the rate is being limited by the cable
and not the drive or controller.

I haven't tested whether the message actually shows up, as my system
isn't horked in this manner..

Signed-off-by: Robert Hancock <[EMAIL PROTECTED]>

--- linux-2.6.20-rc6-mm3/drivers/ata/libata-core.c  2007-02-04 
21:48:25.0 -0600
+++ linux-2.6.20-rc6-mm3edit/drivers/ata/libata-core.c  2007-02-09 
21:04:14.0 -0600
@@ -3393,22 +3393,24 @@ static void ata_dev_xfermask(struct ata_
xfer_mask = ata_pack_xfermask(ap->pio_mask,
  ap->mwdma_mask, ap->udma_mask);

+   /* drive modes available */
+   xfer_mask &= ata_pack_xfermask(dev->pio_mask,
+  dev->mwdma_mask, dev->udma_mask);
+   xfer_mask &= ata_id_xfermask(dev->id);
+
/* Apply cable rule here.  Don't apply it early because when
 * we handle hot plug the cable type can itself change.
+* Unknown or 80 wire cables reported host side are checked
+* drive side as well. Cases where we know a 40wire cable
+* is used safely for 80 are not checked here.
 */
-   if (ap->cbl == ATA_CBL_PATA40)
-   xfer_mask &= ~(0xF8 << ATA_SHIFT_UDMA);
-   /* Apply drive side cable rule. Unknown or 80 pin cables reported
-* host side are checked drive side as well. Cases where we know a
-* 40wire cable is used safely for 80 are not checked here.
-*/
-if (ata_drive_40wire(dev->id) && (ap->cbl == ATA_CBL_PATA_UNK || 
ap->cbl == ATA_CBL_PATA80))
+   if ((xfer_mask & (0xF8 << ATA_SHIFT_UDMA)) &&
+   ((ap->cbl == ATA_CBL_PATA40) ||
+(ata_drive_40wire(dev->id) &&
+ (ap->cbl == ATA_CBL_PATA_UNK || ap->cbl == ATA_CBL_PATA80 {
+   ata_dev_printk(dev, KERN_WARNING, "limited to UDMA2 due to 40-wire 
cable\n");
xfer_mask &= ~(0xF8 << ATA_SHIFT_UDMA);
-
-
-   xfer_mask &= ata_pack_xfermask(dev->pio_mask,
-  dev->mwdma_mask, dev->udma_mask);
-   xfer_mask &= ata_id_xfermask(dev->id);
+   }

/*
 *  CFA Advanced TrueIDE timings are not allowed on a shared

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: NAK new drivers without proper power management?

2007-02-09 Thread Nigel Cunningham
Hi.

On Sat, 2007-02-10 at 03:42 +, Matthew Garrett wrote:
> On Sat, Feb 10, 2007 at 08:57:49AM +1100, Nigel Cunningham wrote:
> 
> > Can we start to NAK new drivers that don't have proper power management
> > implemented? There really is no excuse for writing a new driver and not
> > putting .suspend and .resume methods in anymore, is there?
> 
> The PCI layer is able to deal with drivers that have no PM methods in 
> the most simple case.

Yeah. I suppose we could use a pm_safe bit flag in struct device_driver
and/or struct pci_driver. I have other things to do right now, but will
seek to understand the relationship between those structs better later.

Regards,

Nigel

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: forcedeth problems on 2.6.20-rc6-mm3

2007-02-09 Thread Robert Hancock

Ayaz Abdulla wrote:

For all those who are having issues, please try out the attached patch.

Ayaz


Seems to solve the problem for me (not heavily tested, but certainly 
isn't totally dead as it was before).


--
Robert Hancock  Saskatoon, SK, Canada
To email, remove "nospam" from [EMAIL PROTECTED]
Home Page: http://www.roberthancock.com/

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: NAK new drivers without proper power management?

2007-02-09 Thread Joseph Fannin
On Fri, Feb 09, 2007 at 08:59:55PM -0500, Lee Revell wrote:
> On 2/9/07, Robert Hancock <[EMAIL PROTECTED]> wrote:
> >I would disagree that it's a peripheral issue, it's pretty core these
> >days, at least for any hardware that you can stuff in a laptop (though a
> >fair number of desktops get suspended and resumed these days too).
>
> Servers are still the most important Linux market, and don't care
> about suspend/resume.  I would consider implementing suspend./resume
> for a driver that will only be used in server or HPC class hardware a
> waste of valuable development resources.

Please allow me to be offensively blunt for a moment.

So, the situation seems to be:

1. The work of the suspend developer who engages the users who put
   effort into making suspend work on their hardware (bless
   their addled little heads) often doesn't meet kernel standards,
   or isn't well enough documented to prove the real *need* for
   the features and/or hacks that have happened to get actual
   users' systems sleeping and running again.

2. The swsusp maintainer continues in the belief that as long as
   their are no bug reports in kernel bugzilla or crossing the
   (relatively obscure) swsusp mailing lists, it has zarro boogs
   and meanwhile works on the fourth implementation of suspend
   support in as many years.  It's in CVS on sourceforge.  There's
   no documentation whatsoever.

3. There's another guy who appears to be doing a lot of work, so I
   shan't leave him out.  Like the two developers previously
   mentioned, he seems to be working pretty hard on the whole
   thing.  The previously mentioned fourth suspend implementation
   seems to be largely his doing, for good and for ill.

4. "Everybody" knows suspend doesn't work on Linux without a huge
   amount of tinkering, deep magic, and dead chickens.  Only
   Gentoo users seem to bother; everyone else waits for Ubuntu
   12.04 wherein suspend will "just work".  The Gentoo users all
   use swsusp2, as it contains the hacks to work around:

5. All the suspend developers blame the lack of power-management
   support in drivers for the inablility of Linux to properly
   suspend on anything that doesn't support APM.

6. Getting proper power-management support in Linux device drivers
   is not a priority; drivers without any power management support
   whatsoever should not only be accepted -- they should be merged
   without comment or complaint.

   How is working suspend support ever supposed to happen?

--
Joseph Fannin
[EMAIL PROTECTED] || [EMAIL PROTECTED]

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Kwatch: kernel watchpoints using CPU debug registers

2007-02-09 Thread Alan Stern
On Fri, 9 Feb 2007, Roland McGrath wrote:

> I don't think I really object to the ABI change of clearing %dr6 after an
> exception so that it does not accumulate multiple results.  But first I'll
> have to convince myself that we never actually do want to accumulate
> multiple results.  Hmm, I think we can, so maybe I do object.  If you set
> two watchpoints inside a user buffer and then do a system call that touches
> both those addresses (e.g. read), then you will go through do_debug (to
> send_sigtrap) twice before returning to user mode.  When the syscall is
> done, you'll have a pending SIGTRAP for the debugger to handle.  By looking
> at your %dr6 the debugger can see that both watchpoints hit.  (gdb does not
> handle this case, but it should.)  Am I wrong?

I think you're right.

> So this gets to the more complicated view of %dr6 handling that I had first
> had in mind yesterday.  Each allocation "owns" one of the low 4 bits in
> %dr6 too.  Only the dr6 bits owned by the userland "raw" allocation
> (i.e. ptrace/utrace_regset) should appear nonzero in thread.debugreg[6].
> So when kwatch swallows a debug exception, it should mask off its bit from
> %dr6 in the CPU, but not clear %dr6 completely.  That way you can have a
> sequence of user dr0 hit, kwatch dr3 hit, user dr1 hit, all inside one
> system call (including interrupt handlers), and when it gets to the
> userland debugger examining dr6 it sees the low 2 bits both set.

Okay; I'll fix this too.  Come to think of it, kwatch needs to handle 
multiple hits as well -- there might be two watchpoints set to the same 
address.

> > It's really quite a tricky matter.  Should a register be allocated to
> > kwatch only when no user process needs it?  Should we really go about
> > checking the requirements of every single process whenever a kwatch
> > allocation request comes in?  What if the processes which need a
> > particular register aren't running -- should the register then be given to
> > kwatch?  What if one of those processes then does start running on one
> > CPU?
> 
> To "go about checking the requirements of every single process" is not so
> hard as it sounds when they're recorded as a single global use count per
> slot, as your original code does.  When you mentioned a "your allocation is
> available" callback, I was thinking it might come to that being called
> inside context switch.  It's all rather tricky, indeed.  
> 
> The obvious answer is to start simple.  If any user process anywhere uses
> drN, kwatch has to give it up for all CPUs (watchpoints with less than
> "break ptrace" priority do).  If anyone really cares about more flexibility
> than that, we can change or extend it.  Some copious comments in the
> interface descriptions can lead them in the right direction if the
> situation comes up.  Probably with systemtap support in a while, we'll get
> a lot more concrete uses of watchpoints and people finding out what really 
> matters to them.

It's still more complicated than you might think.  Let's say two user
processes each have dr1 allocated, one with low priority and the other
with high priority.  The kernel has to be aware of the high-priority
allocation, so that it can refuse intermediate-priority kwatch allocation
attempts.  Now suppose the second process exits.  dr1 is still allocated
to the first user process but only with low priority, so now
intermediate-priority kwatch allocation attempts should succeed.

In order for this to work, when the second process gives up its allocation 
I would have to either scan though all tasks to see the first process, or 
else keep several global use counts for each slot -- in fact, one use 
count for each priority level.  That's doable if there are only a few 
levels, but not if there are many.

How do you suggest this be handled?  Maybe we should just keep track of a
maximum user priority level for each slot, allowing it to go up but not
down until all user processes have given up the slot.  (I.e., in the
example above the later kwatch requests would still fail because we would
continue to remember the high user priority level so long as the first
process maintained its allocation.)  That would be overly pessimistic, but
it would at least be safe.

Alan Stern

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [GIT PATCH] ACPI patches for 2.6.21

2007-02-09 Thread Len Brown
On Friday 09 February 2007 18:09, Pavel Machek wrote:
> Hi!
> 
> > Per your request, and the request of the distros, we've changed
> > how ACPICA Core releases are integrated into Linux so that each
> > upstream (CVS) check-in appears as a single git commit.
> > While this process is not yet perfect, it should be vastly better
> > than previous "code drops" in allowing git bisect to work,
> > and allowing distros to cherry-pick individual fixes.
> > 
> > The "bay" driver is new (and marked EXPERIMENTAL) -- adding initial
> > hot-plug support for ACPI controlled drive bays such as the
> > IBM ultrabay or the Dell Module Bay.
> 
> Could you describe userland interface it uses? /proc? Will it be
> usable for bays on notebooks not using acpi?

No, Not until somebody finds one and writes code to support it.

> > The "asus-laptop" driver is also new.  Consistent with msi-laptop,
> > it uses ACPI in platform-specific ways, but strives to avoid
> > exposing ACPI-specific implementation details to the user.
> > asus-laptop is mutually exclusive with asus_acpi, which it will
> > replace over time.
> 
> Not including another /proc/acpi/ibm -like nightmare, is it?

No. See discussion on linux-acpi.
I've prohibited new files under /proc/acpi/ for quite some time now.

> > the old /proc/acpi/ interfaces with cleaner interfaces in sysfs --
> > non-ACPI-specific generic ones whenever possible.  This effort
> > is not complete, but it has been in -mm for a long time and
> > I believe that it is time to push it upstream to benefit
> > from broader exposure and testing.
> 
> Does it still include completely broken alarm interface? Can't find it
> in changelogs, so hopefully not.

No. See discussion on linux-acpi.
David Brownell's RTC driver will provide the new RTC interface in sysfs.
/proc/acpi/alarm will go away when the rest of /proc/acpi goes away.

thanks,
-Len
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: NAK new drivers without proper power management?

2007-02-09 Thread Nigel Cunningham
Hi Dmitry!

On Fri, 2007-02-09 at 22:27 -0500, Dmitry Torokhov wrote:
> Hi Nigel,
> 
> On Friday 09 February 2007 21:05, Nigel Cunningham wrote:
> > [   17.684475] Device driver serio0 lacks bus and class support for being 
> > resumed.
> > [   17.684724] Device driver serio1 lacks bus and class support for being 
> > resumed.
> > [   17.684874] Device driver psaux lacks bus and class support for being 
> > suspended or resumed.
> > [   17.685015] Device driver serio2 lacks bus and class support for being 
> > resumed.
> > [   18.373576] Device driver serio3 lacks bus and class support for being 
> > resumed.
> > [   18.375666] Device driver serio4 lacks bus and class support for being 
> > resumed.
> > 
> 
> You should probably only warn if driver does not have resume method - not
> having suspend is quite valid if driver is able to restore state at resume
> without explicitely saving anything at suspend time.

Can do. Will do :)

Nigel

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 4 of 7] lguest: Config and headers

2007-02-09 Thread James Morris
On Sat, 10 Feb 2007, Rusty Russell wrote:

> Well it was the use of get_order() which triggered Andi's alarm bells,
> so I went back to deriving it.  This code is correct, however.

+   hype_pages = alloc_pages(GFP_KERNEL|__GFP_ZERO, HYPERVISOR_MAP_ORDER);
+   if (!hype_pages)
+   return -ENOMEM;

This will try and allocate 2^16 pages.  I guess we need a 
HYPERVISOR_PAGE_ORDER ?


- James
-- 
James Morris
<[EMAIL PROTECTED]>
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: NAK new drivers without proper power management?

2007-02-09 Thread Matthew Garrett
On Sat, Feb 10, 2007 at 08:57:49AM +1100, Nigel Cunningham wrote:

> Can we start to NAK new drivers that don't have proper power management
> implemented? There really is no excuse for writing a new driver and not
> putting .suspend and .resume methods in anymore, is there?

The PCI layer is able to deal with drivers that have no PM methods in 
the most simple case.

-- 
Matthew Garrett | [EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [GIT PATCH] ACPI patches for 2.6.21

2007-02-09 Thread Matthew Garrett
On Fri, Feb 09, 2007 at 05:24:10PM -0800, Kristen Carlson Accardi wrote:

> The user interface for the Bay driver is via sysfs - it is a platform
> driver

Though, ideally, in the long run it'll be tied into the PATA/SATA 
interface that it's associated with. That involves a little more magic, 
though :)

-- 
Matthew Garrett | [EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: NAK new drivers without proper power management?

2007-02-09 Thread Dmitry Torokhov
Hi Nigel,

On Friday 09 February 2007 21:05, Nigel Cunningham wrote:
> [   17.684475] Device driver serio0 lacks bus and class support for being 
> resumed.
> [   17.684724] Device driver serio1 lacks bus and class support for being 
> resumed.
> [   17.684874] Device driver psaux lacks bus and class support for being 
> suspended or resumed.
> [   17.685015] Device driver serio2 lacks bus and class support for being 
> resumed.
> [   18.373576] Device driver serio3 lacks bus and class support for being 
> resumed.
> [   18.375666] Device driver serio4 lacks bus and class support for being 
> resumed.
> 

You should probably only warn if driver does not have resume method - not
having suspend is quite valid if driver is able to restore state at resume
without explicitely saving anything at suspend time.

-- 
Dmitry
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ANNOUNCE] d80211 based driver for Intel PRO/Wireless 3945ABG

2007-02-09 Thread Nick Kossifidis

Over the past year we were able to make the necessary changes to the
microcode used with the 3945 such that we were able to remove the
regulatory daemon.


Great news !! Congratz ;-)


--
As you read this post global entropy rises. Have Fun ;-)
Nick
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: NAK new drivers without proper power management?

2007-02-09 Thread Kevin Fox
On Fri, 2007-02-09 at 18:22 -0800, Lee Revell wrote:
> On 2/9/07, Nigel Cunningham <[EMAIL PROTECTED]> wrote:
> > On Fri, 2007-02-09 at 20:59 -0500, Lee Revell wrote:
> > > On 2/9/07, Robert Hancock <[EMAIL PROTECTED]> wrote:
> > > > I would disagree that it's a peripheral issue, it's pretty core
> these
> > > > days, at least for any hardware that you can stuff in a laptop
> (though a
> > > > fair number of desktops get suspended and resumed these days
> too).
> > >
> > > Servers are still the most important Linux market, and don't care
> > > about suspend/resume.  I would consider implementing
> suspend./resume
> > > for a driver that will only be used in server or HPC class
> hardware a
> > > waste of valuable development resources.
> >
> > Not necessarily. Imagine suspending to disk in order to replace a
> faulty
> > card. That could be way faster and less disruptive than shutting
> down
> > normally and loosing caches and so on.
> >
> 
> Hmm.  If uptime is critical I would make sure to have redundant
> systems anyway and I would just reboot the thing.  I would not expect
> the suspend/resume paths on server class hardware like 10gig ethernet,
> Infiniband adapters, or high end SCSI to be particularly well tested.

Speaking from the HPC standpoint, we are gaining more and more nodes in
clusters as time goes on, so the potential for single failures affecting
performance is growing. A lot of the server class nodes have redundancy
such that the node slows down but not die on failure. Unfortunately,
slowdown in a single node in a tightly coupled job can greatly affect
performance. A good example would be ECC memory. If a chip is going bad,
the machine can detect it but it will run slower until the memory is
replaced. This one node can affect thousands of other nodes in the same
job. Having a mechanism to migrate the operating system that is running
on this failing node to another node would be quite beneficial to
performance. If all drivers properly supported suspend/resume, it could
possibly be extended to support migration to another node as well. At
least for the HPC world, we'd like to see, and encourage, the hardware
you describe getting full support for suspend/resume.

Kevin

> > Irrespective of the above, servers tend not to have too much in the
> way
> > of hardware unique to them anyway, and even if you don't find it
> useful,
> > that's not to say others won't want it.
> 
> Yes but for such hardware, suspend/resume is likely to be a lot of
> work to implement, and I'd rather the developers devote those
> resources to making the driver as stable and performant as possible.
> 
> I agree 100% that drivers for desktop and laptop hardware should be
> rejected if missing suspend/resume.
> 
> Lee
> -
> To unsubscribe from this list: send the line "unsubscribe
> linux-kernel" in
> the body of a message to [EMAIL PROTECTED]
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 
> 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] Re: NAK new drivers without proper power management?

2007-02-09 Thread Nigel Cunningham
Hi.

On Fri, 2007-02-09 at 19:50 -0600, Robert Hancock wrote:
> It also kind of bothers me that if a driver has no suspend/resume 
> functions, and you suspend and resume the system, we don't complain 
> about it even though there's a very good chance that device is not going 
> to function properly. How about something in dmesg like:
> 
> Warning: driver for device  has no suspend or resume support.
> Device may not function properly after resume.
> 
> so that users know who to complain to. Maybe there are some devices that 
> truly don't need any handling for suspend, but if so I suspect the 
> number of those is small enough that adding empty functions would be a 
> good-enough solution.

Here's my current version of a patch to do this, if anyone wants to try
it out. It dumps stack with the warning to make it easier to see what
the source of the message is:

 drivers/base/core.c   |   25 +
 drivers/pci/pci-driver.c  |6 ++
 drivers/usb/core/driver.c |5 +
 include/linux/device.h|1 +
 4 files changed, 37 insertions(+)
diff -ruNp 920-report-no-pm-support.patch-old/drivers/base/core.c 
920-report-no-pm-support.patch-new/drivers/base/core.c
--- 920-report-no-pm-support.patch-old/drivers/base/core.c  2007-02-06 
14:48:31.0 +1100
+++ 920-report-no-pm-support.patch-new/drivers/base/core.c  2007-02-10 
13:36:33.0 +1100
@@ -552,6 +552,30 @@ int device_add(struct device *dev)
class_intf->add_dev(dev, class_intf);
up(>class->sem);
}
+
+#ifdef CONFIG_PM
+   {
+   int nosusp = 0, nores = 0;
+
+   if (!((dev->class && dev->class->suspend) ||
+ (dev->bus && (dev->bus->suspend || dev->bus->suspend_late
+   nosusp = 1;
+
+   if (!((dev->class && dev->class->resume) ||
+ (dev->bus && (dev->bus->resume || dev->bus->resume_early
+   nores = 1;
+
+   if ((nosusp || nores) && !dev->pm_safe) {
+   printk("Device driver %s lacks bus and class support for "
+   "being %s.\n",
+   kobject_name(>kobj),
+   nosusp ? (nores ? "suspended or resumed" :
+   "resumed") : "suspended");
+   dump_stack();
+   }
+   }
+#endif
+
  Done:
kfree(class_name);
put_device(dev);
@@ -851,6 +875,7 @@ struct device *device_create(struct clas
dev->class = class;
dev->parent = parent;
dev->release = device_create_release;
+   dev->pm_safe = 1;
 
va_start(args, fmt);
vsnprintf(dev->bus_id, BUS_ID_SIZE, fmt, args);
diff -ruNp 920-report-no-pm-support.patch-old/drivers/pci/pci-driver.c 
920-report-no-pm-support.patch-new/drivers/pci/pci-driver.c
--- 920-report-no-pm-support.patch-old/drivers/pci/pci-driver.c 2007-02-06 
14:48:44.0 +1100
+++ 920-report-no-pm-support.patch-new/drivers/pci/pci-driver.c 2007-02-10 
14:00:39.0 +1100
@@ -449,6 +449,12 @@ int __pci_register_driver(struct pci_dri
if (error)
driver_unregister(>driver);
 
+   if (!drv->suspend || !drv->resume)
+   printk("PCI driver %s lacks driver specific %s support.\n",
+   drv->name,
+   !drv->suspend ? (drv->resume ? "suspend" :
+   "suspend and resume") : "resume");
+
return error;
 }
 
diff -ruNp 920-report-no-pm-support.patch-old/drivers/usb/core/driver.c 
920-report-no-pm-support.patch-new/drivers/usb/core/driver.c
--- 920-report-no-pm-support.patch-old/drivers/usb/core/driver.c
2007-02-06 14:48:47.0 +1100
+++ 920-report-no-pm-support.patch-new/drivers/usb/core/driver.c
2007-02-10 12:32:57.0 +1100
@@ -709,6 +709,11 @@ int usb_register_device_driver(struct us
pr_info("%s: registered new device driver %s\n",
usbcore_name, new_udriver->name);
usbfs_update_special();
+   if (!new_udriver->suspend || !new_udriver->resume)
+   printk("USB driver %s lacks %s support.\n",
+   new_udriver->name, !new_udriver->suspend ?
+   (new_udriver->resume ? "suspend" :
+"suspend and resume") : "resume");
} else {
printk(KERN_ERR "%s: error %d registering device "
"   driver %s\n",
diff -ruNp 920-report-no-pm-support.patch-old/include/linux/device.h 
920-report-no-pm-support.patch-new/include/linux/device.h
--- 920-report-no-pm-support.patch-old/include/linux/device.h   2007-02-06 
14:48:56.0 +1100
+++ 920-report-no-pm-support.patch-new/include/linux/device.h   2007-02-10 
13:36:01.0 +1100
@@ -356,6 +356,7 @@ struct device {
struct kobject kobj;
charbus_id[BUS_ID_SIZE];/* 

Re:

2007-02-09 Thread hackmiester (Hunter Fuller)

You're doing it wrong. Please read the bottom of your emails.
On 9 February 2007, at 00:29, Priyanka Sharma wrote:


unsubscribe linux-kernel

--  
Priyanka

202.141.151.80/~priyanka
-
To unsubscribe from this list: send the line "unsubscribe linux- 
kernel" in

the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


--
hackmiester (Hunter Fuller)

 who can help me ? i'm french and i don't know irc
 can't help you with the being french part, you are screwed  
their mate





Phone
Voice: +1 251 589 6348
Fax: Call the voice number and ask.

Email
General chat: [EMAIL PROTECTED]
Large attachments: [EMAIL PROTECTED]
SPS-related stuff: [EMAIL PROTECTED]

IM
AIM: hackmiester1337
Skype: hackmiester31337
YIM: hackm1ester
Gtalk: hackmiester
MSN: [EMAIL PROTECTED]
Xfire: hackmiester






-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[patch 3/3] mm: fix PageUptodate memorder

2007-02-09 Thread Nick Piggin
After running SetPageUptodate, preceeding stores to the page contents to
actually bring it uptodate may not be ordered with the store to set the page
uptodate.

Therefore, another CPU which checks PageUptodate is true, then reads the
page contents can get stale data.

Fix this by ensuring SetPageUptodate is always called with the page locked
(except in the case of a new page that cannot be visible to other CPUs), and
requiring PageUptodate be checked only when the page is locked.

To facilitate lockless checks, SetPageUptodate contains an smp_wmb to order
preceeding stores before the store to page flags, and a new PageUptodate_NoLock
is introduced, which issues a smp_rmb after the page flags are loaded for the
test.

I'm still not sure that a DMA memory barrier is not required, however I think
the logical place to put such a barrier would be in the IO completion routines,
when they come back to tell us that they have succeeded. (Help? Anyone?)

One thing I like about it is that it unifies the anonymous page handling
with the rest of the page management, by marking anon pages as uptodate
when they _are_ uptodate, rather than when our implementation requires
that they be marked as such. Doing this let me get rid of the smp_wmb's
in the page copying functions, which were specially added for anonymous
pages for a closely related issue, always vaguely troubled me.

Convert core code and some filesystems to use PageUptodate_NoLock, just for
reference (a more complete patch follows).

Signed-off-by: Nick Piggin <[EMAIL PROTECTED]>

 fs/ext2/dir.c  |2 -
 fs/namei.c |2 -
 fs/partitions/check.c  |2 -
 fs/splice.c|4 +--
 include/linux/highmem.h|4 ---
 include/linux/page-flags.h |   57 +
 mm/filemap.c   |   28 +++---
 mm/hugetlb.c   |2 +
 mm/memory.c|9 +++
 mm/page_io.c   |2 -
 mm/swap_state.c|2 -
 mm/swapfile.c  |2 -
 12 files changed, 86 insertions(+), 30 deletions(-)

Index: linux-2.6/include/linux/highmem.h
===
--- linux-2.6.orig/include/linux/highmem.h
+++ linux-2.6/include/linux/highmem.h
@@ -57,8 +57,6 @@ static inline void clear_user_highpage(s
void *addr = kmap_atomic(page, KM_USER0);
clear_user_page(addr, vaddr, page);
kunmap_atomic(addr, KM_USER0);
-   /* Make sure this page is cleared on other CPU's too before using it */
-   smp_wmb();
 }
 
 #ifndef __HAVE_ARCH_ALLOC_ZEROED_USER_HIGHPAGE
@@ -108,8 +106,6 @@ static inline void copy_user_highpage(st
copy_user_page(vto, vfrom, vaddr, to);
kunmap_atomic(vfrom, KM_USER0);
kunmap_atomic(vto, KM_USER1);
-   /* Make sure this page is cleared on other CPU's too before using it */
-   smp_wmb();
 }
 
 #endif
Index: linux-2.6/include/linux/page-flags.h
===
--- linux-2.6.orig/include/linux/page-flags.h
+++ linux-2.6/include/linux/page-flags.h
@@ -126,16 +126,60 @@
 #define ClearPageReferenced(page)  clear_bit(PG_referenced, &(page)->flags)
 #define TestClearPageReferenced(page) test_and_clear_bit(PG_referenced, 
&(page)->flags)
 
-#define PageUptodate(page) test_bit(PG_uptodate, &(page)->flags)
-#ifdef CONFIG_S390
+static inline int PageUptodate(struct page *page)
+{
+   WARN_ON(!PageLocked(page));
+   return test_bit(PG_uptodate, &(page)->flags);
+}
+
+/*
+ * PageUptodate to be used when not holding the page lock.
+ */
+static inline int PageUptodate_NoLock(struct page *page)
+{
+   int ret = test_bit(PG_uptodate, &(page)->flags);
+
+   /*
+* Must ensure that the data we read out of the page is loaded
+* _after_ we've loaded page->flags and found that it is uptodate.
+* See SetPageUptodate() for the other side of the story.
+*/
+   if (ret)
+   smp_rmb();
+
+   return ret;
+}
+
 static inline void SetPageUptodate(struct page *page)
 {
+   WARN_ON(!PageLocked(page));
+#ifdef CONFIG_S390
if (!test_and_set_bit(PG_uptodate, >flags))
page_test_and_clear_dirty(page);
-}
 #else
-#define SetPageUptodate(page)  set_bit(PG_uptodate, &(page)->flags)
+   /*
+* Memory barrier must be issued before setting the PG_uptodate bit,
+* so all previous writes that served to bring the page uptodate are
+* visible before PageUptodate becomes true.
+*
+* S390 is guaranteed to have a barrier in the test_and_set operation
+* (see Documentation/atomic_ops.txt).
+*
+* This memory barrier should not need to provide ordering against
+* DMA writes into the page, because the IO completion should really
+* be doing that.
+*/
+   smp_wmb();
+   set_bit(PG_uptodate, &(page)->flags);
 

[PATCH] Make aout executables work again

2007-02-09 Thread Parag Warudkar


This a reworked, replacement version of 
x86-fix-vdso-mapping-for-aout-executables-* series of patches in -mm.


1) Define arch_setup_additional_pages() as weak in linux/interp.h
2) Include linux/interp.h in appropriate places
3) Conditionally call arch_setup_additional_pages() from binfmt_*.c if the 
arch defines it
4) EXPORT_SYMBOL_GPL(arch_setup_additional_pages) for all x86{64}, 
powerpc, sh - binfmt_aout can be built as module

5) Get rid of ARCH_HAS_SETUP_ADDITIONAL_PAGES from various places
6) For x86_64 - define and export arch_setup_additional_pages as a wrapper 
over syscall32_setup_pages, call it from ia32_aout.c


Fully tested on x86. (Compile, boot and run the aout binary at 
http://ftp.funet.fi/pub/Linux/bin/as86.tar.Z). Other arches - changes are 
minimal but still I'll appreciate if someone tests them.


Signed-off-by: Parag Warudkar <[EMAIL PROTECTED]>

diff -urN --exclude='*git*' --exclude='scripts*' 
linux-2.6-us/arch/i386/kernel/sysenter.c 
linux-2.6-wk/arch/i386/kernel/sysenter.c
--- linux-2.6-us/arch/i386/kernel/sysenter.c2007-02-09 17:29:34.0 
-0500
+++ linux-2.6-wk/arch/i386/kernel/sysenter.c2007-02-09 17:54:48.0 
-0500
@@ -137,6 +137,7 @@
up_write(>mmap_sem);
return ret;
 }
+EXPORT_SYMBOL_GPL(arch_setup_additional_pages);

 const char *arch_vma_name(struct vm_area_struct *vma)
 {
diff -urN --exclude='*git*' --exclude='scripts*' 
linux-2.6-us/arch/powerpc/kernel/vdso.c linux-2.6-wk/arch/powerpc/kernel/vdso.c
--- linux-2.6-us/arch/powerpc/kernel/vdso.c 2007-02-09 17:29:34.0 
-0500
+++ linux-2.6-wk/arch/powerpc/kernel/vdso.c 2007-02-09 18:02:09.0 
-0500
@@ -254,6 +254,7 @@
up_write(>mmap_sem);
return rc;
 }
+EXPORT_SYMBOL_GPL(arch_setup_additional_pages);

 const char *arch_vma_name(struct vm_area_struct *vma)
 {
diff -urN --exclude='*git*' --exclude='scripts*' 
linux-2.6-us/arch/sh/kernel/vsyscall/vsyscall.c 
linux-2.6-wk/arch/sh/kernel/vsyscall/vsyscall.c
--- linux-2.6-us/arch/sh/kernel/vsyscall/vsyscall.c 2007-02-09 
17:29:34.0 -0500
+++ linux-2.6-wk/arch/sh/kernel/vsyscall/vsyscall.c 2007-02-09 
18:02:51.0 -0500
@@ -85,6 +85,7 @@
up_write(>mmap_sem);
return ret;
 }
+EXPORT_SYMBOL_GPL(arch_setup_additional_pages);

 const char *arch_vma_name(struct vm_area_struct *vma)
 {
diff -urN --exclude='*git*' --exclude='scripts*' 
linux-2.6-us/arch/x86_64/ia32/ia32_aout.c 
linux-2.6-wk/arch/x86_64/ia32/ia32_aout.c
--- linux-2.6-us/arch/x86_64/ia32/ia32_aout.c   2007-01-26 18:49:37.0 
-0500
+++ linux-2.6-wk/arch/x86_64/ia32/ia32_aout.c   2007-02-09 20:29:01.0 
-0500
@@ -23,6 +23,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 

@@ -410,6 +411,12 @@
send_sig(SIGKILL, current, 0);
return retval;
}
+ 
+	retval = arch_setup_additional_pages(bprm, EXSTACK_DEFAULT);
+	if (retval < 0) { 
+		send_sig(SIGKILL, current, 0); 
+		return retval;

+   }

current->mm->start_stack =
(unsigned long)create_aout_tables((char __user *)bprm->p, bprm);
diff -urN --exclude='*git*' --exclude='scripts*' 
linux-2.6-us/arch/x86_64/ia32/ia32_binfmt.c 
linux-2.6-wk/arch/x86_64/ia32/ia32_binfmt.c
--- linux-2.6-us/arch/x86_64/ia32/ia32_binfmt.c 2007-01-27 17:23:08.0 
-0500
+++ linux-2.6-wk/arch/x86_64/ia32/ia32_binfmt.c 2007-02-09 17:58:42.0 
-0500
@@ -258,10 +258,6 @@

 static void elf32_init(struct pt_regs *);

-#define ARCH_HAS_SETUP_ADDITIONAL_PAGES 1
-#define arch_setup_additional_pages syscall32_setup_pages
-extern int syscall32_setup_pages(struct linux_binprm *, int exstack);
-
 #include "../../../fs/binfmt_elf.c"

 static void elf32_init(struct pt_regs *regs)
diff -urN --exclude='*git*' --exclude='scripts*' 
linux-2.6-us/arch/x86_64/ia32/syscall32.c 
linux-2.6-wk/arch/x86_64/ia32/syscall32.c
--- linux-2.6-us/arch/x86_64/ia32/syscall32.c   2007-02-09 17:29:34.0 
-0500
+++ linux-2.6-wk/arch/x86_64/ia32/syscall32.c   2007-02-09 18:01:23.0 
-0500
@@ -48,6 +48,12 @@
return ret;
 }

+int arch_setup_additional_pages(struct linux_binprm* bprm, int exstack)
+{
+   return syscall32_setup_pages(bprm, exstack);
+}
+EXPORT_SYMBOL_GPL(arch_setup_additional_pages);
+
 const char *arch_vma_name(struct vm_area_struct *vma)
 {
if (vma->vm_start == VSYSCALL32_BASE &&
diff -urN --exclude='*git*' --exclude='scripts*' linux-2.6-us/fs/binfmt_aout.c 
linux-2.6-wk/fs/binfmt_aout.c
--- linux-2.6-us/fs/binfmt_aout.c   2007-01-26 18:49:39.0 -0500
+++ linux-2.6-wk/fs/binfmt_aout.c   2007-02-09 17:53:33.0 -0500
@@ -22,6 +22,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 

@@ -445,6 +446,14 @@
send_sig(SIGKILL, current, 0);
return retval;
}
+ 
+	if(arch_setup_additional_pages) {

+   retval = arch_setup_additional_pages(bprm, EXSTACK_DEFAULT);
+		if (retval < 0) { 
+			

[patch 2/3] fs: buffer don't PageUptodate without page locked

2007-02-09 Thread Nick Piggin
__block_write_full_page is calling SetPageUptodate without the page locked.
This is unusual, but not incorrect, as PG_writeback is still set.

However the next patch will require that SetPageUptodate always be called
with the page locked. Simply don't bother setting the page uptodate in this
case (it is unusual that the write path does such a thing anyway). Instead
just leave it to the read side to bring the page uptodate when it notices
that all buffers are uptodate.

Signed-off-by: Nick Piggin <[EMAIL PROTECTED]>

 fs/buffer.c |   11 +--
 1 file changed, 1 insertion(+), 10 deletions(-)

Index: linux-2.6/fs/buffer.c
===
--- linux-2.6.orig/fs/buffer.c
+++ linux-2.6/fs/buffer.c
@@ -1698,17 +1698,8 @@ done:
 * clean.  Someone wrote them back by hand with
 * ll_rw_block/submit_bh.  A rare case.
 */
-   int uptodate = 1;
-   do {
-   if (!buffer_uptodate(bh)) {
-   uptodate = 0;
-   break;
-   }
-   bh = bh->b_this_page;
-   } while (bh != head);
-   if (uptodate)
-   SetPageUptodate(page);
end_page_writeback(page);
+
/*
 * The page and buffer_heads can be released at any time from
 * here on.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[patch 1/3] mm: make read_cache_page synchronous

2007-02-09 Thread Nick Piggin
Ensure pages are uptodate after returning from read_cache_page, which allows
us to cut out most of the filesystem-internal PageUptodate calls.

I didn't have a great look down the call chains, but this appears to fixes 7
possible use-before uptodate in hfs, 2 in hfsplus, 1 in jfs, a few in ecryptfs,
1 in jffs2, and a possible cleared data overwritten with readpage in block2mtd.
All depending on whether the filler is async and/or can return with a !uptodate
page.

Signed-off-by: Nick Piggin <[EMAIL PROTECTED]>

 drivers/mtd/devices/block2mtd.c |3 --
 fs/afs/dir.c|3 --
 fs/afs/mntpt.c  |   11 +++--
 fs/cramfs/inode.c   |3 +-
 fs/ecryptfs/mmap.c  |   10 
 fs/ext2/dir.c   |3 --
 fs/freevxfs/vxfs_subr.c |3 --
 fs/minix/dir.c  |1 
 fs/namei.c  |   12 --
 fs/nfs/dir.c|5 
 fs/nfs/symlink.c|6 -
 fs/ntfs/aops.h  |3 --
 fs/ntfs/attrib.c|   18 +--
 fs/ntfs/file.c  |3 --
 fs/ntfs/super.c |   30 +++--
 fs/ocfs2/symlink.c  |7 -
 fs/partitions/check.c   |3 --
 fs/reiserfs/xattr.c |4 ---
 fs/sysv/dir.c   |   10 
 fs/ufs/dir.c|6 -
 fs/ufs/util.c   |6 +
 include/linux/pagemap.h |   11 +
 mm/filemap.c|   48 ++--
 mm/swapfile.c   |3 --
 24 files changed, 68 insertions(+), 144 deletions(-)

Index: linux-2.6/fs/afs/dir.c
===
--- linux-2.6.orig/fs/afs/dir.c
+++ linux-2.6/fs/afs/dir.c
@@ -187,10 +187,7 @@ static struct page *afs_dir_get_page(str
 
page = read_mapping_page(dir->i_mapping, index, NULL);
if (!IS_ERR(page)) {
-   wait_on_page_locked(page);
kmap(page);
-   if (!PageUptodate(page))
-   goto fail;
if (!PageChecked(page))
afs_dir_check_page(dir, page);
if (PageError(page))
Index: linux-2.6/fs/afs/mntpt.c
===
--- linux-2.6.orig/fs/afs/mntpt.c
+++ linux-2.6/fs/afs/mntpt.c
@@ -77,13 +77,11 @@ int afs_mntpt_check_symlink(struct afs_v
}
 
ret = -EIO;
-   wait_on_page_locked(page);
-   buf = kmap(page);
-   if (!PageUptodate(page))
-   goto out_free;
if (PageError(page))
goto out_free;
 
+   buf = kmap(page);
+
/* examine the symlink's contents */
size = vnode->status.size;
_debug("symlink to %*.*s", size, (int) size, buf);
@@ -100,8 +98,8 @@ int afs_mntpt_check_symlink(struct afs_v
 
ret = 0;
 
- out_free:
kunmap(page);
+ out_free:
page_cache_release(page);
  out:
_leave(" = %d", ret);
@@ -184,8 +182,7 @@ static struct vfsmount *afs_mntpt_do_aut
}
 
ret = -EIO;
-   wait_on_page_locked(page);
-   if (!PageUptodate(page) || PageError(page))
+   if (PageError(page))
goto error;
 
buf = kmap(page);
Index: linux-2.6/fs/cramfs/inode.c
===
--- linux-2.6.orig/fs/cramfs/inode.c
+++ linux-2.6/fs/cramfs/inode.c
@@ -180,7 +180,8 @@ static void *cramfs_read(struct super_bl
struct page *page = NULL;
 
if (blocknr + i < devsize) {
-   page = read_mapping_page(mapping, blocknr + i, NULL);
+   page = read_mapping_page_async(mapping, blocknr + i,
+   NULL);
/* synchronous error? */
if (IS_ERR(page))
page = NULL;
Index: linux-2.6/fs/ext2/dir.c
===
--- linux-2.6.orig/fs/ext2/dir.c
+++ linux-2.6/fs/ext2/dir.c
@@ -161,10 +161,7 @@ static struct page * ext2_get_page(struc
struct address_space *mapping = dir->i_mapping;
struct page *page = read_mapping_page(mapping, n, NULL);
if (!IS_ERR(page)) {
-   wait_on_page_locked(page);
kmap(page);
-   if (!PageUptodate(page))
-   goto fail;
if (!PageChecked(page))
ext2_check_page(page);
if (PageError(page))
Index: linux-2.6/fs/freevxfs/vxfs_subr.c
===
--- linux-2.6.orig/fs/freevxfs/vxfs_subr.c
+++ linux-2.6/fs/freevxfs/vxfs_subr.c
@@ -74,10 +74,7 @@ vxfs_get_page(struct address_space *mapp
pp = 

[patch 0/3] 2.6.20 fix for PageUptodate memorder problem (try 3)

2007-02-09 Thread Nick Piggin
OK, I have got rid of SetPageUptodate_nowarn, and removed the atomic op
from SetNewPageUptodate. Made PageUptodate_NoLock only issue the memory
barrier is the page was uptodate (hopefully the compiler can thread the
branch into the caller's branch).

SetNewPageUptodate does not do the S390 page_test_and_clear_dirty, so
I'd like to make sure that's OK.

Rearranged the patch series so we don't have the first patch introducing
a lot of WARN_ONs that are solved in the next two patches (rather, solve
those issues first).

Thanks,
Nick

--
SuSE Labs

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: NAK new drivers without proper power management?

2007-02-09 Thread Lee Revell

On 2/9/07, Nigel Cunningham <[EMAIL PROTECTED]> wrote:

On Fri, 2007-02-09 at 20:59 -0500, Lee Revell wrote:
> On 2/9/07, Robert Hancock <[EMAIL PROTECTED]> wrote:
> > I would disagree that it's a peripheral issue, it's pretty core these
> > days, at least for any hardware that you can stuff in a laptop (though a
> > fair number of desktops get suspended and resumed these days too).
>
> Servers are still the most important Linux market, and don't care
> about suspend/resume.  I would consider implementing suspend./resume
> for a driver that will only be used in server or HPC class hardware a
> waste of valuable development resources.

Not necessarily. Imagine suspending to disk in order to replace a faulty
card. That could be way faster and less disruptive than shutting down
normally and loosing caches and so on.



Hmm.  If uptime is critical I would make sure to have redundant
systems anyway and I would just reboot the thing.  I would not expect
the suspend/resume paths on server class hardware like 10gig ethernet,
Infiniband adapters, or high end SCSI to be particularly well tested.


Irrespective of the above, servers tend not to have too much in the way
of hardware unique to them anyway, and even if you don't find it useful,
that's not to say others won't want it.


Yes but for such hardware, suspend/resume is likely to be a lot of
work to implement, and I'd rather the developers devote those
resources to making the driver as stable and performant as possible.

I agree 100% that drivers for desktop and laptop hardware should be
rejected if missing suspend/resume.

Lee
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: NAK new drivers without proper power management?

2007-02-09 Thread Nigel Cunningham
Hi.

On Fri, 2007-02-09 at 20:59 -0500, Lee Revell wrote:
> On 2/9/07, Robert Hancock <[EMAIL PROTECTED]> wrote:
> > I would disagree that it's a peripheral issue, it's pretty core these
> > days, at least for any hardware that you can stuff in a laptop (though a
> > fair number of desktops get suspended and resumed these days too).
> 
> Servers are still the most important Linux market, and don't care
> about suspend/resume.  I would consider implementing suspend./resume
> for a driver that will only be used in server or HPC class hardware a
> waste of valuable development resources.

Not necessarily. Imagine suspending to disk in order to replace a faulty
card. That could be way faster and less disruptive than shutting down
normally and loosing caches and so on.

Irrespective of the above, servers tend not to have too much in the way
of hardware unique to them anyway, and even if you don't find it useful,
that's not to say others won't want it.

Regards,

Nigel

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: NAK new drivers without proper power management?

2007-02-09 Thread Nigel Cunningham
Hi.

On Fri, 2007-02-09 at 19:50 -0600, Robert Hancock wrote:
> Jeff Garzik wrote:
> > Nigel Cunningham wrote:
> >> Hi.
> >>
> >> On Fri, 2007-02-09 at 23:17 +0100, Arjan van de Ven wrote:
> >>> On Sat, 2007-02-10 at 08:57 +1100, Nigel Cunningham wrote:
>  Hi.
> 
>  I don't think this is already done (feel free to correct me if I'm
>  wrong)..
> 
>  Can we start to NAK new drivers that don't have proper power management
>  implemented? There really is no excuse for writing a new driver and not
>  putting .suspend and .resume methods in anymore, is there?
> >>>
> >>> to a large degree, a device driver that doesn't suspend is better than
> >>> no device driver at all, right?
> >>
> >> I'm not sure it is. It only makes more work for everyone else: We have
> >> to help people figure out what causes their computer to fail to resume
> >> (which can take quite a while), then get them them complain to driver
> >> author, and the driver author has to submit patches to fix it.
> >>
> >> All of this is avoided if they'll just do it right in the first place.
> > 
> > A lot of a lot of things could have been avoided, if they just did it 
> > right the first time.
> > 
> > I think it's more valuable to users to get a basic network driver that 
> > pings or a basic ATA driver that reads/writes, than peripheral issues 
> > like suspend/resume.
> > 
> > Certainly we should ask for it, but it shouldn't be a merge-stopper.
> > 
> > Jeff
> 
> I would disagree that it's a peripheral issue, it's pretty core these 
> days, at least for any hardware that you can stuff in a laptop (though a 
> fair number of desktops get suspended and resumed these days too). One 
> driver on a system which doesn't suspend or resume properly can ruin the 
> entire process, causing a ton of user frustration. Certainly I would 
> consider a driver without suspend/resume support to be incomplete.
> 
> The trouble with deferring adding this support is that it's a lot harder 
> to add this support in after the fact than if it was considered during 
> the original driver development.
> 
> I would be in favor of not merging drivers lacking suspend unless 
> there's a very good reason they're lacking it.
> 
> It also kind of bothers me that if a driver has no suspend/resume 
> functions, and you suspend and resume the system, we don't complain 
> about it even though there's a very good chance that device is not going 
> to function properly. How about something in dmesg like:
> 
> Warning: driver for device  has no suspend or resume support.
> Device may not function properly after resume.
> 
> so that users know who to complain to. Maybe there are some devices that 
> truly don't need any handling for suspend, but if so I suspect the 
> number of those is small enough that adding empty functions would be a 
> good-enough solution.

I've already made a start on doing just that. Rafael was clearly right
in asserting that some drivers would need to have warnings suppressed,
but that can be dealt with (see below).

Even if no-one wants it for vanilla, I think I'll put this in Suspend2.
It will at least help my users with debugging issues.

Regards,

Nigel

[   14.936667] Device driver platform lacks bus and class support for being 
suspended or resumed.
[   14.937612] Device driver vtcon0 lacks bus and class support for being 
suspended or resumed.
[   14.955258] Device driver pci:00 lacks bus and class support for being 
suspended or resumed.
[   15.004268] Device driver pnp0 lacks bus and class support for being 
suspended or resumed.
[   15.010618] Device driver mem lacks bus and class support for being 
suspended or resumed.
[   15.010779] Device driver kmem lacks bus and class support for being 
suspended or resumed.
[   15.010932] Device driver null lacks bus and class support for being 
suspended or resumed.
[   15.011090] Device driver port lacks bus and class support for being 
suspended or resumed.
[   15.011248] Device driver zero lacks bus and class support for being 
suspended or resumed.
[   15.011414] Device driver full lacks bus and class support for being 
suspended or resumed.
[   15.011566] Device driver random lacks bus and class support for being 
suspended or resumed.
[   15.011723] Device driver urandom lacks bus and class support for being 
suspended or resumed.
[   15.011875] Device driver kmsg lacks bus and class support for being 
suspended or resumed.
[   15.305495] Device driver mcelog lacks bus and class support for being 
suspended or resumed.
[   15.305688] Device driver msr0 lacks bus and class support for being 
suspended or resumed.
[   15.306571] Device driver snapshot lacks bus and class support for being 
suspended or resumed.
[   15.359006] Device driver fb0 lacks bus and class support for being 
suspended or resumed.
[   15.359471] Device driver vtcon1 lacks bus and class support for being 
suspended or resumed.
[   15.455642] Device driver tty lacks bus and class support 

Re: NAK new drivers without proper power management?

2007-02-09 Thread Lee Revell

On 2/9/07, Robert Hancock <[EMAIL PROTECTED]> wrote:

I would disagree that it's a peripheral issue, it's pretty core these
days, at least for any hardware that you can stuff in a laptop (though a
fair number of desktops get suspended and resumed these days too).


Servers are still the most important Linux market, and don't care
about suspend/resume.  I would consider implementing suspend./resume
for a driver that will only be used in server or HPC class hardware a
waste of valuable development resources.

Lee
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [GIT PATCH] ACPI patches for 2.6.21

2007-02-09 Thread Henrique de Moraes Holschuh
On Fri, 09 Feb 2007, Pavel Machek wrote:
> Not including another /proc/acpi/ibm -like nightmare, is it?

Don't worry, I am already on my way to kill /proc/acpi/ibm... :-)

-- 
  "One disk to rule them all, One disk to find them. One disk to bring
  them all and in the darkness grind them. In the Land of Redmond
  where the shadows lie." -- The Silicon Valley Tarot
  Henrique Holschuh
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: -mm merge plans for 2.6.21

2007-02-09 Thread Oleg Verych
> From: Russell King
> Newsgroups: gmane.linux.kernel
> Subject: Re: -mm merge plans for 2.6.21
> Date: Fri, 9 Feb 2007 22:03:27 +
[]
> However:
>
> sys_foo(int a, int c, unsigned long long b, unsigned long long d)
>
> is entirely reasonable and leaves us with spare room for one additional
> 32-bit arg to be passed.
>
>> Is that actually written anywhere, and does anyone bother to check?
>
> Mostly mailing list archives I'd guess.  As far as anyone bothering
> to check, that's me when I'm aware of new syscalls... which typically
> happens a long time after the syscalls have been introduced on x86
> etc.

Why not to have "the most large argument first" rule here?

  sys_bar(largest,..., larger,..., smaller,..., small);

Put it in Documentation/ABI/README and bother only, when compiller
will bark on -mm tree.


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] kfifo: overflow of unsigned integer

2007-02-09 Thread Cong WANG

2007/2/9, Andrew Morton <[EMAIL PROTECTED]>:

On Thu, 8 Feb 2007 20:16:55 +0800 "Cong WANG" <[EMAIL PROTECTED]> wrote:

> 2007/2/8, Andrew Morton <[EMAIL PROTECTED]>:
> > On Thu, 8 Feb 2007 17:07:28 +0800 "Cong WANG" <[EMAIL PROTECTED]> wrote:
> >
> > > Kfifo is a ring-buffer in kernel which can be used as a lock-free way
> > > for concurrent read/write when there are only one producer and one
> > > consumer. Details of its design can be found in kernel/kfifo.c and
> > > include/linux/kfifo.h.
> > >
> > > You will find that the 'in' and 'out' fields of 'struct kfifo' are
> > > both represented as 'unsigned int' and in most cases 'in' is larger
> > > than 'out' and their difference will NOT be  over 'size'.
> > >
> > > Now the problem is that 'in' will be *smaller* than 'out' when 'in'
> > > overflows and 'out' doesn't (Yes, this may occur quietly.). This is
> > > NOT what we expect, though it may not cause any serious problems if we
> > > carefully use kfifo*() functions. And this is really a bug.
> >
> > You seem to be saying that it's not a bug, but it's a bug.
> >
> > Exactly what goes wrong?
>
> I wrote a module on my machine to test this bug. And when the overflow
> occurs, I cann't put any data into the fifo even though it is not
> full.

Why did you remove the mailing list?  Please don't do that.


Sorry. I used the poor 'reply'.



I can't find any bug.

I converted the code so that it'll run in userspace:

http://userweb.kernel.org/~akpm/kfifo.c
http://userweb.kernel.org/~akpm/kfifo.h

Please see if you can reproduce the problem with that setup and then let's
see if we can understand what's going on, and fix it.




Thanks for your work. And you are right.

I think the OLD /proc API which I used in my module confused my eyes.
I got completely lost by that. OLD /proc API is very bad, isn't it?

BTW, can you tell me which way do you use to exchange information
between user-space and kernel-space when debugging the kernel?

Thanks again! And have a nice day!
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: NAK new drivers without proper power management?

2007-02-09 Thread Robert Hancock

Jeff Garzik wrote:

Nigel Cunningham wrote:

Hi.

On Fri, 2007-02-09 at 23:17 +0100, Arjan van de Ven wrote:

On Sat, 2007-02-10 at 08:57 +1100, Nigel Cunningham wrote:

Hi.

I don't think this is already done (feel free to correct me if I'm
wrong)..

Can we start to NAK new drivers that don't have proper power management
implemented? There really is no excuse for writing a new driver and not
putting .suspend and .resume methods in anymore, is there?


to a large degree, a device driver that doesn't suspend is better than
no device driver at all, right?


I'm not sure it is. It only makes more work for everyone else: We have
to help people figure out what causes their computer to fail to resume
(which can take quite a while), then get them them complain to driver
author, and the driver author has to submit patches to fix it.

All of this is avoided if they'll just do it right in the first place.


A lot of a lot of things could have been avoided, if they just did it 
right the first time.


I think it's more valuable to users to get a basic network driver that 
pings or a basic ATA driver that reads/writes, than peripheral issues 
like suspend/resume.


Certainly we should ask for it, but it shouldn't be a merge-stopper.

Jeff


I would disagree that it's a peripheral issue, it's pretty core these 
days, at least for any hardware that you can stuff in a laptop (though a 
fair number of desktops get suspended and resumed these days too). One 
driver on a system which doesn't suspend or resume properly can ruin the 
entire process, causing a ton of user frustration. Certainly I would 
consider a driver without suspend/resume support to be incomplete.


The trouble with deferring adding this support is that it's a lot harder 
to add this support in after the fact than if it was considered during 
the original driver development.


I would be in favor of not merging drivers lacking suspend unless 
there's a very good reason they're lacking it.


It also kind of bothers me that if a driver has no suspend/resume 
functions, and you suspend and resume the system, we don't complain 
about it even though there's a very good chance that device is not going 
to function properly. How about something in dmesg like:


Warning: driver for device  has no suspend or resume support.
Device may not function properly after resume.

so that users know who to complain to. Maybe there are some devices that 
truly don't need any handling for suspend, but if so I suspect the 
number of those is small enough that adding empty functions would be a 
good-enough solution.


--
Robert Hancock  Saskatoon, SK, Canada
To email, remove "nospam" from [EMAIL PROTECTED]
Home Page: http://www.roberthancock.com/

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 3/3] ext2: use perform_write aop

2007-02-09 Thread Andrew Morton
On Sat, 10 Feb 2007 02:34:07 +0100
Nick Piggin <[EMAIL PROTECTED]> wrote:

> On Fri, Feb 09, 2007 at 11:45:39AM -0800, Andrew Morton wrote:
> > On Fri, 9 Feb 2007 11:14:55 -0800 Andrew Morton <[EMAIL PROTECTED]> wrote:
> > 
> > > If so, that might be preventable by leaving the buffer nonuptodate.
> > 
> > oh, OK, it was buffer_new(), so zeroes are the right thing for a reader to
> > see.
> > 
> > But if it wasn't buffer_new() then the appropriate thing for the reader to
> > see is what's on the disk.  But __block_prepare_write() won't read a buffer
> > which is fully-inside the write area from disk.
> > 
> > And that's seemingly OK, because if a reader gets in there after the short
> > copy, that reader will see the non-uptodate buffer and will populate it
> > from disk.
> > 
> > But doing that will overwrite the data which the write() caller managed to
> > copy into the page before it took a fault.  And that's not OK because
> > block_perform_write() does iovec_iterator_advance(i, copied) in this case
> > and hence will not rerun the copy after acquiring the page lock?
> 
> Hmm, yeah. This can be handled by not advancing partially into a !uptodate
> buffer.

Think so, yeah.

Overall, the implementation you have there seems reasonable to me. 
Basically it's passing the responsibility for preventing the deadlock and
the exposure-of-zeroes problem down into the filesystem itself, where we
have visibility of the state of the various subsections of the page and can
take appropriate actions in response to that.

It's got conceptually harder to follow as a result, which is a shame.  But
still no magic bullet is on offer.

I pity the poor schmuck who has to write ext3_journalled_perform_write(),
ext3_ordered_perform_write(), ext3_writeback_perform_write(),
ext3_writeback_nobh_perform_write() and all that other stuff.  But I think
we need to do that pretty soon to validate the whole approach.  Also xfs
and reiser3.

NTFS will be interesting from the can-this-be-made-to-work POV.

Is NFS vulnerable to the deadlock?  It looks to be.  Shudder.

We'd need to find a way of communicating all this to the poor old
fs maintainers.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH -mm] readahead: partial sendfile fix

2007-02-09 Thread Fengguang Wu
Enable readahead to handle partially done read requests, e.g.

sendfile(188, 1921, [1478592], 19553028) = 37440
sendfile(188, 1921, [1516032], 19515588) = 28800
sendfile(188, 1921, [1544832], 19486788) = 37440
sendfile(188, 1921, [1582272], 19449348) = 14400
sendfile(188, 1921, [1596672], 19434948) = 37440
sendfile(188, 1921, [1634112], 19397508) = 37440

In the above strace log,
- some lighttpd is doing _sequential_ reading
- every sendfile() returns with only _partial_ work done

page_cache_readahead() expects that if it returns @next_index, it will be
called exactly at @next_index next time. That's not true here. So the pattern
will be falsely recognized as a random read trace.

Also documented in "Linux AIO Performance and Robustness for Enterprise
Workloads" section 3.5:

  sendfile(fd, 0, 2GB, fd2) = 8192,
tells readahead about up to 128KB of the read
  sendfile(fd, 8192, 2GB - 8192, fd2) = 8192,
tells readahead about 8KB - 132KB of the read
  sendfile(fd, 16384, 2GB - 16384, fd2) = 8192,
tells readahead about 16KB-140KB of the read
   ...
This confuses the readahead logic about the I/O pattern which appears
to be 0-128K, 8K-132K, 16K-140K instead of clear sequentiality from
0-2GB that is really appropriate.

Retry based AIO shares the same read pattern and readahead problem.
In this case, simply disabling readahead on restarted aio is not a good option:
we still need to call into readahead in the rare case of (req_size > ra_max).

Signed-off-by: Fengguang Wu <[EMAIL PROTECTED]>
---
 mm/filemap.c   |3 ---
 mm/readahead.c |9 +
 2 files changed, 9 insertions(+), 3 deletions(-)

--- linux-2.6.20-rc6-mm3.orig/mm/readahead.c
+++ linux-2.6.20-rc6-mm3/mm/readahead.c
@@ -581,6 +581,15 @@ page_cache_readahead(struct address_spac
int sequential;
 
/*
+* A previous read request is partially completed,
+* causing the retried/continued read calls into us prematurely.
+*/
+   if (ra->start < offset &&
+   offset < ra->prev_page &&
+ra->prev_page < ra->ahead_start + 
ra->ahead_size)
+   goto out;
+
+   /*
 * We avoid doing extra work and bogusly perturbing the readahead
 * window expansion logic.
 */
--- linux-2.6.20-rc6-mm3.orig/mm/filemap.c
+++ linux-2.6.20-rc6-mm3/mm/filemap.c
@@ -915,9 +915,6 @@ void do_generic_mapping_read(struct addr
if (!isize)
goto out;
 
-   if (unlikely(aio_restarted()))
-   next_index = last_index; /* Avoid repeat readahead */
-
end_index = (isize - 1) >> PAGE_CACHE_SHIFT;
for (;;) {
struct page *page;
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 3/3] ext2: use perform_write aop

2007-02-09 Thread Nick Piggin
On Fri, Feb 09, 2007 at 11:45:39AM -0800, Andrew Morton wrote:
> On Fri, 9 Feb 2007 11:14:55 -0800 Andrew Morton <[EMAIL PROTECTED]> wrote:
> 
> > If so, that might be preventable by leaving the buffer nonuptodate.
> 
> oh, OK, it was buffer_new(), so zeroes are the right thing for a reader to
> see.
> 
> But if it wasn't buffer_new() then the appropriate thing for the reader to
> see is what's on the disk.  But __block_prepare_write() won't read a buffer
> which is fully-inside the write area from disk.
> 
> And that's seemingly OK, because if a reader gets in there after the short
> copy, that reader will see the non-uptodate buffer and will populate it
> from disk.
> 
> But doing that will overwrite the data which the write() caller managed to
> copy into the page before it took a fault.  And that's not OK because
> block_perform_write() does iovec_iterator_advance(i, copied) in this case
> and hence will not rerun the copy after acquiring the page lock?

Hmm, yeah. This can be handled by not advancing partially into a !uptodate
buffer.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: sata_nv - ADMA issues with 2.6.20

2007-02-09 Thread Robert Hancock

David R wrote:

I've just upgraded my home server to 2.6.20. It's got an Athlon64 on an ASUS
nForce-4 motherboard running a 32 bit kernel. I've had to fall back to using
sata_nv.adma=0 on the kernel command line. One of the NCQ capable drives
repeatedly produced the following errors. There wasn't much disk IO going on
at the time. It's perfectly happy now with ADMA disabled. Strange thing is the
other identical drive ata8 showed no problems (they're both part of a software
raid1)

Some clues follow.

Cheers
David


Feb  9 18:40:27 server kernel: ata7: EH in ADMA mode, notifier 0x0 
notifier_error 0x0 gen_ctl 0x1501000 status 0x400
Feb  9 18:40:27 server kernel: ata7: CPB 0: ctl_flags 0x1f, resp_flags 0x0
Feb  9 18:40:27 server kernel: ata7: CPB 1: ctl_flags 0x1f, resp_flags 0x1
Feb  9 18:40:27 server kernel: ata7: CPB 2: ctl_flags 0x1f, resp_flags 0x1
Feb  9 18:40:27 server kernel: ata7: CPB 3: ctl_flags 0x1f, resp_flags 0x1

etc etc..

Feb  9 18:40:29 server kernel: ata7: CPB 27: ctl_flags 0x1f, resp_flags 0x1
Feb  9 18:40:29 server kernel: ata7: CPB 28: ctl_flags 0x1f, resp_flags 0x1
Feb  9 18:40:29 server kernel: ata7: CPB 29: ctl_flags 0x1f, resp_flags 0x1
Feb  9 18:40:29 server kernel: ata7: CPB 30: ctl_flags 0x1f, resp_flags 0x1
Feb  9 18:40:29 server kernel: ata7: Resetting port
Feb  9 18:40:29 server kernel: ata7.00: exception Emask 0x0 SAct 0x1 SErr 0x0 
action 0x2 frozen
Feb  9 18:40:29 server kernel: ata7.00: cmd 61/08:00:1f:e4:50/00:00:09:00:00/40 
tag 0 cdb 0x0 data 4096 out
Feb  9 18:40:29 server kernel:  res 40/00:00:00:00:00/00:00:00:00:00/00 
Emask 0x4 (timeout)


So it was tag 0 that timed out , and according to the CPBs the 
controller indeed believes the command is still outstanding, i.e. we 
didn't lose an interrupt. I'm suspicious of the fact that only one of 
two identical drives produced this error.. some kind of hardware-related 
problem perhaps? 30 seconds is an awfully long time for a drive to take 
to finish a command.


You can also try disabling NCQ without disabling ADMA and see what that 
does:


echo 1 > /sys/block/sdX/device/queue_depth

--
Robert Hancock  Saskatoon, SK, Canada
To email, remove "nospam" from [EMAIL PROTECTED]
Home Page: http://www.roberthancock.com/

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: -mm merge plans for 2.6.21

2007-02-09 Thread Andrew Morton
On Sat, 10 Feb 2007 02:15:11 +0100
Carl-Daniel Hailfinger <[EMAIL PROTECTED]> wrote:

> Andrew Morton wrote:
> > On Fri, 9 Feb 2007 19:37:53 +
> > Alan <[EMAIL PROTECTED]> wrote:
> > 
> >> Please just push the EDAC K8 stuff.
> > 
> > OK.
> > 
> >> Andi will say "no" from now until the
> >> end of time, but end users want it, distributions want it, and Andi is
> >> not the EDAC maintainer so should consider himself overruled on what
> >> isn't a technical issue but a personal political viewpoint.
> > 
> > I'll just tell him I sent it by accident.
> 
> Could you please merge ACPI-DSDT-in-initrd for the same reasons?
> 

I don't know what that is.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


arch/arm: typos in KERN_ERR, KERN_INFO

2007-02-09 Thread Nicolas Kaiser
Typos in KERN_ERR, KERN_INFO.

Signed-off-by: Nicolas Kaiser <[EMAIL PROTECTED]>
---

 arch/arm/mach-imx/dma.c   |2 +-
 arch/arm/mach-s3c2410/pm-simtec.c |2 +-
 arch/arm/plat-omap/dma.c  |2 +-
 3 files changed, 3 insertions(+), 3 deletions(-)

diff -ur a/arch/arm/mach-imx/dma.c b/arch/arm/mach-imx/dma.c
--- a/arch/arm/mach-imx/dma.c   2006-11-29 22:57:37.0 +0100
+++ b/arch/arm/mach-imx/dma.c   2007-02-09 23:42:15.0 +0100
@@ -234,7 +234,7 @@
imxdma->resbytes = dma_length;
 
if (!sg || !sgcount) {
-   printk(KERN_ERR "imxdma%d: imx_dma_setup_sg epty sg list\n",
+   printk(KERN_ERR "imxdma%d: imx_dma_setup_sg empty sg list\n",
   dma_ch);
return -EINVAL;
}
diff -ur a/arch/arm/mach-s3c2410/pm-simtec.c b/arch/arm/mach-s3c2410/pm-simtec.c
--- a/arch/arm/mach-s3c2410/pm-simtec.c 2007-01-21 15:40:56.0 +0100
+++ b/arch/arm/mach-s3c2410/pm-simtec.c 2007-02-09 23:40:46.0 +0100
@@ -52,7 +52,7 @@
!machine_is_aml_m5900())
return 0;
 
-   printk(KERN_INFO "Simtec Board Power Manangement" COPYRIGHT "\n");
+   printk(KERN_INFO "Simtec Board Power Management" COPYRIGHT "\n");
 
gstatus4  = (__raw_readl(S3C2410_BANKCON7) & 0x3) << 30;
gstatus4 |= (__raw_readl(S3C2410_BANKCON6) & 0x3) << 28;
diff -ur a/arch/arm/plat-omap/dma.c b/arch/arm/plat-omap/dma.c
--- a/arch/arm/plat-omap/dma.c  2006-11-29 22:57:37.0 +0100
+++ b/arch/arm/plat-omap/dma.c  2007-02-09 23:39:56.0 +0100
@@ -1053,7 +1053,7 @@
 void omap_set_lcd_dma_b1_vxres(unsigned long vxres)
 {
if (omap_dma_in_1510_mode()) {
-   printk(KERN_ERR "DMA virtual resulotion is not supported "
+   printk(KERN_ERR "DMA virtual resolution is not supported "
"in 1510 mode\n");
BUG();
}
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [GIT PATCH] ACPI patches for 2.6.21

2007-02-09 Thread Kristen Carlson Accardi
On Fri, 9 Feb 2007 23:09:29 +
Pavel Machek <[EMAIL PROTECTED]> wrote:

> Hi!
> 
> > Per your request, and the request of the distros, we've changed
> > how ACPICA Core releases are integrated into Linux so that each
> > upstream (CVS) check-in appears as a single git commit.
> > While this process is not yet perfect, it should be vastly better
> > than previous "code drops" in allowing git bisect to work,
> > and allowing distros to cherry-pick individual fixes.
> > 
> > The "bay" driver is new (and marked EXPERIMENTAL) -- adding initial
> > hot-plug support for ACPI controlled drive bays such as the
> > IBM ultrabay or the Dell Module Bay.
> 
> Could you describe userland interface it uses? /proc? Will it be
> usable for bays on notebooks not using acpi?

The user interface for the Bay driver is via sysfs - it is a platform
driver, so once you load it you will find 2 files created under
/sys/devices/platform/bay.X, "eject" and "present".  When the user
writes 1 to the "eject" file, the driver will call the ACPI eject
routine - this normally blinks leds and does whatever the system vendor
thinks is necessary to safely eject the device.  The "present" file
will query the driver to determine if the device is present or not (note,
not good for poll(), it's on my todo list...).  Depending on the system
implementation, when the user presses the eject button on the laptop for
the bay device, the driver will inform user space via a CHANGE uevent.  User
space is then responsible for doing whatever needs to be done to cleanup
and safely eject the drive, the driver will not call the ACPI eject
routine without user space initiation.  The driver currently only handles
module bays that use ACPI to send eject notifications or need "something"
done before ejecting (i.e. _EJ0 in ACPI).  The bay driver will also register 
with the dock driver if the bay is on the dock device (such
as with the IBM X60) so that when the dock station is ejected, the bay
driver is notified with the eject request as well.  This notification will
be passed to user space via the CHANGE uevent.

Kristen
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Strange message in log upon resuming a PCIe system

2007-02-09 Thread Robert Hancock

Larry Finger wrote:

A bcm43xx user is having problems with suspend/resume with a PCIe system. This 
may be the first time
we have tried to resume with PCIe. The problem occurs someplace within the 
initialization of the
bcm43xx chip and we are still tracing it; however, there are some strange 
messages in the log from
the pnp, namely:

pnp: Device 00:04 does not support activation.
pnp: Device 00:05 does not support activation.

How does one trace back these device numbers? The output of 'lspci -v' shows 
the following:


Those aren't PCI devices, they're PnP devices, likely on the 
motherboard. If you look in sysfs (not booted into Linux right now so I 
can't tell you exactly where) you can get some idea of what those are. 
In any case I think those are messages are harmless and unrelated to any 
bcm43xx problems.


--
Robert Hancock  Saskatoon, SK, Canada
To email, remove "nospam" from [EMAIL PROTECTED]
Home Page: http://www.roberthancock.com/

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: -mm merge plans for 2.6.21

2007-02-09 Thread Carl-Daniel Hailfinger
Andrew Morton wrote:
> On Fri, 9 Feb 2007 19:37:53 +
> Alan <[EMAIL PROTECTED]> wrote:
> 
>> Please just push the EDAC K8 stuff.
> 
> OK.
> 
>> Andi will say "no" from now until the
>> end of time, but end users want it, distributions want it, and Andi is
>> not the EDAC maintainer so should consider himself overruled on what
>> isn't a technical issue but a personal political viewpoint.
> 
> I'll just tell him I sent it by accident.

Could you please merge ACPI-DSDT-in-initrd for the same reasons?

Regards,
Carl-Daniel
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[git patches] libata updates 1 of 3

2007-02-09 Thread Jeff Garzik

(just sent this upstream to Andrew and Linus)

This is libata push 1 of 3.  This is largely the "accumulated driver
updates" push: lots of minor changes.  A few new drivers.

The most notable thing is "devres", an optional subsystem for drivers
that greatly simplifies the task of driver housekeeping, if you have
to acquire+map then later unmap+free a bunch of MMIO resources,
some PIO resources, an IRQ (or two or three) like we do with ATA
host controllers.  devres is only used by libata drivers at the moment,
but the APIs are generic enough to be used by any driver.  This should
enable the elimination of several highly common code patterns in various
drivers.

devres, in turn, has enabled us to finally merge the patches that convert
libata to using the lib/iomap.c stuff.  Anyone with eyes can see the
code savings in libata-sff that iomap brings.

Kudos to Tejun Heo for the devres work.

I will be pushing ACPI support on Saturday or Sunday, in order to stage
it into a separate 2.6.20-gitX snapshot.  That's libata push 2 of 3.

The third push will eliminate the ugly split-driver configuration
created by quirk_intel_ide_combined() and request_resource(),
whereby libata claims one half of a controller (SATA), and old-IDE
claims the other half (PATA).  libata wins the battle for DMA and IRQ,
and so old-IDE (PATA) is driven via the slower PIO data xfer methods.

Was necessary at the time, as libata lacked ATAPI support and old-IDE
failed to handle irq storms created by newer Intel IDE irq-ack behavior.
But times have changed, and neither conditions remain true.  So we can
remove the hacks.

Please pull from 'upstream-linus' branch of
master.kernel.org:/pub/scm/linux/kernel/git/jgarzik/libata-dev.git 
upstream-linus

to receive the following updates:

 Documentation/driver-model/devres.txt |  268 +++
 drivers/ata/Kconfig   |   41 ++-
 drivers/ata/Makefile  |3 +
 drivers/ata/ahci.c|  236 +++---
 drivers/ata/ata_generic.c |8 +-
 drivers/ata/ata_piix.c|   56 ++--
 drivers/ata/libata-core.c |  591 -
 drivers/ata/libata-eh.c   |7 +-
 drivers/ata/libata-scsi.c |   98 +++--
 drivers/ata/libata-sff.c  |  641 ---
 drivers/ata/libata.h  |4 +-
 drivers/ata/pata_ali.c|   32 +-
 drivers/ata/pata_amd.c|   36 +-
 drivers/ata/pata_artop.c  |   12 +-
 drivers/ata/pata_atiixp.c |6 +-
 drivers/ata/pata_cmd64x.c |   18 +-
 drivers/ata/pata_cs5520.c |   41 ++-
 drivers/ata/pata_cs5530.c |   41 +-
 drivers/ata/pata_cs5535.c |6 +-
 drivers/ata/pata_cypress.c|6 +-
 drivers/ata/pata_efar.c   |6 +-
 drivers/ata/pata_hpt366.c |   26 +-
 drivers/ata/pata_hpt37x.c |   61 +--
 drivers/ata/pata_hpt3x2n.c|   26 +-
 drivers/ata/pata_hpt3x3.c |8 +-
 drivers/ata/pata_isapnp.c |   21 +-
 drivers/ata/pata_it8213.c |  354 +++
 drivers/ata/pata_it821x.c |   58 +--
 drivers/ata/pata_ixp4xx_cf.c  |   50 +--
 drivers/ata/pata_jmicron.c|8 +-
 drivers/ata/pata_legacy.c |  166 
 drivers/ata/pata_marvell.c|   12 +-
 drivers/ata/pata_mpc52xx.c|  538 +++
 drivers/ata/pata_mpiix.c  |  113 ++---
 drivers/ata/pata_netcell.c|6 +-
 drivers/ata/pata_ns87410.c|6 +-
 drivers/ata/pata_oldpiix.c|   24 +-
 drivers/ata/pata_opti.c   |   24 +-
 drivers/ata/pata_optidma.c|   40 +-
 drivers/ata/pata_pcmcia.c |   27 +-
 drivers/ata/pata_pdc2027x.c   |  122 ++---
 drivers/ata/pata_pdc202xx_old.c   |   41 +-
 drivers/ata/pata_platform.c   |   67 +---
 drivers/ata/pata_qdi.c|   50 ++-
 drivers/ata/pata_radisys.c|6 +-
 drivers/ata/pata_rz1000.c |6 +-
 drivers/ata/pata_sc1200.c |6 +-
 drivers/ata/pata_serverworks.c|   31 +-
 drivers/ata/pata_sil680.c |8 +-
 drivers/ata/pata_sis.c|   70 +++-
 drivers/ata/pata_sl82c105.c   |   10 +-
 drivers/ata/pata_triflex.c|6 +-
 drivers/ata/pata_via.c|   22 +-
 drivers/ata/pata_winbond.c|   49 ++-
 drivers/ata/pdc_adma.c|  120 ++
 drivers/ata/sata_inic162x.c   |  781 +
 drivers/ata/sata_mv.c |  200 +++--
 drivers/ata/sata_nv.c |  629 ---
 drivers/ata/sata_promise.c|  379 +++-
 drivers/ata/sata_qstor.c  |  138 ++
 drivers/ata/sata_sil.c|   99 ++---
 

Re: [ipw3945-devel] [ANNOUNCE] d80211 based driver for Intel PRO/Wireless 3945ABG

2007-02-09 Thread Norbert Preining
Hi all!

On Fre, 09 Feb 2007, James Ketrenos wrote:
> We are pleased to announce the availability of a new driver for the 
> Intel PRO/Wireless 3945ABG Network Connection adapter.  This new driver 

I am impressed: I had 2.6.20 running with ipw3945 + wpa_supplicant. I
installed the d80211 system and the new driver, rebooted, and you won't
believe it, I had network connection even with WEP encryption.

That was a big surprise for me that it worked out of the box without any
magic.

If you are interested in any dmesg/log output, please let me know.


Ahhh ... one thing: The LED on my Acer Laptop (TM3012) does not show up,
maybe because I have CONFIG_D80211_LEDS off?

Again, thanks a lot for the good work!

Norbert

---
Dr. Norbert Preining <[EMAIL PROTECTED]>Università di Siena
Debian Developer <[EMAIL PROTECTED]> Debian TeX Group
gpg DSA: 0x09C5B094  fp: 14DF 2E6C 0307 BE6D AD76  A9C0 D2BF 4AA3 09C5 B094
---
HARPENDEN (n.)
The coda to a phone conversion, consisting of about eight exchanges,
by which people try gracefully to get off the line.
--- Douglas Adams, The Meaning of Liff
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: + smaps-add-clear_refs-file-to-clear-reference.patch added to -mm tree

2007-02-09 Thread David Rientjes
Do not clear references when the task_struct's mm is NULL by using 
/proc/pid/clear_refs.

Also, use mmap_sem since the mm_struct's VMA's are being iterated in 
fs/proc/task_mmu.c.

Reported by Oleg Nesterov <[EMAIL PROTECTED]>.

Signed-off-by: David Rientjes <[EMAIL PROTECTED]>
---
 fs/proc/base.c |9 -
 1 files changed, 8 insertions(+), 1 deletions(-)

diff --git a/fs/proc/base.c b/fs/proc/base.c
--- a/fs/proc/base.c
+++ b/fs/proc/base.c
@@ -719,6 +719,7 @@ static ssize_t clear_refs_write(struct file *file, const 
char __user *buf,
size_t count, loff_t *ppos)
 {
struct task_struct *task;
+   struct mm_struct *mm;
char buffer[PROC_NUMBUF], *end;
 
memset(buffer, 0, sizeof(buffer));
@@ -733,7 +734,13 @@ static ssize_t clear_refs_write(struct file *file, const 
char __user *buf,
task = get_proc_task(file->f_path.dentry->d_inode);
if (!task)
return -ESRCH;
-   clear_refs_smap(task->mm->mmap);
+   mm = get_task_mm(task);
+   if (mm) {
+   down_read(>mmap_sem);
+   clear_refs_smap(mm->mmap);
+   up_read(>mmap_sem);
+   mmput(mm);
+   }
put_task_struct(task);
if (end - buffer == 0)
return -EIO;
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: + smaps-add-clear_refs-file-to-clear-reference.patch added to -mm tree

2007-02-09 Thread Andrew Morton
On Sat, 10 Feb 2007 03:39:58 +0300
Oleg Nesterov <[EMAIL PROTECTED]> wrote:

> David Rientjes wrote:
> >
> > +static ssize_t clear_refs_write(struct file *file, const char __user *buf,
> > +   size_t count, loff_t *ppos)
> > +{
> > ...
> > +   task = get_proc_task(file->f_path.dentry->d_inode);
> > +   if (!task)
> > +   return -ESRCH;
> > +   clear_refs_smap(task->mm->mmap);
> 
> task->mm may be NULL and not stable, this needs get_task_mm() (may fail).

yup.

> Don't we also need ->mmap_sem to iterate vmas?

and yup.

Like this?

--- a/fs/proc/base.c~smaps-add-clear_refs-file-to-clear-reference-fix
+++ a/fs/proc/base.c
@@ -720,6 +720,7 @@ static ssize_t clear_refs_write(struct f
 {
struct task_struct *task;
char buffer[PROC_NUMBUF], *end;
+   struct mm_struct *mm;
 
memset(buffer, 0, sizeof(buffer));
if (count > sizeof(buffer) - 1)
@@ -733,7 +734,11 @@ static ssize_t clear_refs_write(struct f
task = get_proc_task(file->f_path.dentry->d_inode);
if (!task)
return -ESRCH;
-   clear_refs_smap(task->mm->mmap);
+   mm = get_task_mm(task);
+   if (mm) {
+   clear_refs_smap(mm);
+   mmput(mm);
+   }
put_task_struct(task);
if (end - buffer == 0)
return -EIO;
diff -puN fs/proc/task_mmu.c~smaps-add-clear_refs-file-to-clear-reference-fix 
fs/proc/task_mmu.c
--- a/fs/proc/task_mmu.c~smaps-add-clear_refs-file-to-clear-reference-fix
+++ a/fs/proc/task_mmu.c
@@ -350,11 +350,15 @@ static int show_smap(struct seq_file *m,
return show_map_internal(m, v, );
 }
 
-void clear_refs_smap(struct vm_area_struct *vma)
+void clear_refs_smap(struct mm_struct *mm)
 {
-   for (; vma; vma = vma->vm_next)
+   struct vm_area_struct *vma;
+
+   down_read(>mmap_sem);
+   for (vma = mm->mmap; vma; vma = vma->vm_next)
if (vma->vm_mm && !is_vm_hugetlb_page(vma))
for_each_pmd(vma, clear_refs_one_pmd, NULL);
+   up_read(>mmap_sem);
 }
 
 static void *m_start(struct seq_file *m, loff_t *pos)
diff -puN 
include/linux/proc_fs.h~smaps-add-clear_refs-file-to-clear-reference-fix 
include/linux/proc_fs.h
--- a/include/linux/proc_fs.h~smaps-add-clear_refs-file-to-clear-reference-fix
+++ a/include/linux/proc_fs.h
@@ -104,7 +104,7 @@ int proc_pid_readdir(struct file * filp,
 unsigned long task_vsize(struct mm_struct *);
 int task_statm(struct mm_struct *, int *, int *, int *, int *);
 char *task_mem(struct mm_struct *, char *);
-void clear_refs_smap(struct vm_area_struct *);
+void clear_refs_smap(struct mm_struct *mm);
 
 extern struct proc_dir_entry *create_proc_entry(const char *name, mode_t mode,
struct proc_dir_entry *parent);
_

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


PROBLEM: xt_state compiles without errors but cannot be loaded

2007-02-09 Thread Administrative Services
[1.] module: xt_state compiles without errors but cannot be loaded

[2.] Here's what shows up in /var/log/messages:

kernel: xt_state: Unknown symbol nf_conntrack_untracked

kernel: xt_state: Unknown symbol nf_ct_l3proto_module_put

kernel: xt_state: disagrees about version of symbol xt_unregister_matches

kernel: xt_state: Unknown symbol xt_unregister_matches

kernel: xt_state: Unknown symbol nf_ct_l3proto_try_module_get

kernel: xt_state: disagrees about version of symbol xt_register_matches

kernel: xt_state: Unknown symbol xt_register_matches


[3.] modules, netfilter:

[4.] 2.6.20:

[5.] 2.6.19.2 possibly 2.6.9.3:

[7.] try loading module using insmod

[8.] CentOS 4.4
[8.1.] apf 0.9.6 (modified to use xt_state), and insmod

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[ANNOUNCE] sparse-0.2-cl2 is now available

2007-02-09 Thread Christopher Li

Temporarily at:
http://userweb.kernel.org/~chrisl/sparse-0.2-cl2

Will appear later at:

http://ftp.kernel.org//pub/linux/kernel/people/chrisl/patches/sparse/sparse-0.2-cl2/


I have been play with sparse to add more Stanford checker style
of checking. The paper is "Checking System Rules Using System-
Specific, Programmer-Written Compiler Extensions" by Dawson Engler
etc.

Unlike the Stanford checker and smatch, this checker is working on
the linearization level instead of AST level. Linearization code
can be very convenient (when it works) to trace the data flow because
pseudo is in SSA form. There is define/user chain to avoid scan
every instruction.

I take the malloc checking for example to explain how the checker
works. The checking usually happen in three step:

The first step is scanning the linearize instruction. It look for
relevant operations. For malloc checker, the task is find out
the malloc/free function call and usage of malloced pointer.

The second step is converting the relevant operations into checker
instruction. The checker instruction is a simplification of the whole
program, only contain the operation relevant to checker.

The third step is executing the checker instruction. It try to execute
every possible execution flow in the function. The execution engine
will let the checker instruction perform state changes.

Thanks to step two, the size and complexity of the of program has been
greatly reduced.

The new checking has been very fast, it add a few seconds to the make C=1
run.

Again, comment and feed back are always welcome.

Chris

Change log in sparse-0.2-cl2:
 - adding pointer signedness fix
 - adding spinlock checking

Change log in sparse-0.2-cl1:
  The most interesting part is the inline function annotation.
  The new checker can find out inlined function usage. The interrupt
  checker does not depend on x86 asm instruction any more.


origin.patch
006eff06c7adcfb0d06c6fadf6e9b64f0488b2bf URL: 
git://git.kernel.org/pub/scm/linux/kernel/git/josh/sparse.git
incompatible-ptr-signess
Bug fix in pointer modifiers inherent at function degeneration.
sizeof-incomplete
Fix double semicolon in struct declare
anon-symbol
Fix core dump on anonymous symbol.
instruction-buffer-size
Fix core dump on huge switch
debug-checker
Adding debug option for showing the linearized instruction.
no-dead-instruction
Disable liveness "dead" instruction by default.
ptr-allocator
Make the ptrlist using the sparse allocator.
annotate-inline-2
Add annotation for inline function call.
malloc-checker
Adding the malloc NULL pointer checker.
interrupt-checker
Adding the interrupt checker
spinlock-checker
Adding spinlock checker


Total 12 patches


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] fix quadratic behavior of shrink_dcache_parent()

2007-02-09 Thread Andrew Morton
On Fri, 9 Feb 2007 19:23:31 -0500
"Russ Cox" <[EMAIL PROTECTED]> wrote:

> > "The file system mounted on /tmp/z in the example contains 2^50
> > directories".   heh.
> >
> > I do wonder how realistic this problem is in real life.
> 
> That's a fair concern, although I was trying this as part
> of evaluating how much someone could hose a system
> if we let them mount arbitrary FUSE servers.  And the
> answer is: they could make it completely unusable,
> requiring reboot.
> 
> I ran a later test that printed how deep it got into
> the file tree and it was only a few hundred thousand
> if I recall correctly.  A determined attacker might even
> manage to do this in a normal file system.
> 
> But sure, it's not a common case.  ;-)

Well that's a good point - sometimes people do crazy things on purpose.  We
were all University students once ;) 

The patches look nice and as I said, potentially of some use for memory
reclaim.  But I hope that someone who has worked on dcache.c more recently
than I has time to apply a toothcomb to this work.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: + smaps-add-clear_refs-file-to-clear-reference.patch added to -mm tree

2007-02-09 Thread Oleg Nesterov
David Rientjes wrote:
>
> +static ssize_t clear_refs_write(struct file *file, const char __user *buf,
> + size_t count, loff_t *ppos)
> +{
> ...
> + task = get_proc_task(file->f_path.dentry->d_inode);
> + if (!task)
> + return -ESRCH;
> + clear_refs_smap(task->mm->mmap);

task->mm may be NULL and not stable, this needs get_task_mm() (may fail).

Don't we also need ->mmap_sem to iterate vmas?

Oleg.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] fbdev driver for S3 Trio/Virge, updated

2007-02-09 Thread Jiri Slaby

Ondrej Zajicek napsal(a):

This patch adds driver for S3 Trio / S3 Virge. Driver is tested
with most versions of S3 Trio and S3 Virge, on i386.
It is tested both as compiled-in and module. It is against
linux-2.6.20 .

This is version 3. There are some minor modifications from version 2
(mostly coding style cleanups).


Signed-off-by: Ondrej Zajicek <[EMAIL PROTECTED]>

---


[...]

+/* PCI probe */
+
+static int __devinit s3_pci_probe(struct pci_dev *dev, const struct 
pci_device_id *id)
+{
+   struct fb_info *info;
+   struct s3fb_info *par;
+   int rc;
+   u8 regval, cr38, cr39;
+
+   /* Ignore secondary VGA device because there is no VGA arbitration */
+   if (! svga_primary_device(dev)) {
+   dev_info(&(dev->dev), "ignoring secondary device\n");
+   return -ENODEV;
+   }
+
+   /* Allocate and fill driver data structure */
+   info = framebuffer_alloc(sizeof(struct s3fb_info), NULL);
+   if (!info) {
+   dev_err(&(dev->dev), "cannot allocate memory\n");
+   return -ENOMEM;
+   }
+
+   par = info->par;
+   mutex_init(>open_lock);
+
+   info->flags = FBINFO_PARTIAL_PAN_OK | FBINFO_HWACCEL_YPAN;
+   info->fbops = _ops;
+
+   /* Prepare PCI device */
+   rc = pci_enable_device(dev);
+   if (rc < 0) {
+   dev_err(&(dev->dev), "cannot enable PCI device\n");
+   goto err_enable_device;
+   }
+
+   rc = pci_request_regions(dev, "s3fb");
+   if (rc < 0) {
+   dev_err(&(dev->dev), "cannot reserve framebuffer region\n");
+   goto err_request_regions;
+   }
+
+
+   info->fix.smem_start = pci_resource_start(dev, 0);
+   info->fix.smem_len = pci_resource_len(dev, 0);
+
+   /* Map physical IO memory address into kernel space */
+   info->screen_base = pci_iomap(dev, 0, 0);
+   if (! info->screen_base) {
+   rc = -ENOMEM;
+   dev_err(&(dev->dev), "iomap for framebuffer failed\n");
+   goto err_iomap;
+   }
+
+   /* Unlock regs */
+   cr38 = vga_rcrt(NULL, 0x38);
+   cr39 = vga_rcrt(NULL, 0x39);
+   vga_wseq(NULL, 0x08, 0x06);
+   vga_wcrt(NULL, 0x38, 0x48);
+   vga_wcrt(NULL, 0x39, 0xA5);
+
+   /* Find how many physical memory there is on card */
+   /* 0x36 register is accessible even if other registers are locked */
+   regval = vga_rcrt(NULL, 0x36);
+   info->screen_size = s3_memsizes[regval >> 5] << 10;
+   info->fix.smem_len = info->screen_size;
+
+   par->chip = id->driver_data & CHIP_MASK;
+   par->rev = vga_rcrt(NULL, 0x2f);
+   if (par->chip & CHIP_UNDECIDED_FLAG)
+   par->chip = s3_identification(par->chip);
+
+   /* Find MCLK frequency */
+   regval = vga_rseq(NULL, 0x10);
+   par->mclk_freq = ((vga_rseq(NULL, 0x11) + 2) * 14318) / ((regval & 
0x1F)  + 2);
+   par->mclk_freq = par->mclk_freq >> (regval >> 5);
+
+   /* Restore locks */
+   vga_wcrt(NULL, 0x38, cr38);
+   vga_wcrt(NULL, 0x39, cr39);
+
+   strcpy(info->fix.id, s3_names [par->chip]);
+   info->fix.mmio_start = 0;
+   info->fix.mmio_len = 0;
+   info->fix.type = FB_TYPE_PACKED_PIXELS;
+   info->fix.visual = FB_VISUAL_PSEUDOCOLOR;
+   info->fix.ypanstep = 0;
+   info->fix.accel = FB_ACCEL_NONE;
+   info->pseudo_palette = (void*) (par->pseudo_palette);
+
+   /* Prepare startup mode */
+   rc = fb_find_mode(&(info->var), info, mode, NULL, 0, NULL, 8);
+   if (! ((rc == 1) || (rc == 2))) {
+   rc = -EINVAL;
+   dev_err(&(dev->dev), "mode %s not found\n", mode);
+   goto err_find_mode;
+   }
+
+   rc = fb_alloc_cmap(>cmap, 256, 0);
+   if (rc < 0) {
+   dev_err(&(dev->dev), "cannot allocate colormap\n");
+   goto err_alloc_cmap;
+   }
+
+   rc = register_framebuffer(info);
+   if (rc < 0) {
+   dev_err(&(dev->dev), "cannot register framebugger\n");


Bugger :DD LOL? Buffer?


+   goto err_reg_fb;
+   }
+
+   printk(KERN_INFO "fb%d: %s on %s, %d MB RAM, %d MHz MCLK\n", info->node, 
info->fix.id,
+pci_name(dev), info->fix.smem_len >> 20, (par->mclk_freq + 
500) / 1000);
+
+   if (par->chip == CHIP_UNKNOWN)
+   printk(KERN_INFO "fb%d: unknown chip, CR2D=%x, CR2E=%x, CRT2F=%x, 
CRT30=%x\n",
+   info->node, vga_rcrt(NULL, 0x2d), vga_rcrt(NULL, 0x2e),
+   vga_rcrt(NULL, 0x2f), vga_rcrt(NULL, 0x30));


dev_info x 2, but it's a dite.


+
+   /* Record a reference to the driver data */
+   pci_set_drvdata(dev, info);
+
+#ifdef CONFIG_MTRR
+   if (mtrr) {
+   par->mtrr_reg = -1;
+   par->mtrr_reg = mtrr_add(info->fix.smem_start, 
info->fix.smem_len, MTRR_TYPE_WRCOMB, 1);
+   }
+#endif
+
+   return 0;
+
+   /* Error handling */
+err_reg_fb:
+   

Re: [PATCH 21/22] honor r/w changes at do_remount() time

2007-02-09 Thread Anton Altaparmakov

On 9 Feb 2007, at 23:22, Andrew Morton wrote:

On Fri, 09 Feb 2007 14:53:44 -0800
Dave Hansen <[EMAIL PROTECTED]> wrote:


This is the core of the read-only bind mount patch set.


Who wants read-only bind mounts, and for what reason?


On our local mirror server (mirrors just under 3TiB worth of stuff)  
we hold all data on r/w mounted storage in a private location in the  
file tree.  (Note the server runs Solaris 10 not Linux or the  
following would not be possible at present...)


We then bind mount (i.e. loopback mount on Solaris) various  
directories from inside the private paths to various other locations  
so for example we create /export/ftp/pub/* where "*" are directories  
we want to export via FTP and we do all of those as read-only bind  
mounts.  This gives us that little bit of extra confidence that no- 
one from the outside can cause any writes to happen to our mirrored  
data.  We do similar for NFS by creating lots of read-only bind  
mounts in /* that again point into the private locations.


It would be nice if the Linux box that we have that is a copy/backup  
of the Solaris box could do the same rather than have all the bind  
mounts be read-write because we need the storage in the private  
locations to be writable.


Best regards,

Anton
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: NAK new drivers without proper power management?

2007-02-09 Thread Jeff Garzik

Nigel Cunningham wrote:

Hi.

On Fri, 2007-02-09 at 23:17 +0100, Arjan van de Ven wrote:

On Sat, 2007-02-10 at 08:57 +1100, Nigel Cunningham wrote:

Hi.

I don't think this is already done (feel free to correct me if I'm
wrong)..

Can we start to NAK new drivers that don't have proper power management
implemented? There really is no excuse for writing a new driver and not
putting .suspend and .resume methods in anymore, is there?


to a large degree, a device driver that doesn't suspend is better than
no device driver at all, right?


I'm not sure it is. It only makes more work for everyone else: We have
to help people figure out what causes their computer to fail to resume
(which can take quite a while), then get them them complain to driver
author, and the driver author has to submit patches to fix it.

All of this is avoided if they'll just do it right in the first place.


A lot of a lot of things could have been avoided, if they just did it 
right the first time.


I think it's more valuable to users to get a basic network driver that 
pings or a basic ATA driver that reads/writes, than peripheral issues 
like suspend/resume.


Certainly we should ask for it, but it shouldn't be a merge-stopper.

Jeff



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Fix null pointer dereference in appledisplay driver

2007-02-09 Thread Michael Hanselmann
Commit 40b20c257a13c5a526ac540bc5e43d0fdf29792a by Len Brown introduced
a null pointer dereference in the appledisplay driver. This patch fixes
it.

Signed-off-by: Michael Hanselmann <[EMAIL PROTECTED]>

---
I suggest adding this to 2.6.20.1 because this bug causes the kernel to
panic on boot when the driver is compiled in.

diff -Nrup --exclude-from linux-exclude-from 
linux-2.6.20.orig/drivers/usb/misc/appledisplay.c 
linux-2.6.20/drivers/usb/misc/appledisplay.c
--- linux-2.6.20.orig/drivers/usb/misc/appledisplay.c   2007-02-09 
22:35:56.0 +0100
+++ linux-2.6.20/drivers/usb/misc/appledisplay.c2007-02-10 
01:00:28.0 +0100
@@ -281,8 +281,8 @@ static int appledisplay_probe(struct usb
/* Register backlight device */
snprintf(bl_name, sizeof(bl_name), "appledisplay%d",
atomic_inc_return(_displays) - 1);
-   pdata->bd = backlight_device_register(bl_name, NULL, NULL,
-   _bl_data);
+   pdata->bd = backlight_device_register(bl_name, NULL,
+   pdata, _bl_data);
if (IS_ERR(pdata->bd)) {
err("appledisplay: Backlight registration failed");
goto error;
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] fix quadratic behavior of shrink_dcache_parent()

2007-02-09 Thread Russ Cox

"The file system mounted on /tmp/z in the example contains 2^50
directories".   heh.

I do wonder how realistic this problem is in real life.


That's a fair concern, although I was trying this as part
of evaluating how much someone could hose a system
if we let them mount arbitrary FUSE servers.  And the
answer is: they could make it completely unusable,
requiring reboot.

I ran a later test that printed how deep it got into
the file tree and it was only a few hundred thousand
if I recall correctly.  A determined attacker might even
manage to do this in a normal file system.

But sure, it's not a common case.  ;-)

Russ
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0 of 4] Generic AIO by scheduling stacks

2007-02-09 Thread Linus Torvalds


On Sat, 10 Feb 2007, Eric Dumazet wrote:
> 
> Well, I guess if the original program was mono-threaded, and syscall used
> fget_light(), we might have a problem here if the child try a close(). So you
> may have to disable fget_light() magic if async call is the originator of the
> syscall.

Yes. All the issues that I already brought up with Zach's patches are 
still there. This doesn't really change any of them. Any optimization that 
checks for "am I single-threaded" will need to be aware of pending and 
running async things.

With my patch, any _running_ async things will always be seen as normal 
clones, but the pending ones won't. So you'd need to effectively change 
anything that looks like

if (atomic_read(>mm->count) == 1)
.. do some simplified version ..

into

if (!current->async_cookie && atomic_read(..) == 1)
.. do the simplified thing ..

to make it safe.

I think we only do it for fget_light and some VM TLB simplification, so it 
shouldn't be a big burden to check.

Side note: the real issues still remain. The interfaces, and the 
performance testing.

Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: NAK new drivers without proper power management?

2007-02-09 Thread Rafael J. Wysocki
On Saturday, 10 February 2007 00:28, Nigel Cunningham wrote:
> Hi.
> 
> On Sat, 2007-02-10 at 00:12 +0100, Rafael J. Wysocki wrote:
> > > > I think if CONFIG_PM_DEBUG is set, the core should warn about drivers 
> > > > not
> > > > having .suspend or .resume routines.
> > > 
> > > The only problem with that is, not everyone turns on CONFIG_PM_DEBUG.
> > > CONFIG_PM instead?
> > 
> > Well, I can imagine a driver that doesn't need a .suspend routine, for 
> > example,
> > and I don't think we should make the kernel always complain about that.
> 
> How about...
> 
> #ifdef CONFIG_PM_PARANOIA
> static int empty_suspend_routine(struct device *dev, pm_message_t state)
> {
>   return 0;
> }
> #define empty_suspend empty_suspend_routine
> #else
> #define empty_suspend NULL
> #endif
> 
> ...
> 
>   .suspend = empty_suspend;
> ...
> 
> 
> Then CONFIG_PM_PARANOIA can be enabled by default for now, and when we
> eventually device it's not needed anymore, someone can submit a patch
> replacing either turning off the CONFIG by default or removing the whole
> mechanism.

I think that would be tempting people to abuse it, for example by defining or
undefining things just to quieten the warning.

In my opinion the only way to make the warning go away should be to define
a non-NULL .suspend (.resume) routine and that's why I don't think the warning
should be mandatory.

> > I think if someone doesn't set CONFIG_PM_DEBUG, we can ask him to set it
> > and report back.
> 
> We can, but the whole point to the suggestion was to make your life and
> mine easier, as well as those of our users.
> 
> Making it dependent on CONFIG_PM instead achieves that by:
> - Saving you, I and distro people from having to tell their users to
> enable the option (and how to)

I think the distro people can patch their kernels to fit their needs.

> - Saving the users the problem of going through all the steps, making
> mistakes, potentially ending up with unbootable systems because they
> make mistakes and so on.
> 
> This way, they just need to look in dmesg.

Well, IMO, if someone doesn't know how to compile and install the kernel,
he'll be using a distro kernel anyway and then see above.  Otherwise we can
safely ask him to turn on whatever debugging options we need.

Greetings,
Rafael


-- 
If you don't have the time to read,
you don't have the time or the tools to write.
- Stephen King
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 02/22] r/o bind mounts: add vfsmount writer counts

2007-02-09 Thread Dave Hansen
On Sat, 2007-02-10 at 00:41 +0100, Eric Dumazet wrote:
> Dave, please read again this comment in struct vfsmount definition.
> 
> If I understand your infrastructure, mnt=5Fwriters is going to be frequently
> modified, so it should be placed at the end of struct vfsmount, in the same
> cache line than mnt_count.

That's an excellent point, thanks for catching it.  Here's an updated
patch.

-- Dave

This patch actually adds the mount and superblock writer
counts, and the mnt_want/drop_write() functions that use
them.

Before these can become useful, we must first cover each
place in the VFS where writes are performed with a
want/drop pair.  When that is complete, we can actually
introduce code that will safely check the counts before
allowing r/w<->r/o transitions to occur.

Signed-off-by: Dave Hansen <[EMAIL PROTECTED]>
---

 lxc-dave/fs/namespace.c|   53 +
 lxc-dave/fs/super.c|   18 ++---
 lxc-dave/include/linux/fs.h|2 +
 lxc-dave/include/linux/mount.h |   28 +++--
 4 files changed, 94 insertions(+), 7 deletions(-)

diff -puN fs/namespace.c~03-24-add-vfsmount-writer-count fs/namespace.c
--- lxc/fs/namespace.c~03-24-add-vfsmount-writer-count  2007-02-09 
16:04:40.0 -0800
+++ lxc-dave/fs/namespace.c 2007-02-09 16:04:40.0 -0800
@@ -58,6 +58,7 @@ struct vfsmount *alloc_vfsmnt(const char
if (mnt) {
mnt->mnt_user_ns = get_user_ns(current->nsproxy->user_ns);
atomic_set(>mnt_count, 1);
+   mnt->mnt_writers = 0;
INIT_LIST_HEAD(>mnt_hash);
INIT_LIST_HEAD(>mnt_child);
INIT_LIST_HEAD(>mnt_mounts);
@@ -78,6 +79,56 @@ struct vfsmount *alloc_vfsmnt(const char
return mnt;
 }
 
+int mnt_make_readonly(struct vfsmount *mnt)
+{
+   int ret = 0;
+
+   WARN_ON(__mnt_is_readonly(mnt));
+
+   /*
+* This flag set is actually redundant with what
+* happens in do_remount(), but since we do this
+* under the lock, anyone attempting to get a write
+* on it after this will fail.
+*/
+   spin_lock(>mnt_sb->s_mnt_writers_lock);
+   if (!mnt->mnt_writers)
+   mnt->mnt_flags |= MNT_READONLY;
+   else
+   ret = -EBUSY;
+   spin_unlock(>mnt_sb->s_mnt_writers_lock);
+   return ret;
+}
+
+int mnt_want_write(struct vfsmount *mnt)
+{
+   int ret = 0;
+
+   spin_lock(>mnt_sb->s_mnt_writers_lock);
+   if (mnt->mnt_writers)
+   goto out;
+
+   if (__mnt_is_readonly(mnt)) {
+   ret = -EROFS;
+   goto out;
+   }
+   mnt->mnt_sb->s_writers++;
+   mnt->mnt_writers++;
+out:
+   spin_unlock(>mnt_sb->s_mnt_writers_lock);
+   return ret;
+}
+EXPORT_SYMBOL_GPL(mnt_want_write);
+
+void mnt_drop_write(struct vfsmount *mnt)
+{
+   spin_lock(>mnt_sb->s_mnt_writers_lock);
+   mnt->mnt_sb->s_writers--;
+   mnt->mnt_writers--;
+   spin_unlock(>mnt_sb->s_mnt_writers_lock);
+}
+EXPORT_SYMBOL_GPL(mnt_drop_write);
+
 int simple_set_mnt(struct vfsmount *mnt, struct super_block *sb)
 {
mnt->mnt_sb = sb;
@@ -1415,6 +1466,8 @@ long do_mount(char *dev_name, char *dir_
((char *)data_page)[PAGE_SIZE - 1] = 0;
 
/* Separate the per-mountpoint flags */
+   if (flags & MS_RDONLY)
+   mnt_flags |= MNT_READONLY;
if (flags & MS_NOSUID)
mnt_flags |= MNT_NOSUID;
if (flags & MS_NODEV)
diff -puN fs/super.c~03-24-add-vfsmount-writer-count fs/super.c
--- lxc/fs/super.c~03-24-add-vfsmount-writer-count  2007-02-09 
16:04:40.0 -0800
+++ lxc-dave/fs/super.c 2007-02-09 16:04:40.0 -0800
@@ -93,6 +93,8 @@ static struct super_block *alloc_super(s
s->s_qcop = sb_quotactl_ops;
s->s_op = _op;
s->s_time_gran = 10;
+   s->s_writers = 0;
+   spin_lock_init(>s_mnt_writers_lock);
}
 out:
return s;
@@ -576,6 +578,11 @@ static void mark_files_ro(struct super_b
file_list_unlock();
 }
 
+static int sb_remount_ro(struct super_block *sb)
+{
+   return fs_may_remount_ro(sb);
+}
+
 /**
  * do_remount_sb - asks filesystem to change mount options.
  * @sb:superblock in question
@@ -587,7 +594,8 @@ static void mark_files_ro(struct super_b
  */
 int do_remount_sb(struct super_block *sb, int flags, void *data, int force)
 {
-   int retval;
+   int retval = 0;
+   int sb_started_ro = (sb->s_flags & MS_RDONLY);

 #ifdef CONFIG_BLOCK
if (!(flags & MS_RDONLY) && bdev_read_only(sb->s_bdev))
@@ -600,11 +608,13 @@ int do_remount_sb(struct super_block *sb
 
/* If we are remounting RDONLY and current sb is read/write,
   make sure there are no rw files opened */
-   if ((flags & MS_RDONLY) && !(sb->s_flags & MS_RDONLY)) {
+   if ((flags & MS_RDONLY) && 

Re: [PATCH 0 of 4] Generic AIO by scheduling stacks

2007-02-09 Thread Eric Dumazet

Linus Torvalds a écrit :

Ok, here's another entry in this discussion.




 - IF the system call blocks, we call the architecture-specific 
   "schedule_async()" function before we even get any scheduler locks, and 
   it can just do a fork() at that time, and let the *child* return to the 
   original user space. The process that already started doing the system 
   call will just continue to do the system call.



Well, I guess if the original program was mono-threaded, and syscall used 
fget_light(), we might have a problem here if the child try a close(). So you 
may have to disable fget_light() magic if async call is the originator of the 
syscall.


Eric
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 02/22] r/o bind mounts: add vfsmount writer counts

2007-02-09 Thread Eric Dumazet

Dave Hansen a écrit :


@@ -56,6 +57,7 @@ struct vfsmount {
struct vfsmount *mnt_master;/* slave is on master->mnt_slave_list */
struct mnt_namespace *mnt_ns;   /* containing namespace */
struct user_namespace *mnt_user_ns; /* namespace for uid interpretation 
*/
+   int mnt_writers;/* nr files open for write */
/*
 * We put mnt_count & mnt_expiry_mark at the end of struct vfsmount
 * to let these frequently modified fields in a separate cache line
@@ -72,7 +74,26 @@ static inline struct vfsmount *mntget(st
atomic_inc(>mnt_count);
return mnt;


Dave, please read again this comment in struct vfsmount definition.

If I understand your infrastructure, mnt_writers is going to be frequently 
modified, so it should be placed at the end of struct vfsmount, in the same 
cache line than mnt_count.


Thank you
Eric

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] fix quadratic behavior of shrink_dcache_parent()

2007-02-09 Thread Andrew Morton
On Fri, 09 Feb 2007 23:01:06 +0100
Miklos Szeredi <[EMAIL PROTECTED]> wrote:

> From: Miklos Szeredi <[EMAIL PROTECTED]>
> 
> The time shrink_dcache_parent() takes, grows quadratically with the
> depth of the tree under 'parent'.  This starts to get noticable at
> about 10,000.
> 
> These kinds of depths don't occur normally, and filesystems which
> invoke shrink_dcache_parent() via d_invalidate() seem to have other
> depth dependent timings, so it's not even easy to expose this problem.
> 
> However with FUSE it's easy to create a deep tree and d_invalidate()
> will also get called.  This can make a syscall hang for a very long
> time.
> 
> This is the original discovery of the problem by Russ Cox:
> 
>   http://article.gmane.org/gmane.comp.file-systems.fuse.devel/3826

"The file system mounted on /tmp/z in the example contains 2^50
directories".   heh.

I do wonder how realistic this problem is in real life.

> The following patch fixes the quadratic behavior, by optionally
> allowing prune_dcache() to prune ancestors of a dentry in one go,
> instead of doing it one at a time.
> 
> Common code in dput() and prune_one_dentry() is extracted into a new
> helper function d_kill().
> 
> shrink_dcache_parent() as well as shrink_dcache_sb() are converted to
> use the ancestry-pruner option.  Only for shrink_dcache_memory() is
> this behavior not desirable, so it keeps using the old algorithm.
> 

I wonder if we should be setting shrink_parents=1 in
shrink_dcache_memory()?  Because we have this problem where the dentry
slabs suffer lots of internal fragmentation and we end up with whole slab
pages pinned by a single directory dentry.  I expect that if
shrink_dcache_memory() were aggressive about reaping newly-childless
directory dentries, some improvements might be realised there.

If so, we should change prune_dcache() to return the number pruned, so that
shrink_dcache_memory() can keep its arithmetic correct.  Would require some
careful testing and is out of scope for your work.


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 21/22] honor r/w changes at do_remount() time

2007-02-09 Thread Dave Hansen
On Fri, 2007-02-09 at 15:22 -0800, Andrew Morton wrote:
> On Fri, 09 Feb 2007 14:53:44 -0800
> Dave Hansen <[EMAIL PROTECTED]> wrote:
> 
> > This is the core of the read-only bind mount patch set.
> 
> Who wants read-only bind mounts, and for what reason?

The original desire came out of the linux-vserver project.  It allows a
sysadmin to share directories between many vservers/containers and keep
those containers from writing to it, even though the users in that
vserver may have "root" privileges.

This also has the advantage of cleaning up the somewhat hackish "look
for writable-open-files during remount/ro operations".  It should also
allow us to separate the concepts of the user wanting a filesystem to be
r/o and the filesystem _itself_ being r/o because of a r/o device or
some kind of corruption.

-- Dave

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ck] Re: Swap prefetch merge plans

2007-02-09 Thread Randy Dunlap
On Fri, 09 Feb 2007 18:35:51 -0500 Chuck Ebbert wrote:

> Andrew Morton wrote:
> > I have an email sitting in my drafts folder stating that I'll no longer
> > accept any features unless they've been publically reviewed in detail and
> > run-time tested by a third party.  The idea being to force people to spend
> > more time reviewing and testing each other's stuff and less time writing
> > new stuff.  Maybe on a sufficiently gloomy day I'll actually send it.
> >   
> /me sneaks into Andrew's office and sends it out.

Thanks.  8)

---
~Randy
*** Remember to use Documentation/SubmitChecklist when testing your code ***
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Add PM_TRACE x86_64 support.

2007-02-09 Thread Pavel Machek
Hi!

> > > Nigel Cunningham <[EMAIL PROTECTED]> writes:
> > > 
> > > > -   for (tracedata = &__tracedata_start ; tracedata < 
> > > > &__tracedata_end ; tracedata += 6) {
> > > > +   for (tracedata = &__tracedata_start ; tracedata < 
> > > > &__tracedata_end ; tracedata += 2 + sizeof(unsigned long)) {
> > > 
> > > Could you split this line?
> > 
> > Sure.
> > 
> > -- New version -- (What's the right way to do this?)
> > 
> > This patch add x86_64 support for PM_TRACE, and shifts per-arch code to
> > the appropriate subdirectories.
> > 
> > Symbol exports are added so tracing can be used from drivers built as
> > modules too.
> 
> Don't include exports in a patch that doesn't use them. Introduce the
> exports in a later patch series, for when you actually need it.

It is debugging infrastructure, so export actually makes sense... It
will not ever be used in mainline kernel; you need to modify code
manually to use this code..

-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [GIT PATCH] ACPI patches for 2.6.21

2007-02-09 Thread Pavel Machek
Hi!

> Per your request, and the request of the distros, we've changed
> how ACPICA Core releases are integrated into Linux so that each
> upstream (CVS) check-in appears as a single git commit.
> While this process is not yet perfect, it should be vastly better
> than previous "code drops" in allowing git bisect to work,
> and allowing distros to cherry-pick individual fixes.
> 
> The "bay" driver is new (and marked EXPERIMENTAL) -- adding initial
> hot-plug support for ACPI controlled drive bays such as the
> IBM ultrabay or the Dell Module Bay.

Could you describe userland interface it uses? /proc? Will it be
usable for bays on notebooks not using acpi?

> The "asus-laptop" driver is also new.  Consistent with msi-laptop,
> it uses ACPI in platform-specific ways, but strives to avoid
> exposing ACPI-specific implementation details to the user.
> asus-laptop is mutually exclusive with asus_acpi, which it will
> replace over time.

Not including another /proc/acpi/ibm -like nightmare, is it?

> the old /proc/acpi/ interfaces with cleaner interfaces in sysfs --
> non-ACPI-specific generic ones whenever possible.  This effort
> is not complete, but it has been in -mm for a long time and
> I believe that it is time to push it upstream to benefit
> from broader exposure and testing.

Does it still include completely broken alarm interface? Can't find it
in changelogs, so hopefully not.

-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 1/1] PM: Adds remount fs ro at suspend

2007-02-09 Thread Pavel Machek
On Wed 2007-02-07 09:25:39, Henrique de Moraes Holschuh wrote:
> On Wed, 07 Feb 2007, Nigel Cunningham wrote:
> > Ok, as far as usage scenario goes, that's fair enough. But as to the
> > solution, I wonder though whether it's making life more complicated than
> > it needs to be. After all, we should also be able to cope okay with
> > having the power suddenly go out. If we can cope with that, cleaning
> > filesystems prior to suspending should be a non-issue.
> 
> We don't cope okay with the power going out, at all.  And as an user case, a
> need for fsck if you do something that is a reasonable use case (unplugging
> devices while suspended) is not okay, either.

It would be nice to umount devices over suspend, but I do not think
solution is as easy as patch that started this thread. For now it is
'dont do that' and fsck is nice reminder that you done something
wrong.

Pavel

-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] [NETDEV] [004] dmfe : Add suspend/resume support

2007-02-09 Thread Pavel Machek
Hi!

> From: Maxim Levitsky <[EMAIL PROTECTED]>
> Subject: [PATCH] [NETDEV] [004] dmfe : Add suspend/resume support
> 
> Adds support for suspend/resume

Patch looks ok, but your mailer damaged it heavily.

> --- linux-2.6.20-mod/drivers/net/tulip/dmfe.c   2007-02-07 18:46:13.0 
> +0200
> +++ linux-2.6.20-test/drivers/net/tulip/dmfe.c  2007-02-07 18:50:52.0 
> +0200
> @@ -55,9 +55,6 @@
>  
>      TODO
>  
> -    Implement pci_driver::suspend() and pci_driver::resume()
> -    power management methods.
> -
>      Check on 64 bit boxes.
>      Check and fix on big endian boxes.
>  
> @@ -2027,11 +2024,59 @@ static struct pci_device_id dmfe_pci_tbl
>  MODULE_DEVICE_TABLE(pci, dmfe_pci_tbl);
>  
>  
> +
> +static int dmfe_suspend(struct pci_dev *pci_dev, pm_message_t state)
> +{

Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: DMA mapping API for non-system memory pools

2007-02-09 Thread Matthew Jacob

Yes- this would be interesting to know wrt to doing things like
PCI<>PCI xfers (e.g., for things like the Micromemory NVRAM card).

On 2/9/07, Kumar Gala <[EMAIL PROTECTED]> wrote:

We've been having a discussion on the linuxppc-dev list about how to
handle IO memory that exists on some PPC SoC devices.  These IO
memories behave like system memory but are faster to the processor or
device needed accessing for things like buffer descriptors.


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: DMA mapping API for non-system memory pools

2007-02-09 Thread James Bottomley
On Fri, 2007-02-09 at 17:33 -0600, Kumar Gala wrote:
> ideally all this would be handled via the dma mapping API, the  
> question is how to convey to the API to use the IO memory vs the  
> system memory?  Should we look at adding a new GFP_IOMEM flag or do  
> something based on struct device?
> 
> Any ideas on direction (or if this is a solved problem elsewhere)  
> would be appreciated.

Doesn't the dma_declare_coherent_memory() API work for this case?  it
was designed for the ARM SoC (and the voyager weird SCSI card).

James


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ck] Re: Swap prefetch merge plans

2007-02-09 Thread Chuck Ebbert
Andrew Morton wrote:
> I have an email sitting in my drafts folder stating that I'll no longer
> accept any features unless they've been publically reviewed in detail and
> run-time tested by a third party.  The idea being to force people to spend
> more time reviewing and testing each other's stuff and less time writing
> new stuff.  Maybe on a sufficiently gloomy day I'll actually send it.
>   
/me sneaks into Andrew's office and sends it out.


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 11/10] lguest: use disable_acpi()

2007-02-09 Thread Rusty Russell
On Fri, 2007-02-09 at 12:49 -0500, Len Brown wrote:
> On Friday 09 February 2007 12:14, James Morris wrote:
> > This is being disabled in the guest kernel only.  The host and guest 
> > kernels are expected to be the same build.
> 
> Okay, but better to use disable_acpi()
> indeed, since this would be the first code not already inside CONFIG_ACPI
> to invoke disable_acpi(), we could define the inline as empty and you could
> then scratch the #ifdef too.

Thanks Len!

This applies on top of that series.

== 
Len Brown <[EMAIL PROTECTED]> said:
> Okay, but better to use disable_acpi()
> indeed, since this would be the first code not already inside CONFIG_ACPI
> to invoke disable_acpi(), we could define the inline as empty and you could
> then scratch the #ifdef too.

Signed-off-by: Rusty Russell <[EMAIL PROTECTED]>

diff -r 85363b87e20b arch/i386/lguest/lguest.c
--- a/arch/i386/lguest/lguest.c Sat Feb 10 01:52:37 2007 +1100
+++ b/arch/i386/lguest/lguest.c Sat Feb 10 10:28:36 2007 +1100
@@ -555,10 +555,7 @@ static __attribute_used__ __init void lg
mce_disabled = 1;
 #endif
 
-#ifdef CONFIG_ACPI
-   acpi_disabled = 1;
-   acpi_ht = 0;
-#endif
+   disable_acpi();
if (boot->initrd_size) {
/* We stash this at top of memory. */
INITRD_START = boot->max_pfn*PAGE_SIZE - boot->initrd_size;
diff -r 85363b87e20b include/asm-i386/acpi.h
--- a/include/asm-i386/acpi.h   Sat Feb 10 01:52:37 2007 +1100
+++ b/include/asm-i386/acpi.h   Sat Feb 10 10:43:43 2007 +1100
@@ -127,6 +127,7 @@ extern int acpi_irq_balance_set(char *st
 #define acpi_ioapic 0
 static inline void acpi_noirq_set(void) { }
 static inline void acpi_disable_pci(void) { }
+static inline void disable_acpi(void) { }
 
 #endif /* !CONFIG_ACPI */
 



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] fix misannotation of linkinfo_dn

2007-02-09 Thread David Miller
From: Al Viro <[EMAIL PROTECTED]>
Date: Fri, 09 Feb 2007 18:13:42 +

> 
> Signed-off-by: Al Viro <[EMAIL PROTECTED]>
> ---
>  include/linux/dn.h |2 +-
>  1 files changed, 1 insertions(+), 1 deletions(-)

Also applied, thanks Al.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] FRA_{DST,SRC} are le16 for decnet

2007-02-09 Thread David Miller
From: Al Viro <[EMAIL PROTECTED]>
Date: Fri, 09 Feb 2007 18:13:37 +

> 
> Signed-off-by: Al Viro <[EMAIL PROTECTED]>
> ---
>  net/decnet/dn_rules.c |   12 ++--
>  1 files changed, 6 insertions(+), 6 deletions(-)

Applied.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[stable patch 2.6.20 3/3] ieee1394: fix host device registering when nodemgr disabled

2007-02-09 Thread Stefan Richter
Date: Tue, 6 Feb 2007 02:34:45 +0100 (CET)
From: Stefan Richter <[EMAIL PROTECTED]>

Since my commit 8252bbb1363b7fe963a3eb6f8a36da619a6f5a65 in 2.6.20-rc1,
host devices have a dummy driver attached.  Alas the driver was not
registered before use if ieee1394 was loaded with disable_nodemgr=1.

This resulted in non-functional FireWire drivers or kernel lockup.
http://bugzilla.kernel.org/show_bug.cgi?id=7942

Signed-off-by: Stefan Richter <[EMAIL PROTECTED]>
---
 drivers/ieee1394/nodemgr.c |   24 
 1 file changed, 16 insertions(+), 8 deletions(-)

same as commit 91efa462054d44ae52b0c6c8325ed5e899f2cd17 in linux-2.6.20-git#

(Side note:  The parameter disable_nodemgr=1 is merely an optional
tuning parameter for people who know what they are doing and who don't
need device discovery and bus management.)

Index: linux-2.6.20/drivers/ieee1394/nodemgr.c
===
--- linux-2.6.20.orig/drivers/ieee1394/nodemgr.c
+++ linux-2.6.20/drivers/ieee1394/nodemgr.c
@@ -274,7 +274,6 @@ static struct device_driver nodemgr_mid_
 struct device nodemgr_dev_template_host = {
.bus= _bus_type,
.release= nodemgr_release_host,
-   .driver = _mid_layer_driver,
 };
 
 
@@ -1889,22 +1888,31 @@ int init_ieee1394_nodemgr(void)
 
error = class_register(_ne_class);
if (error)
-   return error;
-
+   goto fail_ne;
error = class_register(_ud_class);
-   if (error) {
-   class_unregister(_ne_class);
-   return error;
-   }
+   if (error)
+   goto fail_ud;
error = driver_register(_mid_layer_driver);
+   if (error)
+   goto fail_ml;
+   /* This driver is not used if nodemgr is off (disable_nodemgr=1). */
+   nodemgr_dev_template_host.driver = _mid_layer_driver;
+
hpsb_register_highlevel(_highlevel);
return 0;
+
+fail_ml:
+   class_unregister(_ud_class);
+fail_ud:
+   class_unregister(_ne_class);
+fail_ne:
+   return error;
 }
 
 void cleanup_ieee1394_nodemgr(void)
 {
hpsb_unregister_highlevel(_highlevel);
-
+   driver_unregister(_mid_layer_driver);
class_unregister(_ud_class);
class_unregister(_ne_class);
 }


-- 
Stefan Richter
-=-=-=== --=- -=-=-
http://arcgraph.de/sr/

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[stable patch 2.6.20 2/3] ieee1394: video1394: DMA fix

2007-02-09 Thread Stefan Richter
Date: Sat, 03 Feb 2007 03:09:09 -0500
From: David Moore <[EMAIL PROTECTED]>

This together with the phys_to_virt fix in lib/swiotlb.c::swiotlb_sync_sg
fixes video1394 DMA on machines with DMA bounce buffers, especially Intel
x86-64 machines with > 3GB RAM.

Signed-off-by: Stefan Richter <[EMAIL PROTECTED]>
Signed-off-by: David Moore <[EMAIL PROTECTED]>
Tested-by: Nicolas Turro <[EMAIL PROTECTED]>
---
 drivers/ieee1394/video1394.c |8 
 1 file changed, 8 insertions(+)

same as commit a5782010b4e75cba571357efaa27df22a89427c2 in linux-2.6.20-git#

Index: linux-2.6.20/drivers/ieee1394/video1394.c
===
--- linux-2.6.20.orig/drivers/ieee1394/video1394.c
+++ linux-2.6.20/drivers/ieee1394/video1394.c
@@ -489,6 +489,9 @@ static void wakeup_dma_ir_ctx(unsigned l
reset_ir_status(d, i);
d->buffer_status[d->buffer_prg_assignment[i]] = 
VIDEO1394_BUFFER_READY;

do_gettimeofday(>buffer_time[d->buffer_prg_assignment[i]]);
+   dma_region_sync_for_cpu(>dma,
+   d->buffer_prg_assignment[i] * d->buf_size,
+   d->buf_size);
}
}
 
@@ -1096,6 +1099,8 @@ static long video1394_ioctl(struct file 
DBGMSG(ohci->host->id, "Starting iso transmit DMA 
ctx=%d",
   d->ctx);
put_timestamp(ohci, d, d->last_buffer);
+   dma_region_sync_for_device(>dma,
+   v.buffer * d->buf_size, d->buf_size);
 
/* Tell the controller where the first program is */
reg_write(ohci, d->cmdPtr,
@@ -,6 +1116,9 @@ static long video1394_ioctl(struct file 
  "Waking up iso transmit dma ctx=%d",
  d->ctx);
put_timestamp(ohci, d, d->last_buffer);
+   dma_region_sync_for_device(>dma,
+   v.buffer * d->buf_size, d->buf_size);
+
reg_write(ohci, d->ctrlSet, 0x1000);
}
}


-- 
Stefan Richter
-=-=-=== --=- -=-=-
http://arcgraph.de/sr/

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 4 of 7] lguest: Config and headers

2007-02-09 Thread Rusty Russell
On Fri, 2007-02-09 at 13:15 -0500, James Morris wrote:
> On Sat, 10 Feb 2007, Rusty Russell wrote:
> 
> > +/* 64k ought to be enough for anybody! */
> > +#define HYPERVISOR_MAP_ORDER 16
> > +#define HYPERVISOR_PAGES ((1 << HYPERVISOR_MAP_ORDER)/PAGE_SIZE)
> 
> I think it'd be better to go back to defining HYPERVISOR_SIZE then derive 
> the map order from that via get_order(), as it should be 4 instead of 16; 
> and this code is now both implying PAGE_SIZE while also using it for 
> calculations. 

Well it was the use of get_order() which triggered Andi's alarm bells,
so I went back to deriving it.  This code is correct, however.

get_order() is one of those classic functions only a kernel coder could
love.  Look how lovingly it has been optimized:

#define get_order(n)\
(   \
__builtin_constant_p(n) ?   \
((n < (1UL << PAGE_SHIFT)) ? 0 : ilog2(n) - PAGE_SHIFT) :   \
__get_order(n, PAGE_SHIFT)  \
 )

All that time spent, yet no consideration that it should be called
"get_page_order()" or some name which hints that the divide by page size
is happening.  It's even documented in the comment above, so someone
thought it needed explaining.  Too bad they chose to explain it instead
of actually clarifying it. 8(

Cheers,
Rusty.


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[stable patch 2.6.20 1/3] Missing critical phys_to_virt in lib/swiotlb.c

2007-02-09 Thread Stefan Richter
Date: Sun, 04 Feb 2007 13:39:40 -0500
From: David Moore <[EMAIL PROTECTED]>

Adds missing call to phys_to_virt() in the
lib/swiotlb.c:swiotlb_sync_sg() function.  Without this change, a kernel
panic will always occur whenever a SWIOTLB bounce buffer from a
scatter-gather list gets synced.

Signed-off-by: David Moore <[EMAIL PROTECTED]>
Signed-off-by: Stefan Richter <[EMAIL PROTECTED]>
---

This is a fraction of patch "[IA64] swiotlb bug fixes" in 2.6.20-git#,
commit cde14bbfb3aa79b479db35bd29e6c083513d8614.  Unlike its heading
suggests, it is also important for EM64T.

Example crashes caused by swiotlb_sync_sg:
http://lists.opensuse.org/opensuse-bugs/2006-12/msg02943.html
http://qa.mandriva.com/show_bug.cgi?id=28224
http://www.pchdtv.com/forum/viewtopic.php?t=2063=a959a14a4c2db0eebaab7b0df56103ce

--- linux-2.6.20.orig/lib/swiotlb.c 2007-02-04 13:18:41.0 -0500
+++ linux-2.6.20/lib/swiotlb.c  2007-02-04 13:19:43.0 -0500
@@ -750,7 +750,7 @@ swiotlb_sync_sg(struct device *hwdev, st
 
for (i = 0; i < nelems; i++, sg++)
if (sg->dma_address != SG_ENT_PHYS_ADDRESS(sg))
-   sync_single(hwdev, (void *) sg->dma_address,
+   sync_single(hwdev, phys_to_virt(sg->dma_address),
sg->dma_length, dir, target);
 }
 

-- 
Stefan Richter
-=-=-=== --=- -=-=-
http://arcgraph.de/sr/

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0 of 4] Generic AIO by scheduling stacks

2007-02-09 Thread Linus Torvalds


On Fri, 9 Feb 2007, Davide Libenzi wrote:
> 
> That's another way to do it. But you end up creating/destroying a new 
> thread for every request. May be performing just fine.

Well, I actually wanted to add a special CLONE_ASYNC flag, because I
think we could do it better if we know it's a particularly limited special 
case. But that's really just a "small implementation detail", and I don't 
know how big a deal it is. I didn't want to obscure the basic idea with 
anything bigger.

I agree that the create/destroy is a big overhead, but at least it's now 
only done when we actually end up doing some IO (and _after_ we've started 
the IO, of course - that's when we block), so compared to doing it up 
front, I'm hoping that it's not actually that horrid.

The "fork-like" approach also means that it's very flexible. It's not 
really even limited to doing simple system calls any more: you *could*, 
for example, decide that since you already have the thread, and now that 
it's asynchronous, you'd actually return to user space (to let user space 
"complete" whatever asynchronous action it wanted to complete).

> Another, even simpler way IMO, is to just have a plain per-task kthread 
> pool, and a queue.

Yes, that is actually quite doable with basically the same interface. It's 
literally a "small decision" inside of "schedule_async()" on how it 
actually would want to handle the case of "hey, we now have concurrent 
work to be done".

But I actually don't think a per-task kthread pool is necessarily a good 
idea. If a thread pool works for this, then it should have worked for 
regular thread create/destroy loads too - ie there really is little reason 
to special-case the "async system call" case.

NOTE! I'm also not at all sure that we actually want to waste real threads 
on this. My patch is in no way meant to be an "exclusive alternative" to 
fibrils. Quite the reverse, actually: I _like_ those synchronous fibrils, 
but I didn't like how Zach did the overhead of creating them up-front, 
because I really would like the cached case to be totally *synchronous*.

So I wrote my patch with a "schedule_async()" implementation that just 
creates a full-sized thread, but I actually wanted very much to try to 
make it use fibrils that are allocated on-demand too. I was just too lazy.

So the patch is really meant as a "ok, this is how easy it is to make the 
thread allocation be 'on-demand' instead of 'up-front'". The actual 
_policy_ on how thread allocation is done isn't even interesting to me, to 
some degree. I think Zack's fibrils would work fine, a thread pool would 
work fine, and just the silly outright "new thread for everything" that 
the example patch actually used may also possibly work well enough.

It's one reason I liked my patch. It was not only small and simple, it 
really is very flexible, I think. It's also totally independent on how 
you actually end up _executing_ the async requests.

(In fact, you could easily make it a config option whether you support any 
asynchronous behaviour AT ALL. The "async()" system call might still be 
there, but it would just return "0" all the time, and do the actual work 
synchronously).

Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


DMA mapping API for non-system memory pools

2007-02-09 Thread Kumar Gala
We've been having a discussion on the linuxppc-dev list about how to  
handle IO memory that exists on some PPC SoC devices.  These IO  
memories behave like system memory but are faster to the processor or  
device needed accessing for things like buffer descriptors.


Here's an example in which allocation is done either via system  
memory or a specialized allocator for MURAM from drivers/net/ucc_geth.c:


(Yes, the system memory should be moved to use the dma mapping api)

if (uf_info->bd_mem_part == MEM_PART_SYSTEM) {
u32 align = 4;
if (UCC_GETH_TX_BD_RING_ALIGNMENT > 4)
align = UCC_GETH_TX_BD_RING_ALIGNMENT;
ugeth->tx_bd_ring_offset[j] =
kmalloc((u32) (length + align),  
GFP_KERNEL);


if (ugeth->tx_bd_ring_offset[j] != 0)
ugeth->p_tx_bd_ring[j] =
(void*)((ugeth- 
>tx_bd_ring_offset[j] +

align) & ~(align - 1));
} else if (uf_info->bd_mem_part == MEM_PART_MURAM) {
ugeth->tx_bd_ring_offset[j] =
qe_muram_alloc(length,

UCC_GETH_TX_BD_RING_ALIGNMENT);

if (!IS_MURAM_ERR(ugeth->tx_bd_ring_offset[j]))
ugeth->p_tx_bd_ring[j] =
(u8 *) qe_muram_addr(ugeth->
  
tx_bd_ring_offset[j]);

}

ideally all this would be handled via the dma mapping API, the  
question is how to convey to the API to use the IO memory vs the  
system memory?  Should we look at adding a new GFP_IOMEM flag or do  
something based on struct device?


Any ideas on direction (or if this is a solved problem elsewhere)  
would be appreciated.


Thanks

- kumar
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Kwatch: kernel watchpoints using CPU debug registers

2007-02-09 Thread Roland McGrath
> Yes.  In fact, the current existing code does not handle dr6 correctly.  
> It never clears the register, which means you're likely to get into 
> trouble when multiple breakpoints (or watchpoints) are enabled.

This is a subtle change from the existing ABI, in which userland has to
clear %dr6 via ptrace itself.  But gdb never does that AFAICT.  So it's in
fact subject to confusion when two watchpoints are set and the second hits
after the first.  So gdb ought to be fixed to clear dr6 via ptrace, to work
with existing and older kernels.

I don't think I really object to the ABI change of clearing %dr6 after an
exception so that it does not accumulate multiple results.  But first I'll
have to convince myself that we never actually do want to accumulate
multiple results.  Hmm, I think we can, so maybe I do object.  If you set
two watchpoints inside a user buffer and then do a system call that touches
both those addresses (e.g. read), then you will go through do_debug (to
send_sigtrap) twice before returning to user mode.  When the syscall is
done, you'll have a pending SIGTRAP for the debugger to handle.  By looking
at your %dr6 the debugger can see that both watchpoints hit.  (gdb does not
handle this case, but it should.)  Am I wrong?

So this gets to the more complicated view of %dr6 handling that I had first
had in mind yesterday.  Each allocation "owns" one of the low 4 bits in
%dr6 too.  Only the dr6 bits owned by the userland "raw" allocation
(i.e. ptrace/utrace_regset) should appear nonzero in thread.debugreg[6].
So when kwatch swallows a debug exception, it should mask off its bit from
%dr6 in the CPU, but not clear %dr6 completely.  That way you can have a
sequence of user dr0 hit, kwatch dr3 hit, user dr1 hit, all inside one
system call (including interrupt handlers), and when it gets to the
userland debugger examining dr6 it sees the low 2 bits both set.

> It's really quite a tricky matter.  Should a register be allocated to
> kwatch only when no user process needs it?  Should we really go about
> checking the requirements of every single process whenever a kwatch
> allocation request comes in?  What if the processes which need a
> particular register aren't running -- should the register then be given to
> kwatch?  What if one of those processes then does start running on one
> CPU?

To "go about checking the requirements of every single process" is not so
hard as it sounds when they're recorded as a single global use count per
slot, as your original code does.  When you mentioned a "your allocation is
available" callback, I was thinking it might come to that being called
inside context switch.  It's all rather tricky, indeed.  

The obvious answer is to start simple.  If any user process anywhere uses
drN, kwatch has to give it up for all CPUs (watchpoints with less than
"break ptrace" priority do).  If anyone really cares about more flexibility
than that, we can change or extend it.  Some copious comments in the
interface descriptions can lead them in the right direction if the
situation comes up.  Probably with systemtap support in a while, we'll get
a lot more concrete uses of watchpoints and people finding out what really 
matters to them.


Thanks,
Roland

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: NAK new drivers without proper power management?

2007-02-09 Thread Nigel Cunningham
Hi.

On Sat, 2007-02-10 at 00:12 +0100, Rafael J. Wysocki wrote:
> > > I think if CONFIG_PM_DEBUG is set, the core should warn about drivers not
> > > having .suspend or .resume routines.
> > 
> > The only problem with that is, not everyone turns on CONFIG_PM_DEBUG.
> > CONFIG_PM instead?
> 
> Well, I can imagine a driver that doesn't need a .suspend routine, for 
> example,
> and I don't think we should make the kernel always complain about that.

How about...

#ifdef CONFIG_PM_PARANOIA
static int empty_suspend_routine(struct device *dev, pm_message_t state)
{
return 0;
}
#define empty_suspend empty_suspend_routine
#else
#define empty_suspend NULL
#endif

...

.suspend = empty_suspend;
...


Then CONFIG_PM_PARANOIA can be enabled by default for now, and when we
eventually device it's not needed anymore, someone can submit a patch
replacing either turning off the CONFIG by default or removing the whole
mechanism.

> I think if someone doesn't set CONFIG_PM_DEBUG, we can ask him to set it
> and report back.

We can, but the whole point to the suggestion was to make your life and
mine easier, as well as those of our users.

Making it dependent on CONFIG_PM instead achieves that by:
- Saving you, I and distro people from having to tell their users to
enable the option (and how to)
- Saving the users the problem of going through all the steps, making
mistakes, potentially ending up with unbootable systems because they
make mistakes and so on.

This way, they just need to look in dmesg.

Regards,

Nigel

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 21/22] honor r/w changes at do_remount() time

2007-02-09 Thread Andrew Morton
On Fri, 09 Feb 2007 14:53:44 -0800
Dave Hansen <[EMAIL PROTECTED]> wrote:

> This is the core of the read-only bind mount patch set.

Who wants read-only bind mounts, and for what reason?
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 01/22] filesystem helpers for custom 'struct file's

2007-02-09 Thread Andrew Morton
On Fri, 09 Feb 2007 14:53:29 -0800
Dave Hansen <[EMAIL PROTECTED]> wrote:

> +/*
> + * Note: This is a crappy interface.  It is here to make
> + * merging with the existing users of get_empty_filp()
> + * who have complex failure logic easier.  All users
> + * of this should be moving to alloc_file().
> + */
> +int init_file(struct file *file, struct vfsmount *mnt,
> +struct dentry *dentry, mode_t mode,
> +const struct file_operations *fop)

crappy name too ;)  At least two filesystems have defined their own
static-scope init_file() and so they'll explode if they somehow maange
to include file.h.

I guess we can cross that bridge when we fall off it, but sometime it might be
prudent to do s/init_file/configfs_init_file/ and ditto sysfs_init_file.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: NAK new drivers without proper power management?

2007-02-09 Thread Rafael J. Wysocki
Hi,

On Friday, 9 February 2007 23:51, Nigel Cunningham wrote:
> Hi.
> 
> On Fri, 2007-02-09 at 23:44 +0100, Rafael J. Wysocki wrote:
> > On Friday, 9 February 2007 23:26, Nigel Cunningham wrote:
> > > Hi.
> > > 
> > > On Fri, 2007-02-09 at 23:17 +0100, Arjan van de Ven wrote:
> > > > On Sat, 2007-02-10 at 08:57 +1100, Nigel Cunningham wrote:
> > > > > Hi.
> > > > > 
> > > > > I don't think this is already done (feel free to correct me if I'm
> > > > > wrong)..
> > > > > 
> > > > > Can we start to NAK new drivers that don't have proper power 
> > > > > management
> > > > > implemented? There really is no excuse for writing a new driver and 
> > > > > not
> > > > > putting .suspend and .resume methods in anymore, is there?
> > > > 
> > > > 
> > > > to a large degree, a device driver that doesn't suspend is better than
> > > > no device driver at all, right?
> > > 
> > > I'm not sure it is. It only makes more work for everyone else: We have
> > > to help people figure out what causes their computer to fail to resume
> > > (which can take quite a while), then get them them complain to driver
> > > author, and the driver author has to submit patches to fix it.
> > > 
> > > All of this is avoided if they'll just do it right in the first place.
> > > 
> > > > now.. if you want to make the core warn about it, that's very fair
> > > 
> > > That's probably a good idea too, since I'm only suggesting this for new
> > > drivers.
> > 
> > I think if CONFIG_PM_DEBUG is set, the core should warn about drivers not
> > having .suspend or .resume routines.
> 
> The only problem with that is, not everyone turns on CONFIG_PM_DEBUG.
> CONFIG_PM instead?

Well, I can imagine a driver that doesn't need a .suspend routine, for example,
and I don't think we should make the kernel always complain about that.

I think if someone doesn't set CONFIG_PM_DEBUG, we can ask him to set it
and report back.

Greetings,
Rafael
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0 of 4] Generic AIO by scheduling stacks

2007-02-09 Thread Davide Libenzi
On Fri, 9 Feb 2007, Linus Torvalds wrote:

> 
> Ok, here's another entry in this discussion.

That's another way to do it. But you end up creating/destroying a new 
thread for every request. May be performing just fine.
Another, even simpler way IMO, is to just have a plain per-task kthread 
pool, and a queue. An async_submit() drops a request in the queue, and 
wakes the requests queue-head where the kthreads are sleeping. One kthread 
picks up the request, service it, drops a result in the result queue, and 
wakes results queue-head (where async_fetch() are sleeping). Cancellation 
is not problem here (by the mean of sending a signal to the service 
kthread). Also, no problem with arch-dependent code. This is a 1:1 
match of what my userspace implementation does.
Of course, no hot-path optimization are performed here, and you need a few 
context switches more than necessary.
Let's have Zach (Ingo support to Zach would be great) play with the 
optimized version, and then we can maybe bench the three to see if the 
more complex code that the optimized version require, gets a pay-back from 
the performance side.

/me thinks it likely will



- Davide


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ANNOUNCE] d80211 based driver for Intel PRO/Wireless 3945ABG

2007-02-09 Thread Stefan Schmidt
Hello.

On Sat, 2007-02-10 at 09:26, Neil Brown wrote:
> On Friday February 9, [EMAIL PROTECTED] wrote:
> > 
> > Ok.  Now... any questions?
> > 
> 
> Yes.  Does this require a closed user-space helper like the other
> 3945ABG driver, or is it completely open (maybe excepting firmware)?

Quote from the mentioned website:

"In addition to using the new d80211 subsystem, this project uses a
new microcode image which removes the need for the user space
regulatory daemon for this adapter"

regards
Stefan Schmidt


signature.asc
Description: Digital signature


[PATCH] saa7134: cleanup

2007-02-09 Thread Heikki Orsila
A cleanup patch against 2.6.20 for saa7134 video4linux driver:

 - use generic sort instead of bubblesort
 - removed useless saa7134_video_fini function
 - small coding style changes

Signed-off-by: Heikki Orsila <[EMAIL PROTECTED]>

-- 
Heikki Orsila   Barbie's law:
[EMAIL PROTECTED]   "Math is hard, let's go shopping!"
http://www.iki.fi/shd
diff -urp linux-2.6.20-org/drivers/media/video/saa7134/saa7134-core.c 
linux-2.6.20/drivers/media/video/saa7134/saa7134-core.c
--- linux-2.6.20-org/drivers/media/video/saa7134/saa7134-core.c 2007-02-04 
20:44:54.0 +0200
+++ linux-2.6.20/drivers/media/video/saa7134/saa7134-core.c 2007-02-10 
00:51:01.0 +0200
@@ -703,7 +703,6 @@ static int saa7134_hwfini(struct saa7134
saa7134_ts_fini(dev);
saa7134_input_fini(dev);
saa7134_vbi_fini(dev);
-   saa7134_video_fini(dev);
saa7134_tvaudio_fini(dev);
return 0;
 }
diff -urp linux-2.6.20-org/drivers/media/video/saa7134/saa7134-video.c 
linux-2.6.20/drivers/media/video/saa7134/saa7134-video.c
--- linux-2.6.20-org/drivers/media/video/saa7134/saa7134-video.c
2007-02-04 20:44:54.0 +0200
+++ linux-2.6.20/drivers/media/video/saa7134/saa7134-video.c2007-02-10 
00:51:01.0 +0200
@@ -26,6 +26,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "saa7134-reg.h"
 #include "saa7134.h"
@@ -516,14 +517,12 @@ static int res_get(struct saa7134_dev *d
return 1;
 }
 
-static
-int res_check(struct saa7134_fh *fh, unsigned int bit)
+static int res_check(struct saa7134_fh *fh, unsigned int bit)
 {
return (fh->resources & bit);
 }
 
-static
-int res_locked(struct saa7134_dev *dev, unsigned int bit)
+static int res_locked(struct saa7134_dev *dev, unsigned int bit)
 {
return (dev->resources & bit);
 }
@@ -732,25 +731,6 @@ struct cliplist {
__u8  disable;
 };
 
-static void sort_cliplist(struct cliplist *cl, int entries)
-{
-   struct cliplist swap;
-   int i,j,n;
-
-   for (i = entries-2; i >= 0; i--) {
-   for (n = 0, j = 0; j <= i; j++) {
-   if (cl[j].position > cl[j+1].position) {
-   swap = cl[j];
-   cl[j] = cl[j+1];
-   cl[j+1] = swap;
-   n++;
-   }
-   }
-   if (0 == n)
-   break;
-   }
-}
-
 static void set_cliplist(struct saa7134_dev *dev, int reg,
struct cliplist *cl, int entries, char *name)
 {
@@ -784,15 +764,27 @@ static int clip_range(int val)
return val;
 }
 
+/* Sort into smallest position first order */
+static int cliplist_cmp(const void *a, const void *b)
+{
+   const struct cliplist *cla = a;
+   const struct cliplist *clb = b;
+   if (cla->position < clb->position)
+   return -1;
+   if (cla->position > clb->position)
+   return 1;
+   return 0;
+}
+
 static int setup_clipping(struct saa7134_dev *dev, struct v4l2_clip *clips,
  int nclips, int interlace)
 {
struct cliplist col[16], row[16];
-   int cols, rows, i;
+   int cols = 0, rows = 0, i;
int div = interlace ? 2 : 1;
 
-   memset(col,0,sizeof(col)); cols = 0;
-   memset(row,0,sizeof(row)); rows = 0;
+   memset(col, 0, sizeof(col));
+   memset(row, 0, sizeof(row));
for (i = 0; i < nclips && i < 8; i++) {
col[cols].position = clip_range(clips[i].c.left);
col[cols].enable   = (1 << i);
@@ -808,8 +800,8 @@ static int setup_clipping(struct saa7134
row[rows].disable  = (1 << i);
rows++;
}
-   sort_cliplist(col,cols);
-   sort_cliplist(row,rows);
+   sort(col, cols, sizeof col[0], cliplist_cmp, NULL);
+   sort(row, rows, sizeof row[0], cliplist_cmp, NULL);
set_cliplist(dev,0x380,col,cols,"cols");
set_cliplist(dev,0x384,row,rows,"rows");
return 0;
@@ -1261,19 +1253,14 @@ static struct videobuf_queue* saa7134_qu
 
 static int saa7134_resource(struct saa7134_fh *fh)
 {
-   int res = 0;
+   if (fh->type == V4L2_BUF_TYPE_VIDEO_CAPTURE)
+   return RESOURCE_VIDEO;
 
-   switch (fh->type) {
-   case V4L2_BUF_TYPE_VIDEO_CAPTURE:
-   res = RESOURCE_VIDEO;
-   break;
-   case V4L2_BUF_TYPE_VBI_CAPTURE:
-   res = RESOURCE_VBI;
-   break;
-   default:
-   BUG();
-   }
-   return res;
+   if (fh->type == V4L2_BUF_TYPE_VBI_CAPTURE)
+   return RESOURCE_VBI;
+
+   BUG();
+   return 0;
 }
 
 static int video_open(struct inode *inode, struct file *file)
@@ -1461,8 +1448,7 @@ static int video_release(struct inode *i
return 0;
 }
 
-static int
-video_mmap(struct file *file, struct vm_area_struct * vma)
+static int 

[PATCH 06/22] elevate write count during entire ncp_ioctl()

2007-02-09 Thread Dave Hansen


Some ioctls need write access, but others don't.  Make a helper
function to decide when write access is needed, and take it.

Signed-off-by: Dave Hansen <[EMAIL PROTECTED]>
---

 lxc-dave/fs/ncpfs/ioctl.c |   55 +-
 1 file changed, 54 insertions(+), 1 deletion(-)

diff -puN fs/ncpfs/ioctl.c~08-24-elevate-write-count-during-entire-ncp-ioctl 
fs/ncpfs/ioctl.c
--- lxc/fs/ncpfs/ioctl.c~08-24-elevate-write-count-during-entire-ncp-ioctl  
2007-02-09 14:26:50.0 -0800
+++ lxc-dave/fs/ncpfs/ioctl.c   2007-02-09 14:26:50.0 -0800
@@ -14,6 +14,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -260,7 +261,7 @@ ncp_get_charsets(struct ncp_server* serv
 }
 #endif /* CONFIG_NCPFS_NLS */
 
-int ncp_ioctl(struct inode *inode, struct file *filp,
+static int __ncp_ioctl(struct inode *inode, struct file *filp,
  unsigned int cmd, unsigned long arg)
 {
struct ncp_server *server = NCP_SERVER(inode);
@@ -821,6 +822,58 @@ outrel:
return -EINVAL;
 }
 
+static int ncp_ioctl_need_write(unsigned int cmd)
+{
+   switch (cmd) {
+   case NCP_IOC_GET_FS_INFO:
+   case NCP_IOC_GET_FS_INFO_V2:
+   case NCP_IOC_NCPREQUEST:
+   case NCP_IOC_SETDENTRYTTL:
+   case NCP_IOC_SIGN_INIT:
+   case NCP_IOC_LOCKUNLOCK:
+   case NCP_IOC_SET_SIGN_WANTED:
+   return 1;
+   case NCP_IOC_GETOBJECTNAME:
+   case NCP_IOC_SETOBJECTNAME:
+   case NCP_IOC_GETPRIVATEDATA:
+   case NCP_IOC_SETPRIVATEDATA:
+   case NCP_IOC_SETCHARSETS:
+   case NCP_IOC_GETCHARSETS:
+   case NCP_IOC_CONN_LOGGED_IN:
+   case NCP_IOC_GETDENTRYTTL:
+   case NCP_IOC_GETMOUNTUID2:
+   case NCP_IOC_SIGN_WANTED:
+   case NCP_IOC_GETROOT:
+   case NCP_IOC_SETROOT:
+   return 0;
+   default:
+   /* unkown IOCTL command, assume write */
+   WARN_ON(1);
+   }
+   return 1;
+}
+
+int ncp_ioctl(struct inode *inode, struct file *filp,
+ unsigned int cmd, unsigned long arg)
+{
+   int ret;
+
+   if (ncp_ioctl_need_write(cmd)) {
+   /*
+* inside the ioctl(), any failures which
+* are because of file_permission() are
+* -EACCESS, so it seems consistent to keep
+*  that here.
+*/
+   if (mnt_want_write(filp->f_vfsmnt))
+   return -EACCES;
+   }
+   ret = __ncp_ioctl(inode, filp, cmd, arg);
+   if (ncp_ioctl_need_write(cmd))
+   mnt_drop_write(filp->f_vfsmnt);
+   return ret;
+}
+
 #ifdef CONFIG_COMPAT
 long ncp_compat_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
 {
_
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 04/22] elevate writer count for chown and friends

2007-02-09 Thread Dave Hansen


chown/chmod,etc... don't call permission in the same way
that the normal "open for write" calls do.  They still
write to the filesystem, so bump the write count during
these operations.

Signed-off-by: Dave Hansen <[EMAIL PROTECTED]>
---

 lxc-dave/fs/open.c |   37 +
 1 file changed, 33 insertions(+), 4 deletions(-)

diff -puN fs/open.c~06-24-elevate-writer-count-for-chown-and-friends fs/open.c
--- lxc/fs/open.c~06-24-elevate-writer-count-for-chown-and-friends  
2007-02-09 14:26:48.0 -0800
+++ lxc-dave/fs/open.c  2007-02-09 14:26:48.0 -0800
@@ -511,9 +511,12 @@ asmlinkage long sys_fchmod(unsigned int 
err = -EROFS;
if (IS_RDONLY(inode))
goto out_putf;
+   err = mnt_want_write(file->f_vfsmnt);
+   if (err)
+   goto out_putf;
err = -EPERM;
if (IS_IMMUTABLE(inode) || IS_APPEND(inode))
-   goto out_putf;
+   goto out_drop_write;
mutex_lock(>i_mutex);
if (mode == (mode_t) -1)
mode = inode->i_mode;
@@ -522,6 +525,8 @@ asmlinkage long sys_fchmod(unsigned int 
err = notify_change(dentry, );
mutex_unlock(>i_mutex);
 
+out_drop_write:
+   mnt_drop_write(file->f_vfsmnt);
 out_putf:
fput(file);
 out:
@@ -541,13 +546,16 @@ asmlinkage long sys_fchmodat(int dfd, co
goto out;
inode = nd.dentry->d_inode;
 
+   error = mnt_want_write(nd.mnt);
+   if (error)
+   goto dput_and_out;
error = -EROFS;
if (IS_RDONLY(inode))
-   goto dput_and_out;
+   goto out_drop_write;
 
error = -EPERM;
if (IS_IMMUTABLE(inode) || IS_APPEND(inode))
-   goto dput_and_out;
+   goto out_drop_write;
 
mutex_lock(>i_mutex);
if (mode == (mode_t) -1)
@@ -557,6 +565,8 @@ asmlinkage long sys_fchmodat(int dfd, co
error = notify_change(nd.dentry, );
mutex_unlock(>i_mutex);
 
+out_drop_write:
+   mnt_drop_write(nd.mnt);
 dput_and_out:
path_release();
 out:
@@ -582,7 +592,7 @@ static int chown_common(struct dentry * 
error = -EROFS;
if (IS_RDONLY(inode))
goto out;
-   error = -EPERM;
+   error = -EPERM;
if (IS_IMMUTABLE(inode) || IS_APPEND(inode))
goto out;
newattrs.ia_valid =  ATTR_CTIME;
@@ -611,7 +621,12 @@ asmlinkage long sys_chown(const char __u
error = user_path_walk(filename, );
if (error)
goto out;
+   error = mnt_want_write(nd.mnt);
+   if (error)
+   goto out_release;
error = chown_common(nd.dentry, user, group);
+   mnt_drop_write(nd.mnt);
+out_release:
path_release();
 out:
return error;
@@ -631,7 +646,12 @@ asmlinkage long sys_fchownat(int dfd, co
error = __user_walk_fd(dfd, filename, follow, );
if (error)
goto out;
+   error = mnt_want_write(nd.mnt);
+   if (error)
+   goto out_release;
error = chown_common(nd.dentry, user, group);
+   mnt_drop_write(nd.mnt);
+out_release:
path_release();
 out:
return error;
@@ -645,7 +665,11 @@ asmlinkage long sys_lchown(const char __
error = user_path_walk_link(filename, );
if (error)
goto out;
+   error = mnt_want_write(nd.mnt);
+   if (error)
+   goto out_release;
error = chown_common(nd.dentry, user, group);
+out_release:
path_release();
 out:
return error;
@@ -662,9 +686,14 @@ asmlinkage long sys_fchown(unsigned int 
if (!file)
goto out;
 
+   error = mnt_want_write(file->f_vfsmnt);
+   if (error)
+   goto out_fput;
dentry = file->f_path.dentry;
audit_inode(NULL, dentry->d_inode);
error = chown_common(dentry, user, group);
+   mnt_drop_write(file->f_vfsmnt);
+out_fput:
fput(file);
 out:
return error;
_
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 07/22] elevate write count for link and symlink calls

2007-02-09 Thread Dave Hansen



Signed-off-by: Dave Hansen <[EMAIL PROTECTED]>
---

 lxc-dave/fs/namei.c |   10 ++
 1 file changed, 10 insertions(+)

diff -puN fs/namei.c~09-24-elevate-write-count-for-link-and-symlink-calls 
fs/namei.c
--- lxc/fs/namei.c~09-24-elevate-write-count-for-link-and-symlink-calls 
2007-02-09 14:26:50.0 -0800
+++ lxc-dave/fs/namei.c 2007-02-09 14:26:50.0 -0800
@@ -2236,7 +2236,12 @@ asmlinkage long sys_symlinkat(const char
if (IS_ERR(dentry))
goto out_unlock;
 
+   error = mnt_want_write(nd.mnt);
+   if (error)
+   goto out_dput;
error = vfs_symlink(nd.dentry->d_inode, dentry, from, S_IALLUGO);
+   mnt_drop_write(nd.mnt);
+out_dput:
dput(dentry);
 out_unlock:
mutex_unlock(>d_inode->i_mutex);
@@ -2331,7 +2336,12 @@ asmlinkage long sys_linkat(int olddfd, c
error = PTR_ERR(new_dentry);
if (IS_ERR(new_dentry))
goto out_unlock;
+   error = mnt_want_write(nd.mnt);
+   if (error)
+   goto out_dput;
error = vfs_link(old_nd.dentry, nd.dentry->d_inode, new_dentry);
+   mnt_drop_write(nd.mnt);
+out_dput:
dput(new_dentry);
 out_unlock:
mutex_unlock(>d_inode->i_mutex);
_
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 08/22] elevate mount count for extended attributes

2007-02-09 Thread Dave Hansen


This basically audits the callers of xattr_permission(), which
calls permission() and can perform writes to the filesystem.

Signed-off-by: Dave Hansen <[EMAIL PROTECTED]>
---

 lxc-dave/fs/nfsd/nfs4proc.c |7 ++-
 lxc-dave/fs/xattr.c |   14 ++
 2 files changed, 20 insertions(+), 1 deletion(-)

diff -puN fs/nfsd/nfs4proc.c~10-24-elevate-mount-count-for-extended-attributes 
fs/nfsd/nfs4proc.c
--- lxc/fs/nfsd/nfs4proc.c~10-24-elevate-mount-count-for-extended-attributes
2007-02-09 14:26:51.0 -0800
+++ lxc-dave/fs/nfsd/nfs4proc.c 2007-02-09 14:26:51.0 -0800
@@ -626,14 +626,19 @@ nfsd4_setattr(struct svc_rqst *rqstp, st
return status;
}
}
+   status = mnt_want_write(cstate->current_fh.fh_export->ex_mnt);
+   if (status)
+   return status;
status = nfs_ok;
if (setattr->sa_acl != NULL)
status = nfsd4_set_nfs4_acl(rqstp, >current_fh,
setattr->sa_acl);
if (status)
-   return status;
+   goto out;
status = nfsd_setattr(rqstp, >current_fh, >sa_iattr,
0, (time_t)0);
+out:
+   mnt_drop_write(cstate->current_fh.fh_export->ex_mnt);
return status;
 }
 
diff -puN fs/xattr.c~10-24-elevate-mount-count-for-extended-attributes 
fs/xattr.c
--- lxc/fs/xattr.c~10-24-elevate-mount-count-for-extended-attributes
2007-02-09 14:26:51.0 -0800
+++ lxc-dave/fs/xattr.c 2007-02-09 14:26:51.0 -0800
@@ -12,6 +12,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -237,7 +238,11 @@ sys_setxattr(char __user *path, char __u
error = user_path_walk(path, );
if (error)
return error;
+   error = mnt_want_write(nd.mnt);
+   if (error)
+   return error;
error = setxattr(nd.dentry, name, value, size, flags);
+   mnt_drop_write(nd.mnt);
path_release();
return error;
 }
@@ -252,7 +257,11 @@ sys_lsetxattr(char __user *path, char __
error = user_path_walk_link(path, );
if (error)
return error;
+   error = mnt_want_write(nd.mnt);
+   if (error)
+   return error;
error = setxattr(nd.dentry, name, value, size, flags);
+   mnt_drop_write(nd.mnt);
path_release();
return error;
 }
@@ -268,9 +277,14 @@ sys_fsetxattr(int fd, char __user *name,
f = fget(fd);
if (!f)
return error;
+   error = mnt_want_write(f->f_vfsmnt);
+   if (error)
+   goto out_fput;
dentry = f->f_path.dentry;
audit_inode(NULL, dentry->d_inode);
error = setxattr(dentry, name, value, size, flags);
+   mnt_drop_write(f->f_vfsmnt);
+out_fput:
fput(f);
return error;
 }
_
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 02/22] r/o bind mounts: add vfsmount writer counts

2007-02-09 Thread Dave Hansen

This patch actually adds the mount and superblock writer
counts, and the mnt_want/drop_write() functions that use
them.

Before these can become useful, we must first cover each
place in the VFS where writes are performed with a
want/drop pair.  When that is complete, we can actually
introduce code that will safely check the counts before
allowing r/w<->r/o transitions to occur.

Signed-off-by: Dave Hansen <[EMAIL PROTECTED]>
---

 lxc-dave/fs/namespace.c|   53 +
 lxc-dave/fs/super.c|   18 ++---
 lxc-dave/include/linux/fs.h|2 +
 lxc-dave/include/linux/mount.h |   21 
 4 files changed, 90 insertions(+), 4 deletions(-)

diff -puN fs/namespace.c~03-24-add-vfsmount-writer-count fs/namespace.c
--- lxc/fs/namespace.c~03-24-add-vfsmount-writer-count  2007-02-09 
14:26:47.0 -0800
+++ lxc-dave/fs/namespace.c 2007-02-09 14:26:47.0 -0800
@@ -58,6 +58,7 @@ struct vfsmount *alloc_vfsmnt(const char
if (mnt) {
mnt->mnt_user_ns = get_user_ns(current->nsproxy->user_ns);
atomic_set(>mnt_count, 1);
+   mnt->mnt_writers = 0;
INIT_LIST_HEAD(>mnt_hash);
INIT_LIST_HEAD(>mnt_child);
INIT_LIST_HEAD(>mnt_mounts);
@@ -78,6 +79,56 @@ struct vfsmount *alloc_vfsmnt(const char
return mnt;
 }
 
+int mnt_make_readonly(struct vfsmount *mnt)
+{
+   int ret = 0;
+
+   WARN_ON(__mnt_is_readonly(mnt));
+
+   /*
+* This flag set is actually redundant with what
+* happens in do_remount(), but since we do this
+* under the lock, anyone attempting to get a write
+* on it after this will fail.
+*/
+   spin_lock(>mnt_sb->s_mnt_writers_lock);
+   if (!mnt->mnt_writers)
+   mnt->mnt_flags |= MNT_READONLY;
+   else
+   ret = -EBUSY;
+   spin_unlock(>mnt_sb->s_mnt_writers_lock);
+   return ret;
+}
+
+int mnt_want_write(struct vfsmount *mnt)
+{
+   int ret = 0;
+
+   spin_lock(>mnt_sb->s_mnt_writers_lock);
+   if (mnt->mnt_writers)
+   goto out;
+
+   if (__mnt_is_readonly(mnt)) {
+   ret = -EROFS;
+   goto out;
+   }
+   mnt->mnt_sb->s_writers++;
+   mnt->mnt_writers++;
+out:
+   spin_unlock(>mnt_sb->s_mnt_writers_lock);
+   return ret;
+}
+EXPORT_SYMBOL_GPL(mnt_want_write);
+
+void mnt_drop_write(struct vfsmount *mnt)
+{
+   spin_lock(>mnt_sb->s_mnt_writers_lock);
+   mnt->mnt_sb->s_writers--;
+   mnt->mnt_writers--;
+   spin_unlock(>mnt_sb->s_mnt_writers_lock);
+}
+EXPORT_SYMBOL_GPL(mnt_drop_write);
+
 int simple_set_mnt(struct vfsmount *mnt, struct super_block *sb)
 {
mnt->mnt_sb = sb;
@@ -1415,6 +1466,8 @@ long do_mount(char *dev_name, char *dir_
((char *)data_page)[PAGE_SIZE - 1] = 0;
 
/* Separate the per-mountpoint flags */
+   if (flags & MS_RDONLY)
+   mnt_flags |= MNT_READONLY;
if (flags & MS_NOSUID)
mnt_flags |= MNT_NOSUID;
if (flags & MS_NODEV)
diff -puN fs/super.c~03-24-add-vfsmount-writer-count fs/super.c
--- lxc/fs/super.c~03-24-add-vfsmount-writer-count  2007-02-09 
14:26:47.0 -0800
+++ lxc-dave/fs/super.c 2007-02-09 14:26:47.0 -0800
@@ -93,6 +93,8 @@ static struct super_block *alloc_super(s
s->s_qcop = sb_quotactl_ops;
s->s_op = _op;
s->s_time_gran = 10;
+   s->s_writers = 0;
+   spin_lock_init(>s_mnt_writers_lock);
}
 out:
return s;
@@ -576,6 +578,11 @@ static void mark_files_ro(struct super_b
file_list_unlock();
 }
 
+static int sb_remount_ro(struct super_block *sb)
+{
+   return fs_may_remount_ro(sb);
+}
+
 /**
  * do_remount_sb - asks filesystem to change mount options.
  * @sb:superblock in question
@@ -587,7 +594,8 @@ static void mark_files_ro(struct super_b
  */
 int do_remount_sb(struct super_block *sb, int flags, void *data, int force)
 {
-   int retval;
+   int retval = 0;
+   int sb_started_ro = (sb->s_flags & MS_RDONLY);

 #ifdef CONFIG_BLOCK
if (!(flags & MS_RDONLY) && bdev_read_only(sb->s_bdev))
@@ -600,11 +608,13 @@ int do_remount_sb(struct super_block *sb
 
/* If we are remounting RDONLY and current sb is read/write,
   make sure there are no rw files opened */
-   if ((flags & MS_RDONLY) && !(sb->s_flags & MS_RDONLY)) {
+   if ((flags & MS_RDONLY) && !sb_started_ro) {
if (force)
mark_files_ro(sb);
-   else if (!fs_may_remount_ro(sb))
-   return -EBUSY;
+   else
+   retval = sb_remount_ro(sb);
+   if (retval)
+   return retval;
}
 
if (sb->s_op->remount_fs) {
diff -puN 

[PATCH 22/22] kill open files traverse on remount ro

2007-02-09 Thread Dave Hansen

Now that we have the sb writer count, and all of the
writers marked with mnt_want_write(), we don't need to
go looking at all of the individual open files.

Kill the open files walk, and use the sb writer count.

Signed-off-by: Dave Hansen <[EMAIL PROTECTED]>
---

 lxc-dave/fs/file_table.c|   25 -
 lxc-dave/fs/super.c |   13 -
 lxc-dave/include/linux/fs.h |2 --
 3 files changed, 12 insertions(+), 28 deletions(-)

diff -puN fs/file_table.c~24-24-kill-open-files-traverse-on-remount-ro 
fs/file_table.c
--- lxc/fs/file_table.c~24-24-kill-open-files-traverse-on-remount-ro
2007-02-09 14:27:01.0 -0800
+++ lxc-dave/fs/file_table.c2007-02-09 14:27:01.0 -0800
@@ -308,31 +308,6 @@ void file_kill(struct file *file)
}
 }
 
-int fs_may_remount_ro(struct super_block *sb)
-{
-   struct list_head *p;
-
-   /* Check that no files are currently opened for writing. */
-   file_list_lock();
-   list_for_each(p, >s_files) {
-   struct file *file = list_entry(p, struct file, f_u.fu_list);
-   struct inode *inode = file->f_path.dentry->d_inode;
-
-   /* File with pending delete? */
-   if (inode->i_nlink == 0)
-   goto too_bad;
-
-   /* Writeable file? */
-   if (S_ISREG(inode->i_mode) && (file->f_mode & FMODE_WRITE))
-   goto too_bad;
-   }
-   file_list_unlock();
-   return 1; /* Tis' cool bro. */
-too_bad:
-   file_list_unlock();
-   return 0;
-}
-
 void __init files_init(unsigned long mempages)
 { 
int n; 
diff -puN fs/super.c~24-24-kill-open-files-traverse-on-remount-ro fs/super.c
--- lxc/fs/super.c~24-24-kill-open-files-traverse-on-remount-ro 2007-02-09 
14:27:01.0 -0800
+++ lxc-dave/fs/super.c 2007-02-09 14:27:01.0 -0800
@@ -580,7 +580,18 @@ static void mark_files_ro(struct super_b
 
 static int sb_remount_ro(struct super_block *sb)
 {
-   return fs_may_remount_ro(sb);
+   int ret = 0;
+
+   /*
+* The r/o flag actually gets set
+* by the caller.
+*/
+   spin_lock(>s_mnt_writers_lock);
+   if (sb->s_writers)
+   ret = -EBUSY;
+   spin_unlock(>s_mnt_writers_lock);
+
+   return ret;
 }
 
 /**
diff -puN include/linux/fs.h~24-24-kill-open-files-traverse-on-remount-ro 
include/linux/fs.h
--- lxc/include/linux/fs.h~24-24-kill-open-files-traverse-on-remount-ro 
2007-02-09 14:27:01.0 -0800
+++ lxc-dave/include/linux/fs.h 2007-02-09 14:27:01.0 -0800
@@ -1657,8 +1657,6 @@ extern const struct file_operations read
 extern const struct file_operations write_fifo_fops;
 extern const struct file_operations rdwr_fifo_fops;
 
-extern int fs_may_remount_ro(struct super_block *);
-
 #ifdef CONFIG_BLOCK
 /*
  * return READ, READA, or WRITE
_
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: somebody dropped a (warning) bomb

2007-02-09 Thread Martin Mares
Hello!

> void* comparisons are unsigned. Period.

As far as the C standard is concerned, there is no relationship between
comparison on pointers and comparison of their values casted to uintptr_t.
The address space needn't be linear and on some machines it isn't. So
speaking about signedness of pointer comparisons doesn't make sense,
except for concrete implementations.

Have a nice fortnight
-- 
Martin `MJ' Mares  <[EMAIL PROTECTED]>   
http://mj.ucw.cz/
Faculty of Math and Physics, Charles University, Prague, Czech Rep., Earth
Top ten reasons to procrastinate: 1.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 19/22] elevate writer count for custom struct_file

2007-02-09 Thread Dave Hansen


Some filesystems forego the use of normal vfs calls to create
struct files.  Make sure that these users elevate the mnt writer
count.  These probably don't have any real meaning because there
is no real backing store for these mounts, but it is here for
consistency.


Signed-off-by: Dave Hansen <[EMAIL PROTECTED]>
---

 lxc-dave/fs/file_table.c |4 
 1 file changed, 4 insertions(+)

diff -puN fs/file_table.c~22-24-elevate-writer-count-for-custom-struct-file 
fs/file_table.c
--- lxc/fs/file_table.c~22-24-elevate-writer-count-for-custom-struct-file   
2007-02-09 14:26:59.0 -0800
+++ lxc-dave/fs/file_table.c2007-02-09 14:26:59.0 -0800
@@ -171,6 +171,10 @@ int init_file(struct file *file, struct 
file->f_mapping = dentry->d_inode->i_mapping;
file->f_mode = mode;
file->f_op = fop;
+   if (mode & FMODE_WRITE) {
+   error = mnt_want_write(mnt);
+   WARN_ON(error);
+   }
return error;
 }
 
_
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 15/22] elevate write count for do_sys_utime() and touch_atime()

2007-02-09 Thread Dave Hansen



Signed-off-by: Dave Hansen <[EMAIL PROTECTED]>
---

 lxc-dave/fs/inode.c |   20 
 1 file changed, 12 insertions(+), 8 deletions(-)

diff -puN fs/inode.c~17-24-elevate-write-count-for-do-sys-utime-and-touch-atime 
fs/inode.c
--- lxc/fs/inode.c~17-24-elevate-write-count-for-do-sys-utime-and-touch-atime   
2007-02-09 14:26:56.0 -0800
+++ lxc-dave/fs/inode.c 2007-02-09 14:26:56.0 -0800
@@ -1170,22 +1170,23 @@ void touch_atime(struct vfsmount *mnt, s
struct inode *inode = dentry->d_inode;
struct timespec now;
 
-   if (inode->i_flags & S_NOATIME)
+   if (mnt && mnt_want_write(mnt))
return;
+   if (inode->i_flags & S_NOATIME)
+   goto out;
if (IS_NOATIME(inode))
-   return;
+   goto out;
if ((inode->i_sb->s_flags & MS_NODIRATIME) && S_ISDIR(inode->i_mode))
-   return;
+   goto out;
 
/*
 * We may have a NULL vfsmount when coming from NFSD
 */
if (mnt) {
if (mnt->mnt_flags & MNT_NOATIME)
-   return;
+   goto out;
if ((mnt->mnt_flags & MNT_NODIRATIME) && S_ISDIR(inode->i_mode))
-   return;
-
+   goto out;
if (mnt->mnt_flags & MNT_RELATIME) {
/*
 * With relative atime, only update atime if the
@@ -1196,16 +1197,19 @@ void touch_atime(struct vfsmount *mnt, s
>i_atime) < 0 &&
timespec_compare(>i_ctime,
>i_atime) < 0)
-   return;
+   goto out;
}
}
 
now = current_fs_time(inode->i_sb);
if (timespec_equal(>i_atime, ))
-   return;
+   goto out;
 
inode->i_atime = now;
mark_inode_dirty_sync(inode);
+out:
+   if (mnt)
+   mnt_drop_write(mnt);
 }
 EXPORT_SYMBOL(touch_atime);
 
_
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 12/22] elevate write count files are open()ed

2007-02-09 Thread Dave Hansen


This is the first really tricky patch in the series.  It
elevates the writer count on a mount each time a
non-special file is opened for write.

This is not completely apparent in the patch because the
two if() conditions in may_open() above the
mnt_want_write() call are, combined, equivalent to
special_file().

There is also an elevated count around the vfs_create()
call in open_namei().  The count needs to be kept elevated
all the way into the may_open() call.  Otherwise, when the
write is dropped, a ro->rw transisition could occur.  This
would lead to having rw access on the newly created file,
while the vfsmount is ro.  That is bad.

Signed-off-by: Dave Hansen <[EMAIL PROTECTED]>
---

 lxc-dave/fs/file_table.c |5 -
 lxc-dave/fs/namei.c  |   22 ++
 lxc-dave/ipc/mqueue.c|3 +++
 3 files changed, 25 insertions(+), 5 deletions(-)

diff -puN fs/file_table.c~14-24-tricky-elevate-write-count-files-are-open-ed 
fs/file_table.c
--- lxc/fs/file_table.c~14-24-tricky-elevate-write-count-files-are-open-ed  
2007-02-09 14:26:54.0 -0800
+++ lxc-dave/fs/file_table.c2007-02-09 14:26:54.0 -0800
@@ -209,8 +209,11 @@ void fastcall __fput(struct file *file)
if (unlikely(S_ISCHR(inode->i_mode) && inode->i_cdev != NULL))
cdev_put(inode->i_cdev);
fops_put(file->f_op);
-   if (file->f_mode & FMODE_WRITE)
+   if (file->f_mode & FMODE_WRITE) {
put_write_access(inode);
+   if(!special_file(inode->i_mode))
+   mnt_drop_write(mnt);
+   }
put_pid(file->f_owner.pid);
put_user_ns(file->f_owner.user_ns);
file_kill(file);
diff -puN fs/namei.c~14-24-tricky-elevate-write-count-files-are-open-ed 
fs/namei.c
--- lxc/fs/namei.c~14-24-tricky-elevate-write-count-files-are-open-ed   
2007-02-09 14:26:54.0 -0800
+++ lxc-dave/fs/namei.c 2007-02-09 14:26:54.0 -0800
@@ -1548,8 +1548,17 @@ int may_open(struct nameidata *nd, int a
return -EACCES;
 
flag &= ~O_TRUNC;
-   } else if (IS_RDONLY(inode) && (flag & FMODE_WRITE))
-   return -EROFS;
+   } else if (flag & FMODE_WRITE) {
+   /*
+* effectively: !special_file()
+* balanced by __fput()
+*/
+   error = mnt_want_write(nd->mnt);
+   if (error)
+   return error;
+   if (IS_RDONLY(inode))
+   return -EROFS;
+   }
/*
 * An append-only file must be opened in append mode for writing.
 */
@@ -1688,14 +1697,17 @@ do_last:
}
 
if (IS_ERR(nd->intent.open.file)) {
-   mutex_unlock(>d_inode->i_mutex);
error = PTR_ERR(nd->intent.open.file);
-   goto exit_dput;
+   goto exit_mutex_unlock;
}
 
/* Negative dentry, just create the file */
if (!path.dentry->d_inode) {
+   error = mnt_want_write(nd->mnt);
+   if (error)
+   goto exit_mutex_unlock;
error = open_namei_create(nd, , flag, mode);
+   mnt_drop_write(nd->mnt);
if (error)
goto exit;
return 0;
@@ -1733,6 +1745,8 @@ ok:
goto exit;
return 0;
 
+exit_mutex_unlock:
+   mutex_unlock(>d_inode->i_mutex);
 exit_dput:
dput_path(, nd);
 exit:
diff -puN ipc/mqueue.c~14-24-tricky-elevate-write-count-files-are-open-ed 
ipc/mqueue.c
--- lxc/ipc/mqueue.c~14-24-tricky-elevate-write-count-files-are-open-ed 
2007-02-09 14:26:54.0 -0800
+++ lxc-dave/ipc/mqueue.c   2007-02-09 14:26:54.0 -0800
@@ -687,6 +687,9 @@ asmlinkage long sys_mq_open(const char _
goto out;
filp = do_open(dentry, oflag);
} else {
+   error = mnt_want_write(mqueue_mnt);
+   if (error)
+   goto out;
filp = do_create(mqueue_mnt->mnt_root, dentry,
oflag, mode, u_attr);
}
_
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 13/22] elevate writer count for do_sys_truncate()

2007-02-09 Thread Dave Hansen



Signed-off-by: Dave Hansen <[EMAIL PROTECTED]>
---

 lxc-dave/fs/open.c |   16 +++-
 1 file changed, 11 insertions(+), 5 deletions(-)

diff -puN fs/open.c~15-24-elevate-writer-count-for-do-sys-truncate fs/open.c
--- lxc/fs/open.c~15-24-elevate-writer-count-for-do-sys-truncate
2007-02-09 14:26:55.0 -0800
+++ lxc-dave/fs/open.c  2007-02-09 14:26:55.0 -0800
@@ -241,28 +241,32 @@ static long do_sys_truncate(const char _
if (!S_ISREG(inode->i_mode))
goto dput_and_out;
 
-   error = vfs_permission(, MAY_WRITE);
+   error = mnt_want_write(nd.mnt);
if (error)
goto dput_and_out;
 
+   error = vfs_permission(, MAY_WRITE);
+   if (error)
+   goto mnt_drop_write_and_out;
+
error = -EROFS;
if (IS_RDONLY(inode))
-   goto dput_and_out;
+   goto mnt_drop_write_and_out;
 
error = -EPERM;
if (IS_IMMUTABLE(inode) || IS_APPEND(inode))
-   goto dput_and_out;
+   goto mnt_drop_write_and_out;
 
/*
 * Make sure that there are no leases.
 */
error = break_lease(inode, FMODE_WRITE);
if (error)
-   goto dput_and_out;
+   goto mnt_drop_write_and_out;
 
error = get_write_access(inode);
if (error)
-   goto dput_and_out;
+   goto mnt_drop_write_and_out;
 
error = locks_verify_truncate(inode, NULL, length);
if (!error) {
@@ -271,6 +275,8 @@ static long do_sys_truncate(const char _
}
put_write_access(inode);
 
+mnt_drop_write_and_out:
+   mnt_drop_write(nd.mnt);
 dput_and_out:
path_release();
 out:
_
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 16/22] sys_mknodat(): elevate write count for vfs_mknod/create()

2007-02-09 Thread Dave Hansen


This takes care of all of the direct callers of vfs_mknod().
Since a few of these cases also handle normal file creation
as well, this also covers some calls to vfs_create().

Signed-off-by: Dave Hansen <[EMAIL PROTECTED]>
---

 lxc-dave/fs/namei.c |   12 
 lxc-dave/fs/nfsd/vfs.c  |4 
 lxc-dave/net/unix/af_unix.c |4 
 3 files changed, 20 insertions(+)

diff -puN fs/namei.c~18-24-sys-mknodat-elevate-write-count-for-vfs-mknod-create 
fs/namei.c
--- lxc/fs/namei.c~18-24-sys-mknodat-elevate-write-count-for-vfs-mknod-create   
2007-02-09 14:26:57.0 -0800
+++ lxc-dave/fs/namei.c 2007-02-09 14:26:57.0 -0800
@@ -1903,14 +1903,26 @@ asmlinkage long sys_mknodat(int dfd, con
if (!IS_ERR(dentry)) {
switch (mode & S_IFMT) {
case 0: case S_IFREG:
+   error = mnt_want_write(nd.mnt);
+   if (error)
+   break;
error = vfs_create(nd.dentry->d_inode,dentry,mode,);
+   mnt_drop_write(nd.mnt);
break;
case S_IFCHR: case S_IFBLK:
+   error = mnt_want_write(nd.mnt);
+   if (error)
+   break;
error = vfs_mknod(nd.dentry->d_inode,dentry,mode,
new_decode_dev(dev));
+   mnt_drop_write(nd.mnt);
break;
case S_IFIFO: case S_IFSOCK:
+   error = mnt_want_write(nd.mnt);
+   if (error)
+   break;
error = vfs_mknod(nd.dentry->d_inode,dentry,mode,0);
+   mnt_drop_write(nd.mnt);
break;
case S_IFDIR:
error = -EPERM;
diff -puN 
fs/nfsd/vfs.c~18-24-sys-mknodat-elevate-write-count-for-vfs-mknod-create 
fs/nfsd/vfs.c
--- 
lxc/fs/nfsd/vfs.c~18-24-sys-mknodat-elevate-write-count-for-vfs-mknod-create
2007-02-09 14:26:57.0 -0800
+++ lxc-dave/fs/nfsd/vfs.c  2007-02-09 14:26:57.0 -0800
@@ -664,6 +664,9 @@ nfsd_open(struct svc_rqst *rqstp, struct
/* Disallow write access to files with the append-only bit set
 * or any access when mandatory locking enabled
 */
+   err = mnt_want_write(fhp->fh_export->ex_mnt);
+   if (err)
+   goto out_nfserr;
err = nfserr_perm;
if (IS_APPEND(inode) && (access & MAY_WRITE))
goto out;
@@ -1199,6 +1202,7 @@ nfsd_create(struct svc_rqst *rqstp, stru
printk("nfsd: bad file type %o in nfsd_create\n", type);
host_err = -EINVAL;
}
+   mnt_drop_write(fhp->fh_export->ex_mnt);
if (host_err < 0)
goto out_nfserr;
 
diff -puN 
net/unix/af_unix.c~18-24-sys-mknodat-elevate-write-count-for-vfs-mknod-create 
net/unix/af_unix.c
--- 
lxc/net/unix/af_unix.c~18-24-sys-mknodat-elevate-write-count-for-vfs-mknod-create
   2007-02-09 14:26:57.0 -0800
+++ lxc-dave/net/unix/af_unix.c 2007-02-09 14:26:57.0 -0800
@@ -816,7 +816,11 @@ static int unix_bind(struct socket *sock
 */
mode = S_IFSOCK |
   (SOCK_INODE(sock)->i_mode & ~current->fs->umask);
+   err = mnt_want_write(nd.mnt);
+   if (err)
+   goto out_mknod_dput;
err = vfs_mknod(nd.dentry->d_inode, dentry, mode, 0);
+   mnt_drop_write(nd.mnt);
if (err)
goto out_mknod_dput;
mutex_unlock(>d_inode->i_mutex);
_
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


  1   2   3   4   5   6   7   8   9   >