Re: 2.6.19: ACPI reports AC not present after resume from STD

2007-03-07 Thread Andrey Borzenkov
On Tuesday 06 March 2007, Rafael J. Wysocki wrote:
 [changed Cc list]

 On Sunday, 25 February 2007 18:14, Andrey Borzenkov wrote:
  On Воскресенье 25 февраля 2007, Rafael J. Wysocki wrote:
   On Sunday, 25 February 2007 11:37, Andrey Borzenkov wrote:
On Воскресенье 25 февраля 2007, Rafael J. Wysocki wrote:
 On Sunday, 25 February 2007 00:26, Andrey Borzenkov wrote:
  On Суббота 24 февраля 2007, Rafael J. Wysocki wrote:
   Hi,
  
   On Saturday, 24 February 2007 10:55, Andrey Borzenkov wrote:
On Вторник 13 февраля 2007, Andrey Borzenkov wrote:
 On Четверг 07 декабря 2006, Lebedev, Vladimir P wrote:
  Please register new bug, attach acpidump and dmesg.

 http://bugzilla.kernel.org/show_bug.cgi?id=7995

 regards
   
Well, this starts looking like ACPI is not at fault.
   
When reporting AC state ACPI just reads contents of system
memory (I presume it gets updated by BIOS/ACPI when AC state
changes). It looks like this memory area is restored during
resume from STD. I updated mentioned bug report with more
detailed description. Now if someone could suggest a way to
catch if specific physical address gets saved/restored this
would finally explain it.
  
   First, if you want the reserved memory areas to be left alone
   by swsusp, you need to mark them as 'nosave'.  On x86_64 this
   is done by the function e820_mark_nosave_range() in
   arch/x86_64/kernel/e820.c that can be ported to i386 with no
   problems.  However, we haven't found that very useful, so far,
   since no one has ever reported any problems with the current
   approach, which is to save and restore them.
 
  Well, the following proof of concept patch fixes this issue for
  me. Please notice that original version of
  e820_mark_nosave_range() could fail to exclude some areas due to
  alignment issues (exactly what happened to me on first try) so it
  still can explain your problem too.

 Great job, thanks for the patch!  It looks good, so I'm going to
 forward it for merging.
   
Please no; I'm currently testing slightly more polished version; I
will send it later.
  
   OK
  
Could anybody explain (or give pointer to) what happens which region
that is not page-aligned? In particular, the very first one:
   
 BIOS-e820:  - 0009fc00 (usable)
 BIOS-e820: 0009fc00 - 000a (reserved)
   
Will the kernel allocate partial page (how?) or will the kernel
ignore last (first) incomplete page? In the former case how those
incomplete pages can be detected?
  
   Well, on x86_64, if I understand e820_register_active_regions()
   correctly, the partial pages won't be registered.
 
  It appears that for low memory kernel will ignore incomplete pages for
  sure. I hope it does the same for high memory - but for now I just throw
  this in and pray :) This also significantly simplifies patch.

 Well, can you please check if the appended modification of your patch still
 works?


It works for me with caveat

/home/bor/src/linux-git/arch/i386/kernel/e820.c: In 
function ‘e820_mark_nosave_range’:
/home/bor/src/linux-git/arch/i386/kernel/e820.c:328: warning: format ‘%016Lx’ 
expects type ‘long long unsigned int’, but argument 2 has type ‘long unsigned 
int’
/home/bor/src/linux-git/arch/i386/kernel/e820.c:328: warning: format ‘%016Lx’ 
expects type ‘long long unsigned int’, but argument 3 has type ‘long unsigned 
int’

regards 

-andrey

 Thanks,
 Rafael


 ---
  arch/i386/kernel/e820.c  |   47
 +++ arch/i386/kernel/setup.c | 
   1 +
  include/asm-i386/e820.h  |1 +
  3 files changed, 49 insertions(+)

 Index: linux-2.6.21-rc2/arch/i386/kernel/e820.c
 ===
 --- linux-2.6.21-rc2.orig/arch/i386/kernel/e820.c
 +++ linux-2.6.21-rc2/arch/i386/kernel/e820.c
 @@ -313,6 +313,53 @@ static int __init request_standard_resou

  subsys_initcall(request_standard_resources);

 +/*
 + * Mark pages corresponding to given pfn range as 'nosave'.
 + */
 +static void __init
 +e820_mark_nosave_range(unsigned long start_pfn, unsigned long end_pfn)
 +{
 + unsigned long pfn;
 +
 + if (start_pfn = end_pfn)
 + return;
 +
 + printk(Nosave address range: %016Lx - %016Lx\n,
 + PFN_PHYS(start_pfn), PFN_PHYS(end_pfn));
 + for (pfn = start_pfn; pfn  end_pfn; pfn++)
 + if (pfn_valid(pfn))
 + SetPageNosave(pfn_to_page(pfn));
 +}
 +
 +/*
 + * Find the ranges of physical addresses that do not correspond to
 + * e820 RAM areas and mark the corresponding pages as nosave for software
 + * suspend and suspend to RAM.
 + *
 + * This function requires the e820 map to be sorted and without any
 + * overlapping entries 

[patch 055/101] sata_sil: ignore and clear spurious IRQs while executing commands by polling

2007-03-07 Thread Greg KH

sata_sil used to trigger HSM error if IRQ occurs during polling
command.  This didn't matter because polling wasn't used in sata_sil.
However, as of 2.6.20, all IDENTIFYs are performed by polling and
device detection sometimes fails due to spurious IRQ.  This patch
makes sata_sil ignore and clear spurious IRQ while executing commands
by polling.

This fixes bug#7996 and IMHO should also be included in -stable.

Signed-off-by: Tejun Heo [EMAIL PROTECTED]
Cc: Jeff Garzik [EMAIL PROTECTED]
Signed-off-by: Greg Kroah-Hartman [EMAIL PROTECTED]

---
 drivers/ata/sata_sil.c |8 +++-
 1 file changed, 7 insertions(+), 1 deletion(-)

--- linux-2.6.20.1.orig/drivers/ata/sata_sil.c
+++ linux-2.6.20.1/drivers/ata/sata_sil.c
@@ -383,9 +383,15 @@ static void sil_host_intr(struct ata_por
goto freeze;
}
 
-   if (unlikely(!qc || qc-tf.ctl  ATA_NIEN))
+   if (unlikely(!qc))
goto freeze;
 
+   if (unlikely(qc-tf.flags  ATA_TFLAG_POLLING)) {
+   /* this sometimes happens, just clear IRQ */
+   ata_chk_status(ap);
+   return;
+   }
+
/* Check whether we are expecting interrupt in this state */
switch (ap-hsm_task_state) {
case HSM_ST_FIRST:

--
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[patch 005/101] pata_amd: fix an obvious bug in cable detection

2007-03-07 Thread Greg KH

80c test mask is at bits 18 and 19 of EIDE Controller Configuration
not 22 and 23.  Fix it.

Signed-off-by: Tejun Heo [EMAIL PROTECTED]
Acked-by: Alan Cox [EMAIL PROTECTED]

---
 drivers/ata/pata_amd.c |4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

--- linux-2.6.20.1.orig/drivers/ata/pata_amd.c
+++ linux-2.6.20.1/drivers/ata/pata_amd.c
@@ -128,7 +128,7 @@ static void timing_setup(struct ata_port
 
 static int amd_pre_reset(struct ata_port *ap)
 {
-   static const u32 bitmask[2] = {0x03, 0xC0};
+   static const u32 bitmask[2] = {0x03, 0x0C};
static const struct pci_bits amd_enable_bits[] = {
{ 0x40, 1, 0x02, 0x02 },
{ 0x40, 1, 0x01, 0x01 }
@@ -247,7 +247,7 @@ static void amd133_set_dmamode(struct at
  */
 
 static int nv_pre_reset(struct ata_port *ap) {
-   static const u8 bitmask[2] = {0x03, 0xC0};
+   static const u8 bitmask[2] = {0x03, 0x0C};
static const struct pci_bits nv_enable_bits[] = {
{ 0x50, 1, 0x02, 0x02 },
{ 0x50, 1, 0x01, 0x01 }

--
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


regression: SATA dead after resume from RAM 2.6.21-rc3

2007-03-07 Thread Mark Lord

Mmm.. like others, I've now been bitten by what looks like
a SATA failure on resume from RAM, with 2.6.21-rc3.

I don't have enough info to blame this specific -rc* kernel,
as it has only done it once to me so far.

So, a datapoint, but not much of clue beyond that.
Unless it happens again.

Yes, the GUI did come back from suspend, but the disk
(ICH6M) did not seem want to talk to anything afterwards,
and the system hung on a manual alt-sysrq-Sync.
I waited only about a minute or so before rebooting.

Here's the lspci and the last kernel logs from around
the suspend/resume before I rebooted.

# lspci
:00:00.0 Host bridge: Intel Corporation Mobile 915GM/PM/GMS/910GML Express 
Processor to DRAM Controller (rev 03)
:00:01.0 PCI bridge: Intel Corporation Mobile 915GM/PM Express PCI Express 
Root Port (rev 03)
:00:1d.0 USB Controller: Intel Corporation 82801FB/FBM/FR/FW/FRW (ICH6 
Family) USB UHCI #1 (rev 03)
:00:1d.1 USB Controller: Intel Corporation 82801FB/FBM/FR/FW/FRW (ICH6 
Family) USB UHCI #2 (rev 03)
:00:1d.2 USB Controller: Intel Corporation 82801FB/FBM/FR/FW/FRW (ICH6 
Family) USB UHCI #3 (rev 03)
:00:1d.3 USB Controller: Intel Corporation 82801FB/FBM/FR/FW/FRW (ICH6 
Family) USB UHCI #4 (rev 03)
:00:1d.7 USB Controller: Intel Corporation 82801FB/FBM/FR/FW/FRW (ICH6 
Family) USB2 EHCI Controller (rev 03)
:00:1e.0 PCI bridge: Intel Corporation 82801 Mobile PCI Bridge (rev d3)
:00:1e.2 Multimedia audio controller: Intel Corporation 
82801FB/FBM/FR/FW/FRW (ICH6 Family) AC'97 Audio Controller (rev 03)
:00:1f.0 ISA bridge: Intel Corporation 82801FBM (ICH6M) LPC Interface 
Bridge (rev 03)
:00:1f.2 IDE interface: Intel Corporation 82801FBM (ICH6M) SATA Controller 
(rev 03)
:00:1f.3 SMBus: Intel Corporation 82801FB/FBM/FR/FW/FRW (ICH6 Family) SMBus 
Controller (rev 03)
:01:00.0 VGA compatible controller: ATI Technologies Inc M22 [Radeon 
Mobility M300]
:03:00.0 Ethernet controller: Broadcom Corporation BCM4401-B0 100Base-TX 
(rev 02)
:03:01.0 CardBus bridge: Ricoh Co Ltd RL5c476 II (rev b3)
:03:01.1 FireWire (IEEE 1394): Ricoh Co Ltd R5C552 IEEE 1394 Controller 
(rev 08)
:03:01.2 0805: Ricoh Co Ltd R5C822 SD/SDIO/MMC/MS/MSPro Host Adapter (rev 
17)
:03:03.0 Network controller: Intel Corporation PRO/Wireless 2915ABG MiniPCI 
Adapter (rev 05)

14:28:37 gconfd (root-10138): starting (version 2.14.0), pid 10138 user 'root'
14:28:37 gconfd (root-10138): Resolved address 
xml:readonly:/etc/gconf/gconf.xml.mandatory to a read-only configuration 
source at position 0
14:28:37 gconfd (root-10138): Resolved address xml:readwrite:/root/.gconf to 
a writable configuration source at position 1
14:28:37 gconfd (root-10138): Resolved address 
xml:readonly:/etc/gconf/gconf.xml.defaults to a read-only configuration 
source at position 2
14:28:37 gconfd (root-10138): Resolved address 
xml:readonly:/var/lib/gconf/debian.defaults to a read-only configuration 
source at position 3
14:28:37 gconfd (root-10138): Resolved address 
xml:readonly:/var/lib/gconf/defaults to a read-only configuration source at 
position 4
14:32:43 gconfd (root-10138): GConf server is not in use, shutting down.
14:32:43 gconfd (root-10138): Exiting
14:32:43 kernel: Stopping tasks ... done.
14:32:43 kernel: Suspending console(s)
14:32:43 kernel: pl2303 5-1.3:1.0: no suspend for driver pl2303?
14:32:43 kernel: ACPI: PCI interrupt for device :03:01.2 disabled
14:32:43 kernel: ACPI: PCI interrupt for device :03:00.0 disabled
14:32:43 kernel: ACPI: PCI interrupt for device :00:1f.2 disabled
14:32:43 kernel: ACPI: PCI interrupt for device :00:1e.2 disabled
14:32:43 kernel: ACPI: PCI interrupt for device :00:1d.7 disabled
14:32:43 kernel: ACPI: PCI interrupt for device :00:1d.3 disabled
14:32:43 kernel: ACPI: PCI interrupt for device :00:1d.2 disabled
14:32:43 kernel: ACPI: PCI interrupt for device :00:1d.1 disabled
14:32:43 kernel: ACPI: PCI interrupt for device :00:1d.0 disabled
14:32:43 kernel: Intel machine check architecture supported.
14:32:43 kernel: Intel machine check reporting enabled on CPU#0.
14:32:43 kernel: Back to C!
14:32:43 kernel: PM: Writing back config space on device :00:01.0 at offset 
3 (was 1, writing 10010)
14:32:43 kernel: PCI: Setting latency timer of device :00:01.0 to 64
14:32:43 kernel: ACPI: PCI Interrupt :00:1d.0[A] - GSI 16 (level, low) - 
IRQ 16
14:32:43 kernel: PCI: Setting latency timer of device :00:1d.0 to 64
14:32:43 kernel: usb usb1: root hub lost power or was reset
14:32:43 kernel: PCI: Enabling device :00:1d.1 ( - 0001)
14:32:43 kernel: ACPI: PCI Interrupt :00:1d.1[B] - GSI 17 (level, low) - 
IRQ 18
14:32:43 kernel: PCI: Setting latency timer of device :00:1d.1 to 64
14:32:43 kernel: PM: Writing back config space on device :00:1d.1 at offset 
f (was 200, writing 20a)
14:32:43 kernel: PM: Writing back config space on device :00:1d.1 at offset 
8 (was 1, writing bf61)

Re: [PATCH 3/3] Use correct IDE error recovery

2007-03-07 Thread Bartlomiej Zolnierkiewicz

Hi,

(sorry for the long delay)

On Wednesday 21 February 2007, Suleiman Souhlal wrote:
 IDE error recovery is using WIN_IDLEIMMEDIATE which was only valid for
 IDE V1 and IDE V2.  Modern drives will not be able to recover using
 this error handling.  The correct thing to do is issue a SRST followed
 by a SET_FEATURES.

This change looks fine, indeed we are better of using SRST + SET_FEATURES
than IDLE_IMMEDIATE.

 Signed-off-by:Suleiman Souhlal [EMAIL PROTECTED]
 
 ---
  drivers/ide/ide-io.c   |   35 +++-
  drivers/ide/ide-iops.c |  105 
 
  include/linux/ide.h|2 +
  3 files changed, 88 insertions(+), 54 deletions(-)
 
 diff --git a/drivers/ide/ide-io.c b/drivers/ide/ide-io.c
 index c193553..2f05b4d 100644
 --- a/drivers/ide/ide-io.c
 +++ b/drivers/ide/ide-io.c
 @@ -519,21 +519,21 @@ static ide_startstop_t ide_ata_error(ide
   if ((stat  DRQ_STAT)  rq_data_dir(rq) == READ  
 hwif-err_stops_fifo == 0)
   try_to_flush_leftover_data(drive);
  
 + if (rq-errors = ERROR_MAX || blk_noretry_request(rq)) {
 + ide_kill_rq(drive, rq);
 + return ide_stopped;
 + }
 +
   if (hwif-INB(IDE_STATUS_REG)  (BUSY_STAT|DRQ_STAT))
 - /* force an abort */
 - hwif-OUTB(WIN_IDLEIMMEDIATE, IDE_COMMAND_REG);
 + rq-errors |= ERROR_RESET;
  
 - if (rq-errors = ERROR_MAX || blk_noretry_request(rq))
 - ide_kill_rq(drive, rq);
 - else {
 - if ((rq-errors  ERROR_RESET) == ERROR_RESET) {
 - ++rq-errors;
 - return ide_do_reset(drive);
 - }
 - if ((rq-errors  ERROR_RECAL) == ERROR_RECAL)
 - drive-special.b.recalibrate = 1;

Is the removal of ERROR_RECAL handling intentional?
There is nothing about it in the patch description...

 + if ((rq-errors  ERROR_RESET) == ERROR_RESET) {
   ++rq-errors;
 + return ide_do_reset(drive);
   }
 +
 + ++rq-errors;
 +
   return ide_stopped;
  }
  
 @@ -586,6 +586,13 @@ EXPORT_SYMBOL_GPL(__ide_error);
   *   both new-style (taskfile) and old style command handling here.
   *   In the case of taskfile command handling there is work left to
   *   do
 + *   This used to send a idle immediate to the drive if the drive was
 + *   busy or had drq set.  This violates the ATA spec (can only send IDLE
 + *   immediate when drive is not busy) and really hoses up some drives.

Could this part of the comment be merged into the patch description?
We don't want to clutter the code with the history of the changes.

 + *   We've changed it to just do a SRST followed by a set features (set
 + *   udma mode) it those cases.  This is what Western Digital recommends

hmm, it doesn't have to be UDMA mode,
-current_speed can also be PIO/SWDMA/MWDMA

 + *   for error recovery and what Western Digital says Windows does.  It
 + *   also does not violate the ATA spec as far as I can tell.
   */

The patch fixes code in ide_ata_error() and updates the comment
for ide_error() but ide_atapi_error() is not left untouched
(it still uses IDLE IMMEDIATE).

I suppose that ide_atapi_error() (for ATAPI devices) needs similar fix?

  ide_startstop_t ide_error (ide_drive_t *drive, const char *msg, u8 stat)
 @@ -1004,6 +1011,12 @@ #endif
   goto kill_rq;
   }
  
 + /* We reset the drive so we need to issue a SETFEATURES. */
 + if ((drive-current_speed == 0xff) 
 + ((rq-cmd_type == REQ_TYPE_ATA_CMD) ||
 + (rq-cmd_type == REQ_TYPE_ATA_TASK)))
 + ide_config_drive_speed_irq(drive, drive-desired_speed);

Please update the patch to not depend on ide_config_drive_speed() fixes
[PATCH 2/3] which need more work (shouldn't be a problem as the code here
uses _irq variant anyway).

Please respin the patch so I could merge it.

Thanks,
Bart
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 3/3] Use correct IDE error recovery

2007-03-07 Thread Alan Cox
 On Wednesday 21 February 2007, Suleiman Souhlal wrote:
  IDE error recovery is using WIN_IDLEIMMEDIATE which was only valid for
  IDE V1 and IDE V2.  Modern drives will not be able to recover using
  this error handling.  The correct thing to do is issue a SRST followed
  by a SET_FEATURES.
 
 This change looks fine, indeed we are better of using SRST + SET_FEATURES
 than IDLE_IMMEDIATE.
 
  Signed-off-by:  Suleiman Souhlal [EMAIL PROTECTED]

Acked-by: Alan Cox [EMAIL PROTECTED]

And this is well worth doing - IDLEIMMEDIATE blows the mind of some later
drive firmware that doesn't expect to be treated in an IDE v1 manner.


Alan
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [2/6] 2.6.21-rc2: known regressions

2007-03-07 Thread Jeff Garzik

Adrian Bunk wrote:

Subject: AT keyboard only works with pci=noacpi
References : http://lkml.org/lkml/2007/3/3/68
Submitter  : Ash Milsted [EMAIL PROTECTED]
Status : unknown



sounds like a BIOS bug, even though it appears to be a regression?

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.21-rc2 : Oops in rtc_cmos...

2007-03-07 Thread Alessandro Zummo
On Wed, 7 Mar 2007 05:42:13 +0100
Paul Rolland [EMAIL PROTECTED] wrote:

 Hello,
 
   Yes, it does, so it's a Good One (tm),
  
  And points out that $SUBJECT is misleading; the root cause of
  the oops isn't rtc_cmos.  Workaround, don't enable the legacy
  driver for this hardware.
 
 Well, sorry for that, but my point was that without enabling
 CONFIG_DRV_RTC_CMOS and only using CONFIG_RTC, my dmesg says :
 
 drivers/rtc/hctosys.c: unable to open rtc device (rtc0)

 yep. the layer the copies the hw clock to the system clock
 is saying that it cannot find any driver to work on. so
 you made the correct move in searching a driver. :)

   drivers/rtc/hctosys.c: unable to open rtc device (rtc0) 
  Because probing 00:03 failed, was never fully usable.
  So then rtc0 couldn't be found.  You'd get the same
  message if, say, the RTC was loaded as a module.
 
 It seems to me that the DRV_RTC_CMOS and the standard CONFIG_RTC
 shouldn't be used at the same time... Am I correct on that ? 
 Wouldn't it be better to have this dependancy enforced ?

 I will try to push a patch asap.

-- 

 Best regards,

 Alessandro Zummo,
  Tower Technologies - Torino, Italy

  http://www.towertech.it

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 4/6] mm: merge populate and nopage into fault (fixes nonlinear)

2007-03-07 Thread Nick Piggin
On Wed, Mar 07, 2007 at 08:08:53AM +0100, Nick Piggin wrote:
 On Tue, Mar 06, 2007 at 10:51:01PM -0800, Andrew Morton wrote:
 
  This patch seems to churn things around an awful lot for minimal benefit.
 
 Well it fixes the whole design of the nonlinear fault path.

If it doesn't look very impressive, it could be because it leaves all
the old crud around for backwards compatibility (the worst offenders
are removed in patch 6/6).

If you look at the patchset as a whole, it removes about 250 lines,
mostly of (non trivial) duplicated code in filemap.c memory.c shmem.c
fremap.c, that is nonlinear pages specific and doesn't get anywhere
near the testing that the linear fault path does.

A minimal fix for nonlinear pages would have required changing all
-populate handlers, which I simply thought was not very productive
considering the testing and coverage issues, and that I was going to
rewrite the nonlinear path anyway.

If you like, you can consider patches 1,2,3 as the fix, and ignore
nonlinear (hey, it doesn't even bother checking truncate_count today!).

Then 4,5,6 is the fault/nonlinear rewrite, take it or leave it. I thought
you would have liked the patches...

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 4/6] mm: merge populate and nopage into fault (fixes nonlinear)

2007-03-07 Thread Ingo Molnar

* Nick Piggin [EMAIL PROTECTED] wrote:

 If it doesn't look very impressive, it could be because it leaves all 
 the old crud around for backwards compatibility (the worst offenders 
 are removed in patch 6/6).
 
 If you look at the patchset as a whole, it removes about 250 lines, 
 mostly of (non trivial) duplicated code in filemap.c memory.c shmem.c 
 fremap.c, that is nonlinear pages specific and doesn't get anywhere 
 near the testing that the linear fault path does.
 
 A minimal fix for nonlinear pages would have required changing all 
 -populate handlers, which I simply thought was not very productive 
 considering the testing and coverage issues, and that I was going to 
 rewrite the nonlinear path anyway.
 
 If you like, you can consider patches 1,2,3 as the fix, and ignore 
 nonlinear (hey, it doesn't even bother checking truncate_count 
 today!).
 
 Then 4,5,6 is the fault/nonlinear rewrite, take it or leave it. I 
 thought you would have liked the patches...

btw., if we decide that nonlinear isnt worth the continuing maintainance 
pain, we could internally implement/emulate sys_remap_file_pages() via a 
call to mremap() and essentially deprecate it, without breaking the ABI 
- and remove all the nonlinear code. (This would split fremap areas into 
separate vmas)

Ingo
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 4/6] mm: merge populate and nopage into fault (fixes nonlinear)

2007-03-07 Thread Andrew Morton
On Wed, 7 Mar 2007 09:27:55 +0100 Ingo Molnar [EMAIL PROTECTED] wrote:

 
 * Nick Piggin [EMAIL PROTECTED] wrote:
 
  If it doesn't look very impressive, it could be because it leaves all 
  the old crud around for backwards compatibility (the worst offenders 
  are removed in patch 6/6).
  
  If you look at the patchset as a whole, it removes about 250 lines, 
  mostly of (non trivial) duplicated code in filemap.c memory.c shmem.c 
  fremap.c, that is nonlinear pages specific and doesn't get anywhere 
  near the testing that the linear fault path does.
  
  A minimal fix for nonlinear pages would have required changing all 
  -populate handlers, which I simply thought was not very productive 
  considering the testing and coverage issues, and that I was going to 
  rewrite the nonlinear path anyway.
  
  If you like, you can consider patches 1,2,3 as the fix, and ignore 
  nonlinear (hey, it doesn't even bother checking truncate_count 
  today!).
  
  Then 4,5,6 is the fault/nonlinear rewrite, take it or leave it. I 
  thought you would have liked the patches...
 
 btw., if we decide that nonlinear isnt worth the continuing maintainance 
 pain, we could internally implement/emulate sys_remap_file_pages() via a 
 call to mremap() and essentially deprecate it, without breaking the ABI 
 - and remove all the nonlinear code. (This would split fremap areas into 
 separate vmas)
 

I'm rather regretting having merged it - I don't think it has been used for
much.

Paolo's UML speedup patches might use nonlinear though.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 4/6] mm: merge populate and nopage into fault (fixes nonlinear)

2007-03-07 Thread Miklos Szeredi
  If it doesn't look very impressive, it could be because it leaves all 
  the old crud around for backwards compatibility (the worst offenders 
  are removed in patch 6/6).
  
  If you look at the patchset as a whole, it removes about 250 lines, 
  mostly of (non trivial) duplicated code in filemap.c memory.c shmem.c 
  fremap.c, that is nonlinear pages specific and doesn't get anywhere 
  near the testing that the linear fault path does.
  
  A minimal fix for nonlinear pages would have required changing all 
  -populate handlers, which I simply thought was not very productive 
  considering the testing and coverage issues, and that I was going to 
  rewrite the nonlinear path anyway.
  
  If you like, you can consider patches 1,2,3 as the fix, and ignore 
  nonlinear (hey, it doesn't even bother checking truncate_count 
  today!).
  
  Then 4,5,6 is the fault/nonlinear rewrite, take it or leave it. I 
  thought you would have liked the patches...
 
 btw., if we decide that nonlinear isnt worth the continuing maintainance 
 pain, we could internally implement/emulate sys_remap_file_pages() via a 
 call to mremap() and essentially deprecate it, without breaking the ABI 
 - and remove all the nonlinear code. (This would split fremap areas into 
 separate vmas)

That would make sense.  Dirty page accounting doesn't work either on
non-linear mappings, and I can't see how that could be fixed in any
other way.

Miklos
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.21-rc2-mm2

2007-03-07 Thread Sébastien Dugué

  Hi Andrew,

On Tue, 6 Mar 2007 00:44:08 -0800 Andrew Morton [EMAIL PROTECTED] wrote:

 
 Temporarily at
 
   http://userweb.kernel.org/~akpm/2.6.21-rc2-mm2/
 
 Will appear later at
 
   
 ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.20-rc2/2.6.20-rc2-mm2/
 
 
 - git-block.patch is having problems which are getting in the way - it has
   been dropped.
 
 - As a consequence all the AIO patches were dropped
 

  Why? The aio notification and listio patches have nothing to do with
git-block.patch.

  If you think those patches will be rendered obsolete by the new
syslet/fibril/whatever approach, then fine. Otherwise do you expect me to
resubmit?

  Thanks,

  Sébastien.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 4/6] mm: merge populate and nopage into fault (fixes nonlinear)

2007-03-07 Thread Andrew Morton
On Wed, 07 Mar 2007 09:38:34 +0100 Miklos Szeredi [EMAIL PROTECTED] wrote:

 Dirty page accounting doesn't work either on
 non-linear mappings

It doesn't?  Confused - these things don't have anything to do with each
other do they?
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.21-rc2-mm2

2007-03-07 Thread Andrew Morton
On Wed, 7 Mar 2007 09:39:48 +0100 Sébastien Dugué [EMAIL PROTECTED] wrote:

 
   Hi Andrew,
 
 On Tue, 6 Mar 2007 00:44:08 -0800 Andrew Morton [EMAIL PROTECTED] wrote:
 
  
  Temporarily at
  
http://userweb.kernel.org/~akpm/2.6.21-rc2-mm2/
  
  Will appear later at
  

  ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.20-rc2/2.6.20-rc2-mm2/
  
  
  - git-block.patch is having problems which are getting in the way - it has
been dropped.
  
  - As a consequence all the AIO patches were dropped
  
 
   Why? The aio notification and listio patches have nothing to do with
 git-block.patch.

Yes, I could have retained those with a little bit of jiggling.  But...

   If you think those patches will be rendered obsolete by the new
 syslet/fibril/whatever approach, then fine.

Until this is sorted out I don't think we can add new core AIO code. 
Particularly not new syscalls.

 Otherwise do you expect me to
 resubmit?

Is OK for now, I think.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/3] msi: Fixup the msi enable/disable logic

2007-03-07 Thread Michael Ellerman
On Tue, 2007-03-06 at 22:19 -0700, Eric W. Biederman wrote:
 Michael Ellerman [EMAIL PROTECTED] writes:
 
 
  Hi Eric, comments below ..
 
 
  I get the reasoning for disabling MSI before we start writing back the
  config space, but don't we want to re-enable MSI on the way out?
 
 We are restoring the entire msi flags register which includes the enable bit,
 setting it a second time is gratuitous.
 
 In addition if we are restoring the register when the enable bit is not set.
 (because we don't have a mask bit) enabling the msi state is actually
 the wrong thing to do.But I admit that case can only happen after
 the additions in my last patch.

Yeah, duh.

cheers

-- 
Michael Ellerman
OzLabs, IBM Australia Development Lab

wwweb: http://michael.ellerman.id.au
phone: +61 2 6212 1183 (tie line 70 21183)

We do not inherit the earth from our ancestors,
we borrow it from our children. - S.M.A.R.T Person


signature.asc
Description: This is a digitally signed message part


Re: [patch 4/6] mm: merge populate and nopage into fault (fixes nonlinear)

2007-03-07 Thread Miklos Szeredi
  Dirty page accounting doesn't work either on
  non-linear mappings
 
 It doesn't?  Confused - these things don't have anything to do with each
 other do they?

Look in page_mkclean().  Where does it handle non-linear mappings?

Miklos
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 4/6] mm: merge populate and nopage into fault (fixes nonlinear)

2007-03-07 Thread Ingo Molnar

* Andrew Morton [EMAIL PROTECTED] wrote:

  btw., if we decide that nonlinear isnt worth the continuing 
  maintainance pain, we could internally implement/emulate 
  sys_remap_file_pages() via a call to mremap() and essentially 
  deprecate it, without breaking the ABI - and remove all the 
  nonlinear code. (This would split fremap areas into separate vmas)
  
 
 I'm rather regretting having merged it - I don't think it has been 
 used for much.
 
 Paolo's UML speedup patches might use nonlinear though.

yes, i wrote the first, prototype version of that for UML, it needs an 
extended version of the syscall, sys_remap_file_pages_prot():

 
http://redhat.com/~mingo/remap-file-pages-patches/remap-file-pages-prot-2.6.4-rc1-mm1-A1

i also wrote an x86 hypervisor kind of thing for UML, called 
'sys_vcpu()', which allows UML to execute guest user-mode in a box, 
which also relies on sys_remap_file_pages_prot():

 http://redhat.com/~mingo/remap-file-pages-patches/vcpu-2.6.4-rc2-mm1-A2

which reduced the UML guest syscall overhead from 30 usecs to 4 usecs 
(with native syscalls taking 2 usecs, on the box i tested, years ago).

So it certainly looked useful to me - but wasnt really picked up widely. 

We'll always have the option to get rid of it (and hence completely 
reverse the decision to merge it) without breaking the ABI, by emulating 
the API via mremap(). That eliminates the UML speedup though. So no need 
to feel sorry about having merged it, we can easily revisit that 
years-old 'do we want it' decision, without any ABI worries.

Ingo
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] epoll use a single inode ...

2007-03-07 Thread Christoph Hellwig
On Wed, Mar 07, 2007 at 08:16:14AM +0100, Eric Dumazet wrote:
 Crazy ideas : (some readers are going to kill me)
 
 1) Use the low order bit of f_path.dentry to say : this pointer is not a 
 pointer to a dentry but the inode pointer (with the low order bit set to 1)
 
 OR
 
 2) file-f_path.dentry set to NULL for this special files (so that we dont 
 need to dput() and cache line ping pong the common dentry each time we 
 __fput() a pipe/socket.

No way on either one.  f_path.dentry always beeing there is an assumption
we make all over the place, and changing that would be a big regression
for code qualityand reliability all over the place.

Face it folks, memory is generally cheap, and we're not going to uglify
huge amounts of code to shave of a little bit.

[and that is only in reply to this one, the single dentry optimizations
 for epoll and friends are perfecltly fine from the high level view]

 Same trick could be used for file-f_path.mnt, because there is a big SMP 
 cache line ping/pong to maintain a mnt_count on pipe/sockets mountpoint 
 while these file systems cannot be un-mounted)

Same thing as above.  We might do a hack to not refcount these vfsmounts,
but we definitively want to keep the invariant of f_path.mnt never
beeing NULL.

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Cbe-oss-dev] [PATCH 14/22] spufs: use SPU master control to prevent wild SPU execution

2007-03-07 Thread Michael Ellerman
On Mon, 2007-03-05 at 02:02 +0100, Arnd Bergmann wrote:
 On Friday 02 March 2007, Michael Ellerman wrote:
  There's also the error case for spu_run_init() which skips the master
  stop. I guess that's ok because we've only set the master control in the
  backing store, and the only way that will ever get propagated to an
  actual spu is by coming back thorough spufs_run_spu().
 
 Hmm, the correct way would be to switch off the master control in there,
 afaics. Fixing it only in spu_run_init would mean that we also handle
 the case of spu_reacquire_runnable along with it.
 
  What originally caught my eye on this was the output from xmon. When we
  drop into xmon with no spu programs running and stop the spus, it
  reports that they _all_ have the master run enabled,
 
 That looks right, there is no problem to have master control enabled,
 as long as user space can't access the spu through a context that is
 bound to it.
 
  and some of them 
  have the runcntl enabled (those that have had spu programs run on them
  since boot it seems).
 
 While this sounds wrong. Maybe the runcntl is active on those that have
 _not_ run since boot, which would make more sense. We should investigate
 this.

No I'm pretty sure it's enabled on the ones that _have_ run since boot.
I'm booting up fresh, running two spu programs, and then I see two spus
with master and runcntl set.

cheers

-- 
Michael Ellerman
OzLabs, IBM Australia Development Lab

wwweb: http://michael.ellerman.id.au
phone: +61 2 6212 1183 (tie line 70 21183)

We do not inherit the earth from our ancestors,
we borrow it from our children. - S.M.A.R.T Person


signature.asc
Description: This is a digitally signed message part


Re: [patch 4/6] mm: merge populate and nopage into fault (fixes nonlinear)

2007-03-07 Thread Nick Piggin
On Wed, Mar 07, 2007 at 09:27:55AM +0100, Ingo Molnar wrote:
 
 * Nick Piggin [EMAIL PROTECTED] wrote:
 
  Then 4,5,6 is the fault/nonlinear rewrite, take it or leave it. I 
  thought you would have liked the patches...
 
 btw., if we decide that nonlinear isnt worth the continuing maintainance 
 pain, we could internally implement/emulate sys_remap_file_pages() via a 
 call to mremap() and essentially deprecate it, without breaking the ABI 
 - and remove all the nonlinear code. (This would split fremap areas into 
 separate vmas)

Well I think it has a few possible uses outside the PAE database
workloads. UML for one seem to be interested... as much as I don't
use them, I think nonlinear mappings are kinda cool ;)

After these patches, I don't think there is too much burden. The main
thing left really is just the objrmap stuff, but that is just handled
with a minimal 'dumb' algorithm that doesn't cost much.

Then the core of it is just the file pte handling, which really doesn't
seem to be much problem.

Apart from a handful of trivial if (pte_file()) cases throughout mm/,
our maintainance burden basically now amounts to the following patch.
Even the rmap.c change looks bigger than it is because I split out
the nonlinear unmapping code from try_to_unmap_file. Not too bad, eh? :)

--

 include/asm-powerpc/pgtable.h |   12 
 mm/Kconfig|6 ++
 mm/Makefile   |6 +-
 mm/rmap.c |  101 +-
 4 files changed, 83 insertions(+), 42 deletions(-)

Index: linux-2.6/include/asm-powerpc/pgtable.h
===
--- linux-2.6.orig/include/asm-powerpc/pgtable.h
+++ linux-2.6/include/asm-powerpc/pgtable.h
@@ -243,7 +243,12 @@ static inline int pte_write(pte_t pte) {
 static inline int pte_exec(pte_t pte)  { return pte_val(pte)  _PAGE_EXEC;}
 static inline int pte_dirty(pte_t pte) { return pte_val(pte)  _PAGE_DIRTY;}
 static inline int pte_young(pte_t pte) { return pte_val(pte)  _PAGE_ACCESSED;}
+
+#ifdef CONFIG_NONLINEAR
 static inline int pte_file(pte_t pte) { return pte_val(pte)  _PAGE_FILE;}
+#else
+static inline int pte_file(pte_t pte) { return 0; }
+#endif
 
 static inline void pte_uncache(pte_t pte) { pte_val(pte) |= _PAGE_NO_CACHE; }
 static inline void pte_cache(pte_t pte)   { pte_val(pte) = ~_PAGE_NO_CACHE; }
@@ -483,9 +488,16 @@ extern void update_mmu_cache(struct vm_a
 #define __swp_entry(type, offset) ((swp_entry_t){((type) 1)|((offset)8)})
 #define __pte_to_swp_entry(pte)((swp_entry_t){pte_val(pte)  
PTE_RPN_SHIFT})
 #define __swp_entry_to_pte(x)  ((pte_t) { (x).val  PTE_RPN_SHIFT })
+
+#ifdef CONFIG_NONLINEAR
 #define pte_to_pgoff(pte)  (pte_val(pte)  PTE_RPN_SHIFT)
 #define pgoff_to_pte(off)  ((pte_t) {((off)  PTE_RPN_SHIFT)|_PAGE_FILE})
 #define PTE_FILE_MAX_BITS  (BITS_PER_LONG - PTE_RPN_SHIFT)
+#else
+#define pte_to_pgoff(pte)  ({BUG(); -1;})
+#define pgoff_to_pte(off)  ({BUG(); (pte_t){-1};})
+#define PTE_FILE_MAX_BITS  0
+#endif
 
 /*
  * kern_addr_valid is intended to indicate whether an address is a valid
Index: linux-2.6/mm/Kconfig
===
--- linux-2.6.orig/mm/Kconfig
+++ linux-2.6/mm/Kconfig
@@ -142,6 +142,12 @@ config SPLIT_PTLOCK_CPUS
 #
 # support for page migration
 #
+config NONLINEAR
+   bool Non linear mappings
+   def_bool y
+   help
+ Provides support for the remap_file_pages syscall.
+
 config MIGRATION
bool Page migration
def_bool y
Index: linux-2.6/mm/Makefile
===
--- linux-2.6.orig/mm/Makefile
+++ linux-2.6/mm/Makefile
@@ -3,9 +3,8 @@
 #
 
 mmu-y  := nommu.o
-mmu-$(CONFIG_MMU)  := fremap.o highmem.o madvise.o memory.o mincore.o \
-  mlock.o mmap.o mprotect.o mremap.o msync.o rmap.o \
-  vmalloc.o
+mmu-$(CONFIG_MMU)  := highmem.o madvise.o memory.o mincore.o mlock.o \
+  mmap.o mprotect.o mremap.o msync.o rmap.o vmalloc.o
 
 obj-y  := bootmem.o filemap.o mempool.o oom_kill.o fadvise.o \
   page_alloc.o page-writeback.o pdflush.o \
@@ -27,5 +26,6 @@ obj-$(CONFIG_SLOB) += slob.o
 obj-$(CONFIG_SLAB) += slab.o
 obj-$(CONFIG_MEMORY_HOTPLUG) += memory_hotplug.o
 obj-$(CONFIG_FS_XIP) += filemap_xip.o
+obj-$(CONFIG_NONLINEAR) += fremap.o
 obj-$(CONFIG_MIGRATION) += migrate.o
 obj-$(CONFIG_SMP) += allocpercpu.o
Index: linux-2.6/mm/rmap.c
===
--- linux-2.6.orig/mm/rmap.c
+++ linux-2.6/mm/rmap.c
@@ -756,6 +756,7 @@ out:
return ret;
 }
 
+#ifdef CONFIG_NONLINEAR
 /*
  * objrmap doesn't work for nonlinear VMAs because the assumption that
  * offset-into-file correlates with offset-into-virtual-addresses does not 
hold.
@@ -845,53 +846,18 @@ static void 

Re: [SLUB 2/3] Large kmalloc pass through. Removal of large general slabs

2007-03-07 Thread Peter Zijlstra
On Tue, 2007-03-06 at 18:35 -0800, Christoph Lameter wrote:
 Unlimited kmalloc size and removal of general caches =4.
 
 We can directly use the page allocator for all allocations 4K and larger. This
 means that no general slabs are necessary and the size of the allocation 
 passed
 to kmalloc() can be arbitrarily large. Remove the useless general caches over 
 4k.
 

 Index: linux-2.6.21-rc2-mm1/include/linux/slub_def.h
 ===
 --- linux-2.6.21-rc2-mm1.orig/include/linux/slub_def.h2007-03-06 
 17:56:14.0 -0800
 +++ linux-2.6.21-rc2-mm1/include/linux/slub_def.h 2007-03-06 
 17:57:11.0 -0800
 @@ -55,7 +55,7 @@ struct kmem_cache {
   */
  #define KMALLOC_SHIFT_LOW 3
  
 -#define KMALLOC_SHIFT_HIGH 18
 +#define KMALLOC_SHIFT_HIGH 11
  
  #if L1_CACHE_BYTES = 64
  #define KMALLOC_EXTRAS 2
 @@ -93,13 +93,6 @@ static inline int kmalloc_index(int size
   if (size =  512) return 9;
   if (size = 1024) return 10;
   if (size = 2048) return 11;
 - if (size = 4096) return 12;
 - if (size =   8 * 1024) return 13;
 - if (size =  16 * 1024) return 14;
 - if (size =  32 * 1024) return 15;
 - if (size =  64 * 1024) return 16;
 - if (size = 128 * 1024) return 17;
 - if (size = 256 * 1024) return 18;
   return -1;
  }

Perhaps so something with PAGE_SIZE here, as you know there are
platforms/configs where PAGE_SIZE != 4k :-)

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] BadRAM still not ready for inclusion ? (was: Re: Free Linux Driver Development!)

2007-03-07 Thread Pavel Machek
Hi!

 What's so hard about submitting a 200 line patch to LKML?
 
 that's what i was wondering about ;D
 it's not the first time, that i see nice features not being submitted to lkml.
 
 anyway - pavel (thanks btw!) just pointed me to some param mem=exactmap :
 
 I think functionality is there in latest vanila with mem=exactmap even
 w/o patches.
 
 so, maybe we need to find out _if_ and _how_ this could make BadRAM obsolete 
 (i.e. if this is a good alternative) ?
 (so we won't need to discuss about BadRAM inclusion anymore. :)
 
 what i found is:
 
 894 memmap=exactmap [KNL,IA-32,X86_64] Enable setting of an exact
 895 E820 memory map, as specified by the user.
 896 Such memmap=exactmap lines can be constructed 
 based on
 897 BIOS output or other requirements. See the [EMAIL 
 PROTECTED]
 898 option description.
 899 
 900 [EMAIL PROTECTED]
 901 [KNL] Force usage of a specific region of memory
 902 Region of memory to be used, from ss to ss+nn.
 903 
 904 memmap=nn[KMG]#ss[KMG]
 905 [KNL,ACPI] Mark specific memory as ACPI data.
 906 Region of memory to be used, from ss to ss+nn.
 907 
 908 memmap=nn[KMG]$ss[KMG]
 909 [KNL,ACPI] Mark specific memory as reserved.
 910 Region of memory to be used, from ss to ss+nn.

  this indeed looks like something being able to replace BadRAM, but
 the question is, how to handle/enable that and how to translate
 BadRAM patterns from memtest86 to be usable. (i.e.: writing a HowTo
 for the average user not being a kernel wizard) 

Writing a howto, or maybe writting a shellscript to do a translation
:-). It will not be trivial, but certainly better than trying to push
the badram patch. Good luck ;-).

(e820 map is available from dmesg after boot, and perhaps from other
places. First, you'll need to duplicate it on cmdline using
memmap=... arguments. Then, if bad ram is in the middle of something,
you'll need to split memmap= accordingly).
Pavel


-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 0/7 -rt] powerpc 2.6.20-rt8: fix build breakage for PowerPC(ppc64)

2007-03-07 Thread Ingo Molnar

* Tsutomu OWA [EMAIL PROTECTED] wrote:

 
 Hi Ingo,
 
   Please apply.
 
   This series of patches fixes build breakage on arch/powerpc with 
 realtime preempt patch.  This applies on top of linux-2.6.20 and 
 patch-2.6.20-rt8.

thanks, applied - these fixes all look straightforward and clean.

Ingo
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 4/6] mm: merge populate and nopage into fault (fixes nonlinear)

2007-03-07 Thread Andrew Morton
On Wed, 07 Mar 2007 09:51:57 +0100 Miklos Szeredi [EMAIL PROTECTED] wrote:

   Dirty page accounting doesn't work either on
   non-linear mappings
  
  It doesn't?  Confused - these things don't have anything to do with each
  other do they?
 
 Look in page_mkclean().  Where does it handle non-linear mappings?
 

OK, I'd forgotten about that.  It won't break dirty memory accounting,
but it'll potentially break dirty memory balancing.

If we have the wrong page (due to nonlinear), page_check_address() will
fail and we'll leave the pte dirty.  That puts us back to the pre-2.6.17
algorithms and I guess it'll break the msync guarantees.

Peter, I thought we went through the nonlinear problem ages ago and decided
it was OK?
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 4/6] mm: merge populate and nopage into fault (fixes nonlinear)

2007-03-07 Thread Nick Piggin
On Wed, Mar 07, 2007 at 09:59:44AM +0100, Nick Piggin wrote:
 Apart from a handful of trivial if (pte_file()) cases throughout mm/,
 our maintainance burden basically now amounts to the following patch.
 Even the rmap.c change looks bigger than it is because I split out
 the nonlinear unmapping code from try_to_unmap_file. Not too bad, eh? :)

Oh, there is a bit more nonlinear mmap list manipulation I'd forgotten
about too... makes things a little bit worse, but not too much.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 0/6 -rt] powerpc 2.6.20-rt8: fix boot/runtime errors/warnings for PowerPC(ppc64)

2007-03-07 Thread Ingo Molnar

* Tsutomu OWA [EMAIL PROTECTED] wrote:

 
 Hi Ingo,
 
   Please consider for inclusion in your rt tree.
 
   This series of patches fixes boot and runntime errors/warnings for
 powerpc (esp. 64 bit).  This applies to linux-2.6.20, patch-2.6.20-rt8
 and previous my patch set;
   http://ozlabs.org/pipermail/linuxppc-dev/2007-March/032640.html
   http://lkml.org/lkml/2007/3/6/503

thanks, applied.

Ingo
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] [patch 4/6 -rt] powerpc 2.6.20-rt8: fix a runtime warnings for xmon

2007-03-07 Thread Ingo Molnar

* Tsutomu OWA [EMAIL PROTECTED] wrote:

 @@ -342,6 +342,7 @@ static int xmon_core(struct pt_regs *reg
  
   msr = mfmsr();
   mtmsr(msr  ~MSR_EE);   /* disable interrupts */
 + preempt_disable();

i'm not an xmon expert, but maybe it might make more sense to first 
disable preemption, then interrupts - otherwise you could be preempted 
right after having disabled these interrupts (and be scheduled to 
another CPU, etc.). What is the difference between local_irq_save() and 
the above 'disable interrupts' sequence? If it's not the same and 
xmon_core() relied on having hardirqs disabled then it might make sense 
to do a local_irq_save() there, instead of a preempt_disable().

Ingo
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 4/6] mm: merge populate and nopage into fault (fixes nonlinear)

2007-03-07 Thread Nick Piggin
On Wed, Mar 07, 2007 at 01:07:56AM -0800, Andrew Morton wrote:
 On Wed, 07 Mar 2007 09:51:57 +0100 Miklos Szeredi [EMAIL PROTECTED] wrote:
 
Dirty page accounting doesn't work either on
non-linear mappings
   
   It doesn't?  Confused - these things don't have anything to do with each
   other do they?
  
  Look in page_mkclean().  Where does it handle non-linear mappings?
  
 
 OK, I'd forgotten about that.  It won't break dirty memory accounting,
 but it'll potentially break dirty memory balancing.
 
 If we have the wrong page (due to nonlinear), page_check_address() will
 fail and we'll leave the pte dirty.  That puts us back to the pre-2.6.17
 algorithms and I guess it'll break the msync guarantees.
 
 Peter, I thought we went through the nonlinear problem ages ago and decided
 it was OK?

msync breakage is bad, but otherwise I don't know that we care about
dirty page writeout efficiency.

But I think we discovered that those msync changes are bogus anyway
becuase there is a small race window where pte could be dirtied without
page being set dirty?


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 4/6] mm: merge populate and nopage into fault (fixes nonlinear)

2007-03-07 Thread Ingo Molnar

* Nick Piggin [EMAIL PROTECTED] wrote:

 After these patches, I don't think there is too much burden. The main 
 thing left really is just the objrmap stuff, but that is just handled 
 with a minimal 'dumb' algorithm that doesn't cost much.

ok. What do you think about the sys_remap_file_pages_prot() thing that 
Paolo has done in a nicely split up form - does that complicate things 
in any fundamental way? That is what is useful to UML.

Ingo
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 4/6] mm: merge populate and nopage into fault (fixes nonlinear)

2007-03-07 Thread Miklos Szeredi
  
  Look in page_mkclean().  Where does it handle non-linear mappings?
  
 
 OK, I'd forgotten about that.  It won't break dirty memory accounting,
 but it'll potentially break dirty memory balancing.
 
 If we have the wrong page (due to nonlinear), page_check_address() will
 fail and we'll leave the pte dirty.

It won't even get that far, because it only looks at vmas on
mapping-i_mmap, and not on i_mmap_nonlinear.

Miklos
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 4/6] mm: merge populate and nopage into fault (fixes nonlinear)

2007-03-07 Thread Andrew Morton
On Wed, 7 Mar 2007 10:18:23 +0100 Nick Piggin [EMAIL PROTECTED] wrote:

 On Wed, Mar 07, 2007 at 01:07:56AM -0800, Andrew Morton wrote:
  On Wed, 07 Mar 2007 09:51:57 +0100 Miklos Szeredi [EMAIL PROTECTED] wrote:
  
 Dirty page accounting doesn't work either on
 non-linear mappings

It doesn't?  Confused - these things don't have anything to do with each
other do they?
   
   Look in page_mkclean().  Where does it handle non-linear mappings?
   
  
  OK, I'd forgotten about that.  It won't break dirty memory accounting,
  but it'll potentially break dirty memory balancing.
  
  If we have the wrong page (due to nonlinear), page_check_address() will
  fail and we'll leave the pte dirty.  That puts us back to the pre-2.6.17
  algorithms and I guess it'll break the msync guarantees.
  
  Peter, I thought we went through the nonlinear problem ages ago and decided
  it was OK?
 
 msync breakage is bad, but otherwise I don't know that we care about
 dirty page writeout efficiency.

Well.  We made so many changes to support the synchronous
dirty-the-page-when-we-dirty-the-pte thing that I'm rather doubtful that
the old-style approach still works.  It might seem to, most of the time. 
But if it _is_ subtly broken, boy it's going to take a long time for us to
find out.

 But I think we discovered that those msync changes are bogus anyway
 becuase there is a small race window where pte could be dirtied without
 page being set dirty?

Dunno, I don't recall that.  We dirty the page before the pte...
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 4/6] mm: merge populate and nopage into fault (fixes nonlinear)

2007-03-07 Thread Nick Piggin
On Wed, Mar 07, 2007 at 09:53:23AM +0100, Ingo Molnar wrote:
 
 * Andrew Morton [EMAIL PROTECTED] wrote:
 
   btw., if we decide that nonlinear isnt worth the continuing 
   maintainance pain, we could internally implement/emulate 
   sys_remap_file_pages() via a call to mremap() and essentially 
   deprecate it, without breaking the ABI - and remove all the 
   nonlinear code. (This would split fremap areas into separate vmas)
   
  
  I'm rather regretting having merged it - I don't think it has been 
  used for much.
  
  Paolo's UML speedup patches might use nonlinear though.
 
 yes, i wrote the first, prototype version of that for UML, it needs an 
 extended version of the syscall, sys_remap_file_pages_prot():
 
  
 http://redhat.com/~mingo/remap-file-pages-patches/remap-file-pages-prot-2.6.4-rc1-mm1-A1
 
 i also wrote an x86 hypervisor kind of thing for UML, called 
 'sys_vcpu()', which allows UML to execute guest user-mode in a box, 
 which also relies on sys_remap_file_pages_prot():
 
  http://redhat.com/~mingo/remap-file-pages-patches/vcpu-2.6.4-rc2-mm1-A2
 
 which reduced the UML guest syscall overhead from 30 usecs to 4 usecs 
 (with native syscalls taking 2 usecs, on the box i tested, years ago).
 
 So it certainly looked useful to me - but wasnt really picked up widely. 
 
 We'll always have the option to get rid of it (and hence completely 
 reverse the decision to merge it) without breaking the ABI, by emulating 
 the API via mremap(). That eliminates the UML speedup though. So no need 
 to feel sorry about having merged it, we can easily revisit that 
 years-old 'do we want it' decision, without any ABI worries.

Depending on whether anyone wants it, and what features they want, we
could emulate the old syscall, and make a new restricted one which is
much less intrusive.

For example, if we can operate only on MAP_ANONYMOUS memory and specify
that nonlinear mappings effectively mlock the pages, then we can get
rid of all the objrmap and unmap_mapping_range handling, forget about
the writeout and msync problems...

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 4/6] mm: merge populate and nopage into fault (fixes nonlinear)

2007-03-07 Thread Miklos Szeredi
  But I think we discovered that those msync changes are bogus anyway
  becuase there is a small race window where pte could be dirtied without
  page being set dirty?
 
 Dunno, I don't recall that.  We dirty the page before the pte...

That's the one I just submitted a fix for ;)

  http://lkml.org/lkml/2007/3/6/308

Miklos
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 4/6] mm: merge populate and nopage into fault (fixes nonlinear)

2007-03-07 Thread Bill Irwin
On Wed, 7 Mar 2007 09:27:55 +0100 Ingo Molnar [EMAIL PROTECTED] wrote:
 btw., if we decide that nonlinear isnt worth the continuing maintainance 
 pain, we could internally implement/emulate sys_remap_file_pages() via a 
 call to mremap() and essentially deprecate it, without breaking the ABI 
 - and remove all the nonlinear code. (This would split fremap areas into 
 separate vmas)

On Wed, Mar 07, 2007 at 12:35:20AM -0800, Andrew Morton wrote:
 I'm rather regretting having merged it - I don't think it has been used for
 much.
 Paolo's UML speedup patches might use nonlinear though.

Guess what major real-life application not only uses nonlinear daily
but would even be very happy to see it extended with non-vma-creating
protections and more? It's not terribly typical for things to be
truncated while remap_file_pages() is doing its work, though it's been
proposed as a method of dynamism. It won't stress remap_file_pages() vs.
truncate() in any meaningful way, though, as userspace will be rather
diligent about clearing in-use data out of the file offset range to be
truncated away anyway, and all that via O_DIRECT.


-- wli
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 4/6] mm: merge populate and nopage into fault (fixes nonlinear)

2007-03-07 Thread Bill Irwin
* Nick Piggin [EMAIL PROTECTED] wrote:
 After these patches, I don't think there is too much burden. The main 
 thing left really is just the objrmap stuff, but that is just handled 
 with a minimal 'dumb' algorithm that doesn't cost much.

On Wed, Mar 07, 2007 at 10:22:52AM +0100, Ingo Molnar wrote:
 ok. What do you think about the sys_remap_file_pages_prot() thing that 
 Paolo has done in a nicely split up form - does that complicate things 
 in any fundamental way? That is what is useful to UML.

Oracle would love it. You don't want to know how far back I've been
asked to backport that.


-- wli
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.21-rc2-mm2

2007-03-07 Thread Sébastien Dugué
On Wed, 7 Mar 2007 00:49:19 -0800 Andrew Morton [EMAIL PROTECTED] wrote:

 On Wed, 7 Mar 2007 09:39:48 +0100 Sébastien Dugué [EMAIL PROTECTED] wrote:
 
  
Hi Andrew,
  
  On Tue, 6 Mar 2007 00:44:08 -0800 Andrew Morton [EMAIL PROTECTED] wrote:
  
   
   Temporarily at
   
 http://userweb.kernel.org/~akpm/2.6.21-rc2-mm2/
   
   Will appear later at
   
 
   ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.20-rc2/2.6.20-rc2-mm2/
   
   
   - git-block.patch is having problems which are getting in the way - it has
 been dropped.
   
   - As a consequence all the AIO patches were dropped
   
  
Why? The aio notification and listio patches have nothing to do with
  git-block.patch.
 
 Yes, I could have retained those with a little bit of jiggling.  But...
 
If you think those patches will be rendered obsolete by the new
  syslet/fibril/whatever approach, then fine.
 
 Until this is sorted out I don't think we can add new core AIO code. 
 Particularly not new syscalls.

  Makes sense.

 
  Otherwise do you expect me to
  resubmit?
 
 Is OK for now, I think.

  Ok, thanks for the good job.

  Sébastien.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 4/6] mm: merge populate and nopage into fault (fixes nonlinear)

2007-03-07 Thread Peter Zijlstra
On Wed, 2007-03-07 at 01:07 -0800, Andrew Morton wrote:
 On Wed, 07 Mar 2007 09:51:57 +0100 Miklos Szeredi [EMAIL PROTECTED] wrote:
 
Dirty page accounting doesn't work either on
non-linear mappings
   
   It doesn't?  Confused - these things don't have anything to do with each
   other do they?
  
  Look in page_mkclean().  Where does it handle non-linear mappings?
  
 
 OK, I'd forgotten about that.  It won't break dirty memory accounting,
 but it'll potentially break dirty memory balancing.
 
 If we have the wrong page (due to nonlinear), page_check_address() will
 fail and we'll leave the pte dirty.  That puts us back to the pre-2.6.17
 algorithms and I guess it'll break the msync guarantees.
 
 Peter, I thought we went through the nonlinear problem ages ago and decided
 it was OK?

Can recollect as much, I modelled it after page_referenced() and can't
find any VM_NONLINEAR specific code in there either.

Will have a hard look, but if its broken, then page_referenced if
equally broken it seems, which would make page reclaim funny in the
light of nonlinear mappings.

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 4/6] mm: merge populate and nopage into fault (fixes nonlinear)

2007-03-07 Thread Ingo Molnar

* Bill Irwin [EMAIL PROTECTED] wrote:

 * Nick Piggin [EMAIL PROTECTED] wrote:
  After these patches, I don't think there is too much burden. The main 
  thing left really is just the objrmap stuff, but that is just handled 
  with a minimal 'dumb' algorithm that doesn't cost much.
 
 On Wed, Mar 07, 2007 at 10:22:52AM +0100, Ingo Molnar wrote:
  ok. What do you think about the sys_remap_file_pages_prot() thing that 
  Paolo has done in a nicely split up form - does that complicate things 
  in any fundamental way? That is what is useful to UML.
 
 Oracle would love it. You don't want to know how far back I've been 
 asked to backport that.

ok, cool! Then the first step would be for you to talk to Paolo and to 
pick up the patches, review them, nurse it in -mm, etc. Suffering in 
silence is just a pointless act of masochism, not an efficient 
upstream-merge tactic ;-)

Ingo
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/6] UDF cleanup and fixes

2007-03-07 Thread Christoph Hellwig
On Tue, Mar 06, 2007 at 05:46:38PM +0100, Jan Kara wrote:
   Use sector_t and loff_t for file offsets in UDF filesystem. Otherwise
 an overflow may occur for long files. Also make inode_bmap() return offset in
 the extent in number of blocks instead of number of bytes - for most callers
 this is more convenient.

Looks good, but can you make sure to add line breaks after 80 chars in
all the lines you touch?
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/6] UDF cleanup and fixes

2007-03-07 Thread Christoph Hellwig
On Tue, Mar 06, 2007 at 05:46:59PM +0100, Jan Kara wrote:
 Introduce a structure extent_position to store a position of an extent and
 the corresponding buffer_head in one place.

Looks good.

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 3/6] UDF cleanup and fixes

2007-03-07 Thread Christoph Hellwig
On Tue, Mar 06, 2007 at 05:47:27PM +0100, Jan Kara wrote:
 Make UDF use get_bh() instead of directly accessing b_count and use brelse()
 instead of udf_release_data() which does just brelse()...

Looks good.

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 6/6] UDF cleanup and fixes

2007-03-07 Thread Christoph Hellwig
On Tue, Mar 06, 2007 at 05:48:39PM +0100, Jan Kara wrote:
 We have to decrease link-count of the parent directory when removing a
 subdirectory.

Ok.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 4/6] UDF cleanup and fixes

2007-03-07 Thread Christoph Hellwig
On Tue, Mar 06, 2007 at 05:47:45PM +0100, Jan Kara wrote:
 Add a few assertions into udf_discard_prealloc() to check that the file
 is sane (mostly helps debugging further patches ;).

Ok.

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 5/6] UDF cleanup and fixes

2007-03-07 Thread Christoph Hellwig
On Tue, Mar 06, 2007 at 05:48:05PM +0100, Jan Kara wrote:
 Make UDF work correctly for files larger than 1GB. As no extent can
 be longer than (130)-blocksize bytes, we have to create several extents
 if a big hole is being created. As a side-effect, we now don't discard
 preallocated blocks when creating a hole.

Ok.

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 4/6] mm: merge populate and nopage into fault (fixes nonlinear)

2007-03-07 Thread Nick Piggin
On Wed, Mar 07, 2007 at 01:26:38AM -0800, Andrew Morton wrote:
 On Wed, 7 Mar 2007 10:18:23 +0100 Nick Piggin [EMAIL PROTECTED] wrote:
 
  
  msync breakage is bad, but otherwise I don't know that we care about
  dirty page writeout efficiency.
 
 Well.  We made so many changes to support the synchronous
 dirty-the-page-when-we-dirty-the-pte thing that I'm rather doubtful that
 the old-style approach still works.  It might seem to, most of the time. 
 But if it _is_ subtly broken, boy it's going to take a long time for us to
 find out.

I can't think of anything that should have caused breakage (except for
the msync thing). We're still careful about not dropping pte dirty bits.

  But I think we discovered that those msync changes are bogus anyway
  becuase there is a small race window where pte could be dirtied without
  page being set dirty?
 
 Dunno, I don't recall that.  We dirty the page before the pte...

I don't think it isn't really that simple. There is a big comment in
clear_page_dirty_for_io.

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


How to distinguish original kernel vs -rt kernel

2007-03-07 Thread Pierre Peiffer

Hi,

Supposing I have an external kernel module which I would like to compile against 
both original kernel and -rt kernel, what is the proper/most elegant way to know 
which kernel I'm compiling with ?

I've only found the EXTRAVERSION define, am I missing a better way ?

In fact, I'm facing the problem of HRTIMER_ABS/REL being renamed to 
HRTIMER_MODE_ABS/REL with patch -rt. Is there a reason of this ?


Does anyone have an objection of keeping it the same (let's say 
HRTIMER_ABS/REL) in kernel -rt ?


Thanks,

--
Pierre
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 4/6] mm: merge populate and nopage into fault (fixes nonlinear)

2007-03-07 Thread Andrew Morton
On Wed, 7 Mar 2007 01:29:03 -0800 Bill Irwin [EMAIL PROTECTED] wrote:

 On Wed, 7 Mar 2007 09:27:55 +0100 Ingo Molnar [EMAIL PROTECTED] wrote:
  btw., if we decide that nonlinear isnt worth the continuing maintainance 
  pain, we could internally implement/emulate sys_remap_file_pages() via a 
  call to mremap() and essentially deprecate it, without breaking the ABI 
  - and remove all the nonlinear code. (This would split fremap areas into 
  separate vmas)
 
 On Wed, Mar 07, 2007 at 12:35:20AM -0800, Andrew Morton wrote:
  I'm rather regretting having merged it - I don't think it has been used for
  much.
  Paolo's UML speedup patches might use nonlinear though.
 
 Guess what major real-life application not only uses nonlinear daily
 but would even be very happy to see it extended with non-vma-creating
 protections and more?

uh-oh.  SQL server?

 It's not terribly typical for things to be
 truncated while remap_file_pages() is doing its work, though it's been
 proposed as a method of dynamism. It won't stress remap_file_pages() vs.
 truncate() in any meaningful way, though, as userspace will be rather
 diligent about clearing in-use data out of the file offset range to be
 truncated away anyway, and all that via O_DIRECT.

The problem here isn't related to truncate or direct-IO.  It's just
plain-old MAP_SHARED.  nonlinear VMAs are now using the old-style
dirty-memory management.  msync() is basically a no-op and the code is
wildly tricky and pretty much untested.  The chances that we broke it are
considerable.

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Fix building kernel under Solaris

2007-03-07 Thread Christoph Hellwig
On Tue, Mar 06, 2007 at 10:09:40AM -0800, Deepak Saxena wrote:
 @@ -16,8 +16,10 @@
  #include sys/time.h
  #include sys/ioctl.h
  #include sys/types.h
 +#ifndef __sun__
  #include asm/types.h
  #endif
 +#endif

So if solaris doesn't need it, why do we need it on Linux?

 +/*
 + * Solaris does not strsep
 + */
 +#ifndef __sun__
   while ((fname = strsep(sources,  )) != NULL) {
   if (!*fname)
   continue;
 +#else
 + for (fname = strtok(sources,  ); fname; fname = strtok(NULL,  )) {
 +#endif
   if (!parse_source_files(fname, md))
   goto release;
   }

Please either provide a strsep for solaris, or use strtok unconditionally.
ut this ifdef mess is not acceptable.


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 4/6] mm: merge populate and nopage into fault (fixes nonlinear)

2007-03-07 Thread Bill Irwin
On Wed, Mar 07, 2007 at 10:28:21AM +0100, Nick Piggin wrote:
 Depending on whether anyone wants it, and what features they want, we
 could emulate the old syscall, and make a new restricted one which is
 much less intrusive.
 For example, if we can operate only on MAP_ANONYMOUS memory and specify
 that nonlinear mappings effectively mlock the pages, then we can get
 rid of all the objrmap and unmap_mapping_range handling, forget about
 the writeout and msync problems...

Anonymous-only would make it a doorstop for Oracle, since its entire
motive for using it is to window into objects larger than user virtual
address spaces (this likely also applies to UML, though they should
really chime in to confirm). Restrictions to tmpfs and/or ramfs would
likely be liveable, though I suspect some things might want to do it to
shm segments (I'll ask about that one). There's definitely no need for a
persistent backing store for the object to be remapped in Oracle's case,
in any event. It's largely the in-core destination and source of IO, not
something saved on-disk itself.


-- wli
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 4/6] mm: merge populate and nopage into fault (fixes nonlinear)

2007-03-07 Thread Nick Piggin
On Wed, Mar 07, 2007 at 10:32:22AM +0100, Peter Zijlstra wrote:
 On Wed, 2007-03-07 at 01:07 -0800, Andrew Morton wrote:
  On Wed, 07 Mar 2007 09:51:57 +0100 Miklos Szeredi [EMAIL PROTECTED] wrote:
  
 Dirty page accounting doesn't work either on
 non-linear mappings

It doesn't?  Confused - these things don't have anything to do with each
other do they?
   
   Look in page_mkclean().  Where does it handle non-linear mappings?
   
  
  OK, I'd forgotten about that.  It won't break dirty memory accounting,
  but it'll potentially break dirty memory balancing.
  
  If we have the wrong page (due to nonlinear), page_check_address() will
  fail and we'll leave the pte dirty.  That puts us back to the pre-2.6.17
  algorithms and I guess it'll break the msync guarantees.
  
  Peter, I thought we went through the nonlinear problem ages ago and decided
  it was OK?
 
 Can recollect as much, I modelled it after page_referenced() and can't
 find any VM_NONLINEAR specific code in there either.
 
 Will have a hard look, but if its broken, then page_referenced if
 equally broken it seems, which would make page reclaim funny in the
 light of nonlinear mappings.

page_referenced is just an heuristic, and it ignores nonlinear mappings
and the page which will get filtered down to try_to_unmap.

Page reclaim is already funny for nonlinear mappings, page_referenced
is the least of its worries ;) It works, though.


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: How to distinguish original kernel vs -rt kernel

2007-03-07 Thread Thomas Gleixner
On Wed, 2007-03-07 at 10:38 +0100, Pierre Peiffer wrote:
 In fact, I'm facing the problem of HRTIMER_ABS/REL being renamed to 
 HRTIMER_MODE_ABS/REL with patch -rt. Is there a reason of this ?
 
 Does anyone have an objection of keeping it the same (let's say 
 HRTIMER_ABS/REL) in kernel -rt ?

It is HRTIMER_MODE_xx in mainline as of 2.6.21-rc1. -rt kernels are
always a bit ahead of time. :)

tglx




-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 4/6] mm: merge populate and nopage into fault (fixes nonlinear)

2007-03-07 Thread Nick Piggin
On Wed, Mar 07, 2007 at 01:44:20AM -0800, Bill Irwin wrote:
 On Wed, Mar 07, 2007 at 10:28:21AM +0100, Nick Piggin wrote:
  Depending on whether anyone wants it, and what features they want, we
  could emulate the old syscall, and make a new restricted one which is
  much less intrusive.
  For example, if we can operate only on MAP_ANONYMOUS memory and specify
  that nonlinear mappings effectively mlock the pages, then we can get
  rid of all the objrmap and unmap_mapping_range handling, forget about
  the writeout and msync problems...
 
 Anonymous-only would make it a doorstop for Oracle, since its entire
 motive for using it is to window into objects larger than user virtual

Uh, duh yes I don't mean MAP_ANONYMOUS, I was just thinking of the shmem
inode that sits behind MAP_ANONYMOUS|MAP_SHARED. Of course if you don't
have a file descriptor to get a pgoff, then remap_file_pages is a doorstop
for everyone ;)

 address spaces (this likely also applies to UML, though they should
 really chime in to confirm). Restrictions to tmpfs and/or ramfs would
 likely be liveable, though I suspect some things might want to do it to
 shm segments (I'll ask about that one). There's definitely no need for a
 persistent backing store for the object to be remapped in Oracle's case,
 in any event. It's largely the in-core destination and source of IO, not
 something saved on-disk itself.

Yeah, tmpfs/shm segs are what I was thinking about. If UML can live with
that as well, then I think it might be a good option.

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 4/6] mm: merge populate and nopage into fault (fixes nonlinear)

2007-03-07 Thread Bill Irwin
On Wed, Mar 07, 2007 at 10:22:52AM +0100, Ingo Molnar wrote:
 ok. What do you think about the sys_remap_file_pages_prot() thing that 
 Paolo has done in a nicely split up form - does that complicate things 
 in any fundamental way? That is what is useful to UML.

* Bill Irwin [EMAIL PROTECTED] wrote:
 Oracle would love it. You don't want to know how far back I've been 
 asked to backport that.

On Wed, Mar 07, 2007 at 10:35:18AM +0100, Ingo Molnar wrote:
 ok, cool! Then the first step would be for you to talk to Paolo and to 
 pick up the patches, review them, nurse it in -mm, etc. Suffering in 
 silence is just a pointless act of masochism, not an efficient 
 upstream-merge tactic ;-)

It was intended for use in a debugging mode for the database, so given
the general mood where fighting backouts was an issue, I was relatively
loath to bring it up. With UML behind it I don't feel that's as much of
a concern.


-- wli
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 4/6] mm: merge populate and nopage into fault (fixes nonlinear)

2007-03-07 Thread Nick Piggin
On Wed, Mar 07, 2007 at 10:22:52AM +0100, Ingo Molnar wrote:
 
 * Nick Piggin [EMAIL PROTECTED] wrote:
 
  After these patches, I don't think there is too much burden. The main 
  thing left really is just the objrmap stuff, but that is just handled 
  with a minimal 'dumb' algorithm that doesn't cost much.
 
 ok. What do you think about the sys_remap_file_pages_prot() thing that 
 Paolo has done in a nicely split up form - does that complicate things 
 in any fundamental way? That is what is useful to UML.

Last time I looked (a while ago), the only issue I had was that he was
doing a weird special case rather than using another !present pte bit
for his nonlinear protection ptes.

I think he fixed that now and so it should be quite good now.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH -mm] Blackfin: blackfin i2c driver

2007-03-07 Thread Wu, Bryan

  Signed-off-by: Bryan Wu [EMAIL PROTECTED] 
  ---
   drivers/i2c/busses/Kconfig |   47 
   drivers/i2c/busses/i2c-bfin-gpio.c |   98 +
   drivers/i2c/busses/i2c-bfin-twi.c  |  589 
  
 
 I'd prefer i2c-blackfin-gpio and i2c-blackfin-twi. Abreviations tend to
 confuse newcomers.
 

There are tons of code using bfin abreviations for
functions/files/variables names. So we prefer to follow the name scheme.

   3 files changed, 734 insertions(+)
  
  Index: linux-2.6/drivers/i2c/busses/Kconfig
  ===
  --- linux-2.6.orig/drivers/i2c/busses/Kconfig   2007-03-07 
  13:32:02.0 +0800
  +++ linux-2.6/drivers/i2c/busses/Kconfig2007-03-07 13:44:19.0 
  +0800
  @@ -5,6 +5,53 @@
   menu I2C Hardware Bus support
  depends on I2C
   
  +config I2C_BFIN_GPIO
 
 I2C_BLACKFIN_GPIO
 

In the Kconfig, we are trying to using CONFIG_BLACKFIN. So, I changed
all the BFIN to BLACKFIN as you mentioned.

 Please move the entries to the right location. The list is sorted
 alphabetically if you didn't notice.
 
  +   tristate Generic Blackfin and HHBF533/561 development board I2C 
  support
 
 You can drop the trailing I2C support, the user is in a menu named
 I2C hardware bus support so it's pretty clear what we're talking
 about.
 
  +   depends on I2C  EXPERIMENTAL
  +   select I2C_ALGOBIT
  +   help
  +   --
  +
  +menu BFIN I2C SDA/SCL Selection
  +   depends on I2C_BFIN_GPIO
  +config BFIN_SDA
 
 I2C_BLACKFIN_SDA
 
  +   int SDA is GPIO Number
 
 SDA GPIO pin number
 
  +   range 0 15 if (BF533 || BF532 || BF531) 
 
 Trailing whitespace.
 
  +   range 0 47 if (BF534 || BF536 || BF537)
  +   range 0 47 if BF561
  +   default 2 if (BF533 || BF532 || BF531) 
 
 Trailing whitespace.
 
 No default for the other cases?
 
  +
  +config BFIN_SCL
 
 I2C_BLACKFIN_SCL
 Etc etc, all the options should start with I2C_BLACKFIN.
 
  +   int SCL is GPIO Number
 
 SCL GPIO pin number
 
  +   range 0 15 if (BF533 || BF532 || BF531) 
 
 Trailing whitespace, and many more after that. Please fix them all!
 

All above fixed.

  +   range 0 47 if (BF534 || BF536 || BF537)
  +   range 0 47 if BF561
  +   default 3 
  +endmenu
  +
  +config I2C_BFIN_GPIO_CYCLE_DELAY
  +   int Cycle Delay in usec
  +   depends on I2C_BFIN_GPIO
  +   range 1 100 
  +   default 40
 
 This should really not be a kernel configuration option. Please turn it
 into a kernel module parameter or a sysfs attribute if you really need
 it. Also note that we already have an interface to change this
 value from user-space (using an ioctl on /dev/i2c-N) and that might be
 sufficient for your needs.
 

Actually, for some customer's requirement, they just want to set the
configuration in kernel config time not in the runtime. 


 And allowing 1 usec delay is probably not a good idea, I don't
 recommend values below 6 usec with i2c-algo-bit.

I add a range here from 5 to 100, default is 40.

  +
  +config I2C_BFIN_TWI
  +   tristate Blackfin TWI I2C support
  +   depends on I2C  (BF534 || BF536 || BF537)
  +   help
  + This the TWI I2C device driver for Blackfin 534/536/537.
  +
  + This driver can also be built as a module.  If so, the module
  + will be called i2c-bfin-twi.
  +
  +config TWICLK_KHZ
  +   int TWI clock (kHZ)
 
 kHz
 
  +   depends on I2C_BFIN_TWI
  +   default 50
  +   help
  + The unit of the TWI clock is kilo HZ. Please divide the clock 
  + by 1024 if you count it in HZ. The value should be less than 400.
 
 Why don't you use range here too to ensure that the value is actually
 less than 400? Either way, same as above, IMHO this should not be a
 compilation-time decision.
 
 A kHz is really 1000 Hz, not 1024. And everybody skilled enough to
 configure a kernel should know that, I doubt it's worth reminding.
 

All above fixed.

  +
   config I2C_ALI1535
  tristate ALI 1535
  depends on I2C  PCI
 
 All these options won't work really well until you also change
 drivers/i2c/busses/Makefile to make something useful with them...
 

Sorry for missing the Makefile in this patch. 

Thanks for your review and this is the latest one.

Andrew, I also fixed Kconfig bug in your
blackfin-blackfin-i2c-driver-fix.patch
Please add this one to -mm tree and remove old
one/update.patch/fix.patch from -mm tree.

[PATCH] Blackfin: blackfin i2c driver

The i2c linux driver for blackfin architecture which supports both GPIO
i2c operation and blackfin on-chip TWI controller i2c operation.

Signed-off-by: Bryan Wu [EMAIL PROTECTED]
Reviewed-by: Andrew Morton [EMAIL PROTECTED]
Reviewed-by: Alexey Dobriyan [EMAIL PROTECTED]
Reviewed-by: Jean Delvare [EMAIL PROTECTED]
---

 drivers/i2c/busses/Kconfig |   47 
 drivers/i2c/busses/Makefile|2
 drivers/i2c/busses/i2c-bfin-gpio.c |  100 +
 drivers/i2c/busses/i2c-bfin-twi.c  |  589 

Re: tdfx framebuffer garbles display in 2.6.19.5

2007-03-07 Thread DervishD
Hi Antonino :)

 * Antonino A. Daplas [EMAIL PROTECTED] dixit:
 On Tue, 2007-03-06 at 07:25 +0100, DervishD wrote:
If you want me to test other patches, just tell :)
   
   Can you change the mdelay to udelay and use higher/lower delay values
   to see if there's any improvement?

Regarding the delay: I've discovered a weird thing. When the display
is garbled, if I insist on outputting more text to the screen, sooner or
later it de-garbles! In fact, once the display has been garbled (not
easy to do, sometimes I can work for hours in a terminal before it gets
garbled, I can't reproduce it always), a continous output makes it
de-garble and garble again, in cycles.

Looks like an off-by-one error rather than a speed/sync error, am I
completely clueless?

This happens with vanilla 2.6.19.5, not with the patched one, which
I haven't been able to test yet (sorry...).

Raúl Núñez de Arenas Coronado

-- 
Linux Registered User 88736 | http://www.dervishd.net
It's my PC and I'll cry if I want to... RAmen!
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 4/6] mm: merge populate and nopage into fault (fixes nonlinear)

2007-03-07 Thread Nick Piggin
On Wed, Mar 07, 2007 at 10:49:47AM +0100, Nick Piggin wrote:
 On Wed, Mar 07, 2007 at 01:44:20AM -0800, Bill Irwin wrote:
  On Wed, Mar 07, 2007 at 10:28:21AM +0100, Nick Piggin wrote:
   Depending on whether anyone wants it, and what features they want, we
   could emulate the old syscall, and make a new restricted one which is
   much less intrusive.
   For example, if we can operate only on MAP_ANONYMOUS memory and specify
   that nonlinear mappings effectively mlock the pages, then we can get
   rid of all the objrmap and unmap_mapping_range handling, forget about
   the writeout and msync problems...
  
  Anonymous-only would make it a doorstop for Oracle, since its entire
  motive for using it is to window into objects larger than user virtual
 
 Uh, duh yes I don't mean MAP_ANONYMOUS, I was just thinking of the shmem
 inode that sits behind MAP_ANONYMOUS|MAP_SHARED. Of course if you don't
 have a file descriptor to get a pgoff, then remap_file_pages is a doorstop
 for everyone ;)
 
  address spaces (this likely also applies to UML, though they should
  really chime in to confirm). Restrictions to tmpfs and/or ramfs would
  likely be liveable, though I suspect some things might want to do it to
  shm segments (I'll ask about that one). There's definitely no need for a
  persistent backing store for the object to be remapped in Oracle's case,
  in any event. It's largely the in-core destination and source of IO, not
  something saved on-disk itself.
 
 Yeah, tmpfs/shm segs are what I was thinking about. If UML can live with
 that as well, then I think it might be a good option.

Oh, hmm if you can truncate these things then you still need to
force unmap so you still need i_mmap_nonlinear.

But come to think of it, I still don't think nonlinear mappings are
too bad as they are ;)
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 4/6] mm: merge populate and nopage into fault (fixes nonlinear)

2007-03-07 Thread Nick Piggin
On Wed, Mar 07, 2007 at 10:45:03AM +0100, Nick Piggin wrote:
 On Wed, Mar 07, 2007 at 10:32:22AM +0100, Peter Zijlstra wrote:
  
  Can recollect as much, I modelled it after page_referenced() and can't
  find any VM_NONLINEAR specific code in there either.
  
  Will have a hard look, but if its broken, then page_referenced if
  equally broken it seems, which would make page reclaim funny in the
  light of nonlinear mappings.
 
 page_referenced is just an heuristic, and it ignores nonlinear mappings
 and the page which will get filtered down to try_to_unmap.
 
 Page reclaim is already funny for nonlinear mappings, page_referenced
 is the least of its worries ;) It works, though.

Or, to be more helpful, unmap_mapping_range is what it should be
modelled on.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] x86_64, i386: Add command line length to boot protocol

2007-03-07 Thread Vivek Goyal
On Tue, Mar 06, 2007 at 07:14:30PM +0100, Bernhard Walle wrote:
 Because the command line is increased to 2048 characters after 2.6.21,
 it's not possible for boot loaders and userspace tools to determine the length
 of the command line the kernel can understand. The benefit of knowing the
 length is that users can be warned if the command line size is too long which
 prevents surprise if things don't work after bootup.

This makes sense to me. It can be used in kexec bootloader to warn user
if command line size exceeds than supported by kernel.

 
 This patch updates the boot protocol to contain a field called
 cmdline_size that contain the length of the command line (excluding
 the terminating zero).
 
 The patch also adds missing fields (of protocol version 2.05) to the x86_64
 setup code.

Today I have posted the x86_64 relocatable kernel patches which also fill
in missing 2.05 fields for x86_64.

[..]
  #define SIG1 0xAA55
 @@ -81,7 +82,7 @@ start:
  # This is the setup header, and it must start at %cs:2 (old 0x9020:2)
 
   .ascii  HdrS  # header signature
 - .word   0x0205  # header version number (= 0x0105)
 + .word   0x0206  # header version number (= 0x0105)
   # or else old loadlin-1.5 will fail)
  realmode_swtch:  .word   0, 0# default_switch, SETUPSEG
  start_sys_seg:   .word   SYSSEG
 @@ -171,6 +172,10 @@ relocatable_kernel:.byte 0
  pad2:.byte 0
  pad3:.word 0
 
 +cmdline_size:   .long   COMMAND_LINE_SIZE-1 #length of the command line,
 +#added with boot protocol
 +#version 2.06
 +

I think you will not require more than two bytes to represent supported
command line size so you can use replace pad3 and use these two bytes. These
were anyway padding bytes to be used for other requirements.

[..]
   # or else old loadlin-1.5 will fail)
  realmode_swtch:  .word   0, 0# default_switch, SETUPSEG
  start_sys_seg:   .word   SYSSEG
 @@ -155,6 +156,18 @@ cmd_line_ptr:.long 0 # (Header versio
   # low memory 0x1 or higher.
 
  ramdisk_max: .long 0x
 +
 +kernel_alignment:  .long CONFIG_PHYSICAL_START   #physical addr alignment
 + #(not relocatable =
 + #fixed start == alignment)
 +

This is wrong. CONFIG_PHYSICAL_START is not alignment. On x86_64 alignment
required is 2MB. (0x20).

Thanks
Vivek
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 4/6] mm: merge populate and nopage into fault (fixes nonlinear)

2007-03-07 Thread Peter Zijlstra
On Wed, 2007-03-07 at 11:04 +0100, Nick Piggin wrote:
 On Wed, Mar 07, 2007 at 10:45:03AM +0100, Nick Piggin wrote:
  On Wed, Mar 07, 2007 at 10:32:22AM +0100, Peter Zijlstra wrote:
   
   Can recollect as much, I modelled it after page_referenced() and can't
   find any VM_NONLINEAR specific code in there either.
   
   Will have a hard look, but if its broken, then page_referenced if
   equally broken it seems, which would make page reclaim funny in the
   light of nonlinear mappings.
  
  page_referenced is just an heuristic, and it ignores nonlinear mappings
  and the page which will get filtered down to try_to_unmap.
  
  Page reclaim is already funny for nonlinear mappings, page_referenced
  is the least of its worries ;) It works, though.
 
 Or, to be more helpful, unmap_mapping_range is what it should be
 modelled on.

*sigh* yes was looking at all that code, thats gonna be darn slow
though, but I'll whip up a patch.

/me feels terribly bad about having missed this..

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 4/6] mm: merge populate and nopage into fault (fixes nonlinear)

2007-03-07 Thread Bill Irwin
On Wed, 7 Mar 2007 01:29:03 -0800 Bill Irwin [EMAIL PROTECTED] wrote:
 Guess what major real-life application not only uses nonlinear daily
 but would even be very happy to see it extended with non-vma-creating
 protections and more?

On Wed, Mar 07, 2007 at 01:39:42AM -0800, Andrew Morton wrote:
 uh-oh.  SQL server?

Close enough. ;)


On Wed, 7 Mar 2007 01:29:03 -0800 Bill Irwin [EMAIL PROTECTED] wrote:
 It's not terribly typical for things to be
 truncated while remap_file_pages() is doing its work, though it's been
 proposed as a method of dynamism. It won't stress remap_file_pages() vs.
 truncate() in any meaningful way, though, as userspace will be rather
 diligent about clearing in-use data out of the file offset range to be
 truncated away anyway, and all that via O_DIRECT.

On Wed, Mar 07, 2007 at 01:39:42AM -0800, Andrew Morton wrote:
 The problem here isn't related to truncate or direct-IO.  It's just
 plain-old MAP_SHARED.  nonlinear VMAs are now using the old-style
 dirty-memory management.  msync() is basically a no-op and the code is
 wildly tricky and pretty much untested.  The chances that we broke it are
 considerable.

This would be of concern for swapping out tmpfs-backed nonlinearly-
mapped files under extreme stress in Oracle's case, though it's rather
typical for it all to be mlock()'d in-core and cases where that's
necessary to be considered grossly underprovisioned. As far as I know,
msync() is not used to manage the nonlinearly-mapped objects, which are
most typically expected to be memory-backed, rendering writeback to
disk of questionable value. Also quite happily, I'm not aware of any
data integrity issues it would explain. Bug though it may be, it
requires a usage model very rarely used by Oracle to trigger, so we've
not run into it.


-- wli
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[GIT PULL] AVR32 fixes

2007-03-07 Thread Haavard Skinnemoen
Hi Linus,

Please pull the 'for-linus' branch of

git://www.atmel.no/~hskinnemoen/linux/kernel/avr32.git for-linus

to receive the following updates.

Gary Zambrano (1):
  avr32: dma-mapping.h

Haavard Skinnemoen (5):
  [AVR32] at32_spi_setup_slaves should be __init
  [AVR32] show_trace: Only walk valid stack addresses
  [AVR32] Fix typo in include/asm-avr32/Kbuild
  [AVR32] Fix bogus ti-flags manipulation in debug handler
  [AVR32] Don't use kmap() in flush_icache_page()

 arch/avr32/kernel/ptrace.c  |4 +-
 arch/avr32/kernel/traps.c   |   52 +++---
 arch/avr32/mach-at32ap/at32ap7000.c |2 +-
 arch/avr32/mm/cache.c   |3 +-
 include/asm-avr32/Kbuild|2 +-
 include/asm-avr32/dma-mapping.h |   18 
 6 files changed, 52 insertions(+), 29 deletions(-)

diff --git a/arch/avr32/kernel/ptrace.c b/arch/avr32/kernel/ptrace.c
index f2e81cd..6f4388f 100644
--- a/arch/avr32/kernel/ptrace.c
+++ b/arch/avr32/kernel/ptrace.c
@@ -313,7 +313,7 @@ asmlinkage void do_debug_priv(struct pt_regs *regs)
__mtdr(DBGREG_DC, dc);
 
ti = current_thread_info();
-   ti-flags |= _TIF_BREAKPOINT;
+   set_ti_thread_flag(ti, TIF_BREAKPOINT);
 
/* The TLB miss handlers don't check thread flags */
if ((regs-pc = (unsigned long)itlb_miss)
@@ -328,7 +328,7 @@ asmlinkage void do_debug_priv(struct pt_regs *regs)
 * single step.
 */
if ((regs-sr  MODE_MASK) != MODE_SUPERVISOR)
-   ti-flags |= TIF_SINGLE_STEP;
+   set_ti_thread_flag(ti, TIF_SINGLE_STEP);
} else {
panic(Unable to handle debug trap at pc = %08lx\n,
  regs-pc);
diff --git a/arch/avr32/kernel/traps.c b/arch/avr32/kernel/traps.c
index 7e803f4..adc01a1 100644
--- a/arch/avr32/kernel/traps.c
+++ b/arch/avr32/kernel/traps.c
@@ -49,39 +49,45 @@ out:
return;
 }
 
+static inline int valid_stack_ptr(struct thread_info *tinfo, unsigned long p)
+{
+   return (p  (unsigned long)tinfo)
+(p  (unsigned long)tinfo + THREAD_SIZE - 3);
+}
+
 #ifdef CONFIG_FRAME_POINTER
 static inline void __show_trace(struct task_struct *tsk, unsigned long *sp,
struct pt_regs *regs)
 {
-   unsigned long __user *fp;
-   unsigned long __user *last_fp = NULL;
-
-   if (regs) {
-   fp = (unsigned long __user *)regs-r7;
-   } else if (tsk == current) {
-   register unsigned long __user *real_fp __asm__(r7);
-   fp = real_fp;
-   } else {
-   fp = (unsigned long __user *)tsk-thread.cpu_context.r7;
-   }
+   unsigned long lr, fp;
+   struct thread_info *tinfo;
+
+   tinfo = (struct thread_info *)
+   ((unsigned long)sp  ~(THREAD_SIZE - 1));
+
+   if (regs)
+   fp = regs-r7;
+   else if (tsk == current)
+   asm(mov %0, r7 : =r(fp));
+   else
+   fp = tsk-thread.cpu_context.r7;
 
/*
-* Walk the stack until (a) we get an exception, (b) the frame
-* pointer becomes zero, or (c) the frame pointer gets stuck
-* at the same value.
+* Walk the stack as long as the frame pointer (a) is within
+* the kernel stack of the task, and (b) it doesn't move
+* downwards.
 */
-   while (fp  fp != last_fp) {
-   unsigned long lr, new_fp = 0;
-
-   last_fp = fp;
-   if (__get_user(lr, fp))
-   break;
-   if (fp  __get_user(new_fp, fp + 1))
-   break;
-   fp = (unsigned long __user *)new_fp;
+   while (valid_stack_ptr(tinfo, fp)) {
+   unsigned long new_fp;
 
+   lr = *(unsigned long *)fp;
printk( [%08lx] , lr);
print_symbol(%s\n, lr);
+
+   new_fp = *(unsigned long *)(fp + 4);
+   if (new_fp = fp)
+   break;
+   fp = new_fp;
}
printk(\n);
 }
diff --git a/arch/avr32/mach-at32ap/at32ap7000.c 
b/arch/avr32/mach-at32ap/at32ap7000.c
index bc23550..472703f 100644
--- a/arch/avr32/mach-at32ap/at32ap7000.c
+++ b/arch/avr32/mach-at32ap/at32ap7000.c
@@ -752,7 +752,7 @@ static struct resource atmel_spi1_resource[] = {
 DEFINE_DEV(atmel_spi, 1);
 DEV_CLK(spi_clk, atmel_spi1, pba, 1);
 
-static void
+static void __init
 at32_spi_setup_slaves(unsigned int bus_num, struct spi_board_info *b,
  unsigned int n, const u8 *pins)
 {
diff --git a/arch/avr32/mm/cache.c b/arch/avr32/mm/cache.c
index fb13f72..8f7b1c3 100644
--- a/arch/avr32/mm/cache.c
+++ b/arch/avr32/mm/cache.c
@@ -121,9 +121,8 @@ void flush_icache_range(unsigned long start, unsigned long 
end)
 void flush_icache_page(struct vm_area_struct *vma, struct 

Re: PROBLEM: Crash on device_shutdown

2007-03-07 Thread Julien RF
Hi,

I still have the problem with 2.6.20.1 kernel :(
Julien

Le jeudi 01 février 2007 17:12, Oleg Verych a écrit :
  From: Julien RF
  Newsgroups: gmane.linux.kernel
  Subject: PROBLEM: Crash on device_shutdown
  Date: Mon, 29 Jan 2007 00:03:49 +0100
  Archived-At: http://permalink.gmane.org/gmane.linux.kernel/487140

 Hallo, Julien.

 Do you have such problem with more recent kernels, or mm one?

 Also, if would be good to have at least some of the stack trace.

 You can set up serial console for purpose of logging all information
 to remote host.

  One line summary of the problem:
  Crash on device_shutdown
 
 
  Full description of the problem/report:
  When I shut down the system, I get a crash on the device_shutdown
  function (called by kernel_shutdown_prepare).
  Since the bug happens when the system is almost shutted down, I can't get
  the log...
  I partially copied it by hand, ask me if you need further details :
  Will now halt.
  BUG: unable to handle kernel NULL pointer dereference at virtual adress
  0128
  ...
  EIP is at device_shutdown + 0x49/0x6c
  ...
  Process halt
  ...
  /etc/rc0.d/S90halt: line 21: 11198 Segmentation fault halt -d -f
  $shutdown $poweroff $hddown
 
  Julien Richard-Foy

 p.s. Please, care to honor Mail-Followup-To header information. Thanks.
 

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 4/6] mm: merge populate and nopage into fault (fixes nonlinear)

2007-03-07 Thread Benjamin Herrenschmidt

  NOPAGE_REFAULT is removed. This should be implemented with -fault, and
  no users have hit mainline yet.
 
 Did benh agree with that?

I won't use NOPAGE_REFAULT, I use NOPFN_REFAULT and that has hit
mainline. I will switch to -fault when I have time to adapt the code,
in the meantime, NOPFN_REFAULT should stay.

Note that one thing we really want with the new -fault (though I
haven't looked at the patches lately to see if it's available) is to be
able to differenciate faults coming from userspace from faults coming
from the kernel. The major difference is that the former can be
re-executed to handle signals, the later can't. Thus waiting in the
fault handler can be made interruptible in the former case, not in the
later case.

Ben.


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] [patch 4/6 -rt] powerpc 2.6.20-rt8: fix a runtime warnings for xmon

2007-03-07 Thread Benjamin Herrenschmidt
On Wed, 2007-03-07 at 10:16 +0100, Ingo Molnar wrote:
 * Tsutomu OWA [EMAIL PROTECTED] wrote:
 
  @@ -342,6 +342,7 @@ static int xmon_core(struct pt_regs *reg
   
  msr = mfmsr();
  mtmsr(msr  ~MSR_EE);   /* disable interrupts */
  +   preempt_disable();
 
 i'm not an xmon expert, but maybe it might make more sense to first 
 disable preemption, then interrupts - otherwise you could be preempted 
 right after having disabled these interrupts (and be scheduled to 
 another CPU, etc.). What is the difference between local_irq_save() and 
 the above 'disable interrupts' sequence? If it's not the same and 
 xmon_core() relied on having hardirqs disabled then it might make sense 
 to do a local_irq_save() there, instead of a preempt_disable().

powerpc 64 bits nowadays does lazy HW masking, so local_irq_disable()
will not actually switch MSR_EE off. However, xmon needs that to happen
(though we have a nicer accessor to do it, I suspect some bitrot need
fixing in there, possibly already fixed in .21)

I agree that preempt_disable() should be put before the MSR tweaking
though.

Ben.


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 0/2] xfs: only use refcounted pages for I/O

2007-03-07 Thread Christoph Hellwig
Many block drivers (aoe, iscsi) really want refcountable pages in
bios, which is what almost everyone send down.  XFS unfortunately
has a few places where it sends down buffers that may come from
kmalloc, which breaks them.  The patches in this series fix this
issue up.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 1/2] xfs: use xfs_get_buf_noaddr for iclogs

2007-03-07 Thread Christoph Hellwig
Currently xlog_alloc allocates memory for the iclogs first, then
allocates a buffer using xfs_buf_get_empty and finally assigns
the memory to the buffer.  We don't really want to do this, but
rather allocate a buffer with memory attached to it using
xfs_buf_get_noaddr.  There's a subtile change because
xfs_buf_get_empty returns the buffer locked, but xfs_buf_get_noaddr
returns it unlocked.  From my auditing and testing nothing in the
log I/O code cares about this distincition, but I'd be happy if
someone could try to prove this independently.


Signed-off-by: Christoph Hellwig [EMAIL PROTECTED]

Index: linux-2.6/fs/xfs/xfs_log.c
===
--- linux-2.6.orig/fs/xfs/xfs_log.c 2007-03-06 17:26:40.0 +0100
+++ linux-2.6/fs/xfs/xfs_log.c  2007-03-06 17:28:03.0 +0100
@@ -1199,11 +1199,16 @@
*iclogp = (xlog_in_core_t *)
  kmem_zalloc(sizeof(xlog_in_core_t), KM_SLEEP);
iclog = *iclogp;
-   iclog-hic_data = (xlog_in_core_2_t *)
- kmem_zalloc(iclogsize, KM_SLEEP | KM_LARGE);
-
iclog-ic_prev = prev_iclog;
prev_iclog = iclog;
+
+   bp = xfs_buf_get_noaddr(log-l_iclog_size, mp-m_logdev_targp);
+   XFS_BUF_SET_IODONE_FUNC(bp, xlog_iodone);
+   XFS_BUF_SET_BDSTRAT_FUNC(bp, xlog_bdstrat_cb);
+   XFS_BUF_SET_FSPRIVATE2(bp, (unsigned long)1);
+   iclog-ic_bp = bp;
+   iclog-hic_data = bp-b_addr;
+
log-l_iclog_bak[i] = (xfs_caddr_t)(iclog-ic_header);
 
head = iclog-ic_header;
@@ -1216,11 +1221,6 @@
INT_SET(head-h_fmt, ARCH_CONVERT, XLOG_FMT);
memcpy(head-h_fs_uuid, mp-m_sb.sb_uuid, sizeof(uuid_t));
 
-   bp = xfs_buf_get_empty(log-l_iclog_size, mp-m_logdev_targp);
-   XFS_BUF_SET_IODONE_FUNC(bp, xlog_iodone);
-   XFS_BUF_SET_BDSTRAT_FUNC(bp, xlog_bdstrat_cb);
-   XFS_BUF_SET_FSPRIVATE2(bp, (unsigned long)1);
-   iclog-ic_bp = bp;
 
iclog-ic_size = XFS_BUF_SIZE(bp) - log-l_iclog_hsize;
iclog-ic_state = XLOG_STATE_ACTIVE;
@@ -1229,7 +1229,6 @@
iclog-ic_datap = (char *)iclog-hic_data + log-l_iclog_hsize;
 
ASSERT(XFS_BUF_ISBUSY(iclog-ic_bp));
-   ASSERT(XFS_BUF_VALUSEMA(iclog-ic_bp) = 0);
sv_init(iclog-ic_forcesema, SV_DEFAULT, iclog-force);
sv_init(iclog-ic_writesema, SV_DEFAULT, iclog-write);
 
@@ -1528,7 +1527,6 @@
}
 #endif
next_iclog = iclog-ic_next;
-   kmem_free(iclog-hic_data, log-l_iclog_size);
kmem_free(iclog, sizeof(xlog_in_core_t));
iclog = next_iclog;
}
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 2/2] xfs: stop using kmalloc in xfs_buf_get_noaddr

2007-03-07 Thread Christoph Hellwig
Currently xfs_buf_get_noaddr allocates memory using kmem_alloc which
can end up either in kmalloc or vmalloc and assigns it to the buffer.
This patch changes it to allocate individual pages and if there is
more then one maps it into kernel virtual space using vmap.

This means the minimum buffer allocation is PAGE_SIZE now.  For two
of the three caller (log buffers, log recovery) that is perfectly
fine, because they always allocate buffers that are a power of two
of the page size anyway.  For xfs_zero_remaining_bytes the minimum
allocation goes up from blocksize to pagesize and thus there is
a potential waste of memory for blocksize  pagesize allocations,
which is unfortunate but not directly solveable when block
drivers expect reference countable pages.  To fix this waste
xfs_zero_remaining_bytes could be rewritten to zero more than
a single block at a time, which sounds like a good idea in general.


Signed-off-by: Christoph Hellwig [EMAIL PROTECTED]

Index: linux-2.6/fs/xfs/linux-2.6/xfs_buf.c
===
--- linux-2.6.orig/fs/xfs/linux-2.6/xfs_buf.c   2007-03-05 15:54:40.0 
+0100
+++ linux-2.6/fs/xfs/linux-2.6/xfs_buf.c2007-03-05 15:54:47.0 
+0100
@@ -314,7 +314,7 @@
 
ASSERT(list_empty(bp-b_hash_list));
 
-   if (bp-b_flags  _XBF_PAGE_CACHE) {
+   if (bp-b_flags  (_XBF_PAGE_CACHE|_XBF_PAGES)) {
uinti;
 
if ((bp-b_flags  XBF_MAPPED)  (bp-b_page_count  1))
@@ -323,18 +323,11 @@
for (i = 0; i  bp-b_page_count; i++) {
struct page *page = bp-b_pages[i];
 
-   ASSERT(!PagePrivate(page));
+   if (bp-b_flags  _XBF_PAGE_CACHE)
+   ASSERT(!PagePrivate(page));
page_cache_release(page);
}
_xfs_buf_free_pages(bp);
-   } else if (bp-b_flags  _XBF_KMEM_ALLOC) {
-/*
- * XXX(hch): bp-b_count_desired might be incorrect (see
- * xfs_buf_associate_memory for details), but fortunately
- * the Linux version of kmem_free ignores the len argument..
- */
-   kmem_free(bp-b_addr, bp-b_count_desired);
-   _xfs_buf_free_pages(bp);
}
 
xfs_buf_deallocate(bp);
@@ -764,41 +757,41 @@
size_t  len,
xfs_buftarg_t   *target)
 {
-   size_t  malloc_len = len;
+   unsigned long   page_count = PAGE_ALIGN(len)  PAGE_SHIFT;
+   int error, i;
xfs_buf_t   *bp;
-   void*data;
-   int error;
 
bp = xfs_buf_allocate(0);
if (unlikely(bp == NULL))
goto fail;
_xfs_buf_initialize(bp, target, 0, len, 0);
 
- try_again:
-   data = kmem_alloc(malloc_len, KM_SLEEP | KM_MAYFAIL | KM_LARGE);
-   if (unlikely(data == NULL))
+   error = _xfs_buf_get_pages(bp, page_count, 0);
+   if (error)
goto fail_free_buf;
 
-   /* check whether alignment matches.. */
-   if ((__psunsigned_t)data !=
-   ((__psunsigned_t)data  ~target-bt_smask)) {
-   /* .. else double the size and try again */
-   kmem_free(data, malloc_len);
-   malloc_len = 1;
-   goto try_again;
-   }
-
-   error = xfs_buf_associate_memory(bp, data, len);
-   if (error)
+   for (i = 0; i  page_count; i++) {
+   bp-b_pages[i] = alloc_page(GFP_KERNEL);
+   if (!bp-b_pages[i])
+   goto fail_free_mem;
+   }
+   bp-b_flags |= _XBF_PAGES;
+
+   error = _xfs_buf_map_pages(bp, XBF_MAPPED);
+   if (unlikely(error)) {
+   printk(KERN_WARNING %s: failed to map pages\n,
+   __FUNCTION__);
goto fail_free_mem;
-   bp-b_flags |= _XBF_KMEM_ALLOC;
+   }
 
xfs_buf_unlock(bp);
 
XB_TRACE(bp, no_daddr, data);
return bp;
+
  fail_free_mem:
-   kmem_free(data, malloc_len);
+   for ( ; i = 0; i++)
+   __free_page(bp-b_pages[i]);
  fail_free_buf:
xfs_buf_free(bp);
  fail:
Index: linux-2.6/fs/xfs/linux-2.6/xfs_buf.h
===
--- linux-2.6.orig/fs/xfs/linux-2.6/xfs_buf.h   2007-03-05 15:54:40.0 
+0100
+++ linux-2.6/fs/xfs/linux-2.6/xfs_buf.h2007-03-05 15:55:06.0 
+0100
@@ -63,7 +63,7 @@
 
/* flags used only internally */
_XBF_PAGE_CACHE = (1  17),/* backed by pagecache */
-   _XBF_KMEM_ALLOC = (1  18),/* backed by kmem_alloc()  */
+   _XBF_PAGES = (1  18), /* backed by refcounted pages  */
_XBF_RUN_QUEUES = (1  19),/* run block device task queue */

Re: [patch 4/6] mm: merge populate and nopage into fault (fixes nonlinear)

2007-03-07 Thread Miklos Szeredi
 *sigh* yes was looking at all that code, thats gonna be darn slow
 though, but I'll whip up a patch.

Well, if it's going to be darn slow, maybe it's better to go with
mingo's plan on emulating nonlinear vmas with linear ones.  That'll be
darn slow as well, but at least it will be much less complicated.

Miklos
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 1/8] fix race in clear_page_dirty_for_io()

2007-03-07 Thread Andrew Morton

(cc's reinstated)

On Wed, 07 Mar 2007 09:09:50 +0100 Miklos Szeredi [EMAIL PROTECTED] wrote:

 There's a race in clear_page_dirty_for_io() that allows a page to have
 cleared PG_dirty, while being mapped read-write into the page table(s).

I assume you refer to this:

 * FIXME! We still have a race here: if somebody
 * adds the page back to the page tables in
 * between the page_mkclean() and the TestClearPageDirty(),
 * we might have it mapped without the dirty bit set.
 */
if (page_mkclean(page))
set_page_dirty(page);
if (TestClearPageDirty(page)) {
dec_zone_page_state(page, NR_FILE_DIRTY);
return 1;
}

I guess the comment actually refers to a writefault after the
set_page_dirty() and before the TestClearPageDirty().  The fault handler
will run set_page_dirty() and will return to userspace to rerun the write. 
The page then gets set pte-dirty but this thread of control will now make
the page !PageDirty() and will write it out.

With Nick's proposed lock-the-page-in-pagefaults patches, we have
lock_page() synchronisation between pagefaults and
clear_page_dirty_for_io() which I think will fix this.

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 4/6] mm: merge populate and nopage into fault (fixes nonlinear)

2007-03-07 Thread Nick Piggin
On Wed, Mar 07, 2007 at 11:05:48AM +0100, Benjamin Herrenschmidt wrote:
 
   NOPAGE_REFAULT is removed. This should be implemented with -fault, and
   no users have hit mainline yet.
  
  Did benh agree with that?
 
 I won't use NOPAGE_REFAULT, I use NOPFN_REFAULT and that has hit
 mainline. I will switch to -fault when I have time to adapt the code,
 in the meantime, NOPFN_REFAULT should stay.

I think I removed not only NOFPN_REFAULT, but also nopfn itself, *and*
adapted the code for you ;) it is in patch 5/6, sent a while ago. 

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 4/6] mm: merge populate and nopage into fault (fixes nonlinear)

2007-03-07 Thread Nick Piggin
On Wed, Mar 07, 2007 at 11:13:20AM +0100, Miklos Szeredi wrote:
  *sigh* yes was looking at all that code, thats gonna be darn slow
  though, but I'll whip up a patch.
 
 Well, if it's going to be darn slow, maybe it's better to go with
 mingo's plan on emulating nonlinear vmas with linear ones.  That'll be

There are real users who want these fast, though.

 darn slow as well, but at least it will be much less complicated.

IMO, the best thing to do is just restore msync behaviour, and comment
the fact that we ignore nonlinears. We need to restore msync behaviour
to fix races in regular mappings anyway, at least for now.

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH - RFC] allow setting vm_dirty below 1% for large memory machines

2007-03-07 Thread Leroy van Logchem

  actually a global dirty_ratio causes interference between devices which 
  should otherwise not block each other...
  
  if you set up a dd if=/dev/zero of=/dev/sdb bs=1M it shouldn't affect 
  write performance on sda -- but it does... because the dd basically 
  dirties all of the dirty_background_ratio pages and then any task 
  writing to sda has to block in the foreground...  (i've had this happen in 
  practice -- my hack fix is oflag=direct on the dd... but the problem still 
  exists.)
 
 yeah.  Plus your heavy-dd-to-/dev/sda tends to block light-writers to
 /dev/sda in perhaps disproportionate ways.
 
 This is on my list of things to look at.  Hah.

It really exists in the wild on both large memory and storage machines. I hope
we don't have to patch Samba on every release to add POSIX fadvise calls in
order to have a more polite VM. A 'cfq' for write generators from vm to devices
would be nice if it can auto probe the device write speed so we don't have to
use knobs. Just let the kernel figure out what are the best values to the
algorithms would be the ideal world.

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Linux v2.6.21-rc3

2007-03-07 Thread Benjamin Herrenschmidt
On Tue, 2007-03-06 at 20:59 -0800, Linus Torvalds wrote:

 Linus Torvalds (2):
   Revert [PATCH] LOG2: Alter get_order() so that it can make use of 
 ilog2() on a constant
   Linux 2.6.21-rc3

Greg, I think we should revert that patch in 2.6.20.x stable serie too
as get_order is broken there as well, causing random kernel memory
corruption every now and then among others.

Cheers,
Ben

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 4/6] mm: merge populate and nopage into fault (fixes nonlinear)

2007-03-07 Thread Peter Zijlstra
On Wed, 2007-03-07 at 11:21 +0100, Nick Piggin wrote:
 On Wed, Mar 07, 2007 at 11:13:20AM +0100, Miklos Szeredi wrote:
   *sigh* yes was looking at all that code, thats gonna be darn slow
   though, but I'll whip up a patch.
  
  Well, if it's going to be darn slow, maybe it's better to go with
  mingo's plan on emulating nonlinear vmas with linear ones.  That'll be
 
 There are real users who want these fast, though.

Yeah, why don't we have a tree per nonlinear vma to find these pages?

wli mentions shadow page tables..

  darn slow as well, but at least it will be much less complicated.
 
 IMO, the best thing to do is just restore msync behaviour, and comment
 the fact that we ignore nonlinears. We need to restore msync behaviour
 to fix races in regular mappings anyway, at least for now.

Seems to be the best quick solution indeed.

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 1/8] fix race in clear_page_dirty_for_io()

2007-03-07 Thread Miklos Szeredi
 (cc's reinstated)
 
 On Wed, 07 Mar 2007 09:09:50 +0100 Miklos Szeredi [EMAIL PROTECTED] wrote:
 
  There's a race in clear_page_dirty_for_io() that allows a page to have
  cleared PG_dirty, while being mapped read-write into the page table(s).
 
 I assume you refer to this:
 
* FIXME! We still have a race here: if somebody
* adds the page back to the page tables in
* between the page_mkclean() and the TestClearPageDirty(),
* we might have it mapped without the dirty bit set.
*/
   if (page_mkclean(page))
   set_page_dirty(page);
   if (TestClearPageDirty(page)) {
   dec_zone_page_state(page, NR_FILE_DIRTY);
   return 1;
   }
 

Yes.

 I guess the comment actually refers to a writefault after the
 set_page_dirty() and before the TestClearPageDirty().  The fault handler
 will run set_page_dirty() and will return to userspace to rerun the write. 
 The page then gets set pte-dirty but this thread of control will now make
 the page !PageDirty() and will write it out.

Yes.

 With Nick's proposed lock-the-page-in-pagefaults patches, we have
 lock_page() synchronisation between pagefaults and
 clear_page_dirty_for_io() which I think will fix this.

After a quick look, I don't think it does.  It locks the page in
do_no_page(), but not for the whole fault.  In particular do_wp_page()
is not affected.  But I haven't yet looked closely at that patch, so I
could be wrong.

Miklos
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 4/6] mm: merge populate and nopage into fault (fixes nonlinear)

2007-03-07 Thread Nick Piggin
On Wed, Mar 07, 2007 at 11:24:45AM +0100, Peter Zijlstra wrote:
 On Wed, 2007-03-07 at 11:21 +0100, Nick Piggin wrote:
  On Wed, Mar 07, 2007 at 11:13:20AM +0100, Miklos Szeredi wrote:
*sigh* yes was looking at all that code, thats gonna be darn slow
though, but I'll whip up a patch.
   
   Well, if it's going to be darn slow, maybe it's better to go with
   mingo's plan on emulating nonlinear vmas with linear ones.  That'll be
  
  There are real users who want these fast, though.
 
 Yeah, why don't we have a tree per nonlinear vma to find these pages?
 
 wli mentions shadow page tables..

We could do something more efficient, but I thought that half the point
was that they didn't carry any of this extra memory, and they could be
really fast to set up at the expense of efficiency elsewhere.

   darn slow as well, but at least it will be much less complicated.
  
  IMO, the best thing to do is just restore msync behaviour, and comment
  the fact that we ignore nonlinears. We need to restore msync behaviour
  to fix races in regular mappings anyway, at least for now.
 
 Seems to be the best quick solution indeed.

If we fix the race in the linear mappings, then we can just do the full
msync for nonlinear vmas, and the fast noop version for everyone else.

I don't see it being a big deal. I doubt anybody is writing out huge
amounts of data via nonlinear mappings.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 4/6] mm: merge populate and nopage into fault (fixes nonlinear)

2007-03-07 Thread Benjamin Herrenschmidt
On Wed, 2007-03-07 at 11:17 +0100, Nick Piggin wrote:
 On Wed, Mar 07, 2007 at 11:05:48AM +0100, Benjamin Herrenschmidt wrote:
  
NOPAGE_REFAULT is removed. This should be implemented with -fault, and
no users have hit mainline yet.
   
   Did benh agree with that?
  
  I won't use NOPAGE_REFAULT, I use NOPFN_REFAULT and that has hit
  mainline. I will switch to -fault when I have time to adapt the code,
  in the meantime, NOPFN_REFAULT should stay.
 
 I think I removed not only NOFPN_REFAULT, but also nopfn itself, *and*
 adapted the code for you ;) it is in patch 5/6, sent a while ago. 

Ok, I need to look. I've been travelling, having meeting etc... for the
last couple of weeks and I'm taking a week off next week :-)

Ben.


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 4/6] mm: merge populate and nopage into fault (fixes nonlinear)

2007-03-07 Thread Peter Zijlstra
On Wed, 2007-03-07 at 11:38 +0100, Nick Piggin wrote:

   There are real users who want these fast, though.
  
  Yeah, why don't we have a tree per nonlinear vma to find these pages?
  
  wli mentions shadow page tables..
 
 We could do something more efficient, but I thought that half the point
 was that they didn't carry any of this extra memory, and they could be
 really fast to set up at the expense of efficiency elsewhere.

I'm failing to understand this :-(

That extra memory, and apparently they don't want the inefficiency
either.

 I don't see it being a big deal. I doubt anybody is writing out huge
 amounts of data via nonlinear mappings.

Well, now they don't, but it could be done or even exploited as a DoS.

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] [patch 4/6 -rt] powerpc 2.6.20-rt8: fix a runtime warnings for xmon

2007-03-07 Thread Tsutomu OWA

At Wed, 07 Mar 2007 11:10:59 +0100,
Benjamin Herrenschmidt wrote:
 
 On Wed, 2007-03-07 at 10:16 +0100, Ingo Molnar wrote:
  * Tsutomu OWA [EMAIL PROTECTED] wrote:
  
   @@ -342,6 +342,7 @@ static int xmon_core(struct pt_regs *reg

 msr = mfmsr();
 mtmsr(msr  ~MSR_EE);   /* disable interrupts */
   + preempt_disable();
  
  i'm not an xmon expert, but maybe it might make more sense to first 
  disable preemption, then interrupts - otherwise you could be preempted 
  right after having disabled these interrupts (and be scheduled to 
  another CPU, etc.). What is the difference between local_irq_save() and 
  the above 'disable interrupts' sequence? If it's not the same and 
  xmon_core() relied on having hardirqs disabled then it might make sense 
  to do a local_irq_save() there, instead of a preempt_disable().
 
 powerpc 64 bits nowadays does lazy HW masking, so local_irq_disable()
 will not actually switch MSR_EE off. However, xmon needs that to happen
 (though we have a nicer accessor to do it, I suspect some bitrot need
 fixing in there, possibly already fixed in .21)
 
 I agree that preempt_disable() should be put before the MSR tweaking
 though.

  As all of you said, I'm resending the patch here.  

  To fix the following runtime warnings when entering xmon.

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 
Entering xmon
BUG: using smp_processor_id() in preemptible [] code: khvcd/280
caller is .xmon_core+0xb8/0x8ec
Call Trace:
[CFD737C0] [C000FAAC] .show_stack+0x68/0x1b0 (unreliable)
[CFD73860] [C01F71F0] .debug_smp_processor_id+0xc8/0xf8
[CFD738F0] [C004AF30] .xmon_core+0xb8/0x8ec
[CFD73A80] [C004B918] .xmon+0x38/0x4c
[CFD73C60] [C004BA8C] .sysrq_handle_xmon+0x48/0x5c
[CFD73CD0] [C0243A68] .__handle_sysrq+0xe0/0x1b0
[CFD73D70] [C0244974] .hvc_poll+0x18c/0x2b4
[CFD73E50] [C0244E80] .khvcd+0x88/0x164
[CFD73EE0] [C0075014] .kthread+0x124/0x174
[CFD73F90] [C0023D48] .kernel_thread+0x4c/0x68
BUG: khvcd:280 task might have lost a preemption check!
Call Trace:
[CFD73740] [C000FAAC] .show_stack+0x68/0x1b0 (unreliable)
[CFD737E0] [C0054920] .preempt_enable_no_resched+0x64/0x7c
[CFD73860] [C01F71F8] .debug_smp_processor_id+0xd0/0xf8
[CFD738F0] [C004AF30] .xmon_core+0xb8/0x8ec
[CFD73A80] [C004B918] .xmon+0x38/0x4c
[CFD73C60] [C004BA8C] .sysrq_handle_xmon+0x48/0x5c
[CFD73CD0] [C0243A68] .__handle_sysrq+0xe0/0x1b0
[CFD73D70] [C0244974] .hvc_poll+0x18c/0x2b4
[CFD73E50] [C0244E80] .khvcd+0x88/0x164
[CFD73EE0] [C0075014] .kthread+0x124/0x174
[CFD73F90] [C0023D48] .kernel_thread+0x4c/0x68
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 

thanks a lot!

Signed-off-by: Tsutomu Owa [EMAIL PROTECTED]
-- owa

--- linux-rt8/arch/powerpc/xmon/xmon.c  2007-02-20 09:38:52.0 +0900
+++ rt/arch/powerpc/xmon/xmon.c 2007-03-07 19:49:38.0 +0900
@@ -340,6 +340,7 @@ static int xmon_core(struct pt_regs *reg
unsigned long timeout;
 #endif
 
+   preempt_disable();
msr = mfmsr();
mtmsr(msr  ~MSR_EE);   /* disable interrupts */
 
@@ -517,6 +518,7 @@ static int xmon_core(struct pt_regs *reg
insert_cpu_bpts();
 
mtmsr(msr); /* restore interrupt enable */
+   preempt_enable();
 
return cmd != 'X'  cmd != EOF;
 }
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 4/6] mm: merge populate and nopage into fault (fixes nonlinear)

2007-03-07 Thread Nick Piggin
On Wed, Mar 07, 2007 at 11:47:42AM +0100, Peter Zijlstra wrote:
 On Wed, 2007-03-07 at 11:38 +0100, Nick Piggin wrote:
 
There are real users who want these fast, though.
   
   Yeah, why don't we have a tree per nonlinear vma to find these pages?
   
   wli mentions shadow page tables..
  
  We could do something more efficient, but I thought that half the point
  was that they didn't carry any of this extra memory, and they could be
  really fast to set up at the expense of efficiency elsewhere.
 
 I'm failing to understand this :-(
 
 That extra memory, and apparently they don't want the inefficiency
 either.

Sorry, I didn't understand your misunderstandings ;)

 
  I don't see it being a big deal. I doubt anybody is writing out huge
  amounts of data via nonlinear mappings.
 
 Well, now they don't, but it could be done or even exploited as a DoS.

But so could nonlinear page reclaim. I think we need to restrict nonlinear
mappings to root if we're worried about that.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: PROBLEM: 2.6.20-1 not working on ibook g4 (BUG/Oops)

2007-03-07 Thread Benjamin Herrenschmidt
On Wed, 2007-03-07 at 17:53 +1300, Paul Collins wrote:
 David Woodhouse [EMAIL PROTECTED] writes:
 
  On Tue, 2007-03-06 at 14:53 +1300, Paul Collins wrote:
  In case it's of interest, 2.6.20 has been running fine on my
  PowerBook5,4. 
 
  How much memory? What if you boot with mem=512M or mem=256M?
 
 1GB.  Also works fine when booted with those options.

Can you try 2.6.21-rc3 ? We just fixed a nasty bug causing memory
corruption.

Ben.


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] [patch 4/6 -rt] powerpc 2.6.20-rt8: fix a runtime warnings for xmon

2007-03-07 Thread Arnd Bergmann
On Wednesday 07 March 2007, Ingo Molnar wrote:
 i'm not an xmon expert, but maybe it might make more sense to first 
 disable preemption, then interrupts - otherwise you could be preempted 
 right after having disabled these interrupts (and be scheduled to 
 another CPU, etc.). What is the difference between local_irq_save() and 
 the above 'disable interrupts' sequence? If it's not the same and 
 xmon_core() relied on having hardirqs disabled then it might make sense 
 to do a local_irq_save() there, instead of a preempt_disable().

Since relatively recently, powerpc does no longer actually disable
the hardware interrupts with local_irq_disable(), but rather sets
a per-cpu flag that will be checked if an actual interrupt comes
in as part of the critical section.

The mtmsr() sequence in xmon corresponds to hard_irq_disable()
and should probably changed to that, but then you still need
the extra preempt_disable() / preempt_enable().

I think you're right about the sequence having to be
1. preempt_disable()
2. hard_irq_disable()
3.
4. hard_irq_enable()
5. preempt_enable()

Arnd 
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH -mm] utrace: nommu fixup support utrace

2007-03-07 Thread David Howells
Roland McGrath [EMAIL PROTECTED] wrote:

 That old ptrace check seems pretty questionable to me.  I think what you
 want is for the nommu world's get_user_pages/access_process_vm when called
 with force=1,write=1 on a read-only MAP_PRIVATE page to do something more
 morally similar to the mmu world's COW than it does now.

Such as what?  You *can't* do COW without relocating all the pointers userspace
may have into that VMA.  However, unless you force non-sharing of R/O
MAP_PRIVATE VMAs, you will have text segments of executables and libraries
shared with other processes.  Imagine: you set a breakpoint in uclibc read()
and your whole system dies instantly.

What I did is to say that if a process has PT_TRACED set then the MAP_PRIVATE
VMAs start with their own copies.  The debugger can set this in a new process
by cloning it with appropriate CLONE_xxx flags.

It's not perfect, I know, but it's the best I could come up with as a solution
to debugging things in a NOMMU environment that supports shared libraries and
executables.

David
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 8/6] mm: fix cpdfio vs fault race

2007-03-07 Thread Andrew Morton

(cc's reestablished yet again)

On Wed, 7 Mar 2007 12:04:29 +0100 Nick Piggin [EMAIL PROTECTED] wrote:

 OK, this is how we can plug that hole, leveraging my
 previous patches to lock page over do_no_page.
 
 I'm pretty sure the PageLocked invariant is correct.
 
 
 --
 Fix msync data loss and (less importantly) dirty page accounting inaccuracies
 due to the race remaining in clear_page_dirty_for_io().
 
 The deleted comment explains what the race was, and the added comments
 explain how it is fixed.
 
 Signed-off-by: Nick Piggin [EMAIL PROTECTED]
 
 Index: linux-2.6/mm/memory.c
 ===
 --- linux-2.6.orig/mm/memory.c
 +++ linux-2.6/mm/memory.c
 @@ -1676,6 +1676,17 @@ gotten:
  unlock:
   pte_unmap_unlock(page_table, ptl);
   if (dirty_page) {
 + /*
 +  * Yes, Virginia, this is actually required to prevent a race
 +  * with clear_page_dirty_for_io() from clearing the page dirty
 +  * bit after it clear all dirty ptes, but before a racing
 +  * do_wp_page installs a dirty pte.
 +  *
 +  * do_fault is protected similarly by holding the page lock
 +  * after the dirty pte is installed.
 +  */
 + lock_page(dirty_page);
 + unlock_page(dirty_page);
   set_page_dirty_balance(dirty_page);
   put_page(dirty_page);

Yes, I think that'll plug it.  A wait_on_page_locked() should suffice.

But does this have any dependency on the lock-page-over-do_no_page patches?


   }
 Index: linux-2.6/mm/page-writeback.c
 ===
 --- linux-2.6.orig/mm/page-writeback.c
 +++ linux-2.6/mm/page-writeback.c
 @@ -903,6 +903,8 @@ int clear_page_dirty_for_io(struct page 
  {
   struct address_space *mapping = page_mapping(page);
  
 + BUG_ON(!PageLocked(page));
 +
   if (mapping  mapping_cap_account_dirty(mapping)) {
   /*
* Yes, Virginia, this is indeed insane.
 @@ -928,14 +930,19 @@ int clear_page_dirty_for_io(struct page 
* We basically use the page master dirty bit
* as a serialization point for all the different
* threads doing their things.
 -  *
 -  * FIXME! We still have a race here: if somebody
 -  * adds the page back to the page tables in
 -  * between the page_mkclean() and the TestClearPageDirty(),
 -  * we might have it mapped without the dirty bit set.
*/
   if (page_mkclean(page))
   set_page_dirty(page);
 + /*
 +  * We carefully synchronise fault handlers against
 +  * installing a dirty pte and marking the page dirty
 +  * at this point. We do this by having them hold the
 +  * page lock at some point after installing their
 +  * pte, but before marking the page dirty.
 +  * Pages are always locked coming in here, so we get
 +  * the desired exclusion. See mm/memory.c:do_wp_page()
 +  * for more comments.
 +  */
   if (TestClearPageDirty(page)) {
   dec_zone_page_state(page, NR_FILE_DIRTY);
   return 1;
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[1/1] eventfs: pseudo fs which allows to bind events to file descriptors.

2007-03-07 Thread Evgeniy Polyakov
Hello.

This pseudo fs allows to bind a file descriptor to different kinds of
events, which allows to poll them using epoll().

This particular morning hack supports signals only.

If idea is supposed to be right, I can cook up POSIX timers support.

Signal delivery note.
If special flag is set in signalfd(signo, flag), then signals are _not_
delivered through pending mask update but only through epoll queue.
(Copied from kevent).

Userspace signal code and patch itself can be found at:
http://tservice.net.ru/~s0mbre/archive/eventfs/

signal.c is also attached for interested reader.

Signed-off-by: Evgeniy Polyakov [EMAIL PROTECTED]

diff --git a/arch/i386/kernel/syscall_table.S b/arch/i386/kernel/syscall_table.S
index 2697e92..b14ee54 100644
--- a/arch/i386/kernel/syscall_table.S
+++ b/arch/i386/kernel/syscall_table.S
@@ -319,3 +319,4 @@ ENTRY(sys_call_table)
.long sys_move_pages
.long sys_getcpu
.long sys_epoll_pwait
+   .long sys_signalfd  /* 320 */
diff --git a/arch/x86_64/ia32/ia32entry.S b/arch/x86_64/ia32/ia32entry.S
index eda7a0d..bc6336c 100644
--- a/arch/x86_64/ia32/ia32entry.S
+++ b/arch/x86_64/ia32/ia32entry.S
@@ -719,4 +719,5 @@ ia32_sys_call_table:
.quad compat_sys_move_pages
.quad sys_getcpu
.quad sys_epoll_pwait
+   .quad sys_signalfd
 ia32_syscall_end:  
diff --git a/fs/Kconfig b/fs/Kconfig
index 3c4886b..09803ad 100644
--- a/fs/Kconfig
+++ b/fs/Kconfig
@@ -1032,6 +1032,15 @@ config CONFIGFS_FS
  Both sysfs and configfs can and should exist together on the
  same system. One is not a replacement for the other.
 
+config EVENTFS
+   bool Enable eventpoll filesystem support if EMBEDDED
+   depends on EPOLL
+   default y
+   help
+ Allows to bind file descriptors to different kinds of objects
+ like signals and timers and work with them using epoll 
+ family of system calls.
+
 endmenu
 
 menu Miscellaneous filesystems
diff --git a/fs/Makefile b/fs/Makefile
index 9edf411..185bcb1 100644
--- a/fs/Makefile
+++ b/fs/Makefile
@@ -22,6 +22,7 @@ endif
 obj-$(CONFIG_INOTIFY)  += inotify.o
 obj-$(CONFIG_INOTIFY_USER) += inotify_user.o
 obj-$(CONFIG_EPOLL)+= eventpoll.o
+obj-$(CONFIG_EVENTFS)  += eventfs.o
 obj-$(CONFIG_COMPAT)   += compat.o compat_ioctl.o
 
 nfsd-$(CONFIG_NFSD):= nfsctl.o
diff --git a/fs/eventfs.c b/fs/eventfs.c
new file mode 100644
index 000..dae108c
--- /dev/null
+++ b/fs/eventfs.c
@@ -0,0 +1,221 @@
+/*
+ * 2007 Copyright (c) Evgeniy Polyakov [EMAIL PROTECTED]
+ * All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307  USA
+ */
+
+#include linux/kernel.h
+#include linux/module.h
+#include linux/types.h
+#include linux/list.h
+#include linux/slab.h
+#include linux/spinlock.h
+#include linux/fs.h
+#include linux/file.h
+#include linux/mount.h
+#include linux/device.h
+#include linux/poll.h
+#include asm/io.h
+
+static inline void eventfs_set_signal_file(int sig, struct file *file)
+{
+   spin_lock_irq(current-sighand-siglock);
+   current-signal_file[sig-1] = file;
+   spin_unlock_irq(current-sighand-siglock);
+}
+   
+static int eventfs_signal_release(struct inode *inode, struct file *file)
+{
+   int sig = (int)((unsigned long)(file-private_data)  0x0fff);
+   eventfs_set_signal_file(sig, NULL);
+   return 0;
+}
+
+static unsigned int eventfs_signal_poll(struct file *file, struct 
poll_table_struct *wait)
+{
+   int sig = (int)((unsigned long)(file-private_data)  0x0fff);
+   unsigned int mask = 0;
+   unsigned long flags;
+
+   poll_wait(file, current-signal_wait, wait);
+
+   spin_lock_irqsave(current-sighand-siglock, flags);
+   if (!sigismember(current-blocked, sig)  (((unsigned 
long)(file-private_data))  0x4000)) {
+   mask = POLLIN | POLLRDNORM;
+   file-private_data = (void *)(((unsigned 
long)(file-private_data))  ~0x4000);
+   }
+   spin_unlock_irqrestore(current-sighand-siglock, flags);
+
+   return mask;
+}
+
+struct file_operations eventfs_signal_fops = {
+   .release= eventfs_signal_release,
+   .poll   = eventfs_signal_poll,
+   .owner  = THIS_MODULE,
+};
+

Re: [patch 8/6] mm: fix cpdfio vs fault race

2007-03-07 Thread Nick Piggin
On Wed, Mar 07, 2007 at 03:20:38AM -0800, Andrew Morton wrote:
 
 (cc's reestablished yet again)
 
 On Wed, 7 Mar 2007 12:04:29 +0100 Nick Piggin [EMAIL PROTECTED] wrote:
 
  OK, this is how we can plug that hole, leveraging my
  previous patches to lock page over do_no_page.
  
  I'm pretty sure the PageLocked invariant is correct.
  
  
  --
  Fix msync data loss and (less importantly) dirty page accounting 
  inaccuracies
  due to the race remaining in clear_page_dirty_for_io().
  
  The deleted comment explains what the race was, and the added comments
  explain how it is fixed.
  
  Signed-off-by: Nick Piggin [EMAIL PROTECTED]
  
  Index: linux-2.6/mm/memory.c
  ===
  --- linux-2.6.orig/mm/memory.c
  +++ linux-2.6/mm/memory.c
  @@ -1676,6 +1676,17 @@ gotten:
   unlock:
  pte_unmap_unlock(page_table, ptl);
  if (dirty_page) {
  +   /*
  +* Yes, Virginia, this is actually required to prevent a race
  +* with clear_page_dirty_for_io() from clearing the page dirty
  +* bit after it clear all dirty ptes, but before a racing
  +* do_wp_page installs a dirty pte.
  +*
  +* do_fault is protected similarly by holding the page lock
  +* after the dirty pte is installed.
  +*/
  +   lock_page(dirty_page);
  +   unlock_page(dirty_page);
  set_page_dirty_balance(dirty_page);
  put_page(dirty_page);
 
 Yes, I think that'll plug it.  A wait_on_page_locked() should suffice.

Ooohh, so _that's_ what it's called when you don't want all those
pesky locked operations and memory barriers ;)

 But does this have any dependency on the lock-page-over-do_no_page patches?

No, I guess not. Updated patch follows.

--
Fix msync data loss and (less importantly) dirty page accounting inaccuracies
due to the race remaining in clear_page_dirty_for_io().

The deleted comment explains what the race was, and the added comments
explain how it is fixed.

Signed-off-by: Nick Piggin [EMAIL PROTECTED]

Index: linux-2.6/mm/memory.c
===
--- linux-2.6.orig/mm/memory.c
+++ linux-2.6/mm/memory.c
@@ -1664,6 +1664,15 @@ gotten:
 unlock:
pte_unmap_unlock(page_table, ptl);
if (dirty_page) {
+   /*
+* Yes, Virginia, this is actually required to prevent a race
+* with clear_page_dirty_for_io() from clearing the page dirty
+* bit after it clear all dirty ptes, but before a racing
+* do_wp_page installs a dirty pte.
+*
+* do_no_page is protected similarly.
+*/
+   wait_on_page_locked(dirty_page);
set_page_dirty_balance(dirty_page);
put_page(dirty_page);
}
@@ -2316,6 +2325,7 @@ retry:
 unlock:
pte_unmap_unlock(page_table, ptl);
if (dirty_page) {
+   wait_on_page_locked(dirty_page);
set_page_dirty_balance(dirty_page);
put_page(dirty_page);
}
Index: linux-2.6/mm/page-writeback.c
===
--- linux-2.6.orig/mm/page-writeback.c
+++ linux-2.6/mm/page-writeback.c
@@ -903,6 +903,8 @@ int clear_page_dirty_for_io(struct page 
 {
struct address_space *mapping = page_mapping(page);
 
+   BUG_ON(!PageLocked(page));
+
if (mapping  mapping_cap_account_dirty(mapping)) {
/*
 * Yes, Virginia, this is indeed insane.
@@ -928,14 +930,19 @@ int clear_page_dirty_for_io(struct page 
 * We basically use the page master dirty bit
 * as a serialization point for all the different
 * threads doing their things.
-*
-* FIXME! We still have a race here: if somebody
-* adds the page back to the page tables in
-* between the page_mkclean() and the TestClearPageDirty(),
-* we might have it mapped without the dirty bit set.
 */
if (page_mkclean(page))
set_page_dirty(page);
+   /*
+* We carefully synchronise fault handlers against
+* installing a dirty pte and marking the page dirty
+* at this point. We do this by having them hold the
+* page lock at some point after installing their
+* pte, but before marking the page dirty.
+* Pages are always locked coming in here, so we get
+* the desired exclusion. See mm/memory.c:do_wp_page()
+* for more comments.
+*/
if (TestClearPageDirty(page)) {
dec_zone_page_state(page, NR_FILE_DIRTY);
return 1;
-
To 

Re: [patch 8/6] mm: fix cpdfio vs fault race

2007-03-07 Thread Andrew Morton
On Wed, 7 Mar 2007 03:20:38 -0800 Andrew Morton [EMAIL PROTECTED] wrote:

 
  ===
  --- linux-2.6.orig/mm/memory.c
  +++ linux-2.6/mm/memory.c
  @@ -1676,6 +1676,17 @@ gotten:
   unlock:
  pte_unmap_unlock(page_table, ptl);
  if (dirty_page) {
  +   /*
  +* Yes, Virginia, this is actually required to prevent a race
  +* with clear_page_dirty_for_io() from clearing the page dirty
  +* bit after it clear all dirty ptes, but before a racing
  +* do_wp_page installs a dirty pte.
  +*
  +* do_fault is protected similarly by holding the page lock
  +* after the dirty pte is installed.
  +*/
  +   lock_page(dirty_page);
  +   unlock_page(dirty_page);
  set_page_dirty_balance(dirty_page);
  put_page(dirty_page);
 
 Yes, I think that'll plug it.  A wait_on_page_locked() should suffice.

Or will it?  Suppose after the unlock_page() a _second_
clear_page_dirty_for_io() gets run - the same thing happens?

Extending the lock_page() coverage around the set_page_dirty() would
prevent that.

I guess not needed - the second clear_page_dirty_for_io() will have cleaned the
pte.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] [PATCH] Kconfig: enlarge printk buffer size limit

2007-03-07 Thread Artem Bityutskiy
On Fri, 2007-03-02 at 09:11 -0800, Randy Dunlap wrote:
 That's simple enough, but you could also just add
   log_buf_len=huge_number

Yeah, thanks for the tip, although the Kconfig change would not hurt as
well.

-- 
Best regards,
Artem Bityutskiy (Битюцкий Артём)

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


UUID support

2007-03-07 Thread Marcos Dione

first of all, I'm not subscribed to the list, so please CC me the answers.

I'm being forced by my distro to use UUDIs to specify the boot device by
UUID. the problem is I don't know how to add UUID support to the kernl, that is,
I don't know which option I should enable.

some might say «go fix your distro», some might say «change your distro».
yes, those are options, but for me is more interesting to know what part in the
kernel is responsible for this.

so, thanks in advance for any answer.

-- 
(Not so) Random fortune:
They were tecnicians, mechanics--and never thought of it in that manner [that
they were taking all the power of desicion]. They just wanted to right an
obvious wrong.
-- Harry Harrison, To the stars.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 8/6] mm: fix cpdfio vs fault race

2007-03-07 Thread Nick Piggin
On Wed, Mar 07, 2007 at 03:34:00AM -0800, Andrew Morton wrote:
 On Wed, 7 Mar 2007 03:20:38 -0800 Andrew Morton [EMAIL PROTECTED] wrote:
 
  
   ===
   --- linux-2.6.orig/mm/memory.c
   +++ linux-2.6/mm/memory.c
   @@ -1676,6 +1676,17 @@ gotten:
unlock:
 pte_unmap_unlock(page_table, ptl);
 if (dirty_page) {
   + /*
   +  * Yes, Virginia, this is actually required to prevent a race
   +  * with clear_page_dirty_for_io() from clearing the page dirty
   +  * bit after it clear all dirty ptes, but before a racing
   +  * do_wp_page installs a dirty pte.
   +  *
   +  * do_fault is protected similarly by holding the page lock
   +  * after the dirty pte is installed.
   +  */
   + lock_page(dirty_page);
   + unlock_page(dirty_page);
 set_page_dirty_balance(dirty_page);
 put_page(dirty_page);
  
  Yes, I think that'll plug it.  A wait_on_page_locked() should suffice.
 
 Or will it?  Suppose after the unlock_page() a _second_
 clear_page_dirty_for_io() gets run - the same thing happens?
 
 Extending the lock_page() coverage around the set_page_dirty() would
 prevent that.
 
 I guess not needed - the second clear_page_dirty_for_io() will have cleaned 
 the
 pte.

Yeah, all we need to do is keep page faults out of that little window
in clear_page_dirty_for_io() where I stuck the comment.

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: mm snapshot broken-out-2007-03-05-02-22.tar.gz uploaded

2007-03-07 Thread Michal Piotrowski
Michal Piotrowski napisał(a):
 Hi,
 
 [EMAIL PROTECTED] napisał(a):
 The mm snapshot broken-out-2007-03-05-02-22.tar.gz has been uploaded to


 ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/mm/broken-out-2007-03-05-02-22.tar.gz

 It contains the following patches against 2.6.21-rc2:

 
 Outstanding issue - my 3d surround doesn't work since 2.6.20.
 
 I'll bisect this commits
 http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=724339d76d9407cd1a8ad32a9c1fdf64840cc51b
 http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=6026179519896e7d35b2564e7544487d1c8948e7
 
 00:1f.5 Multimedia audio controller: Intel Corporation 82801EB/ER 
 (ICH5/ICH5R) AC'97 Audio Controller (rev 02)
 Subsystem: ASUSTeK Computer Inc. P4P800 Mainboard
 Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- 
 Stepping- SERR- FastB2B-
 Status: Cap+ 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium TAbort- 
 TAbort- MAbort- SERR- PERR-
 Latency: 0
 Interrupt: pin B routed to IRQ 17
 Region 0: I/O ports at d000 [size=256]
 Region 1: I/O ports at d400 [size=64]
 Region 2: Memory at f5fff800 (32-bit, non-prefetchable) [size=512]
 Region 3: Memory at f5fff400 (32-bit, non-prefetchable) [size=256]
 Capabilities: [50] Power Management version 2
 Flags: PMEClk- DSI- D1- D2- AuxCurrent=375mA 
 PME(D0+,D1-,D2-,D3hot+,D3cold+)
 Status: D0 PME-Enable- DSel=0 DScale=0 PME-
 
 Advanced Linux Sound Architecture Driver Version 1.0.14rc2 (Wed Feb 14 
 07:42:13 2007 UTC).
 
 CONFIG_SND=m
 CONFIG_SND_TIMER=m
 CONFIG_SND_PCM=m
 CONFIG_SND_RAWMIDI=m
 CONFIG_SND_SEQUENCER=m
 CONFIG_SND_SEQ_DUMMY=m
 CONFIG_SND_OSSEMUL=y
 CONFIG_SND_MIXER_OSS=m
 CONFIG_SND_PCM_OSS=m
 CONFIG_SND_PCM_OSS_PLUGINS=y
 CONFIG_SND_SEQUENCER_OSS=y
 CONFIG_SND_RTCTIMER=m
 CONFIG_SND_SEQ_RTCTIMER_DEFAULT=y
 # CONFIG_SND_DYNAMIC_MINORS is not set
 CONFIG_SND_SUPPORT_OLD_API=y
 CONFIG_SND_VERBOSE_PROCFS=y
 # CONFIG_SND_VERBOSE_PRINTK is not set
 # CONFIG_SND_DEBUG is not set
 
 CONFIG_SND_AC97_CODEC=m
 CONFIG_SND_DUMMY=m
 CONFIG_SND_VIRMIDI=m
 
 CONFIG_SND_INTEL8X0=m

As I said above, center and rear speakers doesn't work with this patch.

$ git-bisect good
831466f4ad2b5fe23dff77edbe6a7c244435e973 is first bad commit
commit 831466f4ad2b5fe23dff77edbe6a7c244435e973
Author: Randy Cushman [EMAIL PROTECTED]
Date:   Tue Dec 19 18:42:16 2006 +0100

[ALSA] ac97 - fix microphone and line_in selection logic

This patch fixes the Microphone and LINE_IN select logic for
Analog Devices surround codecs with shared jacks.  The existing
code can never utilize the shared jacks for Microphone and LINE_IN
due to the reversed jack selection logic.  The patched code
correctly selects the shared jack for input if the 'Channel Mode'
selector does not specify that the jack is to be used for output.
Specifically, in '2ch' mode the Center/LFE jack is used for
microphone input and the Surround jack is used for LINE_IN,
in '4ch' mode the Center/LFE jack is used for microphone input
and the Surround jack is used for output, and in '6ch' mode
both jacks are used for output.

Signed-off-by: Randy Cushman [EMAIL PROTECTED]
Signed-off-by: Takashi Iwai [EMAIL PROTECTED]
Signed-off-by: Jaroslav Kysela [EMAIL PROTECTED]

:04 04 7146a2c5350578fe1b05586c64df99889fa423fe 
10e98a9b4819b34ce2abb2c36adbf269d39b9e4c M  sound

Regards,
Michal

-- 
Michal K. K. Piotrowski
LTG - Linux Testers Group (PL)
(http://www.stardust.webpages.pl/ltg/)
LTG - Linux Testers Group (EN)
(http://www.stardust.webpages.pl/linux_testers_group_en/)
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: userspace pagecache management tool

2007-03-07 Thread Pádraig Brady
Andrew Morton wrote:
 On Tue, 06 Mar 2007 12:10:49 +
 P__draig Brady [EMAIL PROTECTED] wrote:
 Perhaps one could possibly just evict pages with _mapcount==0 ?
 
 That is the present fadvise(FADV_DONTNEED) behaviour.

Ah right. It doesn't invalidate page_mapped() pages.
If that means it doesn't invalidate pages previously cached
by other processes, then great.

However I think what I meant though was fadvise(FADV_DONTNEED)
should only invalidate pages where page_count()=1

From include/linux/mm.h

 For pages belonging to inodes, the page_count() is the number of
  attaches, plus 1 if `private' contains something, plus one for
  the page cache itself.

cheers,
Pádraig.

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Fix get_order()

2007-03-07 Thread David Howells
Linus Torvalds [EMAIL PROTECTED] wrote:

  +#define ilog2_up(n) ((n) == 1 ? 0 : ilog2((n) - 1) + 1)
 
 This is wrong. It uses n twice, which makes it unsafe as a macro.

Damn.  I missed that.

 Or it could use a __builtin_constant_p() (which gcc defines to not have 
 side effects) to allow the multiple use for constant data.

I should have, yes.

 Or we could require that ilog2(0) returns -1, and then we could just say
 
   #define ilog2_up(n) (ilog2((n)-1)+1)

I'd rather not do that as the inline assembly variants then have to special
case ilog2(0) rather than just having an undefined result.

 The whole get_order() macro also has some serious lack of parenthesis. 
 In general, commit 39d61db0edb34d60b83c5e0d62d0e906578cc707 just was 
 pretty damn bad!

Unfortunately, I can't disagree.

 I'm becoming a bit disgruntled about this whole thing, I have to admit. 
 I'm just not sure the bugs here are worth it. Especially considering that 
 __get_order() has apparently never even tested these things to begin with, 

It was tested...  I've just re-examined my test program and I've realised I've
only tested power-of-2 parameters.  Sigh.

 since nobody but FRV has ever #defined the ARCH_HAS_ILOG2_U?? macros.

Well, that should be CONFIG_ARCH_HAS_ILOG2_U?? macros, and powerpc defines
those too.

  - buggy

True, for N being a non-power-of-two, unfortunately; and also where evaluating
N has side-effects.

  - untested

Not true, just that my userspace test program isn't sufficiently exhaustive.

  - has untrue comments

Unfortunately so.

  - makes no real sense

Not true.

Various archs (including i386, x86_64, powerpc and frv) have instructions that
can be used to calculate integer log2(N).  The fallback position is to use a
loop:

size = (size - 1)  (PAGE_SHIFT - 1);
order = -1;
do {
size = 1;
order++;
} while (size);

 and I'm inclined to just revert 39d61db0 instead of adding more and more 
 breakage to it, since it's simply not going to help with the fundamental 
 problems!

Probably a good idea.  I'll work on it some more and improve my test program
(which is actually quite simple to do).

David
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


connector: Bugfix for cn_call_callback()

2007-03-07 Thread Philipp Reisner
Hi Evgeniy,

When one stresses the connector code, with sending many messages
from userspace to kernel, one could get in the unlikely()
part in cn_call_callback().

There a new __cbq gets allocated, and a NULL pointer got assigned
to the callback by dereferencing __cbq. This is the bug. The right
thing is the dereference the original __cbq. Therefore the bugfix
is to use a new variable for the newly allocated __cbq. 

This is tested, and it fixes the issue.

Signed-off-by: Philipp Reisner [EMAIL PROTECTED]
Signed-off-by: Lars Ellenberg [EMAIL PROTECTED]

--- /usr/src/linux-2.6.20/drivers/connector/connector.c 2007-03-07 
11:45:38.0 +0100
+++ /usr/src/linux-2.6.20-modified/drivers/connector/connector.c
2007-03-07 11:39:11.0 +0100
@@ -128,7 +128,7 @@
  */
 static int cn_call_callback(struct cn_msg *msg, void (*destruct_data)(void *), 
void *data)
 {
-   struct cn_callback_entry *__cbq;
+   struct cn_callback_entry *__cbq, *__new_cbq;
struct cn_dev *dev = cdev;
int err = -ENODEV;

@@ -148,23 +148,23 @@
} else {
struct cn_callback_data *d;

-   __cbq = kzalloc(sizeof(*__cbq), GFP_ATOMIC);
-   if (__cbq) {
-   d = __cbq-data;
+   __new_cbq = kzalloc(sizeof(*__new_cbq), 
GFP_ATOMIC);
+   if (__new_cbq) {
+   d = __new_cbq-data;
d-callback_priv = msg;
d-callback = __cbq-data.callback;
d-ddata = data;
d-destruct_data = destruct_data;
-   d-free = __cbq;
+   d-free = __new_cbq;

-   INIT_WORK(__cbq-work,
-   cn_queue_wrapper);
+   INIT_WORK(__new_cbq-work,
+ cn_queue_wrapper);

if (queue_work(dev-cbdev-cn_queue,
-   __cbq-work))
+   __new_cbq-work))
err = 0;
else {
-   kfree(__cbq);
+   kfree(__new_cbq);
err = -EINVAL;
}
} else


-- 
: Dipl-Ing Philipp Reisner  Tel +43-1-8178292-50 :
: LINBIT Information Technologies GmbH  Fax +43-1-8178292-82 :
: Vivenotgasse 48, 1120 Vienna, Austriahttp://www.linbit.com :
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 4/6] mm: merge populate and nopage into fault (fixes nonlinear)

2007-03-07 Thread Peter Zijlstra
On Wed, 2007-03-07 at 12:00 +0100, Nick Piggin wrote:
 On Wed, Mar 07, 2007 at 11:47:42AM +0100, Peter Zijlstra wrote:
  On Wed, 2007-03-07 at 11:38 +0100, Nick Piggin wrote:
  
 There are real users who want these fast, though.

Yeah, why don't we have a tree per nonlinear vma to find these pages?

wli mentions shadow page tables..
   
   We could do something more efficient, but I thought that half the point
   was that they didn't carry any of this extra memory, and they could be
   really fast to set up at the expense of efficiency elsewhere.
  
  I'm failing to understand this :-(
  
  That extra memory, and apparently they don't want the inefficiency

s/T/W/

  either.
 
 Sorry, I didn't understand your misunderstandings ;)

Bah, my brain is thick and foggy today. Let us try again;

Nonlinear vmas exist because many vmas are expensive somehow, right?
Nonlinear vmas keep the page mapping in the page tables and screw rmaps.

This 'extra memory' you mentioned would be the overhead of tracking the
actual ranges?

And apparently now we want it to not suck on the rmap case :-(

Anyway, if used on a non writeback capable backing store (ramfs)
page_mkclean will never be called. If also mlocked (I think oracle does
this) then page reclaim will pass over too.

So we're only interested in the bdi_cap_accounting_dirty and VM_SHARED
case, right?

Tracking these ranges on a per-vma basis would avoid taking the mm wide
mmap_sem and so would be cheaper than regular vmas.

Would that still be too expensive?

  Well, now they don't, but it could be done or even exploited as a DoS.
 
 But so could nonlinear page reclaim. I think we need to restrict nonlinear
 mappings to root if we're worried about that.

Can't we just 'fix' it?

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


2.6.21-rc2-mm2 fails does not compile

2007-03-07 Thread Marc Dietrich

Hi,

the fix revert-optimize-and-simplify-get_cycles_sync breaks the build:

  CC  arch/i386/kernel/asm-offsets.s
In file included from include/asm/timex.h:10,
 from include/linux/timex.h:187,
 from include/linux/sched.h:50,
 from include/linux/utsname.h:35,
 from include/asm/elf.h:12,
 from include/linux/elf.h:7,
 from include/linux/module.h:15,
 from include/linux/crypto.h:21,
 from arch/i386/kernel/asm-offsets.c:7:
include/asm/tsc.h: In function ‘get_cycles_sync’:
include/asm/tsc.h:45: warning: implicit declaration of 
function ‘alternative_io’
include/asm/tsc.h:46: error: called object ‘=a’ is not a function
include/asm/tsc.h:46: error: called object ‘0’ is not a function
include/asm/tsc.h:46: error: expected ‘)’ before ‘:’ token
make[1]: *** [arch/i386/kernel/asm-offsets.s] Error 1
make: *** [prepare0] Error 2


seems that alternative_io is only defined in the x86_64 case ...

what's the fix for the fix ?

Marc


-- 
Those who question our statements are traitors
 Lord Arthur Ponsonby, Falsehood in Wartime: Propaganda Lies 
of the First 
World War, 1928
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 8/8] Convert PDA into the percpu section

2007-03-07 Thread Rusty Russell
On Wed, 2007-03-07 at 11:33 +1100, Rusty Russell wrote:
 On Tue, 2007-03-06 at 20:34 +0100, Andi Kleen wrote:
  Do you have text size comparisons before/after and possible lmbench? 
 
 No, but I'll run them this evening.  Last time the size reduction was
 slight, and there was no measurable performance improvement in
 microbenchmarks.

Here are the size results, for a start:


UP:
Before:
size vmlinux
   textdata bss dec hex filename
3094881  243110  221184 3559175  364f07 vmlinux

After:
size vmlinux 
   textdata bss dec hex filename
3093409  243142  221184 3557735  364967 vmlinux

SMP:
Before:
size vmlinux
   textdata bss dec hex filename
369  318770  237568 3778607  39a82f vmlinux

After:
size vmlinux
   textdata bss dec hex filename
3221421  314674  237568 3773663  3994df vmlinux

(The data size changes are moving from pda - percpu, and on SMP
removing the page-aligned PDA).

So, a slight win.  lmbench tomorrow...

Rusty.

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: connector: Bugfix for cn_call_callback()

2007-03-07 Thread Evgeniy Polyakov
On Wed, Mar 07, 2007 at 12:26:12PM +0100, Philipp Reisner ([EMAIL PROTECTED]) 
wrote:
 Hi Evgeniy,

Hi Philipp.

 When one stresses the connector code, with sending many messages
 from userspace to kernel, one could get in the unlikely()
 part in cn_call_callback().
 
 There a new __cbq gets allocated, and a NULL pointer got assigned
 to the callback by dereferencing __cbq. This is the bug. The right
 thing is the dereference the original __cbq. Therefore the bugfix
 is to use a new variable for the newly allocated __cbq. 
 
 This is tested, and it fixes the issue.

Yes, your patch is correct.

 Signed-off-by: Philipp Reisner [EMAIL PROTECTED]
 Signed-off-by: Lars Ellenberg [EMAIL PROTECTED]

I will push it, thanks a lot.

-- 
Evgeniy Polyakov
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


  1   2   3   4   5   6   7   8   >