Re: [PATCH] x86: Run checksumming in parallel accross multiple alu's

2013-10-15 Thread Sébastien Dugué
Hi Eric,

On Tue, 15 Oct 2013 07:06:25 -0700
Eric Dumazet  wrote:

> On Tue, 2013-10-15 at 15:56 +0200, Sébastien Dugué wrote:
> > On Tue, 15 Oct 2013 15:33:36 +0200
> > Andi Kleen  wrote:
> > 
> > > >   indeed, our typical workload is connected mode IPoIB on mlx4 QDR 
> > > > hardware
> > > > where one cannot benefit from hardware offloads.
> > > 
> > > Is this with sendfile?
> > 
> >   Tests were done with iperf at the time without any extra funky options, 
> > and
> > looking at the code it looks like it does plain write() / recv() on the 
> > socket.
> > 
> 
> But the csum cost is both for sender and receiver ?

  No, it was only on the receiver side that I noticed it.

> 
> Please post the following :
> 
> perf record -g "your iperf session"
> 
> perf report | head -n 200

  Sorry, but this is 3 years old stuff and I do not have the
setup anymore to reproduce.

  Sébastien.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] x86: Run checksumming in parallel accross multiple alu's

2013-10-15 Thread Sébastien Dugué
On Tue, 15 Oct 2013 15:33:36 +0200
Andi Kleen  wrote:

> >   indeed, our typical workload is connected mode IPoIB on mlx4 QDR hardware
> > where one cannot benefit from hardware offloads.
> 
> Is this with sendfile?

  Tests were done with iperf at the time without any extra funky options, and
looking at the code it looks like it does plain write() / recv() on the socket.

  Sébastien.

> 
> For normal send() the checksum is done in the user copy and for receiving it
> can be also done during the copy in most cases
> 
> -Andi
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] x86: Run checksumming in parallel accross multiple alu's

2013-10-15 Thread Sébastien Dugué

  Hi Neil, Andi,

On Mon, 14 Oct 2013 16:25:28 -0400
Neil Horman  wrote:

> On Sun, Oct 13, 2013 at 09:38:33PM -0700, Andi Kleen wrote:
> > Neil Horman  writes:
> > 
> > > Sébastien Dugué reported to me that devices implementing ipoib (which 
> > > don't have
> > > checksum offload hardware were spending a significant amount of time 
> > > computing
> > 
> > Must be an odd workload, most TCP/UDP workloads do copy-checksum
> > anyways. I would rather investigate why that doesn't work.
> > 
> FWIW, the reporter was reporting this using an IP over Infiniband network.
> Neil

  indeed, our typical workload is connected mode IPoIB on mlx4 QDR hardware
where one cannot benefit from hardware offloads.

  For a bit of background on the issue:

  It all started nearly 3 years ago when trying to understand why IPoIB BW was
so low in our setups and why ksoftirqd used 100% of one CPU. A kernel profile
trace showed that the CPU spent most of it's time in checksum computation (from
the only old trace I managed to unearth):

  Function   HitTimeAvg
     ------
  schedule  1730629976998 us 364148.5 us
  csum_partial  1081346520944414 us 1.936 us
  mwait_idle_with_hints 14519858861 us 6794.529 us
  get_page_from_freelist101104348120524 us 0.803 us
  alloc_pages_current   100936755180650 us 0.513 us
  __phys_addr   355547834471387 us 0.125 us
  zone_statistics   101104344360871 us 0.431 us
  ipoib_cm_alloc_rx_skb   6738994343949 us 6.445 us

  After having recoded the checksum to use 2 ALUs, csum_partial() disappeared
from the tracer radar. IPoIB BW got from ~12Gb/s to ~ 20Gb/s and ksoftirqd load
dropped down drastically. Sorry, I could not manage to locate my old traces and
results, those seem to have been lost in the mist of time.

  I did some micro benchmark (dirty hack code below) of different solutions.
It looks like processing 128-byte blocks in 4 chains allows the best 
performance,
but there are plenty other possibilities.

  FWIW, this code has been running as is at our customers sites for 3 years now.

  Sébastien.

> 
> > That said the change looks reasonable, but may not fix the root cause.
> > 
> > -Andi
> > 
> > -- 
> > a...@linux.intel.com -- Speaking for myself only
> > 

8<--


/*
 * gcc -Wall -O3 -o csum_test csum_test.c -lrt
 */

#include 
#include 
#include 
#include 
#include 
#include 

#define __force
#define unlikely(x) (x)

typedef uint32_t u32;
typedef uint16_t u16;

typedef u16 __sum16;
typedef u32 __wsum;

#define NUM_LOOPS   10
#define BUF_LEN 65536
unsigned char buf[BUF_LEN];


/*
 * csum_fold - Fold and invert a 32bit checksum.
 * sum: 32bit unfolded sum
 *
 * Fold a 32bit running checksum to 16bit and invert it. This is usually
 * the last step before putting a checksum into a packet.
 * Make sure not to mix with 64bit checksums.
 */
static inline __sum16 csum_fold(__wsum sum)
{
asm("  addl %1,%0\n"
"  adcl $0x,%0"
: "=r" (sum)
: "r" ((__force u32)sum << 16),
  "0" ((__force u32)sum & 0x));
return (__force __sum16)(~(__force u32)sum >> 16);
}

static inline unsigned short from32to16(unsigned a)
{
unsigned short b = a >> 16;
asm("addw %w2,%w0\n\t"
"adcw $0,%w0\n"
: "=r" (b)
: "0" (b), "r" (a));
return b;
}

static inline unsigned add32_with_carry(unsigned a, unsigned b)
{
asm("addl %2,%0\n\t"
"adcl $0,%0"
: "=r" (a)
: "0" (a), "r" (b));
return a;
}

/*
 * Do a 64-bit checksum on an arbitrary memory area.
 * Returns a 32bit checksum.
 *
 * This isn't as time critical as it used to be because many NICs
 * do hardware checksumming these days.
 *
 * Things tried and found to not make it faster:
 * Manual Prefetching
 * Unrolling to an 128 bytes inner loop.
 * Using interleaving with more registers to break the carry chains.
 */
static unsigned do_csum(const unsigned char *buff, unsigned len)
{
unsigned odd, count;
unsigned long result = 0;

if (unlikely(len == 0))
return result;
odd = 1 & (unsigned long) buff;
if (unlikely(odd)) {
result = *buff << 8;
len--;
buff++;
}
count = len >> 1;

Re: [PATCH] x86: Run checksumming in parallel accross multiple alu's

2013-10-15 Thread Sébastien Dugué

  Hi Neil, Andi,

On Mon, 14 Oct 2013 16:25:28 -0400
Neil Horman nhor...@tuxdriver.com wrote:

 On Sun, Oct 13, 2013 at 09:38:33PM -0700, Andi Kleen wrote:
  Neil Horman nhor...@tuxdriver.com writes:
  
   Sébastien Dugué reported to me that devices implementing ipoib (which 
   don't have
   checksum offload hardware were spending a significant amount of time 
   computing
  
  Must be an odd workload, most TCP/UDP workloads do copy-checksum
  anyways. I would rather investigate why that doesn't work.
  
 FWIW, the reporter was reporting this using an IP over Infiniband network.
 Neil

  indeed, our typical workload is connected mode IPoIB on mlx4 QDR hardware
where one cannot benefit from hardware offloads.

  For a bit of background on the issue:

  It all started nearly 3 years ago when trying to understand why IPoIB BW was
so low in our setups and why ksoftirqd used 100% of one CPU. A kernel profile
trace showed that the CPU spent most of it's time in checksum computation (from
the only old trace I managed to unearth):

  Function   HitTimeAvg
     ------
  schedule  1730629976998 us 364148.5 us
  csum_partial  1081346520944414 us 1.936 us
  mwait_idle_with_hints 14519858861 us 6794.529 us
  get_page_from_freelist101104348120524 us 0.803 us
  alloc_pages_current   100936755180650 us 0.513 us
  __phys_addr   355547834471387 us 0.125 us
  zone_statistics   101104344360871 us 0.431 us
  ipoib_cm_alloc_rx_skb   6738994343949 us 6.445 us

  After having recoded the checksum to use 2 ALUs, csum_partial() disappeared
from the tracer radar. IPoIB BW got from ~12Gb/s to ~ 20Gb/s and ksoftirqd load
dropped down drastically. Sorry, I could not manage to locate my old traces and
results, those seem to have been lost in the mist of time.

  I did some micro benchmark (dirty hack code below) of different solutions.
It looks like processing 128-byte blocks in 4 chains allows the best 
performance,
but there are plenty other possibilities.

  FWIW, this code has been running as is at our customers sites for 3 years now.

  Sébastien.

 
  That said the change looks reasonable, but may not fix the root cause.
  
  -Andi
  
  -- 
  a...@linux.intel.com -- Speaking for myself only
  

8--


/*
 * gcc -Wall -O3 -o csum_test csum_test.c -lrt
 */

#include stdio.h
#include stdlib.h
#include stdint.h
#include time.h
#include string.h
#include errno.h

#define __force
#define unlikely(x) (x)

typedef uint32_t u32;
typedef uint16_t u16;

typedef u16 __sum16;
typedef u32 __wsum;

#define NUM_LOOPS   10
#define BUF_LEN 65536
unsigned char buf[BUF_LEN];


/*
 * csum_fold - Fold and invert a 32bit checksum.
 * sum: 32bit unfolded sum
 *
 * Fold a 32bit running checksum to 16bit and invert it. This is usually
 * the last step before putting a checksum into a packet.
 * Make sure not to mix with 64bit checksums.
 */
static inline __sum16 csum_fold(__wsum sum)
{
asm(  addl %1,%0\n
  adcl $0x,%0
: =r (sum)
: r ((__force u32)sum  16),
  0 ((__force u32)sum  0x));
return (__force __sum16)(~(__force u32)sum  16);
}

static inline unsigned short from32to16(unsigned a)
{
unsigned short b = a  16;
asm(addw %w2,%w0\n\t
adcw $0,%w0\n
: =r (b)
: 0 (b), r (a));
return b;
}

static inline unsigned add32_with_carry(unsigned a, unsigned b)
{
asm(addl %2,%0\n\t
adcl $0,%0
: =r (a)
: 0 (a), r (b));
return a;
}

/*
 * Do a 64-bit checksum on an arbitrary memory area.
 * Returns a 32bit checksum.
 *
 * This isn't as time critical as it used to be because many NICs
 * do hardware checksumming these days.
 *
 * Things tried and found to not make it faster:
 * Manual Prefetching
 * Unrolling to an 128 bytes inner loop.
 * Using interleaving with more registers to break the carry chains.
 */
static unsigned do_csum(const unsigned char *buff, unsigned len)
{
unsigned odd, count;
unsigned long result = 0;

if (unlikely(len == 0))
return result;
odd = 1  (unsigned long) buff;
if (unlikely(odd)) {
result = *buff  8;
len--;
buff++;
}
count = len  1;   /* nr of 16-bit words.. */
if (count) {
if (2  (unsigned long) buff) {
result += *(unsigned short *)buff;
count--;
len -= 2;
buff += 2;
}
count = 1;/* nr

Re: [PATCH] x86: Run checksumming in parallel accross multiple alu's

2013-10-15 Thread Sébastien Dugué
On Tue, 15 Oct 2013 15:33:36 +0200
Andi Kleen a...@firstfloor.org wrote:

indeed, our typical workload is connected mode IPoIB on mlx4 QDR hardware
  where one cannot benefit from hardware offloads.
 
 Is this with sendfile?

  Tests were done with iperf at the time without any extra funky options, and
looking at the code it looks like it does plain write() / recv() on the socket.

  Sébastien.

 
 For normal send() the checksum is done in the user copy and for receiving it
 can be also done during the copy in most cases
 
 -Andi
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] x86: Run checksumming in parallel accross multiple alu's

2013-10-15 Thread Sébastien Dugué
Hi Eric,

On Tue, 15 Oct 2013 07:06:25 -0700
Eric Dumazet eric.duma...@gmail.com wrote:

 On Tue, 2013-10-15 at 15:56 +0200, Sébastien Dugué wrote:
  On Tue, 15 Oct 2013 15:33:36 +0200
  Andi Kleen a...@firstfloor.org wrote:
  
  indeed, our typical workload is connected mode IPoIB on mlx4 QDR 
hardware
where one cannot benefit from hardware offloads.
   
   Is this with sendfile?
  
Tests were done with iperf at the time without any extra funky options, 
  and
  looking at the code it looks like it does plain write() / recv() on the 
  socket.
  
 
 But the csum cost is both for sender and receiver ?

  No, it was only on the receiver side that I noticed it.

 
 Please post the following :
 
 perf record -g your iperf session
 
 perf report | head -n 200

  Sorry, but this is 3 years old stuff and I do not have the
setup anymore to reproduce.

  Sébastien.


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Is the PCI clock within the spec?

2007-12-04 Thread Sébastien Dugué

  Hi John,

On Tue, 04 Dec 2007 11:57:43 +0100 John Sigler <[EMAIL PROTECTED]> wrote:

> Hello everyone,
> 
> I have an x86 system, running Linux 2.6.22.1-rt9, in which I plug one
> or two PCI I/O boards. I had been experiencing complete system lock-ups
> until I sent the system to the board manufacturer, and he fixed the
> problem. However, he told me that the PCI clock seemed out of spec,
> as far as voltage is concerned.
> 
> (Disclaimer: my knowledge of PCI is 0.)
> 
> The board manufacturer sent me the plot of (what appears to be) voltage
> versus time for the PCI clock.
> 
> http://linux.kernel.free.fr/plot1.jpg
> 
> The system manufacturer sent me a similar plot.
> 
> http://linux.kernel.free.fr/plot2.jpg

  Why did they send you those plots? What was their point?

> 
> As far as my understanding goes, the signal should alternate between
> 0 V and 3.3 V (??).

  Yep, that's the idealized 3.3V signaling case. However, it looks like
the signal is overshooting a bit (-0.8V below 0 and +0.8V over 3.3V from looking
at the 1st plot) which may be due to incorrect impedance matching on the bus,
probes artifacts, ...

> In the second plot, it looks like Vmax ~ 4.6V
> and Vmin ~ -1.4V (Pk-Pk(C1)=6.08V might mean peak-to-peak voltage?)

  This one looks a bit high (if they measured the same voltages I wonder
where they got their scopes calibrated ;-) )

> 
> 0) What is this C1 both plots mention?

  Scope Channel 1

> 1) Am I reading the plot correctly?

  Yep

> 2) Is -1.4V in DC even possible?

  Why not!

> 3) 4.6V is 1.3V above 3.3V and -1.4V is -1.4V below 0. (Assuming I read 
> the numbers correctly) Are these values within the PCI spec? Or are 
> these voltages dangerous and / or might cause some problems with some 
> PCI boards?

  Well it depends on which of the plot is lying. Looking at the PCI spec
(4.2.2.1) the Vih max for a device is Vcc-max+0.5 = 3.6 + 0.5 = 4.1V
the Vil min is -0.5V so in this case it looks a bit high. But I would not
worry too much, those are only the overshoots, and the circuits have
clamping diodes on their inputs.

  The test waveform voltages for the maximum ratings (4.2.2.3) against which
every PCI device should be qualified are higher than what you have here: 7.1V
peak-to-peak.

  Hope this helps.

  Sebastien.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Is the PCI clock within the spec?

2007-12-04 Thread Sébastien Dugué

  Hi John,

On Tue, 04 Dec 2007 11:57:43 +0100 John Sigler [EMAIL PROTECTED] wrote:

 Hello everyone,
 
 I have an x86 system, running Linux 2.6.22.1-rt9, in which I plug one
 or two PCI I/O boards. I had been experiencing complete system lock-ups
 until I sent the system to the board manufacturer, and he fixed the
 problem. However, he told me that the PCI clock seemed out of spec,
 as far as voltage is concerned.
 
 (Disclaimer: my knowledge of PCI is 0.)
 
 The board manufacturer sent me the plot of (what appears to be) voltage
 versus time for the PCI clock.
 
 http://linux.kernel.free.fr/plot1.jpg
 
 The system manufacturer sent me a similar plot.
 
 http://linux.kernel.free.fr/plot2.jpg

  Why did they send you those plots? What was their point?

 
 As far as my understanding goes, the signal should alternate between
 0 V and 3.3 V (??).

  Yep, that's the idealized 3.3V signaling case. However, it looks like
the signal is overshooting a bit (-0.8V below 0 and +0.8V over 3.3V from looking
at the 1st plot) which may be due to incorrect impedance matching on the bus,
probes artifacts, ...

 In the second plot, it looks like Vmax ~ 4.6V
 and Vmin ~ -1.4V (Pk-Pk(C1)=6.08V might mean peak-to-peak voltage?)

  This one looks a bit high (if they measured the same voltages I wonder
where they got their scopes calibrated ;-) )

 
 0) What is this C1 both plots mention?

  Scope Channel 1

 1) Am I reading the plot correctly?

  Yep

 2) Is -1.4V in DC even possible?

  Why not!

 3) 4.6V is 1.3V above 3.3V and -1.4V is -1.4V below 0. (Assuming I read 
 the numbers correctly) Are these values within the PCI spec? Or are 
 these voltages dangerous and / or might cause some problems with some 
 PCI boards?

  Well it depends on which of the plot is lying. Looking at the PCI spec
(4.2.2.1) the Vih max for a device is Vcc-max+0.5 = 3.6 + 0.5 = 4.1V
the Vil min is -0.5V so in this case it looks a bit high. But I would not
worry too much, those are only the overshoots, and the circuits have
clamping diodes on their inputs.

  The test waveform voltages for the maximum ratings (4.2.2.3) against which
every PCI device should be qualified are higher than what you have here: 7.1V
peak-to-peak.

  Hope this helps.

  Sebastien.

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH -mm] remove HT1000 from the quirk_msi_ht_cap quirk

2007-11-28 Thread Sébastien Dugué

  Andrew, due to mailer problems, I'm not sure you received my previous mail
titled: "[PATCH] V2 -  quirk: Enable MSI Mapping on HT1000" if you did,
then forget it and use this one instead. Or if you already merged V2 then
drop this one.

Sorry for the inconvenience.

  Sebastien.


From: Sebastien Dugue <[EMAIL PROTECTED]>

  Now that the HyperTransport MSI mapping capability is enabled for Broadcom's
HT1000 bridge, the quirk_msi_ht_cap quirk is no longer necessary.

Signed-off-by: Sebastien Dugue <[EMAIL PROTECTED]>
Cc: Andrew Morton <[EMAIL PROTECTED]>
Cc: "Eric W. Biederman" <[EMAIL PROTECTED]>
Cc: Michael Chan <[EMAIL PROTECTED]>


---
 drivers/pci/quirks.c |4 
 1 file changed, 4 deletions(-)

Index: linux-2.6.24-rc3-mm2/drivers/pci/quirks.c
===
--- linux-2.6.24-rc3-mm2.orig/drivers/pci/quirks.c
+++ linux-2.6.24-rc3-mm2/drivers/pci/quirks.c
@@ -1711,10 +1711,6 @@ static void __devinit quirk_msi_ht_cap(s
 }
 DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_SERVERWORKS, 
PCI_DEVICE_ID_SERVERWORKS_HT2000_PCIE,
quirk_msi_ht_cap);
-DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_SERVERWORKS,
-   PCI_DEVICE_ID_SERVERWORKS_HT1000_PXB,
-   quirk_msi_ht_cap);
-
 
 /*
  *  Force enable MSI mapping capability on HT bridges
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH -mm] remove HT1000 from the quirk_msi_ht_cap quirk

2007-11-28 Thread Sébastien Dugué

  Andrew, due to mailer problems, I'm not sure you received my previous mail
titled: [PATCH] V2 -  quirk: Enable MSI Mapping on HT1000 if you did,
then forget it and use this one instead. Or if you already merged V2 then
drop this one.

Sorry for the inconvenience.

  Sebastien.


From: Sebastien Dugue [EMAIL PROTECTED]

  Now that the HyperTransport MSI mapping capability is enabled for Broadcom's
HT1000 bridge, the quirk_msi_ht_cap quirk is no longer necessary.

Signed-off-by: Sebastien Dugue [EMAIL PROTECTED]
Cc: Andrew Morton [EMAIL PROTECTED]
Cc: Eric W. Biederman [EMAIL PROTECTED]
Cc: Michael Chan [EMAIL PROTECTED]


---
 drivers/pci/quirks.c |4 
 1 file changed, 4 deletions(-)

Index: linux-2.6.24-rc3-mm2/drivers/pci/quirks.c
===
--- linux-2.6.24-rc3-mm2.orig/drivers/pci/quirks.c
+++ linux-2.6.24-rc3-mm2/drivers/pci/quirks.c
@@ -1711,10 +1711,6 @@ static void __devinit quirk_msi_ht_cap(s
 }
 DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_SERVERWORKS, 
PCI_DEVICE_ID_SERVERWORKS_HT2000_PCIE,
quirk_msi_ht_cap);
-DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_SERVERWORKS,
-   PCI_DEVICE_ID_SERVERWORKS_HT1000_PXB,
-   quirk_msi_ht_cap);
-
 
 /*
  *  Force enable MSI mapping capability on HT bridges
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/2] msi: set 'En' bit of MSI Mapping Capability on HT platform

2007-11-26 Thread Sébastien Dugué
On Sun, 25 Nov 2007 11:21:48 +0800 "peerchen" <[EMAIL PROTECTED]> wrote:

> According to the HyperTransport spec, 'En' indicate if the MSI Mapping is
> active. So it should be set when enable the MSI.
> 

  Cool, I had a patch that added a quirk to enable MSI Mapping on Broadcom's
HT1000 so that the builtin BCM5706S Gigabit Ethernet would use MSI. This one
looks even cleaner.

  As a side note, the bnx2 driver tries to use MSI by default (in the kernel I'm
using - 2.6.21.5-rt10) and falls back to INTx if no interrupt is received within
a reasonable amount of time. If the HT MSI Mapping is not enabled beforehand,
the first MSI message (before reverting to INTx) is not trapped by the bridge
and lands in memory thus corrupting what is there at that time. In my case
it's the mapcount of a struct page in the freelist which is overwritten with
the MSI message.

  Sébastien.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/2] msi: set 'En' bit of MSI Mapping Capability on HT platform

2007-11-26 Thread Sébastien Dugué
On Sun, 25 Nov 2007 11:21:48 +0800 peerchen [EMAIL PROTECTED] wrote:

 According to the HyperTransport spec, 'En' indicate if the MSI Mapping is
 active. So it should be set when enable the MSI.
 

  Cool, I had a patch that added a quirk to enable MSI Mapping on Broadcom's
HT1000 so that the builtin BCM5706S Gigabit Ethernet would use MSI. This one
looks even cleaner.

  As a side note, the bnx2 driver tries to use MSI by default (in the kernel I'm
using - 2.6.21.5-rt10) and falls back to INTx if no interrupt is received within
a reasonable amount of time. If the HT MSI Mapping is not enabled beforehand,
the first MSI message (before reverting to INTx) is not trapped by the bridge
and lands in memory thus corrupting what is there at that time. In my case
it's the mapcount of a struct page in the freelist which is overwritten with
the MSI message.

  Sébastien.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] RT: fix spin_trylock_irq

2007-10-11 Thread Sébastien Dugué

  This patch fixes a bug in spin_trylock_irq() where __spin_trylock_irq() is
picked for regular (non-raw) spinlocks instead of _spin_trylock_irq().

  This results in systematic boot hangs and may have been going unnoticed
for quite some time as it only manifests (aside from a compile warning) when
booting with a NUMA config or when using the Chelsio T3 (cxgb3) driver as
these seems to be the sole users.


Signed-off-by: Sébastien Dugué <[EMAIL PROTECTED]>

---
 include/linux/spinlock.h |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Index: linux-2.6.23-rc9-rt2/include/linux/spinlock.h
===
--- linux-2.6.23-rc9-rt2.orig/include/linux/spinlock.h
+++ linux-2.6.23-rc9-rt2/include/linux/spinlock.h
@@ -501,7 +501,7 @@ do {
\
 
 #define spin_trylock_irq(lock) \
__cond_lock(lock, PICK_SPIN_OP_RET(__spin_trylock_irq,  \
-   __spin_trylock_irq, lock))
+   _spin_trylock_irq, lock))
 
 #define spin_trylock_irqsave(lock, flags) \
__cond_lock(lock, PICK_SPIN_OP_RET(__spin_trylock_irqsave,  \
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] RT: fix spin_trylock_irq

2007-10-11 Thread Sébastien Dugué

  This patch fixes a bug in spin_trylock_irq() where __spin_trylock_irq() is
picked for regular (non-raw) spinlocks instead of _spin_trylock_irq().

  This results in systematic boot hangs and may have been going unnoticed
for quite some time as it only manifests (aside from a compile warning) when
booting with a NUMA config or when using the Chelsio T3 (cxgb3) driver as
these seems to be the sole users.


Signed-off-by: Sébastien Dugué [EMAIL PROTECTED]

---
 include/linux/spinlock.h |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Index: linux-2.6.23-rc9-rt2/include/linux/spinlock.h
===
--- linux-2.6.23-rc9-rt2.orig/include/linux/spinlock.h
+++ linux-2.6.23-rc9-rt2/include/linux/spinlock.h
@@ -501,7 +501,7 @@ do {
\
 
 #define spin_trylock_irq(lock) \
__cond_lock(lock, PICK_SPIN_OP_RET(__spin_trylock_irq,  \
-   __spin_trylock_irq, lock))
+   _spin_trylock_irq, lock))
 
 #define spin_trylock_irqsave(lock, flags) \
__cond_lock(lock, PICK_SPIN_OP_RET(__spin_trylock_irqsave,  \
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Search for x86_64 documentation.

2007-08-02 Thread Sébastien Dugué

  Hi Francis,

On Wed, 1 Aug 2007 17:30:35 +0200 "Francis Moreau" <[EMAIL PROTECTED]> wrote:

> Hello Rene,
> 
> On 8/1/07, Rene Herman <[EMAIL PROTECTED]> wrote:
> > On 08/01/2007 03:27 PM, Francis Moreau wrote:
> >
> > > I'm used to hack Linux on a ARM based board and would like to be
> > > involved in x86_64 architecture but I don't know where I should
> > > start...
> > >
> > > Could anyone point out some nice documentations/books describing this
> > > architecture ?
> >
> > First and foremost the AMD64 architecture documentation from AMD itself:
> >
> > http://www.amd.com/gb-uk/Processors/TechnicalResources/0,,30_182_739_7044,00.html
> >
> 
> Thank you for the pointer but I alread knew about them.
> 
> I was actually more interested in books which are more pleasant to
> read than a raw datasheet. Something like "x86_64 arch for newbies" ;)
> 

  Then you may have a look at
 
http://www.chip-architect.com/news/2003_09_21_Detailed_Architecture_of_AMDs_64bit_Core.html

  It's a bit more pleasant to read than the AMD or Intel programming manuals
but in the long run you will need those along with the accompanying erratas.

  Sébastien.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ck] Re: Linux Kernel cfs scheduler and xorg kbd

2007-08-02 Thread Sébastien Dugué

  Hi all,

  just a datapoint FWIW

On Wed, 01 Aug 2007 17:53:32 +0200 "<::.. Teresa_II ..::>" <[EMAIL PROTECTED]> 
wrote:

> У чт, 2007-08-02 у 01:00 +1000, Matthew Hawkins пише:
> 
> > Are you sure its not just a setting in Gnome/KDE for accessibility?
> > That tends to do crazy things like making control keys sticky...
> 
> Yes, i use gnome, and all keyboard layouts are set in gnome, also group
> switching for layouts. But accessibility is switched completely off.
> 

  This reminds me I had something similar happening about a year or so ago
running with Debian's unstable xorg and gnome.

  The keys would stick for a short time (no, no coffee spilled on the keyboard,
I can assure you ;-), the most annoying one being the delete key. Nothing
deterministic, it could happen anytime (or not at all for several days).

  As a final resort, I upgraded the xorg and gnome packages - problem solved.

  Sébastien.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ck] Re: Linux Kernel cfs scheduler and xorg kbd

2007-08-02 Thread Sébastien Dugué

  Hi all,

  just a datapoint FWIW

On Wed, 01 Aug 2007 17:53:32 +0200 ::.. Teresa_II ..:: [EMAIL PROTECTED] 
wrote:

 У чт, 2007-08-02 у 01:00 +1000, Matthew Hawkins пише:
 
  Are you sure its not just a setting in Gnome/KDE for accessibility?
  That tends to do crazy things like making control keys sticky...
 
 Yes, i use gnome, and all keyboard layouts are set in gnome, also group
 switching for layouts. But accessibility is switched completely off.
 

  This reminds me I had something similar happening about a year or so ago
running with Debian's unstable xorg and gnome.

  The keys would stick for a short time (no, no coffee spilled on the keyboard,
I can assure you ;-), the most annoying one being the delete key. Nothing
deterministic, it could happen anytime (or not at all for several days).

  As a final resort, I upgraded the xorg and gnome packages - problem solved.

  Sébastien.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Search for x86_64 documentation.

2007-08-02 Thread Sébastien Dugué

  Hi Francis,

On Wed, 1 Aug 2007 17:30:35 +0200 Francis Moreau [EMAIL PROTECTED] wrote:

 Hello Rene,
 
 On 8/1/07, Rene Herman [EMAIL PROTECTED] wrote:
  On 08/01/2007 03:27 PM, Francis Moreau wrote:
 
   I'm used to hack Linux on a ARM based board and would like to be
   involved in x86_64 architecture but I don't know where I should
   start...
  
   Could anyone point out some nice documentations/books describing this
   architecture ?
 
  First and foremost the AMD64 architecture documentation from AMD itself:
 
  http://www.amd.com/gb-uk/Processors/TechnicalResources/0,,30_182_739_7044,00.html
 
 
 Thank you for the pointer but I alread knew about them.
 
 I was actually more interested in books which are more pleasant to
 read than a raw datasheet. Something like x86_64 arch for newbies ;)
 

  Then you may have a look at
 
http://www.chip-architect.com/news/2003_09_21_Detailed_Architecture_of_AMDs_64bit_Core.html

  It's a bit more pleasant to read than the AMD or Intel programming manuals
but in the long run you will need those along with the accompanying erratas.

  Sébastien.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: ATA over ethernet swapping and obfuscated code

2007-07-31 Thread Sébastien Dugué
  Hi Pavel,

On Tue, 31 Jul 2007 15:58:31 +0200 Pavel Machek <[EMAIL PROTECTED]> wrote:

> Hi!
> 
> I wanted to know if it is possible/okay to swap over AOE... 
> 
> According to
> http://www.coraid.com/support/linux/EtherDrive-2.6-HOWTO-5.html#ss5.20
> .. it runs OOM even during normal use, so I guess swapping over it is
> no-no?
> 
> Can I build both client and server for these using free software?
> 
> In the process, I looked at the aoe code, and parts of it look like
> obfuscated C contest. The use of switch() as an if was particulary
> creative; I'm not even sure if I translated it properly... can you
> take a look?
> 
> (Patch is 
> 
> Signed-off-by: Pavel Machek <[EMAIL PROTECTED]>
> 
> but I did not even compile test it)
> 
> diff --git a/drivers/block/aoe/aoedev.c b/drivers/block/aoe/aoedev.c
> index 05a9719..38ba35d 100644
> --- a/drivers/block/aoe/aoedev.c
> +++ b/drivers/block/aoe/aoedev.c
> @@ -64,29 +64,26 @@ aoedev_newdev(ulong nframes)
>  
>   d = kzalloc(sizeof *d, GFP_ATOMIC);
>   f = kcalloc(nframes, sizeof *f, GFP_ATOMIC);
> - switch (!d || !f) {
> - case 0:
> - d->nframes = nframes;
> - d->frames = f;
> - e = f + nframes;
> - for (; f - f->tag = FREETAG;
> - f->skb = new_skb(ETH_ZLEN);
> - if (!f->skb)
> - break;
> - }
> - if (f == e)
> - break;
> + if (!d || !f) {
> + kfree(f);
> + kfree(d);
> + return NULL;
> + }
> +
> + d->nframes = nframes;
> + d->frames = f;
> + e = f + nframes;
> + for (; f + f->tag = FREETAG;
> + f->skb = new_skb(ETH_ZLEN);
> + if (!f->skb)
> + break;
> + }
> + if (f != e) {
>   while (f > d->frames) {
>   f--;
>   dev_kfree_skb(f->skb);
>   }
> - default:
> - if (f)
> - kfree(f);
> - if (d)
> - kfree(d);
> - return NULL;
>   }
>   INIT_WORK(>work, aoecmd_sleepwork);
>   spin_lock_init(>lock);

  Creative it is.

> 
> 
> aoedev_by_sysminor_m() returns with spinlock held in error case; I
> guess that's bad.
> 
> struct aoedev *
> aoedev_by_sysminor_m(ulong sysminor, ulong bufcnt)
> {
>   struct aoedev *d;
>   ulong flags;
> 
>   spin_lock_irqsave(_lock, flags);
> 
>   for (d=devlist; d; d=d->next)
>   if (d->sysminor == sysminor)
>   break;
> 
>   if (d == NULL) {
>   d = aoedev_newdev(bufcnt);
>   if (d == NULL) {
>   spin_unlock_irqrestore(_lock, flags);

~ what about here

>   printk(KERN_INFO "aoe: aoedev_newdev
> failure.\n");
>   return NULL;
> ~ here
>   }
>   d->sysminor = sysminor;
>   d->aoemajor = AOEMAJOR(sysminor);
>   d->aoeminor = AOEMINOR(sysminor);
>   }
> 
>   spin_unlock_irqrestore(_lock, flags);
>   return d;
> }
> 

  Sébastien.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH RT] Fix RT balancing tasks pulling

2007-07-31 Thread Sébastien Dugué

  Hi Josh,

On Mon, 30 Jul 2007 14:50:34 -0700 Josh Triplett <[EMAIL PROTECTED]> wrote:

> On Tue, 2007-07-24 at 15:42 +0200, Sébastien Dugué wrote:
> >   this one-liner fixes a bug in balance_rt_tasks() which sometimes 
> > manifests by
> > having a lower prio task being scheduled while a higher prio task is sitting
> > waiting on another runqueue.
> > 
> >   This is pretty hard to reproduce on low cpu count machines, for example, I
> > had to have sched_football run in a loop for ~38h before it failed on a 
> > dual HT
> > Xeon box.
> 
> Very impressive find, and this does seem to improve things in some
> cases.
> 
> However, with 2.6.22.1-rt8, which includes this patch, I still managed
> to get some failures after a few hours on an 8-way (quad dual-core) box.
> Preliminary results so far:
> 420 Final ball position: 0
>   2 Final ball position: 1
> 

  Darn, I thought it was it :(

  I had a ~60h run on my box with the very same kernel without a single
failure.

  Do you have any other user RT load running when those failures pop up?

  Anyway, back to scheduler code dissection now.

  Sébastien.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH RT] Fix RT balancing tasks pulling

2007-07-31 Thread Sébastien Dugué

  Hi Josh,

On Mon, 30 Jul 2007 14:50:34 -0700 Josh Triplett [EMAIL PROTECTED] wrote:

 On Tue, 2007-07-24 at 15:42 +0200, Sébastien Dugué wrote:
this one-liner fixes a bug in balance_rt_tasks() which sometimes 
  manifests by
  having a lower prio task being scheduled while a higher prio task is sitting
  waiting on another runqueue.
  
This is pretty hard to reproduce on low cpu count machines, for example, I
  had to have sched_football run in a loop for ~38h before it failed on a 
  dual HT
  Xeon box.
 
 Very impressive find, and this does seem to improve things in some
 cases.
 
 However, with 2.6.22.1-rt8, which includes this patch, I still managed
 to get some failures after a few hours on an 8-way (quad dual-core) box.
 Preliminary results so far:
 420 Final ball position: 0
   2 Final ball position: 1
 

  Darn, I thought it was it :(

  I had a ~60h run on my box with the very same kernel without a single
failure.

  Do you have any other user RT load running when those failures pop up?

  Anyway, back to scheduler code dissection now.

  Sébastien.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: ATA over ethernet swapping and obfuscated code

2007-07-31 Thread Sébastien Dugué
  Hi Pavel,

On Tue, 31 Jul 2007 15:58:31 +0200 Pavel Machek [EMAIL PROTECTED] wrote:

 Hi!
 
 I wanted to know if it is possible/okay to swap over AOE... 
 
 According to
 http://www.coraid.com/support/linux/EtherDrive-2.6-HOWTO-5.html#ss5.20
 .. it runs OOM even during normal use, so I guess swapping over it is
 no-no?
 
 Can I build both client and server for these using free software?
 
 In the process, I looked at the aoe code, and parts of it look like
 obfuscated C contest. The use of switch() as an if was particulary
 creative; I'm not even sure if I translated it properly... can you
 take a look?
 
 (Patch is 
 
 Signed-off-by: Pavel Machek [EMAIL PROTECTED]
 
 but I did not even compile test it)
 
 diff --git a/drivers/block/aoe/aoedev.c b/drivers/block/aoe/aoedev.c
 index 05a9719..38ba35d 100644
 --- a/drivers/block/aoe/aoedev.c
 +++ b/drivers/block/aoe/aoedev.c
 @@ -64,29 +64,26 @@ aoedev_newdev(ulong nframes)
  
   d = kzalloc(sizeof *d, GFP_ATOMIC);
   f = kcalloc(nframes, sizeof *f, GFP_ATOMIC);
 - switch (!d || !f) {
 - case 0:
 - d-nframes = nframes;
 - d-frames = f;
 - e = f + nframes;
 - for (; fe; f++) {
 - f-tag = FREETAG;
 - f-skb = new_skb(ETH_ZLEN);
 - if (!f-skb)
 - break;
 - }
 - if (f == e)
 - break;
 + if (!d || !f) {
 + kfree(f);
 + kfree(d);
 + return NULL;
 + }
 +
 + d-nframes = nframes;
 + d-frames = f;
 + e = f + nframes;
 + for (; fe; f++) {
 + f-tag = FREETAG;
 + f-skb = new_skb(ETH_ZLEN);
 + if (!f-skb)
 + break;
 + }
 + if (f != e) {
   while (f  d-frames) {
   f--;
   dev_kfree_skb(f-skb);
   }
 - default:
 - if (f)
 - kfree(f);
 - if (d)
 - kfree(d);
 - return NULL;
   }
   INIT_WORK(d-work, aoecmd_sleepwork);
   spin_lock_init(d-lock);

  Creative it is.

 
 
 aoedev_by_sysminor_m() returns with spinlock held in error case; I
 guess that's bad.
 
 struct aoedev *
 aoedev_by_sysminor_m(ulong sysminor, ulong bufcnt)
 {
   struct aoedev *d;
   ulong flags;
 
   spin_lock_irqsave(devlist_lock, flags);
 
   for (d=devlist; d; d=d-next)
   if (d-sysminor == sysminor)
   break;
 
   if (d == NULL) {
   d = aoedev_newdev(bufcnt);
   if (d == NULL) {
   spin_unlock_irqrestore(devlist_lock, flags);

~ what about here

   printk(KERN_INFO aoe: aoedev_newdev
 failure.\n);
   return NULL;
 ~ here
   }
   d-sysminor = sysminor;
   d-aoemajor = AOEMAJOR(sysminor);
   d-aoeminor = AOEMINOR(sysminor);
   }
 
   spin_unlock_irqrestore(devlist_lock, flags);
   return d;
 }
 

  Sébastien.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: commit 7e92b4fc34 - x86, serial: convert legacy COM ports to platform devices - broke my serial console

2007-07-26 Thread Sébastien Dugué

On Wed, 25 Jul 2007 16:41:23 -0600 Bjorn Helgaas <[EMAIL PROTECTED]> wrote:

> On Wednesday 25 July 2007 07:32:53 am Sébastien Dugué wrote:
> > On Wed, 25 Jul 2007 07:16:44 -0600 Bjorn Helgaas <[EMAIL PROTECTED]> wrote:
> > 
> > > The _DDN is a "DOS device name", and the _UID is a "logical device ID
> > > that does not change across reboots."  Both are optional, and PNPACPI
> > > ignores them.  But maybe we could change PNPACPI to sort by them if
> > > they are present.  I'll think about this a bit.
> > 
> >   That would be nice, but I wish you good luck with all those
> > crappy BIOSes out there.
> 
> Yeah, it's an ugly world we live in.  Would you be able to try the
> attached patch just for testing?  It should sort devices with the
> same _HID by their _UID.  It doesn't have any effect on my systems,
> because my devices are already ordered by _UID by default.  But I
> think it should switch your COM1/COM2 ports back to the order you
> expect.
> 

  Works well, thanks Bjorn.

  Sébastien.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: commit 7e92b4fc34 - x86, serial: convert legacy COM ports to platform devices - broke my serial console

2007-07-26 Thread Sébastien Dugué

On Wed, 25 Jul 2007 16:41:23 -0600 Bjorn Helgaas [EMAIL PROTECTED] wrote:

 On Wednesday 25 July 2007 07:32:53 am Sébastien Dugué wrote:
  On Wed, 25 Jul 2007 07:16:44 -0600 Bjorn Helgaas [EMAIL PROTECTED] wrote:
  
   The _DDN is a DOS device name, and the _UID is a logical device ID
   that does not change across reboots.  Both are optional, and PNPACPI
   ignores them.  But maybe we could change PNPACPI to sort by them if
   they are present.  I'll think about this a bit.
  
That would be nice, but I wish you good luck with all those
  crappy BIOSes out there.
 
 Yeah, it's an ugly world we live in.  Would you be able to try the
 attached patch just for testing?  It should sort devices with the
 same _HID by their _UID.  It doesn't have any effect on my systems,
 because my devices are already ordered by _UID by default.  But I
 think it should switch your COM1/COM2 ports back to the order you
 expect.
 

  Works well, thanks Bjorn.

  Sébastien.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: commit 7e92b4fc34 - x86, serial: convert legacy COM ports to platform devices - broke my serial console

2007-07-25 Thread Sébastien Dugué
On Wed, 25 Jul 2007 15:32:53 +0200 Sébastien Dugué <[EMAIL PROTECTED]> wrote:

> 
>   Hi Bjorn,
> 
> On Wed, 25 Jul 2007 07:16:44 -0600 Bjorn Helgaas <[EMAIL PROTECTED]> wrote:
> 
> > On Wednesday 25 July 2007 01:45:54 am Sébastien Dugué wrote:
> > >   looks like the commit was dropped, nevertheless here is some more info
> > > to try and understand what may be going on so it may benefit the 
> > > posterity ;-)
> > 
> > Thank you very much.
> > 
> > Your machine does indeed describe both UARTs in ACPI, and the PNP probe
> > finds them correctly.  The only wrinkle is that the PNP probe names the
> > ports in the order they appear in the ACPI namespace, and your firmware
> > has them "backwards":
> > 
> > Device (COMB)
> > {
> > Name (_HID, EisaId ("PNP0501"))
> > Name (_DDN, "COM2")
> > Name (_UID, 0x02)
> > 
> > Device (COMA)
> > {
> > Name (_HID, EisaId ("PNP0501"))
> > Name (_DDN, "COM1")
> > Name (_UID, 0x01)
> 
> 
>   Yes, that's what I thought was weird in the DSDT, but thought the PnP layer
> would map those devices according to UID or at least DDN. It seems that's
> not the case.
> 
> > 
> > There's nothing illegal about this, but with the current PNPACPI,
> > it causes the names to be reversed.  The blind probe tries 0x3f8
> > first, and names that ttyS0.  But the PNP probe, using the namespace
> > order, finds the "COM2" port at 0x2f8 first:
> > 
> >   00:0b: ttyS0 at I/O 0x2f8 (irq = 3) is a 16550A
> >   00:0c: ttyS1 at I/O 0x3f8 (irq = 4) is a 16550A
> 
>   Oh I see, didn't notice this though.
> 
> > 
> > So I think your serial console would work without "legacy_serial.force"
> > if you used "console=ttyS1" instead of "console=ttyS0".  But you
> > shouldn't have to do that, of course.
> 
>   Will try ttyS1 just to confirm...
> 
>   Waiting for reboot...
> 
>   OK boots fine with the serial console fully operational again.

  But I also need to change my inittab. At least it confirms the
root cause of the problem.

  Sébastien.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: commit 7e92b4fc34 - x86, serial: convert legacy COM ports to platform devices - broke my serial console

2007-07-25 Thread Sébastien Dugué

  Hi Bjorn,

On Wed, 25 Jul 2007 07:16:44 -0600 Bjorn Helgaas <[EMAIL PROTECTED]> wrote:

> On Wednesday 25 July 2007 01:45:54 am Sébastien Dugué wrote:
> >   looks like the commit was dropped, nevertheless here is some more info
> > to try and understand what may be going on so it may benefit the posterity 
> > ;-)
> 
> Thank you very much.
> 
> Your machine does indeed describe both UARTs in ACPI, and the PNP probe
> finds them correctly.  The only wrinkle is that the PNP probe names the
> ports in the order they appear in the ACPI namespace, and your firmware
> has them "backwards":
> 
> Device (COMB)
> {
> Name (_HID, EisaId ("PNP0501"))
> Name (_DDN, "COM2")
> Name (_UID, 0x02)
> 
> Device (COMA)
> {
> Name (_HID, EisaId ("PNP0501"))
> Name (_DDN, "COM1")
> Name (_UID, 0x01)


  Yes, that's what I thought was weird in the DSDT, but thought the PnP layer
would map those devices according to UID or at least DDN. It seems that's
not the case.

> 
> There's nothing illegal about this, but with the current PNPACPI,
> it causes the names to be reversed.  The blind probe tries 0x3f8
> first, and names that ttyS0.  But the PNP probe, using the namespace
> order, finds the "COM2" port at 0x2f8 first:
> 
>   00:0b: ttyS0 at I/O 0x2f8 (irq = 3) is a 16550A
>   00:0c: ttyS1 at I/O 0x3f8 (irq = 4) is a 16550A

  Oh I see, didn't notice this though.

> 
> So I think your serial console would work without "legacy_serial.force"
> if you used "console=ttyS1" instead of "console=ttyS0".  But you
> shouldn't have to do that, of course.

  Will try ttyS1 just to confirm...

  Waiting for reboot...

  OK boots fine with the serial console fully operational again.

> 
> The _DDN is a "DOS device name", and the _UID is a "logical device ID
> that does not change across reboots."  Both are optional, and PNPACPI
> ignores them.  But maybe we could change PNPACPI to sort by them if
> they are present.  I'll think about this a bit.

  That would be nice, but I wish you good luck with all those
crappy BIOSes out there.

  Thanks,

  Sébastien.


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: commit 7e92b4fc34 - x86, serial: convert legacy COM ports to platform devices - broke my serial console

2007-07-25 Thread Sébastien Dugué

  Hi Bjorn, 

  looks like the commit was dropped, nevertheless here is some more info
to try and understand what may be going on so it may benefit the posterity ;-)

On Tue, 24 Jul 2007 09:48:45 -0600 Bjorn Helgaas <[EMAIL PROTECTED]> wrote:

> On Tuesday 24 July 2007 08:28:05 am Sébastien Dugué wrote:
> >  your commit 7e92b4fc345f5b6f57585fbe5ffdb0f24d7c9b26 broke the serial 
> > console
> > on my box. Adding 'legacy_serial.force=1' to my boot param as a workaround
> > solves the issue, but this may be hiding bugs in Linux PnP support or
> > in my firmware.
> 
> Thanks for your report.  We need to figure out why the 8250_pnp driver
> didn't find your serial console device.  Can you confirm that you also
> have CONFIG_ACPI and CONFIG_PNPACPI in your .config?

  Yep, both =y

CONFIG_ACPI=y
CONFIG_PNPACPI=y

> 
> If you have those, and it still doesn't work, can you collect the DSDT
> dump, the output of "grep . /sys/bus/pnp/devices/*/*", and the dmesg
> from your "legacy_serial.force=1" boot? 

  Please find those attached. I also include the output from dmidecode
just in case (I'm not sure it contains anything interesting in this case
though).

  dsdt.dsl -> DSDT dump
  dmi.txt  -> output from dmidecode
  pnp-info.txt -> output of the grep

  dmesg-ok.txt -> dmesg with 'legacy_serial.force=1' (CONFIG_PNP_DEBUG=y)
  dmesg-bad.txt -> dmesg without 'legacy_serial.force=1' (CONFIG_PNP_DEBUG=y)

> Then we can tell which port
> the blind probe finds and whether it's described somewhere by ACPI.
> 
> Thanks,
>   Bjorn

  Thanks,

  Sébastien.

> 
> >   The box is a dual HT Xeon running a vanilla 2.6.22 x86_64 kernel
> > 
> >   here is my .config:
> > 
> > CONFIG_PNP=y
> > CONFIG_SERIAL_8250=y
> > CONFIG_SERIAL_8250_CONSOLE=y
> > CONFIG_SERIAL_8250_PCI=y
> > CONFIG_SERIAL_8250_PNP=y
> > CONFIG_SERIAL_8250_NR_UARTS=4
> > CONFIG_SERIAL_8250_RUNTIME_UARTS=4
> > CONFIG_SERIAL_8250_EXTENDED=y
> > CONFIG_SERIAL_8250_SHARE_IRQ=y
> > CONFIG_SERIAL_8250_DETECT_IRQ=y
> > 
> >   lspci output attached.
> >  
> >   Any ideas to help me debug this?
> > 
> >   If you need more info (like DSDT dump), just ask.
> > 
> >   Thanks,
> > 
> >   Sébastien.
> > 
> > 
> > 
> 
> 


logs.tar.bz2
Description: application/bzip


Re: commit 7e92b4fc34 - x86, serial: convert legacy COM ports to platform devices - broke my serial console

2007-07-25 Thread Sébastien Dugué

  Hi Bjorn, 

  looks like the commit was dropped, nevertheless here is some more info
to try and understand what may be going on so it may benefit the posterity ;-)

On Tue, 24 Jul 2007 09:48:45 -0600 Bjorn Helgaas [EMAIL PROTECTED] wrote:

 On Tuesday 24 July 2007 08:28:05 am Sébastien Dugué wrote:
   your commit 7e92b4fc345f5b6f57585fbe5ffdb0f24d7c9b26 broke the serial 
  console
  on my box. Adding 'legacy_serial.force=1' to my boot param as a workaround
  solves the issue, but this may be hiding bugs in Linux PnP support or
  in my firmware.
 
 Thanks for your report.  We need to figure out why the 8250_pnp driver
 didn't find your serial console device.  Can you confirm that you also
 have CONFIG_ACPI and CONFIG_PNPACPI in your .config?

  Yep, both =y

CONFIG_ACPI=y
CONFIG_PNPACPI=y

 
 If you have those, and it still doesn't work, can you collect the DSDT
 dump, the output of grep . /sys/bus/pnp/devices/*/*, and the dmesg
 from your legacy_serial.force=1 boot? 

  Please find those attached. I also include the output from dmidecode
just in case (I'm not sure it contains anything interesting in this case
though).

  dsdt.dsl - DSDT dump
  dmi.txt  - output from dmidecode
  pnp-info.txt - output of the grep

  dmesg-ok.txt - dmesg with 'legacy_serial.force=1' (CONFIG_PNP_DEBUG=y)
  dmesg-bad.txt - dmesg without 'legacy_serial.force=1' (CONFIG_PNP_DEBUG=y)

 Then we can tell which port
 the blind probe finds and whether it's described somewhere by ACPI.
 
 Thanks,
   Bjorn

  Thanks,

  Sébastien.

 
The box is a dual HT Xeon running a vanilla 2.6.22 x86_64 kernel
  
here is my .config:
  
  CONFIG_PNP=y
  CONFIG_SERIAL_8250=y
  CONFIG_SERIAL_8250_CONSOLE=y
  CONFIG_SERIAL_8250_PCI=y
  CONFIG_SERIAL_8250_PNP=y
  CONFIG_SERIAL_8250_NR_UARTS=4
  CONFIG_SERIAL_8250_RUNTIME_UARTS=4
  CONFIG_SERIAL_8250_EXTENDED=y
  CONFIG_SERIAL_8250_SHARE_IRQ=y
  CONFIG_SERIAL_8250_DETECT_IRQ=y
  
lspci output attached.
   
Any ideas to help me debug this?
  
If you need more info (like DSDT dump), just ask.
  
Thanks,
  
Sébastien.
  
  
  
 
 


logs.tar.bz2
Description: application/bzip


Re: commit 7e92b4fc34 - x86, serial: convert legacy COM ports to platform devices - broke my serial console

2007-07-25 Thread Sébastien Dugué
On Wed, 25 Jul 2007 15:32:53 +0200 Sébastien Dugué [EMAIL PROTECTED] wrote:

 
   Hi Bjorn,
 
 On Wed, 25 Jul 2007 07:16:44 -0600 Bjorn Helgaas [EMAIL PROTECTED] wrote:
 
  On Wednesday 25 July 2007 01:45:54 am Sébastien Dugué wrote:
 looks like the commit was dropped, nevertheless here is some more info
   to try and understand what may be going on so it may benefit the 
   posterity ;-)
  
  Thank you very much.
  
  Your machine does indeed describe both UARTs in ACPI, and the PNP probe
  finds them correctly.  The only wrinkle is that the PNP probe names the
  ports in the order they appear in the ACPI namespace, and your firmware
  has them backwards:
  
  Device (COMB)
  {
  Name (_HID, EisaId (PNP0501))
  Name (_DDN, COM2)
  Name (_UID, 0x02)
  
  Device (COMA)
  {
  Name (_HID, EisaId (PNP0501))
  Name (_DDN, COM1)
  Name (_UID, 0x01)
 
 
   Yes, that's what I thought was weird in the DSDT, but thought the PnP layer
 would map those devices according to UID or at least DDN. It seems that's
 not the case.
 
  
  There's nothing illegal about this, but with the current PNPACPI,
  it causes the names to be reversed.  The blind probe tries 0x3f8
  first, and names that ttyS0.  But the PNP probe, using the namespace
  order, finds the COM2 port at 0x2f8 first:
  
00:0b: ttyS0 at I/O 0x2f8 (irq = 3) is a 16550A
00:0c: ttyS1 at I/O 0x3f8 (irq = 4) is a 16550A
 
   Oh I see, didn't notice this though.
 
  
  So I think your serial console would work without legacy_serial.force
  if you used console=ttyS1 instead of console=ttyS0.  But you
  shouldn't have to do that, of course.
 
   Will try ttyS1 just to confirm...
 
   Waiting for reboot...
 
   OK boots fine with the serial console fully operational again.

  But I also need to change my inittab. At least it confirms the
root cause of the problem.

  Sébastien.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: commit 7e92b4fc34 - x86, serial: convert legacy COM ports to platform devices - broke my serial console

2007-07-25 Thread Sébastien Dugué

  Hi Bjorn,

On Wed, 25 Jul 2007 07:16:44 -0600 Bjorn Helgaas [EMAIL PROTECTED] wrote:

 On Wednesday 25 July 2007 01:45:54 am Sébastien Dugué wrote:
looks like the commit was dropped, nevertheless here is some more info
  to try and understand what may be going on so it may benefit the posterity 
  ;-)
 
 Thank you very much.
 
 Your machine does indeed describe both UARTs in ACPI, and the PNP probe
 finds them correctly.  The only wrinkle is that the PNP probe names the
 ports in the order they appear in the ACPI namespace, and your firmware
 has them backwards:
 
 Device (COMB)
 {
 Name (_HID, EisaId (PNP0501))
 Name (_DDN, COM2)
 Name (_UID, 0x02)
 
 Device (COMA)
 {
 Name (_HID, EisaId (PNP0501))
 Name (_DDN, COM1)
 Name (_UID, 0x01)


  Yes, that's what I thought was weird in the DSDT, but thought the PnP layer
would map those devices according to UID or at least DDN. It seems that's
not the case.

 
 There's nothing illegal about this, but with the current PNPACPI,
 it causes the names to be reversed.  The blind probe tries 0x3f8
 first, and names that ttyS0.  But the PNP probe, using the namespace
 order, finds the COM2 port at 0x2f8 first:
 
   00:0b: ttyS0 at I/O 0x2f8 (irq = 3) is a 16550A
   00:0c: ttyS1 at I/O 0x3f8 (irq = 4) is a 16550A

  Oh I see, didn't notice this though.

 
 So I think your serial console would work without legacy_serial.force
 if you used console=ttyS1 instead of console=ttyS0.  But you
 shouldn't have to do that, of course.

  Will try ttyS1 just to confirm...

  Waiting for reboot...

  OK boots fine with the serial console fully operational again.

 
 The _DDN is a DOS device name, and the _UID is a logical device ID
 that does not change across reboots.  Both are optional, and PNPACPI
 ignores them.  But maybe we could change PNPACPI to sort by them if
 they are present.  I'll think about this a bit.

  That would be nice, but I wish you good luck with all those
crappy BIOSes out there.

  Thanks,

  Sébastien.


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


commit 7e92b4fc34 - x86, serial: convert legacy COM ports to platform devices - broke my serial console

2007-07-24 Thread Sébastien Dugué

  Hi Bjorn,

 your commit 7e92b4fc345f5b6f57585fbe5ffdb0f24d7c9b26 broke the serial console
on my box. Adding 'legacy_serial.force=1' to my boot param as a workaround
solves the issue, but this may be hiding bugs in Linux PnP support or
in my firmware.

  The box is a dual HT Xeon running a vanilla 2.6.22 x86_64 kernel

  here is my .config:

CONFIG_PNP=y
CONFIG_SERIAL_8250=y
CONFIG_SERIAL_8250_CONSOLE=y
CONFIG_SERIAL_8250_PCI=y
CONFIG_SERIAL_8250_PNP=y
CONFIG_SERIAL_8250_NR_UARTS=4
CONFIG_SERIAL_8250_RUNTIME_UARTS=4
CONFIG_SERIAL_8250_EXTENDED=y
CONFIG_SERIAL_8250_SHARE_IRQ=y
CONFIG_SERIAL_8250_DETECT_IRQ=y

  lspci output attached.
 
  Any ideas to help me debug this?

  If you need more info (like DSDT dump), just ask.

  Thanks,

  Sébastien.


00:00.0 Host bridge: Intel Corporation E7520 Memory Controller Hub (rev 0c)
00:00.1 Class ff00: Intel Corporation E7525/E7520 Error Reporting Registers 
(rev 0c)
00:01.0 System peripheral: Intel Corporation E7520 DMA Controller (rev 0c)
00:02.0 PCI bridge: Intel Corporation E7525/E7520/E7320 PCI Express Port A (rev 
0c)
00:04.0 PCI bridge: Intel Corporation E7525/E7520 PCI Express Port B (rev 0c)
00:05.0 PCI bridge: Intel Corporation E7520 PCI Express Port B1 (rev 0c)
00:06.0 PCI bridge: Intel Corporation E7520 PCI Express Port C (rev 0c)
00:07.0 PCI bridge: Intel Corporation E7520 PCI Express Port C1 (rev 0c)
00:1c.0 PCI bridge: Intel Corporation 6300ESB 64-bit PCI-X Bridge (rev 02)
00:1d.0 USB Controller: Intel Corporation 6300ESB USB Universal Host Controller 
(rev 02)
00:1d.1 USB Controller: Intel Corporation 6300ESB USB Universal Host Controller 
(rev 02)
00:1d.4 System peripheral: Intel Corporation 6300ESB Watchdog Timer (rev 02)
00:1d.5 PIC: Intel Corporation 6300ESB I/O Advanced Programmable Interrupt 
Controller (rev 02)
00:1d.7 USB Controller: Intel Corporation 6300ESB USB2 Enhanced Host Controller 
(rev 02)
00:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev 0a)
00:1f.0 ISA bridge: Intel Corporation 6300ESB LPC Interface Controller (rev 02)
00:1f.1 IDE interface: Intel Corporation 6300ESB PATA Storage Controller (rev 
02)
00:1f.3 SMBus: Intel Corporation 6300ESB SMBus Controller (rev 02)
01:00.0 PCI bridge: Intel Corporation 6700PXH PCI Express-to-PCI Bridge A (rev 
09)
01:00.2 PCI bridge: Intel Corporation 6700PXH PCI Express-to-PCI Bridge B (rev 
09)
02:01.0 Ethernet controller: Intel Corporation 82544EI Gigabit Ethernet 
Controller (Copper) (rev 02)
02:03.0 SCSI storage controller: Adaptec AIC-7902B U320 (rev 10)
02:03.1 SCSI storage controller: Adaptec AIC-7902B U320 (rev 10)
03:01.0 Ethernet controller: Intel Corporation 82544EI Gigabit Ethernet 
Controller (Copper) (rev 02)
08:01.0 VGA compatible controller: ATI Technologies Inc Radeon RV100 QY [Radeon 
7000/VE]
08:02.0 Ethernet controller: Intel Corporation 82541GI Gigabit Ethernet 
Controller
08:03.0 Ethernet controller: Intel Corporation 82541GI Gigabit Ethernet 
Controller


[PATCH RT] Fix RT balancing tasks pulling

2007-07-24 Thread Sébastien Dugué


  Hi Ingo, Thomas,

  this one-liner fixes a bug in balance_rt_tasks() which sometimes manifests by
having a lower prio task being scheduled while a higher prio task is sitting
waiting on another runqueue.

  This is pretty hard to reproduce on low cpu count machines, for example, I
had to have sched_football run in a loop for ~38h before it failed on a dual HT
Xeon box.

  Sébastien.

---


  In the CPU loop in balance_rt_tasks(), the 'next' task pointer is only ever
updated if this_rq->lock has been dropped in double_lock_balance().

  This sometimes lead to 'this_cpu' pulling tasks that are only garanteed to
have a higher priority than the 'next' picked before the loop.

  Fix this to update 'next' to the last picked RT task.


Signed-off-by: Sébastien Dugué <[EMAIL PROTECTED]>

---
 kernel/sched.c |8 
 1 file changed, 8 insertions(+)

Index: linux-2.6.22.1-rt6/kernel/sched.c
===
--- linux-2.6.22.1-rt6.orig/kernel/sched.c  2007-07-24 11:34:18.0 
+0200
+++ linux-2.6.22.1-rt6/kernel/sched.c   2007-07-24 11:34:35.0 +0200
@@ -1499,6 +1499,14 @@ static void balance_rt_tasks(struct rq *
 * in another runqueue. (low likelyhood
 * but possible)
 */
+
+   /*
+* Update next so that we won't pick a task
+* on another cpu with a priority lower (or equal)
+* than the one we just picked.
+*/
+   next = p;
+
}
spin_unlock(_rq->lock);
}
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH RT] Fix RT balancing tasks pulling

2007-07-24 Thread Sébastien Dugué


  Hi Ingo, Thomas,

  this one-liner fixes a bug in balance_rt_tasks() which sometimes manifests by
having a lower prio task being scheduled while a higher prio task is sitting
waiting on another runqueue.

  This is pretty hard to reproduce on low cpu count machines, for example, I
had to have sched_football run in a loop for ~38h before it failed on a dual HT
Xeon box.

  Sébastien.

---


  In the CPU loop in balance_rt_tasks(), the 'next' task pointer is only ever
updated if this_rq-lock has been dropped in double_lock_balance().

  This sometimes lead to 'this_cpu' pulling tasks that are only garanteed to
have a higher priority than the 'next' picked before the loop.

  Fix this to update 'next' to the last picked RT task.


Signed-off-by: Sébastien Dugué [EMAIL PROTECTED]

---
 kernel/sched.c |8 
 1 file changed, 8 insertions(+)

Index: linux-2.6.22.1-rt6/kernel/sched.c
===
--- linux-2.6.22.1-rt6.orig/kernel/sched.c  2007-07-24 11:34:18.0 
+0200
+++ linux-2.6.22.1-rt6/kernel/sched.c   2007-07-24 11:34:35.0 +0200
@@ -1499,6 +1499,14 @@ static void balance_rt_tasks(struct rq *
 * in another runqueue. (low likelyhood
 * but possible)
 */
+
+   /*
+* Update next so that we won't pick a task
+* on another cpu with a priority lower (or equal)
+* than the one we just picked.
+*/
+   next = p;
+
}
spin_unlock(src_rq-lock);
}
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


commit 7e92b4fc34 - x86, serial: convert legacy COM ports to platform devices - broke my serial console

2007-07-24 Thread Sébastien Dugué

  Hi Bjorn,

 your commit 7e92b4fc345f5b6f57585fbe5ffdb0f24d7c9b26 broke the serial console
on my box. Adding 'legacy_serial.force=1' to my boot param as a workaround
solves the issue, but this may be hiding bugs in Linux PnP support or
in my firmware.

  The box is a dual HT Xeon running a vanilla 2.6.22 x86_64 kernel

  here is my .config:

CONFIG_PNP=y
CONFIG_SERIAL_8250=y
CONFIG_SERIAL_8250_CONSOLE=y
CONFIG_SERIAL_8250_PCI=y
CONFIG_SERIAL_8250_PNP=y
CONFIG_SERIAL_8250_NR_UARTS=4
CONFIG_SERIAL_8250_RUNTIME_UARTS=4
CONFIG_SERIAL_8250_EXTENDED=y
CONFIG_SERIAL_8250_SHARE_IRQ=y
CONFIG_SERIAL_8250_DETECT_IRQ=y

  lspci output attached.
 
  Any ideas to help me debug this?

  If you need more info (like DSDT dump), just ask.

  Thanks,

  Sébastien.


00:00.0 Host bridge: Intel Corporation E7520 Memory Controller Hub (rev 0c)
00:00.1 Class ff00: Intel Corporation E7525/E7520 Error Reporting Registers 
(rev 0c)
00:01.0 System peripheral: Intel Corporation E7520 DMA Controller (rev 0c)
00:02.0 PCI bridge: Intel Corporation E7525/E7520/E7320 PCI Express Port A (rev 
0c)
00:04.0 PCI bridge: Intel Corporation E7525/E7520 PCI Express Port B (rev 0c)
00:05.0 PCI bridge: Intel Corporation E7520 PCI Express Port B1 (rev 0c)
00:06.0 PCI bridge: Intel Corporation E7520 PCI Express Port C (rev 0c)
00:07.0 PCI bridge: Intel Corporation E7520 PCI Express Port C1 (rev 0c)
00:1c.0 PCI bridge: Intel Corporation 6300ESB 64-bit PCI-X Bridge (rev 02)
00:1d.0 USB Controller: Intel Corporation 6300ESB USB Universal Host Controller 
(rev 02)
00:1d.1 USB Controller: Intel Corporation 6300ESB USB Universal Host Controller 
(rev 02)
00:1d.4 System peripheral: Intel Corporation 6300ESB Watchdog Timer (rev 02)
00:1d.5 PIC: Intel Corporation 6300ESB I/O Advanced Programmable Interrupt 
Controller (rev 02)
00:1d.7 USB Controller: Intel Corporation 6300ESB USB2 Enhanced Host Controller 
(rev 02)
00:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev 0a)
00:1f.0 ISA bridge: Intel Corporation 6300ESB LPC Interface Controller (rev 02)
00:1f.1 IDE interface: Intel Corporation 6300ESB PATA Storage Controller (rev 
02)
00:1f.3 SMBus: Intel Corporation 6300ESB SMBus Controller (rev 02)
01:00.0 PCI bridge: Intel Corporation 6700PXH PCI Express-to-PCI Bridge A (rev 
09)
01:00.2 PCI bridge: Intel Corporation 6700PXH PCI Express-to-PCI Bridge B (rev 
09)
02:01.0 Ethernet controller: Intel Corporation 82544EI Gigabit Ethernet 
Controller (Copper) (rev 02)
02:03.0 SCSI storage controller: Adaptec AIC-7902B U320 (rev 10)
02:03.1 SCSI storage controller: Adaptec AIC-7902B U320 (rev 10)
03:01.0 Ethernet controller: Intel Corporation 82544EI Gigabit Ethernet 
Controller (Copper) (rev 02)
08:01.0 VGA compatible controller: ATI Technologies Inc Radeon RV100 QY [Radeon 
7000/VE]
08:02.0 Ethernet controller: Intel Corporation 82541GI Gigabit Ethernet 
Controller
08:03.0 Ethernet controller: Intel Corporation 82541GI Gigabit Ethernet 
Controller


Re: [Patch RT] Fix CFS load balancing for RT tasks

2007-07-13 Thread Sébastien Dugué

 Hi Josh,

On Thu, 12 Jul 2007 10:13:57 -0700 Josh Triplett <[EMAIL PROTECTED]> wrote:

> On Thu, 2007-07-12 at 09:41 -0700, Josh Triplett wrote:
> > On Wed, 2007-07-11 at 16:47 +0200, Sébastien Dugué wrote:
> > >   there seems to be something wrong with the way the CFS balances (or 
> > > does not
> > > balance) RT tasks. This was evidenced using the sched_football testcase
> > > available from the RT wiki 
> > > (http://rt.wiki.kernel.org/index.php/IBM_Test_Cases)
> > > which I modified and attached to this mail.
> > > 
> > >   The testcase starts a number of threads which fall into 3 categories:
> > > 
> > >   1 referee thread: SCHED_FIFO, RT prio 5
> > >   ncpus defensive threads: SCHED_FIFO, RT prio 4
> > >   ncpus offensive threads: SCHED_FIFO, RT prio 3
> > > 
> > >   (ncpus being the number of CPUs)
> > > 
> > >   To make a long story short, the defensive threads should end up 
> > > distributed
> > > among all CPUs, but that's not the case. For example, on a dual HT Xeon 
> > > box,
> > > after task migration stabilizes we have the following running on the 
> > > different
> > > CPUs:
> > > 
> > >   CPU 0: defense2
> > >   CPU 1: referee offense2 offense3 offense4 defense3
> > >   CPU 2: offense1
> > >   CPU 3: defense1 defense4
> > > 
> > > which clearly show the imbalance between CPU 2 and CPU 3 where offense1
> > > should not be allowed to run while the higher prio defense1 and defense4
> > > are sharing the same CPU.
> > > 
> > >   The following patch fixes this by re-enabling the RT overload detection
> > > for the CFS. It may not be the right solution, maybe it should be 
> > > incorporated
> > > into the other load balancing mechanisms. I did not digg deep enough yet
> > > to make that call ;-)
> > 
> > 2.6.21.5-rt20 plus this patch passed 1000 runs of the standard
> > sched_football on an 8 processor (quad dual-core) x86-64 box.  Nice
> > work.
> 
> Hmmm, seems I spoke a bit too soon; due to a bug in the test log
> checker, the test failed but the log checker said PASS.  Actual results:
> $ grep 'Final ball position' rt-tests.log | sort | uniq -c
> 960 Final ball position: 0
>  39 Final ball position: 1
>   1 Final ball position: 2
> 
> So it failed 4% of the runs.  However, it failed much less
> spectacularly; rather than overrunning the integer maximum, it only
> reached 1-2.  Still a huge improvement despite not solving the problem
> completely.


  Yeah, I had a 5000 iterations test run last night which failed with
one of the runs having a final ball position of 1. So it's not yet
perfect. I think we're hitting the same bug as with the O(1) scheduler
in 2.6.21-rt8.

  It seems that the more CPUs there are, the easier it is to reproduce.
Hopefully I should have access to a quad dual core box soon now to
verify this.

  One more thing, the bug seems even harder to reproduce if I have
a logdev LD_MARK (direct marker, no kprobe) in context_switch().

  Ingo, Thomas, any ideas, opinions? It looks like there might be a race in
the RT balancing code that prevents pulling the highest prio task and
instead selects a lower one to run.

  In the mean time I'll continue poring over the 2.6.21-rt8 scheduler
to try to find something.


  Sébastien.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Patch RT] Fix CFS load balancing for RT tasks

2007-07-13 Thread Sébastien Dugué

  Hi Josh,

On Thu, 12 Jul 2007 09:41:55 -0700 Josh Triplett <[EMAIL PROTECTED]> wrote:

> On Wed, 2007-07-11 at 16:47 +0200, Sébastien Dugué wrote:
> >   there seems to be something wrong with the way the CFS balances (or does 
> > not
> > balance) RT tasks. This was evidenced using the sched_football testcase
> > available from the RT wiki 
> > (http://rt.wiki.kernel.org/index.php/IBM_Test_Cases)
> > which I modified and attached to this mail.
> > 
> >   The testcase starts a number of threads which fall into 3 categories:
> > 
> > 1 referee thread: SCHED_FIFO, RT prio 5
> > ncpus defensive threads: SCHED_FIFO, RT prio 4
> > ncpus offensive threads: SCHED_FIFO, RT prio 3
> > 
> > (ncpus being the number of CPUs)
> > 
> >   To make a long story short, the defensive threads should end up 
> > distributed
> > among all CPUs, but that's not the case. For example, on a dual HT Xeon box,
> > after task migration stabilizes we have the following running on the 
> > different
> > CPUs:
> > 
> >   CPU 0: defense2
> >   CPU 1: referee offense2 offense3 offense4 defense3
> >   CPU 2: offense1
> >   CPU 3: defense1 defense4
> > 
> > which clearly show the imbalance between CPU 2 and CPU 3 where offense1
> > should not be allowed to run while the higher prio defense1 and defense4
> > are sharing the same CPU.
> > 
> >   The following patch fixes this by re-enabling the RT overload detection
> > for the CFS. It may not be the right solution, maybe it should be 
> > incorporated
> > into the other load balancing mechanisms. I did not digg deep enough yet
> > to make that call ;-)
> 
> 2.6.21.5-rt20 plus this patch passed 1000 runs of the standard
> sched_football on an 8 processor (quad dual-core) x86-64 box.  Nice
> work.

  Thanks, but a nightly run revealed that there is still a bug. At least
now CFS is on par with O(1).


> 
> >   The RT overload mechanism of the O(1) scheduler has not been activated
> > in the new CFS.
> > 
> >   This patch fixes that by inserting calls to inc_rt_tasks() and 
> > dec_rt_tasks()
> > in enqueue_task_rt() and dequeue_task_rt() respectively, which enables the
> > balance_rt_tasks() to be run in the rt_overload case.
> > 
> > 
> > Signed-off-by: Sébastien Dugué <[EMAIL PROTECTED]>
> 
> Acked-by: Josh Triplett <[EMAIL PROTECTED]>
> 
> - Josh Triplett

  Sébastien.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Patch RT] Fix CFS load balancing for RT tasks

2007-07-13 Thread Sébastien Dugué

  Hi Josh,

On Thu, 12 Jul 2007 09:41:55 -0700 Josh Triplett [EMAIL PROTECTED] wrote:

 On Wed, 2007-07-11 at 16:47 +0200, Sébastien Dugué wrote:
there seems to be something wrong with the way the CFS balances (or does 
  not
  balance) RT tasks. This was evidenced using the sched_football testcase
  available from the RT wiki 
  (http://rt.wiki.kernel.org/index.php/IBM_Test_Cases)
  which I modified and attached to this mail.
  
The testcase starts a number of threads which fall into 3 categories:
  
  1 referee thread: SCHED_FIFO, RT prio 5
  ncpus defensive threads: SCHED_FIFO, RT prio 4
  ncpus offensive threads: SCHED_FIFO, RT prio 3
  
  (ncpus being the number of CPUs)
  
To make a long story short, the defensive threads should end up 
  distributed
  among all CPUs, but that's not the case. For example, on a dual HT Xeon box,
  after task migration stabilizes we have the following running on the 
  different
  CPUs:
  
CPU 0: defense2
CPU 1: referee offense2 offense3 offense4 defense3
CPU 2: offense1
CPU 3: defense1 defense4
  
  which clearly show the imbalance between CPU 2 and CPU 3 where offense1
  should not be allowed to run while the higher prio defense1 and defense4
  are sharing the same CPU.
  
The following patch fixes this by re-enabling the RT overload detection
  for the CFS. It may not be the right solution, maybe it should be 
  incorporated
  into the other load balancing mechanisms. I did not digg deep enough yet
  to make that call ;-)
 
 2.6.21.5-rt20 plus this patch passed 1000 runs of the standard
 sched_football on an 8 processor (quad dual-core) x86-64 box.  Nice
 work.

  Thanks, but a nightly run revealed that there is still a bug. At least
now CFS is on par with O(1).


 
The RT overload mechanism of the O(1) scheduler has not been activated
  in the new CFS.
  
This patch fixes that by inserting calls to inc_rt_tasks() and 
  dec_rt_tasks()
  in enqueue_task_rt() and dequeue_task_rt() respectively, which enables the
  balance_rt_tasks() to be run in the rt_overload case.
  
  
  Signed-off-by: Sébastien Dugué [EMAIL PROTECTED]
 
 Acked-by: Josh Triplett [EMAIL PROTECTED]
 
 - Josh Triplett

  Sébastien.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Patch RT] Fix CFS load balancing for RT tasks

2007-07-13 Thread Sébastien Dugué

 Hi Josh,

On Thu, 12 Jul 2007 10:13:57 -0700 Josh Triplett [EMAIL PROTECTED] wrote:

 On Thu, 2007-07-12 at 09:41 -0700, Josh Triplett wrote:
  On Wed, 2007-07-11 at 16:47 +0200, Sébastien Dugué wrote:
 there seems to be something wrong with the way the CFS balances (or 
   does not
   balance) RT tasks. This was evidenced using the sched_football testcase
   available from the RT wiki 
   (http://rt.wiki.kernel.org/index.php/IBM_Test_Cases)
   which I modified and attached to this mail.
   
 The testcase starts a number of threads which fall into 3 categories:
   
 1 referee thread: SCHED_FIFO, RT prio 5
 ncpus defensive threads: SCHED_FIFO, RT prio 4
 ncpus offensive threads: SCHED_FIFO, RT prio 3
   
 (ncpus being the number of CPUs)
   
 To make a long story short, the defensive threads should end up 
   distributed
   among all CPUs, but that's not the case. For example, on a dual HT Xeon 
   box,
   after task migration stabilizes we have the following running on the 
   different
   CPUs:
   
 CPU 0: defense2
 CPU 1: referee offense2 offense3 offense4 defense3
 CPU 2: offense1
 CPU 3: defense1 defense4
   
   which clearly show the imbalance between CPU 2 and CPU 3 where offense1
   should not be allowed to run while the higher prio defense1 and defense4
   are sharing the same CPU.
   
 The following patch fixes this by re-enabling the RT overload detection
   for the CFS. It may not be the right solution, maybe it should be 
   incorporated
   into the other load balancing mechanisms. I did not digg deep enough yet
   to make that call ;-)
  
  2.6.21.5-rt20 plus this patch passed 1000 runs of the standard
  sched_football on an 8 processor (quad dual-core) x86-64 box.  Nice
  work.
 
 Hmmm, seems I spoke a bit too soon; due to a bug in the test log
 checker, the test failed but the log checker said PASS.  Actual results:
 $ grep 'Final ball position' rt-tests.log | sort | uniq -c
 960 Final ball position: 0
  39 Final ball position: 1
   1 Final ball position: 2
 
 So it failed 4% of the runs.  However, it failed much less
 spectacularly; rather than overrunning the integer maximum, it only
 reached 1-2.  Still a huge improvement despite not solving the problem
 completely.


  Yeah, I had a 5000 iterations test run last night which failed with
one of the runs having a final ball position of 1. So it's not yet
perfect. I think we're hitting the same bug as with the O(1) scheduler
in 2.6.21-rt8.

  It seems that the more CPUs there are, the easier it is to reproduce.
Hopefully I should have access to a quad dual core box soon now to
verify this.

  One more thing, the bug seems even harder to reproduce if I have
a logdev LD_MARK (direct marker, no kprobe) in context_switch().

  Ingo, Thomas, any ideas, opinions? It looks like there might be a race in
the RT balancing code that prevents pulling the highest prio task and
instead selects a lower one to run.

  In the mean time I'll continue poring over the 2.6.21-rt8 scheduler
to try to find something.


  Sébastien.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Patch RT] Fix CFS load balancing for RT tasks

2007-07-12 Thread Sébastien Dugué
On Wed, 11 Jul 2007 13:41:55 -0400 (EDT) Steven Rostedt <[EMAIL PROTECTED]> 
wrote:

> > On Wed, 2007-07-11 at 16:47 +0200, Sébastien Dugué wrote:
> > >
> > >   P.S. Thanks to Steven Rostedt for logdev which is proving invaluable in
> > >cases like this.
> > >
> 
> Great to hear!!!
> 
> I'll be uploading a new version of Logdev in a week or so.
> 
> Thanks for the feedback.
> 
> -- Steve
> 

  Great! It's quite a nice and lightweight tool which I think will become
one of my top tools for kernel debugging.

  Just one suggestion, maybe, would be to add to the readme file the
order in which the patches should apply along with a short description
of what functionality they provide (or maybe document the patches and add a
series file).

  Anyway, thanks for that great tool.

  Sébastien.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Patch RT] Fix CFS load balancing for RT tasks

2007-07-12 Thread Sébastien Dugué
On Wed, 11 Jul 2007 19:06:20 +0200 Thomas Gleixner <[EMAIL PROTECTED]> wrote:

> On Wed, 2007-07-11 at 16:47 +0200, Sébastien Dugué wrote:
> >   The following patch fixes this by re-enabling the RT overload detection
> > for the CFS. It may not be the right solution, maybe it should be 
> > incorporated
> > into the other load balancing mechanisms. I did not digg deep enough yet
> > to make that call ;-)
> > 
> >   P.S. Thanks to Steven Rostedt for logdev which is proving invaluable in
> >cases like this.
> > 
> >   Sébastien.
> 
> Nice catch. That was dropped during the CFS -> -rt merge.
> 
>   tglx

  Thanks, and sorry, I forgot to CC you :(

  Sébastien.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Patch RT] Fix CFS load balancing for RT tasks

2007-07-12 Thread Sébastien Dugué
On Wed, 11 Jul 2007 19:06:20 +0200 Thomas Gleixner [EMAIL PROTECTED] wrote:

 On Wed, 2007-07-11 at 16:47 +0200, Sébastien Dugué wrote:
The following patch fixes this by re-enabling the RT overload detection
  for the CFS. It may not be the right solution, maybe it should be 
  incorporated
  into the other load balancing mechanisms. I did not digg deep enough yet
  to make that call ;-)
  
P.S. Thanks to Steven Rostedt for logdev which is proving invaluable in
 cases like this.
  
Sébastien.
 
 Nice catch. That was dropped during the CFS - -rt merge.
 
   tglx

  Thanks, and sorry, I forgot to CC you :(

  Sébastien.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Patch RT] Fix CFS load balancing for RT tasks

2007-07-12 Thread Sébastien Dugué
On Wed, 11 Jul 2007 13:41:55 -0400 (EDT) Steven Rostedt [EMAIL PROTECTED] 
wrote:

  On Wed, 2007-07-11 at 16:47 +0200, Sébastien Dugué wrote:
  
 P.S. Thanks to Steven Rostedt for logdev which is proving invaluable in
  cases like this.
  
 
 Great to hear!!!
 
 I'll be uploading a new version of Logdev in a week or so.
 
 Thanks for the feedback.
 
 -- Steve
 

  Great! It's quite a nice and lightweight tool which I think will become
one of my top tools for kernel debugging.

  Just one suggestion, maybe, would be to add to the readme file the
order in which the patches should apply along with a short description
of what functionality they provide (or maybe document the patches and add a
series file).

  Anyway, thanks for that great tool.

  Sébastien.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[Patch RT] Fix CFS load balancing for RT tasks

2007-07-11 Thread Sébastien Dugué

  Hi Ingo, all

  there seems to be something wrong with the way the CFS balances (or does not
balance) RT tasks. This was evidenced using the sched_football testcase
available from the RT wiki (http://rt.wiki.kernel.org/index.php/IBM_Test_Cases)
which I modified and attached to this mail.

  The testcase starts a number of threads which fall into 3 categories:

1 referee thread: SCHED_FIFO, RT prio 5
ncpus defensive threads: SCHED_FIFO, RT prio 4
ncpus offensive threads: SCHED_FIFO, RT prio 3

(ncpus being the number of CPUs)

  To make a long story short, the defensive threads should end up distributed
among all CPUs, but that's not the case. For example, on a dual HT Xeon box,
after task migration stabilizes we have the following running on the different
CPUs:

  CPU 0: defense2
  CPU 1: referee offense2 offense3 offense4 defense3
  CPU 2: offense1
  CPU 3: defense1 defense4

which clearly show the imbalance between CPU 2 and CPU 3 where offense1
should not be allowed to run while the higher prio defense1 and defense4
are sharing the same CPU.

  The following patch fixes this by re-enabling the RT overload detection
for the CFS. It may not be the right solution, maybe it should be incorporated
into the other load balancing mechanisms. I did not digg deep enough yet
to make that call ;-)

  P.S. Thanks to Steven Rostedt for logdev which is proving invaluable in
   cases like this.

  Sébastien.


--

  The RT overload mechanism of the O(1) scheduler has not been activated
in the new CFS.

  This patch fixes that by inserting calls to inc_rt_tasks() and dec_rt_tasks()
in enqueue_task_rt() and dequeue_task_rt() respectively, which enables the
balance_rt_tasks() to be run in the rt_overload case.


Signed-off-by: Sébastien Dugué <[EMAIL PROTECTED]>

---
 kernel/sched_rt.c |4 
 1 file changed, 4 insertions(+)

Index: linux-2.6.21.5-rt20/kernel/sched_rt.c
===
--- linux-2.6.21.5-rt20.orig/kernel/sched_rt.c  2007-07-11 10:46:26.0 
+0200
+++ linux-2.6.21.5-rt20/kernel/sched_rt.c   2007-07-11 10:46:50.0 
+0200
@@ -32,6 +32,8 @@ enqueue_task_rt(struct rq *rq, struct ta
 
list_add_tail(>run_list, array->queue + p->prio);
__set_bit(p->prio, array->bitmap);
+
+   inc_rt_tasks(p, rq);
 }
 
 /*
@@ -44,6 +46,8 @@ dequeue_task_rt(struct rq *rq, struct ta
 
update_curr_rt(rq, now);
 
+   dec_rt_tasks(p, rq);
+
list_del(>run_list);
if (list_empty(array->queue + p->prio))
__clear_bit(p->prio, array->bitmap);
/* Threaded Football - by John Stultz <[EMAIL PROTECTED]>
 *
 * This is a scheduler test that uses a football analogy.
 * The premise is that we want to make sure that lower priority threads
 * (the offensive team) do not preempt higher priority threads (the
 * defensive team). The offense is trying to increment the balls position,
 * while the defense is trying to block that from happening.
 * And the ref (highest priority thread) will blow the wistle if the
 * ball moves. Finally, we have crazy fans (higer prority) that try to
 * distract the defense by occasionally running onto the field.
 *
 * Steps:
 * - Create a fixed number of offense threads (lower priority)
 * - Create a fixed number of defense threads (higher priority)
 * - Create a referee thread (highest priority)
 * - Once everyone is on the field, the offense thread increments the value of
 *   'the_ball' and yields. The defense thread tries to block the ball by never
 *   letting the offense players get the CPU (it just does a sched_yield)
 * - The refree threads wakes up regularly to check if the game is over :)
 * - In the end, if the value of 'the_ball' is >0, the test is considered to
 *   have failed.
 *
 * PS: I really don't like football that much, it just seemed to fit.
 *
 * 2006-03-16 Reduced verbosity, non binary failure reporting, removal of
 * crazy_fans thread, added game_length argument by Darren Hart.
 */

#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 

#define gettid() syscall(__NR_gettid)

#define MAXTHREADS 256

#define DEF_GAME_LENGTH 5

/* Here's the position of the ball */
volatile int the_ball;

/* Game status */
volatile int game_started;
volatile int game_over;

/* Keep track of who's on the field */
volatile int offense_count;
volatile int defense_count;

/* simple mutex for our atomic increments */
pthread_mutex_t mutex = PTHREAD_MUTEX_INITIALIZER;

#define min(x,y) (x < y ? x: y)


/* This is the defensive team. They're trying to block the offense */
void *thread_defense(void* arg)
{
	pthread_mutex_lock();
	defense_count++;
	pthread_mutex_unlock();

	printf("thread_defense (%ld) starting\n", gettid());

	/*keep the ball from being moved */
	while (!game_over) {
		sched_yield(); /* let other defenders

[Patch RT] Fix CFS load balancing for RT tasks

2007-07-11 Thread Sébastien Dugué

  Hi Ingo, all

  there seems to be something wrong with the way the CFS balances (or does not
balance) RT tasks. This was evidenced using the sched_football testcase
available from the RT wiki (http://rt.wiki.kernel.org/index.php/IBM_Test_Cases)
which I modified and attached to this mail.

  The testcase starts a number of threads which fall into 3 categories:

1 referee thread: SCHED_FIFO, RT prio 5
ncpus defensive threads: SCHED_FIFO, RT prio 4
ncpus offensive threads: SCHED_FIFO, RT prio 3

(ncpus being the number of CPUs)

  To make a long story short, the defensive threads should end up distributed
among all CPUs, but that's not the case. For example, on a dual HT Xeon box,
after task migration stabilizes we have the following running on the different
CPUs:

  CPU 0: defense2
  CPU 1: referee offense2 offense3 offense4 defense3
  CPU 2: offense1
  CPU 3: defense1 defense4

which clearly show the imbalance between CPU 2 and CPU 3 where offense1
should not be allowed to run while the higher prio defense1 and defense4
are sharing the same CPU.

  The following patch fixes this by re-enabling the RT overload detection
for the CFS. It may not be the right solution, maybe it should be incorporated
into the other load balancing mechanisms. I did not digg deep enough yet
to make that call ;-)

  P.S. Thanks to Steven Rostedt for logdev which is proving invaluable in
   cases like this.

  Sébastien.


--

  The RT overload mechanism of the O(1) scheduler has not been activated
in the new CFS.

  This patch fixes that by inserting calls to inc_rt_tasks() and dec_rt_tasks()
in enqueue_task_rt() and dequeue_task_rt() respectively, which enables the
balance_rt_tasks() to be run in the rt_overload case.


Signed-off-by: Sébastien Dugué [EMAIL PROTECTED]

---
 kernel/sched_rt.c |4 
 1 file changed, 4 insertions(+)

Index: linux-2.6.21.5-rt20/kernel/sched_rt.c
===
--- linux-2.6.21.5-rt20.orig/kernel/sched_rt.c  2007-07-11 10:46:26.0 
+0200
+++ linux-2.6.21.5-rt20/kernel/sched_rt.c   2007-07-11 10:46:50.0 
+0200
@@ -32,6 +32,8 @@ enqueue_task_rt(struct rq *rq, struct ta
 
list_add_tail(p-run_list, array-queue + p-prio);
__set_bit(p-prio, array-bitmap);
+
+   inc_rt_tasks(p, rq);
 }
 
 /*
@@ -44,6 +46,8 @@ dequeue_task_rt(struct rq *rq, struct ta
 
update_curr_rt(rq, now);
 
+   dec_rt_tasks(p, rq);
+
list_del(p-run_list);
if (list_empty(array-queue + p-prio))
__clear_bit(p-prio, array-bitmap);
/* Threaded Football - by John Stultz [EMAIL PROTECTED]
 *
 * This is a scheduler test that uses a football analogy.
 * The premise is that we want to make sure that lower priority threads
 * (the offensive team) do not preempt higher priority threads (the
 * defensive team). The offense is trying to increment the balls position,
 * while the defense is trying to block that from happening.
 * And the ref (highest priority thread) will blow the wistle if the
 * ball moves. Finally, we have crazy fans (higer prority) that try to
 * distract the defense by occasionally running onto the field.
 *
 * Steps:
 * - Create a fixed number of offense threads (lower priority)
 * - Create a fixed number of defense threads (higher priority)
 * - Create a referee thread (highest priority)
 * - Once everyone is on the field, the offense thread increments the value of
 *   'the_ball' and yields. The defense thread tries to block the ball by never
 *   letting the offense players get the CPU (it just does a sched_yield)
 * - The refree threads wakes up regularly to check if the game is over :)
 * - In the end, if the value of 'the_ball' is 0, the test is considered to
 *   have failed.
 *
 * PS: I really don't like football that much, it just seemed to fit.
 *
 * 2006-03-16 Reduced verbosity, non binary failure reporting, removal of
 * crazy_fans thread, added game_length argument by Darren Hart.
 */

#include stdio.h
#include stdlib.h
#include signal.h
#include time.h
#include string.h
#include pthread.h
#include sched.h
#include errno.h
#include sys/syscall.h
#include unistd.h
#include sys/time.h

#define gettid() syscall(__NR_gettid)

#define MAXTHREADS 256

#define DEF_GAME_LENGTH 5

/* Here's the position of the ball */
volatile int the_ball;

/* Game status */
volatile int game_started;
volatile int game_over;

/* Keep track of who's on the field */
volatile int offense_count;
volatile int defense_count;

/* simple mutex for our atomic increments */
pthread_mutex_t mutex = PTHREAD_MUTEX_INITIALIZER;

#define min(x,y) (x  y ? x: y)


/* This is the defensive team. They're trying to block the offense */
void *thread_defense(void* arg)
{
	pthread_mutex_lock(mutex);
	defense_count++;
	pthread_mutex_unlock(mutex);

	printf(thread_defense (%ld) starting\n, gettid());

	/*keep the ball from being moved */
	while (!game_over

Re: [x86 setup 22/33] CPU features verification for the new x86 setup code

2007-07-10 Thread Sébastien Dugué

  Hi Peter,

  Only two sub-minor nits:

On Mon, 9 Jul 2007 19:52:01 -0700 "H. Peter Anvin" <[EMAIL PROTECTED]> wrote:

> From: H. Peter Anvin <[EMAIL PROTECTED]>
> 
> Verify that the CPU has enough features to run the kernel.  This may
> entail enabling features on some CPUs.
> 
> By doing this in the setup code we can be guaranteed to still be able to
> write to the console through the BIOS.
> 
> Signed-off-by: H. Peter Anvin <[EMAIL PROTECTED]>
> ---
>  arch/i386/boot/cpu.c  |   69 
>  arch/i386/boot/cpucheck.c |  267 
> +
>  2 files changed, 336 insertions(+), 0 deletions(-)
>  create mode 100644 arch/i386/boot/cpu.c
>  create mode 100644 arch/i386/boot/cpucheck.c
> 
> diff --git a/arch/i386/boot/cpu.c b/arch/i386/boot/cpu.c
> new file mode 100644
> index 000..042d894
> --- /dev/null
> +++ b/arch/i386/boot/cpu.c
> @@ -0,0 +1,69 @@
> +/* -*- linux-c -*- --- *
> + *
> + *   Copyright (C) 1991, 1992 Linus Torvalds
> + *   Copyright 2007 rPath, Inc. - All Rights Reserved
> + *
> + *   This file is part of the Linux kernel, and is made available under
> + *   the terms of the GNU General Public License version 2.
> + *
> + * --- */
> +
> +/*
> + * arch/i386/boot/cpucheck.c

^
cpu.c ?

> + *
> + * Check for obligatory CPU features and abort if the features are not
> + * present.
> + */
> +
> +#include "boot.h"
> +#include "bitops.h"
> +#include 
> +
> +static char *cpu_name(int level)
> +{
> + static char buf[6];
> +
> + if (level == 64) {
> + return "x86-64";
> + } else {
> + sprintf(buf, "i%d86", level);
> + return buf;
> + }
> +}
> +
> +int validate_cpu(void)
> +{
> + u32 *err_flags;
> + int cpu_level, req_level;
> +
> + check_cpu(_level, _level, _flags);
> +
> + if (cpu_level < req_level) {
> + printf("This kernel requires an %s CPU, ",
> +cpu_name(req_level));
> + printf("but only detected an %s CPU.\n",
> +cpu_name(cpu_level));
> + return -1;
> + }
> +
> + if (err_flags) {
> + int i, j;
> + puts("This kernel requires the following features "
> +  "not present on the CPU:\n");
> +
> + for (i = 0; i < NCAPINTS; i++) {
> + u32 e = err_flags[i];
> +
> + for (j = 0; j < 32; j++) {
> + if (e & 1)
> + printf("%d:%d ", i, j);
> +
> + e >>= 1;
> + }
> + }
> + putchar('\n');
> + return -1;
> + } else {
> + return 0;
> + }
> +}
> diff --git a/arch/i386/boot/cpucheck.c b/arch/i386/boot/cpucheck.c
> new file mode 100644
> index 000..431bef8
> --- /dev/null
> +++ b/arch/i386/boot/cpucheck.c
> @@ -0,0 +1,267 @@
> +/* -*- linux-c -*- --- *
> + *
> + *   Copyright (C) 1991, 1992 Linus Torvalds
> + *   Copyright 2007 rPath, Inc. - All Rights Reserved
> + *
> + *   This file is part of the Linux kernel, and is made available under
> + *   the terms of the GNU General Public License version 2.
> + *
> + * --- */
> +
> +/*
> + * arch/i386/boot/cpu.c

^
cpucheck.c ?

> + *
> + * Check for obligatory CPU features and abort if the features are not
> + * present.  This code should be compilable as 16-, 32- or 64-bit
> + * code, so be very careful with types and inline assembly.
> + *
> + * This code should not contain any messages; that requires an
> + * additional wrapper.
> + *


  Sébastien.


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [x86 setup 22/33] CPU features verification for the new x86 setup code

2007-07-10 Thread Sébastien Dugué

  Hi Peter,

  Only two sub-minor nits:

On Mon, 9 Jul 2007 19:52:01 -0700 H. Peter Anvin [EMAIL PROTECTED] wrote:

 From: H. Peter Anvin [EMAIL PROTECTED]
 
 Verify that the CPU has enough features to run the kernel.  This may
 entail enabling features on some CPUs.
 
 By doing this in the setup code we can be guaranteed to still be able to
 write to the console through the BIOS.
 
 Signed-off-by: H. Peter Anvin [EMAIL PROTECTED]
 ---
  arch/i386/boot/cpu.c  |   69 
  arch/i386/boot/cpucheck.c |  267 
 +
  2 files changed, 336 insertions(+), 0 deletions(-)
  create mode 100644 arch/i386/boot/cpu.c
  create mode 100644 arch/i386/boot/cpucheck.c
 
 diff --git a/arch/i386/boot/cpu.c b/arch/i386/boot/cpu.c
 new file mode 100644
 index 000..042d894
 --- /dev/null
 +++ b/arch/i386/boot/cpu.c
 @@ -0,0 +1,69 @@
 +/* -*- linux-c -*- --- *
 + *
 + *   Copyright (C) 1991, 1992 Linus Torvalds
 + *   Copyright 2007 rPath, Inc. - All Rights Reserved
 + *
 + *   This file is part of the Linux kernel, and is made available under
 + *   the terms of the GNU General Public License version 2.
 + *
 + * --- */
 +
 +/*
 + * arch/i386/boot/cpucheck.c

^
cpu.c ?

 + *
 + * Check for obligatory CPU features and abort if the features are not
 + * present.
 + */
 +
 +#include boot.h
 +#include bitops.h
 +#include asm/cpufeature.h
 +
 +static char *cpu_name(int level)
 +{
 + static char buf[6];
 +
 + if (level == 64) {
 + return x86-64;
 + } else {
 + sprintf(buf, i%d86, level);
 + return buf;
 + }
 +}
 +
 +int validate_cpu(void)
 +{
 + u32 *err_flags;
 + int cpu_level, req_level;
 +
 + check_cpu(cpu_level, req_level, err_flags);
 +
 + if (cpu_level  req_level) {
 + printf(This kernel requires an %s CPU, ,
 +cpu_name(req_level));
 + printf(but only detected an %s CPU.\n,
 +cpu_name(cpu_level));
 + return -1;
 + }
 +
 + if (err_flags) {
 + int i, j;
 + puts(This kernel requires the following features 
 +  not present on the CPU:\n);
 +
 + for (i = 0; i  NCAPINTS; i++) {
 + u32 e = err_flags[i];
 +
 + for (j = 0; j  32; j++) {
 + if (e  1)
 + printf(%d:%d , i, j);
 +
 + e = 1;
 + }
 + }
 + putchar('\n');
 + return -1;
 + } else {
 + return 0;
 + }
 +}
 diff --git a/arch/i386/boot/cpucheck.c b/arch/i386/boot/cpucheck.c
 new file mode 100644
 index 000..431bef8
 --- /dev/null
 +++ b/arch/i386/boot/cpucheck.c
 @@ -0,0 +1,267 @@
 +/* -*- linux-c -*- --- *
 + *
 + *   Copyright (C) 1991, 1992 Linus Torvalds
 + *   Copyright 2007 rPath, Inc. - All Rights Reserved
 + *
 + *   This file is part of the Linux kernel, and is made available under
 + *   the terms of the GNU General Public License version 2.
 + *
 + * --- */
 +
 +/*
 + * arch/i386/boot/cpu.c

^
cpucheck.c ?

 + *
 + * Check for obligatory CPU features and abort if the features are not
 + * present.  This code should be compilable as 16-, 32- or 64-bit
 + * code, so be very careful with types and inline assembly.
 + *
 + * This code should not contain any messages; that requires an
 + * additional wrapper.
 + *


  Sébastien.


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: New systems: eu.kernel.org

2007-07-05 Thread Sébastien Dugué
On Wed, 04 Jul 2007 12:51:16 -0700 "H. Peter Anvin" <[EMAIL PROTECTED]> wrote:

> Sébastien Dugué wrote:
> >   
> >   One last thing, is there a git server as well for eu?
> > 
> 
> FWIW, I've set up git.eu.kernel.org; right now it's an alias to
> git.kernel.org, but that might change in the future.

  OK, thanks, will update my pointers.

  Sébastien.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: New systems: eu.kernel.org

2007-07-05 Thread Sébastien Dugué
On Wed, 04 Jul 2007 12:51:16 -0700 H. Peter Anvin [EMAIL PROTECTED] wrote:

 Sébastien Dugué wrote:

One last thing, is there a git server as well for eu?
  
 
 FWIW, I've set up git.eu.kernel.org; right now it's an alias to
 git.kernel.org, but that might change in the future.

  OK, thanks, will update my pointers.

  Sébastien.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: New systems: eu.kernel.org

2007-07-04 Thread Sébastien Dugué
On Tue, 03 Jul 2007 16:16:01 -0700 "H. Peter Anvin" <[EMAIL PROTECTED]> wrote:

> Hi all,
> 
> Thanks to HP, ISC and Umeå Universitet, kernel.org now has official
> servers in Europe.  Specifically, we have www1.eu.kernel.org hosted at
> ISC Amsterdam, and www2.eu.kernel.org at UMU (Umeå, Sweden.)  They are
> collectively accessible as www.eu.kernel.org.
> 
> This is a full service of kernel.org and not a mirror.  In the future,
> we hope to add some sort of smart DNS to automatically redirect traffic,
> as well as additional services in Europe.
> 

  First, thanks a lot to all those involved in this.

  Just a few things at the top of the main page:

- the http, ftp and rsync links still point at kernel.org, as
  the content is also at eu.kernel.org maybe change these.

- idem for the 'prepatch', 'snapshot' and '-mm patch' links

- idem for the 'current changesets' 'C' links

  
  One last thing, is there a git server as well for eu?


  Thanks again,

  Sébastien.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: New systems: eu.kernel.org

2007-07-04 Thread Sébastien Dugué
On Tue, 03 Jul 2007 16:16:01 -0700 H. Peter Anvin [EMAIL PROTECTED] wrote:

 Hi all,
 
 Thanks to HP, ISC and Umeå Universitet, kernel.org now has official
 servers in Europe.  Specifically, we have www1.eu.kernel.org hosted at
 ISC Amsterdam, and www2.eu.kernel.org at UMU (Umeå, Sweden.)  They are
 collectively accessible as www.eu.kernel.org.
 
 This is a full service of kernel.org and not a mirror.  In the future,
 we hope to add some sort of smart DNS to automatically redirect traffic,
 as well as additional services in Europe.
 

  First, thanks a lot to all those involved in this.

  Just a few things at the top of the main page:

- the http, ftp and rsync links still point at kernel.org, as
  the content is also at eu.kernel.org maybe change these.

- idem for the 'prepatch', 'snapshot' and '-mm patch' links

- idem for the 'current changesets' 'C' links

  
  One last thing, is there a git server as well for eu?


  Thanks again,

  Sébastien.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch-mm 19/23] x86_64: Convert to cleckevents

2007-06-11 Thread Sébastien Dugué

  Hi Thomas,

On Sun, 10 Jun 2007 09:44:18 - Thomas Gleixner <[EMAIL PROTECTED]> wrote:

> Convert x86_64 to the clockevents code. Share code with i386 for
> hpet and PIT.
> 
> Build and whitespace fixups from:
>  Venki Pallipadi <[EMAIL PROTECTED]>
> and
>  Chris Wright <[EMAIL PROTECTED]>


  It seems that the hpet clocksource's ->vread has been lost in the conversion.
Is this on purpose?

  The patch below fixes it for x86_64 on 2.6.21.3-rt9. If you want it rebased
on -mm let me know.

  Thanks,

  Sébastien.

---

  It seems the hpet clocksource's vread method was lost in the x86_64 conversion
to clockevent. So here it is.

Signed-off-by: Sébastien Dugué <[EMAIL PROTECTED]>

 arch/i386/kernel/hpet.c |   10 ++
 1 file changed, 10 insertions(+)

Index: linux-2.6.21.3-rt9/arch/i386/kernel/hpet.c
===
--- linux-2.6.21.3-rt9.orig/arch/i386/kernel/hpet.c 2007-06-07 
15:09:41.0 +0200
+++ linux-2.6.21.3-rt9/arch/i386/kernel/hpet.c  2007-06-07 15:50:50.0 
+0200
@@ -313,6 +313,13 @@ static cycle_t notrace read_hpet(void)
return (cycle_t)hpet_readl(HPET_COUNTER);
 }
 
+#ifdef CONFIG_X86_64
+static notrace cycle_t __vsyscall_fn vread_hpet(void)
+{
+   return readl((const void __iomem *)fix_to_virt(VSYSCALL_HPET) + 0xf0);
+}
+#endif
+
 static struct clocksource clocksource_hpet = {
.name   = "hpet",
.rating = 250,
@@ -321,6 +328,9 @@ static struct clocksource clocksource_hp
.shift  = HPET_SHIFT,
.flags  = CLOCK_SOURCE_IS_CONTINUOUS,
.resume = hpet_restart_counter,
+#ifdef CONFIG_X86_64
+   .vread  = vread_hpet,
+#endif
 };
 
 static int hpet_clocksource_register(void)

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch-mm 19/23] x86_64: Convert to cleckevents

2007-06-11 Thread Sébastien Dugué

  Hi Thomas,

On Sun, 10 Jun 2007 09:44:18 - Thomas Gleixner [EMAIL PROTECTED] wrote:

 Convert x86_64 to the clockevents code. Share code with i386 for
 hpet and PIT.
 
 Build and whitespace fixups from:
  Venki Pallipadi [EMAIL PROTECTED]
 and
  Chris Wright [EMAIL PROTECTED]


  It seems that the hpet clocksource's -vread has been lost in the conversion.
Is this on purpose?

  The patch below fixes it for x86_64 on 2.6.21.3-rt9. If you want it rebased
on -mm let me know.

  Thanks,

  Sébastien.

---

  It seems the hpet clocksource's vread method was lost in the x86_64 conversion
to clockevent. So here it is.

Signed-off-by: Sébastien Dugué [EMAIL PROTECTED]

 arch/i386/kernel/hpet.c |   10 ++
 1 file changed, 10 insertions(+)

Index: linux-2.6.21.3-rt9/arch/i386/kernel/hpet.c
===
--- linux-2.6.21.3-rt9.orig/arch/i386/kernel/hpet.c 2007-06-07 
15:09:41.0 +0200
+++ linux-2.6.21.3-rt9/arch/i386/kernel/hpet.c  2007-06-07 15:50:50.0 
+0200
@@ -313,6 +313,13 @@ static cycle_t notrace read_hpet(void)
return (cycle_t)hpet_readl(HPET_COUNTER);
 }
 
+#ifdef CONFIG_X86_64
+static notrace cycle_t __vsyscall_fn vread_hpet(void)
+{
+   return readl((const void __iomem *)fix_to_virt(VSYSCALL_HPET) + 0xf0);
+}
+#endif
+
 static struct clocksource clocksource_hpet = {
.name   = hpet,
.rating = 250,
@@ -321,6 +328,9 @@ static struct clocksource clocksource_hp
.shift  = HPET_SHIFT,
.flags  = CLOCK_SOURCE_IS_CONTINUOUS,
.resume = hpet_restart_counter,
+#ifdef CONFIG_X86_64
+   .vread  = vread_hpet,
+#endif
 };
 
 static int hpet_clocksource_register(void)

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: checkpatch, a patch checking script.

2007-05-03 Thread Sébastien Dugué
On Wed, 2 May 2007 16:28:27 +0200 (CEST) Geert Uytterhoeven <[EMAIL PROTECTED]> 
wrote:


> + $warnings += search(qr/__FUNCION__/,

   ^__FUNCTION__ maybe?

> + "Should use C99 __func__ instead of GNU 
> __FUNCTION__\n");


  Sébastien.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: checkpatch, a patch checking script.

2007-05-03 Thread Sébastien Dugué
On Wed, 2 May 2007 16:28:27 +0200 (CEST) Geert Uytterhoeven [EMAIL PROTECTED] 
wrote:


 + $warnings += search(qr/__FUNCION__/,

   ^__FUNCTION__ maybe?

 + Should use C99 __func__ instead of GNU 
 __FUNCTION__\n);


  Sébastien.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] i386 GDT cleanups: Rename boot_gdt_table to boot_gdt

2007-03-22 Thread Sébastien Dugué

  Rusty,

On Thu, 22 Mar 2007 20:24:10 +1100 Rusty Russell <[EMAIL PROTECTED]> wrote:

> On Thu, 2007-03-22 at 09:10 +0100, Sébastien Dugué wrote:
> >   Why not take on the opportunity to rename boot_gt_table to boot_gtd,
> > to avoid the duplicate T(able)?
> 
> That's not a bad idea, but IMHO deserves its own patch.
> 
> I look forward to it!
> Rusty.
> 
> 

  Here it is, boot tested on an SMP x86 box.

  Sébastien.



 Rename boot_gdt_table to boot_gdt to avoid the duplicate
T(able).

Signed-off-by: Sébastien Dugué <[EMAIL PROTECTED]>

---
 arch/i386/kernel/head.S   |9 -
 arch/i386/kernel/trampoline.S |   12 ++--
 2 files changed, 10 insertions(+), 11 deletions(-)

Index: linux-2.6/arch/i386/kernel/head.S
===
--- linux-2.6.orig/arch/i386/kernel/head.S  2007-03-22 15:09:02.0 
+0100
+++ linux-2.6/arch/i386/kernel/head.S   2007-03-22 16:14:38.0 +0100
@@ -147,8 +147,7 @@ page_pde_offset = (__PAGE_OFFSET >> 20);
 /*
  * Non-boot CPU entry point; entered from trampoline.S
  * We can't lgdt here, because lgdt itself uses a data segment, but
- * we know the trampoline has already loaded the boot_gdt_table GDT
- * for us.
+ * we know the trampoline has already loaded the boot_gdt for us.
  *
  * If cpu hotplug is not supported then this code can go in init section
  * which will be freed later
@@ -588,7 +587,7 @@ fault_msg:
.word 0 # 32 bit align gdt_desc.address
 boot_gdt_descr:
.word __BOOT_DS+7
-   .long boot_gdt_table - __PAGE_OFFSET
+   .long boot_gdt - __PAGE_OFFSET
 
.word 0 # 32-bit align idt_desc.address
 idt_descr:
@@ -602,11 +601,11 @@ ENTRY(early_gdt_descr)
.long cpu_gdt_table
 
 /*
- * The boot_gdt_table must mirror the equivalent in setup.S and is
+ * The boot_gdt must mirror the equivalent in setup.S and is
  * used only for booting.
  */
.align L1_CACHE_BYTES
-ENTRY(boot_gdt_table)
+ENTRY(boot_gdt)
.fill GDT_ENTRY_BOOT_CS,8,0
.quad 0x00cf9a00/* kernel 4GB code at 0x */
.quad 0x00cf9200/* kernel 4GB data at 0x */
Index: linux-2.6/arch/i386/kernel/trampoline.S
===
--- linux-2.6.orig/arch/i386/kernel/trampoline.S2007-03-22 
15:09:02.0 +0100
+++ linux-2.6/arch/i386/kernel/trampoline.S 2007-03-22 16:13:23.0 
+0100
@@ -29,7 +29,7 @@
  *
  * TYPE  VALUE
  * R_386_32  startup_32_smp
- * R_386_32  boot_gdt_table
+ * R_386_32  boot_gdt
  */
 
 #include 
@@ -62,8 +62,8 @@ r_base = .
 * to 32 bit.
 */
 
-   lidtl   boot_idt - r_base   # load idt with 0, 0
-   lgdtl   boot_gdt - r_base   # load gdt with whatever is appropriate
+   lidtl   boot_idt_descr - r_base # load idt with 0, 0
+   lgdtl   boot_gdt_descr - r_base # load gdt with whatever is appropriate
 
xor %ax, %ax
inc %ax # protected mode (PE) bit
@@ -73,11 +73,11 @@ r_base = .
 
# These need to be in the same 64K segment as the above;
# hence we don't use the boot_gdt_descr defined in head.S
-boot_gdt:
+boot_gdt_descr:
.word   __BOOT_DS + 7   # gdt limit
-   .long   boot_gdt_table-__PAGE_OFFSET# gdt base
+   .long   boot_gdt - __PAGE_OFFSET# gdt base
 
-boot_idt:
+boot_idt_descr:
.word   0   # idt limit = 0
.long   0   # idt base = 0L
 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/4] i386 GDT cleanups: Use per-cpu GDT immediately upon boot

2007-03-22 Thread Sébastien Dugué

  Hi Rusty,

On Thu, 22 Mar 2007 10:58:30 +1100 Rusty Russell <[EMAIL PROTECTED]> wrote:

> On Wed, 2007-03-21 at 10:51 -0600, Eric W. Biederman wrote:
> > Rusty Russell <[EMAIL PROTECTED]> writes:
> > 
> > > On Wed, 2007-03-21 at 03:31 -0600, Eric W. Biederman wrote:
> > >> Rusty Russell <[EMAIL PROTECTED]> writes:
> > >> > -/*
> > >> > - * The boot_gdt_table must mirror the equivalent in setup.S and is
> > >> > - * used only for booting.
> > >> > - */
> > >> 
> > >> It looks like you are killing a useful comment here for no good reason.
> > >
> > > Hi Eric,
> > >
> > >   I think one has to look harder, then.  There is no "equivalent in
> > > setup.S": there is no setup.S, and it's certainly not clear what GDT
> > > this "must mirror": it doesn't mirror any GDT at the moment.
> > 
> > see the gdt in:
> > arch/i386/boot/setup.S
> 
> Erk, what a dumb mistake.  Apologies for my snarky comment above 8(
> 
> > If anything the comment should read these values are fixed by the boot
> > protocol and we can't change them.
> 
> Since lguest doesn't use setup.S, it's outside my experience.  I'll just
> leave the comment, and try to pretend this never happened 8)
> 
> Thanks muchly,
> Rusty.
> ==
> Now we are no longer dynamically allocating the GDT, we don't need the
> "cpu_gdt_table" at all: we can switch straight from "boot_gdt_table"
> to the per-cpu GDT.  This means initializing the cpu_gdt array in C.


  Why not take on the opportunity to rename boot_gt_table to boot_gtd,
to avoid the duplicate T(able)?

  Sébastien.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/4] i386 GDT cleanups: Use per-cpu GDT immediately upon boot

2007-03-22 Thread Sébastien Dugué

  Hi Rusty,

On Thu, 22 Mar 2007 10:58:30 +1100 Rusty Russell [EMAIL PROTECTED] wrote:

 On Wed, 2007-03-21 at 10:51 -0600, Eric W. Biederman wrote:
  Rusty Russell [EMAIL PROTECTED] writes:
  
   On Wed, 2007-03-21 at 03:31 -0600, Eric W. Biederman wrote:
   Rusty Russell [EMAIL PROTECTED] writes:
-/*
- * The boot_gdt_table must mirror the equivalent in setup.S and is
- * used only for booting.
- */
   
   It looks like you are killing a useful comment here for no good reason.
  
   Hi Eric,
  
 I think one has to look harder, then.  There is no equivalent in
   setup.S: there is no setup.S, and it's certainly not clear what GDT
   this must mirror: it doesn't mirror any GDT at the moment.
  
  see the gdt in:
  arch/i386/boot/setup.S
 
 Erk, what a dumb mistake.  Apologies for my snarky comment above 8(
 
  If anything the comment should read these values are fixed by the boot
  protocol and we can't change them.
 
 Since lguest doesn't use setup.S, it's outside my experience.  I'll just
 leave the comment, and try to pretend this never happened 8)
 
 Thanks muchly,
 Rusty.
 ==
 Now we are no longer dynamically allocating the GDT, we don't need the
 cpu_gdt_table at all: we can switch straight from boot_gdt_table
 to the per-cpu GDT.  This means initializing the cpu_gdt array in C.


  Why not take on the opportunity to rename boot_gt_table to boot_gtd,
to avoid the duplicate T(able)?

  Sébastien.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] i386 GDT cleanups: Rename boot_gdt_table to boot_gdt

2007-03-22 Thread Sébastien Dugué

  Rusty,

On Thu, 22 Mar 2007 20:24:10 +1100 Rusty Russell [EMAIL PROTECTED] wrote:

 On Thu, 2007-03-22 at 09:10 +0100, Sébastien Dugué wrote:
Why not take on the opportunity to rename boot_gt_table to boot_gtd,
  to avoid the duplicate T(able)?
 
 That's not a bad idea, but IMHO deserves its own patch.
 
 I look forward to it!
 Rusty.
 
 

  Here it is, boot tested on an SMP x86 box.

  Sébastien.



 Rename boot_gdt_table to boot_gdt to avoid the duplicate
T(able).

Signed-off-by: Sébastien Dugué [EMAIL PROTECTED]

---
 arch/i386/kernel/head.S   |9 -
 arch/i386/kernel/trampoline.S |   12 ++--
 2 files changed, 10 insertions(+), 11 deletions(-)

Index: linux-2.6/arch/i386/kernel/head.S
===
--- linux-2.6.orig/arch/i386/kernel/head.S  2007-03-22 15:09:02.0 
+0100
+++ linux-2.6/arch/i386/kernel/head.S   2007-03-22 16:14:38.0 +0100
@@ -147,8 +147,7 @@ page_pde_offset = (__PAGE_OFFSET  20);
 /*
  * Non-boot CPU entry point; entered from trampoline.S
  * We can't lgdt here, because lgdt itself uses a data segment, but
- * we know the trampoline has already loaded the boot_gdt_table GDT
- * for us.
+ * we know the trampoline has already loaded the boot_gdt for us.
  *
  * If cpu hotplug is not supported then this code can go in init section
  * which will be freed later
@@ -588,7 +587,7 @@ fault_msg:
.word 0 # 32 bit align gdt_desc.address
 boot_gdt_descr:
.word __BOOT_DS+7
-   .long boot_gdt_table - __PAGE_OFFSET
+   .long boot_gdt - __PAGE_OFFSET
 
.word 0 # 32-bit align idt_desc.address
 idt_descr:
@@ -602,11 +601,11 @@ ENTRY(early_gdt_descr)
.long cpu_gdt_table
 
 /*
- * The boot_gdt_table must mirror the equivalent in setup.S and is
+ * The boot_gdt must mirror the equivalent in setup.S and is
  * used only for booting.
  */
.align L1_CACHE_BYTES
-ENTRY(boot_gdt_table)
+ENTRY(boot_gdt)
.fill GDT_ENTRY_BOOT_CS,8,0
.quad 0x00cf9a00/* kernel 4GB code at 0x */
.quad 0x00cf9200/* kernel 4GB data at 0x */
Index: linux-2.6/arch/i386/kernel/trampoline.S
===
--- linux-2.6.orig/arch/i386/kernel/trampoline.S2007-03-22 
15:09:02.0 +0100
+++ linux-2.6/arch/i386/kernel/trampoline.S 2007-03-22 16:13:23.0 
+0100
@@ -29,7 +29,7 @@
  *
  * TYPE  VALUE
  * R_386_32  startup_32_smp
- * R_386_32  boot_gdt_table
+ * R_386_32  boot_gdt
  */
 
 #include linux/linkage.h
@@ -62,8 +62,8 @@ r_base = .
 * to 32 bit.
 */
 
-   lidtl   boot_idt - r_base   # load idt with 0, 0
-   lgdtl   boot_gdt - r_base   # load gdt with whatever is appropriate
+   lidtl   boot_idt_descr - r_base # load idt with 0, 0
+   lgdtl   boot_gdt_descr - r_base # load gdt with whatever is appropriate
 
xor %ax, %ax
inc %ax # protected mode (PE) bit
@@ -73,11 +73,11 @@ r_base = .
 
# These need to be in the same 64K segment as the above;
# hence we don't use the boot_gdt_descr defined in head.S
-boot_gdt:
+boot_gdt_descr:
.word   __BOOT_DS + 7   # gdt limit
-   .long   boot_gdt_table-__PAGE_OFFSET# gdt base
+   .long   boot_gdt - __PAGE_OFFSET# gdt base
 
-boot_idt:
+boot_idt_descr:
.word   0   # idt limit = 0
.long   0   # idt base = 0L
 
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.21-rc2-mm2

2007-03-07 Thread Sébastien Dugué
On Wed, 7 Mar 2007 00:49:19 -0800 Andrew Morton <[EMAIL PROTECTED]> wrote:

> On Wed, 7 Mar 2007 09:39:48 +0100 Sébastien Dugué <[EMAIL PROTECTED]> wrote:
> 
> > 
> >   Hi Andrew,
> > 
> > On Tue, 6 Mar 2007 00:44:08 -0800 Andrew Morton <[EMAIL PROTECTED]> wrote:
> > 
> > > 
> > > Temporarily at
> > > 
> > >   http://userweb.kernel.org/~akpm/2.6.21-rc2-mm2/
> > > 
> > > Will appear later at
> > > 
> > >   
> > > ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.20-rc2/2.6.20-rc2-mm2/
> > > 
> > > 
> > > - git-block.patch is having problems which are getting in the way - it has
> > >   been dropped.
> > > 
> > > - As a consequence all the AIO patches were dropped
> > > 
> > 
> >   Why? The aio notification and listio patches have nothing to do with
> > git-block.patch.
> 
> Yes, I could have retained those with a little bit of jiggling.  But...
> 
> >   If you think those patches will be rendered obsolete by the new
> > syslet/fibril/whatever approach, then fine.
> 
> Until this is sorted out I don't think we can add new core AIO code. 
> Particularly not new syscalls.

  Makes sense.

> 
> > Otherwise do you expect me to
> > resubmit?
> 
> Is OK for now, I think.

  Ok, thanks for the good job.

  Sébastien.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.21-rc2-mm2

2007-03-07 Thread Sébastien Dugué

  Hi Andrew,

On Tue, 6 Mar 2007 00:44:08 -0800 Andrew Morton <[EMAIL PROTECTED]> wrote:

> 
> Temporarily at
> 
>   http://userweb.kernel.org/~akpm/2.6.21-rc2-mm2/
> 
> Will appear later at
> 
>   
> ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.20-rc2/2.6.20-rc2-mm2/
> 
> 
> - git-block.patch is having problems which are getting in the way - it has
>   been dropped.
> 
> - As a consequence all the AIO patches were dropped
> 

  Why? The aio notification and listio patches have nothing to do with
git-block.patch.

  If you think those patches will be rendered obsolete by the new
syslet/fibril/whatever approach, then fine. Otherwise do you expect me to
resubmit?

  Thanks,

  Sébastien.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.21-rc2-mm2

2007-03-07 Thread Sébastien Dugué

  Hi Andrew,

On Tue, 6 Mar 2007 00:44:08 -0800 Andrew Morton [EMAIL PROTECTED] wrote:

 
 Temporarily at
 
   http://userweb.kernel.org/~akpm/2.6.21-rc2-mm2/
 
 Will appear later at
 
   
 ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.20-rc2/2.6.20-rc2-mm2/
 
 
 - git-block.patch is having problems which are getting in the way - it has
   been dropped.
 
 - As a consequence all the AIO patches were dropped
 

  Why? The aio notification and listio patches have nothing to do with
git-block.patch.

  If you think those patches will be rendered obsolete by the new
syslet/fibril/whatever approach, then fine. Otherwise do you expect me to
resubmit?

  Thanks,

  Sébastien.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.21-rc2-mm2

2007-03-07 Thread Sébastien Dugué
On Wed, 7 Mar 2007 00:49:19 -0800 Andrew Morton [EMAIL PROTECTED] wrote:

 On Wed, 7 Mar 2007 09:39:48 +0100 Sébastien Dugué [EMAIL PROTECTED] wrote:
 
  
Hi Andrew,
  
  On Tue, 6 Mar 2007 00:44:08 -0800 Andrew Morton [EMAIL PROTECTED] wrote:
  
   
   Temporarily at
   
 http://userweb.kernel.org/~akpm/2.6.21-rc2-mm2/
   
   Will appear later at
   
 
   ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.20-rc2/2.6.20-rc2-mm2/
   
   
   - git-block.patch is having problems which are getting in the way - it has
 been dropped.
   
   - As a consequence all the AIO patches were dropped
   
  
Why? The aio notification and listio patches have nothing to do with
  git-block.patch.
 
 Yes, I could have retained those with a little bit of jiggling.  But...
 
If you think those patches will be rendered obsolete by the new
  syslet/fibril/whatever approach, then fine.
 
 Until this is sorted out I don't think we can add new core AIO code. 
 Particularly not new syscalls.

  Makes sense.

 
  Otherwise do you expect me to
  resubmit?
 
 Is OK for now, I think.

  Ok, thanks for the good job.

  Sébastien.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: git issues while fetching from http://www.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git/

2007-02-13 Thread Sébastien Dugué

  Hi Laurent,


On Tue, 13 Feb 2007 10:29:59 +0100 Laurent Pinchart <[EMAIL PROTECTED]> wrote:

> Hi everybody,
> 
> I'm trying to fetch from the Linux kernel repository at 
> http://www.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git/. After 
> downloading the objects, git failed while trying to fetch tags. Retrying the 
> fetch operation gives the following output:
> 
> $ git-fetch origin
> Fetching refs/heads/master from 
> http://www.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git/ using 
> http
> Auto-following refs/tags/v2.6.20
> Auto-following refs/tags/v2.6.20
> Auto-following refs/tags/v2.6.20-rc1
> Auto-following refs/tags/v2.6.20-rc1
> Auto-following refs/tags/v2.6.20-rc2
> Auto-following refs/tags/v2.6.20-rc2
> Auto-following refs/tags/v2.6.20-rc3
> Auto-following refs/tags/v2.6.20-rc3
> Auto-following refs/tags/v2.6.20-rc4
> Auto-following refs/tags/v2.6.20-rc4
> Auto-following refs/tags/v2.6.20-rc5
> Auto-following refs/tags/v2.6.20-rc5
> Auto-following refs/tags/v2.6.20-rc6
> Auto-following refs/tags/v2.6.20-rc6
> Auto-following refs/tags/v2.6.20-rc7
> Auto-following refs/tags/v2.6.20-rc7
> Failed to fetch refs/tags/v2.6.20 from 
> http://www.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git/
> 
> While trying to solve the issue, I noticed that the refs/tags directory in 
> the 
> remote repository is empty. Is that a normal situation ? Should git be able 
> to regenerate the tags locally from the objects ? I'm using git 1.4.4.4.

  I had the same issue. Upgrade git. the refs are now packed
in a single file packed-refs.

  Sébastien.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: git issues while fetching from http://www.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git/

2007-02-13 Thread Sébastien Dugué

  Hi Laurent,


On Tue, 13 Feb 2007 10:29:59 +0100 Laurent Pinchart [EMAIL PROTECTED] wrote:

 Hi everybody,
 
 I'm trying to fetch from the Linux kernel repository at 
 http://www.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git/. After 
 downloading the objects, git failed while trying to fetch tags. Retrying the 
 fetch operation gives the following output:
 
 $ git-fetch origin
 Fetching refs/heads/master from 
 http://www.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git/ using 
 http
 Auto-following refs/tags/v2.6.20
 Auto-following refs/tags/v2.6.20
 Auto-following refs/tags/v2.6.20-rc1
 Auto-following refs/tags/v2.6.20-rc1
 Auto-following refs/tags/v2.6.20-rc2
 Auto-following refs/tags/v2.6.20-rc2
 Auto-following refs/tags/v2.6.20-rc3
 Auto-following refs/tags/v2.6.20-rc3
 Auto-following refs/tags/v2.6.20-rc4
 Auto-following refs/tags/v2.6.20-rc4
 Auto-following refs/tags/v2.6.20-rc5
 Auto-following refs/tags/v2.6.20-rc5
 Auto-following refs/tags/v2.6.20-rc6
 Auto-following refs/tags/v2.6.20-rc6
 Auto-following refs/tags/v2.6.20-rc7
 Auto-following refs/tags/v2.6.20-rc7
 Failed to fetch refs/tags/v2.6.20 from 
 http://www.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git/
 
 While trying to solve the issue, I noticed that the refs/tags directory in 
 the 
 remote repository is empty. Is that a normal situation ? Should git be able 
 to regenerate the tags locally from the objects ? I'm using git 1.4.4.4.

  I had the same issue. Upgrade git. the refs are now packed
in a single file packed-refs.

  Sébastien.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: -mm merge plans for 2.6.21

2007-02-09 Thread Sébastien Dugué
On Fri, 9 Feb 2007 01:05:57 -0800 Andrew Morton <[EMAIL PROTECTED]> wrote:

> On Fri, 9 Feb 2007 09:26:11 +0100 Sébastien Dugué <[EMAIL PROTECTED]> wrote:
> 
> > On Fri, 9 Feb 2007 11:02:28 +0530 "Bharata B Rao" <[EMAIL PROTECTED]> wrote:
> > 
> > > Andrew,
> > > 
> > > On 2/9/07, Andrew Morton <[EMAIL PROTECTED]> wrote:
> > > > fsaio-add-a-wait-queue-arg-to-the-wait_bit-action-routine.patch
> > > > fsaio-add-a-wait-queue-arg-to-the-wait_bit-action-routine-gfs2-fix.patch
> > > > fsaio-rename-__lock_page-to-lock_page_blocking.patch
> > > > fsaio-interfaces-to-initialize-and-to-test-a-wait-bit-key.patch
> > > > fsaio-add-a-default-io-wait-bit-field-in-task-struct.patch
> > > > fsaio-enable-wait-bit-based-filtered-wakeups-to-work-for-aio.patch
> > > > fsaio-enable-wait-bit-based-filtered-wakeups-to-work-for-aio-fix.patch
> > > > fsaio-enable-wait-bit-based-filtered-wakeups-to-work-for-aio-fix-sparse-fix.patch
> > > > fsaio-enable-asynchronous-wait-page-and-lock-page.patch
> > > > fsaio-filesystem-aio-read.patch
> > > > fsaio-aio-o_sync-filesystem-write.patch
> > > >
> > > >  Will wait to see what happens with febrils.
> > > >
> > > > aio-is-unlikely.patch
> > > >
> > > >  Will merge.
> > > >
> > > > rework-compat_sys_io_submit.patch
> > > > fix-aioh-includes.patch
> > > > fix-access_ok-checks.patch
> > > > make-good_sigevent-non-static.patch
> > > > make-good_sigevent-non-static-fix.patch
> > > > make-__sigqueue_free-and.patch
> > > > aio-completion-signal-notification.patch
> > > > aio-completion-signal-notification-fix.patch
> > > > aio-completion-signal-notification-fixes-and-cleanups.patch
> > > > aio-completion-signal-notification-small-cleanup.patch
> > > > add-listio-syscall-support.patch
> > > >
> > > >  I guess these are dependent upon fsaio.
> > > 
> > > No. AIO completion signal notification patches and listio patch work
> > > independently of fsaio patches. Just that for buffered AIO case, the
> > > fsaio patches will make the behaviour truly asynchronous.
> > > 
> > > Regards,
> > > Bharata.
> > 
> >   I concur.
> > 
> 
> But is the proposed listio implementation still relevant if we go and add
> the async syscalls? 

  I think the listio syscall could be emulated in userspace using async
syscalls.

  Even more, the whole POSIX AIO functions could benefit from this, rendering
aio.c obsolete.

  Sébastien.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: -mm merge plans for 2.6.21

2007-02-09 Thread Sébastien Dugué
On Fri, 9 Feb 2007 11:02:28 +0530 "Bharata B Rao" <[EMAIL PROTECTED]> wrote:

> Andrew,
> 
> On 2/9/07, Andrew Morton <[EMAIL PROTECTED]> wrote:
> > fsaio-add-a-wait-queue-arg-to-the-wait_bit-action-routine.patch
> > fsaio-add-a-wait-queue-arg-to-the-wait_bit-action-routine-gfs2-fix.patch
> > fsaio-rename-__lock_page-to-lock_page_blocking.patch
> > fsaio-interfaces-to-initialize-and-to-test-a-wait-bit-key.patch
> > fsaio-add-a-default-io-wait-bit-field-in-task-struct.patch
> > fsaio-enable-wait-bit-based-filtered-wakeups-to-work-for-aio.patch
> > fsaio-enable-wait-bit-based-filtered-wakeups-to-work-for-aio-fix.patch
> > fsaio-enable-wait-bit-based-filtered-wakeups-to-work-for-aio-fix-sparse-fix.patch
> > fsaio-enable-asynchronous-wait-page-and-lock-page.patch
> > fsaio-filesystem-aio-read.patch
> > fsaio-aio-o_sync-filesystem-write.patch
> >
> >  Will wait to see what happens with febrils.
> >
> > aio-is-unlikely.patch
> >
> >  Will merge.
> >
> > rework-compat_sys_io_submit.patch
> > fix-aioh-includes.patch
> > fix-access_ok-checks.patch
> > make-good_sigevent-non-static.patch
> > make-good_sigevent-non-static-fix.patch
> > make-__sigqueue_free-and.patch
> > aio-completion-signal-notification.patch
> > aio-completion-signal-notification-fix.patch
> > aio-completion-signal-notification-fixes-and-cleanups.patch
> > aio-completion-signal-notification-small-cleanup.patch
> > add-listio-syscall-support.patch
> >
> >  I guess these are dependent upon fsaio.
> 
> No. AIO completion signal notification patches and listio patch work
> independently of fsaio patches. Just that for buffered AIO case, the
> fsaio patches will make the behaviour truly asynchronous.
> 
> Regards,
> Bharata.

  I concur.

  Thanks,

  Sébastien.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: -mm merge plans for 2.6.21

2007-02-09 Thread Sébastien Dugué
On Fri, 9 Feb 2007 11:02:28 +0530 Bharata B Rao [EMAIL PROTECTED] wrote:

 Andrew,
 
 On 2/9/07, Andrew Morton [EMAIL PROTECTED] wrote:
  fsaio-add-a-wait-queue-arg-to-the-wait_bit-action-routine.patch
  fsaio-add-a-wait-queue-arg-to-the-wait_bit-action-routine-gfs2-fix.patch
  fsaio-rename-__lock_page-to-lock_page_blocking.patch
  fsaio-interfaces-to-initialize-and-to-test-a-wait-bit-key.patch
  fsaio-add-a-default-io-wait-bit-field-in-task-struct.patch
  fsaio-enable-wait-bit-based-filtered-wakeups-to-work-for-aio.patch
  fsaio-enable-wait-bit-based-filtered-wakeups-to-work-for-aio-fix.patch
  fsaio-enable-wait-bit-based-filtered-wakeups-to-work-for-aio-fix-sparse-fix.patch
  fsaio-enable-asynchronous-wait-page-and-lock-page.patch
  fsaio-filesystem-aio-read.patch
  fsaio-aio-o_sync-filesystem-write.patch
 
   Will wait to see what happens with febrils.
 
  aio-is-unlikely.patch
 
   Will merge.
 
  rework-compat_sys_io_submit.patch
  fix-aioh-includes.patch
  fix-access_ok-checks.patch
  make-good_sigevent-non-static.patch
  make-good_sigevent-non-static-fix.patch
  make-__sigqueue_free-and.patch
  aio-completion-signal-notification.patch
  aio-completion-signal-notification-fix.patch
  aio-completion-signal-notification-fixes-and-cleanups.patch
  aio-completion-signal-notification-small-cleanup.patch
  add-listio-syscall-support.patch
 
   I guess these are dependent upon fsaio.
 
 No. AIO completion signal notification patches and listio patch work
 independently of fsaio patches. Just that for buffered AIO case, the
 fsaio patches will make the behaviour truly asynchronous.
 
 Regards,
 Bharata.

  I concur.

  Thanks,

  Sébastien.

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: -mm merge plans for 2.6.21

2007-02-09 Thread Sébastien Dugué
On Fri, 9 Feb 2007 01:05:57 -0800 Andrew Morton [EMAIL PROTECTED] wrote:

 On Fri, 9 Feb 2007 09:26:11 +0100 Sébastien Dugué [EMAIL PROTECTED] wrote:
 
  On Fri, 9 Feb 2007 11:02:28 +0530 Bharata B Rao [EMAIL PROTECTED] wrote:
  
   Andrew,
   
   On 2/9/07, Andrew Morton [EMAIL PROTECTED] wrote:
fsaio-add-a-wait-queue-arg-to-the-wait_bit-action-routine.patch
fsaio-add-a-wait-queue-arg-to-the-wait_bit-action-routine-gfs2-fix.patch
fsaio-rename-__lock_page-to-lock_page_blocking.patch
fsaio-interfaces-to-initialize-and-to-test-a-wait-bit-key.patch
fsaio-add-a-default-io-wait-bit-field-in-task-struct.patch
fsaio-enable-wait-bit-based-filtered-wakeups-to-work-for-aio.patch
fsaio-enable-wait-bit-based-filtered-wakeups-to-work-for-aio-fix.patch
fsaio-enable-wait-bit-based-filtered-wakeups-to-work-for-aio-fix-sparse-fix.patch
fsaio-enable-asynchronous-wait-page-and-lock-page.patch
fsaio-filesystem-aio-read.patch
fsaio-aio-o_sync-filesystem-write.patch
   
 Will wait to see what happens with febrils.
   
aio-is-unlikely.patch
   
 Will merge.
   
rework-compat_sys_io_submit.patch
fix-aioh-includes.patch
fix-access_ok-checks.patch
make-good_sigevent-non-static.patch
make-good_sigevent-non-static-fix.patch
make-__sigqueue_free-and.patch
aio-completion-signal-notification.patch
aio-completion-signal-notification-fix.patch
aio-completion-signal-notification-fixes-and-cleanups.patch
aio-completion-signal-notification-small-cleanup.patch
add-listio-syscall-support.patch
   
 I guess these are dependent upon fsaio.
   
   No. AIO completion signal notification patches and listio patch work
   independently of fsaio patches. Just that for buffered AIO case, the
   fsaio patches will make the behaviour truly asynchronous.
   
   Regards,
   Bharata.
  
I concur.
  
 
 But is the proposed listio implementation still relevant if we go and add
 the async syscalls? 

  I think the listio syscall could be emulated in userspace using async
syscalls.

  Even more, the whole POSIX AIO functions could benefit from this, rendering
aio.c obsolete.

  Sébastien.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH -mm][AIO] AIO completion signal notification small cleanup

2007-02-06 Thread Sébastien Dugué

  Andrew, one more cleanup to aio_setup_sigevent() to make it more readable.

  Sébastien.


From: Sébastien Dugué <[EMAIL PROTECTED]>

AIO completion signal notification small cleanup

  This patch cleans up aio_setup_sigevent() to make it more readable.

 aio.c |   15 ---
 1 file changed, 4 insertions(+), 11 deletions(-)

Signed-off-by: Sébastien Dugué <[EMAIL PROTECTED]>


Index: linux-2.6.20-rc6-mm3/fs/aio.c
===
--- linux-2.6.20-rc6-mm3.orig/fs/aio.c  2007-02-06 09:33:55.0 +0100
+++ linux-2.6.20-rc6-mm3/fs/aio.c   2007-02-06 12:43:52.0 +0100
@@ -962,21 +962,11 @@ static long aio_setup_sigevent(struct ai
event.sigev_notify != SIGEV_THREAD_ID)
return -EINVAL;
 
-   notify->notify = event.sigev_notify;
-   notify->signo = event.sigev_signo;
-   notify->value = event.sigev_value;
-
rcu_read_lock();
target = sigevent_find_task();
 
-   if (unlikely(!target)) {
-   /*
-* Revert notify to SIGEV_NONE so that really_put_req()
-* knows that no ref has been taken on a task.
-*/
-   notify->notify = SIGEV_NONE;
+   if (unlikely(!target))
goto out_unlock;
-   }
 
/*
 * At this point, we know that notify is either SIGEV_SIGNAL or
@@ -988,6 +978,9 @@ static long aio_setup_sigevent(struct ai
notify->target = target;
rcu_read_unlock();
 
+   notify->notify = event.sigev_notify;
+   notify->signo = event.sigev_signo;
+   notify->value = event.sigev_value;
notify->sigq = __sigqueue_alloc(current, GFP_KERNEL, 0);
 
/*
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH -mm][AIO] AIO completion signal notification fixes and cleanups

2007-02-06 Thread Sébastien Dugué
On Tue, 6 Feb 2007 14:05:39 +0300 Oleg Nesterov <[EMAIL PROTECTED]> wrote:

> On 02/06, S?bastien Dugu? wrote:
> > 
> > @@ -970,8 +969,14 @@ static long aio_setup_sigevent(struct ai
> > rcu_read_lock();
> > target = sigevent_find_task();
> >  
> > -   if (unlikely(!target))
> > +   if (unlikely(!target)) {
> > +   /*
> > +* Revert notify to SIGEV_NONE so that really_put_req()
> > +* knows that no ref has been taken on a task.
> > +*/
> > +   notify->notify = SIGEV_NONE;
> > goto out_unlock;
> > +   }
> 
> Very minor nit, feel free to ignore.
> 
> Isn't it better to move "notify->* = event.*;" down, when we know that
> target != NULL. Imho, a bit easier to follow. This way we don't need to
> reset notify->notify = SIGEV_NONE.
> 
> aio_setup_sigevent() relies on the fact that ->notify = SIGEV_NONE on
> entry anyway.

  Yep, right, it will make things cleaner.

  Thanks,

  Sébastien.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH -mm][AIO] AIO completion signal notification fixes and cleanups

2007-02-06 Thread Sébastien Dugué

  Andrew, here is another incremental patch which does a bit of cleanup
as well as fixing a possible release on a task ref that was not taken.

  Thanks,

  Sébastien.


From: Sébastien Dugué <[EMAIL PROTECTED]>

AIO completion signal notification misc fixes and cleanups

  This patches cleans up the notification path and fixes a possible
release on a task ref that was not taken in aio_setup_sigevent().


 aio.c |   15 ++-
 1 file changed, 10 insertions(+), 5 deletions(-)

Signed-off-by: Sébastien Dugué <[EMAIL PROTECTED]>


Index: linux-2.6.20-rc6-mm3/fs/aio.c
===
--- linux-2.6.20-rc6-mm3.orig/fs/aio.c  2007-02-05 16:53:43.0 +0100
+++ linux-2.6.20-rc6-mm3/fs/aio.c   2007-02-06 09:33:55.0 +0100
@@ -469,8 +469,7 @@ static inline void really_put_req(struct
kfree(req->ki_iovec);
 
/* Release task ref */
-   if (req->ki_notify.notify == SIGEV_THREAD_ID ||
-   req->ki_notify.notify == SIGEV_SIGNAL)
+   if (req->ki_notify.notify != SIGEV_NONE)
put_task_struct(req->ki_notify.target);
 
kmem_cache_free(kiocb_cachep, req);
@@ -970,8 +969,14 @@ static long aio_setup_sigevent(struct ai
rcu_read_lock();
target = sigevent_find_task();
 
-   if (unlikely(!target))
+   if (unlikely(!target)) {
+   /*
+* Revert notify to SIGEV_NONE so that really_put_req()
+* knows that no ref has been taken on a task.
+*/
+   notify->notify = SIGEV_NONE;
goto out_unlock;
+   }
 
/*
 * At this point, we know that notify is either SIGEV_SIGNAL or
@@ -996,7 +1001,7 @@ static long aio_setup_sigevent(struct ai
return 0;
 
 out_unlock:
-   read_unlock(_lock);
+   rcu_read_unlock();
return -EINVAL;
 }
 
@@ -1763,7 +1768,7 @@ int fastcall io_submit_one(struct kioctx
 (struct sigevent __user *)(unsigned 
long)
 iocb->aio_sigeventp);
if (ret)
-   goto out_put_req;
+   goto out_sigqfree;
}
 
/* Attach this iocb to its lio */
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH -mm][AIO] Fix AIO completion signal notification possible ref leak

2007-02-06 Thread Sébastien Dugué
On Mon, 5 Feb 2007 20:13:35 +0300 Oleg Nesterov <[EMAIL PROTECTED]> wrote:

> On 02/05, S?bastien Dugu? wrote:
> > 
> >   Make sure we only accept valid sigev_notify values in 
> > aio_setup_sigevent(),
> > namely SIGEV_NONE, SIGEV_THREAD_ID or SIGEV_SIGNAL.
> 
> I think this is correct, but I have another concern (most probably I just
> confused looking at non-applied patch), could you re-check?
> 
> > @@ -959,6 +959,10 @@ static long aio_setup_sigevent(struct ai
> > if (event.sigev_notify == SIGEV_NONE)
> > return 0;
> >  
> > +   if (event.sigev_notify != SIGEV_SIGNAL &&
> > +   event.sigev_notify != SIGEV_THREAD_ID)
> > +   return -EINVAL;
> > +
> > notify->notify = event.sigev_notify;
> > notify->signo = event.sigev_signo;
> > notify->value = event.sigev_value;
> 
> Ok. But what if sigevent_find_task() fails after that? Doesn't this mean
> that really_put_req() will do put_task_struct(NULL) ?
> 

  Argh, right, a patch to fix that and a couple of other corner cases to
follow soon.

  Thanks Oleg for looking through this.

  Sébastien.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH -mm][AIO] Fix AIO completion signal notification possible ref leak

2007-02-06 Thread Sébastien Dugué
On Mon, 5 Feb 2007 20:13:35 +0300 Oleg Nesterov [EMAIL PROTECTED] wrote:

 On 02/05, S?bastien Dugu? wrote:
  
Make sure we only accept valid sigev_notify values in 
  aio_setup_sigevent(),
  namely SIGEV_NONE, SIGEV_THREAD_ID or SIGEV_SIGNAL.
 
 I think this is correct, but I have another concern (most probably I just
 confused looking at non-applied patch), could you re-check?
 
  @@ -959,6 +959,10 @@ static long aio_setup_sigevent(struct ai
  if (event.sigev_notify == SIGEV_NONE)
  return 0;
   
  +   if (event.sigev_notify != SIGEV_SIGNAL 
  +   event.sigev_notify != SIGEV_THREAD_ID)
  +   return -EINVAL;
  +
  notify-notify = event.sigev_notify;
  notify-signo = event.sigev_signo;
  notify-value = event.sigev_value;
 
 Ok. But what if sigevent_find_task() fails after that? Doesn't this mean
 that really_put_req() will do put_task_struct(NULL) ?
 

  Argh, right, a patch to fix that and a couple of other corner cases to
follow soon.

  Thanks Oleg for looking through this.

  Sébastien.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH -mm][AIO] AIO completion signal notification fixes and cleanups

2007-02-06 Thread Sébastien Dugué

  Andrew, here is another incremental patch which does a bit of cleanup
as well as fixing a possible release on a task ref that was not taken.

  Thanks,

  Sébastien.


From: Sébastien Dugué [EMAIL PROTECTED]

AIO completion signal notification misc fixes and cleanups

  This patches cleans up the notification path and fixes a possible
release on a task ref that was not taken in aio_setup_sigevent().


 aio.c |   15 ++-
 1 file changed, 10 insertions(+), 5 deletions(-)

Signed-off-by: Sébastien Dugué [EMAIL PROTECTED]


Index: linux-2.6.20-rc6-mm3/fs/aio.c
===
--- linux-2.6.20-rc6-mm3.orig/fs/aio.c  2007-02-05 16:53:43.0 +0100
+++ linux-2.6.20-rc6-mm3/fs/aio.c   2007-02-06 09:33:55.0 +0100
@@ -469,8 +469,7 @@ static inline void really_put_req(struct
kfree(req-ki_iovec);
 
/* Release task ref */
-   if (req-ki_notify.notify == SIGEV_THREAD_ID ||
-   req-ki_notify.notify == SIGEV_SIGNAL)
+   if (req-ki_notify.notify != SIGEV_NONE)
put_task_struct(req-ki_notify.target);
 
kmem_cache_free(kiocb_cachep, req);
@@ -970,8 +969,14 @@ static long aio_setup_sigevent(struct ai
rcu_read_lock();
target = sigevent_find_task(event);
 
-   if (unlikely(!target))
+   if (unlikely(!target)) {
+   /*
+* Revert notify to SIGEV_NONE so that really_put_req()
+* knows that no ref has been taken on a task.
+*/
+   notify-notify = SIGEV_NONE;
goto out_unlock;
+   }
 
/*
 * At this point, we know that notify is either SIGEV_SIGNAL or
@@ -996,7 +1001,7 @@ static long aio_setup_sigevent(struct ai
return 0;
 
 out_unlock:
-   read_unlock(tasklist_lock);
+   rcu_read_unlock();
return -EINVAL;
 }
 
@@ -1763,7 +1768,7 @@ int fastcall io_submit_one(struct kioctx
 (struct sigevent __user *)(unsigned 
long)
 iocb-aio_sigeventp);
if (ret)
-   goto out_put_req;
+   goto out_sigqfree;
}
 
/* Attach this iocb to its lio */
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH -mm][AIO] AIO completion signal notification fixes and cleanups

2007-02-06 Thread Sébastien Dugué
On Tue, 6 Feb 2007 14:05:39 +0300 Oleg Nesterov [EMAIL PROTECTED] wrote:

 On 02/06, S?bastien Dugu? wrote:
  
  @@ -970,8 +969,14 @@ static long aio_setup_sigevent(struct ai
  rcu_read_lock();
  target = sigevent_find_task(event);
   
  -   if (unlikely(!target))
  +   if (unlikely(!target)) {
  +   /*
  +* Revert notify to SIGEV_NONE so that really_put_req()
  +* knows that no ref has been taken on a task.
  +*/
  +   notify-notify = SIGEV_NONE;
  goto out_unlock;
  +   }
 
 Very minor nit, feel free to ignore.
 
 Isn't it better to move notify-* = event.*; down, when we know that
 target != NULL. Imho, a bit easier to follow. This way we don't need to
 reset notify-notify = SIGEV_NONE.
 
 aio_setup_sigevent() relies on the fact that -notify = SIGEV_NONE on
 entry anyway.

  Yep, right, it will make things cleaner.

  Thanks,

  Sébastien.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH -mm][AIO] AIO completion signal notification small cleanup

2007-02-06 Thread Sébastien Dugué

  Andrew, one more cleanup to aio_setup_sigevent() to make it more readable.

  Sébastien.


From: Sébastien Dugué [EMAIL PROTECTED]

AIO completion signal notification small cleanup

  This patch cleans up aio_setup_sigevent() to make it more readable.

 aio.c |   15 ---
 1 file changed, 4 insertions(+), 11 deletions(-)

Signed-off-by: Sébastien Dugué [EMAIL PROTECTED]


Index: linux-2.6.20-rc6-mm3/fs/aio.c
===
--- linux-2.6.20-rc6-mm3.orig/fs/aio.c  2007-02-06 09:33:55.0 +0100
+++ linux-2.6.20-rc6-mm3/fs/aio.c   2007-02-06 12:43:52.0 +0100
@@ -962,21 +962,11 @@ static long aio_setup_sigevent(struct ai
event.sigev_notify != SIGEV_THREAD_ID)
return -EINVAL;
 
-   notify-notify = event.sigev_notify;
-   notify-signo = event.sigev_signo;
-   notify-value = event.sigev_value;
-
rcu_read_lock();
target = sigevent_find_task(event);
 
-   if (unlikely(!target)) {
-   /*
-* Revert notify to SIGEV_NONE so that really_put_req()
-* knows that no ref has been taken on a task.
-*/
-   notify-notify = SIGEV_NONE;
+   if (unlikely(!target))
goto out_unlock;
-   }
 
/*
 * At this point, we know that notify is either SIGEV_SIGNAL or
@@ -988,6 +978,9 @@ static long aio_setup_sigevent(struct ai
notify-target = target;
rcu_read_unlock();
 
+   notify-notify = event.sigev_notify;
+   notify-signo = event.sigev_signo;
+   notify-value = event.sigev_value;
notify-sigq = __sigqueue_alloc(current, GFP_KERNEL, 0);
 
/*
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH -mm][AIO] Fix AIO completion signal notification possible ref leak

2007-02-05 Thread Sébastien Dugué

  Andrew,

  here is an incremental patch to fix a possible ref leak when users
do weird things with the ->sigev_notify flags.

  Thanks Oleg for spotting this.

  Sébastien.


From: Sébastien Dugué <[EMAIL PROTECTED]>

 Fix AIO completion signal notification possible ref leak

  Make sure we only accept valid sigev_notify values in aio_setup_sigevent(),
namely SIGEV_NONE, SIGEV_THREAD_ID or SIGEV_SIGNAL.

 aio.c |6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

Signed-off-by: Sébastien Dugué <[EMAIL PROTECTED]>


Index: linux-2.6.20-rc6-mm3/fs/aio.c
===
--- linux-2.6.20-rc6-mm3.orig/fs/aio.c  2007-02-05 16:50:27.0 +0100
+++ linux-2.6.20-rc6-mm3/fs/aio.c   2007-02-05 16:53:43.0 +0100
@@ -939,7 +939,7 @@ static int aio_send_signal(struct aio_no
info->si_uid = 0;
info->si_value = notify->value;
 
-   if (notify->notify & SIGEV_THREAD_ID)
+   if (notify->notify == SIGEV_THREAD_ID)
ret = send_sigqueue(notify->signo, sigq, notify->target);
else
ret = send_group_sigqueue(notify->signo, sigq, notify->target);
@@ -959,6 +959,10 @@ static long aio_setup_sigevent(struct ai
if (event.sigev_notify == SIGEV_NONE)
return 0;
 
+   if (event.sigev_notify != SIGEV_SIGNAL &&
+   event.sigev_notify != SIGEV_THREAD_ID)
+   return -EINVAL;
+
notify->notify = event.sigev_notify;
notify->signo = event.sigev_signo;
notify->value = event.sigev_value;
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH -mm 4/7][AIO] Resend - Make good_sigevent() non-static

2007-02-05 Thread Sébastien Dugué

  Andrew,

  Could you replace make-good_sigevent-non-static.patch with this
patch, as it fixes some problems reported by Oleg Nesterov.

  Oleg, chime in if you still see something I missed.

  Hopefully I will have gotten it right this time.

  Sorry for the hassle.

  Sébastien.


From: Sébastien Dugué <[EMAIL PROTECTED]>

Make good_sigevent() non-static and rename it


  Move good_sigevent() from posix-timers.c to signal.c where it belongs,
clean it up, rename it to sigevent_find_task() to better describe what the
function does and make it non-static so that it can be used by other
 subsystems.


 include/linux/signal.h |1 +
 kernel/posix-timers.c  |   19 +--
 kernel/signal.c|   24 
 3 files changed, 26 insertions(+), 18 deletions(-)

Signed-off-by: Sébastien Dugué <[EMAIL PROTECTED]>


Index: linux-2.6.20-rc6-mm3/include/linux/signal.h
===
--- linux-2.6.20-rc6-mm3.orig/include/linux/signal.h2007-01-30 
11:41:27.0 +0100
+++ linux-2.6.20-rc6-mm3/include/linux/signal.h 2007-02-05 12:25:12.0 
+0100
@@ -240,6 +240,7 @@ extern int sigprocmask(int, sigset_t *, 
 
 struct pt_regs;
 extern int get_signal_to_deliver(siginfo_t *info, struct k_sigaction 
*return_ka, struct pt_regs *regs, void *cookie);
+extern struct task_struct * sigevent_find_task(sigevent_t *);
 
 extern struct kmem_cache *sighand_cachep;
 
Index: linux-2.6.20-rc6-mm3/kernel/posix-timers.c
===
--- linux-2.6.20-rc6-mm3.orig/kernel/posix-timers.c 2007-01-30 
11:41:27.0 +0100
+++ linux-2.6.20-rc6-mm3/kernel/posix-timers.c  2007-01-30 11:41:36.0 
+0100
@@ -367,23 +367,6 @@ static enum hrtimer_restart posix_timer_
return ret;
 }
 
-static struct task_struct * good_sigevent(sigevent_t * event)
-{
-   struct task_struct *rtn = current->group_leader;
-
-   if ((event->sigev_notify & SIGEV_THREAD_ID ) &&
-   (!(rtn = find_task_by_pid(event->sigev_notify_thread_id)) ||
-rtn->tgid != current->tgid ||
-(event->sigev_notify & ~SIGEV_THREAD_ID) != SIGEV_SIGNAL))
-   return NULL;
-
-   if (((event->sigev_notify & ~SIGEV_THREAD_ID) != SIGEV_NONE) &&
-   ((event->sigev_signo <= 0) || (event->sigev_signo > SIGRTMAX)))
-   return NULL;
-
-   return rtn;
-}
-
 void register_posix_clock(const clockid_t clock_id, struct k_clock *new_clock)
 {
if ((unsigned) clock_id >= MAX_CLOCKS) {
@@ -496,7 +479,7 @@ sys_timer_create(const clockid_t which_c
new_timer->it_sigev_value = event.sigev_value;
 
read_lock(_lock);
-   if ((process = good_sigevent())) {
+   if ((process = sigevent_find_task())) {
/*
 * We may be setting up this process for another
 * thread.  It may be exiting.  To catch this
Index: linux-2.6.20-rc6-mm3/kernel/signal.c
===
--- linux-2.6.20-rc6-mm3.orig/kernel/signal.c   2007-01-30 11:41:27.0 
+0100
+++ linux-2.6.20-rc6-mm3/kernel/signal.c2007-02-05 12:27:33.0 
+0100
@@ -1213,6 +1213,34 @@ int group_send_sig_info(int sig, struct 
return ret;
 }
 
+/***
+ * sigevent_find_task - check and get target task from a sigevent.
+ * @event: the sigevent to be checked
+ *
+ * This function must be called with the tasklist_lock held for reading.
+ */
+struct task_struct * sigevent_find_task(sigevent_t * event)
+{
+   struct task_struct *task = current->group_leader;
+
+   if ((event->sigev_notify & ~SIGEV_THREAD_ID) != SIGEV_NONE) {
+   if (event->sigev_signo <= 0 || event->sigev_signo > SIGRTMAX)
+   return NULL;
+   }
+
+   if (event->sigev_notify & SIGEV_THREAD_ID) {
+   if ((event->sigev_notify & ~SIGEV_THREAD_ID) != SIGEV_SIGNAL)
+   return NULL;
+
+   task = find_task_by_pid(event->sigev_notify_thread_id);
+
+   if (!task || task->tgid != current->tgid)
+   task = NULL;
+   }
+
+   return task;
+}
+
 /*
  * kill_pgrp_info() sends a signal to a process group: this is what the tty
  * control characters do (^C, ^Z etc)
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH -mm 4/7][AIO] - Make good_sigevent() non-static

2007-02-05 Thread Sébastien Dugué

  Hi Oleg,

  thanks for your comments.

On Fri, 2 Feb 2007 21:00:39 +0300 Oleg Nesterov <[EMAIL PROTECTED]> wrote:

> On 02/01, S?bastien Dugu? wrote:
> > 
> > +struct task_struct * sigevent_find_task(sigevent_t * event)
> > +{
> > +   struct task_struct *task = NULL;
> > +
> > +   if (event->sigev_signo <= 0 || event->sigev_signo > SIGRTMAX)
> > +   return NULL;
> > +
> > +   if ((event->sigev_notify & SIGEV_THREAD_ID ) == SIGEV_THREAD_ID) {
> > +   task = find_task_by_pid(event->sigev_notify_thread_id);
> > +
> > +   if (!task || task->tgid != current->tgid)
> > +   task = NULL;
> > +   } else if (event->sigev_notify == SIGEV_SIGNAL)
> > +   task = current->group_leader;
> > +
> > +   return task;
> > +}
> 
> I am afraid this is still not right. Consider
> 
>   ->sigev_notify == SIGEV_THREAD_ID | RANDOM_BIT
> 
> Now, the second "if (SIGEV_THREAD_ID)" returns a valid task. However,
> 
>   really_put_req:
> 
>   if (notify == SIGEV_THREAD_ID || notify == SIGEV_SIGNAL)
>   put_task_struct();
> 
> doesn't work, so we have task_struct leak.

  Right, I'll revert back to the old code with cleanups.

> 
> Worse, this breaks posix-timers. Note that posix-timers allow SIGEV_NONE,
> the timer is not queued in that case, we shouldn't do ->sigev_signo check.
> This means that aio should check SIGEV_NONE itself.
> 
> Also, it is critical for posix-timers that SIGEV_THREAD_ID doesn't come
> with another bit (like in the example below), note the code like
> 
>   if (sigev_notify == (SIGEV_SIGNAL|SIGEV_THREAD_ID))
>   ...
> 
> IOW: good_sigevent() in its current form is very cryptic, and it _really_
> needs a cleanup, but we should not change its behaviour.

  Yep. I must admit that I didn't pay enough attention to this 10-liner,
but in the end it rightfully backfired on me.

> 
> Apart from this, I don't see other problems in the signal related code in
> this series.
> 
> Oleg.
> 

  Thanks again for your review.

  Sébastien.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH -mm 4/7][AIO] - Make good_sigevent() non-static

2007-02-05 Thread Sébastien Dugué

  Hi Oleg,

  thanks for your comments.

On Fri, 2 Feb 2007 21:00:39 +0300 Oleg Nesterov [EMAIL PROTECTED] wrote:

 On 02/01, S?bastien Dugu? wrote:
  
  +struct task_struct * sigevent_find_task(sigevent_t * event)
  +{
  +   struct task_struct *task = NULL;
  +
  +   if (event-sigev_signo = 0 || event-sigev_signo  SIGRTMAX)
  +   return NULL;
  +
  +   if ((event-sigev_notify  SIGEV_THREAD_ID ) == SIGEV_THREAD_ID) {
  +   task = find_task_by_pid(event-sigev_notify_thread_id);
  +
  +   if (!task || task-tgid != current-tgid)
  +   task = NULL;
  +   } else if (event-sigev_notify == SIGEV_SIGNAL)
  +   task = current-group_leader;
  +
  +   return task;
  +}
 
 I am afraid this is still not right. Consider
 
   -sigev_notify == SIGEV_THREAD_ID | RANDOM_BIT
 
 Now, the second if (SIGEV_THREAD_ID) returns a valid task. However,
 
   really_put_req:
 
   if (notify == SIGEV_THREAD_ID || notify == SIGEV_SIGNAL)
   put_task_struct();
 
 doesn't work, so we have task_struct leak.

  Right, I'll revert back to the old code with cleanups.

 
 Worse, this breaks posix-timers. Note that posix-timers allow SIGEV_NONE,
 the timer is not queued in that case, we shouldn't do -sigev_signo check.
 This means that aio should check SIGEV_NONE itself.
 
 Also, it is critical for posix-timers that SIGEV_THREAD_ID doesn't come
 with another bit (like in the example below), note the code like
 
   if (sigev_notify == (SIGEV_SIGNAL|SIGEV_THREAD_ID))
   ...
 
 IOW: good_sigevent() in its current form is very cryptic, and it _really_
 needs a cleanup, but we should not change its behaviour.

  Yep. I must admit that I didn't pay enough attention to this 10-liner,
but in the end it rightfully backfired on me.

 
 Apart from this, I don't see other problems in the signal related code in
 this series.
 
 Oleg.
 

  Thanks again for your review.

  Sébastien.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH -mm 4/7][AIO] Resend - Make good_sigevent() non-static

2007-02-05 Thread Sébastien Dugué

  Andrew,

  Could you replace make-good_sigevent-non-static.patch with this
patch, as it fixes some problems reported by Oleg Nesterov.

  Oleg, chime in if you still see something I missed.

  Hopefully I will have gotten it right this time.

  Sorry for the hassle.

  Sébastien.


From: Sébastien Dugué [EMAIL PROTECTED]

Make good_sigevent() non-static and rename it


  Move good_sigevent() from posix-timers.c to signal.c where it belongs,
clean it up, rename it to sigevent_find_task() to better describe what the
function does and make it non-static so that it can be used by other
 subsystems.


 include/linux/signal.h |1 +
 kernel/posix-timers.c  |   19 +--
 kernel/signal.c|   24 
 3 files changed, 26 insertions(+), 18 deletions(-)

Signed-off-by: Sébastien Dugué [EMAIL PROTECTED]


Index: linux-2.6.20-rc6-mm3/include/linux/signal.h
===
--- linux-2.6.20-rc6-mm3.orig/include/linux/signal.h2007-01-30 
11:41:27.0 +0100
+++ linux-2.6.20-rc6-mm3/include/linux/signal.h 2007-02-05 12:25:12.0 
+0100
@@ -240,6 +240,7 @@ extern int sigprocmask(int, sigset_t *, 
 
 struct pt_regs;
 extern int get_signal_to_deliver(siginfo_t *info, struct k_sigaction 
*return_ka, struct pt_regs *regs, void *cookie);
+extern struct task_struct * sigevent_find_task(sigevent_t *);
 
 extern struct kmem_cache *sighand_cachep;
 
Index: linux-2.6.20-rc6-mm3/kernel/posix-timers.c
===
--- linux-2.6.20-rc6-mm3.orig/kernel/posix-timers.c 2007-01-30 
11:41:27.0 +0100
+++ linux-2.6.20-rc6-mm3/kernel/posix-timers.c  2007-01-30 11:41:36.0 
+0100
@@ -367,23 +367,6 @@ static enum hrtimer_restart posix_timer_
return ret;
 }
 
-static struct task_struct * good_sigevent(sigevent_t * event)
-{
-   struct task_struct *rtn = current-group_leader;
-
-   if ((event-sigev_notify  SIGEV_THREAD_ID ) 
-   (!(rtn = find_task_by_pid(event-sigev_notify_thread_id)) ||
-rtn-tgid != current-tgid ||
-(event-sigev_notify  ~SIGEV_THREAD_ID) != SIGEV_SIGNAL))
-   return NULL;
-
-   if (((event-sigev_notify  ~SIGEV_THREAD_ID) != SIGEV_NONE) 
-   ((event-sigev_signo = 0) || (event-sigev_signo  SIGRTMAX)))
-   return NULL;
-
-   return rtn;
-}
-
 void register_posix_clock(const clockid_t clock_id, struct k_clock *new_clock)
 {
if ((unsigned) clock_id = MAX_CLOCKS) {
@@ -496,7 +479,7 @@ sys_timer_create(const clockid_t which_c
new_timer-it_sigev_value = event.sigev_value;
 
read_lock(tasklist_lock);
-   if ((process = good_sigevent(event))) {
+   if ((process = sigevent_find_task(event))) {
/*
 * We may be setting up this process for another
 * thread.  It may be exiting.  To catch this
Index: linux-2.6.20-rc6-mm3/kernel/signal.c
===
--- linux-2.6.20-rc6-mm3.orig/kernel/signal.c   2007-01-30 11:41:27.0 
+0100
+++ linux-2.6.20-rc6-mm3/kernel/signal.c2007-02-05 12:27:33.0 
+0100
@@ -1213,6 +1213,34 @@ int group_send_sig_info(int sig, struct 
return ret;
 }
 
+/***
+ * sigevent_find_task - check and get target task from a sigevent.
+ * @event: the sigevent to be checked
+ *
+ * This function must be called with the tasklist_lock held for reading.
+ */
+struct task_struct * sigevent_find_task(sigevent_t * event)
+{
+   struct task_struct *task = current-group_leader;
+
+   if ((event-sigev_notify  ~SIGEV_THREAD_ID) != SIGEV_NONE) {
+   if (event-sigev_signo = 0 || event-sigev_signo  SIGRTMAX)
+   return NULL;
+   }
+
+   if (event-sigev_notify  SIGEV_THREAD_ID) {
+   if ((event-sigev_notify  ~SIGEV_THREAD_ID) != SIGEV_SIGNAL)
+   return NULL;
+
+   task = find_task_by_pid(event-sigev_notify_thread_id);
+
+   if (!task || task-tgid != current-tgid)
+   task = NULL;
+   }
+
+   return task;
+}
+
 /*
  * kill_pgrp_info() sends a signal to a process group: this is what the tty
  * control characters do (^C, ^Z etc)
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH -mm][AIO] Fix AIO completion signal notification possible ref leak

2007-02-05 Thread Sébastien Dugué

  Andrew,

  here is an incremental patch to fix a possible ref leak when users
do weird things with the -sigev_notify flags.

  Thanks Oleg for spotting this.

  Sébastien.


From: Sébastien Dugué [EMAIL PROTECTED]

 Fix AIO completion signal notification possible ref leak

  Make sure we only accept valid sigev_notify values in aio_setup_sigevent(),
namely SIGEV_NONE, SIGEV_THREAD_ID or SIGEV_SIGNAL.

 aio.c |6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

Signed-off-by: Sébastien Dugué [EMAIL PROTECTED]


Index: linux-2.6.20-rc6-mm3/fs/aio.c
===
--- linux-2.6.20-rc6-mm3.orig/fs/aio.c  2007-02-05 16:50:27.0 +0100
+++ linux-2.6.20-rc6-mm3/fs/aio.c   2007-02-05 16:53:43.0 +0100
@@ -939,7 +939,7 @@ static int aio_send_signal(struct aio_no
info-si_uid = 0;
info-si_value = notify-value;
 
-   if (notify-notify  SIGEV_THREAD_ID)
+   if (notify-notify == SIGEV_THREAD_ID)
ret = send_sigqueue(notify-signo, sigq, notify-target);
else
ret = send_group_sigqueue(notify-signo, sigq, notify-target);
@@ -959,6 +959,10 @@ static long aio_setup_sigevent(struct ai
if (event.sigev_notify == SIGEV_NONE)
return 0;
 
+   if (event.sigev_notify != SIGEV_SIGNAL 
+   event.sigev_notify != SIGEV_THREAD_ID)
+   return -EINVAL;
+
notify-notify = event.sigev_notify;
notify-signo = event.sigev_signo;
notify-value = event.sigev_value;
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH -mm 3/7][AIO] - Fix access_ok() checks

2007-02-01 Thread Sébastien Dugué

From: Sébastien Dugué <[EMAIL PROTECTED]>

   Fix access_ok() checks in sys_io_submit() and 
compat_sys_io_submit()


  sys_io_submit() and compat_sys_io_submit() check for the number of
requests (nr) being positive, but the following access_ok() call uses
nr * sizeof(ptr) which can overlow to a negative number.

  Just make sure that nr * sizeof(ptr) is positive in the first place.


 aio.c|2 +-
 compat.c |2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

Signed-off-by: Sébastien Dugué <[EMAIL PROTECTED]>

Index: linux-2.6.20-rc6-mm3/fs/aio.c
===
--- linux-2.6.20-rc6-mm3.orig/fs/aio.c  2007-01-30 11:28:56.0 +0100
+++ linux-2.6.20-rc6-mm3/fs/aio.c   2007-01-30 11:32:42.0 +0100
@@ -1630,7 +1630,7 @@ asmlinkage long sys_io_submit(aio_contex
long ret = 0;
int i;
 
-   if (unlikely(nr < 0))
+   if (unlikely((nr*sizeof(*iocbpp)) < 0))
return -EINVAL;
 
if (unlikely(!access_ok(VERIFY_READ, iocbpp, (nr*sizeof(*iocbpp)
Index: linux-2.6.20-rc6-mm3/fs/compat.c
===
--- linux-2.6.20-rc6-mm3.orig/fs/compat.c   2007-01-30 11:30:29.0 
+0100
+++ linux-2.6.20-rc6-mm3/fs/compat.c2007-01-30 11:31:55.0 +0100
@@ -651,7 +651,7 @@ compat_sys_io_submit(aio_context_t ctx_i
long ret = 0;
int i;
 
-   if (unlikely(nr < 0))
+   if (unlikely((nr * sizeof(u32)) < 0))
return -EINVAL;
 
if (unlikely(!access_ok(VERIFY_READ, iocb, (nr * sizeof(u32)
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH -mm 5/7][AIO] - Make __sigqueue_free() and __sigqueue_alloc() non static

2007-02-01 Thread Sébastien Dugué

From: Sébastien Dugué <[EMAIL PROTECTED]>

 Make __sigqueue_alloc() and __sigqueue_free() non static

  Allow subsystems to directly call into __sigqueue_alloc() and __sigqueue_free.
This is used by the AIO signal notification patch.


 include/linux/signal.h |3 +++
 kernel/signal.c|6 +++---
 2 files changed, 6 insertions(+), 3 deletions(-)

Signed-off-by: Sébastien Dugué <[EMAIL PROTECTED]>


Index: linux-2.6.20-rc6-mm3/include/linux/signal.h
===
--- linux-2.6.20-rc6-mm3.orig/include/linux/signal.h2007-01-30 
11:41:36.0 +0100
+++ linux-2.6.20-rc6-mm3/include/linux/signal.h 2007-01-30 11:41:39.0 
+0100
@@ -241,6 +241,9 @@ extern int sigprocmask(int, sigset_t *, 
 struct pt_regs;
 extern int get_signal_to_deliver(siginfo_t *info, struct k_sigaction 
*return_ka, struct pt_regs *regs, void *cookie);
 extern struct task_struct * sigevent_find_task(sigevent_t *);
+extern struct sigqueue *__sigqueue_alloc(struct task_struct *t, gfp_t flags,
+int override_rlimit);
+extern void __sigqueue_free(struct sigqueue *q);
 
 extern struct kmem_cache *sighand_cachep;
 
Index: linux-2.6.20-rc6-mm3/kernel/signal.c
===
--- linux-2.6.20-rc6-mm3.orig/kernel/signal.c   2007-01-30 11:41:36.0 
+0100
+++ linux-2.6.20-rc6-mm3/kernel/signal.c2007-01-30 11:41:39.0 
+0100
@@ -268,8 +268,8 @@ next_signal(struct sigpending *pending, 
return sig;
 }
 
-static struct sigqueue *__sigqueue_alloc(struct task_struct *t, gfp_t flags,
-int override_rlimit)
+struct sigqueue *__sigqueue_alloc(struct task_struct *t, gfp_t flags,
+ int override_rlimit)
 {
struct sigqueue *q = NULL;
struct user_struct *user;
@@ -295,7 +295,7 @@ static struct sigqueue *__sigqueue_alloc
return(q);
 }
 
-static void __sigqueue_free(struct sigqueue *q)
+void __sigqueue_free(struct sigqueue *q)
 {
if (q->flags & SIGQUEUE_PREALLOC)
return;
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH -mm 2/7][AIO] - fix aio.h includes

2007-02-01 Thread Sébastien Dugué

From: Sébastien Dugué <[EMAIL PROTECTED]>

  Fix the double inclusion of linux/uio.h in linux/aio.h



 aio.h |1 -
 1 file changed, 1 deletion(-)

Signed-off-by: Sébastien Dugué <[EMAIL PROTECTED]>

Index: linux-2.6.20-rc6-mm3/include/linux/aio.h
===
--- linux-2.6.20-rc6-mm3.orig/include/linux/aio.h   2007-01-30 
10:05:08.0 +0100
+++ linux-2.6.20-rc6-mm3/include/linux/aio.h2007-01-30 10:11:22.0 
+0100
@@ -7,7 +7,6 @@
 #include 
 
 #include 
-#include 
 
 #define AIO_MAXSEGS4
 #define AIO_KIOGRP_NR_ATOMIC   8
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH -mm 7/7][AIO] - Add listio syscall support

2007-02-01 Thread Sébastien Dugué

From: Bharata B Rao <[EMAIL PROTECTED]>

This patch provides POSIX listio support by means of a new system call.

long lio_submit(aio_context_t ctx_id, int mode, long nr,
struct iocb __user * __user *iocbpp, struct sigevent __user *event)

This system call is similar to the io_submit() system call, but takes
two more arguments.

'mode' argument can be LIO_WAIT or LIO_NOWAIT.

'event' argument specifies the signal notification mechanism.

This patch is built on support provided by the aio signal notification
patch by Sebastien. The following two structures together provide
the support for grouping iocbs belonging to a list (lio).

struct aio_notify {
struct task_struct  *target;
__u16   signo;
__u16   notify;
sigval_tvalue;
struct sigqueue *sigq;
};

struct lio_event {
atomic_tlio_users;
struct aio_notify   lio_notify;
};

A single lio_event struct is maintained for the list of iocbs.
lio_users holds the number of requests attached to this lio and lio_notify
has the necessary information for generating completion notification
signal.

If the mode is LIO_WAIT, the event argument is ignored and the system
call waits until all the requests of the lio are completed.

If the mode is LIO_NOWAIT, the system call returns immediately after
submitting the io requests and may optionally notify the process on
list io completion depending on the event argument.

Signed-off-by: Sébastien Dugué <[EMAIL PROTECTED]>
Signed-off-by: Laurent Vivier <[EMAIL PROTECTED]>
Signed-off-by: Bharata B Rao <[EMAIL PROTECTED]>
---

 arch/i386/kernel/syscall_table.S |1 
 arch/x86_64/ia32/ia32entry.S |5 -
 fs/aio.c |  193 ++-
 fs/compat.c  |  125 +
 include/asm-i386/unistd.h|3 
 include/asm-x86_64/unistd.h  |4 
 include/linux/aio.h  |   14 ++
 include/linux/aio_abi.h  |5 +
 include/linux/syscalls.h |2 
 9 files changed, 307 insertions(+), 45 deletions(-)

Index: linux-2.6.20-rc6-mm3/arch/i386/kernel/syscall_table.S
===
--- linux-2.6.20-rc6-mm3.orig/arch/i386/kernel/syscall_table.S  2007-01-31 
15:31:59.0 +0100
+++ linux-2.6.20-rc6-mm3/arch/i386/kernel/syscall_table.S   2007-01-31 
15:58:55.0 +0100
@@ -320,3 +320,4 @@ ENTRY(sys_call_table)
.long sys_getcpu
.long sys_epoll_pwait
.long sys_lutimesat /* 320 */
+   .long sys_lio_submit
Index: linux-2.6.20-rc6-mm3/arch/x86_64/ia32/ia32entry.S
===
--- linux-2.6.20-rc6-mm3.orig/arch/x86_64/ia32/ia32entry.S  2007-01-31 
15:31:59.0 +0100
+++ linux-2.6.20-rc6-mm3/arch/x86_64/ia32/ia32entry.S   2007-01-31 
15:58:55.0 +0100
@@ -714,8 +714,11 @@ ia32_sys_call_table:
.quad compat_sys_get_robust_list
.quad sys_splice
.quad sys_sync_file_range
-   .quad sys_tee
+   .quad sys_tee   /* 315 */
.quad compat_sys_vmsplice
.quad compat_sys_move_pages
.quad sys_getcpu
+   .quad quiet_ni_syscall  /* sys_epoll_wait */
+   .quad quiet_ni_syscall  /* 320: sys_lutimesat */
+   .quad compat_sys_lio_submit
 ia32_syscall_end:  
Index: linux-2.6.20-rc6-mm3/fs/aio.c
===
--- linux-2.6.20-rc6-mm3.orig/fs/aio.c  2007-01-31 15:31:59.0 +0100
+++ linux-2.6.20-rc6-mm3/fs/aio.c   2007-01-31 16:00:57.0 +0100
@@ -416,6 +416,7 @@ static struct kiocb fastcall *__aio_get_
req->ki_ctx = ctx;
req->ki_cancel = NULL;
req->ki_retry = NULL;
+   req->ki_lio = NULL;
req->ki_dtor = NULL;
req->private = NULL;
req->ki_iovec = NULL;
@@ -995,6 +996,72 @@ out_unlock:
return -EINVAL;
 }
 
+/*
+ * lio_notify
+ * When all iocbs belonging to an lio are complete, this initiates
+ * the completion notification for the lio.
+ *
+ * NOTE: lio is freed here after the last iocb completes, but only
+ * for LIO_NOWAIT case. In case of LIO_WAIT, the submitting thread
+ * waits for iocbs to complete and later frees the lio.
+ */
+void lio_notify(struct lio_event *lio)
+{
+   int ret;
+
+   ret = atomic_dec_and_test(>lio_users);
+
+   if (unlikely(ret) && lio->lio_notify.notify != SIGEV_NONE) {
+   /* last one -> notify process */
+   if (aio_send_signal(>lio_notify))
+   __sigqueue_free(lio->lio_notify.sigq);
+   kfree(lio);
+   }
+}
+
+/*
+ * lio_create
+ * Allocates a lio_event structure which groups the iocbs belonging
+ * to the lio and sets up the c

[PATCH -mm 4/7][AIO] - Make good_sigevent() non-static

2007-02-01 Thread Sébastien Dugué

From: Sébastien Dugué <[EMAIL PROTECTED]>

Make good_sigevent() non-static and rename it


  Move good_sigevent() from posix-timers.c to signal.c where it belongs,
clean it up, rename it to sigevent_find_task() to better describe what the
function does and make it non-static so that it can be used by other
 subsystems.


 include/linux/signal.h |1 +
 kernel/posix-timers.c  |   19 +--
 kernel/signal.c|   24 
 3 files changed, 26 insertions(+), 18 deletions(-)

Signed-off-by: Sébastien Dugué <[EMAIL PROTECTED]>


Index: linux-2.6.20-rc6-mm3/include/linux/signal.h
===
--- linux-2.6.20-rc6-mm3.orig/include/linux/signal.h2007-01-30 
11:41:27.0 +0100
+++ linux-2.6.20-rc6-mm3/include/linux/signal.h 2007-01-30 11:41:36.0 
+0100
@@ -240,6 +240,7 @@ extern int sigprocmask(int, sigset_t *, 
 
 struct pt_regs;
 extern int get_signal_to_deliver(siginfo_t *info, struct k_sigaction 
*return_ka, struct pt_regs *regs, void *cookie);
+extern struct task_struct * sigevent_find_task(sigevent_t *);
 
 extern struct kmem_cache *sighand_cachep;
 
Index: linux-2.6.20-rc6-mm3/kernel/posix-timers.c
===
--- linux-2.6.20-rc6-mm3.orig/kernel/posix-timers.c 2007-01-30 
11:41:27.0 +0100
+++ linux-2.6.20-rc6-mm3/kernel/posix-timers.c  2007-01-30 11:41:36.0 
+0100
@@ -367,23 +367,6 @@ static enum hrtimer_restart posix_timer_
return ret;
 }
 
-static struct task_struct * good_sigevent(sigevent_t * event)
-{
-   struct task_struct *rtn = current->group_leader;
-
-   if ((event->sigev_notify & SIGEV_THREAD_ID ) &&
-   (!(rtn = find_task_by_pid(event->sigev_notify_thread_id)) ||
-rtn->tgid != current->tgid ||
-(event->sigev_notify & ~SIGEV_THREAD_ID) != SIGEV_SIGNAL))
-   return NULL;
-
-   if (((event->sigev_notify & ~SIGEV_THREAD_ID) != SIGEV_NONE) &&
-   ((event->sigev_signo <= 0) || (event->sigev_signo > SIGRTMAX)))
-   return NULL;
-
-   return rtn;
-}
-
 void register_posix_clock(const clockid_t clock_id, struct k_clock *new_clock)
 {
if ((unsigned) clock_id >= MAX_CLOCKS) {
@@ -496,7 +479,7 @@ sys_timer_create(const clockid_t which_c
new_timer->it_sigev_value = event.sigev_value;
 
read_lock(_lock);
-   if ((process = good_sigevent())) {
+   if ((process = sigevent_find_task())) {
/*
 * We may be setting up this process for another
 * thread.  It may be exiting.  To catch this
Index: linux-2.6.20-rc6-mm3/kernel/signal.c
===
--- linux-2.6.20-rc6-mm3.orig/kernel/signal.c   2007-01-30 11:41:27.0 
+0100
+++ linux-2.6.20-rc6-mm3/kernel/signal.c2007-01-30 11:41:36.0 
+0100
@@ -1213,6 +1213,30 @@ int group_send_sig_info(int sig, struct 
return ret;
 }
 
+/***
+ * sigevent_find_task - check and get target task from a sigevent.
+ * @event: the sigevent to be checked
+ *
+ * This function must be called with the tasklist_lock held for reading.
+ */
+struct task_struct * sigevent_find_task(sigevent_t * event)
+{
+   struct task_struct *task = NULL;
+
+   if (event->sigev_signo <= 0 || event->sigev_signo > SIGRTMAX)
+   return NULL;
+
+   if ((event->sigev_notify & SIGEV_THREAD_ID ) == SIGEV_THREAD_ID) {
+   task = find_task_by_pid(event->sigev_notify_thread_id);
+
+   if (!task || task->tgid != current->tgid)
+   task = NULL;
+   } else if (event->sigev_notify == SIGEV_SIGNAL)
+   task = current->group_leader;
+
+   return task;
+}
+
 /*
  * kill_pgrp_info() sends a signal to a process group: this is what the tty
  * control characters do (^C, ^Z etc)
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH -mm 1/7][AIO] - Rework compat_sys_io_submit

2007-02-01 Thread Sébastien Dugué

From: Sébastien Dugué <[EMAIL PROTECTED]>

 compat_sys_io_submit() cleanup


  Cleanup compat_sys_io_submit by duplicating some of the native syscall
logic in the compat layer and directly calling io_submit_one() instead
of fooling the syscall into thinking it is called from a native 64-bit
caller.

  This eliminates:

  - the overhead of copying the nr iocb pointers on the userspace stack

  - the PAGE_SIZE/(sizeof(void *)) limit on the number of iocbs that
can be submitted.

  This is also needed for the completion notification patch to avoid having
to rewrite each iocb on the caller stack for io_submit_one() to find the
sigevents.


 compat.c |   61 ++---
 1 file changed, 34 insertions(+), 27 deletions(-)

Signed-off-by: Sébastien Dugué <[EMAIL PROTECTED]>

Index: linux-2.6.20-rc6-mm3/fs/compat.c
===
--- linux-2.6.20-rc6-mm3.orig/fs/compat.c   2007-01-30 10:05:08.0 
+0100
+++ linux-2.6.20-rc6-mm3/fs/compat.c2007-01-30 11:30:29.0 +0100
@@ -644,40 +644,47 @@ out:
return ret;
 }
 
-static inline long
-copy_iocb(long nr, u32 __user *ptr32, struct iocb __user * __user *ptr64)
-{
-   compat_uptr_t uptr;
-   int i;
-
-   for (i = 0; i < nr; ++i) {
-   if (get_user(uptr, ptr32 + i))
-   return -EFAULT;
-   if (put_user(compat_ptr(uptr), ptr64 + i))
-   return -EFAULT;
-   }
-   return 0;
-}
-
-#define MAX_AIO_SUBMITS(PAGE_SIZE/sizeof(struct iocb *))
-
 asmlinkage long
 compat_sys_io_submit(aio_context_t ctx_id, int nr, u32 __user *iocb)
 {
-   struct iocb __user * __user *iocb64; 
-   long ret;
+   struct kioctx *ctx;
+   long ret = 0;
+   int i;
 
if (unlikely(nr < 0))
return -EINVAL;
 
-   if (nr > MAX_AIO_SUBMITS)
-   nr = MAX_AIO_SUBMITS;
-   
-   iocb64 = compat_alloc_user_space(nr * sizeof(*iocb64));
-   ret = copy_iocb(nr, iocb, iocb64);
-   if (!ret)
-   ret = sys_io_submit(ctx_id, nr, iocb64);
-   return ret;
+   if (unlikely(!access_ok(VERIFY_READ, iocb, (nr * sizeof(u32)
+   return -EFAULT;
+
+   ctx = lookup_ioctx(ctx_id);
+   if (unlikely(!ctx))
+   return -EINVAL;
+
+   for (i=0; ihttp://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH -mm 0/7][AIO] - AIO completion signal notification v6

2007-02-01 Thread Sébastien Dugué

  Hi Andrew,

  here is the latest rework of the aio notification and listio support
patches with comments from Oleg Nesterov and you addressed.

  Sébastien.

  This set now consists in 7 patches:

1. rework-compat-sys-io-submit: cleanup the sys_io_submit() compat 
layer,
   making it more efficient and laying out the base for the following 
patches

2. aio-header-fix-includes: fixes the double inclusion of uio.h in aio.h

3. aio-fix-access_ok-check: fixes the access_ok() checks

4. make-good_sigevent-non-static: move good_sigevent into signal.c, 
rename
   it to something more meaningfull and make it non-static

5. make-__sigqueue_alloc-free-non-static: make __sigqueue_alloc() and
   __sigqueue_free() non static

6. aio-notify-sig: the AIO completion signal notification

7. add-listio-syscall-support: adds listio support via a new syscall

  Description are in the individual patches.


  Changes from v5: based on comments from Oleg Nesterov and Andrew Morton

- Fix for signo range not checked in the SIGEV_THREAD_ID case in
  good_sigevent()
- renamed good_sigevent() to sigevent_find_task()

- Fixed the access_ok checks in sys_io_submit() and
  compat_sys_io_submit()

- Made __sigqueue_alloc() and __sigqueue_free() non static
- Reworked the aio code to directly use __sigqueue_alloc() and
  __sigqueue_free() and avoid messing with SIGQUEUE_PREALLOC

- Removed the unneeded PF_EXITING check in aio_setup_sigevent()
- Added a comment in aio_setup_sigevent() for why there is no task
  ref leak
- Changed read_lock(_lock) to rcu_read_lock() in
  aio_setup_sigevent()

- Fixed an ioctx reference leak in compat_sys_lio_submit()
- Renamed lio_check() as lio_notify()
- Added comments to lio_notify() and lio_create()
- Replaced sigqueue_free() by __sigqueue_free() in lio_notify()

  Changes from v4:
No comments received for v4, so it must be perfect ;-)
replaced the IOCB_CMD_GROUP listio implementation with Bharata's
syscall approach which I find much cleaner.

  Changes from v3:

- added justification for the compat_sys_io_submit() cleanup - Zach 
Brown
- more cleanups in compat_sys_io_submit() to put it in line with
  sys_io_submit() - Zach Brown

- Changed "Export good_sigevent()" patch name to "Make good_sigevent()
  non-static" to better describe what it does. - Christoph Hellwig
- Reworked good_sigevent() to make it more readable. - Christoph Hellwig

- Simplified the use of the SIGEV_* constants in signal notification -
  Christoph Hellwig
- Take a reference on the target task both for the SIGEV_THREAD_ID and
  SIGEV_SIGNAL cases.

  Changes from v2:
- rebased to 2.6.19-rc6-mm2
- reworked the sys_io_submit() compat layer as suggested by Zach Brown
- fixed saving of a pointer to a task struct in aio-notify-sig as
  pointed out by Andrew Morton

  Changes from v1:
- cleanups suggested by Christoph Hellwig, Badari Pulavarty and
Zach Brown
- added lisio patch


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH -mm 6/7][AIO] - AIO completion signal notification

2007-02-01 Thread Sébastien Dugué

From: Sébastien Dugué <[EMAIL PROTECTED]>

  AIO completion signal notification

  The current 2.6 kernel does not support notification of user space via
an RT signal upon an asynchronous IO completion. The POSIX specification
states that when an AIO request completes, a signal can be delivered to
the application as notification.

  This patch adds a struct sigevent *aio_sigeventp to the iocb.
The relevant fields (pid, signal number and value) are stored in the kiocb
for use when the request completes.

  That sigevent structure is filled by the application as part of the AIO
request preparation. Upon request completion, the kernel notifies the
application using those sigevent parameters. If SIGEV_NONE has been specified,
then the old behaviour is retained and the application must rely on polling
the completion queue using io_getevents().


  A struct sigevent *aio_sigeventp field is added to struct iocb in
include/linux/aio_abi.h

  A struct aio_notify containing the sigevent parameters is defined in aio.h:

  struct aio_notify {
struct task_struct  *target;
__u16   signo;
__u16   notify;
sigval_tvalue;
  };

  A struct aio_notify ki_notify is added to struct kiocb in include/linux/aio.h

  In io_submit_one(), if the application provided a sigevent then
aio_setup_sigevent() is called which does the following:

- check access to the user sigevent and make a local copy

- if the requested notification is SIGEV_NONE, then nothing to do

- fill in the kiocb->ki_notify fields (notify, signo, value)

- check sigevent consistency, get the signal target task and
  save it in kiocb->ki_notify.target

- preallocate a sigqueue for this event using __sigqueue_alloc()

  Upon request completion, in aio_complete(), if notification is needed for
this request (iocb->ki_notify.notify != SIGEV_NONE), then aio_send_signal()
is called to signal the target task as follows:

- fill in the siginfo struct to be sent to the task

- if notify is SIGEV_THREAD_ID then send signal to specific task
  using send_sigqueue()

- else send signal to task group using send_5group_sigqueue()



Notes concerning sigqueue preallocation:

 To ensure reliable delivery of completion notification, the sigqueue is
preallocated in the submission path so that there is no chance it can fail
in the completion path.

 Unlike the posix-timers case (currently the single other user of sigqueue
preallocation), where the sigqueue is allocated for the lifetime of the timer
and freed at timer destruction time, the aio case is a bit more tricky due to
the async nature of the whole thing.

  In the aio case, the sigqueue exists for the lifetime of the request, 
therefore
it must be freed only once the signal for the request completion has been
delivered.  The sigqueue is therefore freed by the signal delivery mechanism
when the signal is collected. The aio submission and completion error paths
do an explicit __sigqueue_free() to free the sigqueue.


 fs/aio.c|  114 ++--
 fs/compat.c |   18 +++
 include/linux/aio.h |   12 +
 include/linux/aio_abi.h |3 -
 kernel/signal.c |4 -
 5 files changed, 144 insertions(+), 7 deletions(-)

Signed-off-by: Sébastien Dugué <[EMAIL PROTECTED]>
Signed-off-by: Laurent Vivier <[EMAIL PROTECTED]>

Index: linux-2.6.20-rc6-mm3/fs/aio.c
===
--- linux-2.6.20-rc6-mm3.orig/fs/aio.c  2007-01-30 11:32:42.0 +0100
+++ linux-2.6.20-rc6-mm3/fs/aio.c   2007-01-30 11:41:41.0 +0100
@@ -419,6 +419,7 @@ static struct kiocb fastcall *__aio_get_
req->ki_dtor = NULL;
req->private = NULL;
req->ki_iovec = NULL;
+   req->ki_notify.sigq = NULL;
INIT_LIST_HEAD(>ki_run_list);
 
/* Check if the completion queue has enough free space to
@@ -465,6 +466,12 @@ static inline void really_put_req(struct
req->ki_dtor(req);
if (req->ki_iovec != >ki_inline_vec)
kfree(req->ki_iovec);
+
+   /* Release task ref */
+   if (req->ki_notify.notify == SIGEV_THREAD_ID ||
+   req->ki_notify.notify == SIGEV_SIGNAL)
+   put_task_struct(req->ki_notify.target);
+
kmem_cache_free(kiocb_cachep, req);
ctx->reqs_active--;
 
@@ -916,6 +923,78 @@ void fastcall kick_iocb(struct kiocb *io
 }
 EXPORT_SYMBOL(kick_iocb);
 
+static int aio_send_signal(struct aio_notify *notify)
+{
+   struct sigqueue *sigq = notify->sigq;
+   struct siginfo *info = >info;
+   int ret;
+
+   memset(info, 0, sizeof(struct siginfo));
+
+   info->si_signo = notify->signo;
+   info->si_errno = 0;
+   info->si_code = SI

[PATCH -mm 0/7][AIO] - AIO completion signal notification v6

2007-02-01 Thread Sébastien Dugué

  Hi Andrew,

  here is the latest rework of the aio notification and listio support
patches with comments from Oleg Nesterov and you addressed.

  Sébastien.

  This set now consists in 7 patches:

1. rework-compat-sys-io-submit: cleanup the sys_io_submit() compat 
layer,
   making it more efficient and laying out the base for the following 
patches

2. aio-header-fix-includes: fixes the double inclusion of uio.h in aio.h

3. aio-fix-access_ok-check: fixes the access_ok() checks

4. make-good_sigevent-non-static: move good_sigevent into signal.c, 
rename
   it to something more meaningfull and make it non-static

5. make-__sigqueue_alloc-free-non-static: make __sigqueue_alloc() and
   __sigqueue_free() non static

6. aio-notify-sig: the AIO completion signal notification

7. add-listio-syscall-support: adds listio support via a new syscall

  Description are in the individual patches.


  Changes from v5: based on comments from Oleg Nesterov and Andrew Morton

- Fix for signo range not checked in the SIGEV_THREAD_ID case in
  good_sigevent()
- renamed good_sigevent() to sigevent_find_task()

- Fixed the access_ok checks in sys_io_submit() and
  compat_sys_io_submit()

- Made __sigqueue_alloc() and __sigqueue_free() non static
- Reworked the aio code to directly use __sigqueue_alloc() and
  __sigqueue_free() and avoid messing with SIGQUEUE_PREALLOC

- Removed the unneeded PF_EXITING check in aio_setup_sigevent()
- Added a comment in aio_setup_sigevent() for why there is no task
  ref leak
- Changed read_lock(tasklist_lock) to rcu_read_lock() in
  aio_setup_sigevent()

- Fixed an ioctx reference leak in compat_sys_lio_submit()
- Renamed lio_check() as lio_notify()
- Added comments to lio_notify() and lio_create()
- Replaced sigqueue_free() by __sigqueue_free() in lio_notify()

  Changes from v4:
No comments received for v4, so it must be perfect ;-)
replaced the IOCB_CMD_GROUP listio implementation with Bharata's
syscall approach which I find much cleaner.

  Changes from v3:

- added justification for the compat_sys_io_submit() cleanup - Zach 
Brown
- more cleanups in compat_sys_io_submit() to put it in line with
  sys_io_submit() - Zach Brown

- Changed Export good_sigevent() patch name to Make good_sigevent()
  non-static to better describe what it does. - Christoph Hellwig
- Reworked good_sigevent() to make it more readable. - Christoph Hellwig

- Simplified the use of the SIGEV_* constants in signal notification -
  Christoph Hellwig
- Take a reference on the target task both for the SIGEV_THREAD_ID and
  SIGEV_SIGNAL cases.

  Changes from v2:
- rebased to 2.6.19-rc6-mm2
- reworked the sys_io_submit() compat layer as suggested by Zach Brown
- fixed saving of a pointer to a task struct in aio-notify-sig as
  pointed out by Andrew Morton

  Changes from v1:
- cleanups suggested by Christoph Hellwig, Badari Pulavarty and
Zach Brown
- added lisio patch


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH -mm 4/7][AIO] - Make good_sigevent() non-static

2007-02-01 Thread Sébastien Dugué

From: Sébastien Dugué [EMAIL PROTECTED]

Make good_sigevent() non-static and rename it


  Move good_sigevent() from posix-timers.c to signal.c where it belongs,
clean it up, rename it to sigevent_find_task() to better describe what the
function does and make it non-static so that it can be used by other
 subsystems.


 include/linux/signal.h |1 +
 kernel/posix-timers.c  |   19 +--
 kernel/signal.c|   24 
 3 files changed, 26 insertions(+), 18 deletions(-)

Signed-off-by: Sébastien Dugué [EMAIL PROTECTED]


Index: linux-2.6.20-rc6-mm3/include/linux/signal.h
===
--- linux-2.6.20-rc6-mm3.orig/include/linux/signal.h2007-01-30 
11:41:27.0 +0100
+++ linux-2.6.20-rc6-mm3/include/linux/signal.h 2007-01-30 11:41:36.0 
+0100
@@ -240,6 +240,7 @@ extern int sigprocmask(int, sigset_t *, 
 
 struct pt_regs;
 extern int get_signal_to_deliver(siginfo_t *info, struct k_sigaction 
*return_ka, struct pt_regs *regs, void *cookie);
+extern struct task_struct * sigevent_find_task(sigevent_t *);
 
 extern struct kmem_cache *sighand_cachep;
 
Index: linux-2.6.20-rc6-mm3/kernel/posix-timers.c
===
--- linux-2.6.20-rc6-mm3.orig/kernel/posix-timers.c 2007-01-30 
11:41:27.0 +0100
+++ linux-2.6.20-rc6-mm3/kernel/posix-timers.c  2007-01-30 11:41:36.0 
+0100
@@ -367,23 +367,6 @@ static enum hrtimer_restart posix_timer_
return ret;
 }
 
-static struct task_struct * good_sigevent(sigevent_t * event)
-{
-   struct task_struct *rtn = current-group_leader;
-
-   if ((event-sigev_notify  SIGEV_THREAD_ID ) 
-   (!(rtn = find_task_by_pid(event-sigev_notify_thread_id)) ||
-rtn-tgid != current-tgid ||
-(event-sigev_notify  ~SIGEV_THREAD_ID) != SIGEV_SIGNAL))
-   return NULL;
-
-   if (((event-sigev_notify  ~SIGEV_THREAD_ID) != SIGEV_NONE) 
-   ((event-sigev_signo = 0) || (event-sigev_signo  SIGRTMAX)))
-   return NULL;
-
-   return rtn;
-}
-
 void register_posix_clock(const clockid_t clock_id, struct k_clock *new_clock)
 {
if ((unsigned) clock_id = MAX_CLOCKS) {
@@ -496,7 +479,7 @@ sys_timer_create(const clockid_t which_c
new_timer-it_sigev_value = event.sigev_value;
 
read_lock(tasklist_lock);
-   if ((process = good_sigevent(event))) {
+   if ((process = sigevent_find_task(event))) {
/*
 * We may be setting up this process for another
 * thread.  It may be exiting.  To catch this
Index: linux-2.6.20-rc6-mm3/kernel/signal.c
===
--- linux-2.6.20-rc6-mm3.orig/kernel/signal.c   2007-01-30 11:41:27.0 
+0100
+++ linux-2.6.20-rc6-mm3/kernel/signal.c2007-01-30 11:41:36.0 
+0100
@@ -1213,6 +1213,30 @@ int group_send_sig_info(int sig, struct 
return ret;
 }
 
+/***
+ * sigevent_find_task - check and get target task from a sigevent.
+ * @event: the sigevent to be checked
+ *
+ * This function must be called with the tasklist_lock held for reading.
+ */
+struct task_struct * sigevent_find_task(sigevent_t * event)
+{
+   struct task_struct *task = NULL;
+
+   if (event-sigev_signo = 0 || event-sigev_signo  SIGRTMAX)
+   return NULL;
+
+   if ((event-sigev_notify  SIGEV_THREAD_ID ) == SIGEV_THREAD_ID) {
+   task = find_task_by_pid(event-sigev_notify_thread_id);
+
+   if (!task || task-tgid != current-tgid)
+   task = NULL;
+   } else if (event-sigev_notify == SIGEV_SIGNAL)
+   task = current-group_leader;
+
+   return task;
+}
+
 /*
  * kill_pgrp_info() sends a signal to a process group: this is what the tty
  * control characters do (^C, ^Z etc)
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH -mm 1/7][AIO] - Rework compat_sys_io_submit

2007-02-01 Thread Sébastien Dugué

From: Sébastien Dugué [EMAIL PROTECTED]

 compat_sys_io_submit() cleanup


  Cleanup compat_sys_io_submit by duplicating some of the native syscall
logic in the compat layer and directly calling io_submit_one() instead
of fooling the syscall into thinking it is called from a native 64-bit
caller.

  This eliminates:

  - the overhead of copying the nr iocb pointers on the userspace stack

  - the PAGE_SIZE/(sizeof(void *)) limit on the number of iocbs that
can be submitted.

  This is also needed for the completion notification patch to avoid having
to rewrite each iocb on the caller stack for io_submit_one() to find the
sigevents.


 compat.c |   61 ++---
 1 file changed, 34 insertions(+), 27 deletions(-)

Signed-off-by: Sébastien Dugué [EMAIL PROTECTED]

Index: linux-2.6.20-rc6-mm3/fs/compat.c
===
--- linux-2.6.20-rc6-mm3.orig/fs/compat.c   2007-01-30 10:05:08.0 
+0100
+++ linux-2.6.20-rc6-mm3/fs/compat.c2007-01-30 11:30:29.0 +0100
@@ -644,40 +644,47 @@ out:
return ret;
 }
 
-static inline long
-copy_iocb(long nr, u32 __user *ptr32, struct iocb __user * __user *ptr64)
-{
-   compat_uptr_t uptr;
-   int i;
-
-   for (i = 0; i  nr; ++i) {
-   if (get_user(uptr, ptr32 + i))
-   return -EFAULT;
-   if (put_user(compat_ptr(uptr), ptr64 + i))
-   return -EFAULT;
-   }
-   return 0;
-}
-
-#define MAX_AIO_SUBMITS(PAGE_SIZE/sizeof(struct iocb *))
-
 asmlinkage long
 compat_sys_io_submit(aio_context_t ctx_id, int nr, u32 __user *iocb)
 {
-   struct iocb __user * __user *iocb64; 
-   long ret;
+   struct kioctx *ctx;
+   long ret = 0;
+   int i;
 
if (unlikely(nr  0))
return -EINVAL;
 
-   if (nr  MAX_AIO_SUBMITS)
-   nr = MAX_AIO_SUBMITS;
-   
-   iocb64 = compat_alloc_user_space(nr * sizeof(*iocb64));
-   ret = copy_iocb(nr, iocb, iocb64);
-   if (!ret)
-   ret = sys_io_submit(ctx_id, nr, iocb64);
-   return ret;
+   if (unlikely(!access_ok(VERIFY_READ, iocb, (nr * sizeof(u32)
+   return -EFAULT;
+
+   ctx = lookup_ioctx(ctx_id);
+   if (unlikely(!ctx))
+   return -EINVAL;
+
+   for (i=0; inr; i++) {
+   compat_uptr_t uptr;
+   struct iocb __user *user_iocb;
+   struct iocb tmp;
+
+   if (unlikely(get_user(uptr, iocb + i))) {
+   ret = -EFAULT;
+   break;
+   }
+
+   user_iocb = compat_ptr(uptr);
+
+   if (unlikely(copy_from_user(tmp, user_iocb, sizeof(tmp {
+   ret = -EFAULT;
+   break;
+   }
+
+   ret = io_submit_one(ctx, user_iocb, tmp);
+   if (ret)
+   break;
+   }
+
+   put_ioctx(ctx);
+   return i ? i: ret;
 }
 
 struct compat_ncp_mount_data {
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH -mm 7/7][AIO] - Add listio syscall support

2007-02-01 Thread Sébastien Dugué

From: Bharata B Rao [EMAIL PROTECTED]

This patch provides POSIX listio support by means of a new system call.

long lio_submit(aio_context_t ctx_id, int mode, long nr,
struct iocb __user * __user *iocbpp, struct sigevent __user *event)

This system call is similar to the io_submit() system call, but takes
two more arguments.

'mode' argument can be LIO_WAIT or LIO_NOWAIT.

'event' argument specifies the signal notification mechanism.

This patch is built on support provided by the aio signal notification
patch by Sebastien. The following two structures together provide
the support for grouping iocbs belonging to a list (lio).

struct aio_notify {
struct task_struct  *target;
__u16   signo;
__u16   notify;
sigval_tvalue;
struct sigqueue *sigq;
};

struct lio_event {
atomic_tlio_users;
struct aio_notify   lio_notify;
};

A single lio_event struct is maintained for the list of iocbs.
lio_users holds the number of requests attached to this lio and lio_notify
has the necessary information for generating completion notification
signal.

If the mode is LIO_WAIT, the event argument is ignored and the system
call waits until all the requests of the lio are completed.

If the mode is LIO_NOWAIT, the system call returns immediately after
submitting the io requests and may optionally notify the process on
list io completion depending on the event argument.

Signed-off-by: Sébastien Dugué [EMAIL PROTECTED]
Signed-off-by: Laurent Vivier [EMAIL PROTECTED]
Signed-off-by: Bharata B Rao [EMAIL PROTECTED]
---

 arch/i386/kernel/syscall_table.S |1 
 arch/x86_64/ia32/ia32entry.S |5 -
 fs/aio.c |  193 ++-
 fs/compat.c  |  125 +
 include/asm-i386/unistd.h|3 
 include/asm-x86_64/unistd.h  |4 
 include/linux/aio.h  |   14 ++
 include/linux/aio_abi.h  |5 +
 include/linux/syscalls.h |2 
 9 files changed, 307 insertions(+), 45 deletions(-)

Index: linux-2.6.20-rc6-mm3/arch/i386/kernel/syscall_table.S
===
--- linux-2.6.20-rc6-mm3.orig/arch/i386/kernel/syscall_table.S  2007-01-31 
15:31:59.0 +0100
+++ linux-2.6.20-rc6-mm3/arch/i386/kernel/syscall_table.S   2007-01-31 
15:58:55.0 +0100
@@ -320,3 +320,4 @@ ENTRY(sys_call_table)
.long sys_getcpu
.long sys_epoll_pwait
.long sys_lutimesat /* 320 */
+   .long sys_lio_submit
Index: linux-2.6.20-rc6-mm3/arch/x86_64/ia32/ia32entry.S
===
--- linux-2.6.20-rc6-mm3.orig/arch/x86_64/ia32/ia32entry.S  2007-01-31 
15:31:59.0 +0100
+++ linux-2.6.20-rc6-mm3/arch/x86_64/ia32/ia32entry.S   2007-01-31 
15:58:55.0 +0100
@@ -714,8 +714,11 @@ ia32_sys_call_table:
.quad compat_sys_get_robust_list
.quad sys_splice
.quad sys_sync_file_range
-   .quad sys_tee
+   .quad sys_tee   /* 315 */
.quad compat_sys_vmsplice
.quad compat_sys_move_pages
.quad sys_getcpu
+   .quad quiet_ni_syscall  /* sys_epoll_wait */
+   .quad quiet_ni_syscall  /* 320: sys_lutimesat */
+   .quad compat_sys_lio_submit
 ia32_syscall_end:  
Index: linux-2.6.20-rc6-mm3/fs/aio.c
===
--- linux-2.6.20-rc6-mm3.orig/fs/aio.c  2007-01-31 15:31:59.0 +0100
+++ linux-2.6.20-rc6-mm3/fs/aio.c   2007-01-31 16:00:57.0 +0100
@@ -416,6 +416,7 @@ static struct kiocb fastcall *__aio_get_
req-ki_ctx = ctx;
req-ki_cancel = NULL;
req-ki_retry = NULL;
+   req-ki_lio = NULL;
req-ki_dtor = NULL;
req-private = NULL;
req-ki_iovec = NULL;
@@ -995,6 +996,72 @@ out_unlock:
return -EINVAL;
 }
 
+/*
+ * lio_notify
+ * When all iocbs belonging to an lio are complete, this initiates
+ * the completion notification for the lio.
+ *
+ * NOTE: lio is freed here after the last iocb completes, but only
+ * for LIO_NOWAIT case. In case of LIO_WAIT, the submitting thread
+ * waits for iocbs to complete and later frees the lio.
+ */
+void lio_notify(struct lio_event *lio)
+{
+   int ret;
+
+   ret = atomic_dec_and_test(lio-lio_users);
+
+   if (unlikely(ret)  lio-lio_notify.notify != SIGEV_NONE) {
+   /* last one - notify process */
+   if (aio_send_signal(lio-lio_notify))
+   __sigqueue_free(lio-lio_notify.sigq);
+   kfree(lio);
+   }
+}
+
+/*
+ * lio_create
+ * Allocates a lio_event structure which groups the iocbs belonging
+ * to the lio and sets up the completion notification.
+ *
+ * Case 1: LIO_NOWAIT and NULL sigevent - No need to group

[PATCH -mm 6/7][AIO] - AIO completion signal notification

2007-02-01 Thread Sébastien Dugué

From: Sébastien Dugué [EMAIL PROTECTED]

  AIO completion signal notification

  The current 2.6 kernel does not support notification of user space via
an RT signal upon an asynchronous IO completion. The POSIX specification
states that when an AIO request completes, a signal can be delivered to
the application as notification.

  This patch adds a struct sigevent *aio_sigeventp to the iocb.
The relevant fields (pid, signal number and value) are stored in the kiocb
for use when the request completes.

  That sigevent structure is filled by the application as part of the AIO
request preparation. Upon request completion, the kernel notifies the
application using those sigevent parameters. If SIGEV_NONE has been specified,
then the old behaviour is retained and the application must rely on polling
the completion queue using io_getevents().


  A struct sigevent *aio_sigeventp field is added to struct iocb in
include/linux/aio_abi.h

  A struct aio_notify containing the sigevent parameters is defined in aio.h:

  struct aio_notify {
struct task_struct  *target;
__u16   signo;
__u16   notify;
sigval_tvalue;
  };

  A struct aio_notify ki_notify is added to struct kiocb in include/linux/aio.h

  In io_submit_one(), if the application provided a sigevent then
aio_setup_sigevent() is called which does the following:

- check access to the user sigevent and make a local copy

- if the requested notification is SIGEV_NONE, then nothing to do

- fill in the kiocb-ki_notify fields (notify, signo, value)

- check sigevent consistency, get the signal target task and
  save it in kiocb-ki_notify.target

- preallocate a sigqueue for this event using __sigqueue_alloc()

  Upon request completion, in aio_complete(), if notification is needed for
this request (iocb-ki_notify.notify != SIGEV_NONE), then aio_send_signal()
is called to signal the target task as follows:

- fill in the siginfo struct to be sent to the task

- if notify is SIGEV_THREAD_ID then send signal to specific task
  using send_sigqueue()

- else send signal to task group using send_5group_sigqueue()



Notes concerning sigqueue preallocation:

 To ensure reliable delivery of completion notification, the sigqueue is
preallocated in the submission path so that there is no chance it can fail
in the completion path.

 Unlike the posix-timers case (currently the single other user of sigqueue
preallocation), where the sigqueue is allocated for the lifetime of the timer
and freed at timer destruction time, the aio case is a bit more tricky due to
the async nature of the whole thing.

  In the aio case, the sigqueue exists for the lifetime of the request, 
therefore
it must be freed only once the signal for the request completion has been
delivered.  The sigqueue is therefore freed by the signal delivery mechanism
when the signal is collected. The aio submission and completion error paths
do an explicit __sigqueue_free() to free the sigqueue.


 fs/aio.c|  114 ++--
 fs/compat.c |   18 +++
 include/linux/aio.h |   12 +
 include/linux/aio_abi.h |3 -
 kernel/signal.c |4 -
 5 files changed, 144 insertions(+), 7 deletions(-)

Signed-off-by: Sébastien Dugué [EMAIL PROTECTED]
Signed-off-by: Laurent Vivier [EMAIL PROTECTED]

Index: linux-2.6.20-rc6-mm3/fs/aio.c
===
--- linux-2.6.20-rc6-mm3.orig/fs/aio.c  2007-01-30 11:32:42.0 +0100
+++ linux-2.6.20-rc6-mm3/fs/aio.c   2007-01-30 11:41:41.0 +0100
@@ -419,6 +419,7 @@ static struct kiocb fastcall *__aio_get_
req-ki_dtor = NULL;
req-private = NULL;
req-ki_iovec = NULL;
+   req-ki_notify.sigq = NULL;
INIT_LIST_HEAD(req-ki_run_list);
 
/* Check if the completion queue has enough free space to
@@ -465,6 +466,12 @@ static inline void really_put_req(struct
req-ki_dtor(req);
if (req-ki_iovec != req-ki_inline_vec)
kfree(req-ki_iovec);
+
+   /* Release task ref */
+   if (req-ki_notify.notify == SIGEV_THREAD_ID ||
+   req-ki_notify.notify == SIGEV_SIGNAL)
+   put_task_struct(req-ki_notify.target);
+
kmem_cache_free(kiocb_cachep, req);
ctx-reqs_active--;
 
@@ -916,6 +923,78 @@ void fastcall kick_iocb(struct kiocb *io
 }
 EXPORT_SYMBOL(kick_iocb);
 
+static int aio_send_signal(struct aio_notify *notify)
+{
+   struct sigqueue *sigq = notify-sigq;
+   struct siginfo *info = sigq-info;
+   int ret;
+
+   memset(info, 0, sizeof(struct siginfo));
+
+   info-si_signo = notify-signo;
+   info-si_errno = 0;
+   info-si_code = SI_ASYNCIO;
+   info-si_pid = 0;
+   info-si_uid = 0;
+   info-si_value = notify-value

[PATCH -mm 2/7][AIO] - fix aio.h includes

2007-02-01 Thread Sébastien Dugué

From: Sébastien Dugué [EMAIL PROTECTED]

  Fix the double inclusion of linux/uio.h in linux/aio.h



 aio.h |1 -
 1 file changed, 1 deletion(-)

Signed-off-by: Sébastien Dugué [EMAIL PROTECTED]

Index: linux-2.6.20-rc6-mm3/include/linux/aio.h
===
--- linux-2.6.20-rc6-mm3.orig/include/linux/aio.h   2007-01-30 
10:05:08.0 +0100
+++ linux-2.6.20-rc6-mm3/include/linux/aio.h2007-01-30 10:11:22.0 
+0100
@@ -7,7 +7,6 @@
 #include linux/uio.h
 
 #include asm/atomic.h
-#include linux/uio.h
 
 #define AIO_MAXSEGS4
 #define AIO_KIOGRP_NR_ATOMIC   8
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH -mm 5/7][AIO] - Make __sigqueue_free() and __sigqueue_alloc() non static

2007-02-01 Thread Sébastien Dugué

From: Sébastien Dugué [EMAIL PROTECTED]

 Make __sigqueue_alloc() and __sigqueue_free() non static

  Allow subsystems to directly call into __sigqueue_alloc() and __sigqueue_free.
This is used by the AIO signal notification patch.


 include/linux/signal.h |3 +++
 kernel/signal.c|6 +++---
 2 files changed, 6 insertions(+), 3 deletions(-)

Signed-off-by: Sébastien Dugué [EMAIL PROTECTED]


Index: linux-2.6.20-rc6-mm3/include/linux/signal.h
===
--- linux-2.6.20-rc6-mm3.orig/include/linux/signal.h2007-01-30 
11:41:36.0 +0100
+++ linux-2.6.20-rc6-mm3/include/linux/signal.h 2007-01-30 11:41:39.0 
+0100
@@ -241,6 +241,9 @@ extern int sigprocmask(int, sigset_t *, 
 struct pt_regs;
 extern int get_signal_to_deliver(siginfo_t *info, struct k_sigaction 
*return_ka, struct pt_regs *regs, void *cookie);
 extern struct task_struct * sigevent_find_task(sigevent_t *);
+extern struct sigqueue *__sigqueue_alloc(struct task_struct *t, gfp_t flags,
+int override_rlimit);
+extern void __sigqueue_free(struct sigqueue *q);
 
 extern struct kmem_cache *sighand_cachep;
 
Index: linux-2.6.20-rc6-mm3/kernel/signal.c
===
--- linux-2.6.20-rc6-mm3.orig/kernel/signal.c   2007-01-30 11:41:36.0 
+0100
+++ linux-2.6.20-rc6-mm3/kernel/signal.c2007-01-30 11:41:39.0 
+0100
@@ -268,8 +268,8 @@ next_signal(struct sigpending *pending, 
return sig;
 }
 
-static struct sigqueue *__sigqueue_alloc(struct task_struct *t, gfp_t flags,
-int override_rlimit)
+struct sigqueue *__sigqueue_alloc(struct task_struct *t, gfp_t flags,
+ int override_rlimit)
 {
struct sigqueue *q = NULL;
struct user_struct *user;
@@ -295,7 +295,7 @@ static struct sigqueue *__sigqueue_alloc
return(q);
 }
 
-static void __sigqueue_free(struct sigqueue *q)
+void __sigqueue_free(struct sigqueue *q)
 {
if (q-flags  SIGQUEUE_PREALLOC)
return;
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH -mm 3/7][AIO] - Fix access_ok() checks

2007-02-01 Thread Sébastien Dugué

From: Sébastien Dugué [EMAIL PROTECTED]

   Fix access_ok() checks in sys_io_submit() and 
compat_sys_io_submit()


  sys_io_submit() and compat_sys_io_submit() check for the number of
requests (nr) being positive, but the following access_ok() call uses
nr * sizeof(ptr) which can overlow to a negative number.

  Just make sure that nr * sizeof(ptr) is positive in the first place.


 aio.c|2 +-
 compat.c |2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

Signed-off-by: Sébastien Dugué [EMAIL PROTECTED]

Index: linux-2.6.20-rc6-mm3/fs/aio.c
===
--- linux-2.6.20-rc6-mm3.orig/fs/aio.c  2007-01-30 11:28:56.0 +0100
+++ linux-2.6.20-rc6-mm3/fs/aio.c   2007-01-30 11:32:42.0 +0100
@@ -1630,7 +1630,7 @@ asmlinkage long sys_io_submit(aio_contex
long ret = 0;
int i;
 
-   if (unlikely(nr  0))
+   if (unlikely((nr*sizeof(*iocbpp))  0))
return -EINVAL;
 
if (unlikely(!access_ok(VERIFY_READ, iocbpp, (nr*sizeof(*iocbpp)
Index: linux-2.6.20-rc6-mm3/fs/compat.c
===
--- linux-2.6.20-rc6-mm3.orig/fs/compat.c   2007-01-30 11:30:29.0 
+0100
+++ linux-2.6.20-rc6-mm3/fs/compat.c2007-01-30 11:31:55.0 +0100
@@ -651,7 +651,7 @@ compat_sys_io_submit(aio_context_t ctx_i
long ret = 0;
int i;
 
-   if (unlikely(nr  0))
+   if (unlikely((nr * sizeof(u32))  0))
return -EINVAL;
 
if (unlikely(!access_ok(VERIFY_READ, iocb, (nr * sizeof(u32)
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: + aio-completion-signal-notification.patch added to -mm tree

2007-01-26 Thread Sébastien Dugué
On Fri, 26 Jan 2007 14:52:33 +0300 Oleg Nesterov <[EMAIL PROTECTED]> wrote:

> On 01/26, S?bastien Dugu? wrote:
> >
> > On Thu, 25 Jan 2007 19:21:41 +0300 Oleg Nesterov <[EMAIL PROTECTED]> wrote:
> > 
> > > > +   target = good_sigevent();
> > > > +
> > > > +   if (unlikely(!target || (target->flags & PF_EXITING)))
> > > > +   goto out_unlock;
> > > 
> > > PF_EXITING check is racy and unneded. In fact, it is wrong. If the main
> > > thread is already died, we can only use SIGEV_THREAD_ID signals, because
> > > otherwise good_sigevent() returns ->group_leader.
> > 
> >   Care to explain here please, I'm not following you.
> 
> My apologies, I was unclear.
> 
> This check is racy, the condition could be changed right after the check.
> 
> It is unneeded, it is ok to do send_sigqueue(tsk) if if that task is already
> dead. (we hold the reference to task_struct).
> 
> Now suppose that the main thread (->group_leader) already exited. This is
> normal, the thread group is still alive, it should be ok to send a signal to
> it via send_group_sigqueue(). But we can't: without SIGEV_THREAD_ID in
> ->sigev_notify good_event() returns ->group_leader, and it has PF_EXITING.

  Thanks, I understand the problem now. I will fix this.

> 
> Yes, kernel/posix-timers.c needs a cleanup too. But please note that it does
> this check for another reason (according to the comment). This reason is not
> valid now, the callsite for exit_itimers() was moved from __exit_signal() to
> do_exit().
> 
> > > > +   if (iocb->ki_notify.notify != SIGEV_NONE) {
> > > > +   ret = aio_send_signal(>ki_notify);
> > > > +
> > > > +   /* If signal generation failed, release the sigqueue */
> > > > +   if (ret)
> > > > +   sigqueue_free(iocb->ki_notify.sigq);
> > > 
> > > We should not use sigqueue_free() here. It takes current->sighand->siglock
> > > to remove sigqueue from "struct sigpending". But current is just a 
> > > "random"
> > > process here.
> > > 
> > > Yes, if I understand this patch correctly, it is not possible that this
> > > sigqueue is pending, but still this is bad imho.
> > 
> >  Yes, in fact the sigqueue is used for a single signal delivery and then
> > free. In fact I could have used directly __sigqueue_free() instead here
> > except for the fact that it's private to signal.c and I'm reluctant
> > to export it to other subsystems.
> 
> I personally think it is better to export __sigqueue_free() even if 
> sigqueue_free()
> happens to work. It is to fragile imho to reference current->sighand. At least
> we need a fat comment.

  OK.

> 
> > > >  static void __sigqueue_free(struct sigqueue *q)
> > > >  {
> > > > -   if (q->flags & SIGQUEUE_PREALLOC)
> > > > +   if (q->flags & SIGQUEUE_PREALLOC && q->info.si_code != 
> > > > SI_ASYNCIO)
> > > > return;
> > > 
> > > Oh, this is not nice. Could we change send_sigqueue/send_group_sigqueue
> > > instead ?
> > 
> >   Yep, that's the other solution.
> > 
> > > 
> > > - BUG_ON(!(q->flags & SIGQUEUE_PREALLOC));
> > > + BUG_ON(!(q->flags & SIGQUEUE_PREALLOC) && q->info.si_code != 
> > > SI_ASYNCIO);
> > > 
> > > This way aio can use __sigqueue_alloc/__sigqueue_free directly and forget
> > > about SIGQUEUE_PREALLOC.
> > 
> >   Well, I don't think it's cleaner. The aio error path calls sigqueue_free()
> > directly whereas in case of success sigqueue_free() is called from the 
> > signal
> > delivery path.
> 
> Hmm... now I don't understand you. Of course, the aio error path should use
> __sigqueue_free() if we don't use SIGQUEUE_PREALLOC (and imho we should not).
> 
> And the signal delivery path uses __sigqueue_free() too.
> 
> ?
> 
> > > I'd suggest to not use this interface. Just use group_send_sig_info() or
> > > specific_send_sig_info(). Yes, this way we will do GFP_ATOMIC allocation
> > > of sigqueue in interrupt context, but is this so bad in this case?
> > 
> >   Well, the thihere is that in the past we used group_send_sig_info()
> > and specific_send_sig_info() for notification but Zach Brown raised
> > the question about reliable signal delivery. IOW an aio submission
> > should not succeed if signal delivery is going to fail. Hence the
> > use of the preallocated sigqueue.
> 
> Ok, I see, thanks.
> 
> Oleg.
> 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: + aio-completion-signal-notification.patch added to -mm tree

2007-01-26 Thread Sébastien Dugué
On Thu, 25 Jan 2007 19:21:41 +0300 Oleg Nesterov <[EMAIL PROTECTED]> wrote:

> Sebastien Dugue wrote:
> >
> > +static long aio_setup_sigevent(struct aio_notify *notify,
> > +  struct sigevent __user *user_event)
> > +{
> > +   sigevent_t event;
> > +   struct task_struct *target;
> > +
> > +   if (copy_from_user(, user_event, sizeof(event)))
> > +   return -EFAULT;
> > +
> > +   if (event.sigev_notify == SIGEV_NONE)
> > +   return 0;
> > +
> > +   notify->notify = event.sigev_notify;
> > +   notify->signo = event.sigev_signo;
> > +   notify->value = event.sigev_value;
> > +
> > +   read_lock(_lock);
> 
> We don't need tasklist_lock, we can use rcu_read_lock() instead.

  Right that will be beneficial, will change.

> 
> > +   target = good_sigevent();
> > +
> > +   if (unlikely(!target || (target->flags & PF_EXITING)))
> > +   goto out_unlock;
> 
> PF_EXITING check is racy and unneded. In fact, it is wrong. If the main
> thread is already died, we can only use SIGEV_THREAD_ID signals, because
> otherwise good_sigevent() returns ->group_leader.

  Care to explain here please, I'm not following you.

> 
> > @@ -994,6 +1077,15 @@ int fastcall aio_complete(struct kiocb *
> > kunmap_atomic(ring, KM_IRQ1);
> >
> > pr_debug("added to ring %p at [%lu]\n", iocb, tail);
> > +
> > +   if (iocb->ki_notify.notify != SIGEV_NONE) {
> > +   ret = aio_send_signal(>ki_notify);
> > +
> > +   /* If signal generation failed, release the sigqueue */
> > +   if (ret)
> > +   sigqueue_free(iocb->ki_notify.sigq);
> 
> We should not use sigqueue_free() here. It takes current->sighand->siglock
> to remove sigqueue from "struct sigpending". But current is just a "random"
> process here.
> 
> Yes, if I understand this patch correctly, it is not possible that this
> sigqueue is pending, but still this is bad imho.

 Yes, in fact the sigqueue is used for a single signal delivery and then
free. In fact I could have used directly __sigqueue_free() instead here
except for the fact that it's private to signal.c and I'm reluctant
to export it to other subsystems.

> 
> >  static void __sigqueue_free(struct sigqueue *q)
> >  {
> > -   if (q->flags & SIGQUEUE_PREALLOC)
> > +   if (q->flags & SIGQUEUE_PREALLOC && q->info.si_code != SI_ASYNCIO)
> > return;
> 
> Oh, this is not nice. Could we change send_sigqueue/send_group_sigqueue
> instead ?

  Yep, that's the other solution.

> 
> - BUG_ON(!(q->flags & SIGQUEUE_PREALLOC));
> + BUG_ON(!(q->flags & SIGQUEUE_PREALLOC) && q->info.si_code != 
> SI_ASYNCIO);
> 
> This way aio can use __sigqueue_alloc/__sigqueue_free directly and forget
> about SIGQUEUE_PREALLOC.

  Well, I don't think it's cleaner. The aio error path calls sigqueue_free()
directly whereas in case of success sigqueue_free() is called from the signal
delivery path.

> 
> On the other hand, imho this patch takes a wrong direction.
> 
> The purpose of SIGQUEUE_PREALLOC + send_sigqueue() is to re-use the same
> sigqueue while sending a stream of signals. But in aio case we allocate
> sigqueue to send only 1 signal, then it freed after the delivery like
> the regular sigqueue. So what is the point?
> 
> I'd suggest to not use this interface. Just use group_send_sig_info() or
> specific_send_sig_info(). Yes, this way we will do GFP_ATOMIC allocation
> of sigqueue in interrupt context, but is this so bad in this case?

  Well, the thihere is that in the past we used group_send_sig_info()
and specific_send_sig_info() for notification but Zach Brown raised
the question about reliable signal delivery. IOW an aio submission
should not succeed if signal delivery is going to fail. Hence the
use of the preallocated sigqueue.

  Thanks,

  Sébastien.





-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


  1   2   3   >