Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-16 Thread Paul Mackerras
Herbert Xu writes:

 It doesn't matter.  The memory pressure flag is an *advisory*
 flag.  If we get it wrong the worst that'll happen is that we'd
 waste some time doing work that'll be thrown away.

Ah, so it's the racy but I don't care because it's only an
optimization case.  That's fine.  Somehow I find it hard to believe
that all the racy uses of atomic_read in the kernel are like that,
though. :)

 In any case, this actually illustrates why the addition of
 volatile is completely pointless.  Even if this code was
 broken, which it definitely is not, having the volatile
 there wouldn't have helped at all.

Yes, adding volatile to racy code doesn't somehow make it race-free.
Neither does using atomic_t, despite what some seem to believe.

I have actually started going through all the uses of atomic_read in
the kernel.  So far out of the first 100 I have found none where we
have two atomic_reads of the same variable and the compiler could
usefully use the value from the first as the result of the second.
But there's still  2500 to go...

Paul.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: drivers/net/tokenring/3c359.c

2007-08-16 Thread Sivakumar Subramani
What exactly the difference between kzalloc and kcalloc? From the
definition, I could see that kcalloc should be used for array
allocation. But I could see kzalloc is used for allocation arrays as in
the below patch.

Any coding standard (or) developers can use kzalloc and kcalloc as per
their coding practice??

Thanks,
~Siva
-Original Message-
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]
On Behalf Of Jeff Garzik
Sent: Tuesday, August 14, 2007 11:20 AM
To: [EMAIL PROTECTED]
Cc: [EMAIL PROTECTED]; netdev@vger.kernel.org; [EMAIL PROTECTED];
Linux Kernel; kernel-janitors
Subject: Re: drivers/net/tokenring/3c359.c

Surya Prabhakar N wrote:
 Hi,
Replacing kmalloc with kzalloc and cleaning up memset in 
 drivers/net/tokenring/3c359.c
 
 
 Signed-off-by: Surya Prabhakar [EMAIL PROTECTED]

applied


-
To unsubscribe from this list: send the line unsubscribe netdev in the
body of a message to [EMAIL PROTECTED] More majordomo info at
http://vger.kernel.org/majordomo-info.html
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-16 Thread Paul Mackerras
Herbert Xu writes:

 On Thu, Aug 16, 2007 at 02:11:43PM +1000, Paul Mackerras wrote:
 
  The uses of atomic_read where one might want it to allow caching of
  the result seem to me to fall into 3 categories:
  
  1. Places that are buggy because of a race arising from the way it's
 used.
  
  2. Places where there is a race but it doesn't matter because we're
 doing some clever trick.
  
  3. Places where there is some locking in place that eliminates any
 potential race.
 
 Agreed.
 
  In case 1, adding volatile won't solve the race, of course, but it's
  hard to argue that we shouldn't do something because it will slow down
  buggy code.  Case 2 is hopefully pretty rare and accompanied by large
  comment blocks, and in those cases caching the result of atomic_read
  explicitly in a local variable would probably make the code clearer.
  And in case 3 there is no reason to use atomic_t at all; we might as
  well just use an int.
 
 Since adding volatile doesn't help any of the 3 cases, and
 takes away optimisations from both 2 and 3, I wonder what
 is the point of the addition after all?

Note that I said these are the cases _where one might want to allow
caching_, so of course adding volatile doesn't help _these_ cases.
There are of course other cases where one definitely doesn't want to
allow the compiler to cache the value, such as when polling an atomic
variable waiting for another CPU to change it, and from my inspection
so far these cases seem to be the majority.

The reasons for having volatile behaviour of atomic_read (whether or
not that is achieved by use of the volatile C keyword) are

- It matches the normal expectation based on the name atomic_read
- It matches the behaviour of the other atomic_* primitives
- It avoids bugs in the cases where volatile behaviour is required

To my mind these outweigh the small benefit for some code of the
non-volatile (caching-allowed) behaviour.  In fact it's pretty minor
either way, and since x86[-64] has this behaviour, one can expect the
potential bugs in generic code to have mostly been found, although
perhaps not all of them since x86[-64] has less aggressive reordering
of memory accesses and fewer registers in which to cache things than
some other architectures.

Paul.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-16 Thread Herbert Xu
On Thu, Aug 16, 2007 at 04:56:21PM +1000, Paul Mackerras wrote:

 Note that I said these are the cases _where one might want to allow
 caching_, so of course adding volatile doesn't help _these_ cases.
 There are of course other cases where one definitely doesn't want to
 allow the compiler to cache the value, such as when polling an atomic
 variable waiting for another CPU to change it, and from my inspection
 so far these cases seem to be the majority.

We've been through that already.  If it's a busy-wait it
should use cpu_relax.  If it's scheduling away that already
forces the compiler to reread anyway.

Do you have an actual example where volatile is needed?

 - It matches the normal expectation based on the name atomic_read
 - It matches the behaviour of the other atomic_* primitives

Can't argue since you left out what those expectations
or properties are.

 - It avoids bugs in the cases where volatile behaviour is required

Do you (or anyone else for that matter) have an example of this?

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmVHI~} [EMAIL PROTECTED]
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-16 Thread Stefan Richter
Herbert Xu wrote:
 On Thu, Aug 16, 2007 at 04:56:21PM +1000, Paul Mackerras wrote:

 Note that I said these are the cases _where one might want to allow
 caching_, so of course adding volatile doesn't help _these_ cases.
 There are of course other cases where one definitely doesn't want to
 allow the compiler to cache the value, such as when polling an atomic
 variable waiting for another CPU to change it, and from my inspection
 so far these cases seem to be the majority.
 
 We've been through that already.  If it's a busy-wait it
 should use cpu_relax.  If it's scheduling away that already
 forces the compiler to reread anyway.
 
 Do you have an actual example where volatile is needed?
 
 - It matches the normal expectation based on the name atomic_read
 - It matches the behaviour of the other atomic_* primitives
 
 Can't argue since you left out what those expectations
 or properties are.

We use atomic_t for data that is concurrently locklessly written and
read at arbitrary times.  My naive expectation as driver author (driver
maintainer) is that all atomic_t accessors, including atomic_read, (and
atomic bitops) work with the then current value of the atomic data.

 - It avoids bugs in the cases where volatile behaviour is required
 
 Do you (or anyone else for that matter) have an example of this?

The only code I somewhat know, the ieee1394 subsystem, was perhaps
authored and is currently maintained with the expectation that each
occurrence of atomic_read actually results in a load operation, i.e. is
not optimized away.  This means all atomic_t (bus generation, packet and
buffer refcounts, and some other state variables)* and likewise all
atomic bitops in that subsystem.

If that assumption is wrong, then what is the API or language primitive
to force a load operation to occur?


*)  Interesting what a quick LXR session in search for all atomic_t
usages in 'my' subsystem brings to light.  I now noticed an apparently
unused struct member in the bitrotting pcilynx driver, and more
importantly, a pairing of two atomic_t variables in raw1394 that should
be audited for race conditions and for possible replacement by plain int.
-- 
Stefan Richter
-=-=-=== =--- =
http://arcgraph.de/sr/
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] [DM9000] Add support for big-endian hosts

2007-08-16 Thread Laurent Pinchart
On Tuesday 14 August 2007 20:25, Ben Dooks wrote:
 On Tue, Aug 14, 2007 at 01:22:32PM +0200, Laurent Pinchart wrote:
  This patch splits the receive status in 8bit wide fields and convert the
  packet length from little endian to CPU byte order.

 Which version of the the kernel was this against, it applies with
 fuzz to 2.6.23-rc3:

 $ patch -p1  ~/dm9000-fix-endianness.patch
 patching file drivers/net/dm9000.c
 Hunk #1 succeeded at 894 (offset 15 lines).
 Hunk #2 succeeded at 936 (offset 15 lines).
 Hunk #3 succeeded at 948 (offset 15 lines).

That was against 2.6.22. Sorry. I'll post updated patches against 2.6.23-rc3.

Best regards.

-- 
Laurent Pinchart
CSE Semaphore Belgium

Chaussée de Bruxelles, 732A
B-1410 Waterloo
Belgium

T +32 (2) 387 42 59
F +32 (2) 387 42 75
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 0/2] [DM9000] Support for big endian hosts and external PHY

2007-08-16 Thread Laurent Pinchart
These patches update the DM9000 driver to add support for big endian hosts and 
external PHY.

They are resubmissions of patches based on 2.6.22, that I have now updated to 
2.6.23-rc3.

-- 
Laurent Pinchart
CSE Semaphore Belgium

Chaussée de Bruxelles, 732A
B-1410 Waterloo
Belgium

T +32 (2) 387 42 59
F +32 (2) 387 42 75
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/2] [DM9000] Added support for big-endian hosts

2007-08-16 Thread Laurent Pinchart
This patch splits the receive status in 8bit wide fields and convert the
packet length from little endian to CPU byte order.

Signed-off-by: Laurent Pinchart [EMAIL PROTECTED]
---
 drivers/net/dm9000.c |   13 +++--
 1 files changed, 7 insertions(+), 6 deletions(-)

diff --git a/drivers/net/dm9000.c b/drivers/net/dm9000.c
index c3de81b..a424810 100644
--- a/drivers/net/dm9000.c
+++ b/drivers/net/dm9000.c
@@ -894,7 +894,8 @@ dm9000_timer(unsigned long data)
 }
 
 struct dm9000_rxhdr {
-   u16 RxStatus;
+   u8  RxPktReady;
+   u8  RxStatus;
u16 RxLen;
 } __attribute__((__packed__));
 
@@ -935,7 +936,7 @@ dm9000_rx(struct net_device *dev)
 
(db-inblk)(db-io_data, rxhdr, sizeof(rxhdr));
 
-   RxLen = rxhdr.RxLen;
+   RxLen = le16_to_cpu(rxhdr.RxLen);
 
/* Packet Status check */
if (RxLen  0x40) {
@@ -947,17 +948,17 @@ dm9000_rx(struct net_device *dev)
PRINTK1(RST: RX Len:%x\n, RxLen);
}
 
-   if (rxhdr.RxStatus  0xbf00) {
+   if (rxhdr.RxStatus  0xbf) {
GoodPacket = false;
-   if (rxhdr.RxStatus  0x100) {
+   if (rxhdr.RxStatus  0x01) {
PRINTK1(fifo error\n);
db-stats.rx_fifo_errors++;
}
-   if (rxhdr.RxStatus  0x200) {
+   if (rxhdr.RxStatus  0x02) {
PRINTK1(crc error\n);
db-stats.rx_crc_errors++;
}
-   if (rxhdr.RxStatus  0x8000) {
+   if (rxhdr.RxStatus  0x80) {
PRINTK1(length error\n);
db-stats.rx_length_errors++;
}
-- 
1.5.0

-- 
Laurent Pinchart
CSE Semaphore Belgium

Chaussée de Bruxelles, 732A
B-1410 Waterloo
Belgium

T +32 (2) 387 42 59
F +32 (2) 387 42 75
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/2] [DM9000] External PHY support

2007-08-16 Thread Laurent Pinchart
This patch adds a flag to the DM9000 platform data which, when set,
configures the device to use an external PHY.

Signed-off-by: Laurent Pinchart [EMAIL PROTECTED]
---
 drivers/net/dm9000.c   |6 ++
 include/linux/dm9000.h |1 +
 2 files changed, 7 insertions(+), 0 deletions(-)

diff --git a/drivers/net/dm9000.c b/drivers/net/dm9000.c
index a424810..a86984e 100644
--- a/drivers/net/dm9000.c
+++ b/drivers/net/dm9000.c
@@ -136,6 +136,7 @@ typedef struct board_info {
u16 dbug_cnt;
u8 io_mode; /* 0:word, 2:byte */
u8 phy_addr;
+   unsigned int flags;
 
void (*inblk)(void __iomem *port, void *data, int length);
void (*outblk)(void __iomem *port, void *data, int length);
@@ -528,6 +529,8 @@ dm9000_probe(struct platform_device *pdev)
 
if (pdata-dumpblk != NULL)
db-dumpblk = pdata-dumpblk;
+
+   db-flags = pdata-flags;
}
 
dm9000_reset(db);
@@ -670,6 +673,9 @@ dm9000_init_dm9000(struct net_device *dev)
iow(db, DM9000_GPCR, GPCR_GEP_CNTL);/* Let GPIO0 output */
iow(db, DM9000_GPR, 0); /* Enable PHY */
 
+   if (db-flags  DM9000_PLATF_EXT_PHY)
+   iow(db, DM9000_NCR, NCR_EXT_PHY);
+
/* Program operating register */
iow(db, DM9000_TCR, 0); /* TX Polling clear */
iow(db, DM9000_BPTR, 0x3f); /* Less 3Kb, 200us */
diff --git a/include/linux/dm9000.h b/include/linux/dm9000.h
index 0008e2a..ea530fd 100644
--- a/include/linux/dm9000.h
+++ b/include/linux/dm9000.h
@@ -19,6 +19,7 @@
 #define DM9000_PLATF_8BITONLY  (0x0001)
 #define DM9000_PLATF_16BITONLY (0x0002)
 #define DM9000_PLATF_32BITONLY (0x0004)
+#define DM9000_PLATF_EXT_PHY   (0x0008)
 
 /* platfrom data for platfrom device structure's platfrom_data field */
 
-- 
1.5.0

-- 
Laurent Pinchart
CSE Semaphore Belgium

Chaussée de Bruxelles, 732A
B-1410 Waterloo
Belgium

T +32 (2) 387 42 59
F +32 (2) 387 42 75
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 2.6.23-rc3 and SKY2 driver issue

2007-08-16 Thread Michal Piotrowski
[Adding Stephen and netdev to CC]

On 15/08/07, James Corey [EMAIL PROTECTED] wrote:

 I tried running a D-link gig card on kernel 2.6.21.1
 and it came up fine, but when I did a sftp of
 an linux dvd ISO to it, the interface would lock
 up hard with the error

 eth1: hw csum failure.

 Call Trace:
  IRQ  [804d3e31]
 __skb_checksum_complete_head+0x46/0x5f
  [804d3e56] __skb_checksum_complete+0xc/0x11
  [805046c7] tcp_v4_rcv+0x157/0x810
  [804d8a6f] dev_queue_xmit+0x237/0x260
  [80229990] find_busiest_group+0x252/0x684
  [804ead50] ip_local_deliver+0xca/0x14c
  [804eb24a] ip_rcv+0x478/0x4ba
  [803ec7d3] sky2_poll+0x6f9/0x9b9
  [8022bd5d]
 run_rebalance_domains+0x13e/0x408
  [804d877a] net_rx_action+0xa8/0x166
  [80235d62] __do_softirq+0x55/0xc3
  [8020a4ec] call_softirq+0x1c/0x28
  [8020b611] do_softirq+0x2c/0x7d
  [8020b8cf] do_IRQ+0x13e/0x15f
  [802086ce] mwait_idle+0x0/0x46
  [80209871] ret_from_intr+0x0/0xa
  EOI  [80208710] mwait_idle+0x42/0x46
  [80208666] cpu_idle+0x8c/0xaf
  [8078174e] start_kernel+0x2ac/0x2b8
  [80781140] _sinittext+0x140/0x144


 So I tried running the latest snapshot 2.6.23-rc3
 and the almost the same thing happens. Only
 difference is that now the entire box locks up.
 The error is almost the same

 eth1: hw csum failure.

 Call Trace:
  IRQ  [804779b6]
 __skb_checksum_complete_head+0x43/0x56
  [804779d5] __skb_checksum_complete+0xc/0x11
  [804a989d] tcp_v4_rcv+0x14e/0x801
  [8048ff84] ip_local_deliver+0xca/0x14c
  [80490472] ip_rcv+0x46c/0x4ae
  [880060f9] :sky2:sky2_poll+0x72b/0x9c7
  [8047c934] net_rx_action+0xa8/0x166
  [80235ced] __do_softirq+0x55/0xc4
  [8020c5cc] call_softirq+0x1c/0x28
  [8020d6fd] do_softirq+0x2c/0x7d
  [8020d9bb] do_IRQ+0x13e/0x15f
  [8020a780] mwait_idle+0x0/0x48
  [8020b951] ret_from_intr+0x0/0xa
  EOI  [880063e7]
 :sky2:sky2_xmit_frame+0x0/0x41d
  [8020a7c2] mwait_idle+0x42/0x48
  [8020a718] cpu_idle+0xbd/0xe0
  [80704a5a] start_kernel+0x2ac/0x2b8
  [80704140] _sinittext+0x140/0x144


 I see that the new kernel includes some sort of
 SKY2 DEBUG stuff. I would be happy to rerun
 the test with that turned on, given some minor
 direction.


Regards,
Michal

-- 
LOG
http://www.stardust.webpages.pl/log/
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-16 Thread Herbert Xu
On Thu, Aug 16, 2007 at 10:06:31AM +0200, Stefan Richter wrote:
  
  Do you (or anyone else for that matter) have an example of this?
 
 The only code I somewhat know, the ieee1394 subsystem, was perhaps
 authored and is currently maintained with the expectation that each
 occurrence of atomic_read actually results in a load operation, i.e. is
 not optimized away.  This means all atomic_t (bus generation, packet and
 buffer refcounts, and some other state variables)* and likewise all
 atomic bitops in that subsystem.

Can you find an actual atomic_read code snippet there that is
broken without the volatile modifier?

Thanks,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmVHI~} [EMAIL PROTECTED]
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [patch 4/4] Update e1000 driver to use devres.

2007-08-16 Thread Waskiewicz Jr, Peter P
 Index: linux-2.6/drivers/net/e1000/e1000_main.c
 ===
 --- linux-2.6.orig/drivers/net/e1000/e1000_main.c
 +++ linux-2.6/drivers/net/e1000/e1000_main.c
 @@ -860,15 +860,14 @@ e1000_probe(struct pci_dev *pdev,  {
   struct net_device *netdev;
   struct e1000_adapter *adapter;
 - unsigned long mmio_start, mmio_len;
 - unsigned long flash_start, flash_len;
 + unsigned long mmio_len, flash_len;
  
   static int cards_found = 0;
   static int global_quad_port_a = 0; /* global ksp3 port 
 a indication */
   int i, err, pci_using_dac;
   uint16_t eeprom_data = 0;
   uint16_t eeprom_apme_mask = E1000_EEPROM_APME;
 - if ((err = pci_enable_device(pdev)))
 + if ((err = pcim_enable_device(pdev)))
   return err;
  
   if (!(err = pci_set_dma_mask(pdev, DMA_64BIT_MASK))  
 @@ -878,20 +877,19 @@ e1000_probe(struct pci_dev *pdev,
   if ((err = pci_set_dma_mask(pdev, DMA_32BIT_MASK)) 
   (err = pci_set_consistent_dma_mask(pdev, 
 DMA_32BIT_MASK))) {
   E1000_ERR(No usable DMA configuration, 
 aborting\n);
 - goto err_dma;
 + return err;
   }
   pci_using_dac = 0;
   }
  
   if ((err = pci_request_regions(pdev, e1000_driver_name)))
 - goto err_pci_reg;
 + return err;
  
   pci_set_master(pdev);
  
 - err = -ENOMEM;
 - netdev = alloc_etherdev(sizeof(struct e1000_adapter));
 + netdev = devm_alloc_etherdev(pdev-dev, sizeof(struct 
 +e1000_adapter));
   if (!netdev)
 - goto err_alloc_etherdev;
 + return -ENOMEM;

I'm a bit confused why you removed the goto's, and then removed all the
target unwinding code at the bottom of e1000_probe().   Those labels
clean up resources if something fails, like the err_sw_init label.  I
don't see anything in the devres code that jumps out at me that explains
why we can do away with these cleanup routines.  Thoughts?

Thanks,
-PJ Waskiewicz
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: drivers/infiniband/mlx/mad.c misplaced ;

2007-08-16 Thread Heiko Carstens
On Wed, Aug 15, 2007 at 05:40:11PM -0700, Joe Perches wrote:
 On Wed, 2007-08-15 at 19:58 -0400, Dave Jones wrote:
  Signed-off-by: Dave Jones [EMAIL PROTECTED]
  
  diff --git a/drivers/infiniband/hw/mlx4/mad.c 
  b/drivers/infiniband/hw/mlx4/mad.c
  index 3330917..0ed02b7 100644
  --- a/drivers/infiniband/hw/mlx4/mad.c
  +++ b/drivers/infiniband/hw/mlx4/mad.c
  @@ -109,7 +109,7 @@ int mlx4_MAD_IFC(struct mlx4_ib_dev *dev, int 
  ignore_mkey, int ignore_bkey,
 in_modifier, op_modifier,
 MLX4_CMD_MAD_IFC, MLX4_CMD_TIME_CLASS_C);
   
  -   if (!err);
  +   if (!err)
 
 There's more than a few of these (not inspected).
 
 $ egrep -r --include=*.c \bif[[:space:]]*\([^\)]*\)[[:space:]]*\; * 

[...]

 drivers/s390/scsi/zfcp_erp.c:   if (status == ZFCP_ERP_SUCCEEDED) ;   
   /* no further action */

At least this one is not a bug. But I'm going to add a break there, so it
doesn't look that strange anymore. Thanks!
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [GENETLINK] some thoughts on the usage

2007-08-16 Thread Richard MUSIL
Thomas Graf wrote:
 * Richard MUSIL [EMAIL PROTECTED] 2007-08-10 10:45
 I have noticed that although ops for each family are the same (each
 device is functionally same) I cannot use same genl_ops struct for
 registration, because it uses internal member to link in list. Therefore
 it is necessary to allocate new genl_ops for each device and pass it to
 registration. But I cannot officially use this list to track those
 genl_ops (so I can properly destroy them later), because there is no
 interface. So I need to redo the management of the structures on my own.
 
 The intended usage of the interface in your example would be to register
 only one genetlink family, say tpm, register one set of operations
 and then have an attribute in every message which specifies which TPM
 device to use. This helps keeping the total number of genetlink families
 down.

I got your point. The fact that there are several families of the same
device type seems however quite convenient. For example, I
create/register virtual device /dev/tpm0 and register family with the
same name for that device, the same for /dev/tpm1 etc. Then I got
straightforward association in between devices and families and get for
free the whole management what happen if I try to talk to device which
is not registered/present etc.
I could multiplex it over one channel, but I will need to make the
communication protocol more complex and make me handle all exceptions
myself.
Since in my case there will be probably not more than one device
present, and the device itself is quite exotic I would probably not
rewrite it to the multiplexing scheme, but I understand what you mean
and will take it into account next time I face decision how to use
genetlink.

However I am still wondering, whether the allocation of structures
(genl_family, genl_ops) should not be done by genetlink layer instead of
the user (I mean allocating copy of the struct passed by user). This is
for example, what TPM layer (tpm.c) does. This would come at slight cost
at memory usage and performance, but will protect the caller from
inspecting internal behavior and taking care of that. And internal links
would nicely help in keeping of track of allocated structures. I could
write a patch for this.

 The second inconvenience is that for each family I register, I also
 register basically same ops (basically means, the definitions, and doit,
 dumpit handlers are same, though the structures are at different
 addresses for reasons described above). When the handler receives the
 message it needs to associate the message with the actual device it is
 handling. This could be done through family lookup (using
 nlmsghdr::nlmsg_type), but I wondered if it would make sense to extend
 genl_family for user custom data pointer and then pass this custom data
 (or genl_family reference) to each handler (for example inside
 genl_info). It is already parsed by genetlink layer, so it should not
 slow things down.
 
 That's not a bad idea, although I think we should try and keep the
 generic netlink part as simple as possible. There is a family specific
 header, referred to as user header in genl_info which is basically
 what you're looking for with the custom header. I believe making the
 generic netlink family aware of anything beyond family id and operations
 id only complicates things.

Ok, this was just an idea ;-) - probably important only for high
performance genetlink users.

Richard
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-16 Thread Satyam Sharma
[ Bill tells me in private communication he gets this already, but I
  think it's more complicated than the shoddy explanation I'd made
  earlier so would wish to make this clearer in detail one last time,
  for the benefit of others listening in or reading the archives. ]


On Thu, 16 Aug 2007, Satyam Sharma wrote:

 On Thu, 16 Aug 2007, Satyam Sharma wrote:
 [...]
  On Wed, 15 Aug 2007, Bill Fink wrote:
   [...]
   I'm curious about one minor tangential point.  Why, instead of:
   
 b = *(volatile int *)a;
   
   why can't this just be expressed as:
   
 b = (volatile int)a;
   
   Isn't it the contents of a that's volatile, i.e. it's value can change
   invisibly to the compiler, and that's why you want to force a read from
   memory?  Why do you need the *(volatile int *) construct?
  
  b = (volatile int)a; doesn't help us because a cast to a qualified type
  has the same effect as a cast to an unqualified version of that type, as
  mentioned in 6.5.4:4 (footnote 86) of the standard. Note that volatile
  is a type-qualifier, not a type itself, so a cast of the _object_ itself
  to a qualified-type i.e. (volatile int) would not make the access itself
  volatile-qualified.

Casts don't produce lvalues, and the cast ((volatile int)a) does not
produce the object-int-a-qualified-as-volatile -- in fact, the
result of the above cast is whatever is the _value_ of int a, with
the access to that object having _already_ taken place, as per the
actual type-qualification of the object (that was originally declared
as being _non-volatile_, in fact). Hence, defining atomic_read() as:

#define atomic_read(v)  ((volatile int)((v)-counter))

would be buggy and not give volatility semantics at all, unless the
counter object itself isn't volatile-qualified already (which it
isn't).

The result of the cast itself being the _value_ of the int object, and
not the object itself (i.e., not an lvalue), is thereby independent of
type-qualification in that cast itself (it just wouldn't make any
difference), hence the cast to a qualified type has the same effect
as a cast to an unqualified version of that type bit in section 6.5.4:4
of the standard.


  To serve our purposes, it is necessary for us to take the address of this
  (non-volatile) object, cast the resulting _pointer_ to the corresponding
  volatile-qualified pointer-type, and then dereference it. This makes that
  particular _access_ be volatile-qualified, without the object itself being
  such. Also note that the (dereferenced) result is also a valid lvalue and
  hence can be used in *(volatile int *)a = b; kind of construction
  (which we use for the atomic_set case).

Dereferencing using the *(pointer-type-cast) construct, OTOH, serves
us well:

#define atomic_read(v)  (*(volatile int *)(v)-counter)

Firstly, note that the cast here being (volatile int *) and not
(int * volatile) qualifies the type of the _object_ being pointed to
by the pointer in question as being volatile-qualified, and not the
pointer itself (6.2.5:27 of the standard, and 6.3.2.3:2 allows us to
convert from a pointer-to-non-volatile-qualified-int to a pointer-to-
volatile-qualified-int, which suits us just fine) -- but note that
the _access_ to that address itself has not yet occurred.

_After_ specifying the memory address as containing a volatile-qualified-
int-type object, (and GCC co-operates as mentioned below), we proceed to
dereference it, which is when the _actual access_ occurs, therefore with
volatility semantics this time.

Interesting.


 Here, I should obviously admit that the semantics of *(volatile int *)
 aren't any neater or well-defined in the _language standard_ at all. The
 standard does say (verbatim) precisely what constitutes as access to
 object of volatile-qualified type is implementation-defined, but GCC
 does help us out here by doing the right thing. Accessing the non-volatile
 object there using the volatile-qualified pointer-type cast makes GCC
 treat the object stored at that memory address itself as if it were a 
 volatile object, thus making the access end up having what we're calling
 as volatility semantics here.
 
 Honestly, given such confusion, and the propensity of the volatile
 type-qualifier keyword to be ill-defined (or at least poorly understood,
 often inconsistently implemented), I'd (again) express my opinion that it
 would be best to avoid its usage, given other alternatives do exist.


Satyam
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] [IPv6]: Invalid semicolon after if statement

2007-08-16 Thread Ilpo Järvinen
On Wed, 15 Aug 2007, David Miller wrote:

 From: Dave Jones [EMAIL PROTECTED]
 Date: Wed, 15 Aug 2007 19:52:20 -0400
 
  On Wed, Aug 15, 2007 at 03:08:14PM -0700, David Miller wrote:
From: Ilpo_Järvinen [EMAIL PROTECTED]
Date: Thu, 16 Aug 2007 00:57:00 +0300 (EEST)

 A similar fix to netfilter from Eric Dumazet inspired me to
 look around a bit by using some grep/sed stuff as looking for
 this kind of bugs seemed easy to automate. This is one of them
 I found where it looks like this semicolon is not valid.
 
 Signed-off-by: Ilpo Järvinen [EMAIL PROTECTED]

Yikes!  Makes you want to audit the entire tree for these
things :-)))
   
  Indeed.  Here's another one.
  
  Signed-off-by: Dave Jones [EMAIL PROTECTED]
 
 That got fixed the other day and is the A similar fix to netfilter
 from Eric Dumazet Ilpo is reffering to above :)

...heh, when I said a bit, I meant that it took a very little effort
to check over the whole tree... :-)

...I've already reported all four things one could find from whole tree 
with this cmd (to the relevant parties), so no need for you to redo the 
effort :-) :

$ for i in `find . -name '*.[ch]'`; do echo $i;
sed -n -e '/^\t*if *[(].*[)] *; *$/ N'
-e '/^\(\t*\)if *[(].*[)] *; *\n\1\t/ p' $i; done | tee result

...Basically it checks if the next line has more indentation than the
if line. ...Obviously it will work only if the code follows current 
CodingStyle in indentation. I'm currently experimenting with indent 
preprocessing step... ...but I'm not yet sure if I can pull something 
useful out from that too :-)

...Better cmdlines could be invented, e.g. by manually checking every if 
(); lines once one has first automated separation of them from if () 
do_smthg(); lines (I haven't learned myself how to easily count/work with 
parenthesis pairs in a cmdline, which seems necessary here)...

-- 
 i.

Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-16 Thread Stefan Richter
Herbert Xu wrote:
 On Thu, Aug 16, 2007 at 10:06:31AM +0200, Stefan Richter wrote:
  
  Do you (or anyone else for that matter) have an example of this?
 
 The only code I somewhat know, the ieee1394 subsystem, was perhaps
 authored and is currently maintained with the expectation that each
 occurrence of atomic_read actually results in a load operation, i.e. is
 not optimized away.  This means all atomic_t (bus generation, packet and
 buffer refcounts, and some other state variables)* and likewise all
 atomic bitops in that subsystem.
 
 Can you find an actual atomic_read code snippet there that is
 broken without the volatile modifier?

What do I have to look for?  atomic_read after another read or write
access to the same variable, in the same function scope?  Or in the sum
of scopes of functions that could be merged by function inlining?

One example was discussed here earlier:  The for (;;) loop in
nodemgr_host_thread.  There an msleep_interruptible implicitly acted as
barrier (at the moment because it's in a different translation unit; if
it were the same, then because it hopefully has own barriers).  So that
happens to work, although such an implicit barrier is bad style:  Better
enforce the desired behaviour (== guaranteed load operation) *explicitly*.
-- 
Stefan Richter
-=-=-=== =--- =
http://arcgraph.de/sr/
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: drivers/infiniband/mlx/mad.c misplaced ;

2007-08-16 Thread Ilpo Järvinen

...I guess those guys hunting for broken busyloops in the other thread 
could also benefit from similar searching commands introduced in this 
thread... ...Ccing Satyam to caught their attention too.


 On Wed, Aug 15, 2007 at 05:40:11PM -0700, Joe Perches wrote:
  
  There's more than a few of these (not inspected).
  
  $ egrep -r --include=*.c \bif[[:space:]]*\([^\)]*\)[[:space:]]*\; * 

...Hmm, I plugged in a preprocessor too to manage with non compliant 
coding styles :-). Please understand that the line numbers are not an 
exact match due to preprocessor changes:

$ for i in `find . -name '*.[ch]'`; do echo $i; indent -npro -kr -i8 -ts8 
-sob -l8000 -ss -ncs -cp1 -nhnl -st $i | egrep -n [[:space:]]if [(].*[)] ;$;
done | grep -B1 ^[^.]

./arch/arm/mach-omap1/leds-innovator.c
97: if (led_state  LED_STATE_ENABLED) ;
--
./arch/mips/sibyte/cfe/console.c
23: if (written  0) ;
32: if (written  0) ;
--
./arch/powerpc/kernel/legacy_serial.c
524:if (0) ;
--
./arch/powerpc/xmon/ppc-opc.c
938:else if (value == 0) ;
--
./arch/sh/boards/se/7343/io.c
137:if (0) ;
--
./arch/um/kernel/tt/tracer.c
254:if (WIFEXITED(status)) ;
--
./arch/x86_64/ia32/ptrace32.c
363:if (__copy_from_user(child-thread.i387.fxsave, u, 
sizeof(*u))) ;
--
./arch/x86_64/kernel/traps.c
801:if (eregs == (struct pt_regs *)eregs-rsp) ;
--
./drivers/atm/iphase.c
159:if (!desc1) ;
--
./drivers/isdn/capi/capiutil.c
456:else if (c = 0x0f) ;
--
./drivers/isdn/hisax/hfc_pci.c
125:if (Read_hfc(cs, HFCPCI_INT_S1)) ;
155:if (Read_hfc(cs, HFCPCI_INT_S1)) ;
1483:   if (Read_hfc(cs, HFCPCI_INT_S1)) ;
--
./drivers/isdn/hisax/hfc_sx.c
377:if (Read_hfc(cs, HFCSX_INT_S1)) ;
407:if (Read_hfc(cs, HFCSX_INT_S2)) ;
1246:   if (Read_hfc(cs, HFCSX_INT_S1)) ;
--
./drivers/media/video/video-buf.c
1141:   if (q-bufs[i]) ;
--
./drivers/net/lp486e.c
777:if (lp-scb.command  i596_timeout(dev, i596_cleanup_cmd, 100)) ;
785:if (lp-scb.command  i596_timeout(dev, i596_reset, 100)) ;
794:if (lp-scb.command  i596_timeout(dev, i596_reset(2), 400)) ;
820:if (lp-scb.command  i596_timeout(dev, i596_add_cmd, 100)) ;
1146:   if (lp-scb.command  i596_timeout(dev, interrupt, 40)) ;
1192:   if (lp-scb.command  i596_timeout(dev, i596 interrupt, 100)) ;
1217:   if (lp-scb.command  i596_timeout(dev, i596_close, 200)) ;
--
./drivers/net/ni5010.c
273:if (dev-irq == 0xff) ;
--
./drivers/net/ni52.c
648:if (result  TDR_LNK_OK) ;
--
./drivers/net/sun3_82586.c
498:if (result  TDR_LNK_OK) ;
--
./drivers/pci/hotplug/ibmphp_core.c
418:else if (mode == BUS_MODE_PCI) ;
636:else if (mode == BUS_MODE_PCI) ;
--
./drivers/usb/gadget/file_storage.c
2480:   if (protocol_is_scsi()) ;
--
./drivers/usb/host/uhci-debug.c
416:if (i = SKEL_ISO) ;
419:else if (!uhci-fsbr_is_on) ;
--
./drivers/usb/host/uhci-q.c
541:if (qh-skel == SKEL_ISO) ;
--
./drivers/usb/misc/usbtest.c
1401:   if (status != 0) ;
--
./drivers/video/intelfb/intelfbdrv.c
337:if (get_opt_bool(this_opt, accel, accel)) ;
338:else if (get_opt_int(this_opt, vram, vram)) ;
339:else if (get_opt_bool(this_opt, hwcursor, hwcursor)) ;
340:else if (get_opt_bool(this_opt, mtrr, mtrr)) ;
341:else if (get_opt_bool(this_opt, fixed, fixed)) ;
--
./drivers/video/matrox/matroxfb_DAC1064.c
46: if (fvco = 10) ;
--
./drivers/video/matrox/matroxfb_maven.c
298:if (fvco = 1) ;
316:if (fvco = 10) ;
--
./fs/hfs/inode.c
72: if (!node) ;
--
./fs/hfsplus/inode.c
67: if (!node) ;
--
./fs/hostfs/hostfs_user.c
300:if (attrs-ia_valid  HOSTFS_ATTR_CTIME) ;
--
./fs/xfs/xfs_bmap.c
2287:   if (nullfb || XFS_FSB_TO_AGNO(mp, ap-rval) == fb_agno) ;
--
./fs/xfs/xfs_dir2.c
281:else if ((rval = xfs_dir2_isblock(tp, dp, v))) ;
--
./fs/xfs/xfs_iomap.c
248:if (io-io_flags  XFS_IOCORE_RT) ;
--
./include/asm-cris/uaccess.h
255:if (n == 0) ;
303:if (n == 0) ;
351:if (n == 0) ;
--
./mm/swapfile.c
791:if (swcount = 1) ;
--
./net/core/pktgen.c
2256:   if (pkt_dev-min_in6_daddr.s6_addr32[0] == 0  
pkt_dev-min_in6_daddr.s6_addr32[1] == 0  pkt_dev-min_in6_daddr.s6_addr32[2] 
== 0  pkt_dev-min_in6_daddr.s6_addr32[3] == 0) ;
--
./net/irda/af_irda.c
1357:   if (ret) ;
1358:   else if (sk-sk_shutdown  RCV_SHUTDOWN) ;
--
./net/irda/irnetlink.c
105:if (nla_put_string(msg, IRDA_NL_ATTR_IFNAME, dev-name)) ;
--
./net/netfilter/xt_u32.c
38: if (skb-len  4 || pos  skb-len - 4) ;
--
./sound/pci/au88x0/au88x0_core.c
2076:   if (vortex_adbdma_bufshift(vortex, i)) ;
2085:   if (vortex_wtdma_bufshift(vortex, i)) ;
--

Re: [REVISED PATCH] 3c59x: check return of pci_enable_device()

2007-08-16 Thread Mark Hindley
Revised patch for this.

Mark


commit 5cf33391eba81a49038fa8be8cbad8425b80bf7f
Author: Mark Hindley [EMAIL PROTECTED]
Date:   Thu Aug 16 11:26:35 2007 +0100

Check return of pci_enable_device in vortex_up().

Also modify vortex_up to return error to callers. Handle failure of 
vortex_up in
vortex_open and vortex_resume.

Signed-off-by: Mark Hindley [EMAIL PROTECTED]

diff --git a/drivers/net/3c59x.c b/drivers/net/3c59x.c
index 001c66d..7b3050c 100644
--- a/drivers/net/3c59x.c
+++ b/drivers/net/3c59x.c
@@ -705,7 +705,7 @@ static struct {
 
 static int vortex_probe1(struct device *gendev, void __iomem *ioaddr, int irq,
   int chip_idx, int card_idx);
-static void vortex_up(struct net_device *dev);
+static int vortex_up(struct net_device *dev);
 static void vortex_down(struct net_device *dev, int final);
 static int vortex_open(struct net_device *dev);
 static void mdio_sync(void __iomem *ioaddr, int bits);
@@ -841,8 +841,11 @@ static int vortex_resume(struct pci_dev *pdev)
return -EBUSY;
}
if (netif_running(dev)) {
-   vortex_up(dev);
-   netif_device_attach(dev);
+   err = vortex_up(dev);
+   if (err)
+   return err;
+   else
+   netif_device_attach(dev);
}
}
return 0;
@@ -1484,19 +1487,24 @@ static void vortex_check_media(struct net_device *dev, 
unsigned int init)
}
 }
 
-static void
+static int
 vortex_up(struct net_device *dev)
 {
struct vortex_private *vp = netdev_priv(dev);
void __iomem *ioaddr = vp-ioaddr;
unsigned int config;
-   int i, mii_reg1, mii_reg5;
+   int i, mii_reg1, mii_reg5, err;
 
if (VORTEX_PCI(vp)) {
pci_set_power_state(VORTEX_PCI(vp), PCI_D0);/* Go active */
if (vp-pm_state_valid)
pci_restore_state(VORTEX_PCI(vp));
-   pci_enable_device(VORTEX_PCI(vp));
+   err = pci_enable_device(VORTEX_PCI(vp));
+   if (err) {
+   printk(KERN_WARNING %s: Could not enable device \n,
+   dev-name);
+   goto err_out;
+   }
}
 
/* Before initializing select the active media port. */
@@ -1660,6 +1668,8 @@ vortex_up(struct net_device *dev)
if (vp-cb_fn_base) /* The PCMCIA people are 
idiots.  */
iowrite32(0x8000, vp-cb_fn_base + 4);
netif_start_queue (dev);
+err_out:
+   return err;
 }
 
 static int
@@ -1673,7 +1683,7 @@ vortex_open(struct net_device *dev)
if ((retval = request_irq(dev-irq, vp-full_bus_master_rx ?
boomerang_interrupt : vortex_interrupt, 
IRQF_SHARED, dev-name, dev))) {
printk(KERN_ERR %s: Could not reserve IRQ %d\n, dev-name, 
dev-irq);
-   goto out;
+   goto err;
}
 
if (vp-full_bus_master_rx) { /* Boomerang bus master. */
@@ -1702,20 +1712,22 @@ vortex_open(struct net_device *dev)
}
}
retval = -ENOMEM;
-   goto out_free_irq;
+   goto err_free_irq;
}
/* Wrap the ring. */
vp-rx_ring[i-1].next = cpu_to_le32(vp-rx_ring_dma);
}
 
-   vortex_up(dev);
-   return 0;
+   retval = vortex_up(dev);
+   if (!retval)
+   goto out;
 
-out_free_irq:
+err_free_irq:
free_irq(dev-irq, dev);
-out:
+err:
if (vortex_debug  1)
printk(KERN_ERR %s: vortex_open() fails: returning %d\n, 
dev-name, retval);
+out:
return retval;
 }
 
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-16 Thread Stefan Richter
I wrote:
 Herbert Xu wrote:
 On Thu, Aug 16, 2007 at 10:06:31AM +0200, Stefan Richter wrote:
[...]
 expectation that each
 occurrence of atomic_read actually results in a load operation, i.e. is
 not optimized away.
[...]
 Can you find an actual atomic_read code snippet there that is
 broken without the volatile modifier?

PS:  Just to clarify, I'm not speaking for the volatile modifier.  I'm
not speaking for any particular implementation of atomic_t and its
accessors at all.  All I am saying is that
  - we use atomically accessed data types because we concurrently but
locklessly access this data,
  - hence a read access to this data that could be optimized away
makes *no sense at all*.

The only sensible read accessor to an atomic datatype is a read accessor
that will not be optimized away.

So, the architecture guys can implement atomic_read however they want
--- as long as it cannot be optimized away.*

PPS:  If somebody has code where he can afford to let the compiler
coalesce atomic_read with a previous access to the same data, i.e.
doesn't need and doesn't want all guarantees that the atomic_read API
makes (or IMO should make), then he can replace the atomic_read by a
local temporary variable.


*) Exceptions:
if (known_to_be_false)
read_access(a);
and the like.
-- 
Stefan Richter
-=-=-=== =--- =
http://arcgraph.de/sr/
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-16 Thread Herbert Xu
On Thu, Aug 16, 2007 at 11:54:44AM +0200, Stefan Richter wrote:
 
 One example was discussed here earlier:  The for (;;) loop in
 nodemgr_host_thread.  There an msleep_interruptible implicitly acted as
 barrier (at the moment because it's in a different translation unit; if
 it were the same, then because it hopefully has own barriers).  So that
 happens to work, although such an implicit barrier is bad style:  Better
 enforce the desired behaviour (== guaranteed load operation) *explicitly*.

Hmm, it's not bad style at all.  Let's assume that everything
is in the same scope.  Such a loop must either call a function
that busy-waits, which should always have a cpu_relax or
something equivalent, or it'll call a function that schedules
away which immediately invalidates any values the compiler might
have cached for the atomic_read.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmVHI~} [EMAIL PROTECTED]
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-16 Thread Herbert Xu
On Thu, Aug 16, 2007 at 12:31:03PM +0200, Stefan Richter wrote:
 
 PS:  Just to clarify, I'm not speaking for the volatile modifier.  I'm
 not speaking for any particular implementation of atomic_t and its
 accessors at all.  All I am saying is that
   - we use atomically accessed data types because we concurrently but
 locklessly access this data,
   - hence a read access to this data that could be optimized away
 makes *no sense at all*.

No sane compiler can optimise away an atomic_read per se.
That's only possible if there's a preceding atomic_set or
atomic_read, with no barriers in the middle.

If that's the case, then one has to conclude that doing
away with the second read is acceptable, as otherwise
a memory (or at least a compiler) barrier should have been
used.

In fact, volatile doesn't guarantee that the memory gets
read anyway.  You might be reading some stale value out
of the cache.  Granted this doesn't happen on x86 but
when you're coding for the kernel you can't make such
assumptions.

So the point here is that if you don't mind getting a stale
value from the CPU cache when doing an atomic_read, then
surely you won't mind getting a stale value from the compiler
cache.

 So, the architecture guys can implement atomic_read however they want
 --- as long as it cannot be optimized away.*

They can implement it however they want as long as it stays
atomic.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmVHI~} [EMAIL PROTECTED]
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: drivers/infiniband/mlx/mad.c misplaced ;

2007-08-16 Thread Karsten Keil
On Thu, Aug 16, 2007 at 01:22:04PM +0300, Ilpo Järvinen wrote:
 
 ...I guess those guys hunting for broken busyloops in the other thread 
 could also benefit from similar searching commands introduced in this 
 thread... ...Ccing Satyam to caught their attention too.
 
 
 ./drivers/isdn/hisax/hfc_pci.c
 125:  if (Read_hfc(cs, HFCPCI_INT_S1)) ;
 155:  if (Read_hfc(cs, HFCPCI_INT_S1)) ;
 1483: if (Read_hfc(cs, HFCPCI_INT_S1)) ;
 --
 ./drivers/isdn/hisax/hfc_sx.c
 377:  if (Read_hfc(cs, HFCSX_INT_S1)) ;
 407:  if (Read_hfc(cs, HFCSX_INT_S2)) ;
 1246: if (Read_hfc(cs, HFCSX_INT_S1)) ;
 --

These are workaround to not get compiler warnings about ignored return
values I got some time ago under some architecture.


-- 
Karsten Keil
SuSE Labs
ISDN and VOIP development
SUSE LINUX Products GmbH, Maxfeldstr.5 90409 Nuernberg, GF: Markus Rex, HRB 
16746 (AG Nuernberg)
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch 2/4] Update e100 driver to use devres.

2007-08-16 Thread Tejun Heo
Brandon Philips wrote:
 @@ -2554,8 +2547,10 @@ static int __devinit e100_probe(struct p
   struct net_device *netdev;
   struct nic *nic;
   int err;
 + void __iomem **iomap;
 + int bar;
  
 - if(!(netdev = alloc_etherdev(sizeof(struct nic {
 + if (!(netdev = devm_alloc_etherdev(pdev-dev, sizeof(struct nic {
   if(((1  debug) - 1)  NETIF_MSG_PROBE)
   printk(KERN_ERR PFX Etherdev alloc failed, abort.\n);
   return -ENOMEM;
 @@ -2585,26 +2580,28 @@ static int __devinit e100_probe(struct p
   nic-msg_enable = (1  debug) - 1;
   pci_set_drvdata(pdev, netdev);
  
 - if((err = pci_enable_device(pdev))) {
 + if ((err = pcim_enable_device(pdev))) {
   DPRINTK(PROBE, ERR, Cannot enable PCI device, aborting.\n);
 - goto err_out_free_dev;
 + return err;
   }
  
   if(!(pci_resource_flags(pdev, 0)  IORESOURCE_MEM)) {
   DPRINTK(PROBE, ERR, Cannot find proper PCI device 
   base address, aborting.\n);
   err = -ENODEV;
 - goto err_out_disable_pdev;
 + return err;
   }
  
 - if((err = pci_request_regions(pdev, DRV_NAME))) {
 + bar = use_io ? 1 : 0;
 + if((err = pcim_iomap_regions(pdev, 1  bar, DRV_NAME))) {
   DPRINTK(PROBE, ERR, Cannot obtain PCI resources, aborting.\n);
 - goto err_out_disable_pdev;
 + return err;
   }
 + iomap = pcim_iomap_table(pdev)[bar];

Type mismatch.  Didn't compiler warn about it?

  
   if((err = pci_set_dma_mask(pdev, DMA_32BIT_MASK))) {
   DPRINTK(PROBE, ERR, No usable DMA configuration, aborting.\n);
 - goto err_out_free_res;
 + return err;
   }
  
   SET_MODULE_OWNER(netdev);
 @@ -2613,12 +2610,6 @@ static int __devinit e100_probe(struct p
   if (use_io)
   DPRINTK(PROBE, INFO, using i/o access mode\n);
  
 - nic-csr = pci_iomap(pdev, (use_io ? 1 : 0), sizeof(struct csr));
 - if(!nic-csr) {
 - DPRINTK(PROBE, ERR, Cannot map device registers, aborting.\n);
 - err = -ENOMEM;
 - goto err_out_free_res;
 - }

nic-csr initialization seems missing.  Have you tested the patched code?

-- 
tejun
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch 4/4] Update e1000 driver to use devres.

2007-08-16 Thread Tejun Heo
Brandon Philips wrote:
 - mmio_start = pci_resource_start(pdev, BAR_0);
   mmio_len = pci_resource_len(pdev, BAR_0);

You don't need mmio_len either.

 - err = -EIO;
 - adapter-hw.hw_addr = ioremap(mmio_start, mmio_len);
 + adapter-hw.hw_addr = pcim_iomap(pdev, BAR_0, mmio_len);

Passing 0 as @max_len tells pci[m]_iomap() to use pci_resource_len() of
the BAR.

 @@ -952,16 +948,15 @@ e1000_probe(struct pci_dev *pdev,
   /* setup the private structure */
  
   if ((err = e1000_sw_init(adapter)))
 - goto err_sw_init;
 + return err;
  
   err = -EIO;
   /* Flash BAR mapping must happen after e1000_sw_init
* because it depends on mac_type */
   if ((adapter-hw.mac_type == e1000_ich8lan) 
  (pci_resource_flags(pdev, 1)  IORESOURCE_MEM)) {
 - flash_start = pci_resource_start(pdev, 1);
   flash_len = pci_resource_len(pdev, 1);

Ditto.

 - adapter-hw.flash_address = ioremap(flash_start, flash_len);
 + adapter-hw.flash_address = pcim_iomap(pdev, 1, flash_len);
   if (!adapter-hw.flash_address)
   goto err_flashmap;
   }

-- 
tejun
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC] net/core/dst.c : Should'nt dst_run_gc() be more scalable and friendly ?

2007-08-16 Thread Eric Dumazet
Hi all

When the periodic route cache flush is done, this machine suffers a lot and 
eventually triggers the soft lockup message.

dst_run_gc() is doing a scan of a possibly huge list of dst_entries, eventually 
freeing some (less than 1%) of them, but holding the dst_lock spinlock for the 
whole scan.

Then it rearms a timer to redo the full thing 1/10 s later. The slowdown can 
last one minute or so.
No need to say that machine is not really usable in the meantime.

I already looked at the code and I am testing a patch (included in this mail) 
where I limit the number of entries that can be scanned at each dst_run_gc() 
call, and not holding dst_lock during the scan. But then I dont know how to 
address the dst_dev_event() problem, since we must be able to dst_ifdown() all 
entries attached to one dev.

I am wondering how to really solve the problem. Could a workqueue be used here 
instead of a timer ?


Aug 16 06:21:37 SRV1 kernel: BUG: soft lockup detected on CPU#0!
Aug 16 06:21:37 SRV1 kernel: 
Aug 16 06:21:37 SRV1 kernel: Call Trace:
Aug 16 06:21:37 SRV1 kernel:  IRQ  [802286f0] 
wake_up_process+0x10/0x20
Aug 16 06:21:37 SRV1 kernel:  [80251e09] softlockup_tick+0xe9/0x110
Aug 16 06:21:37 SRV1 kernel:  [803cd380] dst_run_gc+0x0/0x140
Aug 16 06:21:37 SRV1 kernel:  [802376f3] run_local_timers+0x13/0x20
Aug 16 06:21:37 SRV1 kernel:  [802379c7] 
update_process_times+0x57/0x90
Aug 16 06:21:37 SRV1 kernel:  [80216034] 
smp_local_timer_interrupt+0x34/0x60
Aug 16 06:21:37 SRV1 kernel:  [802165cc] 
smp_apic_timer_interrupt+0x5c/0x80
Aug 16 06:21:37 SRV1 kernel:  [8020a816] 
apic_timer_interrupt+0x66/0x70
Aug 16 06:21:37 SRV1 kernel:  [803cd3d3] dst_run_gc+0x53/0x140
Aug 16 06:21:37 SRV1 kernel:  [803cd3c6] dst_run_gc+0x46/0x140
Aug 16 06:21:37 SRV1 kernel:  [80237148] run_timer_softirq+0x148/0x1c0
Aug 16 06:21:37 SRV1 kernel:  [8023340c] __do_softirq+0x6c/0xe0
Aug 16 06:21:37 SRV1 kernel:  [8020ad6c] call_softirq+0x1c/0x30
Aug 16 06:21:37 SRV1 kernel:  EOI  [8020cb34] do_softirq+0x34/0x90
Aug 16 06:21:37 SRV1 kernel:  [802331cf] local_bh_enable_ip+0x3f/0x60
Aug 16 06:21:37 SRV1 kernel:  [80422913] _spin_unlock_bh+0x13/0x20
Aug 16 06:21:37 SRV1 kernel:  [803dfde8] 
rt_garbage_collect+0x1d8/0x320
Aug 16 06:21:37 SRV1 kernel:  [803cd4dd] dst_alloc+0x1d/0xa0
Aug 16 06:21:37 SRV1 kernel:  [803e1433] 
__ip_route_output_key+0x573/0x800
Aug 16 06:21:37 SRV1 kernel:  [803c02e2] sock_common_recvmsg+0x32/0x50
Aug 16 06:21:37 SRV1 kernel:  [803e16dc] 
ip_route_output_flow+0x1c/0x60
Aug 16 06:21:37 SRV1 kernel:  [80400160] tcp_v4_connect+0x150/0x610
Aug 16 06:21:37 SRV1 kernel:  [803ebf07] 
inet_bind_bucket_create+0x17/0x60
Aug 16 06:21:37 SRV1 kernel:  [8040cd16] 
inet_stream_connect+0xa6/0x2c0
Aug 16 06:21:37 SRV1 kernel:  [80422981] _spin_lock_bh+0x11/0x30
Aug 16 06:21:37 SRV1 kernel:  [803c0bdf] lock_sock_nested+0xcf/0xe0
Aug 16 06:21:37 SRV1 kernel:  [80422981] _spin_lock_bh+0x11/0x30
Aug 16 06:21:37 SRV1 kernel:  [803be551] sys_connect+0x71/0xa0
Aug 16 06:21:37 SRV1 kernel:  [803eee3f] tcp_setsockopt+0x1f/0x30
Aug 16 06:21:37 SRV1 kernel:  [803c030f] 
sock_common_setsockopt+0xf/0x20
Aug 16 06:21:37 SRV1 kernel:  [803be4bd] sys_setsockopt+0x9d/0xc0
Aug 16 06:21:37 SRV1 kernel:  [8028881e] sys_ioctl+0x5e/0x80
Aug 16 06:21:37 SRV1 kernel:  [80209c4e] system_call+0x7e/0x83

Prelimary patch :

--- linux-2.6.22/net/core/dst.c
+++ linux-2.6.22-ed/net/core/dst.c
@@ -28,6 +28,9 @@
  * 4) All operations modify state, so a spinlock is used.
  */
 static struct dst_entry*dst_garbage_list;
+static struct dst_entry *dst_garbage_wrk;
+static struct dst_entry *dst_garbage_inuse;
+static int dst_gc_running;
 #if RT_CACHE_DEBUG = 2
 static atomic_t dst_total = ATOMIC_INIT(0);
 #endif
@@ -42,26 +45,42 @@
 
 static void dst_run_gc(unsigned long dummy)
 {
-   intdelayed = 0;
-   intwork_performed;
-   struct dst_entry * dst, **dstp;
-
-   if (!spin_trylock(dst_lock)) {
-   mod_timer(dst_gc_timer, jiffies + HZ/10);
-   return;
-   }
-
+   intquota = 1000;
+   intwork_performed = 0;
+   struct dst_entry *dst, *next;
+   struct dst_entry *first = NULL, *last = NULL;
+#if RT_CACHE_DEBUG = 2
+   unsigned long t0 = jiffies;
+#endif
+   spin_lock(dst_lock);
+   if (dst_gc_running)
+   goto out;
del_timer(dst_gc_timer);
-   dstp = dst_garbage_list;
-   work_performed = 0;
-   while ((dst = *dstp) != NULL) {
-   if (atomic_read(dst-__refcnt)) {
-   dstp = dst-next;
-   delayed++;
+   if (!dst_garbage_wrk) {
+   dst_garbage_wrk = dst_garbage_list;
+   

Re: drivers/infiniband/mlx/mad.c misplaced ;

2007-08-16 Thread Satyam Sharma
Hi Ilpo,


On Thu, 16 Aug 2007, Ilpo Järvinen wrote:

 
 ...I guess those guys hunting for broken busyloops in the other thread 
 could also benefit from similar searching commands introduced in this 
 thread... ...Ccing Satyam to caught their attention too.
 
 
  On Wed, Aug 15, 2007 at 05:40:11PM -0700, Joe Perches wrote:
   
   There's more than a few of these (not inspected).
   
   $ egrep -r --include=*.c \bif[[:space:]]*\([^\)]*\)[[:space:]]*\; * 
 
 ...Hmm, I plugged in a preprocessor too to manage with non compliant 
 coding styles :-). Please understand that the line numbers are not an 
 exact match due to preprocessor changes:
 
 $ for i in `find . -name '*.[ch]'`; do echo $i; indent -npro -kr -i8 -ts8 
 -sob -l8000 -ss -ncs -cp1 -nhnl -st $i | egrep -n [[:space:]]if [(].*[)] ;$;
 done | grep -B1 ^[^.]

Thanks, looks useful, will check with this.


Satyam

Re: [RFC] net/core/dst.c : Should'nt dst_run_gc() be more scalable and friendly ?

2007-08-16 Thread Herbert Xu
Eric Dumazet [EMAIL PROTECTED] wrote:

 I am wondering how to really solve the problem. Could a workqueue be used 
 here instead of a timer ?

I think so.

A mutex would separate the GC and dst_ifdown.  You'd take the
spin lock in the GC only to replace the garbage list with NULL.
You then drop the lock to process it.  Once it's done you take
the lock again and join it with whatever that's been added in
the mean time.  This is easy because you should already have
the tail after the GC process.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmVHI~} [EMAIL PROTECTED]
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 2.6.23-rc3 and SKY2 driver issue

2007-08-16 Thread Stephen Hemminger
On Thu, 16 Aug 2007 10:25:45 +0200
Michal Piotrowski [EMAIL PROTECTED] wrote:

 [Adding Stephen and netdev to CC]
 
 On 15/08/07, James Corey [EMAIL PROTECTED] wrote:
 
  I tried running a D-link gig card on kernel 2.6.21.1
  and it came up fine, but when I did a sftp of
  an linux dvd ISO to it, the interface would lock
  up hard with the error
 
  eth1: hw csum failure.
 
  Call Trace:
   IRQ  [804d3e31]
  __skb_checksum_complete_head+0x46/0x5f
   [804d3e56] __skb_checksum_complete+0xc/0x11
   [805046c7] tcp_v4_rcv+0x157/0x810
   [804d8a6f] dev_queue_xmit+0x237/0x260
   [80229990] find_busiest_group+0x252/0x684
   [804ead50] ip_local_deliver+0xca/0x14c
   [804eb24a] ip_rcv+0x478/0x4ba
   [803ec7d3] sky2_poll+0x6f9/0x9b9
   [8022bd5d]
  run_rebalance_domains+0x13e/0x408
   [804d877a] net_rx_action+0xa8/0x166
   [80235d62] __do_softirq+0x55/0xc3
   [8020a4ec] call_softirq+0x1c/0x28
   [8020b611] do_softirq+0x2c/0x7d
   [8020b8cf] do_IRQ+0x13e/0x15f
   [802086ce] mwait_idle+0x0/0x46
   [80209871] ret_from_intr+0x0/0xa
   EOI  [80208710] mwait_idle+0x42/0x46
   [80208666] cpu_idle+0x8c/0xaf
   [8078174e] start_kernel+0x2ac/0x2b8
   [80781140] _sinittext+0x140/0x144
 
 
  So I tried running the latest snapshot 2.6.23-rc3
  and the almost the same thing happens. Only
  difference is that now the entire box locks up.
  The error is almost the same
 
  eth1: hw csum failure.
 
  Call Trace:
   IRQ  [804779b6]
  __skb_checksum_complete_head+0x43/0x56
   [804779d5] __skb_checksum_complete+0xc/0x11
   [804a989d] tcp_v4_rcv+0x14e/0x801
   [8048ff84] ip_local_deliver+0xca/0x14c
   [80490472] ip_rcv+0x46c/0x4ae
   [880060f9] :sky2:sky2_poll+0x72b/0x9c7
   [8047c934] net_rx_action+0xa8/0x166
   [80235ced] __do_softirq+0x55/0xc4
   [8020c5cc] call_softirq+0x1c/0x28
   [8020d6fd] do_softirq+0x2c/0x7d
   [8020d9bb] do_IRQ+0x13e/0x15f
   [8020a780] mwait_idle+0x0/0x48
   [8020b951] ret_from_intr+0x0/0xa
   EOI  [880063e7]
  :sky2:sky2_xmit_frame+0x0/0x41d
   [8020a7c2] mwait_idle+0x42/0x48
   [8020a718] cpu_idle+0xbd/0xe0
   [80704a5a] start_kernel+0x2ac/0x2b8
   [80704140] _sinittext+0x140/0x144
 
 
  I see that the new kernel includes some sort of
  SKY2 DEBUG stuff. I would be happy to rerun
  the test with that turned on, given some minor
  direction.
 
 
 Regards,
 Michal

Please reproduce with a more recent kernel?
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 0/4] backport of sky2 stability fixes

2007-08-16 Thread Stephen Hemminger
These are bugfixes that happened late in 2.6.22 cycle that missed
getting merged.

-- 

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/4] sky2: restore workarounds for lost interrupts

2007-08-16 Thread Stephen Hemminger
Backport of commit c59697e06058fc2361da8cefcfa3de85ac107582

This patch restores a couple of workarounds from 2.6.16:
 * restart transmit moderation timer in case it expires during IRQ routine
 * default to having 10 HZ watchdog timer.
At this point it more important not to hang than to worry about the
power cost.

Signed-off-by: Stephen Hemminger [EMAIL PROTECTED]

--- a/drivers/net/sky2.c2007-08-08 21:50:10.0 +0100
+++ b/drivers/net/sky2.c2007-08-08 22:01:00.0 +0100
@@ -96,7 +96,7 @@ static int disable_msi = 0;
 module_param(disable_msi, int, 0);
 MODULE_PARM_DESC(disable_msi, Disable Message Signaled Interrupt (MSI));
 
-static int idle_timeout = 0;
+static int idle_timeout = 100;
 module_param(idle_timeout, int, 0);
 MODULE_PARM_DESC(idle_timeout, Watchdog timer for lost interrupts (ms));
 
@@ -2442,6 +2442,13 @@ static int sky2_poll(struct net_device *
 
work_done = sky2_status_intr(hw, work_limit);
if (work_done  work_limit) {
+   /* Bug/Errata workaround?
+* Need to kick the TX irq moderation timer.
+*/
+   if (sky2_read8(hw, STAT_TX_TIMER_CTRL) == TIM_START) {
+   sky2_write8(hw, STAT_TX_TIMER_CTRL, TIM_STOP);
+   sky2_write8(hw, STAT_TX_TIMER_CTRL, TIM_START);
+   }
netif_rx_complete(dev0);
 
/* end of interrupt, re-enables also acts as I/O 
synchronization */

-- 

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3/4] sky2: check for more work before leaving NAPI

2007-08-16 Thread Stephen Hemminger
Backport of commit 5c11ce700f77fada15b6264417d72462da4bbb1c

This patch avoids generating another IRQ if more packets
arrive while in the NAPI poll routine. Before marking device as
finished, it rechecks that the status ring is empty.

Signed-off-by: Stephen Hemminger [EMAIL PROTECTED]

--- a/drivers/net/sky2.c2007-08-08 22:01:11.0 +0100
+++ b/drivers/net/sky2.c2007-08-08 22:01:28.0 +0100
@@ -2428,8 +2428,7 @@ static void sky2_err_intr(struct sky2_hw
 static int sky2_poll(struct net_device *dev0, int *budget)
 {
struct sky2_hw *hw = ((struct sky2_port *) netdev_priv(dev0))-hw;
-   int work_limit = min(dev0-quota, *budget);
-   int work_done = 0;
+   int work_done;
u32 status = sky2_read32(hw, B0_Y2_SP_EISR);
 
if (unlikely(status  Y2_IS_ERROR))
@@ -2441,25 +2440,25 @@ static int sky2_poll(struct net_device *
if (status  Y2_IS_IRQ_PHY2)
sky2_phy_intr(hw, 1);
 
-   work_done = sky2_status_intr(hw, work_limit);
-   if (work_done  work_limit) {
-   /* Bug/Errata workaround?
-* Need to kick the TX irq moderation timer.
-*/
-   if (sky2_read8(hw, STAT_TX_TIMER_CTRL) == TIM_START) {
-   sky2_write8(hw, STAT_TX_TIMER_CTRL, TIM_STOP);
-   sky2_write8(hw, STAT_TX_TIMER_CTRL, TIM_START);
-   }
-   netif_rx_complete(dev0);
+   work_done = sky2_status_intr(hw, min(dev0-quota, *budget));
+   *budget -= work_done;
+   dev0-quota -= work_done;
 
-   /* end of interrupt, re-enables also acts as I/O 
synchronization */
-   sky2_read32(hw, B0_Y2_SP_LISR);
-   return 0;
-   } else {
-   *budget -= work_done;
-   dev0-quota -= work_done;
+   /* More work? */
+   if (hw-st_idx != sky2_read16(hw, STAT_PUT_IDX))
return 1;
+
+   /* Bug/Errata workaround?
+* Need to kick the TX irq moderation timer.
+*/
+   if (sky2_read8(hw, STAT_TX_TIMER_CTRL) == TIM_START) {
+   sky2_write8(hw, STAT_TX_TIMER_CTRL, TIM_STOP);
+   sky2_write8(hw, STAT_TX_TIMER_CTRL, TIM_START);
}
+   netif_rx_complete(dev0);
+
+   sky2_read32(hw, B0_Y2_SP_LISR);
+   return 0;
 }
 
 static irqreturn_t sky2_intr(int irq, void *dev_id)

-- 

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 4/4] sky2: check drop truncated packets

2007-08-16 Thread Stephen Hemminger
Backport of commit 71749531f2d1954137a1a77422ef4ff29eb102dd

If packet larger than MTU is received, the driver uses hardware to
truncate the packet. Use the status registers to catch/drop them.

Signed-off-by: Stephen Hemminger [EMAIL PROTECTED]

--- a/drivers/net/sky2.c2007-08-08 22:01:28.0 +0100
+++ b/drivers/net/sky2.c2007-08-08 22:01:37.0 +0100
@@ -2065,6 +2065,9 @@ static struct sk_buff *sky2_receive(stru
if (!(status  GMR_FS_RX_OK))
goto resubmit;
 
+   if (status  16 != length)
+   goto len_mismatch;
+
if (length  copybreak)
skb = receive_copy(sky2, re, length);
else
@@ -2074,6 +2077,11 @@ resubmit:
 
return skb;
 
+len_mismatch:
+   /* Truncation of overlength packets
+  causes PHY length to not match MAC length */
+   ++sky2-net_stats.rx_length_errors;
+
 error:
++sky2-net_stats.rx_errors;
if (status  GMR_FS_RX_FF_OV) {

-- 

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/4] sky2: carrier management

2007-08-16 Thread Stephen Hemminger
backport of commit 55d7b4e6ed6ad3ec5e5e30b3b4515a0a6a53e344

Make sky2 handle carrier similar to other drivers,
eliminate some possible races in carrier state transistions.

Signed-off-by: Stephen Hemminger [EMAIL PROTECTED]

--- a/drivers/net/sky2.c2007-08-08 22:01:00.0 +0100
+++ b/drivers/net/sky2.c2007-08-08 22:01:11.0 +0100
@@ -1234,6 +1234,8 @@ static int sky2_up(struct net_device *de
if (netif_msg_ifup(sky2))
printk(KERN_INFO PFX %s: enabling interface\n, dev-name);
 
+   netif_carrier_off(dev);
+
/* must be power of 2 */
sky2-tx_le = pci_alloc_consistent(hw-pdev,
   TX_RING_SIZE *
@@ -1573,7 +1575,6 @@ static int sky2_down(struct net_device *
 
/* Stop more packets from being queued */
netif_stop_queue(dev);
-   netif_carrier_off(dev);
 
/* Disable port IRQ */
imask = sky2_read32(hw, B0_IMSK);
@@ -1625,6 +1626,8 @@ static int sky2_down(struct net_device *
 
sky2_phy_power(hw, port, 0);
 
+   netif_carrier_off(dev);
+
/* turn off LED's */
sky2_write16(hw, B0_Y2LED, LED_STAT_OFF);
 
@@ -1689,7 +1692,6 @@ static void sky2_link_up(struct sky2_por
gm_phy_write(hw, port, PHY_MARV_INT_MASK, PHY_M_DEF_MSK);
 
netif_carrier_on(sky2-netdev);
-   netif_wake_queue(sky2-netdev);
 
/* Turn on link LED */
sky2_write8(hw, SK_REG(port, LNK_LED_REG),
@@ -1741,7 +1743,6 @@ static void sky2_link_down(struct sky2_p
gma_write16(hw, port, GM_GP_CTRL, reg);
 
netif_carrier_off(sky2-netdev);
-   netif_stop_queue(sky2-netdev);
 
/* Turn on link LED */
sky2_write8(hw, SK_REG(port, LNK_LED_REG), LINKLED_OFF);
@@ -3493,10 +3494,6 @@ static __devinit struct net_device *sky2
memcpy_fromio(dev-dev_addr, hw-regs + B2_MAC_1 + port * 8, ETH_ALEN);
memcpy(dev-perm_addr, dev-dev_addr, dev-addr_len);
 
-   /* device is off until link detection */
-   netif_carrier_off(dev);
-   netif_stop_queue(dev);
-
return dev;
 }
 

-- 

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: skge and sky2 stop working after a few minutes.

2007-08-16 Thread Stephen Hemminger
On Wed, 15 Aug 2007 21:20:37 + (UTC)
Sébastien Judenherc  [EMAIL PROTECTED] wrote:

 Hello,
 I can see some very strange problem with sky2 and skge interfaces with 2.6.22,
 and also 2.6.23-rc2/3.
 After 8-9 minutes, the interfaces stop working. ethtool reports that the link 
 is
 down and the only way to make the interfaces usable again is
 removing/reinserting the module or running ethtool -r.
 This occurs for both sky2 and skge for two PCs connected to a gigabit switch.
 The other PCs on the same switch continue to work fine (most are 100Mbit/s). 
 It
 also occurs on a third PC with skge, directly connected to a 100Mbit/s 
 interface
 over a cross over cable. I didn't notice the problem with 2.6.19.
 

Looks like some noise sensitivity in the PHY. Could you try the vendor sk98lin
driver, although all drivers use the same phy settings.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3/4] net: add dev_get_stats

2007-08-16 Thread Stephen Hemminger
Since we now have internal stats, it cleans up code to have a
dev_get_stats() interface.  This allows for future patches where
'network device ops' patch where get_stats is immutable.

Signed-off-by: Stephen Hemminger [EMAIL PROTECTED]

--- a/include/linux/netdevice.h 2007-08-16 06:34:25.0 -0400
+++ b/include/linux/netdevice.h 2007-08-16 06:38:33.0 -0400
@@ -809,6 +809,7 @@ extern int  dev_set_mac_address(struct n
struct sockaddr *);
 extern int dev_hard_start_xmit(struct sk_buff *skb,
struct net_device *dev);
+extern struct net_device_stats *dev_get_stats(struct net_device *dev);
 
 extern int netdev_budget;
 
--- a/net/core/dev.c2007-08-16 06:34:25.0 -0400
+++ b/net/core/dev.c2007-08-16 08:26:04.0 -0400
@@ -2304,7 +2304,7 @@ void dev_seq_stop(struct seq_file *seq, 
 
 static void dev_seq_printf_stats(struct seq_file *seq, struct net_device *dev)
 {
-   struct net_device_stats *stats = dev-get_stats(dev);
+   struct net_device_stats *stats = dev_get_stats(dev);
 
seq_printf(seq, %6s:%8lu %7lu %4lu %4lu %4lu %5lu %10lu %9lu 
   %8lu %7lu %4lu %4lu %4lu %5lu %7lu %10lu\n,
@@ -3682,10 +3682,21 @@ out:
mutex_unlock(net_todo_run_mutex);
 }
 
-static struct net_device_stats *internal_stats(struct net_device *dev)
+/**
+ * dev_get_stats - get network device statistics
+ * @dev:   network device
+ *
+ * Get standard network device statistics.
+ * Use internal stastics unless device overides
+ */
+struct net_device_stats *dev_get_stats(struct net_device *dev)
 {
-   return dev-stats;
+   if (!dev-get_stats)
+   return dev-stats;
+
+   return dev-get_stats(dev);
 }
+EXPORT_SYMBOL(dev_get_stats);
 
 /**
  * alloc_netdev_mq - allocate network device
@@ -3733,7 +3744,6 @@ struct net_device *alloc_netdev_mq(int s
 
dev-egress_subqueue_count = queue_count;
 
-   dev-get_stats = internal_stats;
setup(dev);
strcpy(dev-name, name);
return dev;
--- a/net/core/net-sysfs.c  2007-08-16 06:34:25.0 -0400
+++ b/net/core/net-sysfs.c  2007-08-16 09:23:13.0 -0400
@@ -264,8 +264,7 @@ static ssize_t netstat_show(const struct
WARN_ON(1);
 
read_lock(dev_base_lock);
-   if (dev_isalive(dev)  dev-get_stats 
-   (stats = (*dev-get_stats)(dev)))
+   if (dev_isalive(dev)  (stats = dev_get_stats(dev)))
ret = sprintf(buf, fmt_ulong,
  *(unsigned long *)(((u8 *) stats) + offset));
 
@@ -481,8 +480,7 @@ int netdev_register_sysfs(struct net_dev
BUILD_BUG_ON(BUS_ID_SIZE  IFNAMSIZ);
strlcpy(dev-bus_id, net-name, BUS_ID_SIZE);
 
-   if (net-get_stats)
-   *groups++ = netstat_group;
+   *groups++ = netstat_group;
 
 #ifdef CONFIG_WIRELESS_EXT
if (net-wireless_handlers  
net-wireless_handlers-get_wireless_stats)
--- a/net/core/rtnetlink.c  2007-08-16 06:34:25.0 -0400
+++ b/net/core/rtnetlink.c  2007-08-16 08:26:13.0 -0400
@@ -619,6 +619,7 @@ static int rtnl_fill_ifinfo(struct sk_bu
 {
struct ifinfomsg *ifm;
struct nlmsghdr *nlh;
+   struct net_device_stats *stats;
 
nlh = nlmsg_put(skb, pid, seq, type, sizeof(*ifm), flags);
if (nlh == NULL)
@@ -666,18 +667,13 @@ static int rtnl_fill_ifinfo(struct sk_bu
NLA_PUT(skb, IFLA_BROADCAST, dev-addr_len, dev-broadcast);
}
 
-   if (dev-get_stats) {
-   struct net_device_stats *stats = dev-get_stats(dev);
-   if (stats) {
-   struct nlattr *attr;
-
-   attr = nla_reserve(skb, IFLA_STATS,
-  sizeof(struct rtnl_link_stats));
-   if (attr == NULL)
-   goto nla_put_failure;
+   if ((stats = dev_get_stats(dev))) {
+   struct nlattr *attr = nla_reserve(skb, IFLA_STATS,
+ sizeof(struct 
rtnl_link_stats));
+   if (attr == NULL)
+   goto nla_put_failure;
 
-   copy_rtnl_link_stats(nla_data(attr), stats);
-   }
+   copy_rtnl_link_stats(nla_data(attr), stats);
}
 
if (dev-rtnl_link_ops) {

-- 

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/4] net: cleanup left over decl

2007-08-16 Thread Stephen Hemminger
Remove unneeded declaration.

Signed-off-by: Stephen Hemminger [EMAIL PROTECTED]

--- a/include/linux/netdevice.h 2007-08-15 14:29:02.0 +0100
+++ b/include/linux/netdevice.h 2007-08-15 14:35:25.0 +0100
@@ -833,8 +833,6 @@ extern int  dev_set_mac_address(struct n
 extern int dev_hard_start_xmit(struct sk_buff *skb,
struct net_device *dev);
 
-extern voiddev_init(void);
-
 extern int netdev_budget;
 
 /* Called by rtnetlink.c:rtnl_unlock() */

-- 

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 0/4] Small network device interface changes

2007-08-16 Thread Stephen Hemminger
These cleanups should go in 2.6.24. Hope they don't conflict too bad
with the NAPI stuff.

-- 

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/4] net: deinline dev_kfree_skb_irq

2007-08-16 Thread Stephen Hemminger
Deinline dev_kfree_skb_irq. This saves about 100bytes per call
site on UP, probably more on SMP.

Signed-off-by: Stephen Hemminger [EMAIL PROTECTED]
--- a/include/linux/netdevice.h 2007-08-06 09:26:41.0 +0100
+++ b/include/linux/netdevice.h 2007-08-15 14:12:32.0 +0100
@@ -793,29 +793,6 @@ static inline int netif_is_multiqueue(co
 #endif
 }
 
-/* Use this variant when it is known for sure that it
- * is executing from interrupt context.
- */
-static inline void dev_kfree_skb_irq(struct sk_buff *skb)
-{
-   if (atomic_dec_and_test(skb-users)) {
-   struct softnet_data *sd;
-   unsigned long flags;
-
-   local_irq_save(flags);
-   sd = __get_cpu_var(softnet_data);
-   skb-next = sd-completion_queue;
-   sd-completion_queue = skb;
-   raise_softirq_irqoff(NET_TX_SOFTIRQ);
-   local_irq_restore(flags);
-   }
-}
-
-/* Use this variant in places where it could be invoked
- * either from interrupt or non-interrupt context.
- */
-extern void dev_kfree_skb_any(struct sk_buff *skb);
-
 #define HAVE_NETIF_RX 1
 extern int netif_rx(struct sk_buff *skb);
 extern int netif_rx_ni(struct sk_buff *skb);
--- a/include/linux/skbuff.h2007-08-06 09:26:43.0 +0100
+++ b/include/linux/skbuff.h2007-08-15 14:09:04.0 +0100
@@ -378,6 +378,8 @@ extern int skb_cow_data(struct sk
struct sk_buff **trailer);
 extern intskb_pad(struct sk_buff *skb, int pad);
 #define dev_kfree_skb(a)   kfree_skb(a)
+extern void  dev_kfree_skb_any(struct sk_buff *skb);
+extern void  dev_kfree_skb_irq(struct sk_buff *skb);
 extern void  skb_over_panic(struct sk_buff *skb, int len,
 void *here);
 extern void  skb_under_panic(struct sk_buff *skb, int len,
--- a/net/core/dev.c2007-08-06 09:26:48.0 +0100
+++ b/net/core/dev.c2007-08-15 14:12:16.0 +0100
@@ -1249,6 +1249,36 @@ void __netif_rx_schedule(struct net_devi
 }
 EXPORT_SYMBOL(__netif_rx_schedule);
 
+/**
+ * dev_kfree_skb_irq - free skb from IRQ context
+ * @skb: skb to free
+ *
+ * Queue skb for later release. Used when skb needs to be
+ * free but executing in IRQ context.
+ */
+void dev_kfree_skb_irq(struct sk_buff *skb)
+{
+   if (atomic_dec_and_test(skb-users)) {
+   struct softnet_data *sd;
+   unsigned long flags;
+
+   local_irq_save(flags);
+   sd = __get_cpu_var(softnet_data);
+   skb-next = sd-completion_queue;
+   sd-completion_queue = skb;
+   raise_softirq_irqoff(NET_TX_SOFTIRQ);
+   local_irq_restore(flags);
+   }
+}
+EXPORT_SYMBOL(dev_kfree_skb_irq);
+
+/**
+ * dev_kfree_skb_any - free skb from any context
+ * @skb: skb to free
+ *
+ * Free skb from interrupt or non-interrupt context.
+ * Use this variant in places where context is not easily determined.
+ */
 void dev_kfree_skb_any(struct sk_buff *skb)
 {
if (in_irq() || irqs_disabled())

-- 

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 4/4] net: netdev_budget rearrangement

2007-08-16 Thread Stephen Hemminger
Trivial patch, move netdev_budget declaration out of netdevice.h
to a new home with the other sysctl externs.

Signed-off-by: Stephen Hemminger [EMAIL PROTECTED]

--- a/include/linux/netdevice.h 2007-08-16 06:38:33.0 -0400
+++ b/include/linux/netdevice.h 2007-08-16 09:24:08.0 -0400
@@ -811,8 +811,6 @@ extern int  dev_hard_start_xmit(struct s
struct net_device *dev);
 extern struct net_device_stats *dev_get_stats(struct net_device *dev);
 
-extern int netdev_budget;
-
 /* Called by rtnetlink.c:rtnl_unlock() */
 extern void netdev_run_todo(void);
 
--- a/net/core/sysctl_net_core.c2007-08-06 04:26:48.0 -0400
+++ b/net/core/sysctl_net_core.c2007-08-16 09:24:08.0 -0400
@@ -13,6 +13,7 @@
 
 #ifdef CONFIG_SYSCTL
 
+extern int netdev_budget;
 extern int netdev_max_backlog;
 extern int weight_p;
 

-- 

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] net/core/dst.c : Should'nt dst_run_gc() be more scalable and friendly ?

2007-08-16 Thread Eric Dumazet
On Thu, 16 Aug 2007 20:41:58 +0800
Herbert Xu [EMAIL PROTECTED] wrote:

 Eric Dumazet [EMAIL PROTECTED] wrote:
 
  I am wondering how to really solve the problem. Could a workqueue be used 
  here instead of a timer ?
 
 I think so.
 
 A mutex would separate the GC and dst_ifdown.  You'd take the
 spin lock in the GC only to replace the garbage list with NULL.
 You then drop the lock to process it.  Once it's done you take
 the lock again and join it with whatever that's been added in
 the mean time.  This is easy because you should already have
 the tail after the GC process.

Thanks Herbert

Yes, I already did this (with the current softirq based timer model),
 but how can dst_dev_event() do its work, since the GC is using 
a private list. (In my patch, time to GC process XXX.000 entries is about XX 
seconds.)

We would have to change dst_dev_event() to :
- Signal to GC it has to stop as soon as possible.
- Wait for GC be stoped (busy wait I suspect we cannot sleep in dst_dev_event() 
? )
, giving us a full garbage_list.
- Process the whole list.
- Restart GC

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [ofa-general] Re: [PATCH RFC] RDMA/CMA: Allocate PS_TCP ports from the host TCP port space.

2007-08-16 Thread Tom Tucker
On Wed, 2007-08-15 at 22:26 -0400, Jeff Garzik wrote:

[...snip...]

  I think removing the RDMA stack is the wrong thing to do, and you 
  shouldn't just threaten to yank entire subsystems because you don't like 
  the technology.  Lets keep this constructive, can we?  RDMA should get 
  the respect of any other technology in Linux.  Maybe its a niche in your 
  opinion, but come on, there's more RDMA users than say, the sparc64 
  port.  Eh?
 
 It's not about being a niche.  It's about creating a maintainable 
 software net stack that has predictable behavior.

Isn't RDMA _part_ of the software net stack within Linux? Why isn't
making RDMA stable, supportable and maintainable equally as important as
any other subsystem? 

 
 Needing to reach out of the RDMA sandbox and reserve net stack resources 
 away from itself travels a path we've consistently avoided.
 
 
  I will NACK any patch that opens up sockets to eat up ports or
  anything stupid like that.
  
  Got it.
 
 Ditto for me as well.
 
   Jeff
 
 
 -
 To unsubscribe from this list: send the line unsubscribe netdev in
 the body of a message to [EMAIL PROTECTED]
 More majordomo info at  http://vger.kernel.org/majordomo-info.html

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] use correct array index. (array is just 6 bytes long)

2007-08-16 Thread Marcus Meissner
From: Marcus Meissner [EMAIL PROTECTED]

Use correct array index (goes from 0-6 instead of 10-16).

Signed-Off-By: Marcus Meissner [EMAIL PROTECTED]
---
 drivers/net/tokenring/3c359.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/drivers/net/tokenring/3c359.c b/drivers/net/tokenring/3c359.c
index 9f1b6ab..5e2437e 100644
--- a/drivers/net/tokenring/3c359.c
+++ b/drivers/net/tokenring/3c359.c
@@ -763,7 +763,7 @@ static int xl_open_hw(struct net_device
if (xl_priv-xl_laa[0]) {  /* If using a LAA address */
for (i=10;i16;i++) { 
writel( (MEM_BYTE_WRITE | 0xD | xl_priv-srb) + i, 
xl_mmio + MMIO_MAC_ACCESS_CMD) ; 
-   writeb(xl_priv-xl_laa[i],xl_mmio + MMIO_MACDATA) ; 
+   writeb(xl_priv-xl_laa[i-10],xl_mmio + MMIO_MACDATA) ;
}
memcpy(dev-dev_addr,xl_priv-xl_laa,dev-addr_len) ; 
} else { /* Regular hardware address */ 
-- 
1.4.3.4
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: drivers/infiniband/mlx/mad.c misplaced ;

2007-08-16 Thread Andy Whitcroft
Kok, Auke wrote:
 Joe Perches wrote:
 On Wed, 2007-08-15 at 19:58 -0400, Dave Jones wrote:
 Signed-off-by: Dave Jones [EMAIL PROTECTED]

 diff --git a/drivers/infiniband/hw/mlx4/mad.c
 b/drivers/infiniband/hw/mlx4/mad.c
 index 3330917..0ed02b7 100644
 --- a/drivers/infiniband/hw/mlx4/mad.c
 +++ b/drivers/infiniband/hw/mlx4/mad.c
 @@ -109,7 +109,7 @@ int mlx4_MAD_IFC(struct mlx4_ib_dev *dev, int
 ignore_mkey, int ignore_bkey,
 in_modifier, op_modifier,
 MLX4_CMD_MAD_IFC, MLX4_CMD_TIME_CLASS_C);
  
 -if (!err);
 +if (!err)

 There's more than a few of these (not inspected).

 $ egrep -r --include=*.c \bif[[:space:]]*\([^\)]*\)[[:space:]]*\; *
 arch/sh/boards/se/7343/io.c:if (0) ;
 drivers/atm/iphase.c: if (!desc1) ;
 drivers/infiniband/hw/mlx4/mad.c:   if (!err);
 drivers/isdn/capi/capiutil.c:   else if (c = 0x0f);
 drivers/net/tokenring/ibmtr.c:  else if (ti-shared_ram_paging ==
 0xf);  /* No paging in adapter */
 drivers/s390/scsi/zfcp_erp.c:   if (status ==
 ZFCP_ERP_SUCCEEDED) ; /* no further action */
 fs/hostfs/hostfs_user.c:if(attrs-ia_valid  HOSTFS_ATTR_CTIME) ;
 net/netfilter/xt_u32.c: if (skb-len  4 || pos  skb-len - 4);
 sound/pci/au88x0/au88x0_synth.c:if
 (eax == 0) ;
 
 sounds like an excellent candidate check for checkpatch.pl !!!

Yep.  My scan of 2.6.23-rc3 with a checkpatch with this new test added,
found 6 which seemed wrong.

-apw
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: PROBLEM: 2.6.23-rc NETDEV WATCHDOG: eth0: transmit timed out

2007-08-16 Thread Francois Romieu
(please do not remove the netdev Cc:)

Francois Romieu [EMAIL PROTECTED] :
[...]
 If it does not work I'll dissect 0e4851502f846b13b29b7f88f1250c980d57e944
 tomorrow.

You will find a tgz archive in attachment which contains a serie of patches
(0001-... to 0005-...) to walk from 6dccd16b7c2703e8bbf8bca62b5cf248332afbe2
to 0e4851502f846b13b29b7f88f1250c980d57e944 in smaller steps.

Please apply 0001 on top of 6dccd16b7c2703e8bbf8bca62b5cf248332afbe2. If it
still works, apply 0002 on top of 0001, etc.

-- 
Ueimor


r8169-meyer.tgz
Description: GNU Zip compressed data


Re: [RFC] net/core/dst.c : Should'nt dst_run_gc() be more scalable and friendly ?

2007-08-16 Thread Herbert Xu
Eric Dumazet [EMAIL PROTECTED] wrote:
 
 Yes, I already did this (with the current softirq based timer model),
 but how can dst_dev_event() do its work, since the GC is using 
 a private list. (In my patch, time to GC process XXX.000 entries is about XX 
 seconds.)
 
 We would have to change dst_dev_event() to :
 - Signal to GC it has to stop as soon as possible.
 - Wait for GC be stoped (busy wait I suspect we cannot sleep in 
 dst_dev_event() ? )

You can sleep in dst_dev_event so just use a mutex to separate
it from the GC.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmVHI~} [EMAIL PROTECTED]
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] lockdep: annotate rcu_read_{,un}lock()

2007-08-16 Thread Peter Zijlstra

There seem to be some unbalanced rcu_read_{,un}lock() issues of late,
how about doing something like this:

Signed-off-by: Peter Zijlstra [EMAIL PROTECTED]
---
 include/linux/lockdep.h  |7 +++
 include/linux/rcupdate.h |   12 
 kernel/rcupdate.c|8 
 3 files changed, 27 insertions(+)

Index: linux-2.6/include/linux/lockdep.h
===
--- linux-2.6.orig/include/linux/lockdep.h
+++ linux-2.6/include/linux/lockdep.h
@@ -252,6 +252,13 @@ extern void lockdep_init_map(struct lock
 struct lock_class_key *key, int subclass);
 
 /*
+ * To initialize a lockdep_map statically use this macro.
+ * Note that _name must not be NULL.
+ */
+#define STATIC_LOCKDEP_MAP_INIT(_name, _key) \
+   { .name = (_name), .key = (void *)(_key), }
+
+/*
  * Reinitialize a lock key - for cases where there is special locking or
  * special initialization of locks so that the validator gets the scope
  * of dependencies wrong: they are either too broad (they need a class-split)
Index: linux-2.6/include/linux/rcupdate.h
===
--- linux-2.6.orig/include/linux/rcupdate.h
+++ linux-2.6/include/linux/rcupdate.h
@@ -41,6 +41,7 @@
 #include linux/percpu.h
 #include linux/cpumask.h
 #include linux/seqlock.h
+#include linux/lockdep.h
 
 /**
  * struct rcu_head - callback structure for use with RCU
@@ -133,6 +134,15 @@ static inline void rcu_bh_qsctr_inc(int 
 extern int rcu_pending(int cpu);
 extern int rcu_needs_cpu(int cpu);
 
+#ifdef CONFIG_DEBUG_LOCK_ALLOC
+extern struct lockdep_map rcu_lock_map;
+# define rcu_read_acquire()lock_acquire(rcu_lock_map, 0, 0, 2, 1, 
_THIS_IP_)
+# define rcu_read_release()lock_release(rcu_lock_map, 1, _THIS_IP_)
+#else
+# define rcu_read_acquire()do { } while (0)
+# define rcu_read_release()do { } while (0)
+#endif
+
 /**
  * rcu_read_lock - mark the beginning of an RCU read-side critical section.
  *
@@ -166,6 +176,7 @@ extern int rcu_needs_cpu(int cpu);
do { \
preempt_disable(); \
__acquire(RCU); \
+   rcu_read_acquire(); \
} while(0)
 
 /**
@@ -175,6 +186,7 @@ extern int rcu_needs_cpu(int cpu);
  */
 #define rcu_read_unlock() \
do { \
+   rcu_read_release(); \
__release(RCU); \
preempt_enable(); \
} while(0)
Index: linux-2.6/kernel/rcupdate.c
===
--- linux-2.6.orig/kernel/rcupdate.c
+++ linux-2.6/kernel/rcupdate.c
@@ -49,6 +49,14 @@
 #include linux/cpu.h
 #include linux/mutex.h
 
+#ifdef CONFIG_DEBUG_LOCK_ALLOC
+static struct lock_class_key rcu_lock_key;
+struct lockdep_map rcu_lock_map =
+   STATIC_LOCKDEP_MAP_INIT(rcu_read_lock, rcu_lock_key);
+
+EXPORT_SYMBOL_GPL(rcu_lock_map);
+#endif
+
 /* Definition for rcupdate control block. */
 static struct rcu_ctrlblk rcu_ctrlblk = {
.cur = -300,


-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-16 Thread Ilpo Järvinen
On Thu, 16 Aug 2007, Herbert Xu wrote:

 We've been through that already.  If it's a busy-wait it
 should use cpu_relax. 

I looked around a bit by using some command lines and ended up wondering 
if these are equal to busy-wait case (and should be fixed) or not:

./drivers/telephony/ixj.c
6674:   while (atomic_read(j-DSPWrite)  0)
6675-   atomic_dec(j-DSPWrite);

...besides that, there are couple of more similar cases in the same file 
(with braces)...


-- 
 i.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] nl80211: remove VLAN type

2007-08-16 Thread Johannes Berg
On Wed, 2007-08-15 at 11:18 +0200, Johannes Berg wrote:
 There is no point in trying to handle source MAC address based VLANs in
 the wireless stack, something like smacvlan modeled after macvlan
 should be sufficient.

Whoops. I only forgot the use with encryption. Please consider this
patch withdrawn, at least until I see clearer through that crypto issue.

johannes


signature.asc
Description: This is a digitally signed message part


Re: drivers/infiniband/mlx/mad.c misplaced ;

2007-08-16 Thread Jeff Dike
On Wed, Aug 15, 2007 at 05:40:11PM -0700, Joe Perches wrote:
 fs/hostfs/hostfs_user.c:if(attrs-ia_valid  HOSTFS_ATTR_CTIME) ;

This one can be deleted, but I think I did it for documentation
reasons, to make it clear that ctime handling wasn't left out by
mistake.

Jeff

-- 
Work email - jdike at linux dot intel dot com
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] net/core/dst.c : Should'nt dst_run_gc() be more scalable and friendly ?

2007-08-16 Thread Eric Dumazet
On Thu, 16 Aug 2007 21:59:43 +0800
Herbert Xu [EMAIL PROTECTED] wrote:

 Eric Dumazet [EMAIL PROTECTED] wrote:
  
  Yes, I already did this (with the current softirq based timer model),
  but how can dst_dev_event() do its work, since the GC is using 
  a private list. (In my patch, time to GC process XXX.000 entries is about 
  XX seconds.)
  
  We would have to change dst_dev_event() to :
  - Signal to GC it has to stop as soon as possible.
  - Wait for GC be stoped (busy wait I suspect we cannot sleep in 
  dst_dev_event() ? )
 
 You can sleep in dst_dev_event so just use a mutex to separate
 it from the GC.

Thanks for the suggestion, I added dst_mutex and attempt a mutex_trylock() 
inside dst_run_gc().

So do you think this patch is enough or should we convert dst_run_gc processing 
from softirq to workqueue too ?

Eric

[PATCH] net/core/dst.c : let dst_run_gc() do its job incrementally

When the periodic route cache flush is done, we can feed dst_garbage_list with 
a huge number of dst_entry structs.

Then dst_run_gc() is doing a scan of a possibly huge list of dst_entries, 
eventually freeing some (less than 1%) of them, while holding the dst_lock 
spinlock for the whole scan. This sometimes triggers a soft lockup. 

Then it rearms a timer to redo the full thing 1/10 s later. The slowdown can 
last one minute or so.
No need to say that machine is not really usable in the meantime.

This patch attempts to solve the problem giving dst_run_gc() the ability to 
perform its work by chunks instead of whole list scan. We limit a chunk by the 
number of entries that could not be freed in a run. This should limit the CPU 
cache trashing (currently 128 bytes per entry on x86_64), yet giving a chance 
to free unreferenced entries (not included in the quota) if the load is really 
high.

Signed-off-by: Eric Dumazet [EMAIL PROTECTED]

--- linux-2.6.22/net/core/dst.c.orig
+++ linux-2.6.22/net/core/dst.c
@@ -9,6 +9,7 @@
 #include linux/errno.h
 #include linux/init.h
 #include linux/kernel.h
+#include linux/mutex.h
 #include linux/mm.h
 #include linux/module.h
 #include linux/netdevice.h
@@ -28,10 +29,17 @@
  * 4) All operations modify state, so a spinlock is used.
  */
 static struct dst_entry*dst_garbage_list;
+static struct dst_entry *dst_garbage_wrk;
+static struct dst_entry *dst_garbage_inuse;
+static int dst_gc_running;
 #if RT_CACHE_DEBUG = 2
 static atomic_t dst_total = ATOMIC_INIT(0);
 #endif
 static DEFINE_SPINLOCK(dst_lock);
+/*
+ * A mutex is used to synchronize GC with dst_dev_event()
+ */
+static DEFINE_MUTEX(dst_mutex);
 
 static unsigned long dst_gc_timer_expires;
 static unsigned long dst_gc_timer_inc = DST_GC_MAX;
@@ -42,26 +50,39 @@ static DEFINE_TIMER(dst_gc_timer, dst_ru
 
 static void dst_run_gc(unsigned long dummy)
 {
-   intdelayed = 0;
-   intwork_performed;
-   struct dst_entry * dst, **dstp;
+   intquota = 1000;
+   intwork_performed = 0;
+   struct dst_entry *dst, *next;
+   struct dst_entry *first = NULL, *last = NULL;
 
-   if (!spin_trylock(dst_lock)) {
+   if (!mutex_trylock(dst_mutex)) {
mod_timer(dst_gc_timer, jiffies + HZ/10);
return;
}
+   dst_gc_running = 1;
 
+   spin_lock(dst_lock);
del_timer(dst_gc_timer);
-   dstp = dst_garbage_list;
-   work_performed = 0;
-   while ((dst = *dstp) != NULL) {
-   if (atomic_read(dst-__refcnt)) {
-   dstp = dst-next;
-   delayed++;
+   if (!dst_garbage_wrk) {
+   dst_garbage_wrk = dst_garbage_list;
+   dst_garbage_list = dst_garbage_inuse;
+   dst_garbage_inuse = NULL;
+   }
+   next = dst_garbage_wrk;
+   spin_unlock(dst_lock);
+
+   while ((dst = next) != NULL) {
+   next = dst-next;
+   if (likely(atomic_read(dst-__refcnt))) {
+   if (unlikely(last == NULL))
+   last = dst;
+   dst-next = first;
+   first = dst;
+   if (--quota == 0)
+   break;
continue;
}
-   *dstp = dst-next;
-   work_performed = 1;
+   work_performed++;
 
dst = dst_destroy(dst);
if (dst) {
@@ -77,16 +98,26 @@ static void dst_run_gc(unsigned long dum
continue;
 
___dst_free(dst);
-   dst-next = *dstp;
-   *dstp = dst;
-   dstp = dst-next;
+   dst-next = next;
+   next = dst;
}
}
-   if (!dst_garbage_list) {
+   dst_garbage_wrk = next;
+
+   spin_lock(dst_lock);
+   dst_gc_running = 0;
+   if (last != NULL) {
+   last-next = 

Re: [patch 2/4] Update e100 driver to use devres.

2007-08-16 Thread Brandon Philips
On 20:34 Thu 16 Aug 2007, Tejun Heo wrote:
 Brandon Philips wrote:
  @@ -2554,8 +2547,10 @@ static int __devinit e100_probe(struct p
  struct net_device *netdev;
  struct nic *nic;
  int err;
  +   void __iomem **iomap;
  +   int bar;
   
  -   if(!(netdev = alloc_etherdev(sizeof(struct nic {
  +   if (!(netdev = devm_alloc_etherdev(pdev-dev, sizeof(struct nic {
  if(((1  debug) - 1)  NETIF_MSG_PROBE)
  printk(KERN_ERR PFX Etherdev alloc failed, abort.\n);
  return -ENOMEM;
  @@ -2585,26 +2580,28 @@ static int __devinit e100_probe(struct p
  nic-msg_enable = (1  debug) - 1;
  pci_set_drvdata(pdev, netdev);
   
  -   if((err = pci_enable_device(pdev))) {
  +   if ((err = pcim_enable_device(pdev))) {
  DPRINTK(PROBE, ERR, Cannot enable PCI device, aborting.\n);
  -   goto err_out_free_dev;
  +   return err;
  }
   
  if(!(pci_resource_flags(pdev, 0)  IORESOURCE_MEM)) {
  DPRINTK(PROBE, ERR, Cannot find proper PCI device 
  base address, aborting.\n);
  err = -ENODEV;
  -   goto err_out_disable_pdev;
  +   return err;
  }
   
  -   if((err = pci_request_regions(pdev, DRV_NAME))) {
  +   bar = use_io ? 1 : 0;
  +   if((err = pcim_iomap_regions(pdev, 1  bar, DRV_NAME))) {
  DPRINTK(PROBE, ERR, Cannot obtain PCI resources, aborting.\n);
  -   goto err_out_disable_pdev;
  +   return err;
  }
  +   iomap = pcim_iomap_table(pdev)[bar];
 
 Type mismatch.  Didn't compiler warn about it?

s/iomap/nic-csr/

   
  if((err = pci_set_dma_mask(pdev, DMA_32BIT_MASK))) {
  DPRINTK(PROBE, ERR, No usable DMA configuration, aborting.\n);
  -   goto err_out_free_res;
  +   return err;
  }
   
  SET_MODULE_OWNER(netdev);
  @@ -2613,12 +2610,6 @@ static int __devinit e100_probe(struct p
  if (use_io)
  DPRINTK(PROBE, INFO, using i/o access mode\n);
   
  -   nic-csr = pci_iomap(pdev, (use_io ? 1 : 0), sizeof(struct csr));
  -   if(!nic-csr) {
  -   DPRINTK(PROBE, ERR, Cannot map device registers, aborting.\n);
  -   err = -ENOMEM;
  -   goto err_out_free_res;
  -   }
 
 nic-csr initialization seems missing.  Have you tested the patched code?

No I didn't test it; my e100 box gave up on me.  I shouldn't have sent
this patch since I wasn't able to get it tested.

Thanks,

Brandon
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [GENETLINK]: Question: global lock (genl_mutex) possible refinement?

2007-08-16 Thread Thomas Graf
* Richard MUSIL [EMAIL PROTECTED] 2007-07-24 13:09
 Thomas Graf wrote:
  Please provide a new overall patch which is not based on your
  initial patch so I can review your idea properly.
 
 Here it goes (merging two previous patches). I have diffed
 against v2.6.22, which I am using currently as my base:

Sorry for taking so long.

 @@ -150,9 +176,9 @@ int genl_register_ops(struct genl_family *family, struct 
 genl_ops *ops)
   if (ops-policy)
   ops-flags |= GENL_CMD_CAP_HASPOL;
  
 - genl_lock();
 + genl_fam_lock(family);
   list_add_tail(ops-ops_list, family-ops_list);
 - genl_unlock();
 + genl_fam_unlock(family);

For registering operations, it is sufficient to just acquire the
family lock, the family itself can't disappear while holding it.

 @@ -216,8 +242,9 @@ int genl_register_family(struct genl_family *family)
   goto errout;
  
   INIT_LIST_HEAD(family-ops_list);
 + mutex_init(family-lock);
  
 - genl_lock();
 + genl_fam_lock(family);
  
   if (genl_family_find_byname(family-name)) {
   err = -EEXIST;
 @@ -251,14 +278,14 @@ int genl_register_family(struct genl_family *family)
   family-attrbuf = NULL;
  
   list_add_tail(family-family_list, genl_family_chain(family-id));
 - genl_unlock();
 + genl_fam_unlock(family);

This looks good.

 @@ -303,38 +332,57 @@ static int genl_rcv_msg(struct sk_buff *skb, struct 
 nlmsghdr *nlh)
   struct genlmsghdr *hdr = nlmsg_data(nlh);
   int hdrlen, err;
  
 + genl_fam_lock(NULL);
   family = genl_family_find_byid(nlh-nlmsg_type);
 - if (family == NULL)
 + if (family == NULL) {
 + genl_fam_unlock(NULL);
   return -ENOENT;
 + }
 +
 + /* get particular family lock, but release global family lock
 +  * so registering operations for other families are possible */
 + genl_onefam_lock(family);
 + genl_fam_unlock(NULL);

I don't like having two locks for something as trivial as this.
Basically the only reason the global lock is required here is to
protect from family removal which can be avoided otherwise by
using RCU list operations.

Therefore, I'd propose the following lock semantics:
Use own global mutex to protect writing to the family list, make
reading side lockless using rcu for use when looking up family
upon mesage processing. Use a family lock to protect writing to
operations list and serialize messae processing with unregister
operations.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] lockdep: annotate rcu_read_{,un}lock()

2007-08-16 Thread Paul E. McKenney
On Thu, Aug 16, 2007 at 04:25:07PM +0200, Peter Zijlstra wrote:
 
 There seem to be some unbalanced rcu_read_{,un}lock() issues of late,
 how about doing something like this:

This will break when rcu_read_lock() and rcu_read_unlock() are invoked
from NMI/SMI handlers -- the raw_local_irq_save() in lock_acquire() will
not mask NMIs or SMIs.

One approach would be to check for being in an NMI/SMI handler, and
to avoid calling lock_acquire() and lock_release() in those cases.
Another approach would be to use sparse, which has checks for
rcu_read_lock()/rcu_read_unlock() nesting.

Thanx, Paul

 Signed-off-by: Peter Zijlstra [EMAIL PROTECTED]
 ---
  include/linux/lockdep.h  |7 +++
  include/linux/rcupdate.h |   12 
  kernel/rcupdate.c|8 
  3 files changed, 27 insertions(+)
 
 Index: linux-2.6/include/linux/lockdep.h
 ===
 --- linux-2.6.orig/include/linux/lockdep.h
 +++ linux-2.6/include/linux/lockdep.h
 @@ -252,6 +252,13 @@ extern void lockdep_init_map(struct lock
struct lock_class_key *key, int subclass);
 
  /*
 + * To initialize a lockdep_map statically use this macro.
 + * Note that _name must not be NULL.
 + */
 +#define STATIC_LOCKDEP_MAP_INIT(_name, _key) \
 + { .name = (_name), .key = (void *)(_key), }
 +
 +/*
   * Reinitialize a lock key - for cases where there is special locking or
   * special initialization of locks so that the validator gets the scope
   * of dependencies wrong: they are either too broad (they need a class-split)
 Index: linux-2.6/include/linux/rcupdate.h
 ===
 --- linux-2.6.orig/include/linux/rcupdate.h
 +++ linux-2.6/include/linux/rcupdate.h
 @@ -41,6 +41,7 @@
  #include linux/percpu.h
  #include linux/cpumask.h
  #include linux/seqlock.h
 +#include linux/lockdep.h
 
  /**
   * struct rcu_head - callback structure for use with RCU
 @@ -133,6 +134,15 @@ static inline void rcu_bh_qsctr_inc(int 
  extern int rcu_pending(int cpu);
  extern int rcu_needs_cpu(int cpu);
 
 +#ifdef CONFIG_DEBUG_LOCK_ALLOC
 +extern struct lockdep_map rcu_lock_map;
 +# define rcu_read_acquire()  lock_acquire(rcu_lock_map, 0, 0, 2, 1, 
 _THIS_IP_)
 +# define rcu_read_release()  lock_release(rcu_lock_map, 1, _THIS_IP_)
 +#else
 +# define rcu_read_acquire()  do { } while (0)
 +# define rcu_read_release()  do { } while (0)
 +#endif
 +
  /**
   * rcu_read_lock - mark the beginning of an RCU read-side critical section.
   *
 @@ -166,6 +176,7 @@ extern int rcu_needs_cpu(int cpu);
   do { \
   preempt_disable(); \
   __acquire(RCU); \
 + rcu_read_acquire(); \
   } while(0)
 
  /**
 @@ -175,6 +186,7 @@ extern int rcu_needs_cpu(int cpu);
   */
  #define rcu_read_unlock() \
   do { \
 + rcu_read_release(); \
   __release(RCU); \
   preempt_enable(); \
   } while(0)
 Index: linux-2.6/kernel/rcupdate.c
 ===
 --- linux-2.6.orig/kernel/rcupdate.c
 +++ linux-2.6/kernel/rcupdate.c
 @@ -49,6 +49,14 @@
  #include linux/cpu.h
  #include linux/mutex.h
 
 +#ifdef CONFIG_DEBUG_LOCK_ALLOC
 +static struct lock_class_key rcu_lock_key;
 +struct lockdep_map rcu_lock_map =
 + STATIC_LOCKDEP_MAP_INIT(rcu_read_lock, rcu_lock_key);
 +
 +EXPORT_SYMBOL_GPL(rcu_lock_map);
 +#endif
 +
  /* Definition for rcupdate control block. */
  static struct rcu_ctrlblk rcu_ctrlblk = {
   .cur = -300,
 
 
 -
 To unsubscribe from this list: send the line unsubscribe netdev in
 the body of a message to [EMAIL PROTECTED]
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-16 Thread Stefan Richter
Ilpo Järvinen wrote:
 I looked around a bit by using some command lines and ended up wondering 
 if these are equal to busy-wait case (and should be fixed) or not:
 
 ./drivers/telephony/ixj.c
 6674:   while (atomic_read(j-DSPWrite)  0)
 6675-   atomic_dec(j-DSPWrite);
 
 ...besides that, there are couple of more similar cases in the same file 
 (with braces)...

Generally, ixj.c has several occurrences of couples of atomic write and
atomic read which potentially do not do what the author wanted.
-- 
Stefan Richter
-=-=-=== =--- =
http://arcgraph.de/sr/
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: drivers/infiniband/mlx/mad.c misplaced ;

2007-08-16 Thread Paul Mundt
On Wed, Aug 15, 2007 at 05:40:11PM -0700, Joe Perches wrote:
 $ egrep -r --include=*.c \bif[[:space:]]*\([^\)]*\)[[:space:]]*\; * 
 arch/sh/boards/se/7343/io.c:if (0) ;
 drivers/atm/iphase.c: if (!desc1) ;
 drivers/infiniband/hw/mlx4/mad.c:   if (!err);
 drivers/isdn/capi/capiutil.c:   else if (c = 0x0f);
 drivers/net/tokenring/ibmtr.c:  else if (ti-shared_ram_paging == 0xf);  /* 
 No paging in adapter */
 drivers/s390/scsi/zfcp_erp.c:   if (status == ZFCP_ERP_SUCCEEDED) ;   
   /* no further action */
 fs/hostfs/hostfs_user.c:if(attrs-ia_valid  HOSTFS_ATTR_CTIME) ;
 net/netfilter/xt_u32.c: if (skb-len  4 || pos  skb-len - 4);
 sound/pci/au88x0/au88x0_synth.c:if (eax == 0) 
 ;

On Thu, Aug 16, 2007 at 02:18:40PM +0100, Andy Whitcroft wrote:
 A couple of people suggested adding checks to checkpatch for trailing
 semicolons on conditionals, where the conditional block may not be
 actually conditional:
 
   if (err);
   return err;
 
 While regression testing the changes, I ran these checks across the
 whole of 2.6.23-rc3 and there appear to be 5 places where this is
 occurs (above and beyond the IPv6 one which triggered this effort)
 and a benign use which could be confused later which it seems safest
 to fix.
 
 Following this email are 6 patches for these issues, relevant
 maintainers cc'd.  All against 2.6.23-rc3
 
It looks like you may want to refine your search parameters to match the
above, as there are at least a few cases where the space exists.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-16 Thread Paul E. McKenney
On Thu, Aug 16, 2007 at 06:42:50PM +0800, Herbert Xu wrote:
 On Thu, Aug 16, 2007 at 12:31:03PM +0200, Stefan Richter wrote:
  
  PS:  Just to clarify, I'm not speaking for the volatile modifier.  I'm
  not speaking for any particular implementation of atomic_t and its
  accessors at all.  All I am saying is that
- we use atomically accessed data types because we concurrently but
  locklessly access this data,
- hence a read access to this data that could be optimized away
  makes *no sense at all*.
 
 No sane compiler can optimise away an atomic_read per se.
 That's only possible if there's a preceding atomic_set or
 atomic_read, with no barriers in the middle.
 
 If that's the case, then one has to conclude that doing
 away with the second read is acceptable, as otherwise
 a memory (or at least a compiler) barrier should have been
 used.

The compiler can also reorder non-volatile accesses.  For an example
patch that cares about this, please see:

http://lkml.org/lkml/2007/8/7/280

This patch uses an ORDERED_WRT_IRQ() in rcu_read_lock() and
rcu_read_unlock() to ensure that accesses aren't reordered with respect
to interrupt handlers and NMIs/SMIs running on that same CPU.

 In fact, volatile doesn't guarantee that the memory gets
 read anyway.  You might be reading some stale value out
 of the cache.  Granted this doesn't happen on x86 but
 when you're coding for the kernel you can't make such
 assumptions.
 
 So the point here is that if you don't mind getting a stale
 value from the CPU cache when doing an atomic_read, then
 surely you won't mind getting a stale value from the compiler
 cache.

Absolutely disagree.  An interrupt/NMI/SMI handler running on the CPU
will see the same value (whether in cache or in store buffer) that
the mainline code will see.  In this case, we don't care about CPU
misordering, only about compiler misordering.  It is easy to see
other uses that combine communication with handlers on the current
CPU with communication among CPUs -- again, see prior messages in
this thread.

  So, the architecture guys can implement atomic_read however they want
  --- as long as it cannot be optimized away.*
 
 They can implement it however they want as long as it stays
 atomic.

Precisely.  And volatility is a key property of atomic.  Let's please
not throw it away.

Thanx, Paul
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


lockdep report in the bonding code.

2007-08-16 Thread Dave Jones
A Fedora users reported this against our 2.6.23-rc3 build

Dave

NET: Registered protocol family 10
lo: Disabled Privacy Extensions
Ethernet Channel Bonding Driver: v3.1.3 (June 13, 2007)
bonding: MII link monitoring set to 100 ms
ADDRCONF(NETDEV_UP): bond0: link is not ready
bonding: bond0: Adding slave eth0.
e100: eth0: e100_watchdog: link up, 100Mbps, half-duplex
bonding: bond0: making interface eth0 the new active one.
bonding: bond0: enslaving eth0 as an active interface with an up link.
ADDRCONF(NETDEV_CHANGE): bond0: link becomes ready

=
[ INFO: inconsistent lock state ]
2.6.23-0.104.rc3.fc8 #1
-
inconsistent {softirq-on-W} - {in-softirq-W} usage.
events/0/9 [HC0[0]:SC1[2]:HE1:SE0] takes:
 ((bond_info-tx_hashtbl_lock)){-+..}, at: [f8ae4cc2] 
bond_alb_xmit+0x26a/0x42c [bonding]
{softirq-on-W} state was registered at:
  [c044a6e4] __lock_acquire+0x4ff/0xc67
  [c044b2c6] lock_acquire+0x7b/0x9e
  [c062ecc8] _spin_lock+0x2e/0x58
  [f8ae492e] bond_alb_initialize+0x64/0x18e [bonding]
  [f8ae1256] bond_open+0x33/0x175 [bonding]
  [c05ca8aa] dev_open+0x31/0x6c
  [c05c8a01] dev_change_flags+0xa3/0x156
  [c0609231] devinet_ioctl+0x207/0x50e
  [c06098df] inet_ioctl+0x86/0xa4
  [c05bec32] sock_ioctl+0x1ac/0x1c9
  [c0490b76] do_ioctl+0x22/0x68
  [c0490e05] vfs_ioctl+0x249/0x25c
  [c0490e61] sys_ioctl+0x49/0x64
  [c040519e] syscall_call+0x7/0xb
  [] 0x
irq event stamp: 266326
hardirqs last  enabled at (266326): [c043258f] local_bh_enable_ip+0xf3/0x106
hardirqs last disabled at (266325): [c0432516] local_bh_enable_ip+0x7a/0x106
softirqs last  enabled at (266306): [c05e2d20] rt_run_flush+0x6e/0x97
softirqs last disabled at (266309): [c0407548] do_softirq+0x74/0xf7

other info that might help us debug this:
3 locks held by events/0/9:
 #0:  (rtnl_mutex){--..}, at: [c062d8a9] mutex_lock+0x21/0x24
 #1:  (bond-lock){-.-+}, at: [f8ae4a90] bond_alb_xmit+0x38/0x42c [bonding]
 #2:  (bond-curr_slave_lock){..-+}, at: [f8ae4a98] bond_alb_xmit+0x40/0x42c 
[bonding]

stack backtrace:
 [c04063d8] show_trace_log_lvl+0x1a/0x2f
 [c0406e71] show_trace+0x12/0x14
 [c0406e89] dump_stack+0x16/0x18
 [c0448f8a] print_usage_bug+0x141/0x14b
 [c0449810] mark_lock+0x12f/0x472
 [c044a66c] __lock_acquire+0x487/0xc67
 [c044b2c6] lock_acquire+0x7b/0x9e
 [c062ecc8] _spin_lock+0x2e/0x58
 [f8ae4cc2] bond_alb_xmit+0x26a/0x42c [bonding]
 [c05c9467] dev_hard_start_xmit+0x21c/0x27a
 [c05cb489] dev_queue_xmit+0x1f7/0x2aa
 [f8b274e0] mld_sendpack+0x210/0x35a [ipv6]
 [f8b28535] mld_ifc_timer_expire+0x1e9/0x211 [ipv6]
 [c0435a5f] run_timer_softirq+0x127/0x18f
 [c0432708] __do_softirq+0x78/0xff
 [c0407548] do_softirq+0x74/0xf7
 [c04325e6] irq_exit+0x44/0x46
 [c041d3a3] smp_apic_timer_interrupt+0x74/0x81
 [c0405cb7] apic_timer_interrupt+0x33/0x38
 [c062eba6] _spin_unlock_bh+0x25/0x28
 [c05e2d20] rt_run_flush+0x6e/0x97
 [c05e471c] rt_cache_flush+0x7c/0xaf
 [c060e86f] fib_netdev_event+0x62/0x68
 [c0630e6c] notifier_call_chain+0x2b/0x4a
 [c04391c8] __raw_notifier_call_chain+0x19/0x1e
 [c04391e7] raw_notifier_call_chain+0x1a/0x1c
 [c05c84f3] netdev_state_change+0x20/0x31
 [c05d251a] __linkwatch_run_queue+0x159/0x183
 [c05d2564] linkwatch_event+0x20/0x27
 [c043bd9b] run_workqueue+0x7d/0x129
 [c043c75b] worker_thread+0xbb/0xc8
 [c043ef6f] kthread+0x3b/0x64
 [c0405e5b] kernel_thread_helper+0x7/0x10
 ===
bonding: bond0: Adding slave eth1.
e100: eth1: e100_watchdog: link up, 100Mbps, half-duplex
bonding: bond0: enslaving eth1 as an active interface with an up link.
audit(1187170179.728:3): audit_pid=1885 old=0 by auid=4294967295
bond0: no IPv6 routers present

-- 
http://www.codemonkey.org.uk
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch 4/4] Update e1000 driver to use devres.

2007-08-16 Thread Brandon Philips
On 01:38 Thu 16 Aug 2007, Waskiewicz Jr, Peter P wrote:
  -   err = -ENOMEM;
  -   netdev = alloc_etherdev(sizeof(struct e1000_adapter));
  +   netdev = devm_alloc_etherdev(pdev-dev, sizeof(struct 
  +e1000_adapter));
  if (!netdev)
  -   goto err_alloc_etherdev;
  +   return -ENOMEM;
 
 I'm a bit confused why you removed the goto's, and then removed all the
 target unwinding code at the bottom of e1000_probe().   Those labels
 clean up resources if something fails, like the err_sw_init label.  I
 don't see anything in the devres code that jumps out at me that explains
 why we can do away with these cleanup routines.  Thoughts?

Have you read Documentation/driver-model/devres.txt?  That has a good
explanation.  Here is a practical explanation on how it works too.

This is the output from a normal modprobe then rmmod of e1000 with
devres debugging on.

# modprobe
DEVRES ADD f7fd6cc0 pcim_release (8 bytes)
DEVRES ADD f7a80fe0 devm_free_netdev (4 bytes) # netdev **p for free_netdev
DEVRES ADD f7dbe780 pcim_iomap_release (24 bytes) # adapter-hw.hw_addr
DEVRES ADD f7dbe9c0 devm_kzalloc_release (40 bytes) # adapter-tx_ring
DEVRES ADD f7dbe8c0 devm_kzalloc_release (44 bytes) # adapter-rx_ring

# rmmod
DEVRES REL f7dbe8c0 devm_kzalloc_release (44 bytes) # adapter-tx_ring
DEVRES REL f7dbe9c0 devm_kzalloc_release (40 bytes) # adapter-rx_ring
DEVRES REL f7dbe780 pcim_iomap_release (24 bytes) # adapter-hw.hw_addr
DEVRES REL f7a80fe0 devm_free_netdev (4 bytes) # called free_netdev
DEVRES REL f7fd6cc0 pcim_release (8 bytes)

Now if I insert a return -ENOMEM right after allocating tx_ring:
--- a/drivers/net/e1000/e1000_main.c
+++ b/drivers/net/e1000/e1000_main.c
@@ -1356,6 +1356,8 @@ e1000_alloc_queues(struct e1000_adapter *adapter)
 {
adapter-tx_ring = kcalloc(adapter-num_tx_queues,
   sizeof(struct e1000_tx_ring),
GFP_KERNEL);
+
+   return -ENOMEM;
if (!adapter-tx_ring)
return -ENOMEM;

#insmod
DEVRES ADD f7a80e80 pcim_release (8 bytes)
DEVRES ADD f7a80ca0 devm_free_netdev (4 bytes)
DEVRES ADD eb7f0080 pcim_iomap_release (24 bytes)
DEVRES ADD eb7f devm_kzalloc_release (40 bytes)
e1000_sw_init: Unable to allocate memory for queues 
DEVRES REL eb7f devm_kzalloc_release (40 bytes)
DEVRES REL eb7f0080 pcim_iomap_release (24 bytes)
DEVRES REL f7a80ca0 devm_free_netdev (4 bytes)
DEVRES REL f7a80e80 pcim_release (8 bytes)
ACPI: PCI interrupt for device :02:00.0 disabled
e1000: probe of :02:00.0 failed with error -12

Since we are returning an error from probe the driver core calls
devres_release_all(dev) which releases all of the resources in the right
order.  See really_probe() in drivers/base/dd.c.

SIDE NOTE
-
I ran into a possible e1000 bug with the little -ENOMEM patch above both
with and without the devres patches.  The driver seems to leave the
EEPROM in a bad state on error because I get this error after trying to
insert the module again: 

e1000_probe: The EEPROM Checksum Is Not Valid 

A power cycle but not a reboot fixes it.

Thanks,

Brandon
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] [DM9000] External PHY support

2007-08-16 Thread Ben Dooks
On Thu, Aug 16, 2007 at 10:16:08AM +0200, Laurent Pinchart wrote:
 This patch adds a flag to the DM9000 platform data which, when set,
 configures the device to use an external PHY.
 
 Signed-off-by: Laurent Pinchart [EMAIL PROTECTED]
Acked-by: Ben Dooks [EMAIL PROTECTED]
 ---
  drivers/net/dm9000.c   |6 ++
  include/linux/dm9000.h |1 +
  2 files changed, 7 insertions(+), 0 deletions(-)
 
 diff --git a/drivers/net/dm9000.c b/drivers/net/dm9000.c
 index a424810..a86984e 100644
 --- a/drivers/net/dm9000.c
 +++ b/drivers/net/dm9000.c
 @@ -136,6 +136,7 @@ typedef struct board_info {
   u16 dbug_cnt;
   u8 io_mode; /* 0:word, 2:byte */
   u8 phy_addr;
 + unsigned int flags;
  
   void (*inblk)(void __iomem *port, void *data, int length);
   void (*outblk)(void __iomem *port, void *data, int length);
 @@ -528,6 +529,8 @@ dm9000_probe(struct platform_device *pdev)
  
   if (pdata-dumpblk != NULL)
   db-dumpblk = pdata-dumpblk;
 +
 + db-flags = pdata-flags;
   }
  
   dm9000_reset(db);
 @@ -670,6 +673,9 @@ dm9000_init_dm9000(struct net_device *dev)
   iow(db, DM9000_GPCR, GPCR_GEP_CNTL);/* Let GPIO0 output */
   iow(db, DM9000_GPR, 0); /* Enable PHY */
  
 + if (db-flags  DM9000_PLATF_EXT_PHY)
 + iow(db, DM9000_NCR, NCR_EXT_PHY);
 +
   /* Program operating register */
   iow(db, DM9000_TCR, 0); /* TX Polling clear */
   iow(db, DM9000_BPTR, 0x3f); /* Less 3Kb, 200us */
 diff --git a/include/linux/dm9000.h b/include/linux/dm9000.h
 index 0008e2a..ea530fd 100644
 --- a/include/linux/dm9000.h
 +++ b/include/linux/dm9000.h
 @@ -19,6 +19,7 @@
  #define DM9000_PLATF_8BITONLY(0x0001)
  #define DM9000_PLATF_16BITONLY   (0x0002)
  #define DM9000_PLATF_32BITONLY   (0x0004)
 +#define DM9000_PLATF_EXT_PHY (0x0008)
  
  /* platfrom data for platfrom device structure's platfrom_data field */
  
 -- 
 1.5.0
 
 -- 
 Laurent Pinchart
 CSE Semaphore Belgium
 
 Chauss?e de Bruxelles, 732A
 B-1410 Waterloo
 Belgium
 
 T +32 (2) 387 42 59
 F +32 (2) 387 42 75
 -
 To unsubscribe from this list: send the line unsubscribe netdev in
 the body of a message to [EMAIL PROTECTED]
 More majordomo info at  http://vger.kernel.org/majordomo-info.html

-- 
Ben ([EMAIL PROTECTED], http://www.fluff.org/)

  'a smiley only costs 4 bytes'
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] [DM9000] Added support for big-endian hosts

2007-08-16 Thread Ben Dooks
On Thu, Aug 16, 2007 at 10:15:35AM +0200, Laurent Pinchart wrote:
 This patch splits the receive status in 8bit wide fields and convert the
 packet length from little endian to CPU byte order.
 
 Signed-off-by: Laurent Pinchart [EMAIL PROTECTED]
Acked-by: Ben Dooks [EMAIL PROTECTED]
 ---
  drivers/net/dm9000.c |   13 +++--
  1 files changed, 7 insertions(+), 6 deletions(-)
 
 diff --git a/drivers/net/dm9000.c b/drivers/net/dm9000.c
 index c3de81b..a424810 100644
 --- a/drivers/net/dm9000.c
 +++ b/drivers/net/dm9000.c
 @@ -894,7 +894,8 @@ dm9000_timer(unsigned long data)
  }
  
  struct dm9000_rxhdr {
 - u16 RxStatus;
 + u8  RxPktReady;
 + u8  RxStatus;
   u16 RxLen;
  } __attribute__((__packed__));
  
 @@ -935,7 +936,7 @@ dm9000_rx(struct net_device *dev)
  
   (db-inblk)(db-io_data, rxhdr, sizeof(rxhdr));
  
 - RxLen = rxhdr.RxLen;
 + RxLen = le16_to_cpu(rxhdr.RxLen);
  
   /* Packet Status check */
   if (RxLen  0x40) {
 @@ -947,17 +948,17 @@ dm9000_rx(struct net_device *dev)
   PRINTK1(RST: RX Len:%x\n, RxLen);
   }
  
 - if (rxhdr.RxStatus  0xbf00) {
 + if (rxhdr.RxStatus  0xbf) {
   GoodPacket = false;
 - if (rxhdr.RxStatus  0x100) {
 + if (rxhdr.RxStatus  0x01) {
   PRINTK1(fifo error\n);
   db-stats.rx_fifo_errors++;
   }
 - if (rxhdr.RxStatus  0x200) {
 + if (rxhdr.RxStatus  0x02) {
   PRINTK1(crc error\n);
   db-stats.rx_crc_errors++;
   }
 - if (rxhdr.RxStatus  0x8000) {
 + if (rxhdr.RxStatus  0x80) {
   PRINTK1(length error\n);
   db-stats.rx_length_errors++;
   }
 -- 
 1.5.0
 
 -- 
 Laurent Pinchart
 CSE Semaphore Belgium
 
 Chauss?e de Bruxelles, 732A
 B-1410 Waterloo
 Belgium
 
 T +32 (2) 387 42 59
 F +32 (2) 387 42 75
 -
 To unsubscribe from this list: send the line unsubscribe netdev in
 the body of a message to [EMAIL PROTECTED]
 More majordomo info at  http://vger.kernel.org/majordomo-info.html

-- 
Ben ([EMAIL PROTECTED], http://www.fluff.org/)

  'a smiley only costs 4 bytes'
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [patch 4/4] Update e1000 driver to use devres.

2007-08-16 Thread Waskiewicz Jr, Peter P
 Now if I insert a return -ENOMEM right after allocating tx_ring:
 --- a/drivers/net/e1000/e1000_main.c
 +++ b/drivers/net/e1000/e1000_main.c
 @@ -1356,6 +1356,8 @@ e1000_alloc_queues(struct e1000_adapter 
 *adapter)  {
 adapter-tx_ring = kcalloc(adapter-num_tx_queues,
sizeof(struct 
 e1000_tx_ring), GFP_KERNEL);
 +
 +   return -ENOMEM;
 if (!adapter-tx_ring)
 return -ENOMEM;
 
 #insmod
 DEVRES ADD f7a80e80 pcim_release (8 bytes) DEVRES ADD 
 f7a80ca0 devm_free_netdev (4 bytes) DEVRES ADD eb7f0080 
 pcim_iomap_release (24 bytes) DEVRES ADD eb7f 
 devm_kzalloc_release (40 bytes)
 e1000_sw_init: Unable to allocate memory for queues DEVRES 
 REL eb7f devm_kzalloc_release (40 bytes) DEVRES REL 
 eb7f0080 pcim_iomap_release (24 bytes) DEVRES REL f7a80ca0 
 devm_free_netdev (4 bytes) DEVRES REL f7a80e80 pcim_release (8 bytes)
 ACPI: PCI interrupt for device :02:00.0 disabled
 e1000: probe of :02:00.0 failed with error -12
 
 Since we are returning an error from probe the driver core calls
 devres_release_all(dev) which releases all of the resources 
 in the right order.  See really_probe() in drivers/base/dd.c.

This looks fine then to me.  Thanks for the explanation.

-PJ
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [E1000-devel] [patch 4/4] Update e1000 driver to use devres.

2007-08-16 Thread Kok, Auke

Tejun Heo wrote:

Brandon Philips wrote:

-   mmio_start = pci_resource_start(pdev, BAR_0);
mmio_len = pci_resource_len(pdev, BAR_0);


You don't need mmio_len either.


-   err = -EIO;
-   adapter-hw.hw_addr = ioremap(mmio_start, mmio_len);
+   adapter-hw.hw_addr = pcim_iomap(pdev, BAR_0, mmio_len);


Passing 0 as @max_len tells pci[m]_iomap() to use pci_resource_len() of
the BAR.


@@ -952,16 +948,15 @@ e1000_probe(struct pci_dev *pdev,
/* setup the private structure */
 
 	if ((err = e1000_sw_init(adapter)))

-   goto err_sw_init;
+   return err;
 
 	err = -EIO;

/* Flash BAR mapping must happen after e1000_sw_init
 * because it depends on mac_type */
if ((adapter-hw.mac_type == e1000_ich8lan) 
   (pci_resource_flags(pdev, 1)  IORESOURCE_MEM)) {
-   flash_start = pci_resource_start(pdev, 1);
flash_len = pci_resource_len(pdev, 1);


Ditto.


-   adapter-hw.flash_address = ioremap(flash_start, flash_len);
+   adapter-hw.flash_address = pcim_iomap(pdev, 1, flash_len);
if (!adapter-hw.flash_address)
goto err_flashmap;
}





brandon,

seeing the multiple revisions I am scared that this will produce some fallout 
and e1000 already is quite fragile. I would suggest that you instead work 
against e1000e which is in a branch on jgarzik's netdev tree instead. This 
driver is new and it would be much more interesting to have devres used in here 
instead.


Since this driver is in -mm as well this would give your patches some testing 
before it goes upstream.


Let's leave e1000 alone for now if we can.

Auke
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-16 Thread Christoph Lameter
On Thu, 16 Aug 2007, Paul Mackerras wrote:

 Herbert Xu writes:
 
  It doesn't matter.  The memory pressure flag is an *advisory*
  flag.  If we get it wrong the worst that'll happen is that we'd
  waste some time doing work that'll be thrown away.
 
 Ah, so it's the racy but I don't care because it's only an
 optimization case.  That's fine.  Somehow I find it hard to believe
 that all the racy uses of atomic_read in the kernel are like that,
 though. :)

My use of atomic_read in SLUB is like that. Volatile does not magically 
sync up reads somehow.

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-16 Thread Christoph Lameter
On Thu, 16 Aug 2007, Paul Mackerras wrote:

 The uses of atomic_read where one might want it to allow caching of
 the result seem to me to fall into 3 categories:
 
 1. Places that are buggy because of a race arising from the way it's
used.
 
 2. Places where there is a race but it doesn't matter because we're
doing some clever trick.
 
 3. Places where there is some locking in place that eliminates any
potential race.
 
 In case 1, adding volatile won't solve the race, of course, but it's
 hard to argue that we shouldn't do something because it will slow down
 buggy code.  Case 2 is hopefully pretty rare and accompanied by large
 comment blocks, and in those cases caching the result of atomic_read
 explicitly in a local variable would probably make the code clearer.
 And in case 3 there is no reason to use atomic_t at all; we might as
 well just use an int.

In 2 + 3 you may increment the atomic variable in some places. The value 
of the atomic variable may not matter because you only do optimizations.

Checking a atomic_t for a definite state has to involve either
some side conditions (lock only taken if refcount is = 0 or so) or done 
by changing the state (see f.e. atomic_inc_unless_zero).

 So I don't see any good reason to make the atomic API more complex by
 having volatile and non-volatile versions of atomic_read.  It
 should just have the volatile behaviour.

If you want to make it less complex then drop volatile which causes weird 
side effects without solving any problems as you just pointed out.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-16 Thread Christoph Lameter
On Thu, 16 Aug 2007, Paul Mackerras wrote:

 
 It seems that there could be a lot of places where atomic_t is used in
 a non-atomic fashion, and that those uses are either buggy, or there
 is some lock held at the time which guarantees that other CPUs aren't
 changing the value.  In both cases there is no point in using
 atomic_t; we might as well just use an ordinary int.

The point of atomic_t is to do atomic *changes* to the variable.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Proposed interface for per-packet mesh-ttl

2007-08-16 Thread Luis Carlos Cobo
On 7/30/07, Stephen Hemminger [EMAIL PROTECTED] wrote:
 it would need an IP ttl to mesh mapping. The fundamental thing is to try
 and avoid topology specific options bleeding all the way up the socket layer,
 especially since the network layer is involved and may need to multipath.

I think the cleanest way would be to add a ll_ttl (ll for link layer)
field to struct sock and a SO_LL_TTL socket option that sets both the
field and a flag in sk-flags. This way it is useful for any driver
that can do mesh or any other protocol that involves a ttl at link
layer (not that I'm aware of any).

However I guess you are not supposed to add new socket options nor
modify struct socket too often so I'd appreciate feedback on whether
this would be considered a good approach.

-- 
Luis Carlos Cobo Rus   GnuPG ID: 44019B60
cozybit Inc.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-16 Thread Segher Boessenkool
Part of the motivation here is to fix heisenbugs.  If I knew where 
they



By the same token we should probably disable optimisations
altogether since that too can create heisenbugs.

Almost everything is a tradeoff; and so is this.  I don't
believe most people would find disabling all compiler
optimisations an acceptable price to pay for some peace
of mind.


So why is this a good tradeoff?


It certainly is better than disabling all compiler optimisations!


I also think that just adding things to APIs in the hope it might fix
up some bugs isn't really a good road to go down. Where do you stop?


I look at it the other way: keeping the volatile semantics in
atomic_XXX() (or adding them to it, whatever) helps _prevent_ bugs;
certainly most people expect that behaviour, and also that behaviour
is *needed* in some places and no other interface provides that
functionality.


[some confusion about barriers wrt atomics snipped]


Segher

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-16 Thread Segher Boessenkool

The only thing volatile on an asm does is create a side effect
on the asm statement; in effect, it tells the compiler do not
remove this asm even if you don't need any of its outputs.

It's not disabling optimisation likely to result in bugs,
heisen- or otherwise; _not_ putting the volatile on an asm
that needs it simply _is_ a bug :-)


Yep.  And the reason it is a bug is that it fails to disable
the relevant compiler optimizations.  So I suspect that we might
actually be saying the same thing here.


We're not saying the same thing, but we do agree :-)


Segher

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


e1000 autotuning doesn't get along with itself

2007-08-16 Thread Rick Jones

Folks -

I was trying to look at bonding vs discrete links and so put a couple dual-port 
e1000-driven NICs:


4a:01.1 Ethernet controller: Intel Corporation 82546GB Gigabit Ethernet 
Controller (rev 03)

Subsystem: Hewlett-Packard Company HP Dual Port 1000Base-T [A9900A]

into a pair of 8 core systems running a 2.6.22.2 kernel.  This gave me:

hpcpc109:~/netperf2_trunk# ethtool -i eth2
driver: e1000
version: 7.3.20-k2-NAPI
firmware-version: N/A
bus-info: :49:02.0

for the e1000 driver.  I connected the two systems back-to-back and started 
running some tests.  In the course of trying to look at something else 
(verifying the results reported by bwm-ng) I enabled demo-mode in netperf 
(./configure --enable-demo) and noticed a considerable oscillation.  I undid the 
bond and repeated the experiment with a discrete NIC:


hpcpc109:~/netperf2_trunk# src/netperf -t TCP_RR -H 192.168.2.105 -D 1.0 -l 15
TCP REQUEST/RESPONSE TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.2.105 
(192.168.2.105) port 0 AF_INET : demo : first burst 0

Interim result: 10014.93 Trans/s over 1.00 seconds
Interim result: 10015.79 Trans/s over 1.00 seconds
Interim result: 10014.30 Trans/s over 1.00 seconds
Interim result: 10016.29 Trans/s over 1.00 seconds
Interim result: 10085.80 Trans/s over 1.00 seconds
Interim result: 17526.61 Trans/s over 1.00 seconds
Interim result: 20007.60 Trans/s over 1.00 seconds
Interim result: 19626.46 Trans/s over 1.02 seconds
Interim result: 10616.44 Trans/s over 1.85 seconds
Interim result: 10014.88 Trans/s over 1.06 seconds
Interim result: 10015.79 Trans/s over 1.00 seconds
Interim result: 10014.80 Trans/s over 1.00 seconds
Interim result: 10035.30 Trans/s over 1.00 seconds
Interim result: 13974.69 Trans/s over 1.00 seconds
Local /Remote
Socket Size   Request  Resp.   Elapsed  Trans.
Send   Recv   Size SizeTime Rate
bytes  Bytes  bytesbytes   secs.per sec

16384  87380  11   15.0012225.77
16384  87380

On a slightly informed whim I tried disabling the interrupt thottle on both 
sides (modprobe e1000 InterruptThrottleRate=0,0,0,0,0,0,0,0) and re-ran:


hpcpc109:~/netperf2_trunk# src/netperf -t TCP_RR -H 192.168.2.105 -D 1.0 -l 
15TCP REQUEST/RESPONSE TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 
192.168.2.105 (192.168.2.105) port 0 AF_INET : demo : first burst 0

Interim result: 18673.68 Trans/s over 1.00 seconds
Interim result: 18685.01 Trans/s over 1.00 seconds
Interim result: 18682.30 Trans/s over 1.00 seconds
Interim result: 18681.05 Trans/s over 1.00 seconds
Interim result: 18680.25 Trans/s over 1.00 seconds
Interim result: 18742.44 Trans/s over 1.00 seconds
Interim result: 18739.45 Trans/s over 1.00 seconds
Interim result: 18723.52 Trans/s over 1.00 seconds
Interim result: 18736.53 Trans/s over 1.00 seconds
Interim result: 18737.61 Trans/s over 1.00 seconds
Interim result: 18744.76 Trans/s over 1.00 seconds
Interim result: 18728.54 Trans/s over 1.00 seconds
Interim result: 18738.91 Trans/s over 1.00 seconds
Interim result: 18735.53 Trans/s over 1.00 seconds
Interim result: 18741.03 Trans/s over 1.00 seconds
Local /Remote
Socket Size   Request  Resp.   Elapsed  Trans.
Send   Recv   Size SizeTime Rate
bytes  Bytes  bytesbytes   secs.per sec

16384  87380  11   15.0018717.94
16384  87380

and then just for grins I tried just disabling it on one side, leaving the other 
at defaults:


hpcpc109:~/netperf2_trunk# src/netperf -t TCP_RR -H 192.168.2.105 -D 1.0 -l 
15TCP REQUEST/RESPONSE TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 
192.168.2.105 (192.168.2.105) port 0 AF_INET : demo : first burst 0

Interim result: 19980.84 Trans/s over 1.00 seconds
Interim result: 19997.60 Trans/s over 1.00 seconds
Interim result: 19995.60 Trans/s over 1.00 seconds
Interim result: 20002.60 Trans/s over 1.00 seconds
Interim result: 20011.58 Trans/s over 1.00 seconds
Interim result: 19985.66 Trans/s over 1.00 seconds
Interim result: 20002.60 Trans/s over 1.00 seconds
Interim result: 20010.58 Trans/s over 1.00 seconds
Interim result: 20012.60 Trans/s over 1.00 seconds
Interim result: 19993.63 Trans/s over 1.00 seconds
Interim result: 19979.63 Trans/s over 1.00 seconds
Interim result: 19991.58 Trans/s over 1.00 seconds
Interim result: 20011.60 Trans/s over 1.00 seconds
Interim result: 19948.84 Trans/s over 1.00 seconds
Local /Remote
Socket Size   Request  Resp.   Elapsed  Trans.
Send   Recv   Size SizeTime Rate
bytes  Bytes  bytesbytes   secs.per sec

16384  87380  11   15.0019990.14
16384  87380


It looks like the e1000 interrupt throttle autotuning works very nicely when the 
other side isn't doing any, but if the other side is also trying to autotune it 
doesn't seem to stablize.  At least not during a netperf TCP_RR test.


Does anyone else see this?  To try to eliminate netperf demo mode I re-ran 
without it and got the same end results.


rick jones
-
To unsubscribe from this list: send the 

Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-16 Thread Segher Boessenkool

I'd go so far as to say that anywhere where you want a non-volatile
atomic_read, either your code is buggy, or else an int would work just
as well.


Even, the only way to implement a non-volatile atomic_read() is
essentially as a plain int (you can do some tricks so you cannot
assign to the result and stuff like that, but that's not the issue
here).

So if that would be the behaviour we wanted, just get rid of that
whole atomic_read() thing, so no one can misuse it anymore.


Segher

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-16 Thread Chris Snook

Ilpo Järvinen wrote:

On Thu, 16 Aug 2007, Herbert Xu wrote:


We've been through that already.  If it's a busy-wait it
should use cpu_relax. 


I looked around a bit by using some command lines and ended up wondering 
if these are equal to busy-wait case (and should be fixed) or not:


./drivers/telephony/ixj.c
6674:   while (atomic_read(j-DSPWrite)  0)
6675-   atomic_dec(j-DSPWrite);

...besides that, there are couple of more similar cases in the same file 
(with braces)...


atomic_dec() already has volatile behavior everywhere, so this is 
semantically okay, but this code (and any like it) should be calling 
cpu_relax() each iteration through the loop, unless there's a compelling 
reason not to.  I'll allow that for some hardware drivers (possibly this 
one) such a compelling reason may exist, but hardware-independent core 
subsystems probably have no excuse.


If the maintainer of this code doesn't see a compelling reason to add 
cpu_relax() in this loop, then it should be patched.


-- Chris
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-16 Thread Chris Snook

Herbert Xu wrote:

On Thu, Aug 16, 2007 at 10:06:31AM +0200, Stefan Richter wrote:

Do you (or anyone else for that matter) have an example of this?

The only code I somewhat know, the ieee1394 subsystem, was perhaps
authored and is currently maintained with the expectation that each
occurrence of atomic_read actually results in a load operation, i.e. is
not optimized away.  This means all atomic_t (bus generation, packet and
buffer refcounts, and some other state variables)* and likewise all
atomic bitops in that subsystem.


Can you find an actual atomic_read code snippet there that is
broken without the volatile modifier?


A whole bunch of atomic_read uses will be broken without the volatile 
modifier once we start removing barriers that aren't needed if volatile 
behavior is guaranteed.


barrier() clobbers all your registers.  volatile atomic_read() only 
clobbers one register, and more often than not it's a register you 
wanted to clobber anyway.


-- Chris
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-16 Thread Chris Snook

Ilpo Järvinen wrote:

On Thu, 16 Aug 2007, Herbert Xu wrote:


We've been through that already.  If it's a busy-wait it
should use cpu_relax. 


I looked around a bit by using some command lines and ended up wondering 
if these are equal to busy-wait case (and should be fixed) or not:


./drivers/telephony/ixj.c
6674:   while (atomic_read(j-DSPWrite)  0)
6675-   atomic_dec(j-DSPWrite);

...besides that, there are couple of more similar cases in the same file 
(with braces)...


atomic_dec() already has volatile behavior everywhere, so this is 
semantically okay, but this code (and any like it) should be calling 
cpu_relax() each iteration through the loop, unless there's a compelling 
reason not to.  I'll allow that for some hardware drivers (possibly this 
one) such a compelling reason may exist, but hardware-independent core 
subsystems probably have no excuse.


If the maintainer of this code doesn't see a compelling reason not to 
add cpu_relax() in this loop, then it should be patched.


-- Chris
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-16 Thread Paul E. McKenney
On Thu, Aug 16, 2007 at 11:54:54AM -0700, Christoph Lameter wrote:
 On Thu, 16 Aug 2007, Paul Mackerras wrote:
  So I don't see any good reason to make the atomic API more complex by
  having volatile and non-volatile versions of atomic_read.  It
  should just have the volatile behaviour.
 
 If you want to make it less complex then drop volatile which causes weird 
 side effects without solving any problems as you just pointed out.

The other set of problems are communication between process context
and interrupt/NMI handlers.  Volatile does help here.  And the performance
impact of volatile is pretty near zero, so why have the non-volatile
variant?

Thanx, Paul
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Bugme-new] [Bug 8895] New: An ioctl to delete an ipv6 tunnel leads to a kernel panic

2007-08-16 Thread Andrew Morton
On Thu, 16 Aug 2007 12:24:05 -0700 (PDT)
[EMAIL PROTECTED] wrote:

 http://bugzilla.kernel.org/show_bug.cgi?id=8895
 
Summary: An ioctl to delete an ipv6 tunnel leads to a kernel
 panic
Product: Networking
Version: 2.5
  KernelVersion: 2.6.22.3 and also 2.6.21.5
   Platform: All
 OS/Version: Linux
   Tree: Mainline
 Status: NEW
   Severity: normal
   Priority: P1
  Component: IPV6
 AssignedTo: [EMAIL PROTECTED]
 ReportedBy: [EMAIL PROTECTED]
 
 
 Most recent kernel where this bug did not occur: ?
 Distribution: lfs and fedora
 Hardware Environment:user mode linux and vmware
 Software Environment:an evolution of mip6d (ip mobility daemon) 
 Problem Description: The mip6d HA was modified to make a redondancy evolution,
 when an HA is interrupted, the other takes over, this leads to some
 creation/deletion of routes and tunnels.
 Note: The HA ip address known by the mobile (MR) stays the same, the slave HA
 takes it with an override neighbor advertisement message. So the tunnel 
 between
 the mobile router and the HA(s) keep the same end adresses. 
 The problem occurs when a Ctrl C is done on the master HA, the slave takes 
 over
 but sometimes, the master gets a kernel panic. 
 
 Here is the dump of the master:
 
 ICMPv6 NA: someone advertises our address on eth1!
 Slab corruption: ip6_dst_cache start=0867ed00, len=224
 Redzone: 0x9f911029d74e35b/0x9f911029d74e35b.
 Last user: [08157c46](dst_destroy+0x79/0xad)
 0a0: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6c 6b 6b 6b
 Prev obj: start=0867ec08, len=224
 Redzone: 0xd84156c5635688c0/0xd84156c5635688c0.
 Last user: [08157b05](dst_alloc+0x26/0x62)
 000: 00 00 00 00 00 00 00 00 00 00 00 00 40 41 6f 08
 010: 00 00 00 00 01 00 00 00 00 00 00 00 00 00 00 00
 Next obj: start=0867edf8, len=224
 Redzone: 0xd84156c5635688c0/0xd84156c5635688c0.
 Last user: [08157b05](dst_alloc+0x26/0x62)
 000: 00 00 00 00 00 00 00 00 00 00 00 00 60 41 99 0b
 010: 00 00 ff ff 00 00 00 00 7d df ff ff 00 00 00 00
 BUG: failure at net/ipv6/ip6_fib.c:1151/fib6_del_route()!
 Kernel panic - not syncing: BUG!
 
 EIP: 0073:[080e10b4] CPU: 0 Not tainted ESP: 007b:bf6d0398 EFLAGS: 0246
 Not tainted
 EAX: ffda EBX: 0006 ECX: 89f2 EDX: bf6d0428
 ESI:  EDI: 0815c150 EBP: bf6d0458 DS: 007b ES: 007b
 08a37ae4:  [0806ba80] show_regs+0xb4/0xb9
 08a37b10:  [0805a044] panic_exit+0x25/0x3f
 08a37b24:  [0807b088] notifier_call_chain+0x21/0x46
 08a37b44:  [0807b123] __atomic_notifier_call_chain+0x17/0x19
 08a37b60:  [0807b13a] atomic_notifier_call_chain+0x15/0x17
 08a37b7c:  [0806fff6] panic+0x52/0xdd
 08a37b9c:  [081bb8d2] fib6_del_route+0x112/0x175
 08a37bc0:  [081bb9c6] fib6_del+0x91/0xcc
 08a37bdc:  [081bbba8] fib6_clean_node+0x26/0x73
 08a37bf4:  [081bba8a] fib6_walk_continue+0x89/0x11f
 08a37c04:  [081bbb57] fib6_walk+0x37/0x62
 08a37c18:  [081bbc23] fib6_clean_tree+0x2e/0x31
 08a37c4c:  [081bbc83] fib6_prune_clones+0x15/0x1a
 08a37c64:  [081bb9de] fib6_del+0xa9/0xcc
 08a37c7c:  [081bbba8] fib6_clean_node+0x26/0x73
 08a37c94:  [081bba8a] fib6_walk_continue+0x89/0x11f
 08a37ca4:  [081bbb57] fib6_walk+0x37/0x62
 08a37cb8:  [081bbc23] fib6_clean_tree+0x2e/0x31
 08a37cec:  [081bbc51] fib6_clean_all+0x2b/0x48
 08a37d10:  [081b9d15] rt6_ifdown+0x12/0x17
 08a37d24:  [081b56e3] addrconf_ifdown+0x54/0x275
 08a37d40:  [081b562d] addrconf_notify+0x18a/0x1ec
 08a37d5c:  [0807b088] notifier_call_chain+0x21/0x46
 08a37d7c:  [0807b257] __raw_notifier_call_chain+0x17/0x19
 08a37d98:  [0807b26e] raw_notifier_call_chain+0x15/0x17
 08a37db4:  [08153c18] dev_close+0x5e/0x68
 08a37dcc:  [0815619e] unregister_netdevice+0xb7/0x1bc
 08a37ddc:  [081d75d7] ip6_tnl_ioctl+0x1a9/0x1d2
 08a37e34:  [0815578c] dev_ifsioc+0x3b9/0x3d9
 08a37e54:  [08155a71] dev_ioctl+0x2c5/0x300
 08a37e9c:  [0814b435] sock_ioctl+0x230/0x243
 08a37ebc:  [080b0801] do_ioctl+0x21/0x5a
 08a37ed8:  [080b0ba8] vfs_ioctl+0x1ec/0x209
 08a37f00:  [080b0bf3] sys_ioctl+0x2e/0x4b
 08a37f28:  [0805a7ae] handle_syscall+0x86/0xa0
 08a37f74:  [08068d00] handle_trap+0xd8/0xe1
 08a37f90:  [080690f3] userspace+0x138/0x180
 08a37fdc:  [0805a4d1] fork_handler+0x74/0x7c
 08a37ffc:  [a55a5a5a] 0xa55a5a5a
 
 
 Program received signal SIGSEGV, Segmentation fault.
 0xb7e58761 in abort () from /lib/tls/i686/cmov/libc.so.6
 (gdb)
 
 
 
 Program received signal SIGSEGV, Segmentation fault.
 0xb7e58761 in abort () from /lib/tls/i686/cmov/libc.so.6
 (gdb) bt
 #0  0xb7e58761 in abort () from /lib/tls/i686/cmov/libc.so.6
 #1  0x080676df in os_dump_core () at arch/um/os-Linux/util.c:109
 #2  0x0805a05a in panic_exit (self=0x825d674, unused1=0, unused2=0x8277ee0)
 at arch/um/kernel/um_arch.c:477
 #3  0x0807b088 in notifier_call_chain (nl=0x8277ec0, val=0, v=0x8277ee0,
 nr_to_call=-2, nr_calls=0x0) at kernel/sys.c:163
 #4  0x0807b123 in __atomic_notifier_call_chain (nh=0x8277ec0, val=0,
 v=0x8277ee0, nr_to_call=-1, 

Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-16 Thread Christoph Lameter
On Thu, 16 Aug 2007, Chris Snook wrote:

 atomic_dec() already has volatile behavior everywhere, so this is semantically
 okay, but this code (and any like it) should be calling cpu_relax() each
 iteration through the loop, unless there's a compelling reason not to.  I'll
 allow that for some hardware drivers (possibly this one) such a compelling
 reason may exist, but hardware-independent core subsystems probably have no
 excuse.

No it does not have any volatile semantics. atomic_dec() can be reordered 
at will by the compiler within the current basic unit if you do not add a 
barrier.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-16 Thread Segher Boessenkool
I can't speak for this particular case, but there could be similar 
code

examples elsewhere, where we do the atomic ops on an atomic_t object
inside a higher-level locking scheme that would take care of the kind 
of
problem you're referring to here. It would be useful for such or 
similar
code if the compiler kept the value of that atomic object in a 
register.


If there is a higher-level locking scheme then there is no point to
using atomic_t variables.  Atomic_t is specifically for the situation
where multiple CPUs are updating a variable without locking.


And don't forget about the case where it is an I/O device that is
updating the memory (in buffer descriptors or similar).  The driver
needs to do a volatile atomic read to get at the most recent version
of that data, which can be important for optimising latency (or 
throughput

even).  There is no other way the kernel can get that info -- doing an
MMIO read is way way too expensive.


Segher

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-16 Thread Segher Boessenkool

Note that volatile
is a type-qualifier, not a type itself, so a cast of the _object_ 
itself
to a qualified-type i.e. (volatile int) would not make the access 
itself

volatile-qualified.


There is no such thing as volatile-qualified access defined
anywhere; there only is the concept of a volatile-qualified
*object*.

To serve our purposes, it is necessary for us to take the address of 
this
(non-volatile) object, cast the resulting _pointer_ to the 
corresponding
volatile-qualified pointer-type, and then dereference it. This makes 
that
particular _access_ be volatile-qualified, without the object itself 
being
such. Also note that the (dereferenced) result is also a valid lvalue 
and

hence can be used in *(volatile int *)a = b; kind of construction
(which we use for the atomic_set case).


There is a quite convincing argument that such an access _is_ an
access to a volatile object; see GCC PR21568 comment #9.  This
probably isn't the last word on the matter though...


Segher

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-16 Thread Segher Boessenkool

Here, I should obviously admit that the semantics of *(volatile int *)
aren't any neater or well-defined in the _language standard_ at all. 
The

standard does say (verbatim) precisely what constitutes as access to
object of volatile-qualified type is implementation-defined, but GCC
does help us out here by doing the right thing.


Where do you get that idea?  GCC manual, section 6.1, When
is a Volatile Object Accessed? doesn't say anything of the
kind.  PR33053 and some others.


Honestly, given such confusion, and the propensity of the volatile
type-qualifier keyword to be ill-defined (or at least poorly 
understood,
often inconsistently implemented), I'd (again) express my opinion that 
it

would be best to avoid its usage, given other alternatives do exist.


Yeah.  Or we can have an email thread like this every time
someone proposes a patch that uses an atomic variable ;-)


Segher

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: PROBLEM: 2.6.23-rc NETDEV WATCHDOG: eth0: transmit timed out

2007-08-16 Thread Karl Meyer
I did some testing today and found that the error occurs after
applying some of the patches. However I did not figure out the exact
patch in which the error starts since it sometimes occurs immediatly
when moving some data over the net and sometimes it takes 30 min till
I get the transmit timeout. I will be away till sunday and do some
more testing then.

2007/8/16, Francois Romieu [EMAIL PROTECTED]:
 (please do not remove the netdev Cc:)

 Francois Romieu [EMAIL PROTECTED] :
 [...]
  If it does not work I'll dissect 0e4851502f846b13b29b7f88f1250c980d57e944
  tomorrow.

 You will find a tgz archive in attachment which contains a serie of patches
 (0001-... to 0005-...) to walk from 6dccd16b7c2703e8bbf8bca62b5cf248332afbe2
 to 0e4851502f846b13b29b7f88f1250c980d57e944 in smaller steps.

 Please apply 0001 on top of 6dccd16b7c2703e8bbf8bca62b5cf248332afbe2. If it
 still works, apply 0002 on top of 0001, etc.

 --
 Ueimor


-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-16 Thread Luck, Tony
 6674:   while (atomic_read(j-DSPWrite)  0)
 6675-   atomic_dec(j-DSPWrite);

 If the maintainer of this code doesn't see a compelling reason to add 
 cpu_relax() in this loop, then it should be patched.

Shouldn't it be just re-written without the loop:

if ((tmp = atomic_read(j-DSPWrite))  0)
atomic_sub(j-DSPWrite, tmp);

Has all the same bugs, but runs much faster :-)

-Tony
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [ofa-general] Re: [PATCH RFC] RDMA/CMA: Allocate PS_TCP ports from the host TCP port space.

2007-08-16 Thread David Miller
From: Tom Tucker [EMAIL PROTECTED]
Date: Thu, 16 Aug 2007 08:43:11 -0500

 Isn't RDMA _part_ of the software net stack within Linux?

It very much is not so.

When using RDMA you lose the capability to do packet shaping,
classification, and all the other wonderful networking facilities
you've grown to love and use over the years.

I'm glad this is a surprise to you, because it illustrates the
point some of us keep trying to make about technologies like
this.

Imagine if you didn't know any of this, you purchase and begin to
deploy a huge piece of RDMA infrastructure, you then get the mandate
from IT that you need to add firewalling on the RDMA connections at
the host level, and oh shit you can't?

This is why none of us core networking developers like RDMA at all.
It's totally not integrated with the rest of the Linux stack and on
top of that it even gets in the way.  It's an abberation, an eye sore,
and a constant source of consternation.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Proposed interface for per-packet mesh-ttl

2007-08-16 Thread Stephen Hemminger
On Thu, 16 Aug 2007 12:21:14 -0700
Luis Carlos Cobo [EMAIL PROTECTED] wrote:

 On 7/30/07, Stephen Hemminger [EMAIL PROTECTED] wrote:
  it would need an IP ttl to mesh mapping. The fundamental thing is to try
  and avoid topology specific options bleeding all the way up the socket 
  layer,
  especially since the network layer is involved and may need to multipath.
 
 I think the cleanest way would be to add a ll_ttl (ll for link layer)
 field to struct sock and a SO_LL_TTL socket option that sets both the
 field and a flag in sk-flags. This way it is useful for any driver
 that can do mesh or any other protocol that involves a ttl at link
 layer (not that I'm aware of any).
 
 However I guess you are not supposed to add new socket options nor
 modify struct socket too often so I'd appreciate feedback on whether
 this would be considered a good approach.
 
 -- 
 Luis Carlos Cobo Rus   GnuPG ID: 44019B60
 cozybit Inc.

The problem with socket options is how does the application know
the correct policy? Pushing configuration to application is just deferring
the problem, not solving it.  You want some policy to be done by the
infrastructure; that means kernel, libraries, daemons, etc. Doing it in
the application is often inflexible and unusable.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Proposed interface for per-packet mesh-ttl

2007-08-16 Thread Luis Carlos Cobo
On 8/16/07, Stephen Hemminger [EMAIL PROTECTED] wrote:
 The problem with socket options is how does the application know
 the correct policy? Pushing configuration to application is just deferring
 the problem, not solving it.  You want some policy to be done by the
 infrastructure; that means kernel, libraries, daemons, etc. Doing it in
 the application is often inflexible and unusable.

The policy will usually be done by the kernel. If the flag is not set,
which will happen most of the time, the driver will use a sensible
default ttl. The socket option would only be used by applications that
are specifically interested in a configurable ttl (like an application
to plot neighbors within an specific diameter). A per-interface
setting is not good enough, since we do not want the neighbor-plotting
tool to affect the ttl of other connections (e.g. a ssh session) that
might be going on at the same time.

-- 
Luis Carlos Cobo Rus   GnuPG ID: 44019B60
cozybit Inc.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] net/core/dst.c : Should'nt dst_run_gc() be more scalable and friendly ?

2007-08-16 Thread Herbert Xu
On Thu, Aug 16, 2007 at 05:40:44PM +0200, Eric Dumazet wrote:
 
 So do you think this patch is enough or should we convert dst_run_gc 
 processing from softirq to workqueue too ?

I think a workqueue would be the best solution since with
that you wouldn't have to worry about processing things in
chunks.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmVHI~} [EMAIL PROTECTED]
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-16 Thread Herbert Xu
On Thu, Aug 16, 2007 at 09:34:41AM -0700, Paul E. McKenney wrote:

 The compiler can also reorder non-volatile accesses.  For an example
 patch that cares about this, please see:
 
   http://lkml.org/lkml/2007/8/7/280
 
 This patch uses an ORDERED_WRT_IRQ() in rcu_read_lock() and
 rcu_read_unlock() to ensure that accesses aren't reordered with respect
 to interrupt handlers and NMIs/SMIs running on that same CPU.

Good, finally we have some code to discuss (even though it's
not actually in the kernel yet).

First of all, I think this illustrates that what you want
here has nothing to do with atomic ops.  The ORDERED_WRT_IRQ
macro occurs a lot more times in your patch than atomic
reads/sets.  So *assuming* that it was necessary at all,
then having an ordered variant of the atomic_read/atomic_set
ops could do just as well.

However, I still don't know which atomic_read/atomic_set in
your patch would be broken if there were no volatile.  Could
you please point them out?

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmVHI~} [EMAIL PROTECTED]
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-16 Thread Herbert Xu
On Thu, Aug 16, 2007 at 03:48:54PM -0400, Chris Snook wrote:

 Can you find an actual atomic_read code snippet there that is
 broken without the volatile modifier?
 
 A whole bunch of atomic_read uses will be broken without the volatile 
 modifier once we start removing barriers that aren't needed if volatile 
 behavior is guaranteed.

Could you please cite the file/function names so we can
see whether removing the barrier makes sense?

Thanks,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmVHI~} [EMAIL PROTECTED]
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-16 Thread Paul E. McKenney
On Fri, Aug 17, 2007 at 07:59:02AM +0800, Herbert Xu wrote:
 On Thu, Aug 16, 2007 at 09:34:41AM -0700, Paul E. McKenney wrote:
 
  The compiler can also reorder non-volatile accesses.  For an example
  patch that cares about this, please see:
  
  http://lkml.org/lkml/2007/8/7/280
  
  This patch uses an ORDERED_WRT_IRQ() in rcu_read_lock() and
  rcu_read_unlock() to ensure that accesses aren't reordered with respect
  to interrupt handlers and NMIs/SMIs running on that same CPU.
 
 Good, finally we have some code to discuss (even though it's
 not actually in the kernel yet).

There was some earlier in this thread as well.

 First of all, I think this illustrates that what you want
 here has nothing to do with atomic ops.  The ORDERED_WRT_IRQ
 macro occurs a lot more times in your patch than atomic
 reads/sets.  So *assuming* that it was necessary at all,
 then having an ordered variant of the atomic_read/atomic_set
 ops could do just as well.

Indeed.  If I could trust atomic_read()/atomic_set() to cause the compiler
to maintain ordering, then I could just use them instead of having to
create an  ORDERED_WRT_IRQ().  (Or ACCESS_ONCE(), as it is called in a
different patch.)

 However, I still don't know which atomic_read/atomic_set in
 your patch would be broken if there were no volatile.  Could
 you please point them out?

Suppose I tried replacing the ORDERED_WRT_IRQ() calls with
atomic_read() and atomic_set().  Starting with __rcu_read_lock():

o   If ORDERED_WRT_IRQ(__get_cpu_var(rcu_flipctr)[idx])++
was ordered by the compiler after
ORDERED_WRT_IRQ(me-rcu_read_lock_nesting) = nesting + 1, then
suppose an NMI/SMI happened after the rcu_read_lock_nesting but
before the rcu_flipctr.

Then if there was an rcu_read_lock() in the SMI/NMI
handler (which is perfectly legal), the nested rcu_read_lock()
would believe that it could take the then-clause of the
enclosing if statement.  But because the rcu_flipctr per-CPU
variable had not yet been incremented, an RCU updater would
be within its rights to assume that there were no RCU reads
in progress, thus possibly yanking a data structure out from
under the reader in the SMI/NMI function.

Fatal outcome.  Note that only one CPU is involved here
because these are all either per-CPU or per-task variables.

o   If ORDERED_WRT_IRQ(me-rcu_read_lock_nesting) = nesting + 1
was ordered by the compiler to follow the
ORDERED_WRT_IRQ(me-rcu_flipctr_idx) = idx, and an NMI/SMI
happened between the two, then an __rcu_read_lock() in the NMI/SMI
would incorrectly take the else clause of the enclosing if
statement.  If some other CPU flipped the rcu_ctrlblk.completed
in the meantime, then the __rcu_read_lock() would (correctly)
write the new value into rcu_flipctr_idx.

Well and good so far.  But the problem arises in
__rcu_read_unlock(), which then decrements the wrong counter.
Depending on exactly how subsequent events played out, this could
result in either prematurely ending grace periods or never-ending
grace periods, both of which are fatal outcomes.

And the following are not needed in the current version of the
patch, but will be in a future version that either avoids disabling
irqs or that dispenses with the smp_read_barrier_depends() that I
have 99% convinced myself is unneeded:

o   nesting = ORDERED_WRT_IRQ(me-rcu_read_lock_nesting);

o   idx = ORDERED_WRT_IRQ(rcu_ctrlblk.completed)  0x1;

Furthermore, in that future version, irq handlers can cause the same
mischief that SMI/NMI handlers can in this version.

Next, looking at __rcu_read_unlock():

o   If ORDERED_WRT_IRQ(me-rcu_read_lock_nesting) = nesting - 1
was reordered by the compiler to follow the
ORDERED_WRT_IRQ(__get_cpu_var(rcu_flipctr)[idx])--,
then if an NMI/SMI containing an rcu_read_lock() occurs between
the two, this nested rcu_read_lock() would incorrectly believe
that it was protected by an enclosing RCU read-side critical
section as described in the first reversal discussed for
__rcu_read_lock() above.  Again, fatal outcome.

This is what we have now.  It is not hard to imagine situations that
interact with -both- interrupt handlers -and- other CPUs, as described
earlier.

Thanx, Paul
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-16 Thread Paul E. McKenney
On Thu, Aug 16, 2007 at 01:20:26PM -0700, Christoph Lameter wrote:
 On Thu, 16 Aug 2007, Chris Snook wrote:
 
  atomic_dec() already has volatile behavior everywhere, so this is 
  semantically
  okay, but this code (and any like it) should be calling cpu_relax() each
  iteration through the loop, unless there's a compelling reason not to.  I'll
  allow that for some hardware drivers (possibly this one) such a compelling
  reason may exist, but hardware-independent core subsystems probably have no
  excuse.
 
 No it does not have any volatile semantics. atomic_dec() can be reordered 
 at will by the compiler within the current basic unit if you do not add a 
 barrier.

Yep.  Or you can use atomic_dec_return() instead of using a barrier.

Thanx, Paul
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-16 Thread Herbert Xu
On Thu, Aug 16, 2007 at 06:02:32PM -0700, Paul E. McKenney wrote:
 
 Yep.  Or you can use atomic_dec_return() instead of using a barrier.

Or you could use smp_mb__{before,after}_atomic_dec.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmVHI~} [EMAIL PROTECTED]
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-16 Thread Chris Snook

Herbert Xu wrote:

On Thu, Aug 16, 2007 at 03:48:54PM -0400, Chris Snook wrote:

Can you find an actual atomic_read code snippet there that is
broken without the volatile modifier?
A whole bunch of atomic_read uses will be broken without the volatile 
modifier once we start removing barriers that aren't needed if volatile 
behavior is guaranteed.


Could you please cite the file/function names so we can
see whether removing the barrier makes sense?

Thanks,


At a glance, several architectures' implementations of smp_call_function() have 
one or more legitimate atomic_read() busy-waits that shouldn't be using 
CPU-relax.  Some of them do work in the loop.


I'm sure there are plenty more examples that various maintainers could find in 
their own code.


-- Chris
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-16 Thread Herbert Xu
On Thu, Aug 16, 2007 at 10:04:24PM -0400, Chris Snook wrote:

 Could you please cite the file/function names so we can
 see whether removing the barrier makes sense?
 
 At a glance, several architectures' implementations of smp_call_function() 
 have one or more legitimate atomic_read() busy-waits that shouldn't be 
 using CPU-relax.  Some of them do work in the loop.

Care to name one so we can discuss it?

Thanks,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmVHI~} [EMAIL PROTECTED]
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-16 Thread Nick Piggin

Segher Boessenkool wrote:
Part of the motivation here is to fix heisenbugs.  If I knew where 
they




By the same token we should probably disable optimisations
altogether since that too can create heisenbugs.


Almost everything is a tradeoff; and so is this.  I don't
believe most people would find disabling all compiler
optimisations an acceptable price to pay for some peace
of mind.



So why is this a good tradeoff?



It certainly is better than disabling all compiler optimisations!


It's easy to be better than something really stupid :)

So i386 and x86-64 don't have volatiles there, and it saves them a
few K of kernel text. What you need to justify is why it is a good
tradeoff to make them volatile (which btw, is much harder to go
the other way after we let people make those assumptions).



I also think that just adding things to APIs in the hope it might fix
up some bugs isn't really a good road to go down. Where do you stop?



I look at it the other way: keeping the volatile semantics in
atomic_XXX() (or adding them to it, whatever) helps _prevent_ bugs;


Yeah, but we could add lots of things to help prevent bugs and
would never be included. I would also contend that it helps _hide_
bugs and encourages people to be lazy when thinking about these
things.

Also, you dismiss the fact that we'd actually be *adding* volatile
semantics back to the 2 most widely tested architectures (in terms
of test time, number of testers, variety of configurations, and
coverage of driver code). This is a very important different from
just keeping volatile semantics because it is basically a one-way
API change.



certainly most people expect that behaviour, and also that behaviour
is *needed* in some places and no other interface provides that
functionality.


I don't know that most people would expect that behaviour. Is there
any documentation anywhere that would suggest this?

Also, barrier() most definitely provides the required functionality.
It is overkill in some situations, but volatile is overkill in _most_
situations. If that's what you're worried about, we should add a new
ordering primitive.



[some confusion about barriers wrt atomics snipped]


What were you confused about?

--
SUSE Labs, Novell Inc.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-16 Thread Nick Piggin

Chris Snook wrote:

Herbert Xu wrote:


On Thu, Aug 16, 2007 at 03:48:54PM -0400, Chris Snook wrote:


Can you find an actual atomic_read code snippet there that is
broken without the volatile modifier?


A whole bunch of atomic_read uses will be broken without the volatile 
modifier once we start removing barriers that aren't needed if 
volatile behavior is guaranteed.



Could you please cite the file/function names so we can
see whether removing the barrier makes sense?

Thanks,



At a glance, several architectures' implementations of 
smp_call_function() have one or more legitimate atomic_read() busy-waits 
that shouldn't be using CPU-relax.  Some of them do work in the loop.


sh looks like the only one there that would be broken (and that's only
because they don't have a cpu_relax there, but it should be added anyway).
Sure, if we removed volatile from other architectures, it would be wise
to audit arch code because arch maintainers do sometimes make assumptions
about their implementation details... however we can assume most generic
code is safe without volatile.

--
SUSE Labs, Novell Inc.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-16 Thread Paul Mackerras
Christoph Lameter writes:

 No it does not have any volatile semantics. atomic_dec() can be reordered 
 at will by the compiler within the current basic unit if you do not add a 
 barrier.

Volatile doesn't mean it can't be reordered; volatile means the
accesses can't be eliminated.

Paul.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-16 Thread Linus Torvalds


On Fri, 17 Aug 2007, Paul Mackerras wrote:

 Volatile doesn't mean it can't be reordered; volatile means the
 accesses can't be eliminated.

It also does limit re-ordering. 

Of course, since *normal* accesses aren't necessarily limited wrt 
re-ordering, the question then becomes one of with regard to *what* does 
it limit re-ordering?.

A C compiler that re-orders two different volatile accesses that have a 
sequence point in between them is pretty clearly a buggy compiler. So at a 
minimum, it limits re-ordering wrt other volatiles (assuming sequence 
points exists). It also means that the compiler cannot move it 
speculatively across conditionals, but other than that it's starting to 
get fuzzy.

In general, I'd *much* rather we used barriers. Anything that depends on 
volatile is pretty much set up to be buggy. But I'm certainly also willing 
to have that volatile inside atomic_read/atomic_set() if it avoids code 
that would otherwise break - ie if it hides a bug.

Linus
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-16 Thread Nick Piggin

Paul E. McKenney wrote:

On Thu, Aug 16, 2007 at 06:42:50PM +0800, Herbert Xu wrote:



In fact, volatile doesn't guarantee that the memory gets
read anyway.  You might be reading some stale value out
of the cache.  Granted this doesn't happen on x86 but
when you're coding for the kernel you can't make such
assumptions.

So the point here is that if you don't mind getting a stale
value from the CPU cache when doing an atomic_read, then
surely you won't mind getting a stale value from the compiler
cache.



Absolutely disagree.  An interrupt/NMI/SMI handler running on the CPU
will see the same value (whether in cache or in store buffer) that
the mainline code will see.  In this case, we don't care about CPU
misordering, only about compiler misordering.  It is easy to see
other uses that combine communication with handlers on the current
CPU with communication among CPUs -- again, see prior messages in
this thread.


I still don't agree with the underlying assumption that everybody
(or lots of kernel code) treats atomic accesses as volatile.

Nobody that does has managed to explain my logic problem either:
loads and stores to long and ptr have always been considered to be
atomic, test_bit is atomic; so why are another special subclass of
atomic loads and stores? (and yes, it is perfectly legitimate to
want a non-volatile read for a data type that you also want to do
atomic RMW operations on)

Why are people making these undocumented and just plain false
assumptions about atomic_t? If they're using lockless code (ie.
which they must be if using atomics), then they actually need to be
thinking much harder about memory ordering issues. If that is too
much for them, then they can just use locks.



So, the architecture guys can implement atomic_read however they want
--- as long as it cannot be optimized away.*


They can implement it however they want as long as it stays
atomic.



Precisely.  And volatility is a key property of atomic.  Let's please
not throw it away.


It isn't, though (at least not since i386 and x86-64 don't have it).
_Adding_ it is trivial, and can be done any time. Throwing it away
(ie. making the API weaker) is _hard_. So let's not add it without
really good reasons. It most definitely results in worse code
generation in practice.

I don't know why people would assume volatile of atomics. AFAIK, most
of the documentation is pretty clear that all the atomic stuff can be
reordered etc. except for those that modify and return a value.

It isn't even intuitive: `*lp = value` is like the most fundamental
atomic operation in Linux.

--
SUSE Labs, Novell Inc.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-16 Thread Paul Mackerras
Nick Piggin writes:

 So i386 and x86-64 don't have volatiles there, and it saves them a
 few K of kernel text. What you need to justify is why it is a good

I'm really surprised it's as much as a few K.  I tried it on powerpc
and it only saved 40 bytes (10 instructions) for a G5 config.

Paul.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-16 Thread Nick Piggin

Paul Mackerras wrote:

Nick Piggin writes:



So i386 and x86-64 don't have volatiles there, and it saves them a
few K of kernel text. What you need to justify is why it is a good



I'm really surprised it's as much as a few K.  I tried it on powerpc
and it only saved 40 bytes (10 instructions) for a G5 config.

Paul.



I'm surprised too. Numbers were from the ...use asm() like the other
atomic operations already do thread. According to them,

  textdata bss dec hex filename
3434150  249176  176128 3859454  3ae3fe atomic_normal/vmlinux
3436203  249176  176128 3861507  3aec03 atomic_volatile/vmlinux

The first one is a stock kenel, the second is with atomic_read/set
cast to volatile. gcc-4.1 -- maybe if you have an earlier gcc it
won't optimise as much?

--
SUSE Labs, Novell Inc.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


  1   2   >