Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
Herbert Xu writes: It doesn't matter. The memory pressure flag is an *advisory* flag. If we get it wrong the worst that'll happen is that we'd waste some time doing work that'll be thrown away. Ah, so it's the racy but I don't care because it's only an optimization case. That's fine. Somehow I find it hard to believe that all the racy uses of atomic_read in the kernel are like that, though. :) In any case, this actually illustrates why the addition of volatile is completely pointless. Even if this code was broken, which it definitely is not, having the volatile there wouldn't have helped at all. Yes, adding volatile to racy code doesn't somehow make it race-free. Neither does using atomic_t, despite what some seem to believe. I have actually started going through all the uses of atomic_read in the kernel. So far out of the first 100 I have found none where we have two atomic_reads of the same variable and the compiler could usefully use the value from the first as the result of the second. But there's still 2500 to go... Paul. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: drivers/net/tokenring/3c359.c
What exactly the difference between kzalloc and kcalloc? From the definition, I could see that kcalloc should be used for array allocation. But I could see kzalloc is used for allocation arrays as in the below patch. Any coding standard (or) developers can use kzalloc and kcalloc as per their coding practice?? Thanks, ~Siva -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Jeff Garzik Sent: Tuesday, August 14, 2007 11:20 AM To: [EMAIL PROTECTED] Cc: [EMAIL PROTECTED]; netdev@vger.kernel.org; [EMAIL PROTECTED]; Linux Kernel; kernel-janitors Subject: Re: drivers/net/tokenring/3c359.c Surya Prabhakar N wrote: Hi, Replacing kmalloc with kzalloc and cleaning up memset in drivers/net/tokenring/3c359.c Signed-off-by: Surya Prabhakar [EMAIL PROTECTED] applied - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
Herbert Xu writes: On Thu, Aug 16, 2007 at 02:11:43PM +1000, Paul Mackerras wrote: The uses of atomic_read where one might want it to allow caching of the result seem to me to fall into 3 categories: 1. Places that are buggy because of a race arising from the way it's used. 2. Places where there is a race but it doesn't matter because we're doing some clever trick. 3. Places where there is some locking in place that eliminates any potential race. Agreed. In case 1, adding volatile won't solve the race, of course, but it's hard to argue that we shouldn't do something because it will slow down buggy code. Case 2 is hopefully pretty rare and accompanied by large comment blocks, and in those cases caching the result of atomic_read explicitly in a local variable would probably make the code clearer. And in case 3 there is no reason to use atomic_t at all; we might as well just use an int. Since adding volatile doesn't help any of the 3 cases, and takes away optimisations from both 2 and 3, I wonder what is the point of the addition after all? Note that I said these are the cases _where one might want to allow caching_, so of course adding volatile doesn't help _these_ cases. There are of course other cases where one definitely doesn't want to allow the compiler to cache the value, such as when polling an atomic variable waiting for another CPU to change it, and from my inspection so far these cases seem to be the majority. The reasons for having volatile behaviour of atomic_read (whether or not that is achieved by use of the volatile C keyword) are - It matches the normal expectation based on the name atomic_read - It matches the behaviour of the other atomic_* primitives - It avoids bugs in the cases where volatile behaviour is required To my mind these outweigh the small benefit for some code of the non-volatile (caching-allowed) behaviour. In fact it's pretty minor either way, and since x86[-64] has this behaviour, one can expect the potential bugs in generic code to have mostly been found, although perhaps not all of them since x86[-64] has less aggressive reordering of memory accesses and fewer registers in which to cache things than some other architectures. Paul. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
On Thu, Aug 16, 2007 at 04:56:21PM +1000, Paul Mackerras wrote: Note that I said these are the cases _where one might want to allow caching_, so of course adding volatile doesn't help _these_ cases. There are of course other cases where one definitely doesn't want to allow the compiler to cache the value, such as when polling an atomic variable waiting for another CPU to change it, and from my inspection so far these cases seem to be the majority. We've been through that already. If it's a busy-wait it should use cpu_relax. If it's scheduling away that already forces the compiler to reread anyway. Do you have an actual example where volatile is needed? - It matches the normal expectation based on the name atomic_read - It matches the behaviour of the other atomic_* primitives Can't argue since you left out what those expectations or properties are. - It avoids bugs in the cases where volatile behaviour is required Do you (or anyone else for that matter) have an example of this? Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmVHI~} [EMAIL PROTECTED] Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
Herbert Xu wrote: On Thu, Aug 16, 2007 at 04:56:21PM +1000, Paul Mackerras wrote: Note that I said these are the cases _where one might want to allow caching_, so of course adding volatile doesn't help _these_ cases. There are of course other cases where one definitely doesn't want to allow the compiler to cache the value, such as when polling an atomic variable waiting for another CPU to change it, and from my inspection so far these cases seem to be the majority. We've been through that already. If it's a busy-wait it should use cpu_relax. If it's scheduling away that already forces the compiler to reread anyway. Do you have an actual example where volatile is needed? - It matches the normal expectation based on the name atomic_read - It matches the behaviour of the other atomic_* primitives Can't argue since you left out what those expectations or properties are. We use atomic_t for data that is concurrently locklessly written and read at arbitrary times. My naive expectation as driver author (driver maintainer) is that all atomic_t accessors, including atomic_read, (and atomic bitops) work with the then current value of the atomic data. - It avoids bugs in the cases where volatile behaviour is required Do you (or anyone else for that matter) have an example of this? The only code I somewhat know, the ieee1394 subsystem, was perhaps authored and is currently maintained with the expectation that each occurrence of atomic_read actually results in a load operation, i.e. is not optimized away. This means all atomic_t (bus generation, packet and buffer refcounts, and some other state variables)* and likewise all atomic bitops in that subsystem. If that assumption is wrong, then what is the API or language primitive to force a load operation to occur? *) Interesting what a quick LXR session in search for all atomic_t usages in 'my' subsystem brings to light. I now noticed an apparently unused struct member in the bitrotting pcilynx driver, and more importantly, a pairing of two atomic_t variables in raw1394 that should be audited for race conditions and for possible replacement by plain int. -- Stefan Richter -=-=-=== =--- = http://arcgraph.de/sr/ - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] [DM9000] Add support for big-endian hosts
On Tuesday 14 August 2007 20:25, Ben Dooks wrote: On Tue, Aug 14, 2007 at 01:22:32PM +0200, Laurent Pinchart wrote: This patch splits the receive status in 8bit wide fields and convert the packet length from little endian to CPU byte order. Which version of the the kernel was this against, it applies with fuzz to 2.6.23-rc3: $ patch -p1 ~/dm9000-fix-endianness.patch patching file drivers/net/dm9000.c Hunk #1 succeeded at 894 (offset 15 lines). Hunk #2 succeeded at 936 (offset 15 lines). Hunk #3 succeeded at 948 (offset 15 lines). That was against 2.6.22. Sorry. I'll post updated patches against 2.6.23-rc3. Best regards. -- Laurent Pinchart CSE Semaphore Belgium Chaussée de Bruxelles, 732A B-1410 Waterloo Belgium T +32 (2) 387 42 59 F +32 (2) 387 42 75 - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 0/2] [DM9000] Support for big endian hosts and external PHY
These patches update the DM9000 driver to add support for big endian hosts and external PHY. They are resubmissions of patches based on 2.6.22, that I have now updated to 2.6.23-rc3. -- Laurent Pinchart CSE Semaphore Belgium Chaussée de Bruxelles, 732A B-1410 Waterloo Belgium T +32 (2) 387 42 59 F +32 (2) 387 42 75 - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/2] [DM9000] Added support for big-endian hosts
This patch splits the receive status in 8bit wide fields and convert the packet length from little endian to CPU byte order. Signed-off-by: Laurent Pinchart [EMAIL PROTECTED] --- drivers/net/dm9000.c | 13 +++-- 1 files changed, 7 insertions(+), 6 deletions(-) diff --git a/drivers/net/dm9000.c b/drivers/net/dm9000.c index c3de81b..a424810 100644 --- a/drivers/net/dm9000.c +++ b/drivers/net/dm9000.c @@ -894,7 +894,8 @@ dm9000_timer(unsigned long data) } struct dm9000_rxhdr { - u16 RxStatus; + u8 RxPktReady; + u8 RxStatus; u16 RxLen; } __attribute__((__packed__)); @@ -935,7 +936,7 @@ dm9000_rx(struct net_device *dev) (db-inblk)(db-io_data, rxhdr, sizeof(rxhdr)); - RxLen = rxhdr.RxLen; + RxLen = le16_to_cpu(rxhdr.RxLen); /* Packet Status check */ if (RxLen 0x40) { @@ -947,17 +948,17 @@ dm9000_rx(struct net_device *dev) PRINTK1(RST: RX Len:%x\n, RxLen); } - if (rxhdr.RxStatus 0xbf00) { + if (rxhdr.RxStatus 0xbf) { GoodPacket = false; - if (rxhdr.RxStatus 0x100) { + if (rxhdr.RxStatus 0x01) { PRINTK1(fifo error\n); db-stats.rx_fifo_errors++; } - if (rxhdr.RxStatus 0x200) { + if (rxhdr.RxStatus 0x02) { PRINTK1(crc error\n); db-stats.rx_crc_errors++; } - if (rxhdr.RxStatus 0x8000) { + if (rxhdr.RxStatus 0x80) { PRINTK1(length error\n); db-stats.rx_length_errors++; } -- 1.5.0 -- Laurent Pinchart CSE Semaphore Belgium Chaussée de Bruxelles, 732A B-1410 Waterloo Belgium T +32 (2) 387 42 59 F +32 (2) 387 42 75 - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/2] [DM9000] External PHY support
This patch adds a flag to the DM9000 platform data which, when set, configures the device to use an external PHY. Signed-off-by: Laurent Pinchart [EMAIL PROTECTED] --- drivers/net/dm9000.c |6 ++ include/linux/dm9000.h |1 + 2 files changed, 7 insertions(+), 0 deletions(-) diff --git a/drivers/net/dm9000.c b/drivers/net/dm9000.c index a424810..a86984e 100644 --- a/drivers/net/dm9000.c +++ b/drivers/net/dm9000.c @@ -136,6 +136,7 @@ typedef struct board_info { u16 dbug_cnt; u8 io_mode; /* 0:word, 2:byte */ u8 phy_addr; + unsigned int flags; void (*inblk)(void __iomem *port, void *data, int length); void (*outblk)(void __iomem *port, void *data, int length); @@ -528,6 +529,8 @@ dm9000_probe(struct platform_device *pdev) if (pdata-dumpblk != NULL) db-dumpblk = pdata-dumpblk; + + db-flags = pdata-flags; } dm9000_reset(db); @@ -670,6 +673,9 @@ dm9000_init_dm9000(struct net_device *dev) iow(db, DM9000_GPCR, GPCR_GEP_CNTL);/* Let GPIO0 output */ iow(db, DM9000_GPR, 0); /* Enable PHY */ + if (db-flags DM9000_PLATF_EXT_PHY) + iow(db, DM9000_NCR, NCR_EXT_PHY); + /* Program operating register */ iow(db, DM9000_TCR, 0); /* TX Polling clear */ iow(db, DM9000_BPTR, 0x3f); /* Less 3Kb, 200us */ diff --git a/include/linux/dm9000.h b/include/linux/dm9000.h index 0008e2a..ea530fd 100644 --- a/include/linux/dm9000.h +++ b/include/linux/dm9000.h @@ -19,6 +19,7 @@ #define DM9000_PLATF_8BITONLY (0x0001) #define DM9000_PLATF_16BITONLY (0x0002) #define DM9000_PLATF_32BITONLY (0x0004) +#define DM9000_PLATF_EXT_PHY (0x0008) /* platfrom data for platfrom device structure's platfrom_data field */ -- 1.5.0 -- Laurent Pinchart CSE Semaphore Belgium Chaussée de Bruxelles, 732A B-1410 Waterloo Belgium T +32 (2) 387 42 59 F +32 (2) 387 42 75 - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: 2.6.23-rc3 and SKY2 driver issue
[Adding Stephen and netdev to CC] On 15/08/07, James Corey [EMAIL PROTECTED] wrote: I tried running a D-link gig card on kernel 2.6.21.1 and it came up fine, but when I did a sftp of an linux dvd ISO to it, the interface would lock up hard with the error eth1: hw csum failure. Call Trace: IRQ [804d3e31] __skb_checksum_complete_head+0x46/0x5f [804d3e56] __skb_checksum_complete+0xc/0x11 [805046c7] tcp_v4_rcv+0x157/0x810 [804d8a6f] dev_queue_xmit+0x237/0x260 [80229990] find_busiest_group+0x252/0x684 [804ead50] ip_local_deliver+0xca/0x14c [804eb24a] ip_rcv+0x478/0x4ba [803ec7d3] sky2_poll+0x6f9/0x9b9 [8022bd5d] run_rebalance_domains+0x13e/0x408 [804d877a] net_rx_action+0xa8/0x166 [80235d62] __do_softirq+0x55/0xc3 [8020a4ec] call_softirq+0x1c/0x28 [8020b611] do_softirq+0x2c/0x7d [8020b8cf] do_IRQ+0x13e/0x15f [802086ce] mwait_idle+0x0/0x46 [80209871] ret_from_intr+0x0/0xa EOI [80208710] mwait_idle+0x42/0x46 [80208666] cpu_idle+0x8c/0xaf [8078174e] start_kernel+0x2ac/0x2b8 [80781140] _sinittext+0x140/0x144 So I tried running the latest snapshot 2.6.23-rc3 and the almost the same thing happens. Only difference is that now the entire box locks up. The error is almost the same eth1: hw csum failure. Call Trace: IRQ [804779b6] __skb_checksum_complete_head+0x43/0x56 [804779d5] __skb_checksum_complete+0xc/0x11 [804a989d] tcp_v4_rcv+0x14e/0x801 [8048ff84] ip_local_deliver+0xca/0x14c [80490472] ip_rcv+0x46c/0x4ae [880060f9] :sky2:sky2_poll+0x72b/0x9c7 [8047c934] net_rx_action+0xa8/0x166 [80235ced] __do_softirq+0x55/0xc4 [8020c5cc] call_softirq+0x1c/0x28 [8020d6fd] do_softirq+0x2c/0x7d [8020d9bb] do_IRQ+0x13e/0x15f [8020a780] mwait_idle+0x0/0x48 [8020b951] ret_from_intr+0x0/0xa EOI [880063e7] :sky2:sky2_xmit_frame+0x0/0x41d [8020a7c2] mwait_idle+0x42/0x48 [8020a718] cpu_idle+0xbd/0xe0 [80704a5a] start_kernel+0x2ac/0x2b8 [80704140] _sinittext+0x140/0x144 I see that the new kernel includes some sort of SKY2 DEBUG stuff. I would be happy to rerun the test with that turned on, given some minor direction. Regards, Michal -- LOG http://www.stardust.webpages.pl/log/ - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
On Thu, Aug 16, 2007 at 10:06:31AM +0200, Stefan Richter wrote: Do you (or anyone else for that matter) have an example of this? The only code I somewhat know, the ieee1394 subsystem, was perhaps authored and is currently maintained with the expectation that each occurrence of atomic_read actually results in a load operation, i.e. is not optimized away. This means all atomic_t (bus generation, packet and buffer refcounts, and some other state variables)* and likewise all atomic bitops in that subsystem. Can you find an actual atomic_read code snippet there that is broken without the volatile modifier? Thanks, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmVHI~} [EMAIL PROTECTED] Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [patch 4/4] Update e1000 driver to use devres.
Index: linux-2.6/drivers/net/e1000/e1000_main.c === --- linux-2.6.orig/drivers/net/e1000/e1000_main.c +++ linux-2.6/drivers/net/e1000/e1000_main.c @@ -860,15 +860,14 @@ e1000_probe(struct pci_dev *pdev, { struct net_device *netdev; struct e1000_adapter *adapter; - unsigned long mmio_start, mmio_len; - unsigned long flash_start, flash_len; + unsigned long mmio_len, flash_len; static int cards_found = 0; static int global_quad_port_a = 0; /* global ksp3 port a indication */ int i, err, pci_using_dac; uint16_t eeprom_data = 0; uint16_t eeprom_apme_mask = E1000_EEPROM_APME; - if ((err = pci_enable_device(pdev))) + if ((err = pcim_enable_device(pdev))) return err; if (!(err = pci_set_dma_mask(pdev, DMA_64BIT_MASK)) @@ -878,20 +877,19 @@ e1000_probe(struct pci_dev *pdev, if ((err = pci_set_dma_mask(pdev, DMA_32BIT_MASK)) (err = pci_set_consistent_dma_mask(pdev, DMA_32BIT_MASK))) { E1000_ERR(No usable DMA configuration, aborting\n); - goto err_dma; + return err; } pci_using_dac = 0; } if ((err = pci_request_regions(pdev, e1000_driver_name))) - goto err_pci_reg; + return err; pci_set_master(pdev); - err = -ENOMEM; - netdev = alloc_etherdev(sizeof(struct e1000_adapter)); + netdev = devm_alloc_etherdev(pdev-dev, sizeof(struct +e1000_adapter)); if (!netdev) - goto err_alloc_etherdev; + return -ENOMEM; I'm a bit confused why you removed the goto's, and then removed all the target unwinding code at the bottom of e1000_probe(). Those labels clean up resources if something fails, like the err_sw_init label. I don't see anything in the devres code that jumps out at me that explains why we can do away with these cleanup routines. Thoughts? Thanks, -PJ Waskiewicz - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: drivers/infiniband/mlx/mad.c misplaced ;
On Wed, Aug 15, 2007 at 05:40:11PM -0700, Joe Perches wrote: On Wed, 2007-08-15 at 19:58 -0400, Dave Jones wrote: Signed-off-by: Dave Jones [EMAIL PROTECTED] diff --git a/drivers/infiniband/hw/mlx4/mad.c b/drivers/infiniband/hw/mlx4/mad.c index 3330917..0ed02b7 100644 --- a/drivers/infiniband/hw/mlx4/mad.c +++ b/drivers/infiniband/hw/mlx4/mad.c @@ -109,7 +109,7 @@ int mlx4_MAD_IFC(struct mlx4_ib_dev *dev, int ignore_mkey, int ignore_bkey, in_modifier, op_modifier, MLX4_CMD_MAD_IFC, MLX4_CMD_TIME_CLASS_C); - if (!err); + if (!err) There's more than a few of these (not inspected). $ egrep -r --include=*.c \bif[[:space:]]*\([^\)]*\)[[:space:]]*\; * [...] drivers/s390/scsi/zfcp_erp.c: if (status == ZFCP_ERP_SUCCEEDED) ; /* no further action */ At least this one is not a bug. But I'm going to add a break there, so it doesn't look that strange anymore. Thanks! - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [GENETLINK] some thoughts on the usage
Thomas Graf wrote: * Richard MUSIL [EMAIL PROTECTED] 2007-08-10 10:45 I have noticed that although ops for each family are the same (each device is functionally same) I cannot use same genl_ops struct for registration, because it uses internal member to link in list. Therefore it is necessary to allocate new genl_ops for each device and pass it to registration. But I cannot officially use this list to track those genl_ops (so I can properly destroy them later), because there is no interface. So I need to redo the management of the structures on my own. The intended usage of the interface in your example would be to register only one genetlink family, say tpm, register one set of operations and then have an attribute in every message which specifies which TPM device to use. This helps keeping the total number of genetlink families down. I got your point. The fact that there are several families of the same device type seems however quite convenient. For example, I create/register virtual device /dev/tpm0 and register family with the same name for that device, the same for /dev/tpm1 etc. Then I got straightforward association in between devices and families and get for free the whole management what happen if I try to talk to device which is not registered/present etc. I could multiplex it over one channel, but I will need to make the communication protocol more complex and make me handle all exceptions myself. Since in my case there will be probably not more than one device present, and the device itself is quite exotic I would probably not rewrite it to the multiplexing scheme, but I understand what you mean and will take it into account next time I face decision how to use genetlink. However I am still wondering, whether the allocation of structures (genl_family, genl_ops) should not be done by genetlink layer instead of the user (I mean allocating copy of the struct passed by user). This is for example, what TPM layer (tpm.c) does. This would come at slight cost at memory usage and performance, but will protect the caller from inspecting internal behavior and taking care of that. And internal links would nicely help in keeping of track of allocated structures. I could write a patch for this. The second inconvenience is that for each family I register, I also register basically same ops (basically means, the definitions, and doit, dumpit handlers are same, though the structures are at different addresses for reasons described above). When the handler receives the message it needs to associate the message with the actual device it is handling. This could be done through family lookup (using nlmsghdr::nlmsg_type), but I wondered if it would make sense to extend genl_family for user custom data pointer and then pass this custom data (or genl_family reference) to each handler (for example inside genl_info). It is already parsed by genetlink layer, so it should not slow things down. That's not a bad idea, although I think we should try and keep the generic netlink part as simple as possible. There is a family specific header, referred to as user header in genl_info which is basically what you're looking for with the custom header. I believe making the generic netlink family aware of anything beyond family id and operations id only complicates things. Ok, this was just an idea ;-) - probably important only for high performance genetlink users. Richard - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
[ Bill tells me in private communication he gets this already, but I think it's more complicated than the shoddy explanation I'd made earlier so would wish to make this clearer in detail one last time, for the benefit of others listening in or reading the archives. ] On Thu, 16 Aug 2007, Satyam Sharma wrote: On Thu, 16 Aug 2007, Satyam Sharma wrote: [...] On Wed, 15 Aug 2007, Bill Fink wrote: [...] I'm curious about one minor tangential point. Why, instead of: b = *(volatile int *)a; why can't this just be expressed as: b = (volatile int)a; Isn't it the contents of a that's volatile, i.e. it's value can change invisibly to the compiler, and that's why you want to force a read from memory? Why do you need the *(volatile int *) construct? b = (volatile int)a; doesn't help us because a cast to a qualified type has the same effect as a cast to an unqualified version of that type, as mentioned in 6.5.4:4 (footnote 86) of the standard. Note that volatile is a type-qualifier, not a type itself, so a cast of the _object_ itself to a qualified-type i.e. (volatile int) would not make the access itself volatile-qualified. Casts don't produce lvalues, and the cast ((volatile int)a) does not produce the object-int-a-qualified-as-volatile -- in fact, the result of the above cast is whatever is the _value_ of int a, with the access to that object having _already_ taken place, as per the actual type-qualification of the object (that was originally declared as being _non-volatile_, in fact). Hence, defining atomic_read() as: #define atomic_read(v) ((volatile int)((v)-counter)) would be buggy and not give volatility semantics at all, unless the counter object itself isn't volatile-qualified already (which it isn't). The result of the cast itself being the _value_ of the int object, and not the object itself (i.e., not an lvalue), is thereby independent of type-qualification in that cast itself (it just wouldn't make any difference), hence the cast to a qualified type has the same effect as a cast to an unqualified version of that type bit in section 6.5.4:4 of the standard. To serve our purposes, it is necessary for us to take the address of this (non-volatile) object, cast the resulting _pointer_ to the corresponding volatile-qualified pointer-type, and then dereference it. This makes that particular _access_ be volatile-qualified, without the object itself being such. Also note that the (dereferenced) result is also a valid lvalue and hence can be used in *(volatile int *)a = b; kind of construction (which we use for the atomic_set case). Dereferencing using the *(pointer-type-cast) construct, OTOH, serves us well: #define atomic_read(v) (*(volatile int *)(v)-counter) Firstly, note that the cast here being (volatile int *) and not (int * volatile) qualifies the type of the _object_ being pointed to by the pointer in question as being volatile-qualified, and not the pointer itself (6.2.5:27 of the standard, and 6.3.2.3:2 allows us to convert from a pointer-to-non-volatile-qualified-int to a pointer-to- volatile-qualified-int, which suits us just fine) -- but note that the _access_ to that address itself has not yet occurred. _After_ specifying the memory address as containing a volatile-qualified- int-type object, (and GCC co-operates as mentioned below), we proceed to dereference it, which is when the _actual access_ occurs, therefore with volatility semantics this time. Interesting. Here, I should obviously admit that the semantics of *(volatile int *) aren't any neater or well-defined in the _language standard_ at all. The standard does say (verbatim) precisely what constitutes as access to object of volatile-qualified type is implementation-defined, but GCC does help us out here by doing the right thing. Accessing the non-volatile object there using the volatile-qualified pointer-type cast makes GCC treat the object stored at that memory address itself as if it were a volatile object, thus making the access end up having what we're calling as volatility semantics here. Honestly, given such confusion, and the propensity of the volatile type-qualifier keyword to be ill-defined (or at least poorly understood, often inconsistently implemented), I'd (again) express my opinion that it would be best to avoid its usage, given other alternatives do exist. Satyam - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] [IPv6]: Invalid semicolon after if statement
On Wed, 15 Aug 2007, David Miller wrote: From: Dave Jones [EMAIL PROTECTED] Date: Wed, 15 Aug 2007 19:52:20 -0400 On Wed, Aug 15, 2007 at 03:08:14PM -0700, David Miller wrote: From: Ilpo_Järvinen [EMAIL PROTECTED] Date: Thu, 16 Aug 2007 00:57:00 +0300 (EEST) A similar fix to netfilter from Eric Dumazet inspired me to look around a bit by using some grep/sed stuff as looking for this kind of bugs seemed easy to automate. This is one of them I found where it looks like this semicolon is not valid. Signed-off-by: Ilpo Järvinen [EMAIL PROTECTED] Yikes! Makes you want to audit the entire tree for these things :-))) Indeed. Here's another one. Signed-off-by: Dave Jones [EMAIL PROTECTED] That got fixed the other day and is the A similar fix to netfilter from Eric Dumazet Ilpo is reffering to above :) ...heh, when I said a bit, I meant that it took a very little effort to check over the whole tree... :-) ...I've already reported all four things one could find from whole tree with this cmd (to the relevant parties), so no need for you to redo the effort :-) : $ for i in `find . -name '*.[ch]'`; do echo $i; sed -n -e '/^\t*if *[(].*[)] *; *$/ N' -e '/^\(\t*\)if *[(].*[)] *; *\n\1\t/ p' $i; done | tee result ...Basically it checks if the next line has more indentation than the if line. ...Obviously it will work only if the code follows current CodingStyle in indentation. I'm currently experimenting with indent preprocessing step... ...but I'm not yet sure if I can pull something useful out from that too :-) ...Better cmdlines could be invented, e.g. by manually checking every if (); lines once one has first automated separation of them from if () do_smthg(); lines (I haven't learned myself how to easily count/work with parenthesis pairs in a cmdline, which seems necessary here)... -- i.
Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
Herbert Xu wrote: On Thu, Aug 16, 2007 at 10:06:31AM +0200, Stefan Richter wrote: Do you (or anyone else for that matter) have an example of this? The only code I somewhat know, the ieee1394 subsystem, was perhaps authored and is currently maintained with the expectation that each occurrence of atomic_read actually results in a load operation, i.e. is not optimized away. This means all atomic_t (bus generation, packet and buffer refcounts, and some other state variables)* and likewise all atomic bitops in that subsystem. Can you find an actual atomic_read code snippet there that is broken without the volatile modifier? What do I have to look for? atomic_read after another read or write access to the same variable, in the same function scope? Or in the sum of scopes of functions that could be merged by function inlining? One example was discussed here earlier: The for (;;) loop in nodemgr_host_thread. There an msleep_interruptible implicitly acted as barrier (at the moment because it's in a different translation unit; if it were the same, then because it hopefully has own barriers). So that happens to work, although such an implicit barrier is bad style: Better enforce the desired behaviour (== guaranteed load operation) *explicitly*. -- Stefan Richter -=-=-=== =--- = http://arcgraph.de/sr/ - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: drivers/infiniband/mlx/mad.c misplaced ;
...I guess those guys hunting for broken busyloops in the other thread could also benefit from similar searching commands introduced in this thread... ...Ccing Satyam to caught their attention too. On Wed, Aug 15, 2007 at 05:40:11PM -0700, Joe Perches wrote: There's more than a few of these (not inspected). $ egrep -r --include=*.c \bif[[:space:]]*\([^\)]*\)[[:space:]]*\; * ...Hmm, I plugged in a preprocessor too to manage with non compliant coding styles :-). Please understand that the line numbers are not an exact match due to preprocessor changes: $ for i in `find . -name '*.[ch]'`; do echo $i; indent -npro -kr -i8 -ts8 -sob -l8000 -ss -ncs -cp1 -nhnl -st $i | egrep -n [[:space:]]if [(].*[)] ;$; done | grep -B1 ^[^.] ./arch/arm/mach-omap1/leds-innovator.c 97: if (led_state LED_STATE_ENABLED) ; -- ./arch/mips/sibyte/cfe/console.c 23: if (written 0) ; 32: if (written 0) ; -- ./arch/powerpc/kernel/legacy_serial.c 524:if (0) ; -- ./arch/powerpc/xmon/ppc-opc.c 938:else if (value == 0) ; -- ./arch/sh/boards/se/7343/io.c 137:if (0) ; -- ./arch/um/kernel/tt/tracer.c 254:if (WIFEXITED(status)) ; -- ./arch/x86_64/ia32/ptrace32.c 363:if (__copy_from_user(child-thread.i387.fxsave, u, sizeof(*u))) ; -- ./arch/x86_64/kernel/traps.c 801:if (eregs == (struct pt_regs *)eregs-rsp) ; -- ./drivers/atm/iphase.c 159:if (!desc1) ; -- ./drivers/isdn/capi/capiutil.c 456:else if (c = 0x0f) ; -- ./drivers/isdn/hisax/hfc_pci.c 125:if (Read_hfc(cs, HFCPCI_INT_S1)) ; 155:if (Read_hfc(cs, HFCPCI_INT_S1)) ; 1483: if (Read_hfc(cs, HFCPCI_INT_S1)) ; -- ./drivers/isdn/hisax/hfc_sx.c 377:if (Read_hfc(cs, HFCSX_INT_S1)) ; 407:if (Read_hfc(cs, HFCSX_INT_S2)) ; 1246: if (Read_hfc(cs, HFCSX_INT_S1)) ; -- ./drivers/media/video/video-buf.c 1141: if (q-bufs[i]) ; -- ./drivers/net/lp486e.c 777:if (lp-scb.command i596_timeout(dev, i596_cleanup_cmd, 100)) ; 785:if (lp-scb.command i596_timeout(dev, i596_reset, 100)) ; 794:if (lp-scb.command i596_timeout(dev, i596_reset(2), 400)) ; 820:if (lp-scb.command i596_timeout(dev, i596_add_cmd, 100)) ; 1146: if (lp-scb.command i596_timeout(dev, interrupt, 40)) ; 1192: if (lp-scb.command i596_timeout(dev, i596 interrupt, 100)) ; 1217: if (lp-scb.command i596_timeout(dev, i596_close, 200)) ; -- ./drivers/net/ni5010.c 273:if (dev-irq == 0xff) ; -- ./drivers/net/ni52.c 648:if (result TDR_LNK_OK) ; -- ./drivers/net/sun3_82586.c 498:if (result TDR_LNK_OK) ; -- ./drivers/pci/hotplug/ibmphp_core.c 418:else if (mode == BUS_MODE_PCI) ; 636:else if (mode == BUS_MODE_PCI) ; -- ./drivers/usb/gadget/file_storage.c 2480: if (protocol_is_scsi()) ; -- ./drivers/usb/host/uhci-debug.c 416:if (i = SKEL_ISO) ; 419:else if (!uhci-fsbr_is_on) ; -- ./drivers/usb/host/uhci-q.c 541:if (qh-skel == SKEL_ISO) ; -- ./drivers/usb/misc/usbtest.c 1401: if (status != 0) ; -- ./drivers/video/intelfb/intelfbdrv.c 337:if (get_opt_bool(this_opt, accel, accel)) ; 338:else if (get_opt_int(this_opt, vram, vram)) ; 339:else if (get_opt_bool(this_opt, hwcursor, hwcursor)) ; 340:else if (get_opt_bool(this_opt, mtrr, mtrr)) ; 341:else if (get_opt_bool(this_opt, fixed, fixed)) ; -- ./drivers/video/matrox/matroxfb_DAC1064.c 46: if (fvco = 10) ; -- ./drivers/video/matrox/matroxfb_maven.c 298:if (fvco = 1) ; 316:if (fvco = 10) ; -- ./fs/hfs/inode.c 72: if (!node) ; -- ./fs/hfsplus/inode.c 67: if (!node) ; -- ./fs/hostfs/hostfs_user.c 300:if (attrs-ia_valid HOSTFS_ATTR_CTIME) ; -- ./fs/xfs/xfs_bmap.c 2287: if (nullfb || XFS_FSB_TO_AGNO(mp, ap-rval) == fb_agno) ; -- ./fs/xfs/xfs_dir2.c 281:else if ((rval = xfs_dir2_isblock(tp, dp, v))) ; -- ./fs/xfs/xfs_iomap.c 248:if (io-io_flags XFS_IOCORE_RT) ; -- ./include/asm-cris/uaccess.h 255:if (n == 0) ; 303:if (n == 0) ; 351:if (n == 0) ; -- ./mm/swapfile.c 791:if (swcount = 1) ; -- ./net/core/pktgen.c 2256: if (pkt_dev-min_in6_daddr.s6_addr32[0] == 0 pkt_dev-min_in6_daddr.s6_addr32[1] == 0 pkt_dev-min_in6_daddr.s6_addr32[2] == 0 pkt_dev-min_in6_daddr.s6_addr32[3] == 0) ; -- ./net/irda/af_irda.c 1357: if (ret) ; 1358: else if (sk-sk_shutdown RCV_SHUTDOWN) ; -- ./net/irda/irnetlink.c 105:if (nla_put_string(msg, IRDA_NL_ATTR_IFNAME, dev-name)) ; -- ./net/netfilter/xt_u32.c 38: if (skb-len 4 || pos skb-len - 4) ; -- ./sound/pci/au88x0/au88x0_core.c 2076: if (vortex_adbdma_bufshift(vortex, i)) ; 2085: if (vortex_wtdma_bufshift(vortex, i)) ; --
Re: [REVISED PATCH] 3c59x: check return of pci_enable_device()
Revised patch for this. Mark commit 5cf33391eba81a49038fa8be8cbad8425b80bf7f Author: Mark Hindley [EMAIL PROTECTED] Date: Thu Aug 16 11:26:35 2007 +0100 Check return of pci_enable_device in vortex_up(). Also modify vortex_up to return error to callers. Handle failure of vortex_up in vortex_open and vortex_resume. Signed-off-by: Mark Hindley [EMAIL PROTECTED] diff --git a/drivers/net/3c59x.c b/drivers/net/3c59x.c index 001c66d..7b3050c 100644 --- a/drivers/net/3c59x.c +++ b/drivers/net/3c59x.c @@ -705,7 +705,7 @@ static struct { static int vortex_probe1(struct device *gendev, void __iomem *ioaddr, int irq, int chip_idx, int card_idx); -static void vortex_up(struct net_device *dev); +static int vortex_up(struct net_device *dev); static void vortex_down(struct net_device *dev, int final); static int vortex_open(struct net_device *dev); static void mdio_sync(void __iomem *ioaddr, int bits); @@ -841,8 +841,11 @@ static int vortex_resume(struct pci_dev *pdev) return -EBUSY; } if (netif_running(dev)) { - vortex_up(dev); - netif_device_attach(dev); + err = vortex_up(dev); + if (err) + return err; + else + netif_device_attach(dev); } } return 0; @@ -1484,19 +1487,24 @@ static void vortex_check_media(struct net_device *dev, unsigned int init) } } -static void +static int vortex_up(struct net_device *dev) { struct vortex_private *vp = netdev_priv(dev); void __iomem *ioaddr = vp-ioaddr; unsigned int config; - int i, mii_reg1, mii_reg5; + int i, mii_reg1, mii_reg5, err; if (VORTEX_PCI(vp)) { pci_set_power_state(VORTEX_PCI(vp), PCI_D0);/* Go active */ if (vp-pm_state_valid) pci_restore_state(VORTEX_PCI(vp)); - pci_enable_device(VORTEX_PCI(vp)); + err = pci_enable_device(VORTEX_PCI(vp)); + if (err) { + printk(KERN_WARNING %s: Could not enable device \n, + dev-name); + goto err_out; + } } /* Before initializing select the active media port. */ @@ -1660,6 +1668,8 @@ vortex_up(struct net_device *dev) if (vp-cb_fn_base) /* The PCMCIA people are idiots. */ iowrite32(0x8000, vp-cb_fn_base + 4); netif_start_queue (dev); +err_out: + return err; } static int @@ -1673,7 +1683,7 @@ vortex_open(struct net_device *dev) if ((retval = request_irq(dev-irq, vp-full_bus_master_rx ? boomerang_interrupt : vortex_interrupt, IRQF_SHARED, dev-name, dev))) { printk(KERN_ERR %s: Could not reserve IRQ %d\n, dev-name, dev-irq); - goto out; + goto err; } if (vp-full_bus_master_rx) { /* Boomerang bus master. */ @@ -1702,20 +1712,22 @@ vortex_open(struct net_device *dev) } } retval = -ENOMEM; - goto out_free_irq; + goto err_free_irq; } /* Wrap the ring. */ vp-rx_ring[i-1].next = cpu_to_le32(vp-rx_ring_dma); } - vortex_up(dev); - return 0; + retval = vortex_up(dev); + if (!retval) + goto out; -out_free_irq: +err_free_irq: free_irq(dev-irq, dev); -out: +err: if (vortex_debug 1) printk(KERN_ERR %s: vortex_open() fails: returning %d\n, dev-name, retval); +out: return retval; } - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
I wrote: Herbert Xu wrote: On Thu, Aug 16, 2007 at 10:06:31AM +0200, Stefan Richter wrote: [...] expectation that each occurrence of atomic_read actually results in a load operation, i.e. is not optimized away. [...] Can you find an actual atomic_read code snippet there that is broken without the volatile modifier? PS: Just to clarify, I'm not speaking for the volatile modifier. I'm not speaking for any particular implementation of atomic_t and its accessors at all. All I am saying is that - we use atomically accessed data types because we concurrently but locklessly access this data, - hence a read access to this data that could be optimized away makes *no sense at all*. The only sensible read accessor to an atomic datatype is a read accessor that will not be optimized away. So, the architecture guys can implement atomic_read however they want --- as long as it cannot be optimized away.* PPS: If somebody has code where he can afford to let the compiler coalesce atomic_read with a previous access to the same data, i.e. doesn't need and doesn't want all guarantees that the atomic_read API makes (or IMO should make), then he can replace the atomic_read by a local temporary variable. *) Exceptions: if (known_to_be_false) read_access(a); and the like. -- Stefan Richter -=-=-=== =--- = http://arcgraph.de/sr/ - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
On Thu, Aug 16, 2007 at 11:54:44AM +0200, Stefan Richter wrote: One example was discussed here earlier: The for (;;) loop in nodemgr_host_thread. There an msleep_interruptible implicitly acted as barrier (at the moment because it's in a different translation unit; if it were the same, then because it hopefully has own barriers). So that happens to work, although such an implicit barrier is bad style: Better enforce the desired behaviour (== guaranteed load operation) *explicitly*. Hmm, it's not bad style at all. Let's assume that everything is in the same scope. Such a loop must either call a function that busy-waits, which should always have a cpu_relax or something equivalent, or it'll call a function that schedules away which immediately invalidates any values the compiler might have cached for the atomic_read. Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmVHI~} [EMAIL PROTECTED] Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
On Thu, Aug 16, 2007 at 12:31:03PM +0200, Stefan Richter wrote: PS: Just to clarify, I'm not speaking for the volatile modifier. I'm not speaking for any particular implementation of atomic_t and its accessors at all. All I am saying is that - we use atomically accessed data types because we concurrently but locklessly access this data, - hence a read access to this data that could be optimized away makes *no sense at all*. No sane compiler can optimise away an atomic_read per se. That's only possible if there's a preceding atomic_set or atomic_read, with no barriers in the middle. If that's the case, then one has to conclude that doing away with the second read is acceptable, as otherwise a memory (or at least a compiler) barrier should have been used. In fact, volatile doesn't guarantee that the memory gets read anyway. You might be reading some stale value out of the cache. Granted this doesn't happen on x86 but when you're coding for the kernel you can't make such assumptions. So the point here is that if you don't mind getting a stale value from the CPU cache when doing an atomic_read, then surely you won't mind getting a stale value from the compiler cache. So, the architecture guys can implement atomic_read however they want --- as long as it cannot be optimized away.* They can implement it however they want as long as it stays atomic. Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmVHI~} [EMAIL PROTECTED] Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: drivers/infiniband/mlx/mad.c misplaced ;
On Thu, Aug 16, 2007 at 01:22:04PM +0300, Ilpo Järvinen wrote: ...I guess those guys hunting for broken busyloops in the other thread could also benefit from similar searching commands introduced in this thread... ...Ccing Satyam to caught their attention too. ./drivers/isdn/hisax/hfc_pci.c 125: if (Read_hfc(cs, HFCPCI_INT_S1)) ; 155: if (Read_hfc(cs, HFCPCI_INT_S1)) ; 1483: if (Read_hfc(cs, HFCPCI_INT_S1)) ; -- ./drivers/isdn/hisax/hfc_sx.c 377: if (Read_hfc(cs, HFCSX_INT_S1)) ; 407: if (Read_hfc(cs, HFCSX_INT_S2)) ; 1246: if (Read_hfc(cs, HFCSX_INT_S1)) ; -- These are workaround to not get compiler warnings about ignored return values I got some time ago under some architecture. -- Karsten Keil SuSE Labs ISDN and VOIP development SUSE LINUX Products GmbH, Maxfeldstr.5 90409 Nuernberg, GF: Markus Rex, HRB 16746 (AG Nuernberg) - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch 2/4] Update e100 driver to use devres.
Brandon Philips wrote: @@ -2554,8 +2547,10 @@ static int __devinit e100_probe(struct p struct net_device *netdev; struct nic *nic; int err; + void __iomem **iomap; + int bar; - if(!(netdev = alloc_etherdev(sizeof(struct nic { + if (!(netdev = devm_alloc_etherdev(pdev-dev, sizeof(struct nic { if(((1 debug) - 1) NETIF_MSG_PROBE) printk(KERN_ERR PFX Etherdev alloc failed, abort.\n); return -ENOMEM; @@ -2585,26 +2580,28 @@ static int __devinit e100_probe(struct p nic-msg_enable = (1 debug) - 1; pci_set_drvdata(pdev, netdev); - if((err = pci_enable_device(pdev))) { + if ((err = pcim_enable_device(pdev))) { DPRINTK(PROBE, ERR, Cannot enable PCI device, aborting.\n); - goto err_out_free_dev; + return err; } if(!(pci_resource_flags(pdev, 0) IORESOURCE_MEM)) { DPRINTK(PROBE, ERR, Cannot find proper PCI device base address, aborting.\n); err = -ENODEV; - goto err_out_disable_pdev; + return err; } - if((err = pci_request_regions(pdev, DRV_NAME))) { + bar = use_io ? 1 : 0; + if((err = pcim_iomap_regions(pdev, 1 bar, DRV_NAME))) { DPRINTK(PROBE, ERR, Cannot obtain PCI resources, aborting.\n); - goto err_out_disable_pdev; + return err; } + iomap = pcim_iomap_table(pdev)[bar]; Type mismatch. Didn't compiler warn about it? if((err = pci_set_dma_mask(pdev, DMA_32BIT_MASK))) { DPRINTK(PROBE, ERR, No usable DMA configuration, aborting.\n); - goto err_out_free_res; + return err; } SET_MODULE_OWNER(netdev); @@ -2613,12 +2610,6 @@ static int __devinit e100_probe(struct p if (use_io) DPRINTK(PROBE, INFO, using i/o access mode\n); - nic-csr = pci_iomap(pdev, (use_io ? 1 : 0), sizeof(struct csr)); - if(!nic-csr) { - DPRINTK(PROBE, ERR, Cannot map device registers, aborting.\n); - err = -ENOMEM; - goto err_out_free_res; - } nic-csr initialization seems missing. Have you tested the patched code? -- tejun - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch 4/4] Update e1000 driver to use devres.
Brandon Philips wrote: - mmio_start = pci_resource_start(pdev, BAR_0); mmio_len = pci_resource_len(pdev, BAR_0); You don't need mmio_len either. - err = -EIO; - adapter-hw.hw_addr = ioremap(mmio_start, mmio_len); + adapter-hw.hw_addr = pcim_iomap(pdev, BAR_0, mmio_len); Passing 0 as @max_len tells pci[m]_iomap() to use pci_resource_len() of the BAR. @@ -952,16 +948,15 @@ e1000_probe(struct pci_dev *pdev, /* setup the private structure */ if ((err = e1000_sw_init(adapter))) - goto err_sw_init; + return err; err = -EIO; /* Flash BAR mapping must happen after e1000_sw_init * because it depends on mac_type */ if ((adapter-hw.mac_type == e1000_ich8lan) (pci_resource_flags(pdev, 1) IORESOURCE_MEM)) { - flash_start = pci_resource_start(pdev, 1); flash_len = pci_resource_len(pdev, 1); Ditto. - adapter-hw.flash_address = ioremap(flash_start, flash_len); + adapter-hw.flash_address = pcim_iomap(pdev, 1, flash_len); if (!adapter-hw.flash_address) goto err_flashmap; } -- tejun - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[RFC] net/core/dst.c : Should'nt dst_run_gc() be more scalable and friendly ?
Hi all When the periodic route cache flush is done, this machine suffers a lot and eventually triggers the soft lockup message. dst_run_gc() is doing a scan of a possibly huge list of dst_entries, eventually freeing some (less than 1%) of them, but holding the dst_lock spinlock for the whole scan. Then it rearms a timer to redo the full thing 1/10 s later. The slowdown can last one minute or so. No need to say that machine is not really usable in the meantime. I already looked at the code and I am testing a patch (included in this mail) where I limit the number of entries that can be scanned at each dst_run_gc() call, and not holding dst_lock during the scan. But then I dont know how to address the dst_dev_event() problem, since we must be able to dst_ifdown() all entries attached to one dev. I am wondering how to really solve the problem. Could a workqueue be used here instead of a timer ? Aug 16 06:21:37 SRV1 kernel: BUG: soft lockup detected on CPU#0! Aug 16 06:21:37 SRV1 kernel: Aug 16 06:21:37 SRV1 kernel: Call Trace: Aug 16 06:21:37 SRV1 kernel: IRQ [802286f0] wake_up_process+0x10/0x20 Aug 16 06:21:37 SRV1 kernel: [80251e09] softlockup_tick+0xe9/0x110 Aug 16 06:21:37 SRV1 kernel: [803cd380] dst_run_gc+0x0/0x140 Aug 16 06:21:37 SRV1 kernel: [802376f3] run_local_timers+0x13/0x20 Aug 16 06:21:37 SRV1 kernel: [802379c7] update_process_times+0x57/0x90 Aug 16 06:21:37 SRV1 kernel: [80216034] smp_local_timer_interrupt+0x34/0x60 Aug 16 06:21:37 SRV1 kernel: [802165cc] smp_apic_timer_interrupt+0x5c/0x80 Aug 16 06:21:37 SRV1 kernel: [8020a816] apic_timer_interrupt+0x66/0x70 Aug 16 06:21:37 SRV1 kernel: [803cd3d3] dst_run_gc+0x53/0x140 Aug 16 06:21:37 SRV1 kernel: [803cd3c6] dst_run_gc+0x46/0x140 Aug 16 06:21:37 SRV1 kernel: [80237148] run_timer_softirq+0x148/0x1c0 Aug 16 06:21:37 SRV1 kernel: [8023340c] __do_softirq+0x6c/0xe0 Aug 16 06:21:37 SRV1 kernel: [8020ad6c] call_softirq+0x1c/0x30 Aug 16 06:21:37 SRV1 kernel: EOI [8020cb34] do_softirq+0x34/0x90 Aug 16 06:21:37 SRV1 kernel: [802331cf] local_bh_enable_ip+0x3f/0x60 Aug 16 06:21:37 SRV1 kernel: [80422913] _spin_unlock_bh+0x13/0x20 Aug 16 06:21:37 SRV1 kernel: [803dfde8] rt_garbage_collect+0x1d8/0x320 Aug 16 06:21:37 SRV1 kernel: [803cd4dd] dst_alloc+0x1d/0xa0 Aug 16 06:21:37 SRV1 kernel: [803e1433] __ip_route_output_key+0x573/0x800 Aug 16 06:21:37 SRV1 kernel: [803c02e2] sock_common_recvmsg+0x32/0x50 Aug 16 06:21:37 SRV1 kernel: [803e16dc] ip_route_output_flow+0x1c/0x60 Aug 16 06:21:37 SRV1 kernel: [80400160] tcp_v4_connect+0x150/0x610 Aug 16 06:21:37 SRV1 kernel: [803ebf07] inet_bind_bucket_create+0x17/0x60 Aug 16 06:21:37 SRV1 kernel: [8040cd16] inet_stream_connect+0xa6/0x2c0 Aug 16 06:21:37 SRV1 kernel: [80422981] _spin_lock_bh+0x11/0x30 Aug 16 06:21:37 SRV1 kernel: [803c0bdf] lock_sock_nested+0xcf/0xe0 Aug 16 06:21:37 SRV1 kernel: [80422981] _spin_lock_bh+0x11/0x30 Aug 16 06:21:37 SRV1 kernel: [803be551] sys_connect+0x71/0xa0 Aug 16 06:21:37 SRV1 kernel: [803eee3f] tcp_setsockopt+0x1f/0x30 Aug 16 06:21:37 SRV1 kernel: [803c030f] sock_common_setsockopt+0xf/0x20 Aug 16 06:21:37 SRV1 kernel: [803be4bd] sys_setsockopt+0x9d/0xc0 Aug 16 06:21:37 SRV1 kernel: [8028881e] sys_ioctl+0x5e/0x80 Aug 16 06:21:37 SRV1 kernel: [80209c4e] system_call+0x7e/0x83 Prelimary patch : --- linux-2.6.22/net/core/dst.c +++ linux-2.6.22-ed/net/core/dst.c @@ -28,6 +28,9 @@ * 4) All operations modify state, so a spinlock is used. */ static struct dst_entry*dst_garbage_list; +static struct dst_entry *dst_garbage_wrk; +static struct dst_entry *dst_garbage_inuse; +static int dst_gc_running; #if RT_CACHE_DEBUG = 2 static atomic_t dst_total = ATOMIC_INIT(0); #endif @@ -42,26 +45,42 @@ static void dst_run_gc(unsigned long dummy) { - intdelayed = 0; - intwork_performed; - struct dst_entry * dst, **dstp; - - if (!spin_trylock(dst_lock)) { - mod_timer(dst_gc_timer, jiffies + HZ/10); - return; - } - + intquota = 1000; + intwork_performed = 0; + struct dst_entry *dst, *next; + struct dst_entry *first = NULL, *last = NULL; +#if RT_CACHE_DEBUG = 2 + unsigned long t0 = jiffies; +#endif + spin_lock(dst_lock); + if (dst_gc_running) + goto out; del_timer(dst_gc_timer); - dstp = dst_garbage_list; - work_performed = 0; - while ((dst = *dstp) != NULL) { - if (atomic_read(dst-__refcnt)) { - dstp = dst-next; - delayed++; + if (!dst_garbage_wrk) { + dst_garbage_wrk = dst_garbage_list; +
Re: drivers/infiniband/mlx/mad.c misplaced ;
Hi Ilpo, On Thu, 16 Aug 2007, Ilpo Järvinen wrote: ...I guess those guys hunting for broken busyloops in the other thread could also benefit from similar searching commands introduced in this thread... ...Ccing Satyam to caught their attention too. On Wed, Aug 15, 2007 at 05:40:11PM -0700, Joe Perches wrote: There's more than a few of these (not inspected). $ egrep -r --include=*.c \bif[[:space:]]*\([^\)]*\)[[:space:]]*\; * ...Hmm, I plugged in a preprocessor too to manage with non compliant coding styles :-). Please understand that the line numbers are not an exact match due to preprocessor changes: $ for i in `find . -name '*.[ch]'`; do echo $i; indent -npro -kr -i8 -ts8 -sob -l8000 -ss -ncs -cp1 -nhnl -st $i | egrep -n [[:space:]]if [(].*[)] ;$; done | grep -B1 ^[^.] Thanks, looks useful, will check with this. Satyam
Re: [RFC] net/core/dst.c : Should'nt dst_run_gc() be more scalable and friendly ?
Eric Dumazet [EMAIL PROTECTED] wrote: I am wondering how to really solve the problem. Could a workqueue be used here instead of a timer ? I think so. A mutex would separate the GC and dst_ifdown. You'd take the spin lock in the GC only to replace the garbage list with NULL. You then drop the lock to process it. Once it's done you take the lock again and join it with whatever that's been added in the mean time. This is easy because you should already have the tail after the GC process. Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmVHI~} [EMAIL PROTECTED] Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: 2.6.23-rc3 and SKY2 driver issue
On Thu, 16 Aug 2007 10:25:45 +0200 Michal Piotrowski [EMAIL PROTECTED] wrote: [Adding Stephen and netdev to CC] On 15/08/07, James Corey [EMAIL PROTECTED] wrote: I tried running a D-link gig card on kernel 2.6.21.1 and it came up fine, but when I did a sftp of an linux dvd ISO to it, the interface would lock up hard with the error eth1: hw csum failure. Call Trace: IRQ [804d3e31] __skb_checksum_complete_head+0x46/0x5f [804d3e56] __skb_checksum_complete+0xc/0x11 [805046c7] tcp_v4_rcv+0x157/0x810 [804d8a6f] dev_queue_xmit+0x237/0x260 [80229990] find_busiest_group+0x252/0x684 [804ead50] ip_local_deliver+0xca/0x14c [804eb24a] ip_rcv+0x478/0x4ba [803ec7d3] sky2_poll+0x6f9/0x9b9 [8022bd5d] run_rebalance_domains+0x13e/0x408 [804d877a] net_rx_action+0xa8/0x166 [80235d62] __do_softirq+0x55/0xc3 [8020a4ec] call_softirq+0x1c/0x28 [8020b611] do_softirq+0x2c/0x7d [8020b8cf] do_IRQ+0x13e/0x15f [802086ce] mwait_idle+0x0/0x46 [80209871] ret_from_intr+0x0/0xa EOI [80208710] mwait_idle+0x42/0x46 [80208666] cpu_idle+0x8c/0xaf [8078174e] start_kernel+0x2ac/0x2b8 [80781140] _sinittext+0x140/0x144 So I tried running the latest snapshot 2.6.23-rc3 and the almost the same thing happens. Only difference is that now the entire box locks up. The error is almost the same eth1: hw csum failure. Call Trace: IRQ [804779b6] __skb_checksum_complete_head+0x43/0x56 [804779d5] __skb_checksum_complete+0xc/0x11 [804a989d] tcp_v4_rcv+0x14e/0x801 [8048ff84] ip_local_deliver+0xca/0x14c [80490472] ip_rcv+0x46c/0x4ae [880060f9] :sky2:sky2_poll+0x72b/0x9c7 [8047c934] net_rx_action+0xa8/0x166 [80235ced] __do_softirq+0x55/0xc4 [8020c5cc] call_softirq+0x1c/0x28 [8020d6fd] do_softirq+0x2c/0x7d [8020d9bb] do_IRQ+0x13e/0x15f [8020a780] mwait_idle+0x0/0x48 [8020b951] ret_from_intr+0x0/0xa EOI [880063e7] :sky2:sky2_xmit_frame+0x0/0x41d [8020a7c2] mwait_idle+0x42/0x48 [8020a718] cpu_idle+0xbd/0xe0 [80704a5a] start_kernel+0x2ac/0x2b8 [80704140] _sinittext+0x140/0x144 I see that the new kernel includes some sort of SKY2 DEBUG stuff. I would be happy to rerun the test with that turned on, given some minor direction. Regards, Michal Please reproduce with a more recent kernel? - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 0/4] backport of sky2 stability fixes
These are bugfixes that happened late in 2.6.22 cycle that missed getting merged. -- - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/4] sky2: restore workarounds for lost interrupts
Backport of commit c59697e06058fc2361da8cefcfa3de85ac107582 This patch restores a couple of workarounds from 2.6.16: * restart transmit moderation timer in case it expires during IRQ routine * default to having 10 HZ watchdog timer. At this point it more important not to hang than to worry about the power cost. Signed-off-by: Stephen Hemminger [EMAIL PROTECTED] --- a/drivers/net/sky2.c2007-08-08 21:50:10.0 +0100 +++ b/drivers/net/sky2.c2007-08-08 22:01:00.0 +0100 @@ -96,7 +96,7 @@ static int disable_msi = 0; module_param(disable_msi, int, 0); MODULE_PARM_DESC(disable_msi, Disable Message Signaled Interrupt (MSI)); -static int idle_timeout = 0; +static int idle_timeout = 100; module_param(idle_timeout, int, 0); MODULE_PARM_DESC(idle_timeout, Watchdog timer for lost interrupts (ms)); @@ -2442,6 +2442,13 @@ static int sky2_poll(struct net_device * work_done = sky2_status_intr(hw, work_limit); if (work_done work_limit) { + /* Bug/Errata workaround? +* Need to kick the TX irq moderation timer. +*/ + if (sky2_read8(hw, STAT_TX_TIMER_CTRL) == TIM_START) { + sky2_write8(hw, STAT_TX_TIMER_CTRL, TIM_STOP); + sky2_write8(hw, STAT_TX_TIMER_CTRL, TIM_START); + } netif_rx_complete(dev0); /* end of interrupt, re-enables also acts as I/O synchronization */ -- - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 3/4] sky2: check for more work before leaving NAPI
Backport of commit 5c11ce700f77fada15b6264417d72462da4bbb1c This patch avoids generating another IRQ if more packets arrive while in the NAPI poll routine. Before marking device as finished, it rechecks that the status ring is empty. Signed-off-by: Stephen Hemminger [EMAIL PROTECTED] --- a/drivers/net/sky2.c2007-08-08 22:01:11.0 +0100 +++ b/drivers/net/sky2.c2007-08-08 22:01:28.0 +0100 @@ -2428,8 +2428,7 @@ static void sky2_err_intr(struct sky2_hw static int sky2_poll(struct net_device *dev0, int *budget) { struct sky2_hw *hw = ((struct sky2_port *) netdev_priv(dev0))-hw; - int work_limit = min(dev0-quota, *budget); - int work_done = 0; + int work_done; u32 status = sky2_read32(hw, B0_Y2_SP_EISR); if (unlikely(status Y2_IS_ERROR)) @@ -2441,25 +2440,25 @@ static int sky2_poll(struct net_device * if (status Y2_IS_IRQ_PHY2) sky2_phy_intr(hw, 1); - work_done = sky2_status_intr(hw, work_limit); - if (work_done work_limit) { - /* Bug/Errata workaround? -* Need to kick the TX irq moderation timer. -*/ - if (sky2_read8(hw, STAT_TX_TIMER_CTRL) == TIM_START) { - sky2_write8(hw, STAT_TX_TIMER_CTRL, TIM_STOP); - sky2_write8(hw, STAT_TX_TIMER_CTRL, TIM_START); - } - netif_rx_complete(dev0); + work_done = sky2_status_intr(hw, min(dev0-quota, *budget)); + *budget -= work_done; + dev0-quota -= work_done; - /* end of interrupt, re-enables also acts as I/O synchronization */ - sky2_read32(hw, B0_Y2_SP_LISR); - return 0; - } else { - *budget -= work_done; - dev0-quota -= work_done; + /* More work? */ + if (hw-st_idx != sky2_read16(hw, STAT_PUT_IDX)) return 1; + + /* Bug/Errata workaround? +* Need to kick the TX irq moderation timer. +*/ + if (sky2_read8(hw, STAT_TX_TIMER_CTRL) == TIM_START) { + sky2_write8(hw, STAT_TX_TIMER_CTRL, TIM_STOP); + sky2_write8(hw, STAT_TX_TIMER_CTRL, TIM_START); } + netif_rx_complete(dev0); + + sky2_read32(hw, B0_Y2_SP_LISR); + return 0; } static irqreturn_t sky2_intr(int irq, void *dev_id) -- - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 4/4] sky2: check drop truncated packets
Backport of commit 71749531f2d1954137a1a77422ef4ff29eb102dd If packet larger than MTU is received, the driver uses hardware to truncate the packet. Use the status registers to catch/drop them. Signed-off-by: Stephen Hemminger [EMAIL PROTECTED] --- a/drivers/net/sky2.c2007-08-08 22:01:28.0 +0100 +++ b/drivers/net/sky2.c2007-08-08 22:01:37.0 +0100 @@ -2065,6 +2065,9 @@ static struct sk_buff *sky2_receive(stru if (!(status GMR_FS_RX_OK)) goto resubmit; + if (status 16 != length) + goto len_mismatch; + if (length copybreak) skb = receive_copy(sky2, re, length); else @@ -2074,6 +2077,11 @@ resubmit: return skb; +len_mismatch: + /* Truncation of overlength packets + causes PHY length to not match MAC length */ + ++sky2-net_stats.rx_length_errors; + error: ++sky2-net_stats.rx_errors; if (status GMR_FS_RX_FF_OV) { -- - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/4] sky2: carrier management
backport of commit 55d7b4e6ed6ad3ec5e5e30b3b4515a0a6a53e344 Make sky2 handle carrier similar to other drivers, eliminate some possible races in carrier state transistions. Signed-off-by: Stephen Hemminger [EMAIL PROTECTED] --- a/drivers/net/sky2.c2007-08-08 22:01:00.0 +0100 +++ b/drivers/net/sky2.c2007-08-08 22:01:11.0 +0100 @@ -1234,6 +1234,8 @@ static int sky2_up(struct net_device *de if (netif_msg_ifup(sky2)) printk(KERN_INFO PFX %s: enabling interface\n, dev-name); + netif_carrier_off(dev); + /* must be power of 2 */ sky2-tx_le = pci_alloc_consistent(hw-pdev, TX_RING_SIZE * @@ -1573,7 +1575,6 @@ static int sky2_down(struct net_device * /* Stop more packets from being queued */ netif_stop_queue(dev); - netif_carrier_off(dev); /* Disable port IRQ */ imask = sky2_read32(hw, B0_IMSK); @@ -1625,6 +1626,8 @@ static int sky2_down(struct net_device * sky2_phy_power(hw, port, 0); + netif_carrier_off(dev); + /* turn off LED's */ sky2_write16(hw, B0_Y2LED, LED_STAT_OFF); @@ -1689,7 +1692,6 @@ static void sky2_link_up(struct sky2_por gm_phy_write(hw, port, PHY_MARV_INT_MASK, PHY_M_DEF_MSK); netif_carrier_on(sky2-netdev); - netif_wake_queue(sky2-netdev); /* Turn on link LED */ sky2_write8(hw, SK_REG(port, LNK_LED_REG), @@ -1741,7 +1743,6 @@ static void sky2_link_down(struct sky2_p gma_write16(hw, port, GM_GP_CTRL, reg); netif_carrier_off(sky2-netdev); - netif_stop_queue(sky2-netdev); /* Turn on link LED */ sky2_write8(hw, SK_REG(port, LNK_LED_REG), LINKLED_OFF); @@ -3493,10 +3494,6 @@ static __devinit struct net_device *sky2 memcpy_fromio(dev-dev_addr, hw-regs + B2_MAC_1 + port * 8, ETH_ALEN); memcpy(dev-perm_addr, dev-dev_addr, dev-addr_len); - /* device is off until link detection */ - netif_carrier_off(dev); - netif_stop_queue(dev); - return dev; } -- - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: skge and sky2 stop working after a few minutes.
On Wed, 15 Aug 2007 21:20:37 + (UTC) Sébastien Judenherc [EMAIL PROTECTED] wrote: Hello, I can see some very strange problem with sky2 and skge interfaces with 2.6.22, and also 2.6.23-rc2/3. After 8-9 minutes, the interfaces stop working. ethtool reports that the link is down and the only way to make the interfaces usable again is removing/reinserting the module or running ethtool -r. This occurs for both sky2 and skge for two PCs connected to a gigabit switch. The other PCs on the same switch continue to work fine (most are 100Mbit/s). It also occurs on a third PC with skge, directly connected to a 100Mbit/s interface over a cross over cable. I didn't notice the problem with 2.6.19. Looks like some noise sensitivity in the PHY. Could you try the vendor sk98lin driver, although all drivers use the same phy settings. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 3/4] net: add dev_get_stats
Since we now have internal stats, it cleans up code to have a dev_get_stats() interface. This allows for future patches where 'network device ops' patch where get_stats is immutable. Signed-off-by: Stephen Hemminger [EMAIL PROTECTED] --- a/include/linux/netdevice.h 2007-08-16 06:34:25.0 -0400 +++ b/include/linux/netdevice.h 2007-08-16 06:38:33.0 -0400 @@ -809,6 +809,7 @@ extern int dev_set_mac_address(struct n struct sockaddr *); extern int dev_hard_start_xmit(struct sk_buff *skb, struct net_device *dev); +extern struct net_device_stats *dev_get_stats(struct net_device *dev); extern int netdev_budget; --- a/net/core/dev.c2007-08-16 06:34:25.0 -0400 +++ b/net/core/dev.c2007-08-16 08:26:04.0 -0400 @@ -2304,7 +2304,7 @@ void dev_seq_stop(struct seq_file *seq, static void dev_seq_printf_stats(struct seq_file *seq, struct net_device *dev) { - struct net_device_stats *stats = dev-get_stats(dev); + struct net_device_stats *stats = dev_get_stats(dev); seq_printf(seq, %6s:%8lu %7lu %4lu %4lu %4lu %5lu %10lu %9lu %8lu %7lu %4lu %4lu %4lu %5lu %7lu %10lu\n, @@ -3682,10 +3682,21 @@ out: mutex_unlock(net_todo_run_mutex); } -static struct net_device_stats *internal_stats(struct net_device *dev) +/** + * dev_get_stats - get network device statistics + * @dev: network device + * + * Get standard network device statistics. + * Use internal stastics unless device overides + */ +struct net_device_stats *dev_get_stats(struct net_device *dev) { - return dev-stats; + if (!dev-get_stats) + return dev-stats; + + return dev-get_stats(dev); } +EXPORT_SYMBOL(dev_get_stats); /** * alloc_netdev_mq - allocate network device @@ -3733,7 +3744,6 @@ struct net_device *alloc_netdev_mq(int s dev-egress_subqueue_count = queue_count; - dev-get_stats = internal_stats; setup(dev); strcpy(dev-name, name); return dev; --- a/net/core/net-sysfs.c 2007-08-16 06:34:25.0 -0400 +++ b/net/core/net-sysfs.c 2007-08-16 09:23:13.0 -0400 @@ -264,8 +264,7 @@ static ssize_t netstat_show(const struct WARN_ON(1); read_lock(dev_base_lock); - if (dev_isalive(dev) dev-get_stats - (stats = (*dev-get_stats)(dev))) + if (dev_isalive(dev) (stats = dev_get_stats(dev))) ret = sprintf(buf, fmt_ulong, *(unsigned long *)(((u8 *) stats) + offset)); @@ -481,8 +480,7 @@ int netdev_register_sysfs(struct net_dev BUILD_BUG_ON(BUS_ID_SIZE IFNAMSIZ); strlcpy(dev-bus_id, net-name, BUS_ID_SIZE); - if (net-get_stats) - *groups++ = netstat_group; + *groups++ = netstat_group; #ifdef CONFIG_WIRELESS_EXT if (net-wireless_handlers net-wireless_handlers-get_wireless_stats) --- a/net/core/rtnetlink.c 2007-08-16 06:34:25.0 -0400 +++ b/net/core/rtnetlink.c 2007-08-16 08:26:13.0 -0400 @@ -619,6 +619,7 @@ static int rtnl_fill_ifinfo(struct sk_bu { struct ifinfomsg *ifm; struct nlmsghdr *nlh; + struct net_device_stats *stats; nlh = nlmsg_put(skb, pid, seq, type, sizeof(*ifm), flags); if (nlh == NULL) @@ -666,18 +667,13 @@ static int rtnl_fill_ifinfo(struct sk_bu NLA_PUT(skb, IFLA_BROADCAST, dev-addr_len, dev-broadcast); } - if (dev-get_stats) { - struct net_device_stats *stats = dev-get_stats(dev); - if (stats) { - struct nlattr *attr; - - attr = nla_reserve(skb, IFLA_STATS, - sizeof(struct rtnl_link_stats)); - if (attr == NULL) - goto nla_put_failure; + if ((stats = dev_get_stats(dev))) { + struct nlattr *attr = nla_reserve(skb, IFLA_STATS, + sizeof(struct rtnl_link_stats)); + if (attr == NULL) + goto nla_put_failure; - copy_rtnl_link_stats(nla_data(attr), stats); - } + copy_rtnl_link_stats(nla_data(attr), stats); } if (dev-rtnl_link_ops) { -- - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/4] net: cleanup left over decl
Remove unneeded declaration. Signed-off-by: Stephen Hemminger [EMAIL PROTECTED] --- a/include/linux/netdevice.h 2007-08-15 14:29:02.0 +0100 +++ b/include/linux/netdevice.h 2007-08-15 14:35:25.0 +0100 @@ -833,8 +833,6 @@ extern int dev_set_mac_address(struct n extern int dev_hard_start_xmit(struct sk_buff *skb, struct net_device *dev); -extern voiddev_init(void); - extern int netdev_budget; /* Called by rtnetlink.c:rtnl_unlock() */ -- - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 0/4] Small network device interface changes
These cleanups should go in 2.6.24. Hope they don't conflict too bad with the NAPI stuff. -- - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/4] net: deinline dev_kfree_skb_irq
Deinline dev_kfree_skb_irq. This saves about 100bytes per call site on UP, probably more on SMP. Signed-off-by: Stephen Hemminger [EMAIL PROTECTED] --- a/include/linux/netdevice.h 2007-08-06 09:26:41.0 +0100 +++ b/include/linux/netdevice.h 2007-08-15 14:12:32.0 +0100 @@ -793,29 +793,6 @@ static inline int netif_is_multiqueue(co #endif } -/* Use this variant when it is known for sure that it - * is executing from interrupt context. - */ -static inline void dev_kfree_skb_irq(struct sk_buff *skb) -{ - if (atomic_dec_and_test(skb-users)) { - struct softnet_data *sd; - unsigned long flags; - - local_irq_save(flags); - sd = __get_cpu_var(softnet_data); - skb-next = sd-completion_queue; - sd-completion_queue = skb; - raise_softirq_irqoff(NET_TX_SOFTIRQ); - local_irq_restore(flags); - } -} - -/* Use this variant in places where it could be invoked - * either from interrupt or non-interrupt context. - */ -extern void dev_kfree_skb_any(struct sk_buff *skb); - #define HAVE_NETIF_RX 1 extern int netif_rx(struct sk_buff *skb); extern int netif_rx_ni(struct sk_buff *skb); --- a/include/linux/skbuff.h2007-08-06 09:26:43.0 +0100 +++ b/include/linux/skbuff.h2007-08-15 14:09:04.0 +0100 @@ -378,6 +378,8 @@ extern int skb_cow_data(struct sk struct sk_buff **trailer); extern intskb_pad(struct sk_buff *skb, int pad); #define dev_kfree_skb(a) kfree_skb(a) +extern void dev_kfree_skb_any(struct sk_buff *skb); +extern void dev_kfree_skb_irq(struct sk_buff *skb); extern void skb_over_panic(struct sk_buff *skb, int len, void *here); extern void skb_under_panic(struct sk_buff *skb, int len, --- a/net/core/dev.c2007-08-06 09:26:48.0 +0100 +++ b/net/core/dev.c2007-08-15 14:12:16.0 +0100 @@ -1249,6 +1249,36 @@ void __netif_rx_schedule(struct net_devi } EXPORT_SYMBOL(__netif_rx_schedule); +/** + * dev_kfree_skb_irq - free skb from IRQ context + * @skb: skb to free + * + * Queue skb for later release. Used when skb needs to be + * free but executing in IRQ context. + */ +void dev_kfree_skb_irq(struct sk_buff *skb) +{ + if (atomic_dec_and_test(skb-users)) { + struct softnet_data *sd; + unsigned long flags; + + local_irq_save(flags); + sd = __get_cpu_var(softnet_data); + skb-next = sd-completion_queue; + sd-completion_queue = skb; + raise_softirq_irqoff(NET_TX_SOFTIRQ); + local_irq_restore(flags); + } +} +EXPORT_SYMBOL(dev_kfree_skb_irq); + +/** + * dev_kfree_skb_any - free skb from any context + * @skb: skb to free + * + * Free skb from interrupt or non-interrupt context. + * Use this variant in places where context is not easily determined. + */ void dev_kfree_skb_any(struct sk_buff *skb) { if (in_irq() || irqs_disabled()) -- - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 4/4] net: netdev_budget rearrangement
Trivial patch, move netdev_budget declaration out of netdevice.h to a new home with the other sysctl externs. Signed-off-by: Stephen Hemminger [EMAIL PROTECTED] --- a/include/linux/netdevice.h 2007-08-16 06:38:33.0 -0400 +++ b/include/linux/netdevice.h 2007-08-16 09:24:08.0 -0400 @@ -811,8 +811,6 @@ extern int dev_hard_start_xmit(struct s struct net_device *dev); extern struct net_device_stats *dev_get_stats(struct net_device *dev); -extern int netdev_budget; - /* Called by rtnetlink.c:rtnl_unlock() */ extern void netdev_run_todo(void); --- a/net/core/sysctl_net_core.c2007-08-06 04:26:48.0 -0400 +++ b/net/core/sysctl_net_core.c2007-08-16 09:24:08.0 -0400 @@ -13,6 +13,7 @@ #ifdef CONFIG_SYSCTL +extern int netdev_budget; extern int netdev_max_backlog; extern int weight_p; -- - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC] net/core/dst.c : Should'nt dst_run_gc() be more scalable and friendly ?
On Thu, 16 Aug 2007 20:41:58 +0800 Herbert Xu [EMAIL PROTECTED] wrote: Eric Dumazet [EMAIL PROTECTED] wrote: I am wondering how to really solve the problem. Could a workqueue be used here instead of a timer ? I think so. A mutex would separate the GC and dst_ifdown. You'd take the spin lock in the GC only to replace the garbage list with NULL. You then drop the lock to process it. Once it's done you take the lock again and join it with whatever that's been added in the mean time. This is easy because you should already have the tail after the GC process. Thanks Herbert Yes, I already did this (with the current softirq based timer model), but how can dst_dev_event() do its work, since the GC is using a private list. (In my patch, time to GC process XXX.000 entries is about XX seconds.) We would have to change dst_dev_event() to : - Signal to GC it has to stop as soon as possible. - Wait for GC be stoped (busy wait I suspect we cannot sleep in dst_dev_event() ? ) , giving us a full garbage_list. - Process the whole list. - Restart GC - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [ofa-general] Re: [PATCH RFC] RDMA/CMA: Allocate PS_TCP ports from the host TCP port space.
On Wed, 2007-08-15 at 22:26 -0400, Jeff Garzik wrote: [...snip...] I think removing the RDMA stack is the wrong thing to do, and you shouldn't just threaten to yank entire subsystems because you don't like the technology. Lets keep this constructive, can we? RDMA should get the respect of any other technology in Linux. Maybe its a niche in your opinion, but come on, there's more RDMA users than say, the sparc64 port. Eh? It's not about being a niche. It's about creating a maintainable software net stack that has predictable behavior. Isn't RDMA _part_ of the software net stack within Linux? Why isn't making RDMA stable, supportable and maintainable equally as important as any other subsystem? Needing to reach out of the RDMA sandbox and reserve net stack resources away from itself travels a path we've consistently avoided. I will NACK any patch that opens up sockets to eat up ports or anything stupid like that. Got it. Ditto for me as well. Jeff - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] use correct array index. (array is just 6 bytes long)
From: Marcus Meissner [EMAIL PROTECTED] Use correct array index (goes from 0-6 instead of 10-16). Signed-Off-By: Marcus Meissner [EMAIL PROTECTED] --- drivers/net/tokenring/3c359.c |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/drivers/net/tokenring/3c359.c b/drivers/net/tokenring/3c359.c index 9f1b6ab..5e2437e 100644 --- a/drivers/net/tokenring/3c359.c +++ b/drivers/net/tokenring/3c359.c @@ -763,7 +763,7 @@ static int xl_open_hw(struct net_device if (xl_priv-xl_laa[0]) { /* If using a LAA address */ for (i=10;i16;i++) { writel( (MEM_BYTE_WRITE | 0xD | xl_priv-srb) + i, xl_mmio + MMIO_MAC_ACCESS_CMD) ; - writeb(xl_priv-xl_laa[i],xl_mmio + MMIO_MACDATA) ; + writeb(xl_priv-xl_laa[i-10],xl_mmio + MMIO_MACDATA) ; } memcpy(dev-dev_addr,xl_priv-xl_laa,dev-addr_len) ; } else { /* Regular hardware address */ -- 1.4.3.4 - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: drivers/infiniband/mlx/mad.c misplaced ;
Kok, Auke wrote: Joe Perches wrote: On Wed, 2007-08-15 at 19:58 -0400, Dave Jones wrote: Signed-off-by: Dave Jones [EMAIL PROTECTED] diff --git a/drivers/infiniband/hw/mlx4/mad.c b/drivers/infiniband/hw/mlx4/mad.c index 3330917..0ed02b7 100644 --- a/drivers/infiniband/hw/mlx4/mad.c +++ b/drivers/infiniband/hw/mlx4/mad.c @@ -109,7 +109,7 @@ int mlx4_MAD_IFC(struct mlx4_ib_dev *dev, int ignore_mkey, int ignore_bkey, in_modifier, op_modifier, MLX4_CMD_MAD_IFC, MLX4_CMD_TIME_CLASS_C); -if (!err); +if (!err) There's more than a few of these (not inspected). $ egrep -r --include=*.c \bif[[:space:]]*\([^\)]*\)[[:space:]]*\; * arch/sh/boards/se/7343/io.c:if (0) ; drivers/atm/iphase.c: if (!desc1) ; drivers/infiniband/hw/mlx4/mad.c: if (!err); drivers/isdn/capi/capiutil.c: else if (c = 0x0f); drivers/net/tokenring/ibmtr.c: else if (ti-shared_ram_paging == 0xf); /* No paging in adapter */ drivers/s390/scsi/zfcp_erp.c: if (status == ZFCP_ERP_SUCCEEDED) ; /* no further action */ fs/hostfs/hostfs_user.c:if(attrs-ia_valid HOSTFS_ATTR_CTIME) ; net/netfilter/xt_u32.c: if (skb-len 4 || pos skb-len - 4); sound/pci/au88x0/au88x0_synth.c:if (eax == 0) ; sounds like an excellent candidate check for checkpatch.pl !!! Yep. My scan of 2.6.23-rc3 with a checkpatch with this new test added, found 6 which seemed wrong. -apw - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: PROBLEM: 2.6.23-rc NETDEV WATCHDOG: eth0: transmit timed out
(please do not remove the netdev Cc:) Francois Romieu [EMAIL PROTECTED] : [...] If it does not work I'll dissect 0e4851502f846b13b29b7f88f1250c980d57e944 tomorrow. You will find a tgz archive in attachment which contains a serie of patches (0001-... to 0005-...) to walk from 6dccd16b7c2703e8bbf8bca62b5cf248332afbe2 to 0e4851502f846b13b29b7f88f1250c980d57e944 in smaller steps. Please apply 0001 on top of 6dccd16b7c2703e8bbf8bca62b5cf248332afbe2. If it still works, apply 0002 on top of 0001, etc. -- Ueimor r8169-meyer.tgz Description: GNU Zip compressed data
Re: [RFC] net/core/dst.c : Should'nt dst_run_gc() be more scalable and friendly ?
Eric Dumazet [EMAIL PROTECTED] wrote: Yes, I already did this (with the current softirq based timer model), but how can dst_dev_event() do its work, since the GC is using a private list. (In my patch, time to GC process XXX.000 entries is about XX seconds.) We would have to change dst_dev_event() to : - Signal to GC it has to stop as soon as possible. - Wait for GC be stoped (busy wait I suspect we cannot sleep in dst_dev_event() ? ) You can sleep in dst_dev_event so just use a mutex to separate it from the GC. Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmVHI~} [EMAIL PROTECTED] Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] lockdep: annotate rcu_read_{,un}lock()
There seem to be some unbalanced rcu_read_{,un}lock() issues of late, how about doing something like this: Signed-off-by: Peter Zijlstra [EMAIL PROTECTED] --- include/linux/lockdep.h |7 +++ include/linux/rcupdate.h | 12 kernel/rcupdate.c|8 3 files changed, 27 insertions(+) Index: linux-2.6/include/linux/lockdep.h === --- linux-2.6.orig/include/linux/lockdep.h +++ linux-2.6/include/linux/lockdep.h @@ -252,6 +252,13 @@ extern void lockdep_init_map(struct lock struct lock_class_key *key, int subclass); /* + * To initialize a lockdep_map statically use this macro. + * Note that _name must not be NULL. + */ +#define STATIC_LOCKDEP_MAP_INIT(_name, _key) \ + { .name = (_name), .key = (void *)(_key), } + +/* * Reinitialize a lock key - for cases where there is special locking or * special initialization of locks so that the validator gets the scope * of dependencies wrong: they are either too broad (they need a class-split) Index: linux-2.6/include/linux/rcupdate.h === --- linux-2.6.orig/include/linux/rcupdate.h +++ linux-2.6/include/linux/rcupdate.h @@ -41,6 +41,7 @@ #include linux/percpu.h #include linux/cpumask.h #include linux/seqlock.h +#include linux/lockdep.h /** * struct rcu_head - callback structure for use with RCU @@ -133,6 +134,15 @@ static inline void rcu_bh_qsctr_inc(int extern int rcu_pending(int cpu); extern int rcu_needs_cpu(int cpu); +#ifdef CONFIG_DEBUG_LOCK_ALLOC +extern struct lockdep_map rcu_lock_map; +# define rcu_read_acquire()lock_acquire(rcu_lock_map, 0, 0, 2, 1, _THIS_IP_) +# define rcu_read_release()lock_release(rcu_lock_map, 1, _THIS_IP_) +#else +# define rcu_read_acquire()do { } while (0) +# define rcu_read_release()do { } while (0) +#endif + /** * rcu_read_lock - mark the beginning of an RCU read-side critical section. * @@ -166,6 +176,7 @@ extern int rcu_needs_cpu(int cpu); do { \ preempt_disable(); \ __acquire(RCU); \ + rcu_read_acquire(); \ } while(0) /** @@ -175,6 +186,7 @@ extern int rcu_needs_cpu(int cpu); */ #define rcu_read_unlock() \ do { \ + rcu_read_release(); \ __release(RCU); \ preempt_enable(); \ } while(0) Index: linux-2.6/kernel/rcupdate.c === --- linux-2.6.orig/kernel/rcupdate.c +++ linux-2.6/kernel/rcupdate.c @@ -49,6 +49,14 @@ #include linux/cpu.h #include linux/mutex.h +#ifdef CONFIG_DEBUG_LOCK_ALLOC +static struct lock_class_key rcu_lock_key; +struct lockdep_map rcu_lock_map = + STATIC_LOCKDEP_MAP_INIT(rcu_read_lock, rcu_lock_key); + +EXPORT_SYMBOL_GPL(rcu_lock_map); +#endif + /* Definition for rcupdate control block. */ static struct rcu_ctrlblk rcu_ctrlblk = { .cur = -300, - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
On Thu, 16 Aug 2007, Herbert Xu wrote: We've been through that already. If it's a busy-wait it should use cpu_relax. I looked around a bit by using some command lines and ended up wondering if these are equal to busy-wait case (and should be fixed) or not: ./drivers/telephony/ixj.c 6674: while (atomic_read(j-DSPWrite) 0) 6675- atomic_dec(j-DSPWrite); ...besides that, there are couple of more similar cases in the same file (with braces)... -- i. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] nl80211: remove VLAN type
On Wed, 2007-08-15 at 11:18 +0200, Johannes Berg wrote: There is no point in trying to handle source MAC address based VLANs in the wireless stack, something like smacvlan modeled after macvlan should be sufficient. Whoops. I only forgot the use with encryption. Please consider this patch withdrawn, at least until I see clearer through that crypto issue. johannes signature.asc Description: This is a digitally signed message part
Re: drivers/infiniband/mlx/mad.c misplaced ;
On Wed, Aug 15, 2007 at 05:40:11PM -0700, Joe Perches wrote: fs/hostfs/hostfs_user.c:if(attrs-ia_valid HOSTFS_ATTR_CTIME) ; This one can be deleted, but I think I did it for documentation reasons, to make it clear that ctime handling wasn't left out by mistake. Jeff -- Work email - jdike at linux dot intel dot com - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC] net/core/dst.c : Should'nt dst_run_gc() be more scalable and friendly ?
On Thu, 16 Aug 2007 21:59:43 +0800 Herbert Xu [EMAIL PROTECTED] wrote: Eric Dumazet [EMAIL PROTECTED] wrote: Yes, I already did this (with the current softirq based timer model), but how can dst_dev_event() do its work, since the GC is using a private list. (In my patch, time to GC process XXX.000 entries is about XX seconds.) We would have to change dst_dev_event() to : - Signal to GC it has to stop as soon as possible. - Wait for GC be stoped (busy wait I suspect we cannot sleep in dst_dev_event() ? ) You can sleep in dst_dev_event so just use a mutex to separate it from the GC. Thanks for the suggestion, I added dst_mutex and attempt a mutex_trylock() inside dst_run_gc(). So do you think this patch is enough or should we convert dst_run_gc processing from softirq to workqueue too ? Eric [PATCH] net/core/dst.c : let dst_run_gc() do its job incrementally When the periodic route cache flush is done, we can feed dst_garbage_list with a huge number of dst_entry structs. Then dst_run_gc() is doing a scan of a possibly huge list of dst_entries, eventually freeing some (less than 1%) of them, while holding the dst_lock spinlock for the whole scan. This sometimes triggers a soft lockup. Then it rearms a timer to redo the full thing 1/10 s later. The slowdown can last one minute or so. No need to say that machine is not really usable in the meantime. This patch attempts to solve the problem giving dst_run_gc() the ability to perform its work by chunks instead of whole list scan. We limit a chunk by the number of entries that could not be freed in a run. This should limit the CPU cache trashing (currently 128 bytes per entry on x86_64), yet giving a chance to free unreferenced entries (not included in the quota) if the load is really high. Signed-off-by: Eric Dumazet [EMAIL PROTECTED] --- linux-2.6.22/net/core/dst.c.orig +++ linux-2.6.22/net/core/dst.c @@ -9,6 +9,7 @@ #include linux/errno.h #include linux/init.h #include linux/kernel.h +#include linux/mutex.h #include linux/mm.h #include linux/module.h #include linux/netdevice.h @@ -28,10 +29,17 @@ * 4) All operations modify state, so a spinlock is used. */ static struct dst_entry*dst_garbage_list; +static struct dst_entry *dst_garbage_wrk; +static struct dst_entry *dst_garbage_inuse; +static int dst_gc_running; #if RT_CACHE_DEBUG = 2 static atomic_t dst_total = ATOMIC_INIT(0); #endif static DEFINE_SPINLOCK(dst_lock); +/* + * A mutex is used to synchronize GC with dst_dev_event() + */ +static DEFINE_MUTEX(dst_mutex); static unsigned long dst_gc_timer_expires; static unsigned long dst_gc_timer_inc = DST_GC_MAX; @@ -42,26 +50,39 @@ static DEFINE_TIMER(dst_gc_timer, dst_ru static void dst_run_gc(unsigned long dummy) { - intdelayed = 0; - intwork_performed; - struct dst_entry * dst, **dstp; + intquota = 1000; + intwork_performed = 0; + struct dst_entry *dst, *next; + struct dst_entry *first = NULL, *last = NULL; - if (!spin_trylock(dst_lock)) { + if (!mutex_trylock(dst_mutex)) { mod_timer(dst_gc_timer, jiffies + HZ/10); return; } + dst_gc_running = 1; + spin_lock(dst_lock); del_timer(dst_gc_timer); - dstp = dst_garbage_list; - work_performed = 0; - while ((dst = *dstp) != NULL) { - if (atomic_read(dst-__refcnt)) { - dstp = dst-next; - delayed++; + if (!dst_garbage_wrk) { + dst_garbage_wrk = dst_garbage_list; + dst_garbage_list = dst_garbage_inuse; + dst_garbage_inuse = NULL; + } + next = dst_garbage_wrk; + spin_unlock(dst_lock); + + while ((dst = next) != NULL) { + next = dst-next; + if (likely(atomic_read(dst-__refcnt))) { + if (unlikely(last == NULL)) + last = dst; + dst-next = first; + first = dst; + if (--quota == 0) + break; continue; } - *dstp = dst-next; - work_performed = 1; + work_performed++; dst = dst_destroy(dst); if (dst) { @@ -77,16 +98,26 @@ static void dst_run_gc(unsigned long dum continue; ___dst_free(dst); - dst-next = *dstp; - *dstp = dst; - dstp = dst-next; + dst-next = next; + next = dst; } } - if (!dst_garbage_list) { + dst_garbage_wrk = next; + + spin_lock(dst_lock); + dst_gc_running = 0; + if (last != NULL) { + last-next =
Re: [patch 2/4] Update e100 driver to use devres.
On 20:34 Thu 16 Aug 2007, Tejun Heo wrote: Brandon Philips wrote: @@ -2554,8 +2547,10 @@ static int __devinit e100_probe(struct p struct net_device *netdev; struct nic *nic; int err; + void __iomem **iomap; + int bar; - if(!(netdev = alloc_etherdev(sizeof(struct nic { + if (!(netdev = devm_alloc_etherdev(pdev-dev, sizeof(struct nic { if(((1 debug) - 1) NETIF_MSG_PROBE) printk(KERN_ERR PFX Etherdev alloc failed, abort.\n); return -ENOMEM; @@ -2585,26 +2580,28 @@ static int __devinit e100_probe(struct p nic-msg_enable = (1 debug) - 1; pci_set_drvdata(pdev, netdev); - if((err = pci_enable_device(pdev))) { + if ((err = pcim_enable_device(pdev))) { DPRINTK(PROBE, ERR, Cannot enable PCI device, aborting.\n); - goto err_out_free_dev; + return err; } if(!(pci_resource_flags(pdev, 0) IORESOURCE_MEM)) { DPRINTK(PROBE, ERR, Cannot find proper PCI device base address, aborting.\n); err = -ENODEV; - goto err_out_disable_pdev; + return err; } - if((err = pci_request_regions(pdev, DRV_NAME))) { + bar = use_io ? 1 : 0; + if((err = pcim_iomap_regions(pdev, 1 bar, DRV_NAME))) { DPRINTK(PROBE, ERR, Cannot obtain PCI resources, aborting.\n); - goto err_out_disable_pdev; + return err; } + iomap = pcim_iomap_table(pdev)[bar]; Type mismatch. Didn't compiler warn about it? s/iomap/nic-csr/ if((err = pci_set_dma_mask(pdev, DMA_32BIT_MASK))) { DPRINTK(PROBE, ERR, No usable DMA configuration, aborting.\n); - goto err_out_free_res; + return err; } SET_MODULE_OWNER(netdev); @@ -2613,12 +2610,6 @@ static int __devinit e100_probe(struct p if (use_io) DPRINTK(PROBE, INFO, using i/o access mode\n); - nic-csr = pci_iomap(pdev, (use_io ? 1 : 0), sizeof(struct csr)); - if(!nic-csr) { - DPRINTK(PROBE, ERR, Cannot map device registers, aborting.\n); - err = -ENOMEM; - goto err_out_free_res; - } nic-csr initialization seems missing. Have you tested the patched code? No I didn't test it; my e100 box gave up on me. I shouldn't have sent this patch since I wasn't able to get it tested. Thanks, Brandon - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [GENETLINK]: Question: global lock (genl_mutex) possible refinement?
* Richard MUSIL [EMAIL PROTECTED] 2007-07-24 13:09 Thomas Graf wrote: Please provide a new overall patch which is not based on your initial patch so I can review your idea properly. Here it goes (merging two previous patches). I have diffed against v2.6.22, which I am using currently as my base: Sorry for taking so long. @@ -150,9 +176,9 @@ int genl_register_ops(struct genl_family *family, struct genl_ops *ops) if (ops-policy) ops-flags |= GENL_CMD_CAP_HASPOL; - genl_lock(); + genl_fam_lock(family); list_add_tail(ops-ops_list, family-ops_list); - genl_unlock(); + genl_fam_unlock(family); For registering operations, it is sufficient to just acquire the family lock, the family itself can't disappear while holding it. @@ -216,8 +242,9 @@ int genl_register_family(struct genl_family *family) goto errout; INIT_LIST_HEAD(family-ops_list); + mutex_init(family-lock); - genl_lock(); + genl_fam_lock(family); if (genl_family_find_byname(family-name)) { err = -EEXIST; @@ -251,14 +278,14 @@ int genl_register_family(struct genl_family *family) family-attrbuf = NULL; list_add_tail(family-family_list, genl_family_chain(family-id)); - genl_unlock(); + genl_fam_unlock(family); This looks good. @@ -303,38 +332,57 @@ static int genl_rcv_msg(struct sk_buff *skb, struct nlmsghdr *nlh) struct genlmsghdr *hdr = nlmsg_data(nlh); int hdrlen, err; + genl_fam_lock(NULL); family = genl_family_find_byid(nlh-nlmsg_type); - if (family == NULL) + if (family == NULL) { + genl_fam_unlock(NULL); return -ENOENT; + } + + /* get particular family lock, but release global family lock + * so registering operations for other families are possible */ + genl_onefam_lock(family); + genl_fam_unlock(NULL); I don't like having two locks for something as trivial as this. Basically the only reason the global lock is required here is to protect from family removal which can be avoided otherwise by using RCU list operations. Therefore, I'd propose the following lock semantics: Use own global mutex to protect writing to the family list, make reading side lockless using rcu for use when looking up family upon mesage processing. Use a family lock to protect writing to operations list and serialize messae processing with unregister operations. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] lockdep: annotate rcu_read_{,un}lock()
On Thu, Aug 16, 2007 at 04:25:07PM +0200, Peter Zijlstra wrote: There seem to be some unbalanced rcu_read_{,un}lock() issues of late, how about doing something like this: This will break when rcu_read_lock() and rcu_read_unlock() are invoked from NMI/SMI handlers -- the raw_local_irq_save() in lock_acquire() will not mask NMIs or SMIs. One approach would be to check for being in an NMI/SMI handler, and to avoid calling lock_acquire() and lock_release() in those cases. Another approach would be to use sparse, which has checks for rcu_read_lock()/rcu_read_unlock() nesting. Thanx, Paul Signed-off-by: Peter Zijlstra [EMAIL PROTECTED] --- include/linux/lockdep.h |7 +++ include/linux/rcupdate.h | 12 kernel/rcupdate.c|8 3 files changed, 27 insertions(+) Index: linux-2.6/include/linux/lockdep.h === --- linux-2.6.orig/include/linux/lockdep.h +++ linux-2.6/include/linux/lockdep.h @@ -252,6 +252,13 @@ extern void lockdep_init_map(struct lock struct lock_class_key *key, int subclass); /* + * To initialize a lockdep_map statically use this macro. + * Note that _name must not be NULL. + */ +#define STATIC_LOCKDEP_MAP_INIT(_name, _key) \ + { .name = (_name), .key = (void *)(_key), } + +/* * Reinitialize a lock key - for cases where there is special locking or * special initialization of locks so that the validator gets the scope * of dependencies wrong: they are either too broad (they need a class-split) Index: linux-2.6/include/linux/rcupdate.h === --- linux-2.6.orig/include/linux/rcupdate.h +++ linux-2.6/include/linux/rcupdate.h @@ -41,6 +41,7 @@ #include linux/percpu.h #include linux/cpumask.h #include linux/seqlock.h +#include linux/lockdep.h /** * struct rcu_head - callback structure for use with RCU @@ -133,6 +134,15 @@ static inline void rcu_bh_qsctr_inc(int extern int rcu_pending(int cpu); extern int rcu_needs_cpu(int cpu); +#ifdef CONFIG_DEBUG_LOCK_ALLOC +extern struct lockdep_map rcu_lock_map; +# define rcu_read_acquire() lock_acquire(rcu_lock_map, 0, 0, 2, 1, _THIS_IP_) +# define rcu_read_release() lock_release(rcu_lock_map, 1, _THIS_IP_) +#else +# define rcu_read_acquire() do { } while (0) +# define rcu_read_release() do { } while (0) +#endif + /** * rcu_read_lock - mark the beginning of an RCU read-side critical section. * @@ -166,6 +176,7 @@ extern int rcu_needs_cpu(int cpu); do { \ preempt_disable(); \ __acquire(RCU); \ + rcu_read_acquire(); \ } while(0) /** @@ -175,6 +186,7 @@ extern int rcu_needs_cpu(int cpu); */ #define rcu_read_unlock() \ do { \ + rcu_read_release(); \ __release(RCU); \ preempt_enable(); \ } while(0) Index: linux-2.6/kernel/rcupdate.c === --- linux-2.6.orig/kernel/rcupdate.c +++ linux-2.6/kernel/rcupdate.c @@ -49,6 +49,14 @@ #include linux/cpu.h #include linux/mutex.h +#ifdef CONFIG_DEBUG_LOCK_ALLOC +static struct lock_class_key rcu_lock_key; +struct lockdep_map rcu_lock_map = + STATIC_LOCKDEP_MAP_INIT(rcu_read_lock, rcu_lock_key); + +EXPORT_SYMBOL_GPL(rcu_lock_map); +#endif + /* Definition for rcupdate control block. */ static struct rcu_ctrlblk rcu_ctrlblk = { .cur = -300, - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
Ilpo Järvinen wrote: I looked around a bit by using some command lines and ended up wondering if these are equal to busy-wait case (and should be fixed) or not: ./drivers/telephony/ixj.c 6674: while (atomic_read(j-DSPWrite) 0) 6675- atomic_dec(j-DSPWrite); ...besides that, there are couple of more similar cases in the same file (with braces)... Generally, ixj.c has several occurrences of couples of atomic write and atomic read which potentially do not do what the author wanted. -- Stefan Richter -=-=-=== =--- = http://arcgraph.de/sr/ - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: drivers/infiniband/mlx/mad.c misplaced ;
On Wed, Aug 15, 2007 at 05:40:11PM -0700, Joe Perches wrote: $ egrep -r --include=*.c \bif[[:space:]]*\([^\)]*\)[[:space:]]*\; * arch/sh/boards/se/7343/io.c:if (0) ; drivers/atm/iphase.c: if (!desc1) ; drivers/infiniband/hw/mlx4/mad.c: if (!err); drivers/isdn/capi/capiutil.c: else if (c = 0x0f); drivers/net/tokenring/ibmtr.c: else if (ti-shared_ram_paging == 0xf); /* No paging in adapter */ drivers/s390/scsi/zfcp_erp.c: if (status == ZFCP_ERP_SUCCEEDED) ; /* no further action */ fs/hostfs/hostfs_user.c:if(attrs-ia_valid HOSTFS_ATTR_CTIME) ; net/netfilter/xt_u32.c: if (skb-len 4 || pos skb-len - 4); sound/pci/au88x0/au88x0_synth.c:if (eax == 0) ; On Thu, Aug 16, 2007 at 02:18:40PM +0100, Andy Whitcroft wrote: A couple of people suggested adding checks to checkpatch for trailing semicolons on conditionals, where the conditional block may not be actually conditional: if (err); return err; While regression testing the changes, I ran these checks across the whole of 2.6.23-rc3 and there appear to be 5 places where this is occurs (above and beyond the IPv6 one which triggered this effort) and a benign use which could be confused later which it seems safest to fix. Following this email are 6 patches for these issues, relevant maintainers cc'd. All against 2.6.23-rc3 It looks like you may want to refine your search parameters to match the above, as there are at least a few cases where the space exists. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
On Thu, Aug 16, 2007 at 06:42:50PM +0800, Herbert Xu wrote: On Thu, Aug 16, 2007 at 12:31:03PM +0200, Stefan Richter wrote: PS: Just to clarify, I'm not speaking for the volatile modifier. I'm not speaking for any particular implementation of atomic_t and its accessors at all. All I am saying is that - we use atomically accessed data types because we concurrently but locklessly access this data, - hence a read access to this data that could be optimized away makes *no sense at all*. No sane compiler can optimise away an atomic_read per se. That's only possible if there's a preceding atomic_set or atomic_read, with no barriers in the middle. If that's the case, then one has to conclude that doing away with the second read is acceptable, as otherwise a memory (or at least a compiler) barrier should have been used. The compiler can also reorder non-volatile accesses. For an example patch that cares about this, please see: http://lkml.org/lkml/2007/8/7/280 This patch uses an ORDERED_WRT_IRQ() in rcu_read_lock() and rcu_read_unlock() to ensure that accesses aren't reordered with respect to interrupt handlers and NMIs/SMIs running on that same CPU. In fact, volatile doesn't guarantee that the memory gets read anyway. You might be reading some stale value out of the cache. Granted this doesn't happen on x86 but when you're coding for the kernel you can't make such assumptions. So the point here is that if you don't mind getting a stale value from the CPU cache when doing an atomic_read, then surely you won't mind getting a stale value from the compiler cache. Absolutely disagree. An interrupt/NMI/SMI handler running on the CPU will see the same value (whether in cache or in store buffer) that the mainline code will see. In this case, we don't care about CPU misordering, only about compiler misordering. It is easy to see other uses that combine communication with handlers on the current CPU with communication among CPUs -- again, see prior messages in this thread. So, the architecture guys can implement atomic_read however they want --- as long as it cannot be optimized away.* They can implement it however they want as long as it stays atomic. Precisely. And volatility is a key property of atomic. Let's please not throw it away. Thanx, Paul - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
lockdep report in the bonding code.
A Fedora users reported this against our 2.6.23-rc3 build Dave NET: Registered protocol family 10 lo: Disabled Privacy Extensions Ethernet Channel Bonding Driver: v3.1.3 (June 13, 2007) bonding: MII link monitoring set to 100 ms ADDRCONF(NETDEV_UP): bond0: link is not ready bonding: bond0: Adding slave eth0. e100: eth0: e100_watchdog: link up, 100Mbps, half-duplex bonding: bond0: making interface eth0 the new active one. bonding: bond0: enslaving eth0 as an active interface with an up link. ADDRCONF(NETDEV_CHANGE): bond0: link becomes ready = [ INFO: inconsistent lock state ] 2.6.23-0.104.rc3.fc8 #1 - inconsistent {softirq-on-W} - {in-softirq-W} usage. events/0/9 [HC0[0]:SC1[2]:HE1:SE0] takes: ((bond_info-tx_hashtbl_lock)){-+..}, at: [f8ae4cc2] bond_alb_xmit+0x26a/0x42c [bonding] {softirq-on-W} state was registered at: [c044a6e4] __lock_acquire+0x4ff/0xc67 [c044b2c6] lock_acquire+0x7b/0x9e [c062ecc8] _spin_lock+0x2e/0x58 [f8ae492e] bond_alb_initialize+0x64/0x18e [bonding] [f8ae1256] bond_open+0x33/0x175 [bonding] [c05ca8aa] dev_open+0x31/0x6c [c05c8a01] dev_change_flags+0xa3/0x156 [c0609231] devinet_ioctl+0x207/0x50e [c06098df] inet_ioctl+0x86/0xa4 [c05bec32] sock_ioctl+0x1ac/0x1c9 [c0490b76] do_ioctl+0x22/0x68 [c0490e05] vfs_ioctl+0x249/0x25c [c0490e61] sys_ioctl+0x49/0x64 [c040519e] syscall_call+0x7/0xb [] 0x irq event stamp: 266326 hardirqs last enabled at (266326): [c043258f] local_bh_enable_ip+0xf3/0x106 hardirqs last disabled at (266325): [c0432516] local_bh_enable_ip+0x7a/0x106 softirqs last enabled at (266306): [c05e2d20] rt_run_flush+0x6e/0x97 softirqs last disabled at (266309): [c0407548] do_softirq+0x74/0xf7 other info that might help us debug this: 3 locks held by events/0/9: #0: (rtnl_mutex){--..}, at: [c062d8a9] mutex_lock+0x21/0x24 #1: (bond-lock){-.-+}, at: [f8ae4a90] bond_alb_xmit+0x38/0x42c [bonding] #2: (bond-curr_slave_lock){..-+}, at: [f8ae4a98] bond_alb_xmit+0x40/0x42c [bonding] stack backtrace: [c04063d8] show_trace_log_lvl+0x1a/0x2f [c0406e71] show_trace+0x12/0x14 [c0406e89] dump_stack+0x16/0x18 [c0448f8a] print_usage_bug+0x141/0x14b [c0449810] mark_lock+0x12f/0x472 [c044a66c] __lock_acquire+0x487/0xc67 [c044b2c6] lock_acquire+0x7b/0x9e [c062ecc8] _spin_lock+0x2e/0x58 [f8ae4cc2] bond_alb_xmit+0x26a/0x42c [bonding] [c05c9467] dev_hard_start_xmit+0x21c/0x27a [c05cb489] dev_queue_xmit+0x1f7/0x2aa [f8b274e0] mld_sendpack+0x210/0x35a [ipv6] [f8b28535] mld_ifc_timer_expire+0x1e9/0x211 [ipv6] [c0435a5f] run_timer_softirq+0x127/0x18f [c0432708] __do_softirq+0x78/0xff [c0407548] do_softirq+0x74/0xf7 [c04325e6] irq_exit+0x44/0x46 [c041d3a3] smp_apic_timer_interrupt+0x74/0x81 [c0405cb7] apic_timer_interrupt+0x33/0x38 [c062eba6] _spin_unlock_bh+0x25/0x28 [c05e2d20] rt_run_flush+0x6e/0x97 [c05e471c] rt_cache_flush+0x7c/0xaf [c060e86f] fib_netdev_event+0x62/0x68 [c0630e6c] notifier_call_chain+0x2b/0x4a [c04391c8] __raw_notifier_call_chain+0x19/0x1e [c04391e7] raw_notifier_call_chain+0x1a/0x1c [c05c84f3] netdev_state_change+0x20/0x31 [c05d251a] __linkwatch_run_queue+0x159/0x183 [c05d2564] linkwatch_event+0x20/0x27 [c043bd9b] run_workqueue+0x7d/0x129 [c043c75b] worker_thread+0xbb/0xc8 [c043ef6f] kthread+0x3b/0x64 [c0405e5b] kernel_thread_helper+0x7/0x10 === bonding: bond0: Adding slave eth1. e100: eth1: e100_watchdog: link up, 100Mbps, half-duplex bonding: bond0: enslaving eth1 as an active interface with an up link. audit(1187170179.728:3): audit_pid=1885 old=0 by auid=4294967295 bond0: no IPv6 routers present -- http://www.codemonkey.org.uk - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch 4/4] Update e1000 driver to use devres.
On 01:38 Thu 16 Aug 2007, Waskiewicz Jr, Peter P wrote: - err = -ENOMEM; - netdev = alloc_etherdev(sizeof(struct e1000_adapter)); + netdev = devm_alloc_etherdev(pdev-dev, sizeof(struct +e1000_adapter)); if (!netdev) - goto err_alloc_etherdev; + return -ENOMEM; I'm a bit confused why you removed the goto's, and then removed all the target unwinding code at the bottom of e1000_probe(). Those labels clean up resources if something fails, like the err_sw_init label. I don't see anything in the devres code that jumps out at me that explains why we can do away with these cleanup routines. Thoughts? Have you read Documentation/driver-model/devres.txt? That has a good explanation. Here is a practical explanation on how it works too. This is the output from a normal modprobe then rmmod of e1000 with devres debugging on. # modprobe DEVRES ADD f7fd6cc0 pcim_release (8 bytes) DEVRES ADD f7a80fe0 devm_free_netdev (4 bytes) # netdev **p for free_netdev DEVRES ADD f7dbe780 pcim_iomap_release (24 bytes) # adapter-hw.hw_addr DEVRES ADD f7dbe9c0 devm_kzalloc_release (40 bytes) # adapter-tx_ring DEVRES ADD f7dbe8c0 devm_kzalloc_release (44 bytes) # adapter-rx_ring # rmmod DEVRES REL f7dbe8c0 devm_kzalloc_release (44 bytes) # adapter-tx_ring DEVRES REL f7dbe9c0 devm_kzalloc_release (40 bytes) # adapter-rx_ring DEVRES REL f7dbe780 pcim_iomap_release (24 bytes) # adapter-hw.hw_addr DEVRES REL f7a80fe0 devm_free_netdev (4 bytes) # called free_netdev DEVRES REL f7fd6cc0 pcim_release (8 bytes) Now if I insert a return -ENOMEM right after allocating tx_ring: --- a/drivers/net/e1000/e1000_main.c +++ b/drivers/net/e1000/e1000_main.c @@ -1356,6 +1356,8 @@ e1000_alloc_queues(struct e1000_adapter *adapter) { adapter-tx_ring = kcalloc(adapter-num_tx_queues, sizeof(struct e1000_tx_ring), GFP_KERNEL); + + return -ENOMEM; if (!adapter-tx_ring) return -ENOMEM; #insmod DEVRES ADD f7a80e80 pcim_release (8 bytes) DEVRES ADD f7a80ca0 devm_free_netdev (4 bytes) DEVRES ADD eb7f0080 pcim_iomap_release (24 bytes) DEVRES ADD eb7f devm_kzalloc_release (40 bytes) e1000_sw_init: Unable to allocate memory for queues DEVRES REL eb7f devm_kzalloc_release (40 bytes) DEVRES REL eb7f0080 pcim_iomap_release (24 bytes) DEVRES REL f7a80ca0 devm_free_netdev (4 bytes) DEVRES REL f7a80e80 pcim_release (8 bytes) ACPI: PCI interrupt for device :02:00.0 disabled e1000: probe of :02:00.0 failed with error -12 Since we are returning an error from probe the driver core calls devres_release_all(dev) which releases all of the resources in the right order. See really_probe() in drivers/base/dd.c. SIDE NOTE - I ran into a possible e1000 bug with the little -ENOMEM patch above both with and without the devres patches. The driver seems to leave the EEPROM in a bad state on error because I get this error after trying to insert the module again: e1000_probe: The EEPROM Checksum Is Not Valid A power cycle but not a reboot fixes it. Thanks, Brandon - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/2] [DM9000] External PHY support
On Thu, Aug 16, 2007 at 10:16:08AM +0200, Laurent Pinchart wrote: This patch adds a flag to the DM9000 platform data which, when set, configures the device to use an external PHY. Signed-off-by: Laurent Pinchart [EMAIL PROTECTED] Acked-by: Ben Dooks [EMAIL PROTECTED] --- drivers/net/dm9000.c |6 ++ include/linux/dm9000.h |1 + 2 files changed, 7 insertions(+), 0 deletions(-) diff --git a/drivers/net/dm9000.c b/drivers/net/dm9000.c index a424810..a86984e 100644 --- a/drivers/net/dm9000.c +++ b/drivers/net/dm9000.c @@ -136,6 +136,7 @@ typedef struct board_info { u16 dbug_cnt; u8 io_mode; /* 0:word, 2:byte */ u8 phy_addr; + unsigned int flags; void (*inblk)(void __iomem *port, void *data, int length); void (*outblk)(void __iomem *port, void *data, int length); @@ -528,6 +529,8 @@ dm9000_probe(struct platform_device *pdev) if (pdata-dumpblk != NULL) db-dumpblk = pdata-dumpblk; + + db-flags = pdata-flags; } dm9000_reset(db); @@ -670,6 +673,9 @@ dm9000_init_dm9000(struct net_device *dev) iow(db, DM9000_GPCR, GPCR_GEP_CNTL);/* Let GPIO0 output */ iow(db, DM9000_GPR, 0); /* Enable PHY */ + if (db-flags DM9000_PLATF_EXT_PHY) + iow(db, DM9000_NCR, NCR_EXT_PHY); + /* Program operating register */ iow(db, DM9000_TCR, 0); /* TX Polling clear */ iow(db, DM9000_BPTR, 0x3f); /* Less 3Kb, 200us */ diff --git a/include/linux/dm9000.h b/include/linux/dm9000.h index 0008e2a..ea530fd 100644 --- a/include/linux/dm9000.h +++ b/include/linux/dm9000.h @@ -19,6 +19,7 @@ #define DM9000_PLATF_8BITONLY(0x0001) #define DM9000_PLATF_16BITONLY (0x0002) #define DM9000_PLATF_32BITONLY (0x0004) +#define DM9000_PLATF_EXT_PHY (0x0008) /* platfrom data for platfrom device structure's platfrom_data field */ -- 1.5.0 -- Laurent Pinchart CSE Semaphore Belgium Chauss?e de Bruxelles, 732A B-1410 Waterloo Belgium T +32 (2) 387 42 59 F +32 (2) 387 42 75 - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html -- Ben ([EMAIL PROTECTED], http://www.fluff.org/) 'a smiley only costs 4 bytes' - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/2] [DM9000] Added support for big-endian hosts
On Thu, Aug 16, 2007 at 10:15:35AM +0200, Laurent Pinchart wrote: This patch splits the receive status in 8bit wide fields and convert the packet length from little endian to CPU byte order. Signed-off-by: Laurent Pinchart [EMAIL PROTECTED] Acked-by: Ben Dooks [EMAIL PROTECTED] --- drivers/net/dm9000.c | 13 +++-- 1 files changed, 7 insertions(+), 6 deletions(-) diff --git a/drivers/net/dm9000.c b/drivers/net/dm9000.c index c3de81b..a424810 100644 --- a/drivers/net/dm9000.c +++ b/drivers/net/dm9000.c @@ -894,7 +894,8 @@ dm9000_timer(unsigned long data) } struct dm9000_rxhdr { - u16 RxStatus; + u8 RxPktReady; + u8 RxStatus; u16 RxLen; } __attribute__((__packed__)); @@ -935,7 +936,7 @@ dm9000_rx(struct net_device *dev) (db-inblk)(db-io_data, rxhdr, sizeof(rxhdr)); - RxLen = rxhdr.RxLen; + RxLen = le16_to_cpu(rxhdr.RxLen); /* Packet Status check */ if (RxLen 0x40) { @@ -947,17 +948,17 @@ dm9000_rx(struct net_device *dev) PRINTK1(RST: RX Len:%x\n, RxLen); } - if (rxhdr.RxStatus 0xbf00) { + if (rxhdr.RxStatus 0xbf) { GoodPacket = false; - if (rxhdr.RxStatus 0x100) { + if (rxhdr.RxStatus 0x01) { PRINTK1(fifo error\n); db-stats.rx_fifo_errors++; } - if (rxhdr.RxStatus 0x200) { + if (rxhdr.RxStatus 0x02) { PRINTK1(crc error\n); db-stats.rx_crc_errors++; } - if (rxhdr.RxStatus 0x8000) { + if (rxhdr.RxStatus 0x80) { PRINTK1(length error\n); db-stats.rx_length_errors++; } -- 1.5.0 -- Laurent Pinchart CSE Semaphore Belgium Chauss?e de Bruxelles, 732A B-1410 Waterloo Belgium T +32 (2) 387 42 59 F +32 (2) 387 42 75 - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html -- Ben ([EMAIL PROTECTED], http://www.fluff.org/) 'a smiley only costs 4 bytes' - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [patch 4/4] Update e1000 driver to use devres.
Now if I insert a return -ENOMEM right after allocating tx_ring: --- a/drivers/net/e1000/e1000_main.c +++ b/drivers/net/e1000/e1000_main.c @@ -1356,6 +1356,8 @@ e1000_alloc_queues(struct e1000_adapter *adapter) { adapter-tx_ring = kcalloc(adapter-num_tx_queues, sizeof(struct e1000_tx_ring), GFP_KERNEL); + + return -ENOMEM; if (!adapter-tx_ring) return -ENOMEM; #insmod DEVRES ADD f7a80e80 pcim_release (8 bytes) DEVRES ADD f7a80ca0 devm_free_netdev (4 bytes) DEVRES ADD eb7f0080 pcim_iomap_release (24 bytes) DEVRES ADD eb7f devm_kzalloc_release (40 bytes) e1000_sw_init: Unable to allocate memory for queues DEVRES REL eb7f devm_kzalloc_release (40 bytes) DEVRES REL eb7f0080 pcim_iomap_release (24 bytes) DEVRES REL f7a80ca0 devm_free_netdev (4 bytes) DEVRES REL f7a80e80 pcim_release (8 bytes) ACPI: PCI interrupt for device :02:00.0 disabled e1000: probe of :02:00.0 failed with error -12 Since we are returning an error from probe the driver core calls devres_release_all(dev) which releases all of the resources in the right order. See really_probe() in drivers/base/dd.c. This looks fine then to me. Thanks for the explanation. -PJ - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [E1000-devel] [patch 4/4] Update e1000 driver to use devres.
Tejun Heo wrote: Brandon Philips wrote: - mmio_start = pci_resource_start(pdev, BAR_0); mmio_len = pci_resource_len(pdev, BAR_0); You don't need mmio_len either. - err = -EIO; - adapter-hw.hw_addr = ioremap(mmio_start, mmio_len); + adapter-hw.hw_addr = pcim_iomap(pdev, BAR_0, mmio_len); Passing 0 as @max_len tells pci[m]_iomap() to use pci_resource_len() of the BAR. @@ -952,16 +948,15 @@ e1000_probe(struct pci_dev *pdev, /* setup the private structure */ if ((err = e1000_sw_init(adapter))) - goto err_sw_init; + return err; err = -EIO; /* Flash BAR mapping must happen after e1000_sw_init * because it depends on mac_type */ if ((adapter-hw.mac_type == e1000_ich8lan) (pci_resource_flags(pdev, 1) IORESOURCE_MEM)) { - flash_start = pci_resource_start(pdev, 1); flash_len = pci_resource_len(pdev, 1); Ditto. - adapter-hw.flash_address = ioremap(flash_start, flash_len); + adapter-hw.flash_address = pcim_iomap(pdev, 1, flash_len); if (!adapter-hw.flash_address) goto err_flashmap; } brandon, seeing the multiple revisions I am scared that this will produce some fallout and e1000 already is quite fragile. I would suggest that you instead work against e1000e which is in a branch on jgarzik's netdev tree instead. This driver is new and it would be much more interesting to have devres used in here instead. Since this driver is in -mm as well this would give your patches some testing before it goes upstream. Let's leave e1000 alone for now if we can. Auke - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
On Thu, 16 Aug 2007, Paul Mackerras wrote: Herbert Xu writes: It doesn't matter. The memory pressure flag is an *advisory* flag. If we get it wrong the worst that'll happen is that we'd waste some time doing work that'll be thrown away. Ah, so it's the racy but I don't care because it's only an optimization case. That's fine. Somehow I find it hard to believe that all the racy uses of atomic_read in the kernel are like that, though. :) My use of atomic_read in SLUB is like that. Volatile does not magically sync up reads somehow. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
On Thu, 16 Aug 2007, Paul Mackerras wrote: The uses of atomic_read where one might want it to allow caching of the result seem to me to fall into 3 categories: 1. Places that are buggy because of a race arising from the way it's used. 2. Places where there is a race but it doesn't matter because we're doing some clever trick. 3. Places where there is some locking in place that eliminates any potential race. In case 1, adding volatile won't solve the race, of course, but it's hard to argue that we shouldn't do something because it will slow down buggy code. Case 2 is hopefully pretty rare and accompanied by large comment blocks, and in those cases caching the result of atomic_read explicitly in a local variable would probably make the code clearer. And in case 3 there is no reason to use atomic_t at all; we might as well just use an int. In 2 + 3 you may increment the atomic variable in some places. The value of the atomic variable may not matter because you only do optimizations. Checking a atomic_t for a definite state has to involve either some side conditions (lock only taken if refcount is = 0 or so) or done by changing the state (see f.e. atomic_inc_unless_zero). So I don't see any good reason to make the atomic API more complex by having volatile and non-volatile versions of atomic_read. It should just have the volatile behaviour. If you want to make it less complex then drop volatile which causes weird side effects without solving any problems as you just pointed out. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
On Thu, 16 Aug 2007, Paul Mackerras wrote: It seems that there could be a lot of places where atomic_t is used in a non-atomic fashion, and that those uses are either buggy, or there is some lock held at the time which guarantees that other CPUs aren't changing the value. In both cases there is no point in using atomic_t; we might as well just use an ordinary int. The point of atomic_t is to do atomic *changes* to the variable. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Proposed interface for per-packet mesh-ttl
On 7/30/07, Stephen Hemminger [EMAIL PROTECTED] wrote: it would need an IP ttl to mesh mapping. The fundamental thing is to try and avoid topology specific options bleeding all the way up the socket layer, especially since the network layer is involved and may need to multipath. I think the cleanest way would be to add a ll_ttl (ll for link layer) field to struct sock and a SO_LL_TTL socket option that sets both the field and a flag in sk-flags. This way it is useful for any driver that can do mesh or any other protocol that involves a ttl at link layer (not that I'm aware of any). However I guess you are not supposed to add new socket options nor modify struct socket too often so I'd appreciate feedback on whether this would be considered a good approach. -- Luis Carlos Cobo Rus GnuPG ID: 44019B60 cozybit Inc. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
Part of the motivation here is to fix heisenbugs. If I knew where they By the same token we should probably disable optimisations altogether since that too can create heisenbugs. Almost everything is a tradeoff; and so is this. I don't believe most people would find disabling all compiler optimisations an acceptable price to pay for some peace of mind. So why is this a good tradeoff? It certainly is better than disabling all compiler optimisations! I also think that just adding things to APIs in the hope it might fix up some bugs isn't really a good road to go down. Where do you stop? I look at it the other way: keeping the volatile semantics in atomic_XXX() (or adding them to it, whatever) helps _prevent_ bugs; certainly most people expect that behaviour, and also that behaviour is *needed* in some places and no other interface provides that functionality. [some confusion about barriers wrt atomics snipped] Segher - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
The only thing volatile on an asm does is create a side effect on the asm statement; in effect, it tells the compiler do not remove this asm even if you don't need any of its outputs. It's not disabling optimisation likely to result in bugs, heisen- or otherwise; _not_ putting the volatile on an asm that needs it simply _is_ a bug :-) Yep. And the reason it is a bug is that it fails to disable the relevant compiler optimizations. So I suspect that we might actually be saying the same thing here. We're not saying the same thing, but we do agree :-) Segher - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
e1000 autotuning doesn't get along with itself
Folks - I was trying to look at bonding vs discrete links and so put a couple dual-port e1000-driven NICs: 4a:01.1 Ethernet controller: Intel Corporation 82546GB Gigabit Ethernet Controller (rev 03) Subsystem: Hewlett-Packard Company HP Dual Port 1000Base-T [A9900A] into a pair of 8 core systems running a 2.6.22.2 kernel. This gave me: hpcpc109:~/netperf2_trunk# ethtool -i eth2 driver: e1000 version: 7.3.20-k2-NAPI firmware-version: N/A bus-info: :49:02.0 for the e1000 driver. I connected the two systems back-to-back and started running some tests. In the course of trying to look at something else (verifying the results reported by bwm-ng) I enabled demo-mode in netperf (./configure --enable-demo) and noticed a considerable oscillation. I undid the bond and repeated the experiment with a discrete NIC: hpcpc109:~/netperf2_trunk# src/netperf -t TCP_RR -H 192.168.2.105 -D 1.0 -l 15 TCP REQUEST/RESPONSE TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.2.105 (192.168.2.105) port 0 AF_INET : demo : first burst 0 Interim result: 10014.93 Trans/s over 1.00 seconds Interim result: 10015.79 Trans/s over 1.00 seconds Interim result: 10014.30 Trans/s over 1.00 seconds Interim result: 10016.29 Trans/s over 1.00 seconds Interim result: 10085.80 Trans/s over 1.00 seconds Interim result: 17526.61 Trans/s over 1.00 seconds Interim result: 20007.60 Trans/s over 1.00 seconds Interim result: 19626.46 Trans/s over 1.02 seconds Interim result: 10616.44 Trans/s over 1.85 seconds Interim result: 10014.88 Trans/s over 1.06 seconds Interim result: 10015.79 Trans/s over 1.00 seconds Interim result: 10014.80 Trans/s over 1.00 seconds Interim result: 10035.30 Trans/s over 1.00 seconds Interim result: 13974.69 Trans/s over 1.00 seconds Local /Remote Socket Size Request Resp. Elapsed Trans. Send Recv Size SizeTime Rate bytes Bytes bytesbytes secs.per sec 16384 87380 11 15.0012225.77 16384 87380 On a slightly informed whim I tried disabling the interrupt thottle on both sides (modprobe e1000 InterruptThrottleRate=0,0,0,0,0,0,0,0) and re-ran: hpcpc109:~/netperf2_trunk# src/netperf -t TCP_RR -H 192.168.2.105 -D 1.0 -l 15TCP REQUEST/RESPONSE TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.2.105 (192.168.2.105) port 0 AF_INET : demo : first burst 0 Interim result: 18673.68 Trans/s over 1.00 seconds Interim result: 18685.01 Trans/s over 1.00 seconds Interim result: 18682.30 Trans/s over 1.00 seconds Interim result: 18681.05 Trans/s over 1.00 seconds Interim result: 18680.25 Trans/s over 1.00 seconds Interim result: 18742.44 Trans/s over 1.00 seconds Interim result: 18739.45 Trans/s over 1.00 seconds Interim result: 18723.52 Trans/s over 1.00 seconds Interim result: 18736.53 Trans/s over 1.00 seconds Interim result: 18737.61 Trans/s over 1.00 seconds Interim result: 18744.76 Trans/s over 1.00 seconds Interim result: 18728.54 Trans/s over 1.00 seconds Interim result: 18738.91 Trans/s over 1.00 seconds Interim result: 18735.53 Trans/s over 1.00 seconds Interim result: 18741.03 Trans/s over 1.00 seconds Local /Remote Socket Size Request Resp. Elapsed Trans. Send Recv Size SizeTime Rate bytes Bytes bytesbytes secs.per sec 16384 87380 11 15.0018717.94 16384 87380 and then just for grins I tried just disabling it on one side, leaving the other at defaults: hpcpc109:~/netperf2_trunk# src/netperf -t TCP_RR -H 192.168.2.105 -D 1.0 -l 15TCP REQUEST/RESPONSE TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.2.105 (192.168.2.105) port 0 AF_INET : demo : first burst 0 Interim result: 19980.84 Trans/s over 1.00 seconds Interim result: 19997.60 Trans/s over 1.00 seconds Interim result: 19995.60 Trans/s over 1.00 seconds Interim result: 20002.60 Trans/s over 1.00 seconds Interim result: 20011.58 Trans/s over 1.00 seconds Interim result: 19985.66 Trans/s over 1.00 seconds Interim result: 20002.60 Trans/s over 1.00 seconds Interim result: 20010.58 Trans/s over 1.00 seconds Interim result: 20012.60 Trans/s over 1.00 seconds Interim result: 19993.63 Trans/s over 1.00 seconds Interim result: 19979.63 Trans/s over 1.00 seconds Interim result: 19991.58 Trans/s over 1.00 seconds Interim result: 20011.60 Trans/s over 1.00 seconds Interim result: 19948.84 Trans/s over 1.00 seconds Local /Remote Socket Size Request Resp. Elapsed Trans. Send Recv Size SizeTime Rate bytes Bytes bytesbytes secs.per sec 16384 87380 11 15.0019990.14 16384 87380 It looks like the e1000 interrupt throttle autotuning works very nicely when the other side isn't doing any, but if the other side is also trying to autotune it doesn't seem to stablize. At least not during a netperf TCP_RR test. Does anyone else see this? To try to eliminate netperf demo mode I re-ran without it and got the same end results. rick jones - To unsubscribe from this list: send the
Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
I'd go so far as to say that anywhere where you want a non-volatile atomic_read, either your code is buggy, or else an int would work just as well. Even, the only way to implement a non-volatile atomic_read() is essentially as a plain int (you can do some tricks so you cannot assign to the result and stuff like that, but that's not the issue here). So if that would be the behaviour we wanted, just get rid of that whole atomic_read() thing, so no one can misuse it anymore. Segher - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
Ilpo Järvinen wrote: On Thu, 16 Aug 2007, Herbert Xu wrote: We've been through that already. If it's a busy-wait it should use cpu_relax. I looked around a bit by using some command lines and ended up wondering if these are equal to busy-wait case (and should be fixed) or not: ./drivers/telephony/ixj.c 6674: while (atomic_read(j-DSPWrite) 0) 6675- atomic_dec(j-DSPWrite); ...besides that, there are couple of more similar cases in the same file (with braces)... atomic_dec() already has volatile behavior everywhere, so this is semantically okay, but this code (and any like it) should be calling cpu_relax() each iteration through the loop, unless there's a compelling reason not to. I'll allow that for some hardware drivers (possibly this one) such a compelling reason may exist, but hardware-independent core subsystems probably have no excuse. If the maintainer of this code doesn't see a compelling reason to add cpu_relax() in this loop, then it should be patched. -- Chris - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
Herbert Xu wrote: On Thu, Aug 16, 2007 at 10:06:31AM +0200, Stefan Richter wrote: Do you (or anyone else for that matter) have an example of this? The only code I somewhat know, the ieee1394 subsystem, was perhaps authored and is currently maintained with the expectation that each occurrence of atomic_read actually results in a load operation, i.e. is not optimized away. This means all atomic_t (bus generation, packet and buffer refcounts, and some other state variables)* and likewise all atomic bitops in that subsystem. Can you find an actual atomic_read code snippet there that is broken without the volatile modifier? A whole bunch of atomic_read uses will be broken without the volatile modifier once we start removing barriers that aren't needed if volatile behavior is guaranteed. barrier() clobbers all your registers. volatile atomic_read() only clobbers one register, and more often than not it's a register you wanted to clobber anyway. -- Chris - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
Ilpo Järvinen wrote: On Thu, 16 Aug 2007, Herbert Xu wrote: We've been through that already. If it's a busy-wait it should use cpu_relax. I looked around a bit by using some command lines and ended up wondering if these are equal to busy-wait case (and should be fixed) or not: ./drivers/telephony/ixj.c 6674: while (atomic_read(j-DSPWrite) 0) 6675- atomic_dec(j-DSPWrite); ...besides that, there are couple of more similar cases in the same file (with braces)... atomic_dec() already has volatile behavior everywhere, so this is semantically okay, but this code (and any like it) should be calling cpu_relax() each iteration through the loop, unless there's a compelling reason not to. I'll allow that for some hardware drivers (possibly this one) such a compelling reason may exist, but hardware-independent core subsystems probably have no excuse. If the maintainer of this code doesn't see a compelling reason not to add cpu_relax() in this loop, then it should be patched. -- Chris - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
On Thu, Aug 16, 2007 at 11:54:54AM -0700, Christoph Lameter wrote: On Thu, 16 Aug 2007, Paul Mackerras wrote: So I don't see any good reason to make the atomic API more complex by having volatile and non-volatile versions of atomic_read. It should just have the volatile behaviour. If you want to make it less complex then drop volatile which causes weird side effects without solving any problems as you just pointed out. The other set of problems are communication between process context and interrupt/NMI handlers. Volatile does help here. And the performance impact of volatile is pretty near zero, so why have the non-volatile variant? Thanx, Paul - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Bugme-new] [Bug 8895] New: An ioctl to delete an ipv6 tunnel leads to a kernel panic
On Thu, 16 Aug 2007 12:24:05 -0700 (PDT) [EMAIL PROTECTED] wrote: http://bugzilla.kernel.org/show_bug.cgi?id=8895 Summary: An ioctl to delete an ipv6 tunnel leads to a kernel panic Product: Networking Version: 2.5 KernelVersion: 2.6.22.3 and also 2.6.21.5 Platform: All OS/Version: Linux Tree: Mainline Status: NEW Severity: normal Priority: P1 Component: IPV6 AssignedTo: [EMAIL PROTECTED] ReportedBy: [EMAIL PROTECTED] Most recent kernel where this bug did not occur: ? Distribution: lfs and fedora Hardware Environment:user mode linux and vmware Software Environment:an evolution of mip6d (ip mobility daemon) Problem Description: The mip6d HA was modified to make a redondancy evolution, when an HA is interrupted, the other takes over, this leads to some creation/deletion of routes and tunnels. Note: The HA ip address known by the mobile (MR) stays the same, the slave HA takes it with an override neighbor advertisement message. So the tunnel between the mobile router and the HA(s) keep the same end adresses. The problem occurs when a Ctrl C is done on the master HA, the slave takes over but sometimes, the master gets a kernel panic. Here is the dump of the master: ICMPv6 NA: someone advertises our address on eth1! Slab corruption: ip6_dst_cache start=0867ed00, len=224 Redzone: 0x9f911029d74e35b/0x9f911029d74e35b. Last user: [08157c46](dst_destroy+0x79/0xad) 0a0: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6c 6b 6b 6b Prev obj: start=0867ec08, len=224 Redzone: 0xd84156c5635688c0/0xd84156c5635688c0. Last user: [08157b05](dst_alloc+0x26/0x62) 000: 00 00 00 00 00 00 00 00 00 00 00 00 40 41 6f 08 010: 00 00 00 00 01 00 00 00 00 00 00 00 00 00 00 00 Next obj: start=0867edf8, len=224 Redzone: 0xd84156c5635688c0/0xd84156c5635688c0. Last user: [08157b05](dst_alloc+0x26/0x62) 000: 00 00 00 00 00 00 00 00 00 00 00 00 60 41 99 0b 010: 00 00 ff ff 00 00 00 00 7d df ff ff 00 00 00 00 BUG: failure at net/ipv6/ip6_fib.c:1151/fib6_del_route()! Kernel panic - not syncing: BUG! EIP: 0073:[080e10b4] CPU: 0 Not tainted ESP: 007b:bf6d0398 EFLAGS: 0246 Not tainted EAX: ffda EBX: 0006 ECX: 89f2 EDX: bf6d0428 ESI: EDI: 0815c150 EBP: bf6d0458 DS: 007b ES: 007b 08a37ae4: [0806ba80] show_regs+0xb4/0xb9 08a37b10: [0805a044] panic_exit+0x25/0x3f 08a37b24: [0807b088] notifier_call_chain+0x21/0x46 08a37b44: [0807b123] __atomic_notifier_call_chain+0x17/0x19 08a37b60: [0807b13a] atomic_notifier_call_chain+0x15/0x17 08a37b7c: [0806fff6] panic+0x52/0xdd 08a37b9c: [081bb8d2] fib6_del_route+0x112/0x175 08a37bc0: [081bb9c6] fib6_del+0x91/0xcc 08a37bdc: [081bbba8] fib6_clean_node+0x26/0x73 08a37bf4: [081bba8a] fib6_walk_continue+0x89/0x11f 08a37c04: [081bbb57] fib6_walk+0x37/0x62 08a37c18: [081bbc23] fib6_clean_tree+0x2e/0x31 08a37c4c: [081bbc83] fib6_prune_clones+0x15/0x1a 08a37c64: [081bb9de] fib6_del+0xa9/0xcc 08a37c7c: [081bbba8] fib6_clean_node+0x26/0x73 08a37c94: [081bba8a] fib6_walk_continue+0x89/0x11f 08a37ca4: [081bbb57] fib6_walk+0x37/0x62 08a37cb8: [081bbc23] fib6_clean_tree+0x2e/0x31 08a37cec: [081bbc51] fib6_clean_all+0x2b/0x48 08a37d10: [081b9d15] rt6_ifdown+0x12/0x17 08a37d24: [081b56e3] addrconf_ifdown+0x54/0x275 08a37d40: [081b562d] addrconf_notify+0x18a/0x1ec 08a37d5c: [0807b088] notifier_call_chain+0x21/0x46 08a37d7c: [0807b257] __raw_notifier_call_chain+0x17/0x19 08a37d98: [0807b26e] raw_notifier_call_chain+0x15/0x17 08a37db4: [08153c18] dev_close+0x5e/0x68 08a37dcc: [0815619e] unregister_netdevice+0xb7/0x1bc 08a37ddc: [081d75d7] ip6_tnl_ioctl+0x1a9/0x1d2 08a37e34: [0815578c] dev_ifsioc+0x3b9/0x3d9 08a37e54: [08155a71] dev_ioctl+0x2c5/0x300 08a37e9c: [0814b435] sock_ioctl+0x230/0x243 08a37ebc: [080b0801] do_ioctl+0x21/0x5a 08a37ed8: [080b0ba8] vfs_ioctl+0x1ec/0x209 08a37f00: [080b0bf3] sys_ioctl+0x2e/0x4b 08a37f28: [0805a7ae] handle_syscall+0x86/0xa0 08a37f74: [08068d00] handle_trap+0xd8/0xe1 08a37f90: [080690f3] userspace+0x138/0x180 08a37fdc: [0805a4d1] fork_handler+0x74/0x7c 08a37ffc: [a55a5a5a] 0xa55a5a5a Program received signal SIGSEGV, Segmentation fault. 0xb7e58761 in abort () from /lib/tls/i686/cmov/libc.so.6 (gdb) Program received signal SIGSEGV, Segmentation fault. 0xb7e58761 in abort () from /lib/tls/i686/cmov/libc.so.6 (gdb) bt #0 0xb7e58761 in abort () from /lib/tls/i686/cmov/libc.so.6 #1 0x080676df in os_dump_core () at arch/um/os-Linux/util.c:109 #2 0x0805a05a in panic_exit (self=0x825d674, unused1=0, unused2=0x8277ee0) at arch/um/kernel/um_arch.c:477 #3 0x0807b088 in notifier_call_chain (nl=0x8277ec0, val=0, v=0x8277ee0, nr_to_call=-2, nr_calls=0x0) at kernel/sys.c:163 #4 0x0807b123 in __atomic_notifier_call_chain (nh=0x8277ec0, val=0, v=0x8277ee0, nr_to_call=-1,
Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
On Thu, 16 Aug 2007, Chris Snook wrote: atomic_dec() already has volatile behavior everywhere, so this is semantically okay, but this code (and any like it) should be calling cpu_relax() each iteration through the loop, unless there's a compelling reason not to. I'll allow that for some hardware drivers (possibly this one) such a compelling reason may exist, but hardware-independent core subsystems probably have no excuse. No it does not have any volatile semantics. atomic_dec() can be reordered at will by the compiler within the current basic unit if you do not add a barrier. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
I can't speak for this particular case, but there could be similar code examples elsewhere, where we do the atomic ops on an atomic_t object inside a higher-level locking scheme that would take care of the kind of problem you're referring to here. It would be useful for such or similar code if the compiler kept the value of that atomic object in a register. If there is a higher-level locking scheme then there is no point to using atomic_t variables. Atomic_t is specifically for the situation where multiple CPUs are updating a variable without locking. And don't forget about the case where it is an I/O device that is updating the memory (in buffer descriptors or similar). The driver needs to do a volatile atomic read to get at the most recent version of that data, which can be important for optimising latency (or throughput even). There is no other way the kernel can get that info -- doing an MMIO read is way way too expensive. Segher - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
Note that volatile is a type-qualifier, not a type itself, so a cast of the _object_ itself to a qualified-type i.e. (volatile int) would not make the access itself volatile-qualified. There is no such thing as volatile-qualified access defined anywhere; there only is the concept of a volatile-qualified *object*. To serve our purposes, it is necessary for us to take the address of this (non-volatile) object, cast the resulting _pointer_ to the corresponding volatile-qualified pointer-type, and then dereference it. This makes that particular _access_ be volatile-qualified, without the object itself being such. Also note that the (dereferenced) result is also a valid lvalue and hence can be used in *(volatile int *)a = b; kind of construction (which we use for the atomic_set case). There is a quite convincing argument that such an access _is_ an access to a volatile object; see GCC PR21568 comment #9. This probably isn't the last word on the matter though... Segher - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
Here, I should obviously admit that the semantics of *(volatile int *) aren't any neater or well-defined in the _language standard_ at all. The standard does say (verbatim) precisely what constitutes as access to object of volatile-qualified type is implementation-defined, but GCC does help us out here by doing the right thing. Where do you get that idea? GCC manual, section 6.1, When is a Volatile Object Accessed? doesn't say anything of the kind. PR33053 and some others. Honestly, given such confusion, and the propensity of the volatile type-qualifier keyword to be ill-defined (or at least poorly understood, often inconsistently implemented), I'd (again) express my opinion that it would be best to avoid its usage, given other alternatives do exist. Yeah. Or we can have an email thread like this every time someone proposes a patch that uses an atomic variable ;-) Segher - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: PROBLEM: 2.6.23-rc NETDEV WATCHDOG: eth0: transmit timed out
I did some testing today and found that the error occurs after applying some of the patches. However I did not figure out the exact patch in which the error starts since it sometimes occurs immediatly when moving some data over the net and sometimes it takes 30 min till I get the transmit timeout. I will be away till sunday and do some more testing then. 2007/8/16, Francois Romieu [EMAIL PROTECTED]: (please do not remove the netdev Cc:) Francois Romieu [EMAIL PROTECTED] : [...] If it does not work I'll dissect 0e4851502f846b13b29b7f88f1250c980d57e944 tomorrow. You will find a tgz archive in attachment which contains a serie of patches (0001-... to 0005-...) to walk from 6dccd16b7c2703e8bbf8bca62b5cf248332afbe2 to 0e4851502f846b13b29b7f88f1250c980d57e944 in smaller steps. Please apply 0001 on top of 6dccd16b7c2703e8bbf8bca62b5cf248332afbe2. If it still works, apply 0002 on top of 0001, etc. -- Ueimor - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH 0/24] make atomic_read() behave consistently across all architectures
6674: while (atomic_read(j-DSPWrite) 0) 6675- atomic_dec(j-DSPWrite); If the maintainer of this code doesn't see a compelling reason to add cpu_relax() in this loop, then it should be patched. Shouldn't it be just re-written without the loop: if ((tmp = atomic_read(j-DSPWrite)) 0) atomic_sub(j-DSPWrite, tmp); Has all the same bugs, but runs much faster :-) -Tony - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [ofa-general] Re: [PATCH RFC] RDMA/CMA: Allocate PS_TCP ports from the host TCP port space.
From: Tom Tucker [EMAIL PROTECTED] Date: Thu, 16 Aug 2007 08:43:11 -0500 Isn't RDMA _part_ of the software net stack within Linux? It very much is not so. When using RDMA you lose the capability to do packet shaping, classification, and all the other wonderful networking facilities you've grown to love and use over the years. I'm glad this is a surprise to you, because it illustrates the point some of us keep trying to make about technologies like this. Imagine if you didn't know any of this, you purchase and begin to deploy a huge piece of RDMA infrastructure, you then get the mandate from IT that you need to add firewalling on the RDMA connections at the host level, and oh shit you can't? This is why none of us core networking developers like RDMA at all. It's totally not integrated with the rest of the Linux stack and on top of that it even gets in the way. It's an abberation, an eye sore, and a constant source of consternation. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Proposed interface for per-packet mesh-ttl
On Thu, 16 Aug 2007 12:21:14 -0700 Luis Carlos Cobo [EMAIL PROTECTED] wrote: On 7/30/07, Stephen Hemminger [EMAIL PROTECTED] wrote: it would need an IP ttl to mesh mapping. The fundamental thing is to try and avoid topology specific options bleeding all the way up the socket layer, especially since the network layer is involved and may need to multipath. I think the cleanest way would be to add a ll_ttl (ll for link layer) field to struct sock and a SO_LL_TTL socket option that sets both the field and a flag in sk-flags. This way it is useful for any driver that can do mesh or any other protocol that involves a ttl at link layer (not that I'm aware of any). However I guess you are not supposed to add new socket options nor modify struct socket too often so I'd appreciate feedback on whether this would be considered a good approach. -- Luis Carlos Cobo Rus GnuPG ID: 44019B60 cozybit Inc. The problem with socket options is how does the application know the correct policy? Pushing configuration to application is just deferring the problem, not solving it. You want some policy to be done by the infrastructure; that means kernel, libraries, daemons, etc. Doing it in the application is often inflexible and unusable. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Proposed interface for per-packet mesh-ttl
On 8/16/07, Stephen Hemminger [EMAIL PROTECTED] wrote: The problem with socket options is how does the application know the correct policy? Pushing configuration to application is just deferring the problem, not solving it. You want some policy to be done by the infrastructure; that means kernel, libraries, daemons, etc. Doing it in the application is often inflexible and unusable. The policy will usually be done by the kernel. If the flag is not set, which will happen most of the time, the driver will use a sensible default ttl. The socket option would only be used by applications that are specifically interested in a configurable ttl (like an application to plot neighbors within an specific diameter). A per-interface setting is not good enough, since we do not want the neighbor-plotting tool to affect the ttl of other connections (e.g. a ssh session) that might be going on at the same time. -- Luis Carlos Cobo Rus GnuPG ID: 44019B60 cozybit Inc. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC] net/core/dst.c : Should'nt dst_run_gc() be more scalable and friendly ?
On Thu, Aug 16, 2007 at 05:40:44PM +0200, Eric Dumazet wrote: So do you think this patch is enough or should we convert dst_run_gc processing from softirq to workqueue too ? I think a workqueue would be the best solution since with that you wouldn't have to worry about processing things in chunks. Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmVHI~} [EMAIL PROTECTED] Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
On Thu, Aug 16, 2007 at 09:34:41AM -0700, Paul E. McKenney wrote: The compiler can also reorder non-volatile accesses. For an example patch that cares about this, please see: http://lkml.org/lkml/2007/8/7/280 This patch uses an ORDERED_WRT_IRQ() in rcu_read_lock() and rcu_read_unlock() to ensure that accesses aren't reordered with respect to interrupt handlers and NMIs/SMIs running on that same CPU. Good, finally we have some code to discuss (even though it's not actually in the kernel yet). First of all, I think this illustrates that what you want here has nothing to do with atomic ops. The ORDERED_WRT_IRQ macro occurs a lot more times in your patch than atomic reads/sets. So *assuming* that it was necessary at all, then having an ordered variant of the atomic_read/atomic_set ops could do just as well. However, I still don't know which atomic_read/atomic_set in your patch would be broken if there were no volatile. Could you please point them out? Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmVHI~} [EMAIL PROTECTED] Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
On Thu, Aug 16, 2007 at 03:48:54PM -0400, Chris Snook wrote: Can you find an actual atomic_read code snippet there that is broken without the volatile modifier? A whole bunch of atomic_read uses will be broken without the volatile modifier once we start removing barriers that aren't needed if volatile behavior is guaranteed. Could you please cite the file/function names so we can see whether removing the barrier makes sense? Thanks, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmVHI~} [EMAIL PROTECTED] Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
On Fri, Aug 17, 2007 at 07:59:02AM +0800, Herbert Xu wrote: On Thu, Aug 16, 2007 at 09:34:41AM -0700, Paul E. McKenney wrote: The compiler can also reorder non-volatile accesses. For an example patch that cares about this, please see: http://lkml.org/lkml/2007/8/7/280 This patch uses an ORDERED_WRT_IRQ() in rcu_read_lock() and rcu_read_unlock() to ensure that accesses aren't reordered with respect to interrupt handlers and NMIs/SMIs running on that same CPU. Good, finally we have some code to discuss (even though it's not actually in the kernel yet). There was some earlier in this thread as well. First of all, I think this illustrates that what you want here has nothing to do with atomic ops. The ORDERED_WRT_IRQ macro occurs a lot more times in your patch than atomic reads/sets. So *assuming* that it was necessary at all, then having an ordered variant of the atomic_read/atomic_set ops could do just as well. Indeed. If I could trust atomic_read()/atomic_set() to cause the compiler to maintain ordering, then I could just use them instead of having to create an ORDERED_WRT_IRQ(). (Or ACCESS_ONCE(), as it is called in a different patch.) However, I still don't know which atomic_read/atomic_set in your patch would be broken if there were no volatile. Could you please point them out? Suppose I tried replacing the ORDERED_WRT_IRQ() calls with atomic_read() and atomic_set(). Starting with __rcu_read_lock(): o If ORDERED_WRT_IRQ(__get_cpu_var(rcu_flipctr)[idx])++ was ordered by the compiler after ORDERED_WRT_IRQ(me-rcu_read_lock_nesting) = nesting + 1, then suppose an NMI/SMI happened after the rcu_read_lock_nesting but before the rcu_flipctr. Then if there was an rcu_read_lock() in the SMI/NMI handler (which is perfectly legal), the nested rcu_read_lock() would believe that it could take the then-clause of the enclosing if statement. But because the rcu_flipctr per-CPU variable had not yet been incremented, an RCU updater would be within its rights to assume that there were no RCU reads in progress, thus possibly yanking a data structure out from under the reader in the SMI/NMI function. Fatal outcome. Note that only one CPU is involved here because these are all either per-CPU or per-task variables. o If ORDERED_WRT_IRQ(me-rcu_read_lock_nesting) = nesting + 1 was ordered by the compiler to follow the ORDERED_WRT_IRQ(me-rcu_flipctr_idx) = idx, and an NMI/SMI happened between the two, then an __rcu_read_lock() in the NMI/SMI would incorrectly take the else clause of the enclosing if statement. If some other CPU flipped the rcu_ctrlblk.completed in the meantime, then the __rcu_read_lock() would (correctly) write the new value into rcu_flipctr_idx. Well and good so far. But the problem arises in __rcu_read_unlock(), which then decrements the wrong counter. Depending on exactly how subsequent events played out, this could result in either prematurely ending grace periods or never-ending grace periods, both of which are fatal outcomes. And the following are not needed in the current version of the patch, but will be in a future version that either avoids disabling irqs or that dispenses with the smp_read_barrier_depends() that I have 99% convinced myself is unneeded: o nesting = ORDERED_WRT_IRQ(me-rcu_read_lock_nesting); o idx = ORDERED_WRT_IRQ(rcu_ctrlblk.completed) 0x1; Furthermore, in that future version, irq handlers can cause the same mischief that SMI/NMI handlers can in this version. Next, looking at __rcu_read_unlock(): o If ORDERED_WRT_IRQ(me-rcu_read_lock_nesting) = nesting - 1 was reordered by the compiler to follow the ORDERED_WRT_IRQ(__get_cpu_var(rcu_flipctr)[idx])--, then if an NMI/SMI containing an rcu_read_lock() occurs between the two, this nested rcu_read_lock() would incorrectly believe that it was protected by an enclosing RCU read-side critical section as described in the first reversal discussed for __rcu_read_lock() above. Again, fatal outcome. This is what we have now. It is not hard to imagine situations that interact with -both- interrupt handlers -and- other CPUs, as described earlier. Thanx, Paul - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
On Thu, Aug 16, 2007 at 01:20:26PM -0700, Christoph Lameter wrote: On Thu, 16 Aug 2007, Chris Snook wrote: atomic_dec() already has volatile behavior everywhere, so this is semantically okay, but this code (and any like it) should be calling cpu_relax() each iteration through the loop, unless there's a compelling reason not to. I'll allow that for some hardware drivers (possibly this one) such a compelling reason may exist, but hardware-independent core subsystems probably have no excuse. No it does not have any volatile semantics. atomic_dec() can be reordered at will by the compiler within the current basic unit if you do not add a barrier. Yep. Or you can use atomic_dec_return() instead of using a barrier. Thanx, Paul - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
On Thu, Aug 16, 2007 at 06:02:32PM -0700, Paul E. McKenney wrote: Yep. Or you can use atomic_dec_return() instead of using a barrier. Or you could use smp_mb__{before,after}_atomic_dec. Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmVHI~} [EMAIL PROTECTED] Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
Herbert Xu wrote: On Thu, Aug 16, 2007 at 03:48:54PM -0400, Chris Snook wrote: Can you find an actual atomic_read code snippet there that is broken without the volatile modifier? A whole bunch of atomic_read uses will be broken without the volatile modifier once we start removing barriers that aren't needed if volatile behavior is guaranteed. Could you please cite the file/function names so we can see whether removing the barrier makes sense? Thanks, At a glance, several architectures' implementations of smp_call_function() have one or more legitimate atomic_read() busy-waits that shouldn't be using CPU-relax. Some of them do work in the loop. I'm sure there are plenty more examples that various maintainers could find in their own code. -- Chris - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
On Thu, Aug 16, 2007 at 10:04:24PM -0400, Chris Snook wrote: Could you please cite the file/function names so we can see whether removing the barrier makes sense? At a glance, several architectures' implementations of smp_call_function() have one or more legitimate atomic_read() busy-waits that shouldn't be using CPU-relax. Some of them do work in the loop. Care to name one so we can discuss it? Thanks, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmVHI~} [EMAIL PROTECTED] Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
Segher Boessenkool wrote: Part of the motivation here is to fix heisenbugs. If I knew where they By the same token we should probably disable optimisations altogether since that too can create heisenbugs. Almost everything is a tradeoff; and so is this. I don't believe most people would find disabling all compiler optimisations an acceptable price to pay for some peace of mind. So why is this a good tradeoff? It certainly is better than disabling all compiler optimisations! It's easy to be better than something really stupid :) So i386 and x86-64 don't have volatiles there, and it saves them a few K of kernel text. What you need to justify is why it is a good tradeoff to make them volatile (which btw, is much harder to go the other way after we let people make those assumptions). I also think that just adding things to APIs in the hope it might fix up some bugs isn't really a good road to go down. Where do you stop? I look at it the other way: keeping the volatile semantics in atomic_XXX() (or adding them to it, whatever) helps _prevent_ bugs; Yeah, but we could add lots of things to help prevent bugs and would never be included. I would also contend that it helps _hide_ bugs and encourages people to be lazy when thinking about these things. Also, you dismiss the fact that we'd actually be *adding* volatile semantics back to the 2 most widely tested architectures (in terms of test time, number of testers, variety of configurations, and coverage of driver code). This is a very important different from just keeping volatile semantics because it is basically a one-way API change. certainly most people expect that behaviour, and also that behaviour is *needed* in some places and no other interface provides that functionality. I don't know that most people would expect that behaviour. Is there any documentation anywhere that would suggest this? Also, barrier() most definitely provides the required functionality. It is overkill in some situations, but volatile is overkill in _most_ situations. If that's what you're worried about, we should add a new ordering primitive. [some confusion about barriers wrt atomics snipped] What were you confused about? -- SUSE Labs, Novell Inc. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
Chris Snook wrote: Herbert Xu wrote: On Thu, Aug 16, 2007 at 03:48:54PM -0400, Chris Snook wrote: Can you find an actual atomic_read code snippet there that is broken without the volatile modifier? A whole bunch of atomic_read uses will be broken without the volatile modifier once we start removing barriers that aren't needed if volatile behavior is guaranteed. Could you please cite the file/function names so we can see whether removing the barrier makes sense? Thanks, At a glance, several architectures' implementations of smp_call_function() have one or more legitimate atomic_read() busy-waits that shouldn't be using CPU-relax. Some of them do work in the loop. sh looks like the only one there that would be broken (and that's only because they don't have a cpu_relax there, but it should be added anyway). Sure, if we removed volatile from other architectures, it would be wise to audit arch code because arch maintainers do sometimes make assumptions about their implementation details... however we can assume most generic code is safe without volatile. -- SUSE Labs, Novell Inc. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
Christoph Lameter writes: No it does not have any volatile semantics. atomic_dec() can be reordered at will by the compiler within the current basic unit if you do not add a barrier. Volatile doesn't mean it can't be reordered; volatile means the accesses can't be eliminated. Paul. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
On Fri, 17 Aug 2007, Paul Mackerras wrote: Volatile doesn't mean it can't be reordered; volatile means the accesses can't be eliminated. It also does limit re-ordering. Of course, since *normal* accesses aren't necessarily limited wrt re-ordering, the question then becomes one of with regard to *what* does it limit re-ordering?. A C compiler that re-orders two different volatile accesses that have a sequence point in between them is pretty clearly a buggy compiler. So at a minimum, it limits re-ordering wrt other volatiles (assuming sequence points exists). It also means that the compiler cannot move it speculatively across conditionals, but other than that it's starting to get fuzzy. In general, I'd *much* rather we used barriers. Anything that depends on volatile is pretty much set up to be buggy. But I'm certainly also willing to have that volatile inside atomic_read/atomic_set() if it avoids code that would otherwise break - ie if it hides a bug. Linus - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
Paul E. McKenney wrote: On Thu, Aug 16, 2007 at 06:42:50PM +0800, Herbert Xu wrote: In fact, volatile doesn't guarantee that the memory gets read anyway. You might be reading some stale value out of the cache. Granted this doesn't happen on x86 but when you're coding for the kernel you can't make such assumptions. So the point here is that if you don't mind getting a stale value from the CPU cache when doing an atomic_read, then surely you won't mind getting a stale value from the compiler cache. Absolutely disagree. An interrupt/NMI/SMI handler running on the CPU will see the same value (whether in cache or in store buffer) that the mainline code will see. In this case, we don't care about CPU misordering, only about compiler misordering. It is easy to see other uses that combine communication with handlers on the current CPU with communication among CPUs -- again, see prior messages in this thread. I still don't agree with the underlying assumption that everybody (or lots of kernel code) treats atomic accesses as volatile. Nobody that does has managed to explain my logic problem either: loads and stores to long and ptr have always been considered to be atomic, test_bit is atomic; so why are another special subclass of atomic loads and stores? (and yes, it is perfectly legitimate to want a non-volatile read for a data type that you also want to do atomic RMW operations on) Why are people making these undocumented and just plain false assumptions about atomic_t? If they're using lockless code (ie. which they must be if using atomics), then they actually need to be thinking much harder about memory ordering issues. If that is too much for them, then they can just use locks. So, the architecture guys can implement atomic_read however they want --- as long as it cannot be optimized away.* They can implement it however they want as long as it stays atomic. Precisely. And volatility is a key property of atomic. Let's please not throw it away. It isn't, though (at least not since i386 and x86-64 don't have it). _Adding_ it is trivial, and can be done any time. Throwing it away (ie. making the API weaker) is _hard_. So let's not add it without really good reasons. It most definitely results in worse code generation in practice. I don't know why people would assume volatile of atomics. AFAIK, most of the documentation is pretty clear that all the atomic stuff can be reordered etc. except for those that modify and return a value. It isn't even intuitive: `*lp = value` is like the most fundamental atomic operation in Linux. -- SUSE Labs, Novell Inc. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
Nick Piggin writes: So i386 and x86-64 don't have volatiles there, and it saves them a few K of kernel text. What you need to justify is why it is a good I'm really surprised it's as much as a few K. I tried it on powerpc and it only saved 40 bytes (10 instructions) for a G5 config. Paul. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
Paul Mackerras wrote: Nick Piggin writes: So i386 and x86-64 don't have volatiles there, and it saves them a few K of kernel text. What you need to justify is why it is a good I'm really surprised it's as much as a few K. I tried it on powerpc and it only saved 40 bytes (10 instructions) for a G5 config. Paul. I'm surprised too. Numbers were from the ...use asm() like the other atomic operations already do thread. According to them, textdata bss dec hex filename 3434150 249176 176128 3859454 3ae3fe atomic_normal/vmlinux 3436203 249176 176128 3861507 3aec03 atomic_volatile/vmlinux The first one is a stock kenel, the second is with atomic_read/set cast to volatile. gcc-4.1 -- maybe if you have an earlier gcc it won't optimise as much? -- SUSE Labs, Novell Inc. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html