[PATCH] SMC911X: Fix using of dereferenced skb after netif_rx
After netif_rx(), skb will be freed. So get the skb-len before netif_rx(). Signed-off-by: Wang Chen [EMAIL PROTECTED] --- smc911x.c |5 +++-- 1 files changed, 3 insertions(+), 2 deletions(-) --- linux-2.6.24.rc3.org/drivers/net/smc911x.c 2007-11-19 12:38:05.0 +0800 +++ linux-2.6.24.rc3/drivers/net/smc911x.c 2007-11-30 15:00:53.0 +0800 @@ -1287,7 +1287,7 @@ smc911x_rx_dma_irq(int dma, void *data) struct smc911x_local *lp = netdev_priv(dev); struct sk_buff *skb = lp-current_rx_skb; unsigned long flags; - unsigned int pkts; + unsigned int pkts, len; DBG(SMC_DEBUG_FUNC, %s: -- %s\n, dev-name, __FUNCTION__); DBG(SMC_DEBUG_RX | SMC_DEBUG_DMA, %s: RX DMA irq handler\n, dev-name); @@ -1299,9 +1299,10 @@ smc911x_rx_dma_irq(int dma, void *data) PRINT_PKT(skb-data, skb-len); dev-last_rx = jiffies; skb-protocol = eth_type_trans(skb, dev); + len = skb-len; netif_rx(skb); dev-stats.rx_packets++; - dev-stats.rx_bytes += skb-len; + dev-stats.rx_bytes += len; spin_lock_irqsave(lp-lock, flags); pkts = (SMC_GET_RX_FIFO_INF() RX_FIFO_INF_RXSUSED_) 16; -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC][PATCHES 0/7]: Reorganization of RX history patches
On 12/3/07, Arnaldo Carvalho de Melo [EMAIL PROTECTED] wrote: WARNING: After reading some messages from Ingo Molnar on lkml I think we should really trim the number of lists we use for kernel development. And since I moved back to using mutt for reading e-mails, something I should have never, ever stopped doing, I guess we should move the DCCP discussions to netdev, where we hopefully can get more people interested and reviewing the work we do, so please consider moving DCCP discussion to netdev@vger.kernel.org, where lots of smart networking folks are present and can help our efforts on turning RFCs to code. I (and others too) don't necessarily have time to read netdev so would vote on keeping dccp. I would totally agree to making sure that cross-post to netdev as well as dccp. Ian -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: namespace support requires network modules to say GPL
Patrick McHardy wrote: Adrian Bunk wrote: On Sun, Dec 02, 2007 at 09:03:56PM +0100, Patrick McHardy wrote: For all I care binary modules can break, but frankly I don't see how encapsulating a couple of structures and pointers in a new structure and adding a new argument to existing functions shifts the decision about how a function should be usable to the namespace guys. IMO all functions should continue to be usable as before, as decided by whoever actually wrote them. ... Even ignoring the fact that it's unclear whether distributing modules with not GPLv2 compatible licences is legal at all or might bring you in jail, Agreed, lets ignore that :) your statement has an interesting implication: Stuff like e.g. the EXPORT_SYMBOL(sk_alloc) predates the EXPORT_SYMBOL_GPL stuff. Who is considered the author of this code? And when should he state whether he prefers to use EXPORT_SYMBOL_GPL but wasn't able to use it at that when he wrote it since his code predates it and is glad to be able to decide this now? He can state it when he feels like it, I don't see the point. Authors generally get to decide whether they use EXPORT_SYMBOL or EXPORT_SYMBOL_GPL unless in cases where its really clear-cut that EXPORT_SYMBOL is inapproriate. But thats a different matter. If a symbol was OK to be used previously and something using it would not automatically be considered a derived work, how does passing init_net to the function just to make the compiler happy, avoid BUG_ONs and generally keep things working as before make it more of a derived work? We, namely, Pavel Emelyanov and me, if we have some rights as a committers to this staff :), do not mind against change EXPORT_SYMBOL_GPL to EXPORT_SYMBOL. Regards, Den -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH][RESEND] PHY: Add the phy_device_release device method.
In cases where more than a single PHY is found on the MDIO bus, the kernel will print a warning that this method is missing for each PHY device that is not attached to a networking device. Signed-off-by: Thierry Reding [EMAIL PROTECTED] --- drivers/net/phy/mdio_bus.c | 19 ++- 1 files changed, 18 insertions(+), 1 deletions(-) diff --git a/drivers/net/phy/mdio_bus.c b/drivers/net/phy/mdio_bus.c index fc2f0e6..cb7fb47 100644 --- a/drivers/net/phy/mdio_bus.c +++ b/drivers/net/phy/mdio_bus.c @@ -36,6 +36,23 @@ #include asm/uaccess.h /** + * phy_device_release - free a phy_device structure when all users of it are + * finished. + * + * @dev: device that's been disconnected + * + * Will be called only by the device core when all users of this phy_device + * are done. + */ +static void phy_device_release(struct device *dev) +{ + struct phy_device *phy; + + phy = to_phy_device(dev); + kfree(phy); +} + +/** * mdiobus_register - bring up all the PHYs on a given bus and attach them to bus * @bus: target mii_bus * @@ -83,6 +100,7 @@ int mdiobus_register(struct mii_bus *bus) if (phydev) { phydev-irq = bus-irq[i]; + phydev-dev.release = phy_device_release; phydev-dev.parent = bus-dev; phydev-dev.bus = mdio_bus_type; snprintf(phydev-dev.bus_id, BUS_ID_SIZE, PHY_ID_FMT, bus-id, i); @@ -112,7 +130,6 @@ void mdiobus_unregister(struct mii_bus *bus) for (i = 0; i PHY_MAX_ADDR; i++) { if (bus-phy_map[i]) { device_unregister(bus-phy_map[i]-dev); - kfree(bus-phy_map[i]); } } } signature.asc Description: Digital signature
Re: [RFC][PATCHES 0/7]: Reorganization of RX history patches
Hi Arnaldo, hank you for going through this. I have just backported your recent patches of 2.6.25 to the DCCP/CCID4/Faster Restart test tree at git://eden-feed.erg.abdn.ac.uk/dccp_exp {dccp,ccid4,dccp_fr} as per subsequent message. |do, so please consider moving DCCP discussion to netdev@vger.kernel.org, |where lots of smart networking folks are present and can help our efforts |on turning RFCs to code. Are you suggesting using netdev exclusively or in addition to [EMAIL PROTECTED] | Please take a look at this patch series where I reorganized your work on the new | TFRC rx history handling code. I'll wait for your considerations and then do as many | interactions as reasonable to get your work merged. | | It should be completely equivalent, plus some fixes and optimizations, such as: It will be necessary to address these points one-by-one. Before diving into making fixes and `optimisations', have you tested your code? The patches you are referring to have been posted and re-posted for a period of over 9 months on [EMAIL PROTECTED], and there are regression tests which show that this code improves on the existing Linux implementation. These are labelled as `test tree' on http://www.linux-foundation.org/en/Net:DCCP_Testing#Regression_testing So if you are making changes to the code, I would like to ask if you have run similar regression tests, to avoid having to step back later. | . The code that allocates the RX ring deals with failures when one of the entries in | the ring buffer is not successfully allocated, the original code was leaking the | successfully allocated entries. | | . We do just one allocation for the ring buffer, as the number of entries is fixed we | should just do one allocation and not TFRC_NDUPACK times. Will look at the first point in the patch; with regard to the second point I agree, this will make the code simpler, which is good. | . I haven't checked if all the code was commited, as I tried to introduce just what was | immediatelly used, probably we'll need to do some changes when working on the merge | of your loss intervals code. Sorry I don't understand this point. | . I changed the ccid3_hc_rx_packet_recv code to set hcrx-ccid3hcrx_s for the first | non-data packet instead of calling ccid3_hc_rx_set_state, that would use 0 as the | initial value in the EWMA calculation. This is a misunderstanding. Non-data packets are not considered in the moving average for the data packet size `s'; and it would be an error to do (consider 40byte Acks vs. 1460byte data packets, also it is in RFC 4342). Where would the zero initial value come from? I think this is also a misunderstanding. Please have a look below: static void ccid3_hc_rx_packet_recv(struct sock *sk, struct sk_buff *skb) { // ... u32 sample, payload_size = skb-len - dccp_hdr(skb)-dccph_doff * 4; if (unlikely(hcrx-ccid3hcrx_state == TFRC_RSTATE_NO_DATA)) { if (is_data_packet) { do_feedback = FBACK_INITIAL; ccid3_hc_rx_set_state(sk, TFRC_RSTATE_DATA); ccid3_hc_rx_update_s(hcrx, payload_size); } goto update_records; } == Non-data packets are ignored for the purposes of computing s (this is in the RFC), consequently update_s() is only called for data packets; using the two following functions: static inline u32 tfrc_ewma(const u32 avg, const u32 newval, const u8 weight) { return avg ? (weight * avg + (10 - weight) * newval) / 10 : newval; } static inline void ccid3_hc_rx_update_s(struct ccid3_hc_rx_sock *hcrx, int len) { if (likely(len 0))/* don't update on empty packets (e.g. ACKs) */ hcrx-ccid3hcrx_s = tfrc_ewma(hcrx-ccid3hcrx_s, len, 9); } == Hence I can't see where a zero value should come from: ccid3hrx_s is initially initialised with zero (memset(...,0,...)); when first called, update_s() will feed a non-zero payload size to tfrc_ewma(), which will return `newval' = payload_size, hence the first data packet will contribute a non-zero payload_size. Zero-sized DCCP-Data packets are pathological and are ignored by the CCID calculations (not by the receiver); a corresponding counterpart for zero-sized | | It is available at: | | master.kernel.org:/pub/scm/linux/kernel/git/acme/net-2.6.25 | Need to do this separately. As said, the code has been developed and tested over a long time, it took a long while until it acted predictably, so being careful is very important. I would rather not have my patches merged and continue to run a test tree if the current changes alter the behaviour to the worse. -- To unsubscribe
Re: [PATCH (resubmit)] Fix inet_diag.ko register vs rcv race
Herbert Xu wrote: On Thu, Nov 29, 2007 at 04:01:25PM +0300, Pavel Emelyanov wrote: @@ -863,13 +861,13 @@ int inet_diag_register(const struct inet_diag_handler *h) if (type = INET_DIAG_GETSOCK_MAX) goto out; -spin_lock(inet_diag_register_lock); +mutex_lock(inet_diag_mutex); err = -EEXIST; if (inet_diag_table[type] == NULL) { inet_diag_table[type] = h; err = 0; } -spin_unlock(inet_diag_register_lock); +mutex_unlock(inet_diag_mutex); Actually this causes a dead-lock when the handlers are built as modules because we try to load them with that mutex held. Ouch! Sorry, I didn't notice this :( I've fixed it with this patch on top. Thanks! Cheers, Pavel -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net-2.6.25 1/6][CORE] Remove unneeded ifdefs from sysctl_net_core.c
Eric W. Biederman wrote: Pavel Emelyanov [EMAIL PROTECTED] writes: They include the whole file, but it is already compiled out when SYSCTL=n, since it is obj-$(CONFIG_SYSCTL) target in the Makefile. Pavel thanks for sending these patches. Might I ask for some level of acknowledgement when you rework one of my patches and send it off. Sure. I though I've been doing exactly this thing. I've said that unix ctls were from your tree and so on. I suppose this could be a case of duplicate thinking but this patch looks very familiar. Hm... Do you mean that this one is from your tree? Sorry then, I didn't know this. I can resend it with From: you if you wish. Eric Thanks, Pavel Signed-off-by: Pavel Emelyanov [EMAIL PROTECTED] --- diff --git a/net/core/sysctl_net_core.c b/net/core/sysctl_net_core.c index 113cc72..277c8fa 100644 --- a/net/core/sysctl_net_core.c +++ b/net/core/sysctl_net_core.c @@ -13,8 +13,6 @@ #include net/sock.h #include net/xfrm.h -#ifdef CONFIG_SYSCTL - ctl_table core_table[] = { #ifdef CONFIG_NET { @@ -151,5 +149,3 @@ ctl_table core_table[] = { }, { .ctl_name = 0 } }; - -#endif -- 1.5.3.4 -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: namespace support requires network modules to say GPL
On Sat, 2007-12-01 at 22:34 -0500, Mark Lord wrote: Stephen Hemminger wrote:. I spoke too soon earlier, ndiswrapper builds and loads against current 2.6.24-rc3. Vmware and proprietary VPN software probably do not. Once again I don't give a damn, but the enterprise distro vendors certainly care. ... Naw, enterprise (or any other) distro vendors shouldn't have any issues here, since they can just patch their kernels around any issues. Please pardon me for jumping in; I am not a kernel developer, but I try to help with debugging whenever I can (and it's not just hand-waving, I helped to track down a couple of nasty bugs on MMC or ACPI EC, recently). And I am an engineer and IANAL, so I wouldn't speak about laws here. But I think it's not just a distribution's problem. Unfortunately, I need VMware and ndiswrapper to get work done with my laptop. It's not the perfect world, but the only alternative is to boot in XP. So I normally stick with vendors kernels and, when I have time to play with new kernel, I go for it. If ndiswrapper and VMware work, perfect, I can test extensively the new kernel; if I find problems, I *know* I have to restart without proprietary modules, try to reproduce, report back. I did it a lot of times. What I think is that every time VMware or (worst) ndiswrapper breaks, the kernel loose an awful lot of testers. In the span of time before Giri and the VMware team post a patch (-rc1 and -rc4, tipically), my testing activity is just occasional. And I guess a lot of people is in the same situation. These are just my 2cents. I will continue to test new kernels every time I can, and to use native solutions as often as I can (go, ath5k, go!; and LabWindows/CVI for Linux, anyone?). But maybe a bit of tolerance can help everyone... Back in my hole, Romano -- Sorry for the disclaimer --- ¡I cannot stop it! -- La presente comunicación tiene carácter confidencial y es para el exclusivo uso del destinatario indicado en la misma. Si Ud. no es el destinatario indicado, le informamos que cualquier forma de distribución, reproducción o uso de esta comunicación y/o de la información contenida en la misma están estrictamente prohibidos por la ley. Si Ud. ha recibido esta comunicación por error, por favor, notifíquelo inmediatamente al remitente contestando a este mensaje y proceda a continuación a destruirlo. Gracias por su colaboración. This communication contains confidential information. It is for the exclusive use of the intended addressee. If you are not the intended addressee, please note that any form of distribution, copying or use of this communication or the information in it is strictly prohibited by law. If you have received this communication in error, please immediately notify the sender by reply e-mail and destroy this message. Thank you for your cooperation. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: SSB: No is not an answer
On 01-12-2007 21:27, Michael Buesch wrote: On Saturday 01 December 2007 20:00:23 Arnaldo Carvalho de Melo wrote: Em Sat, Dec 01, 2007 at 12:45:32PM -0500, John W. Linville escreveu: On Sat, Dec 01, 2007 at 03:17:44PM -0200, Arnaldo Carvalho de Melo wrote: Sonics Silicon Backplane support (SSB) [M/y/?] (NEW) n Support for the Sonics Silicon Backplane bus. You only need to enable this option, if you are configuring a kernel for an embedded system with this bus. It will be auto-selected if needed in other environments. The module will be called ssb. If unsure, say N. Sonics Silicon Backplane support (SSB) [M/y/?] (NEW) I think this is OK -- it isn't really offering the choice to say no anyway. You must have turned-on B44 or B43(LEGACY) already? So, your choice is merely whether to have it built-in or as a module. Ok, so the comment on being unsure is wrong as we can't say N as suggested :-) Oh, come on... Read the _whole_ comment. Why? IMHO If unsure... are kind of defaults just to let you not read or understand the whole comment. But here even reading the whole doesn't make this less amusing, because it suggests you still have the choice (eg. to revert auto-selection). Regards, Jarek P. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH]: fix lro_gen_skb() alignment
On Sun, Dec 02, 2007 at 07:07:26PM -0500, Andrew Gallatin wrote: That was my first thought as well, but it turns out that when lro_gen_skb() is called via the out1 label, mac_hdr_len may not be known. It seemed simplest and cleanest to just make it a field in lro_mgr. Fair enough. Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmVHI~} [EMAIL PROTECTED] Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/2] [IPSEC]: Reinject packet instead of calling netfilter directly on input
On Thu, Nov 29, 2007 at 03:49:34PM -0500, jamal wrote: Herbert, This is a simplified version of one of your earlier patches that never made it in. I liked it so much that i reduced it to this and infact given the cycles today, tested it (with transport and tunnel mode only;-). Sorry for the late response Jamal. I've been too busy to give this issue proper thought. It's still in my mailbox so I will respond to it once things quiten down a little. Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmVHI~} [EMAIL PROTECTED] Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/2] netem: trace enhancement
Patrick McHardy wrote: Ariane Keller wrote: Thanks for your comments! I'd like to better understand your dislike of the current implementation of the data transfer from user space to kernel space. Is it the fact that we use configfs? I think, we had already a discussion about this (and we changed from procfs to configfs). Or don't you like that we need a user space daemon which is responsible for feeding the data to the kernel module? I think we do not have another option, since the trace file may be of arbitrary length. Or anything else? I dislike using anything besides rtnetlink for qdisc configuration. The only way to transfer arbitary amounts of data over netlink would be to spread the data over multiple messages. But then again, you're using kmalloc and only seem to allocate 4k, so how large are these traces in practice? For each packet to be processed there is 32bit of data, which encodes delay and drop, duplicate etc. The size of the actual trace file can therefore reach any length, depending on for how many packets the information is encoded (up to several GB). Therefore we send the trace file in chunks of 4000bytes to the kernel. In order to have always a packet-delay-value ready, we maintain two delay queues in the kernel (each of 4k). In a first step, both queues are filled, and the values are read from the first queue, if this queue is finished, we read values from the second queue and fill the first queue with new values from the trace file etc. Therefore we have a user space process running, which reads the values from the trace file, and sends them to the kernel. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2.6.25] net: removes unnecessary dependencies for net_namespace.h
This patch removes some unneeded includes for net_namespace.h to speed up compilation. Signed-off-by: Denis V. Lunev [EMAIL PROTECTED] diff --git a/include/net/pkt_cls.h b/include/net/pkt_cls.h index f285de6..28b7f25 100644 --- a/include/net/pkt_cls.h +++ b/include/net/pkt_cls.h @@ -2,7 +2,6 @@ #define __NET_PKT_CLS_H #include linux/pkt_cls.h -#include net/net_namespace.h #include net/sch_generic.h #include net/act_api.h diff --git a/include/net/sock.h b/include/net/sock.h index 43e3cd9..a04e361 100644 --- a/include/net/sock.h +++ b/include/net/sock.h @@ -57,7 +57,6 @@ #include asm/atomic.h #include net/dst.h #include net/checksum.h -#include net/net_namespace.h /* * This structure really needs to be cleaned up. @@ -95,6 +94,7 @@ typedef struct { struct sock; struct proto; +struct net; /** * struct sock_common - minimal network layer representation of sockets -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: e1000 driver problems
On Tue, Nov 27, 2007 at 10:23:00AM -0800, Kok, Auke wrote: can you see if your problem goes away with this patch? I cannot test it right now but friend of mine has the same card with 2.6.23.1 kernel. it does not. he also tried module 7.6.12 from source fourge, your patch did not help. Moreover, it just hangs network connections after while. No message in dmesg about hangup. -- Lukáš Hejtmánek -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH][IPVS] Don't leak sysctl tables if the scheduler registration fails
In case we load lblc or lblcr module we can leak some sysctl tables if the call to register_ip_vs_scheduler() fails. I've looked at the register_ip_vs_scheduler() code and saw, that the only reason to fail is the name collision, so I think that with some 3rd party schedulers this becomes a relevant issue. No? Signed-off-by: Pavel Emelyanov [EMAIL PROTECTED] --- diff --git a/net/ipv4/ipvs/ip_vs_lblc.c b/net/ipv4/ipvs/ip_vs_lblc.c index b843a11..ad89644 100644 --- a/net/ipv4/ipvs/ip_vs_lblc.c +++ b/net/ipv4/ipvs/ip_vs_lblc.c @@ -580,9 +580,14 @@ static struct ip_vs_scheduler ip_vs_lblc_scheduler = static int __init ip_vs_lblc_init(void) { + int ret; + INIT_LIST_HEAD(ip_vs_lblc_scheduler.n_list); sysctl_header = register_sysctl_table(lblc_root_table); - return register_ip_vs_scheduler(ip_vs_lblc_scheduler); + ret = register_ip_vs_scheduler(ip_vs_lblc_scheduler); + if (ret) + unregister_sysctl_table(sysctl_header); + return ret; } diff --git a/net/ipv4/ipvs/ip_vs_lblcr.c b/net/ipv4/ipvs/ip_vs_lblcr.c index e5b323a..2a5ed85 100644 --- a/net/ipv4/ipvs/ip_vs_lblcr.c +++ b/net/ipv4/ipvs/ip_vs_lblcr.c @@ -769,9 +769,14 @@ static struct ip_vs_scheduler ip_vs_lblcr_scheduler = static int __init ip_vs_lblcr_init(void) { + int ret; + INIT_LIST_HEAD(ip_vs_lblcr_scheduler.n_list); sysctl_header = register_sysctl_table(lblcr_root_table); - return register_ip_vs_scheduler(ip_vs_lblcr_scheduler); + ret = register_ip_vs_scheduler(ip_vs_lblcr_scheduler); + if (ret) + unregister_sysctl_table(sysctl_header); + return ret; } -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] SMC911X: Fix using of dereferenced skb after netif_rx
Herbert Xu said the following on 2007-12-3 18:11: Please send driver patches to Jeff Garzik [EMAIL PROTECTED]. Sorry. Resend the patch. After netif_rx(), skb will be freed. So get the skb-len before netif_rx(). Signed-off-by: Wang Chen [EMAIL PROTECTED] --- smc911x.c |5 +++-- 1 files changed, 3 insertions(+), 2 deletions(-) --- linux-2.6.24.rc3.org/drivers/net/smc911x.c 2007-11-19 12:38:05.0 +0800 +++ linux-2.6.24.rc3/drivers/net/smc911x.c 2007-11-30 15:00:53.0 +0800 @@ -1287,7 +1287,7 @@ smc911x_rx_dma_irq(int dma, void *data) struct smc911x_local *lp = netdev_priv(dev); struct sk_buff *skb = lp-current_rx_skb; unsigned long flags; - unsigned int pkts; + unsigned int pkts, len; DBG(SMC_DEBUG_FUNC, %s: -- %s\n, dev-name, __FUNCTION__); DBG(SMC_DEBUG_RX | SMC_DEBUG_DMA, %s: RX DMA irq handler\n, dev-name); @@ -1299,9 +1299,10 @@ smc911x_rx_dma_irq(int dma, void *data) PRINT_PKT(skb-data, skb-len); dev-last_rx = jiffies; skb-protocol = eth_type_trans(skb, dev); + len = skb-len; netif_rx(skb); dev-stats.rx_packets++; - dev-stats.rx_bytes += skb-len; + dev-stats.rx_bytes += len; spin_lock_irqsave(lp-lock, flags); pkts = (SMC_GET_RX_FIFO_INF() RX_FIFO_INF_RXSUSED_) 16; -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] SMC911X: Fix using of dereferenced skb after netif_rx
On Mon, Dec 03, 2007 at 03:59:09PM +0800, Wang Chen wrote: After netif_rx(), skb will be freed. So get the skb-len before netif_rx(). Signed-off-by: Wang Chen [EMAIL PROTECTED] Please send driver patches to Jeff Garzik [EMAIL PROTECTED]. Thanks, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmVHI~} [EMAIL PROTECTED] Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH][IPVS] Fix sched registration race when checking for name collision
The register_ip_vs_scheduler() checks for the scheduler with the same name under the read-locked __ip_vs_sched_lock, then drops, takes it for writing and puts the scheduler in list. This is racy, since we can have a race window between the lock being re-locked for writing. The fix is to search the scheduler with the given name right under the write-locked __ip_vs_sched_lock. Signed-off-by: Pavel Emelyanov [EMAIL PROTECTED] --- diff --git a/net/ipv4/ipvs/ip_vs_sched.c b/net/ipv4/ipvs/ip_vs_sched.c index 1602304..4322358 100644 --- a/net/ipv4/ipvs/ip_vs_sched.c +++ b/net/ipv4/ipvs/ip_vs_sched.c @@ -183,19 +183,6 @@ int register_ip_vs_scheduler(struct ip_vs_scheduler *scheduler) /* increase the module use count */ ip_vs_use_count_inc(); - /* -* Make sure that the scheduler with this name doesn't exist -* in the scheduler list. -*/ - sched = ip_vs_sched_getbyname(scheduler-name); - if (sched) { - ip_vs_scheduler_put(sched); - ip_vs_use_count_dec(); - IP_VS_ERR(register_ip_vs_scheduler(): [%s] scheduler - already existed in the system\n, scheduler-name); - return -EINVAL; - } - write_lock_bh(__ip_vs_sched_lock); if (scheduler-n_list.next != scheduler-n_list) { @@ -207,6 +194,20 @@ int register_ip_vs_scheduler(struct ip_vs_scheduler *scheduler) } /* +* Make sure that the scheduler with this name doesn't exist +* in the scheduler list. +*/ + list_for_each_entry(sched, ip_vs_schedulers, n_list) { + if (strcmp(scheduler-name, sched-name) == 0) { + write_unlock_bh(__ip_vs_sched_lock); + ip_vs_use_count_dec(); + IP_VS_ERR(register_ip_vs_scheduler(): [%s] scheduler + already existed in the system\n, + scheduler-name); + return -EINVAL; + } + } + /* * Add it into the d-linked scheduler list */ list_add(scheduler-n_list, ip_vs_schedulers); -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] sungem: fix napi regression with reset work
On Thu, 2007-11-29 at 21:57 +1100, Herbert Xu wrote: On Mon, Nov 26, 2007 at 09:02:08PM +0100, Johannes Berg wrote: sungem's gem_reset_task() will unconditionally try to disable NAPI even when it's called while the interface is not operating and hence the NAPI struct isn't enabled. Make napi_disable() depend on gp-running. Also removes a superfluous test of gp-running in the same function. Signed-off-by: Johannes Berg [EMAIL PROTECTED] Patch applied to net-2.6. Thanks Johannes! Thanks. I just noticed that I made a mistake in the patch description, it should read Make napi_disable() depend on gp-opened. I don't think it matters much but wanted to at least mention it. johannes signature.asc Description: This is a digitally signed message part
Re: [PATCH net-2.6.25 1/6][CORE] Remove unneeded ifdefs from sysctl_net_core.c
Pavel Emelyanov [EMAIL PROTECTED] writes: Eric W. Biederman wrote: Pavel Emelyanov [EMAIL PROTECTED] writes: They include the whole file, but it is already compiled out when SYSCTL=n, since it is obj-$(CONFIG_SYSCTL) target in the Makefile. Pavel thanks for sending these patches. Might I ask for some level of acknowledgement when you rework one of my patches and send it off. Sure. I though I've been doing exactly this thing. I've said that unix ctls were from your tree and so on. Ok. I guess I must have overlooked that attribution. I suppose this could be a case of duplicate thinking but this patch looks very familiar. Hm... Do you mean that this one is from your tree? Sorry then, I didn't know this. I can resend it with From: you if you wish. It is. But I don't see any of this as needing a resend. Please just don't forget me in the future is all I ask. Eric -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 3/5] [IrDA] irlmp_unregister_link needs to free lsaps
While testing the mcs7780 based IrDA USB dongle I've stumbled upon memory leak in irlmp_unregister_link(). Hashbin for lsaps is created in irlmp_register_link and should probably be freed in irlmp_unregister_link(). Signed-off-by: Hinko Kocevar [EMAIL PROTECTED] Signed-off-by: Samuel Ortiz [EMAIL PROTECTED] --- net/irda/irlmp.c |1 + 1 file changed, 1 insertion(+) Index: net-2.6/net/irda/irlmp.c === --- net-2.6.orig/net/irda/irlmp.c 2007-11-25 05:54:02.0 +0100 +++ net-2.6/net/irda/irlmp.c2007-11-25 07:12:13.0 +0100 @@ -353,6 +353,7 @@ /* Final cleanup */ del_timer(link-idle_timer); link-magic = 0; + hashbin_delete(link-lsaps, (FREE_FUNC) __irlmp_close_lsap); kfree(link); } } -- -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [1/4] dst: Distributed storage documentation.
Hi Matt. On Sun, Dec 02, 2007 at 10:50:59PM -0600, Matt Mackall ([EMAIL PROTECTED]) wrote: Distributed storage documentation. Algorithms used in the system, userspace interfaces (sysfs dirs and files), design and implementation details are described here. Can you give us a summary of how this differs from using device mapper with NBD? From the higher point ov view it does not, but it operates quite differently: it has async processing of the requests, thus not blocking, it has different protocol with smaller overhead, supports strong checksums, has in-kernel export server, which supports simple security attributes (i.e. allow to connect, to read or write). It uses smaller amount of memory (zero additional allocations in the common path for linear mapping, not including network allocations, it uses smaller amount of additional allocations for mirroring case). DST supports failure recovery in case of dropped connection (core will reconnect to the remote node when it is ready), thus it is possible to turn off and on remote nodes without special administration steps. DST has simple autoconfiguration at the startup time (support checksums and storage size autonegotiation). It is possible to turn one of the mirror nodes off and use it as a offline backup, since dst mirror node stores data at the end of the storage, so it can be mounted locally. -- Evgeniy Polyakov -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net-2.6.25] TCP: Fix copy-paste (or code move) error
On Sun, Dec 02, 2007 at 06:45:44PM +0200, Ilpo Järvinen wrote: Should get the skb from the same queue. I had it first elsewhere and missed this change while moving. Signed-off-by: Ilpo Järvinen [EMAIL PROTECTED] Applied to net-2.6.25. Thanks Ilpo! -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmVHI~} [EMAIL PROTECTED] Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH net-2.6.25 1/2][IPV6] Make the ipv6/sysctl_net_ipv6.c compilation cleaner
Since this file is enclosed with the #ifdef CONFIG_SYSCTL/#endif pair, it's OK to move this CONFIG_ into a Makefile. Signed-off-by: Pavel Emelyanov [EMAIL PROTECTED] --- diff --git a/net/ipv6/Makefile b/net/ipv6/Makefile index 5ffa980..24f3aa0 100644 --- a/net/ipv6/Makefile +++ b/net/ipv6/Makefile @@ -8,9 +8,9 @@ ipv6-objs :=af_inet6.o anycast.o ip6_output.o ip6_input.o addrconf.o \ addrlabel.o \ route.o ip6_fib.o ipv6_sockglue.o ndisc.o udp.o udplite.o \ raw.o protocol.o icmp.o mcast.o reassembly.o tcp_ipv6.o \ - exthdrs.o sysctl_net_ipv6.o datagram.o \ - ip6_flowlabel.o inet6_connection_sock.o + exthdrs.o datagram.o ip6_flowlabel.o inet6_connection_sock.o +ipv6-$(CONFIG_SYSCTL) = sysctl_net_ipv6.o ipv6-$(CONFIG_XFRM) += xfrm6_policy.o xfrm6_state.o xfrm6_input.o \ xfrm6_output.o ipv6-$(CONFIG_NETFILTER) += netfilter.o diff --git a/net/ipv6/sysctl_net_ipv6.c b/net/ipv6/sysctl_net_ipv6.c index 68bb254..227efa7 100644 --- a/net/ipv6/sysctl_net_ipv6.c +++ b/net/ipv6/sysctl_net_ipv6.c @@ -14,8 +14,6 @@ #include net/addrconf.h #include net/inet_frag.h -#ifdef CONFIG_SYSCTL - static ctl_table ipv6_table[] = { { .ctl_name = NET_IPV6_ROUTE, @@ -115,8 +113,3 @@ void ipv6_sysctl_unregister(void) { unregister_sysctl_table(ipv6_sysctl_header); } - -#endif /* CONFIG_SYSCTL */ - - - -- 1.5.3.4 -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH net-2.6.25 2/2][IPV6] Use sysctl paths to register ipv6 sysctl tables
This makes the ipv6.ko smaller and creates the ground needed for net namespaces support in ipv6.ko ssctls. Signed-off-by: Pavel Emelyanov [EMAIL PROTECTED] --- diff --git a/net/ipv6/sysctl_net_ipv6.c b/net/ipv6/sysctl_net_ipv6.c index 227efa7..0b5bec3 100644 --- a/net/ipv6/sysctl_net_ipv6.c +++ b/net/ipv6/sysctl_net_ipv6.c @@ -82,31 +82,17 @@ static ctl_table ipv6_table[] = { { .ctl_name = 0 } }; -static struct ctl_table_header *ipv6_sysctl_header; - -static ctl_table ipv6_net_table[] = { - { - .ctl_name = NET_IPV6, - .procname = ipv6, - .mode = 0555, - .child = ipv6_table - }, - { .ctl_name = 0 } +static struct ctl_path ipv6_ctl_path[] = { + { .procname = net, .ctl_name = CTL_NET, }, + { .procname = ipv6, .ctl_name = NET_IPV6, }, + { }, }; -static ctl_table ipv6_root_table[] = { - { - .ctl_name = CTL_NET, - .procname = net, - .mode = 0555, - .child = ipv6_net_table - }, - { .ctl_name = 0 } -}; +static struct ctl_table_header *ipv6_sysctl_header; void ipv6_sysctl_register(void) { - ipv6_sysctl_header = register_sysctl_table(ipv6_root_table); + ipv6_sysctl_header = register_sysctl_paths(ipv6_ctl_path, ipv6_table); } void ipv6_sysctl_unregister(void) -- 1.5.3.4 -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[RFC][Resend][PATCH 0/7]: CCID3/TFRC RX history and loss intervals implementation
I am re-submitting the original patch set which Arnaldo referred to, due to: 1/ It has been reworked to patch/compile against the 2.6.25 tree, which lead to a few changes (not conceptual in nature). 2/ Patch #7 in Arnaldo's set says that this patch set introduced some bugs. Please can you point out if and what those are, so that they can be fixed (just saying that there are bugs without saying where they are is not very helpful). Overview: - Patch #1: Provides a central module source file for dccp_tfrc_lib. Patch #2: Implements a new (reduced) RX history interface. Patch #3: Hooks CCID3 up with the new RX history interface. Patch #4: Removes the old/previous RX history interface. Patch #5: Ringbuffer-based loss intervals database. Patch #6: Interface CCID3 with loss intervals database. The algorithms that are implemented here have been documented on http://www.erg.abdn.ac.uk/users/gerrit/dccp/notes/ccid3_packet_reception/ -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/6] [TFRC]: Provide central source file and debug facility
This patch changes the tfrc_lib module in the following manner: (1) a dedicated tfrc_module source file (this is later populated with TX/RX hist and LI Database routines); (2) a dedicated tfrc_pr_debug macro with toggle switch `tfrc_debug'. Signed-off-by: Gerrit Renker [EMAIL PROTECTED] Signed-off-by: Ian McDonald [EMAIL PROTECTED] --- net/dccp/ccids/Kconfig | 12 ++--- net/dccp/ccids/lib/Makefile |3 +- net/dccp/ccids/lib/loss_interval.c |2 +- net/dccp/ccids/lib/packet_history.c | 28 ++--- net/dccp/ccids/lib/packet_history.h |3 +- net/dccp/ccids/lib/tfrc.h | 17 ++--- net/dccp/ccids/lib/tfrc_module.c| 45 +++ 7 files changed, 73 insertions(+), 37 deletions(-) create mode 100644 net/dccp/ccids/lib/tfrc_module.c diff --git a/net/dccp/ccids/Kconfig b/net/dccp/ccids/Kconfig index 3d7d867..a0c42d7 100644 --- a/net/dccp/ccids/Kconfig +++ b/net/dccp/ccids/Kconfig @@ -63,10 +63,6 @@ config IP_DCCP_CCID3 If in doubt, say M. -config IP_DCCP_TFRC_LIB - depends on IP_DCCP_CCID3 - def_tristate IP_DCCP_CCID3 - config IP_DCCP_CCID3_DEBUG bool CCID3 debugging messages depends on IP_DCCP_CCID3 @@ -110,5 +106,13 @@ config IP_DCCP_CCID3_RTO is serious network congestion: experimenting with larger values should therefore not be performed on WANs. +# The TFRC Library: currently only has CCID 3 as customer +config IP_DCCP_TFRC_LIB + depends on IP_DCCP_CCID3 + def_tristate IP_DCCP_CCID3 + +config IP_DCCP_TFRC_DEBUG + bool + default y if IP_DCCP_CCID3_DEBUG endmenu diff --git a/net/dccp/ccids/lib/Makefile b/net/dccp/ccids/lib/Makefile index 5f940a6..1295635 100644 --- a/net/dccp/ccids/lib/Makefile +++ b/net/dccp/ccids/lib/Makefile @@ -1,3 +1,4 @@ obj-$(CONFIG_IP_DCCP_TFRC_LIB) += dccp_tfrc_lib.o -dccp_tfrc_lib-y := loss_interval.o packet_history.o tfrc_equation.o +dccp_tfrc_lib-y := tfrc_module.otfrc_equation.o \ + packet_history.o loss_interval.o diff --git a/net/dccp/ccids/lib/loss_interval.c b/net/dccp/ccids/lib/loss_interval.c index f2ca4eb..4fe3c1c 100644 --- a/net/dccp/ccids/lib/loss_interval.c +++ b/net/dccp/ccids/lib/loss_interval.c @@ -285,7 +285,7 @@ int __init dccp_li_init(void) return dccp_li_cachep == NULL ? -ENOBUFS : 0; } -void dccp_li_exit(void) +void __exit dccp_li_exit(void) { if (dccp_li_cachep != NULL) { kmem_cache_destroy(dccp_li_cachep); diff --git a/net/dccp/ccids/lib/packet_history.c b/net/dccp/ccids/lib/packet_history.c index 4805de9..ebcb6c0 100644 --- a/net/dccp/ccids/lib/packet_history.c +++ b/net/dccp/ccids/lib/packet_history.c @@ -35,7 +35,6 @@ * Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA. */ -#include linux/module.h #include linux/string.h #include packet_history.h @@ -277,39 +276,18 @@ void dccp_rx_hist_purge(struct dccp_rx_hist *hist, struct list_head *list) EXPORT_SYMBOL_GPL(dccp_rx_hist_purge); -extern int __init dccp_li_init(void); -extern void dccp_li_exit(void); - -static __init int packet_history_init(void) +int __init packet_history_init(void) { - if (dccp_li_init() != 0) - goto out; - tfrc_tx_hist = kmem_cache_create(tfrc_tx_hist, sizeof(struct tfrc_tx_hist_entry), 0, SLAB_HWCACHE_ALIGN, NULL); - if (tfrc_tx_hist == NULL) - goto out_li_exit; - - return 0; -out_li_exit: - dccp_li_exit(); -out: - return -ENOBUFS; + return tfrc_tx_hist == NULL ? -ENOBUFS : 0; } -module_init(packet_history_init); -static __exit void packet_history_exit(void) +void __exit packet_history_exit(void) { if (tfrc_tx_hist != NULL) { kmem_cache_destroy(tfrc_tx_hist); tfrc_tx_hist = NULL; } - dccp_li_exit(); } -module_exit(packet_history_exit); - -MODULE_AUTHOR(Ian McDonald [EMAIL PROTECTED], - Arnaldo Carvalho de Melo [EMAIL PROTECTED]); -MODULE_DESCRIPTION(DCCP TFRC library); -MODULE_LICENSE(GPL); diff --git a/net/dccp/ccids/lib/packet_history.h b/net/dccp/ccids/lib/packet_history.h index 0670f46..9a2642e 100644 --- a/net/dccp/ccids/lib/packet_history.h +++ b/net/dccp/ccids/lib/packet_history.h @@ -39,8 +39,7 @@ #include linux/ktime.h #include linux/list.h #include linux/slab.h - -#include ../../dccp.h +#include tfrc.h /* Number of later packets received before one is considered lost */ #define TFRC_RECV_NUM_LATE_LOSS 3 diff --git a/net/dccp/ccids/lib/tfrc.h b/net/dccp/ccids/lib/tfrc.h index 5a0ba86..ab8848c 100644 --- a/net/dccp/ccids/lib/tfrc.h +++ b/net/dccp/ccids/lib/tfrc.h @@ -3,10 +3,11 @@ /* * net/dccp/ccids/lib/tfrc.h * - * Copyright (c) 2005 The University of Waikato, Hamilton, New Zealand. - * Copyright (c) 2005 Ian McDonald [EMAIL
[PATCH 6/6] [CCID]: Interface CCID3 code with newer Loss Intervals Database
This hooks up the TFRC Loss Interval database with CCID 3 packet reception. In addition, it makes the CCID-specific computation of the first loss interval (which requires access to all the guts of CCID3) local to ccid3.c. The patch also fixes an omission in the DCCP code, that of a default / fallback RTT value (defined in section 3.4 of RFC 4340 as 0.2 sec); while at it, the upper bound of 4 seconds for an RTT sample has been reduced to match the initial TCP RTO value of 3 seconds from[RFC 1122, 4.2.3.1]. Signed-off-by: Gerrit Renker [EMAIL PROTECTED] Signed-off-by: Ian McDonald [EMAIL PROTECTED] --- net/dccp/ccids/ccid3.c | 76 +++ net/dccp/ccids/ccid3.h |8 ++-- net/dccp/dccp.h|7 +++- 3 files changed, 72 insertions(+), 19 deletions(-) diff --git a/net/dccp/ccids/ccid3.c b/net/dccp/ccids/ccid3.c index ef243dc..bf6d043 100644 --- a/net/dccp/ccids/ccid3.c +++ b/net/dccp/ccids/ccid3.c @@ -34,11 +34,7 @@ * along with this program; if not, write to the Free Software * Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA. */ -#include ../ccid.h #include ../dccp.h -#include lib/packet_history.h -#include lib/loss_interval.h -#include lib/tfrc.h #include ccid3.h #include asm/unaligned.h @@ -751,6 +747,46 @@ static int ccid3_hc_rx_insert_options(struct sock *sk, struct sk_buff *skb) return 0; } +/** ccid3_first_li - Implements [RFC 3448, 6.3.1] + * + * Determine the length of the first loss interval via inverse lookup. + * Assume that X_recv can be computed by the throughput equation + * s + * X_recv = + * R * fval + * Find some p such that f(p) = fval; return 1/p (scaled). + */ +static u32 ccid3_first_li(struct sock *sk) +{ + struct ccid3_hc_rx_sock *hcrx = ccid3_hc_rx_sk(sk); + u32 x_recv, p, delta; + u64 fval; + + if (hcrx-ccid3hcrx_rtt == 0) { + DCCP_WARN(No RTT estimate available, using fallback RTT\n); + hcrx-ccid3hcrx_rtt = DCCP_FALLBACK_RTT; + } + + delta = ktime_to_us(net_timedelta(hcrx-ccid3hcrx_last_feedback)); + x_recv = scaled_div32(hcrx-ccid3hcrx_bytes_recv, delta); + if (x_recv == 0) { /* would also trigger divide-by-zero */ + DCCP_WARN(X_recv==0\n); + if ((x_recv = hcrx-ccid3hcrx_x_recv) == 0) { + DCCP_BUG(stored value of X_recv is zero); + return ~0U; + } + } + + fval = scaled_div(hcrx-ccid3hcrx_s, hcrx-ccid3hcrx_rtt); + fval = scaled_div32(fval, x_recv); + p = tfrc_calc_x_reverse_lookup(fval); + + ccid3_pr_debug(%s(%p), receive rate=%u bytes/s, implied + loss rate=%u\n, dccp_role(sk), sk, x_recv, p); + + return p == 0 ? ~0U : scaled_div(1, p); +} + static void ccid3_hc_rx_packet_recv(struct sock *sk, struct sk_buff *skb) { struct ccid3_hc_rx_sock *hcrx = ccid3_hc_rx_sk(sk); @@ -779,6 +815,14 @@ static void ccid3_hc_rx_packet_recv(struct sock *sk, struct sk_buff *skb) /* * Handle pending losses and otherwise check for new loss */ + if (tfrc_rx_loss_pending(hcrx-ccid3hcrx_hist) + tfrc_rx_handle_loss(hcrx-ccid3hcrx_hist, + hcrx-ccid3hcrx_li_hist, + skb, ndp, ccid3_first_li, sk) ) { + do_feedback = FBACK_PARAM_CHANGE; + goto done_receiving; + } + if (tfrc_rx_new_loss_indicated(hcrx-ccid3hcrx_hist, skb, ndp)) goto update_records; @@ -788,13 +832,23 @@ static void ccid3_hc_rx_packet_recv(struct sock *sk, struct sk_buff *skb) if (unlikely(!is_data_packet)) goto update_records; - if (list_empty(hcrx-ccid3hcrx_li_hist)) { /* no loss so far: p = 0 */ - + if (! tfrc_lh_is_initialised(hcrx-ccid3hcrx_li_hist)) { + /* +* Empty loss history: no loss so far, hence p stays 0. +* Sample RTT values, since an RTT estimate is required for the +* computation of p when the first loss occurs; RFC 3448, 6.3.1. +*/ sample = tfrc_rx_sample_rtt(hcrx-ccid3hcrx_hist, skb); if (sample != 0) hcrx-ccid3hcrx_rtt = tfrc_ewma(hcrx-ccid3hcrx_rtt, sample, 9); + } else if (tfrc_lh_update_i_mean(hcrx-ccid3hcrx_li_hist, skb)) { + /* +* Step (3) of [RFC 3448, 6.1]: Recompute I_mean and, if I_mean +* has decreased (resp. p has increased), send feedback now. +*/ + do_feedback = FBACK_PARAM_CHANGE; } /* check if the periodic once-per-RTT feedback is due; RFC 4342, 10.3 */ @@ -816,10 +870,8 @@ static int ccid3_hc_rx_init(struct ccid *ccid, struct sock *sk) ccid3_pr_debug(entry\n);
[PATCH 4/6] [TFRC]: Remove old RX history interface
Signed-off-by: Gerrit Renker [EMAIL PROTECTED] Signed-off-by: Ian McDonald [EMAIL PROTECTED] --- net/dccp/ccids/lib/loss_interval.c |9 ++ net/dccp/ccids/lib/packet_history.c | 162 --- net/dccp/ccids/lib/packet_history.h | 84 -- 3 files changed, 9 insertions(+), 246 deletions(-) diff --git a/net/dccp/ccids/lib/loss_interval.c b/net/dccp/ccids/lib/loss_interval.c index 4fe3c1c..0ebdc67 100644 --- a/net/dccp/ccids/lib/loss_interval.c +++ b/net/dccp/ccids/lib/loss_interval.c @@ -129,6 +129,12 @@ static u32 dccp_li_calc_first_li(struct sock *sk, u16 s, u32 bytes_recv, u32 previous_x_recv) { + /* +* XXX This function still relies on the old RX interface and thus can not be +* kept. But it can also not be removed since the loss interval code calls it. +* Since it will be replaced anyway, comment it out for this moment. +*/ +#ifdef __THIS_IS_RESOLVED_IN_NEXT_PATCH__ struct dccp_rx_hist_entry *entry, *next, *tail = NULL; u32 x_recv, p; suseconds_t rtt, delta; @@ -220,6 +226,9 @@ found: return ~0; else return 100 / p; +#else + return ~0; +#endif } void dccp_li_update_li(struct sock *sk, diff --git a/net/dccp/ccids/lib/packet_history.c b/net/dccp/ccids/lib/packet_history.c index 2f9e0a0..e87e445 100644 --- a/net/dccp/ccids/lib/packet_history.c +++ b/net/dccp/ccids/lib/packet_history.c @@ -129,156 +129,6 @@ EXPORT_SYMBOL_GPL(tfrc_tx_hist_rtt); /* * Receiver History Routines */ -struct dccp_rx_hist *dccp_rx_hist_new(const char *name) -{ - struct dccp_rx_hist *hist = kmalloc(sizeof(*hist), GFP_ATOMIC); - static const char dccp_rx_hist_mask[] = rx_hist_%s; - char *slab_name; - - if (hist == NULL) - goto out; - - slab_name = kmalloc(strlen(name) + sizeof(dccp_rx_hist_mask) - 1, - GFP_ATOMIC); - if (slab_name == NULL) - goto out_free_hist; - - sprintf(slab_name, dccp_rx_hist_mask, name); - hist-dccprxh_slab = kmem_cache_create(slab_name, -sizeof(struct dccp_rx_hist_entry), -0, SLAB_HWCACHE_ALIGN, NULL); - if (hist-dccprxh_slab == NULL) - goto out_free_slab_name; -out: - return hist; -out_free_slab_name: - kfree(slab_name); -out_free_hist: - kfree(hist); - hist = NULL; - goto out; -} - -EXPORT_SYMBOL_GPL(dccp_rx_hist_new); - -void dccp_rx_hist_delete(struct dccp_rx_hist *hist) -{ - const char* name = kmem_cache_name(hist-dccprxh_slab); - - kmem_cache_destroy(hist-dccprxh_slab); - kfree(name); - kfree(hist); -} - -EXPORT_SYMBOL_GPL(dccp_rx_hist_delete); - -int dccp_rx_hist_find_entry(const struct list_head *list, const u64 seq, - u8 *ccval) -{ - struct dccp_rx_hist_entry *packet = NULL, *entry; - - list_for_each_entry(entry, list, dccphrx_node) - if (entry-dccphrx_seqno == seq) { - packet = entry; - break; - } - - if (packet) - *ccval = packet-dccphrx_ccval; - - return packet != NULL; -} - -EXPORT_SYMBOL_GPL(dccp_rx_hist_find_entry); -struct dccp_rx_hist_entry * - dccp_rx_hist_find_data_packet(const struct list_head *list) -{ - struct dccp_rx_hist_entry *entry, *packet = NULL; - - list_for_each_entry(entry, list, dccphrx_node) - if (entry-dccphrx_type == DCCP_PKT_DATA || - entry-dccphrx_type == DCCP_PKT_DATAACK) { - packet = entry; - break; - } - - return packet; -} - -EXPORT_SYMBOL_GPL(dccp_rx_hist_find_data_packet); - -void dccp_rx_hist_add_packet(struct dccp_rx_hist *hist, - struct list_head *rx_list, - struct list_head *li_list, - struct dccp_rx_hist_entry *packet, - u64 nonloss_seqno) -{ - struct dccp_rx_hist_entry *entry, *next; - u8 num_later = 0; - - list_add(packet-dccphrx_node, rx_list); - - num_later = TFRC_RECV_NUM_LATE_LOSS + 1; - - if (!list_empty(li_list)) { - list_for_each_entry_safe(entry, next, rx_list, dccphrx_node) { - if (num_later == 0) { - if (after48(nonloss_seqno, - entry-dccphrx_seqno)) { - list_del_init(entry-dccphrx_node); - dccp_rx_hist_entry_delete(hist, entry); - } - } else if (dccp_rx_hist_entry_data_packet(entry)) - --num_later; -
[PATCH 2/6] [TFRC]: New RX history implementation
This provides a new, self-contained and generic RX history service for TFRC based protocols. Details: * new data structure, initialisation and cleanup routines; * allocation of dccp_rx_hist entries local to packet_history.c, as a service exported by the dccp_tfrc_lib module. * interface to automatically track highest-received seqno; * receiver-based RTT estimation (needed for instance by RFC 3448, 6.3.1); * a generic function to test for `data packets' as per RFC 4340, sec. 7.7. Signed-off-by: Gerrit Renker [EMAIL PROTECTED] Signed-off-by: Ian McDonald [EMAIL PROTECTED] --- net/dccp/ccids/lib/packet_history.c | 120 ++--- net/dccp/ccids/lib/packet_history.h | 143 ++- net/dccp/ccids/lib/tfrc_module.c| 27 ++- net/dccp/dccp.h | 12 +++ 4 files changed, 281 insertions(+), 21 deletions(-) diff --git a/net/dccp/ccids/lib/packet_history.c b/net/dccp/ccids/lib/packet_history.c index ebcb6c0..2f9e0a0 100644 --- a/net/dccp/ccids/lib/packet_history.c +++ b/net/dccp/ccids/lib/packet_history.c @@ -34,7 +34,6 @@ * along with this program; if not, write to the Free Software * Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA. */ - #include linux/string.h #include packet_history.h @@ -55,6 +54,22 @@ struct tfrc_tx_hist_entry { */ static struct kmem_cache *tfrc_tx_hist; +int __init tx_packet_history_init(void) +{ + tfrc_tx_hist = kmem_cache_create(tfrc_tx_hist, +sizeof(struct tfrc_tx_hist_entry), 0, +SLAB_HWCACHE_ALIGN, NULL); + return tfrc_tx_hist == NULL ? -ENOBUFS : 0; +} + +void tx_packet_history_cleanup(void) +{ + if (tfrc_tx_hist != NULL) { + kmem_cache_destroy(tfrc_tx_hist); + tfrc_tx_hist = NULL; + } +} + static struct tfrc_tx_hist_entry * tfrc_tx_hist_find_entry(struct tfrc_tx_hist_entry *head, u64 seqno) { @@ -264,6 +279,49 @@ void dccp_rx_hist_add_packet(struct dccp_rx_hist *hist, EXPORT_SYMBOL_GPL(dccp_rx_hist_add_packet); +static struct kmem_cache *tfrc_rxh_cache; + +int __init rx_packet_history_init(void) +{ + tfrc_rxh_cache = kmem_cache_create(tfrc_rxh_cache, + sizeof(struct tfrc_rx_hist_entry), + 0, SLAB_HWCACHE_ALIGN, NULL); + return tfrc_rxh_cache == NULL ? -ENOBUFS : 0; +} + +void rx_packet_history_cleanup(void) +{ + if (tfrc_rxh_cache != NULL) { + kmem_cache_destroy(tfrc_rxh_cache); + tfrc_rxh_cache = NULL; + } +} + +int tfrc_rx_hist_init(struct tfrc_rx_hist *h) +{ + int i; + + for (i = 0; i = NDUPACK; i++) { + h-ring[i] = kmem_cache_alloc(tfrc_rxh_cache, GFP_ATOMIC); + if (h-ring[i] == NULL) + return 1; + } + h-loss_count = 0; + h-loss_start = 0; + return 0; +} +EXPORT_SYMBOL_GPL(tfrc_rx_hist_init); + +void tfrc_rx_hist_cleanup(struct tfrc_rx_hist *h) +{ + int i; + + for (i=0; i = NDUPACK; i++) + if (h-ring[i] != NULL) + kmem_cache_free(tfrc_rxh_cache, h-ring[i]); +} +EXPORT_SYMBOL_GPL(tfrc_rx_hist_cleanup); + void dccp_rx_hist_purge(struct dccp_rx_hist *hist, struct list_head *list) { struct dccp_rx_hist_entry *entry, *next; @@ -276,18 +334,56 @@ void dccp_rx_hist_purge(struct dccp_rx_hist *hist, struct list_head *list) EXPORT_SYMBOL_GPL(dccp_rx_hist_purge); -int __init packet_history_init(void) +/** + * tfrc_rx_sample_rtt - Sample RTT from timestamp / CCVal + * Based on ideas presented in RFC 4342, 8.1. Returns 0 if it was not able + * to compute a sample with given data - calling function should check this. + */ +u32 tfrc_rx_sample_rtt(struct tfrc_rx_hist *h, struct sk_buff *skb) { - tfrc_tx_hist = kmem_cache_create(tfrc_tx_hist, -sizeof(struct tfrc_tx_hist_entry), 0, -SLAB_HWCACHE_ALIGN, NULL); - return tfrc_tx_hist == NULL ? -ENOBUFS : 0; -} + u32 sample = 0, + delta_v = SUB16(dccp_hdr(skb)-dccph_ccval, rtt_last_s(h)-ccval); + + if (delta_v 1 || delta_v 4) { /* unsuitable CCVal delta */ + + if (h-rtt_sample_prev == 2) { /* previous candidate stored */ + sample = SUB16(rtt_prev_s(h)-ccval, + rtt_last_s(h)-ccval); + if (sample) + sample = 4 / sample + * ktime_us_delta(rtt_prev_s(h)-stamp, + rtt_last_s(h)-stamp); + else/* +* FIXME: This condition is in principle not +* possible but occurs when CCID is used for +
[PATCH 3/6] [CCID3]: Hook up with new RX history interface
In addition, it makes two corrections too the code: 1. The receiver of a half-connection does not set window counter values; only the sender sets window counters [RFC 4342, sections 5 and 8.1]. 2. The computation of X_recv does currently not conform to TFRC/RFC 3448, since this specification requires that X_recv be computed over the last R_m seconds (sec. 6.2). The patch tackles this problem as it - explicitly distinguishes the types of feedback (using an enum); - uses previous value of X_recv when sending feedback due to a parameter change; - makes all state changes local to ccid3_hc_tx_packet_recv; - assigns feedback type according to incident (previously only used flag `do_feedback'). Further and detailed information is at http://www.erg.abdn.ac.uk/users/gerrit/dccp/notes/ccid3_packet_reception/#8._Computing_X_recv_ Signed-off-by: Gerrit Renker [EMAIL PROTECTED] Signed-off-by: Ian McDonald [EMAIL PROTECTED] --- net/dccp/ccids/ccid3.c | 292 ++-- net/dccp/ccids/ccid3.h | 38 --- 2 files changed, 102 insertions(+), 228 deletions(-) diff --git a/net/dccp/ccids/ccid3.c b/net/dccp/ccids/ccid3.c index 5dea690..ef243dc 100644 --- a/net/dccp/ccids/ccid3.c +++ b/net/dccp/ccids/ccid3.c @@ -1,6 +1,7 @@ /* * net/dccp/ccids/ccid3.c * + * Copyright (c) 2007 The University of Aberdeen, Scotland, UK * Copyright (c) 2005-7 The University of Waikato, Hamilton, New Zealand. * Copyright (c) 2005-7 Ian McDonald [EMAIL PROTECTED] * @@ -49,8 +50,6 @@ static int ccid3_debug; #define ccid3_pr_debug(format, a...) #endif -static struct dccp_rx_hist *ccid3_rx_hist; - /* * Transmitter Half-Connection Routines */ @@ -675,52 +674,53 @@ static inline void ccid3_hc_rx_update_s(struct ccid3_hc_rx_sock *hcrx, int len) hcrx-ccid3hcrx_s = tfrc_ewma(hcrx-ccid3hcrx_s, len, 9); } -static void ccid3_hc_rx_send_feedback(struct sock *sk) +static void ccid3_hc_rx_send_feedback(struct sock *sk, struct sk_buff *skb, + enum ccid3_fback_type fbtype) { struct ccid3_hc_rx_sock *hcrx = ccid3_hc_rx_sk(sk); struct dccp_sock *dp = dccp_sk(sk); - struct dccp_rx_hist_entry *packet; - ktime_t now; - suseconds_t delta; - - ccid3_pr_debug(%s(%p) - entry \n, dccp_role(sk), sk); + ktime_t now = ktime_get_real(); + s64 delta = 0; - now = ktime_get_real(); + if (unlikely(hcrx-ccid3hcrx_state == TFRC_RSTATE_TERM)) + return; - switch (hcrx-ccid3hcrx_state) { - case TFRC_RSTATE_NO_DATA: + switch (fbtype) { + case FBACK_INITIAL: hcrx-ccid3hcrx_x_recv = 0; + hcrx-ccid3hcrx_pinv = ~0U; /* see RFC 4342, 8.5 */ break; - case TFRC_RSTATE_DATA: - delta = ktime_us_delta(now, - hcrx-ccid3hcrx_tstamp_last_feedback); - DCCP_BUG_ON(delta 0); - hcrx-ccid3hcrx_x_recv = - scaled_div32(hcrx-ccid3hcrx_bytes_recv, delta); + case FBACK_PARAM_CHANGE: + /* +* When parameters change (new loss or p p_prev), we do not +* have a reliable estimate for R_m of [RFC 3448, 6.2] and so +* need to reuse the previous value of X_recv. However, when +* X_recv was 0 (due to early loss), this would kill X down to +* s/t_mbi (i.e. one packet in 64 seconds). +* To avoid such drastic reduction, we approximate X_recv as +* the number of bytes since last feedback. +* This is a safe fallback, since X is bounded above by X_calc. +*/ + if (hcrx-ccid3hcrx_x_recv 0) + break; + /* fall through */ + case FBACK_PERIODIC: + delta = ktime_us_delta(now, hcrx-ccid3hcrx_last_feedback); + if (delta = 0) + DCCP_BUG(delta (%ld) = 0, (long)delta); + else + hcrx-ccid3hcrx_x_recv = + scaled_div32(hcrx-ccid3hcrx_bytes_recv, delta); break; - case TFRC_RSTATE_TERM: - DCCP_BUG(%s(%p) - Illegal state TERM, dccp_role(sk), sk); - return; - } - - packet = dccp_rx_hist_find_data_packet(hcrx-ccid3hcrx_hist); - if (unlikely(packet == NULL)) { - DCCP_WARN(%s(%p), no data packet in history!\n, - dccp_role(sk), sk); + default: return; } + ccid3_pr_debug(Interval %ldusec, X_recv=%u, 1/p=%u\n, (long)delta, + hcrx-ccid3hcrx_x_recv, hcrx-ccid3hcrx_pinv); - hcrx-ccid3hcrx_tstamp_last_feedback = now; - hcrx-ccid3hcrx_ccval_last_counter = packet-dccphrx_ccval; - hcrx-ccid3hcrx_bytes_recv
[PATCH 5/6] [TFRC]: Ringbuffer to track loss interval history
A ringbuffer-based implementation of loss interval history is easier to maintain, allocate, and update. Details: * access to the Loss Interval Records via macro wrappers (with safety checks); * simplified, on-demand allocation of entries (no extra memory consumption on lossless links); cache allocation is local to the module / exported as service; * provision of RFC-compliant algorithm to re-compute average loss interval; * provision of comprehensive, new loss detection algorithm - support for all cases of loss, including re-ordered/duplicate packets; - waiting for NDUPACK=3 packets to fill the hole; - updating loss records when a late-arriving packet fills a hole. Signed-off-by: Gerrit Renker [EMAIL PROTECTED] Signed-off-by: Ian McDonald [EMAIL PROTECTED] --- net/dccp/ccids/lib/loss_interval.c | 158 +- net/dccp/ccids/lib/loss_interval.h | 58 +++- net/dccp/ccids/lib/packet_history.c | 183 +++ net/dccp/ccids/lib/packet_history.h |4 + net/dccp/ccids/lib/tfrc.h |3 + 5 files changed, 400 insertions(+), 6 deletions(-) diff --git a/net/dccp/ccids/lib/loss_interval.c b/net/dccp/ccids/lib/loss_interval.c index 0ebdc67..cde5c7f 100644 --- a/net/dccp/ccids/lib/loss_interval.c +++ b/net/dccp/ccids/lib/loss_interval.c @@ -1,6 +1,7 @@ /* * net/dccp/ccids/lib/loss_interval.c * + * Copyright (c) 2007 The University of Aberdeen, Scotland, UK * Copyright (c) 2005-7 The University of Waikato, Hamilton, New Zealand. * Copyright (c) 2005-7 Ian McDonald [EMAIL PROTECTED] * Copyright (c) 2005 Arnaldo Carvalho de Melo [EMAIL PROTECTED] @@ -10,12 +11,7 @@ * the Free Software Foundation; either version 2 of the License, or * (at your option) any later version. */ - -#include linux/module.h #include net/sock.h -#include ../../dccp.h -#include loss_interval.h -#include packet_history.h #include tfrc.h #define DCCP_LI_HIST_IVAL_F_LENGTH 8 @@ -27,6 +23,51 @@ struct dccp_li_hist_entry { u32 dccplih_interval; }; +static struct kmem_cache *tfrc_lh_slab __read_mostly; +/* Loss Interval weights from [RFC 3448, 5.4], scaled by 10 */ +static const int tfrc_lh_weights[NINTERVAL] = { 10, 10, 10, 10, 8, 6, 4, 2 }; + +/* + * Access macros: These require that at least one entry is present in lh, + * and implement array semantics (0 is first, n-1 is the last of n entries). + */ +#define __lih_index(lh, n) LIH_INDEX((lh)-counter - (n) - 1) +#define __lih_entry(lh, n) (lh)-ring[__lih_index(lh, n)] +#define __curr_entry(lh) (lh)-ring[LIH_INDEX((lh)-counter - 1)] +#define __next_entry(lh) (lh)-ring[LIH_INDEX((lh)-counter)] + +/* given i with 0 = i = k, return I_i as per the rfc3448bis notation */ +static inline u32 tfrc_lh_get_interval(struct tfrc_loss_hist *lh, u8 i) +{ + BUG_ON(i = lh-counter); + return __lih_entry(lh, i)-li_length; +} + +static inline struct tfrc_loss_interval *tfrc_lh_peek(struct tfrc_loss_hist *lh) +{ + return lh-counter ? __curr_entry(lh) : NULL; +} + +/* + * On-demand allocation and de-allocation of entries + */ +static struct tfrc_loss_interval *tfrc_lh_demand_next(struct tfrc_loss_hist *lh) +{ + if (__next_entry(lh) == NULL) + __next_entry(lh) = kmem_cache_alloc(tfrc_lh_slab, GFP_ATOMIC); + + return __next_entry(lh); +} + +void tfrc_lh_cleanup(struct tfrc_loss_hist *lh) +{ + if (tfrc_lh_is_initialised(lh)) + for (lh-counter = 0; lh-counter LIH_SIZE; lh-counter++) + if (__next_entry(lh) != NULL) + kmem_cache_free(tfrc_lh_slab, __next_entry(lh)); +} +EXPORT_SYMBOL_GPL(tfrc_lh_cleanup); + static struct kmem_cache *dccp_li_cachep __read_mostly; static inline struct dccp_li_hist_entry *dccp_li_hist_entry_new(const gfp_t prio) @@ -98,6 +139,65 @@ u32 dccp_li_hist_calc_i_mean(struct list_head *list) EXPORT_SYMBOL_GPL(dccp_li_hist_calc_i_mean); +static void tfrc_lh_calc_i_mean(struct tfrc_loss_hist *lh) +{ + u32 i_i, i_tot0 = 0, i_tot1 = 0, w_tot = 0; + int i, k = tfrc_lh_length(lh) - 1; /* k is as in rfc3448bis, 5.4 */ + + for (i=0; i = k; i++) { + i_i = tfrc_lh_get_interval(lh, i); + + if (i k) { + i_tot0 += i_i * tfrc_lh_weights[i]; + w_tot += tfrc_lh_weights[i]; + } + if (i 0) + i_tot1 += i_i * tfrc_lh_weights[i-1]; + } + + BUG_ON(w_tot == 0); + lh-i_mean = max(i_tot0, i_tot1) / w_tot; +} + +/** + * tfrc_lh_update_i_mean - Update the `open' loss interval I_0 + * For recomputing p: returns `true' if p p_prev = 1/p 1/p_prev + */ +u8 tfrc_lh_update_i_mean(struct tfrc_loss_hist *lh, struct sk_buff *skb) +{ + struct tfrc_loss_interval *cur = tfrc_lh_peek(lh); + u32 old_i_mean = lh-i_mean; + s64 length; + +
Re: [PATCHv7 1/5] Remove unnecessary locks from rtnetlink (in do_setlink)
Jarek Poplawski írta: Laszlo Attila Toth wrote, On 11/29/2007 05:11 PM: The do_setlink function is protected by rtnl, additional locks are unnecessary, and the set_operstate() function is called from protected parts. Locks removed from both functions. It doesn't look like in accordance with a comment to dev_base_lock in dev.c. And it makes eg. rfc2863_policy() locking from link_watch.c looking strange. Isn't there needed some additional comment to this? I modified do_setlink(), but set_operstate() is also called from rtnl_create_link() and from no other places. In rtnl_create_link() none of the changes is protected by set_lock_bh() except inside set_operstate(), different locking scheme is not necessary for the operstate. Also two solution can be made, one is locking everything and one is locking nothing (to unify the changes made by these parts). The second one is better if it is protected. I tried to figure out how it is protected but I couldn't. But Patrick said it is protected by rtnl. And he suggested this patch. Attila -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/3] [UDP6]: Counter increment on BH mode
On Mon, Dec 03, 2007 at 10:39:35PM +1100, Herbert Xu wrote: So we need to fix this, and whatever the fix is will probably render the BH/USER distinction obsolete. Hmm, I would think opposite. USER (or generic) is expensive variant, BH is lite. No? Alexey -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2.6.25] net: removes unnecessary dependencies for net_namespace.h
you right, how about this? Eric W. Biederman wrote: Denis V. Lunev [EMAIL PROTECTED] writes: This patch removes some unneeded includes for net_namespace.h to speed up compilation. Signed-off-by: Denis V. Lunev [EMAIL PROTECTED] diff --git a/include/net/pkt_cls.h b/include/net/pkt_cls.h index f285de6..28b7f25 100644 --- a/include/net/pkt_cls.h +++ b/include/net/pkt_cls.h @@ -2,7 +2,6 @@ #define __NET_PKT_CLS_H #include linux/pkt_cls.h -#include net/net_namespace.h #include net/sch_generic.h #include net/act_api.h What of tcf_match_indev? I guess linux/netdevice.h brings it in for us but still. diff --git a/include/net/sock.h b/include/net/sock.h index 43e3cd9..a04e361 100644 --- a/include/net/sock.h +++ b/include/net/sock.h @@ -57,7 +57,6 @@ #include asm/atomic.h #include net/dst.h #include net/checksum.h -#include net/net_namespace.h /* * This structure really needs to be cleaned up. @@ -95,6 +94,7 @@ typedef struct { struct sock; struct proto; +struct net; /** * struct sock_common - minimal network layer representation of sockets This hunk definitely looks sane. Eric diff --git a/include/net/pkt_cls.h b/include/net/pkt_cls.h index f285de6..2eaf204 100644 --- a/include/net/pkt_cls.h +++ b/include/net/pkt_cls.h @@ -2,7 +2,6 @@ #define __NET_PKT_CLS_H #include linux/pkt_cls.h -#include net/net_namespace.h #include net/sch_generic.h #include net/act_api.h @@ -336,6 +335,8 @@ static inline int tcf_valid_offset(const struct sk_buff *skb, } #ifdef CONFIG_NET_CLS_IND +#include net/net_namespace.h + static inline int tcf_change_indev(struct tcf_proto *tp, char *indev, struct rtattr *indev_tlv) { diff --git a/include/net/sock.h b/include/net/sock.h index 43e3cd9..a04e361 100644 --- a/include/net/sock.h +++ b/include/net/sock.h @@ -57,7 +57,6 @@ #include asm/atomic.h #include net/dst.h #include net/checksum.h -#include net/net_namespace.h /* * This structure really needs to be cleaned up. @@ -95,6 +94,7 @@ typedef struct { struct sock; struct proto; +struct net; /** * struct sock_common - minimal network layer representation of sockets
Re: [PATCH 3/3] [UDP6]: Counter increment on BH mode
On Mon, Dec 03, 2007 at 02:49:12PM +0300, Alexey Kuznetsov wrote: On Mon, Dec 03, 2007 at 10:39:35PM +1100, Herbert Xu wrote: So we need to fix this, and whatever the fix is will probably render the BH/USER distinction obsolete. Hmm, I would think opposite. USER (or generic) is expensive variant, BH is lite. No? Yes that would certainly be the obvious fix. In other words, just use smp_processor_id instead of the raw version. I suppose the only issue would be that disabling/enabling preemption isn't exactly cost-free and we're going to be doing that for every single increment. Hmm, wasn't someone else talking about a non-atomic version of atomic ops lately (i.e., atomic with respect to the local CPU only)? Perhaps this is the killer app for it :) Thanks, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmVHI~} [EMAIL PROTECTED] Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2.6.25] net: removes unnecessary dependencies for net_namespace.h
Denis V. Lunev [EMAIL PROTECTED] writes: This patch removes some unneeded includes for net_namespace.h to speed up compilation. Signed-off-by: Denis V. Lunev [EMAIL PROTECTED] diff --git a/include/net/pkt_cls.h b/include/net/pkt_cls.h index f285de6..28b7f25 100644 --- a/include/net/pkt_cls.h +++ b/include/net/pkt_cls.h @@ -2,7 +2,6 @@ #define __NET_PKT_CLS_H #include linux/pkt_cls.h -#include net/net_namespace.h #include net/sch_generic.h #include net/act_api.h What of tcf_match_indev? I guess linux/netdevice.h brings it in for us but still. diff --git a/include/net/sock.h b/include/net/sock.h index 43e3cd9..a04e361 100644 --- a/include/net/sock.h +++ b/include/net/sock.h @@ -57,7 +57,6 @@ #include asm/atomic.h #include net/dst.h #include net/checksum.h -#include net/net_namespace.h /* * This structure really needs to be cleaned up. @@ -95,6 +94,7 @@ typedef struct { struct sock; struct proto; +struct net; /** * struct sock_common - minimal network layer representation of sockets This hunk definitely looks sane. Eric -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/3] [UDP6]: Counter increment on BH mode
On Mon, Dec 03, 2007 at 03:19:35PM +0800, Wang Chen wrote: System calls should be USER. So change the BH to USER for UDP*_INC_STATS_BH(). Signed-off-by: Wang Chen [EMAIL PROTECTED] I've rearragned it so that we the new INC_STATS call is USER in the first patch. Otherwise it's all in net-2.6.25 now. BTW, thanks to Gerrit's note I took a look at the underlying code for the _BH/_USER stuff. It's in fact totally broken when PREEMPT is on. It relies on the fact that process-context kernel code does not get preempted by other process-context kernel code or for that matter migrate to other CPUs, neither of which is true with PREEMPT on. So we need to fix this, and whatever the fix is will probably render the BH/USER distinction obsolete. Any takers? Thanks, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmVHI~} [EMAIL PROTECTED] Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCHv7 2/5] rtnetlink: send a single notification on device state changes
Jarek Poplawski írta: Laszlo Attila Toth wrote, On 11/29/2007 05:11 PM: In do_setlink() a single ntification is sent at the end of the function if any modification occured. If the address has been changed, another notification is sent. ... @@ -858,6 +859,7 @@ static int do_setlink(struct net_device *dev, struct ifinfomsg *ifm, if (tb[IFLA_BROADCAST]) { nla_memcpy(dev-broadcast, tb[IFLA_BROADCAST], dev-addr_len); send_addr_notify = 1; + modified = 1; } .. if (send_addr_notify) call_netdevice_notifiers(NETDEV_CHANGEADDR, dev); + + if (modified) + netdev_state_change(dev); + The subject suggests there might be less notifications. The patch actually adds a little. Any additional comment why they are necessary? The actual state of a device contains its address(es), also address change implies state change, but these are different netlink messages also the NETDEV_CHANGEADDR cannot be dropped because the other one is used. Attila -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/2] [IPSEC]: Reinject packet instead of calling netfilter directly on input
On Mon, 2007-03-12 at 20:21 +1100, Herbert Xu wrote: Sorry for the late response Jamal. I've been too busy to give this issue proper thought. It's still in my mailbox so I will respond to it once things quiten down a little. I totaly empathize - take your time. The point brought up on v6 extensions needs to be addressed. I thought about it a little - and it is valid as well for ipv4 options; they will be processed twice. To build up on what Patrick said, I noticed a bit still available in the bag right after skb-nf_trace that i could use to signal options/extensions already processed. If people think think this is a sane use of that very lonely bit, I will post patches. CCing Yoshfuji. cheers, jamal -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC][PATCHES 0/7]: Reorganization of RX history patches
Em Mon, Dec 03, 2007 at 08:35:12AM +, Gerrit Renker escreveu: Hi Arnaldo, hank you for going through this. I have just backported your recent patches of 2.6.25 to the DCCP/CCID4/Faster Restart test tree at git://eden-feed.erg.abdn.ac.uk/dccp_exp {dccp,ccid4,dccp_fr} as per subsequent message. | do, so please consider moving DCCP discussion to netdev@vger.kernel.org, | where lots of smart networking folks are present and can help our efforts | on turning RFCs to code. Are you suggesting using netdev exclusively or in addition to [EMAIL PROTECTED] Well, since at least one person that has contributed significantly in the past has said he can't cope with traffic on netdev, we can CC [EMAIL PROTECTED] | Please take a look at this patch series where I reorganized your work on the new | TFRC rx history handling code. I'll wait for your considerations and then do as many | interactions as reasonable to get your work merged. | | It should be completely equivalent, plus some fixes and optimizations, such as: It will be necessary to address these points one-by-one. Before diving into making fixes and `optimisations', have you tested your code? The patches you are referring to I have performed basic tests, and when doing so noticed a bug in inet_diag, that I commented with Herbert Xu and he has since provided a fix. have been posted and re-posted for a period of over 9 months on [EMAIL PROTECTED], and Being posted and re-posted does not guarantee that the patch is OK or that is in a form that is acceptable to all tree maintainers. DaveM is subscribed to [EMAIL PROTECTED] and could have picked this if he had the time to do the review. I didn't, but now I'm trying to spend my time on reviewing your patches and in the process doing modifications I find appropriate while trying hard not to introduce bugs in your hard work to get it merged. there are regression tests which show that this code improves on the existing Linux implementation. These are labelled as `test tree' on http://www.linux-foundation.org/en/Net:DCCP_Testing#Regression_testing So if you are making changes to the code, I would like to ask if you have run similar regression tests, to avoid having to step back later. Fair enough, I will do that before asking Herbert or Dave to pull from my tree. | . The code that allocates the RX ring deals with failures when one of the entries in | the ring buffer is not successfully allocated, the original code was leaking the | successfully allocated entries. Sorry for not point out exactly this, here it goes: Your original patch: +int tfrc_rx_hist_init(struct tfrc_rx_hist *h) +{ + int i; + + for (i = 0; i = NDUPACK; i++) { + h-ring[i] = kmem_cache_alloc(tfrc_rxh_cache, GFP_ATOMIC); + if (h-ring[i] == NULL) + return 1; + } + h-loss_count = 0; + h-loss_start = 0; + return 0; +} +EXPORT_SYMBOL_GPL(tfrc_rx_hist_init); Then, in ccid3_hc_rx_init you do: static int ccid3_hc_rx_init(struct ccid *ccid, struct sock *sk) { struct ccid3_hc_rx_sock *hcrx = ccid_priv(ccid); ccid3_pr_debug(entry\n); hcrx-ccid3hcrx_state = TFRC_RSTATE_NO_DATA; tfrc_lh_init(hcrx-ccid3hcrx_li_hist); return tfrc_rx_hist_init(hcrx-ccid3hcrx_hist); } So if tfrc_rx_hist_init fail to allocate, say, the last entry in the ring, it will return 1, and from looking at how you initialize h-loss_{count,start} to zero I assumed that the whole tfrc_rx_hist is not zeroed when tfrc_rx_hist_init is called, so one can also assume that the ring entries are not initialized to NULL and that if the error recovery is to assume that later on tfrc_rx_hist_cleanup is called we would not have a leak, but an OOPS when tfrc_rx_hist_cleanup tries to call kfree_cache_free on the uninitialized ring entries. But if you look at ccid_new(), that calls ccid3_hc_rx_init(), you'll see that when the ccid rx or hx routine fails, it just frees the struct ccid with the area where h-ring is, so, what was not cleaned up by the ccid init routine is effectively forgot, leaked. I first did the cleanup at tfrc_rx_hist_init (that I renamed to tfrc_rx_hist_alloc, since it doesn't just initializes things, but allocates entries from slab), but then I just made the rx history slab have arrays of tfrc_rx_hist_entry objects, not individual ones as your code always allocates them as arrays. | . We do just one allocation for the ring buffer, as the number of entries is fixed we | should just do one allocation and not TFRC_NDUPACK times. Will look at the first point in the patch; with regard to the second point I agree, this will make the code simpler, which is good. good | . I haven't checked if all the code was commited, as I tried to introduce just what was | immediatelly used, probably we'll need to do some changes when working on the merge
Re: [PATCH net-2.6.25] Add packet filtering based on process's securitycontext.
Hello. Patrick McHardy wrote: No news on that. I'm also a bit sceptical if adding all this complexity and overhead would really be worth it (considering only netfilter) just to use the owner match and UID/GID matching. It wouldn't even be accurate because there is not 1:1 mapping of sockets and processes. Considering only LSM, socket_post_accept()/socket_post_recv_datagram() hooks are not complicated at all. A socket may be mapped to multiple processes, but at the moment of picking up (i.e. accept()/recvmsg()), I think it is accurate 1:1 mapping. I'm more interested in Who picks this connection/datagram up? than Which socket enqueues this connection/datagram? It may be indifferent for netfilter, but it is region of interest for me. I actually like Samir Bellabes' approach, which doesn't suffer from these problems IIRC. Oh, I found him at http://nfws.inl.fr/en/?p=50 . (Sorry, I didn't know.) He is the person who was discussing with me a few days ago. From memory, one approach under discussion was to add netfilter hooks to the transport layer, which could be invoked correctly by each type of protocol when the target process is selected. We can only invoke the hooks after the socket lookup, but we don't know which process is going to call recvmsg() for that socket. Right. Thus, I'm proposing LSM hooks at accept()/recvmsg() time. Regards. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCHv7 2/5] rtnetlink: send a single notification on device state changes
On 03-12-2007 12:40, Laszlo Attila Toth wrote: Jarek Poplawski írta: Laszlo Attila Toth wrote, On 11/29/2007 05:11 PM: In do_setlink() a single ntification is sent at the end of the function if any modification occured. If the address has been changed, another notification is sent. ... @@ -858,6 +859,7 @@ static int do_setlink(struct net_device *dev, struct ifinfomsg *ifm, if (tb[IFLA_BROADCAST]) { nla_memcpy(dev-broadcast, tb[IFLA_BROADCAST], dev-addr_len); send_addr_notify = 1; +modified = 1; } .. if (send_addr_notify) call_netdevice_notifiers(NETDEV_CHANGEADDR, dev); + +if (modified) +netdev_state_change(dev); + The subject suggests there might be less notifications. The patch actually adds a little. Any additional comment why they are necessary? The actual state of a device contains its address(es), also address change implies state change, but these are different netlink messages also the NETDEV_CHANGEADDR cannot be dropped because the other one is used. OK. But, since until this patch it seemed to be enough, it would be nice to know from the changelog why exactly it's nececessary to add this now, because it doesn't look like it was omitted here by mistake. (Or to say that it was omitted by mistake.) Regards, Jarek P. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/3] [UDP6]: Counter increment on BH mode
On Mon, Dec 03, 2007 at 10:54:35PM +1100, Herbert Xu wrote: Hmm, wasn't someone else talking about a non-atomic version of atomic ops lately (i.e., atomic with respect to the local CPU only)? Perhaps this is the killer app for it :) Never mind, we already have that in local_t and as Alexey correctly points out, USER is still going to be the expensive variant with the preempt_disable (well until BH gets threaded). So how about this patch? Thanks, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmVHI~} [EMAIL PROTECTED] Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt -- diff --git a/include/net/snmp.h b/include/net/snmp.h index ea206bf..9444b54 100644 --- a/include/net/snmp.h +++ b/include/net/snmp.h @@ -23,6 +23,7 @@ #include linux/cache.h #include linux/snmp.h +#include linux/smp.h /* * Mibs are stored in array of unsigned long. @@ -137,7 +138,10 @@ struct linux_mib { #define SNMP_INC_STATS_OFFSET_BH(mib, field, offset) \ (per_cpu_ptr(mib[0], raw_smp_processor_id())-mibs[field + (offset)]++) #define SNMP_INC_STATS_USER(mib, field) \ - (per_cpu_ptr(mib[1], raw_smp_processor_id())-mibs[field]++) + do { \ + per_cpu_ptr(mib[1], get_cpu())-mibs[field]++; \ + put_cpu(); \ + } while (0) #define SNMP_INC_STATS(mib, field) \ (per_cpu_ptr(mib[!in_softirq()], raw_smp_processor_id())-mibs[field]++) #define SNMP_DEC_STATS(mib, field) \ @@ -145,6 +149,9 @@ struct linux_mib { #define SNMP_ADD_STATS_BH(mib, field, addend) \ (per_cpu_ptr(mib[0], raw_smp_processor_id())-mibs[field] += addend) #define SNMP_ADD_STATS_USER(mib, field, addend)\ - (per_cpu_ptr(mib[1], raw_smp_processor_id())-mibs[field] += addend) + do { \ + per_cpu_ptr(mib[1], get_cpu())-mibs[field] += addend; \ + put_cpu(); \ + } while (0) #endif -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: kernel 2.6.23.8: KERNEL: assertion in net/ipv4/tcp_input.c
On Mon, 3 Dec 2007, Wolfgang Walter wrote: with kernel 2.6.23.8 we saw a KERNEL: assertion ((int)tcp_packets_in_flight(tp) = 0) failed at net/ipv4/tcp_input.c (1292) Is this the only message? Are there any Leak printouts? Any tweaking done to TCP related sysctls? And for completeness, is GSO enabled (ethtool -k)? Most likely I broke the manual synchronization for left_out in sacktag by skipping over it when packets_out == 0 but so far I haven't been able to figure out how such state could develop in the first place... Ie., I couldn't find a case where tcp_fastretrans_alert wouldn't be called if left_out was non-zero (and it did the sync_left_out after modifying either sacked_out or lost_out, IIRC). ...If you can reproduce it, you could try if this patch below changes anything (should silence the assert and trigger earlier a WARN_ON or two :-)). ...If this triggers, then I'm sure we can pollute TCP code by a larger number of more costly checks to catch it in early. This might reveal a long-standing inconsistency of left_out in some case I just couldn't come up with by code review. Left_out will be (is) anyway dropped as unnecessary in 2.6.24. In 2.6.23 sync for left_out occurs quite soon after that BUG_TRAP anyway so the effect won't be too dramatic, prior_in_flight would be once stale, won't lead to big problems (either missed cnwd or cwnd_cnt increment, or failure to do application limited check at that particular ACK). Thanks anyway for the report. ...If I figure something out here, I'll let you know. -- diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c index c9298a7..0c5194d 100644 --- a/net/ipv4/tcp_input.c +++ b/net/ipv4/tcp_input.c @@ -1012,8 +1012,12 @@ tcp_sacktag_write_queue(struct sock *sk, struct sk_buff *ack_skb, u32 prior_snd_ if (before(TCP_SKB_CB(ack_skb)-ack_seq, prior_snd_una - tp-max_window)) return 0; - if (!tp-packets_out) + if (!tp-packets_out) { + WARN_ON(tp-sacked_out); + WARN_ON(tp-lost_out); + WARN_ON(tp-left_out); goto out; + } /* SACK fastpath: * if the only SACK change is the increase of the end_seq of @@ -1277,14 +1281,14 @@ tcp_sacktag_write_queue(struct sock *sk, struct sk_buff *ack_skb, u32 prior_snd_ } } +out: + tp-left_out = tp-sacked_out + tp-lost_out; if ((reord tp-fackets_out) icsk-icsk_ca_state != TCP_CA_Loss (!tp-frto_highmark || after(tp-snd_una, tp-frto_highmark))) tcp_update_reordering(sk, ((tp-fackets_out + 1) - reord), 0); -out: - #if FASTRETRANS_DEBUG 0 BUG_TRAP((int)tp-sacked_out = 0); BUG_TRAP((int)tp-lost_out = 0); -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] SMC911X: Fix using of dereferenced skb after netif_rx
Wang == Wang Chen [EMAIL PROTECTED] writes: Hi, Wang +len = skb-len; Wang netif_rx(skb); dev- stats.rx_packets++; Wang -dev-stats.rx_bytes += skb-len; Wang +dev-stats.rx_bytes += len; Why not simply update the stats before calling netif_rx as the return value isn't checked anyway? -- Bye, Peter Korsgaard -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC][PATCHES 0/7]: Reorganization of RX history patches
| Are you suggesting using netdev exclusively or in addition to [EMAIL PROTECTED] | | Well, since at least one person that has contributed significantly in | the past has said he can't cope with traffic on netdev, we can CC | [EMAIL PROTECTED] I have a similar problem with the traffic but agree and will copy as well. | have been posted and re-posted for a period of over 9 months on [EMAIL PROTECTED], and | | Being posted and re-posted does not guarantee that the patch is OK or | that is in a form that is acceptable to all tree maintainers. With the first point I couldn't agree more, but this is not really what I meant - the point was that despite posting and re-posting there was often silence. And now there is feedback, in form of a patchset made by you; and all that I am asking for is just to be given the time to look that through. Last time a RFC patch appeared at 3pm and was checked in the same evening (only found out next morning). With your experience and long background as a maintainer maybe you expect quicker turn-around times, but I also had waited patiently until you had had a chance to review the stuff I sent. | | . The code that allocates the RX ring deals with failures when one of the entries in | | the ring buffer is not successfully allocated, the original code was leaking the | | successfully allocated entries. | | Sorry for not point out exactly this, here it goes: | | Your original patch: | | +int tfrc_rx_hist_init(struct tfrc_rx_hist *h) | +{ | + int i; | + | + for (i = 0; i = NDUPACK; i++) { | + h-ring[i] = kmem_cache_alloc(tfrc_rxh_cache, GFP_ATOMIC); | + if (h-ring[i] == NULL) | + return 1; | + } | + h-loss_count = 0; | + h-loss_start = 0; | + return 0; | +} | +EXPORT_SYMBOL_GPL(tfrc_rx_hist_init); | I expected this and it actually was very clear from your original message. I fully back up your solution in this point, see below. But my question above was rather: are there any other bugs rather than the above leakage, which is what the previous email seemed to indicate? With regard to your solution - you are using int tfrc_rx_hist_alloc(struct tfrc_rx_hist *h) { h-ring = kmem_cache_alloc(tfrc_rx_hist_slab, GFP_ATOMIC); h-loss_count = h-loss_start = 0; return h-ring == NULL; } which is better not only for one but for two reasons. It solves the leakage and in addition makes the entire code simpler. Fully agreed. | | | . I haven't checked if all the code was commited, as I tried to introduce just what was | | immediatelly used, probably we'll need to do some changes when working on the merge | | of your loss intervals code. | Sorry I don't understand this point. | | Let me check now and tell you for sure: | | tfrc_rx_hist_delta_seqno and tfrc_rx_hist_swap were not included, as | they were not used, we should introduce them later, when getting to the | working on the loss interval code. Ah thanks, that was really not clear. Just beginning to work through the set. | static void ccid3_hc_rx_packet_recv(struct sock *sk, struct sk_buff *skb) | { | // ... | u32 sample, payload_size = skb-len - dccp_hdr(skb)-dccph_doff * 4; | | if (unlikely(hcrx-ccid3hcrx_state == TFRC_RSTATE_NO_DATA)) { | if (is_data_packet) { | do_feedback = FBACK_INITIAL; | ccid3_hc_rx_set_state(sk, TFRC_RSTATE_DATA); | ccid3_hc_rx_update_s(hcrx, payload_size); | } | goto update_records; | } | | == Non-data packets are ignored for the purposes of computing s (this is in the RFC), | consequently update_s() is only called for data packets; using the two following | functions: | | | static inline u32 tfrc_ewma(const u32 avg, const u32 newval, const u8 weight) | { | return avg ? (weight * avg + (10 - weight) * newval) / 10 : newval; | } | | I hadn't considered that tfrc_ewma would for every packet check if the | avg was 0 and I find it suboptimal now that I look at it, we are just | feeding data packets, no? Yes exactly, only data packets are used for s. | static inline void ccid3_hc_rx_update_s(struct ccid3_hc_rx_sock *hcrx, int len) | { | if (likely(len 0))/* don't update on empty packets (e.g. ACKs) */ | hcrx-ccid3hcrx_s = tfrc_ewma(hcrx-ccid3hcrx_s, len, 9); | } | | And we also just do test for len 0 in update_s, that looks like | also excessive, no? Hm, I think we need to make it robust against API bugs and/or zero-sized data packets. The check `len 0' may seem redundant but it catches such a condition. For a moving average an accidental zero value will have quite an impact on the overall value. In CCID3 it
[PATCH net-2.6.25 1/2][INET] Merge sys.net.ipv4.ip_forward and sys.net.ipv4.conf.all.forwarding
AFAIS these two entries should do the same thing - change the forwarding state on ipv4_devconf and on all the devices. I propose to merge the handlers together using ctl paths. The inet_forward_change() is static after this and I move it higher to make code organized like helpers (including this) proc/sysctl handlers tables rather than mixing it altogether. Signed-off-by: Pavel Emelyanov [EMAIL PROTECTED] --- diff --git a/include/linux/inetdevice.h b/include/linux/inetdevice.h index d83fee2..dd093ea 100644 --- a/include/linux/inetdevice.h +++ b/include/linux/inetdevice.h @@ -135,7 +135,6 @@ extern struct in_device *inetdev_by_index(int); extern __be32 inet_select_addr(const struct net_device *dev, __be32 dst, int scope); extern __be32 inet_confirm_addr(const struct net_device *dev, __be32 dst, __be32 local, int scope); extern struct in_ifaddr *inet_ifa_byprefix(struct in_device *in_dev, __be32 prefix, __be32 mask); -extern voidinet_forward_change(void); static __inline__ int inet_ifa_match(__be32 addr, struct in_ifaddr *ifa) { diff --git a/net/ipv4/devinet.c b/net/ipv4/devinet.c index c19c8db..0b5f042 100644 --- a/net/ipv4/devinet.c +++ b/net/ipv4/devinet.c @@ -1264,6 +1264,28 @@ static void devinet_copy_dflt_conf(int i) read_unlock(dev_base_lock); } +static void inet_forward_change(void) +{ + struct net_device *dev; + int on = IPV4_DEVCONF_ALL(FORWARDING); + + IPV4_DEVCONF_ALL(ACCEPT_REDIRECTS) = !on; + IPV4_DEVCONF_DFLT(FORWARDING) = on; + + read_lock(dev_base_lock); + for_each_netdev(init_net, dev) { + struct in_device *in_dev; + rcu_read_lock(); + in_dev = __in_dev_get_rcu(dev); + if (in_dev) + IN_DEV_CONF_SET(in_dev, FORWARDING, on); + rcu_read_unlock(); + } + read_unlock(dev_base_lock); + + rt_cache_flush(0); +} + static int devinet_conf_proc(ctl_table *ctl, int write, struct file* filp, void __user *buffer, size_t *lenp, loff_t *ppos) @@ -1333,28 +1355,6 @@ static int devinet_conf_sysctl(ctl_table *table, int __user *name, int nlen, return 1; } -void inet_forward_change(void) -{ - struct net_device *dev; - int on = IPV4_DEVCONF_ALL(FORWARDING); - - IPV4_DEVCONF_ALL(ACCEPT_REDIRECTS) = !on; - IPV4_DEVCONF_DFLT(FORWARDING) = on; - - read_lock(dev_base_lock); - for_each_netdev(init_net, dev) { - struct in_device *in_dev; - rcu_read_lock(); - in_dev = __in_dev_get_rcu(dev); - if (in_dev) - IN_DEV_CONF_SET(in_dev, FORWARDING, on); - rcu_read_unlock(); - } - read_unlock(dev_base_lock); - - rt_cache_flush(0); -} - static int devinet_sysctl_forward(ctl_table *ctl, int write, struct file* filp, void __user *buffer, size_t *lenp, loff_t *ppos) @@ -1537,6 +1537,27 @@ static void devinet_sysctl_unregister(struct ipv4_devconf *p) } #endif +static struct ctl_table ctl_forward_entry[] = { + { + .ctl_name = NET_IPV4_FORWARD, + .procname = ip_forward, + .data = ipv4_devconf.data[ + NET_IPV4_CONF_FORWARDING - 1], + .maxlen = sizeof(int), + .mode = 0644, + .proc_handler = devinet_sysctl_forward, + .strategy = devinet_conf_sysctl, + .extra1 = ipv4_devconf, + }, + { }, +}; + +static __initdata struct ctl_path net_ipv4_path[] = { + { .procname = net, .ctl_name = CTL_NET, }, + { .procname = ipv4, .ctl_name = NET_IPV4, }, + { }, +}; + void __init devinet_init(void) { register_gifconf(PF_INET, inet_gifconf); @@ -1550,6 +1571,7 @@ void __init devinet_init(void) ipv4_devconf); __devinet_sysctl_register(default, NET_PROTO_CONF_DEFAULT, ipv4_devconf_dflt); + register_sysctl_paths(net_ipv4_path, ctl_forward_entry); #endif } diff --git a/net/ipv4/sysctl_net_ipv4.c b/net/ipv4/sysctl_net_ipv4.c index bec6fe8..8c16077 100644 --- a/net/ipv4/sysctl_net_ipv4.c +++ b/net/ipv4/sysctl_net_ipv4.c @@ -35,62 +35,6 @@ struct ipv4_config ipv4_config; #ifdef CONFIG_SYSCTL -static -int ipv4_sysctl_forward(ctl_table *ctl, int write, struct file * filp, - void __user *buffer, size_t *lenp, loff_t *ppos) -{ - int val = IPV4_DEVCONF_ALL(FORWARDING); - int ret; - - ret = proc_dointvec(ctl, write, filp, buffer, lenp, ppos); - - if (write IPV4_DEVCONF_ALL(FORWARDING) != val) - inet_forward_change(); - - return ret; -} - -static int ipv4_sysctl_forward_strategy(ctl_table *table, -
[PATCH net-2.6.25 2/2][INET] Eliminate difference in actions of sysctl and proc handler for conf.all.forwarding
AFAIS the net.ipv4.conf. dev, all and default sysctls should work like this when changed (besides changing the value itself): dev : optionally do smth else all : walk devices default : walk devices The proc handler for net.ipv4.conf.all works like this: dev : flush rt cache all : walk devices and flush rt cache default : nothing while the sysctl handler works like this: dev : nothing all : nothing default : walk devices but don't flush the cache All this looks strange. Am I right that regardless of whatever handler (proc or syscall) is called the behavior should be: dev : flush rt cache all : walk the devices and flush the cache default : walk the devices and flush the cache ? If yes, then this patch should fix it up. Signed-off-by: Pavel Emelyanov [EMAIL PROTECTED] --- diff --git a/net/ipv4/devinet.c b/net/ipv4/devinet.c index 0b5f042..1934a06 100644 --- a/net/ipv4/devinet.c +++ b/net/ipv4/devinet.c @@ -1282,6 +1282,17 @@ static void inet_forward_change(void) rcu_read_unlock(); } read_unlock(dev_base_lock); +} + +static void fixup_forward_change(struct ctl_table *table) +{ + struct ipv4_devconf *conf; + + conf = table-extra1; + if (conf == ipv4_devconf) + inet_forward_change(); + else if (conf == ipv4_devconf_dflt) + devinet_copy_dflt_conf(NET_IPV4_CONF_FORWARDING - 1); rt_cache_flush(0); } @@ -1305,9 +1316,9 @@ static int devinet_conf_proc(ctl_table *ctl, int write, return ret; } -static int devinet_conf_sysctl(ctl_table *table, int __user *name, int nlen, +static int __devinet_conf_sysctl(ctl_table *table, int __user *name, int nlen, void __user *oldval, size_t __user *oldlenp, - void __user *newval, size_t newlen) + void __user *newval, size_t newlen, int *idx) { struct ipv4_devconf *cnf; int *valp = table-data; @@ -1346,16 +1357,27 @@ static int devinet_conf_sysctl(ctl_table *table, int __user *name, int nlen, cnf = table-extra1; i = (int *)table-data - cnf-data; - set_bit(i, cnf-state); + *idx = i; + return 1; +} + +static int devinet_conf_sysctl(ctl_table *table, int __user *name, int nlen, + void __user *oldval, size_t __user *oldlenp, + void __user *newval, size_t newlen) +{ + int ret, i; - if (cnf == ipv4_devconf_dflt) + ret = __devinet_conf_sysctl(table, name, nlen, oldval, oldlenp, + newval, newlen, i); + + if (ret == 1 table-extra1 == ipv4_devconf_dflt) devinet_copy_dflt_conf(i); - return 1; + return ret; } -static int devinet_sysctl_forward(ctl_table *ctl, int write, +static int devinet_forward_proc(ctl_table *ctl, int write, struct file* filp, void __user *buffer, size_t *lenp, loff_t *ppos) { @@ -1363,16 +1385,25 @@ static int devinet_sysctl_forward(ctl_table *ctl, int write, int val = *valp; int ret = proc_dointvec(ctl, write, filp, buffer, lenp, ppos); - if (write *valp != val) { - if (valp == IPV4_DEVCONF_ALL(FORWARDING)) - inet_forward_change(); - else if (valp != IPV4_DEVCONF_DFLT(FORWARDING)) - rt_cache_flush(0); - } + if (write *valp != val) + fixup_forward_change(ctl); return ret; } +static int devinet_forward_sysctl(ctl_table *table, int __user *name, int nlen, + void __user *oldval, size_t __user *oldlenp, + void __user *newval, size_t newlen) +{ + int ret, i; + + ret = __devinet_conf_sysctl(table, name, nlen, oldval, oldlenp, + newval, newlen, i); + if (ret == 1) + fixup_forward_change(table); + return ret; +} + int ipv4_doint_and_flush(ctl_table *ctl, int write, struct file* filp, void __user *buffer, size_t *lenp, loff_t *ppos) @@ -1436,8 +1467,8 @@ static struct devinet_sysctl_table { } devinet_sysctl = { .devinet_vars = { DEVINET_SYSCTL_COMPLEX_ENTRY(FORWARDING, forwarding, -devinet_sysctl_forward, -devinet_conf_sysctl), +devinet_forward_proc, +devinet_forward_sysctl), DEVINET_SYSCTL_RO_ENTRY(MC_FORWARDING, mc_forwarding), DEVINET_SYSCTL_RW_ENTRY(ACCEPT_REDIRECTS, accept_redirects), @@ -1545,8 +1576,8 @@ static struct ctl_table ctl_forward_entry[] = { NET_IPV4_CONF_FORWARDING - 1], .maxlen =
Re: [RFC][PATCHES 0/7]: Reorganization of RX history patches
Em Mon, Dec 03, 2007 at 01:49:47PM +, Gerrit Renker escreveu: | Are you suggesting using netdev exclusively or in addition to [EMAIL PROTECTED] | | Well, since at least one person that has contributed significantly in | the past has said he can't cope with traffic on netdev, we can CC | [EMAIL PROTECTED] I have a similar problem with the traffic but agree and will copy as well. | have been posted and re-posted for a period of over 9 months on [EMAIL PROTECTED], and | | Being posted and re-posted does not guarantee that the patch is OK or | that is in a form that is acceptable to all tree maintainers. With the first point I couldn't agree more, but this is not really what I meant - the point was that despite posting and re-posting there was often silence. And now there is feedback, in form of a patchset made by you; and all that I am asking for is just to be given the time to look that through. Last time a RFC patch appeared at 3pm and was checked in the same evening (only found out next morning). I was too optimistic about that one, feeling that it was safe, sorry about that, will avoid doing that in the future. With your experience and long background as a maintainer maybe you expect quicker turn-around times, but I also had waited patiently until you had had a chance to review the stuff I sent. Agreed, my bad, will be more patient with my side as you've been with yours :-) | | . The code that allocates the RX ring deals with failures when one of the entries in | | the ring buffer is not successfully allocated, the original code was leaking the | | successfully allocated entries. | | Sorry for not point out exactly this, here it goes: | | Your original patch: | | +int tfrc_rx_hist_init(struct tfrc_rx_hist *h) | +{ | + int i; | + | + for (i = 0; i = NDUPACK; i++) { | + h-ring[i] = kmem_cache_alloc(tfrc_rxh_cache, GFP_ATOMIC); | + if (h-ring[i] == NULL) | + return 1; | + } | + h-loss_count = 0; | + h-loss_start = 0; | + return 0; | +} | +EXPORT_SYMBOL_GPL(tfrc_rx_hist_init); | I expected this and it actually was very clear from your original message. I fully back up your solution in this point, see below. But my question above was rather: are there any other bugs rather than the above leakage, which is what the previous email seemed to indicate? With regard to your solution - you are using int tfrc_rx_hist_alloc(struct tfrc_rx_hist *h) { h-ring = kmem_cache_alloc(tfrc_rx_hist_slab, GFP_ATOMIC); h-loss_count = h-loss_start = 0; return h-ring == NULL; } which is better not only for one but for two reasons. It solves the leakage and in addition makes the entire code simpler. Fully agreed. good | | | . I haven't checked if all the code was commited, as I tried to introduce just what was | | immediatelly used, probably we'll need to do some changes when working on the merge | | of your loss intervals code. | Sorry I don't understand this point. | | Let me check now and tell you for sure: | | tfrc_rx_hist_delta_seqno and tfrc_rx_hist_swap were not included, as | they were not used, we should introduce them later, when getting to the | working on the loss interval code. Ah thanks, that was really not clear. Just beginning to work through the set. great |static void ccid3_hc_rx_packet_recv(struct sock *sk, struct sk_buff *skb) |{ |// ... |u32 sample, payload_size = skb-len - dccp_hdr(skb)-dccph_doff * 4; | |if (unlikely(hcrx-ccid3hcrx_state == TFRC_RSTATE_NO_DATA)) { |if (is_data_packet) { |do_feedback = FBACK_INITIAL; |ccid3_hc_rx_set_state(sk, TFRC_RSTATE_DATA); |ccid3_hc_rx_update_s(hcrx, payload_size); |} |goto update_records; |} | | == Non-data packets are ignored for the purposes of computing s (this is in the RFC), | consequently update_s() is only called for data packets; using the two following | functions: | | |static inline u32 tfrc_ewma(const u32 avg, const u32 newval, const u8 weight) |{ |return avg ? (weight * avg + (10 - weight) * newval) / 10 : newval; |} | | I hadn't considered that tfrc_ewma would for every packet check if the | avg was 0 and I find it suboptimal now that I look at it, we are just | feeding data packets, no? Yes exactly, only data packets are used for s. |static inline void ccid3_hc_rx_update_s(struct ccid3_hc_rx_sock *hcrx, int len) |{ |if (likely(len 0))/* don't update on empty packets (e.g. ACKs) */ |hcrx-ccid3hcrx_s = tfrc_ewma(hcrx-ccid3hcrx_s, len, 9); |} | | And we also
Re: namespace support requires network modules to say GPL
On Mon, 03 Dec 2007 09:24:15 +0100 Romano Giannetti [EMAIL PROTECTED] wrote: On Sat, 2007-12-01 at 22:34 -0500, Mark Lord wrote: Stephen Hemminger wrote:. I spoke too soon earlier, ndiswrapper builds and loads against current 2.6.24-rc3. Vmware and proprietary VPN software probably do not. Once again I don't give a damn, but the enterprise distro vendors certainly care. ... Naw, enterprise (or any other) distro vendors shouldn't have any issues here, since they can just patch their kernels around any issues. Please pardon me for jumping in; What I think is that every time VMware or (worst) ndiswrapper breaks, if you had read the thread... ndiswrapper doesn't break, and vmware driver had some bugs that, once fixed, no no longer break either -- If you want to reach me at my work email, use [EMAIL PROTECTED] For development, discussion and tips for power savings, visit http://www.lesswatts.org -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: e1000 driver problems
Lukas Hejtmanek wrote: On Tue, Nov 27, 2007 at 10:23:00AM -0800, Kok, Auke wrote: can you see if your problem goes away with this patch? I cannot test it right now but friend of mine has the same card with 2.6.23.1 kernel. it does not. he also tried module 7.6.12 from source fourge, your patch did not help. Moreover, it just hangs network connections after while. No message in dmesg about hangup. can you open up a ticket on e1000.sf.net and fill in a full bugreport including output of all of these? : - ethttool -i eth0 - ethtool -e eth0 - lspci -vv - full dmesg (not just the driver parts) - dmidecode - cat /proc/interrupts I'd like to see if maybe we can reproduce this in our lab. Thanks, Auke -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: kernel 2.6.23.8: KERNEL: assertion in net/ipv4/tcp_input.c
Am Montag, 3. Dezember 2007 14:34 schrieb Ilpo Järvinen: On Mon, 3 Dec 2007, Wolfgang Walter wrote: with kernel 2.6.23.8 we saw a KERNEL: assertion ((int)tcp_packets_in_flight(tp) = 0) failed at net/ipv4/tcp_input.c (1292) Is this the only message? Are there any Leak printouts? No. 4 days earlier there were 3 messages: TCP: Treason uncloaked! Peer a.b.c.d:80/56532 shrinks window 3535507131:3535513869. Repaired. Any tweaking done to TCP related sysctls? And for completeness, is GSO enabled (ethtool -k)? rx-checksumming: on tx-checksumming: on scatter-gather: on tcp segmentation offload: off udp fragmentation offload: off generic segmentation offload: off Most likely I broke the manual synchronization for left_out in sacktag by skipping over it when packets_out == 0 but so far I haven't been able to figure out how such state could develop in the first place... Ie., I couldn't find a case where tcp_fastretrans_alert wouldn't be called if left_out was non-zero (and it did the sync_left_out after modifying either sacked_out or lost_out, IIRC). ...If you can reproduce it, you could try if this patch below changes I don't know how to reproduce it - we never saw the message before. I'll aply the patch. Let see if the WARN_ON triggers before we update to a newer kernel :-). anything (should silence the assert and trigger earlier a WARN_ON or two :-)). ...If this triggers, then I'm sure we can pollute TCP code by a larger number of more costly checks to catch it in early. This might reveal a long-standing inconsistency of left_out in some case I just couldn't come up with by code review. Left_out will be (is) anyway dropped as unnecessary in 2.6.24. In 2.6.23 sync for left_out occurs quite soon after that BUG_TRAP anyway so the effect won't be too dramatic, prior_in_flight would be once stale, won't lead to big problems (either missed cnwd or cwnd_cnt increment, or failure to do application limited check at that particular ACK). Thanks anyway for the report. ...If I figure something out here, I'll let you know. -- diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c index c9298a7..0c5194d 100644 --- a/net/ipv4/tcp_input.c +++ b/net/ipv4/tcp_input.c @@ -1012,8 +1012,12 @@ tcp_sacktag_write_queue(struct sock *sk, struct sk_buff *ack_skb, u32 prior_snd_ if (before(TCP_SKB_CB(ack_skb)-ack_seq, prior_snd_una - tp-max_window)) return 0; - if (!tp-packets_out) + if (!tp-packets_out) { + WARN_ON(tp-sacked_out); + WARN_ON(tp-lost_out); + WARN_ON(tp-left_out); goto out; + } /* SACK fastpath: * if the only SACK change is the increase of the end_seq of @@ -1277,14 +1281,14 @@ tcp_sacktag_write_queue(struct sock *sk, struct sk_buff *ack_skb, u32 prior_snd_ } } +out: + tp-left_out = tp-sacked_out + tp-lost_out; if ((reord tp-fackets_out) icsk-icsk_ca_state != TCP_CA_Loss (!tp-frto_highmark || after(tp-snd_una, tp-frto_highmark))) tcp_update_reordering(sk, ((tp-fackets_out + 1) - reord), 0); -out: - #if FASTRETRANS_DEBUG 0 BUG_TRAP((int)tp-sacked_out = 0); BUG_TRAP((int)tp-lost_out = 0); Thanks and regards, -- Wolfgang Walter Studentenwerk München Anstalt des öffentlichen Rechts -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC][PATCHES 0/7]: Reorganization of RX history patches
| | static inline void ccid3_hc_rx_update_s(struct ccid3_hc_rx_sock *hcrx, int len) | | { | | if (likely(len 0))/* don't update on empty packets (e.g. ACKs) */ | | hcrx-ccid3hcrx_s = tfrc_ewma(hcrx-ccid3hcrx_s, len, 9); | | } | | | | And we also just do test for len 0 in update_s, that looks like | | also excessive, no? | Hm, I think we need to make it robust against API bugs and/or zero-sized | data packets. The check `len 0' may seem redundant but it catches such | a condition. For a moving average an accidental zero value will have | quite an impact on the overall value. In CCID3 it is | | x_new = 0.9 * x_old + 0.1 * len | | So if len is accidentally 0 where it should not be, then each time the | moving average is reduced to 90%. | | So we must make it a BUG_ON, not something that is to be always present. | I think it should be a warning condition since it can be triggered when the remote party sends zero-sized packets. It may be good to log this into the syslog to warn about possibly misbehaving apps/peers/remote stacks. | As a comparison - the entire patch set took about a full month to do. | But that meant I am reasonably sure the algorithm is sound and can cope | with problematic conditions. | | And from what I saw so far that is my impression too, if you look at | what I'm doing it is: | | 1. go thru the whole patch trying to understand hunk by hunk You are doing a great job - in particular as it really is a lot of material. | 2. do consistency changes (add namespace prefixes) | 3. reorganize the code to look more like what is already there, we |both have different backgrounds and tastes about how code should |be written, so its only normal that if we want to keep the code |consistent, and I want that, I jump into things I think should be |reworded, while trying to keep the algorithm expressed by you. | Agree, that is not always easy to get right. I try to stick as close as possible to existing conventions but of course that is my interpretation, so I am already anticipating such changes/comments here. | think about further automatization on regression testing. | If it is of any use, some scripts and setups are at the bottom of the page at http://www.erg.abdn.ac.uk/users/gerrit/dccp/testing_dccp/ -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2.6.25] net: removes unnecessary dependencies for net_namespace.h
Looks good to me. Acked-by: Eric W. Biederman [EMAIL PROTECTED] Eric Denis V. Lunev [EMAIL PROTECTED] writes: you right, how about this? Eric W. Biederman wrote: Denis V. Lunev [EMAIL PROTECTED] writes: This patch removes some unneeded includes for net_namespace.h to speed up compilation. Signed-off-by: Denis V. Lunev [EMAIL PROTECTED] diff --git a/include/net/pkt_cls.h b/include/net/pkt_cls.h index f285de6..28b7f25 100644 --- a/include/net/pkt_cls.h +++ b/include/net/pkt_cls.h @@ -2,7 +2,6 @@ #define __NET_PKT_CLS_H #include linux/pkt_cls.h -#include net/net_namespace.h #include net/sch_generic.h #include net/act_api.h What of tcf_match_indev? I guess linux/netdevice.h brings it in for us but still. diff --git a/include/net/sock.h b/include/net/sock.h index 43e3cd9..a04e361 100644 --- a/include/net/sock.h +++ b/include/net/sock.h @@ -57,7 +57,6 @@ #include asm/atomic.h #include net/dst.h #include net/checksum.h -#include net/net_namespace.h /* * This structure really needs to be cleaned up. @@ -95,6 +94,7 @@ typedef struct { struct sock; struct proto; +struct net; /** * struct sock_common - minimal network layer representation of sockets This hunk definitely looks sane. Eric diff --git a/include/net/pkt_cls.h b/include/net/pkt_cls.h index f285de6..2eaf204 100644 --- a/include/net/pkt_cls.h +++ b/include/net/pkt_cls.h @@ -2,7 +2,6 @@ #define __NET_PKT_CLS_H #include linux/pkt_cls.h -#include net/net_namespace.h #include net/sch_generic.h #include net/act_api.h @@ -336,6 +335,8 @@ static inline int tcf_valid_offset(const struct sk_buff *skb, } #ifdef CONFIG_NET_CLS_IND +#include net/net_namespace.h + static inline int tcf_change_indev(struct tcf_proto *tp, char *indev, struct rtattr *indev_tlv) { diff --git a/include/net/sock.h b/include/net/sock.h index 43e3cd9..a04e361 100644 --- a/include/net/sock.h +++ b/include/net/sock.h @@ -57,7 +57,6 @@ #include asm/atomic.h #include net/dst.h #include net/checksum.h -#include net/net_namespace.h /* * This structure really needs to be cleaned up. @@ -95,6 +94,7 @@ typedef struct { struct sock; struct proto; +struct net; /** * struct sock_common - minimal network layer representation of sockets -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/2] netem: trace enhancement
Ariane Keller wrote: Patrick McHardy wrote: I dislike using anything besides rtnetlink for qdisc configuration. The only way to transfer arbitary amounts of data over netlink would be to spread the data over multiple messages. But then again, you're using kmalloc and only seem to allocate 4k, so how large are these traces in practice? For each packet to be processed there is 32bit of data, which encodes delay and drop, duplicate etc. The size of the actual trace file can therefore reach any length, depending on for how many packets the information is encoded (up to several GB). Therefore we send the trace file in chunks of 4000bytes to the kernel. In order to have always a packet-delay-value ready, we maintain two delay queues in the kernel (each of 4k). In a first step, both queues are filled, and the values are read from the first queue, if this queue is finished, we read values from the second queue and fill the first queue with new values from the trace file etc. Therefore we have a user space process running, which reads the values from the trace file, and sends them to the kernel. That sounds like it would also be possible using rtnetlink. You could send out a notification whenever you switch the active buffer and have userspace listen to these and replace the inactive one. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: namespace support requires network modules to say GPL
Patrick McHardy [EMAIL PROTECTED] writes: Ben Greear wrote: I have a binary module that uses dev_get_by_name...it's sort of a bridge-like thing and needs user-space to tell it which device to listen for packets on... This code doesn't need or care about name-spaces, so I don't see how it could really be infringing on the author's code (any worse than loading a binary driver into the kernel ever does). Regardless of infringement it is incompatible with a complete network namespace implementation. Further it sounds like the module you are describing defines a kernel ABI without being merged and hopes that ABI will still be supportable in the future. Honestly I think doing so is horrible code maintenance policy. I would certainly prefer to not have to patch around any problems with calling dev_get_by_name from a non-gpl module, but if required, I can probably figure something out... For all I care binary modules can break, but frankly I don't see how encapsulating a couple of structures and pointers in a new structure and adding a new argument to existing functions shifts the decision about how a function should be usable to the namespace guys. IMO all functions should continue to be usable as before, as decided by whoever actually wrote them. The only exception might be stuff where an existing EXPORT_SYMBOL is clearly wrong, but that would be a seperate discussion. I don't think we have actually shifted the decision. Further from a namespace perspective if I had to support out of tree modules and the current in kernel API the implementation would be impossible short of loading kernel modules multiple times once for each namespace. I totally refuse to give out of tree modules that power whatever their license. Right now the network namespace code that has been merged isn't that interesting as it does not include ipv4 and ipv6 support which everyone uses. One of the tests for completion of the network namespace work is grepping for init_net and making certain we have cleanly removed all references to except in a handful of cases like the boot code. Once things are largely complete it makes sense to argue with out of tree module authors that because they don't have network namespace support in their modules, their modules are broken. Right now I suspect to many developers even of in-tree modules I have just shifted code around in an annoying looking way. I can completely see other developers not getting the point. Eric -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: namespace support requires network modules to say GPL
Romano Giannetti [EMAIL PROTECTED] writes: Please pardon me for jumping in; I am not a kernel developer, but I try to help with debugging whenever I can (and it's not just hand-waving, I helped to track down a couple of nasty bugs on MMC or ACPI EC, recently). And I am an engineer and IANAL, so I wouldn't speak about laws here. But I think it's not just a distribution's problem. Unfortunately, I need VMware and ndiswrapper to get work done with my laptop. It's not the perfect world, but the only alternative is to boot in XP. So I normally stick with vendors kernels and, when I have time to play with new kernel, I go for it. If ndiswrapper and VMware work, perfect, I can test extensively the new kernel; if I find problems, I *know* I have to restart without proprietary modules, try to reproduce, report back. I did it a lot of times. What I think is that every time VMware or (worst) ndiswrapper breaks, the kernel loose an awful lot of testers. In the span of time before Giri and the VMware team post a patch (-rc1 and -rc4, tipically), my testing activity is just occasional. And I guess a lot of people is in the same situation. These are just my 2cents. I will continue to test new kernels every time I can, and to use native solutions as often as I can (go, ath5k, go!; and LabWindows/CVI for Linux, anyone?). But maybe a bit of tolerance can help everyone... As a kernel developer let me say thank you for doing what testing you can. I think a bit of tolerance for others can help the conversation. At the same time since out of tree modules (even GPL'd ones) have not chosen to play with us we have to move forward as best we can without their input. It isn't possible to do anything else. Right now I have made some changes for good technical reasons, and some out of tree modules have broken. Regardless of the flavor of EXPORT_SYMBOL they would have broken. Based on my experience with in-tree code and the few glimpses I have gotten of out of tree code the reason the out of tree code broke is because it is doing very questionable things. So the best I can say at this point, is my apologies that we have not served you better and made it possible to do what you need to do without relying on code of questionable character. Hopefully this situation will be better in the future. Eric -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: namespace support requires network modules to say GPL
Eric W. Biederman wrote: Patrick McHardy [EMAIL PROTECTED] writes: Ben Greear wrote: I have a binary module that uses dev_get_by_name...it's sort of a bridge-like thing and needs user-space to tell it which device to listen for packets on... This code doesn't need or care about name-spaces, so I don't see how it could really be infringing on the author's code (any worse than loading a binary driver into the kernel ever does). Regardless of infringement it is incompatible with a complete network namespace implementation. Further it sounds like the module you are describing defines a kernel ABI without being merged and hopes that ABI will still be supportable in the future. Honestly I think doing so is horrible code maintenance policy. I don't mind if the ABI changes, so long as I can still use something similar. The namespace logic is interesting to me in general, but at this point I can't think of a way that it actually helps this particular module. All I really need is a way to grab every frame from eth0 and then transmit it to eth1. I'm currently doing this by finding the netdevice and registering a raw-packet protocol (ie, like tcpdump would do). At least up to 2.6.23, this does not require any hacks to the kernel and uses only non GPL exported symbols. Based on my understanding of the namespace logic, if I never add any namespaces, the general network layout should look similar to how it does today, so I should have no logical problem with my module. Once things are largely complete it makes sense to argue with out of tree module authors that because they don't have network namespace support in their modules, their modules are broken. Does this imply that every module that accesses the network code *must* become GPL simply because it must interact with namespace logic that is exported as GPL only symbols? Thanks, Ben -- Ben Greear [EMAIL PROTECTED] Candela Technologies Inc http://www.candelatech.com -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/2] netem: trace enhancement
Patrick McHardy wrote: That sounds like it would also be possible using rtnetlink. You could send out a notification whenever you switch the active buffer and have userspace listen to these and replace the inactive one. Also, I think you will need a larger cache than 4-8k if you are running higher speeds (100,000 pps, etc), as you probably can't rely on user-space responding reliably every 10ms (or even less time for faster speeds.) Thanks, Ben -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html -- Ben Greear [EMAIL PROTECTED] Candela Technologies Inc http://www.candelatech.com -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: namespace support requires network modules to say GPL
From: [EMAIL PROTECTED] (Eric W. Biederman) Date: Mon, 03 Dec 2007 11:03:38 -0700 Based on my experience with in-tree code and the few glimpses I have gotten of out of tree code the reason the out of tree code broke is because it is doing very questionable things. Calling dev_get_by_foo() was never ever a very questionable thing. Stop saying bullshit, because that's all that is coming out of your mouth in this thread. The fact is, these modules called perfectly fine interfaces and by adding namespaces YOU BROKE THEM. That by itself is OK, they can make the code changes to adapt and use the init namespace. Enforcing new licensing restrictions on them for existing interfaces just because you add a new freaking argument that is practically speaking a constant and always the same right now, on the other hand, IS NOT FINE and you must fix this now. I don't care how you do it. If you don't want them to get at the init namespace symbol, fine, revert all the dev_get_by_*() interfaces to not take the namespace symbol and make them internally use the init namespace albeit invisibly to the caller. Then you make all the existing call sites invoke new dev_get_by_*_ns() interfaces that take an explicit argument. But only do this where it is truly necessary, everything uses the init namespace practically speaking and it's clearer if you conver to the *_ns() variant when the code itself is converted. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: namespace support requires network modules to say GPL
Ben Greear wrote: Eric W. Biederman wrote: Patrick McHardy [EMAIL PROTECTED] writes: Ben Greear wrote: I have a binary module that uses dev_get_by_name...it's sort of a bridge-like thing and needs user-space to tell it which device to listen for packets on... This code doesn't need or care about name-spaces, so I don't see how it could really be infringing on the author's code (any worse than loading a binary driver into the kernel ever does). Regardless of infringement it is incompatible with a complete network namespace implementation. Further it sounds like the module you are describing defines a kernel ABI without being merged and hopes that ABI will still be supportable in the future. Honestly I think doing so is horrible code maintenance policy. I don't mind if the ABI changes, so long as I can still use something similar. The namespace logic is interesting to me in general, but at this point I can't think of a way that it actually helps this particular module. All I really need is a way to grab every frame from eth0 and then transmit it to eth1. I'm currently doing this by finding the netdevice and registering a raw-packet protocol (ie, like tcpdump would do). At least up to 2.6.23, this does not require any hacks to the kernel and uses only non GPL exported symbols. Based on my understanding of the namespace logic, if I never add any namespaces, the general network layout should look similar to how it does today, so I should have no logical problem with my module. Once things are largely complete it makes sense to argue with out of tree module authors that because they don't have network namespace support in their modules, their modules are broken. Does this imply that every module that accesses the network code *must* become GPL simply because it must interact with namespace logic that is exported as GPL only symbols? That's right, with init_net's EXPORT_SYMBOL_GPL and dev_get_xx, we enforce people to be GPL whatever they didn't asked to have the namespaces in their code. Eric, why can we simply change EXPORT_SYMBOL_GPL to EXPORT_SYMBOL for init_net ? -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/2] ucc_geth: use rx-clock-name and tx-clock-name device tree properties
Updates the ucc_geth device driver to check the new rx-clock-name and tx-clock-name properties first. If present, it uses the new function qe_clock_source() to obtain the clock source. Otherwise, it checks the deprecated rx-clock and tx-clock properties. Update the device trees for 832x, 836x, and 8568 to contain the new property names only. Signed-off-by: Timur Tabi [EMAIL PROTECTED] --- This patch applies to Kumar's for-2.6.25 branch. ucc_geth will compile but not run if my other patch, qe: add function qe_clock_source has not also been applied. arch/powerpc/boot/dts/mpc832x_mds.dts |8 ++-- arch/powerpc/boot/dts/mpc832x_rdb.dts |8 ++-- arch/powerpc/boot/dts/mpc836x_mds.dts |8 ++-- arch/powerpc/boot/dts/mpc8568mds.dts |8 ++-- drivers/net/ucc_geth.c| 55 ++-- 5 files changed, 67 insertions(+), 20 deletions(-) diff --git a/arch/powerpc/boot/dts/mpc832x_mds.dts b/arch/powerpc/boot/dts/mpc832x_mds.dts index c64f303..fe54489 100644 --- a/arch/powerpc/boot/dts/mpc832x_mds.dts +++ b/arch/powerpc/boot/dts/mpc832x_mds.dts @@ -223,8 +223,8 @@ */ mac-address = [ 00 00 00 00 00 00 ]; local-mac-address = [ 00 00 00 00 00 00 ]; - rx-clock = 19; - tx-clock = 1a; + rx-clock-name = clk9; + tx-clock-name = clk10; phy-handle = phy3 ; pio-handle = pio3 ; }; @@ -244,8 +244,8 @@ */ mac-address = [ 00 00 00 00 00 00 ]; local-mac-address = [ 00 00 00 00 00 00 ]; - rx-clock = 17; - tx-clock = 18; + rx-clock-name = clk7; + tx-clock-name = clk8; phy-handle = phy4 ; pio-handle = pio4 ; }; diff --git a/arch/powerpc/boot/dts/mpc832x_rdb.dts b/arch/powerpc/boot/dts/mpc832x_rdb.dts index 388c8a7..e68a08b 100644 --- a/arch/powerpc/boot/dts/mpc832x_rdb.dts +++ b/arch/powerpc/boot/dts/mpc832x_rdb.dts @@ -202,8 +202,8 @@ */ mac-address = [ 00 00 00 00 00 00 ]; local-mac-address = [ 00 00 00 00 00 00 ]; - rx-clock = 20; - tx-clock = 13; + rx-clock-name = clk16; + tx-clock-name = clk3; phy-handle = phy00; pio-handle = ucc2pio; }; @@ -223,8 +223,8 @@ */ mac-address = [ 00 00 00 00 00 00 ]; local-mac-address = [ 00 00 00 00 00 00 ]; - rx-clock = 19; - tx-clock = 1a; + rx-clock-name = clk9; + tx-clock-name = clk10; phy-handle = phy04; pio-handle = ucc3pio; }; diff --git a/arch/powerpc/boot/dts/mpc836x_mds.dts b/arch/powerpc/boot/dts/mpc836x_mds.dts index 0b2d2b5..bfd48d0 100644 --- a/arch/powerpc/boot/dts/mpc836x_mds.dts +++ b/arch/powerpc/boot/dts/mpc836x_mds.dts @@ -254,8 +254,8 @@ */ mac-address = [ 00 00 00 00 00 00 ]; local-mac-address = [ 00 00 00 00 00 00 ]; - rx-clock = 0; - tx-clock = 19; + rx-clock-name = none; + tx-clock-name = clk9; phy-handle = phy0 ; phy-connection-type = rgmii-id; pio-handle = pio1 ; @@ -276,8 +276,8 @@ */ mac-address = [ 00 00 00 00 00 00 ]; local-mac-address = [ 00 00 00 00 00 00 ]; - rx-clock = 0; - tx-clock = 14; + rx-clock-name = none; + tx-clock-name = clk4; phy-handle = phy1 ; phy-connection-type = rgmii-id; pio-handle = pio2 ; diff --git a/arch/powerpc/boot/dts/mpc8568mds.dts b/arch/powerpc/boot/dts/mpc8568mds.dts index 5439437..cf45aab 100644 --- a/arch/powerpc/boot/dts/mpc8568mds.dts +++ b/arch/powerpc/boot/dts/mpc8568mds.dts @@ -333,8 +333,8 @@ */ mac-address = [ 00 00 00 00 00 00 ]; local-mac-address = [ 00 00 00 00 00 00 ]; - rx-clock = 0; - tx-clock = 20; + rx-clock-name = none; + tx-clock-name = clk16; pio-handle = pio1; phy-handle = phy0;
[PATCH 0/2] QE clock source improvements
This patch set adds a new property to make specifying QE clock sources easier, adds a function to help parse the property, and updates the ucc_geth driver to take advantage of all this. Patch #1 is an arch/powerpc patch meant for Kumar's for-2.6.25 branch. Patch #2 is a netdev patch, so it's either for Jeff G and/or Kumar. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/2] qe: add function qe_clock_source()
Add function qe_clock_source() which takes a string containing the name of a QE clock source (as is typically found in device trees) and returns the matching enum qe_clock value. Update booting-without-of.txt to indicate that the UCC properties rx-clock and tx-clock are deprecated and replaced with rx-clock-name and tx-clock-name, which use strings instead of numbers to indicate QE clock sources. Signed-off-by: Timur Tabi [EMAIL PROTECTED] --- This patch applies to Kumar's for-2.6.25 branch. You may need to apply my other pending patch, qe: add ability to upload QE firmware, first. Documentation/powerpc/booting-without-of.txt | 13 ++ arch/powerpc/sysdev/qe_lib/qe.c | 32 ++ include/asm-powerpc/qe.h |1 + 3 files changed, 46 insertions(+), 0 deletions(-) diff --git a/Documentation/powerpc/booting-without-of.txt b/Documentation/powerpc/booting-without-of.txt index e9a3cb1..c4342d3 100644 --- a/Documentation/powerpc/booting-without-of.txt +++ b/Documentation/powerpc/booting-without-of.txt @@ -1626,6 +1626,19 @@ platforms are moved over to use the flattened-device-tree model. - interrupt-parent : the phandle for the interrupt controller that services interrupts for this device. - pio-handle : The phandle for the Parallel I/O port configuration. + - rx-clock-name: the UCC receive clock source + none: clock source is disabled + brg1 through brg16: clock source is BRG1-BRG16, respectively + clk1 through clk24: clock source is CLK1-CLK24, respectively + - tx-clock-name: the UCC transmit clock source + none: clock source is disabled + brg1 through brg16: clock source is BRG1-BRG16, respectively + clk1 through clk24: clock source is CLK1-CLK24, respectively + The following two properties are deprecated. rx-clock has been replaced + with rx-clock-name, and tx-clock has been replaced with tx-clock-name. + Drivers that currently use the deprecated properties should continue to + do so, in order to support older device trees, but they should be updated + to check for the new properties first. - rx-clock : represents the UCC receive clock source. 0x00 : clock source is disabled; 0x1~0x10 : clock source is BRG1~BRG16 respectively; diff --git a/arch/powerpc/sysdev/qe_lib/qe.c b/arch/powerpc/sysdev/qe_lib/qe.c index 865277b..5df8530 100644 --- a/arch/powerpc/sysdev/qe_lib/qe.c +++ b/arch/powerpc/sysdev/qe_lib/qe.c @@ -204,6 +204,38 @@ int qe_setbrg(enum qe_clock brg, unsigned int rate, unsigned int multiplier) } EXPORT_SYMBOL(qe_setbrg); +/* Convert a string to a QE clock source enum + * + * This function takes a string, typically from a property in the device + * tree, and returns the corresponding enum qe_clock value. +*/ +enum qe_clock qe_clock_source(const char *source) +{ + unsigned int i; + + if (strcasecmp(source, none) == 0) + return QE_CLK_NONE; + + if (strncasecmp(source, brg, 3) == 0) { + i = simple_strtoul(source + 3, NULL, 10); + if ((i = 1) (i = 16)) + return (QE_BRG1 - 1) + i; + else + return QE_CLK_DUMMY; + } + + if (strncasecmp(source, clk, 3) == 0) { + i = simple_strtoul(source + 3, NULL, 10); + if ((i = 1) (i = 24)) + return (QE_CLK1 - 1) + i; + else + return QE_CLK_DUMMY; + } + + return QE_CLK_DUMMY; +} +EXPORT_SYMBOL(qe_clock_source); + /* Initialize SNUMs (thread serial numbers) according to * QE Module Control chapter, SNUM table */ diff --git a/include/asm-powerpc/qe.h b/include/asm-powerpc/qe.h index 35c7b8d..430dc77 100644 --- a/include/asm-powerpc/qe.h +++ b/include/asm-powerpc/qe.h @@ -84,6 +84,7 @@ extern int par_io_data_set(u8 port, u8 pin, u8 val); /* QE internal API */ int qe_issue_cmd(u32 cmd, u32 device, u8 mcn_protocol, u32 cmd_input); +enum qe_clock qe_clock_source(const char *source); int qe_setbrg(enum qe_clock brg, unsigned int rate, unsigned int multiplier); int qe_get_snum(void); void qe_put_snum(u8 snum); -- 1.5.2.4 -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [README] away until Dec 3rd
On Thu, Nov 22, 2007 at 12:27:47PM +0800, Herbert Xu wrote: On Tue, Nov 20, 2007 at 08:29:21PM -0800, David Miller wrote: During this time Herbert Xu (CC:'d) will take care of both the net-2.6 stable tree and the net-2.6.25 devel tree. For this duration please use the net-2.6.25 tree at this location for basing your patches: OK, David is back now. If I haven't taken your patches yet, then please resend them to [EMAIL PROTECTED] because all previous submissions may have been deleted. Andrew, you can stop pulling my tree and start pulling David's now. git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6.25.git/ Thanks, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmVHI~} [EMAIL PROTECTED] Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] Fix memory corruption in fec_mpc52xx
From: Jon Smirl [EMAIL PROTECTED] The mpc5200 fec driver is corrupting memory. This patch fixes two bugs where the wrong skb was being referenced. Signed-off-by: Jon Smirl [EMAIL PROTECTED] Acked-by: Domen Puncer [EMAIL PROTECTED] Signed-off-by: Grant Likely [EMAIL PROTECTED] Signed-off-by: David Woodhouse [EMAIL PROTECTED] --- a/drivers/net/fec_mpc52xx.c +++ b/drivers/net/fec_mpc52xx.c @@ -422,7 +422,7 @@ static irqreturn_t mpc52xx_fec_rx_interrupt(int irq, void *dev_id) rskb = bcom_retrieve_buffer(priv-rx_dmatsk, status, (struct bcom_bd **)bd); - dma_unmap_single(dev-dev, bd-skb_pa, skb-len, DMA_FROM_DEVICE); + dma_unmap_single(dev-dev, bd-skb_pa, rskb-len, DMA_FROM_DEVICE); /* Test for errors in received frame */ if (status BCOM_FEC_RX_BD_ERRORS) { @@ -467,7 +467,7 @@ static irqreturn_t mpc52xx_fec_rx_interrupt(int irq, void *dev_id) bcom_prepare_next_buffer(priv-rx_dmatsk); bd-status = FEC_RX_BUFFER_SIZE; - bd-skb_pa = dma_map_single(dev-dev, rskb-data, + bd-skb_pa = dma_map_single(dev-dev, skb-data, FEC_RX_BUFFER_SIZE, DMA_FROM_DEVICE); bcom_submit_next_buffer(priv-rx_dmatsk, skb); -- dwmw2 -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Fix memory corruption in fec_mpc52xx
On 12/3/07, David Woodhouse [EMAIL PROTECTED] wrote: From: Jon Smirl [EMAIL PROTECTED] The mpc5200 fec driver is corrupting memory. This patch fixes two bugs where the wrong skb was being referenced. What does it take to get this patch into the kernel? I've been posting it since Nov 9. It is an obvious bug. I can't even boot one of my boxes without it. -- Jon Smirl [EMAIL PROTECTED] -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC] TCP illinois max rtt aging
Greetings Stephen, Thanks. We'll have to play with the rate of ageing. I used the slower ageing if (ca-cnt_rtt 3) { u64 mean_rtt = ca-sum_rtt; do_div (mean_rtt, ca-cnt_rtt); if (ca-max_rtt mean_rtt) ca-max_rtt -= (ca-max_rtt - mean_rtt) 9; } and still found that the max_rtt drops considerably within a congestion epoch. What would also really help would be getting rid of RTT outliers somehow. I ignore RTT measurements when SACK is active: if (ca-max_rtt rtt) { struct tcp_sock *tp = tcp_sk(sk); if (! tp-sacked_out ) // SACKs cause hi-CPU/hi-RTT. ignore ca-max_rtt = rtt; } which helps a lot, but still gets some outliers. Would it be possible to time-stamp packets in the hardware interrupt handler, instead of waiting for the post-processing stage? Cheers, Lachlan On 03/12/2007, Stephen Hemminger [EMAIL PROTECTED] wrote: On Wed, 28 Nov 2007 21:26:12 -0800 Shao Liu [EMAIL PROTECTED] wrote: Hi Stephen and Lachlan, Thanks for pointing out and fixing this bug. For the max RTT problem, I have considered it also and I have some idea on improve it. I also have some other places to improve. I will summarize all my new ideas and send you an update. For me to change it, could you please give me a link to download to latest source codes for the whole congestion control module in Linux implementation, including the general module for all algorithms, and the implementation for specific algorithms like TCP-Illinois and H-TCP? Thanks for the help! -Shao -Original Message- From: Stephen Hemminger [mailto:[EMAIL PROTECTED] Sent: Wednesday, November 28, 2007 4:44 PM To: Lachlan Andrew Cc: David S. Miller; Herbert Xu; [EMAIL PROTECTED]; Douglas Leith; Robert Shorten; netdev@vger.kernel.org Subject: Re: [PATCH] tcp-illinois: incorrect beta usage Lachlan Andrew wrote: Thanks Stephen. A related problem (largely due to the published algorithm itself) is that Illinois is very aggressive when it over-estimates the maximum RTT. At high load (say 200Mbps and 200ms RTT), a backlog of packets builds up just after a loss, causing the RTT estimate to become large. This makes Illinois think that *all* losses are due to corruption not congestion, and so only back off by 1/8 instead of 1/2. I can't think how to fix this except by better RTT estimation, or changes to Illinois itself. Currently, I ignore RTT measurements when sacked_out != 0and have a heuristic RTT aging mechanism, but that's pretty ugly. Cheers, Lachlan Ageing the RTT estimates needs to be done anyway. Maybe something can be reused from H-TCP. The two are closely related. The following adds gradual aging of max RTT. --- a/net/ipv4/tcp_illinois.c 2007-11-29 08:58:35.0 -0800 +++ b/net/ipv4/tcp_illinois.c 2007-11-29 09:37:33.0 -0800 @@ -63,7 +63,10 @@ static void rtt_reset(struct sock *sk) ca-cnt_rtt = 0; ca-sum_rtt = 0; - /* TODO: age max_rtt? */ + /* add slowly fading memory for maxRTT to accommodate routing changes */ + if (ca-max_rtt ca-base_rtt) + ca-max_rtt = ca-base_rtt + + (((ca-max_rtt - ca-base_rtt) * 31) 5); } static void tcp_illinois_init(struct sock *sk) -- Lachlan Andrew Dept of Computer Science, Caltech 1200 E California Blvd, Mail Code 256-80, Pasadena CA 91125, USA Ph: +1 (626) 395-8820Fax: +1 (626) 568-3603 http://netlab.caltech.edu/~lachlan -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[RFC] TCP illinois max rtt aging
On Wed, 28 Nov 2007 21:26:12 -0800 Shao Liu [EMAIL PROTECTED] wrote: Hi Stephen and Lachlan, Thanks for pointing out and fixing this bug. For the max RTT problem, I have considered it also and I have some idea on improve it. I also have some other places to improve. I will summarize all my new ideas and send you an update. For me to change it, could you please give me a link to download to latest source codes for the whole congestion control module in Linux implementation, including the general module for all algorithms, and the implementation for specific algorithms like TCP-Illinois and H-TCP? Thanks for the help! -Shao -Original Message- From: Stephen Hemminger [mailto:[EMAIL PROTECTED] Sent: Wednesday, November 28, 2007 4:44 PM To: Lachlan Andrew Cc: David S. Miller; Herbert Xu; [EMAIL PROTECTED]; Douglas Leith; Robert Shorten; netdev@vger.kernel.org Subject: Re: [PATCH] tcp-illinois: incorrect beta usage Lachlan Andrew wrote: Thanks Stephen. A related problem (largely due to the published algorithm itself) is that Illinois is very aggressive when it over-estimates the maximum RTT. At high load (say 200Mbps and 200ms RTT), a backlog of packets builds up just after a loss, causing the RTT estimate to become large. This makes Illinois think that *all* losses are due to corruption not congestion, and so only back off by 1/8 instead of 1/2. I can't think how to fix this except by better RTT estimation, or changes to Illinois itself. Currently, I ignore RTT measurements when sacked_out != 0and have a heuristic RTT aging mechanism, but that's pretty ugly. Cheers, Lachlan Ageing the RTT estimates needs to be done anyway. Maybe something can be reused from H-TCP. The two are closely related. The following adds gradual aging of max RTT. --- a/net/ipv4/tcp_illinois.c 2007-11-29 08:58:35.0 -0800 +++ b/net/ipv4/tcp_illinois.c 2007-11-29 09:37:33.0 -0800 @@ -63,7 +63,10 @@ static void rtt_reset(struct sock *sk) ca-cnt_rtt = 0; ca-sum_rtt = 0; - /* TODO: age max_rtt? */ + /* add slowly fading memory for maxRTT to accommodate routing changes */ + if (ca-max_rtt ca-base_rtt) + ca-max_rtt = ca-base_rtt + + (((ca-max_rtt - ca-base_rtt) * 31) 5); } static void tcp_illinois_init(struct sock *sk) -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Fw: [Bug 9493] New: routing table bug or feature adding/replacing/deleting routes
Begin forwarded message: Date: Mon, 3 Dec 2007 11:23:27 -0800 (PST) From: [EMAIL PROTECTED] To: [EMAIL PROTECTED] Subject: [Bug 9493] New: routing table bug or feature adding/replacing/deleting routes http://bugzilla.kernel.org/show_bug.cgi?id=9493 Summary: routing table bug or feature adding/replacing/deleting routes Product: Networking Version: 2.5 KernelVersion: 2.6.23 Platform: All OS/Version: Linux Tree: Mainline Status: NEW Severity: normal Priority: P1 Component: IPV4 AssignedTo: [EMAIL PROTECTED] ReportedBy: [EMAIL PROTECTED] Most recent kernel where this bug did not occur: unknown Distribution: Ubuntu (kernel 2.6.22-14), Gentoo (kernel 2.6.23-gentoo-r3) Hardware Environment: x86 Software Environment: Steps to reproduce: (eth0 is 172.16.32.0/22): $ ip route add 172.17.0.1 via 172.16.32.1 $ ip route append 172.17.0.1 via 172.16.35.1 $ ip ro ... 172.17.0.1 via 172.16.32.1 dev eth0 172.17.0.1 via 172.16.35.1 dev eth0 ... $ ip ro replace 172.17.0.1 via 172.16.35.1 $ ip ro ... 172.17.0.1 via 172.16.35.1 dev eth0 172.17.0.1 via 172.16.35.1 dev eth0 ... $ ip ro del 172.17.0.1 $ ip ro ... 172.17.0.1 via 172.16.35.1 dev eth0 ... Problem Description: Is it OK: - only 1 route is deleted? - only 1 route is replaced? - replacing route results in dublicate routing table entry? Is this a bug or feature? -- Configure bugmail: http://bugzilla.kernel.org/userprefs.cgi?tab=email --- You are receiving this mail because: --- You are the assignee for the bug, or are watching the assignee. -- Stephen Hemminger [EMAIL PROTECTED] -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [RFC] TCP illinois max rtt aging
Hi Stephen and Lachlan, Thanks for the discussion. The gradual aging is surely an option. And another possibility is that, we compute the RTT just before congestion notification, which ideally, represent the full queueing delay + propagation delay. We can compute the average of the last M such values, and either use the average as maxRTT, or use it as a benchmark to judge whether a sample is outlier. How do you think of this idea? And also a question, why the samples when SACK is active are outliers? For the accuracy of time stamping, I am not very familiar with the implementation details. But I can think of two ways, 1) do time stamp in as low layer as possible; 2) use as high priority thread to do it as possible. For 2), we can use separate threads to do time stamp and to process packets. Thanks and I will let you know more of my thoughts after I go over the entire code space! -Shao -Original Message- From: Lachlan Andrew [mailto:[EMAIL PROTECTED] Sent: Monday, December 03, 2007 3:06 PM To: Stephen Hemminger Cc: [EMAIL PROTECTED]; David S. Miller; Herbert Xu; Douglas Leith; Robert Shorten; netdev@vger.kernel.org Subject: Re: [RFC] TCP illinois max rtt aging Greetings Stephen, Thanks. We'll have to play with the rate of ageing. I used the slower ageing if (ca-cnt_rtt 3) { u64 mean_rtt = ca-sum_rtt; do_div (mean_rtt, ca-cnt_rtt); if (ca-max_rtt mean_rtt) ca-max_rtt -= (ca-max_rtt - mean_rtt) 9; } and still found that the max_rtt drops considerably within a congestion epoch. What would also really help would be getting rid of RTT outliers somehow. I ignore RTT measurements when SACK is active: if (ca-max_rtt rtt) { struct tcp_sock *tp = tcp_sk(sk); if (! tp-sacked_out ) // SACKs cause hi-CPU/hi-RTT. ignore ca-max_rtt = rtt; } which helps a lot, but still gets some outliers. Would it be possible to time-stamp packets in the hardware interrupt handler, instead of waiting for the post-processing stage? Cheers, Lachlan On 03/12/2007, Stephen Hemminger [EMAIL PROTECTED] wrote: On Wed, 28 Nov 2007 21:26:12 -0800 Shao Liu [EMAIL PROTECTED] wrote: Hi Stephen and Lachlan, Thanks for pointing out and fixing this bug. For the max RTT problem, I have considered it also and I have some idea on improve it. I also have some other places to improve. I will summarize all my new ideas and send you an update. For me to change it, could you please give me a link to download to latest source codes for the whole congestion control module in Linux implementation, including the general module for all algorithms, and the implementation for specific algorithms like TCP-Illinois and H-TCP? Thanks for the help! -Shao -Original Message- From: Stephen Hemminger [mailto:[EMAIL PROTECTED] Sent: Wednesday, November 28, 2007 4:44 PM To: Lachlan Andrew Cc: David S. Miller; Herbert Xu; [EMAIL PROTECTED]; Douglas Leith; Robert Shorten; netdev@vger.kernel.org Subject: Re: [PATCH] tcp-illinois: incorrect beta usage Lachlan Andrew wrote: Thanks Stephen. A related problem (largely due to the published algorithm itself) is that Illinois is very aggressive when it over-estimates the maximum RTT. At high load (say 200Mbps and 200ms RTT), a backlog of packets builds up just after a loss, causing the RTT estimate to become large. This makes Illinois think that *all* losses are due to corruption not congestion, and so only back off by 1/8 instead of 1/2. I can't think how to fix this except by better RTT estimation, or changes to Illinois itself. Currently, I ignore RTT measurements when sacked_out != 0and have a heuristic RTT aging mechanism, but that's pretty ugly. Cheers, Lachlan Ageing the RTT estimates needs to be done anyway. Maybe something can be reused from H-TCP. The two are closely related. The following adds gradual aging of max RTT. --- a/net/ipv4/tcp_illinois.c 2007-11-29 08:58:35.0 -0800 +++ b/net/ipv4/tcp_illinois.c 2007-11-29 09:37:33.0 -0800 @@ -63,7 +63,10 @@ static void rtt_reset(struct sock *sk) ca-cnt_rtt = 0; ca-sum_rtt = 0; - /* TODO: age max_rtt? */ + /* add slowly fading memory for maxRTT to accommodate routing changes */ + if (ca-max_rtt ca-base_rtt) + ca-max_rtt = ca-base_rtt + + (((ca-max_rtt - ca-base_rtt) * 31) 5); } static void tcp_illinois_init(struct sock *sk) -- Lachlan Andrew Dept of Computer Science, Caltech 1200 E California Blvd, Mail Code 256-80, Pasadena CA 91125, USA Ph: +1 (626) 395-8820Fax: +1 (626) 568-3603 http://netlab.caltech.edu/~lachlan -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at
Re: [PATCH 2/4] datagram: mem_scheudle functions
Herbert Xu wrote: On Wed, Nov 28, 2007 at 01:52:59PM -0500, Hideo AOKI wrote: +static inline int sk_wmem_schedule(struct sock *sk, int size) +{ +if (sk-sk_type == SOCK_STREAM) +return sk_stream_wmem_schedule(sk, size); +else if (sk-sk_type == SOCK_DGRAM) +return sk_datagram_wmem_schedule(sk, size); +else +return 1; +} Why do we need this function? As far as I can see we always know whether it's a stream or datagram socket at compile time so doing a run-time test is pointless. Because we have to call wmem_schedule function in ip_append_data() which is used by several protocols both stream and datagram. I just thought adding the sk_wmem_schedule() was only way to call proper function from ip_append_data(). Please let me know if I misunderstand or there is better way to call wmem_schedule functions. Best regards, Hideo -- Hitachi Computer Products (America) Inc. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 4/4] udp: memory accounting in IPv4
Eric Dumazet wrote: Herbert Xu a écrit : However, I'm still a little concerned about the effect of two more atomic op's per packet that we're adding here. Hang on a sec, that should've been Dave's line since atomic ops are cheap on x86 :) But seriously, it's not so much that we have two more atomic op's per packet, but we have two more writes to a single global counter for each packet. This is going to really suck on SMP. So what I'd like to see is a scheme that's similar to sk_forward_alloc. The idea is that each socket allocates memory using mem_schedule and then stores it in sk_forward_alloc. Each packet then only has to add to/subtract from sk_forward_alloc. There is one big problem with this though, UDP is not serialised like TCP. So you can't just use sk_forward_alloc since it's not an atomic_t. We'll need to think about this one a bit more. I agree adding yet another atomics ops is a big problem. Another idea, coupled with recent work on percpu storage done by Christoph Lameter, would be to use kind of a percpu_counter : We dont really need strong and precise memory accounting (UDP , but TCP as well), just some kind of limit to avoid memory to be too much used. That is, updating a percpu variable, and doing some updates to a global counter only when this percpu variable escapes from a given range. Lot of contended cache lines could benefit from this relaxing (count of sockets...) I would wait first that Christoph work is done, so that we dont need atomic ops on local cpu storage (and no need to disable preemption too). Thank you for your comments. I understood your concern of atomic operations. Let me try to use sk_forward_alloc at first, while percpu storage is an interesting idea. Many thanks, Hideo -- Hitachi Computer Products (America) Inc. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 4/4] udp: memory accounting in IPv4
On Mon, Dec 03, 2007 at 07:14:26PM -0500, Hideo AOKI wrote: Let me try to use sk_forward_alloc at first, while percpu storage is an interesting idea. Actually I don't think sk_forward_alloc would work for UDP because it runs lockless (unlike TCP which is run under a the socket lock). So it's either going to be the atomic op or per-cpu counters. For me the atomic op isn't the issue, it's the SMP cache-line bouncing that's more important so having something that did atomic ops on a socket counter which then feeds into the global counter would solve my concerns. But let's wait and see what Dave has to say about this too. Thanks, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmVHI~} [EMAIL PROTECTED] Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC] TCP illinois max rtt aging
On Mon, 3 Dec 2007 15:59:23 -0800 Shao Liu [EMAIL PROTECTED] wrote: Hi Stephen and Lachlan, Thanks for the discussion. The gradual aging is surely an option. And another possibility is that, we compute the RTT just before congestion notification, which ideally, represent the full queueing delay + propagation delay. We can compute the average of the last M such values, and either use the average as maxRTT, or use it as a benchmark to judge whether a sample is outlier. How do you think of this idea? The problem with an average like that would be storing enough values to be useful and choosing how many to store. Perhaps some form of weighted sliding average which favors recent values heavily would work best. Remember that RTT's have a huge noise component and you are fighting against the long tail distribution trying to see the queue effects. And also a question, why the samples when SACK is active are outliers? Any sample with SACK is going to mean a loss or reordering has occurred. So shouldn't the SACK values be useful, but RTT values from retransmits are not useful. For the accuracy of time stamping, I am not very familiar with the implementation details. But I can think of two ways, 1) do time stamp in as low layer as possible; 2) use as high priority thread to do it as possible. For 2), we can use separate threads to do time stamp and to process packets. Right now the resolution is in microseconds using the hardware clock. The clock usage costs a little bit, but makes the math more accurate. It would be worth exploring sensitivity by taking out RTT_STAMP from the flags field and varying HZ. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] sky2: recovery deadlock fix
Prevent deadlock in sky2 recovery logic. sky2_down calls napi_synchronize which gets stuck if napi was already disabled. Fix by rearranging slightly and not calling napi_disable until after both ports are stopped. The napi_disable probably is being overly paranoid, but it is safe now. Signed-off-by: Stephen Hemminger [EMAIL PROTECTED] --- Please apply for 2.6.24 (upstream-fixes) --- a/drivers/net/sky2.c2007-11-30 16:51:50.0 -0800 +++ b/drivers/net/sky2.c2007-11-30 16:54:52.0 -0800 @@ -2906,16 +2906,14 @@ static void sky2_restart(struct work_str int i, err; rtnl_lock(); - sky2_write32(hw, B0_IMSK, 0); - sky2_read32(hw, B0_IMSK); - napi_disable(hw-napi); - for (i = 0; i hw-ports; i++) { dev = hw-dev[i]; if (netif_running(dev)) sky2_down(dev); } + napi_disable(hw-napi); + sky2_write32(hw, B0_IMSK, 0); sky2_reset(hw); sky2_write32(hw, B0_IMSK, Y2_IS_BASE); napi_enable(hw-napi); -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC] TCP illinois max rtt aging
Greetings, On 03/12/2007, Stephen Hemminger [EMAIL PROTECTED] wrote: On Mon, 3 Dec 2007 15:59:23 -0800 Shao Liu [EMAIL PROTECTED] wrote: And also a question, why the samples when SACK is active are outliers? Any sample with SACK is going to mean a loss or reordering has occurred. So shouldn't the SACK values be useful, but RTT values from retransmits are not useful. When SACK is active, the per-packet processing becomes more involved, tracking the list of lost/SACKed packets. This causes a CPU spike just after a loss, which increases the RTTs, at least in my experience. This is a separate issue from the fact that it is hard to get RTT measurements from lost/retransmitted packets themselves. Cheers, Lachlan -- Lachlan Andrew Dept of Computer Science, Caltech 1200 E California Blvd, Mail Code 256-80, Pasadena CA 91125, USA Ph: +1 (626) 395-8820Fax: +1 (626) 568-3603 http://netlab.caltech.edu/~lachlan -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH][IPVS] Don't leak sysctl tables if the scheduler registration fails
On Mon, Dec 03, 2007 at 01:04:53PM +0300, Pavel Emelyanov wrote: In case we load lblc or lblcr module we can leak some sysctl tables if the call to register_ip_vs_scheduler() fails. This looks correct to me. I've looked at the register_ip_vs_scheduler() code and saw, that the only reason to fail is the name collision, so I think that with some 3rd party schedulers this becomes a relevant issue. No? I guess so. Though presumably register_ip_vs_scheduler() could have other modes of failure in the future. Signed-off-by: Pavel Emelyanov [EMAIL PROTECTED] Acked-by: Simon Horman [EMAIL PROTECTED] --- diff --git a/net/ipv4/ipvs/ip_vs_lblc.c b/net/ipv4/ipvs/ip_vs_lblc.c index b843a11..ad89644 100644 --- a/net/ipv4/ipvs/ip_vs_lblc.c +++ b/net/ipv4/ipvs/ip_vs_lblc.c @@ -580,9 +580,14 @@ static struct ip_vs_scheduler ip_vs_lblc_scheduler = static int __init ip_vs_lblc_init(void) { + int ret; + INIT_LIST_HEAD(ip_vs_lblc_scheduler.n_list); sysctl_header = register_sysctl_table(lblc_root_table); - return register_ip_vs_scheduler(ip_vs_lblc_scheduler); + ret = register_ip_vs_scheduler(ip_vs_lblc_scheduler); + if (ret) + unregister_sysctl_table(sysctl_header); + return ret; } diff --git a/net/ipv4/ipvs/ip_vs_lblcr.c b/net/ipv4/ipvs/ip_vs_lblcr.c index e5b323a..2a5ed85 100644 --- a/net/ipv4/ipvs/ip_vs_lblcr.c +++ b/net/ipv4/ipvs/ip_vs_lblcr.c @@ -769,9 +769,14 @@ static struct ip_vs_scheduler ip_vs_lblcr_scheduler = static int __init ip_vs_lblcr_init(void) { + int ret; + INIT_LIST_HEAD(ip_vs_lblcr_scheduler.n_list); sysctl_header = register_sysctl_table(lblcr_root_table); - return register_ip_vs_scheduler(ip_vs_lblcr_scheduler); + ret = register_ip_vs_scheduler(ip_vs_lblcr_scheduler); + if (ret) + unregister_sysctl_table(sysctl_header); + return ret; } -- Horms -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] SMC911X: Fix using of dereferenced skb after netif_rx
Peter Korsgaard said the following on 2007-12-3 21:47: Wang == Wang Chen [EMAIL PROTECTED] writes: Hi, Wang + len = skb-len; Wangnetif_rx(skb); dev- stats.rx_packets++; Wang - dev-stats.rx_bytes += skb-len; Wang + dev-stats.rx_bytes += len; Why not simply update the stats before calling netif_rx as the return value isn't checked anyway? Even the return value of netif_rx isn't checked, dev-stats maybe changed in netif_rx. But fortunately dev-stats isn't changed in netif_rx. So, I agree. Here is the new patch. Signed-off-by: Wang Chen [EMAIL PROTECTED] --- smc911x.c |2 +- 1 files changed, 1 insertion(+), 1 deletion(-) --- linux-2.6.24.rc3.org/drivers/net/smc911x.c 2007-11-19 12:38:05.0 +0800 +++ linux-2.6.24.rc3/drivers/net/smc911x.c 2007-12-04 09:59:06.0 +0800 @@ -1299,9 +1299,9 @@ smc911x_rx_dma_irq(int dma, void *data) PRINT_PKT(skb-data, skb-len); dev-last_rx = jiffies; skb-protocol = eth_type_trans(skb, dev); - netif_rx(skb); dev-stats.rx_packets++; dev-stats.rx_bytes += skb-len; + netif_rx(skb); spin_lock_irqsave(lp-lock, flags); pkts = (SMC_GET_RX_FIFO_INF() RX_FIFO_INF_RXSUSED_) 16; -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[BUG] open network driver bugs (100)
ID AssigneeSummary 2283[EMAIL PROTECTED] (net forcedeth) NETDEV WATCHDOG: eth1: transmit timed out... 2776[EMAIL PROTECTED] (net dmfe) Davicom 9102AF only works in 10 Mbps 3156[EMAIL PROTECTED] (net de2104x) Kernel panic with de2104x tulip driver on boot 3526[EMAIL PROTECTED] (net 8139too) CardBus NIC locks up system with PIO 3661[EMAIL PROTECTED] (net 8139too) gives strange errors, drops and overruns in... 3765[EMAIL PROTECTED] (net b44) Network link down, when writting to sata disk A... 3801[EMAIL PROTECTED] (net 8139too) does not set PME-Enable upon setting WOL 3938[EMAIL PROTECTED] (net tun) device driver doesn't fill in interface name on... 4186[EMAIL PROTECTED] (wireless airo) Aironet 340 PCMCIA does not support WPA 4420[EMAIL PROTECTED] (net tulip) Problem with 6 bit addressing in tulip_read_e... 4434[EMAIL PROTECTED] (net Tulip) NIC card causes hard lock up of PC 4451[EMAIL PROTECTED] (net via-rhine) VIA Rhine II: media detection fails on re... 4566[EMAIL PROTECTED] (net B44) Randomly driver starts sending garbage and stop... 4701[EMAIL PROTECTED] (net tulip) No driver works with Asant�FAST 10/100 PWA ... 4755[EMAIL PROTECTED] (net b44) At system startup b44 doesn't report correct mi... 4849[EMAIL PROTECTED] (net via-rhine) WAKE ON LAN not working. 4883[EMAIL PROTECTED] (net tg3) doesn't send ARP reply on 8021q VLAN interfaces 5149[EMAIL PROTECTED] Wake-on-lan broken using Intel e100 driver 5569[EMAIL PROTECTED] (net xirc2ps_cs) stopped working in 2.6.14 5624[EMAIL PROTECTED] (net b44) Lockup on 'mii-tool ethX' when 'ifconfig ethX u... 5839[EMAIL PROTECTED] uli526x partially recognizing interface 5870[EMAIL PROTECTED] SiS 190 doesn't download files 5979[EMAIL PROTECTED] (net dmfe) Davicom DM9102 Network Card cuts out every 60 ... 6108[EMAIL PROTECTED] Failure of r8169 ethernet i/f after resume from S3 sleep 6171[EMAIL PROTECTED] 3c59x: netlink only updates online status each 60s 6444[EMAIL PROTECTED] (net 3c59x) transmit timed out 6690[EMAIL PROTECTED] (net forcedeth) 0.57 problems at gigabit speeds 6807[EMAIL PROTECTED] r8169: freeze at higher speeds 6986[EMAIL PROTECTED] e1000 doesn't update trafic counters frequently enough 7051[EMAIL PROTECTED] prism54 does not respect carrier 7071[EMAIL PROTECTED] (net cassini) Can't bring Sun Quad GigaSwift Ethernet int... 7085[EMAIL PROTECTED] System freezes when load the module sis190 7133[EMAIL PROTECTED] (net ibmtr_cs) seams working as fine with 64bit kernel th... 7226[EMAIL PROTECTED] (net 8139too) Problem in forcing RTL8139 into 100Mbps ful... 7440[EMAIL PROTECTED] (net 3c59x) suddenly receives no more packets 7443[EMAIL PROTECTED] (net 8139too) transmit timeouts with edge triggered PCI irq 7487[EMAIL PROTECTED] (net sundance) driver fails to recognize carrier status 7588[EMAIL PROTECTED] (net wan z85230) Race: Lock is not acquired before callin... 7617[EMAIL PROTECTED] sky2 driver crashes 7633[EMAIL PROTECTED] (net bonding) Race: Caller of function alb_swap_mac_addr(... 7659[EMAIL PROTECTED] (net 3c589_cs) Race: a lock must be held before entering ... 7660[EMAIL PROTECTED] (net wan z85230) Race: lock must be held before entering ... 7682[EMAIL PROTECTED] bcm43xx: iwlist scan: no scan results with 2.6.19.1 7696[EMAIL PROTECTED] (net b44) driver doesn't work under heavy load 7718[EMAIL PROTECTED] (net xircom_cb, xircom_tulip_cb) When card inserted, dmes... 7752[EMAIL PROTECTED] drivers/net/wireless/zd1211rw/zd_chip.c:1461 ASSERT r = ... 7821[EMAIL PROTECTED] sundance driver not activating D-Link DFE-580TX adapter 7853[EMAIL PROTECTED] (net ppp) Get mppe_decompress: osize too small while fo... 7856[EMAIL PROTECTED] (net b44) WOL problem 7946[EMAIL PROTECTED] ipw2200 driver + wpa_supplicant (wpa-psk) = fail to send ... 8007[EMAIL PROTECTED] (net 3c59x) 3c905 Tornado dosen't work (eeprom mac addres... 8009[EMAIL PROTECTED] (net ppp) PPPoE+mppe Server fail with Win Client 8043[EMAIL PROTECTED] curious communication breakage with e1000 and NBT 8061[EMAIL PROTECTED] is VIA Network device statistics calculation wrong? missi... 8084[EMAIL PROTECTED] phy_mii_ioctl(...) forgets to return phy_device's speed s... 8106[EMAIL PROTECTED] tx_errors and tx_fifo_errors not updated consistantly 8132[EMAIL PROTECTED] pptp server lockup in ppp_asynctty_receive() 8252[EMAIL PROTECTED]
Re: [PATCH][IPVS] Fix sched registration race when checking for name collision
On Mon, Dec 03, 2007 at 01:10:57PM +0300, Pavel Emelyanov wrote: The register_ip_vs_scheduler() checks for the scheduler with the same name under the read-locked __ip_vs_sched_lock, then drops, takes it for writing and puts the scheduler in list. This is racy, since we can have a race window between the lock being re-locked for writing. The fix is to search the scheduler with the given name right under the write-locked __ip_vs_sched_lock. This looks correct to me. Signed-off-by: Pavel Emelyanov [EMAIL PROTECTED] Acked-by: Simon Horman [EMAIL PROTECTED] --- diff --git a/net/ipv4/ipvs/ip_vs_sched.c b/net/ipv4/ipvs/ip_vs_sched.c index 1602304..4322358 100644 --- a/net/ipv4/ipvs/ip_vs_sched.c +++ b/net/ipv4/ipvs/ip_vs_sched.c @@ -183,19 +183,6 @@ int register_ip_vs_scheduler(struct ip_vs_scheduler *scheduler) /* increase the module use count */ ip_vs_use_count_inc(); - /* - * Make sure that the scheduler with this name doesn't exist - * in the scheduler list. - */ - sched = ip_vs_sched_getbyname(scheduler-name); - if (sched) { - ip_vs_scheduler_put(sched); - ip_vs_use_count_dec(); - IP_VS_ERR(register_ip_vs_scheduler(): [%s] scheduler - already existed in the system\n, scheduler-name); - return -EINVAL; - } - write_lock_bh(__ip_vs_sched_lock); if (scheduler-n_list.next != scheduler-n_list) { @@ -207,6 +194,20 @@ int register_ip_vs_scheduler(struct ip_vs_scheduler *scheduler) } /* + * Make sure that the scheduler with this name doesn't exist + * in the scheduler list. + */ + list_for_each_entry(sched, ip_vs_schedulers, n_list) { + if (strcmp(scheduler-name, sched-name) == 0) { + write_unlock_bh(__ip_vs_sched_lock); + ip_vs_use_count_dec(); + IP_VS_ERR(register_ip_vs_scheduler(): [%s] scheduler + already existed in the system\n, + scheduler-name); + return -EINVAL; + } + } + /* * Add it into the d-linked scheduler list */ list_add(scheduler-n_list, ip_vs_schedulers); -- Horms -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[BUG] Open networking bugs (73)
Some of these are old and maybe already fixed? ID Owner Description 2803[EMAIL PROTECTED] isa modem detected but does not initialize 4206[EMAIL PROTECTED] NAT/Masquerade not working 4595[EMAIL PROTECTED] Error inserting ebt_ulog 4809[EMAIL PROTECTED] AF_PACKET sockets ignore the SO_TIMESTAMP sockopt 4885[EMAIL PROTECTED] IPv6 Route entry being wrongly removed 5088[EMAIL PROTECTED] disable ECN per connection 5091[EMAIL PROTECTED] connection tracking can give abnormal throughput result 5131[EMAIL PROTECTED] Computer hangs when default-gw becomes unreachable 5248[EMAIL PROTECTED] Rapid loading and unloading of iptables modules gives oop... 5999[EMAIL PROTECTED] Iptables modules fail to load on Alpha arch 6036[EMAIL PROTECTED] mmap'ed write to socket hangs when connection remote end ... 6161[EMAIL PROTECTED] Modem Slow down speed 50% after kernel upgrade 6187[EMAIL PROTECTED] netlink: possible use after free in netlink_recvmsg 6197[EMAIL PROTECTED] unregister_netdevice: waiting for ppp9 to become free. Us... 6319[EMAIL PROTECTED] ipsec tunnel asymmetrical mtu 6322[EMAIL PROTECTED] Kernel Panic (tc filter delete panic) 6339[EMAIL PROTECTED] Wish: /proc or /sys-access to counters 6548[EMAIL PROTECTED] MPPE Encrypt Decrypt module bug. 6681[EMAIL PROTECTED] TC crash and rule freeze 6682[EMAIL PROTECTED] BUG: soft lockup detected on CPU#0! / ksoftirqd takse 100... 6830[EMAIL PROTECTED] Need a /proc to view IF-MIB counters not in /proc/net/dev 6917[EMAIL PROTECTED] BUG: warning at net/core/dev.c:1171/skb_checksum_help() 6998[EMAIL PROTECTED] rp_filter missing for ipv6 7058[EMAIL PROTECTED] CONFIG_IP_ROUTE_FWMARK breaks rp_filter checks 7198[EMAIL PROTECTED] balance-alb bonding oops when disconnecting primary slave... 7202[EMAIL PROTECTED] sun happy meal ethernet driver problem 7512[EMAIL PROTECTED] irda: sock_error in af_irda.c:irda_recvmsg_stream 7554[EMAIL PROTECTED] oops after insmod rfcomm and rmmod rfcomm 7709[EMAIL PROTECTED] Exposing the string field lengths of the ethtool_drvinfo ... 7732[EMAIL PROTECTED] System freeze every 10 days 7846[EMAIL PROTECTED] Strange trouble with samba and 2.6.19 kernel 7952[EMAIL PROTECTED] slattach only works every other time 7974[EMAIL PROTECTED] (bonding): scheduling while atomic 7983[EMAIL PROTECTED] kernel BUG at kernel/workqueue.c:150! 8054[EMAIL PROTECTED] tipc_ref_discard tipc_deleteport locking dependency 8085networking_netfilter-iptabl... performance drop in 2.6.20 (CONFIG_NF_CONNTRACK_SUPPORT) 8107[EMAIL PROTECTED] dev-header_cache_update has a random value 8203[EMAIL PROTECTED] Race: a lock is expected before calling llc_conn_state_pr... 8215[EMAIL PROTECTED] A lock is expected before calling zero_fw_chain, but it ... 8218[EMAIL PROTECTED] 8021q - Vlan - Tag lost/missing on base interface when sn... 8253[EMAIL PROTECTED] BUG: unable to handle kernel paging request at virtual ad... 8325networking_netfilter-iptabl... -j REDIRECT --to-ports 1000-1009, always first choosen 8338networking_netfilter-iptabl... NAT of TCP connections broken 8382[EMAIL PROTECTED] 2.6.20 cannot route packets outside tunnel 8474[EMAIL PROTECTED] regression failure, can't even ping modem 8536[EMAIL PROTECTED] Kernel drops UDP packets silently when reading from certa... 8561[EMAIL PROTECTED] list_add corruption. prev-next should be next (f7d28794)... 8654[EMAIL PROTECTED] possible connect() bug 8659[EMAIL PROTECTED] Patch for bug #8635 breaks TCPv6 8726[EMAIL PROTECTED] MSG_TRUNC not regarded in unix_dgram_recvmsg() 8732[EMAIL PROTECTED] Samba - very slow -one way- speed on AMD 690 8736[EMAIL PROTECTED] New TC deadlock scenario 8754[EMAIL PROTECTED] Kernel addrconf modifies MTU of non-kernel routes 8755[EMAIL PROTECTED] ip -6 route change behaves like ip -6 route add 8766[EMAIL PROTECTED] 802.1q VLAN stacking + REORDER_HDR is broken 8786[EMAIL PROTECTED] Dell 350 Bluetooth device (usb id 413c:8103) not found 8891[EMAIL PROTECTED] in-kernel rpc generates broken RPCBPROC_GETVERSADDR v4 re... 8895[EMAIL PROTECTED] An ioctl to delete an ipv6 tunnel leads to a kernel panic 8961[EMAIL PROTECTED] BUG triggered by oidentd in netlink code 8971[EMAIL PROTECTED] htb class delete causes kernelpanic and other htb bugs. 8975[EMAIL PROTECTED] Broadcasting fails when default gateway can't be reached. 9079[EMAIL
[PATCH 2.6.24] pasemi_mac: Fix reuse of free'd skb
Turns out we're freeing the skb when we detect CRC error, but we're not clearing out info-skb. We could either clear it and have the stack reallocate it, or just leave it and the rx ring refill code will reuse the one that was allocated. Reusing a freed skb obviously caused some nasty crashes of various kind, as reported by Brent Baude and David Woodhouse. Signed-off-by: Olof Johansson [EMAIL PROTECTED] --- Jeff, I'd like to see this in 2.6.24, it's causing some real problems out there. It's not needed in the 2.6.25 queue since the other changes there have already covered these cases. My test network at home is quiet enough to not cause CRC errors, we mainly get those during interface bringup before speed is configured. diff --git a/drivers/net/pasemi_mac.c b/drivers/net/pasemi_mac.c index 09b4fde..6617e24 100644 --- a/drivers/net/pasemi_mac.c +++ b/drivers/net/pasemi_mac.c @@ -586,7 +586,7 @@ static int pasemi_mac_clean_rx(struct pasemi_mac *mac, int limit) /* CRC error flagged */ mac-netdev-stats.rx_errors++; mac-netdev-stats.rx_crc_errors++; - dev_kfree_skb_irq(skb); + /* No need to free skb, it'll be reused */ goto next; } -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
IPv6 IPsec input
Hello Herbert, We have a trouble of IPv6 IPsec input process with your net-2.6.25 tree. We can not receive IPsec packets. I found that you directly pass the address family (AF_INET) to xfrm_state_lookup in xfrm_input. I guess it may result the bug. Is it correct? Best regards, -- Kazunori Miyazawa -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/3] [UDP6]: Counter increment on BH mode
Herbert Xu said the following on 2007-12-3 21:17: On Mon, Dec 03, 2007 at 10:54:35PM +1100, Herbert Xu wrote: diff --git a/include/net/snmp.h b/include/net/snmp.h index ea206bf..9444b54 100644 --- a/include/net/snmp.h +++ b/include/net/snmp.h @@ -23,6 +23,7 @@ #include linux/cache.h #include linux/snmp.h +#include linux/smp.h It's no need to include smp.h? -- WCN -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/3] [UDP6]: Counter increment on BH mode
On Tue, Dec 04, 2007 at 11:50:55AM +0800, Wang Chen wrote: #include linux/cache.h #include linux/snmp.h +#include linux/smp.h It's no need to include smp.h? We need it for smp_processor_id. Well we needed it before too but there's probably some implicit inclusion which has made it work. It's better to declare these inclusions explicitly as otherwise they may break on less common architectures later. Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmVHI~} [EMAIL PROTECTED] Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH][IPv6][IPsec] fix the address family for xfrm_state_lookup in xfrm_input
Hi Herbert, This is the patch to fix IPv6 IPsec input. Signed-off-by: Kazunori MIYAZAWA [EMAIL PROTECTED] --- net/xfrm/xfrm_input.c |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/net/xfrm/xfrm_input.c b/net/xfrm/xfrm_input.c index 96f42c1..da3c963 100644 --- a/net/xfrm/xfrm_input.c +++ b/net/xfrm/xfrm_input.c @@ -136,7 +136,7 @@ int xfrm_input(struct sk_buff *skb, int nexthdr, __be32 spi, int encap_type) if (skb-sp-len == XFRM_MAX_DEPTH) goto drop; - x = xfrm_state_lookup(daddr, spi, nexthdr, AF_INET); + x = xfrm_state_lookup(daddr, spi, nexthdr, skb-dst-ops-family); if (x == NULL) goto drop; -- 1.4.4.2 -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH][RESEND] PHY: Add the phy_device_release device method.
On Mon, 3 Dec 2007 09:35:11 +0100 Thierry Reding [EMAIL PROTECTED] wrote: In cases where more than a single PHY is found on the MDIO bus, the kernel will print a warning that this method is missing for each PHY device that is not attached to a networking device. Signed-off-by: Thierry Reding [EMAIL PROTECTED] --- drivers/net/phy/mdio_bus.c | 19 ++- 1 files changed, 18 insertions(+), 1 deletions(-) diff --git a/drivers/net/phy/mdio_bus.c b/drivers/net/phy/mdio_bus.c index fc2f0e6..cb7fb47 100644 --- a/drivers/net/phy/mdio_bus.c +++ b/drivers/net/phy/mdio_bus.c @@ -36,6 +36,23 @@ #include asm/uaccess.h /** + * phy_device_release - free a phy_device structure when all users of it are + * finished. + * + * @dev: device that's been disconnected + * + * Will be called only by the device core when all users of this phy_device + * are done. + */ +static void phy_device_release(struct device *dev) +{ + struct phy_device *phy; + + phy = to_phy_device(dev); + kfree(phy); +} + +/** * mdiobus_register - bring up all the PHYs on a given bus and attach them to bus * @bus: target mii_bus * @@ -83,6 +100,7 @@ int mdiobus_register(struct mii_bus *bus) if (phydev) { phydev-irq = bus-irq[i]; + phydev-dev.release = phy_device_release; phydev-dev.parent = bus-dev; phydev-dev.bus = mdio_bus_type; snprintf(phydev-dev.bus_id, BUS_ID_SIZE, PHY_ID_FMT, bus-id, i); @@ -112,7 +130,6 @@ void mdiobus_unregister(struct mii_bus *bus) for (i = 0; i PHY_MAX_ADDR; i++) { if (bus-phy_map[i]) { device_unregister(bus-phy_map[i]-dev); - kfree(bus-phy_map[i]); } } } I've been sitting on ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.24-rc3/2.6.24-rc3-mm2/broken-out/phy-implement-release-function.patch for a few weeks. For some reason I have it in my nacked netdev patches section but I think that was a mistake and it has not (yet ;)) been nacked. Anyway, Anton's patch looks somewhat different from yours. Please compare notes. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH][IPv6][IPsec] fix the address family for xfrm_state_lookup in xfrm_input
On Tue, Dec 04, 2007 at 02:28:36PM +0900, Kazunori MIYAZAWA wrote: Hi Herbert, This is the patch to fix IPv6 IPsec input. Signed-off-by: Kazunori MIYAZAWA [EMAIL PROTECTED] Good catch! Thanks. diff --git a/net/xfrm/xfrm_input.c b/net/xfrm/xfrm_input.c index 96f42c1..da3c963 100644 --- a/net/xfrm/xfrm_input.c +++ b/net/xfrm/xfrm_input.c @@ -136,7 +136,7 @@ int xfrm_input(struct sk_buff *skb, int nexthdr, __be32 spi, int encap_type) if (skb-sp-len == XFRM_MAX_DEPTH) goto drop; - x = xfrm_state_lookup(daddr, spi, nexthdr, AF_INET); + x = xfrm_state_lookup(daddr, spi, nexthdr, skb-dst-ops-family); I'd prefer to put this in XFRM_SPI_SKB_CB instead because the xfrm layer shouldn't really rely on dst-ops structures except on entry and exit, like this: [IPSEC]: Use the correct family for input state lookup When merging the input paths of IPsec I accidentally left a hard-coded AF_INET for the state lookup call. This broke IPv6 obviously. This patch fixes by getting the input callers to specify the family through skb-cb. Credit goes to Kazunori Miyazawa for diagnosing this and providing an initial patch. Signed-off-by: Herbert Xu [EMAIL PROTECTED] Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmVHI~} [EMAIL PROTECTED] Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt -- diff --git a/include/net/xfrm.h b/include/net/xfrm.h index 99251b7..3f86cd8 100644 --- a/include/net/xfrm.h +++ b/include/net/xfrm.h @@ -534,6 +534,7 @@ struct xfrm_spi_skb_cb { } header; unsigned int daddroff; + unsigned int family; }; #define XFRM_SPI_SKB_CB(__skb) ((struct xfrm_spi_skb_cb *)((__skb)-cb[0])) diff --git a/net/ipv4/xfrm4_input.c b/net/ipv4/xfrm4_input.c index 0c377a6..33f990d 100644 --- a/net/ipv4/xfrm4_input.c +++ b/net/ipv4/xfrm4_input.c @@ -39,6 +39,7 @@ drop: int xfrm4_rcv_encap(struct sk_buff *skb, int nexthdr, __be32 spi, int encap_type) { + XFRM_SPI_SKB_CB(skb)-family = AF_INET; XFRM_SPI_SKB_CB(skb)-daddroff = offsetof(struct iphdr, daddr); return xfrm_input(skb, nexthdr, spi, encap_type); } diff --git a/net/ipv6/xfrm6_input.c b/net/ipv6/xfrm6_input.c index e2c3efd..74f3aac 100644 --- a/net/ipv6/xfrm6_input.c +++ b/net/ipv6/xfrm6_input.c @@ -23,6 +23,7 @@ int xfrm6_extract_input(struct xfrm_state *x, struct sk_buff *skb) int xfrm6_rcv_spi(struct sk_buff *skb, int nexthdr, __be32 spi) { + XFRM_SPI_SKB_CB(skb)-family = AF_INET6; XFRM_SPI_SKB_CB(skb)-daddroff = offsetof(struct ipv6hdr, daddr); return xfrm_input(skb, nexthdr, spi, 0); } diff --git a/net/xfrm/xfrm_input.c b/net/xfrm/xfrm_input.c index 96f42c1..8b2b1b5 100644 --- a/net/xfrm/xfrm_input.c +++ b/net/xfrm/xfrm_input.c @@ -102,6 +102,7 @@ int xfrm_input(struct sk_buff *skb, int nexthdr, __be32 spi, int encap_type) __be32 seq; struct xfrm_state *x; xfrm_address_t *daddr; + unsigned int family; int decaps = 0; int async = 0; @@ -127,6 +128,7 @@ int xfrm_input(struct sk_buff *skb, int nexthdr, __be32 spi, int encap_type) daddr = (xfrm_address_t *)(skb_network_header(skb) + XFRM_SPI_SKB_CB(skb)-daddroff); + family = XFRM_SPI_SKB_CB(skb)-family; seq = 0; if (!spi (err = xfrm_parse_spi(skb, nexthdr, spi, seq)) != 0) @@ -136,7 +138,7 @@ int xfrm_input(struct sk_buff *skb, int nexthdr, __be32 spi, int encap_type) if (skb-sp-len == XFRM_MAX_DEPTH) goto drop; - x = xfrm_state_lookup(daddr, spi, nexthdr, AF_INET); + x = xfrm_state_lookup(daddr, spi, nexthdr, family); if (x == NULL) goto drop; @@ -198,6 +200,7 @@ resume: * transport mode so the outer address is identical. */ daddr = x-id.daddr; + family = x-outer_mode-afinfo-family; err = xfrm_parse_spi(skb, nexthdr, spi, seq); if (err 0) -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: IPv6 IPsec input
On Tue, Dec 04, 2007 at 12:39:43PM +0900, Kazunori MIYAZAWA wrote: I found that you directly pass the address family (AF_INET) to xfrm_state_lookup in xfrm_input. I guess it may result the bug. Is it correct? Yes sorry, I didn't test IPv6. Ironic because all this work is for the benefit of IPv6 :) Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmVHI~} [EMAIL PROTECTED] Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH][IPv6][IPsec] fix the address family for xfrm_state_lookup in xfrm_input
Herbert Xu wrote: On Tue, Dec 04, 2007 at 02:28:36PM +0900, Kazunori MIYAZAWA wrote: Hi Herbert, This is the patch to fix IPv6 IPsec input. Signed-off-by: Kazunori MIYAZAWA [EMAIL PROTECTED] Good catch! Thanks. diff --git a/net/xfrm/xfrm_input.c b/net/xfrm/xfrm_input.c index 96f42c1..da3c963 100644 --- a/net/xfrm/xfrm_input.c +++ b/net/xfrm/xfrm_input.c @@ -136,7 +136,7 @@ int xfrm_input(struct sk_buff *skb, int nexthdr, __be32 spi, int encap_type) if (skb-sp-len == XFRM_MAX_DEPTH) goto drop; - x = xfrm_state_lookup(daddr, spi, nexthdr, AF_INET); + x = xfrm_state_lookup(daddr, spi, nexthdr, skb-dst-ops-family); I'd prefer to put this in XFRM_SPI_SKB_CB instead because the xfrm layer shouldn't really rely on dst-ops structures except on entry and exit, like this: [IPSEC]: Use the correct family for input state lookup When merging the input paths of IPsec I accidentally left a hard-coded AF_INET for the state lookup call. This broke IPv6 obviously. This patch fixes by getting the input callers to specify the family through skb-cb. Credit goes to Kazunori Miyazawa for diagnosing this and providing an initial patch. Signed-off-by: Herbert Xu [EMAIL PROTECTED] Cheers, It works fine. Thank you!! -- Kazunori Miyazawa -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH][IPv6][IPsec] fix the address family for xfrm_state_lookup in xfrm_input
From: Kazunori MIYAZAWA [EMAIL PROTECTED] Date: Tue, 04 Dec 2007 15:23:01 +0900 Herbert Xu wrote: I'd prefer to put this in XFRM_SPI_SKB_CB instead because the xfrm layer shouldn't really rely on dst-ops structures except on entry and exit, like this: [IPSEC]: Use the correct family for input state lookup When merging the input paths of IPsec I accidentally left a hard-coded AF_INET for the state lookup call. This broke IPv6 obviously. This patch fixes by getting the input callers to specify the family through skb-cb. Credit goes to Kazunori Miyazawa for diagnosing this and providing an initial patch. Signed-off-by: Herbert Xu [EMAIL PROTECTED] Cheers, It works fine. Patch applied, thanks everyone. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 7/7] [TFRC] New rx history code
NAK. You have changed the control flow of the algorithm and the underlying data structure. Originally it had been an array of pointers, and this had been a design decision right from the first submissions in March. From I of http://www.erg.abdn.ac.uk/users/gerrit/dccp/notes/ccid3_packet_reception/loss_detection/loss_detection_algorithm_notes.txt 1. 'ring' is an array of pointers rather than a contiguous area in memory. The reason is that, when having to swap adjacent entries to sort the history, it is easier to swap pointers than to copy (heavy-weight) history-entry data structs. But in your algorithm it becomes a normal linear array. As a consequence all subsequent code needs to be rewritten to use copying instead of swapping pointers. And there is a lot of that, since the loss detection code makes heavy use of swapping entries in order to cope with reordered and/or delayed packets. This is not what I would call entirely equivalent as you claim. Your algorithm is fundamentally different, the control flow is not the same, nor are the underlying data structures. | Credit here goes to Gerrit Renker, that provided the initial implementation for | this new codebase. | | I modified it just to try to make it closer to the existing API, hide details from | the CCIDs and fix a couple bugs found in the initial implementation. What is a couple bugs? So far you have pointed out only one problem and that was agreed, it can be fixed. But it remains a side issue, it is not a failure of the algorithm per se. What is worse here is that you take this single occurrence as a kind of carte blanche to mess around with the code as you please. Where please are the other bugs you are referring to? I am not buying into this and am not convinced that you are understanding what you are doing. It is the third time that you take patches which have been submitted on [EMAIL PROTECTED] for over half a year, for which you have offered no more than a sentence of feedback, release them under your name, and introduce fundamental changes which alter the behaviour. The first instance was the set of ktime patches which I had developed for the test tree and which you extended to DCCP. In this case it were in fact three bugs that you added in migrating my patches. I know this because it messed up the way CCID3 behaved and since I spent several hours chasing these. And, in contrast to the above, it is not a mere claim: that is recorded in the netdev mail archives. The second instance was when you released the TX history patches under your name. Pro forma there was an RFC patch at 3pm, de facto it was checked in a few hours later: input not welcome. The third instance is now where you change in essence the floor underneath this work. Since you are using a different basis, the algorithm is (in addition to the changes in control flow) necessarily different. I have provided documentation, written test modules, and am able to prove that the algorithm as submitted works reasonably correct. In addition, the behaviour has been verified using regression tests. I am not prepared to take your claims and expressions of deepest respect at face value since your actions - not your words - show that you are, in fact, not dealing respectfully with work which has taken months to complete and verify. During 9 months on [EMAIL PROTECTED] you did not provide so much as a paragraph of feedback. Instead you mopped up code from the test tree, modified it according to your own whims and now, in the circle of your invitation-only buddies, start to talk about having discussions and iterations. The only iteration I can see is in fixing up the things you introduced, so it is not you who has to pay the price. Sorry, unless you can offer a more mature model of collaboration, consider me out of this and the patches summarily withdrawn. I am not prepared to throw away several months of work just because you feel inspired to do as you please with the work of others. Gerrit | Original changeset comment from Gerrit: | --- | This provides a new, self-contained and generic RX history service for TFRC | based protocols. | | Details: | * new data structure, initialisation and cleanup routines; | * allocation of dccp_rx_hist entries local to packet_history.c, |as a service exported by the dccp_tfrc_lib module. | * interface to automatically track highest-received seqno; | * receiver-based RTT estimation (needed for instance by RFC 3448, 6.3.1); | * a generic function to test for `data packets' as per RFC 4340, sec. 7.7. | --- | | Signed-off-by: Arnaldo Carvalho de Melo [EMAIL PROTECTED] | --- | net/dccp/ccids/ccid3.c | 255 | net/dccp/ccids/ccid3.h | 14 +- | net/dccp/ccids/lib/loss_interval.c | 14 +- | net/dccp/ccids/lib/packet_history.c | 277 +++--- | net/dccp/ccids/lib/packet_history.h
Re: [PATCH][RESEND] PHY: Add the phy_device_release device method.
* Andrew Morton wrote: On Mon, 3 Dec 2007 09:35:11 +0100 Thierry Reding [EMAIL PROTECTED] wrote: In cases where more than a single PHY is found on the MDIO bus, the kernel will print a warning that this method is missing for each PHY device that is not attached to a networking device. Signed-off-by: Thierry Reding [EMAIL PROTECTED] --- drivers/net/phy/mdio_bus.c | 19 ++- 1 files changed, 18 insertions(+), 1 deletions(-) diff --git a/drivers/net/phy/mdio_bus.c b/drivers/net/phy/mdio_bus.c index fc2f0e6..cb7fb47 100644 --- a/drivers/net/phy/mdio_bus.c +++ b/drivers/net/phy/mdio_bus.c @@ -36,6 +36,23 @@ #include asm/uaccess.h /** + * phy_device_release - free a phy_device structure when all users of it are + * finished. + * + * @dev: device that's been disconnected + * + * Will be called only by the device core when all users of this phy_device + * are done. + */ +static void phy_device_release(struct device *dev) +{ + struct phy_device *phy; + + phy = to_phy_device(dev); + kfree(phy); +} + +/** * mdiobus_register - bring up all the PHYs on a given bus and attach them to bus * @bus: target mii_bus * @@ -83,6 +100,7 @@ int mdiobus_register(struct mii_bus *bus) if (phydev) { phydev-irq = bus-irq[i]; + phydev-dev.release = phy_device_release; phydev-dev.parent = bus-dev; phydev-dev.bus = mdio_bus_type; snprintf(phydev-dev.bus_id, BUS_ID_SIZE, PHY_ID_FMT, bus-id, i); @@ -112,7 +130,6 @@ void mdiobus_unregister(struct mii_bus *bus) for (i = 0; i PHY_MAX_ADDR; i++) { if (bus-phy_map[i]) { device_unregister(bus-phy_map[i]-dev); - kfree(bus-phy_map[i]); } } } I've been sitting on ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.24-rc3/2.6.24-rc3-mm2/broken-out/phy-implement-release-function.patch for a few weeks. For some reason I have it in my nacked netdev patches section but I think that was a mistake and it has not (yet ;)) been nacked. Anyway, Anton's patch looks somewhat different from yours. Please compare notes. FWIW, I like Anton's patch better, especially since it plugs a possible memory leak. I'm not sure it's useful or necessary to export the phy_device_free symbol, though. The only other difference that I can see is that Anton's patch sets the release function in a different place, when the device is created, whereas my patch only sets it before registering the device. Both should be identical since the release function won't be needed unless the device is registered. Thierry signature.asc Description: Digital signature