[PATCH] IPV6: Remove bogus WARN_ON() in Proxy-NA handling.
[IPV6]: Remove bogus WARN_ON in Proxy-NA handling. Signed-off-by: YOSHIFUJI Hideaki <[EMAIL PROTECTED]>c --- diff --git a/net/ipv6/ndisc.c b/net/ipv6/ndisc.c index 0304b5f..41a8a5f 100644 --- a/net/ipv6/ndisc.c +++ b/net/ipv6/ndisc.c @@ -967,8 +967,6 @@ static void ndisc_recv_na(struct sk_buff ipv6_devconf.forwarding && ipv6_devconf.proxy_ndp && pneigh_lookup(&nd_tbl, &msg->target, dev, 0)) { /* XXX: idev->cnf.prixy_ndp */ - WARN_ON(skb->dst != NULL && - ((struct rt6_info *)skb->dst)->rt6i_idev); goto out; } -- YOSHIFUJI Hideaki @ USAGI Project <[EMAIL PROTECTED]> GPG-FP : 9022 65EB 1ECF 3AD1 0BDF 80D8 4807 F894 E062 0EEA - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: Suppress / delay SYN-ACK
On Friday, October 13, 2006 7:42 AM, Stephen J. Bevan wrote: > Say you are writing a transparent proxy i.e. when a TCP connection is > made through the box, rather than forwarding the TCP SYN, it is > delivered locally where it accepted and then the proxy makes a > separate TCP connection to original IP address. Thus all traffic > flows through a user-space proxy that can cache, log, virus scan, ... > etc. the traffic. Say also that the proxy is for a protocol that can > mediate peer<->peer connections via a server (e.g. most IM > protocols). Furher still assume that the client has the property > that if while trying to establish a peer<->peer connection it will > back off and use the server if it does not manage to establish the > peer<->peer TCP connection but if it does establish the connection > then it will not back off to use the server. Thus if a client is > behind the transparent proxy the proxy terminates the TCP connection > locally and at that point the client thinks it has connected to the > peer even though the proxy has yet to establish a connection to the > peer. > > Should the proxy fail to do so all it can do is drop the > client<->proxy connection at which point the client does not connect > via the server and the user of the client is not happy since if the > proxy wasn't there everything would have worked just fine. So, if > the proxy could delay the SYN/ACK until it has determined whether it > can really connect to the IP address in the SYN, then it can decide > whether to SYN/ACK or just not respond. > > Of course, the much simpler solution is to fix the client program so > that it will still back off to the server even if it does manage to > make a TCP connection. However, fixing other people's software is > easier said than done. So if you are trying to $ell a tranparent > proxy solution, you need to handle it somehow. Delayed SYN/ACK is > one such way, though not necessarily the best way. - > To unsubscribe from this list: send the line "unsubscribe netdev" in > the body of a message to [EMAIL PROTECTED] More majordomo > info at http://vger.kernel.org/majordomo-info.html That's nearly exactly what our situation is: The machine on which the SYN-ACK-feature should be implemented is a TCP-to-X.25 Gateway. There are really stupid TCP terminals out there which connects to the Gateway and simply start sending their data after the connection between them and the Gateway is established. The Gateway otherwise has to check its internal routing-table, which X.25 Number should be called for the requesting TCP terminal and establish this X.25 connection. And now here is the point: If, why ever, the X.25 connection can't be established, the TCP-connection to the terminal has to be closed, or even better: NOT been established at all, so that the terminal can't send any data. So if you ask me, how often the connections should be rejected, i have to say: "Hopefully never", but so long as this stupid terminals will be very confused if the connection is firstly established and than suddenly closed, I think I can't resign this feature. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Dropping NETIF_F_SG since no checksum feature.
Quoting r. David Miller <[EMAIL PROTECTED]>: > Subject: Re: Dropping NETIF_F_SG since no checksum feature. > > From: "Michael S. Tsirkin" <[EMAIL PROTECTED]> > Date: Thu, 12 Oct 2006 21:12:06 +0200 > > > Quoting r. David Miller <[EMAIL PROTECTED]>: > > > Subject: Re: Dropping NETIF_F_SG since no checksum feature. > > > > > > Numbers? > > > > I created two subnets on top of the same pair infiniband HCAs: > > I was asking for SG vs. non-SG numbers so I could see proof > that it really does help like you say it will. > Dave, thanks for the clarification. Please note that ib0 is a non-SG device with MTU 2K, sorry that I forgot to mention that. so, to summarize my previous mail: interface flags mtubandwidth ib0 linear(0) 2044 286.45 ibc0_F_SG 65484 782.55 If I will set both ib0 and ibc0 to 64K MTU, then benchmark-mode with the same MTU SG is somewhat slower than non-SG (I tested this at some point, by some 10%, don't have the numbers at the moment - do you want to see them?). I did not claim it is faster to do SG with same MTU and it is I think clear why linear should be faster for copy *with the same MTU*. But do you really think that we will be able to allocate even a single 64K linear skb after the machine has been active for a while? My assumption is that if I want to reliably get MTU > PAGE_SIZE I must support SG. Is it the wrong one? If this assumption is correct, then below is my line of thinking: - with infiniband we provably get a 2.5x speedup with MTU of 64K vs to 2K. - to get packets of that size reliably we must declare S/G support - infiniband verbs do not support IP checksumming - per network algorithmics, it is better to piggyback checksum calculation on copying if copying takes place For this reason, I would like to define the meaning of S/G set when checksum bits are all clear as "we support S/G but not checksum, please checksum for us if you copy data anyway". Alternatively, add a new NETIF_F_??_CSUM bit to mean this capability. Does this make sense? Thanks, -- MST - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Suppress / delay SYN-ACK
Caitlin Bestler writes: > More to the point, on what basis would the application be rejecting a > connection request based solely on the SYN? Perhaps not the reason that Martin is interested in but ... Say you are writing a transparent proxy i.e. when a TCP connection is made through the box, rather than forwarding the TCP SYN, it is delivered locally where it accepted and then the proxy makes a separate TCP connection to original IP address. Thus all traffic flows through a user-space proxy that can cache, log, virus scan, ... etc. the traffic. Say also that the proxy is for a protocol that can mediate peer<->peer connections via a server (e.g. most IM protocols). Furher still assume that the client has the property that if while trying to establish a peer<->peer connection it will back off and use the server if it does not manage to establish the peer<->peer TCP connection but if it does establish the connection then it will not back off to use the server. Thus if a client is behind the transparent proxy the proxy terminates the TCP connection locally and at that point the client thinks it has connected to the peer even though the proxy has yet to establish a connection to the peer. Should the proxy fail to do so all it can do is drop the client<->proxy connection at which point the client does not connect via the server and the user of the client is not happy since if the proxy wasn't there everything would have worked just fine. So, if the proxy could delay the SYN/ACK until it has determined whether it can really connect to the IP address in the SYN, then it can decide whether to SYN/ACK or just not respond. Of course, the much simpler solution is to fix the client program so that it will still back off to the server even if it does manage to make a TCP connection. However, fixing other people's software is easier said than done. So if you are trying to $ell a tranparent proxy solution, you need to handle it somehow. Delayed SYN/ACK is one such way, though not necessarily the best way. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[FYI]: Introduction of the support for RFC4312(The Camellia Cipher Algorithm)
Hi all, This is Takamiya, from NTT Software. NTT has released the code of the new cipher algorithm, which is specified in RFC4312(The Camellia Cipher Algorithm) Please see http://info.isl.ntt.co.jp/crypt/eng/camellia/source_s.html . The above patch is available for the version of 2.6.18. We started to prepare the patch against the cryptodev-2.6 tree, and will submit it in the few weeks. Best regards. P.S. The patches to use camellia algorithm from ipsec-tools is also available on the above URL, and it will be merged into the ipsec-tools, too. -- Noriaki TAKAMIYA - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] [NET] reduce sizeof(struct inet_peer), cleanup, change in peer_check_expire()
David Miller a écrit : From: Eric Dumazet <[EMAIL PROTECTED]> Date: Fri, 13 Oct 2006 05:56:43 +0200 2^31 is 2147483648 Thats a *lot* of timer ticks, an inet_peer entry should not stay in unused_list for more than 10 minutes. My bad, I thought the time was compared to the creation time not the time at which it was added to the unused list. I like your patch and I'll apply it. Thanks Eric. Thank you David (Re-reading my previous mail, I forgot to say that on ia32, unsigned long is already 32 bits, so if a 32bits timestamp is OK (for delta computing) for such platforms, it must be OK for other platforms as well) Eric - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Dropping NETIF_F_SG since no checksum feature.
From: "Michael S. Tsirkin" <[EMAIL PROTECTED]> Date: Thu, 12 Oct 2006 21:12:06 +0200 > Quoting r. David Miller <[EMAIL PROTECTED]>: > > Subject: Re: Dropping NETIF_F_SG since no checksum feature. > > > > Numbers? > > I created two subnets on top of the same pair infiniband HCAs: I was asking for SG vs. non-SG numbers so I could see proof that it really does help like you say it will. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] [NET] reduce sizeof(struct inet_peer), cleanup, change in peer_check_expire()
From: Eric Dumazet <[EMAIL PROTECTED]> Date: Fri, 13 Oct 2006 05:56:43 +0200 > 2^31 is 2147483648 > > Thats a *lot* of timer ticks, an inet_peer entry should not stay in > unused_list for more than 10 minutes. My bad, I thought the time was compared to the creation time not the time at which it was added to the unused list. I like your patch and I'll apply it. Thanks Eric. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Suppress / delay SYN-ACK
Rick Jones a écrit : More to the point, on what basis would the application be rejecting a connection request based solely on the SYN? True, it isn't like there would suddenly be any call user data as in XTI/TLI. DATA payload could be included in the SYN packet. TCP specs allow this AFAIK. About iptables rules added on the fly by an application that want to protect its listen queue from random sources of 'blacklisted' peers, this has the limitation of granting sufficient rights to the user running the application. Eric - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] [NET] reduce sizeof(struct inet_peer), cleanup, change in peer_check_expire()
David Miller a écrit : From: Eric Dumazet <[EMAIL PROTECTED]> Date: Thu, 12 Oct 2006 22:14:12 +0200 1) shrink struct inet_peer on 64 bits platforms. I noticed sizeof(struct inet_peer) was 64+8 on x86_64 As we dont really need 64 bits timestamps (we only care for garbage collection), we can use 32bits ones and reduce sizeof(struct inet_peer) to 64 bytes : Because of SLAB_HWCACHE_ALIGN constraint, final allocation is 64 bytes instead of 128 bytes per inet_peer structure. I'm not convinced this is %100 correct. There are wrapping cases that I think aren't covered. Consider an entry that lives long enough for the lower 32-bits of jiffies to wrap, then we kill it, but we won't purge it properly if the wrapped jiffie is close to dtime. I'm sure there are other similar cases as well. Hum, if it was incorrect, I urge you to grep for tcp_time_stamp, and correct this as soon as possible :) 2^31 is 2147483648 Thats a *lot* of timer ticks, an inet_peer entry should not stay in unused_list for more than 10 minutes. Even if the system is under stress for more than 30 days, and some entries stay that long in unused list, they wont leak : either they are re-used, either they are purged by cleanup_once(0); done in inet_getpeer(). Eric - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [IPv6] rules: Use RT6_LOOKUP_F_HAS_SADDR and fix source based selectors
From: YOSHIFUJI Hideaki <[EMAIL PROTECTED]> Date: Thu, 12 Oct 2006 18:55:51 +0900 (JST) > I tend to agree. Ville, do you agree? I'll wait for Ville's response before applying this. Otherwise, I think the change looks fine. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: What is current sundance.c status
Ok, I will generate those again with descriptions. Thank you! Best Regards, Jesse Huang. - Original Message - From: "Andrew Morton" <[EMAIL PROTECTED]> To: "Jesse Huang" <[EMAIL PROTECTED]> Cc: ; ; <[EMAIL PROTECTED]> Sent: Thursday, October 12, 2006 10:55 AM Subject: Re: What is current sundance.c status On Thu, 12 Oct 2006 10:29:37 +0800 "Jesse Huang" <[EMAIL PROTECTED]> wrote: > Would you tell me what is the current IP100A status? Should I re-generate patches again. Would it put into kernel or not? I'm sitting on a copy of them. I didn't send them to Jeff last time because: sundance-remove-txstartthresh-and-rxearlythresh.patch There's no description of what this patent issue is. sundance-fix-tx-pause-bug-reset_tx-intr_handler.patch There's no description of the bug which got fixed, nor how this patch fixes it. sundance-change-phy-address-search-from-phy=1-to-phy=0.patch There's a (small) possibility that this will break on hardware which _doesn't_ have a phy at address 0. sundance-correct-initial-and-close-hardware-step.patch There's no real description of the bug which is being fixed, nor of how this patch fixes it. sundance-solve-host-error-problem-in-low-performance-embedded.patch No description of what the "host error problem" is, nor of what causes it, nor of how this patch fixes it. So generally these patches are a bit worrying, and it is hard to gauge what their risk factor is. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] sk98lin: handle pci_enable_device() return value in skge_resume() properly
On Thu, 12 Oct 2006, Andrew Morton wrote: > > pci_set_power_state(pdev, PCI_D0); > > pci_restore_state(pdev); > > - pci_enable_device(pdev); > > + ret = pci_enable_device(pdev); > > + if (ret) { > > + printk(KERN_ERR "sk98lin: Cannot enable PCI device %s during > > resume\n", > > + dev->name); > > + unregister_netdev(dev); > This looks rather wrong - skge_exit() will run unregister_netdev() again. You are of course right (the problem was also spotted by Russell King). This I believe is the correct one for the sk98lin case. [PATCH] fix sk98lin driver, ignoring return value from pci_enable_device() add check of return value to _resume() function of sk98lin driver. Signed-off-by: Jiri Kosina <[EMAIL PROTECTED]> --- drivers/net/sk98lin/skge.c | 20 +++- 1 files changed, 15 insertions(+), 5 deletions(-) diff --git a/drivers/net/sk98lin/skge.c b/drivers/net/sk98lin/skge.c index d4913c3..3a9323d 100644 --- a/drivers/net/sk98lin/skge.c +++ b/drivers/net/sk98lin/skge.c @@ -5070,7 +5070,12 @@ static int skge_resume(struct pci_dev *p pci_set_power_state(pdev, PCI_D0); pci_restore_state(pdev); - pci_enable_device(pdev); + ret = pci_enable_device(pdev); + if (ret) { + printk(KERN_WARNING "sk98lin: unable to enable device %s in resume\n", + dev->name); + goto out_err; + } pci_set_master(pdev); if (pAC->GIni.GIMacsFound == 2) ret = request_irq(dev->irq, SkGeIsr, IRQF_SHARED, "sk98lin", dev); @@ -5078,10 +5083,8 @@ static int skge_resume(struct pci_dev *p ret = request_irq(dev->irq, SkGeIsrOnePort, IRQF_SHARED, "sk98lin", dev); if (ret) { printk(KERN_WARNING "sk98lin: unable to acquire IRQ %d\n", dev->irq); - pAC->AllocFlag &= ~SK_ALLOC_IRQ; - dev->irq = 0; - pci_disable_device(pdev); - return -EBUSY; + ret = -EBUSY; + goto out_err; } netif_device_attach(dev); @@ -5098,6 +5101,13 @@ static int skge_resume(struct pci_dev *p } return 0; +out_err: + pAC->AllocFlag &= ~SK_ALLOC_IRQ; + dev->irq = 0; + pci_disable_device(pdev); + + return ret; + } #else #define skge_suspend NULL -- Jiri Kosina - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Suppress / delay SYN-ACK
On Thu, 12 Oct 2006 15:54:49 -0700 Rick Jones <[EMAIL PROTECTED]> wrote: > > More to the point, on what basis would the application be rejecting a > > connection request based solely on the SYN? > > True, it isn't like there would suddenly be any call user data as in XTI/TLI. > > > There are only two pieces of information available: the remote IP > > address and port, and the total number of pending requests. The > > latter is already addressed through the backlog size, and netfilter > > rules can already be used to reject based on IP address. > > It would though allow an application to have an even more restricted set of > allowed IP's than was set in netfilter. Rather like allowing the application > to > set socket buffer sizes rather than relying on the system's default. > Some version of BSD sockets had this behaviour, perhaps you should use the same model. It was some socket option, I can't remember; what ever it wasn't widely adopted. Nothing says you can't just use shutdown() to force a RST on the addresses you don't want to talk to. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] sk98lin: handle pci_enable_device() return value in skge_resume() properly
On Fri, 13 Oct 2006 00:57:18 +0200 (CEST) Jiri Kosina <[EMAIL PROTECTED]> wrote: > @@ -5070,7 +5070,13 @@ static int skge_resume(struct pci_dev *p > > pci_set_power_state(pdev, PCI_D0); > pci_restore_state(pdev); > - pci_enable_device(pdev); > + ret = pci_enable_device(pdev); > + if (ret) { > + printk(KERN_ERR "sk98lin: Cannot enable PCI device %s during > resume\n", > + dev->name); > + unregister_netdev(dev); This looks rather wrong - skge_exit() will run unregister_netdev() again. Look a few lines down, to where this function already handles request_irq() failure, reuse that code path. Hopefully it has been tested.. (Once we have an easy-to-use fault-injection framework we'll be able to test all these things more easily) (But it's possible to test them already, with a bit of ad-hoc testing code) - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
FW: [patch] Performance enhancement patches for SB1250 MAC
-Original Message- From: Yang, Steve Sent: Thursday, October 12, 2006 5:46 PM To: 'Stephen Hemminger' Cc: netdev@vger.kernel.org Subject: RE: [patch] Performance enhancement patches for SB1250 MAC Stephen, I assume the "expense" you referred to is the reserved SK cache buffers. 1. The SKB_CACHE does hold on to buffers which would otherwise be returned to the system (although the number it holds on to is limited and configurable). These buffers are only returned with certainty at module unload time, although with normal traffic most of them would be recycled pretty quick. I think the cache was implemented as a stack, rather than a FIFO, which could cause a few buffers to be held for quite a while under light loads. 2. SKB_CACHE, just like NAPI, is also a configurable option. Systems that need the performance have the option of turning this on, at the expense of small number of buffers; other systems which don't care much about networking performance can leave this option off. 3. Can you elaborate other possible issues that you touch upon (memory starvation/race, etc.)? Regards, Steve Yang -Original Message- From: Stephen Hemminger [mailto:[EMAIL PROTECTED] Sent: Thursday, October 12, 2006 3:14 PM To: Yang, Steve Cc: netdev@vger.kernel.org Subject: Re: [patch] Performance enhancement patches for SB1250 MAC On Thu, 12 Oct 2006 14:54:33 -0700 "Yang, Steve" <[EMAIL PROTECTED]> wrote: > FYI ... > > Regards, > Steve Yang > > -Original Message- > From: Yang, Steve > Sent: Monday, September 25, 2006 3:50 PM > To: '[EMAIL PROTECTED]' > Cc: '[EMAIL PROTECTED]'; 'Mark E Mason' > Subject: Performance enhancement patches for SB1250 MAC > > Hi, > > The attached are two network performance enhancement patches for > SB1250 MAC. The NAPI patch applies first. Followed by the "skb cache" patch. > They applied and builds cleanly on 2.6.18 kernel for the following > kernel option combinations: > > SBMAC_NAPIno yes yes > SKB_CACHE no no yes > > Regards, > Steve Yang > NAK on the SKB_CACHE it is idea that just ends up favoring your driver at the expense of the rest of the system. Also, there are resource/memory starvation issues and probably other races as well. I bet it makes your benchmark run faster, but it doesn't belong in normal kernel -- Stephen Hemminger <[EMAIL PROTECTED]> - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] sk98lin: handle pci_enable_device() return value in skge_resume() properly
On Thu, 12 Oct 2006, Stephen Hemminger wrote: > > > Having the device unregister seems harsh. > > What would be the proper way? As the initialization failed, accessing > > the device would not make sense any more (therefore I don't think that > > calling skge_remove_one() would be OK, as it issues calls to > > SkEventQueue() and SkEventDispatcher(), trying to send something to > > the card). > I guess, its just not clear what the state of the machine is anyway > if you can't enable the device something is hosed (or the device was > hot removed). Well, it depends on definition of 'hot'. What would for example happen in the case suspend-to-disk -> remove the card when the machine is switched off -> resume-from-disk? I guess that exactly this pci_enable_device() will fail, so we definitely have to handle this case, as it can easily happen. > > > Why put condtional on same line? > > Pardon me? > I prefer: > ret = pci_enable_device(pdev); As you wish. [PATCH] fix sk98lin driver, ignoring return value from pci_enable_device() add check of return value to _resume() function of sk98lin driver. Signed-off-by: Jiri Kosina <[EMAIL PROTECTED]> --- drivers/net/sk98lin/skge.c |8 +++- 1 files changed, 7 insertions(+), 1 deletions(-) diff --git a/drivers/net/sk98lin/skge.c b/drivers/net/sk98lin/skge.c index d4913c3..d691811 100644 --- a/drivers/net/sk98lin/skge.c +++ b/drivers/net/sk98lin/skge.c @@ -5070,7 +5070,13 @@ static int skge_resume(struct pci_dev *p pci_set_power_state(pdev, PCI_D0); pci_restore_state(pdev); - pci_enable_device(pdev); + ret = pci_enable_device(pdev); + if (ret) { + printk(KERN_ERR "sk98lin: Cannot enable PCI device %s during resume\n", + dev->name); + unregister_netdev(dev); + return ret; + } pci_set_master(pdev); if (pAC->GIni.GIMacsFound == 2) ret = request_irq(dev->irq, SkGeIsr, IRQF_SHARED, "sk98lin", dev); -- Jiri Kosina - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Suppress / delay SYN-ACK
More to the point, on what basis would the application be rejecting a connection request based solely on the SYN? True, it isn't like there would suddenly be any call user data as in XTI/TLI. There are only two pieces of information available: the remote IP address and port, and the total number of pending requests. The latter is already addressed through the backlog size, and netfilter rules can already be used to reject based on IP address. It would though allow an application to have an even more restricted set of allowed IP's than was set in netfilter. Rather like allowing the application to set socket buffer sizes rather than relying on the system's default. rick jones - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] sk98lin: handle pci_enable_device() return value in skge_resume() properly
On Fri, 13 Oct 2006 00:38:20 +0200 (CEST) Jiri Kosina <[EMAIL PROTECTED]> wrote: > On Thu, 12 Oct 2006, Stephen Hemminger wrote: > > > > pci_set_power_state(pdev, PCI_D0); > > > pci_restore_state(pdev); > > > - pci_enable_device(pdev); > > > + if ((ret = pci_enable_device(pdev))) { > > > + printk(KERN_ERR "sk98lin: Cannot enable PCI device during > > > resume\n"); > > > + unregister_netdev(dev); > > > > > Having the device unregister seems harsh. > > What would be the proper way? As the initialization failed, accessing the > device would not make sense any more (therefore I don't think that calling > skge_remove_one() would be OK, as it issues calls to SkEventQueue() and > SkEventDispatcher(), trying to send something to the card). I guess, its just not clear what the state of the machine is anyway if you can't enable the device something is hosed (or the device was hot removed). > > Why put condtional on same line? > > Pardon me? I prefer: ret = pci_enable_device(pdev); if (ret) { > > > Why not print device name dev->name. > > Thanks. > > [PATCH] fix sk98lin driver, ignoring return value from pci_enable_device() > > add check of return value to _resume() function of sk98lin driver. > > Signed-off-by: Jiri Kosina <[EMAIL PROTECTED]> > > --- > > drivers/net/sk98lin/skge.c |6 +- > 1 files changed, 5 insertions(+), 1 deletions(-) > > diff --git a/drivers/net/sk98lin/skge.c b/drivers/net/sk98lin/skge.c > index d4913c3..1f03cf8 100644 > --- a/drivers/net/sk98lin/skge.c > +++ b/drivers/net/sk98lin/skge.c > @@ -5070,7 +5070,11 @@ static int skge_resume(struct pci_dev *p > > pci_set_power_state(pdev, PCI_D0); > pci_restore_state(pdev); > - pci_enable_device(pdev); > + if ((ret = pci_enable_device(pdev))) { > + printk(KERN_ERR "sk98lin: Cannot enable PCI device %s during > resume\n", > + dev->name); > + return ret; > + } > pci_set_master(pdev); > if (pAC->GIni.GIMacsFound == 2) > ret = request_irq(dev->irq, SkGeIsr, IRQF_SHARED, "sk98lin", > dev); > -- Stephen Hemminger <[EMAIL PROTECTED]> - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] sk98lin: handle pci_enable_device() return value in skge_resume() properly
On Thu, 12 Oct 2006, Stephen Hemminger wrote: > > pci_set_power_state(pdev, PCI_D0); > > pci_restore_state(pdev); > > - pci_enable_device(pdev); > > + if ((ret = pci_enable_device(pdev))) { > > + printk(KERN_ERR "sk98lin: Cannot enable PCI device during > > resume\n"); > > + unregister_netdev(dev); > > > Having the device unregister seems harsh. What would be the proper way? As the initialization failed, accessing the device would not make sense any more (therefore I don't think that calling skge_remove_one() would be OK, as it issues calls to SkEventQueue() and SkEventDispatcher(), trying to send something to the card). > Why put condtional on same line? Pardon me? > Why not print device name dev->name. Thanks. [PATCH] fix sk98lin driver, ignoring return value from pci_enable_device() add check of return value to _resume() function of sk98lin driver. Signed-off-by: Jiri Kosina <[EMAIL PROTECTED]> --- drivers/net/sk98lin/skge.c |6 +- 1 files changed, 5 insertions(+), 1 deletions(-) diff --git a/drivers/net/sk98lin/skge.c b/drivers/net/sk98lin/skge.c index d4913c3..1f03cf8 100644 --- a/drivers/net/sk98lin/skge.c +++ b/drivers/net/sk98lin/skge.c @@ -5070,7 +5070,11 @@ static int skge_resume(struct pci_dev *p pci_set_power_state(pdev, PCI_D0); pci_restore_state(pdev); - pci_enable_device(pdev); + if ((ret = pci_enable_device(pdev))) { + printk(KERN_ERR "sk98lin: Cannot enable PCI device %s during resume\n", + dev->name); + return ret; + } pci_set_master(pdev); if (pAC->GIni.GIMacsFound == 2) ret = request_irq(dev->irq, SkGeIsr, IRQF_SHARED, "sk98lin", dev); -- Jiri Kosina - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/7] d80211: add support for SIOCSIWRATE and SIOCGIWRATE
I am sorry for the late response. please read my comment bellow. Jiri Benc wrote: On Thu, 21 Sep 2006 09:59:39 -0700, mabbas wrote: I can not see how does it break per-STA TX rate limit, especially PRISM2_HOSTAPD_SET_RATE_SETS almost doing the same thing. I am not saying the patch is correct I just want to know how to fix it to get it in. As Jouni wrote, it's not useful to change the per-radio rate table. You want to limit the rates you are using to communicate with the current AP while not limiting other virtual interfaces. (Imagine you have the card that is capable to associate to two APs at the same time. You don't want to limit rates for both APs with SIOCSIWRATE.) To do that I think the following is needed: 1. Add 'allowed_rates' field to struct sta_info. It defaults to 0x. (Or perhaps call it 'disabled_rates' and make it default to 0.) Should I add the new field to sta_info or to ieee80211_sub_if_data. If we added to sta_info then it wont be persistent. We will loose SIOCSIWRATE restriction once we associate with new AP. Then in 3 we bitmask sta->curr_rates with ieee80211_sub_if_data::allowed_rates and this will solve the problem for IBSS as well. 2. The SIOCSIWRATE handler: If the interface is not in a STA mode, return -EOPNOTSUPP. Otherwise, modify the allowed_rates field of the sta entry belonging to the current AP. 3. Bitmask sta->curr_rates with sta->allowed_rates (or ~sta->disabled_rates) in various places (ieee80211_ioctl_add_sta, ieee80211_rx_mgmt_assoc_resp, ieee80211_rx_bss_info; please check for other places). In IBSS and AP mode setting this (per-STA, of course, which is not supported by WE, btw.) can be useful as well but it can be done later. Jiri Thanks Mohamed - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/3] Collection of small NetLabel bugfixes
On Wed, 11 Oct 2006, [EMAIL PROTECTED] wrote: > When doing some more testing today I ran into a few bugs, this patchset > addresses those bugs. This patchset is backed against today's net-2.6 git > tree. > > Please apply these patches for 2.6.19, thanks. Applied to git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/selinux-net-2.6.git -- James Morris <[EMAIL PROTECTED]> - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] sk98lin: handle pci_enable_device() return value in skge_resume() properly
On Fri, 13 Oct 2006 00:17:50 +0200 (CEST) Jiri Kosina <[EMAIL PROTECTED]> wrote: > [PATCH] sk98lin: handle pci_enable_device() return value in skge_resume() > properly > > Fix missing handling of pci_enable_device() return value in skge_resume() > > Signed-off-by: Jiri Kosina <[EMAIL PROTECTED]> > > --- > > drivers/net/sk98lin/skge.c |6 +- > 1 files changed, 5 insertions(+), 1 deletions(-) > > diff --git a/drivers/net/sk98lin/skge.c b/drivers/net/sk98lin/skge.c > index 99e9262..e12fb62 100644 > --- a/drivers/net/sk98lin/skge.c > +++ b/drivers/net/sk98lin/skge.c > @@ -5070,7 +5070,11 @@ static int skge_resume(struct pci_dev *p > > pci_set_power_state(pdev, PCI_D0); > pci_restore_state(pdev); > - pci_enable_device(pdev); > + if ((ret = pci_enable_device(pdev))) { > + printk(KERN_ERR "sk98lin: Cannot enable PCI device during > resume\n"); > + unregister_netdev(dev); > Having the device unregister seems harsh. Why put condtional on same line? Why not print device name dev->name. -- Stephen Hemminger <[EMAIL PROTECTED]> - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] sk98lin: handle pci_enable_device() return value in skge_resume() properly
[PATCH] sk98lin: handle pci_enable_device() return value in skge_resume() properly Fix missing handling of pci_enable_device() return value in skge_resume() Signed-off-by: Jiri Kosina <[EMAIL PROTECTED]> --- drivers/net/sk98lin/skge.c |6 +- 1 files changed, 5 insertions(+), 1 deletions(-) diff --git a/drivers/net/sk98lin/skge.c b/drivers/net/sk98lin/skge.c index 99e9262..e12fb62 100644 --- a/drivers/net/sk98lin/skge.c +++ b/drivers/net/sk98lin/skge.c @@ -5070,7 +5070,11 @@ static int skge_resume(struct pci_dev *p pci_set_power_state(pdev, PCI_D0); pci_restore_state(pdev); - pci_enable_device(pdev); + if ((ret = pci_enable_device(pdev))) { + printk(KERN_ERR "sk98lin: Cannot enable PCI device during resume\n"); + unregister_netdev(dev); + return ret; + } pci_set_master(pdev); if (pAC->GIni.GIMacsFound == 2) ret = request_irq(dev->irq, SkGeIsr, IRQF_SHARED, "sk98lin", dev); -- Jiri Kosina - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] [NET] reduce sizeof(struct inet_peer), cleanup, change in peer_check_expire()
From: Eric Dumazet <[EMAIL PROTECTED]> Date: Thu, 12 Oct 2006 22:14:12 +0200 > 1) shrink struct inet_peer on 64 bits platforms. > > I noticed sizeof(struct inet_peer) was 64+8 on x86_64 > > As we dont really need 64 bits timestamps (we only care for garbage > collection), we can use 32bits ones and reduce sizeof(struct inet_peer) to 64 > bytes : Because of SLAB_HWCACHE_ALIGN constraint, final allocation is 64 > bytes > instead of 128 bytes per inet_peer structure. I'm not convinced this is %100 correct. There are wrapping cases that I think aren't covered. Consider an entry that lives long enough for the lower 32-bits of jiffies to wrap, then we kill it, but we won't purge it properly if the wrapped jiffie is close to dtime. I'm sure there are other similar cases as well. > 2) Cleanup > -- > inet_putpeer() is not anymore inlined in inetpeer.h, as this is not called > in fast paths, to reduce text size. Some exports are not anymore needed > (inet_peer_unused_lock, inet_peer_unused_tailp) and can be declared static. > > 3) No more hard limit (PEER_MAX_CLEANUP_WORK = 30) > -- > peer_check_expire() try to delete entries for at most one timer tick. CPUS > are going faster, hard limits are becoming useless... Similar thing is done > in > net/ipv4/route.c garbage collector. These parts are fine. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch] Performance enhancement patches for SB1250 MAC
On Thu, 12 Oct 2006 14:54:33 -0700 "Yang, Steve" <[EMAIL PROTECTED]> wrote: > FYI ... > > Regards, > Steve Yang > > -Original Message- > From: Yang, Steve > Sent: Monday, September 25, 2006 3:50 PM > To: '[EMAIL PROTECTED]' > Cc: '[EMAIL PROTECTED]'; 'Mark E Mason' > Subject: Performance enhancement patches for SB1250 MAC > > Hi, > > The attached are two network performance enhancement patches for SB1250 > MAC. The NAPI patch applies first. Followed by the "skb cache" patch. > They applied and builds cleanly on 2.6.18 kernel for the following > kernel option combinations: > > SBMAC_NAPIno yes yes > SKB_CACHE no no yes > > Regards, > Steve Yang > NAK on the SKB_CACHE it is idea that just ends up favoring your driver at the expense of the rest of the system. Also, there are resource/memory starvation issues and probably other races as well. I bet it makes your benchmark run faster, but it doesn't belong in normal kernel -- Stephen Hemminger <[EMAIL PROTECTED]> - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Suppress / delay SYN-ACK
On Thu, 2006-12-10 at 14:58 -0700, Caitlin Bestler wrote: > That would seem to limit the usefullness to scenarios where a given > remote IP address *might* be accepted based on total traffic load, > number of other connections from the same IP address, etc. If > *all* requests from that IP address are going to be rejected, why > not use netfilter? Netfilter or ingress tc may both work; I have a feeling that the poster needs to consult some policy+state in the application first which is more complex than what rate control or number of connections provide (DOS detection?)- in which case, theyd have to write a netfilter target. cheers, jamal - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Suppress / delay SYN-ACK
On 10/12/06, Rick Jones <[EMAIL PROTECTED]> wrote: Martin Schiller wrote: > Hi! > > I'm searching for a solution to suppress / delay the SYN-ACK packet of a > listening server (-application) until he has decided (e.g. analysed the > requesting ip-address or checked if the corresponding other end of a > connection is available) if he wants to accept the connect request of the > client. If not, it should be possible to reject the connect request. How often do you expect the incomming call to be rejected? I suspect that would have a significant effect on whether the whole thing is worthwhile. rick jones More to the point, on what basis would the application be rejecting a connection request based solely on the SYN? There are only two pieces of information available: the remote IP address and port, and the total number of pending requests. The latter is already addressed through the backlog size, and netfilter rules can already be used to reject based on IP address. That would seem to limit the usefullness to scenarios where a given remote IP address *might* be accepted based on total traffic load, number of other connections from the same IP address, etc. If *all* requests from that IP address are going to be rejected, why not use netfilter? - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[patch] Performance enhancement patches for SB1250 MAC
FYI ... Regards, Steve Yang -Original Message- From: Yang, Steve Sent: Monday, September 25, 2006 3:50 PM To: '[EMAIL PROTECTED]' Cc: '[EMAIL PROTECTED]'; 'Mark E Mason' Subject: Performance enhancement patches for SB1250 MAC Hi, The attached are two network performance enhancement patches for SB1250 MAC. The NAPI patch applies first. Followed by the "skb cache" patch. They applied and builds cleanly on 2.6.18 kernel for the following kernel option combinations: SBMAC_NAPI no yes yes SKB_CACHE no no yes Regards, Steve Yang mips-sb1250-mac-NAPI.patch Description: mips-sb1250-mac-NAPI.patch sb1250mac_skb_cache.patch Description: sb1250mac_skb_cache.patch
Re: [PATCH] bridge: flush forwarding table when device carrier off
From: Stephen Hemminger <[EMAIL PROTECTED]> Date: Thu, 12 Oct 2006 11:24:31 -0700 > Flush the forwarding table when carrier is lost. This helps for > availability because we don't want to forward to a downed device and > new packets may come in on other links. > > Signed-off-by: Stephen Hemminger <[EMAIL PROTECTED]> ... > + if (f->is_static & !do_all) > + continue; Applied with "&" changed to "&&" as mentioned elsewhere :) - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] bridge: flush forwarding table when device carrier off
On Thu, 2006-12-10 at 14:32 -0700, Stephen Hemminger wrote: > > I am on the other extreme - this is problematic if you have a large > > table already learnt. Agrevate that with an unstable link and it gets a > > lot worse. Both of which dont sound unrealistic in say a wireless AP. > > We don't support bridging wireless, that requires some NDS stuff that > isn't supported, and requires more softmac than the stack has. > I was more thinking of wireless-to-ethernet bridging; that should still work, no? i.e say eth1 on wireless with eth0 on the wired side? In any case, that may be a bad example (and a digression) of something that learns large tables. I have however seen 1K entries in bridging. > > A more sane policy i have seen is a timer that flushes the table after a > > programmed period; this way you counter a flipflop-ing link. > > That's already there. > ah, ok. So the patch is in an alternative to this then? > > IOW, the best place is to have this in some user space daemon. If it has > > to be in the kernel, can you add a systcl to disable it? > > > > When RSTP is in userspace, it will do the flushing. Cool. And that makes a lot of sense. cheers, jamal - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] bridge: flush forwarding table when device carrier off
On Thu, 12 Oct 2006 17:30:33 -0400 jamal <[EMAIL PROTECTED]> wrote: > On Thu, 2006-12-10 at 16:10 -0400, Andy Gospodarek wrote: > > On Thu, Oct 12, 2006 at 11:24:31AM -0700, Stephen Hemminger wrote: > > > Flush the forwarding table when carrier is lost. This helps for > > > availability because we don't want to forward to a downed device and > > > new packets may come in on other links. > > > > > > Signed-off-by: Stephen Hemminger <[EMAIL PROTECTED]> > > > > > > > Stephen, > > > > This is an excellent idea > > I am on the other extreme - this is problematic if you have a large > table already learnt. Agrevate that with an unstable link and it gets a > lot worse. Both of which dont sound unrealistic in say a wireless AP. We don't support bridging wireless, that requires some NDS stuff that isn't supported, and requires more softmac than the stack has. > A more sane policy i have seen is a timer that flushes the table after a > programmed period; this way you counter a flipflop-ing link. That's already there. > IOW, the best place is to have this in some user space daemon. If it has > to be in the kernel, can you add a systcl to disable it? > When RSTP is in userspace, it will do the flushing. -- Stephen Hemminger <[EMAIL PROTECTED]> - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] bridge: flush forwarding table when device carrier off
On Thu, 2006-12-10 at 16:10 -0400, Andy Gospodarek wrote: > On Thu, Oct 12, 2006 at 11:24:31AM -0700, Stephen Hemminger wrote: > > Flush the forwarding table when carrier is lost. This helps for > > availability because we don't want to forward to a downed device and > > new packets may come in on other links. > > > > Signed-off-by: Stephen Hemminger <[EMAIL PROTECTED]> > > > > Stephen, > > This is an excellent idea I am on the other extreme - this is problematic if you have a large table already learnt. Agrevate that with an unstable link and it gets a lot worse. Both of which dont sound unrealistic in say a wireless AP. A more sane policy i have seen is a timer that flushes the table after a programmed period; this way you counter a flipflop-ing link. IOW, the best place is to have this in some user space daemon. If it has to be in the kernel, can you add a systcl to disable it? cheers, jamal - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] e1000: Real time packets and bytes statistics
On Thu, 12 Oct 2006 23:02:52 +0200 Jean Delvare <[EMAIL PROTECTED]> wrote: > Hi Stephen, > > On 10/11/06, Stephen Hemminger wrote: > > On Wed, 11 Oct 2006, Jesse Brandeburg wrote: > > > On 10/11/06, Jean Delvare wrote: > > > > Let the e1000 driver report the most important statistics (rx/tx_bytes > > > > and rx/tx_packets) in real time, rather than every other second. This > > > > is similar to what the e100 driver is doing. > > > > (...) > > > > --- linux-2.6.19-rc1.orig/drivers/net/e1000/e1000_main.c > > > > 2006-10-11 10:53:49.0 +0200 > > > > +++ linux-2.6.19-rc1/drivers/net/e1000/e1000_main.c 2006-10-11 > > > > 11:34:41.0 +0200 > > > > @@ -3118,6 +3118,8 @@ > > > >e1000_tx_map(adapter, tx_ring, skb, first, > > > > max_per_txd, nr_frags, mss)); > > > > > > > > + adapter->net_stats.tx_packets++; > > > > + adapter->net_stats.tx_bytes += skb->len; > > > > netdev->trans_start = jiffies; > > > > > > this is the part I'm most worried about. as I believe it to be > > > incorrect for TSO packets. Maybe something like? > > > + if (skb_shinfo(skb)->gso_segs) > > > + adapter->net_stats.tx_packets += skb_shinfo(skb)->gso_segs; > > > + else > > > + adapter->net_stats.tx_packets++; > > > + adapter->net_stats.tx_bytes += skb->len; > > > netdev->trans_start = jiffies; > > > > > > skb len will still be off by some amount, because the skb->data > > > (header) is replicated across each gso segment but only counted once > > > this way, but hopefully someone will pipe up with a good way to > > > compute that. > > > > You might want to put the tx values in a per-cpu structure and > > sum later. Incrementing statistics can actually be a performance > > bottleneck on SMP tests, because it causes lots of cache thrashing. > > I don't really see how this would be implemented. Can you please point > me to other drivers which do it that way? > > Thanks, Loopback (drivers/net/loopback.c) does it, but it is simpler since it doesn't have to support multiple interfaces. In a normal driver you would have to use indirection and alloc_percpu() like af_inet.c does. -- Stephen Hemminger <[EMAIL PROTECTED]> - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] e1000: Real time packets and bytes statistics
Hi Jesse, On 10/11/06, Jesse Brandeburg wrote: > On 10/11/06, Jean Delvare wrote: > > Let the e1000 driver report the most important statistics (rx/tx_bytes > > and rx/tx_packets) in real time, rather than every other second. This > > is similar to what the e100 driver is doing. > > > > The current asynchronous statistics refresh model makes it impossible > > to monitor the network traffic with an interval which isn't a multiple > > of 2 seconds. For example, an interval of 5 seconds would result in a > > sawtooth diagram (+20%, -20%) for a constant transfer rate. With a 1 > > second interval it's even worse (0, 200%) of course. This has been > > annoying users for years, but was never actually fixed: > > I think the idea is good, however, see below. Good news :) > > I additionally noted a difference of 6 bytes on some TX frames, which > > I am not able to explain. It's probably small and rare enough not to > > be considered a problem, but if someone can explain it, I would be > > grateful. > > now, that sounds odd, however, once again, see below. What you say below about TSO can't possibly explain this difference, as your fix is about tx_packets while the difference I observed was on tx_bytes only, the packet count was always correct. I'll investigate tomorrow, if I can find a pattern for these differences I might discover what these bytes are. For now the only idea I have is that 6 bytes is ETH_ALEN, the size of an ethernet MAC address - but that doesn't explain anything per se. > > drivers/net/e1000/e1000_main.c | 14 ++ > > 1 file changed, 10 insertions(+), 4 deletions(-) > > > > --- linux-2.6.19-rc1.orig/drivers/net/e1000/e1000_main.c2006-10-11 > > 10:53:49.0 +0200 > > +++ linux-2.6.19-rc1/drivers/net/e1000/e1000_main.c 2006-10-11 > > 11:34:41.0 +0200 > > @@ -3118,6 +3118,8 @@ > >e1000_tx_map(adapter, tx_ring, skb, first, > > max_per_txd, nr_frags, mss)); > > > > + adapter->net_stats.tx_packets++; > > + adapter->net_stats.tx_bytes += skb->len; > > netdev->trans_start = jiffies; > > this is the part I'm most worried about. as I believe it to be > incorrect for TSO packets. Maybe something like? I have to admit I have very little experience with network drivers and I didn't have the slightest idea what TSO was until I looked into wikipedia two minutes ago. So I seem to understand that the skb structure in the code above could correspond to several packets sent on the wire by the ethernet adapter when TSO is used? Seems to be very recent, 2.6.16 didn't have that. > + if (skb_shinfo(skb)->gso_segs) > + adapter->net_stats.tx_packets += skb_shinfo(skb)->gso_segs; > + else > + adapter->net_stats.tx_packets++; > + adapter->net_stats.tx_bytes += skb->len; > netdev->trans_start = jiffies; My comparisons between hardware and software-computed statistics did not reveal any difference with regards to tx_packets, while there should have been one if the change above is needed. This suggests that my tests (run on 2.6.18) did not trigger any TSO packet? Can you suggest a way to generate such packets so that I am sure I exercise this code path? I found an inline function in include/net/tcp.h, tcp_skb_pcount(), which evaluates to skb_shinfo(skb)->gso_segs. I guess I should use that instead of the above? > skb len will still be off by some amount, because the skb->data > (header) is replicated across each gso segment but only counted once > this way, but hopefully someone will pipe up with a good way to > compute that. And nearby there is another inline function, tcp_skb_mss(), which evaluates to skb_shinfo(skb)->gso_size... I can't experiment with that until I know a way to trigger TSO packets, but could it be the size we're after? Thanks a lot for your guidance. -- Jean Delvare - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] e1000: Real time packets and bytes statistics
Hi Stephen, On 10/11/06, Stephen Hemminger wrote: > On Wed, 11 Oct 2006, Jesse Brandeburg wrote: > > On 10/11/06, Jean Delvare wrote: > > > Let the e1000 driver report the most important statistics (rx/tx_bytes > > > and rx/tx_packets) in real time, rather than every other second. This > > > is similar to what the e100 driver is doing. > > > (...) > > > --- linux-2.6.19-rc1.orig/drivers/net/e1000/e1000_main.c > > > 2006-10-11 10:53:49.0 +0200 > > > +++ linux-2.6.19-rc1/drivers/net/e1000/e1000_main.c 2006-10-11 > > > 11:34:41.0 +0200 > > > @@ -3118,6 +3118,8 @@ > > >e1000_tx_map(adapter, tx_ring, skb, first, > > > max_per_txd, nr_frags, mss)); > > > > > > + adapter->net_stats.tx_packets++; > > > + adapter->net_stats.tx_bytes += skb->len; > > > netdev->trans_start = jiffies; > > > > this is the part I'm most worried about. as I believe it to be > > incorrect for TSO packets. Maybe something like? > > + if (skb_shinfo(skb)->gso_segs) > > + adapter->net_stats.tx_packets += skb_shinfo(skb)->gso_segs; > > + else > > + adapter->net_stats.tx_packets++; > > + adapter->net_stats.tx_bytes += skb->len; > > netdev->trans_start = jiffies; > > > > skb len will still be off by some amount, because the skb->data > > (header) is replicated across each gso segment but only counted once > > this way, but hopefully someone will pipe up with a good way to > > compute that. > > You might want to put the tx values in a per-cpu structure and > sum later. Incrementing statistics can actually be a performance > bottleneck on SMP tests, because it causes lots of cache thrashing. I don't really see how this would be implemented. Can you please point me to other drivers which do it that way? Thanks, -- Jean Delvare - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] bridge: flush forwarding table when device carrier off
On Thu, 12 Oct 2006 16:10:44 -0400 Andy Gospodarek <[EMAIL PROTECTED]> wrote: > On Thu, Oct 12, 2006 at 11:24:31AM -0700, Stephen Hemminger wrote: > > Flush the forwarding table when carrier is lost. This helps for > > availability because we don't want to forward to a downed device and > > new packets may come in on other links. > > > > Signed-off-by: Stephen Hemminger <[EMAIL PROTECTED]> > > > > Stephen, > > This is an excellent idea and all looks good except this check > > + if (f->is_static & !do_all) > + continue; > > should be this: > > + if (f->is_static && !do_all) > + continue; > Agreed, but it probably worked during testing because both flags are strict booleans (ie 0/1) -- Stephen Hemminger <[EMAIL PROTECTED]> - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Pull request for 'jg-20061012-00' tag
Please pull from tag 'jg-20061012-00' in repository git://electric-eye.fr.zoreil.com/home/romieu/linux-2.6.git jg-20061012-00 to get the changes below. Distance from 'upstream-fixes' - 733b736c91dd2c556f35dffdcf77e667cf10cefc 73f5e28b336772c4b08ee82e5bf28ab872898ee1 Diffstat drivers/net/r8169.c |7 ++- 1 files changed, 6 insertions(+), 1 deletions(-) Shortlog Andrew Morton: r8169: PCI ID for Corega Gigabit network card Arnaud Patard: r8169: fix infinite loop during hotplug Patch - diff --git a/drivers/net/r8169.c b/drivers/net/r8169.c index 4c47c5b..c2c9a86 100644 --- a/drivers/net/r8169.c +++ b/drivers/net/r8169.c @@ -214,6 +214,7 @@ static struct pci_device_id rtl8169_pci_ { PCI_DEVICE(PCI_VENDOR_ID_REALTEK, 0x8168), 0, 0, RTL_CFG_2 }, { PCI_DEVICE(PCI_VENDOR_ID_REALTEK, 0x8169), 0, 0, RTL_CFG_0 }, { PCI_DEVICE(PCI_VENDOR_ID_DLINK, 0x4300), 0, 0, RTL_CFG_0 }, + { PCI_DEVICE(0x1259,0xc107), 0, 0, RTL_CFG_0 }, { PCI_DEVICE(0x16ec,0x0116), 0, 0, RTL_CFG_0 }, { PCI_VENDOR_ID_LINKSYS,0x1032, PCI_ANY_ID, 0x0024, 0, 0, RTL_CFG_0 }, @@ -2701,6 +2702,7 @@ static void rtl8169_down(struct net_devi struct rtl8169_private *tp = netdev_priv(dev); void __iomem *ioaddr = tp->mmio_addr; unsigned int poll_locked = 0; + unsigned int intrmask; rtl8169_delete_timer(dev); @@ -2739,8 +2741,11 @@ core_down: * 2) dev->change_mtu *-> rtl8169_poll can not be issued again and re-enable the * interruptions. Let's simply issue the IRQ down sequence again. +* +* No loop if hotpluged or major error (0x). */ - if (RTL_R16(IntrMask)) + intrmask = RTL_R16(IntrMask); + if (intrmask && (intrmask != 0x)) goto core_down; rtl8169_tx_clear(tp); -- Ueimor - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] [NET] reduce sizeof(struct inet_peer), cleanup, change in peer_check_expire()
Hi David Please find this patch against include/net/inetpeer.h and net/ipv4/inetpeer.c 1) shrink struct inet_peer on 64 bits platforms. I noticed sizeof(struct inet_peer) was 64+8 on x86_64 As we dont really need 64 bits timestamps (we only care for garbage collection), we can use 32bits ones and reduce sizeof(struct inet_peer) to 64 bytes : Because of SLAB_HWCACHE_ALIGN constraint, final allocation is 64 bytes instead of 128 bytes per inet_peer structure. 2) Cleanup -- inet_putpeer() is not anymore inlined in inetpeer.h, as this is not called in fast paths, to reduce text size. Some exports are not anymore needed (inet_peer_unused_lock, inet_peer_unused_tailp) and can be declared static. 3) No more hard limit (PEER_MAX_CLEANUP_WORK = 30) -- peer_check_expire() try to delete entries for at most one timer tick. CPUS are going faster, hard limits are becoming useless... Similar thing is done in net/ipv4/route.c garbage collector. Signed-off-by: Eric Dumazet <[EMAIL PROTECTED]> --- linux-2.6.18/include/net/inetpeer.h Wed Sep 20 05:42:06 2006 +++ linux-2.6.18-ed/include/net/inetpeer.h Thu Oct 12 21:40:28 2006 @@ -19,7 +19,7 @@ { struct inet_peer*avl_left, *avl_right; struct inet_peer*unused_next, **unused_prevp; - unsigned long dtime; /* the time of last use of not + __u32 dtime; /* the time of last use of not * referenced entries */ atomic_trefcnt; __u32 v4daddr;/* peer's address */ @@ -35,21 +35,8 @@ /* can be called with or without local BH being disabled */ struct inet_peer *inet_getpeer(__u32 daddr, int create); -extern spinlock_t inet_peer_unused_lock; -extern struct inet_peer **inet_peer_unused_tailp; /* can be called from BH context or outside */ -static inline void inet_putpeer(struct inet_peer *p) -{ - spin_lock_bh(&inet_peer_unused_lock); - if (atomic_dec_and_test(&p->refcnt)) { - p->unused_prevp = inet_peer_unused_tailp; - p->unused_next = NULL; - *inet_peer_unused_tailp = p; - inet_peer_unused_tailp = &p->unused_next; - p->dtime = jiffies; - } - spin_unlock_bh(&inet_peer_unused_lock); -} +extern void inet_putpeer(struct inet_peer *p); extern spinlock_t inet_peer_idlock; /* can be called with or without local BH being disabled */ --- linux-2.6.18/net/ipv4/inetpeer.cWed Sep 20 05:42:06 2006 +++ linux-2.6.18-ed/net/ipv4/inetpeer.c Thu Oct 12 21:55:23 2006 @@ -94,10 +94,8 @@ int inet_peer_maxttl = 10 * 60 * HZ; /* usual time to live: 10 min */ static struct inet_peer *inet_peer_unused_head; -/* Exported for inet_putpeer inline function. */ -struct inet_peer **inet_peer_unused_tailp = &inet_peer_unused_head; -DEFINE_SPINLOCK(inet_peer_unused_lock); -#define PEER_MAX_CLEANUP_WORK 30 +static struct inet_peer **inet_peer_unused_tailp = &inet_peer_unused_head; +static DEFINE_SPINLOCK(inet_peer_unused_lock); static void peer_check_expire(unsigned long dummy); static DEFINE_TIMER(peer_periodic_timer, peer_check_expire, 0, 0); @@ -343,7 +341,8 @@ spin_lock_bh(&inet_peer_unused_lock); p = inet_peer_unused_head; if (p != NULL) { - if (time_after(p->dtime + ttl, jiffies)) { + __u32 delta = (__u32)jiffies - p->dtime; + if (delta < ttl) { /* Do not prune fresh entries. */ spin_unlock_bh(&inet_peer_unused_lock); return -1; @@ -435,7 +434,7 @@ /* Called with local BH disabled. */ static void peer_check_expire(unsigned long dummy) { - int i; + unsigned long now = jiffies; int ttl; if (peer_total >= inet_peer_threshold) @@ -444,7 +443,10 @@ ttl = inet_peer_maxttl - (inet_peer_maxttl - inet_peer_minttl) / HZ * peer_total / inet_peer_threshold * HZ; - for (i = 0; i < PEER_MAX_CLEANUP_WORK && !cleanup_once(ttl); i++); + while (!cleanup_once(ttl)) { + if (jiffies != now) + break; + } /* Trigger the timer after inet_peer_gc_mintime .. inet_peer_gc_maxtime * interval depending on the total number of entries (more entries, @@ -458,3 +460,16 @@ peer_total / inet_peer_threshold * HZ; add_timer(&peer_periodic_timer); } + +void inet_putpeer(struct inet_peer *p) +{ + spin_lock_bh(&inet_peer_unused_lock); + if (atomic_dec_and_test(&p->refcnt)) { + p->unused_prevp = inet_peer_unused_tailp; + p->unused_next = NULL; + *inet_peer_unused_tailp = p; + inet_
Re: [PATCH] bridge: flush forwarding table when device carrier off
On Thu, Oct 12, 2006 at 11:24:31AM -0700, Stephen Hemminger wrote: > Flush the forwarding table when carrier is lost. This helps for > availability because we don't want to forward to a downed device and > new packets may come in on other links. > > Signed-off-by: Stephen Hemminger <[EMAIL PROTECTED]> > Stephen, This is an excellent idea and all looks good except this check + if (f->is_static & !do_all) + continue; should be this: + if (f->is_static && !do_all) + continue; I'll ACK a repost with that change. -andy - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: user of the jiffies rounding code: Networking
Arjan van de Ven wrote: From: Arjan van de Ven <[EMAIL PROTECTED]> Subject: round_jiffies users CC: [EMAIL PROTECTED] CC: netdev@vger.kernel.org This patch introduces users of the round_jiffies() function in the networking code. These timers all were of the "about once a second" or "about once every X seconds" variety and several showed up in the "what wakes the cpu up" profiles that the tickless patches provide. Some timers are highly dynamic based on network load; but even on low activity systems they still show up so the rounding is done only in cases of low activity, allowing higher frequency timers in the high activity case. The various hardware watchdogs are an obvious case; they run every 2 seconds but aren't otherwise specific of exactly when they need to run. Signed-off-by: Arjan van de Ven <[EMAIL PROTECTED]> Index: linux-2.6.19-rc1-git6/net/core/dst.c === --- linux-2.6.19-rc1-git6.orig/net/core/dst.c +++ linux-2.6.19-rc1-git6/net/core/dst.c @@ -99,7 +99,14 @@ static void dst_run_gc(unsigned long dum printk("dst_total: %d/%d %ld\n", atomic_read(&dst_total), delayed, dst_gc_timer_expires); #endif - mod_timer(&dst_gc_timer, jiffies + dst_gc_timer_expires); + /* if the next desired timer is more than 4 seconds in the future +* then round the timer to whole seconds +*/ + if (dst_gc_timer_expires > 4*HZ) + mod_timer(&dst_gc_timer, + round_jiffies(jiffies + dst_gc_timer_expires)); + else + mod_timer(&dst_gc_timer, jiffies + dst_gc_timer_expires); out: spin_unlock(&dst_lock); Index: linux-2.6.19-rc1-git6/net/core/neighbour.c === --- linux-2.6.19-rc1-git6.orig/net/core/neighbour.c +++ linux-2.6.19-rc1-git6/net/core/neighbour.c @@ -695,7 +695,10 @@ next_elt: if (!expire) expire = 1; - mod_timer(&tbl->gc_timer, now + expire); + if (expire>HZ) + mod_timer(&tbl->gc_timer, round_jiffies(now + expire)); + else + mod_timer(&tbl->gc_timer, now + expire); write_unlock(&tbl->lock); } Index: linux-2.6.19-rc1-git6/net/sched/sch_generic.c === --- linux-2.6.19-rc1-git6.orig/net/sched/sch_generic.c +++ linux-2.6.19-rc1-git6/net/sched/sch_generic.c @@ -209,7 +209,7 @@ static void dev_watchdog(unsigned long a dev->name); dev->tx_timeout(dev); } - if (!mod_timer(&dev->watchdog_timer, jiffies + dev->watchdog_timeo)) + if (!mod_timer(&dev->watchdog_timer, round_jiffies(jiffies + dev->watchdog_timeo))) dev_hold(dev); } } Index: linux-2.6.19-rc1-git6/drivers/net/e1000/e1000_main.c === --- linux-2.6.19-rc1-git6.orig/drivers/net/e1000/e1000_main.c +++ linux-2.6.19-rc1-git6/drivers/net/e1000/e1000_main.c @@ -483,7 +483,7 @@ e1000_up(struct e1000_adapter *adapter) clear_bit(__E1000_DOWN, &adapter->flags); - mod_timer(&adapter->watchdog_timer, jiffies + 2 * HZ); + mod_timer(&adapter->watchdog_timer, round_jiffies(jiffies + 2 * HZ)); return 0; } @@ -2493,7 +2493,7 @@ e1000_watchdog(unsigned long data) netif_carrier_on(netdev); netif_wake_queue(netdev); - mod_timer(&adapter->phy_info_timer, jiffies + 2 * HZ); + mod_timer(&adapter->phy_info_timer, round_jiffies(jiffies + 2 * HZ)); adapter->smartspeed = 0; } } else { @@ -2503,7 +2503,7 @@ e1000_watchdog(unsigned long data) DPRINTK(LINK, INFO, "NIC Link is Down\n"); netif_carrier_off(netdev); netif_stop_queue(netdev); - mod_timer(&adapter->phy_info_timer, jiffies + 2 * HZ); + mod_timer(&adapter->phy_info_timer, round_jiffies(jiffies + 2 * HZ)); /* 80003ES2LAN workaround-- * For packet buffer work-around on link down event; @@ -2568,7 +2568,7 @@ e1000_watchdog(unsigned long data) e1000_rar_set(&adapter->hw, adapter->hw.mac_addr, 0); /* Reset the timer */ - mod_timer(&adapter->watchdog_timer, jiffies + 2 * HZ); + mod_timer(&adapter->watchdog_timer, round_jiffies(jiffies + 2 * HZ)); } #define E1000_TX_FLAGS_CSUM 0x0001 - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html For the e1000 parts, but in general too: Acked-by: Auke Kok <[EMAIL PROTECTED]> Cheers - To unsubscri
Re: Dropping NETIF_F_SG since no checksum feature.
Quoting r. David Miller <[EMAIL PROTECTED]>: > Subject: Re: Dropping NETIF_F_SG since no checksum feature. > > From: "Michael S. Tsirkin" <[EMAIL PROTECTED]> > Date: Wed, 11 Oct 2006 23:23:39 +0200 > > > With my patch, there is a huge performance gain by increasing MTU to 64K. > > And it seems the only way to do this is by S/G. > > Numbers? > I created two subnets on top of the same pair infiniband HCAs: [EMAIL PROTECTED] ~]# ifconfig ib0 ib0 Link encap:UNSPEC HWaddr 00-00-04-04-FE-80-00-00-00-00-00-00-00-00-00-00 inet addr:12.4.3.69 Bcast:12.255.255.255 Mask:255.0.0.0 inet6 addr: fe80::202:c902:20:ee45/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:2044 Metric:1 RX packets:1382531 errors:0 dropped:0 overruns:0 frame:0 TX packets:2725206 errors:0 dropped:5 overruns:0 carrier:0 collisions:0 txqueuelen:128 RX bytes:71892772 (68.5 MiB) TX bytes:5290011992 (4.9 GiB) [EMAIL PROTECTED] ~]# ifconfig ibc0 ibc0 Link encap:UNSPEC HWaddr 00-03-04-06-FE-80-00-00-00-00-00-00-00-00-00-00 inet addr:11.4.3.69 Bcast:11.255.255.255 Mask:255.0.0.0 inet6 addr: fe80::202:c902:20:ee45/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:65484 Metric:1 RX packets:115647 errors:0 dropped:0 overruns:0 frame:0 TX packets:253403 errors:0 dropped:4 overruns:0 carrier:0 collisions:0 txqueuelen:128 RX bytes:6014720 (5.7 MiB) TX bytes:16589589008 (15.4 GiB) The other side was configured with 12.4.3.68 for MTU 65484 and 11.4.3.68 for MTU 2044. And then I just run netperf: [EMAIL PROTECTED] ~]# [EMAIL PROTECTED] ~]# /mswg/work/mst/netperf-2.4.2/src/netperf -f M -H 12.4.3.68 -c -C TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 12.4.3.68 (12.4.3.68) port 0 AF_INET Recv SendSend Utilization Service Demand Socket Socket Message Elapsed Send Recv SendRecv Size SizeSize Time Throughput localremote local remote bytes bytes bytessecs.MBytes /s % S % S us/KB us/KB 87380 16384 1638410.00 286.45 40.2025.285.482 3.448 [EMAIL PROTECTED] ~]# /mswg/work/mst/netperf-2.4.2/src/netperf -f M -H 11.4.3.68 TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 11.4.3.68 (11.4.3.68) port 0 AF_INET Recv SendSend Socket Socket Message Elapsed Size SizeSize Time Throughput bytes bytes bytessecs.MBytes/sec 87380 16384 1638410.01 782.55 This is all very preliminary - but I hope you get the idea - increasing MTU is very helpful for infiniband, and infiniband adapters handle large S/G lists without problems, but the verbs do not include support for IP checksums, so these must be done in software. So what we would like, is for the infiniband network device to say "I don't support checksums, I only support S/G" and then for network layer to do the checksumming for us piggybacking on data copy at least for cases where it does perform the copy. Does this makes sense now? -- MST - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
user of the jiffies rounding code: Networking
From: Arjan van de Ven <[EMAIL PROTECTED]> Subject: round_jiffies users CC: [EMAIL PROTECTED] CC: netdev@vger.kernel.org This patch introduces users of the round_jiffies() function in the networking code. These timers all were of the "about once a second" or "about once every X seconds" variety and several showed up in the "what wakes the cpu up" profiles that the tickless patches provide. Some timers are highly dynamic based on network load; but even on low activity systems they still show up so the rounding is done only in cases of low activity, allowing higher frequency timers in the high activity case. The various hardware watchdogs are an obvious case; they run every 2 seconds but aren't otherwise specific of exactly when they need to run. Signed-off-by: Arjan van de Ven <[EMAIL PROTECTED]> Index: linux-2.6.19-rc1-git6/net/core/dst.c === --- linux-2.6.19-rc1-git6.orig/net/core/dst.c +++ linux-2.6.19-rc1-git6/net/core/dst.c @@ -99,7 +99,14 @@ static void dst_run_gc(unsigned long dum printk("dst_total: %d/%d %ld\n", atomic_read(&dst_total), delayed, dst_gc_timer_expires); #endif - mod_timer(&dst_gc_timer, jiffies + dst_gc_timer_expires); + /* if the next desired timer is more than 4 seconds in the future +* then round the timer to whole seconds +*/ + if (dst_gc_timer_expires > 4*HZ) + mod_timer(&dst_gc_timer, + round_jiffies(jiffies + dst_gc_timer_expires)); + else + mod_timer(&dst_gc_timer, jiffies + dst_gc_timer_expires); out: spin_unlock(&dst_lock); Index: linux-2.6.19-rc1-git6/net/core/neighbour.c === --- linux-2.6.19-rc1-git6.orig/net/core/neighbour.c +++ linux-2.6.19-rc1-git6/net/core/neighbour.c @@ -695,7 +695,10 @@ next_elt: if (!expire) expire = 1; - mod_timer(&tbl->gc_timer, now + expire); + if (expire>HZ) + mod_timer(&tbl->gc_timer, round_jiffies(now + expire)); + else + mod_timer(&tbl->gc_timer, now + expire); write_unlock(&tbl->lock); } Index: linux-2.6.19-rc1-git6/net/sched/sch_generic.c === --- linux-2.6.19-rc1-git6.orig/net/sched/sch_generic.c +++ linux-2.6.19-rc1-git6/net/sched/sch_generic.c @@ -209,7 +209,7 @@ static void dev_watchdog(unsigned long a dev->name); dev->tx_timeout(dev); } - if (!mod_timer(&dev->watchdog_timer, jiffies + dev->watchdog_timeo)) + if (!mod_timer(&dev->watchdog_timer, round_jiffies(jiffies + dev->watchdog_timeo))) dev_hold(dev); } } Index: linux-2.6.19-rc1-git6/drivers/net/e1000/e1000_main.c === --- linux-2.6.19-rc1-git6.orig/drivers/net/e1000/e1000_main.c +++ linux-2.6.19-rc1-git6/drivers/net/e1000/e1000_main.c @@ -483,7 +483,7 @@ e1000_up(struct e1000_adapter *adapter) clear_bit(__E1000_DOWN, &adapter->flags); - mod_timer(&adapter->watchdog_timer, jiffies + 2 * HZ); + mod_timer(&adapter->watchdog_timer, round_jiffies(jiffies + 2 * HZ)); return 0; } @@ -2493,7 +2493,7 @@ e1000_watchdog(unsigned long data) netif_carrier_on(netdev); netif_wake_queue(netdev); - mod_timer(&adapter->phy_info_timer, jiffies + 2 * HZ); + mod_timer(&adapter->phy_info_timer, round_jiffies(jiffies + 2 * HZ)); adapter->smartspeed = 0; } } else { @@ -2503,7 +2503,7 @@ e1000_watchdog(unsigned long data) DPRINTK(LINK, INFO, "NIC Link is Down\n"); netif_carrier_off(netdev); netif_stop_queue(netdev); - mod_timer(&adapter->phy_info_timer, jiffies + 2 * HZ); + mod_timer(&adapter->phy_info_timer, round_jiffies(jiffies + 2 * HZ)); /* 80003ES2LAN workaround-- * For packet buffer work-around on link down event; @@ -2568,7 +2568,7 @@ e1000_watchdog(unsigned long data) e1000_rar_set(&adapter->hw, adapter->hw.mac_addr, 0); /* Reset the timer */ - mod_timer(&adapter->watchdog_timer, jiffies + 2 * HZ); + mod_timer(&adapter->watchdog_timer, round_jiffies(jiffies + 2 * HZ)); } #define E1000_TX_FLAGS_CSUM0x0001 - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] bridge: flush forwarding table when device carrier off
Flush the forwarding table when carrier is lost. This helps for availability because we don't want to forward to a downed device and new packets may come in on other links. Signed-off-by: Stephen Hemminger <[EMAIL PROTECTED]> --- net/bridge/br_fdb.c |7 ++- net/bridge/br_if.c |4 ++-- net/bridge/br_private.h |2 +- net/bridge/br_stp_if.c |2 ++ 4 files changed, 11 insertions(+), 4 deletions(-) --- bridge.orig/net/bridge/br_fdb.c +++ bridge/net/bridge/br_fdb.c @@ -128,7 +128,10 @@ void br_fdb_cleanup(unsigned long _data) mod_timer(&br->gc_timer, jiffies + HZ/10); } -void br_fdb_delete_by_port(struct net_bridge *br, struct net_bridge_port *p) + +void br_fdb_delete_by_port(struct net_bridge *br, + const struct net_bridge_port *p, + int do_all) { int i; @@ -142,6 +145,8 @@ void br_fdb_delete_by_port(struct net_br if (f->dst != p) continue; + if (f->is_static & !do_all) + continue; /* * if multiple ports all have the same device address * then when one port is deleted, assign --- bridge.orig/net/bridge/br_if.c +++ bridge/net/bridge/br_if.c @@ -163,7 +163,7 @@ static void del_nbp(struct net_bridge_po br_stp_disable_port(p); spin_unlock_bh(&br->lock); - br_fdb_delete_by_port(br, p); + br_fdb_delete_by_port(br, p, 1); list_del_rcu(&p->list); @@ -448,7 +448,7 @@ int br_add_if(struct net_bridge *br, str return 0; err2: - br_fdb_delete_by_port(br, p); + br_fdb_delete_by_port(br, p, 1); err1: kobject_del(&p->kobj); err0: --- bridge.orig/net/bridge/br_private.h +++ bridge/net/bridge/br_private.h @@ -143,7 +143,7 @@ extern void br_fdb_changeaddr(struct net const unsigned char *newaddr); extern void br_fdb_cleanup(unsigned long arg); extern void br_fdb_delete_by_port(struct net_bridge *br, - struct net_bridge_port *p); + const struct net_bridge_port *p, int do_all); extern struct net_bridge_fdb_entry *__br_fdb_get(struct net_bridge *br, const unsigned char *addr); extern struct net_bridge_fdb_entry *br_fdb_get(struct net_bridge *br, --- bridge.orig/net/bridge/br_stp_if.c +++ bridge/net/bridge/br_stp_if.c @@ -113,6 +113,8 @@ void br_stp_disable_port(struct net_brid del_timer(&p->forward_delay_timer); del_timer(&p->hold_timer); + br_fdb_delete_by_port(br, p, 0); + br_configuration_update(br); br_port_state_selection(br); - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] e1000: Real time packets and bytes statistics
On Wed, 11 Oct 2006 10:44:12 -0700 "Jesse Brandeburg" <[EMAIL PROTECTED]> wrote: > On 10/11/06, Jean Delvare <[EMAIL PROTECTED]> wrote: > > Hi all, > > > > This patch is posted for review and comments. > > > > Let the e1000 driver report the most important statistics (rx/tx_bytes > > and rx/tx_packets) in real time, rather than every other second. This > > is similar to what the e100 driver is doing. > > > > The current asynchronous statistics refresh model makes it impossible > > to monitor the network traffic with an interval which isn't a multiple > > of 2 seconds. For example, an interval of 5 seconds would result in a > > sawtooth diagram (+20%, -20%) for a constant transfer rate. With a 1 > > second interval it's even worse (0, 200%) of course. This has been > > annoying users for years, but was never actually fixed: > > I think the idea is good, however, see below. > > > rx/tx_bytes will show slightly lower values than before, because the > > hardware appears to include the 4-byte ethernet frame CRC into the > > frame length, while the driver doesn't. It's probably OK as the > > e100, 3c59x and 8139too drivers don't include it either. > > this is okay. > > > I additionally noted a difference of 6 bytes on some TX frames, which > > I am not able to explain. It's probably small and rare enough not to > > be considered a problem, but if someone can explain it, I would be > > grateful. > > now, that sounds odd, however, once again, see below. > > > Signed-off-by: Jean Delvare <[EMAIL PROTECTED]> > > --- > > drivers/net/e1000/e1000_main.c | 14 ++ > > 1 file changed, 10 insertions(+), 4 deletions(-) > > > > --- linux-2.6.19-rc1.orig/drivers/net/e1000/e1000_main.c2006-10-11 > > 10:53:49.0 +0200 > > +++ linux-2.6.19-rc1/drivers/net/e1000/e1000_main.c 2006-10-11 > > 11:34:41.0 +0200 > > @@ -3118,6 +3118,8 @@ > >e1000_tx_map(adapter, tx_ring, skb, first, > > max_per_txd, nr_frags, mss)); > > > > + adapter->net_stats.tx_packets++; > > + adapter->net_stats.tx_bytes += skb->len; > > netdev->trans_start = jiffies; > > this is the part I'm most worried about. as I believe it to be > incorrect for TSO packets. Maybe something like? > + if (skb_shinfo(skb)->gso_segs) > + adapter->net_stats.tx_packets += skb_shinfo(skb)->gso_segs; > + else > + adapter->net_stats.tx_packets++; > + adapter->net_stats.tx_bytes += skb->len; > netdev->trans_start = jiffies; > > skb len will still be off by some amount, because the skb->data > (header) is replicated across each gso segment but only counted once > this way, but hopefully someone will pipe up with a good way to > compute that. > > The rest of the patch seems fine, barring any other comments. > > Jesse You might want to put the tx values in a per-cpu structure and sum later. Incrementing statistics can actually be a performance bottleneck on SMP tests, because it causes lots of cache thrashing. -- Stephen Hemminger <[EMAIL PROTECTED]> - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Suppress / delay SYN-ACK
Martin Schiller wrote: Hi! I'm searching for a solution to suppress / delay the SYN-ACK packet of a listening server (-application) until he has decided (e.g. analysed the requesting ip-address or checked if the corresponding other end of a connection is available) if he wants to accept the connect request of the client. If not, it should be possible to reject the connect request. How often do you expect the incomming call to be rejected? I suspect that would have a significant effect on whether the whole thing is worthwhile. rick jones - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Customizable TCP backoff patch
YOSHIFUJI Hideaki / 吉藤英明 wrote: + .data = &sysctl_tcp_rto_max, + .maxlen = sizeof(unsigned), sizeof(unsigned long) Good catch. That would have corrupted things badly on some 64b platforms. With all the flux in the area I forgot to change the size of that but would have been OK on the ia32 boxes. diff -ru linux-2.6.18/net/ipv4/tcp.c linux-2.6.18.new/net/ipv4/tcp.c --- linux-2.6.18/net/ipv4/tcp.c 2006-09-19 20:42:06.0 -0700 +++ linux-2.6.18.new/net/ipv4/tcp.c 2006-10-11 16:00:37.0 -0700 @@ -2110,6 +2126,12 @@ if (copy_to_user(optval, icsk->icsk_ca_ops->name, len)) return -EFAULT; return 0; + case TCP_BACKOFF_MAX: + val = jiffies_to_msecs(tcp_rto_max(tp)); + break; tp->rto_max + case TCP_BACKOFF_INIT: + val = jiffies_to_msecs(tcp_rto_init(tp)); + break; default: tp->rto_init OK I get it now. --yoshfuji diff -ru linux-2.6.18/include/linux/sysctl.h linux-2.6.18.new/include/linux/sysctl.h --- linux-2.6.18/include/linux/sysctl.h 2006-09-19 20:42:06.0 -0700 +++ linux-2.6.18.new/include/linux/sysctl.h 2006-10-11 10:27:52.0 -0700 @@ -411,6 +411,8 @@ NET_IPV4_TCP_WORKAROUND_SIGNED_WINDOWS=115, NET_TCP_DMA_COPYBREAK=116, NET_TCP_SLOW_START_AFTER_IDLE=117, + NET_TCP_RTO_MAX=118, + NET_TCP_RTO_INIT=119, }; enum { diff -ru linux-2.6.18/include/linux/tcp.h linux-2.6.18.new/include/linux/tcp.h --- linux-2.6.18/include/linux/tcp.h 2006-09-19 20:42:06.0 -0700 +++ linux-2.6.18.new/include/linux/tcp.h 2006-10-12 08:28:52.645411000 -0700 @@ -94,6 +94,8 @@ #define TCP_INFO 11 /* Information about this connection. */ #define TCP_QUICKACK 12 /* Block/reenable quick acks */ #define TCP_CONGESTION 13 /* Congestion control algorithm */ +#define TCP_BACKOFF_MAX 14 /* Maximum backoff value */ +#define TCP_BACKOFF_INIT 15 /* Initial backoff value */ #define TCPI_OPT_TIMESTAMPS 1 #define TCPI_OPT_SACK 2 @@ -257,6 +259,8 @@ __u8 frto_counter; /* Number of new acks after RTO */ __u8 nonagle; /* Disable Nagle algorithm? */ __u8 keepalive_probes; /* num of allowed keep alive probes */ + __u16 rto_max; /* Maximum backoff value in ms */ + __u16 rto_init; /* Initial backoff value in ms */ /* RTT measurement */ __u32 srtt; /* smoothed round trip time << 3 */ Only in linux-2.6.18.new/include/linux: tcp.h~ diff -ru linux-2.6.18/include/net/tcp.h linux-2.6.18.new/include/net/tcp.h --- linux-2.6.18/include/net/tcp.h 2006-09-19 20:42:06.0 -0700 +++ linux-2.6.18.new/include/net/tcp.h 2006-10-11 17:43:23.091431000 -0700 @@ -227,11 +227,23 @@ extern int sysctl_tcp_base_mss; extern int sysctl_tcp_workaround_signed_windows; extern int sysctl_tcp_slow_start_after_idle; +extern unsigned long sysctl_tcp_rto_max; +extern unsigned long sysctl_tcp_rto_init; extern atomic_t tcp_memory_allocated; extern atomic_t tcp_sockets_allocated; extern int tcp_memory_pressure; +static inline unsigned long tcp_rto_max(struct tcp_sock *tp) +{ + return tp->rto_max ? msecs_to_jiffies(tp->rto_max) : sysctl_tcp_rto_max; +} + +static inline unsigned long tcp_rto_init(struct tcp_sock *tp) +{ + return tp->rto_init ? msecs_to_jiffies(tp->rto_init) : sysctl_tcp_rto_init; +} + /* * The next routines deal with comparing 32 bit unsigned ints * and worry about wraparound (automatic with unsigned arithmetic). Only in linux-2.6.18.new/include/net: tcp.h~ diff -ru linux-2.6.18/net/ipv4/sysctl_net_ipv4.c linux-2.6.18.new/net/ipv4/sysctl_net_ipv4.c --- linux-2.6.18/net/ipv4/sysctl_net_ipv4.c 2006-09-19 20:42:06.0 -0700 +++ linux-2.6.18.new/net/ipv4/sysctl_net_ipv4.c 2006-10-12 07:14:41.87191 -0700 @@ -128,6 +128,8 @@ return ret; } +static unsigned long tcp_rto_min=0; +static unsigned long tcp_rto_max=65535; ctl_table ipv4_table[] = { { @@ -697,6 +699,26 @@ .mode = 0644, .proc_handler = &proc_dointvec }, + { + .ctl_name = NET_TCP_RTO_MAX, + .procname = "tcp_rto_max", + .data = &sysctl_tcp_rto_max, + .maxlen = sizeof(unsigned long), + .mode = 0644, + .proc_handler = &proc_doulongvec_ms_jiffies_minmax, + .extra1 = &tcp_rto_min_constant, + .extra2 = &tcp_rto_max_constant, + }, + { + .ctl_name = NET_TCP_RTO_INIT, + .procname = "tcp_rto_init", + .data = &sysctl_tcp_rto_init, + .maxlen = sizeof(unsigned long), + .mode = 0644, + .proc_handler = &proc_doulongvec_ms_jiffies_minmax, + .extra1 = &tcp_rto_min_constant, + .extra2 = &tcp_rto_max_constant, + }, { .ctl_name = 0 } }; Only in linux-2.6.18.new/net/ipv4: sysctl_net_ipv4.c~ diff -ru linux-2.6.18/net/ipv4/tcp.c linux-2.6.18.new/net/ipv4/tcp.c --- linux-2.6.18/net/ipv4/tcp.c 2006-09-19 20:42:06.0 -0700 +++ linux-2.6.18.new/net/ipv4/tcp.c 2006-10-12 07:18:01.193083000 -0700 @@ -1764,6 +1764,8 @@ return err; } +#define TCP_BACKOFF_MAXVAL 65535 + /* * Socket
Re: Suppress / delay SYN-ACK
On Thu, Oct 12, 2006 at 12:39:30PM +0200, Eric Dumazet ([EMAIL PROTECTED]) wrote: > > You should break your decision into per state change transformations. > > I think it is possible with either conntrack or netlink module Samir > > Bellabes creates (Network Events Connector > > subject) or even using syncookie algo changes. > > Hum.. they are some cases where conntrack is not an option (way too expensive > if your server handle XXX.XXX concurrent tcp streams) I think any netlink related work here can not be used for any kind of high performance setup - it will be too slow to send/receive one or more messages per state change for each new connection... > > But it will drastically change your server performance... > > Sure, at least its capacity to answer to SYN packets (session establishment > should be slower, unless the thread receiving/handling SYN packets has > realtime scheduling) Maybe it will be better to create some more complex protocol which will collect data before sending netlink message, or just use a procfs file or syscall/ioctl/socket option. > Eric -- Evgeniy Polyakov - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Suppress / delay SYN-ACK
On Thursday 12 October 2006 12:31, Evgeniy Polyakov wrote: > On Thu, Oct 12, 2006 at 12:13:26PM +0200, Martin Schiller ([EMAIL PROTECTED]) wrote: > > On Thursday, October 12, 2006 10:38 AM, Eric Dumazet wrote: > > > Well, it is already possible to delay the 'third packet' of an > > > outgoing connection with a litle hack. But AFAIK not the SYNACK of > > > incoming connection. It could be cool. Maybe some new syscalls are > > > needed: > > > > > > int syn_recv(int socklisten, ...); > > > /* give to user app the SYN packet */ > > > int syn_ack(int socklisten, ...); > > > /* User app has the ability to ask kernel tcp stack to : > > > DROP this packet. > > > REJECT the attempt > > > ACCEPT the attempt (sending a SYN/ACK) */ > > > > So, when do you mean the user-space application should run this syscalls? > > After the call to listen()? > > > > Another problem with this solution might be, that I don't want to block > > the listening socket with the processing of one request, because there > > could be a lot of simultaneous requests. > > You should break your decision into per state change transformations. > I think it is possible with either conntrack or netlink module Samir > Bellabes creates (Network Events Connector > subject) or even using syncookie algo changes. Hum.. they are some cases where conntrack is not an option (way too expensive if your server handle XXX.XXX concurrent tcp streams) > > But it will drastically change your server performance... Sure, at least its capacity to answer to SYN packets (session establishment should be slower, unless the thread receiving/handling SYN packets has realtime scheduling) Eric - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Suppress / delay SYN-ACK
On Thursday 12 October 2006 12:13, Martin Schiller wrote: > On Thursday, October 12, 2006 10:38 AM, Eric Dumazet wrote: > > Well, it is already possible to delay the 'third packet' of an > > outgoing connection with a litle hack. But AFAIK not the SYNACK of > > incoming connection. It could be cool. Maybe some new syscalls are > > needed: > > > > int syn_recv(int socklisten, ...); > > /* give to user app the SYN packet */ > > int syn_ack(int socklisten, ...); > > /* User app has the ability to ask kernel tcp stack to : > > DROP this packet. > > REJECT the attempt > > ACCEPT the attempt (sending a SYN/ACK) */ > > So, when do you mean the user-space application should run this syscalls? > After the call to listen()? > Exactly like when you call accept() on a non blocking listening socket. If your application did asked to received notification of SYN packets, it should be prepared to call accept() (to be notified of fully established connections) and/or syn_recv() (to be notified of SYN packets) So when poll()/select()/epoll() tells your socklisten has available events, your application would have to call both accept() and syn_recv() in a loop to empty all awaiting events. > Another problem with this solution might be, that I don't want to block the > listening socket with the processing of one request, because there could be > a lot of simultaneous requests. Yes I can imagine. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Suppress / delay SYN-ACK
On Thu, Oct 12, 2006 at 12:13:26PM +0200, Martin Schiller ([EMAIL PROTECTED]) wrote: > On Thursday, October 12, 2006 10:38 AM, Eric Dumazet wrote: > > > > Well, it is already possible to delay the 'third packet' of an > > outgoing connection with a litle hack. But AFAIK not the SYNACK of > > incoming connection. It could be cool. Maybe some new syscalls are > > needed: > > > > int syn_recv(int socklisten, ...); > > /* give to user app the SYN packet */ > > int syn_ack(int socklisten, ...); > > /* User app has the ability to ask kernel tcp stack to : > > DROP this packet. > > REJECT the attempt > > ACCEPT the attempt (sending a SYN/ACK) */ > > > > So, when do you mean the user-space application should run this syscalls? > After the call to listen()? > > Another problem with this solution might be, that I don't want to block the > listening socket with the processing of one request, because there could be > a lot of simultaneous requests. You should break your decision into per state change transformations. I think it is possible with either conntrack or netlink module Samir Bellabes creates (Network Events Connector subject) or even using syncookie algo changes. But it will drastically change your server performance... -- Evgeniy Polyakov - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [DECNET]: Use correct config option for routing by fwmark in compare_keys()
From: Patrick McHardy <[EMAIL PROTECTED]> Date: Thu, 12 Oct 2006 12:23:37 +0200 > Small bugfix to the compare_keys fix. Damn cut&paste :-) I'll add this fix tomorrow, thanks. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[DECNET]: Use correct config option for routing by fwmark in compare_keys()
Small bugfix to the compare_keys fix. [DECNET]: Use correct config option for routing by fwmark in compare_keys() Signed-off-by: Patrick McHardy <[EMAIL PROTECTED]> --- commit 8302c73a668852de1b1527038bc4c432cf757a7f tree d4e9be4f4bfc87b56bf4756a12a2538f2211fc84 parent 22c4cae48af19e83f31bb88a98970166beacc4fd author Patrick McHardy <[EMAIL PROTECTED]> Thu, 12 Oct 2006 12:22:57 +0200 committer Patrick McHardy <[EMAIL PROTECTED]> Thu, 12 Oct 2006 12:22:57 +0200 net/decnet/dn_route.c |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/net/decnet/dn_route.c b/net/decnet/dn_route.c index a2a43d8..491429c 100644 --- a/net/decnet/dn_route.c +++ b/net/decnet/dn_route.c @@ -269,7 +269,7 @@ static inline int compare_keys(struct fl { return ((fl1->nl_u.dn_u.daddr ^ fl2->nl_u.dn_u.daddr) | (fl1->nl_u.dn_u.saddr ^ fl2->nl_u.dn_u.saddr) | -#ifdef CONFIG_IP_ROUTE_FWMARK +#ifdef CONFIG_DECNET_ROUTE_FWMARK (fl1->nl_u.dn_u.fwmark ^ fl2->nl_u.dn_u.fwmark) | #endif (fl1->nl_u.dn_u.scope ^ fl2->nl_u.dn_u.scope) |
RE: Suppress / delay SYN-ACK
On Thursday, October 12, 2006 10:38 AM, Eric Dumazet wrote: > > Well, it is already possible to delay the 'third packet' of an > outgoing connection with a litle hack. But AFAIK not the SYNACK of > incoming connection. It could be cool. Maybe some new syscalls are > needed: > > int syn_recv(int socklisten, ...); > /* give to user app the SYN packet */ > int syn_ack(int socklisten, ...); > /* User app has the ability to ask kernel tcp stack to : > DROP this packet. > REJECT the attempt > ACCEPT the attempt (sending a SYN/ACK) */ > So, when do you mean the user-space application should run this syscalls? After the call to listen()? Another problem with this solution might be, that I don't want to block the listening socket with the processing of one request, because there could be a lot of simultaneous requests. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [IPv6] rules: Use RT6_LOOKUP_F_HAS_SADDR and fix source based selectors
In article <[EMAIL PROTECTED]> (at Thu, 12 Oct 2006 11:41:24 +0200), Thomas Graf <[EMAIL PROTECTED]> says: > Fixes rt6_lookup() to provide the source address in the flow > and sets RT6_LOOKUP_F_HAS_SADDR whenever it is present in > the flow. > > Avoids unnecessary prefix comparisons by checking for a prefix > length first. > > Fixes the rule logic to not match packets if a source selector > has been specified but no source address is available. > > Thanks to Kim Nordlund <[EMAIL PROTECTED]> for working > on this patch with me. > > Signed-off-by: Thomas Graf <[EMAIL PROTECTED]> I tend to agree. Ville, do you agree? --yoshfuji - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[IPv6] rules: Use RT6_LOOKUP_F_HAS_SADDR and fix source based selectors
Fixes rt6_lookup() to provide the source address in the flow and sets RT6_LOOKUP_F_HAS_SADDR whenever it is present in the flow. Avoids unnecessary prefix comparisons by checking for a prefix length first. Fixes the rule logic to not match packets if a source selector has been specified but no source address is available. Thanks to Kim Nordlund <[EMAIL PROTECTED]> for working on this patch with me. Signed-off-by: Thomas Graf <[EMAIL PROTECTED]> Index: net-2.6/net/ipv6/fib6_rules.c === --- net-2.6.orig/net/ipv6/fib6_rules.c 2006-10-11 22:29:50.0 +0200 +++ net-2.6/net/ipv6/fib6_rules.c 2006-10-12 11:01:00.0 +0200 @@ -117,12 +117,15 @@ { struct fib6_rule *r = (struct fib6_rule *) rule; - if (!ipv6_prefix_equal(&fl->fl6_dst, &r->dst.addr, r->dst.plen)) + if (r->dst.plen && + !ipv6_prefix_equal(&fl->fl6_dst, &r->dst.addr, r->dst.plen)) return 0; - if ((flags & RT6_LOOKUP_F_HAS_SADDR) && - !ipv6_prefix_equal(&fl->fl6_src, &r->src.addr, r->src.plen)) - return 0; + if (r->src.plen) { + if (!(flags & RT6_LOOKUP_F_HAS_SADDR) || + !ipv6_prefix_equal(&fl->fl6_src, &r->src.addr, r->src.plen)) + return 0; + } if (r->tclass && r->tclass != ((ntohl(fl->fl6_flowlabel) >> 20) & 0xff)) return 0; Index: net-2.6/net/ipv6/route.c === --- net-2.6.orig/net/ipv6/route.c 2006-10-11 22:29:50.0 +0200 +++ net-2.6/net/ipv6/route.c2006-10-12 10:59:13.0 +0200 @@ -529,13 +529,17 @@ .nl_u = { .ip6_u = { .daddr = *daddr, - /* TODO: saddr */ }, }, }; struct dst_entry *dst; int flags = strict ? RT6_LOOKUP_F_IFACE : 0; + if (saddr) { + memcpy(&fl.fl6_src, saddr, sizeof(*saddr)); + flags |= RT6_LOOKUP_F_HAS_SADDR; + } + dst = fib6_rule_lookup(&fl, flags, ip6_pol_route_lookup); if (dst->error == 0) return (struct rt6_info *) dst; @@ -697,6 +701,7 @@ void ip6_route_input(struct sk_buff *skb) { struct ipv6hdr *iph = skb->nh.ipv6h; + int flags = RT6_LOOKUP_F_HAS_SADDR; struct flowi fl = { .iif = skb->dev->ifindex, .nl_u = { @@ -711,7 +716,9 @@ }, .proto = iph->nexthdr, }; - int flags = rt6_need_strict(&iph->daddr) ? RT6_LOOKUP_F_IFACE : 0; + + if (rt6_need_strict(&iph->daddr)) + flags |= RT6_LOOKUP_F_IFACE; skb->dst = fib6_rule_lookup(&fl, flags, ip6_pol_route_input); } @@ -794,6 +801,9 @@ if (rt6_need_strict(&fl->fl6_dst)) flags |= RT6_LOOKUP_F_IFACE; + if (!ipv6_addr_any(&fl->fl6_src)) + flags |= RT6_LOOKUP_F_HAS_SADDR; + return fib6_rule_lookup(fl, flags, ip6_pol_route_output); } @@ -1345,6 +1355,7 @@ struct in6_addr *gateway, struct net_device *dev) { + int flags = RT6_LOOKUP_F_HAS_SADDR; struct ip6rd_flowi rdfl = { .fl = { .oif = dev->ifindex, @@ -1357,7 +1368,9 @@ }, .gateway = *gateway, }; - int flags = rt6_need_strict(dest) ? RT6_LOOKUP_F_IFACE : 0; + + if (rt6_need_strict(dest)) + flags |= RT6_LOOKUP_F_IFACE; return (struct rt6_info *)fib6_rule_lookup((struct flowi *)&rdfl, flags, __ip6_route_redirect); } - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH-update][RFC] net: consolidated UDP / UDP-Lite code
From: Gerrit Renker <[EMAIL PROTECTED]> Date: Thu, 12 Oct 2006 08:49:19 +0100 > please find attached the updated UDP-Lite patch - I have removed the > statistics corrections you pointed out to me. > > Can you please indicate whether you are ok, by and large, with the > changes performed by the patch? Even if it is some time ago, I > have implemented in this patch the architectural suggestions you > gave me a while earlier. The patch looks pretty good. I have no problems with how you implemented this at all. I think we'll have no problem getting this into 2.6.20 - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: sfuzz hanging on 2.6.18
From: Patrick McHardy <[EMAIL PROTECTED]> Date: Thu, 12 Oct 2006 09:46:47 +0200 > Looks like unbalanced locking. > > Signed-off-by: Patrick McHardy <[EMAIL PROTECTED]> Applied and pushed to -stable, thanks Patrick. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RTNETLINK]: Fix use of wrong skb in do_getlink()
From: Patrick McHardy <[EMAIL PROTECTED]> Date: Thu, 12 Oct 2006 08:40:57 +0200 > [RTNETLINK]: Fix use of wrong skb in do_getlink() > > skb is the netlink query, nskb is the reply message. > > Signed-off-by: Patrick McHardy <[EMAIL PROTECTED]> Applied, thanks Patrick. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Suppress / delay SYN-ACK
On Thursday 12 October 2006 10:08, Martin Schiller wrote: > Hi! > > I'm searching for a solution to suppress / delay the SYN-ACK packet of a > listening server (-application) until he has decided (e.g. analysed the > requesting ip-address or checked if the corresponding other end of a > connection is available) if he wants to accept the connect request of the > client. If not, it should be possible to reject the connect request. > > My idea is to add two ioctl's: > - One to set the listening socket into "delay_synack" mode. > - And one to send the synack packet, if the connection should be > accepted. > > If the "delay_synack" mode is not enabled, the connection should just work > as usual. > > I had a look at the tcp/ipv4 stack for a while and have found out, that > this three-way-handshake is already done before anything comes up to > user-space when I am doing a call to accept(). So I think it wouldn't be > possible to add this feature with "a little hack". > Well, it is already possible to delay the 'third packet' of an outgoing connection with a litle hack. But AFAIK not the SYNACK of incoming connection. It could be cool. Maybe some new syscalls are needed: int syn_recv(int socklisten, ...); /* give to user app the SYN packet */ int syn_ack(int socklisten, ...); /* User app has the ability to ask kernel tcp stack to : DROP this packet. REJECT the attempt ACCEPT the attempt (sending a SYN/ACK) */ Maybe NETLINK (netfilter) is able to meet your need. > Does anybody have any hints for me where I should start to work? > > Regards, > Martin Schiller - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Suppress / delay SYN-ACK
Hi! I'm searching for a solution to suppress / delay the SYN-ACK packet of a listening server (-application) until he has decided (e.g. analysed the requesting ip-address or checked if the corresponding other end of a connection is available) if he wants to accept the connect request of the client. If not, it should be possible to reject the connect request. My idea is to add two ioctl's: - One to set the listening socket into "delay_synack" mode. - And one to send the synack packet, if the connection should be accepted. If the "delay_synack" mode is not enabled, the connection should just work as usual. I had a look at the tcp/ipv4 stack for a while and have found out, that this three-way-handshake is already done before anything comes up to user-space when I am doing a call to accept(). So I think it wouldn't be possible to add this feature with "a little hack". Does anybody have any hints for me where I should start to work? Regards, Martin Schiller - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC] Question about potential problem in net/ipv4/route.c
From: Eric Dumazet <[EMAIL PROTECTED]> Date: Thu, 12 Oct 2006 08:35:47 +0200 > How about avoiding the fwmark thing if !CONFIG_IP_ROUTE_FWMARK I've added that, good idea. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: sfuzz hanging on 2.6.18
Dave Jones wrote: > sfuzz D 724EF62A 2828 28717 28691 (NOTLB) >cd69fe98 0082 012d 724ef62a 0001971a 0010 0007 > df6d22b0 >dfd81080 725bbc5e 0001971a 000cc634 0001 df6d23bc c140e260 > 0202 >de1d5ba0 cd69fea0 de1d5ba0 de1d5b60 de1d5b8c > de1d5ba0 > Call Trace: > [] lock_sock+0x75/0xa6 > [] dn_getname+0x18/0x5f [decnet] > [] sys_getsockname+0x5c/0xb0 > [] sys_socketcall+0xef/0x261 > [] syscall_call+0x7/0xb > DWARF2 unwinder stuck at syscall_call+0x7/0xb > > I wonder if the plethora of lockdep related changes inadvertantly broke > something? Looks like unbalanced locking. Signed-off-by: Patrick McHardy <[EMAIL PROTECTED]> diff --git a/net/decnet/af_decnet.c b/net/decnet/af_decnet.c index 70e0273..3456cd3 100644 --- a/net/decnet/af_decnet.c +++ b/net/decnet/af_decnet.c @@ -1178,8 +1178,10 @@ static int dn_getname(struct socket *soc if (peer) { if ((sock->state != SS_CONNECTED && sock->state != SS_CONNECTING) && - scp->accept_mode == ACC_IMMED) + scp->accept_mode == ACC_IMMED) { + release_sock(sk); return -ENOTCONN; + } memcpy(sa, &scp->peer, sizeof(struct sockaddr_dn)); } else {
sfuzz hanging on 2.6.18
sfuzz.c (google for it if you don't have it already) used to run forver (or until I got bored and ctrl-c'd it) as long as it didn't trigger an oops or the like in 2.6.17 Running it against 2.6.18, I notice that it runs for a while, and then gets totally wedged. It doesn't respond to any signals, can't be ptraced, and even strace subsequently gets wedged. The machine responds, and is still interactive, but that process is hosed. sysrq-t shows it stuck here.. sfuzz D 724EF62A 2828 28717 28691 (NOTLB) cd69fe98 0082 012d 724ef62a 0001971a 0010 0007 df6d22b0 dfd81080 725bbc5e 0001971a 000cc634 0001 df6d23bc c140e260 0202 de1d5ba0 cd69fea0 de1d5ba0 de1d5b60 de1d5b8c de1d5ba0 Call Trace: [] lock_sock+0x75/0xa6 [] dn_getname+0x18/0x5f [decnet] [] sys_getsockname+0x5c/0xb0 [] sys_socketcall+0xef/0x261 [] syscall_call+0x7/0xb DWARF2 unwinder stuck at syscall_call+0x7/0xb I wonder if the plethora of lockdep related changes inadvertantly broke something? Dave -- http://www.codemonkey.org.uk - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] NET/bluetooth: handle sysfs errors
Hi Jeff, thanks for the patch, but I already have one that fixes this and it will go to David Miller for inclusion soon. Regards Marcel - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH 00/11] The _entire_ secid reconciliation patchset (tada!)
On Wed, 11 Oct 2006, Venkat Yekkirala wrote: > > Outstanding items include resolving the igmp skb hook issue > > generally, > > testing to verify both the design and implementation, and > > ensuring that > > all the related policy changes are merged upstream first. > > > Regarding the igmp hook issue, we could do a generic hook > like Paul suggested. Would that be more palatable you think? It needs to be investigated to see if anything else in the kernel is doing the same thing, and then most likely, a generic hook for classifying non-socket packets (you could pass the protocol as a hook parameter). -- James Morris <[EMAIL PROTECTED]> - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html