[PATCH] IPV6: Remove bogus WARN_ON() in Proxy-NA handling.

2006-10-12 Thread YOSHIFUJI Hideaki / 吉藤英明
[IPV6]: Remove bogus WARN_ON in Proxy-NA handling.

Signed-off-by: YOSHIFUJI Hideaki <[EMAIL PROTECTED]>c

---
diff --git a/net/ipv6/ndisc.c b/net/ipv6/ndisc.c
index 0304b5f..41a8a5f 100644
--- a/net/ipv6/ndisc.c
+++ b/net/ipv6/ndisc.c
@@ -967,8 +967,6 @@ static void ndisc_recv_na(struct sk_buff
ipv6_devconf.forwarding && ipv6_devconf.proxy_ndp &&
pneigh_lookup(&nd_tbl, &msg->target, dev, 0)) {
/* XXX: idev->cnf.prixy_ndp */
-   WARN_ON(skb->dst != NULL &&
-   ((struct rt6_info *)skb->dst)->rt6i_idev);
goto out;
}
 

-- 
YOSHIFUJI Hideaki @ USAGI Project  <[EMAIL PROTECTED]>
GPG-FP  : 9022 65EB 1ECF 3AD1 0BDF  80D8 4807 F894 E062 0EEA
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: Suppress / delay SYN-ACK

2006-10-12 Thread Martin Schiller
On Friday, October 13, 2006 7:42 AM, Stephen J. Bevan wrote:

> Say you are writing a transparent proxy i.e. when a TCP connection is
> made through the box, rather than forwarding the TCP SYN, it is
> delivered locally where it accepted and then the proxy makes a
> separate TCP connection to original IP address.  Thus all traffic
> flows through a user-space proxy that can cache, log, virus scan, ...
> etc. the traffic.  Say also that the proxy is for a protocol that can
> mediate peer<->peer connections via a server (e.g. most IM
> protocols).  Furher still assume that the client has the property
> that if while trying to establish a peer<->peer connection it will
> back off and use the server if it does not manage to establish the
> peer<->peer TCP connection but if it does establish the connection
> then it will not back off to use the server.  Thus if a client is
> behind the transparent proxy the proxy terminates the TCP connection
> locally and at that point the client thinks it has connected to the
> peer even though the proxy has yet to establish a connection to the
> peer.   
> 
> Should the proxy fail to do so all it can do is drop the
> client<->proxy connection at which point the client does not connect
> via the server and the user of the client is not happy since if the
> proxy wasn't there everything would have worked just fine.  So, if
> the proxy could delay the SYN/ACK until it has determined whether it
> can really connect to the IP address in the SYN, then it can decide
> whether to SYN/ACK or just not respond.  
> 
> Of course, the much simpler solution is to fix the client program so
> that it will still back off to the server even if it does manage to
> make a TCP connection.  However, fixing other people's software is
> easier said than done.  So if you are trying to $ell a tranparent
> proxy solution, you need to handle it somehow.  Delayed SYN/ACK is
> one such way, though not necessarily the best way. -
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to [EMAIL PROTECTED] More majordomo
> info at  http://vger.kernel.org/majordomo-info.html  

That's nearly exactly what our situation is:

The machine on which the SYN-ACK-feature should be implemented is a
TCP-to-X.25 Gateway. There are really stupid TCP terminals out there which
connects to the Gateway and simply start sending their data after the
connection between them and the Gateway is established.

The Gateway otherwise has to check its internal routing-table, which X.25
Number should be called for the requesting TCP terminal and establish this
X.25 connection.

And now here is the point:
If, why ever, the X.25 connection can't be established, the TCP-connection
to the terminal has to be closed, or even better: NOT been established at
all, so that the terminal can't send any data.

So if you ask me, how often the connections should be rejected, i have to
say: "Hopefully never", but so long as this stupid terminals will be very
confused if the connection is firstly established and than suddenly closed,
I think I can't resign this feature.



-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Dropping NETIF_F_SG since no checksum feature.

2006-10-12 Thread Michael S. Tsirkin
Quoting r. David Miller <[EMAIL PROTECTED]>:
> Subject: Re: Dropping NETIF_F_SG since no checksum feature.
> 
> From: "Michael S. Tsirkin" <[EMAIL PROTECTED]>
> Date: Thu, 12 Oct 2006 21:12:06 +0200
> 
> > Quoting r. David Miller <[EMAIL PROTECTED]>:
> > > Subject: Re: Dropping NETIF_F_SG since no checksum feature.
> > > 
> > > Numbers?
> > 
> > I created two subnets on top of the same pair infiniband HCAs:
> 
> I was asking for SG vs. non-SG numbers so I could see proof
> that it really does help like you say it will.
> 

Dave, thanks for the clarification.
Please note that ib0 is a non-SG device with MTU 2K,
sorry that I forgot to mention that.


so, to summarize my previous mail:


interface   flags  mtubandwidth
ib0 linear(0)  2044   286.45
ibc0_F_SG  65484  782.55



If I will set both ib0 and ibc0 to 64K MTU, then
benchmark-mode with the same MTU SG is somewhat slower than non-SG 
(I tested this at some point, by some 10%, don't have the numbers at the moment 
-
do you want to see them?).  I did not claim it is faster to do SG with same MTU
and it is I think clear why linear should be faster for copy *with the same 
MTU*.
But do you really think that we will be able to allocate
even a single 64K linear skb after the machine has been active for a while?

My assumption is that if I want to reliably get MTU > PAGE_SIZE I must support 
SG.
Is it the wrong one?

If this assumption is correct, then below is my line of thinking:
- with infiniband we provably get a 2.5x speedup with MTU of 64K vs to 2K.
- to get packets of that size reliably we must declare S/G support
- infiniband verbs do not support IP checksumming
- per network algorithmics, it is better to piggyback checksum calculation
  on copying if copying takes place

For this reason, I would like to define the meaning of S/G set when checksum
bits are all clear as "we support S/G but not checksum, please checksum
for us if you copy data anyway". Alternatively, add a new
NETIF_F_??_CSUM bit to mean this capability.
Does this make sense?

Thanks,

-- 
MST
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Suppress / delay SYN-ACK

2006-10-12 Thread Stephen J. Bevan
Caitlin Bestler writes:
 > More to the point, on what basis would the application be rejecting a
 > connection request based solely on the SYN?

Perhaps not the reason that Martin is interested in but ...

Say you are writing a transparent proxy i.e. when a TCP connection is
made through the box, rather than forwarding the TCP SYN, it is
delivered locally where it accepted and then the proxy makes a
separate TCP connection to original IP address.  Thus all traffic
flows through a user-space proxy that can cache, log, virus scan,
... etc. the traffic.  Say also that the proxy is for a protocol that
can mediate peer<->peer connections via a server (e.g. most IM
protocols).  Furher still assume that the client has the property that
if while trying to establish a peer<->peer connection it will back off
and use the server if it does not manage to establish the peer<->peer
TCP connection but if it does establish the connection then it will
not back off to use the server.  Thus if a client is behind the
transparent proxy the proxy terminates the TCP connection locally and
at that point the client thinks it has connected to the peer even
though the proxy has yet to establish a connection to the peer.

Should the proxy fail to do so all it can do is drop the
client<->proxy connection at which point the client does not connect
via the server and the user of the client is not happy since if the
proxy wasn't there everything would have worked just fine.  So, if the
proxy could delay the SYN/ACK until it has determined whether it can
really connect to the IP address in the SYN, then it can decide
whether to SYN/ACK or just not respond.

Of course, the much simpler solution is to fix the client program so
that it will still back off to the server even if it does manage to
make a TCP connection.  However, fixing other people's software is
easier said than done.  So if you are trying to $ell a tranparent
proxy solution, you need to handle it somehow.  Delayed SYN/ACK is one
such way, though not necessarily the best way.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[FYI]: Introduction of the support for RFC4312(The Camellia Cipher Algorithm)

2006-10-12 Thread Noriaki TAKAMIYA
Hi all,

  This is Takamiya, from NTT Software.

  NTT has released the code of the new cipher algorithm, which is
  specified in RFC4312(The Camellia Cipher Algorithm)

  Please see
  http://info.isl.ntt.co.jp/crypt/eng/camellia/source_s.html .

  The above patch is available for the version of 2.6.18.

  We started to prepare the patch against the cryptodev-2.6 tree, and
  will submit it in the few weeks.

  Best regards.

P.S.
  The patches to use camellia algorithm from ipsec-tools is also
  available on the above URL, and it will be merged into the
  ipsec-tools, too.

-- 
Noriaki TAKAMIYA
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] [NET] reduce sizeof(struct inet_peer), cleanup, change in peer_check_expire()

2006-10-12 Thread Eric Dumazet

David Miller a écrit :

From: Eric Dumazet <[EMAIL PROTECTED]>
Date: Fri, 13 Oct 2006 05:56:43 +0200


2^31 is 2147483648

Thats a *lot* of timer ticks, an inet_peer entry should not stay in 
unused_list for more than 10 minutes.


My bad, I thought the time was compared to the creation time
not the time at which it was added to the unused list.

I like your patch and I'll apply it.

Thanks Eric.


Thank you David

(Re-reading my previous mail, I forgot to say that on ia32, unsigned long is 
already 32 bits, so if a 32bits timestamp is OK (for delta computing) for such 
platforms, it must be  OK for other platforms as well)


Eric

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Dropping NETIF_F_SG since no checksum feature.

2006-10-12 Thread David Miller
From: "Michael S. Tsirkin" <[EMAIL PROTECTED]>
Date: Thu, 12 Oct 2006 21:12:06 +0200

> Quoting r. David Miller <[EMAIL PROTECTED]>:
> > Subject: Re: Dropping NETIF_F_SG since no checksum feature.
> > 
> > Numbers?
> 
> I created two subnets on top of the same pair infiniband HCAs:

I was asking for SG vs. non-SG numbers so I could see proof
that it really does help like you say it will.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] [NET] reduce sizeof(struct inet_peer), cleanup, change in peer_check_expire()

2006-10-12 Thread David Miller
From: Eric Dumazet <[EMAIL PROTECTED]>
Date: Fri, 13 Oct 2006 05:56:43 +0200

> 2^31 is 2147483648
> 
> Thats a *lot* of timer ticks, an inet_peer entry should not stay in 
> unused_list for more than 10 minutes.

My bad, I thought the time was compared to the creation time
not the time at which it was added to the unused list.

I like your patch and I'll apply it.

Thanks Eric.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Suppress / delay SYN-ACK

2006-10-12 Thread Eric Dumazet

Rick Jones a écrit :

More to the point, on what basis would the application be rejecting a
connection request based solely on the SYN?


True, it isn't like there would suddenly be any call user data as in 
XTI/TLI.


DATA payload could be included in the SYN packet. TCP specs allow this AFAIK.

About iptables rules added on the fly by an application that want to protect 
its listen queue from random sources of 'blacklisted' peers, this has the 
limitation of granting sufficient rights to the user running the application.


Eric

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] [NET] reduce sizeof(struct inet_peer), cleanup, change in peer_check_expire()

2006-10-12 Thread Eric Dumazet

David Miller a écrit :

From: Eric Dumazet <[EMAIL PROTECTED]>
Date: Thu, 12 Oct 2006 22:14:12 +0200


1) shrink struct inet_peer on 64 bits platforms.

I noticed sizeof(struct inet_peer) was 64+8 on x86_64

As we dont really need 64 bits timestamps (we only care for garbage 
collection), we can use 32bits ones and reduce sizeof(struct inet_peer) to 64 
bytes : Because of SLAB_HWCACHE_ALIGN constraint, final allocation is 64 bytes 
instead of 128 bytes per inet_peer structure.


I'm not convinced this is %100 correct.  There are wrapping
cases that I think aren't covered.

Consider an entry that lives long enough for the lower 32-bits
of jiffies to wrap, then we kill it, but we won't purge it
properly if the wrapped jiffie is close to dtime.

I'm sure there are other similar cases as well.


Hum, if it was incorrect, I urge you to grep for tcp_time_stamp, and correct 
this as soon as possible :)


2^31 is 2147483648

Thats a *lot* of timer ticks, an inet_peer entry should not stay in 
unused_list for more than 10 minutes.


Even if the system is under stress for more than 30 days, and some entries 
stay that long in unused list, they wont leak : either they are re-used, 
either they are purged by cleanup_once(0); done in inet_getpeer().


Eric

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [IPv6] rules: Use RT6_LOOKUP_F_HAS_SADDR and fix source based selectors

2006-10-12 Thread David Miller
From: YOSHIFUJI Hideaki <[EMAIL PROTECTED]>
Date: Thu, 12 Oct 2006 18:55:51 +0900 (JST)

> I tend to agree.  Ville, do you agree?

I'll wait for Ville's response before applying this.
Otherwise, I think the change looks fine.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: What is current sundance.c status

2006-10-12 Thread Jesse Huang
Ok, I will generate those again with descriptions.

Thank you!

Best Regards,
Jesse Huang.

- Original Message - 
From: "Andrew Morton" <[EMAIL PROTECTED]>
To: "Jesse Huang" <[EMAIL PROTECTED]>
Cc: ; ;
<[EMAIL PROTECTED]>
Sent: Thursday, October 12, 2006 10:55 AM
Subject: Re: What is current sundance.c status


On Thu, 12 Oct 2006 10:29:37 +0800
"Jesse Huang" <[EMAIL PROTECTED]> wrote:

> Would you tell me what is the current IP100A status? Should I
re-generate patches again. Would it put into kernel or not?

I'm sitting on a copy of them.  I didn't send them to Jeff last time
because:

sundance-remove-txstartthresh-and-rxearlythresh.patch

 There's no description of what this patent issue is.

sundance-fix-tx-pause-bug-reset_tx-intr_handler.patch

 There's no description of the bug which got fixed, nor how this patch
 fixes it.

sundance-change-phy-address-search-from-phy=1-to-phy=0.patch

 There's a (small) possibility that this will break on hardware which
 _doesn't_ have a phy at address 0.

sundance-correct-initial-and-close-hardware-step.patch

 There's no real description of the bug which is being fixed, nor of how
 this patch fixes it.

sundance-solve-host-error-problem-in-low-performance-embedded.patch

 No description of what the "host error problem" is, nor of what causes
 it, nor of how this patch fixes it.


So generally these patches are a bit worrying, and it is hard to gauge what
their risk factor is.


-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] sk98lin: handle pci_enable_device() return value in skge_resume() properly

2006-10-12 Thread Jiri Kosina
On Thu, 12 Oct 2006, Andrew Morton wrote:

> > pci_set_power_state(pdev, PCI_D0);
> > pci_restore_state(pdev);
> > -   pci_enable_device(pdev);
> > +   ret = pci_enable_device(pdev);
> > +   if (ret) {
> > +   printk(KERN_ERR "sk98lin: Cannot enable PCI device %s during 
> > resume\n", 
> > +   dev->name);
> > +   unregister_netdev(dev);
> This looks rather wrong - skge_exit() will run unregister_netdev() again.

You are of course right (the problem was also spotted by Russell King). 
This I believe is the correct one for the sk98lin case.

[PATCH] fix sk98lin driver, ignoring return value from pci_enable_device()

add check of return value to _resume() function of sk98lin driver.

Signed-off-by: Jiri Kosina <[EMAIL PROTECTED]>

---

 drivers/net/sk98lin/skge.c |   20 +++-
 1 files changed, 15 insertions(+), 5 deletions(-)

diff --git a/drivers/net/sk98lin/skge.c b/drivers/net/sk98lin/skge.c
index d4913c3..3a9323d 100644
--- a/drivers/net/sk98lin/skge.c
+++ b/drivers/net/sk98lin/skge.c
@@ -5070,7 +5070,12 @@ static int skge_resume(struct pci_dev *p
 
pci_set_power_state(pdev, PCI_D0);
pci_restore_state(pdev);
-   pci_enable_device(pdev);
+   ret = pci_enable_device(pdev);
+   if (ret) {
+   printk(KERN_WARNING "sk98lin: unable to enable device %s in 
resume\n",
+   dev->name);
+   goto out_err;
+   }   
pci_set_master(pdev);
if (pAC->GIni.GIMacsFound == 2)
ret = request_irq(dev->irq, SkGeIsr, IRQF_SHARED, "sk98lin", 
dev);
@@ -5078,10 +5083,8 @@ static int skge_resume(struct pci_dev *p
ret = request_irq(dev->irq, SkGeIsrOnePort, IRQF_SHARED, 
"sk98lin", dev);
if (ret) {
printk(KERN_WARNING "sk98lin: unable to acquire IRQ %d\n", 
dev->irq);
-   pAC->AllocFlag &= ~SK_ALLOC_IRQ;
-   dev->irq = 0;
-   pci_disable_device(pdev);
-   return -EBUSY;
+   ret = -EBUSY;
+   goto out_err;
}
 
netif_device_attach(dev);
@@ -5098,6 +5101,13 @@ static int skge_resume(struct pci_dev *p
}
 
return 0;
+out_err:
+   pAC->AllocFlag &= ~SK_ALLOC_IRQ;
+   dev->irq = 0;
+   pci_disable_device(pdev);
+
+   return ret;
+
 }
 #else
 #define skge_suspend NULL

-- 
Jiri Kosina
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Suppress / delay SYN-ACK

2006-10-12 Thread Stephen Hemminger
On Thu, 12 Oct 2006 15:54:49 -0700
Rick Jones <[EMAIL PROTECTED]> wrote:

> > More to the point, on what basis would the application be rejecting a
> > connection request based solely on the SYN?
> 
> True, it isn't like there would suddenly be any call user data as in XTI/TLI.
> 
> > There are only two pieces of information available: the remote IP
> > address and port, and the total number of pending requests. The
> > latter is already addressed through the backlog size, and netfilter
> > rules can already be used to reject based on IP address.
> 
> It would though allow an application to have an even more restricted set of 
> allowed IP's than was set in netfilter.  Rather like allowing the application 
> to 
> set socket buffer sizes rather than relying on the system's default.
>

Some version of BSD sockets had this behaviour, perhaps you should use
the same model.  It was some socket option, I can't remember; what ever
it wasn't widely adopted. Nothing says you can't just use shutdown() to
force a RST on the addresses you don't want to talk to.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] sk98lin: handle pci_enable_device() return value in skge_resume() properly

2006-10-12 Thread Andrew Morton
On Fri, 13 Oct 2006 00:57:18 +0200 (CEST)
Jiri Kosina <[EMAIL PROTECTED]> wrote:

> @@ -5070,7 +5070,13 @@ static int skge_resume(struct pci_dev *p
>  
>   pci_set_power_state(pdev, PCI_D0);
>   pci_restore_state(pdev);
> - pci_enable_device(pdev);
> + ret = pci_enable_device(pdev);
> + if (ret) {
> + printk(KERN_ERR "sk98lin: Cannot enable PCI device %s during 
> resume\n", 
> + dev->name);
> + unregister_netdev(dev);

This looks rather wrong - skge_exit() will run unregister_netdev() again.

Look a few lines down, to where this function already handles request_irq()
failure, reuse that code path.  Hopefully it has been tested..

(Once we have an easy-to-use fault-injection framework we'll be able to
test all these things more easily)

(But it's possible to test them already, with a bit of ad-hoc testing code)
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


FW: [patch] Performance enhancement patches for SB1250 MAC

2006-10-12 Thread Yang, Steve
-Original Message-
From: Yang, Steve 
Sent: Thursday, October 12, 2006 5:46 PM
To: 'Stephen Hemminger'
Cc: netdev@vger.kernel.org
Subject: RE: [patch] Performance enhancement patches for SB1250 MAC

Stephen,

I assume the "expense" you referred to is the reserved SK cache buffers.


1. The SKB_CACHE does hold on to buffers which would
   otherwise be returned to the system (although the
   number it holds on to is limited and configurable).
   These buffers are only returned with certainty
   at module unload time, although with normal traffic
   most of them would be recycled pretty quick.  I think
   the cache was implemented as a stack, rather than a
   FIFO, which could cause a few buffers to be held for
   quite a while under light loads.

2. SKB_CACHE, just like NAPI, is also a configurable
   option. Systems that need the performance have the
   option of turning this on, at the expense of small
   number of buffers; other systems which don't care
   much about networking performance can leave this
   option off.

3. Can you elaborate other possible issues that you
   touch upon (memory starvation/race, etc.)?

Regards,
Steve Yang


-Original Message-
From: Stephen Hemminger [mailto:[EMAIL PROTECTED] 
Sent: Thursday, October 12, 2006 3:14 PM
To: Yang, Steve
Cc: netdev@vger.kernel.org
Subject: Re: [patch] Performance enhancement patches for SB1250 MAC

On Thu, 12 Oct 2006 14:54:33 -0700
"Yang, Steve" <[EMAIL PROTECTED]> wrote:

> FYI ...
> 
> Regards,
> Steve Yang
> 
> -Original Message-
> From: Yang, Steve
> Sent: Monday, September 25, 2006 3:50 PM
> To: '[EMAIL PROTECTED]'
> Cc: '[EMAIL PROTECTED]'; 'Mark E Mason'
> Subject: Performance enhancement patches for SB1250 MAC
> 
> Hi,
> 
> The attached are two network performance enhancement patches for 
> SB1250 MAC. The NAPI patch applies first. Followed by the "skb cache"
patch.
> They applied and builds cleanly on 2.6.18 kernel for the following 
> kernel option combinations:
> 
> SBMAC_NAPIno  yes yes
> SKB_CACHE no  no  yes
>  
> Regards,
> Steve Yang
> 

NAK on the SKB_CACHE it is idea that just ends up favoring your driver
at the expense of the rest of the system. Also, there are
resource/memory starvation issues and probably other races as well.

I bet it makes your benchmark run faster, but it doesn't belong in
normal kernel

--
Stephen Hemminger <[EMAIL PROTECTED]>
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] sk98lin: handle pci_enable_device() return value in skge_resume() properly

2006-10-12 Thread Jiri Kosina
On Thu, 12 Oct 2006, Stephen Hemminger wrote:

> > > Having the device unregister seems harsh.
> > What would be the proper way? As the initialization failed, accessing 
> > the device would not make sense any more (therefore I don't think that 
> > calling skge_remove_one() would be OK, as it issues calls to 
> > SkEventQueue() and SkEventDispatcher(), trying to send something to 
> > the card).
> I guess, its just not clear what the state of the machine is anyway
> if you can't enable the device something is hosed (or the device was
> hot removed).

Well, it depends on definition of 'hot'. What would for example happen in 
the case suspend-to-disk -> remove the card when the machine is switched 
off -> resume-from-disk? I guess that exactly this pci_enable_device() 
will fail, so we definitely have to handle this case, as it can easily 
happen.

> > > Why put condtional on same line?
> > Pardon me?
> I prefer:
>   ret = pci_enable_device(pdev);

As you wish. 

[PATCH] fix sk98lin driver, ignoring return value from pci_enable_device()

add check of return value to _resume() function of sk98lin driver.

Signed-off-by: Jiri Kosina <[EMAIL PROTECTED]>

---

 drivers/net/sk98lin/skge.c |8 +++-
 1 files changed, 7 insertions(+), 1 deletions(-)

diff --git a/drivers/net/sk98lin/skge.c b/drivers/net/sk98lin/skge.c
index d4913c3..d691811 100644
--- a/drivers/net/sk98lin/skge.c
+++ b/drivers/net/sk98lin/skge.c
@@ -5070,7 +5070,13 @@ static int skge_resume(struct pci_dev *p
 
pci_set_power_state(pdev, PCI_D0);
pci_restore_state(pdev);
-   pci_enable_device(pdev);
+   ret = pci_enable_device(pdev);
+   if (ret) {
+   printk(KERN_ERR "sk98lin: Cannot enable PCI device %s during 
resume\n", 
+   dev->name);
+   unregister_netdev(dev);
+   return ret;
+   }
pci_set_master(pdev);
if (pAC->GIni.GIMacsFound == 2)
ret = request_irq(dev->irq, SkGeIsr, IRQF_SHARED, "sk98lin", 
dev);

-- 
Jiri Kosina
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Suppress / delay SYN-ACK

2006-10-12 Thread Rick Jones

More to the point, on what basis would the application be rejecting a
connection request based solely on the SYN?


True, it isn't like there would suddenly be any call user data as in XTI/TLI.


There are only two pieces of information available: the remote IP
address and port, and the total number of pending requests. The
latter is already addressed through the backlog size, and netfilter
rules can already be used to reject based on IP address.


It would though allow an application to have an even more restricted set of 
allowed IP's than was set in netfilter.  Rather like allowing the application to 
set socket buffer sizes rather than relying on the system's default.


rick jones
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] sk98lin: handle pci_enable_device() return value in skge_resume() properly

2006-10-12 Thread Stephen Hemminger
On Fri, 13 Oct 2006 00:38:20 +0200 (CEST)
Jiri Kosina <[EMAIL PROTECTED]> wrote:

> On Thu, 12 Oct 2006, Stephen Hemminger wrote:
> 
> > >   pci_set_power_state(pdev, PCI_D0);
> > >   pci_restore_state(pdev);
> > > - pci_enable_device(pdev);
> > > + if ((ret = pci_enable_device(pdev))) {
> > > + printk(KERN_ERR "sk98lin: Cannot enable PCI device during 
> > > resume\n");
> > > + unregister_netdev(dev);
> > >
> > Having the device unregister seems harsh.
> 
> What would be the proper way? As the initialization failed, accessing the 
> device would not make sense any more (therefore I don't think that calling 
> skge_remove_one() would be OK, as it issues calls to SkEventQueue() and 
> SkEventDispatcher(), trying to send something to the card).

I guess, its just not clear what the state of the machine is anyway
if you can't enable the device something is hosed (or the device was
hot removed).

> > Why put condtional on same line?
> 
> Pardon me?

I prefer:
ret = pci_enable_device(pdev);
if (ret) {



> 
> > Why not print device name dev->name.
> 
> Thanks.
> 
> [PATCH] fix sk98lin driver, ignoring return value from pci_enable_device()
> 
> add check of return value to _resume() function of sk98lin driver.
> 
> Signed-off-by: Jiri Kosina <[EMAIL PROTECTED]>
> 
> --- 
> 
>  drivers/net/sk98lin/skge.c |6 +-
>  1 files changed, 5 insertions(+), 1 deletions(-)
> 
> diff --git a/drivers/net/sk98lin/skge.c b/drivers/net/sk98lin/skge.c
> index d4913c3..1f03cf8 100644
> --- a/drivers/net/sk98lin/skge.c
> +++ b/drivers/net/sk98lin/skge.c
> @@ -5070,7 +5070,11 @@ static int skge_resume(struct pci_dev *p
>  
>   pci_set_power_state(pdev, PCI_D0);
>   pci_restore_state(pdev);
> - pci_enable_device(pdev);
> + if ((ret = pci_enable_device(pdev))) {
> + printk(KERN_ERR "sk98lin: Cannot enable PCI device %s during 
> resume\n", 
> + dev->name);
> + return ret;
> + }
>   pci_set_master(pdev);
>   if (pAC->GIni.GIMacsFound == 2)
>   ret = request_irq(dev->irq, SkGeIsr, IRQF_SHARED, "sk98lin", 
> dev);
> 


-- 
Stephen Hemminger <[EMAIL PROTECTED]>
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] sk98lin: handle pci_enable_device() return value in skge_resume() properly

2006-10-12 Thread Jiri Kosina
On Thu, 12 Oct 2006, Stephen Hemminger wrote:

> > pci_set_power_state(pdev, PCI_D0);
> > pci_restore_state(pdev);
> > -   pci_enable_device(pdev);
> > +   if ((ret = pci_enable_device(pdev))) {
> > +   printk(KERN_ERR "sk98lin: Cannot enable PCI device during 
> > resume\n");
> > +   unregister_netdev(dev);
> >
> Having the device unregister seems harsh.

What would be the proper way? As the initialization failed, accessing the 
device would not make sense any more (therefore I don't think that calling 
skge_remove_one() would be OK, as it issues calls to SkEventQueue() and 
SkEventDispatcher(), trying to send something to the card).

> Why put condtional on same line?

Pardon me?

> Why not print device name dev->name.

Thanks.

[PATCH] fix sk98lin driver, ignoring return value from pci_enable_device()

add check of return value to _resume() function of sk98lin driver.

Signed-off-by: Jiri Kosina <[EMAIL PROTECTED]>

--- 

 drivers/net/sk98lin/skge.c |6 +-
 1 files changed, 5 insertions(+), 1 deletions(-)

diff --git a/drivers/net/sk98lin/skge.c b/drivers/net/sk98lin/skge.c
index d4913c3..1f03cf8 100644
--- a/drivers/net/sk98lin/skge.c
+++ b/drivers/net/sk98lin/skge.c
@@ -5070,7 +5070,11 @@ static int skge_resume(struct pci_dev *p
 
pci_set_power_state(pdev, PCI_D0);
pci_restore_state(pdev);
-   pci_enable_device(pdev);
+   if ((ret = pci_enable_device(pdev))) {
+   printk(KERN_ERR "sk98lin: Cannot enable PCI device %s during 
resume\n", 
+   dev->name);
+   return ret;
+   }
pci_set_master(pdev);
if (pAC->GIni.GIMacsFound == 2)
ret = request_irq(dev->irq, SkGeIsr, IRQF_SHARED, "sk98lin", 
dev);

-- 
Jiri Kosina
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/7] d80211: add support for SIOCSIWRATE and SIOCGIWRATE

2006-10-12 Thread mabbas

I am sorry for the late response. please read my comment bellow.

Jiri Benc wrote:

On Thu, 21 Sep 2006 09:59:39 -0700, mabbas wrote:
  
I can not see how does it break per-STA TX rate limit, especially 
PRISM2_HOSTAPD_SET_RATE_SETS almost doing the same thing. I am not saying

the patch is correct I just want to know how to fix it to get it in.



As Jouni wrote, it's not useful to change the per-radio rate table. You
want to limit the rates you are using to communicate with the current AP
while not limiting other virtual interfaces. (Imagine you have the card
that is capable to associate to two APs at the same time. You don't want to
limit rates for both APs with SIOCSIWRATE.)

To do that I think the following is needed:

1. Add 'allowed_rates' field to struct sta_info. It defaults to 0x.
(Or perhaps call it 'disabled_rates' and make it default to 0.)
  
Should I add the new field to sta_info or to ieee80211_sub_if_data. If 
we added to sta_info then it wont be persistent.
We will loose SIOCSIWRATE restriction once we associate with new AP. 
Then in 3
we bitmask sta->curr_rates with ieee80211_sub_if_data::allowed_rates and 
this will solve the problem for IBSS as well.

2. The SIOCSIWRATE handler: If the interface is not in a STA mode, return
-EOPNOTSUPP. Otherwise, modify the allowed_rates field of the sta entry
belonging to the current AP.

3. Bitmask sta->curr_rates with sta->allowed_rates (or
~sta->disabled_rates) in various places (ieee80211_ioctl_add_sta,
ieee80211_rx_mgmt_assoc_resp, ieee80211_rx_bss_info; please check for other
places).

In IBSS and AP mode setting this (per-STA, of course, which is not
supported by WE, btw.) can be useful as well but it can be done later.

 Jiri

  


Thanks
Mohamed
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/3] Collection of small NetLabel bugfixes

2006-10-12 Thread James Morris
On Wed, 11 Oct 2006, [EMAIL PROTECTED] wrote:

> When doing some more testing today I ran into a few bugs, this patchset
> addresses those bugs.  This patchset is backed against today's net-2.6 git
> tree.
> 
> Please apply these patches for 2.6.19, thanks.

Applied to
git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/selinux-net-2.6.git


-- 
James Morris
<[EMAIL PROTECTED]>
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] sk98lin: handle pci_enable_device() return value in skge_resume() properly

2006-10-12 Thread Stephen Hemminger
On Fri, 13 Oct 2006 00:17:50 +0200 (CEST)
Jiri Kosina <[EMAIL PROTECTED]> wrote:

> [PATCH] sk98lin: handle pci_enable_device() return value in skge_resume() 
> properly
> 
> Fix missing handling of pci_enable_device() return value in skge_resume() 
> 
> Signed-off-by: Jiri Kosina <[EMAIL PROTECTED]>
> 
> --- 
> 
>  drivers/net/sk98lin/skge.c |6 +-
>  1 files changed, 5 insertions(+), 1 deletions(-)
> 
> diff --git a/drivers/net/sk98lin/skge.c b/drivers/net/sk98lin/skge.c
> index 99e9262..e12fb62 100644
> --- a/drivers/net/sk98lin/skge.c
> +++ b/drivers/net/sk98lin/skge.c
> @@ -5070,7 +5070,11 @@ static int skge_resume(struct pci_dev *p
>  
>   pci_set_power_state(pdev, PCI_D0);
>   pci_restore_state(pdev);
> - pci_enable_device(pdev);
> + if ((ret = pci_enable_device(pdev))) {
> + printk(KERN_ERR "sk98lin: Cannot enable PCI device during 
> resume\n");
> + unregister_netdev(dev);
>

Having the device unregister seems harsh.
Why put condtional on same line?
Why not print device name dev->name.


-- 
Stephen Hemminger <[EMAIL PROTECTED]>
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] sk98lin: handle pci_enable_device() return value in skge_resume() properly

2006-10-12 Thread Jiri Kosina
[PATCH] sk98lin: handle pci_enable_device() return value in skge_resume() 
properly

Fix missing handling of pci_enable_device() return value in skge_resume() 

Signed-off-by: Jiri Kosina <[EMAIL PROTECTED]>

--- 

 drivers/net/sk98lin/skge.c |6 +-
 1 files changed, 5 insertions(+), 1 deletions(-)

diff --git a/drivers/net/sk98lin/skge.c b/drivers/net/sk98lin/skge.c
index 99e9262..e12fb62 100644
--- a/drivers/net/sk98lin/skge.c
+++ b/drivers/net/sk98lin/skge.c
@@ -5070,7 +5070,11 @@ static int skge_resume(struct pci_dev *p
 
pci_set_power_state(pdev, PCI_D0);
pci_restore_state(pdev);
-   pci_enable_device(pdev);
+   if ((ret = pci_enable_device(pdev))) {
+   printk(KERN_ERR "sk98lin: Cannot enable PCI device during 
resume\n");
+   unregister_netdev(dev);
+   return ret;
+   }
pci_set_master(pdev);
if (pAC->GIni.GIMacsFound == 2)
ret = request_irq(dev->irq, SkGeIsr, IRQF_SHARED, "sk98lin", 
dev);

-- 
Jiri Kosina
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] [NET] reduce sizeof(struct inet_peer), cleanup, change in peer_check_expire()

2006-10-12 Thread David Miller
From: Eric Dumazet <[EMAIL PROTECTED]>
Date: Thu, 12 Oct 2006 22:14:12 +0200

> 1) shrink struct inet_peer on 64 bits platforms.
> 
> I noticed sizeof(struct inet_peer) was 64+8 on x86_64
> 
> As we dont really need 64 bits timestamps (we only care for garbage 
> collection), we can use 32bits ones and reduce sizeof(struct inet_peer) to 64 
> bytes : Because of SLAB_HWCACHE_ALIGN constraint, final allocation is 64 
> bytes 
> instead of 128 bytes per inet_peer structure.

I'm not convinced this is %100 correct.  There are wrapping
cases that I think aren't covered.

Consider an entry that lives long enough for the lower 32-bits
of jiffies to wrap, then we kill it, but we won't purge it
properly if the wrapped jiffie is close to dtime.

I'm sure there are other similar cases as well.

> 2) Cleanup
> --
>   inet_putpeer() is not anymore inlined in inetpeer.h, as this is not called 
> in fast paths, to reduce text size. Some exports are not anymore needed 
> (inet_peer_unused_lock, inet_peer_unused_tailp) and can be declared static.
> 
> 3) No more hard limit (PEER_MAX_CLEANUP_WORK = 30)
> --
>   peer_check_expire() try to delete entries for at most one timer tick. CPUS 
> are going faster, hard limits are becoming useless... Similar thing is done 
> in 
> net/ipv4/route.c garbage collector.

These parts are fine.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch] Performance enhancement patches for SB1250 MAC

2006-10-12 Thread Stephen Hemminger
On Thu, 12 Oct 2006 14:54:33 -0700
"Yang, Steve" <[EMAIL PROTECTED]> wrote:

> FYI ...
> 
> Regards, 
> Steve Yang
> 
> -Original Message-
> From: Yang, Steve 
> Sent: Monday, September 25, 2006 3:50 PM
> To: '[EMAIL PROTECTED]'
> Cc: '[EMAIL PROTECTED]'; 'Mark E Mason'
> Subject: Performance enhancement patches for SB1250 MAC
> 
> Hi,
> 
> The attached are two network performance enhancement patches for SB1250
> MAC. The NAPI patch applies first. Followed by the "skb cache" patch.
> They applied and builds cleanly on 2.6.18 kernel for the following
> kernel option combinations:
> 
> SBMAC_NAPIno  yes yes
> SKB_CACHE no  no  yes
>  
> Regards,
> Steve Yang
> 

NAK on the SKB_CACHE it is idea that just ends up favoring your
driver at the expense of the rest of the system. Also, there are
resource/memory starvation issues and probably other races as well.

I bet it makes your benchmark run faster, but it doesn't belong in
normal kernel

-- 
Stephen Hemminger <[EMAIL PROTECTED]>
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Suppress / delay SYN-ACK

2006-10-12 Thread jamal
On Thu, 2006-12-10 at 14:58 -0700, Caitlin Bestler wrote:

> That would seem to limit the usefullness to scenarios where a given
> remote IP address *might* be accepted based on total traffic load,
> number of other connections from the same IP address, etc.  If
> *all* requests from that IP address are going to be rejected, why
> not use netfilter?

Netfilter or ingress tc may both work; 
I have a feeling that the poster needs to consult some policy+state in
the application first which is more complex than what rate control or
number of connections provide (DOS detection?)- in which case, theyd
have to write a netfilter target.

cheers,
jamal


-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Suppress / delay SYN-ACK

2006-10-12 Thread Caitlin Bestler

On 10/12/06, Rick Jones <[EMAIL PROTECTED]> wrote:

Martin Schiller wrote:
> Hi!
>
> I'm searching for a solution to suppress / delay the SYN-ACK packet of a
> listening server (-application) until he has decided (e.g. analysed the
> requesting ip-address or checked if the corresponding other end of a
> connection is available) if he wants to accept the connect request of the
> client. If not, it should be possible to reject the connect request.

How often do you expect the incomming call to be rejected?  I suspect that would
have a significant effect on whether the whole thing is worthwhile.

rick jones



More to the point, on what basis would the application be rejecting a
connection request based solely on the SYN?

There are only two pieces of information available: the remote IP
address and port, and the total number of pending requests. The
latter is already addressed through the backlog size, and netfilter
rules can already be used to reject based on IP address.

That would seem to limit the usefullness to scenarios where a given
remote IP address *might* be accepted based on total traffic load,
number of other connections from the same IP address, etc.  If
*all* requests from that IP address are going to be rejected, why
not use netfilter?
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[patch] Performance enhancement patches for SB1250 MAC

2006-10-12 Thread Yang, Steve
FYI ...

Regards, 
Steve Yang

-Original Message-
From: Yang, Steve 
Sent: Monday, September 25, 2006 3:50 PM
To: '[EMAIL PROTECTED]'
Cc: '[EMAIL PROTECTED]'; 'Mark E Mason'
Subject: Performance enhancement patches for SB1250 MAC

Hi,

The attached are two network performance enhancement patches for SB1250
MAC. The NAPI patch applies first. Followed by the "skb cache" patch.
They applied and builds cleanly on 2.6.18 kernel for the following
kernel option combinations:

SBMAC_NAPI  no  yes yes
SKB_CACHE   no  no  yes
 
Regards,
Steve Yang



mips-sb1250-mac-NAPI.patch
Description: mips-sb1250-mac-NAPI.patch


sb1250mac_skb_cache.patch
Description: sb1250mac_skb_cache.patch


Re: [PATCH] bridge: flush forwarding table when device carrier off

2006-10-12 Thread David Miller
From: Stephen Hemminger <[EMAIL PROTECTED]>
Date: Thu, 12 Oct 2006 11:24:31 -0700

> Flush the forwarding table when carrier is lost. This helps for
> availability because we don't want to forward to a downed device and
> new packets may come in on other links.
> 
> Signed-off-by: Stephen Hemminger <[EMAIL PROTECTED]>
 ...
> + if (f->is_static & !do_all)
> + continue;

Applied with "&" changed to "&&" as mentioned elsewhere :)
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] bridge: flush forwarding table when device carrier off

2006-10-12 Thread jamal
On Thu, 2006-12-10 at 14:32 -0700, Stephen Hemminger wrote:

> > I am on the other extreme - this is problematic if you have a large
> > table already learnt. Agrevate that with an unstable link and it gets a
> > lot worse. Both of which dont sound unrealistic in say a wireless AP.
> 
> We don't support bridging wireless, that requires some NDS stuff that
> isn't supported, and requires more softmac than the stack has.
> 

I was more thinking of wireless-to-ethernet bridging; that should still
work, no? i.e say eth1 on wireless with eth0 on the wired side? 
In any case, that may be a bad example (and a digression) of something
that learns large tables. I have however seen 1K entries in bridging.

> > A more sane policy i have seen is a timer that flushes the table after a
> > programmed period; this way you counter a flipflop-ing link.
> 
> That's already there.
> 

ah, ok. So the patch is in an alternative to this then? 

> > IOW, the best place is to have this in some user space daemon. If it has
> > to be in the kernel, can you add a systcl to disable it?
> > 
> 
> When RSTP is in userspace, it will do the flushing.

Cool. And that makes a lot of sense.

cheers,
jamal

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] bridge: flush forwarding table when device carrier off

2006-10-12 Thread Stephen Hemminger
On Thu, 12 Oct 2006 17:30:33 -0400
jamal <[EMAIL PROTECTED]> wrote:

> On Thu, 2006-12-10 at 16:10 -0400, Andy Gospodarek wrote:
> > On Thu, Oct 12, 2006 at 11:24:31AM -0700, Stephen Hemminger wrote:
> > > Flush the forwarding table when carrier is lost. This helps for
> > > availability because we don't want to forward to a downed device and
> > > new packets may come in on other links.
> > > 
> > > Signed-off-by: Stephen Hemminger <[EMAIL PROTECTED]>
> > > 
> > 
> > Stephen,
> > 
> > This is an excellent idea 
> 
> I am on the other extreme - this is problematic if you have a large
> table already learnt. Agrevate that with an unstable link and it gets a
> lot worse. Both of which dont sound unrealistic in say a wireless AP.

We don't support bridging wireless, that requires some NDS stuff that
isn't supported, and requires more softmac than the stack has.

> A more sane policy i have seen is a timer that flushes the table after a
> programmed period; this way you counter a flipflop-ing link.

That's already there.

> IOW, the best place is to have this in some user space daemon. If it has
> to be in the kernel, can you add a systcl to disable it?
> 

When RSTP is in userspace, it will do the flushing.

-- 
Stephen Hemminger <[EMAIL PROTECTED]>
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] bridge: flush forwarding table when device carrier off

2006-10-12 Thread jamal
On Thu, 2006-12-10 at 16:10 -0400, Andy Gospodarek wrote:
> On Thu, Oct 12, 2006 at 11:24:31AM -0700, Stephen Hemminger wrote:
> > Flush the forwarding table when carrier is lost. This helps for
> > availability because we don't want to forward to a downed device and
> > new packets may come in on other links.
> > 
> > Signed-off-by: Stephen Hemminger <[EMAIL PROTECTED]>
> > 
> 
> Stephen,
> 
> This is an excellent idea 

I am on the other extreme - this is problematic if you have a large
table already learnt. Agrevate that with an unstable link and it gets a
lot worse. Both of which dont sound unrealistic in say a wireless AP.

A more sane policy i have seen is a timer that flushes the table after a
programmed period; this way you counter a flipflop-ing link.
IOW, the best place is to have this in some user space daemon. If it has
to be in the kernel, can you add a systcl to disable it?

cheers,
jamal



-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] e1000: Real time packets and bytes statistics

2006-10-12 Thread Stephen Hemminger
On Thu, 12 Oct 2006 23:02:52 +0200
Jean Delvare <[EMAIL PROTECTED]> wrote:

> Hi Stephen,
> 
> On 10/11/06, Stephen Hemminger wrote:
> > On Wed, 11 Oct 2006, Jesse Brandeburg wrote:
> > > On 10/11/06, Jean Delvare wrote:
> > > > Let the e1000 driver report the most important statistics (rx/tx_bytes
> > > > and rx/tx_packets) in real time, rather than every other second. This
> > > > is similar to what the e100 driver is doing.
> > > > (...)
> > > > --- linux-2.6.19-rc1.orig/drivers/net/e1000/e1000_main.c
> > > > 2006-10-11 10:53:49.0 +0200
> > > > +++ linux-2.6.19-rc1/drivers/net/e1000/e1000_main.c 2006-10-11 
> > > > 11:34:41.0 +0200
> > > > @@ -3118,6 +3118,8 @@
> > > >e1000_tx_map(adapter, tx_ring, skb, first,
> > > > max_per_txd, nr_frags, mss));
> > > >
> > > > +   adapter->net_stats.tx_packets++;
> > > > +   adapter->net_stats.tx_bytes += skb->len;
> > > > netdev->trans_start = jiffies;
> > > 
> > > this is the part I'm most worried about.  as I believe it to be
> > > incorrect for TSO packets.  Maybe something like?
> > > +   if (skb_shinfo(skb)->gso_segs)
> > > +  adapter->net_stats.tx_packets += skb_shinfo(skb)->gso_segs;
> > > +   else
> > > +  adapter->net_stats.tx_packets++;
> > > +   adapter->net_stats.tx_bytes += skb->len;
> > > netdev->trans_start = jiffies;
> > > 
> > > skb len will still be off by some amount, because the skb->data
> > > (header) is replicated across each gso segment but only counted once
> > > this way, but hopefully someone will pipe up with a good way to
> > > compute that.
> > 
> > You might want to put the tx values in a per-cpu structure and
> > sum later.  Incrementing statistics can actually be a performance
> > bottleneck on SMP tests, because it causes lots of cache thrashing.
> 
> I don't really see how this would be implemented. Can you please point
> me to other drivers which do it that way?
> 
> Thanks,

Loopback (drivers/net/loopback.c) does it, but it is simpler since it doesn't
have to support multiple interfaces. In a normal driver you would have to use
indirection and alloc_percpu() like af_inet.c does.

-- 
Stephen Hemminger <[EMAIL PROTECTED]>
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] e1000: Real time packets and bytes statistics

2006-10-12 Thread Jean Delvare
Hi Jesse,

On 10/11/06, Jesse Brandeburg wrote:
> On 10/11/06, Jean Delvare wrote:
> > Let the e1000 driver report the most important statistics (rx/tx_bytes
> > and rx/tx_packets) in real time, rather than every other second. This
> > is similar to what the e100 driver is doing.
> >
> > The current asynchronous statistics refresh model makes it impossible
> > to monitor the network traffic with an interval which isn't a multiple
> > of 2 seconds. For example, an interval of 5 seconds would result in a
> > sawtooth diagram (+20%, -20%) for a constant transfer rate. With a 1
> > second interval it's even worse (0, 200%) of course. This has been
> > annoying users for years, but was never actually fixed:
> 
> I think the idea is good, however, see below.

Good news :)

> > I additionally noted a difference of 6 bytes on some TX frames, which
> > I am not able to explain. It's probably small and rare enough not to
> > be considered a problem, but if someone can explain it, I would be
> > grateful.
> 
> now, that sounds odd, however, once again, see below.

What you say below about TSO can't possibly explain this difference, as
your fix is about tx_packets while the difference I observed was on
tx_bytes only, the packet count was always correct. I'll investigate
tomorrow, if I can find a pattern for these differences I might discover
what these bytes are. For now the only idea I have is that 6 bytes is
ETH_ALEN, the size of an ethernet MAC address - but that doesn't
explain anything per se.

> >  drivers/net/e1000/e1000_main.c |   14 ++
> >  1 file changed, 10 insertions(+), 4 deletions(-)
> >
> > --- linux-2.6.19-rc1.orig/drivers/net/e1000/e1000_main.c2006-10-11 
> > 10:53:49.0 +0200
> > +++ linux-2.6.19-rc1/drivers/net/e1000/e1000_main.c 2006-10-11 
> > 11:34:41.0 +0200
> > @@ -3118,6 +3118,8 @@
> >e1000_tx_map(adapter, tx_ring, skb, first,
> > max_per_txd, nr_frags, mss));
> >
> > +   adapter->net_stats.tx_packets++;
> > +   adapter->net_stats.tx_bytes += skb->len;
> > netdev->trans_start = jiffies;
> 
> this is the part I'm most worried about.  as I believe it to be
> incorrect for TSO packets.  Maybe something like?

I have to admit I have very little experience with network drivers and
I didn't have the slightest idea what TSO was until I looked into
wikipedia two minutes ago. So I seem to understand that the skb
structure in the code above could correspond to several packets sent on
the wire by the ethernet adapter when TSO is used? Seems to be very
recent, 2.6.16 didn't have that.

> +   if (skb_shinfo(skb)->gso_segs)
> +  adapter->net_stats.tx_packets += skb_shinfo(skb)->gso_segs;
> +   else
> +  adapter->net_stats.tx_packets++;
> +   adapter->net_stats.tx_bytes += skb->len;
> netdev->trans_start = jiffies;

My comparisons between hardware and software-computed statistics did
not reveal any difference with regards to tx_packets, while there
should have been one if the change above is needed. This suggests that
my tests (run on 2.6.18) did not trigger any TSO packet? Can you
suggest a way to generate such packets so that I am sure I exercise
this code path?

I found an inline function in include/net/tcp.h, tcp_skb_pcount(),
which evaluates to skb_shinfo(skb)->gso_segs. I guess I should use that
instead of the above?

> skb len will still be off by some amount, because the skb->data
> (header) is replicated across each gso segment but only counted once
> this way, but hopefully someone will pipe up with a good way to
> compute that.

And nearby there is another inline function, tcp_skb_mss(), which
evaluates to skb_shinfo(skb)->gso_size... I can't experiment with that
until I know a way to trigger TSO packets, but could it be the size
we're after?

Thanks a lot for your guidance.

-- 
Jean Delvare
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] e1000: Real time packets and bytes statistics

2006-10-12 Thread Jean Delvare
Hi Stephen,

On 10/11/06, Stephen Hemminger wrote:
> On Wed, 11 Oct 2006, Jesse Brandeburg wrote:
> > On 10/11/06, Jean Delvare wrote:
> > > Let the e1000 driver report the most important statistics (rx/tx_bytes
> > > and rx/tx_packets) in real time, rather than every other second. This
> > > is similar to what the e100 driver is doing.
> > > (...)
> > > --- linux-2.6.19-rc1.orig/drivers/net/e1000/e1000_main.c
> > > 2006-10-11 10:53:49.0 +0200
> > > +++ linux-2.6.19-rc1/drivers/net/e1000/e1000_main.c 2006-10-11 
> > > 11:34:41.0 +0200
> > > @@ -3118,6 +3118,8 @@
> > >e1000_tx_map(adapter, tx_ring, skb, first,
> > > max_per_txd, nr_frags, mss));
> > >
> > > +   adapter->net_stats.tx_packets++;
> > > +   adapter->net_stats.tx_bytes += skb->len;
> > > netdev->trans_start = jiffies;
> > 
> > this is the part I'm most worried about.  as I believe it to be
> > incorrect for TSO packets.  Maybe something like?
> > +   if (skb_shinfo(skb)->gso_segs)
> > +  adapter->net_stats.tx_packets += skb_shinfo(skb)->gso_segs;
> > +   else
> > +  adapter->net_stats.tx_packets++;
> > +   adapter->net_stats.tx_bytes += skb->len;
> > netdev->trans_start = jiffies;
> > 
> > skb len will still be off by some amount, because the skb->data
> > (header) is replicated across each gso segment but only counted once
> > this way, but hopefully someone will pipe up with a good way to
> > compute that.
> 
> You might want to put the tx values in a per-cpu structure and
> sum later.  Incrementing statistics can actually be a performance
> bottleneck on SMP tests, because it causes lots of cache thrashing.

I don't really see how this would be implemented. Can you please point
me to other drivers which do it that way?

Thanks,
-- 
Jean Delvare
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] bridge: flush forwarding table when device carrier off

2006-10-12 Thread Stephen Hemminger
On Thu, 12 Oct 2006 16:10:44 -0400
Andy Gospodarek <[EMAIL PROTECTED]> wrote:

> On Thu, Oct 12, 2006 at 11:24:31AM -0700, Stephen Hemminger wrote:
> > Flush the forwarding table when carrier is lost. This helps for
> > availability because we don't want to forward to a downed device and
> > new packets may come in on other links.
> > 
> > Signed-off-by: Stephen Hemminger <[EMAIL PROTECTED]>
> > 
> 
> Stephen,
> 
> This is an excellent idea and all looks good except this check
> 
> + if (f->is_static & !do_all)
> + continue;
> 
> should be this: 
> 
> + if (f->is_static && !do_all)
> + continue;
> 

Agreed, but it probably worked during testing because both flags
are strict booleans (ie 0/1)


-- 
Stephen Hemminger <[EMAIL PROTECTED]>
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Pull request for 'jg-20061012-00' tag

2006-10-12 Thread Francois Romieu
Please pull from tag 'jg-20061012-00' in repository

git://electric-eye.fr.zoreil.com/home/romieu/linux-2.6.git jg-20061012-00

to get the changes below.

Distance from 'upstream-fixes'
-

733b736c91dd2c556f35dffdcf77e667cf10cefc
73f5e28b336772c4b08ee82e5bf28ab872898ee1

Diffstat


 drivers/net/r8169.c |7 ++-
 1 files changed, 6 insertions(+), 1 deletions(-)

Shortlog


Andrew Morton:
  r8169: PCI ID for Corega Gigabit network card

Arnaud Patard:
  r8169: fix infinite loop during hotplug

Patch
-

diff --git a/drivers/net/r8169.c b/drivers/net/r8169.c
index 4c47c5b..c2c9a86 100644
--- a/drivers/net/r8169.c
+++ b/drivers/net/r8169.c
@@ -214,6 +214,7 @@ static struct pci_device_id rtl8169_pci_
{ PCI_DEVICE(PCI_VENDOR_ID_REALTEK, 0x8168), 0, 0, RTL_CFG_2 },
{ PCI_DEVICE(PCI_VENDOR_ID_REALTEK, 0x8169), 0, 0, RTL_CFG_0 },
{ PCI_DEVICE(PCI_VENDOR_ID_DLINK,   0x4300), 0, 0, RTL_CFG_0 },
+   { PCI_DEVICE(0x1259,0xc107), 0, 0, RTL_CFG_0 },
{ PCI_DEVICE(0x16ec,0x0116), 0, 0, RTL_CFG_0 },
{ PCI_VENDOR_ID_LINKSYS,0x1032,
PCI_ANY_ID, 0x0024, 0, 0, RTL_CFG_0 },
@@ -2701,6 +2702,7 @@ static void rtl8169_down(struct net_devi
struct rtl8169_private *tp = netdev_priv(dev);
void __iomem *ioaddr = tp->mmio_addr;
unsigned int poll_locked = 0;
+   unsigned int intrmask;
 
rtl8169_delete_timer(dev);
 
@@ -2739,8 +2741,11 @@ core_down:
 * 2) dev->change_mtu
 *-> rtl8169_poll can not be issued again and re-enable the
 *   interruptions. Let's simply issue the IRQ down sequence again.
+*
+* No loop if hotpluged or major error (0x).
 */
-   if (RTL_R16(IntrMask))
+   intrmask = RTL_R16(IntrMask);
+   if (intrmask && (intrmask != 0x))
goto core_down;
 
rtl8169_tx_clear(tp);
-- 
Ueimor
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] [NET] reduce sizeof(struct inet_peer), cleanup, change in peer_check_expire()

2006-10-12 Thread Eric Dumazet

Hi David

Please find this patch against include/net/inetpeer.h and net/ipv4/inetpeer.c

1) shrink struct inet_peer on 64 bits platforms.

I noticed sizeof(struct inet_peer) was 64+8 on x86_64

As we dont really need 64 bits timestamps (we only care for garbage 
collection), we can use 32bits ones and reduce sizeof(struct inet_peer) to 64 
bytes : Because of SLAB_HWCACHE_ALIGN constraint, final allocation is 64 bytes 
instead of 128 bytes per inet_peer structure.


2) Cleanup
--
 inet_putpeer() is not anymore inlined in inetpeer.h, as this is not called 
in fast paths, to reduce text size. Some exports are not anymore needed 
(inet_peer_unused_lock, inet_peer_unused_tailp) and can be declared static.


3) No more hard limit (PEER_MAX_CLEANUP_WORK = 30)
--
 peer_check_expire() try to delete entries for at most one timer tick. CPUS 
are going faster, hard limits are becoming useless... Similar thing is done in 
net/ipv4/route.c garbage collector.


Signed-off-by: Eric Dumazet <[EMAIL PROTECTED]>

--- linux-2.6.18/include/net/inetpeer.h Wed Sep 20 05:42:06 2006
+++ linux-2.6.18-ed/include/net/inetpeer.h  Thu Oct 12 21:40:28 2006
@@ -19,7 +19,7 @@
 {
struct inet_peer*avl_left, *avl_right;
struct inet_peer*unused_next, **unused_prevp;
-   unsigned long   dtime;  /* the time of last use of not
+   __u32   dtime;  /* the time of last use of not
 * referenced entries */
atomic_trefcnt;
__u32   v4daddr;/* peer's address */
@@ -35,21 +35,8 @@
 /* can be called with or without local BH being disabled */
 struct inet_peer   *inet_getpeer(__u32 daddr, int create);
 
-extern spinlock_t inet_peer_unused_lock;
-extern struct inet_peer **inet_peer_unused_tailp;
 /* can be called from BH context or outside */
-static inline void inet_putpeer(struct inet_peer *p)
-{
-   spin_lock_bh(&inet_peer_unused_lock);
-   if (atomic_dec_and_test(&p->refcnt)) {
-   p->unused_prevp = inet_peer_unused_tailp;
-   p->unused_next = NULL;
-   *inet_peer_unused_tailp = p;
-   inet_peer_unused_tailp = &p->unused_next;
-   p->dtime = jiffies;
-   }
-   spin_unlock_bh(&inet_peer_unused_lock);
-}
+extern void inet_putpeer(struct inet_peer *p);
 
 extern spinlock_t inet_peer_idlock;
 /* can be called with or without local BH being disabled */
--- linux-2.6.18/net/ipv4/inetpeer.cWed Sep 20 05:42:06 2006
+++ linux-2.6.18-ed/net/ipv4/inetpeer.c Thu Oct 12 21:55:23 2006
@@ -94,10 +94,8 @@
 int inet_peer_maxttl = 10 * 60 * HZ;   /* usual time to live: 10 min */
 
 static struct inet_peer *inet_peer_unused_head;
-/* Exported for inet_putpeer inline function.  */
-struct inet_peer **inet_peer_unused_tailp = &inet_peer_unused_head;
-DEFINE_SPINLOCK(inet_peer_unused_lock);
-#define PEER_MAX_CLEANUP_WORK 30
+static struct inet_peer **inet_peer_unused_tailp = &inet_peer_unused_head;
+static DEFINE_SPINLOCK(inet_peer_unused_lock);
 
 static void peer_check_expire(unsigned long dummy);
 static DEFINE_TIMER(peer_periodic_timer, peer_check_expire, 0, 0);
@@ -343,7 +341,8 @@
spin_lock_bh(&inet_peer_unused_lock);
p = inet_peer_unused_head;
if (p != NULL) {
-   if (time_after(p->dtime + ttl, jiffies)) {
+   __u32 delta = (__u32)jiffies - p->dtime;
+   if (delta < ttl) {
/* Do not prune fresh entries. */
spin_unlock_bh(&inet_peer_unused_lock);
return -1;
@@ -435,7 +434,7 @@
 /* Called with local BH disabled. */
 static void peer_check_expire(unsigned long dummy)
 {
-   int i;
+   unsigned long now = jiffies;
int ttl;
 
if (peer_total >= inet_peer_threshold)
@@ -444,7 +443,10 @@
ttl = inet_peer_maxttl
- (inet_peer_maxttl - inet_peer_minttl) / HZ *
peer_total / inet_peer_threshold * HZ;
-   for (i = 0; i < PEER_MAX_CLEANUP_WORK && !cleanup_once(ttl); i++);
+   while (!cleanup_once(ttl)) {
+   if (jiffies != now)
+   break;
+   }
 
/* Trigger the timer after inet_peer_gc_mintime .. inet_peer_gc_maxtime
 * interval depending on the total number of entries (more entries,
@@ -458,3 +460,16 @@
peer_total / inet_peer_threshold * HZ;
add_timer(&peer_periodic_timer);
 }
+
+void inet_putpeer(struct inet_peer *p)
+{
+   spin_lock_bh(&inet_peer_unused_lock);
+   if (atomic_dec_and_test(&p->refcnt)) {
+   p->unused_prevp = inet_peer_unused_tailp;
+   p->unused_next = NULL;
+   *inet_peer_unused_tailp = p;
+   inet_

Re: [PATCH] bridge: flush forwarding table when device carrier off

2006-10-12 Thread Andy Gospodarek
On Thu, Oct 12, 2006 at 11:24:31AM -0700, Stephen Hemminger wrote:
> Flush the forwarding table when carrier is lost. This helps for
> availability because we don't want to forward to a downed device and
> new packets may come in on other links.
> 
> Signed-off-by: Stephen Hemminger <[EMAIL PROTECTED]>
> 

Stephen,

This is an excellent idea and all looks good except this check

+   if (f->is_static & !do_all)
+   continue;

should be this: 

+   if (f->is_static && !do_all)
+   continue;

I'll ACK a repost with that change.

-andy


-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: user of the jiffies rounding code: Networking

2006-10-12 Thread Auke Kok

Arjan van de Ven wrote:

From: Arjan van de Ven <[EMAIL PROTECTED]>
Subject: round_jiffies users
CC: [EMAIL PROTECTED]
CC: netdev@vger.kernel.org

This patch introduces users of the round_jiffies() function in the networking 
code.

These timers all were of the "about once a second" or "about once every X seconds" 
variety and several showed up in the "what wakes the cpu up" profiles that
the tickless patches provide. Some timers are highly dynamic based on network load; but 
even on low activity systems they still show up so the rounding is done only in cases of

low activity, allowing higher frequency timers in the high activity case.

The various hardware watchdogs are an obvious case; they run every 2 seconds 
but aren't otherwise
specific of exactly when they need to run.

Signed-off-by: Arjan van de Ven <[EMAIL PROTECTED]>

Index: linux-2.6.19-rc1-git6/net/core/dst.c
===
--- linux-2.6.19-rc1-git6.orig/net/core/dst.c
+++ linux-2.6.19-rc1-git6/net/core/dst.c
@@ -99,7 +99,14 @@ static void dst_run_gc(unsigned long dum
printk("dst_total: %d/%d %ld\n",
   atomic_read(&dst_total), delayed,  dst_gc_timer_expires);
 #endif
-   mod_timer(&dst_gc_timer, jiffies + dst_gc_timer_expires);
+   /* if the next desired timer is more than 4 seconds in the future
+* then round the timer to whole seconds
+*/
+   if (dst_gc_timer_expires > 4*HZ)
+   mod_timer(&dst_gc_timer,
+   round_jiffies(jiffies + dst_gc_timer_expires));
+   else
+   mod_timer(&dst_gc_timer, jiffies + dst_gc_timer_expires);
 
 out:

spin_unlock(&dst_lock);
Index: linux-2.6.19-rc1-git6/net/core/neighbour.c
===
--- linux-2.6.19-rc1-git6.orig/net/core/neighbour.c
+++ linux-2.6.19-rc1-git6/net/core/neighbour.c
@@ -695,7 +695,10 @@ next_elt:
if (!expire)
expire = 1;
 
- 	mod_timer(&tbl->gc_timer, now + expire);

+   if (expire>HZ)
+   mod_timer(&tbl->gc_timer, round_jiffies(now + expire));
+   else
+   mod_timer(&tbl->gc_timer, now + expire);
 
 	write_unlock(&tbl->lock);

 }
Index: linux-2.6.19-rc1-git6/net/sched/sch_generic.c
===
--- linux-2.6.19-rc1-git6.orig/net/sched/sch_generic.c
+++ linux-2.6.19-rc1-git6/net/sched/sch_generic.c
@@ -209,7 +209,7 @@ static void dev_watchdog(unsigned long a
   dev->name);
dev->tx_timeout(dev);
}
-   if (!mod_timer(&dev->watchdog_timer, jiffies + 
dev->watchdog_timeo))
+   if (!mod_timer(&dev->watchdog_timer, round_jiffies(jiffies 
+ dev->watchdog_timeo)))
dev_hold(dev);
}
}
Index: linux-2.6.19-rc1-git6/drivers/net/e1000/e1000_main.c
===
--- linux-2.6.19-rc1-git6.orig/drivers/net/e1000/e1000_main.c
+++ linux-2.6.19-rc1-git6/drivers/net/e1000/e1000_main.c
@@ -483,7 +483,7 @@ e1000_up(struct e1000_adapter *adapter)
 
 	clear_bit(__E1000_DOWN, &adapter->flags);
 
-	mod_timer(&adapter->watchdog_timer, jiffies + 2 * HZ);

+   mod_timer(&adapter->watchdog_timer, round_jiffies(jiffies + 2 * HZ));
return 0;
 }
 
@@ -2493,7 +2493,7 @@ e1000_watchdog(unsigned long data)
 
 			netif_carrier_on(netdev);

netif_wake_queue(netdev);
-   mod_timer(&adapter->phy_info_timer, jiffies + 2 * HZ);
+   mod_timer(&adapter->phy_info_timer, 
round_jiffies(jiffies + 2 * HZ));
adapter->smartspeed = 0;
}
} else {
@@ -2503,7 +2503,7 @@ e1000_watchdog(unsigned long data)
DPRINTK(LINK, INFO, "NIC Link is Down\n");
netif_carrier_off(netdev);
netif_stop_queue(netdev);
-   mod_timer(&adapter->phy_info_timer, jiffies + 2 * HZ);
+   mod_timer(&adapter->phy_info_timer, 
round_jiffies(jiffies + 2 * HZ));
 
 			/* 80003ES2LAN workaround--

 * For packet buffer work-around on link down event;
@@ -2568,7 +2568,7 @@ e1000_watchdog(unsigned long data)
e1000_rar_set(&adapter->hw, adapter->hw.mac_addr, 0);
 
 	/* Reset the timer */

-   mod_timer(&adapter->watchdog_timer, jiffies + 2 * HZ);
+   mod_timer(&adapter->watchdog_timer, round_jiffies(jiffies + 2 * HZ));
 }
 
 #define E1000_TX_FLAGS_CSUM		0x0001


-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html



For the e1000 parts, but in general too:

Acked-by: Auke Kok <[EMAIL PROTECTED]>

Cheers
-
To unsubscri

Re: Dropping NETIF_F_SG since no checksum feature.

2006-10-12 Thread Michael S. Tsirkin
Quoting r. David Miller <[EMAIL PROTECTED]>:
> Subject: Re: Dropping NETIF_F_SG since no checksum feature.
> 
> From: "Michael S. Tsirkin" <[EMAIL PROTECTED]>
> Date: Wed, 11 Oct 2006 23:23:39 +0200
> 
> > With my patch, there is a huge performance gain by increasing MTU to 64K.
> > And it seems the only way to do this is by S/G.
> 
> Numbers?
> 

I created two subnets on top of the same pair infiniband HCAs:

[EMAIL PROTECTED] ~]# ifconfig ib0
ib0   Link encap:UNSPEC  HWaddr
00-00-04-04-FE-80-00-00-00-00-00-00-00-00-00-00
  inet addr:12.4.3.69  Bcast:12.255.255.255  Mask:255.0.0.0
  inet6 addr: fe80::202:c902:20:ee45/64 Scope:Link
  UP BROADCAST RUNNING MULTICAST  MTU:2044  Metric:1
  RX packets:1382531 errors:0 dropped:0 overruns:0 frame:0
  TX packets:2725206 errors:0 dropped:5 overruns:0 carrier:0
  collisions:0 txqueuelen:128
  RX bytes:71892772 (68.5 MiB)  TX bytes:5290011992 (4.9 GiB)

[EMAIL PROTECTED] ~]# ifconfig ibc0
ibc0  Link encap:UNSPEC  HWaddr
00-03-04-06-FE-80-00-00-00-00-00-00-00-00-00-00
  inet addr:11.4.3.69  Bcast:11.255.255.255  Mask:255.0.0.0
  inet6 addr: fe80::202:c902:20:ee45/64 Scope:Link
  UP BROADCAST RUNNING MULTICAST  MTU:65484  Metric:1
  RX packets:115647 errors:0 dropped:0 overruns:0 frame:0
  TX packets:253403 errors:0 dropped:4 overruns:0 carrier:0
  collisions:0 txqueuelen:128
  RX bytes:6014720 (5.7 MiB)  TX bytes:16589589008 (15.4 GiB)

The other side was configured with 12.4.3.68 for MTU 65484
and 11.4.3.68 for MTU 2044. 

And then I just run netperf:
[EMAIL PROTECTED] ~]#
[EMAIL PROTECTED] ~]# /mswg/work/mst/netperf-2.4.2/src/netperf -f M -H 
12.4.3.68 -c -C
TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 12.4.3.68 (12.4.3.68)
port 0 AF_INET
Recv   SendSend  Utilization   Service Demand
Socket Socket  Message  Elapsed  Send Recv SendRecv
Size   SizeSize Time Throughput  localremote   local   remote
bytes  bytes   bytessecs.MBytes  /s  % S  % S  us/KB   us/KB

 87380  16384  1638410.00   286.45   40.2025.285.482   3.448

[EMAIL PROTECTED] ~]# /mswg/work/mst/netperf-2.4.2/src/netperf -f M -H 11.4.3.68
TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 11.4.3.68 (11.4.3.68)
port 0 AF_INET
Recv   SendSend
Socket Socket  Message  Elapsed
Size   SizeSize Time Throughput
bytes  bytes   bytessecs.MBytes/sec

 87380  16384  1638410.01 782.55

This is all very preliminary - but I hope you get the idea -
increasing MTU is very helpful for infiniband, and infiniband adapters
handle large S/G lists without problems, but the verbs
do not include support for IP checksums, so these must be done in software.

So what we would like, is for the infiniband network device to say
"I don't support checksums, I only support S/G" and then for
network layer to do the checksumming for us piggybacking on data copy
at least for cases where it does perform the copy.

Does this makes sense now?

-- 
MST
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


user of the jiffies rounding code: Networking

2006-10-12 Thread Arjan van de Ven
From: Arjan van de Ven <[EMAIL PROTECTED]>
Subject: round_jiffies users
CC: [EMAIL PROTECTED]
CC: netdev@vger.kernel.org

This patch introduces users of the round_jiffies() function in the networking 
code.

These timers all were of the "about once a second" or "about once every X 
seconds" 
variety and several showed up in the "what wakes the cpu up" profiles that
the tickless patches provide. Some timers are highly dynamic based on network 
load; but 
even on low activity systems they still show up so the rounding is done only in 
cases of
low activity, allowing higher frequency timers in the high activity case.

The various hardware watchdogs are an obvious case; they run every 2 seconds 
but aren't otherwise
specific of exactly when they need to run.

Signed-off-by: Arjan van de Ven <[EMAIL PROTECTED]>

Index: linux-2.6.19-rc1-git6/net/core/dst.c
===
--- linux-2.6.19-rc1-git6.orig/net/core/dst.c
+++ linux-2.6.19-rc1-git6/net/core/dst.c
@@ -99,7 +99,14 @@ static void dst_run_gc(unsigned long dum
printk("dst_total: %d/%d %ld\n",
   atomic_read(&dst_total), delayed,  dst_gc_timer_expires);
 #endif
-   mod_timer(&dst_gc_timer, jiffies + dst_gc_timer_expires);
+   /* if the next desired timer is more than 4 seconds in the future
+* then round the timer to whole seconds
+*/
+   if (dst_gc_timer_expires > 4*HZ)
+   mod_timer(&dst_gc_timer,
+   round_jiffies(jiffies + dst_gc_timer_expires));
+   else
+   mod_timer(&dst_gc_timer, jiffies + dst_gc_timer_expires);
 
 out:
spin_unlock(&dst_lock);
Index: linux-2.6.19-rc1-git6/net/core/neighbour.c
===
--- linux-2.6.19-rc1-git6.orig/net/core/neighbour.c
+++ linux-2.6.19-rc1-git6/net/core/neighbour.c
@@ -695,7 +695,10 @@ next_elt:
if (!expire)
expire = 1;
 
-   mod_timer(&tbl->gc_timer, now + expire);
+   if (expire>HZ)
+   mod_timer(&tbl->gc_timer, round_jiffies(now + expire));
+   else
+   mod_timer(&tbl->gc_timer, now + expire);
 
write_unlock(&tbl->lock);
 }
Index: linux-2.6.19-rc1-git6/net/sched/sch_generic.c
===
--- linux-2.6.19-rc1-git6.orig/net/sched/sch_generic.c
+++ linux-2.6.19-rc1-git6/net/sched/sch_generic.c
@@ -209,7 +209,7 @@ static void dev_watchdog(unsigned long a
   dev->name);
dev->tx_timeout(dev);
}
-   if (!mod_timer(&dev->watchdog_timer, jiffies + 
dev->watchdog_timeo))
+   if (!mod_timer(&dev->watchdog_timer, 
round_jiffies(jiffies + dev->watchdog_timeo)))
dev_hold(dev);
}
}
Index: linux-2.6.19-rc1-git6/drivers/net/e1000/e1000_main.c
===
--- linux-2.6.19-rc1-git6.orig/drivers/net/e1000/e1000_main.c
+++ linux-2.6.19-rc1-git6/drivers/net/e1000/e1000_main.c
@@ -483,7 +483,7 @@ e1000_up(struct e1000_adapter *adapter)
 
clear_bit(__E1000_DOWN, &adapter->flags);
 
-   mod_timer(&adapter->watchdog_timer, jiffies + 2 * HZ);
+   mod_timer(&adapter->watchdog_timer, round_jiffies(jiffies + 2 * HZ));
return 0;
 }
 
@@ -2493,7 +2493,7 @@ e1000_watchdog(unsigned long data)
 
netif_carrier_on(netdev);
netif_wake_queue(netdev);
-   mod_timer(&adapter->phy_info_timer, jiffies + 2 * HZ);
+   mod_timer(&adapter->phy_info_timer, 
round_jiffies(jiffies + 2 * HZ));
adapter->smartspeed = 0;
}
} else {
@@ -2503,7 +2503,7 @@ e1000_watchdog(unsigned long data)
DPRINTK(LINK, INFO, "NIC Link is Down\n");
netif_carrier_off(netdev);
netif_stop_queue(netdev);
-   mod_timer(&adapter->phy_info_timer, jiffies + 2 * HZ);
+   mod_timer(&adapter->phy_info_timer, 
round_jiffies(jiffies + 2 * HZ));
 
/* 80003ES2LAN workaround--
 * For packet buffer work-around on link down event;
@@ -2568,7 +2568,7 @@ e1000_watchdog(unsigned long data)
e1000_rar_set(&adapter->hw, adapter->hw.mac_addr, 0);
 
/* Reset the timer */
-   mod_timer(&adapter->watchdog_timer, jiffies + 2 * HZ);
+   mod_timer(&adapter->watchdog_timer, round_jiffies(jiffies + 2 * HZ));
 }
 
 #define E1000_TX_FLAGS_CSUM0x0001

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] bridge: flush forwarding table when device carrier off

2006-10-12 Thread Stephen Hemminger
Flush the forwarding table when carrier is lost. This helps for
availability because we don't want to forward to a downed device and
new packets may come in on other links.

Signed-off-by: Stephen Hemminger <[EMAIL PROTECTED]>
---
 net/bridge/br_fdb.c |7 ++-
 net/bridge/br_if.c  |4 ++--
 net/bridge/br_private.h |2 +-
 net/bridge/br_stp_if.c  |2 ++
 4 files changed, 11 insertions(+), 4 deletions(-)

--- bridge.orig/net/bridge/br_fdb.c
+++ bridge/net/bridge/br_fdb.c
@@ -128,7 +128,10 @@ void br_fdb_cleanup(unsigned long _data)
mod_timer(&br->gc_timer, jiffies + HZ/10);
 }
 
-void br_fdb_delete_by_port(struct net_bridge *br, struct net_bridge_port *p)
+
+void br_fdb_delete_by_port(struct net_bridge *br,
+  const struct net_bridge_port *p,
+  int do_all)
 {
int i;
 
@@ -142,6 +145,8 @@ void br_fdb_delete_by_port(struct net_br
if (f->dst != p) 
continue;
 
+   if (f->is_static & !do_all)
+   continue;
/*
 * if multiple ports all have the same device address
 * then when one port is deleted, assign
--- bridge.orig/net/bridge/br_if.c
+++ bridge/net/bridge/br_if.c
@@ -163,7 +163,7 @@ static void del_nbp(struct net_bridge_po
br_stp_disable_port(p);
spin_unlock_bh(&br->lock);
 
-   br_fdb_delete_by_port(br, p);
+   br_fdb_delete_by_port(br, p, 1);
 
list_del_rcu(&p->list);
 
@@ -448,7 +448,7 @@ int br_add_if(struct net_bridge *br, str
 
return 0;
 err2:
-   br_fdb_delete_by_port(br, p);
+   br_fdb_delete_by_port(br, p, 1);
 err1:
kobject_del(&p->kobj);
 err0:
--- bridge.orig/net/bridge/br_private.h
+++ bridge/net/bridge/br_private.h
@@ -143,7 +143,7 @@ extern void br_fdb_changeaddr(struct net
  const unsigned char *newaddr);
 extern void br_fdb_cleanup(unsigned long arg);
 extern void br_fdb_delete_by_port(struct net_bridge *br,
-  struct net_bridge_port *p);
+ const struct net_bridge_port *p, int do_all);
 extern struct net_bridge_fdb_entry *__br_fdb_get(struct net_bridge *br,
 const unsigned char *addr);
 extern struct net_bridge_fdb_entry *br_fdb_get(struct net_bridge *br,
--- bridge.orig/net/bridge/br_stp_if.c
+++ bridge/net/bridge/br_stp_if.c
@@ -113,6 +113,8 @@ void br_stp_disable_port(struct net_brid
del_timer(&p->forward_delay_timer);
del_timer(&p->hold_timer);
 
+   br_fdb_delete_by_port(br, p, 0);
+
br_configuration_update(br);
 
br_port_state_selection(br);
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] e1000: Real time packets and bytes statistics

2006-10-12 Thread Stephen Hemminger
On Wed, 11 Oct 2006 10:44:12 -0700
"Jesse Brandeburg" <[EMAIL PROTECTED]> wrote:

> On 10/11/06, Jean Delvare <[EMAIL PROTECTED]> wrote:
> > Hi all,
> >
> > This patch is posted for review and comments.
> >
> > Let the e1000 driver report the most important statistics (rx/tx_bytes
> > and rx/tx_packets) in real time, rather than every other second. This
> > is similar to what the e100 driver is doing.
> >
> > The current asynchronous statistics refresh model makes it impossible
> > to monitor the network traffic with an interval which isn't a multiple
> > of 2 seconds. For example, an interval of 5 seconds would result in a
> > sawtooth diagram (+20%, -20%) for a constant transfer rate. With a 1
> > second interval it's even worse (0, 200%) of course. This has been
> > annoying users for years, but was never actually fixed:
> 
> I think the idea is good, however, see below.
> 
>  > rx/tx_bytes will show slightly lower values than before, because the
> > hardware appears to include the 4-byte ethernet frame CRC into the
> > frame length, while the driver doesn't. It's probably OK as the
> > e100, 3c59x and 8139too drivers don't include it either.
> 
> this is okay.
> 
> > I additionally noted a difference of 6 bytes on some TX frames, which
> > I am not able to explain. It's probably small and rare enough not to
> > be considered a problem, but if someone can explain it, I would be
> > grateful.
> 
> now, that sounds odd, however, once again, see below.
> 
> > Signed-off-by: Jean Delvare <[EMAIL PROTECTED]>
> > ---
> >  drivers/net/e1000/e1000_main.c |   14 ++
> >  1 file changed, 10 insertions(+), 4 deletions(-)
> >
> > --- linux-2.6.19-rc1.orig/drivers/net/e1000/e1000_main.c2006-10-11 
> > 10:53:49.0 +0200
> > +++ linux-2.6.19-rc1/drivers/net/e1000/e1000_main.c 2006-10-11 
> > 11:34:41.0 +0200
> > @@ -3118,6 +3118,8 @@
> >e1000_tx_map(adapter, tx_ring, skb, first,
> > max_per_txd, nr_frags, mss));
> >
> > +   adapter->net_stats.tx_packets++;
> > +   adapter->net_stats.tx_bytes += skb->len;
> > netdev->trans_start = jiffies;
> 
> this is the part I'm most worried about.  as I believe it to be
> incorrect for TSO packets.  Maybe something like?
> +   if (skb_shinfo(skb)->gso_segs)
> +  adapter->net_stats.tx_packets += skb_shinfo(skb)->gso_segs;
> +   else
> +  adapter->net_stats.tx_packets++;
> +   adapter->net_stats.tx_bytes += skb->len;
> netdev->trans_start = jiffies;
> 
> skb len will still be off by some amount, because the skb->data
> (header) is replicated across each gso segment but only counted once
> this way, but hopefully someone will pipe up with a good way to
> compute that.
> 
> The rest of the patch seems fine, barring any other comments.
> 
> Jesse

You might want to put the tx values in a per-cpu structure and
sum later.  Incrementing statistics can actually be a performance
bottleneck on SMP tests, because it causes lots of cache thrashing.


-- 
Stephen Hemminger <[EMAIL PROTECTED]>
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Suppress / delay SYN-ACK

2006-10-12 Thread Rick Jones

Martin Schiller wrote:

Hi!

I'm searching for a solution to suppress / delay the SYN-ACK packet of a
listening server (-application) until he has decided (e.g. analysed the
requesting ip-address or checked if the corresponding other end of a
connection is available) if he wants to accept the connect request of the
client. If not, it should be possible to reject the connect request.


How often do you expect the incomming call to be rejected?  I suspect that would 
have a significant effect on whether the whole thing is worthwhile.


rick jones
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Customizable TCP backoff patch

2006-10-12 Thread Ben Woodard

YOSHIFUJI Hideaki / 吉藤英明 wrote:

+   .data   = &sysctl_tcp_rto_max,
+   .maxlen = sizeof(unsigned),


sizeof(unsigned long)


Good catch. That would have corrupted things badly on some 64b 
platforms. With all the flux in the area I forgot to change the size of 
that but would have been OK on the ia32 boxes.



diff -ru linux-2.6.18/net/ipv4/tcp.c linux-2.6.18.new/net/ipv4/tcp.c
--- linux-2.6.18/net/ipv4/tcp.c 2006-09-19 20:42:06.0 -0700
+++ linux-2.6.18.new/net/ipv4/tcp.c 2006-10-11 16:00:37.0 -0700
@@ -2110,6 +2126,12 @@
if (copy_to_user(optval, icsk->icsk_ca_ops->name, len))
return -EFAULT;
return 0;
+   case TCP_BACKOFF_MAX:
+   val = jiffies_to_msecs(tcp_rto_max(tp));
+   break;


tp->rto_max


+   case TCP_BACKOFF_INIT:
+   val = jiffies_to_msecs(tcp_rto_init(tp));
+   break;
default:


tp->rto_init


OK I get it now.



--yoshfuji


diff -ru linux-2.6.18/include/linux/sysctl.h linux-2.6.18.new/include/linux/sysctl.h
--- linux-2.6.18/include/linux/sysctl.h	2006-09-19 20:42:06.0 -0700
+++ linux-2.6.18.new/include/linux/sysctl.h	2006-10-11 10:27:52.0 -0700
@@ -411,6 +411,8 @@
 	NET_IPV4_TCP_WORKAROUND_SIGNED_WINDOWS=115,
 	NET_TCP_DMA_COPYBREAK=116,
 	NET_TCP_SLOW_START_AFTER_IDLE=117,
+	NET_TCP_RTO_MAX=118,
+	NET_TCP_RTO_INIT=119,
 };
 
 enum {
diff -ru linux-2.6.18/include/linux/tcp.h linux-2.6.18.new/include/linux/tcp.h
--- linux-2.6.18/include/linux/tcp.h	2006-09-19 20:42:06.0 -0700
+++ linux-2.6.18.new/include/linux/tcp.h	2006-10-12 08:28:52.645411000 -0700
@@ -94,6 +94,8 @@
 #define TCP_INFO		11	/* Information about this connection. */
 #define TCP_QUICKACK		12	/* Block/reenable quick acks */
 #define TCP_CONGESTION		13	/* Congestion control algorithm */
+#define TCP_BACKOFF_MAX		14	/* Maximum backoff value */
+#define TCP_BACKOFF_INIT	15	/* Initial backoff value */
 
 #define TCPI_OPT_TIMESTAMPS	1
 #define TCPI_OPT_SACK		2
@@ -257,6 +259,8 @@
 	__u8	frto_counter;	/* Number of new acks after RTO */
 	__u8	nonagle;	/* Disable Nagle algorithm? */
 	__u8	keepalive_probes; /* num of allowed keep alive probes	*/
+	__u16	rto_max;	/* Maximum backoff value in ms		*/
+	__u16	rto_init;	/* Initial backoff value in ms		*/
 
 /* RTT measurement */
 	__u32	srtt;		/* smoothed round trip time << 3	*/
Only in linux-2.6.18.new/include/linux: tcp.h~
diff -ru linux-2.6.18/include/net/tcp.h linux-2.6.18.new/include/net/tcp.h
--- linux-2.6.18/include/net/tcp.h	2006-09-19 20:42:06.0 -0700
+++ linux-2.6.18.new/include/net/tcp.h	2006-10-11 17:43:23.091431000 -0700
@@ -227,11 +227,23 @@
 extern int sysctl_tcp_base_mss;
 extern int sysctl_tcp_workaround_signed_windows;
 extern int sysctl_tcp_slow_start_after_idle;
+extern unsigned long sysctl_tcp_rto_max;
+extern unsigned long sysctl_tcp_rto_init;
 
 extern atomic_t tcp_memory_allocated;
 extern atomic_t tcp_sockets_allocated;
 extern int tcp_memory_pressure;
 
+static inline unsigned long tcp_rto_max(struct tcp_sock *tp)
+{
+	return tp->rto_max ? msecs_to_jiffies(tp->rto_max) : sysctl_tcp_rto_max;
+}
+
+static inline unsigned long tcp_rto_init(struct tcp_sock *tp)
+{
+	return tp->rto_init ? msecs_to_jiffies(tp->rto_init) : sysctl_tcp_rto_init;
+}
+
 /*
  * The next routines deal with comparing 32 bit unsigned ints
  * and worry about wraparound (automatic with unsigned arithmetic).
Only in linux-2.6.18.new/include/net: tcp.h~
diff -ru linux-2.6.18/net/ipv4/sysctl_net_ipv4.c linux-2.6.18.new/net/ipv4/sysctl_net_ipv4.c
--- linux-2.6.18/net/ipv4/sysctl_net_ipv4.c	2006-09-19 20:42:06.0 -0700
+++ linux-2.6.18.new/net/ipv4/sysctl_net_ipv4.c	2006-10-12 07:14:41.87191 -0700
@@ -128,6 +128,8 @@
 	return ret;
 }
 
+static unsigned long tcp_rto_min=0;
+static unsigned long tcp_rto_max=65535;
 
 ctl_table ipv4_table[] = {
 {
@@ -697,6 +699,26 @@
 		.mode		= 0644,
 		.proc_handler	= &proc_dointvec
 	},
+	{
+		.ctl_name	= NET_TCP_RTO_MAX,
+		.procname	= "tcp_rto_max",
+		.data		= &sysctl_tcp_rto_max,
+		.maxlen		= sizeof(unsigned long),
+		.mode		= 0644,
+		.proc_handler	= &proc_doulongvec_ms_jiffies_minmax,
+		.extra1		= &tcp_rto_min_constant,
+		.extra2		= &tcp_rto_max_constant,
+	},
+	{
+		.ctl_name	= NET_TCP_RTO_INIT,
+		.procname	= "tcp_rto_init",
+		.data		= &sysctl_tcp_rto_init,
+		.maxlen		= sizeof(unsigned long),
+		.mode		= 0644,
+		.proc_handler	= &proc_doulongvec_ms_jiffies_minmax,
+		.extra1		= &tcp_rto_min_constant,
+		.extra2		= &tcp_rto_max_constant,
+	},
 	{ .ctl_name = 0 }
 };
 
Only in linux-2.6.18.new/net/ipv4: sysctl_net_ipv4.c~
diff -ru linux-2.6.18/net/ipv4/tcp.c linux-2.6.18.new/net/ipv4/tcp.c
--- linux-2.6.18/net/ipv4/tcp.c	2006-09-19 20:42:06.0 -0700
+++ linux-2.6.18.new/net/ipv4/tcp.c	2006-10-12 07:18:01.193083000 -0700
@@ -1764,6 +1764,8 @@
 	return err;
 }
 
+#define TCP_BACKOFF_MAXVAL 65535
+
 /*
  *	Socket 

Re: Suppress / delay SYN-ACK

2006-10-12 Thread Evgeniy Polyakov
On Thu, Oct 12, 2006 at 12:39:30PM +0200, Eric Dumazet ([EMAIL PROTECTED]) 
wrote:
> > You should break your decision into per state change transformations.
> > I think it is possible with either conntrack or netlink module Samir
> > Bellabes  creates (Network Events Connector
> > subject) or even using syncookie algo changes.
> 
> Hum.. they are some cases where conntrack is not an option (way too expensive 
> if your server handle XXX.XXX concurrent tcp streams)

I think any netlink related work here can not be used for any kind of
high performance setup - it will be too slow to send/receive one or more
messages per state change for each new connection...

> > But it will drastically change your server performance...
> 
> Sure, at least its capacity to answer to SYN packets (session establishment 
> should be slower, unless the thread receiving/handling SYN packets has 
> realtime scheduling)

Maybe it will be better to create some more complex protocol which will
collect data before sending netlink message, or just use a procfs file
or syscall/ioctl/socket option.

> Eric

-- 
Evgeniy Polyakov
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Suppress / delay SYN-ACK

2006-10-12 Thread Eric Dumazet
On Thursday 12 October 2006 12:31, Evgeniy Polyakov wrote:
> On Thu, Oct 12, 2006 at 12:13:26PM +0200, Martin Schiller ([EMAIL PROTECTED]) 
wrote:
> > On Thursday, October 12, 2006 10:38 AM, Eric Dumazet wrote:
> > > Well, it is already possible to delay the 'third packet' of an
> > > outgoing connection with a litle hack. But AFAIK not the SYNACK of
> > > incoming connection. It could be cool. Maybe some new syscalls are
> > > needed:
> > >
> > > int syn_recv(int socklisten, ...);
> > > /* give to user app the SYN packet */
> > > int syn_ack(int socklisten, ...);
> > > /* User app has the ability to ask kernel tcp stack to :
> > > DROP this packet.
> > > REJECT the attempt
> > > ACCEPT the attempt (sending a SYN/ACK) */
> >
> > So, when do you mean the user-space application should run this syscalls?
> > After the call to listen()?
> >
> > Another problem with this solution might be, that I don't want to block
> > the listening socket with the processing of one request, because there
> > could be a lot of simultaneous requests.
>
> You should break your decision into per state change transformations.
> I think it is possible with either conntrack or netlink module Samir
> Bellabes  creates (Network Events Connector
> subject) or even using syncookie algo changes.

Hum.. they are some cases where conntrack is not an option (way too expensive 
if your server handle XXX.XXX concurrent tcp streams)

>
> But it will drastically change your server performance...

Sure, at least its capacity to answer to SYN packets (session establishment 
should be slower, unless the thread receiving/handling SYN packets has 
realtime scheduling)

Eric
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Suppress / delay SYN-ACK

2006-10-12 Thread Eric Dumazet
On Thursday 12 October 2006 12:13, Martin Schiller wrote:
> On Thursday, October 12, 2006 10:38 AM, Eric Dumazet wrote:
> > Well, it is already possible to delay the 'third packet' of an
> > outgoing connection with a litle hack. But AFAIK not the SYNACK of
> > incoming connection. It could be cool. Maybe some new syscalls are
> > needed:
> >
> > int syn_recv(int socklisten, ...);
> > /* give to user app the SYN packet */
> > int syn_ack(int socklisten, ...);
> > /* User app has the ability to ask kernel tcp stack to :
> > DROP this packet.
> > REJECT the attempt
> > ACCEPT the attempt (sending a SYN/ACK) */
>
> So, when do you mean the user-space application should run this syscalls?
> After the call to listen()?
>

Exactly like when you call accept() on a non blocking listening socket.

If your application did asked to received notification of SYN packets, it 
should be prepared to call accept() (to be notified of fully established 
connections) and/or syn_recv() (to be notified of SYN packets)

So when poll()/select()/epoll() tells your socklisten has available events, 
your application would have to call both accept() and syn_recv() in a loop to 
empty all awaiting events.

> Another problem with this solution might be, that I don't want to block the
> listening socket with the processing of one request, because there could be
> a lot of simultaneous requests.

Yes I can imagine.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Suppress / delay SYN-ACK

2006-10-12 Thread Evgeniy Polyakov
On Thu, Oct 12, 2006 at 12:13:26PM +0200, Martin Schiller ([EMAIL PROTECTED]) 
wrote:
> On Thursday, October 12, 2006 10:38 AM, Eric Dumazet wrote:
> >
> > Well, it is already possible to delay the 'third packet' of an
> > outgoing connection with a litle hack. But AFAIK not the SYNACK of
> > incoming connection. It could be cool. Maybe some new syscalls are
> > needed:   
> > 
> > int syn_recv(int socklisten, ...);
> > /* give to user app the SYN packet */
> > int syn_ack(int socklisten, ...);
> > /* User app has the ability to ask kernel tcp stack to :
> > DROP this packet.
> > REJECT the attempt
> > ACCEPT the attempt (sending a SYN/ACK) */
> > 
> 
> So, when do you mean the user-space application should run this syscalls?
> After the call to listen()?
> 
> Another problem with this solution might be, that I don't want to block the
> listening socket with the processing of one request, because there could be
> a lot of simultaneous requests. 

You should break your decision into per state change transformations.
I think it is possible with either conntrack or netlink module Samir
Bellabes  creates (Network Events Connector
subject) or even using syncookie algo changes.

But it will drastically change your server performance...

-- 
Evgeniy Polyakov
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [DECNET]: Use correct config option for routing by fwmark in compare_keys()

2006-10-12 Thread David Miller
From: Patrick McHardy <[EMAIL PROTECTED]>
Date: Thu, 12 Oct 2006 12:23:37 +0200

> Small bugfix to the compare_keys fix.

Damn cut&paste :-)  I'll add this fix tomorrow, thanks.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[DECNET]: Use correct config option for routing by fwmark in compare_keys()

2006-10-12 Thread Patrick McHardy
Small bugfix to the compare_keys fix.

[DECNET]: Use correct config option for routing by fwmark in compare_keys()

Signed-off-by: Patrick McHardy <[EMAIL PROTECTED]>

---
commit 8302c73a668852de1b1527038bc4c432cf757a7f
tree d4e9be4f4bfc87b56bf4756a12a2538f2211fc84
parent 22c4cae48af19e83f31bb88a98970166beacc4fd
author Patrick McHardy <[EMAIL PROTECTED]> Thu, 12 Oct 2006 12:22:57 +0200
committer Patrick McHardy <[EMAIL PROTECTED]> Thu, 12 Oct 2006 12:22:57 +0200

 net/decnet/dn_route.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/net/decnet/dn_route.c b/net/decnet/dn_route.c
index a2a43d8..491429c 100644
--- a/net/decnet/dn_route.c
+++ b/net/decnet/dn_route.c
@@ -269,7 +269,7 @@ static inline int compare_keys(struct fl
 {
return ((fl1->nl_u.dn_u.daddr ^ fl2->nl_u.dn_u.daddr) |
(fl1->nl_u.dn_u.saddr ^ fl2->nl_u.dn_u.saddr) |
-#ifdef CONFIG_IP_ROUTE_FWMARK
+#ifdef CONFIG_DECNET_ROUTE_FWMARK
(fl1->nl_u.dn_u.fwmark ^ fl2->nl_u.dn_u.fwmark) |
 #endif
(fl1->nl_u.dn_u.scope ^ fl2->nl_u.dn_u.scope) |


RE: Suppress / delay SYN-ACK

2006-10-12 Thread Martin Schiller
On Thursday, October 12, 2006 10:38 AM, Eric Dumazet wrote:
>
> Well, it is already possible to delay the 'third packet' of an
> outgoing connection with a litle hack. But AFAIK not the SYNACK of
> incoming connection. It could be cool. Maybe some new syscalls are
> needed:   
> 
> int syn_recv(int socklisten, ...);
> /* give to user app the SYN packet */
> int syn_ack(int socklisten, ...);
> /* User app has the ability to ask kernel tcp stack to :
> DROP this packet.
> REJECT the attempt
> ACCEPT the attempt (sending a SYN/ACK) */
> 

So, when do you mean the user-space application should run this syscalls?
After the call to listen()?

Another problem with this solution might be, that I don't want to block the
listening socket with the processing of one request, because there could be
a lot of simultaneous requests. 


-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [IPv6] rules: Use RT6_LOOKUP_F_HAS_SADDR and fix source based selectors

2006-10-12 Thread YOSHIFUJI Hideaki / 吉藤英明
In article <[EMAIL PROTECTED]> (at Thu, 12 Oct 2006 11:41:24 +0200), Thomas 
Graf <[EMAIL PROTECTED]> says:

> Fixes rt6_lookup() to provide the source address in the flow
> and sets RT6_LOOKUP_F_HAS_SADDR whenever it is present in
> the flow.
> 
> Avoids unnecessary prefix comparisons by checking for a prefix
> length first.
> 
> Fixes the rule logic to not match packets if a source selector
> has been specified but no source address is available.
> 
> Thanks to Kim Nordlund <[EMAIL PROTECTED]> for working
> on this patch with me.
> 
> Signed-off-by: Thomas Graf <[EMAIL PROTECTED]>

I tend to agree.  Ville, do you agree?

--yoshfuji
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[IPv6] rules: Use RT6_LOOKUP_F_HAS_SADDR and fix source based selectors

2006-10-12 Thread Thomas Graf
Fixes rt6_lookup() to provide the source address in the flow
and sets RT6_LOOKUP_F_HAS_SADDR whenever it is present in
the flow.

Avoids unnecessary prefix comparisons by checking for a prefix
length first.

Fixes the rule logic to not match packets if a source selector
has been specified but no source address is available.

Thanks to Kim Nordlund <[EMAIL PROTECTED]> for working
on this patch with me.

Signed-off-by: Thomas Graf <[EMAIL PROTECTED]>

Index: net-2.6/net/ipv6/fib6_rules.c
===
--- net-2.6.orig/net/ipv6/fib6_rules.c  2006-10-11 22:29:50.0 +0200
+++ net-2.6/net/ipv6/fib6_rules.c   2006-10-12 11:01:00.0 +0200
@@ -117,12 +117,15 @@
 {
struct fib6_rule *r = (struct fib6_rule *) rule;
 
-   if (!ipv6_prefix_equal(&fl->fl6_dst, &r->dst.addr, r->dst.plen))
+   if (r->dst.plen &&
+   !ipv6_prefix_equal(&fl->fl6_dst, &r->dst.addr, r->dst.plen))
return 0;
 
-   if ((flags & RT6_LOOKUP_F_HAS_SADDR) &&
-   !ipv6_prefix_equal(&fl->fl6_src, &r->src.addr, r->src.plen))
-   return 0;
+   if (r->src.plen) {
+   if (!(flags & RT6_LOOKUP_F_HAS_SADDR) ||
+   !ipv6_prefix_equal(&fl->fl6_src, &r->src.addr, r->src.plen))
+   return 0;
+   }
 
if (r->tclass && r->tclass != ((ntohl(fl->fl6_flowlabel) >> 20) & 0xff))
return 0;
Index: net-2.6/net/ipv6/route.c
===
--- net-2.6.orig/net/ipv6/route.c   2006-10-11 22:29:50.0 +0200
+++ net-2.6/net/ipv6/route.c2006-10-12 10:59:13.0 +0200
@@ -529,13 +529,17 @@
.nl_u = {
.ip6_u = {
.daddr = *daddr,
-   /* TODO: saddr */
},
},
};
struct dst_entry *dst;
int flags = strict ? RT6_LOOKUP_F_IFACE : 0;
 
+   if (saddr) {
+   memcpy(&fl.fl6_src, saddr, sizeof(*saddr));
+   flags |= RT6_LOOKUP_F_HAS_SADDR;
+   }
+
dst = fib6_rule_lookup(&fl, flags, ip6_pol_route_lookup);
if (dst->error == 0)
return (struct rt6_info *) dst;
@@ -697,6 +701,7 @@
 void ip6_route_input(struct sk_buff *skb)
 {
struct ipv6hdr *iph = skb->nh.ipv6h;
+   int flags = RT6_LOOKUP_F_HAS_SADDR;
struct flowi fl = {
.iif = skb->dev->ifindex,
.nl_u = {
@@ -711,7 +716,9 @@
},
.proto = iph->nexthdr,
};
-   int flags = rt6_need_strict(&iph->daddr) ? RT6_LOOKUP_F_IFACE : 0;
+
+   if (rt6_need_strict(&iph->daddr))
+   flags |= RT6_LOOKUP_F_IFACE;
 
skb->dst = fib6_rule_lookup(&fl, flags, ip6_pol_route_input);
 }
@@ -794,6 +801,9 @@
if (rt6_need_strict(&fl->fl6_dst))
flags |= RT6_LOOKUP_F_IFACE;
 
+   if (!ipv6_addr_any(&fl->fl6_src))
+   flags |= RT6_LOOKUP_F_HAS_SADDR;
+
return fib6_rule_lookup(fl, flags, ip6_pol_route_output);
 }
 
@@ -1345,6 +1355,7 @@
   struct in6_addr *gateway,
   struct net_device *dev)
 {
+   int flags = RT6_LOOKUP_F_HAS_SADDR;
struct ip6rd_flowi rdfl = {
.fl = {
.oif = dev->ifindex,
@@ -1357,7 +1368,9 @@
},
.gateway = *gateway,
};
-   int flags = rt6_need_strict(dest) ? RT6_LOOKUP_F_IFACE : 0;
+
+   if (rt6_need_strict(dest))
+   flags |= RT6_LOOKUP_F_IFACE;
 
return (struct rt6_info *)fib6_rule_lookup((struct flowi *)&rdfl, 
flags, __ip6_route_redirect);
 }
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH-update][RFC] net: consolidated UDP / UDP-Lite code

2006-10-12 Thread David Miller
From: Gerrit Renker <[EMAIL PROTECTED]>
Date: Thu, 12 Oct 2006 08:49:19 +0100

> please find attached the updated UDP-Lite patch - I have removed the 
> statistics corrections you pointed out to me.
> 
> Can you please indicate whether you are ok, by and large, with the
> changes performed by the patch? Even if it is some time ago, I
> have implemented in this patch the architectural suggestions you
> gave me a while earlier. 

The patch looks pretty good.  I have no problems with how
you implemented this at all.

I think we'll have no problem getting this into 2.6.20
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: sfuzz hanging on 2.6.18

2006-10-12 Thread David Miller
From: Patrick McHardy <[EMAIL PROTECTED]>
Date: Thu, 12 Oct 2006 09:46:47 +0200

> Looks like unbalanced locking.
> 
> Signed-off-by: Patrick McHardy <[EMAIL PROTECTED]>

Applied and pushed to -stable, thanks Patrick.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RTNETLINK]: Fix use of wrong skb in do_getlink()

2006-10-12 Thread David Miller
From: Patrick McHardy <[EMAIL PROTECTED]>
Date: Thu, 12 Oct 2006 08:40:57 +0200

> [RTNETLINK]: Fix use of wrong skb in do_getlink()
> 
> skb is the netlink query, nskb is the reply message.
> 
> Signed-off-by: Patrick McHardy <[EMAIL PROTECTED]>

Applied, thanks Patrick.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Suppress / delay SYN-ACK

2006-10-12 Thread Eric Dumazet
On Thursday 12 October 2006 10:08, Martin Schiller wrote:
> Hi!
>
> I'm searching for a solution to suppress / delay the SYN-ACK packet of a
> listening server (-application) until he has decided (e.g. analysed the
> requesting ip-address or checked if the corresponding other end of a
> connection is available) if he wants to accept the connect request of the
> client. If not, it should be possible to reject the connect request.
>
> My idea is to add two ioctl's:
>   - One to set the listening socket into "delay_synack" mode.
>   - And one to send the synack packet, if the connection should be
> accepted.
>
> If the "delay_synack" mode is not enabled, the connection should just work
> as usual.
>
> I had a look at the tcp/ipv4 stack for a while and have found out, that
> this three-way-handshake is already done before anything comes up to
> user-space when I am doing a call to accept(). So I think it wouldn't be
> possible to add this feature with "a little hack".
>

Well, it is already possible to delay the 'third packet' of an outgoing 
connection with a litle hack. But AFAIK not the SYNACK of incoming 
connection. It could be cool. Maybe some new syscalls are needed:

int syn_recv(int socklisten, ...);
/* give to user app the SYN packet */
int syn_ack(int socklisten, ...);
/* User app has the ability to ask kernel tcp stack to :
DROP this packet.
REJECT the attempt
ACCEPT the attempt (sending a SYN/ACK)
*/

Maybe NETLINK (netfilter) is able to meet your need.

> Does anybody have any hints for me where I should start to work?
>
> Regards,
> Martin Schiller
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Suppress / delay SYN-ACK

2006-10-12 Thread Martin Schiller
Hi!

I'm searching for a solution to suppress / delay the SYN-ACK packet of a
listening server (-application) until he has decided (e.g. analysed the
requesting ip-address or checked if the corresponding other end of a
connection is available) if he wants to accept the connect request of the
client. If not, it should be possible to reject the connect request.

My idea is to add two ioctl's:
- One to set the listening socket into "delay_synack" mode.
- And one to send the synack packet, if the connection should be
accepted.

If the "delay_synack" mode is not enabled, the connection should just work
as usual.

I had a look at the tcp/ipv4 stack for a while and have found out, that this
three-way-handshake is already done before anything comes up to user-space
when I am doing a call to accept(). So I think it wouldn't be possible to
add this feature with "a little hack".

Does anybody have any hints for me where I should start to work?

Regards,
Martin Schiller


-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] Question about potential problem in net/ipv4/route.c

2006-10-12 Thread David Miller
From: Eric Dumazet <[EMAIL PROTECTED]>
Date: Thu, 12 Oct 2006 08:35:47 +0200

> How about avoiding the fwmark thing if !CONFIG_IP_ROUTE_FWMARK

I've added that, good idea.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: sfuzz hanging on 2.6.18

2006-10-12 Thread Patrick McHardy
Dave Jones wrote:
> sfuzz D 724EF62A  2828 28717  28691 (NOTLB)
>cd69fe98 0082 012d 724ef62a 0001971a 0010 0007 
> df6d22b0 
>dfd81080 725bbc5e 0001971a 000cc634 0001 df6d23bc c140e260 
> 0202 
>de1d5ba0 cd69fea0 de1d5ba0   de1d5b60 de1d5b8c 
> de1d5ba0 
> Call Trace:
>  [] lock_sock+0x75/0xa6
>  [] dn_getname+0x18/0x5f [decnet]
>  [] sys_getsockname+0x5c/0xb0
>  [] sys_socketcall+0xef/0x261
>  [] syscall_call+0x7/0xb
> DWARF2 unwinder stuck at syscall_call+0x7/0xb
> 
> I wonder if the plethora of lockdep related changes inadvertantly broke 
> something?

Looks like unbalanced locking.

Signed-off-by: Patrick McHardy <[EMAIL PROTECTED]>
diff --git a/net/decnet/af_decnet.c b/net/decnet/af_decnet.c
index 70e0273..3456cd3 100644
--- a/net/decnet/af_decnet.c
+++ b/net/decnet/af_decnet.c
@@ -1178,8 +1178,10 @@ static int dn_getname(struct socket *soc
if (peer) {
if ((sock->state != SS_CONNECTED && 
 sock->state != SS_CONNECTING) && 
-   scp->accept_mode == ACC_IMMED)
+   scp->accept_mode == ACC_IMMED) {
+   release_sock(sk);
return -ENOTCONN;
+   }
 
memcpy(sa, &scp->peer, sizeof(struct sockaddr_dn));
} else {


sfuzz hanging on 2.6.18

2006-10-12 Thread Dave Jones
sfuzz.c (google for it if you don't have it already) used to
run forver (or until I got bored and ctrl-c'd it) as long
as it didn't trigger an oops or the like in 2.6.17

Running it against 2.6.18, I notice that it runs for a while,
and then gets totally wedged.  It doesn't respond to any signals,
can't be ptraced, and even strace subsequently gets wedged.
The machine responds, and is still interactive, but that process
is hosed.

sysrq-t shows it stuck here..

sfuzz D 724EF62A  2828 28717  28691 (NOTLB)
   cd69fe98 0082 012d 724ef62a 0001971a 0010 0007 df6d22b0 
   dfd81080 725bbc5e 0001971a 000cc634 0001 df6d23bc c140e260 0202 
   de1d5ba0 cd69fea0 de1d5ba0   de1d5b60 de1d5b8c de1d5ba0 
Call Trace:
 [] lock_sock+0x75/0xa6
 [] dn_getname+0x18/0x5f [decnet]
 [] sys_getsockname+0x5c/0xb0
 [] sys_socketcall+0xef/0x261
 [] syscall_call+0x7/0xb
DWARF2 unwinder stuck at syscall_call+0x7/0xb

I wonder if the plethora of lockdep related changes inadvertantly broke 
something?

Dave

-- 
http://www.codemonkey.org.uk
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] NET/bluetooth: handle sysfs errors

2006-10-12 Thread Marcel Holtmann
Hi Jeff,

thanks for the patch, but I already have one that fixes this and it will
go to David Miller for inclusion soon.

Regards

Marcel


-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH 00/11] The _entire_ secid reconciliation patchset (tada!)

2006-10-12 Thread James Morris
On Wed, 11 Oct 2006, Venkat Yekkirala wrote:

> > Outstanding items include resolving the igmp skb hook issue 
> > generally, 
> > testing to verify both the design and implementation, and 
> > ensuring that 
> > all the related policy changes are merged upstream first.
> > 
> Regarding the igmp hook issue, we could do a generic hook
> like Paul suggested. Would that be more palatable you think?

It needs to be investigated to see if anything else in the kernel is doing 
the same thing, and then most likely, a generic hook for 
classifying non-socket packets (you could pass the protocol as a hook 
parameter).


-- 
James Morris
<[EMAIL PROTECTED]>
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html