Re: TSO trimming question

2007-12-21 Thread Bill Fink
On Thu, 20 Dec 2007, David Miller wrote:

 From: Ilpo_Järvinen [EMAIL PROTECTED]
 Date: Thu, 20 Dec 2007 13:40:51 +0200 (EET)
 
  [PATCH] [TCP]: Fix TSO deferring
  
  I'd say that most of what tcp_tso_should_defer had in between
  there was dead code because of this.
  
  Signed-off-by: Ilpo Järvinen [EMAIL PROTECTED]
 
 Yikes!
 
 John, we've been living a lie for more than a year. :-/
 
 On the bright side this explains a lot of small TSO frames I've been
 seeing in traces over the past year but never got a chance to
 investigate.
 
  diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
  index 8dafda9..693b9f6 100644
  --- a/net/ipv4/tcp_output.c
  +++ b/net/ipv4/tcp_output.c
  @@ -1217,7 +1217,8 @@ static int tcp_tso_should_defer(struct sock *sk, 
  struct sk_buff *skb)
  goto send_now;
   
  /* Defer for less than two clock ticks. */
  -   if (!tp-tso_deferred  ((jiffies1)1) - (tp-tso_deferred1)  1)
  +   if (tp-tso_deferred 
  +   ((jiffies  1)  1) - (tp-tso_deferred  1)  1)
  goto send_now;
   
  in_flight = tcp_packets_in_flight(tp);

I meant to ask about this a while back but then got distracted by
other things.  But now since the subject has come up, I had a couple
of more questions about this code.

What's with all the shifting back and forth?  Here with:

((jiffies1)1) - (tp-tso_deferred1)

and later with:

/* Ok, it looks like it is advisable to defer.  */
tp-tso_deferred = 1 | (jiffies1);

Is this just done to try and avoid the special case of jiffies==0 
when the jiffies wrap?  If so it seems like a lot of unnecessary
work just to avoid a 1 in 4 billion event, since it's my understanding
that the whole tcp_tso_should_defer function is just an optimization
and not a criticality to the proper functioning of TCP, especially
considering it hasn't even been executing at all up to now.

My second question is more basic and if I'm not mistaken actually
relates to a remaining bug in the (corrected) test:

/* Defer for less than two clock ticks. */
if (tp-tso_deferred 
((jiffies  1)  1) - (tp-tso_deferred  1)  1)

Since jiffies is an unsigned long, which is 64-bits on a 64-bit system,
whereas tp-tso_deferred is a u32, once jiffies exceeds 31-bits, which
will happen in about 25 days if HZ=1000, won't the second part of the
test always be true after that?  Or am I missing something obvious?

-Bill
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [TCP] IPV6 : Change a divide into a right shift in tcp_v6_send_ack()

2007-12-21 Thread Eric Dumazet

YOSHIFUJI Hideaki / 吉藤英明 a écrit :

In article [EMAIL PROTECTED] (at Fri, 21 Dec 2007 08:06:32 +0100), Eric Dumazet 
[EMAIL PROTECTED] says:


YOSHIFUJI Hideaki / 吉藤英明 a écrit :

In article [EMAIL PROTECTED] (at Fri, 21 Dec 2007 07:03:58 +0100), Eric Dumazet 
[EMAIL PROTECTED] says:


Because tot_len is signed in tcp_v6_send_ack(), tot_len/4 forces compiler
to emit an integer divide, while we can help it to use a right shift,
less expensive.

Are you really sure?
At least, gcc-4.1.2-20061115 (debian) does not make any difference.

And, IMHO, because shift for signed variable is fragile, so we should
avoid using it.


Yes I am sure, but maybe you are on x86_64 ?

gcc-4.2.2 on x86


I'm on gcc-4.1.2 20061115 (prerelease) (Debian 4.1.1-21), on x86 (i686).
Maybe compiler difference?!


hum, more probably a .config difference. Try with

CONFIG_CC_OPTIMIZE_FOR_SIZE=Y

(default config)


--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[INET] Avoid an integer divide in rt_garbage_collect()

2007-12-21 Thread Eric Dumazet

Since 'goal' is a signed int, compiler may emit an integer divide
to compute goal/2.

Using a right shift is OK here and less expensive.

Signed-off-by: Eric Dumazet [EMAIL PROTECTED]

diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index e35076e..10915bb 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -851,14 +851,14 @@ static int rt_garbage_collect(void)
equilibrium = ipv4_dst_ops.gc_thresh;
goal = atomic_read(ipv4_dst_ops.entries) - equilibrium;
if (goal  0) {
-   equilibrium += min_t(unsigned int, goal / 2, 
rt_hash_mask + 1);
+   equilibrium += min_t(unsigned int, goal  1, 
rt_hash_mask + 1);
goal = atomic_read(ipv4_dst_ops.entries) - equilibrium;
}
} else {
/* We are in dangerous area. Try to reduce cache really
 * aggressively.
 */
-   goal = max_t(unsigned int, goal / 2, rt_hash_mask + 1);
+   goal = max_t(unsigned int, goal  1, rt_hash_mask + 1);
equilibrium = atomic_read(ipv4_dst_ops.entries) - goal;
}
 


Re: [PATCH] [IPROUTE]: A workaround to make larger rto_min printed correctly

2007-12-21 Thread Jarek Poplawski
On 21-12-2007 03:24, Satoru SATOH wrote:
 2007/12/21, Jarek Poplawski [EMAIL PROTECTED]:
 Jarek Poplawski wrote, On 12/20/2007 09:24 PM:
 ...

 but since it's your patch, I hope you do some additional checking
 if it's always like this...

 ...or maybe only changing this all a little bit will make it look safer!

 Jarek P.
 
 
 OK, how about this?
 
 Signed-off-by: Satoru SATOH [EMAIL PROTECTED]
 
  ip/iproute.c |   12 
  1 files changed, 8 insertions(+), 4 deletions(-)
 
 diff --git a/ip/iproute.c b/ip/iproute.c
 index f4200ae..c771b34 100644
 --- a/ip/iproute.c
 +++ b/ip/iproute.c
 @@ -510,16 +510,20 @@ int print_route(const struct sockaddr_nl *who,
 struct nlmsghdr *n, void *arg)
   fprintf(fp,  %u, 
 *(unsigned*)RTA_DATA(mxrta[i]));
   else {
   unsigned val = *(unsigned*)RTA_DATA(mxrta[i]);
 + unsigned hz1 = hz;
 + if (hz1  1000)

Looks OK (safe) to me: it's compatible both with old and new way.
I'd only suggest to maybe change this to '(hz1  1024)', because
it's the biggest HZ currently in the kernel, so this compatibility
should be 100%. I think, you could leave 1 empty line before this
'if', as well. (Btw., aren't these overflows connected with
CONFIG_HIGH_RES_TIMERS?)

On the other hand this 'hz' still looks 'strange' here - I don't
understand, why, a bit earlier it's:

if (!hz)
hz = get_hz();

while 'else' would use: hz == get_user_hz();
So, probably I miss something, but even after your patch, there
could be different outputs here...

Thanks,
Jarek P.

PS: did you CC Stephen Hemminger on this?

 + hz1 /= 1000;
 + else
 + val *= 1000;
 
 - val *= 1000;
   if (i == RTAX_RTT)
   val /= 8;
   else if (i == RTAX_RTTVAR)
   val /= 4;
 - if (val = hz)
 - fprintf(fp,  %ums, val/hz);
 + if (val = hz1)
 + fprintf(fp,  %ums, val/hz1);
   else
 - fprintf(fp,  %.2fms, (float)val/hz);
 + fprintf(fp,  %.2fms, (float)val/hz1);
   }
   }
   }
 
 
 Thanks,
 Satoru SATOH
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] [IPROUTE]: A workaround to make larger rto_min printed correctly

2007-12-21 Thread YOSHIFUJI Hideaki / 吉藤英明
In article [EMAIL PROTECTED] (at Fri, 21 Dec 2007 11:24:54 +0900), Satoru 
SATOH [EMAIL PROTECTED] says:

 2007/12/21, Jarek Poplawski [EMAIL PROTECTED]:
  Jarek Poplawski wrote, On 12/20/2007 09:24 PM:
  ...
 
   but since it's your patch, I hope you do some additional checking
   if it's always like this...
 
 
  ...or maybe only changing this all a little bit will make it look safer!
 
  Jarek P.
 
 
 OK, how about this?
 
 Signed-off-by: Satoru SATOH [EMAIL PROTECTED]
 
  ip/iproute.c |   12 
  1 files changed, 8 insertions(+), 4 deletions(-)
 
 diff --git a/ip/iproute.c b/ip/iproute.c
 index f4200ae..c771b34 100644
 --- a/ip/iproute.c
 +++ b/ip/iproute.c
 @@ -510,16 +510,20 @@ int print_route(const struct sockaddr_nl *who,
 struct nlmsghdr *n, void *arg)
   fprintf(fp,  %u, 
 *(unsigned*)RTA_DATA(mxrta[i]));
   else {
   unsigned val = *(unsigned*)RTA_DATA(mxrta[i]);
 + unsigned hz1 = hz;
 + if (hz1  1000)

Why don't you simply use unsigned long long (or maybe uint64_t) here?

Signed-off-by: YOSHIFUJI Hideaki [EMAIL PROTECTED]

--- 
diff --git a/ip/iproute.c b/ip/iproute.c
index f4200ae..db9a3b6 100644
--- a/ip/iproute.c
+++ b/ip/iproute.c
@@ -509,16 +509,21 @@ int print_route(const struct sockaddr_nl *who, struct 
nlmsghdr *n, void *arg)
i != RTAX_RTO_MIN)
fprintf(fp,  %u, 
*(unsigned*)RTA_DATA(mxrta[i]));
else {
-   unsigned val = *(unsigned*)RTA_DATA(mxrta[i]);
+   unsigned long long val = 
*(unsigned*)RTA_DATA(mxrta[i]);
+   unsigned div = 1;
 
-   val *= 1000;
if (i == RTAX_RTT)
-   val /= 8;
+   div = 8;
else if (i == RTAX_RTTVAR)
-   val /= 4;
-   if (val = hz)
-   fprintf(fp,  %ums, val/hz);
+   div = 4;
else
+   div = 1;
+
+   val = val * 1000ULL / div;
+
+   if (val = hz) {
+   fprintf(fp,  %llums, val/hz);
+   } else
fprintf(fp,  %.2fms, (float)val/hz);
}
}

--yoshfuji
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] OOPS with NETLINK_FIB_LOOKUP netlink socket

2007-12-21 Thread Denis V. Lunev
nl_fib_input re-reuses incoming skb to send the reply. This means that this
packet will be freed twice, namely in:
- netlink_unicast_kernel
- on receive path
Use clone to send as a cure, the caller is responsible for kfree_skb on error.

Thanks to Alexey Dobryan, who originally found the problem.

Signed-off-by: Denis V. Lunev [EMAIL PROTECTED]
---
diff --git a/net/ipv4/fib_frontend.c b/net/ipv4/fib_frontend.c
index b03c4c6..ac6238a 100644
--- a/net/ipv4/fib_frontend.c
+++ b/net/ipv4/fib_frontend.c
@@ -832,10 +832,13 @@ static void nl_fib_input(struct sk_buff *skb)
 
nlh = nlmsg_hdr(skb);
if (skb-len  NLMSG_SPACE(0) || skb-len  nlh-nlmsg_len ||
-   nlh-nlmsg_len  NLMSG_LENGTH(sizeof(*frn))) {
-   kfree_skb(skb);
+   nlh-nlmsg_len  NLMSG_LENGTH(sizeof(*frn)))
return;
-   }
+
+   skb = skb_clone(skb, GFP_KERNEL);
+   if (skb == NULL)
+   return;
+   nlh = nlmsg_hdr(skb);
 
frn = (struct fib_result_nl *) NLMSG_DATA(nlh);
tb = fib_get_table(frn-tb_id_in);
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: TSO trimming question

2007-12-21 Thread Ilpo Järvinen
On Fri, 21 Dec 2007, Bill Fink wrote:

 I meant to ask about this a while back but then got distracted by
 other things.  But now since the subject has come up, I had a couple
 of more questions about this code.
 
 What's with all the shifting back and forth?  Here with:
 
   ((jiffies1)1) - (tp-tso_deferred1)
 
 and later with:
 
   /* Ok, it looks like it is advisable to defer.  */
   tp-tso_deferred = 1 | (jiffies1);
 
 Is this just done to try and avoid the special case of jiffies==0 
 when the jiffies wrap? 

I thought that it must be the reason. I couldn't figure out any other 
explination while thinking the same thing but since I saw no problem in 
that rather weird approach, I didn't want to touch that in a patch which 
had net-2.6 (or stable) potential.

 If so it seems like a lot of unnecessary
 work just to avoid a 1 in 4 billion event, since it's my understanding
 that the whole tcp_tso_should_defer function is just an optimization
 and not a criticality to the proper functioning of TCP, especially
 considering it hasn't even been executing at all up to now.

It would still be good to not return 1 in that case we didn't flag the 
deferral, how about adding one additional tick (+comment) in the zero 
case making tso_deferred == 0 again unambiguously defined, e.g.:

tp-tso_deferred = min_t(u32, jiffies, 1);

...I'm relatively sure that nobody would ever notice that tick :-) and we 
kept return value consistent with tso_deferred state invariant.

 My second question is more basic and if I'm not mistaken actually
 relates to a remaining bug in the (corrected) test:
 
   /* Defer for less than two clock ticks. */
   if (tp-tso_deferred 
   ((jiffies  1)  1) - (tp-tso_deferred  1)  1)
 
 Since jiffies is an unsigned long, which is 64-bits on a 64-bit system,
 whereas tp-tso_deferred is a u32, once jiffies exceeds 31-bits, which
 will happen in about 25 days if HZ=1000, won't the second part of the
 test always be true after that?  Or am I missing something obvious?

I didn't notice that earlier but I think you're right though my knowledge 
about jiffies and such is quite limited.

...Feel free to submit a patch to correct these.


-- 
 i.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] PS3: gelic: Add wireless support for PS3

2007-12-21 Thread Masakazu Mokuno
Hi David,

On Fri, 14 Dec 2007 01:48:52 -0500
David Woodhouse [EMAIL PROTECTED] wrote:

 --- linux-2.6.23.ppc64/drivers/net/ps3_gelic_net.c~   2007-12-14 
 01:31:50.0 -0500
 +++ linux-2.6.23.ppc64/drivers/net/ps3_gelic_net.c2007-12-14 
 01:39:25.0 -0500
 @@ -1139,7 +1139,7 @@ static irqreturn_t gelic_card_interrupt(
   *
   * see Documentation/networking/netconsole.txt
   */
 -static void gelic_net_poll_controller(struct net_device *netdev)
 +void gelic_net_poll_controller(struct net_device *netdev)
  {
   struct gelic_card *card = netdev_card(netdev);
  
 --- linux-2.6.23.ppc64/drivers/net/ps3_gelic_net.h~   2007-12-14 
 01:31:50.0 -0500
 +++ linux-2.6.23.ppc64/drivers/net/ps3_gelic_net.h2007-12-14 
 01:39:47.0 -0500
 @@ -346,6 +346,7 @@ extern void gelic_net_tx_timeout(struct 
  extern int gelic_net_change_mtu(struct net_device *netdev, int new_mtu);
  extern int gelic_net_setup_netdev(struct net_device *netdev,
 struct gelic_card *card);
 +extern void gelic_net_poll_controller(struct net_device *netdev);
  
  /* shared ethtool ops */
  extern void gelic_net_get_drvinfo(struct net_device *netdev,


Thank you for your reviewing.  I'll fix it.

-- 
Masakazu MOKUNO

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: TSO trimming question

2007-12-21 Thread David Miller
From: Bill Fink [EMAIL PROTECTED]
Date: Fri, 21 Dec 2007 03:06:48 -0500

 What's with all the shifting back and forth?  Here with:
 
   ((jiffies1)1) - (tp-tso_deferred1)
 
 and later with:
 
   /* Ok, it looks like it is advisable to defer.  */
   tp-tso_deferred = 1 | (jiffies1);
 
 Is this just done to try and avoid the special case of jiffies==0 
 when the jiffies wrap?  If so it seems like a lot of unnecessary
 work just to avoid a 1 in 4 billion event, since it's my understanding
 that the whole tcp_tso_should_defer function is just an optimization
 and not a criticality to the proper functioning of TCP, especially
 considering it hasn't even been executing at all up to now.

How else would you avoid the incorrect result when jiffies is
indeed zero?

It's two shifts, and this gets scheduled along with the other
instructions on many cpus so it's effectively free.

I don't see why this is even worth mentioning and discussing.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: TSO trimming question

2007-12-21 Thread Ilpo Järvinen
On Fri, 21 Dec 2007, Ilpo Järvinen wrote:

 On Fri, 21 Dec 2007, Bill Fink wrote:
 
  If so it seems like a lot of unnecessary
  work just to avoid a 1 in 4 billion event, since it's my understanding
  that the whole tcp_tso_should_defer function is just an optimization
  and not a criticality to the proper functioning of TCP, especially
  considering it hasn't even been executing at all up to now.
 
 It would still be good to not return 1 in that case we didn't flag the 
 deferral, how about adding one additional tick (+comment) in the zero 
 case making tso_deferred == 0 again unambiguously defined, e.g.:
 
   tp-tso_deferred = min_t(u32, jiffies, 1);

Blah, max_t of course :-).


-- 
 i.

Re: TSO trimming question

2007-12-21 Thread Herbert Xu
On Fri, Dec 21, 2007 at 01:27:20AM -0800, David Miller wrote:

 It's two shifts, and this gets scheduled along with the other
 instructions on many cpus so it's effectively free.
 
 I don't see why this is even worth mentioning and discussing.

I totally agree.  Two shifts are way better than a branch.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmVHI~} [EMAIL PROTECTED]
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] OOPS with NETLINK_FIB_LOOKUP netlink socket

2007-12-21 Thread David Miller
From: Denis V. Lunev [EMAIL PROTECTED]
Date: Fri, 21 Dec 2007 12:00:43 +0300

 nl_fib_input re-reuses incoming skb to send the reply. This means that this
 packet will be freed twice, namely in:
 - netlink_unicast_kernel
 - on receive path
 Use clone to send as a cure, the caller is responsible for kfree_skb on error.
 
 Thanks to Alexey Dobryan, who originally found the problem.
 
 Signed-off-by: Denis V. Lunev [EMAIL PROTECTED]

What introduced this bug?  This code didn't have this
problem previously.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] OOPS with NETLINK_FIB_LOOKUP netlink socket

2007-12-21 Thread Denis V. Lunev
David Miller wrote:
 From: Denis V. Lunev [EMAIL PROTECTED]
 Date: Fri, 21 Dec 2007 12:00:43 +0300
 
 nl_fib_input re-reuses incoming skb to send the reply. This means that this
 packet will be freed twice, namely in:
 - netlink_unicast_kernel
 - on receive path
 Use clone to send as a cure, the caller is responsible for kfree_skb on 
 error.

 Thanks to Alexey Dobryan, who originally found the problem.

 Signed-off-by: Denis V. Lunev [EMAIL PROTECTED]
 
 What introduced this bug?  This code didn't have this
 problem previously.
 

commit cd40b7d3983c708aabe3d3008ec64ffce56d33b0
Author: Denis V. Lunev [EMAIL PROTECTED]
Date:   Wed Oct 10 21:15:29 2007 -0700
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [TCP] IPV6 : Change a divide into a right shift in tcp_v6_send_ack()

2007-12-21 Thread David Miller
From: Eric Dumazet [EMAIL PROTECTED]
Date: Fri, 21 Dec 2007 08:39:24 +0100

 I didnt chose this path, because David was against changing some
 fields from 'int' to 'unsigned'. If you look in other parts of
 networking, we have many  1 or  2 already there.

I don't remember making this statement, where did I say this?
I'm genuinely curious :-)

But I am indeed against it in cases where the variable is compared
against signed quantities.

I think the shifts are more pretty and more closely match what the
programmer wants to come out of the compiler.  Getting all of these
divides is an awful surprise for me.

I've learned over the years to never explicitly code divides
or multiplies by powers of two and to always use shifts.  As
a result I am never surprised.  In fact I've been burnt every
time I mistakedly didn't use a shift.

Nevertheless, this tcplen arg is always assigned to and used
with unsigned quantities so I'll apply Yoshifuji's version of
the fix.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [TCP]: Convert several length variable to unsigned.

2007-12-21 Thread David Miller
From: YOSHIFUJI Hideaki / 吉藤英明 [EMAIL PROTECTED]
Date: Fri, 21 Dec 2007 16:48:11 +0900 (JST)

 Several length variables cannot be negative, so convert int to
 unsigned int.  This also allows us to do sane shift operations
 on those variables.
 
 Signed-off-by: YOSHIFUJI Hideaki [EMAIL PROTECTED]

Applied to net-2.6.25, thanks.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] OOPS with NETLINK_FIB_LOOKUP netlink socket

2007-12-21 Thread David Miller
From: Denis V. Lunev [EMAIL PROTECTED]
Date: Fri, 21 Dec 2007 12:39:36 +0300

 David Miller wrote:
  What introduced this bug?  This code didn't have this
  problem previously.
 
 commit cd40b7d3983c708aabe3d3008ec64ffce56d33b0
 Author: Denis V. Lunev [EMAIL PROTECTED]
 Date:   Wed Oct 10 21:15:29 2007 -0700

Thank you, I'm going to add this reference to the changeset
comments so that it will help others who look at this fix.

Consider your patch applied, thanks! :-)
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [SOCK] Avoid integer divides where not necessary in include/net/sock.h

2007-12-21 Thread David Miller
From: Eric Dumazet [EMAIL PROTECTED]
Date: Fri, 21 Dec 2007 07:18:40 +0100

 Because sk_wmem_queued, sk_sndbuf are signed, a divide per two
 forces compiler to use an integer divide. We can instead use
 a right shift.
 
 SK_STREAM_MEM_QUANTUM deserves to be declared as an unsigned
 quantity, so that sk_stream_pages() and __sk_stream_mem_reclaim()
 can use right shifts instead of integer divides.
 
 Signed-off-by: Eric Dumazet [EMAIL PROTECTED]

Everything that works with SK_STREAM_MEM_QUANTUM is an int.

We do a DIV_ROUND_UP() using SK_STREAM_MEM_QUANTUM and an int.

We do sk-sk_forward_alloc modifications using multiplies on
SK_STREAM_MEM_QUANTUM and an int, sk_forward_alloc is an int
too.

Changing the type of SK_STREAM_MEM_QUANTUM does nothing more than
create an exception in the typing used in these operations for no real
gain, in fact it makes this code's correctness harder to verify.

I'm fine with the rest of your change, so please resubmit this patch
with the type change removed.

Thanks.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Fwd: Increasing initial cwnd size

2007-12-21 Thread Sourav Chakraborty
-- Forwarded message --
From: Sourav Chakraborty [EMAIL PROTECTED]
Date: Dec 21, 2007 1:28 PM
Subject: Increasing initial cwnd size
To: [EMAIL PROTECTED]


Hello,

We are trying to test with kernel version 2.6.20,the RFC 3390
implementation of increased initial congestion window size(~4 when
MSS=1460 bytes).We are not getting consistent results
meaning,sometimes we see 4 data pkts sent without an ACK received and
sometimes we see 3.Also we would like to test by setting the initial
cwnd to a value more than 4.Can you guys advise us on how to do
that(where in the code should we change and what)?Also mention if
using linux 2.6.23 kernel instead of  the one we are using,would help
in this work.

Thanks and Regards
Sourav
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [INET] Avoid an integer divide in rt_garbage_collect()

2007-12-21 Thread David Miller
From: Eric Dumazet [EMAIL PROTECTED]
Date: Fri, 21 Dec 2007 09:20:50 +0100

 Since 'goal' is a signed int, compiler may emit an integer divide
 to compute goal/2.
 
 Using a right shift is OK here and less expensive.
 
 Signed-off-by: Eric Dumazet [EMAIL PROTECTED]

Applied, thanks.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: TSO trimming question

2007-12-21 Thread David Miller
From: Herbert Xu [EMAIL PROTECTED]
Date: Fri, 21 Dec 2007 17:29:27 +0800

 On Fri, Dec 21, 2007 at 01:27:20AM -0800, David Miller wrote:
 
  It's two shifts, and this gets scheduled along with the other
  instructions on many cpus so it's effectively free.
  
  I don't see why this is even worth mentioning and discussing.
 
 I totally agree.  Two shifts are way better than a branch.

We take probably a thousand+ 100+ cycle cache misses in the TCP stack
on big window openning ACKs.

Instead of discussing ways to solve that huge performance killer we're
wanking about two friggin' integer shifts.

It's hilarious isn't it? :-)

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/3] XFRM: RFC4303 compliant auditing

2007-12-21 Thread David Miller
From: Paul Moore [EMAIL PROTECTED]
Date: Thu, 20 Dec 2007 16:42:25 -0500

 This patch adds a number of new IPsec audit events to meet the auditing
 requirements of RFC4303.  This includes audit hooks for the following events:
 
  * Could not find a valid SA [sections 2.1, 3.4.2]
. xfrm_audit_state_notfound()
. xfrm_audit_state_notfound_simple()
 
  * Sequence number overflow [section 3.3.3]
. xfrm_audit_state_replay_overflow()
 
  * Replayed packet [section 3.4.3]
. xfrm_audit_state_replay()
 
  * Integrity check failure [sections 3.4.4.1, 3.4.4.2]
. xfrm_audit_state_icvfail()
 
 While RFC4304 deals only with ESP most of the changes in this patch apply to
 IPsec in general, i.e. both AH and ESP.  The one case, integrity check
 failure, where ESP specific code had to be modified the same was done to the
 AH code for the sake of consistency.
 
 Signed-off-by: Paul Moore [EMAIL PROTECTED]

This doesn't apply at all to net-2.6.25, in particular
xfrm6_input_addr() doesn't even have a local variable
named xfrm_vec_one let alone the conditional where you're
adding the state notfound audit hook.

Please respin this and the third patch, thanks.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [TCP] tcp_write_timeout.c cleanup

2007-12-21 Thread David Miller
From: Eric Dumazet [EMAIL PROTECTED]
Date: Fri, 21 Dec 2007 06:56:17 +0100

 Before submiting a patch to change a divide to a right shift, I felt
 necessary to create a helper function tcp_mtu_probing() to reduce length of 
 lines exceeding 100 chars in tcp_write_timeout().
 
 Signed-off-by: Eric Dumazet [EMAIL PROTECTED]

Thanks for not mixing code cleanups with functional changes,
it makes my life so much easier and make me so much want to
review and apply your patches.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[DCCP] [Patch 1/1] [Bug-Fix]: Need to ignore Ack Vectors on request sockets

2007-12-21 Thread Gerrit Renker
[ACKVEC]: Need to ignore Ack Vectors on request sockets

This fixes an oversight (mine) from an earlier patch and is in principle a 
bug-fix, although the bug will with the current code not become visible.

The issue is that Ack Vectors must not be parsed on request sockets, since
the Ack Vector feature depends on the selection of the (TX) CCID. During the
initial handshake the CCIDs are undefined, and so RFC 4340, 10.3 applies:

 Using CCID-specific options and feature options during a negotiation
  for the corresponding CCID feature is NOT RECOMMENDED [...]

Worse, it is not even possible: when the server receives the Request from the 
client, the CCID and Ack vector features are undefined; when the Ack finalising
the 3-way hanshake arrives, the request socket has not been cloned yet into a
new (full) socket. This order is necessary, since otherwise the newly created
socket would have to be destroyed whenever an option error occurred (a malicious
hacker could simply send garbage options and exploit this).
As a result, it is not feasible to parse Ack Vectors on request sockets, since
their destination (the CCIDs interested in using Ack Vector information) is
undefined. Hence disabled by this patch.

Two more changes suggested itself:
 * replaced magic numbers for CCID-specific options with symbolic constants;
 * replaced local variables `idx' with computation in argument list.

Signed-off-by: Gerrit Renker [EMAIL PROTECTED]
---
 include/linux/dccp.h |6 --
 net/dccp/options.c   |   19 +++
 2 files changed, 11 insertions(+), 14 deletions(-)

--- a/net/dccp/options.c
+++ b/net/dccp/options.c
@@ -104,9 +104,10 @@ int dccp_parse_options(struct sock *sk, 
 *
 * CCID-specific options are ignored during connection setup, as
 * negotiation may still be in progress (see RFC 4340, 10.3).
-*
+* The same applies to Ack Vectors, as these depend on the CCID.
 */
-   if (dreq != NULL  opt = 128)
+   if (dreq != NULL  (opt = DCCPO_MIN_RX_CCID_SPECIFIC ||
+   opt == DCCPO_ACK_VECTOR_0 || opt == DCCPO_ACK_VECTOR_1))
goto ignore_option;
 
switch (opt) {
@@ -222,23 +223,17 @@ int dccp_parse_options(struct sock *sk, 
dccp_pr_debug(%s rx opt: ELAPSED_TIME=%d\n,
  dccp_role(sk), elapsed_time);
break;
-   case 128 ... 191: {
-   const u16 idx = value - options;
-
+   case DCCPO_MIN_RX_CCID_SPECIFIC ... DCCPO_MAX_RX_CCID_SPECIFIC:
if (ccid_hc_rx_parse_options(dp-dccps_hc_rx_ccid, sk,
-opt, len, idx,
+opt, len, value - options,
 value) != 0)
goto out_invalid_option;
-   }
break;
-   case 192 ... 255: {
-   const u16 idx = value - options;
-
+   case DCCPO_MIN_TX_CCID_SPECIFIC ... DCCPO_MAX_TX_CCID_SPECIFIC:
if (ccid_hc_tx_parse_options(dp-dccps_hc_tx_ccid, sk,
-opt, len, idx,
+opt, len, value - options,
 value) != 0)
goto out_invalid_option;
-   }
break;
default:
DCCP_CRIT(DCCP(%p): option %d(len=%d) not 
--- a/include/linux/dccp.h
+++ b/include/linux/dccp.h
@@ -165,8 +165,10 @@ enum {
DCCPO_TIMESTAMP_ECHO = 42,
DCCPO_ELAPSED_TIME = 43,
DCCPO_MAX = 45,
-   DCCPO_MIN_CCID_SPECIFIC = 128,
-   DCCPO_MAX_CCID_SPECIFIC = 255,
+   DCCPO_MIN_RX_CCID_SPECIFIC = 128,   /* from sender to receiver */
+   DCCPO_MAX_RX_CCID_SPECIFIC = 191,
+   DCCPO_MIN_TX_CCID_SPECIFIC = 192,   /* from receiver to sender */
+   DCCPO_MAX_TX_CCID_SPECIFIC = 255,
 };
 
 /* DCCP CCIDS */
-- 
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: TSO trimming question

2007-12-21 Thread Bill Fink
On Fri, 21 Dec 2007, David Miller wrote:

 From: Herbert Xu [EMAIL PROTECTED]
 Date: Fri, 21 Dec 2007 17:29:27 +0800
 
  On Fri, Dec 21, 2007 at 01:27:20AM -0800, David Miller wrote:
  
   It's two shifts, and this gets scheduled along with the other
   instructions on many cpus so it's effectively free.
   
   I don't see why this is even worth mentioning and discussing.
  
  I totally agree.  Two shifts are way better than a branch.
 
 We take probably a thousand+ 100+ cycle cache misses in the TCP stack
 on big window openning ACKs.
 
 Instead of discussing ways to solve that huge performance killer we're
 wanking about two friggin' integer shifts.
 
 It's hilarious isn't it? :-)

I don't think obfuscated code is hilarious.  Instead of the convoluted
and dense code:

/* Defer for less than two clock ticks. */
if (tp-tso_deferred 
((jiffies  1)  1) - (tp-tso_deferred  1)  1)

You can have the much simpler and more easily understandable:

/* Defer for less than two clock ticks. */
if (tp-tso_deferred  (jiffies - tp-tso_deferred)  1)

And instead of:

/* Ok, it looks like it is advisable to defer.  */
tp-tso_deferred = 1 | (jiffies1);

return 1;

You could do as Ilpo suggested:

/* Ok, it looks like it is advisable to defer.  */
tp-tso_deferred = max_t(u32, jiffies, 1);

return 1;

Or perhaps more efficiently:

/* Ok, it looks like it is advisable to defer.  */
tp-tso_deferred = jiffies;
if (unlikely(jiffies == 0))
tp-tso_deferred = 1;

return 1;

Or perhaps even:

/* Ok, it looks like it is advisable to defer.  */
tp-tso_deferred = jiffies;

/* need to return a non-zero value to defer, which means won't
 * defer if jiffies == 0 but it's only a 1 in 4 billion event
 * (and avoids a compare/branch by not checking jiffies)
 /
return jiffies;

Since it really only needs a non-zero return value to defer.

See, no branches needed and much clearer code.  That seems worthwhile
to me from a code maintenance standpoint, even if it isn't any speed
improvement.

And what about the 64-bit jiffies versus 32-bit tp-tso_deferred issue?
Should tso_deferred be made unsigned long to match jiffies?

-Bill
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Evgeniy Polyakov

2007-12-21 Thread David Miller
From: Evgeniy Polyakov [EMAIL PROTECTED]
Date: Fri, 21 Dec 2007 13:34:12 +0300

 Yep, I saw him couple of times and will try to contact.

:-)

 Do not unsubscribe me :)

Ok!
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Evgeniy Polyakov

2007-12-21 Thread Evgeniy Polyakov
On Thu, Dec 20, 2007 at 08:34:59PM -0800, David Miller ([EMAIL PROTECTED]) 
wrote:
 
 If someone has a way other than email to contact Evgeniy, could
 you please let him know that his email is bouncing in strange
 ways.

Yep, I saw him couple of times and will try to contact.

 I'll have to unsubscribe him if this goes on much longer, which
 I don't want to do.
 
 Thanks.
 
 Here is some example bounce text:
 
 451 4.0.0 readqf: cannot open ./dflBL48UH3032179: No such file or directory
 552 5.3.4 Message is too large; 1500 bytes max
 554 5.0.0 Service unavailable

This looks really strange for me - I will forward it to system admin of
the university server where I have a mail.
Likely is is because of some troubles with the mail queue or FS...

Do not unsubscribe me :)

-- 
Evgeniy Polyakov
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [SOCK] Avoid integer divides where not necessary in include/net/sock.h

2007-12-21 Thread Eric Dumazet
On Fri, 21 Dec 2007 01:55:43 -0800 (PST)
David Miller [EMAIL PROTECTED] wrote:

 From: Eric Dumazet [EMAIL PROTECTED]
 Date: Fri, 21 Dec 2007 07:18:40 +0100
 
  Because sk_wmem_queued, sk_sndbuf are signed, a divide per two
  forces compiler to use an integer divide. We can instead use
  a right shift.
  
  SK_STREAM_MEM_QUANTUM deserves to be declared as an unsigned
  quantity, so that sk_stream_pages() and __sk_stream_mem_reclaim()
  can use right shifts instead of integer divides.
  
  Signed-off-by: Eric Dumazet [EMAIL PROTECTED]
 
 Everything that works with SK_STREAM_MEM_QUANTUM is an int.
 
 We do a DIV_ROUND_UP() using SK_STREAM_MEM_QUANTUM and an int.
 
 We do sk-sk_forward_alloc modifications using multiplies on
 SK_STREAM_MEM_QUANTUM and an int, sk_forward_alloc is an int
 too.
 
 Changing the type of SK_STREAM_MEM_QUANTUM does nothing more than
 create an exception in the typing used in these operations for no real
 gain, in fact it makes this code's correctness harder to verify.
 
 I'm fine with the rest of your change, so please resubmit this patch
 with the type change removed.

I cannot remove underlying divide without telling compiler *something* is 
unsigned,
or adding a new _SHIFT macro

We currently do

 int_value / int_constant

I was suggesting 

 int_value / uint_constant

Since you prefer to keep sk_forward_alloc as signed, 
the only clean (without casts) way is to do :

 int_value  SK_STREAM_MEM_QUANTUM_SHIFT

Please tell me if you are OK with this solution, or if you prefer
I change sk_forward_alloc to be unsigned :)

Thank you 

Here is the patch handling the change on sk_wmem_queued, sk_sndbuf.
Keeping small patches may help future bisection anyway...

[SOCK] Avoid integer divides where not necessary in include/net/sock.h

Because sk_wmem_queued, sk_sndbuf are signed, a divide per two
may force compiler to use an integer divide.

We can instead use a right shift.

Signed-off-by: Eric Dumazet [EMAIL PROTECTED]

diff --git a/include/net/sock.h b/include/net/sock.h
index 803d8f2..4456453 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -445,7 +445,7 @@ static inline int sk_acceptq_is_full(struct sock *sk)
  */
 static inline int sk_stream_min_wspace(struct sock *sk)
 {
-   return sk-sk_wmem_queued / 2;
+   return sk-sk_wmem_queued  1;
 }
 
 static inline int sk_stream_wspace(struct sock *sk)
@@ -1187,7 +1187,7 @@ static inline void sk_wake_async(struct sock *sk, int 
how, int band)
 static inline void sk_stream_moderate_sndbuf(struct sock *sk)
 {
if (!(sk-sk_userlocks  SOCK_SNDBUF_LOCK)) {
-   sk-sk_sndbuf = min(sk-sk_sndbuf, sk-sk_wmem_queued / 2);
+   sk-sk_sndbuf = min(sk-sk_sndbuf, sk-sk_wmem_queued  1);
sk-sk_sndbuf = max(sk-sk_sndbuf, SOCK_MIN_SNDBUF);
}
 }
@@ -1211,7 +1211,7 @@ static inline struct page *sk_stream_alloc_page(struct 
sock *sk)
  */
 static inline int sock_writeable(const struct sock *sk) 
 {
-   return atomic_read(sk-sk_wmem_alloc)  (sk-sk_sndbuf / 2);
+   return atomic_read(sk-sk_wmem_alloc)  (sk-sk_sndbuf  1);
 }
 
 static inline gfp_t gfp_any(void)
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] [IPROUTE]: A workaround to make larger rto_min printed correctly

2007-12-21 Thread Bill Fink
On Fri, 21 Dec 2007, YOSHIFUJI Hideaki wrote:

 In article [EMAIL PROTECTED] (at Fri, 21 Dec 2007 11:24:54 +0900), Satoru 
 SATOH [EMAIL PROTECTED] says:
 
  2007/12/21, Jarek Poplawski [EMAIL PROTECTED]:
   Jarek Poplawski wrote, On 12/20/2007 09:24 PM:
   ...
  
but since it's your patch, I hope you do some additional checking
if it's always like this...
  
  
   ...or maybe only changing this all a little bit will make it look safer!
  
   Jarek P.
  
  
  OK, how about this?
  
  Signed-off-by: Satoru SATOH [EMAIL PROTECTED]
  
   ip/iproute.c |   12 
   1 files changed, 8 insertions(+), 4 deletions(-)
  
  diff --git a/ip/iproute.c b/ip/iproute.c
  index f4200ae..c771b34 100644
  --- a/ip/iproute.c
  +++ b/ip/iproute.c
  @@ -510,16 +510,20 @@ int print_route(const struct sockaddr_nl *who,
  struct nlmsghdr *n, void *arg)
  fprintf(fp,  %u, 
  *(unsigned*)RTA_DATA(mxrta[i]));
  else {
  unsigned val = *(unsigned*)RTA_DATA(mxrta[i]);
  +   unsigned hz1 = hz;
  +   if (hz1  1000)
 
 Why don't you simply use unsigned long long (or maybe uint64_t) here?

I was wondering that too.  And maybe change the (float) cast
to (double) in the fprintf.

-Bill



 Signed-off-by: YOSHIFUJI Hideaki [EMAIL PROTECTED]
 
 --- 
 diff --git a/ip/iproute.c b/ip/iproute.c
 index f4200ae..db9a3b6 100644
 --- a/ip/iproute.c
 +++ b/ip/iproute.c
 @@ -509,16 +509,21 @@ int print_route(const struct sockaddr_nl *who, struct 
 nlmsghdr *n, void *arg)
   i != RTAX_RTO_MIN)
   fprintf(fp,  %u, 
 *(unsigned*)RTA_DATA(mxrta[i]));
   else {
 - unsigned val = *(unsigned*)RTA_DATA(mxrta[i]);
 + unsigned long long val = 
 *(unsigned*)RTA_DATA(mxrta[i]);
 + unsigned div = 1;
  
 - val *= 1000;
   if (i == RTAX_RTT)
 - val /= 8;
 + div = 8;
   else if (i == RTAX_RTTVAR)
 - val /= 4;
 - if (val = hz)
 - fprintf(fp,  %ums, val/hz);
 + div = 4;
   else
 + div = 1;
 +
 + val = val * 1000ULL / div;
 +
 + if (val = hz) {
 + fprintf(fp,  %llums, val/hz);
 + } else
   fprintf(fp,  %.2fms, (float)val/hz);
   }
   }
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH/RFC] [v2] TCP: use non-delayed ACK for congestion control RTT

2007-12-21 Thread David Miller
From: Ilpo_Järvinen [EMAIL PROTECTED]
Date: Wed, 19 Dec 2007 15:30:03 +0200 (EET)

 On Wed, 19 Dec 2007, Gavin McCullagh wrote:
 
  Will do.   I gather I should use the latest net- tree in future when
  submitting patches.
 
 Doh, I owe you apology as I was probably too hasty to point you towards 
 net-2.6.25. I suppose this could by considered as fix as well and 
 therefore could probably be accepted to net-2.6 as well, which is for 
 bugfixes only after merge window is closed. But it's Dave how will make 
 such decisions, not me :-), and it's he who gets to deal with all 
 the resulting conflicts ;-) (I added Cc to him).

When Gavin respins the patch I'll look at in the context of submitting
it as a bug fix.  So Gavin please generate the patch against Linus's
vanilla GIT tree or net-2.6, your choise.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [SOCK] Avoid integer divides where not necessary in include/net/sock.h

2007-12-21 Thread David Miller
From: Eric Dumazet [EMAIL PROTECTED]
Date: Fri, 21 Dec 2007 11:40:51 +0100

 On Fri, 21 Dec 2007 01:55:43 -0800 (PST)
 David Miller [EMAIL PROTECTED] wrote:
 
 Please tell me if you are OK with this solution, or if you prefer
 I change sk_forward_alloc to be unsigned :)

When I was playing with this crap a long time ago I
think I remember that sk-sk_forward_alloc can become
negative in some circumstances.

Or maybe that was just a bug :-)

 Here is the patch handling the change on sk_wmem_queued, sk_sndbuf.
 Keeping small patches may help future bisection anyway...
 
 [SOCK] Avoid integer divides where not necessary in include/net/sock.h
 
 Because sk_wmem_queued, sk_sndbuf are signed, a divide per two
 may force compiler to use an integer divide.
 
 We can instead use a right shift.
 
 Signed-off-by: Eric Dumazet [EMAIL PROTECTED]

I'll apply this, thanks Eric.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [TCP] IPV6 : Change a divide into a right shift in tcp_v6_send_ack()

2007-12-21 Thread David Miller
From: Jeff Garzik [EMAIL PROTECTED]
Date: Fri, 21 Dec 2007 06:26:48 -0500

 YOSHIFUJI Hideaki / 吉藤英明 wrote:
  In article [EMAIL PROTECTED] (at Fri, 21 Dec 2007 07:03:58 +0100), Eric 
  Dumazet [EMAIL PROTECTED] says:
  
  Because tot_len is signed in tcp_v6_send_ack(), tot_len/4 forces compiler
  to emit an integer divide, while we can help it to use a right shift,
  less expensive.
  
  Are you really sure?
  At least, gcc-4.1.2-20061115 (debian) does not make any difference.
 
 Quite true -- thus it is a matter of taste to the programmer.

Not true, the code output does change, check your optimize-for-size
kernel config setting.

This was discussed and explained later in this thread, and I also
explained it to you on IRC Jeff ;-)
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/10] sysfs network namespace support

2007-12-21 Thread Eric W. Biederman
Greg KH [EMAIL PROTECTED] writes:

 On Sat, Dec 01, 2007 at 02:06:58AM -0700, Eric W. Biederman wrote:
 
 Now that we have network namespace support merged it is time to
 revisit the sysfs support so we can remove the dependency on !SYSFS.

 snip

 Oops, I forgot to apply this to my tree.  Eric, you still want this
 submitted, right?

Yes.

I'm am just about to head out of town to visit my parents over Christmas.
So I'm not going to be very responsive until I after the New Year.

Eric
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/3] XFRM: RFC4303 compliant auditing

2007-12-21 Thread Paul Moore
On Friday 21 December 2007 4:43:10 am David Miller wrote:
 From: Paul Moore [EMAIL PROTECTED]
 Date: Thu, 20 Dec 2007 16:42:25 -0500

  This patch adds a number of new IPsec audit events to meet the auditing
  requirements of RFC4303.  This includes audit hooks for the following
  events:
 
   * Could not find a valid SA [sections 2.1, 3.4.2]
 . xfrm_audit_state_notfound()
 . xfrm_audit_state_notfound_simple()
 
   * Sequence number overflow [section 3.3.3]
 . xfrm_audit_state_replay_overflow()
 
   * Replayed packet [section 3.4.3]
 . xfrm_audit_state_replay()
 
   * Integrity check failure [sections 3.4.4.1, 3.4.4.2]
 . xfrm_audit_state_icvfail()
 
  While RFC4304 deals only with ESP most of the changes in this patch apply
  to IPsec in general, i.e. both AH and ESP.  The one case, integrity check
  failure, where ESP specific code had to be modified the same was done to
  the AH code for the sake of consistency.
 
  Signed-off-by: Paul Moore [EMAIL PROTECTED]

 This doesn't apply at all to net-2.6.25, in particular
 xfrm6_input_addr() doesn't even have a local variable
 named xfrm_vec_one let alone the conditional where you're
 adding the state notfound audit hook.

 Please respin this and the third patch, thanks.

Sorry about that, I must have missed something (or probably just updated the 
wrong tree on accident).  I'll respin the patches and send them out today.

-- 
paul moore
linux security @ hp
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [TCP] tcp_write_timeout.c cleanup

2007-12-21 Thread Arnaldo Carvalho de Melo
Em Fri, Dec 21, 2007 at 04:30:26AM -0800, David Miller escreveu:
 From: Arnaldo Carvalho de Melo [EMAIL PROTECTED]
 Date: Fri, 21 Dec 2007 10:23:16 -0200
 
  Em Fri, Dec 21, 2007 at 06:56:17AM +0100, Eric Dumazet escreveu:
   +static void tcp_mtu_probing(struct inet_connection_sock *icsk, struct 
   sock *sk)
   +{
   + int mss;
  -   int mss;
  
   +
   + /* Black hole detection */
   + if (sysctl_tcp_mtu_probing) {
   + if (!icsk-icsk_mtup.enabled) {
   + icsk-icsk_mtup.enabled = 1;
   + tcp_sync_mss(sk, icsk-icsk_pmtu_cookie);
   + } else {
   + struct tcp_sock *tp = tcp_sk(sk);
  +   int mss;
   + mss = tcp_mtu_to_mss(sk, icsk-icsk_mtup.search_low)/2;
 
 I've checked in this, ummm... patch :-)

Thanks!

- Arnaldo
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [SOCK] Avoid integer divides where not necessary in include/net/sock.h

2007-12-21 Thread Herbert Xu
David Miller [EMAIL PROTECTED] wrote:

 When I was playing with this crap a long time ago I
 think I remember that sk-sk_forward_alloc can become
 negative in some circumstances.
 
 Or maybe that was just a bug :-)

Yeah we had a few bugs there in the early days of TSO but it's
been quiet for the last 18 months.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmVHI~} [EMAIL PROTECTED]
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [TCP] Avoid a divide in tcp_mtu_probing()

2007-12-21 Thread David Miller
From: Eric Dumazet [EMAIL PROTECTED]
Date: Fri, 21 Dec 2007 14:32:41 +0100

 tcp_mtu_to_mss() being signed, compiler might emit an integer divide
 to compute tcp_mtu_to_mss()/2 .
 
 Using a right shift is OK here and less expensive.
 
 Signed-off-by: Eric Dumazet [EMAIL PROTECTED]

Applied, thanks Eric.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Top 10 kernel oopses/warnings for the week of December 21st 2007

2007-12-21 Thread Andi Kleen
Arjan van de Ven [EMAIL PROTECTED] writes:

   
 Rank 8: __change_page_attr
   BUG at arch/x86/mm/pageattr_64.c:176
   Reported 2 times
   Reported this week for 2.6.24-rc5; history goes back to 2.6.15

There is no BUG on this line on 2.6.24-rc5.   Since there are many
BUG_ONs in this file it is unclear which you mean.

Could you always include the version of the kernel where the actual
oops in the line came from?

Anyways there are a lot of third party modules who do strange 
things with c_p_a(), not always legal, so you might look up out for that 
pattern too. Perhaps report the out of tree modules loaded in the 
summary too?

-Andi

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: After many hours all outbound connections get stuck in SYN_SENT

2007-12-21 Thread James Nichols
 Huh?  Do you mean a PIX blade in a Cisco switch-router chassis? It
 would be very useful if you could be less vague about the
 equipment in use.

Right it's a PIX blade in Cisco chassis.  The PIX is running ASA version 7.0(6)



 That depends more on your customers' networking attributes
 then you are sharing or perhaps even know.  Perhaps your customer
 base is very Window-skewed and you simply aren't seeing any Sack
 Permitted negotiations for the first 37.999 hours. Or
 perhaps you've had a network glitch, and all of your
 connections have done a Selective Ack, which the firewall
 has trashed, leaving all the connections in a wacko state,
 not just a few which you haven't noticed.

I definitely see SACKs over the course of the 38 hours.  I don't have
any Windows hosts, they are all running Linux except for a very small
number that run a proprietary OS and webserver.  If the firewall were
to get trashed and hose the currently active connections, I would
expect that newly initiated connections would work.


 The actual failure mode needs a packet trace to determine,
 but you should be able to do this yourself (or ask your
 local network engineering staff).

 If your firewall is trashing the Sack field, then it needs
 to be fixed.  Time to raise a case with the Cisco TAC and
 ask them directly if your PIX version has bug CSCse14419.
 You can't expect Sack to work when it's being fed trash,
 so it is important to make sure that is not happening.

I've pinged our dude that handles the PIX stuff to see about getting
an upgrade to 7.0(7).  I should be able to get a packet trace, but it
may take some time.  At this point I'm getting a lot of resistance and
people here telling me to just turn SACK off and not worry about what
is causing this issue, but I'd really like to get to the bottom of it.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/3] XFRM: RFC4303 compliant auditing

2007-12-21 Thread Paul Moore
On Friday 21 December 2007 8:27:23 am Paul Moore wrote:
 On Friday 21 December 2007 4:43:10 am David Miller wrote:
  From: Paul Moore [EMAIL PROTECTED]
  Date: Thu, 20 Dec 2007 16:42:25 -0500
 
   This patch adds a number of new IPsec audit events to meet the auditing
   requirements of RFC4303.  This includes audit hooks for the following
   events:
  
* Could not find a valid SA [sections 2.1, 3.4.2]
  . xfrm_audit_state_notfound()
  . xfrm_audit_state_notfound_simple()
  
* Sequence number overflow [section 3.3.3]
  . xfrm_audit_state_replay_overflow()
  
* Replayed packet [section 3.4.3]
  . xfrm_audit_state_replay()
  
* Integrity check failure [sections 3.4.4.1, 3.4.4.2]
  . xfrm_audit_state_icvfail()
  
   While RFC4304 deals only with ESP most of the changes in this patch
   apply to IPsec in general, i.e. both AH and ESP.  The one case,
   integrity check failure, where ESP specific code had to be modified the
   same was done to the AH code for the sake of consistency.
  
   Signed-off-by: Paul Moore [EMAIL PROTECTED]
 
  This doesn't apply at all to net-2.6.25, in particular
  xfrm6_input_addr() doesn't even have a local variable
  named xfrm_vec_one let alone the conditional where you're
  adding the state notfound audit hook.
 
  Please respin this and the third patch, thanks.

 Sorry about that, I must have missed something (or probably just updated
 the wrong tree on accident).  I'll respin the patches and send them out
 today.

Ah, looks like I may not be crazy after all!  It looks like the XFRM patches 
from Masahide NAKAMURA were pulled into net-2.6.25 just before mine last 
night which caused my patches to conflict ...

-- 
paul moore
linux security @ hp
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] [IPROUTE]: A workaround to make larger rto_min printed correctly

2007-12-21 Thread Satoru SATOH
I agree.

I mistakenly thought hz in that context must be larger than 1000..
As it's uncertain, your's looks much simpler and better.

(btw, the lines else  div = 1 is not needed, is it?)

Thanks,
Satoru SATOH

2007/12/21, YOSHIFUJI Hideaki / 吉藤英明 [EMAIL PROTECTED]:
(snip)
 Why don't you simply use unsigned long long (or maybe uint64_t) here?

 Signed-off-by: YOSHIFUJI Hideaki [EMAIL PROTECTED]

 ---
 diff --git a/ip/iproute.c b/ip/iproute.c
 index f4200ae..db9a3b6 100644
 --- a/ip/iproute.c
 +++ b/ip/iproute.c
 @@ -509,16 +509,21 @@ int print_route(const struct sockaddr_nl *who, struct 
 nlmsghdr *n, void *arg)
 i != RTAX_RTO_MIN)
 fprintf(fp,  %u, 
 *(unsigned*)RTA_DATA(mxrta[i]));
 else {
 -   unsigned val = *(unsigned*)RTA_DATA(mxrta[i]);
 +   unsigned long long val = 
 *(unsigned*)RTA_DATA(mxrta[i]);
 +   unsigned div = 1;

 -   val *= 1000;
 if (i == RTAX_RTT)
 -   val /= 8;
 +   div = 8;
 else if (i == RTAX_RTTVAR)
 -   val /= 4;
 -   if (val = hz)
 -   fprintf(fp,  %ums, val/hz);
 +   div = 4;
 else
 +   div = 1;
 +
 +   val = val * 1000ULL / div;
 +
 +   if (val = hz) {
 +   fprintf(fp,  %llums, val/hz);
 +   } else
 fprintf(fp,  %.2fms, (float)val/hz);
 }
 }

 --yoshfuji
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[TCP] Avoid a divide in tcp_mtu_probing()

2007-12-21 Thread Eric Dumazet
tcp_mtu_to_mss() being signed, compiler might emit an integer divide
to compute tcp_mtu_to_mss()/2 .

Using a right shift is OK here and less expensive.

Signed-off-by: Eric Dumazet [EMAIL PROTECTED]

diff --git a/net/ipv4/tcp_timer.c b/net/ipv4/tcp_timer.c
index ea111e9..ea85bc0 100644
--- a/net/ipv4/tcp_timer.c
+++ b/net/ipv4/tcp_timer.c
@@ -125,7 +125,7 @@ static void tcp_mtu_probing(struct inet_connection_sock 
*icsk, struct sock *sk)
struct tcp_sock *tp = tcp_sk(sk);
int mss;
 
-   mss = tcp_mtu_to_mss(sk, icsk-icsk_mtup.search_low)/2;
+   mss = tcp_mtu_to_mss(sk, icsk-icsk_mtup.search_low)  
1;
mss = min(sysctl_tcp_base_mss, mss);
mss = max(mss, 68 - tp-tcp_header_len);
icsk-icsk_mtup.search_low = tcp_mss_to_mtu(sk, mss);
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [TCP] tcp_write_timeout.c cleanup

2007-12-21 Thread David Miller
From: Arnaldo Carvalho de Melo [EMAIL PROTECTED]
Date: Fri, 21 Dec 2007 10:23:16 -0200

 Em Fri, Dec 21, 2007 at 06:56:17AM +0100, Eric Dumazet escreveu:
  +static void tcp_mtu_probing(struct inet_connection_sock *icsk, struct sock 
  *sk)
  +{
  +   int mss;
 -   int mss;
   
  +
  +   /* Black hole detection */
  +   if (sysctl_tcp_mtu_probing) {
  +   if (!icsk-icsk_mtup.enabled) {
  +   icsk-icsk_mtup.enabled = 1;
  +   tcp_sync_mss(sk, icsk-icsk_pmtu_cookie);
  +   } else {
  +   struct tcp_sock *tp = tcp_sk(sk);
 + int mss;
  +   mss = tcp_mtu_to_mss(sk, icsk-icsk_mtup.search_low)/2;

I've checked in this, ummm... patch :-)

commit 323f3f2f31840f94e540ec5a0ce33593d05dd8d9
Author: David S. Miller [EMAIL PROTECTED]
Date:   Fri Dec 21 04:29:16 2007 -0800

[TCP]: Move mss variable in tcp_mtu_probing()

Down into the only scope where it is used.

Signed-off-by: David S. Miller [EMAIL PROTECTED]

diff --git a/net/ipv4/tcp_timer.c b/net/ipv4/tcp_timer.c
index 8f14808..ea111e9 100644
--- a/net/ipv4/tcp_timer.c
+++ b/net/ipv4/tcp_timer.c
@@ -116,8 +116,6 @@ static int tcp_orphan_retries(struct sock *sk, int alive)
 
 static void tcp_mtu_probing(struct inet_connection_sock *icsk, struct sock *sk)
 {
-   int mss;
-
/* Black hole detection */
if (sysctl_tcp_mtu_probing) {
if (!icsk-icsk_mtup.enabled) {
@@ -125,6 +123,8 @@ static void tcp_mtu_probing(struct inet_connection_sock 
*icsk, struct sock *sk)
tcp_sync_mss(sk, icsk-icsk_pmtu_cookie);
} else {
struct tcp_sock *tp = tcp_sk(sk);
+   int mss;
+
mss = tcp_mtu_to_mss(sk, icsk-icsk_mtup.search_low)/2;
mss = min(sysctl_tcp_base_mss, mss);
mss = max(mss, 68 - tp-tcp_header_len);
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [TCP] tcp_write_timeout.c cleanup

2007-12-21 Thread Arnaldo Carvalho de Melo
Em Fri, Dec 21, 2007 at 06:56:17AM +0100, Eric Dumazet escreveu:
 Before submiting a patch to change a divide to a right shift, I felt
 necessary to create a helper function tcp_mtu_probing() to reduce length of 
 lines exceeding 100 chars in tcp_write_timeout().

 Signed-off-by: Eric Dumazet [EMAIL PROTECTED]


 diff --git a/net/ipv4/tcp_timer.c b/net/ipv4/tcp_timer.c
 index d8970ec..8f14808 100644
 --- a/net/ipv4/tcp_timer.c
 +++ b/net/ipv4/tcp_timer.c
 @@ -114,13 +114,31 @@ static int tcp_orphan_retries(struct sock *sk, int 
 alive)
   return retries;
  }
  
 +static void tcp_mtu_probing(struct inet_connection_sock *icsk, struct sock 
 *sk)
 +{
 + int mss;
-   int mss;

 +
 + /* Black hole detection */
 + if (sysctl_tcp_mtu_probing) {
 + if (!icsk-icsk_mtup.enabled) {
 + icsk-icsk_mtup.enabled = 1;
 + tcp_sync_mss(sk, icsk-icsk_pmtu_cookie);
 + } else {
 + struct tcp_sock *tp = tcp_sk(sk);
+   int mss;
 + mss = tcp_mtu_to_mss(sk, icsk-icsk_mtup.search_low)/2;
 + mss = min(sysctl_tcp_base_mss, mss);
 + mss = max(mss, 68 - tp-tcp_header_len);
 + icsk-icsk_mtup.search_low = tcp_mss_to_mtu(sk, mss);
 + tcp_sync_mss(sk, icsk-icsk_pmtu_cookie);
 + }
 + }
 +}

:-)

- Arnaldo
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [TCP] IPV6 : Change a divide into a right shift in tcp_v6_send_ack()

2007-12-21 Thread Jeff Garzik

David Miller wrote:

From: Jeff Garzik [EMAIL PROTECTED]
Date: Fri, 21 Dec 2007 06:26:48 -0500


YOSHIFUJI Hideaki / 吉藤英明 wrote:

In article [EMAIL PROTECTED] (at Fri, 21 Dec 2007 07:03:58 +0100), Eric Dumazet 
[EMAIL PROTECTED] says:


Because tot_len is signed in tcp_v6_send_ack(), tot_len/4 forces compiler
to emit an integer divide, while we can help it to use a right shift,
less expensive.

Are you really sure?
At least, gcc-4.1.2-20061115 (debian) does not make any difference.

Quite true -- thus it is a matter of taste to the programmer.


Not true, the code output does change, check your optimize-for-size
kernel config setting.

This was discussed and explained later in this thread, and I also
explained it to you on IRC Jeff ;-)


[looks]

For signed integers, this is true.

For unsigned integers, the code output is the same, regardless of 
optimization setting.


Jeff


--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] [TCP]: Remove seq_rtt ptr from clean_rtx_queue args

2007-12-21 Thread Ilpo Järvinen

While checking Gavin's patch I noticed that the returned seq_rtt
is not used by the caller.

Signed-off-by: Ilpo Järvinen [EMAIL PROTECTED]
---
I think it shouldn't introduce new conflicts between net-2.6.25 and 
net-2.6 (tested with git-am -3 to both).

 net/ipv4/tcp_input.c |7 ++-
 1 files changed, 2 insertions(+), 5 deletions(-)

diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 0f0c1c9..a020e83 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -2638,8 +2638,7 @@ static u32 tcp_tso_acked(struct sock *sk, struct sk_buff 
*skb)
  * is before the ack sequence we can discard it as it's confirmed to have
  * arrived at the other end.
  */
-static int tcp_clean_rtx_queue(struct sock *sk, s32 *seq_rtt_p,
-  int prior_fackets)
+static int tcp_clean_rtx_queue(struct sock *sk, int prior_fackets)
 {
struct tcp_sock *tp = tcp_sk(sk);
const struct inet_connection_sock *icsk = inet_csk(sk);
@@ -2803,7 +2802,6 @@ static int tcp_clean_rtx_queue(struct sock *sk, s32 
*seq_rtt_p,
}
}
 #endif
-   *seq_rtt_p = seq_rtt;
return flag;
 }
 
@@ -3044,7 +3042,6 @@ static int tcp_ack(struct sock *sk, struct sk_buff *skb, 
int flag)
u32 ack = TCP_SKB_CB(skb)-ack_seq;
u32 prior_in_flight;
u32 prior_fackets;
-   s32 seq_rtt;
int prior_packets;
int frto_cwnd = 0;
 
@@ -3111,7 +3108,7 @@ static int tcp_ack(struct sock *sk, struct sk_buff *skb, 
int flag)
prior_in_flight = tcp_packets_in_flight(tp);
 
/* See if we can take anything off of the retransmit queue. */
-   flag |= tcp_clean_rtx_queue(sk, seq_rtt, prior_fackets);
+   flag |= tcp_clean_rtx_queue(sk, prior_fackets);
 
if (tp-frto_counter)
frto_cwnd = tcp_process_frto(sk, flag);
-- 
1.5.0.6


Re: [PATCH 2/3] XFRM: RFC4303 compliant auditing

2007-12-21 Thread Paul Moore
On Friday 21 December 2007 9:02:41 am David Miller wrote:
 From: Paul Moore [EMAIL PROTECTED]
 Date: Fri, 21 Dec 2007 08:51:22 -0500

  Ah, looks like I may not be crazy after all!  It looks like the XFRM
  patches from Masahide NAKAMURA were pulled into net-2.6.25 just before
  mine last night which caused my patches to conflict ...

 Sorry.  I had double checked and was going to rework your patches
 by hand if that was the case, but for some reason I didn't see
 his patches as being the culprit causing the rejects when I
 checked things out.

 I must have looked over the history correctly.

No harm, no foul.  I've just about finished respining the patches, expect them 
shortly ...

-- 
paul moore
linux security @ hp
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[patch net-2.6.25 0/2][NETNS] prepare the neigh subsys to handle network namespaces v2

2007-12-21 Thread Daniel Lezcano
These two patches are coming from Eric Biederman patchset
to make the neighbouring aware of the namespaces.

The description in the patch headers is big enough to add
more comments here  :) 

Changelog:
 v2 :
fixed bad coding style
 v1 : 
initial post

-- 
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[patch net-2.6.25 1/2][NETNS] net: Modify the neighbour table code so it handles multiple network namespaces

2007-12-21 Thread Daniel Lezcano
I'm actually surprised at how much was involved.  At first glance it appears
that the neighbour table data structures are already split by network device
so all that should be needed is to modify the user interface commands
to filter the set of neighbours by the network namespace of their devices.

However a couple things turned up while I was reading through the code.
The proxy neighbour table allows entries with no network device, and
the neighbour parms are per network device (except for the defaults)
so they now need a per network namespace default.

So I updated the two structures (which surprised me) with their very
own network namespace parameter.  Updated the relevant lookup and
destroy routines with a network namespace parameter and modified
the code that interacts with users to filter out neighbour
table entries for devices of other namespaces.

I'm a little concerned that we can modify and display the global
table configuration and from all network namespaces.  But
this appears good enough for now.

I keep thinking modifying the neighbour table to have per network
namespace instances of each table type would should be cleaner.  The
hash table is already dynamically sized so there are it is not a limiter.
The default parameter would be straight forward to take care of.  However
when I look at the how the network table is built and used I still find
some assumptions that there is only a single neighbour table for each
type of table in the kernel.  The netlink operations, neigh_seq_start,
the non-core network users that call neigh_lookup.  So while it might
be doable it would require more refactoring than my current approach
of just doing a little extra filtering in the code.

Signed-off-by: Eric W. Biederman [EMAIL PROTECTED]
Signed-off-by: Daniel Lezcano [EMAIL PROTECTED]
---
 include/net/neighbour.h |   10 ++--
 net/atm/clip.c  |   15 +-
 net/core/neighbour.c|  116 +---
 net/decnet/dn_neigh.c   |6 +-
 net/decnet/dn_route.c   |2 
 net/ipv4/arp.c  |   12 ++--
 net/ipv6/ip6_output.c   |2 
 net/ipv6/ndisc.c|4 -
 8 files changed, 106 insertions(+), 61 deletions(-)

Index: net-2.6.25/include/net/neighbour.h
===
--- net-2.6.25.orig/include/net/neighbour.h
+++ net-2.6.25/include/net/neighbour.h
@@ -34,6 +34,7 @@ struct neighbour;
 
 struct neigh_parms
 {
+   struct net *net;
struct net_device *dev;
struct neigh_parms *next;
int (*neigh_setup)(struct neighbour *);
@@ -126,7 +127,8 @@ struct neigh_ops
 struct pneigh_entry
 {
struct pneigh_entry *next;
-   struct net_device   *dev;
+   struct net  *net;
+   struct net_device   *dev;
u8  flags;
u8  key[0];
 };
@@ -187,6 +189,7 @@ extern struct neighbour *   neigh_lookup(s
 const void *pkey,
 struct net_device *dev);
 extern struct neighbour *  neigh_lookup_nodev(struct neigh_table *tbl,
+  struct net *net,
   const void *pkey);
 extern struct neighbour *  neigh_create(struct neigh_table *tbl,
 const void *pkey,
@@ -211,8 +214,8 @@ extern unsigned longneigh_rand_reach_t
 
 extern voidpneigh_enqueue(struct neigh_table *tbl, struct 
neigh_parms *p,
   struct sk_buff *skb);
-extern struct pneigh_entry *pneigh_lookup(struct neigh_table *tbl, const 
void *key, struct net_device *dev, int creat);
-extern int pneigh_delete(struct neigh_table *tbl, const 
void *key, struct net_device *dev);
+extern struct pneigh_entry *pneigh_lookup(struct neigh_table *tbl, struct 
net *net, const void *key, struct net_device *dev, int creat);
+extern int pneigh_delete(struct neigh_table *tbl, struct 
net *net, const void *key, struct net_device *dev);
 
 extern void neigh_app_ns(struct neighbour *n);
 extern void neigh_for_each(struct neigh_table *tbl, void (*cb)(struct 
neighbour *, void *), void *cookie);
@@ -220,6 +223,7 @@ extern void __neigh_for_each_release(str
 extern void pneigh_for_each(struct neigh_table *tbl, void (*cb)(struct 
pneigh_entry *));
 
 struct neigh_seq_state {
+   struct net *net;
struct neigh_table *tbl;
void *(*neigh_sub_iter)(struct neigh_seq_state *state,
struct neighbour *n, loff_t *pos);
Index: net-2.6.25/net/atm/clip.c
===
--- net-2.6.25.orig/net/atm/clip.c
+++ net-2.6.25/net/atm/clip.c
@@ -949,6 +949,11 @@ static int arp_seq_open(struct inode *in
 
seq = file-private_data;

[SOCK] Avoid divides in sk_stream_pages() and __sk_stream_mem_reclaim()

2007-12-21 Thread Eric Dumazet
Hi David

This is the last one of this boring patch suite.

I prefered to not change sk_forward_alloc type, I prefer to wait 
18 months more before taking this responsibility :)

Merry Christmas everybody

Eric

[SOCK] Avoid divides in sk_stream_pages() and __sk_stream_mem_reclaim()

sk_forward_alloc being signed, we should take care of divides by
SK_STREAM_MEM_QUANTUM we do in sk_stream_pages() and 
__sk_stream_mem_reclaim()

This patchs introduces SK_STREAM_MEM_QUANTUM_SHIFT, defined
as ilog2(SK_STREAM_MEM_QUANTUM), to be able to use right
shifts instead of plain divides.

This should help compiler to choose right shifts instead of
expensive divides (as seen with CONFIG_CC_OPTIMIZE_FOR_SIZE=y on x86)

Signed-off-by: Eric Dumazet [EMAIL PROTECTED]

diff --git a/include/net/sock.h b/include/net/sock.h
index 4456453..a134be1 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -716,10 +716,11 @@ extern void __sk_stream_mem_reclaim(struct sock *sk);
 extern int sk_stream_mem_schedule(struct sock *sk, int size, int kind);
 
 #define SK_STREAM_MEM_QUANTUM ((int)PAGE_SIZE)
+#define SK_STREAM_MEM_QUANTUM_SHIFT ilog2(SK_STREAM_MEM_QUANTUM)
 
 static inline int sk_stream_pages(int amt)
 {
-   return DIV_ROUND_UP(amt, SK_STREAM_MEM_QUANTUM);
+   return (amt + SK_STREAM_MEM_QUANTUM - 1)  SK_STREAM_MEM_QUANTUM_SHIFT;
 }
 
 static inline void sk_stream_mem_reclaim(struct sock *sk)
diff --git a/net/core/stream.c b/net/core/stream.c
index 5586879..bf188ff 100644
--- a/net/core/stream.c
+++ b/net/core/stream.c
@@ -196,7 +196,7 @@ EXPORT_SYMBOL(sk_stream_error);
 
 void __sk_stream_mem_reclaim(struct sock *sk)
 {
-   atomic_sub(sk-sk_forward_alloc / SK_STREAM_MEM_QUANTUM,
+   atomic_sub(sk-sk_forward_alloc  SK_STREAM_MEM_QUANTUM_SHIFT,
   sk-sk_prot-memory_allocated);
sk-sk_forward_alloc = SK_STREAM_MEM_QUANTUM - 1;
if (*sk-sk_prot-memory_pressure 
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] IPv4: Fix a typo in tcp_v4_do_calc_md5_hash() which prevents compilation

2007-12-21 Thread Paul Moore
It's spelled unsigned.

Signed-off-by: Paul Moore [EMAIL PROTECTED]
---

 net/ipv4/tcp_ipv4.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
index 9cb92b8..9aea88b 100644
--- a/net/ipv4/tcp_ipv4.c
+++ b/net/ipv4/tcp_ipv4.c
@@ -1020,7 +1020,7 @@ static int tcp_v4_parse_md5_keys(struct sock *sk, char 
__user *optval,
 static int tcp_v4_do_calc_md5_hash(char *md5_hash, struct tcp_md5sig_key *key,
   __be32 saddr, __be32 daddr,
   struct tcphdr *th, int protocol,
-  unsinged int tcplen)
+  unsigned int tcplen)
 {
struct scatterlist sg[4];
__u16 data_len;

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] [IPROUTE]: A workaround to make larger rto_min printed correctly

2007-12-21 Thread YOSHIFUJI Hideaki / 吉藤英明
In article [EMAIL PROTECTED] (at Fri, 21 Dec 2007 22:49:59 +0900), Satoru 
SATOH [EMAIL PROTECTED] says:

 I agree.
 
 I mistakenly thought hz in that context must be larger than 1000..
 As it's uncertain, your's looks much simpler and better.
 
 (btw, the lines else  div = 1 is not needed, is it?)

Simplest fix is as follows:

Signed-off-by: YOSHIFUJI Hideaki [EMAIL PROTECTED]
--
diff --git a/ip/iproute.c b/ip/iproute.c
index f4200ae..7a885b0 100644
--- a/ip/iproute.c
+++ b/ip/iproute.c
@@ -509,7 +509,7 @@ int print_route(const struct sockaddr_nl *who, struct 
nlmsghdr *n, void *arg)
i != RTAX_RTO_MIN)
fprintf(fp,  %u, 
*(unsigned*)RTA_DATA(mxrta[i]));
else {
-   unsigned val = *(unsigned*)RTA_DATA(mxrta[i]);
+   unsigned long long val = 
*(unsigned*)RTA_DATA(mxrta[i]);
 
val *= 1000;
if (i == RTAX_RTT)
@@ -517,7 +517,7 @@ int print_route(const struct sockaddr_nl *who, struct 
nlmsghdr *n, void *arg)
else if (i == RTAX_RTTVAR)
val /= 4;
if (val = hz)
-   fprintf(fp,  %ums, val/hz);
+   fprintf(fp,  %llums, val/hz);
else
fprintf(fp,  %.2fms, (float)val/hz);
}

-- 
YOSHIFUJI Hideaki @ USAGI Project  [EMAIL PROTECTED]
GPG-FP  : 9022 65EB 1ECF 3AD1 0BDF  80D8 4807 F894 E062 0EEA
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH/RFC] [v2] TCP: use non-delayed ACK for congestion control RTT

2007-12-21 Thread Gavin McCullagh
On Fri, 21 Dec 2007, David Miller wrote:

 When Gavin respins the patch I'll look at in the context of submitting
 it as a bug fix.  So Gavin please generate the patch against Linus's
 vanilla GIT tree or net-2.6, your choise.

The existing patch was against Linus' linux-2.6.git from a few days ago so
I've updated my tree and regenerated the patch (below).  Is that the right
one?

I'm just checking through the existing CA modules.  I don't see the rtt
used for RTO anywhere.  This is what I gather they're each using rtt for.

tcp_highspeed.c doesn't implement .pkts_acked
tcp_hybla.c doesn't implement .pkts_acked
tcp_scalable.c  doesn't implement .pkts_acked
tcp_bic.c   ignores rtt value from .pkts_acked

tcp_lp.cseems to ignore rtt value from .pkts_acked (despite setting
TCP_CONG_RTT_STAMP for high res rtts -- why?)
tcp_vegas.c uses high res rtt to measure congestion signal, increase,
backoff -- TCP_CONG_RTT_STAMP set so doesn't use seq_rtt
tcp_veno.c  uses high res rtt to measure congestion signal, increase,
backoff -- TCP_CONG_RTT_STAMP set so doesn't use seq_rtt
tcp_yeah.c  uses high res rtt to measure congestion signal, increase,
backoff -- TCP_CONG_RTT_STAMP set so doesn't use seq_rtt
tcp_illinois.c  uses rtt to scale increase, backoff 
-- TCP_CONG_RTT_STAMP set so doesn't use seq_rtt

tcp_htcp.c  uses rtt to scale increase, backoff 
tcp_cubic.c uses rtt to scale increase, backoff
tcp_westwood.c  scales backoff using rtt

So as far as I can tell, timeout stuff is not ever altered using
pkts_acked() so I guess this fix only affects westwood, htcp and cubic just
now.

I need to re-read properly, but I think the same problem affects the
microsecond values where TCP_CONG_RTT_STAMP is set (used by vegas, veno,
yeah, illinois).  I might follow up with another patch which changes the
behaviour where TCP_CONG_RTT_STAMP when I'm more sure of that.

Thanks,
Gavin


Signed-off-by: Gavin McCullagh [EMAIL PROTECTED]

diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 889c893..6fb7989 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -2651,6 +2651,7 @@ static int tcp_clean_rtx_queue(struct sock *sk, s32 
*seq_rtt_p,
u32 cnt = 0;
u32 reord = tp-packets_out;
s32 seq_rtt = -1;
+   s32 ca_seq_rtt = -1;
ktime_t last_ackt = net_invalid_timestamp();
 
while ((skb = tcp_write_queue_head(sk))  skb != tcp_send_head(sk)) {
@@ -2686,13 +2687,15 @@ static int tcp_clean_rtx_queue(struct sock *sk, s32 
*seq_rtt_p,
if (sacked  TCPCB_SACKED_RETRANS)
tp-retrans_out -= packets_acked;
flag |= FLAG_RETRANS_DATA_ACKED;
+   ca_seq_rtt = -1;
seq_rtt = -1;
if ((flag  FLAG_DATA_ACKED) ||
(packets_acked  1))
flag |= FLAG_NONHEAD_RETRANS_ACKED;
} else {
+   ca_seq_rtt = now - scb-when;
if (seq_rtt  0) {
-   seq_rtt = now - scb-when;
+   seq_rtt = ca_seq_rtt;
if (fully_acked)
last_ackt = skb-tstamp;
}
@@ -2709,8 +2712,9 @@ static int tcp_clean_rtx_queue(struct sock *sk, s32 
*seq_rtt_p,
!before(end_seq, tp-snd_up))
tp-urg_mode = 0;
} else {
+   ca_seq_rtt = now - scb-when;
if (seq_rtt  0) {
-   seq_rtt = now - scb-when;
+   seq_rtt = ca_seq_rtt;
if (fully_acked)
last_ackt = skb-tstamp;
}
@@ -2772,8 +2776,8 @@ static int tcp_clean_rtx_queue(struct sock *sk, s32 
*seq_rtt_p,
 net_invalid_timestamp()))
rtt_us = 
ktime_us_delta(ktime_get_real(),
last_ackt);
-   else if (seq_rtt  0)
-   rtt_us = jiffies_to_usecs(seq_rtt);
+   else if (ca_seq_rtt  0)
+   rtt_us = jiffies_to_usecs(ca_seq_rtt);
}
 
ca_ops-pkts_acked(sk, pkts_acked, rtt_us);
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/2] XFRM: RFC4303 compliant auditing

2007-12-21 Thread Paul Moore
This patch adds a number of new IPsec audit events to meet the auditing
requirements of RFC4303.  This includes audit hooks for the following events:

 * Could not find a valid SA [sections 2.1, 3.4.2]
   . xfrm_audit_state_notfound()
   . xfrm_audit_state_notfound_simple()

 * Sequence number overflow [section 3.3.3]
   . xfrm_audit_state_replay_overflow()

 * Replayed packet [section 3.4.3]
   . xfrm_audit_state_replay()

 * Integrity check failure [sections 3.4.4.1, 3.4.4.2]
   . xfrm_audit_state_icvfail()

While RFC4304 deals only with ESP most of the changes in this patch apply to
IPsec in general, i.e. both AH and ESP.  The one case, integrity check
failure, where ESP specific code had to be modified the same was done to the
AH code for the sake of consistency.

Signed-off-by: Paul Moore [EMAIL PROTECTED]
---

 include/net/xfrm.h |   33 --
 net/ipv4/ah4.c |4 +
 net/ipv4/esp4.c|1 
 net/ipv6/ah6.c |2 -
 net/ipv6/esp6.c|1 
 net/ipv6/xfrm6_input.c |1 
 net/xfrm/xfrm_input.c  |3 +
 net/xfrm/xfrm_output.c |2 +
 net/xfrm/xfrm_policy.c |   14 ++--
 net/xfrm/xfrm_state.c  |  153 +++-
 10 files changed, 180 insertions(+), 34 deletions(-)

diff --git a/include/net/xfrm.h b/include/net/xfrm.h
index 2ed25e7..9141bbd 100644
--- a/include/net/xfrm.h
+++ b/include/net/xfrm.h
@@ -565,26 +565,33 @@ struct xfrm_audit
 };
 
 #ifdef CONFIG_AUDITSYSCALL
-static inline struct audit_buffer *xfrm_audit_start(u32 auid, u32 secid)
+static inline struct audit_buffer *xfrm_audit_start(const char *op)
 {
struct audit_buffer *audit_buf = NULL;
-   char *secctx;
-   u32 secctx_len;
 
+   if (audit_enabled == 0)
+   return NULL;
audit_buf = audit_log_start(current-audit_context, GFP_ATOMIC,
- AUDIT_MAC_IPSEC_EVENT);
+   AUDIT_MAC_IPSEC_EVENT);
if (audit_buf == NULL)
return NULL;
+   audit_log_format(audit_buf, op=%s, op);
+   return audit_buf;
+}
 
-   audit_log_format(audit_buf, auid=%u, auid);
+static inline void xfrm_audit_helper_usrinfo(u32 auid, u32 secid,
+struct audit_buffer *audit_buf)
+{
+   char *secctx;
+   u32 secctx_len;
 
+   audit_log_format(audit_buf,  auid=%u, auid);
if (secid != 0 
security_secid_to_secctx(secid, secctx, secctx_len) == 0) {
audit_log_format(audit_buf,  subj=%s, secctx);
security_release_secctx(secctx, secctx_len);
} else
audit_log_task_context(audit_buf);
-   return audit_buf;
 }
 
 extern void xfrm_audit_policy_add(struct xfrm_policy *xp, int result,
@@ -595,11 +602,22 @@ extern void xfrm_audit_state_add(struct xfrm_state *x, 
int result,
 u32 auid, u32 secid);
 extern void xfrm_audit_state_delete(struct xfrm_state *x, int result,
u32 auid, u32 secid);
+extern void xfrm_audit_state_replay_overflow(struct xfrm_state *x,
+struct sk_buff *skb);
+extern void xfrm_audit_state_notfound_simple(struct sk_buff *skb, u16 family);
+extern void xfrm_audit_state_notfound(struct sk_buff *skb, u16 family,
+ __be32 net_spi, __be32 net_seq);
+extern void xfrm_audit_state_icvfail(struct xfrm_state *x,
+struct sk_buff *skb, u8 proto);
 #else
 #define xfrm_audit_policy_add(x, r, a, s)  do { ; } while (0)
 #define xfrm_audit_policy_delete(x, r, a, s)   do { ; } while (0)
 #define xfrm_audit_state_add(x, r, a, s)   do { ; } while (0)
 #define xfrm_audit_state_delete(x, r, a, s)do { ; } while (0)
+#define xfrm_audit_state_replay_overflow(x, s) do { ; } while (0)
+#define xfrm_audit_state_notfound_simple(s, f) do { ; } while (0)
+#define xfrm_audit_state_notfound(s, f, sp, sq)do { ; } while (0)
+#define xfrm_audit_state_icvfail(x, s, p)  do { ; } while (0)
 #endif /* CONFIG_AUDITSYSCALL */
 
 static inline void xfrm_pol_hold(struct xfrm_policy *policy)
@@ -1214,7 +1232,8 @@ extern int xfrm_state_delete(struct xfrm_state *x);
 extern int xfrm_state_flush(u8 proto, struct xfrm_audit *audit_info);
 extern void xfrm_sad_getinfo(struct xfrmk_sadinfo *si);
 extern void xfrm_spd_getinfo(struct xfrmk_spdinfo *si);
-extern int xfrm_replay_check(struct xfrm_state *x, __be32 seq);
+extern int xfrm_replay_check(struct xfrm_state *x,
+struct sk_buff *skb, __be32 seq);
 extern void xfrm_replay_advance(struct xfrm_state *x, __be32 seq);
 extern void xfrm_replay_notify(struct xfrm_state *x, int event);
 extern int xfrm_state_mtu(struct xfrm_state *x, int mtu);
diff --git a/net/ipv4/ah4.c b/net/ipv4/ah4.c
index d76803a..ec8de0a 100644
--- a/net/ipv4/ah4.c
+++ b/net/ipv4/ah4.c
@@ -179,8 +179,10 @@ static int ah_input(struct 

[PATCH 0/2] XFRM auditing patch rebased

2007-12-21 Thread Paul Moore
Based on the net-2.6.25 tree from about an hour ago.  The first patch was
dropped because it is already applied.

-- 
paul moore
linux security @ hp

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[XFRM]: Fix outbound statistics.

2007-12-21 Thread Masahide NAKAMURA
David, I failed to include this statistics codes since I didn't notice
the conflict with the latest Herbert XFRM fix.

Please apply this, too.

Signed-off-by: Masahide NAKAMURA [EMAIL PROTECTED]
---
 net/xfrm/xfrm_output.c |   12 +---
 1 files changed, 9 insertions(+), 3 deletions(-)

diff --git a/net/xfrm/xfrm_output.c b/net/xfrm/xfrm_output.c
index 867484a..8dee031 100644
--- a/net/xfrm/xfrm_output.c
+++ b/net/xfrm/xfrm_output.c
@@ -43,17 +43,23 @@ static int xfrm_output_one(struct sk_buff *skb, int err)
 
do {
err = xfrm_state_check_space(x, skb);
-   if (err)
+   if (err) {
goto error_nolock;
+   XFRM_INC_STATS(LINUX_MIB_XFRMOUTERROR);
+   }
 
err = x-outer_mode-output(x, skb);
-   if (err)
+   if (err) {
+   XFRM_INC_STATS(LINUX_MIB_XFRMOUTSTATEMODEERROR);
goto error_nolock;
+   }
 
spin_lock_bh(x-lock);
err = xfrm_state_check_expire(x);
-   if (err)
+   if (err) {
+   XFRM_INC_STATS(LINUX_MIB_XFRMOUTSTATEEXPIRED);
goto error;
+   }
 
if (x-type-flags  XFRM_TYPE_REPLAY_PROT) {
XFRM_SKB_CB(skb)-seq = ++x-replay.oseq;
-- 
1.4.4.2

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] IPv4: Fix a typo in tcp_v4_do_calc_md5_hash() which prevents compilation

2007-12-21 Thread David Miller
From: Paul Moore [EMAIL PROTECTED]
Date: Fri, 21 Dec 2007 08:47:13 -0500

 It's spelled unsigned.
 
 Signed-off-by: Paul Moore [EMAIL PROTECTED]

Applied, thanks.

I swore I did an allmodconfig test build with that change
added, looks like I didn't :-/
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/3] XFRM: RFC4303 compliant auditing

2007-12-21 Thread David Miller
From: Paul Moore [EMAIL PROTECTED]
Date: Fri, 21 Dec 2007 08:51:22 -0500

 Ah, looks like I may not be crazy after all!  It looks like the XFRM patches 
 from Masahide NAKAMURA were pulled into net-2.6.25 just before mine last 
 night which caused my patches to conflict ...

Sorry.  I had double checked and was going to rework your patches
by hand if that was the case, but for some reason I didn't see
his patches as being the culprit causing the rejects when I
checked things out.

I must have looked over the history correctly.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [TCP] Avoid two divides in __tcp_grow_window()

2007-12-21 Thread David Miller
From: Eric Dumazet [EMAIL PROTECTED]
Date: Fri, 21 Dec 2007 15:05:51 +0100

 tcp_win_from_space() being signed, compiler might emit an integer divide
 to compute tcp_win_from_space()/2 .
 
 Using right shifts is OK here and less expensive.
 
 Signed-off-by: Eric Dumazet [EMAIL PROTECTED]

Applied, thanks Eric.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH/RFC] [v2] TCP: use non-delayed ACK for congestion control RTT

2007-12-21 Thread David Miller
From: Gavin McCullagh [EMAIL PROTECTED]
Date: Fri, 21 Dec 2007 13:31:06 +

 On Fri, 21 Dec 2007, David Miller wrote:
 
  When Gavin respins the patch I'll look at in the context of submitting
  it as a bug fix.  So Gavin please generate the patch against Linus's
  vanilla GIT tree or net-2.6, your choise.
 
 The existing patch was against Linus' linux-2.6.git from a few days ago so
 I've updated my tree and regenerated the patch (below).  Is that the right
 one?

Yep, it is.

 I need to re-read properly, but I think the same problem affects the
 microsecond values where TCP_CONG_RTT_STAMP is set (used by vegas, veno,
 yeah, illinois).  I might follow up with another patch which changes the
 behaviour where TCP_CONG_RTT_STAMP when I'm more sure of that.

Ok, let us know what you find.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [XFRM]: Fix outbound statistics.

2007-12-21 Thread Herbert Xu
On Fri, Dec 21, 2007 at 11:25:00PM +0900, Masahide NAKAMURA wrote:

   do {
   err = xfrm_state_check_space(x, skb);
 - if (err)
 + if (err) {
 + XFRM_INC_STATS(LINUX_MIB_XFRMOUTERROR);
   goto error_nolock;
 + }
  
   err = x-outer_mode-output(x, skb);
 - if (err)
 + if (err) {
 + XFRM_INC_STATS(LINUX_MIB_XFRMOUTSTATEMODEERROR);

BTW, none of our existing mode output functions actually return
an error.  I noticed that the description for this field is actually
Transformation mode specific error, e.g. Outer header space is not
enough.  This is slightly misleading as output header space is
checked by xfrm_state_check_space so if there's an error that's
where it'll show up.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmVHI~} [EMAIL PROTECTED]
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[TCP] Avoid two divides in __tcp_grow_window()

2007-12-21 Thread Eric Dumazet
tcp_win_from_space() being signed, compiler might emit an integer divide
to compute tcp_win_from_space()/2 .

Using right shifts is OK here and less expensive.

Signed-off-by: Eric Dumazet [EMAIL PROTECTED]

diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 6931946..145b51a 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -289,8 +289,8 @@ static int __tcp_grow_window(const struct sock *sk, const 
struct sk_buff *skb)
 {
struct tcp_sock *tp = tcp_sk(sk);
/* Optimize this! */
-   int truesize = tcp_win_from_space(skb-truesize)/2;
-   int window = tcp_win_from_space(sysctl_tcp_rmem[2])/2;
+   int truesize = tcp_win_from_space(skb-truesize)  1;
+   int window = tcp_win_from_space(sysctl_tcp_rmem[2])  1;
 
while (tp-rcv_ssthresh = window) {
if (truesize = skb-len)
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH/RFC] [v2] TCP: use non-delayed ACK for congestion control RTT

2007-12-21 Thread Ilpo Järvinen
On Fri, 21 Dec 2007, Gavin McCullagh wrote:

 On Fri, 21 Dec 2007, David Miller wrote:
 
  When Gavin respins the patch I'll look at in the context of submitting
  it as a bug fix.  So Gavin please generate the patch against Linus's
  vanilla GIT tree or net-2.6, your choise.
 
 The existing patch was against Linus' linux-2.6.git from a few days ago so
 I've updated my tree and regenerated the patch (below).  Is that the right
 one?
 
 I'm just checking through the existing CA modules.  I don't see the rtt
 used for RTO anywhere.  This is what I gather they're each using rtt for.

I meant more timeout like fashion (e.g., to timeout some internal 
phase but I don't find that too likely)...

Thanks for checking them.

 So as far as I can tell, timeout stuff is not ever altered using
 pkts_acked() so I guess this fix only affects westwood, htcp and cubic just
 now.
 
 I need to re-read properly, but I think the same problem affects the
 microsecond values where TCP_CONG_RTT_STAMP is set (used by vegas, veno,
 yeah, illinois).  I might follow up with another patch which changes the
 behaviour where TCP_CONG_RTT_STAMP when I'm more sure of that.

Please do, you might have to remove fully_acked checks to do that right 
though so it won't be as straight-forward change as this one and requires 
some amount of thinking to result in a right thing.

 Signed-off-by: Gavin McCullagh [EMAIL PROTECTED]
 
 diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
 index 889c893..6fb7989 100644
 --- a/net/ipv4/tcp_input.c
 +++ b/net/ipv4/tcp_input.c
 @@ -2651,6 +2651,7 @@ static int tcp_clean_rtx_queue(struct sock *sk, s32 
 *seq_rtt_p,
   u32 cnt = 0;
   u32 reord = tp-packets_out;
   s32 seq_rtt = -1;
 + s32 ca_seq_rtt = -1;
   ktime_t last_ackt = net_invalid_timestamp();
  
   while ((skb = tcp_write_queue_head(sk))  skb != tcp_send_head(sk)) {
 @@ -2686,13 +2687,15 @@ static int tcp_clean_rtx_queue(struct sock *sk, s32 
 *seq_rtt_p,
   if (sacked  TCPCB_SACKED_RETRANS)
   tp-retrans_out -= packets_acked;
   flag |= FLAG_RETRANS_DATA_ACKED;
 + ca_seq_rtt = -1;
   seq_rtt = -1;
   if ((flag  FLAG_DATA_ACKED) ||
   (packets_acked  1))
   flag |= FLAG_NONHEAD_RETRANS_ACKED;
   } else {
 + ca_seq_rtt = now - scb-when;
   if (seq_rtt  0) {
 - seq_rtt = now - scb-when;
 + seq_rtt = ca_seq_rtt;
   if (fully_acked)
   last_ackt = skb-tstamp;
   }
 @@ -2709,8 +2712,9 @@ static int tcp_clean_rtx_queue(struct sock *sk, s32 
 *seq_rtt_p,
   !before(end_seq, tp-snd_up))
   tp-urg_mode = 0;
   } else {
 + ca_seq_rtt = now - scb-when;
   if (seq_rtt  0) {
 - seq_rtt = now - scb-when;
 + seq_rtt = ca_seq_rtt;
   if (fully_acked)
   last_ackt = skb-tstamp;
   }
 @@ -2772,8 +2776,8 @@ static int tcp_clean_rtx_queue(struct sock *sk, s32 
 *seq_rtt_p,
net_invalid_timestamp()))
   rtt_us = 
 ktime_us_delta(ktime_get_real(),
   last_ackt);
 - else if (seq_rtt  0)
 - rtt_us = jiffies_to_usecs(seq_rtt);
 + else if (ca_seq_rtt  0)
 + rtt_us = jiffies_to_usecs(ca_seq_rtt);
   }
  
   ca_ops-pkts_acked(sk, pkts_acked, rtt_us);
 

Acked-by: Ilpo Järvinen [EMAIL PROTECTED]

/ Reviewed-by... whatever, I don't know if they really started to use it
or not... :-)

-- 
 i.

Re: [PATCH/RFC] [v2] TCP: use non-delayed ACK for congestion control RTT

2007-12-21 Thread Ilpo Järvinen
On Fri, 21 Dec 2007, Ilpo Järvinen wrote:

 On Fri, 21 Dec 2007, Gavin McCullagh wrote:
 
  I'm just checking through the existing CA modules.  I don't see the rtt
  used for RTO anywhere.  This is what I gather they're each using rtt for.
 
 I meant more timeout like fashion (e.g., to timeout some internal 
 phase but I don't find that too likely)...

I meant didn't, no need to recheck them due to that... :-)

-- 
 i.

Re: Top 10 kernel oopses/warnings for the week of December 21st 2007

2007-12-21 Thread Alan Cox
 Rank 2: uart_flush_buffer
   Warning at drivers/serial/serial_core.c:544 in uart_flush_buffer()
   Reported 16 times
   No specific version information reported; bug present in 2.6.24-rc5
   Caused by a bug in the Bluetooth line discipline/tty code

 Rank 7: uart_write
   Warning at drivers/serial/serial_core.c:490 in uart_write()
   Reported 2 times

Seems to be more of the Bluetooth stuf


Alan
--
DASD is not really IBM - it has a vowel
-Arjan van de Ven
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [TCP]: Convert several length variable to unsigned.

2007-12-21 Thread Joe Perches
On Fri, 2007-12-21 at 16:48 +0900, YOSHIFUJI Hideaki / 吉藤英明 wrote:
 diff --git a/net/ipv6/tcp_ipv6.c b/net/ipv6/tcp_ipv6.c
 index 93980c3..3b4169c 100644
 --- a/net/ipv6/tcp_ipv6.c
 +++ b/net/ipv6/tcp_ipv6.c
 @@ -985,7 +985,7 @@ static void tcp_v6_send_reset(struct sock *sk, struct 
 sk_buff *skb)
   struct tcphdr *th = tcp_hdr(skb), *t1;
   struct sk_buff *buff;
   struct flowi fl;
 - int tot_len = sizeof(*th);
 + unsigned int tot_len = sizeof(*th);
  #ifdef CONFIG_TCP_MD5SIG
   struct tcp_md5sig_key *key;
  #endif
 @@ -1085,7 +1085,7 @@ static void tcp_v6_send_ack(struct tcp_timewait_sock 
 *tw,
   struct tcphdr *th = tcp_hdr(skb), *t1;
   struct sk_buff *buff;
   struct flowi fl;
 - int tot_len = sizeof(struct tcphdr);
 + unsigned int tot_len = sizeof(struct tcphdr);
   __be32 *topt;
  #ifdef CONFIG_TCP_MD5SIG
   struct tcp_md5sig_key *key;

Unrelated, but perhaps tot_len should be initialized in
a consistent fashion?

I think unsigned int tot_len = sizeof(struct tcphdr); is more readable.

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Top 10 kernel oopses/warnings for the week of December 21st 2007

2007-12-21 Thread Arjan van de Ven

Andi Kleen wrote:

Arjan van de Ven [EMAIL PROTECTED] writes:



Rank 8: __change_page_attr
BUG at arch/x86/mm/pageattr_64.c:176
Reported 2 times
Reported this week for 2.6.24-rc5; history goes back to 2.6.15


There is no BUG on this line on 2.6.24-rc5.   Since there are many
BUG_ONs in this file it is unclear which you mean.

Could you always include the version of the kernel where the actual
oops in the line came from?


in this case this is really all the version information available ;(
it seems to be a patched kernel without patched EXTRAVERSION.
But in the future if I have more specific information (eg it's only 1 kernel 
version) I'll mention it in more detail.
It gets unwieldy if there's 500 reports for an oops of course ;)



Anyways there are a lot of third party modules who do strange 
things with c_p_a(), not always legal, so you might look up out for that 
pattern too. Perhaps report the out of tree modules loaded in the 
summary too?


I already always will mention if the oops is tainted or not (that I track 
specifically);
I'll keep an eye out for other non-tainting out of tree modules as well.

Thanks for the suggestions.

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/2] phylib: add module owner to the mii_bus structure

2007-12-21 Thread Ionut Nicu
Prevent unloading mii bus driver module when other modules have references to 
some
phydevs on that bus. Added a new member (module owner) to struct mii_bus and 
added
code to increment the mii bus owner module usage count on phy_connect and 
decrement
it on phy_disconnect

Set the module owner in the ucc_geth_mdio driver.

Signed-off-by: Ionut Nicu [EMAIL PROTECTED]
---
 drivers/net/phy/phy_device.c |9 -
 drivers/net/ucc_geth_mii.c   |3 +++
 include/linux/phy.h  |4 
 3 files changed, 15 insertions(+), 1 deletions(-)

diff --git a/drivers/net/phy/phy_device.c b/drivers/net/phy/phy_device.c
index 5b9e175..7dc5480 100644
--- a/drivers/net/phy/phy_device.c
+++ b/drivers/net/phy/phy_device.c
@@ -178,6 +178,10 @@ struct phy_device * phy_connect(struct net_device *dev, 
const char *phy_id,
if (phydev-irq  0)
phy_start_interrupts(phydev);
 
+   /* Increment the usage count of the mii bus owner */
+   if (!try_module_get(phydev-bus-owner))
+   return ERR_PTR(-EFAULT);
+
return phydev;
 }
 EXPORT_SYMBOL(phy_connect);
@@ -192,9 +196,12 @@ void phy_disconnect(struct phy_device *phydev)
phy_stop_interrupts(phydev);
 
phy_stop_machine(phydev);
-   
+
phydev-adjust_link = NULL;
 
+   /* Decrement the reference count for the mii bus owner */
+   module_put(phydev-bus-owner);
+
phy_detach(phydev);
 }
 EXPORT_SYMBOL(phy_disconnect);
diff --git a/drivers/net/ucc_geth_mii.c b/drivers/net/ucc_geth_mii.c
index a3af4ea..84c7295 100644
--- a/drivers/net/ucc_geth_mii.c
+++ b/drivers/net/ucc_geth_mii.c
@@ -217,6 +217,9 @@ static int uec_mdio_probe(struct of_device *ofdev, const 
struct of_device_id *ma
}
}
 
+   /* register ourselves as the owner of this bus */
+   new_bus-owner = THIS_MODULE;
+
err = mdiobus_register(new_bus);
if (0 != err) {
printk(KERN_ERR %s: Cannot register as MDIO bus\n,
diff --git a/include/linux/phy.h b/include/linux/phy.h
index 554836e..04ff6a5 100644
--- a/include/linux/phy.h
+++ b/include/linux/phy.h
@@ -82,6 +82,10 @@ struct mii_bus {
const char *name;
int id;
void *priv;
+
+   /* The module that owns this bus */
+   struct module *owner;
+
int (*read)(struct mii_bus *bus, int phy_id, int regnum);
int (*write)(struct mii_bus *bus, int phy_id, int regnum, u16 val);
int (*reset)(struct mii_bus *bus);
-- 
1.5.4.rc0

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[XFRM]: Fix outbound statistics.

2007-12-21 Thread Masahide NAKAMURA
Hello David,

I'm sorry, the previous mail contains wrong code:


 David, I failed to include this statistics codes since I didn't notice
 the conflict with the latest Herbert XFRM fix.
 
 Please apply this, too.
 
 Signed-off-by: Masahide NAKAMURA [EMAIL PROTECTED]
 ---
[snip]
  
   do {
  err = xfrm_state_check_space(x, skb);
 - if (err)
 + if (err) {
   goto error_nolock;
 + XFRM_INC_STATS(LINUX_MIB_XFRMOUTERROR);
 + }


Please apply the below patch instead of the previous.

Signed-off-by: Masahide NAKAMURA [EMAIL PROTECTED]
---
 net/xfrm/xfrm_output.c |   12 +---
 1 files changed, 9 insertions(+), 3 deletions(-)

diff --git a/net/xfrm/xfrm_output.c b/net/xfrm/xfrm_output.c
index 867484a..f8e5961 100644
--- a/net/xfrm/xfrm_output.c
+++ b/net/xfrm/xfrm_output.c
@@ -43,17 +43,23 @@ static int xfrm_output_one(struct sk_buff *skb, int err)
 
do {
err = xfrm_state_check_space(x, skb);
-   if (err)
+   if (err) {
+   XFRM_INC_STATS(LINUX_MIB_XFRMOUTERROR);
goto error_nolock;
+   }
 
err = x-outer_mode-output(x, skb);
-   if (err)
+   if (err) {
+   XFRM_INC_STATS(LINUX_MIB_XFRMOUTSTATEMODEERROR);
goto error_nolock;
+   }
 
spin_lock_bh(x-lock);
err = xfrm_state_check_expire(x);
-   if (err)
+   if (err) {
+   XFRM_INC_STATS(LINUX_MIB_XFRMOUTSTATEEXPIRED);
goto error;
+   }
 
if (x-type-flags  XFRM_TYPE_REPLAY_PROT) {
XFRM_SKB_CB(skb)-seq = ++x-replay.oseq;
-- 
1.4.4.2

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/2] XFRM: Drop packets when replay counter would overflow

2007-12-21 Thread Paul Moore
According to RFC4303, section 3.3.3 we need to drop outgoing packets which
cause the replay counter to overflow:

   3.3.3.  Sequence Number Generation

   The sender's counter is initialized to 0 when an SA is established.
   The sender increments the sequence number (or ESN) counter for this
   SA and inserts the low-order 32 bits of the value into the Sequence
   Number field.  Thus, the first packet sent using a given SA will
   contain a sequence number of 1.

   If anti-replay is enabled (the default), the sender checks to ensure
   that the counter has not cycled before inserting the new value in the
   Sequence Number field.  In other words, the sender MUST NOT send a
   packet on an SA if doing so would cause the sequence number to cycle.
   An attempt to transmit a packet that would result in sequence number
   overflow is an auditable event.  The audit log entry for this event
   SHOULD include the SPI value, current date/time, Source Address,
   Destination Address, and (in IPv6) the cleartext Flow ID.

Signed-off-by: Paul Moore [EMAIL PROTECTED]
---

 net/xfrm/xfrm_output.c |5 -
 1 files changed, 4 insertions(+), 1 deletions(-)

diff --git a/net/xfrm/xfrm_output.c b/net/xfrm/xfrm_output.c
index 0951444..d73003c 100644
--- a/net/xfrm/xfrm_output.c
+++ b/net/xfrm/xfrm_output.c
@@ -57,8 +57,11 @@ static int xfrm_output_one(struct sk_buff *skb, int err)
 
if (x-type-flags  XFRM_TYPE_REPLAY_PROT) {
XFRM_SKB_CB(skb)-seq = ++x-replay.oseq;
-   if (unlikely(x-replay.oseq == 0))
+   if (unlikely(x-replay.oseq == 0)) {
+   x-replay.oseq--;
xfrm_audit_state_replay_overflow(x, skb);
+   goto error;
+   }
if (xfrm_aevent_is_on())
xfrm_replay_notify(x, XFRM_REPLAY_UPDATE);
}

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/2] ucc_geth: split ucc_geth into two modules

2007-12-21 Thread Ionut Nicu
Split ucc_geth_driver into 2 modules:
- one module for the mii bus (phy devices register to this bus).
- one module for the ethernet driver (uses phy_connect to get a phydev 
from the mii bus)

Updated Makefile, Kconfig files and defconfigs (mpc836x, mpc832x_mds, 
mpc832x_rdb).

Signed-off-by: Ionut Nicu [EMAIL PROTECTED]
---
 arch/powerpc/configs/mpc832x_mds_defconfig |1 +
 arch/powerpc/configs/mpc832x_rdb_defconfig |1 +
 arch/powerpc/configs/mpc836x_mds_defconfig |1 +
 drivers/net/Kconfig|8 
 drivers/net/Makefile   |5 -
 drivers/net/ucc_geth.c |   18 +-
 drivers/net/ucc_geth_mii.c |9 +
 7 files changed, 29 insertions(+), 14 deletions(-)

diff --git a/arch/powerpc/configs/mpc832x_mds_defconfig 
b/arch/powerpc/configs/mpc832x_mds_defconfig
index 2d8951b..1c51739 100644
--- a/arch/powerpc/configs/mpc832x_mds_defconfig
+++ b/arch/powerpc/configs/mpc832x_mds_defconfig
@@ -500,6 +500,7 @@ CONFIG_NETDEV_1000=y
 # CONFIG_TIGON3 is not set
 # CONFIG_BNX2 is not set
 # CONFIG_GIANFAR is not set
+CONFIG_UCC_MDIO=y
 CONFIG_UCC_GETH=y
 # CONFIG_UGETH_NAPI is not set
 # CONFIG_UGETH_MAGIC_PACKET is not set
diff --git a/arch/powerpc/configs/mpc832x_rdb_defconfig 
b/arch/powerpc/configs/mpc832x_rdb_defconfig
index 761718a..cb4d076 100644
--- a/arch/powerpc/configs/mpc832x_rdb_defconfig
+++ b/arch/powerpc/configs/mpc832x_rdb_defconfig
@@ -503,6 +503,7 @@ CONFIG_E1000=y
 # CONFIG_TIGON3 is not set
 # CONFIG_BNX2 is not set
 # CONFIG_GIANFAR is not set
+CONFIG_UCC_MDIO=y
 CONFIG_UCC_GETH=y
 CONFIG_UGETH_NAPI=y
 # CONFIG_UGETH_MAGIC_PACKET is not set
diff --git a/arch/powerpc/configs/mpc836x_mds_defconfig 
b/arch/powerpc/configs/mpc836x_mds_defconfig
index c44fc56..92166e9 100644
--- a/arch/powerpc/configs/mpc836x_mds_defconfig
+++ b/arch/powerpc/configs/mpc836x_mds_defconfig
@@ -499,6 +499,7 @@ CONFIG_NETDEV_1000=y
 # CONFIG_TIGON3 is not set
 # CONFIG_BNX2 is not set
 # CONFIG_GIANFAR is not set
+CONFIG_UCC_MDIO=y
 CONFIG_UCC_GETH=y
 # CONFIG_UGETH_NAPI is not set
 # CONFIG_UGETH_MAGIC_PACKET is not set
diff --git a/drivers/net/Kconfig b/drivers/net/Kconfig
index d9107e5..7314802 100644
--- a/drivers/net/Kconfig
+++ b/drivers/net/Kconfig
@@ -2315,9 +2315,17 @@ config GFAR_NAPI
bool Use Rx Polling (NAPI)
depends on GIANFAR
 
+config UCC_MDIO
+   tristate Freescale QE UCC MDIO Bus
+   depends on QUICC_ENGINE
+   select PHYLIB
+   help
+ Provides Bus interface for MII Management regs in the UCC register 
space.
+
 config UCC_GETH
tristate Freescale QE Gigabit Ethernet
depends on QUICC_ENGINE
+   select UCC_MDIO
select PHYLIB
help
  This driver supports the Gigabit Ethernet mode of the QUICC Engine,
diff --git a/drivers/net/Makefile b/drivers/net/Makefile
index 0e5fde4..97843a3 100644
--- a/drivers/net/Makefile
+++ b/drivers/net/Makefile
@@ -22,8 +22,11 @@ gianfar_driver-objs := gianfar.o \
gianfar_mii.o \
gianfar_sysfs.o
 
+obj-$(CONFIG_UCC_MDIO) += ucc_geth_mdio.o
+ucc_geth_mdio-objs := ucc_geth_mii.o
+
 obj-$(CONFIG_UCC_GETH) += ucc_geth_driver.o
-ucc_geth_driver-objs := ucc_geth.o ucc_geth_mii.o ucc_geth_ethtool.o
+ucc_geth_driver-objs := ucc_geth.o ucc_geth_ethtool.o
 
 #
 # link order important here
diff --git a/drivers/net/ucc_geth.c b/drivers/net/ucc_geth.c
index c6a1902..c33a4cb 100644
--- a/drivers/net/ucc_geth.c
+++ b/drivers/net/ucc_geth.c
@@ -1612,9 +1612,12 @@ static int init_phy(struct net_device *dev)
priv-oldspeed = 0;
priv-oldduplex = -1;
 
+   request_module(ucc_geth_mdio);
+
snprintf(phy_id, BUS_ID_SIZE, PHY_ID_FMT, priv-ug_info-mdio_bus,
priv-ug_info-phy_address);
 
+
phydev = phy_connect(dev, phy_id, adjust_link, 0, priv-phy_interface);
 
if (IS_ERR(phydev)) {
@@ -4025,12 +4028,7 @@ static struct of_platform_driver ucc_geth_driver = {
 
 static int __init ucc_geth_init(void)
 {
-   int i, ret;
-
-   ret = uec_mdio_init();
-
-   if (ret)
-   return ret;
+   int i;
 
if (netif_msg_drv(debug))
printk(KERN_INFO ucc_geth:  DRV_DESC \n);
@@ -4038,18 +4036,12 @@ static int __init ucc_geth_init(void)
memcpy((ugeth_info[i]), ugeth_primary_info,
   sizeof(ugeth_primary_info));
 
-   ret = of_register_platform_driver(ucc_geth_driver);
-
-   if (ret)
-   uec_mdio_exit();
-
-   return ret;
+   return of_register_platform_driver(ucc_geth_driver);
 }
 
 static void __exit ucc_geth_exit(void)
 {
of_unregister_platform_driver(ucc_geth_driver);
-   uec_mdio_exit();
 }
 
 module_init(ucc_geth_init);
diff --git a/drivers/net/ucc_geth_mii.c b/drivers/net/ucc_geth_mii.c
index df884f0..a3af4ea 100644
--- a/drivers/net/ucc_geth_mii.c
+++ b/drivers/net/ucc_geth_mii.c

[patch net-2.6.25 2/2][NETNS] net: Add a helper function neigh_param_default_alloc

2007-12-21 Thread Daniel Lezcano
In the presence of multiple network namespaces the logic needed
to allocate the a default parameter table is just barely non-trivial.
So add a function to automate it to make everyone's life easier.

Signed-off-by: Eric W. Biederman [EMAIL PROTECTED]
Signed-off-by: Daniel Lezcano [EMAIL PROTECTED]
---
 include/net/neighbour.h |1 +
 net/core/neighbour.c|   15 +++
 2 files changed, 16 insertions(+)

Index: net-2.6.25/include/net/neighbour.h
===
--- net-2.6.25.orig/include/net/neighbour.h
+++ net-2.6.25/include/net/neighbour.h
@@ -208,6 +208,7 @@ extern struct neighbour *neigh_event_ns
struct net_device *dev);
 
 extern struct neigh_parms  *neigh_parms_alloc(struct net_device *dev, 
struct neigh_table *tbl);
+extern struct neigh_parms  *neigh_parms_alloc_default(struct neigh_table 
*tbl, struct net *net);
 extern voidneigh_parms_release(struct neigh_table *tbl, 
struct neigh_parms *parms);
 extern voidneigh_parms_destroy(struct neigh_parms *parms);
 extern unsigned long   neigh_rand_reach_time(unsigned long base);
Index: net-2.6.25/net/core/neighbour.c
===
--- net-2.6.25.orig/net/core/neighbour.c
+++ net-2.6.25/net/core/neighbour.c
@@ -1325,6 +1325,20 @@ struct neigh_parms *neigh_parms_alloc(st
return p;
 }
 
+struct neigh_parms *neigh_parms_alloc_default(struct neigh_table *tbl,
+   struct net *net)
+{
+   struct neigh_parms *parms;
+   if (net != init_net) {
+   parms = neigh_parms_alloc(NULL, tbl);
+   release_net(parms-net);
+   parms-net = hold_net(net);
+   }
+   else
+   parms = neigh_parms_clone(tbl-parms);
+   return parms;
+}
+
 static void neigh_rcu_free_parms(struct rcu_head *head)
 {
struct neigh_parms *parms =
@@ -2787,6 +2801,7 @@ EXPORT_SYMBOL(neigh_ifdown);
 EXPORT_SYMBOL(neigh_lookup);
 EXPORT_SYMBOL(neigh_lookup_nodev);
 EXPORT_SYMBOL(neigh_parms_alloc);
+EXPORT_SYMBOL(neigh_parms_alloc_default);
 EXPORT_SYMBOL(neigh_parms_release);
 EXPORT_SYMBOL(neigh_rand_reach_time);
 EXPORT_SYMBOL(neigh_resolve_output);

-- 
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Top 10 kernel oopses/warnings for the week of December 21st 2007

2007-12-21 Thread Andi Kleen
 in this case this is really all the version information available ;(
 it seems to be a patched kernel without patched EXTRAVERSION.
 But in the future if I have more specific information (eg it's only 1 
 kernel version) I'll mention it in more detail.
 It gets unwieldy if there's 500 reports for an oops of course ;)

Hmm would there be an automatic way to check out the file of the 
kernel version and then check if the BUG_ON/WARN_ON is on that line?
Maybe it could be done using git.

 
 
 Anyways there are a lot of third party modules who do strange 
 things with c_p_a(), not always legal, so you might look up out for that 
 pattern too. Perhaps report the out of tree modules loaded in the 
 summary too?
 
 I already always will mention if the oops is tainted or not (that I track 
 specifically);

I don't necessarily mean tainted, just out of tree modules in general.
There are some GPL modules who do strange things too. Not saying
that these oopses should be all ignored -- they might be legitimate
kernel bugs that they just trigger -- just it should be visible somehow
in the summary in case there is a pattern. 

Especially for c_p_a() i'm quite suspicious
because it depends a lot on what the caller did.

One way perhaps would be also to check if there is an out of tree
module inside the backtrace. I suppose you could keep a list 
of in tree modules and do this automatically. Of course there could
be false positives too with the standard inexact backtrace.

-Andi

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] New driver sfc for Solarstorm SFC4000 controller - 3nd try

2007-12-21 Thread Robert Stonehouse
This is a resubmission of a new driver for Solarflare network controllers.

The driver supports several types of PHY (10Gbase-T, XFP, CX4) on six
different 10G and 1G boards.

The previous thread was:
  [PATCH] [RFC] New driver sfc for Solarstorm SFC4000 controller
  http://marc.info/?l=linux-netdevm=119757352830103w=2

Thanks to the people who looked at the previous patches. We have addressed
the following from received comments after the 2nd submission:
  - Use kzalloc where appropriate
  - Remove deprecated fastcall attributes
  - Use kernel routines for hex dumps and MAC address printing
  - Changes to logging MACROs

The last two patches were marked with RFC but I now think that this driver
is ready (withstanding any further review comments) and I would like to ask
that this driver is considered for merging.


The patch (against net-2.6.25) is at:
 https://support.solarflare.com/netdev/3/net-2.6.25-sfc-2.2.0029.patch

The new files may also be downloaded as a tarball:
 https://support.solarflare.com/netdev/3/net-2.6.25-sfc-2.2.0029.tgz

And for verification there is:
 https://support.solarflare.com/netdev/3/MD5SUMS


Happy Christmas

-- 
Robert Stonehouse
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] [IPROUTE]: A workaround to make larger rto_min printed correctly

2007-12-21 Thread Stephen Hemminger
On Fri, 21 Dec 2007 22:58:04 +0900 (JST)
YOSHIFUJI Hideaki / 吉藤英明 [EMAIL PROTECTED] wrote:

 In article [EMAIL PROTECTED] (at Fri, 21 Dec 2007 22:49:59 +0900), Satoru 
 SATOH [EMAIL PROTECTED] says:
 
  I agree.
  
  I mistakenly thought hz in that context must be larger than 1000..
  As it's uncertain, your's looks much simpler and better.
  
  (btw, the lines else  div = 1 is not needed, is it?)
 
 Simplest fix is as follows:
 
 Signed-off-by: YOSHIFUJI Hideaki [EMAIL PROTECTED]
 --
 diff --git a/ip/iproute.c b/ip/iproute.c
 index f4200ae..7a885b0 100644
 --- a/ip/iproute.c
 +++ b/ip/iproute.c
 @@ -509,7 +509,7 @@ int print_route(const struct sockaddr_nl *who, struct 
 nlmsghdr *n, void *arg)
   i != RTAX_RTO_MIN)
   fprintf(fp,  %u, 
 *(unsigned*)RTA_DATA(mxrta[i]));
   else {
 - unsigned val = *(unsigned*)RTA_DATA(mxrta[i]);
 + unsigned long long val = 
 *(unsigned*)RTA_DATA(mxrta[i]);
  
   val *= 1000;
   if (i == RTAX_RTT)
 @@ -517,7 +517,7 @@ int print_route(const struct sockaddr_nl *who, struct 
 nlmsghdr *n, void *arg)
   else if (i == RTAX_RTTVAR)
   val /= 4;
   if (val = hz)
 - fprintf(fp,  %ums, val/hz);
 + fprintf(fp,  %llums, val/hz);
   else
   fprintf(fp,  %.2fms, (float)val/hz);
   }
 

applied thanks.

-- 
Stephen Hemminger [EMAIL PROTECTED]
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Fix lost export-dynamic

2007-12-21 Thread Stephen Hemminger
On Mon, 17 Dec 2007 16:06:38 +0300
Vitaliy Gusev [EMAIL PROTECTED] wrote:

 get_link_kind() fails for statically linked modules (vlan, veth, etc.) if 
 ip 
 was linked without export-dynamic.
 
 Signed-off-by: Vitaliy Gusev [EMAIL PROTECTED]
 

applied thanks

-- 
Stephen Hemminger [EMAIL PROTECTED]
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] Module for ip utility to support veth device

2007-12-21 Thread Stephen Hemminger
On Tue, 18 Dec 2007 15:15:38 +0300
Vitaliy Gusev [EMAIL PROTECTED] wrote:

 
 module link_veth
 
 Signed-off-by: Vitaliy Gusev [EMAIL PROTECTED]
 
 ---
 

Applied both patches, and moved veth.h from ip/veth.h to include/net/veth.h
so that the header file will be correctly updated if ABI changes.

-- 
Stephen Hemminger [EMAIL PROTECTED]
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] New driver sfc for Solarstorm SFC4000 controller - 3nd try

2007-12-21 Thread Randy Dunlap
On Fri, 21 Dec 2007 16:53:40 + Robert Stonehouse wrote:

 This is a resubmission of a new driver for Solarflare network controllers.
 
 The driver supports several types of PHY (10Gbase-T, XFP, CX4) on six
 different 10G and 1G boards.
 
 The previous thread was:
   [PATCH] [RFC] New driver sfc for Solarstorm SFC4000 controller
   http://marc.info/?l=linux-netdevm=119757352830103w=2
 
 Thanks to the people who looked at the previous patches. We have addressed
 the following from received comments after the 2nd submission:
   - Use kzalloc where appropriate
   - Remove deprecated fastcall attributes
   - Use kernel routines for hex dumps and MAC address printing
   - Changes to logging MACROs
 
 The last two patches were marked with RFC but I now think that this driver
 is ready (withstanding any further review comments) and I would like to ask
 that this driver is considered for merging.
 
 
 The patch (against net-2.6.25) is at:
  https://support.solarflare.com/netdev/3/net-2.6.25-sfc-2.2.0029.patch

wow, 750+ KB

How many drivers is this?


 The new files may also be downloaded as a tarball:
  https://support.solarflare.com/netdev/3/net-2.6.25-sfc-2.2.0029.tgz
 
 And for verification there is:
  https://support.solarflare.com/netdev/3/MD5SUMS


Hi,
I have just a few comments, mostly related to infrastructure.

a.  Don't use kernel-doc comment flags (i.e., /** to begin a comment
block) when the comment is not in kernel-doc format.  Or use
kernel-doc format.  :)
(as documented in Documentation/kernel-doc-nano-HOWTO.txt)

E.g.:
+/** @file
+ *
+ * Efx driverlink
+ *
+ * This header file defines the portions of the Efx driverlink
+ * interface that are used only by the kernel net driver, and are not
+ * part of the public interface specification.
+ */

b.  Kconfig file:

Use 1 tab (not 8 spaces) to indent below config items.
Use 1 tab + 2 spaces to indent help text below help lines.


c.  Driver contains MTD, SPI,  I2C (at least) code and needs to be
reviewed by people in those areas as well (IMO).

I see an MTD dependency in the Kconfig file.
What about the SPI and I2C parts?  Are they conditional or
how is that handled?  or does the driver not use the kernel
infrastructure for these?


---
~Randy
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: BNX2 warning

2007-12-21 Thread Michael Chan
On Thu, 2007-12-20 at 20:39 -0800, David Miller wrote:

 Michael, please fix this, thanks :-)
 
 drivers/net/bnx2.c: In function 'bnx2_init_napi':
 drivers/net/bnx2.c:7329: warning: no return statement in function returning 
 non-void
 

[BNX2]: Fix compiler warning.

Change bnx2_init_napi() to void.

Warning was noted by DaveM.

Signed-off-by: Michael Chan [EMAIL PROTECTED]

diff --git a/drivers/net/bnx2.c b/drivers/net/bnx2.c
index aef2a9a..94d1857 100644
--- a/drivers/net/bnx2.c
+++ b/drivers/net/bnx2.c
@@ -7313,7 +7313,7 @@ bnx2_bus_string(struct bnx2 *bp, char *str)
return str;
 }
 
-static int __devinit
+static void __devinit
 bnx2_init_napi(struct bnx2 *bp)
 {
int i;


--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-2.6.25 3/3] Uninline the inet_twsk_put function

2007-12-21 Thread Ingo Oeser
Hi David,

David Miller schrieb:
 inet_timewait_sock begins with a struct sock_common
 which is where the atomic_t is, and:
 
 #define tw_refcnt __tw_common.skc_refcnt
 
 So you would have to change struct sock_common over to kref, and thus
 the entire networking, in order to make such a change.

Ok, that sounds too much. Many thanks for following up and taking the time
to explain it.
 
 But you would have seen this instantly if you had spent 5 seconds
 looking at how these datastructures are defined.  Instead you choose
 to make me do it and explain it to you instead.

Sorry, just matched the wrong pattern here :-)


Best Regards

Ingo Oeser
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] New driver sfc for Solarstorm SFC4000 controller - 3nd try

2007-12-21 Thread Ben Hutchings
Randy Dunlap wrote:
 On Fri, 21 Dec 2007 16:53:40 + Robert Stonehouse wrote:
 
  This is a resubmission of a new driver for Solarflare network controllers.
snip
  The last two patches were marked with RFC but I now think that this driver
  is ready (withstanding any further review comments) and I would like to ask
  that this driver is considered for merging.
  
  
  The patch (against net-2.6.25) is at:
   https://support.solarflare.com/netdev/3/net-2.6.25-sfc-2.2.0029.patch
 
 wow, 750+ KB
 
 How many drivers is this?

Just two: sfc (net) and sfc_mtd (MTD).  As Robert said, the net driver
supports a variety of PHYs.  It also has a fair amount of self-test
and (conditional) debug code, and comments.

I've noted your comments on kernel-doc and Kconfig format and will
address those in the next version.

snip
 c.  Driver contains MTD, SPI,  I2C (at least) code and needs to be
 reviewed by people in those areas as well (IMO).
 
 I see an MTD dependency in the Kconfig file.
 What about the SPI and I2C parts?  Are they conditional or
 how is that handled?  or does the driver not use the kernel
 infrastructure for these?

We are not currently using the kernel infrastructure for those.
I'm not sure whether we could do.

I think everyone working on the net driver here will be away until
the new year, so please forgive our silence in the mean time.

Ben.

-- 
Ben Hutchings, Senior Software Engineer, Solarflare Communications
Not speaking for my employer; that's the marketing department's job.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] net: santize headers for iproute2

2007-12-21 Thread Stephen Hemminger
iproute2 source uses headers that result from make headers_install.
The header files net/tcp_states.h and net/veth.h are both being used
but are not processed. 

Signed-off-by: Stephen Hemminger [EMAIL PROTECTED]

---
This patch is not urgent but it would be good to get it in 2.6.24.

--- a/include/Kbuild2007-12-21 09:33:36.0 -0800
+++ b/include/Kbuild2007-12-21 09:33:42.0 -0800
@@ -4,5 +4,6 @@ header-y += sound/
 header-y += mtd/
 header-y += rdma/
 header-y += video/
+header-y += net/
 
 header-y += asm-$(ARCH)/
--- /dev/null   1970-01-01 00:00:00.0 +
+++ b/include/net/Kbuild2007-12-21 09:32:20.0 -0800
@@ -0,0 +1,2 @@
+header-y += tcp_states.h
+header-y += veth.h
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: TSO trimming question

2007-12-21 Thread Bill Fink
On Fri, 21 Dec 2007, Bill Fink wrote:

 Or perhaps even:
 
   /* Ok, it looks like it is advisable to defer.  */
   tp-tso_deferred = jiffies;
 
   /* need to return a non-zero value to defer, which means won't
* defer if jiffies == 0 but it's only a 1 in 4 billion event
* (and avoids a compare/branch by not checking jiffies)
/
 return jiffies;

Ack.  I introduced my own 64-bit to 32-bit issue (too late at night).
How about:

/* Ok, it looks like it is advisable to defer.  */
tp-tso_deferred = jiffies;

/* this won't defer if jiffies == 0 but it's only a 1 in
 * 4 billion event (and avoids a branch)
 */
return (jiffies != 0);

-Bill
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] [TCP]: Force TSO splits to MSS boundaries

2007-12-21 Thread Ilpo Järvinen
On Thu, 20 Dec 2007, David Miller wrote:

 From: Herbert Xu [EMAIL PROTECTED]
 Date: Thu, 20 Dec 2007 22:00:12 +0800
 
  On Thu, Dec 20, 2007 at 04:00:37AM -0800, David Miller wrote:
  
   In the most ideal sense, tcp_window_allows() should probably
   be changed to only return MSS multiples.
   
   Unfortunately this would add an expensive modulo operation,
   however I think it would elimiate this problem case.
  
  Well you only have to divide in the unlikely case of us being
  limited by the receiver window.  In that case speed is probably
  not of the essence anyway.
 
 Agreed, to some extent.
 
 I say to some extent because it might be realistic, with
 lots (millions) of sockets to hit this case a lot.
 
 There are so many things that are a don't care performance
 wise until you have a lot of stinky connections over crappy
 links.

How about this, I had to use another approach due to reasons
outlined in the commit message:

--
[PATCH] [TCP]: Force TSO splits to MSS boundaries

If snd_wnd - snd_nxt wasn't multiple of MSS, skb was split on
odd boundary by the callers of tcp_window_allows.

We try really hard to avoid unnecessary modulos. Therefore the
old caller side check if (skb-len  limit) was too wide as
well because limit is not bound in any way to skb-len and can
cause spurious testing for trimming in the middle of the queue
while we only wanted that to happen at the tail of the queue.
A simple additional caller side check for tcp_write_queue_tail
would likely have resulted 2 x modulos because the limit would
have to be first calculated from window, however, doing that
unnecessary modulo is not mandatory. After a minor change to
the algorithm, simply determine first if the modulo is needed
at all and at that point immediately decide also from which
value it should be calculated from.

This approach also kills some duplicated code.

Signed-off-by: Ilpo Järvinen [EMAIL PROTECTED]
---
 net/ipv4/tcp_output.c |   51 -
 1 files changed, 25 insertions(+), 26 deletions(-)

diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index 1852698..ea92a1b 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -1008,13 +1008,29 @@ static void tcp_cwnd_validate(struct sock *sk)
}
 }
 
-static unsigned int tcp_window_allows(struct tcp_sock *tp, struct sk_buff 
*skb, unsigned int mss_now, unsigned int cwnd)
+/* Returns the portion of skb which can be sent right away without
+ * introducing MSS oddities to segment boundaries. In rare cases where
+ * mss_now != mss_cache, we will request caller to create a small skb
+ * per input skb which could be mostly avoided here (if desired).
+ */
+static unsigned int tcp_mss_split_point(struct sock *sk, struct sk_buff *skb,
+   unsigned int mss_now,
+   unsigned int cwnd)
 {
-   u32 window, cwnd_len;
+   struct tcp_sock *tp = tcp_sk(sk);
+   u32 needed, window, cwnd_len;
 
window = (tp-snd_una + tp-snd_wnd - TCP_SKB_CB(skb)-seq);
cwnd_len = mss_now * cwnd;
-   return min(window, cwnd_len);
+
+   if (likely(cwnd_len = window  skb != tcp_write_queue_tail(sk)))
+   return cwnd_len;
+
+   if (skb == tcp_write_queue_tail(sk)  cwnd_len = skb-len)
+   return cwnd_len;
+
+   needed = min(skb-len, window);
+   return needed - needed % mss_now;
 }
 
 /* Can at least one segment of SKB be sent right now, according to the
@@ -1445,17 +1461,9 @@ static int tcp_write_xmit(struct sock *sk, unsigned int 
mss_now, int nonagle)
}
 
limit = mss_now;
-   if (tso_segs  1) {
-   limit = tcp_window_allows(tp, skb,
- mss_now, cwnd_quota);
-
-   if (skb-len  limit) {
-   unsigned int trim = skb-len % mss_now;
-
-   if (trim)
-   limit = skb-len - trim;
-   }
-   }
+   if (tso_segs  1)
+   limit = tcp_mss_split_point(sk, skb, mss_now,
+   cwnd_quota);
 
if (skb-len  limit 
unlikely(tso_fragment(sk, skb, limit, mss_now)))
@@ -1502,7 +1510,6 @@ void __tcp_push_pending_frames(struct sock *sk, unsigned 
int cur_mss,
  */
 void tcp_push_one(struct sock *sk, unsigned int mss_now)
 {
-   struct tcp_sock *tp = tcp_sk(sk);
struct sk_buff *skb = tcp_send_head(sk);
unsigned int tso_segs, cwnd_quota;
 
@@ -1517,17 +1524,9 @@ void tcp_push_one(struct sock *sk, unsigned int mss_now)
BUG_ON(!tso_segs);
 
limit = mss_now;
-   if (tso_segs  1) {
-   limit = tcp_window_allows(tp, skb,
- mss_now, cwnd_quota);
-
-  

Re: TSO trimming question

2007-12-21 Thread Ilpo Järvinen
On Fri, 21 Dec 2007, Bill Fink wrote:

 On Fri, 21 Dec 2007, Bill Fink wrote:
 
  Or perhaps even:
  
  /* Ok, it looks like it is advisable to defer.  */
  tp-tso_deferred = jiffies;
  
  /* need to return a non-zero value to defer, which means won't
   * defer if jiffies == 0 but it's only a 1 in 4 billion event
   * (and avoids a compare/branch by not checking jiffies)
   /
  return jiffies;
 
 Ack.  I introduced my own 64-bit to 32-bit issue (too late at night).
 How about:
 
   /* Ok, it looks like it is advisable to defer.  */
   tp-tso_deferred = jiffies;
 
   /* this won't defer if jiffies == 0 but it's only a 1 in
* 4 billion event (and avoids a branch)
*/
   return (jiffies != 0);

I'm not sure how the jiffies work but is this racy as well?

Simple return tp-tso_deferred; should work, shouldn't it? :-)


-- 
 i.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] [TCP]: Force TSO splits to MSS boundaries

2007-12-21 Thread Ilpo Järvinen
On Fri, 21 Dec 2007, Ilpo Järvinen wrote:

 How about this, I had to use another approach due to reasons
 outlined in the commit message:
 
 --
 [PATCH] [TCP]: Force TSO splits to MSS boundaries
 
 If snd_wnd - snd_nxt wasn't multiple of MSS, skb was split on
 odd boundary by the callers of tcp_window_allows.
 
 We try really hard to avoid unnecessary modulos. Therefore the
 old caller side check if (skb-len  limit) was too wide as
 well because limit is not bound in any way to skb-len and can
 cause spurious testing for trimming in the middle of the queue
 while we only wanted that to happen at the tail of the queue.
 A simple additional caller side check for tcp_write_queue_tail
 would likely have resulted 2 x modulos because the limit would
 have to be first calculated from window, however, doing that
 unnecessary modulo is not mandatory. After a minor change to
 the algorithm, simply determine first if the modulo is needed
 at all and at that point immediately decide also from which
 value it should be calculated from.
 
 This approach also kills some duplicated code.
 
 Signed-off-by: Ilpo Järvinen [EMAIL PROTECTED]
 ---
  net/ipv4/tcp_output.c |   51 
 -
  1 files changed, 25 insertions(+), 26 deletions(-)
 
 diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
 index 1852698..ea92a1b 100644
 --- a/net/ipv4/tcp_output.c
 +++ b/net/ipv4/tcp_output.c
 @@ -1008,13 +1008,29 @@ static void tcp_cwnd_validate(struct sock *sk)
   }
  }
  
 -static unsigned int tcp_window_allows(struct tcp_sock *tp, struct sk_buff 
 *skb, unsigned int mss_now, unsigned int cwnd)
 +/* Returns the portion of skb which can be sent right away without
 + * introducing MSS oddities to segment boundaries. In rare cases where
 + * mss_now != mss_cache, we will request caller to create a small skb
 + * per input skb which could be mostly avoided here (if desired).
 + */
 +static unsigned int tcp_mss_split_point(struct sock *sk, struct sk_buff *skb,
 + unsigned int mss_now,
 + unsigned int cwnd)
  {
 - u32 window, cwnd_len;
 + struct tcp_sock *tp = tcp_sk(sk);
 + u32 needed, window, cwnd_len;
  
   window = (tp-snd_una + tp-snd_wnd - TCP_SKB_CB(skb)-seq);
   cwnd_len = mss_now * cwnd;
 - return min(window, cwnd_len);
 +
 + if (likely(cwnd_len = window  skb != tcp_write_queue_tail(sk)))
 + return cwnd_len;
 +
 + if (skb == tcp_write_queue_tail(sk)  cwnd_len = skb-len)
 + return cwnd_len;

...if desired that this won't cause small skbs in the middle of the queue 
(except in case where just the sub-MSS portion is left out above window), 
something like this could be added here (again, avoiding more modulos):

if (skb != tcp_write_queue_tail(sk)  window  skb-len)
return skb-len;

 + needed = min(skb-len, window);
 + return needed - needed % mss_now;
  }
  
  /* Can at least one segment of SKB be sent right now, according to the

-- 
 i.

Re: TSO trimming question

2007-12-21 Thread Bill Fink
On Fri, 21 Dec 2007, Ilpo Järvinen wrote:

 On Fri, 21 Dec 2007, Bill Fink wrote:
 
  On Fri, 21 Dec 2007, Bill Fink wrote:
  
   Or perhaps even:
   
 /* Ok, it looks like it is advisable to defer.  */
 tp-tso_deferred = jiffies;
   
 /* need to return a non-zero value to defer, which means won't
  * defer if jiffies == 0 but it's only a 1 in 4 billion event
  * (and avoids a compare/branch by not checking jiffies)
  /
   return jiffies;
  
  Ack.  I introduced my own 64-bit to 32-bit issue (too late at night).
  How about:
  
  /* Ok, it looks like it is advisable to defer.  */
  tp-tso_deferred = jiffies;
  
  /* this won't defer if jiffies == 0 but it's only a 1 in
   * 4 billion event (and avoids a branch)
   */
  return (jiffies != 0);
 
 I'm not sure how the jiffies work but is this racy as well?
 
 Simple return tp-tso_deferred; should work, shouldn't it? :-)

As long as tp-tso_deferred remains u32, pending the other issue.

-Bill
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH 1/2] ucc_geth: split ucc_geth into two modules

2007-12-21 Thread Medve Emilian
Tested-by: Emil Medve [EMAIL PROTECTED]


 -Original Message-
 From: [EMAIL PROTECTED] 
 [mailto:[EMAIL PROTECTED] On Behalf Of Nicu Ioan Petru
 Sent: Friday, December 21, 2007 7:57 AM
 To: netdev@vger.kernel.org
 Cc: Nicu Ioan Petru
 Subject: [PATCH 1/2] ucc_geth: split ucc_geth into two modules
 
 Split ucc_geth_driver into 2 modules:
   - one module for the mii bus (phy devices register to this bus).
   - one module for the ethernet driver (uses phy_connect 
 to get a phydev from the mii bus)
 
 Updated Makefile, Kconfig files and defconfigs (mpc836x, 
 mpc832x_mds, mpc832x_rdb).
 
 Signed-off-by: Ionut Nicu [EMAIL PROTECTED]
 ---
  arch/powerpc/configs/mpc832x_mds_defconfig |1 +
  arch/powerpc/configs/mpc832x_rdb_defconfig |1 +
  arch/powerpc/configs/mpc836x_mds_defconfig |1 +
  drivers/net/Kconfig|8 
  drivers/net/Makefile   |5 -
  drivers/net/ucc_geth.c |   18 +-
  drivers/net/ucc_geth_mii.c |9 +
  7 files changed, 29 insertions(+), 14 deletions(-)
 
 diff --git a/arch/powerpc/configs/mpc832x_mds_defconfig 
 b/arch/powerpc/configs/mpc832x_mds_defconfig
 index 2d8951b..1c51739 100644
 --- a/arch/powerpc/configs/mpc832x_mds_defconfig
 +++ b/arch/powerpc/configs/mpc832x_mds_defconfig
 @@ -500,6 +500,7 @@ CONFIG_NETDEV_1000=y
  # CONFIG_TIGON3 is not set
  # CONFIG_BNX2 is not set
  # CONFIG_GIANFAR is not set
 +CONFIG_UCC_MDIO=y
  CONFIG_UCC_GETH=y
  # CONFIG_UGETH_NAPI is not set
  # CONFIG_UGETH_MAGIC_PACKET is not set
 diff --git a/arch/powerpc/configs/mpc832x_rdb_defconfig 
 b/arch/powerpc/configs/mpc832x_rdb_defconfig
 index 761718a..cb4d076 100644
 --- a/arch/powerpc/configs/mpc832x_rdb_defconfig
 +++ b/arch/powerpc/configs/mpc832x_rdb_defconfig
 @@ -503,6 +503,7 @@ CONFIG_E1000=y
  # CONFIG_TIGON3 is not set
  # CONFIG_BNX2 is not set
  # CONFIG_GIANFAR is not set
 +CONFIG_UCC_MDIO=y
  CONFIG_UCC_GETH=y
  CONFIG_UGETH_NAPI=y
  # CONFIG_UGETH_MAGIC_PACKET is not set
 diff --git a/arch/powerpc/configs/mpc836x_mds_defconfig 
 b/arch/powerpc/configs/mpc836x_mds_defconfig
 index c44fc56..92166e9 100644
 --- a/arch/powerpc/configs/mpc836x_mds_defconfig
 +++ b/arch/powerpc/configs/mpc836x_mds_defconfig
 @@ -499,6 +499,7 @@ CONFIG_NETDEV_1000=y
  # CONFIG_TIGON3 is not set
  # CONFIG_BNX2 is not set
  # CONFIG_GIANFAR is not set
 +CONFIG_UCC_MDIO=y
  CONFIG_UCC_GETH=y
  # CONFIG_UGETH_NAPI is not set
  # CONFIG_UGETH_MAGIC_PACKET is not set
 diff --git a/drivers/net/Kconfig b/drivers/net/Kconfig
 index d9107e5..7314802 100644
 --- a/drivers/net/Kconfig
 +++ b/drivers/net/Kconfig
 @@ -2315,9 +2315,17 @@ config GFAR_NAPI
   bool Use Rx Polling (NAPI)
   depends on GIANFAR
  
 +config UCC_MDIO
 + tristate Freescale QE UCC MDIO Bus
 + depends on QUICC_ENGINE
 + select PHYLIB
 + help
 +   Provides Bus interface for MII Management regs in the 
 UCC register space.
 +
  config UCC_GETH
   tristate Freescale QE Gigabit Ethernet
   depends on QUICC_ENGINE
 + select UCC_MDIO
   select PHYLIB
   help
 This driver supports the Gigabit Ethernet mode of the 
 QUICC Engine,
 diff --git a/drivers/net/Makefile b/drivers/net/Makefile
 index 0e5fde4..97843a3 100644
 --- a/drivers/net/Makefile
 +++ b/drivers/net/Makefile
 @@ -22,8 +22,11 @@ gianfar_driver-objs := gianfar.o \
   gianfar_mii.o \
   gianfar_sysfs.o
  
 +obj-$(CONFIG_UCC_MDIO) += ucc_geth_mdio.o
 +ucc_geth_mdio-objs := ucc_geth_mii.o
 +
  obj-$(CONFIG_UCC_GETH) += ucc_geth_driver.o
 -ucc_geth_driver-objs := ucc_geth.o ucc_geth_mii.o ucc_geth_ethtool.o
 +ucc_geth_driver-objs := ucc_geth.o ucc_geth_ethtool.o
  
  #
  # link order important here
 diff --git a/drivers/net/ucc_geth.c b/drivers/net/ucc_geth.c
 index c6a1902..c33a4cb 100644
 --- a/drivers/net/ucc_geth.c
 +++ b/drivers/net/ucc_geth.c
 @@ -1612,9 +1612,12 @@ static int init_phy(struct net_device *dev)
   priv-oldspeed = 0;
   priv-oldduplex = -1;
  
 + request_module(ucc_geth_mdio);
 +
   snprintf(phy_id, BUS_ID_SIZE, PHY_ID_FMT, 
 priv-ug_info-mdio_bus,
   priv-ug_info-phy_address);
  
 +
   phydev = phy_connect(dev, phy_id, adjust_link, 0, 
 priv-phy_interface);
  
   if (IS_ERR(phydev)) {
 @@ -4025,12 +4028,7 @@ static struct of_platform_driver 
 ucc_geth_driver = {
  
  static int __init ucc_geth_init(void)
  {
 - int i, ret;
 -
 - ret = uec_mdio_init();
 -
 - if (ret)
 - return ret;
 + int i;
  
   if (netif_msg_drv(debug))
   printk(KERN_INFO ucc_geth:  DRV_DESC \n);
 @@ -4038,18 +4036,12 @@ static int __init ucc_geth_init(void)
   memcpy((ugeth_info[i]), ugeth_primary_info,
  sizeof(ugeth_primary_info));
  
 - ret = of_register_platform_driver(ucc_geth_driver);
 -
 - if (ret)
 - uec_mdio_exit();
 -
 - return ret;
 + 

RE: [PATCH 2/2] phylib: add module owner to the mii_bus structure

2007-12-21 Thread Medve Emilian
Tested-by: Emil Medve [EMAIL PROTECTED]


 -Original Message-
 From: [EMAIL PROTECTED] 
 [mailto:[EMAIL PROTECTED] On Behalf Of Nicu Ioan Petru
 Sent: Friday, December 21, 2007 7:58 AM
 To: netdev@vger.kernel.org
 Cc: Nicu Ioan Petru
 Subject: [PATCH 2/2] phylib: add module owner to the mii_bus structure
 
 Prevent unloading mii bus driver module when other modules 
 have references to some
 phydevs on that bus. Added a new member (module owner) to 
 struct mii_bus and added
 code to increment the mii bus owner module usage count on 
 phy_connect and decrement
 it on phy_disconnect
 
 Set the module owner in the ucc_geth_mdio driver.
 
 Signed-off-by: Ionut Nicu [EMAIL PROTECTED]
 ---
  drivers/net/phy/phy_device.c |9 -
  drivers/net/ucc_geth_mii.c   |3 +++
  include/linux/phy.h  |4 
  3 files changed, 15 insertions(+), 1 deletions(-)
 
 diff --git a/drivers/net/phy/phy_device.c 
 b/drivers/net/phy/phy_device.c
 index 5b9e175..7dc5480 100644
 --- a/drivers/net/phy/phy_device.c
 +++ b/drivers/net/phy/phy_device.c
 @@ -178,6 +178,10 @@ struct phy_device * phy_connect(struct 
 net_device *dev, const char *phy_id,
   if (phydev-irq  0)
   phy_start_interrupts(phydev);
  
 + /* Increment the usage count of the mii bus owner */
 + if (!try_module_get(phydev-bus-owner))
 + return ERR_PTR(-EFAULT);
 +
   return phydev;
  }
  EXPORT_SYMBOL(phy_connect);
 @@ -192,9 +196,12 @@ void phy_disconnect(struct phy_device *phydev)
   phy_stop_interrupts(phydev);
  
   phy_stop_machine(phydev);
 - 
 +
   phydev-adjust_link = NULL;
  
 + /* Decrement the reference count for the mii bus owner */
 + module_put(phydev-bus-owner);
 +
   phy_detach(phydev);
  }
  EXPORT_SYMBOL(phy_disconnect);
 diff --git a/drivers/net/ucc_geth_mii.c b/drivers/net/ucc_geth_mii.c
 index a3af4ea..84c7295 100644
 --- a/drivers/net/ucc_geth_mii.c
 +++ b/drivers/net/ucc_geth_mii.c
 @@ -217,6 +217,9 @@ static int uec_mdio_probe(struct 
 of_device *ofdev, const struct of_device_id *ma
   }
   }
  
 + /* register ourselves as the owner of this bus */
 + new_bus-owner = THIS_MODULE;
 +
   err = mdiobus_register(new_bus);
   if (0 != err) {
   printk(KERN_ERR %s: Cannot register as MDIO bus\n,
 diff --git a/include/linux/phy.h b/include/linux/phy.h
 index 554836e..04ff6a5 100644
 --- a/include/linux/phy.h
 +++ b/include/linux/phy.h
 @@ -82,6 +82,10 @@ struct mii_bus {
   const char *name;
   int id;
   void *priv;
 +
 + /* The module that owns this bus */
 + struct module *owner;
 +
   int (*read)(struct mii_bus *bus, int phy_id, int regnum);
   int (*write)(struct mii_bus *bus, int phy_id, int 
 regnum, u16 val);
   int (*reset)(struct mii_bus *bus);
 -- 
 1.5.4.rc0
 
 --
 To unsubscribe from this list: send the line unsubscribe netdev in
 the body of a message to [EMAIL PROTECTED]
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[ETH]: Combine format_addr() with print_mac().

2007-12-21 Thread Michael Chan
[ETH]: Combine format_addr() with print_mac().

print_mac() used by most net drivers and format_addr() used by
net-sysfs.c are very similar and they can be integrated.

format_addr() is also identically redefined in the qla4xxx iscsi
driver.

Export a new function format_mac_addr() to be used by net-sysfs,
qla4xxx and others in the future.  Both print_mac() and
format_mac_addr() call _format_mac_addr() to do the formatting.

Signed-off-by: Michael Chan [EMAIL PROTECTED]
Cc: Joe Perches [EMAIL PROTECTED]
Cc: Mike Christie [EMAIL PROTECTED]
Cc: David Somayajulu [EMAIL PROTECTED]

diff --git a/drivers/scsi/qla4xxx/ql4_os.c b/drivers/scsi/qla4xxx/ql4_os.c
index 89460d2..0f4562b 100644
--- a/drivers/scsi/qla4xxx/ql4_os.c
+++ b/drivers/scsi/qla4xxx/ql4_os.c
@@ -173,18 +173,6 @@ static void qla4xxx_conn_stop(struct iscsi_cls_conn *conn, 
int flag)
printk(KERN_ERR iscsi: invalid stop flag %d\n, flag);
 }
 
-static ssize_t format_addr(char *buf, const unsigned char *addr, int len)
-{
-   int i;
-   char *cp = buf;
-
-   for (i = 0; i  len; i++)
-   cp += sprintf(cp, %02x%c, addr[i],
- i == (len - 1) ? '\n' : ':');
-   return cp - buf;
-}
-
-
 static int qla4xxx_host_get_param(struct Scsi_Host *shost,
  enum iscsi_host_param param, char *buf)
 {
@@ -193,7 +181,7 @@ static int qla4xxx_host_get_param(struct Scsi_Host *shost,
 
switch (param) {
case ISCSI_HOST_PARAM_HWADDRESS:
-   len = format_addr(buf, ha-my_mac, MAC_ADDR_LEN);
+   len = format_mac_addr(buf, ha-my_mac, MAC_ADDR_LEN);
break;
case ISCSI_HOST_PARAM_IPADDRESS:
len = sprintf(buf, %d.%d.%d.%d\n, ha-ip_address[0],
diff --git a/include/linux/if_ether.h b/include/linux/if_ether.h
index cc002cb..d20512c 100644
--- a/include/linux/if_ether.h
+++ b/include/linux/if_ether.h
@@ -124,10 +124,11 @@ int eth_header_parse(const struct sk_buff *skb, unsigned 
char *haddr);
 extern struct ctl_table ether_table[];
 #endif
 
+extern ssize_t format_mac_addr(char *buf, const unsigned char *addr, int len);
+
 /*
  * Display a 6 byte device address (MAC) in a readable format.
  */
-#define MAC_FMT %02x:%02x:%02x:%02x:%02x:%02x
 extern char *print_mac(char *buf, const u8 *addr);
 #define DECLARE_MAC_BUF(var) char var[18] __maybe_unused
 
diff --git a/net/core/net-sysfs.c b/net/core/net-sysfs.c
index e41f4b9..e72993b 100644
--- a/net/core/net-sysfs.c
+++ b/net/core/net-sysfs.c
@@ -95,17 +95,6 @@ NETDEVICE_SHOW(type, fmt_dec);
 NETDEVICE_SHOW(link_mode, fmt_dec);
 
 /* use same locking rules as GIFHWADDR ioctl's */
-static ssize_t format_addr(char *buf, const unsigned char *addr, int len)
-{
-   int i;
-   char *cp = buf;
-
-   for (i = 0; i  len; i++)
-   cp += sprintf(cp, %02x%c, addr[i],
- i == (len - 1) ? '\n' : ':');
-   return cp - buf;
-}
-
 static ssize_t show_address(struct device *dev, struct device_attribute *attr,
char *buf)
 {
@@ -114,7 +103,7 @@ static ssize_t show_address(struct device *dev, struct 
device_attribute *attr,
 
read_lock(dev_base_lock);
if (dev_isalive(net))
-   ret = format_addr(buf, net-dev_addr, net-addr_len);
+   ret = format_mac_addr(buf, net-dev_addr, net-addr_len);
read_unlock(dev_base_lock);
return ret;
 }
@@ -124,7 +113,7 @@ static ssize_t show_broadcast(struct device *dev,
 {
struct net_device *net = to_net_dev(dev);
if (dev_isalive(net))
-   return format_addr(buf, net-broadcast, net-addr_len);
+   return format_mac_addr(buf, net-broadcast, net-addr_len);
return -EINVAL;
 }
 
diff --git a/net/ethernet/eth.c b/net/ethernet/eth.c
index 6b2e454..f760d41 100644
--- a/net/ethernet/eth.c
+++ b/net/ethernet/eth.c
@@ -359,10 +359,33 @@ struct net_device *alloc_etherdev_mq(int sizeof_priv, 
unsigned int queue_count)
 }
 EXPORT_SYMBOL(alloc_etherdev_mq);
 
+static ssize_t _format_mac_addr(char *buf, const unsigned char *addr, int len)
+{
+   int i;
+   char *cp = buf;
+
+   for (i = 0; i  len; i++) {
+   cp += sprintf(cp, %02x, addr[i]);
+   if (i == len - 1)
+   break;
+   *cp++ = ':';
+   }
+   return cp - buf;
+}
+
+ssize_t format_mac_addr(char *buf, const unsigned char *addr, int len)
+{
+   ssize_t l;
+
+   l = _format_mac_addr(buf, addr, len);
+   strcpy(buf + l, \n);
+   return l + 1;
+}
+EXPORT_SYMBOL(format_mac_addr);
+
 char *print_mac(char *buf, const u8 *addr)
 {
-   sprintf(buf, MAC_FMT,
-   addr[0], addr[1], addr[2], addr[3], addr[4], addr[5]);
+   _format_mac_addr(buf, addr, ETH_ALEN);
return buf;
 }
 EXPORT_SYMBOL(print_mac);


--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More 

Re: [PATCH 2/2] XFRM: Drop packets when replay counter would overflow

2007-12-21 Thread David Miller
From: Paul Moore [EMAIL PROTECTED]
Date: Fri, 21 Dec 2007 09:15:00 -0500

 According to RFC4303, section 3.3.3 we need to drop outgoing packets which
 cause the replay counter to overflow:
 
3.3.3.  Sequence Number Generation
 
The sender's counter is initialized to 0 when an SA is established.
The sender increments the sequence number (or ESN) counter for this
SA and inserts the low-order 32 bits of the value into the Sequence
Number field.  Thus, the first packet sent using a given SA will
contain a sequence number of 1.
 
If anti-replay is enabled (the default), the sender checks to ensure
that the counter has not cycled before inserting the new value in the
Sequence Number field.  In other words, the sender MUST NOT send a
packet on an SA if doing so would cause the sequence number to cycle.
An attempt to transmit a packet that would result in sequence number
overflow is an auditable event.  The audit log entry for this event
SHOULD include the SPI value, current date/time, Source Address,
Destination Address, and (in IPv6) the cleartext Flow ID.
 
 Signed-off-by: Paul Moore [EMAIL PROTECTED]

Applied.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] XFRM: RFC4303 compliant auditing

2007-12-21 Thread David Miller
From: Paul Moore [EMAIL PROTECTED]
Date: Fri, 21 Dec 2007 09:14:55 -0500

 This patch adds a number of new IPsec audit events to meet the auditing
 requirements of RFC4303.  This includes audit hooks for the following events:
 
  * Could not find a valid SA [sections 2.1, 3.4.2]
. xfrm_audit_state_notfound()
. xfrm_audit_state_notfound_simple()
 
  * Sequence number overflow [section 3.3.3]
. xfrm_audit_state_replay_overflow()
 
  * Replayed packet [section 3.4.3]
. xfrm_audit_state_replay()
 
  * Integrity check failure [sections 3.4.4.1, 3.4.4.2]
. xfrm_audit_state_icvfail()
 
 While RFC4304 deals only with ESP most of the changes in this patch apply to
 IPsec in general, i.e. both AH and ESP.  The one case, integrity check
 failure, where ESP specific code had to be modified the same was done to the
 AH code for the sake of consistency.
 
 Signed-off-by: Paul Moore [EMAIL PROTECTED]

Applied.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [ETH]: Combine format_addr() with print_mac().

2007-12-21 Thread Joe Perches
On Fri, 2007-12-21 at 14:05 -0800, Michael Chan wrote:
 [ETH]: Combine format_addr() with print_mac().
 print_mac() used by most net drivers and format_addr() used by
 net-sysfs.c are very similar and they can be integrated.
 format_addr() is also identically redefined in the qla4xxx iscsi
 driver.
 diff --git a/net/ethernet/eth.c b/net/ethernet/eth.c
 index 6b2e454..f760d41 100644
 --- a/net/ethernet/eth.c
 +++ b/net/ethernet/eth.c
 @@ -359,10 +359,33 @@ struct net_device *alloc_etherdev_mq(int sizeof_priv, 
 unsigned int queue_count)
  }
  EXPORT_SYMBOL(alloc_etherdev_mq);
  
 +static ssize_t _format_mac_addr(char *buf, const unsigned char *addr, int 
 len)
 +{
 + int i;
 + char *cp = buf;
 +
 + for (i = 0; i  len; i++) {
 + cp += sprintf(cp, %02x, addr[i]);
 + if (i == len - 1)
 + break;
 + *cp++ = ':';
 + }
 + return cp - buf;
 +}
 +
 +ssize_t format_mac_addr(char *buf, const unsigned char *addr, int len)
 +{
 + ssize_t l;
 +
 + l = _format_mac_addr(buf, addr, len);
 + strcpy(buf + l, \n);
 + return l + 1;
 +}
 +EXPORT_SYMBOL(format_mac_addr);
 +
  char *print_mac(char *buf, const u8 *addr)
  {
 - sprintf(buf, MAC_FMT,
 - addr[0], addr[1], addr[2], addr[3], addr[4], addr[5]);
 + _format_mac_addr(buf, addr, ETH_ALEN);
   return buf;
  }
  EXPORT_SYMBOL(print_mac);

I think const unsigned char *addr should be const u8 *addr
ssize_t? shouldn't it be size_t?
Indexing buf by int len is unchecked.
That could lead to unintended buffer overruns.
Maybe add a buflen argument and use snprintf?

I had a patch that added some type-safety to print_mac
and prevented this unintended buffer overrun.

It seems it wasn't applied.

---

Subject: Re: [PATCH net-2.6.24] introduce MAC_FMT/MAC_ARG
Date: Mon, 24 Sep 2007 10:28:36 -0700

Here is a patch that adds some type safety to print_mac
by using a struct print_mac_buf * instead of char *.

It also reduces the defconfig vmlinux size by 8 bytes.

Signed-off-by:  Joe Perches [EMAIL PROTECTED]

--

 include/linux/if_ether.h |   12 ++--
 net/ethernet/eth.c   |6 +++---
 2 files changed, 13 insertions(+), 5 deletions(-)

diff --git a/include/linux/if_ether.h b/include/linux/if_ether.h
index 57abca1..620d6b1 100644
--- a/include/linux/if_ether.h
+++ b/include/linux/if_ether.h
@@ -126,7 +126,15 @@ extern struct ctl_table ether_table[];
  * Display a 6 byte device address (MAC) in a readable format.
  */
 #define MAC_FMT %02x:%02x:%02x:%02x:%02x:%02x
-extern char *print_mac(char *buf, const u8 *addr);
-#define DECLARE_MAC_BUF(var) char var[18] __maybe_unused
+
+struct print_mac_buf {
+   char formatted_mac_addr[18];
+};
+
+#define DECLARE_MAC_BUF(var) \
+   struct print_mac_buf __maybe_unused _##var; \
+   struct print_mac_buf __maybe_unused *var = _##var
+
+extern char *print_mac(struct print_mac_buf *buf, const u8 *addr);
 
 #endif /* _LINUX_IF_ETHER_H */
diff --git a/net/ethernet/eth.c b/net/ethernet/eth.c
index 2aaf6fa..ad82613 100644
--- a/net/ethernet/eth.c
+++ b/net/ethernet/eth.c
@@ -338,10 +338,10 @@ struct net_device *alloc_etherdev_mq(int
sizeof_priv, unsigned int queue_count)
 }
 EXPORT_SYMBOL(alloc_etherdev_mq);
 
-char *print_mac(char *buf, const u8 *addr)
+char *print_mac(struct print_mac_buf *buf, const u8 *addr)
 {
-   sprintf(buf, MAC_FMT,
+   sprintf(buf-formatted_mac_addr, MAC_FMT,
addr[0], addr[1], addr[2], addr[3], addr[4], addr[5]);
-   return buf;
+   return buf-formatted_mac_addr;
 }
 EXPORT_SYMBOL(print_mac);



--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: BNX2 warning

2007-12-21 Thread David Miller
From: Michael Chan [EMAIL PROTECTED]
Date: Fri, 21 Dec 2007 10:34:15 -0800

 [BNX2]: Fix compiler warning.
 
 Change bnx2_init_napi() to void.
 
 Warning was noted by DaveM.
 
 Signed-off-by: Michael Chan [EMAIL PROTECTED]

Applied, thanks!
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] phylib: add module owner to the mii_bus structure

2007-12-21 Thread Stephen Hemminger
On Fri, 21 Dec 2007 15:57:31 +0200
Ionut Nicu [EMAIL PROTECTED] wrote:

 Prevent unloading mii bus driver module when other modules have references to 
 some
 phydevs on that bus. Added a new member (module owner) to struct mii_bus and 
 added
 code to increment the mii bus owner module usage count on phy_connect and 
 decrement
 it on phy_disconnect
 
 Set the module owner in the ucc_geth_mdio driver.
 
 Signed-off-by: Ionut Nicu [EMAIL PROTECTED]
 ---
  drivers/net/phy/phy_device.c |9 -
  drivers/net/ucc_geth_mii.c   |3 +++
  include/linux/phy.h  |4 
  3 files changed, 15 insertions(+), 1 deletions(-)
 
 diff --git a/drivers/net/phy/phy_device.c b/drivers/net/phy/phy_device.c
 index 5b9e175..7dc5480 100644
 --- a/drivers/net/phy/phy_device.c
 +++ b/drivers/net/phy/phy_device.c
 @@ -178,6 +178,10 @@ struct phy_device * phy_connect(struct net_device *dev, 
 const char *phy_id,
   if (phydev-irq  0)
   phy_start_interrupts(phydev);
  
 + /* Increment the usage count of the mii bus owner */
 + if (!try_module_get(phydev-bus-owner))
 + return ERR_PTR(-EFAULT);

Shouldn't you get a handle before the first usage (start_interrupts)?

 +
   return phydev;
  }
  EXPORT_SYMBOL(phy_connect);
 @@ -192,9 +196,12 @@ void phy_disconnect(struct phy_device *phydev)
   phy_stop_interrupts(phydev);
  
   phy_stop_machine(phydev);
 - 
 +
   phydev-adjust_link = NULL;
  
 + /* Decrement the reference count for the mii bus owner */
 + module_put(phydev-bus-owner);
 +
   phy_detach(phydev);
  }
  EXPORT_SYMBOL(phy_disconnect);
 diff --git a/drivers/net/ucc_geth_mii.c b/drivers/net/ucc_geth_mii.c
 index a3af4ea..84c7295 100644
 --- a/drivers/net/ucc_geth_mii.c
 +++ b/drivers/net/ucc_geth_mii.c
 @@ -217,6 +217,9 @@ static int uec_mdio_probe(struct of_device *ofdev, const 
 struct of_device_id *ma
   }
   }
  
 + /* register ourselves as the owner of this bus */
 + new_bus-owner = THIS_MODULE;
 +
   err = mdiobus_register(new_bus);
   if (0 != err) {
   printk(KERN_ERR %s: Cannot register as MDIO bus\n,
 diff --git a/include/linux/phy.h b/include/linux/phy.h
 index 554836e..04ff6a5 100644
 --- a/include/linux/phy.h
 +++ b/include/linux/phy.h
 @@ -82,6 +82,10 @@ struct mii_bus {
   const char *name;
   int id;
   void *priv;
 +
 + /* The module that owns this bus */
 + struct module *owner;
 +
   int (*read)(struct mii_bus *bus, int phy_id, int regnum);
   int (*write)(struct mii_bus *bus, int phy_id, int regnum, u16 val);
   int (*reset)(struct mii_bus *bus);


-- 
Stephen Hemminger [EMAIL PROTECTED]

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/4] [CORE]: datagram: basic memory accounting functions

2007-12-21 Thread Hideo AOKI
David Miller wrote:
 From: Hideo AOKI [EMAIL PROTECTED]
 Date: Thu, 20 Dec 2007 23:18:54 -0500
 
 Also, the memory accounting is done at different parts in
 the socket code paths for stream vs. datagram.  This is why
 everything is inconsistent, and, a mess.
 Could you tell me more detailed information?
 
 I think the core thing is that TCP and INET protocols call into
 the memory accounting internally, either inside their own code
 paths or with inet_*() helpers.
 
 This is versus what we really want is everything happening via generic
 sk_foo() helpers.
 
 If that's what's happening already, great, just consolidate the
 datagram vs. stream stuff and it should be good.

Thank you for the explanation.

I'll do my best.

Regards,
Hideo

-- 
Hitachi Computer Products (America) Inc.

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC PATCH v9 06/18] LSM: Add inet_sys_snd_skb() LSM hook (fwd)

2007-12-21 Thread James Morris
This is part of a large patchset which finally fixes labeled networking, 
which we're hoping to get into 2.6.25.

Thread @ http://thread.gmane.org/gmane.linux.kernel.lsm/4894

The patch below is the only one which is not self-contained  impacts on 
core networking code.

If anyone has any objections or comments on this patch, please let us 
know.

-- Forwarded message --
Date: Fri, 21 Dec 2007 12:09:28 -0500
From: Paul Moore [EMAIL PROTECTED]
To: [EMAIL PROTECTED], [EMAIL PROTECTED]
Cc: [EMAIL PROTECTED], [EMAIL PROTECTED]
Subject: [RFC PATCH v9 06/18] LSM: Add inet_sys_snd_skb() LSM hook

Add an inet_sys_snd_skb() LSM hook to allow the LSM to provide packet level
access control for all outbound packets.  Using the existing postroute_last
netfilter hook turns out to be problematic as it is can be invoked multiple
times for a single packet, e.g. individual IPsec transforms, adding unwanted
overhead and complicating the security policy.

Signed-off-by: Paul Moore [EMAIL PROTECTED]
---

 include/linux/security.h |   11 +++
 net/ipv4/ip_output.c |7 +++
 net/ipv6/ip6_output.c|5 +
 security/dummy.c |8 +++-
 security/security.c  |6 ++
 5 files changed, 36 insertions(+), 1 deletions(-)

diff --git a/include/linux/security.h b/include/linux/security.h
index db19c92..1b8d332 100644
--- a/include/linux/security.h
+++ b/include/linux/security.h
@@ -876,6 +876,10 @@ struct request_sock;
  * Sets the connection's peersid to the secmark on skb.
  * @req_classify_flow:
  * Sets the flow's sid to the openreq sid.
+ * @inet_sys_snd_skb:
+ * Check permissions on outgoing network packets.
+ * @skb is the packet to check
+ * @family is the packet's address family
  *
  * Security hooks for XFRM operations.
  *
@@ -1416,6 +1420,7 @@ struct security_operations {
void (*inet_csk_clone)(struct sock *newsk, const struct request_sock 
*req);
void (*inet_conn_established)(struct sock *sk, struct sk_buff *skb);
void (*req_classify_flow)(const struct request_sock *req, struct flowi 
*fl);
+   int (*inet_sys_snd_skb)(struct sk_buff *skb, int family);
 #endif /* CONFIG_SECURITY_NETWORK */
 
 #ifdef CONFIG_SECURITY_NETWORK_XFRM
@@ -2328,6 +2333,7 @@ void security_sk_free(struct sock *sk);
 void security_sk_clone(const struct sock *sk, struct sock *newsk);
 void security_sk_classify_flow(struct sock *sk, struct flowi *fl);
 void security_req_classify_flow(const struct request_sock *req, struct flowi 
*fl);
+int security_inet_sys_snd_skb(struct sk_buff *skb, int family);
 void security_sock_graft(struct sock*sk, struct socket *parent);
 int security_inet_conn_request(struct sock *sk,
struct sk_buff *skb, struct request_sock *req);
@@ -2471,6 +2477,11 @@ static inline void security_req_classify_flow(const 
struct request_sock *req, st
 {
 }
 
+static inline int security_inet_sys_snd_skb(struct sk_buff *skb, int family)
+{
+   return 0;
+}
+
 static inline void security_sock_graft(struct sock* sk, struct socket *parent)
 {
 }
diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c
index fd99fbd..82a7297 100644
--- a/net/ipv4/ip_output.c
+++ b/net/ipv4/ip_output.c
@@ -204,6 +204,8 @@ static inline int ip_skb_dst_mtu(struct sk_buff *skb)
 
 static int ip_finish_output(struct sk_buff *skb)
 {
+   int err;
+
 #if defined(CONFIG_NETFILTER)  defined(CONFIG_XFRM)
/* Policy lookup after SNAT yielded a new policy */
if (skb-dst-xfrm != NULL) {
@@ -211,6 +213,11 @@ static int ip_finish_output(struct sk_buff *skb)
return dst_output(skb);
}
 #endif
+
+   err = security_inet_sys_snd_skb(skb, AF_INET);
+   if (err)
+   return err;
+
if (skb-len  ip_skb_dst_mtu(skb)  !skb_is_gso(skb))
return ip_fragment(skb, ip_finish_output2);
else
diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c
index 6338a9c..44ddf32 100644
--- a/net/ipv6/ip6_output.c
+++ b/net/ipv6/ip6_output.c
@@ -72,8 +72,13 @@ static __inline__ void ipv6_select_ident(struct sk_buff 
*skb, struct frag_hdr *f
 
 static int ip6_output_finish(struct sk_buff *skb)
 {
+   int err;
struct dst_entry *dst = skb-dst;
 
+   err = security_inet_sys_snd_skb(skb, AF_INET6);
+   if (err)
+   return err;
+
if (dst-hh)
return neigh_hh_output(dst-hh, skb);
else if (dst-neighbour)
diff --git a/security/dummy.c b/security/dummy.c
index 0b62f95..384979a 100644
--- a/security/dummy.c
+++ b/security/dummy.c
@@ -848,6 +848,11 @@ static inline void dummy_req_classify_flow(const struct 
request_sock *req,
struct flowi *fl)
 {
 }
+
+static inline int dummy_inet_sys_snd_skb(struct sk_buff *skb, int family)
+{
+   return 0;
+}
 #endif /* CONFIG_SECURITY_NETWORK */
 
 #ifdef CONFIG_SECURITY_NETWORK_XFRM
@@ -1122,7 +1127,8 @@ void security_fixup_ops (struct security_operations *ops)
   

[PATCH] Reduce locking in TX path of forcedth driver

2007-12-21 Thread Tom Herbert
Reduce the amount of locking in the TX path.  Instead of using both 
netif_tx_lock and dev-priv-lock on transmitting, a single private lock 
(dev-priv-tx_lock) is used.  This method is similar to that of the e1000 
driver, including the logic to stop the queue in the start xmit functions, and 
the logic to wake the queue in the TX done functions.  We see some performance 
improvement with this patch.

Signed-off-by: Tom Herbert [EMAIL PROTECTED]

--- linux-2.6/drivers/net/forcedeth.c.orig  2007-12-21 16:26:15.743639000 
-0800
+++ linux-2.6/drivers/net/forcedeth.c   2007-12-21 16:51:19.001325000 -0800
@@ -525,6 +525,12 @@ union ring_type {
 #define RING_MAX_DESC_VER_11024
 #define RING_MAX_DESC_VER_2_3  16384
 
+/*
+ * Maxmimum number of fragments that a single packet could need in
+ * transmit.
+ */
+#defineMAX_TX_FRAGS(MAX_SKB_FRAGS + (65535  
NV_TX2_TSO_MAX_SHIFT))
+
 /* rx/tx mac addr + type + vlan + align + slack*/
 #define NV_RX_HEADERS  (64)
 /* even more slack. */
@@ -738,9 +744,8 @@ struct nv_skb_map {
  * critical parts:
  * - rx is (pseudo-) lockless: it relies on the single-threading provided
  * by the arch code for interrupts.
- * - tx setup is lockless: it relies on netif_tx_lock. Actual submission
- * needs dev-priv-lock :-(
- * - set_multicast_list: preparation lockless, relies on netif_tx_lock.
+ * - tx uses dev-priv-tx_lock and not netif_tx_lock.  TX done processing
+ *   only acquires dev-priv-tx_lock when the queue needs to be awoken.
  */
 
 /* in dev: base, irq */
@@ -806,6 +811,7 @@ struct fe_priv {
/*
 * tx specific fields.
 */
+   spinlock_t tx_lock;
union ring_type get_tx, put_tx, first_tx, last_tx;
struct nv_skb_map *get_tx_ctx, *put_tx_ctx;
struct nv_skb_map *first_tx_ctx, *last_tx_ctx;
@@ -814,7 +820,6 @@ struct fe_priv {
union ring_type tx_ring;
u32 tx_flags;
int tx_ring_size;
-   int tx_stop;
 
/* vlan fields */
struct vlan_group *vlangrp;
@@ -1775,9 +1780,16 @@ static inline u32 nv_get_empty_tx_slots(
return (u32)(np-tx_ring_size - ((np-tx_ring_size + (np-put_tx_ctx - 
np-get_tx_ctx)) % np-tx_ring_size));
 }
 
+static inline u32 nv_get_used_tx_slots(struct fe_priv *np)
+{
+   return (u32)((np-tx_ring_size + (np-put_tx_ctx - np-get_tx_ctx)) %
+np-tx_ring_size);
+}
+
+#defineTX_WAKE_THRESHOLD(np) ((np)-tx_ring_size / 4)
+
 /*
  * nv_start_xmit: dev-hard_start_xmit function
- * Called with netif_tx_lock held.
  */
 static int nv_start_xmit(struct sk_buff *skb, struct net_device *dev)
 {
@@ -1790,7 +1802,7 @@ static int nv_start_xmit(struct sk_buff 
u32 bcnt;
u32 size = skb-len-skb-data_len;
u32 entries = (size  NV_TX2_TSO_MAX_SHIFT) + ((size  
(NV_TX2_TSO_MAX_SIZE-1)) ? 1 : 0);
-   u32 empty_slots;
+   unsigned long irq_flags;
struct ring_desc* put_tx;
struct ring_desc* start_tx;
struct ring_desc* prev_tx;
@@ -1802,12 +1814,22 @@ static int nv_start_xmit(struct sk_buff 
   ((skb_shinfo(skb)-frags[i].size  
(NV_TX2_TSO_MAX_SIZE-1)) ? 1 : 0);
}
 
-   empty_slots = nv_get_empty_tx_slots(np);
-   if (unlikely(empty_slots = entries)) {
-   spin_lock_irq(np-lock);
-   netif_stop_queue(dev);
-   np-tx_stop = 1;
-   spin_unlock_irq(np-lock);
+   local_irq_save(irq_flags);
+   if (!spin_trylock(np-tx_lock)) {
+   /* Collision - tell upper layer to requeue */
+   local_irq_restore(irq_flags);
+   return NETDEV_TX_LOCKED;
+   }
+
+   if (unlikely(nv_get_empty_tx_slots(np) = entries)) {
+   if (!netif_queue_stopped(dev)) {
+   netif_stop_queue(dev);
+
+   /* This is a hard error, log it. */
+   printk(KERN_ERR %s: BUG! Tx Ring full when 
+   queue awake\n, dev-name);
+   }
+   spin_unlock_irqrestore(np-tx_lock, irq_flags);
return NETDEV_TX_BUSY;
}
 
@@ -1858,6 +1880,12 @@ static int nv_start_xmit(struct sk_buff 
} while (size);
}
 
+   /*
+* Make put_tx_ctx visible to nx_tx_done as soon as possible,
+* this might avoid an unnecessary queue wakeup.
+*/
+   smp_mb();
+
/* set last fragment flag  */
prev_tx-flaglen |= cpu_to_le32(tx_flags_extra);
 
@@ -1870,14 +1898,10 @@ static int nv_start_xmit(struct sk_buff 
tx_flags_extra = skb-ip_summed == CHECKSUM_PARTIAL ?
 NV_TX2_CHECKSUM_L3 | NV_TX2_CHECKSUM_L4 : 0;
 
-   spin_lock_irq(np-lock);
-
/* set tx flags */
start_tx-flaglen |= cpu_to_le32(tx_flags | tx_flags_extra);
np-put_tx.orig = put_tx;
 
-   spin_unlock_irq(np-lock);
-
dprintk(KERN_DEBUG %s: nv_start_xmit: entries %d queued for 

[no subject]

2007-12-21 Thread Masahide NAKAMURA
Subject: [XFRM] Documentaion: Fix error example at XFRMOUTSTATEMODEERROR. (Re: 
[XFRM]: Fix outbound statistics.)

Hello,

On Fri, 21 Dec 2007 23:11:11 +0800
Herbert Xu [EMAIL PROTECTED] wrote:

 On Fri, Dec 21, 2007 at 11:25:00PM +0900, Masahide NAKAMURA wrote:
 
  do {
  err = xfrm_state_check_space(x, skb);
  -   if (err)
  +   if (err) {
  +   XFRM_INC_STATS(LINUX_MIB_XFRMOUTERROR);
  goto error_nolock;
  +   }
   
  err = x-outer_mode-output(x, skb);
  -   if (err)
  +   if (err) {
  +   XFRM_INC_STATS(LINUX_MIB_XFRMOUTSTATEMODEERROR);
 
 BTW, none of our existing mode output functions actually return
 an error.  I noticed that the description for this field is actually
 Transformation mode specific error, e.g. Outer header space is not
 enough.  This is slightly misleading as output header space is
 checked by xfrm_state_check_space so if there's an error that's
 where it'll show up.

Thanks for comment, Herbert.

I fix the documentation to remove e.g. Outer header space is not enough
from XFRMSTATEMODEERROR.
About error code from xfrm_state_check_space(), I still map it XFRMOUTERROR
(other errors) this time because I think the error here is not a length
error by protocol (e.g MTU related things) but an internal buffer management.

Any comments for the statistics are still welcomed.

David, please apply the following patch, too.


[XFRM] Documentaion: Fix error example at XFRMOUTSTATEMODEERROR.

Signed-off-by: Masahide NAKAMURA [EMAIL PROTECTED]
---
 Documentation/networking/xfrm_proc.txt |1 -
 1 files changed, 0 insertions(+), 1 deletions(-)

diff --git a/Documentation/networking/xfrm_proc.txt 
b/Documentation/networking/xfrm_proc.txt
index ec9045b..53c1a58 100644
--- a/Documentation/networking/xfrm_proc.txt
+++ b/Documentation/networking/xfrm_proc.txt
@@ -60,7 +60,6 @@ XfrmOutStateProtoError:
Transformation protocol specific error
 XfrmOutStateModeError:
Transformation mode specific error
-   e.g. Outer header space is not enough
 XfrmOutStateExpired:
State is expired
 XfrmOutPolBlock:
-- 
1.4.4.2

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Reduce locking in TX path of forcedth driver

2007-12-21 Thread Stephen Hemminger
On Fri, 21 Dec 2007 17:41:34 -0800 (PST)
[EMAIL PROTECTED] (Tom Herbert) wrote:

 Reduce the amount of locking in the TX path.  Instead of using both 
 netif_tx_lock and dev-priv-lock on transmitting, a single private lock 
 (dev-priv-tx_lock) is used.  This method is similar to that of the e1000 
 driver, including the logic to stop the queue in the start xmit functions, 
 and the logic to wake the queue in the TX done functions.  We see some 
 performance improvement with this patch.
 
 Signed-off-by: Tom Herbert [EMAIL PROTECTED]


  
 + spin_lock_init(np-tx_lock);
 +
 + dev-features |= NETIF_F_LLTX;
 +

NAK - lockless transmit is not desirable for real devices.

use netif_tx_lock() instead of your private lock

-- 
Stephen Hemminger [EMAIL PROTECTED]

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [ETH]: Combine format_addr() with print_mac().

2007-12-21 Thread Michael Chan
On Fri, 2007-12-21 at 14:36 -0800, Joe Perches wrote:
 On Fri, 2007-12-21 at 14:05 -0800, Michael Chan wrote:
  diff --git a/net/ethernet/eth.c b/net/ethernet/eth.c
  index 6b2e454..f760d41 100644
  --- a/net/ethernet/eth.c
  +++ b/net/ethernet/eth.c
  @@ -359,10 +359,33 @@ struct net_device *alloc_etherdev_mq(int sizeof_priv, 
  unsigned int queue_count)
   }
   EXPORT_SYMBOL(alloc_etherdev_mq);
   
  +static ssize_t _format_mac_addr(char *buf, const unsigned char *addr, int 
  len)
  +{
  +   int i;
  +   char *cp = buf;
  +
  +   for (i = 0; i  len; i++) {
  +   cp += sprintf(cp, %02x, addr[i]);
  +   if (i == len - 1)
  +   break;
  +   *cp++ = ':';
  +   }
  +   return cp - buf;
  +}
  +
  +ssize_t format_mac_addr(char *buf, const unsigned char *addr, int len)
  +{
  +   ssize_t l;
  +
  +   l = _format_mac_addr(buf, addr, len);
  +   strcpy(buf + l, \n);
  +   return l + 1;
  +}
  +EXPORT_SYMBOL(format_mac_addr);
  +
   char *print_mac(char *buf, const u8 *addr)
   {
  -   sprintf(buf, MAC_FMT,
  -   addr[0], addr[1], addr[2], addr[3], addr[4], addr[5]);
  +   _format_mac_addr(buf, addr, ETH_ALEN);
  return buf;
   }
   EXPORT_SYMBOL(print_mac);
 
 I think const unsigned char *addr should be const u8 *addr

The dev_addr is declared as unsigned char* in struct net_device.  To be
consistent, can we change print_mac() and MAC_BUF to use unsigned char*?
They are really the same.

 ssize_t? shouldn't it be size_t?

I'm just keeping the prototype unchanged as originally defined in net-
sysfs.c

 Indexing buf by int len is unchecked.
 That could lead to unintended buffer overruns.
 Maybe add a buflen argument and use snprintf?

Again, I kept the semantics the same as the original, but will be happy
to add a buflen for better checking.

 
 I had a patch that added some type-safety to print_mac
 and prevented this unintended buffer overrun.
 
 It seems it wasn't applied.
 
 ---
 
 Subject: Re: [PATCH net-2.6.24] introduce MAC_FMT/MAC_ARG
 Date: Mon, 24 Sep 2007 10:28:36 -0700
 
 Here is a patch that adds some type safety to print_mac
 by using a struct print_mac_buf * instead of char *.
 
 It also reduces the defconfig vmlinux size by 8 bytes.
 
 Signed-off-by:  Joe Perches [EMAIL PROTECTED]
 
 --
 
  include/linux/if_ether.h |   12 ++--
  net/ethernet/eth.c   |6 +++---
  2 files changed, 13 insertions(+), 5 deletions(-)
 
 diff --git a/include/linux/if_ether.h b/include/linux/if_ether.h
 index 57abca1..620d6b1 100644
 --- a/include/linux/if_ether.h
 +++ b/include/linux/if_ether.h
 @@ -126,7 +126,15 @@ extern struct ctl_table ether_table[];
   * Display a 6 byte device address (MAC) in a readable format.
   */
  #define MAC_FMT %02x:%02x:%02x:%02x:%02x:%02x
 -extern char *print_mac(char *buf, const u8 *addr);
 -#define DECLARE_MAC_BUF(var) char var[18] __maybe_unused
 +
 +struct print_mac_buf {
 +   char formatted_mac_addr[18];
 +};
 +
 +#define DECLARE_MAC_BUF(var) \
 +   struct print_mac_buf __maybe_unused _##var; \
 +   struct print_mac_buf __maybe_unused *var = _##var
 +
 +extern char *print_mac(struct print_mac_buf *buf, const u8 *addr);
  
  #endif /* _LINUX_IF_ETHER_H */
 diff --git a/net/ethernet/eth.c b/net/ethernet/eth.c
 index 2aaf6fa..ad82613 100644
 --- a/net/ethernet/eth.c
 +++ b/net/ethernet/eth.c
 @@ -338,10 +338,10 @@ struct net_device *alloc_etherdev_mq(int
 sizeof_priv, unsigned int queue_count)
  }
  EXPORT_SYMBOL(alloc_etherdev_mq);
  
 -char *print_mac(char *buf, const u8 *addr)
 +char *print_mac(struct print_mac_buf *buf, const u8 *addr)
  {
 -   sprintf(buf, MAC_FMT,
 +   sprintf(buf-formatted_mac_addr, MAC_FMT,
 addr[0], addr[1], addr[2], addr[3], addr[4], addr[5]);
 -   return buf;
 +   return buf-formatted_mac_addr;
  }
  EXPORT_SYMBOL(print_mac);
 
 
 
 

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


  1   2   >