Re: [ofa-general] Re: [PATCH 2/3][NET_BATCH] net core use batching

2007-10-11 Thread Krishna Kumar2
Hi Dave,

David Miller wrote on 10/10/2007 02:13:31 AM:

  Hopefully that new qdisc will just use the TX rings of the hardware
  directly. They are typically large enough these days. That might avoid
  some locking in this critical path.

 Indeed, I also realized last night that for the default qdiscs
 we do a lot of stupid useless work.  If the queue is a FIFO
 and the device can take packets, we should send it directly
 and not stick it into the qdisc at all.

Since you are talking of how it should be done in the *current* code,
I feel LLTX drivers will not work nicely with this.

Actually I was trying this change a couple of weeks back, but felt
that doin go would result in out of order packets (skbs present in
q which were not sent out for LLTX failure will be sent out only at
next net_tx_action, while other skbs are sent ahead).

One option is to first call qdisc_run() and then process this skb,
but that is ugly (requeue handling).

However I guess this can be done cleanly once LLTX is removed.

Thanks,

- KK

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] rtnl: Simplify ASSERT_RTNL

2007-10-11 Thread Eric W. Biederman
David Miller [EMAIL PROTECTED] writes:

 From: [EMAIL PROTECTED] (Eric W. Biederman)
 Date: Fri, 28 Sep 2007 18:59:08 -0600

 
 Currently we have the call path:
 macvlan_open - dev_unicast_add - __dev_set_rx_mode -
  __dev_set_promiscuity - ASSERT_RTNL - mutex_trylock
 
 When mutex debugging is on taking a mutex complains if we are not
 allowed to sleep.  At that point we have called netif_tx_lock_bh
 so we are clearly not allowed to sleep.  Arguably this is not a
 problem for mutex_trylock.
 
 However we can avoid the complaint and make the ASSERT_RTNL code
 cheaper, faster and more obvious by simply calling mutex_is_locked.
 
 So this patch adds rtnl_is_locked (which does mutex_is_locked on
 the rtnl_mutex) and changes ASSERT_RTNL to use that.
 
 Signed-off-by: Eric W. Biederman [EMAIL PROTECTED]

 There was a lot of discussion about how to do this right and
 therefore you'll need to resubmit all of this with this
 discussioned issues addressed.

Thanks for the update on where this patch sits.

My understanding is that the patch as I submitted it is correct.

The code path that sparked this patch has been seen to be extremely
convoluted from a locking perspective.  Herbert and Patrick have been
discussing that code path and it was my impression they were working
together to figure out how to refactor that code path to make the
locking simpler.  There are enough convoluted details that I
do not feel comfortable refactoring that code path.

There was a practical suggestion by Herbert that ASSERT_RTNL have a
might_sleep() added.  That suggestion will currently result in
ASSERT_RTNL firing unnecessarily from the macvlan_open code path.

So I do not feel comfortable adding a might_sleep() into ASSERT_RTNL,
as it appears that it will warn on currently correct code.  Even if
that code has code has nearly incomprehensible locking.

Eric
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] rtnl: Simplify ASSERT_RTNL

2007-10-11 Thread Herbert Xu
On Thu, Oct 11, 2007 at 12:57:31AM -0600, Eric W. Biederman wrote:

 There was a practical suggestion by Herbert that ASSERT_RTNL have a
 might_sleep() added.  That suggestion will currently result in
 ASSERT_RTNL firing unnecessarily from the macvlan_open code path.

As I've already said we should change the macvlan and core
netdev code so that this doesn't happen in the first place.
In gernal checking for the RTNL while holding a spin lock is
a sign of a bug.

So I would object to a patch that caused the RTNL_ASSERT to not
warn about being called in an atomic context.

I don't have a problem with your patch per se, it's the fact
that the patch is removing the warning when it's called in an
atomic context that I don't like.

Thanks,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmVHI~} [EMAIL PROTECTED]
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: PROBLEM: skb_clone SMP race?

2007-10-11 Thread Santiago Font Arquer
Thank you.
I was thinking about something like that when I said not find
explicit checks, but I was confused because the sk_buff code is
plenty of explicit checks and I understood those checks as a try to
make the code self-protected against wrong uses: I, sk_buff, offer
you this interface. Don´t worry too much about my internals: I will
complain on your mistakes

The fact the name of the functionality is generic fast_clone and not
specific tcp_fast_clone drove me to think: Ok, the allocation of
fast-clonable skbs is NOW only in ONE place inside TCP code and sure
that they have the implicit checks to avoid this race when they call
skb_clone, but -NOW- and -ONE- are not a big guarantee...

A newbie's thought. :)



2007/10/11, Herbert Xu [EMAIL PROTECTED]:
 Santiago Font Arquer [EMAIL PROTECTED] wrote:
 
If an skb with fast clone available (first if true) has
  references in different CPUs (skb-users1) (I do not find explicit
  checks for this to be impossible), if skb_clone is called
  simultaneously over that skb, both callers can get the same clone (the
  fast clone) and different problems follow: wrong clone_skb-users
  (1 as expected by the caller, but it should be, to be true, 2),
  fclone_ref set to 3 involving further problems, ...

 Fast clones are only used by TCP where the original skb is
 never given to the outside world.  This plus the fact that
 a given TCP socket is single-threaded makes it safe.

 Cheers,
 --
 Visit Openswan at http://www.openswan.org/
 Email: Herbert Xu ~{PmVHI~} [EMAIL PROTECTED]
 Home Page: http://gondor.apana.org.au/~herbert/
 PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] rtnl: Simplify ASSERT_RTNL

2007-10-11 Thread Eric W. Biederman
Herbert Xu [EMAIL PROTECTED] writes:

 On Thu, Oct 11, 2007 at 12:57:31AM -0600, Eric W. Biederman wrote:

 There was a practical suggestion by Herbert that ASSERT_RTNL have a
 might_sleep() added.  That suggestion will currently result in
 ASSERT_RTNL firing unnecessarily from the macvlan_open code path.

 As I've already said we should change the macvlan and core
 netdev code so that this doesn't happen in the first place.

Agreed.  Until that is done I am reluctant to add a warning
to ASSERT_RTNL.  Last I looked at that part of the thread
it looked like you and Patrick were making good progress
towards unraveling that, so I have no problem adding
an extra warning when we don't expect it to ever trigger.

 In gernal checking for the RTNL while holding a spin lock is
 a sign of a bug.

 So I would object to a patch that caused the RTNL_ASSERT to not
 warn about being called in an atomic context.

ASSERT_RTNL does not warn about being called in an atomic context
today!

 I don't have a problem with your patch per se, it's the fact
 that the patch is removing the warning when it's called in an
 atomic context that I don't like.

No my patch does not remove a warning.

Way way deep in mutex debugging on the slowpath there is a unreadable
and incomprehensible WARN_ON in muxtex_trylock that will trigger if
you have 10 tons of debugging turned on, and you are in,
interrupt context, and you manage to hit the slow path.  I think that
is a pretty unlikely scenario.

Eric
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] rtnl: Simplify ASSERT_RTNL

2007-10-11 Thread Herbert Xu
On Thu, Oct 11, 2007 at 02:23:35AM -0600, Eric W. Biederman wrote:

  So I would object to a patch that caused the RTNL_ASSERT to not
  warn about being called in an atomic context.
 
 ASSERT_RTNL does not warn about being called in an atomic context
 today!

Well it did didn't it or we wouldn't be having this thread :)
 
 Way way deep in mutex debugging on the slowpath there is a unreadable
 and incomprehensible WARN_ON in muxtex_trylock that will trigger if
 you have 10 tons of debugging turned on, and you are in,
 interrupt context, and you manage to hit the slow path.  I think that
 is a pretty unlikely scenario.

Well thanks to that warning we're on our way of improving the
code that triggered it in such a way that this warning will soon
go silent.

That's precisely the reason why I object to having this warning
removed.  Now you have a good point that this warning doesn't
trigger all the time.  The fix to that is to *make* it trigger
always, not removing it.

Thanks,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmVHI~} [EMAIL PROTECTED]
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-2.6.23-rc5] ipsec interfamily route handling fix

2007-10-11 Thread Joakim Koskela
On Friday 14 September 2007 23:42:52 David Miller wrote:
 From: Joakim Koskela [EMAIL PROTECTED]
 Date: Thu, 6 Sep 2007 19:00:10 +0300

  This patch addresses a couple of issues related to interfamily ipsec
  modes. The problem is that the structure of the routing info changes
  with the family during the __xfrmX_bundle_create, which hasn't been
  taken properly into account. Seems that by coincidence it hasn't

 Since nobody else found time to review this, I did :-)


Thanks for taking the time, and sorry for not getting back on this until
now..

 It sets encap_type in the inner loop, but what if we find multiple
 entries some ipv4 and some ipv6?  This logic can't be right.

 Instead, we need to treat these objects on an individual basis, I
 think, and that requires a bit more changes.

Yes, this is what I was worried about. But as I'm not that familiar with 
how these dst_entries are used down the line or with subpolicy 
transformations I didn't feel comfortable rewriting that completely.
I'm trying to get this thing solved (any help is of course appreciated :),
but in the meantime I think the following bit could actually be separated as,
although related, fixes an issue not directly tied to the original problem
(..and would be great to get applied as it makes interfamily work quite ok
for a number of setups). What do you think?

..and for a short description:

This patch resets the ipv4-related flags in the new flow as their content
will otherwise depend on the bits of the ipv6 addresses the struct was 
previously used for. For example, fl4_tos might have RTO_ONLINK set, which
usually prevents the right route from being found.

--
diff --git a/net/ipv6/xfrm6_policy.c b/net/ipv6/xfrm6_policy.c
index 15aa4c5..b4a0b54 100644
--- a/net/ipv6/xfrm6_policy.c
+++ b/net/ipv6/xfrm6_policy.c
@@ -185,6 +185,8 @@ __xfrm6_bundle_create(struct xfrm_policy *policy, struct 
xfrm_state **xfrm, int
case AF_INET:
fl_tunnel.fl4_dst = xfrm[i]-id.daddr.a4;
fl_tunnel.fl4_src = xfrm[i]-props.saddr.a4;
+   fl_tunnel.fl4_tos = 0;
+   fl_tunnel.fl4_scope = 0;
break;
case AF_INET6:
ipv6_addr_copy(fl_tunnel.fl6_dst, 
__xfrm6_bundle_addr_remote(xfrm[i], fl-fl6_dst));
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH][NETNS] Make ifindex generation per-namespace

2007-10-11 Thread Johannes Berg
On Wed, 2007-10-10 at 13:51 -0600, Eric W. Biederman wrote:

 Yes.  Netlink sockets are per-namespace and you can use the namespace
 of a netlink socket to look up a netdev.

Ok, thanks. I still haven't really looked into the wireless vs. net
namespaces problem but this will probably help.

johannes


signature.asc
Description: This is a digitally signed message part


Re: [RFC PATCH] [TCP]: Fix lost_retrans loop vs fastpath problems

2007-10-11 Thread Ilpo Järvinen
On Thu, 11 Oct 2007, TAKANO Ryousei wrote:

 From: Ilpo Järvinen [EMAIL PROTECTED]
 Subject: [RFC PATCH] [TCP]: Fix lost_retrans loop vs fastpath problems
 Date: Tue,  9 Oct 2007 15:20:01 +0300
 
 Thanks Ilpo!  I am trying to evaluate this patch.

There's a minor problem in this 2nd patch, it's just preventing the 
cnt == tp-retrans_out short-circuit from working, not a correctness 
problem though it could affect the performance. I'll post larger patch 
series among which is a fixed version (hopefully today).

 But, I got 
 a kernel panic at net_rx_action() in our experimental setting.
 I am using the net-2.6.24 kernel _without_ this patch. 
 (I will post a bug report separately).

...Please do. :-)

 Anyway, I will report the result as soon as possible.

Thanks. ...It's very interesting to see because it's not that clear
cut how the extra processing that is necessary affects high-speed 
performance, it could add yet another source of RTOs due
processing latency.

-- 
 i.

Re: authenc compile warnings in current net-2.6.24

2007-10-11 Thread Sebastian Siewior
* David Miller | 2007-10-10 16:25:28 [-0700]:

From: Sebastian Siewior [EMAIL PROTECTED]
Date: Wed, 10 Oct 2007 21:53:37 +0200

 * Oliver Hartkopp | 2007-10-10 19:53:53 [+0200]:
 
  CC [M] crypto/authenc.o
  crypto/authenc.c: In function ?crypto_authenc_hash?:
  crypto/authenc.c:88: warning: ?cryptlen? may be used uninitialized in this 
  function
  crypto/authenc.c:87: warning: ?dst? may be used uninitialized in this 
  function
  crypto/authenc.c: In function ?crypto_authenc_decrypt?:
  crypto/authenc.c:163: warning: ?cryptlen? may be used uninitialized in 
  this 
  function
  crypto/authenc.c:163: note: ?cryptlen? was declared here
  crypto/authenc.c:162: warning: ?src? may be used uninitialized in this 
  function
  crypto/authenc.c:162: note: ?src? was declared here
 
  do you already know these warnings?
 
 Those warnings are looking like a compiler bug to me.

To be honest I don't know of any compiler which commits enough
flow variable analysis to support doing %100 accurate warnings
in situations like this.

gcc (GCC) 4.1.2 (Gentoo 4.1.2) did not produce any warnings in this
case.

Since the compiler is unlikely to do so, I think we should fix
it somehow because useless warnings just distract.
sure.

Sebastian
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 6/7] [TCP]: Fix lost_retrans loop vs fastpath problems

2007-10-11 Thread Ilpo Järvinen
Detection implemented with lost_retrans must work also when
fastpath is taken, yet most of the queue is skipped including
(very likely) those retransmitted skb's we're interested in.
This problem appeared when the hints got added, which removed
a need to always walk over the whole write queue head.
Therefore decicion for the lost_retrans worker loop entry must
be separated from the sacktag processing more than it was
necessary before.

It turns out to be problematic to optimize the worker loop
very heavily because ack_seqs of skb may have a number of
discontinuity points. Maybe similar approach as currently is
implemented could be attempted but that's becoming more and
more complex because the trend is towards less skb walking
in sacktag marker. Trying a simple work until all rexmitted
skbs heve been processed approach.

Maybe after(highest_sack_end_seq, tp-high_seq) checking is not
sufficiently accurate and causes entry too often in no-work-to-do
cases. Since that's not known, I've separated solution to that
from this patch.

Noticed because of report against a related problem from TAKANO
Ryousei [EMAIL PROTECTED]. He also provided a patch to
that part of the problem. This patch includes solution to it
(though this patch has to use somewhat different placement).
TAKANO's description and patch is available here:

  http://marc.info/?l=linux-netdevm=119149311913288w=2

...In short, TAKANO's problem is that end_seq the loop is using
not necessarily the largest SACK block's end_seq because the
current ACK may still have higher SACK blocks which are later
by the loop.

Cc: TAKANO Ryousei [EMAIL PROTECTED]
Signed-off-by: Ilpo Järvinen [EMAIL PROTECTED]
---
 net/ipv4/tcp_input.c |   37 ++---
 1 files changed, 22 insertions(+), 15 deletions(-)

diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index a066039..d5e0fcc 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -1109,27 +1109,34 @@ static int tcp_is_sackblock_valid(struct tcp_sock *tp, 
int is_dsack,
 /* Check for lost retransmit. This superb idea is borrowed from ratehalving.
  * Event C. Later note: FACK people cheated me again 8), we have to account
  * for reordering! Ugly, but should help.
+ *
+ * Search retransmitted skbs from write_queue that were sent when snd_nxt was
+ * less than what is now known to be received by the other end (derived from
+ * SACK blocks by the caller).
  */
-static int tcp_mark_lost_retrans(struct sock *sk, u32 lost_retrans)
+static int tcp_mark_lost_retrans(struct sock *sk, u32 received_upto)
 {
struct tcp_sock *tp = tcp_sk(sk);
struct sk_buff *skb;
int flag = 0;
+   int cnt = 0;
 
tcp_for_write_queue(skb, sk) {
u32 ack_seq = TCP_SKB_CB(skb)-ack_seq;
 
if (skb == tcp_send_head(sk))
break;
-   if (after(TCP_SKB_CB(skb)-seq, lost_retrans))
+   if (cnt == tp-retrans_out)
break;
if (!after(TCP_SKB_CB(skb)-end_seq, tp-snd_una))
continue;
 
-   if ((TCP_SKB_CB(skb)-sacked  TCPCB_SACKED_RETRANS) 
-   after(lost_retrans, ack_seq) 
+   if (!(TCP_SKB_CB(skb)-sacked  TCPCB_SACKED_RETRANS))
+   continue;
+
+   if (after(received_upto, ack_seq) 
(tcp_is_fack(tp) ||
-!before(lost_retrans,
+!before(received_upto,
 ack_seq + tp-reordering * tp-mss_cache))) {
TCP_SKB_CB(skb)-sacked = ~TCPCB_SACKED_RETRANS;
tp-retrans_out -= tcp_skb_pcount(skb);
@@ -1143,6 +1150,8 @@ static int tcp_mark_lost_retrans(struct sock *sk, u32 
lost_retrans)
flag |= FLAG_DATA_SACKED;
NET_INC_STATS_BH(LINUX_MIB_TCPLOSTRETRANSMIT);
}
+   } else {
+   cnt += tcp_skb_pcount(skb);
}
}
return flag;
@@ -1225,7 +1234,7 @@ tcp_sacktag_write_queue(struct sock *sk, struct sk_buff 
*ack_skb, u32 prior_snd_
int num_sacks = (ptr[1] - TCPOLEN_SACK_BASE)3;
int reord = tp-packets_out;
int prior_fackets;
-   u32 lost_retrans = 0;
+   u32 highest_sack_end_seq = 0;
int flag = 0;
int found_dup_sack = 0;
int cached_fack_count;
@@ -1396,11 +1405,6 @@ tcp_sacktag_write_queue(struct sock *sk, struct sk_buff 
*ack_skb, u32 prior_snd_
continue;
}
 
-   if ((sackedTCPCB_SACKED_RETRANS) 
-   after(end_seq, TCP_SKB_CB(skb)-ack_seq) 
-   (!lost_retrans || after(end_seq, lost_retrans)))
-   lost_retrans = end_seq;
-
if (!in_sack)
continue;
 
@@ -1454,9 

[PATCH 7/7] [TCP]: Limit processing lost_retrans loop to work-to-do cases

2007-10-11 Thread Ilpo Järvinen
This addition of lost_retrans_low to tcp_sock might be
unnecessary, it's not clear how often lost_retrans worker is
executed when there wasn't work to do.

Cc: TAKANO Ryousei [EMAIL PROTECTED]
Signed-off-by: Ilpo Järvinen [EMAIL PROTECTED]
---
 include/linux/tcp.h   |2 ++
 net/ipv4/tcp_input.c  |   14 +++---
 net/ipv4/tcp_output.c |2 ++
 3 files changed, 15 insertions(+), 3 deletions(-)

diff --git a/include/linux/tcp.h b/include/linux/tcp.h
index 9ff456e..c5b94c1 100644
--- a/include/linux/tcp.h
+++ b/include/linux/tcp.h
@@ -348,6 +348,8 @@ struct tcp_sock {
int lost_cnt_hint;
int retransmit_cnt_hint;
 
+   u32 lost_retrans_low;   /* Sent seq after any rxmit (lowest) */
+
u16 advmss; /* Advertised MSS   */
u16 prior_ssthresh; /* ssthresh saved at recovery start */
u32 lost_out;   /* Lost packets */
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index d5e0fcc..0a42e93 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -1112,7 +1112,8 @@ static int tcp_is_sackblock_valid(struct tcp_sock *tp, 
int is_dsack,
  *
  * Search retransmitted skbs from write_queue that were sent when snd_nxt was
  * less than what is now known to be received by the other end (derived from
- * SACK blocks by the caller).
+ * SACK blocks by the caller). Also calculate the lowest snd_nxt among the
+ * remaining retransmitted skbs to avoid some costly processing per ACKs.
  */
 static int tcp_mark_lost_retrans(struct sock *sk, u32 received_upto)
 {
@@ -1120,6 +1121,7 @@ static int tcp_mark_lost_retrans(struct sock *sk, u32 
received_upto)
struct sk_buff *skb;
int flag = 0;
int cnt = 0;
+   u32 new_low_seq = 0;
 
tcp_for_write_queue(skb, sk) {
u32 ack_seq = TCP_SKB_CB(skb)-ack_seq;
@@ -1151,9 +1153,15 @@ static int tcp_mark_lost_retrans(struct sock *sk, u32 
received_upto)
NET_INC_STATS_BH(LINUX_MIB_TCPLOSTRETRANSMIT);
}
} else {
+   if (!new_low_seq || before(ack_seq, new_low_seq))
+   new_low_seq = ack_seq;
cnt += tcp_skb_pcount(skb);
}
}
+
+   if (tp-retrans_out)
+   tp-lost_retrans_low = new_low_seq;
+
return flag;
 }
 
@@ -1481,8 +1489,8 @@ tcp_sacktag_write_queue(struct sock *sk, struct sk_buff 
*ack_skb, u32 prior_snd_
}
}
 
-   if (tp-retrans_out  highest_sack_end_seq 
-   after(highest_sack_end_seq, tp-high_seq) 
+   if (tp-retrans_out 
+   after(highest_sack_end_seq, tp-lost_retrans_low) 
icsk-icsk_ca_state == TCP_CA_Recovery)
flag |= tcp_mark_lost_retrans(sk, highest_sack_end_seq);
 
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index 5329675..324b420 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -1914,6 +1914,8 @@ int tcp_retransmit_skb(struct sock *sk, struct sk_buff 
*skb)
printk(KERN_DEBUG retrans_out leaked.\n);
}
 #endif
+   if (!tp-retrans_out)
+   tp-lost_retrans_low = tp-snd_nxt;
TCP_SKB_CB(skb)-sacked |= TCPCB_RETRANS;
tp-retrans_out += tcp_skb_pcount(skb);
 
-- 
1.5.0.6

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH net-2.6 0/7] TCP: small changes/fixes + lost_retrans brokeness fix

2007-10-11 Thread Ilpo Järvinen
Some small and simple changes. Minor testing done, until I got
one LostRetrans counted, which turned out to be hard task than
I thought (without specifically arranged scenario, just some  
basic netem delay  drops).  
 
Lost_retrans handling fix is now tested, I fixed one incorrectly
placed if from the lost_retrans_low patch after reading it
through.  
 
...Recount removal patch could be more elegant in the way it
plays around with the tcp_clear_retrans, but I just wasn't able
to figure out how to do that nicely enough...
 
Those cleanups are preparation for DSACK fix, which is yet to  
be done but IMHO they won't hurt if applied alone either.  
 

Dave, please apply at will. FYI, I'll post also the SACK rewrite
patchset rebased after this (built on top of these).

--
 i.


-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/7] [TCP]: Add bytes_acked (ABC) clearing to FRTO too

2007-10-11 Thread Ilpo Järvinen
I was reading tcp_enter_loss while looking for Cedric's bug and
noticed bytes_acked adjustment is missing from FRTO side.

Since bytes_acked will only be used in tcp_cong_avoid, I think
it's safe to assume RTO would be spurious. During FRTO cwnd
will be not controlled by tcp_cong_avoid and if FRTO calls for
conventional recovery, cwnd is adjusted and the result of wrong
assumption is cleared from bytes_acked. If RTO was in fact
spurious, we did normal ABC already and can continue without
any additional adjustments.

Signed-off-by: Ilpo Järvinen [EMAIL PROTECTED]
---
 net/ipv4/tcp_input.c |2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 101054c..e40857e 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -1687,6 +1687,7 @@ static void tcp_enter_frto_loss(struct sock *sk, int 
allowed_segments, int flag)
tp-snd_cwnd_cnt = 0;
tp-snd_cwnd_stamp = tcp_time_stamp;
tp-frto_counter = 0;
+   tp-bytes_acked = 0;
 
tp-reordering = min_t(unsigned int, tp-reordering,
 sysctl_tcp_reordering);
@@ -2824,6 +2825,7 @@ static void tcp_conservative_spur_to_response(struct 
tcp_sock *tp)
 {
tp-snd_cwnd = min(tp-snd_cwnd, tp-snd_ssthresh);
tp-snd_cwnd_cnt = 0;
+   tp-bytes_acked = 0;
TCP_ECN_queue_cwr(tp);
tcp_moderate_cwnd(tp);
 }
-- 
1.5.0.6

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/7] [TCP]: Fix mark_head_lost to ignore R-bit when trying to mark L

2007-10-11 Thread Ilpo Järvinen
This condition (plain R) can arise at least in recovery that
is triggered after tcp_undo_loss. There isn't any reason why
they should not be marked as lost, not marking makes in_flight
estimator to return too large values.

Signed-off-by: Ilpo Järvinen [EMAIL PROTECTED]
---
 net/ipv4/tcp_input.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index e40857e..c827285 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -1987,7 +1987,7 @@ static void tcp_mark_head_lost(struct sock *sk,
cnt += tcp_skb_pcount(skb);
if (cnt  packets || after(TCP_SKB_CB(skb)-end_seq, high_seq))
break;
-   if (!(TCP_SKB_CB(skb)-sackedTCPCB_TAGBITS)) {
+   if (!(TCP_SKB_CB(skb)-sacked  
(TCPCB_SACKED_ACKED|TCPCB_LOST))) {
TCP_SKB_CB(skb)-sacked |= TCPCB_LOST;
tp-lost_out += tcp_skb_pcount(skb);
tcp_verify_retransmit_hint(tp, skb);
-- 
1.5.0.6

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 4/7] [TCP]: Extract tcp_match_queue_to_sack from sacktag code

2007-10-11 Thread Ilpo Järvinen
This is necessary for upcoming DSACK bugfix. Reduces sacktag
length which is not very sad thing at all... :-)

Notice that there's a need to handle out-of-mem at caller's
place.

Signed-off-by: Ilpo Järvinen [EMAIL PROTECTED]
---
 net/ipv4/tcp_input.c |   54 -
 1 files changed, 35 insertions(+), 19 deletions(-)

diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 5004704..bd18c25 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -1181,6 +1181,38 @@ static int tcp_check_dsack(struct tcp_sock *tp, struct 
sk_buff *ack_skb,
return dup_sack;
 }
 
+/* Check if skb is fully within the SACK block. In presence of GSO skbs,
+ * the incoming SACK may not exactly match but we can find smaller MSS
+ * aligned portion of it that matches. Therefore we might need to fragment
+ * which may fail and creates some hassle (caller must handle error case
+ * returns).
+ */
+int tcp_match_skb_to_sack(struct sock *sk, struct sk_buff *skb,
+ u32 start_seq, u32 end_seq)
+{
+   int in_sack, err;
+   unsigned int pkt_len;
+
+   in_sack = !after(start_seq, TCP_SKB_CB(skb)-seq) 
+ !before(end_seq, TCP_SKB_CB(skb)-end_seq);
+
+   if (tcp_skb_pcount(skb)  1  !in_sack 
+   after(TCP_SKB_CB(skb)-end_seq, start_seq)) {
+
+   in_sack = !after(start_seq, TCP_SKB_CB(skb)-seq);
+
+   if (!in_sack)
+   pkt_len = start_seq - TCP_SKB_CB(skb)-seq;
+   else
+   pkt_len = end_seq - TCP_SKB_CB(skb)-seq;
+   err = tcp_fragment(sk, skb, pkt_len, skb_shinfo(skb)-gso_size);
+   if (err  0)
+   return err;
+   }
+
+   return in_sack;
+}
+
 static int
 tcp_sacktag_write_queue(struct sock *sk, struct sk_buff *ack_skb, u32 
prior_snd_una)
 {
@@ -1333,25 +1365,9 @@ tcp_sacktag_write_queue(struct sock *sk, struct sk_buff 
*ack_skb, u32 prior_snd_
if (!before(TCP_SKB_CB(skb)-seq, end_seq))
break;
 
-   in_sack = !after(start_seq, TCP_SKB_CB(skb)-seq) 
-   !before(end_seq, TCP_SKB_CB(skb)-end_seq);
-
-   if (tcp_skb_pcount(skb)  1  !in_sack 
-   after(TCP_SKB_CB(skb)-end_seq, start_seq)) {
-   unsigned int pkt_len;
-
-   in_sack = !after(start_seq,
-TCP_SKB_CB(skb)-seq);
-
-   if (!in_sack)
-   pkt_len = (start_seq -
-  TCP_SKB_CB(skb)-seq);
-   else
-   pkt_len = (end_seq -
-  TCP_SKB_CB(skb)-seq);
-   if (tcp_fragment(sk, skb, pkt_len, 
skb_shinfo(skb)-gso_size))
-   break;
-   }
+   in_sack = tcp_match_skb_to_sack(sk, skb, start_seq, 
end_seq);
+   if (in_sack  0)
+   break;
 
fack_count += tcp_skb_pcount(skb);
 
-- 
1.5.0.6

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3/7] [TCP]: Kill almost unused variable pcount from sacktag

2007-10-11 Thread Ilpo Järvinen
It's on the way for future cutting of that function.

Signed-off-by: Ilpo Järvinen [EMAIL PROTECTED]
---
 net/ipv4/tcp_input.c |9 +++--
 1 files changed, 3 insertions(+), 6 deletions(-)

diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index c827285..5004704 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -1314,7 +1314,7 @@ tcp_sacktag_write_queue(struct sock *sk, struct sk_buff 
*ack_skb, u32 prior_snd_
flag |= FLAG_DATA_LOST;
 
tcp_for_write_queue_from(skb, sk) {
-   int in_sack, pcount;
+   int in_sack;
u8 sacked;
 
if (skb == tcp_send_head(sk))
@@ -1336,9 +1336,7 @@ tcp_sacktag_write_queue(struct sock *sk, struct sk_buff 
*ack_skb, u32 prior_snd_
in_sack = !after(start_seq, TCP_SKB_CB(skb)-seq) 
!before(end_seq, TCP_SKB_CB(skb)-end_seq);
 
-   pcount = tcp_skb_pcount(skb);
-
-   if (pcount  1  !in_sack 
+   if (tcp_skb_pcount(skb)  1  !in_sack 
after(TCP_SKB_CB(skb)-end_seq, start_seq)) {
unsigned int pkt_len;
 
@@ -1353,10 +1351,9 @@ tcp_sacktag_write_queue(struct sock *sk, struct sk_buff 
*ack_skb, u32 prior_snd_
   TCP_SKB_CB(skb)-seq);
if (tcp_fragment(sk, skb, pkt_len, 
skb_shinfo(skb)-gso_size))
break;
-   pcount = tcp_skb_pcount(skb);
}
 
-   fack_count += pcount;
+   fack_count += tcp_skb_pcount(skb);
 
sacked = TCP_SKB_CB(skb)-sacked;
 
-- 
1.5.0.6

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 5/7] [TCP]: No need to re-count fackets_out/sacked_out at RTO

2007-10-11 Thread Ilpo Järvinen
Both sacked_out and fackets_out are directly known from how
parameter. Since fackets_out is accurate, there's no need for
recounting (sacked_out was previously unnecessarily counted
in the loop anyway).

Signed-off-by: Ilpo Järvinen [EMAIL PROTECTED]
---
 net/ipv4/tcp_input.c |   26 --
 1 files changed, 16 insertions(+), 10 deletions(-)

diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index bd18c25..a066039 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -1711,18 +1711,23 @@ static void tcp_enter_frto_loss(struct sock *sk, int 
allowed_segments, int flag)
tcp_clear_retrans_hints_partial(tp);
 }
 
-void tcp_clear_retrans(struct tcp_sock *tp)
+static void tcp_clear_retrans_partial(struct tcp_sock *tp)
 {
tp-retrans_out = 0;
-
-   tp-fackets_out = 0;
-   tp-sacked_out = 0;
tp-lost_out = 0;
 
tp-undo_marker = 0;
tp-undo_retrans = 0;
 }
 
+void tcp_clear_retrans(struct tcp_sock *tp)
+{
+   tcp_clear_retrans_partial(tp);
+
+   tp-fackets_out = 0;
+   tp-sacked_out = 0;
+}
+
 /* Enter Loss state. If how is not zero, forget all SACK information
  * and reset tags completely, otherwise preserve SACKs. If receiver
  * dropped its ofo queue, we will know this due to reneging detection.
@@ -1732,7 +1737,6 @@ void tcp_enter_loss(struct sock *sk, int how)
const struct inet_connection_sock *icsk = inet_csk(sk);
struct tcp_sock *tp = tcp_sk(sk);
struct sk_buff *skb;
-   int cnt = 0;
 
/* Reduce ssthresh if it has not yet been made inside this window. */
if (icsk-icsk_ca_state = TCP_CA_Disorder || tp-snd_una == 
tp-high_seq ||
@@ -1746,7 +1750,10 @@ void tcp_enter_loss(struct sock *sk, int how)
tp-snd_cwnd_stamp = tcp_time_stamp;
 
tp-bytes_acked = 0;
-   tcp_clear_retrans(tp);
+   tcp_clear_retrans_partial(tp);
+
+   if (tcp_is_reno(tp))
+   tcp_reset_reno_sack(tp);
 
if (!how) {
/* Push undo marker, if it was plain RTO and nothing
@@ -1754,13 +1761,15 @@ void tcp_enter_loss(struct sock *sk, int how)
tp-undo_marker = tp-snd_una;
tcp_clear_retrans_hints_partial(tp);
} else {
+   tp-sacked_out = 0;
+   tp-fackets_out = 0;
tcp_clear_all_retrans_hints(tp);
}
 
tcp_for_write_queue(skb, sk) {
if (skb == tcp_send_head(sk))
break;
-   cnt += tcp_skb_pcount(skb);
+
if (TCP_SKB_CB(skb)-sackedTCPCB_RETRANS)
tp-undo_marker = 0;
TCP_SKB_CB(skb)-sacked = (~TCPCB_TAGBITS)|TCPCB_SACKED_ACKED;
@@ -1768,9 +1777,6 @@ void tcp_enter_loss(struct sock *sk, int how)
TCP_SKB_CB(skb)-sacked = ~TCPCB_SACKED_ACKED;
TCP_SKB_CB(skb)-sacked |= TCPCB_LOST;
tp-lost_out += tcp_skb_pcount(skb);
-   } else {
-   tp-sacked_out += tcp_skb_pcount(skb);
-   tp-fackets_out = cnt;
}
}
tcp_verify_left_out(tp);
-- 
1.5.0.6

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC PATCH net-2.6] TCP: SACKtag rewrite

2007-10-11 Thread Ilpo Järvinen
Dave ( others?),

Here's the rebased SACKtag rewrite series rebased for you,
applies on top of the seven patch series I sent earlier
today (or on top of net-2.6 as soon as Dave picks them up
and pushes out).

Some trivial helper patches first (two first ones from
tcp-2.6), the most interesting bits are in the fifth patch.

I'll recode the included DSACK handling, it will be much
cleaner then so please ignore that mess for now...

This time only compile tested but should be still runnable
because I tested an earlier version and diff against that
looked sane.

-- 
 i.



-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3/6] [TCP]: Convert highest_sack to sk_buff to allow direct access

2007-10-11 Thread Ilpo Järvinen
It is going to replace the sack fastpath hint quite soon... :-)

Signed-off-by: Ilpo Järvinen [EMAIL PROTECTED]
---
 include/linux/tcp.h   |6 --
 include/net/tcp.h |   13 +
 net/ipv4/tcp_input.c  |   11 ++-
 net/ipv4/tcp_output.c |   19 ++-
 4 files changed, 33 insertions(+), 16 deletions(-)

diff --git a/include/linux/tcp.h b/include/linux/tcp.h
index c5b94c1..0ec6bb6 100644
--- a/include/linux/tcp.h
+++ b/include/linux/tcp.h
@@ -332,8 +332,10 @@ struct tcp_sock {
 
struct tcp_sack_block_wire recv_sack_cache[4];
 
-   u32 highest_sack;   /* Start seq of globally highest revd SACK
-* (validity guaranteed only if sacked_out  0) 
*/
+   struct sk_buff *highest_sack;   /* highest skb with SACK received
+* (validity guaranteed only if
+* sacked_out  0)
+*/
 
/* from STCP, retrans queue hinting */
struct sk_buff* lost_skb_hint;
diff --git a/include/net/tcp.h b/include/net/tcp.h
index 92049e6..aead90a 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -1306,6 +1306,19 @@ static inline int tcp_write_queue_empty(struct sock *sk)
return skb_queue_empty(sk-sk_write_queue);
 }
 
+/* Start sequence of the highest skb with SACKed bit, valid only if
+ * sacked  0 or when the caller has ensured validity by itself.
+ */
+static inline u32 tcp_highest_sack_seq(struct sock *sk)
+{
+   struct tcp_sock *tp = tcp_sk(sk);
+
+   if (WARN_ON(!tp-sacked_out 
+   tp-highest_sack != tcp_write_queue_head(sk)))
+   return tp-snd_una;
+   return TCP_SKB_CB(tp-highest_sack)-seq;
+}
+
 /* /proc */
 enum tcp_seq_states {
TCP_SEQ_STATE_LISTENING,
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 9d3d390..39d6a6a 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -1238,10 +1238,11 @@ int tcp_match_skb_to_sack(struct sock *sk, struct 
sk_buff *skb,
return in_sack;
 }
 
-static void tcp_sacktag_one(struct sk_buff *skb, struct tcp_sock *tp,
+static void tcp_sacktag_one(struct sk_buff *skb, struct sock *sk,
struct tcp_sacktag_state *state, int in_sack,
int dup_sack, int fack_count, u32 end_seq)
 {
+   struct tcp_sock *tp = tcp_sk(sk);
u8 sacked = TCP_SKB_CB(skb)-sacked;
 
/* Account D-SACK for retransmitted packet. */
@@ -1321,8 +1322,8 @@ static void tcp_sacktag_one(struct sk_buff *skb, struct 
tcp_sock *tp,
if (fack_count  tp-fackets_out)
tp-fackets_out = fack_count;
 
-   if (after(TCP_SKB_CB(skb)-seq, tp-highest_sack)) {
-   tp-highest_sack = TCP_SKB_CB(skb)-seq;
+   if (after(TCP_SKB_CB(skb)-seq, tcp_highest_sack_seq(sk))) {
+   tp-highest_sack = skb;
state-highest_sack_end_seq = TCP_SKB_CB(skb)-end_seq;
}
} else {
@@ -1363,7 +1364,7 @@ tcp_sacktag_write_queue(struct sock *sk, struct sk_buff 
*ack_skb, u32 prior_snd_
if (!tp-sacked_out) {
if (WARN_ON(tp-fackets_out))
tp-fackets_out = 0;
-   tp-highest_sack = tp-snd_una;
+   tp-highest_sack = tcp_write_queue_head(sk);
}
 
found_dup_sack = tcp_check_dsack(tp, ack_skb, sp, num_sacks, 
prior_snd_una);
@@ -1497,7 +1498,7 @@ tcp_sacktag_write_queue(struct sock *sk, struct sk_buff 
*ack_skb, u32 prior_snd_
 
fack_count += tcp_skb_pcount(skb);
 
-   tcp_sacktag_one(skb, tp, state, in_sack,
+   tcp_sacktag_one(skb, sk, state, in_sack,
dup_sack, fack_count, end_seq);
}
}
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index 324b420..9603de8 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -657,13 +657,15 @@ static void tcp_set_skb_tso_segs(struct sock *sk, struct 
sk_buff *skb, unsigned
  * tweak SACK fastpath hint too as it would overwrite all changes unless
  * hint is also changed.
  */
-static void tcp_adjust_fackets_out(struct tcp_sock *tp, struct sk_buff *skb,
+static void tcp_adjust_fackets_out(struct sock *sk, struct sk_buff *skb,
   int decr)
 {
+   struct tcp_sock *tp = tcp_sk(sk);
+
if (!tp-sacked_out || tcp_is_reno(tp))
return;
 
-   if (!before(tp-highest_sack, TCP_SKB_CB(skb)-seq))
+   if (!before(tcp_highest_sack_seq(sk), TCP_SKB_CB(skb)-seq))
tp-fackets_out -= decr;
 
/* cnt_hint is off-by-one compared with fackets_out (see sacktag) */
@@ -712,9 +714,8 @@ int tcp_fragment(struct sock *sk, struct sk_buff *skb, u32 
len, unsigned int mss
TCP_SKB_CB(buff)-end_seq = TCP_SKB_CB(skb)-end_seq;

[PATCH 1/6] [TCP]: Create tcp_sacktag_state.

2007-10-11 Thread Ilpo Järvinen
From: David S. Miller [EMAIL PROTECTED]

It is difficult to break out the inner-logic of
tcp_sacktag_write_queue() into worker functions because
so many local variables get updated in-place.

Start to overcome this by creating a structure block
of state variables that can be passed around into
worker routines.

[I made minor tweaks due to rebase/reordering of stuff
in tcp-2.6 tree, and dropped found_dup_sack  dup_sack
from state -ij]

Signed-off-by: David S. Miller [EMAIL PROTECTED]
Signed-off-by: Ilpo Järvinen [EMAIL PROTECTED]
---
 net/ipv4/tcp_input.c |   82 +++---
 1 files changed, 44 insertions(+), 38 deletions(-)

diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 0a42e93..1f6cbce 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -1165,6 +1165,14 @@ static int tcp_mark_lost_retrans(struct sock *sk, u32 
received_upto)
return flag;
 }
 
+struct tcp_sacktag_state {
+   unsigned int flag;
+   int reord;
+   int prior_fackets;
+   u32 highest_sack_end_seq;
+   int first_sack_index;
+};
+
 static int tcp_check_dsack(struct tcp_sock *tp, struct sk_buff *ack_skb,
   struct tcp_sack_block_wire *sp, int num_sacks,
   u32 prior_snd_una)
@@ -1240,26 +1248,23 @@ tcp_sacktag_write_queue(struct sock *sk, struct sk_buff 
*ack_skb, u32 prior_snd_
struct tcp_sack_block_wire *sp = (struct tcp_sack_block_wire *)(ptr+2);
struct sk_buff *cached_skb;
int num_sacks = (ptr[1] - TCPOLEN_SACK_BASE)3;
-   int reord = tp-packets_out;
-   int prior_fackets;
-   u32 highest_sack_end_seq = 0;
-   int flag = 0;
-   int found_dup_sack = 0;
+   struct tcp_sacktag_state state;
+   int found_dup_sack;
int cached_fack_count;
int i;
-   int first_sack_index;
+   int force_one_sack;
+
+   state.flag = 0;
 
if (!tp-sacked_out) {
if (WARN_ON(tp-fackets_out))
tp-fackets_out = 0;
tp-highest_sack = tp-snd_una;
}
-   prior_fackets = tp-fackets_out;
 
-   found_dup_sack = tcp_check_dsack(tp, ack_skb, sp,
-num_sacks, prior_snd_una);
+   found_dup_sack = tcp_check_dsack(tp, ack_skb, sp, num_sacks, 
prior_snd_una);
if (found_dup_sack)
-   flag |= FLAG_DSACKING_ACK;
+   state.flag |= FLAG_DSACKING_ACK;
 
/* Eliminate too old ACKs, but take into
 * account more or less fresh ones, they can
@@ -1272,18 +1277,18 @@ tcp_sacktag_write_queue(struct sock *sk, struct sk_buff 
*ack_skb, u32 prior_snd_
 * if the only SACK change is the increase of the end_seq of
 * the first block then only apply that SACK block
 * and use retrans queue hinting otherwise slowpath */
-   flag = 1;
+   force_one_sack = 1;
for (i = 0; i  num_sacks; i++) {
__be32 start_seq = sp[i].start_seq;
__be32 end_seq = sp[i].end_seq;
 
if (i == 0) {
if (tp-recv_sack_cache[i].start_seq != start_seq)
-   flag = 0;
+   force_one_sack = 0;
} else {
if ((tp-recv_sack_cache[i].start_seq != start_seq) ||
(tp-recv_sack_cache[i].end_seq != end_seq))
-   flag = 0;
+   force_one_sack = 0;
}
tp-recv_sack_cache[i].start_seq = start_seq;
tp-recv_sack_cache[i].end_seq = end_seq;
@@ -1294,8 +1299,8 @@ tcp_sacktag_write_queue(struct sock *sk, struct sk_buff 
*ack_skb, u32 prior_snd_
tp-recv_sack_cache[i].end_seq = 0;
}
 
-   first_sack_index = 0;
-   if (flag)
+   state.first_sack_index = 0;
+   if (force_one_sack)
num_sacks = 1;
else {
int j;
@@ -1313,17 +1318,14 @@ tcp_sacktag_write_queue(struct sock *sk, struct sk_buff 
*ack_skb, u32 prior_snd_
sp[j+1] = tmp;
 
/* Track where the first SACK block 
goes to */
-   if (j == first_sack_index)
-   first_sack_index = j+1;
+   if (j == state.first_sack_index)
+   state.first_sack_index = j+1;
}
 
}
}
}
 
-   /* clear flag as used for different purpose in following code */
-   flag = 0;
-
/* Use SACK fastpath hint if valid */
cached_skb = tp-fastpath_skb_hint;
cached_fack_count = tp-fastpath_cnt_hint;
@@ -1332,12 +1334,16 @@ tcp_sacktag_write_queue(struct sock *sk, struct sk_buff 
*ack_skb, u32 prior_snd_
 

[PATCH 2/6] [TCP]: Create tcp_sacktag_one().

2007-10-11 Thread Ilpo Järvinen
From: David S. Miller [EMAIL PROTECTED]

Worker function that implements the main logic of
the inner-most loop of tcp_sacktag_write_queue().

Signed-off-by: David S. Miller [EMAIL PROTECTED]
Acked-by: Ilpo Järvinen [EMAIL PROTECTED]
---
 net/ipv4/tcp_input.c |  205 ++
 1 files changed, 106 insertions(+), 99 deletions(-)

diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 1f6cbce..9d3d390 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -1238,6 +1238,110 @@ int tcp_match_skb_to_sack(struct sock *sk, struct 
sk_buff *skb,
return in_sack;
 }
 
+static void tcp_sacktag_one(struct sk_buff *skb, struct tcp_sock *tp,
+   struct tcp_sacktag_state *state, int in_sack,
+   int dup_sack, int fack_count, u32 end_seq)
+{
+   u8 sacked = TCP_SKB_CB(skb)-sacked;
+
+   /* Account D-SACK for retransmitted packet. */
+   if ((dup_sack  in_sack) 
+   (sacked  TCPCB_RETRANS) 
+   after(TCP_SKB_CB(skb)-end_seq, tp-undo_marker))
+   tp-undo_retrans--;
+
+   /* The frame is ACKed. */
+   if (!after(TCP_SKB_CB(skb)-end_seq, tp-snd_una)) {
+   if (sacked  TCPCB_RETRANS) {
+   if ((dup_sack  in_sack) 
+   (sacked  TCPCB_SACKED_ACKED))
+   state-reord = min(fack_count, state-reord);
+   } else {
+   /* If it was in a hole, we detected reordering. */
+   if (fack_count  state-prior_fackets 
+   !(sacked  TCPCB_SACKED_ACKED))
+   state-reord = min(fack_count, state-reord);
+   }
+
+   /* Nothing to do; acked frame is about to be dropped. */
+   return;
+   }
+
+   if (!in_sack)
+   return;
+
+   if (!(sacked  TCPCB_SACKED_ACKED)) {
+   if (sacked  TCPCB_SACKED_RETRANS) {
+   /* If the segment is not tagged as lost,
+* we do not clear RETRANS, believing
+* that retransmission is still in flight.
+*/
+   if (sacked  TCPCB_LOST) {
+   TCP_SKB_CB(skb)-sacked =
+   ~(TCPCB_LOST|TCPCB_SACKED_RETRANS);
+   tp-lost_out -= tcp_skb_pcount(skb);
+   tp-retrans_out -= tcp_skb_pcount(skb);
+
+   /* clear lost hint */
+   tp-retransmit_skb_hint = NULL;
+   }
+   } else {
+   /* New sack for not retransmitted frame,
+* which was in hole. It is reordering.
+*/
+   if (!(sacked  TCPCB_RETRANS) 
+   fack_count  state-prior_fackets)
+   state-reord = min(fack_count, state-reord);
+
+   if (sacked  TCPCB_LOST) {
+   TCP_SKB_CB(skb)-sacked = ~TCPCB_LOST;
+   tp-lost_out -= tcp_skb_pcount(skb);
+
+   /* clear lost hint */
+   tp-retransmit_skb_hint = NULL;
+   }
+   /* SACK enhanced F-RTO detection.
+* Set flag if and only if non-rexmitted
+* segments below frto_highmark are
+* SACKed (RFC4138; Appendix B).
+* Clearing correct due to in-order walk
+*/
+   if (after(end_seq, tp-frto_highmark)) {
+   state-flag = ~FLAG_ONLY_ORIG_SACKED;
+   } else {
+   if (!(sacked  TCPCB_RETRANS))
+   state-flag |= FLAG_ONLY_ORIG_SACKED;
+   }
+   }
+
+   TCP_SKB_CB(skb)-sacked |= TCPCB_SACKED_ACKED;
+   state-flag |= FLAG_DATA_SACKED;
+   tp-sacked_out += tcp_skb_pcount(skb);
+
+   if (fack_count  tp-fackets_out)
+   tp-fackets_out = fack_count;
+
+   if (after(TCP_SKB_CB(skb)-seq, tp-highest_sack)) {
+   tp-highest_sack = TCP_SKB_CB(skb)-seq;
+   state-highest_sack_end_seq = TCP_SKB_CB(skb)-end_seq;
+   }
+   } else {
+   if (dup_sack  (sackedTCPCB_RETRANS))
+   state-reord = min(fack_count, state-reord);
+   }
+
+   /* D-SACK. We can detect redundant retransmission
+* in S|R and plain R frames and clear it.
+* undo_retrans is decreased above, L|R frames
+* are accounted above as well.
+*/
+   if (dup_sack  

[PATCH 4/6] [TCP]: Earlier SACK block verification simplify access to them

2007-10-11 Thread Ilpo Järvinen
Signed-off-by: Ilpo Järvinen [EMAIL PROTECTED]
---
 include/linux/tcp.h  |2 +-
 net/ipv4/tcp_input.c |   75 ++---
 2 files changed, 47 insertions(+), 30 deletions(-)

diff --git a/include/linux/tcp.h b/include/linux/tcp.h
index 0ec6bb6..3e412f2 100644
--- a/include/linux/tcp.h
+++ b/include/linux/tcp.h
@@ -330,7 +330,7 @@ struct tcp_sock {
struct tcp_sack_block duplicate_sack[1]; /* D-SACK block */
struct tcp_sack_block selective_acks[4]; /* The SACKS themselves*/
 
-   struct tcp_sack_block_wire recv_sack_cache[4];
+   struct tcp_sack_block recv_sack_cache[4];
 
struct sk_buff *highest_sack;   /* highest skb with SACK received
 * (validity guaranteed only if
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 39d6a6a..c260642 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -1350,9 +1350,11 @@ tcp_sacktag_write_queue(struct sock *sk, struct sk_buff 
*ack_skb, u32 prior_snd_
struct tcp_sock *tp = tcp_sk(sk);
unsigned char *ptr = (skb_transport_header(ack_skb) +
  TCP_SKB_CB(ack_skb)-sacked);
-   struct tcp_sack_block_wire *sp = (struct tcp_sack_block_wire *)(ptr+2);
+   struct tcp_sack_block_wire *sp_wire = (struct tcp_sack_block_wire 
*)(ptr+2);
+   struct tcp_sack_block sp[4];
struct sk_buff *cached_skb;
int num_sacks = (ptr[1] - TCPOLEN_SACK_BASE)3;
+   int used_sacks;
struct tcp_sacktag_state state;
int found_dup_sack;
int cached_fack_count;
@@ -1367,7 +1369,7 @@ tcp_sacktag_write_queue(struct sock *sk, struct sk_buff 
*ack_skb, u32 prior_snd_
tp-highest_sack = tcp_write_queue_head(sk);
}
 
-   found_dup_sack = tcp_check_dsack(tp, ack_skb, sp, num_sacks, 
prior_snd_una);
+   found_dup_sack = tcp_check_dsack(tp, ack_skb, sp_wire, num_sacks, 
prior_snd_una);
if (found_dup_sack)
state.flag |= FLAG_DSACKING_ACK;
 
@@ -1378,14 +1380,46 @@ tcp_sacktag_write_queue(struct sock *sk, struct sk_buff 
*ack_skb, u32 prior_snd_
if (before(TCP_SKB_CB(ack_skb)-ack_seq, prior_snd_una - 
tp-max_window))
return 0;
 
+   used_sacks = 0;
+   for (i = 0; i  num_sacks; i++) {
+   int dup_sack = !i  found_dup_sack;
+
+   sp[used_sacks].start_seq = 
ntohl(get_unaligned(sp_wire[i].start_seq));
+   sp[used_sacks].end_seq = 
ntohl(get_unaligned(sp_wire[i].end_seq));
+
+   if (!tcp_is_sackblock_valid(tp, dup_sack,
+   sp[used_sacks].start_seq,
+   sp[used_sacks].end_seq)) {
+   if (dup_sack) {
+   if (!tp-undo_marker)
+   
NET_INC_STATS_BH(LINUX_MIB_TCPDSACKIGNOREDNOUNDO);
+   else
+   
NET_INC_STATS_BH(LINUX_MIB_TCPDSACKIGNOREDOLD);
+   } else {
+   /* Don't count olds caused by ACK reordering */
+   if ((TCP_SKB_CB(ack_skb)-ack_seq != 
tp-snd_una) 
+   !after(sp[used_sacks].end_seq, tp-snd_una))
+   continue;
+   NET_INC_STATS_BH(LINUX_MIB_TCPSACKDISCARD);
+   }
+   continue;
+   }
+
+   /* Ignore very old stuff early */
+   if (!after(sp[used_sacks].end_seq, prior_snd_una))
+   continue;
+
+   used_sacks++;
+   }
+
/* SACK fastpath:
 * if the only SACK change is the increase of the end_seq of
 * the first block then only apply that SACK block
 * and use retrans queue hinting otherwise slowpath */
force_one_sack = 1;
-   for (i = 0; i  num_sacks; i++) {
-   __be32 start_seq = sp[i].start_seq;
-   __be32 end_seq = sp[i].end_seq;
+   for (i = 0; i  used_sacks; i++) {
+   u32 start_seq = sp[i].start_seq;
+   u32 end_seq = sp[i].end_seq;
 
if (i == 0) {
if (tp-recv_sack_cache[i].start_seq != start_seq)
@@ -1406,17 +1440,16 @@ tcp_sacktag_write_queue(struct sock *sk, struct sk_buff 
*ack_skb, u32 prior_snd_
 
state.first_sack_index = 0;
if (force_one_sack)
-   num_sacks = 1;
+   used_sacks = 1;
else {
int j;
tp-fastpath_skb_hint = NULL;
 
/* order SACK blocks to allow in order walk of the retrans 
queue */
-   for (i = num_sacks-1; i  0; i--) {
+   for (i = used_sacks-1; i  0; i--) {
for (j = 0; j  i; j++){
-   if (after(ntohl(sp[j].start_seq),
- 

[RFC PATCH 5/6] [TCP]: Rewrite sack_recv_cache (WIP)

2007-10-11 Thread Ilpo Järvinen
Key points of this patch are:

  - In case new SACK information is advance only type, no skb
processing below previously discovered highest point is done
  - Optimize cases below highest point too since there's no need
to always go up to highest point (which is very likely still
present in that SACK), this is not entirely true though
because I'm dropping the fastpath_skb_hint which could
previously optimize those cases even better. Whether that
significant, I'm not too sure.

Corrently it will provide skipping by walking. Combined with
RB-tree, all skipping would become fast too regardless of window
size (can be done incrementally later).

Previously a number of cases in TCP SACK processing fails to
take advantage of costly stored information in sack_recv_cache.
Most importantly expected events such as cumulative ACK and new
hole ACKs. Processing on such ACKs result in rather long walks
building up latencies (which easily gets nasty when window is
large). Those latencies are often completely unnecessary
compared with the amount of _new_ information received, usually
for cumulative ACK there's no new information at all, yet TCP
walked whole queue unnecessary potentially taking a number of
costly cache misses on the way, etc.!

Since the inclusion of highest_sack, there's a lot information
that is very likely redundant (SACK fastpath hint stuff,
fackets_out, highest_sack), though there's no ultimate guarantee
that they'll remain the same whole the time (in all unearthly
scenarios). Take advantage of this too and drop fastpath hint
and use direct access to highest SACKed skb as replacement.

Effectively special cased fastpath is dropped. This change
adds some complexity to introduce better coveraged fastpath,
though the added complexity should make TCP behave more cache
friendly.

The current ACK's SACK blocks are compared against each cached
block individially and only ranges that are new are then scanned
by the high constant walk. For other parts of write queue, even
when in previously known part of the SACK blocks, a faster skip
function is used (if necessary at all). In addition, whenever
possible, TCP fast-forwards to highest_sack skb that was made
available by an earlier patch. In typical case, no other things
but this fast-forward and mandatory markings after that occur
making the access pattern quite similar to the former fastpath
special case.

DSACKs are special case that must always be walked.

Signed-off-by: Ilpo Järvinen [EMAIL PROTECTED]
---
 include/linux/tcp.h   |3 -
 include/net/tcp.h |1 -
 net/ipv4/tcp_input.c  |  253 ++--
 net/ipv4/tcp_output.c |   14 +---
 4 files changed, 159 insertions(+), 112 deletions(-)

diff --git a/include/linux/tcp.h b/include/linux/tcp.h
index 3e412f2..91c4430 100644
--- a/include/linux/tcp.h
+++ b/include/linux/tcp.h
@@ -343,10 +343,7 @@ struct tcp_sock {
struct sk_buff *scoreboard_skb_hint;
struct sk_buff *retransmit_skb_hint;
struct sk_buff *forward_skb_hint;
-   struct sk_buff *fastpath_skb_hint;
 
-   int fastpath_cnt_hint;  /* Lags behind by current skb's pcount
-* compared to respective fackets_out */
int lost_cnt_hint;
int retransmit_cnt_hint;
 
diff --git a/include/net/tcp.h b/include/net/tcp.h
index aead90a..cff2d86 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -1078,7 +1078,6 @@ static inline void tcp_clear_retrans_hints_partial(struct 
tcp_sock *tp)
 static inline void tcp_clear_all_retrans_hints(struct tcp_sock *tp)
 {
tcp_clear_retrans_hints_partial(tp);
-   tp-fastpath_skb_hint = NULL;
 }
 
 /* MD5 Signature */
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index c260642..c13f1af 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -1170,10 +1170,14 @@ struct tcp_sacktag_state {
int reord;
int prior_fackets;
u32 highest_sack_end_seq;
-   int first_sack_index;
+   int fack_count;
+   u32 dup_start;
+   u32 dup_end;
 };
 
-static int tcp_check_dsack(struct tcp_sock *tp, struct sk_buff *ack_skb,
+static int tcp_check_dsack(struct tcp_sock *tp,
+  struct tcp_sacktag_state *state,
+  struct sk_buff *ack_skb,
   struct tcp_sack_block_wire *sp, int num_sacks,
   u32 prior_snd_una)
 {
@@ -1183,6 +1187,8 @@ static int tcp_check_dsack(struct tcp_sock *tp, struct 
sk_buff *ack_skb,
 
if (before(start_seq_0, TCP_SKB_CB(ack_skb)-ack_seq)) {
dup_sack = 1;
+   state-dup_start = start_seq_0;
+   state-dup_end = end_seq_0;
tcp_dsack_seen(tp);
NET_INC_STATS_BH(LINUX_MIB_TCPDSACKRECV);
} else if (num_sacks  1) {
@@ -1192,6 +1198,8 @@ static int tcp_check_dsack(struct tcp_sock *tp, struct 
sk_buff *ack_skb,

[PATCH 6/6] [TCP]: Track sacktag (DEVEL PATCH)

2007-10-11 Thread Ilpo Järvinen
This is not intented to go to mainline, provided just for those
who are interested enough about the algorithm internals during
a test.

Signed-off-by: Ilpo Järvinen [EMAIL PROTECTED]
---
 include/linux/snmp.h |   20 +++
 net/ipv4/proc.c  |   20 +++
 net/ipv4/tcp_input.c |   52 +++--
 3 files changed, 89 insertions(+), 3 deletions(-)

diff --git a/include/linux/snmp.h b/include/linux/snmp.h
index 89f0c2b..42b8c07 100644
--- a/include/linux/snmp.h
+++ b/include/linux/snmp.h
@@ -214,6 +214,26 @@ enum
LINUX_MIB_TCPDSACKIGNOREDOLD,   /* TCPSACKIgnoredOld */
LINUX_MIB_TCPDSACKIGNOREDNOUNDO,/* TCPSACKIgnoredNoUndo */
LINUX_MIB_TCPSPURIOUSRTOS,  /* TCPSpuriousRTOs */
+   LINUX_MIB_TCP_SACKTAG,
+   LINUX_MIB_TCP_SACK0,
+   LINUX_MIB_TCP_SACK1,
+   LINUX_MIB_TCP_SACK2,
+   LINUX_MIB_TCP_SACK3,
+   LINUX_MIB_TCP_SACK4,
+   LINUX_MIB_TCP_WALKEDSKBS,
+   LINUX_MIB_TCP_WALKEDDSACKS,
+   LINUX_MIB_TCP_SKIPPEDSKBS,
+   LINUX_MIB_TCP_NOCACHE,
+   LINUX_MIB_TCP_FULLWALK,
+   LINUX_MIB_TCP_HEADWALK,
+   LINUX_MIB_TCP_TAILWALK,
+   LINUX_MIB_TCP_FULLSKIP,
+   LINUX_MIB_TCP_TAILSKIP,
+   LINUX_MIB_TCP_HEADSKIP,
+   LINUX_MIB_TCP_FULLSKIP_TOHIGH,
+   LINUX_MIB_TCP_TAILSKIP_TOHIGH,
+   LINUX_MIB_TCP_NEWSKIP,
+   LINUX_MIB_TCP_CACHEREMAINING,
__LINUX_MIB_MAX
 };
 
diff --git a/net/ipv4/proc.c b/net/ipv4/proc.c
index e5b05b0..9909178 100644
--- a/net/ipv4/proc.c
+++ b/net/ipv4/proc.c
@@ -246,6 +246,26 @@ static const struct snmp_mib snmp4_net_list[] = {
SNMP_MIB_ITEM(TCPDSACKIgnoredOld, LINUX_MIB_TCPDSACKIGNOREDOLD),
SNMP_MIB_ITEM(TCPDSACKIgnoredNoUndo, LINUX_MIB_TCPDSACKIGNOREDNOUNDO),
SNMP_MIB_ITEM(TCPSpuriousRTOs, LINUX_MIB_TCPSPURIOUSRTOS),
+   SNMP_MIB_ITEM(TCP_SACKTAG, LINUX_MIB_TCP_SACKTAG),
+   SNMP_MIB_ITEM(TCP_SACK0, LINUX_MIB_TCP_SACK0),
+   SNMP_MIB_ITEM(TCP_SACK1, LINUX_MIB_TCP_SACK1),
+   SNMP_MIB_ITEM(TCP_SACK2, LINUX_MIB_TCP_SACK2),
+   SNMP_MIB_ITEM(TCP_SACK3, LINUX_MIB_TCP_SACK3),
+   SNMP_MIB_ITEM(TCP_SACK4, LINUX_MIB_TCP_SACK4),
+   SNMP_MIB_ITEM(TCP_WALKEDSKBS, LINUX_MIB_TCP_WALKEDSKBS),
+   SNMP_MIB_ITEM(TCP_WALKEDDSACKS, LINUX_MIB_TCP_WALKEDDSACKS),
+   SNMP_MIB_ITEM(TCP_SKIPPEDSKBS, LINUX_MIB_TCP_SKIPPEDSKBS),
+   SNMP_MIB_ITEM(TCP_NOCACHE, LINUX_MIB_TCP_NOCACHE),
+   SNMP_MIB_ITEM(TCP_FULLWALK, LINUX_MIB_TCP_FULLWALK),
+   SNMP_MIB_ITEM(TCP_HEADWALK, LINUX_MIB_TCP_HEADWALK),
+   SNMP_MIB_ITEM(TCP_TAILWALK, LINUX_MIB_TCP_TAILWALK),
+   SNMP_MIB_ITEM(TCP_FULLSKIP, LINUX_MIB_TCP_FULLSKIP),
+   SNMP_MIB_ITEM(TCP_TAILSKIP, LINUX_MIB_TCP_TAILSKIP),
+   SNMP_MIB_ITEM(TCP_HEADSKIP, LINUX_MIB_TCP_HEADSKIP),
+   SNMP_MIB_ITEM(TCP_FULLSKIP_TOHIGH, LINUX_MIB_TCP_FULLSKIP_TOHIGH),
+   SNMP_MIB_ITEM(TCP_TAILSKIP_TOHIGH, LINUX_MIB_TCP_TAILSKIP_TOHIGH),
+   SNMP_MIB_ITEM(TCP_NEWSKIP, LINUX_MIB_TCP_NEWSKIP),
+   SNMP_MIB_ITEM(TCP_CACHEREMAINING, LINUX_MIB_TCP_CACHEREMAINING),
SNMP_MIB_SENTINEL
 };
 
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index c13f1af..02b34a5 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -1382,6 +1382,10 @@ static struct sk_buff *tcp_sacktag_walk(struct sk_buff 
*skb, struct sock *sk,
 
tcp_sacktag_one(skb, sk, state, in_sack, dup_sack,
state-fack_count, end_seq);
+
+   NET_INC_STATS_BH(LINUX_MIB_TCP_WALKEDSKBS);
+   if (dup_sack)
+   NET_INC_STATS_BH(LINUX_MIB_TCP_WALKEDDSACKS);
}
return skb;
 }
@@ -1405,6 +1409,7 @@ static struct sk_buff *tcp_sacktag_skip(struct sk_buff 
*skb, struct sock *sk,
skb = tcp_sacktag_walk(skb, sk, state, state-dup_start,
   state-dup_end, 1);
}
+   NET_INC_STATS_BH(LINUX_MIB_TCP_SKIPPEDSKBS);
}
return skb;
 }
@@ -1535,9 +1540,22 @@ tcp_sacktag_write_queue(struct sock *sk, struct sk_buff 
*ack_skb, u32 prior_snd_
cache++;
}
 
+   NET_INC_STATS_BH(LINUX_MIB_TCP_SACKTAG);
+   switch (used_sacks) {
+   case 0: NET_INC_STATS_BH(LINUX_MIB_TCP_SACK0); break;
+   case 1: NET_INC_STATS_BH(LINUX_MIB_TCP_SACK1); break;
+   case 2: NET_INC_STATS_BH(LINUX_MIB_TCP_SACK2); break;
+   case 3: NET_INC_STATS_BH(LINUX_MIB_TCP_SACK3); break;
+   case 4: NET_INC_STATS_BH(LINUX_MIB_TCP_SACK4); break;
+   }
+
+   if (!tcp_sack_cache_ok(tp, cache))
+   NET_INC_STATS_BH(LINUX_MIB_TCP_NOCACHE);
+
while (i  used_sacks) {
u32 start_seq = sp[i].start_seq;
u32 end_seq = sp[i].end_seq;
+   int fullwalk = 0;
 
/* Event B in the comment above. */

[PATCH respin] ucc_geth: fix module removal

2007-10-11 Thread Anton Vorontsov
- uccf should be set to NULL to not double-free memory on
  subsequent calls;
- ind_hash_q and group_hash_q lists should be initialized in the
  probe() function, instead of struct_init() (called by open()),
  otherwise there will be an oops if ucc_geth_driver removed
  prior 'ifconfig ethX up';
- add unregister_netdev();
- reorder geth_remove() steps.

Signed-off-by: Anton Vorontsov [EMAIL PROTECTED]
---
 drivers/net/ucc_geth.c |   17 ++---
 1 files changed, 10 insertions(+), 7 deletions(-)

diff --git a/drivers/net/ucc_geth.c b/drivers/net/ucc_geth.c
index 7dedc96..18a6f48 100644
--- a/drivers/net/ucc_geth.c
+++ b/drivers/net/ucc_geth.c
@@ -2080,8 +2080,10 @@ static void ucc_geth_memclean(struct ucc_geth_private 
*ugeth)
if (!ugeth)
return;
 
-   if (ugeth-uccf)
+   if (ugeth-uccf) {
ucc_fast_free(ugeth-uccf);
+   ugeth-uccf = NULL;
+   }
 
if (ugeth-p_thread_data_tx) {
qe_muram_free(ugeth-thread_dat_tx_offset);
@@ -2312,10 +2314,6 @@ static int ucc_struct_init(struct ucc_geth_private 
*ugeth)
ug_info = ugeth-ug_info;
uf_info = ug_info-uf_info;
 
-   /* Create CQs for hash tables */
-   INIT_LIST_HEAD(ugeth-group_hash_q);
-   INIT_LIST_HEAD(ugeth-ind_hash_q);
-
if (!((uf_info-bd_mem_part == MEM_PART_SYSTEM) ||
  (uf_info-bd_mem_part == MEM_PART_MURAM))) {
if (netif_msg_probe(ugeth))
@@ -3949,6 +3947,10 @@ static int ucc_geth_probe(struct of_device* ofdev, const 
struct of_device_id *ma
ugeth = netdev_priv(dev);
spin_lock_init(ugeth-lock);
 
+   /* Create CQs for hash tables */
+   INIT_LIST_HEAD(ugeth-group_hash_q);
+   INIT_LIST_HEAD(ugeth-ind_hash_q);
+
dev_set_drvdata(device, dev);
 
/* Set the dev-base_addr to the gfar reg region */
@@ -4002,9 +4004,10 @@ static int ucc_geth_remove(struct of_device* ofdev)
struct net_device *dev = dev_get_drvdata(device);
struct ucc_geth_private *ugeth = netdev_priv(dev);
 
-   dev_set_drvdata(device, NULL);
-   ucc_geth_memclean(ugeth);
+   unregister_netdev(dev);
free_netdev(dev);
+   ucc_geth_memclean(ugeth);
+   dev_set_drvdata(device, NULL);
 
return 0;
 }
-- 
1.5.0.6

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] ucc_geth: add support for netpoll

2007-10-11 Thread Anton Vorontsov
This patch adds netpoll support for the QE UCC Gigabit Ethernet
driver. The approach is very similar to the gianfar driver.

Tested using netconsole.

Signed-off-by: Anton Vorontsov [EMAIL PROTECTED]
---
 drivers/net/ucc_geth.c |   19 +++
 1 files changed, 19 insertions(+), 0 deletions(-)

diff --git a/drivers/net/ucc_geth.c b/drivers/net/ucc_geth.c
index 18a6f48..06807ce 100644
--- a/drivers/net/ucc_geth.c
+++ b/drivers/net/ucc_geth.c
@@ -3691,6 +3691,22 @@ static irqreturn_t ucc_geth_irq_handler(int irq, void 
*info)
return IRQ_HANDLED;
 }
 
+#ifdef CONFIG_NET_POLL_CONTROLLER
+/*
+ * Polling 'interrupt' - used by things like netconsole to send skbs
+ * without having to re-enable interrupts. It's not called while
+ * the interrupt routine is executing.
+ */
+static void ucc_netpoll(struct net_device *dev)
+{
+   struct ucc_geth_private *ugeth = netdev_priv(dev);
+
+   disable_irq(ugeth-ug_info-uf_info.irq);
+   ucc_geth_irq_handler(ugeth-ug_info-uf_info.irq, dev);
+   enable_irq(ugeth-ug_info-uf_info.irq);
+}
+#endif /* CONFIG_NET_POLL_CONTROLLER */
+
 /* Called when something needs to use the ethernet device */
 /* Returns 0 for success. */
 static int ucc_geth_open(struct net_device *dev)
@@ -3969,6 +3985,9 @@ static int ucc_geth_probe(struct of_device* ofdev, const 
struct of_device_id *ma
dev-poll = ucc_geth_poll;
dev-weight = UCC_GETH_DEV_WEIGHT;
 #endif /* CONFIG_UGETH_NAPI */
+#ifdef CONFIG_NET_POLL_CONTROLLER
+   dev-poll_controller = ucc_netpoll;
+#endif
dev-stop = ucc_geth_close;
dev-get_stats = ucc_geth_get_stats;
 //dev-change_mtu = ucc_geth_change_mtu;
-- 
1.5.0.6

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH respin] phy: implement release function

2007-10-11 Thread Anton Vorontsov
Lately I've got this nice badness on mdio bus removal:

Device 'e0103120:06' does not have a release() function, it is broken and must 
be fixed.
[ cut here ]
Badness at drivers/base/core.c:107
NIP: c015c1a8 LR: c015c1a8 CTR: c0157488
REGS: c34bdcf0 TRAP: 0700   Not tainted  (2.6.23-rc5-g9ebadfbb-dirty)
MSR: 00029032 EE,ME,IR,DR  CR: 24088422  XER: 
...
[c34bdda0] [c015c1a8] device_release+0x78/0x80 (unreliable)
[c34bddb0] [c01354cc] kobject_cleanup+0x80/0xbc
[c34bddd0] [c01365f0] kref_put+0x54/0x6c
[c34bdde0] [c013543c] kobject_put+0x24/0x34
[c34bddf0] [c015c384] put_device+0x1c/0x2c
[c34bde00] [c0180e84] mdiobus_unregister+0x2c/0x58
...

Though actually there is nothing broken, it just device
subsystem core expects another pattern of resource managment.

This patch implement phy device's release function, thus
we're getting rid of this badness.

Also small hidden bug fixed, hope none other introduced. ;-)

Signed-off-by: Anton Vorontsov [EMAIL PROTECTED]
Acked-by: Andy Fleming [EMAIL PROTECTED]
---
 drivers/net/phy/mdio_bus.c   |9 +
 drivers/net/phy/phy_device.c |   13 +
 include/linux/phy.h  |1 +
 3 files changed, 19 insertions(+), 4 deletions(-)

diff --git a/drivers/net/phy/mdio_bus.c b/drivers/net/phy/mdio_bus.c
index fc2f0e6..c30196d 100644
--- a/drivers/net/phy/mdio_bus.c
+++ b/drivers/net/phy/mdio_bus.c
@@ -91,9 +91,12 @@ int mdiobus_register(struct mii_bus *bus)
 
err = device_register(phydev-dev);
 
-   if (err)
+   if (err) {
printk(KERN_ERR phy %d failed to register\n,
i);
+   phy_device_free(phydev);
+   phydev = NULL;
+   }
}
 
bus-phy_map[i] = phydev;
@@ -110,10 +113,8 @@ void mdiobus_unregister(struct mii_bus *bus)
int i;
 
for (i = 0; i  PHY_MAX_ADDR; i++) {
-   if (bus-phy_map[i]) {
+   if (bus-phy_map[i])
device_unregister(bus-phy_map[i]-dev);
-   kfree(bus-phy_map[i]);
-   }
}
 }
 EXPORT_SYMBOL(mdiobus_unregister);
diff --git a/drivers/net/phy/phy_device.c b/drivers/net/phy/phy_device.c
index 49328e0..8e25bf7 100644
--- a/drivers/net/phy/phy_device.c
+++ b/drivers/net/phy/phy_device.c
@@ -44,6 +44,17 @@ static struct phy_driver genphy_driver;
 extern int mdio_bus_init(void);
 extern void mdio_bus_exit(void);
 
+void phy_device_free(struct phy_device *phydev)
+{
+   kfree(phydev);
+}
+EXPORT_SYMBOL(phy_device_free);
+
+static void phy_device_release(struct device *dev)
+{
+   phy_device_free(to_phy_device(dev));
+}
+
 struct phy_device* phy_device_create(struct mii_bus *bus, int addr, int phy_id)
 {
struct phy_device *dev;
@@ -54,6 +65,8 @@ struct phy_device* phy_device_create(struct mii_bus *bus, int 
addr, int phy_id)
if (NULL == dev)
return (struct phy_device*) PTR_ERR((void*)-ENOMEM);
 
+   dev-dev.release = phy_device_release;
+
dev-speed = 0;
dev-duplex = -1;
dev-pause = dev-asym_pause = 0;
diff --git a/include/linux/phy.h b/include/linux/phy.h
index 2a65978..9ec1363 100644
--- a/include/linux/phy.h
+++ b/include/linux/phy.h
@@ -398,6 +398,7 @@ int phy_mii_ioctl(struct phy_device *phydev,
 int phy_start_interrupts(struct phy_device *phydev);
 void phy_print_status(struct phy_device *phydev);
 struct phy_device* phy_device_create(struct mii_bus *bus, int addr, int 
phy_id);
+void phy_device_free(struct phy_device *phydev);
 
 extern struct bus_type mdio_bus_type;
 #endif /* __PHY_H */
-- 
1.5.0.6
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC/PATCH 4/4] UDP memory usage accounting (take 4): memory limitation

2007-10-11 Thread Satoshi OSHIMA
Hi Stephen,

Thank you for your comment.

  {
 +.ctl_name   = NET_UDP_MEM,
 +.procname   = udp_mem,
 +.data   = sysctl_udp_mem,
 +.maxlen = sizeof(sysctl_udp_mem),
 +.mode   = 0644,
 +.proc_handler   = proc_dointvec
 +},
 +{
  .ctl_name   = NET_TCP_APP_WIN,
  .procname   = tcp_app_win,
  .data   = sysctl_tcp_app_win,
 
 if you use proc_dointvec_minmax, then you could inforce min/max
 values for udp_mem for the sysctl

udp_mem has two meanings:
* turn off this limitation function (currently udp_mem=4096)
* limit udp memory (currently udp_mem4096)

To realize this,  udp_mem is evaluated whether udp_mem equals
4096 or smaller in UDP and IP layers.

If udp_mem has proc_dointvec_minmax or dedicated proc handler,
turn off check must be done in UDP and IP layers. This means
there is no reduction of the check in UDP and IP layers.


If you pointed out that minus value of udp_mem is strange,
I agree. I'll fix it.

How about this?
min=4096 (and turn off limitation)
udp_mem4096 (and turn on limitation)


Satoshi Oshima

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Regression in net-2.6.24?

2007-10-11 Thread TAKANO Ryousei
From: Ilpo Järvinen [EMAIL PROTECTED]
Subject: Re: [RFC PATCH] [TCP]: Fix lost_retrans loop vs fastpath problems
Date: Thu, 11 Oct 2007 13:12:49 +0300 (EEST)

  But, I got 
  a kernel panic at net_rx_action() in our experimental setting.
  I am using the net-2.6.24 kernel _without_ this patch. 
  (I will post a bug report separately).
 
 ...Please do. :-)
 
I have got a kernel panic, when I run the Iperf benchmark in the following
setting. Each node has x86_64 dual processors and 2 tg3 NICs (BCM95704A6).
If the router does not regulate the bandwidth, a kernel panic does not occur.

Node A  Router --- Delay --- Node B
(Policing rate:  emulator
 500Mbps)(RTT: 20ms)


Ggit-bisect told me that the following commit causes a regression:

commit bea3348eef27e6044b6161fd04c3152215f96411
Author: Stephen Hemminger [EMAIL PROTECTED]
Date:   Wed Oct 3 16:41:36 2007 -0700

[NET]: Make NAPI polling independent of struct net_device objects.


Here is Oops message:

Unable to handle kernel paging request at 00100108 RIP: 
 [80421d59] net_rx_action+0x169/0x1c0
PGD f384d067 PUD 0 
Oops: 0002 [1] SMP 
CPU 0 
Modules linked in: 8021q tcp_bic netconsole evdev joydev sg st sr_mod ohci_hcd 
i2c_amd756 i2c_amd8111 i2c_core ipv6 tg3 usbhid usbcore ff_memless dm_mod ext3 
jbd sata_sil libata sd_mod scsi_mod
Pid: 0, comm: swapper Not tainted 2.6.23-rc9 #2
RIP: 0010:[80421d59]  [80421d59] net_rx_action+0x169/0x1c0
RSP: 0018:80847eb0  EFLAGS: 00010046
RAX: 00200200 RBX: 8100f63ef750 RCX: 0246
RDX: 00100100 RSI: c20002e60204 RDI: 8100f63ef6c0
RBP: 80847ef0 R08: 810178b72fd8 R09: 0001
R10: 0003 R11:  R12: 0040
R13: 0040 R14: 810003f51640 R15: 
FS:  40331960() GS:805d7000() knlGS:
CS:  0010 DS: 0018 ES: 0018 CR0: 8005003b
CR2: 00100108 CR3: f45b6000 CR4: 06e0
DR0:  DR1:  DR2: 
DR3:  DR6: 0ff0 DR7: 0400
Process swapper (pid: 0, threadinfo 806de000, task 8058b540)
Stack:  80847eb0 00ec80847eb0 0001000b17ec 0009
 805e10b0 8083eee0  0009
 80847f30 8023d935  0046
Call Trace:
 IRQ  [8023d935] __do_softirq+0x75/0x100
 [8020cb8c] call_softirq+0x1c/0x30
 [8020dd89] do_softirq+0x49/0xb0
 [8023d4e5] irq_exit+0x45/0x50
 [8020e053] do_IRQ+0x83/0x100
 [8020a310] default_idle+0x0/0x50
 [8020bf11] ret_from_intr+0x0/0xb
 EOI  [8020a33d] default_idle+0x2d/0x50
 [8020aa72] enter_idle+0x22/0x30
 [8020ab0c] cpu_idle+0x8c/0xd0
 [804af6ec] rest_init+0x5c/0x70
 [806e8baf] start_kernel+0x28f/0x300
 [806e8140] _sinittext+0x140/0x180


Code: 48 89 42 08 48 89 10 49 8b 46 08 4c 89 33 49 89 5e 08 48 89 
RIP  [80421d59] net_rx_action+0x169/0x1c0
 RSP 80847eb0
CR2: 00100108
Kernel panic - not syncing: Aiee, killing interrupt handler!


RIP points at __list_del (net_rx_action - list_move_tail - __list_del).

$ objdump -S net/core/dev.o | less
  snip
static inline void __list_del(struct list_head * prev, struct list_head * next)
{
 b42:   48 8b 43 08 mov0x8(%rbx),%rax
 b46:   48 8b 13mov(%rbx),%rdx
next-prev = prev;
 b49:   48 89 42 08 mov%rax,0x8(%rdx)  
prev-next = next;
 b4d:   48 89 10mov%rdx,(%rax)
 b50:   49 8b 46 08 mov0x8(%r14),%rax
 b54:   4c 89 33mov%r14,(%rbx)
 b57:   49 89 5e 08 mov%rbx,0x8(%r14)
 b5b:   48 89 18mov%rbx,(%rax)


What is happen and what information do you need to fix it?

Thanks in advance,
Ryousei Takano
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] IB/ipoib: Bound the net device to the ipoib_neigh structue

2007-10-11 Thread Moni Shoua
Roland Dreier wrote:
   I also ran a test for the code in the branch of 2.6.24 and found a problem.
   I see that ifconfig down doesn't return (for IPoIB interfaces) and it's 
 stuck in napi_disable() in the kernel (any idea why?)
 
 For what it's worth, I took the upstream 2.6.23 git tree and merged in
 Dave's latest net-2.6.24 tree and my latest for-2.6.24 tree and tried
 that.  I brought up an IPoIB interface, sent a few pings, and did
 ifconfig down, and it worked fine.
 
 Can you try the same thing without the bonding patches to see if your
 setup works OK too?
 
 Also can you give more details about what you do to get ifconfig down stuck?
 
  - R.

Without bonding ifconfig down works fine. 
It happens only when ib interfaces are slaves of a bonding device.
I thought before that the stuck is in napi_disable() but it's almost right.
I put prints before and after call to napi_disable and see that it is called 
twice.
I'll try to investigate in this direction.

ib0: stopping interface
ib0: before napi_disable
ib0: after napi_disable
ib0: downing ib_dev
ib0: All sends and receives done.
ib0: stopping interface
ib0: before napi_disable



There is also a dump of the kernel log after 'echo t  /proc/sysrq-trigger' 
(for ifconfig)

SysRq : Show State

ifconfig  S  0  6311   6099
 810034f49d18 0086  
 810037e747c0 810037e747c0 00013481e000 81003a851a78
 81003a851840 3b0c8c00  802358ee
Call Trace:
 [8023cc89] lock_timer_base+0x24/0x49
 [80403754] schedule_timeout+0x8a/0xad
 [8023d241] process_timeout+0x0/0x5
 [8023d6ec] msleep_interruptible+0x11/0x39
 [884081a7] :ib_ipoib:ipoib_stop+0x64/0x12c
 [8039fc07] dev_close+0x3e/0x56
 [803a1c31] dev_change_flags+0xa7/0x15f
 [803e5bee] devinet_ioctl+0x293/0x5ed
 [803e775b] inet_ioctl+0x7f/0x9d
 [80395b2e] sock_ioctl+0x0/0x1fe
 [80395d08] sock_ioctl+0x1da/0x1fe
 [802947d9] do_ioctl+0x29/0x6f
 [80294a75] vfs_ioctl+0x256/0x267
 [80294adf] sys_ioctl+0x59/0x7a
 [8020bc0e] system_call+0x7e/0x83


-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH]race between open and disconnect in irda-usb

2007-10-11 Thread Oliver Neukum
Hi,

it seems to me that irda_usb_net_open() must set self-netopen
under spinlock or disconnect() may fail to kill all URBs.

Signed-off-by: Oliver Neukum [EMAIL PROTECTED]

Regards
Oliver



--- a/drivers/net/irda/irda-usb.c   2007-10-11 17:24:35.0 +0200
+++ b/drivers/net/irda/irda-usb.c   2007-10-11 17:30:30.0 +0200
@@ -1168,6 +1168,7 @@ static int stir421x_patch_device(struct 
 static int irda_usb_net_open(struct net_device *netdev)
 {
struct irda_usb_cb *self;
+   unsigned long flags;
charhwname[16];
int i;

@@ -1177,13 +1178,16 @@ static int irda_usb_net_open(struct net_
self = (struct irda_usb_cb *) netdev-priv;
IRDA_ASSERT(self != NULL, return -1;);
 
+   spin_lock_irqsave(self-lock, flags);
/* Can only open the device if it's there */
if(!self-present) {
+   spin_unlock_irqrestore(self-lock, flags);
IRDA_WARNING(%s(), device not present!\n, __FUNCTION__);
return -1;
}
 
if(self-needspatch) {
+   spin_unlock_irqrestore(self-lock, flags);
IRDA_WARNING(%s(), device needs patch\n, __FUNCTION__) ;
return -EIO ;
}
@@ -1198,6 +1202,7 @@ static int irda_usb_net_open(struct net_
/* To do *before* submitting Rx urbs and starting net Tx queue
 * Jean II */
self-netopen = 1;
+   spin_unlock_irqrestore(self-lock, flags);
 
/* 
 * Now that everything should be initialized properly,

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: e100 problems in .23rc8 ?

2007-10-11 Thread Kok, Auke
Herbert Xu wrote:
 On Wed, Oct 10, 2007 at 08:36:38PM -0400, Dave Jones wrote:
 The e1000 changes you reference above, is this the changeset you mean?

 commit 416b5d10afdc797c21c457ade3714e8f2f75edd9
 Author: Auke Kok [EMAIL PROTECTED]
 Date:   Fri Jun 1 10:22:39 2007 -0700

 e1000: disable polling before registering netdevice
 
 Yep.

this patch actually called napi_disable() in the probe routine which was wrong,
but e100 does not do that. Nonetheless e100 doesn't call netif_carrier_off() and
netif_stop_queue(), so to make e100 the same as e1000 we should probably do 
this,
see below.

Dave, can you see if this resolves the issue for you? If so then we might want 
to
push this to -stable.

Auke


---
e100: disable netdevice explicitly to avoid rx irq oops

Several reported OOPS messages suggest that e100 has a race that was fixed in
e1000 before where incoming interrupts trigger an OOPS immediately after probe()
finishes.

Signed-off-by: Auke Kok [EMAIL PROTECTED]

diff --git a/drivers/net/e100.c b/drivers/net/e100.c
index 280313b..ded5f68 100644
--- a/drivers/net/e100.c
+++ b/drivers/net/e100.c
@@ -2682,6 +2682,10 @@ static int __devinit e100_probe(struct pci_dev *pdev,
if (err)
DPRINTK(PROBE, ERR, Error clearing wake event\n);

+   /* tell the stack to leave us alone until e100_open() is called */
+   netif_carrier_off(netdev);
+   netif_stop_queue(netdev);
+
strcpy(netdev-name, eth%d);
if((err = register_netdev(netdev))) {
DPRINTK(PROBE, ERR, Cannot register net device, aborting.\n);
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] rtnl: Simplify ASSERT_RTNL

2007-10-11 Thread Eric W. Biederman
Herbert Xu [EMAIL PROTECTED] writes:

 Well thanks to that warning we're on our way of improving the
 code that triggered it in such a way that this warning will soon
 go silent.

 That's precisely the reason why I object to having this warning
 removed.  Now you have a good point that this warning doesn't
 trigger all the time.  The fix to that is to *make* it trigger
 always, not removing it.

I'm almost convinced but.

Where people deliberately use convoluted locking is where we
most need things like ASSERT_RTNL.

Having ASSERT_RTNL warn if you were sleeping does not seem
intuitive from the name.

This instance of convoluted locking seems like a complete
one off to me, and if it will warn about other constructs
currently in the kernel it seems wrong.

Frankly I don't feel comfortable adding the check because I can't
defend the presence of might_sleep() in ASSERT_RTNL.  If I can't
understand a change well enough to defend it I will not take
responsibility for it, and I will not add my Signed-off-by to it.

The patch I wrote was trivial a trivial optimization and obviously
correct.  Adding the might_sleep() and the patch becomes the start
of a crusade for better code that I don't believe in.

So I would rather forget this patch then make that one line addition.

Thanks,
Eric



-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: authenc compile warnings in current net-2.6.24

2007-10-11 Thread Oliver Hartkopp

Sebastian Siewior wrote:

* David Miller | 2007-10-10 16:25:28 [-0700]:

  

From: Sebastian Siewior [EMAIL PROTECTED]
Date: Wed, 10 Oct 2007 21:53:37 +0200



* Oliver Hartkopp | 2007-10-10 19:53:53 [+0200]:

  

CC [M] crypto/authenc.o
crypto/authenc.c: In function ?crypto_authenc_hash?:
crypto/authenc.c:88: warning: ?cryptlen? may be used uninitialized in this 
function
crypto/authenc.c:87: warning: ?dst? may be used uninitialized in this 
function

crypto/authenc.c: In function ?crypto_authenc_decrypt?:
crypto/authenc.c:163: warning: ?cryptlen? may be used uninitialized in this 
function

crypto/authenc.c:163: note: ?cryptlen? was declared here
crypto/authenc.c:162: warning: ?src? may be used uninitialized in this 
function

crypto/authenc.c:162: note: ?src? was declared here

do you already know these warnings?


Those warnings are looking like a compiler bug to me.
  

To be honest I don't know of any compiler which commits enough
flow variable analysis to support doing %100 accurate warnings
in situations like this.



gcc (GCC) 4.1.2 (Gentoo 4.1.2) did not produce any warnings in this
case.

  


Hi Sebasian,

my gcc was the lastest Debian unstable one:

gcc -v

Using built-in specs.
Target: i486-linux-gnu
Configured with: ../src/configure -v 
--enable-languages=c,c++,fortran,objc,obj-c++,treelang --prefix=/usr 
--enable-shared --with-system-zlib --libexecdir=/usr/lib 
--without-included-gettext --enable-threads=posix --enable-nls 
--with-gxx-include-dir=/usr/include/c++/4.2 --program-suffix=-4.2 
--enable-clocale=gnu --enable-libstdcxx-debug --enable-mpfr 
--enable-targets=all --disable-werror --enable-checking=release 
--build=i486-linux-gnu --host=i486-linux-gnu --target=i486-linux-gnu

Thread model: posix
gcc version 4.2.1 (Debian 4.2.1-5)

Regards,
Oliver


-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH]race between open and disconnect in irda-usb

2007-10-11 Thread Jean Tourrilhes
On Thu, Oct 11, 2007 at 05:45:28PM +0200, Oliver Neukum wrote:
 Hi,
 

Hi,

You need to cc Samuel Ortiz and the IrDA mailing list if you
want any meaningful response.

 it seems to me that irda_usb_net_open() must set self-netopen
 under spinlock or disconnect() may fail to kill all URBs.

I believe to patch is correct, but in practice I fail to see
how it could happen. Anyway, Samuel will have the final word.

 Signed-off-by: Oliver Neukum [EMAIL PROTECTED]
 
   Regards
   Oliver

Have fun...

Jean

 
 
 --- a/drivers/net/irda/irda-usb.c 2007-10-11 17:24:35.0 +0200
 +++ b/drivers/net/irda/irda-usb.c 2007-10-11 17:30:30.0 +0200
 @@ -1168,6 +1168,7 @@ static int stir421x_patch_device(struct 
  static int irda_usb_net_open(struct net_device *netdev)
  {
   struct irda_usb_cb *self;
 + unsigned long flags;
   charhwname[16];
   int i;
   
 @@ -1177,13 +1178,16 @@ static int irda_usb_net_open(struct net_
   self = (struct irda_usb_cb *) netdev-priv;
   IRDA_ASSERT(self != NULL, return -1;);
  
 + spin_lock_irqsave(self-lock, flags);
   /* Can only open the device if it's there */
   if(!self-present) {
 + spin_unlock_irqrestore(self-lock, flags);
   IRDA_WARNING(%s(), device not present!\n, __FUNCTION__);
   return -1;
   }
  
   if(self-needspatch) {
 + spin_unlock_irqrestore(self-lock, flags);
   IRDA_WARNING(%s(), device needs patch\n, __FUNCTION__) ;
   return -EIO ;
   }
 @@ -1198,6 +1202,7 @@ static int irda_usb_net_open(struct net_
   /* To do *before* submitting Rx urbs and starting net Tx queue
* Jean II */
   self-netopen = 1;
 + spin_unlock_irqrestore(self-lock, flags);
  
   /* 
* Now that everything should be initialized properly,
 
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Fwd: [PATCH#2 3/4] [PPC] Compile fix for 8xx CPM Ehernet driver

2007-10-11 Thread Jochen Friedrich

Hi Jeff,


Kumar Gala wrote:
  

Begin forwarded message:



From: Jochen Friedrich [EMAIL PROTECTED]
Date: September 24, 2007 12:15:35 PM CDT
To: [EMAIL PROTECTED]
Cc: [EMAIL PROTECTED], Marcelo Tosatti [EMAIL PROTECTED]
Subject: [PATCH#2 3/4] [PPC] Compile fix for 8xx CPM Ehernet driver
  

Jeff,

Please pick up for 2.6.23 if you don't mind.



Please send an apply-able patch...

Jeff, off to bed
  


I rebased git://git.bocc.de/dbox2.git ppc-fixes against 2.6.23. This 
patch is the only remaining one in this head.

I'll resend.

Thanks,
Jochen
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] [PPC] Compile fix for 8xx CPM Ehernet driver

2007-10-11 Thread Jochen Friedrich


Add #include asm/cacheflush.h for flush_dcache_range
to make the driver compile again.

 CC  arch/ppc/8xx_io/enet.o
arch/ppc/8xx_io/enet.c: In function 'scc_enet_start_xmit':
arch/ppc/8xx_io/enet.c:240: error: implicit declaration of function
'flush_dcache_range'
make[1]: *** [arch/ppc/8xx_io/enet.o] Error 1
make: *** [arch/ppc/8xx_io] Error 2

Signed-off-by: Jochen Friedrich [EMAIL PROTECTED]
---
arch/ppc/8xx_io/enet.c |1 +
1 files changed, 1 insertions(+), 0 deletions(-)
diff --git a/arch/ppc/8xx_io/enet.c b/arch/ppc/8xx_io/enet.c
index 703d47e..eace3bc 100644
--- a/arch/ppc/8xx_io/enet.c
+++ b/arch/ppc/8xx_io/enet.c
@@ -44,6 +44,7 @@
 #include asm/mpc8xx.h
 #include asm/uaccess.h
 #include asm/commproc.h
+#include asm/cacheflush.h
 
 /*
  *Theory of Operation



Re: [PATCH][BNX2X] round three

2007-10-11 Thread Eliezer Tamir

[added Andy Whitcroft, who is listed as a CHECKPATCH maintainer]

Stephen Hemminger wrote:

Minor formatting nits reported by checkpatch.pl script:


Thanks, I Will fix them.


...


WARNING: no space between function name and open parenthesis '('
#777: FILE: drivers/net/bnx2x.c:722:
+   case (RAMROD_CMD_ID_ETH_PORT_SETUP | BNX2X_STATE_OPENING_WAIT4_PORT):

WARNING: no space between function name and open parenthesis '('
#782: FILE: drivers/net/bnx2x.c:727:
+   case (RAMROD_CMD_ID_ETH_HALT | BNX2X_STATE_CLOSING_WAIT4_HALT):

WARNING: no space between function name and open parenthesis '('
#788: FILE: drivers/net/bnx2x.c:733:
+   case (RAMROD_CMD_ID_ETH_PORT_DEL | BNX2X_STATE_CLOSING_WAIT4_DELETE):

WARNING: no space between function name and open parenthesis '('
#793: FILE: drivers/net/bnx2x.c:738:
+   case (RAMROD_CMD_ID_ETH_SET_MAC | BNX2X_STATE_OPEN):



These look like false positives

...



CHECK: spinlock_t definition without comment
#9486: FILE: drivers/net/bnx2x.h:508:
+   spinlock_t  spq_lock;  /* Used to serialize slowpath

This one too

...


ERROR: Macros with complex values should be enclosed in parenthesis
#21196: FILE: drivers/net/bnx2x_init.h:138:
+#define INIT_INTERNAL0_MEM_WR(block_bar, block, reg, \
+ part, hw, value, off, len) \


Not sure, I think this one is too, but I'm going to rewrite bnx2x_init.h anyway,
this code is going away.

...

Thanks
Eliezer



-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: e100 problems in .23rc8 ?

2007-10-11 Thread Dave Jones
On Thu, Oct 11, 2007 at 09:10:34AM -0700, Kok, Auke wrote:
  Herbert Xu wrote:
   On Wed, Oct 10, 2007 at 08:36:38PM -0400, Dave Jones wrote:
   The e1000 changes you reference above, is this the changeset you mean?
  
   commit 416b5d10afdc797c21c457ade3714e8f2f75edd9
   Author: Auke Kok [EMAIL PROTECTED]
   Date:   Fri Jun 1 10:22:39 2007 -0700
  
   e1000: disable polling before registering netdevice
   
   Yep.
  
  this patch actually called napi_disable() in the probe routine which was 
  wrong,
  but e100 does not do that. Nonetheless e100 doesn't call netif_carrier_off() 
  and
  netif_stop_queue(), so to make e100 the same as e1000 we should probably do 
  this,
  see below.
  
  Dave, can you see if this resolves the issue for you? If so then we might 
  want to
  push this to -stable.
 
Will do, thanks Auke.

Eric/David, the Fedora 8 RPM version 2.6.23-6.fc8 will have this if you
want to give it a shot too.  It'll be at
http://people.redhat.com/davej/kernels/Fedora/f7.92/ when it's done
building in an hour or so.

Dave

-- 
http://www.codemonkey.org.uk
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[IPv6] Update setsockopt(IPV6_MULTICAST_IF) to support RFC 3493, try2

2007-10-11 Thread Brian Haley

Hi,

From RFC 3493, Section 5.2:

  IPV6_MULTICAST_IF

 Set the interface to use for outgoing multicast packets.  The
 argument is the index of the interface to use.  If the
 interface index is specified as zero, the system selects the
 interface (for example, by looking up the address in a routing
 table and using the resulting interface).

This patch adds support for (index == 0) to reset the value to it's 
original state, allowing the system to choose the best interface.  IPv4 
already behaves this way.


-Brian


Signed-off-by: Brian Haley [EMAIL PROTECTED]
diff --git a/net/ipv6/ipv6_sockglue.c b/net/ipv6/ipv6_sockglue.c
index 532425d..1334fc1 100644
--- a/net/ipv6/ipv6_sockglue.c
+++ b/net/ipv6/ipv6_sockglue.c
@@ -539,12 +539,15 @@ done:
 	case IPV6_MULTICAST_IF:
 		if (sk-sk_type == SOCK_STREAM)
 			goto e_inval;
-		if (sk-sk_bound_dev_if  sk-sk_bound_dev_if != val)
-			goto e_inval;
 
-		if (__dev_get_by_index(init_net, val) == NULL) {
-			retv = -ENODEV;
-			break;
+		if (val) {
+			if (sk-sk_bound_dev_if  sk-sk_bound_dev_if != val)
+goto e_inval;
+
+			if (__dev_get_by_index(init_net, val) == NULL) {
+retv = -ENODEV;
+break;
+			}
 		}
 		np-mcast_oif = val;
 		retv = 0;


Re: [IPv6] Update setsockopt(IPV6_MULTICAST_IF) to support RFC 3493, try2

2007-10-11 Thread David Stevens
Acked-by: David L Stevens [EMAIL PROTECTED]

 
 Signed-off-by: Brian Haley [EMAIL PROTECTED]
 diff --git a/net/ipv6/ipv6_sockglue.c b/net/ipv6/ipv6_sockglue.c
 index 532425d..1334fc1 100644
 --- a/net/ipv6/ipv6_sockglue.c
 +++ b/net/ipv6/ipv6_sockglue.c
 @@ -539,12 +539,15 @@ done:
 case IPV6_MULTICAST_IF:
if (sk-sk_type == SOCK_STREAM)
   goto e_inval;
 -  if (sk-sk_bound_dev_if  sk-sk_bound_dev_if != val)
 - goto e_inval;
 
 -  if (__dev_get_by_index(init_net, val) == NULL) {
 - retv = -ENODEV;
 - break;
 +  if (val) {
 + if (sk-sk_bound_dev_if  sk-sk_bound_dev_if != val)
 +goto e_inval;
 +
 + if (__dev_get_by_index(init_net, val) == NULL) {
 +retv = -ENODEV;
 +break;
 + }
}
np-mcast_oif = val;
retv = 0;

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH][NETNS] Make ifindex generation per-namespace

2007-10-11 Thread Eric W. Biederman
Johannes Berg [EMAIL PROTECTED] writes:

 On Wed, 2007-10-10 at 13:51 -0600, Eric W. Biederman wrote:

 Yes.  Netlink sockets are per-namespace and you can use the namespace
 of a netlink socket to look up a netdev.

 Ok, thanks. I still haven't really looked into the wireless vs. net
 namespaces problem but this will probably help.

I think I may even have some patches in my proof of concept tree that
address some of the wireless issues.  Especially rtnetlink ones.
Generally those cases haven't been hard to spot.

Having hash tables and the like that hash and do key compares
on an ifindex instead of a net_device * are the in kernel places that
make it very hard to have duplicate ifindexes.

Thinking about it probably the biggest challenge to deal with
is iff in struct sk_buff.

Eric
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH][BNX2X] round three - sparse warnings.

2007-10-11 Thread Eliezer Tamir
On Wed, 2007-10-10 at 18:54 -0700, Stephen Hemminger wrote:
 Please fix these...
 

Thanks, 
All that code is going to be rewritten, as per Dave's request,
but I will make sure to test the new code with sparse.

Eliezer



-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH][BNX2X] round three

2007-10-11 Thread Eliezer Tamir
On Wed, 2007-10-10 at 17:59 -0700, David Miller wrote:

...

 I was going to add this to the tree for 2.6.24 but there is simply
 too much super-ugly stuff in this driver for me to do so.

We will rewrite bnx2x_hsi.h bnx2x_init.h and bnx2x_init_vlaues.h.

Does bnx2x_asm.h look OK?


 Look, there is zero way I'm adding a driver that's written like this
 to the tree.
 
...

 This huge header file full of register programming magic goes way
 beyond the limits of reasonable and has to go.

Will it be OK if we replace it with 4-5 headers so we can maintain them
better manually. (I'm asking what is the upper bound to the number of
header files you would consider reasonable for this driver.)

Thanks
Eliezer


-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


iproute2: resend of patches from Debian.

2007-10-11 Thread Andreas Henriksson
Hello all!

Posting this set of patches again, since there where no reaction last time I 
sent them.

Regards,
Andreas Henriksson




Hello Stephen Hemminger and the rest of the people on the nevdev list!

I'm posting a bunch of patches for iproute2. A few I've written myself
and have a Signed-off-by with my name in them, the others I've picked up
from the iproute package in Debian. I tried my best to add a decent
description, references to debian bug number and when possible a From
header suggesting who the original author might be.
They should all apply with some offset and sometimes a bit fuzz to the
current iproute2 git tree.


new patches posted to the debian bug tracking system:
add-mpath-docs.diff
doublefree.diff
fix-layer-syntax.diff
new-manpages.diff
tick2time-fix.diff
wrandom-fix.diff

fixes in the debian iproute package:
empty_linkname.diff
ip_address_flush_loop.diff
lartc.diff
netbug_fix.diff
okey-ikey.diff
remove_tc_filters_reference.diff

patches updating help text syntax:
ip_route_usage.diff
ip_rule_usage.diff

patches fixing documentation typos:
fix_ss_typo.diff
ip.8-typo.diff
ip_address.diff
libnetlink_typo.diff
manpages-typo.diff
tcb_htb_typo.diff
tc_cbq_details_typo.diff


-- 
Regards,
Andreas Henriksson
Disallow empty link name.

From: Alexander Wirt [EMAIL PROTECTED]

Patch from debian iproute package.

diff -urNad iproute-20060323~/ip/iplink.c iproute-20060323/ip/iplink.c
--- iproute-20060323~/ip/iplink.c	2006-03-22 00:57:50.0 +0100
+++ iproute-20060323/ip/iplink.c	2006-09-08 21:07:14.0 +0200
@@ -384,6 +384,10 @@
 	}
 
 	if (newname  strcmp(dev, newname)) {
+		if (strlen(newname) == 0) {
+		printf(\\ is not valid device identifier\n,dev);
+		return -1;
+		}
 		if (do_changename(dev, newname)  0)
 			return -1;
 		dev = newname;
Abort flush after 10 seconds.

From: Alexander Wirt [EMAIL PROTECTED]

Patch from Debian iproute package.

diff -urNad iproute-20060323~/ip/ipaddress.c iproute-20060323/ip/ipaddress.c
--- iproute-20060323~/ip/ipaddress.c	2006-09-08 17:02:03.0 +0200
+++ iproute-20060323/ip/ipaddress.c	2006-09-08 17:03:01.0 +0200
@@ -15,6 +15,7 @@
 #include stdio.h
 #include stdlib.h
 #include unistd.h
+#include time.h
 #include syslog.h
 #include fcntl.h
 #include sys/ioctl.h
@@ -589,6 +590,7 @@
 	if (flush) {
 		int round = 0;
 		char flushb[4096-512];
+		time_t start = time(0);
 
 		filter.flushb = flushb;
 		filter.flushp = 0;
@@ -617,6 +619,12 @@
 printf(Warum?\n);
 return 1;
 
+if (time(0) - start  10) {
+printf(\n*** Flush not completed after %ld seconds, %d entries remain ***\n,
+   time(0) - start, filter.flushed);
+exit(1);
+}
+
 			if (show_stats) {
 printf(\n*** Round %d, deleting %d addresses ***\n, round, filter.flushed);
 fflush(stdout);
Add references to lartc, also drop bogus reference to tc-filters

Patch from Debian iproute package.

diff -Nur old/man/man8/ip.8 new/man/man8/ip.8
--- old/man/man8/ip.8	2005-01-05 22:40:29.0 +
+++ new/man/man8/ip.8	2005-01-05 22:47:03.0 +
@@ -1803,6 +1803,8 @@
 .RB IP Command reference  ip-cref.ps
 .br
 .RB IP tunnels  ip-cref.ps
+.br
+.RB http://lartc.org/
 
 .SH AUTHOR
 
diff -Nur old/man/man8/tc.8 new/man/man8/tc.8
--- old/man/man8/tc.8	2004-10-19 20:49:02.0 +
+++ new/man/man8/tc.8	2005-01-05 22:46:15.0 +
@@ -341,7 +341,7 @@
 .BR tc-pfifo (8),
 .BR tc-bfifo (8),
 .BR tc-pfifo_fast (8),
-.BR tc-filters (8)
+.BR http://lartc.org/
 
 .SH AUTHOR
 Manpage maintained by bert hubert ([EMAIL PROTECTED])
Fix misc/netbug script.

See the following debian bugs:
http://bugs.debian.org/289541
http://bugs.debian.org/313540
http://bugs.debian.org/313541
http://bugs.debian.org/313544
http://bugs.debian.org/347699

Changes from Debian iproute package rediffed to apply against current
iproute2 git tree.

--- iproute2/misc/netbug	2007-09-09 17:36:19.0 +0200
+++ iproute-20070313/misc/netbug	2007-09-09 20:42:01.0 +0200
@@ -1,23 +1,16 @@
 #! /bin/bash
 
+set -e
+
 echo -n Send network configuration summary to [ENTER means [EMAIL PROTECTED] 
 IFS= read mail || exit 1
 [ -z $mail ]  [EMAIL PROTECTED]
 
+netbug=`mktemp -d -t netbug.XX` || (echo $0: Cannot create temporary directory 2; exit 1; )
+netbugtar=`tempfile -d $netbug --suffix=tar.gz` || (echo $0: Cannot create temporary file 2; exit 1; )
+tmppath=$netbug
+trap /bin/rm -rf $netbug $netbugtar 0 1 2 3 13 15
 
-netbug=
-while [ $netbug =  ]; do
-	netbug=`echo netbug.$$.$RANDOM`
-	if [ -e /tmp/$netbug ]; then
-		netbug=
-	fi
-done
-
-tmppath=/tmp/$netbug
-
-trap rm -rf $tmppath $tmppath.tar.gz 0 SIGINT
-
-mkdir $tmppath
 mkdir $tmppath/net
 
 cat /proc/slabinfo  $tmppath/slabinfo
@@ -44,9 +37,8 @@
 fi
 
 cd /tmp
-tar c $netbug | gzip -9c  $netbug.tar.gz
-
-uuencode $netbug.tar.gz $netbug.tar.gz | mail -s $netbug $mail
+tar c 

Re: e100 problems in .23rc8 ?

2007-10-11 Thread Eric Sandeen

Eric/David, the Fedora 8 RPM version 2.6.23-6.fc8 will have this if you
want to give it a shot too.  It'll be at
http://people.redhat.com/davej/kernels/Fedora/f7.92/ when it's done
building in an hour or so.

Dave



Thanks, I'll give it a whirl this evening.  I put a new net card in that 
box 'cause I got tired of resetting it :)


-Eric
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: iproute2: resend of patches from Debian.

2007-10-11 Thread Lennart Sorensen
On Thu, Oct 11, 2007 at 08:25:32PM +0200, Andreas Henriksson wrote:
 Patch from debian iproute package.
 
 diff -urNad iproute-20060323~/ip/iplink.c iproute-20060323/ip/iplink.c
 --- iproute-20060323~/ip/iplink.c 2006-03-22 00:57:50.0 +0100
 +++ iproute-20060323/ip/iplink.c  2006-09-08 21:07:14.0 +0200
 @@ -384,6 +384,10 @@
   }
  
   if (newname  strcmp(dev, newname)) {
 + if (strlen(newname) == 0) {
 + printf(\\ is not valid device identifier\n,dev);
 + return -1;
 + }
   if (do_changename(dev, newname)  0)
   return -1;
   dev = newname;

Isn't that printf missing somewhere for the 'dev' argument to go?

--
Len Sorensen
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC/PATCH 4/4] UDP memory usage accounting (take 4): memory limitation

2007-10-11 Thread Stephen Hemminger
On Thu, 11 Oct 2007 21:51:14 +0900
Satoshi OSHIMA [EMAIL PROTECTED] wrote:

 Hi Stephen,
 
 Thank you for your comment.
 
 {
  +  .ctl_name   = NET_UDP_MEM,
  +  .procname   = udp_mem,
  +  .data   = sysctl_udp_mem,
  +  .maxlen = sizeof(sysctl_udp_mem),
  +  .mode   = 0644,
  +  .proc_handler   = proc_dointvec
  +  },
  +  {
 .ctl_name   = NET_TCP_APP_WIN,
 .procname   = tcp_app_win,
 .data   = sysctl_tcp_app_win,
  
  if you use proc_dointvec_minmax, then you could inforce min/max
  values for udp_mem for the sysctl

One other comment. Sysctl value indexes are deprecated at this point
so all new values should use CTL_UNNUMBERED.  Therefore unless NET_UDP_MEM
already exists, please don't add it.

-- 
Stephen Hemminger [EMAIL PROTECTED]
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: iproute2: resend of patches from Debian.

2007-10-11 Thread Andreas Henriksson
On tor, 2007-10-11 at 15:25 -0400, Lennart Sorensen wrote:
 On Thu, Oct 11, 2007 at 08:25:32PM +0200, Andreas Henriksson wrote:
  Patch from debian iproute package.
  
  diff -urNad iproute-20060323~/ip/iplink.c iproute-20060323/ip/iplink.c
  --- iproute-20060323~/ip/iplink.c   2006-03-22 00:57:50.0 +0100
  +++ iproute-20060323/ip/iplink.c2006-09-08 21:07:14.0 +0200
  @@ -384,6 +384,10 @@
  }
   
  if (newname  strcmp(dev, newname)) {
  +   if (strlen(newname) == 0) {
  +   printf(\\ is not valid device identifier\n,dev);
  +   return -1;
  +   }
  if (do_changename(dev, newname)  0)
  return -1;
  dev = newname;
 
 Isn't that printf missing somewhere for the 'dev' argument to go?

This is not my patch so I don't know what the original intention was,
but now that you point it out I'd say that the dev argument should just
be removed.
The new check is for an empty new name old name (dev) probably isn't
interesting.

I think Alexander Wirt is the original author, lets CC and see if he has
a comment...

-- 
Regards,
Andreas Henriksson

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] IB/ipoib: Bound the net device to the ipoib_neigh structue

2007-10-11 Thread Roland Dreier
  It happens only when ib interfaces are slaves of a bonding device.
  I thought before that the stuck is in napi_disable() but it's almost right.
  I put prints before and after call to napi_disable and see that it is called 
  twice.
  I'll try to investigate in this direction.
  
  ib0: stopping interface
  ib0: before napi_disable
  ib0: after napi_disable
  ib0: downing ib_dev
  ib0: All sends and receives done.
  ib0: stopping interface
  ib0: before napi_disable

Yes, two napi_disable()s in a row without a matching napi_enable()
will deadlock.  I guess the question is why the ipoib interface is
being stopped twice.

If you just take the net-2.6.24 tree (without bonding patches), does
bonding for ethernet interfaces work OK, or is there a similar problem
with double napi_disable()?  How about bonding of ethernet after this
batch of bonding patches?

 - R.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: iproute2: resend of patches from Debian.

2007-10-11 Thread Stephen Hemminger
On Thu, 11 Oct 2007 20:25:32 +0200
Andreas Henriksson [EMAIL PROTECTED] wrote:

 Hello all!
 
 Posting this set of patches again, since there where no reaction last time I 
 sent them.
 
 Regards,
 Andreas Henriksson
 


Since there are so many. Please send them as git separate emails to me and
make sure they are in the proper format for importing into git. If you really
want to make it easy, just put them in git yourself and then use 'git 
formatpatch origin'
so all the formatting is correct.

The current iproute2 git repository is:
git://git.kernel.org/pub/scm/linux/kernel/git/shemminger/iproute2.git
(or symlink git://git.kernel.org/pub/network/iproute2/iproute2.git)


-- 
Stephen Hemminger [EMAIL PROTECTED]
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: iproute2: resend of patches from Debian.

2007-10-11 Thread Patrick McHardy
Andreas Henriksson wrote:
 Abort flush after 10 seconds.

This should be optional IMO.

 Add references to lartc, also drop bogus reference to tc-filters

There's a bad habit of reporting bugs on lartc that belong to netdev,
please also make it perfectly clear that lartc is not for bug-reports.

 Fix overflow in time2tick / tick2time.
 
 The helper functions gets passed an unsigned int, which gets cast to long
 and overflows.
 
 See Debian bug #175462 - http://bugs.debian.org/175462
 
 Signed-off-by: Andreas Henriksson [EMAIL PROTECTED]
 
 diff -uri iproute-20070313.orig/tc/tc_core.c iproute-20070313/tc/tc_core.c
 --- iproute-20070313.orig/tc/tc_core.c2007-03-13 22:50:56.0 
 +0100
 +++ iproute-20070313/tc/tc_core.c 2007-08-15 00:41:30.0 +0200
 @@ -35,12 +35,12 @@
  }
  
  
 -long tc_core_time2tick(long time)
 +unsigned tc_core_time2tick(unsigned time)
  {
   return time*tick_in_usec;
  }
  
 -long tc_core_tick2time(long tick)
 +unsigned tc_core_tick2time(unsigned tick)
  {
   return tick/tick_in_usec;
  }

What about the other functions returning long? I think all of these
should be unsigned.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: iproute2: resend of patches from Debian.

2007-10-11 Thread Andreas Henriksson

On tor, 2007-10-11 at 13:11 -0700, Stephen Hemminger wrote:
 Since there are so many. Please send them as git separate emails to me and
 make sure they are in the proper format for importing into git. If you really
 want to make it easy, just put them in git yourself and then use 'git 
 formatpatch origin'
 so all the formatting is correct.

I'm not so great with git, but I'll read up on how to do this and
hopefully resend properly tomorrow.
(I'll make sure the patch Len commented on also gets fixed by then.)

-- 
Regards,
Andreas Henriksson

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: iproute2: resend of patches from Debian.

2007-10-11 Thread Patrick McHardy
Andreas Henriksson wrote:
 Add mpath to ip manpage.
 
 From: Norbert Buchmuller [EMAIL PROTECTED]
 
 The 'mpath' parameter of 'ip route' is not documented in the manual page
 nor in ip-cref.tex.
 
 ...huge part of the text in the patch was taken from the net/ipv4/Kconfig
 file in the Linux kernel source (2.6.18). I suppose that's OK because both
 Linux and iproute2 are GPL'd, but I let you know anyway.
 
 See Debian bug #428442 - http://bugs.debian.org/428442
 
 diff -Naur iproute-20061002/doc/ip-cref.tex 
 iproute-20061002.fixed/doc/ip-cref.tex
 --- iproute-20061002/doc/ip-cref.tex  2007-06-11 19:26:52.0 +0200
 +++ iproute-20061002.fixed/doc/ip-cref.tex2007-06-11 20:34:09.0 
 +0200
 @@ -1348,6 +1348,28 @@
  route reflecting its relative bandwidth or quality.
  \end{itemize}
  
 +\item \verb|mpath MP_ALGO|


This has been removed from the kernel again because of complete
brokeness, so this patch shouldn't be merged.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: iproute2: resend of patches from Debian.

2007-10-11 Thread Andreas Henriksson

On tor, 2007-10-11 at 22:30 +0200, Patrick McHardy wrote:
 Andreas Henriksson wrote:
  Abort flush after 10 seconds.
 
 This should be optional IMO.

Copying Alexander Wirt here again, as I think this is his patch.
I'll skip this patch in my next resend...

 
  Add references to lartc, also drop bogus reference to tc-filters
 
 There's a bad habit of reporting bugs on lartc that belong to netdev,
 please also make it perfectly clear that lartc is not for bug-reports.

I'll update the patch to mention this.

 
  Fix overflow in time2tick / tick2time.
  
  The helper functions gets passed an unsigned int, which gets cast to long
  and overflows.
  
  See Debian bug #175462 - http://bugs.debian.org/175462
  
  Signed-off-by: Andreas Henriksson [EMAIL PROTECTED]
  
  diff -uri iproute-20070313.orig/tc/tc_core.c iproute-20070313/tc/tc_core.c
  --- iproute-20070313.orig/tc/tc_core.c  2007-03-13 22:50:56.0 
  +0100
  +++ iproute-20070313/tc/tc_core.c   2007-08-15 00:41:30.0 +0200
  @@ -35,12 +35,12 @@
   }
   
   
  -long tc_core_time2tick(long time)
  +unsigned tc_core_time2tick(unsigned time)
   {
  return time*tick_in_usec;
   }
   
  -long tc_core_tick2time(long tick)
  +unsigned tc_core_tick2time(unsigned tick)
   {
  return tick/tick_in_usec;
   }
 
 What about the other functions returning long? I think all of these
 should be unsigned.

Since I'm no superhacker, I was afraid to touch anything else then the
things I could test and this change is what's needed to fix the reported
bug. (Actually only one of them was needed IIRC, but it would be stupid
to only change one.)
I'll have a look at the other helper functions as well, probably adding
a separate patch so that it can be easily dropped if I screw something
up there.. :)

I'll also drop the mpath stuff you mentioned in the other mail...

Thanks for reviewing the patches!


-- 
Regards,
Andreas Henriksson

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFD] iptables: mangle table obsoletes filter table

2007-10-11 Thread Al Boldi
With the existence of the mangle table, how useful is the filter table?

Other than requiring the REJECT target to be ported to the mangle table, is 
the filter table faster than the mangle table?

If not, then shouldn't the filter table be obsoleted to avoid confusion?


Thanks!

--
Al

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [IPv6] Update setsockopt(IPV6_MULTICAST_IF) to support RFC 3493, try2

2007-10-11 Thread David Miller
From: David Stevens [EMAIL PROTECTED]
Date: Thu, 11 Oct 2007 10:49:14 -0700

 Acked-by: David L Stevens [EMAIL PROTECTED]

Applied, thanks everyone!
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH][BNX2X] round three

2007-10-11 Thread David Miller
From: Eliezer Tamir [EMAIL PROTECTED]
Date: Thu, 11 Oct 2007 19:53:39 +0200

 Will it be OK if we replace it with 4-5 headers so we can maintain them
 better manually. (I'm asking what is the upper bound to the number of
 header files you would consider reasonable for this driver.)

It's not the count, it's what's in them that's the problem.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: iproute2: resend of patches from Debian.

2007-10-11 Thread David Miller
From: Andreas Henriksson [EMAIL PROTECTED]
Date: Thu, 11 Oct 2007 23:22:12 +0200

 Copying Alexander Wirt here again, as I think this is his patch.

It is important that whoever is the package maintainer for an
upstream piece of code in any distribution:

1) Completely understands the patches he is applying.

and therefore:

2) When he submits upstream, he doesn't have to wake up
   2,000 patch submitters from the dead just to figure out
   why a change was made.

Neither seems to be occuring here.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] IB/ipoib: Bound the net device to the ipoib_neigh structue

2007-10-11 Thread Jay Vosburgh
Roland Dreier [EMAIL PROTECTED] wrote:
[...]
Yes, two napi_disable()s in a row without a matching napi_enable()
will deadlock.  I guess the question is why the ipoib interface is
being stopped twice.

If you just take the net-2.6.24 tree (without bonding patches), does
bonding for ethernet interfaces work OK, or is there a similar problem
with double napi_disable()?  How about bonding of ethernet after this
batch of bonding patches?

I just checked this on an x86 box.  The bonding in stock net-2.6
pulled this morning or last night works ok (I did some basic tests,
including ifconfig down / up, with e100).  This remains true with the
IPoIB bonding patches applied.  I do not have hardware available to test
IPoIB.

I did get a whammy from tg3, but I think this is unrelated to
bonding (as it happens when tg3 comes up, before bonding is involved):

BUG: unable to handle kernel paging request at virtual address 4214
 printing eip:
e0828017
*pde = 
Oops: 0002 [#1]
SMP
Modules linked in: thermal processor fan button loop e1000 sg evdev tg3 e100 rtb
CPU:0
EIP:0060:[e0828017]Not tainted VLI
EFLAGS: 00010206   (2.6.23-ipv6 #1)
EIP is at tg3_ape_write32+0x7/0x10 [tg3]
eax: de9304c0   ebx: dde8fe18   ecx:    edx: 4214
esi: de9304c0   edi:    ebp: dde8fe28   esp: dde8fdd4
ds: 007b   es: 007b   fs: 00d8  gs: 0033  ss: 0068
Process ip (pid: 2817, ti=dde8e000 task=dff4e0b0 task.ti=dde8e000)
Stack: e082fb2e  dde8fdf4 c01ece3e dde8fdf8 03fe  5400
   0800 1aa0 e083b340 08001aa0 0060 e083ce00 08001b20 0030
   e083ce80 0101 de9304c0 0001 dde56800 dde8fe38 e0830178 dff69000
Call Trace:
 [c010536a] show_trace_log_lvl+0x1a/0x30
 [c0105429] show_stack_log_lvl+0xa9/0xd0
 [c0105639] show_registers+0x1e9/0x2f0
 [c0105851] die+0x111/0x260
 [c011c5dc] do_page_fault+0x18c/0x6a0
 [c0319bea] error_code+0x72/0x78
 [e0830178] tg3_init_hw+0x38/0x50 [tg3]
 [e0838886] tg3_open+0x276/0x5d0 [tg3]
 [c02aead8] dev_open+0x38/0x80
 [c02ad5cd] dev_change_flags+0x7d/0x1a0
 [c02f63d8] devinet_ioctl+0x4c8/0x660
 [c02f698b] inet_ioctl+0x6b/0x90
 [c02a0e5a] sock_ioctl+0x5a/0x210
 [c017cd98] do_ioctl+0x28/0x80
 [c017ce47] vfs_ioctl+0x57/0x290
 [c017d0b9] sys_ioctl+0x39/0x60
 [c01042a2] sysenter_past_esp+0x5f/0x99
 ===
Code: 89 0a c3 8d b6 00 00 00 00 55 8b 48 50 89 e5 5d 01 ca 8b 02 c3 8d
EIP: [e0828017] tg3_ape_write32+0x7/0x10 [tg3] SS:ESP 0068:dde8fdd4
Kernel panic - not syncing: Fatal exception in interrupt

I haven't investigated this further.  I'm using a BCM5704 card;
if this isn't a known problem and anyone is curious, I can supply
additional info.

-J

---
-Jay Vosburgh, IBM Linux Technology Center, [EMAIL PROTECTED]

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: iproute2: resend of patches from Debian.

2007-10-11 Thread Andreas Henriksson
On tor, 2007-10-11 at 14:48 -0700, David Miller wrote:
 From: Andreas Henriksson [EMAIL PROTECTED]
 Date: Thu, 11 Oct 2007 23:22:12 +0200
 
  Copying Alexander Wirt here again, as I think this is his patch.
 
 It is important that whoever is the package maintainer for an
 upstream piece of code in any distribution:
 
 1) Completely understands the patches he is applying.
 
 and therefore:
 
 2) When he submits upstream, he doesn't have to wake up
2,000 patch submitters from the dead just to figure out
why a change was made.
 
 Neither seems to be occuring here.

Alexander Wirt is both the Debian package maintainer and (probably) the
author of both of the two discussed patches (and only a single person as
far as I know). I don't consider it rude to notify him of the comments
made about the patches he's been carrying for a long time.
While doing a couple of fixes to iproute I thought it would be best to
not only submit those but all the patches that has been carried in
debian for review upstream and possibly inclusion. The extra review
seemed to have been a success so far and I will fix up the patches
myself if I have to.

If you want me to STFU please just say so and I'll give you all my
sincere appologies and go back to the status quo of no fixes shared with
others.

-- 
Regards,
Andreas Henriksson

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: iproute2: resend of patches from Debian.

2007-10-11 Thread David Miller
From: Andreas Henriksson [EMAIL PROTECTED]
Date: Fri, 12 Oct 2007 00:35:07 +0200

 If you want me to STFU please just say so and I'll give you all my
 sincere appologies and go back to the status quo of no fixes shared
 with others.

I applaud your efforts, it's not what the problem is.

I just wonder, therefore, when Alexander planned on submitting
this sizable backlog of local iproute2 patches.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


bytes received from recvmsg doesn't match FIONREAD

2007-10-11 Thread Steven Dake
I wanted to verify that the size of a multicast UDP message received
with recvmsg matches the size of the message the kernel thinks the
message is.

So I went about using the FIONREAD ioctl as follows:

res = ioctl (fd, FIONCREAD, value);
assert (res != -1);
bytes_received = recvmsg (fd, msg_recv, MSG_NOSIGNAL | MSG_DONTWAIT);
assert (bytes_received == value);

It appears the value and bytes_received do not match after many
thousands of runs with a UDP multicast protocol (www.openais.org).
(neither is -1)

Am I using this ioctl improperly?  Shouldn't I expect that the full
datagram is returned by recvmsg no matter the state of the blocking or
nonblocking mode of the file descriptor?

pls copy me on responses - i am not onlist.

Regards
-steve

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: e100 problems in .23rc8 ?

2007-10-11 Thread Herbert Xu
On Thu, Oct 11, 2007 at 09:10:34AM -0700, Kok, Auke wrote:
 
  commit 416b5d10afdc797c21c457ade3714e8f2f75edd9
  Author: Auke Kok [EMAIL PROTECTED]
  Date:   Fri Jun 1 10:22:39 2007 -0700
 
  e1000: disable polling before registering netdevice
 
 this patch actually called napi_disable() in the probe routine which was 
 wrong,
 but e100 does not do that. Nonetheless e100 doesn't call netif_carrier_off() 
 and
 netif_stop_queue(), so to make e100 the same as e1000 we should probably do 
 this,
 see below.

Back then we didn't have napi_disable at all.  That patch calls
netif_poll_disable which has different semantics.

 Dave, can you see if this resolves the issue for you? If so then we might 
 want to
 push this to -stable.

The problem is with netif_poll so this patch probably won't help.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmVHI~} [EMAIL PROTECTED]
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Question on TSO maximum segment sizes.

2007-10-11 Thread Waskiewicz Jr, Peter P
I'm having an issue with TSO right vs. hardware that can't take the
maximum segment size sent from the stack.  I've been told that the
maximum packet size that can be sent to the hardware today is 64k, but
my hardware can only take 32k in certain modes per queue due to hardware
limitations.  I have two questions regarding this: 1) where is this
value set in the TCP code, and 2) Is this something that can be
configured on the fly?  If the answer to 2 is no, I will try and put
something together to allow this to happen.

Thanks in advance,
 
PJ Waskiewicz
Intel Corporation
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: bytes received from recvmsg doesn't match FIONREAD

2007-10-11 Thread David Miller
From: Steven Dake [EMAIL PROTECTED]
Date: Thu, 11 Oct 2007 16:08:15 -0700

 I wanted to verify that the size of a multicast UDP message received
 with recvmsg matches the size of the message the kernel thinks the
 message is.
 
 So I went about using the FIONREAD ioctl as follows:
 
 res = ioctl (fd, FIONCREAD, value);
   ^ (typo? should be FIONREAD not FION_C_READ)
 assert (res != -1);
 bytes_received = recvmsg (fd, msg_recv, MSG_NOSIGNAL | MSG_DONTWAIT);
 assert (bytes_received == value);

I think you want to use SIOCINQ, FIONREAD is only valid on files.

I'm surprised you didn't get an error return from ioctl() as the
VFS code seems to enforce this.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: bytes received from recvmsg doesn't match FIONREAD

2007-10-11 Thread David Stevens
Questions of this sort should normally be directed to linux-net mailing 
list.

From the code you quoted, I see at least one case where it
will fail -- when the allocated buffer you pass to recvmsg is smaller than
value (ie. the datagram is too big for the read buffer).

If that's not the problem, I suggest you reproduce it with a small test
case and post real code and results when it happens.

+-DLS 


-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Question on TSO maximum segment sizes.

2007-10-11 Thread David Miller
From: Waskiewicz Jr, Peter P [EMAIL PROTECTED]
Date: Thu, 11 Oct 2007 16:27:14 -0700

 I'm having an issue with TSO right vs. hardware that can't take the
 maximum segment size sent from the stack.  I've been told that the
 maximum packet size that can be sent to the hardware today is 64k, but
 my hardware can only take 32k in certain modes per queue due to hardware
 limitations.  I have two questions regarding this: 1) where is this
 value set in the TCP code, and 2) Is this something that can be
 configured on the fly?  If the answer to 2 is no, I will try and put
 something together to allow this to happen.

The TCP code just builds the maximum possible for the underlying
protocol, be it ipv4 or ipv6.  It takes the underlying protocol
maximum packet length, subtracts the amount of header space it
knows will be used, and uses that.

You'll need to use GSO sw segmentation to split the TSO packets
which are too big for your HW to handle.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Regression in net-2.6.24?

2007-10-11 Thread David Miller
From: TAKANO Ryousei [EMAIL PROTECTED]
Date: Thu, 11 Oct 2007 22:51:46 +0900 (JST)

 Modules linked in: 8021q tcp_bic netconsole evdev joydev sg st sr_mod 
 ohci_hcd i2c_amd756 i2c_amd8111 i2c_core ipv6 tg3 usbhid usbcore ff_memless 
 dm_mod ext3 jbd sata_sil libata sd_mod scsi_mod
 ...
 RIP points at __list_del (net_rx_action - list_move_tail - __list_del).

There is a contract between the driver's -poll() method and
net_rx_action() in that it is assumed that if the entire quota has
been used up, the driver will not perform a netif_rx_complete().

It seems that in a corner case the tg3 driver, which you appear to be
using, will not abide by this rule.  That corner case is when the card
has exactly budget receive packets pending.  In such a case
tg3_has_work() will be false, and we will RX complete when work_done
= budget, which violates the above mentioned rule.

Can you see if the following test patch makes the crash go away?

Michael, I know you're not pleased with this patch and neither am
I.  It might be better to just strictly RX complete when
(work  budget) and if tg3_has_work(), trigger a HW interrupt.

Alternatively we could loop in tg3_poll() until either budget
is exhausted or tg3_has_work() returns false.  Actually, this sounds
like a cleaner scheme the more I think about it.

BNX2 likely has a similar issue.

diff --git a/drivers/net/tg3.c b/drivers/net/tg3.c
index d2b30fb..8c27962 100644
--- a/drivers/net/tg3.c
+++ b/drivers/net/tg3.c
@@ -3599,7 +3599,7 @@ static int tg3_poll(struct napi_struct *napi, int budget)
sblk-status = ~SD_STATUS_UPDATED;
 
/* if no more work, tell net stack and NIC we're done */
-   if (!tg3_has_work(tp)) {
+   if ((work_done  budget)  !tg3_has_work(tp)) {
netif_rx_complete(netdev, napi);
tg3_restart_ints(tp);
}

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Non-linear SKBs

2007-10-11 Thread Kristian Evensen

Hello,

I have developed a small patch for the TCP code in 2.6.19 and it works 
flawlessly. A couple of days ago I decided to make it compatible with 
2.6.22.5 and have stumbled upon a problem I cannot solve.


In 2.6.19 it seems that all packets (at least the ones my patch work 
with) are linear, while they are non-linear in 2.6.22.5. I have searched 
through the code (focusing on tcp_sendmsg) to try to figure out what 
happens, but can't find any differences that would explain this. Does 
anyone know what might be the cause and if there is an easy way to 
return to linear skbs (unless that is totally stupid)? I would also like 
the benefits offered by the collapsing when retransmitting (which 
requires number of fragments to be 0).


Currently I us e the skb_linearize(skb) on all fragmentet packets, which 
is not nice :) Both kernels are vanilla kernels with the patch applied, 
and I used OpenSuse with 2.6.19 and Ubuntu with 2.6.22.5 (but I guess 
that shouldn't matter in this case).


The reason this is a problem is that I copy data between SKBs, and in 
2.6.19 I have used memcpy for this purpose. I have looked at the way the 
kernel copies data into a non-linear skb in the else-part of the large 
if-test in tcp_sendmsg and the skb_copy_bits-function, but I have to 
admit that I think the first one is a bit advanced and I don't quite 
understand how to use the last one. Does anyone have any easier to 
understand examples or explanations on how to copy data from one 
non-linear skb to another?


Thanks.

-Kristian
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: iproute2: resend of patches from Debian.

2007-10-11 Thread Stephen Hemminger
On Thu, 11 Oct 2007 14:48:43 -0700 (PDT)
David Miller [EMAIL PROTECTED] wrote:

 From: Andreas Henriksson [EMAIL PROTECTED]
 Date: Thu, 11 Oct 2007 23:22:12 +0200
 
  Copying Alexander Wirt here again, as I think this is his patch.
 
 It is important that whoever is the package maintainer for an
 upstream piece of code in any distribution:
 
 1) Completely understands the patches he is applying.
 
 and therefore:
 
 2) When he submits upstream, he doesn't have to wake up
2,000 patch submitters from the dead just to figure out
why a change was made.
 
 Neither seems to be occuring here.

Also, I want the correct author name for legal and technical reasons.

-- 
Stephen Hemminger [EMAIL PROTECTED]
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Question on TSO maximum segment sizes.

2007-10-11 Thread Rick Jones

David Miller wrote:

From: Waskiewicz Jr, Peter P [EMAIL PROTECTED]
Date: Thu, 11 Oct 2007 16:27:14 -0700



I'm having an issue with TSO right vs. hardware that can't take the
maximum segment size sent from the stack.  I've been told that the
maximum packet size that can be sent to the hardware today is 64k, but
my hardware can only take 32k in certain modes per queue due to hardware
limitations. 


Bletch.


I have two questions regarding this: 1) where is this
value set in the TCP code, and 2) Is this something that can be
configured on the fly?  If the answer to 2 is no, I will try and put
something together to allow this to happen.



The TCP code just builds the maximum possible for the underlying
protocol, be it ipv4 or ipv6.  It takes the underlying protocol
maximum packet length, subtracts the amount of header space it
knows will be used, and uses that.

You'll need to use GSO sw segmentation to split the TSO packets
which are too big for your HW to handle.


For just messing about, might it be possible to tweak the socket buffer sizes 
and tcp_tso_win_divisor to kludge things for a short while?  Couldn't ship that 
way certainly, but assuming Peter's going to get his broken hardware fixed it 
might let him limp along until then.


rick jones
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Regression in net-2.6.24?

2007-10-11 Thread Stephen Hemminger
On Thu, 11 Oct 2007 16:48:27 -0700 (PDT)
David Miller [EMAIL PROTECTED] wrote:

 From: TAKANO Ryousei [EMAIL PROTECTED]
 Date: Thu, 11 Oct 2007 22:51:46 +0900 (JST)
 
  Modules linked in: 8021q tcp_bic netconsole evdev joydev sg st sr_mod 
  ohci_hcd i2c_amd756 i2c_amd8111 i2c_core ipv6 tg3 usbhid usbcore ff_memless 
  dm_mod ext3 jbd sata_sil libata sd_mod scsi_mod
  ...
  RIP points at __list_del (net_rx_action - list_move_tail - __list_del).
 
 There is a contract between the driver's -poll() method and
 net_rx_action() in that it is assumed that if the entire quota has
 been used up, the driver will not perform a netif_rx_complete().
 
 It seems that in a corner case the tg3 driver, which you appear to be
 using, will not abide by this rule.  That corner case is when the card
 has exactly budget receive packets pending.  In such a case
 tg3_has_work() will be false, and we will RX complete when work_done
 = budget, which violates the above mentioned rule.
 
 Can you see if the following test patch makes the crash go away?
 
 Michael, I know you're not pleased with this patch and neither am
 I.  It might be better to just strictly RX complete when
 (work  budget) and if tg3_has_work(), trigger a HW interrupt.
 
 Alternatively we could loop in tg3_poll() until either budget
 is exhausted or tg3_has_work() returns false.  Actually, this sounds
 like a cleaner scheme the more I think about it.
 
 BNX2 likely has a similar issue.

sky2 as well.


-- 
Stephen Hemminger [EMAIL PROTECTED]
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Non-linear SKBs

2007-10-11 Thread David Miller
From: Kristian Evensen [EMAIL PROTECTED]
Date: Fri, 12 Oct 2007 00:54:37 +0200

 I have developed a small patch for the TCP code in 2.6.19 and it works 
 flawlessly. A couple of days ago I decided to make it compatible with 
 2.6.22.5 and have stumbled upon a problem I cannot solve.
 
 In 2.6.19 it seems that all packets (at least the ones my patch work 
 with) are linear, while they are non-linear in 2.6.22.5. I have searched 
 through the code (focusing on tcp_sendmsg) to try to figure out what 
 happens, but can't find any differences that would explain this. Does 
 anyone know what might be the cause and if there is an easy way to 
 return to linear skbs (unless that is totally stupid)? I would also like 
 the benefits offered by the collapsing when retransmitting (which 
 requires number of fragments to be 0).

If the underlying device can do scatter-gather and checksumming,
the TCP code builds outgoing packets by copying user date into
full system pages, and then attaching those pages into the SKB.
The protocol headers sit under the skb-data linear area, and
the user data mostly sits in the user pages under
skb_shinfo(skb)-frags[]

This increases the density of data packed into the memory allocated
compared to using skb-data for it.  It also enormously simplifies
the logic necessary to support TSO.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Question on TSO maximum segment sizes.

2007-10-11 Thread David Miller
From: Rick Jones [EMAIL PROTECTED]
Date: Thu, 11 Oct 2007 16:50:46 -0700

 For just messing about, might it be possible to tweak the socket buffer sizes 
 and tcp_tso_win_divisor to kludge things for a short while?  Couldn't ship 
 that 
 way certainly, but assuming Peter's going to get his broken hardware fixed it 
 might let him limp along until then.

TCP dynamically grows the socket buffer sizes unless the application
explicitly sets them via setsockopt() and the limits imposed in those
cases are controlled by tcp_{,r,w}mem[] sysctls.  Decreasing those
will kill performance exactly for the cases this person cares about.

No, the only way to deal with this is to GSO segment incoming frames
inside of the driver when they exceed the HW limits.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: Question on TSO maximum segment sizes.

2007-10-11 Thread Waskiewicz Jr, Peter P
 From: Rick Jones [EMAIL PROTECTED]
 Date: Thu, 11 Oct 2007 16:50:46 -0700
 
  For just messing about, might it be possible to tweak the socket 
  buffer sizes and tcp_tso_win_divisor to kludge things for a short 
  while?  Couldn't ship that way certainly, but assuming 
 Peter's going 
  to get his broken hardware fixed it might let him limp 
 along until then.
 
 TCP dynamically grows the socket buffer sizes unless the 
 application explicitly sets them via setsockopt() and the 
 limits imposed in those cases are controlled by 
 tcp_{,r,w}mem[] sysctls.  Decreasing those will kill 
 performance exactly for the cases this person cares about.
 
 No, the only way to deal with this is to GSO segment incoming 
 frames inside of the driver when they exceed the HW limits.

Thanks Dave and Rick.  I'll mess with this and make my driver happy
again.

Cheers,
-PJ Waskiewicz
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Regression in net-2.6.24?

2007-10-11 Thread David Miller
From: Stephen Hemminger [EMAIL PROTECTED]
Date: Thu, 11 Oct 2007 16:55:48 -0700

 On Thu, 11 Oct 2007 16:48:27 -0700 (PDT)
 David Miller [EMAIL PROTECTED] wrote:
 
  Alternatively we could loop in tg3_poll() until either budget
  is exhausted or tg3_has_work() returns false.  Actually, this sounds
  like a cleaner scheme the more I think about it.
  
  BNX2 likely has a similar issue.
 
 sky2 as well.

Thanks for the heads up Stephen.

Here is a patch that implements the looping idea in tg3, bnx2, and
sky2.

diff --git a/drivers/net/bnx2.c b/drivers/net/bnx2.c
index bbfbdaf..b015d52 100644
--- a/drivers/net/bnx2.c
+++ b/drivers/net/bnx2.c
@@ -2633,15 +2633,11 @@ bnx2_has_work(struct bnx2 *bp)
return 0;
 }
 
-static int
-bnx2_poll(struct napi_struct *napi, int budget)
+static int bnx2_poll_work(struct bnx2 *bp, int work_done, int budget)
 {
-   struct bnx2 *bp = container_of(napi, struct bnx2, napi);
-   struct net_device *dev = bp-dev;
struct status_block *sblk = bp-status_blk;
u32 status_attn_bits = sblk-status_attn_bits;
u32 status_attn_bits_ack = sblk-status_attn_bits_ack;
-   int work_done = 0;
 
if ((status_attn_bits  STATUS_ATTN_EVENTS) !=
(status_attn_bits_ack  STATUS_ATTN_EVENTS)) {
@@ -2660,27 +2656,43 @@ bnx2_poll(struct napi_struct *napi, int budget)
bnx2_tx_int(bp);
 
if (bp-status_blk-status_rx_quick_consumer_index0 != bp-hw_rx_cons)
-   work_done = bnx2_rx_int(bp, budget);
+   work_done += bnx2_rx_int(bp, budget - work_done);
 
-   bp-last_status_idx = bp-status_blk-status_idx;
-   rmb();
+   return work_done;
+}
+
+static int bnx2_poll(struct napi_struct *napi, int budget)
+{
+   struct bnx2 *bp = container_of(napi, struct bnx2, napi);
+   int work_done = 0;
+
+   while (1) {
+   work_done += bnx2_poll_work(bp, work_done, budget);
 
-   if (!bnx2_has_work(bp)) {
-   netif_rx_complete(dev, napi);
-   if (likely(bp-flags  USING_MSI_FLAG)) {
+   if (unlikely(work_done = budget))
+   break;
+
+   if (likely(!bnx2_has_work(bp))) {
+   bp-last_status_idx = bp-status_blk-status_idx;
+   rmb();
+
+   netif_rx_complete(bp-dev, napi);
+   if (likely(bp-flags  USING_MSI_FLAG)) {
+   REG_WR(bp, BNX2_PCICFG_INT_ACK_CMD,
+  BNX2_PCICFG_INT_ACK_CMD_INDEX_VALID |
+  bp-last_status_idx);
+   return 0;
+   }
REG_WR(bp, BNX2_PCICFG_INT_ACK_CMD,
   BNX2_PCICFG_INT_ACK_CMD_INDEX_VALID |
+  BNX2_PCICFG_INT_ACK_CMD_MASK_INT |
   bp-last_status_idx);
-   return 0;
-   }
-   REG_WR(bp, BNX2_PCICFG_INT_ACK_CMD,
-  BNX2_PCICFG_INT_ACK_CMD_INDEX_VALID |
-  BNX2_PCICFG_INT_ACK_CMD_MASK_INT |
-  bp-last_status_idx);
 
-   REG_WR(bp, BNX2_PCICFG_INT_ACK_CMD,
-  BNX2_PCICFG_INT_ACK_CMD_INDEX_VALID |
-  bp-last_status_idx);
+   REG_WR(bp, BNX2_PCICFG_INT_ACK_CMD,
+  BNX2_PCICFG_INT_ACK_CMD_INDEX_VALID |
+  bp-last_status_idx);
+   break;
+   }
}
 
return work_done;
diff --git a/drivers/net/sky2.c b/drivers/net/sky2.c
index fe0e756..25da238 100644
--- a/drivers/net/sky2.c
+++ b/drivers/net/sky2.c
@@ -2605,33 +2605,41 @@ static void sky2_err_intr(struct sky2_hw *hw, u32 
status)
 static int sky2_poll(struct napi_struct *napi, int work_limit)
 {
struct sky2_hw *hw = container_of(napi, struct sky2_hw, napi);
-   u32 status = sky2_read32(hw, B0_Y2_SP_EISR);
-   int work_done;
+   int work_done = 0;
 
-   if (unlikely(status  Y2_IS_ERROR))
-   sky2_err_intr(hw, status);
+   while (1) {
+   u32 status = sky2_read32(hw, B0_Y2_SP_EISR);
 
-   if (status  Y2_IS_IRQ_PHY1)
-   sky2_phy_intr(hw, 0);
+   if (unlikely(status  Y2_IS_ERROR))
+   sky2_err_intr(hw, status);
 
-   if (status  Y2_IS_IRQ_PHY2)
-   sky2_phy_intr(hw, 1);
+   if (status  Y2_IS_IRQ_PHY1)
+   sky2_phy_intr(hw, 0);
 
-   work_done = sky2_status_intr(hw, work_limit);
+   if (status  Y2_IS_IRQ_PHY2)
+   sky2_phy_intr(hw, 1);
 
-   /* More work? */
-   if (hw-st_idx == sky2_read16(hw, STAT_PUT_IDX)) {
-   /* Bug/Errata workaround?
-* Need to kick the TX irq moderation timer.
-*/
-   if (sky2_read8(hw, 

Re: Question on TSO maximum segment sizes.

2007-10-11 Thread Ben Greear

Waskiewicz Jr, Peter P wrote:

From: Rick Jones [EMAIL PROTECTED]
Date: Thu, 11 Oct 2007 16:50:46 -0700

For just messing about, might it be possible to tweak the socket 
buffer sizes and tcp_tso_win_divisor to kludge things for a short 
while?  Couldn't ship that way certainly, but assuming 
Peter's going 
to get his broken hardware fixed it might let him limp 

along until then.

TCP dynamically grows the socket buffer sizes unless the 
application explicitly sets them via setsockopt() and the 
limits imposed in those cases are controlled by 
tcp_{,r,w}mem[] sysctls.  Decreasing those will kill 
performance exactly for the cases this person cares about.


I just tried turning off my explicit SO_SNDBUF/SO_RCVBUG settings in my app,
and the connection ran very poorly through a link with even a small
bit of latency (~2-4ms I believe).

It ran near gige line speed through a cross-over cable.

I have the sysctl max values set very generous, though the min
and default are fairly small.

This was with kernel 2.6.20.

Was the auto-tuning put in after 2.6.20?  If not, has this
been tested through a higher latency link?  Or, am I confused
and you are talking about some other setsockopt?

Thanks,
Ben


--
Ben Greear [EMAIL PROTECTED]
Candela Technologies Inc  http://www.candelatech.com

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Question on TSO maximum segment sizes.

2007-10-11 Thread David Miller
From: Ben Greear [EMAIL PROTECTED]
Date: Thu, 11 Oct 2007 17:17:48 -0700

 Was the auto-tuning put in after 2.6.20?  If not, has this
 been tested through a higher latency link?  Or, am I confused
 and you are talking about some other setsockopt?

It's been there for quite some time.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] rtnl: Simplify ASSERT_RTNL

2007-10-11 Thread David Miller
From: [EMAIL PROTECTED] (Eric W. Biederman)
Date: Thu, 11 Oct 2007 10:33:39 -0600

 Having ASSERT_RTNL warn if you were sleeping does not seem
 intuitive from the name.
 
 This instance of convoluted locking seems like a complete
 one off to me, and if it will warn about other constructs
 currently in the kernel it seems wrong.

RTNL is a semaphore, therefore it sleeps.

Therefore anything that requires RTNL is held can also assume that it
can do things like GFP_KERNEL allocations and other sleeping actions.

This is why any code path that runs with spinlocks held or interrupts
disabled, and hits an RTNL assertion, is buggy.  It is the chain of
dependencies of what is allowed in such contexts.

ASSERT_RTNL();
page = alloc_page(GFP_KERNEL);

That sequence above is absolutely always legal.

The might_sleep() check is just letting us know the problem exists.

If spinlocks or interrupt disabling is needed to implement something
deeper in the call chain, that's fine, it just cannot call back into
the RTNL asserted domain with those spinlocks held or interrupts
disabled.

If the mac_vlan driver, or whichever one has the problem, does things
like this it must be fixed and putting or not putting a proper
might_sleep() check here doesn't change that.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/7] [TCP]: Add bytes_acked (ABC) clearing to FRTO too

2007-10-11 Thread David Miller
From: Ilpo_Järvinen [EMAIL PROTECTED]
Date: Thu, 11 Oct 2007 14:41:01 +0300

 I was reading tcp_enter_loss while looking for Cedric's bug and
 noticed bytes_acked adjustment is missing from FRTO side.
 
 Since bytes_acked will only be used in tcp_cong_avoid, I think
 it's safe to assume RTO would be spurious. During FRTO cwnd
 will be not controlled by tcp_cong_avoid and if FRTO calls for
 conventional recovery, cwnd is adjusted and the result of wrong
 assumption is cleared from bytes_acked. If RTO was in fact
 spurious, we did normal ABC already and can continue without
 any additional adjustments.
 
 Signed-off-by: Ilpo Järvinen [EMAIL PROTECTED]

Applied.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Regression in net-2.6.24?

2007-10-11 Thread Stephen Hemminger
On Thu, 11 Oct 2007 17:17:43 -0700 (PDT)
David Miller [EMAIL PROTECTED] wrote:

 From: Stephen Hemminger [EMAIL PROTECTED]
 Date: Thu, 11 Oct 2007 16:55:48 -0700
 
  On Thu, 11 Oct 2007 16:48:27 -0700 (PDT)
  David Miller [EMAIL PROTECTED] wrote:
  
   Alternatively we could loop in tg3_poll() until either budget
   is exhausted or tg3_has_work() returns false.  Actually, this sounds
   like a cleaner scheme the more I think about it.
   
   BNX2 likely has a similar issue.
  
  sky2 as well.
 
 Thanks for the heads up Stephen.
 
 Here is a patch that implements the looping idea in tg3, bnx2, and
 sky2.
 
 diff --git a/drivers/net/bnx2.c b/drivers/net/bnx2.c
 index bbfbdaf..b015d52 100644
 --- a/drivers/net/bnx2.c
 +++ b/drivers/net/bnx2.c
 @@ -2633,15 +2633,11 @@ bnx2_has_work(struct bnx2 *bp)
   return 0;
  }
  
 -static int
 -bnx2_poll(struct napi_struct *napi, int budget)
 +static int bnx2_poll_work(struct bnx2 *bp, int work_done, int budget)
  {
 - struct bnx2 *bp = container_of(napi, struct bnx2, napi);
 - struct net_device *dev = bp-dev;
   struct status_block *sblk = bp-status_blk;
   u32 status_attn_bits = sblk-status_attn_bits;
   u32 status_attn_bits_ack = sblk-status_attn_bits_ack;
 - int work_done = 0;
  
   if ((status_attn_bits  STATUS_ATTN_EVENTS) !=
   (status_attn_bits_ack  STATUS_ATTN_EVENTS)) {
 @@ -2660,27 +2656,43 @@ bnx2_poll(struct napi_struct *napi, int budget)
   bnx2_tx_int(bp);
  
   if (bp-status_blk-status_rx_quick_consumer_index0 != bp-hw_rx_cons)
 - work_done = bnx2_rx_int(bp, budget);
 + work_done += bnx2_rx_int(bp, budget - work_done);
  
 - bp-last_status_idx = bp-status_blk-status_idx;
 - rmb();
 + return work_done;
 +}
 +
 +static int bnx2_poll(struct napi_struct *napi, int budget)
 +{
 + struct bnx2 *bp = container_of(napi, struct bnx2, napi);
 + int work_done = 0;
 +
 + while (1) {
 + work_done += bnx2_poll_work(bp, work_done, budget);
  
 - if (!bnx2_has_work(bp)) {
 - netif_rx_complete(dev, napi);
 - if (likely(bp-flags  USING_MSI_FLAG)) {
 + if (unlikely(work_done = budget))
 + break;
 +
 + if (likely(!bnx2_has_work(bp))) {
 + bp-last_status_idx = bp-status_blk-status_idx;
 + rmb();
 +
 + netif_rx_complete(bp-dev, napi);
 + if (likely(bp-flags  USING_MSI_FLAG)) {
 + REG_WR(bp, BNX2_PCICFG_INT_ACK_CMD,
 +BNX2_PCICFG_INT_ACK_CMD_INDEX_VALID |
 +bp-last_status_idx);
 + return 0;
 + }
   REG_WR(bp, BNX2_PCICFG_INT_ACK_CMD,
  BNX2_PCICFG_INT_ACK_CMD_INDEX_VALID |
 +BNX2_PCICFG_INT_ACK_CMD_MASK_INT |
  bp-last_status_idx);
 - return 0;
 - }
 - REG_WR(bp, BNX2_PCICFG_INT_ACK_CMD,
 -BNX2_PCICFG_INT_ACK_CMD_INDEX_VALID |
 -BNX2_PCICFG_INT_ACK_CMD_MASK_INT |
 -bp-last_status_idx);
  
 - REG_WR(bp, BNX2_PCICFG_INT_ACK_CMD,
 -BNX2_PCICFG_INT_ACK_CMD_INDEX_VALID |
 -bp-last_status_idx);
 + REG_WR(bp, BNX2_PCICFG_INT_ACK_CMD,
 +BNX2_PCICFG_INT_ACK_CMD_INDEX_VALID |
 +bp-last_status_idx);
 + break;
 + }
   }
  
   return work_done;
 diff --git a/drivers/net/sky2.c b/drivers/net/sky2.c
 index fe0e756..25da238 100644
 --- a/drivers/net/sky2.c
 +++ b/drivers/net/sky2.c
 @@ -2605,33 +2605,41 @@ static void sky2_err_intr(struct sky2_hw *hw, u32 
 status)
  static int sky2_poll(struct napi_struct *napi, int work_limit)
  {
   struct sky2_hw *hw = container_of(napi, struct sky2_hw, napi);
 - u32 status = sky2_read32(hw, B0_Y2_SP_EISR);
 - int work_done;
 + int work_done = 0;
  
 - if (unlikely(status  Y2_IS_ERROR))
 - sky2_err_intr(hw, status);
 + while (1) {
 + u32 status = sky2_read32(hw, B0_Y2_SP_EISR);
  
 - if (status  Y2_IS_IRQ_PHY1)
 - sky2_phy_intr(hw, 0);
 + if (unlikely(status  Y2_IS_ERROR))
 + sky2_err_intr(hw, status);
  
 - if (status  Y2_IS_IRQ_PHY2)
 - sky2_phy_intr(hw, 1);
 + if (status  Y2_IS_IRQ_PHY1)
 + sky2_phy_intr(hw, 0);
  
 - work_done = sky2_status_intr(hw, work_limit);
 + if (status  Y2_IS_IRQ_PHY2)
 + sky2_phy_intr(hw, 1);
  
 - /* More work? */
 - if (hw-st_idx == sky2_read16(hw, STAT_PUT_IDX)) {
 - /* Bug/Errata workaround?
 -  * Need to kick the TX irq moderation timer.
 

Re: [PATCH 2/7] [TCP]: Fix mark_head_lost to ignore R-bit when trying to mark L

2007-10-11 Thread David Miller
From: Ilpo_Järvinen [EMAIL PROTECTED]
Date: Thu, 11 Oct 2007 14:41:02 +0300

 This condition (plain R) can arise at least in recovery that
 is triggered after tcp_undo_loss. There isn't any reason why
 they should not be marked as lost, not marking makes in_flight
 estimator to return too large values.
 
 Signed-off-by: Ilpo Järvinen [EMAIL PROTECTED]

Nice observation, applied, thanks Ilpo.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/7] [TCP]: Kill almost unused variable pcount from sacktag

2007-10-11 Thread David Miller
From: Ilpo_Järvinen [EMAIL PROTECTED]
Date: Thu, 11 Oct 2007 14:41:03 +0300

 It's on the way for future cutting of that function.
 
 Signed-off-by: Ilpo Järvinen [EMAIL PROTECTED]

Applied.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/7] [TCP]: Extract tcp_match_queue_to_sack from sacktag code

2007-10-11 Thread David Miller
From: Ilpo_Järvinen [EMAIL PROTECTED]
Date: Thu, 11 Oct 2007 14:41:04 +0300

 This is necessary for upcoming DSACK bugfix. Reduces sacktag
 length which is not very sad thing at all... :-)
 
 Notice that there's a need to handle out-of-mem at caller's
 place.
 
 Signed-off-by: Ilpo Järvinen [EMAIL PROTECTED]

Applied.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 5/7] [TCP]: No need to re-count fackets_out/sacked_out at RTO

2007-10-11 Thread David Miller
From: Ilpo_Järvinen [EMAIL PROTECTED]
Date: Thu, 11 Oct 2007 14:41:05 +0300

 Both sacked_out and fackets_out are directly known from how
 parameter. Since fackets_out is accurate, there's no need for
 recounting (sacked_out was previously unnecessarily counted
 in the loop anyway).
 
 Signed-off-by: Ilpo Järvinen [EMAIL PROTECTED]

Applied.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 6/7] [TCP]: Fix lost_retrans loop vs fastpath problems

2007-10-11 Thread David Miller
From: Ilpo_Järvinen [EMAIL PROTECTED]
Date: Thu, 11 Oct 2007 14:41:06 +0300

 Detection implemented with lost_retrans must work also when
 fastpath is taken, yet most of the queue is skipped including
 (very likely) those retransmitted skb's we're interested in.
 This problem appeared when the hints got added, which removed
 a need to always walk over the whole write queue head.
 Therefore decicion for the lost_retrans worker loop entry must
 be separated from the sacktag processing more than it was
 necessary before.
 
 It turns out to be problematic to optimize the worker loop
 very heavily because ack_seqs of skb may have a number of
 discontinuity points. Maybe similar approach as currently is
 implemented could be attempted but that's becoming more and
 more complex because the trend is towards less skb walking
 in sacktag marker. Trying a simple work until all rexmitted
 skbs heve been processed approach.
 
 Maybe after(highest_sack_end_seq, tp-high_seq) checking is not
 sufficiently accurate and causes entry too often in no-work-to-do
 cases. Since that's not known, I've separated solution to that
 from this patch.
 
 Noticed because of report against a related problem from TAKANO
 Ryousei [EMAIL PROTECTED]. He also provided a patch to
 that part of the problem. This patch includes solution to it
 (though this patch has to use somewhat different placement).
 TAKANO's description and patch is available here:
 
   http://marc.info/?l=linux-netdevm=119149311913288w=2
 
 ...In short, TAKANO's problem is that end_seq the loop is using
 not necessarily the largest SACK block's end_seq because the
 current ACK may still have higher SACK blocks which are later
 by the loop.
 
 Cc: TAKANO Ryousei [EMAIL PROTECTED]
 Signed-off-by: Ilpo Järvinen [EMAIL PROTECTED]

Applied.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 7/7] [TCP]: Limit processing lost_retrans loop to work-to-do cases

2007-10-11 Thread David Miller
From: Ilpo_Järvinen [EMAIL PROTECTED]
Date: Thu, 11 Oct 2007 14:41:07 +0300

 This addition of lost_retrans_low to tcp_sock might be
 unnecessary, it's not clear how often lost_retrans worker is
 executed when there wasn't work to do.
 
 Cc: TAKANO Ryousei [EMAIL PROTECTED]
 Signed-off-by: Ilpo Järvinen [EMAIL PROTECTED]

Applied.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-2.6 0/7] TCP: small changes/fixes + lost_retrans brokeness fix

2007-10-11 Thread David Miller
From: Ilpo_Järvinen [EMAIL PROTECTED]
Date: Thu, 11 Oct 2007 14:41:00 +0300

 Dave, please apply at will.

Done.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Regression in net-2.6.24?

2007-10-11 Thread David Miller
From: Stephen Hemminger [EMAIL PROTECTED]
Date: Thu, 11 Oct 2007 17:31:49 -0700

 You don't need to re-read the status register and process the PHY irq's 
 inside loop.
 Try this:

Are you sure?  What if a PHY interrupt comes in during the loop?

I'm just preserving the semantics of the driver when -poll()
is invoked multiple times per interrupt.

And I think preserving that makes sense while we're purely
trying to fix this bug, so we don't add some new ones.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Regression in net-2.6.24?

2007-10-11 Thread Stephen Hemminger
On Thu, 11 Oct 2007 17:40:26 -0700 (PDT)
David Miller [EMAIL PROTECTED] wrote:

 From: Stephen Hemminger [EMAIL PROTECTED]
 Date: Thu, 11 Oct 2007 17:31:49 -0700
 
  You don't need to re-read the status register and process the PHY irq's 
  inside loop.
  Try this:
 
 Are you sure?  What if a PHY interrupt comes in during the loop?

The interrupt is level triggered, and will rearm.

 
 I'm just preserving the semantics of the driver when -poll()
 is invoked multiple times per interrupt.
 
 And I think preserving that makes sense while we're purely
 trying to fix this bug, so we don't add some new ones.


-- 
Stephen Hemminger [EMAIL PROTECTED]
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


APE changes cause tg3 regressions...

2007-10-11 Thread David Miller

On my onboard SunBlade1500 5703 chips, which lack firmware, the APE
changes introduce bus timeouts while bringing the chip up,
specifically I get crashes in tg3_write_sig_post_reset() such as:

[   36.066603] eth1: Tigon3 [partno(BCM95703A30U) rev 1002 PHY(5703)] 
(PCI:33MHz:64-bit) 10/100/1000Base-T Ethernet 00:10:18:04:13:02
[   36.076040] eth1: RXcsums[1] LinkChgREG[0] MIirq[0] ASF[0] WireSpeed[1] 
TSOcap[1]
[   36.085393] eth1: dma_rwctrl[763f] dma_mask[32-bit]
[   36.095909] PCI: Enabling device: (:00:08.0), cmd 3
[   37.541359] tg3: eth1: No firmware running.
[   37.588662] ERROR(0): Cheetah error trap taken afsr[4000] 
afar[4210] TL1(0)
[   37.598106] ERROR(0): TPC[1004a700] TNPC[1004a704] O7[42f488] 
TSTATE[80009602]
[   37.607555] ERROR(0): TPCtg3_write_sig_post_reset+0x68/0x78 [tg3]
[   37.617040] ERROR(0): M_SYND(0),  E_SYND(0)
[   37.626422] ERROR(0): Highest priority error (4000) Error due 
to unsupported store
[   37.636046] ERROR(0): AFAR E-syndrome [???]
[   37.645619] ERROR(0): D-cache idx[0] tag[] 
utag[] stag[]
[   37.655394] ERROR(0): D-cache data0[] 
data1[] data2[] data3[]
[   37.665254] ERROR(0): I-cache idx[0] tag[] 
utag[] stag[] u[] 
l[]
[   37.675194] ERROR(0): I-cache INSN0[] 
INSN1[] INSN2[] INSN3[]
[   37.685138] ERROR(0): I-cache INSN4[] 
INSN5[] INSN6[] INSN7[]
[   37.695069] ERROR(0): E-cache idx[4200] tag[02000fb1]
[   37.704920] ERROR(0): E-cache data0[b0102000a6922000] 
data1[02800057a004c010] data2[9410001192100019] data3[a2042250b0100013]
[   37.714978] /[EMAIL PROTECTED],70: Safari/JBUS interrupt, UNMAPPED 
error, interrogating IOMMUs.

My initial suspicion is that TG3_FLG3_ENABLE_APE isn't being set right
on these firmware-less cards, or some APE programming is not being
guarded properly with tests on that feature bit.

Please fix this, thanks!
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Regression in net-2.6.24?

2007-10-11 Thread David Miller
From: Stephen Hemminger [EMAIL PROTECTED]
Date: Thu, 11 Oct 2007 17:50:59 -0700

 On Thu, 11 Oct 2007 17:40:26 -0700 (PDT)
 David Miller [EMAIL PROTECTED] wrote:
 
  From: Stephen Hemminger [EMAIL PROTECTED]
  Date: Thu, 11 Oct 2007 17:31:49 -0700
  
   You don't need to re-read the status register and process the PHY irq's 
   inside loop.
   Try this:
  
  Are you sure?  What if a PHY interrupt comes in during the loop?
 
 The interrupt is level triggered, and will rearm.

Fair enough.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Regression in net-2.6.24?

2007-10-11 Thread David Miller
From: David Miller [EMAIL PROTECTED]
Date: Thu, 11 Oct 2007 18:00:31 -0700 (PDT)

 From: Stephen Hemminger [EMAIL PROTECTED]
 Date: Thu, 11 Oct 2007 17:50:59 -0700
 
  On Thu, 11 Oct 2007 17:40:26 -0700 (PDT)
  David Miller [EMAIL PROTECTED] wrote:
  
   From: Stephen Hemminger [EMAIL PROTECTED]
   Date: Thu, 11 Oct 2007 17:31:49 -0700
   
You don't need to re-read the status register and process the PHY irq's 
inside loop.
Try this:
   
   Are you sure?  What if a PHY interrupt comes in during the loop?
  
  The interrupt is level triggered, and will rearm.
 
 Fair enough.

Actually, your change isn't right for another reason.

You missed the necessary budget reducing logic that I used
in the original changes.  You need to adjust work_limit
like this:

work_done += sky2_status_intr(hw, work_limit - work_done);

Otherwise if you just pass plain work_limit, and we do loop, the
driver uses more quota than it really should.

And I've made that above correction to your patch in my tree.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Regression in net-2.6.24?

2007-10-11 Thread David Miller

Here is what I'm checking into net-2.6 for now:

commit 6f535763165331bb91277d7519b507fed22034e5
Author: David S. Miller [EMAIL PROTECTED]
Date:   Thu Oct 11 18:08:29 2007 -0700

[NET]: Fix NAPI completion handling in some drivers.

In order for the list handling in net_rx_action() to be
correct, drivers must follow certain rules as stated by
this comment in net_rx_action():

/* Drivers must not modify the NAPI state if they
 * consume the entire weight.  In such cases this code
 * still owns the NAPI instance and therefore can
 * move the instance around on the list at-will.
 */

A few drivers do not do this because they mix the budget checks
with reading hardware state, resulting in crashes like the one
reported by [EMAIL PROTECTED]

BNX2 and TG3 are taken care of here, SKY2 fix is from Stephen
Hemminger.

Signed-off-by: David S. Miller [EMAIL PROTECTED]

diff --git a/drivers/net/bnx2.c b/drivers/net/bnx2.c
index bbfbdaf..d68acce 100644
--- a/drivers/net/bnx2.c
+++ b/drivers/net/bnx2.c
@@ -2633,15 +2633,11 @@ bnx2_has_work(struct bnx2 *bp)
return 0;
 }
 
-static int
-bnx2_poll(struct napi_struct *napi, int budget)
+static int bnx2_poll_work(struct bnx2 *bp, int work_done, int budget)
 {
-   struct bnx2 *bp = container_of(napi, struct bnx2, napi);
-   struct net_device *dev = bp-dev;
struct status_block *sblk = bp-status_blk;
u32 status_attn_bits = sblk-status_attn_bits;
u32 status_attn_bits_ack = sblk-status_attn_bits_ack;
-   int work_done = 0;
 
if ((status_attn_bits  STATUS_ATTN_EVENTS) !=
(status_attn_bits_ack  STATUS_ATTN_EVENTS)) {
@@ -2660,27 +2656,43 @@ bnx2_poll(struct napi_struct *napi, int budget)
bnx2_tx_int(bp);
 
if (bp-status_blk-status_rx_quick_consumer_index0 != bp-hw_rx_cons)
-   work_done = bnx2_rx_int(bp, budget);
+   work_done += bnx2_rx_int(bp, budget - work_done);
 
-   bp-last_status_idx = bp-status_blk-status_idx;
-   rmb();
+   return work_done;
+}
+
+static int bnx2_poll(struct napi_struct *napi, int budget)
+{
+   struct bnx2 *bp = container_of(napi, struct bnx2, napi);
+   int work_done = 0;
+
+   while (1) {
+   work_done = bnx2_poll_work(bp, work_done, budget);
 
-   if (!bnx2_has_work(bp)) {
-   netif_rx_complete(dev, napi);
-   if (likely(bp-flags  USING_MSI_FLAG)) {
+   if (unlikely(work_done = budget))
+   break;
+
+   if (likely(!bnx2_has_work(bp))) {
+   bp-last_status_idx = bp-status_blk-status_idx;
+   rmb();
+
+   netif_rx_complete(bp-dev, napi);
+   if (likely(bp-flags  USING_MSI_FLAG)) {
+   REG_WR(bp, BNX2_PCICFG_INT_ACK_CMD,
+  BNX2_PCICFG_INT_ACK_CMD_INDEX_VALID |
+  bp-last_status_idx);
+   return 0;
+   }
REG_WR(bp, BNX2_PCICFG_INT_ACK_CMD,
   BNX2_PCICFG_INT_ACK_CMD_INDEX_VALID |
+  BNX2_PCICFG_INT_ACK_CMD_MASK_INT |
   bp-last_status_idx);
-   return 0;
-   }
-   REG_WR(bp, BNX2_PCICFG_INT_ACK_CMD,
-  BNX2_PCICFG_INT_ACK_CMD_INDEX_VALID |
-  BNX2_PCICFG_INT_ACK_CMD_MASK_INT |
-  bp-last_status_idx);
 
-   REG_WR(bp, BNX2_PCICFG_INT_ACK_CMD,
-  BNX2_PCICFG_INT_ACK_CMD_INDEX_VALID |
-  bp-last_status_idx);
+   REG_WR(bp, BNX2_PCICFG_INT_ACK_CMD,
+  BNX2_PCICFG_INT_ACK_CMD_INDEX_VALID |
+  bp-last_status_idx);
+   break;
+   }
}
 
return work_done;
diff --git a/drivers/net/sky2.c b/drivers/net/sky2.c
index fe0e756..4e569fa 100644
--- a/drivers/net/sky2.c
+++ b/drivers/net/sky2.c
@@ -2606,7 +2606,7 @@ static int sky2_poll(struct napi_struct *napi, int 
work_limit)
 {
struct sky2_hw *hw = container_of(napi, struct sky2_hw, napi);
u32 status = sky2_read32(hw, B0_Y2_SP_EISR);
-   int work_done;
+   int work_done = 0;
 
if (unlikely(status  Y2_IS_ERROR))
sky2_err_intr(hw, status);
@@ -2617,10 +2617,16 @@ static int sky2_poll(struct napi_struct *napi, int 
work_limit)
if (status  Y2_IS_IRQ_PHY2)
sky2_phy_intr(hw, 1);
 
-   work_done = sky2_status_intr(hw, work_limit);
+   for(;;) {
+   work_done += sky2_status_intr(hw, work_limit);
+
+   if (work_done = work_limit)
+   break;
+
+  

[PATCH 1/6] sky2: status polling loop

2007-10-11 Thread Stephen Hemminger
Handle the corner case where budget is exhausted correctly.

Signed-off-by: Stephen Hemminger [EMAIL PROTECTED]

--- a/drivers/net/sky2.c2007-10-11 18:11:44.0 -0700
+++ b/drivers/net/sky2.c2007-10-11 18:12:50.0 -0700
@@ -2245,15 +2245,13 @@ static inline void sky2_tx_done(struct n
 }
 
 /* Process status response ring */
-static int sky2_status_intr(struct sky2_hw *hw, int to_do)
+static int sky2_status_intr(struct sky2_hw *hw, int to_do, u16 idx)
 {
int work_done = 0;
unsigned rx[2] = { 0, 0 };
-   u16 hwidx = sky2_read16(hw, STAT_PUT_IDX);
 
rmb();
-
-   while (hw-st_idx != hwidx) {
+   do {
struct sky2_port *sky2;
struct sky2_status_le *le  = hw-st_le + hw-st_idx;
unsigned port = le-css  CSS_LINK_BIT;
@@ -2364,7 +2362,7 @@ static int sky2_status_intr(struct sky2_
printk(KERN_WARNING PFX
   unknown status opcode 0x%x\n, 
le-opcode);
}
-   }
+   } while (hw-st_idx != idx);
 
/* Fully processed status ring so clear irq */
sky2_write32(hw, STAT_CTRL, SC_STAT_CLR_IRQ);
@@ -2606,7 +2604,8 @@ static int sky2_poll(struct napi_struct 
 {
struct sky2_hw *hw = container_of(napi, struct sky2_hw, napi);
u32 status = sky2_read32(hw, B0_Y2_SP_EISR);
-   int work_done;
+   int work_done = 0;
+   u16 idx;
 
if (unlikely(status  Y2_IS_ERROR))
sky2_err_intr(hw, status);
@@ -2617,21 +2616,24 @@ static int sky2_poll(struct napi_struct 
if (status  Y2_IS_IRQ_PHY2)
sky2_phy_intr(hw, 1);
 
-   work_done = sky2_status_intr(hw, work_limit);
+   while ((idx = sky2_read16(hw, STAT_PUT_IDX)) != hw-st_idx) {
+   work_done += sky2_status_intr(hw, work_limit - work_done, idx);
 
-   /* More work? */
-   if (hw-st_idx == sky2_read16(hw, STAT_PUT_IDX)) {
-   /* Bug/Errata workaround?
-* Need to kick the TX irq moderation timer.
-*/
-   if (sky2_read8(hw, STAT_TX_TIMER_CTRL) == TIM_START) {
-   sky2_write8(hw, STAT_TX_TIMER_CTRL, TIM_STOP);
-   sky2_write8(hw, STAT_TX_TIMER_CTRL, TIM_START);
-   }
+   if (work_done = work_limit)
+   goto done;
+   }
 
-   napi_complete(napi);
-   sky2_read32(hw, B0_Y2_SP_LISR);
+   /* Bug/Errata workaround?
+* Need to kick the TX irq moderation timer.
+*/
+   if (sky2_read8(hw, STAT_TX_TIMER_CTRL) == TIM_START) {
+   sky2_write8(hw, STAT_TX_TIMER_CTRL, TIM_STOP);
+   sky2_write8(hw, STAT_TX_TIMER_CTRL, TIM_START);
}
+   napi_complete(napi);
+   sky2_read32(hw, B0_Y2_SP_LISR);
+done:
+
return work_done;
 }
 

-- 
Stephen Hemminger [EMAIL PROTECTED]

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3/6] sky2: fix power settings on Yukon XL

2007-10-11 Thread Stephen Hemminger
Make sure PCI register for PHY power gets set correctly.

Signed-off-by: Stephen Hemminger [EMAIL PROTECTED]


--- a/drivers/net/sky2.c2007-10-11 18:12:56.0 -0700
+++ b/drivers/net/sky2.c2007-10-11 18:13:00.0 -0700
@@ -606,20 +606,19 @@ static void sky2_phy_power(struct sky2_h
 {
struct pci_dev *pdev = hw-pdev;
u32 reg1;
-   static const u32 phy_power[]
-   = { PCI_Y2_PHY1_POWD, PCI_Y2_PHY2_POWD };
-
-   /* looks like this XL is back asswards .. */
-   if (hw-chip_id == CHIP_ID_YUKON_XL  hw-chip_rev  1)
-   onoff = !onoff;
+   static const u32 phy_power[] = { PCI_Y2_PHY1_POWD, PCI_Y2_PHY2_POWD };
+   static const u32 coma_mode[] = { PCI_Y2_PHY1_COMA, PCI_Y2_PHY2_COMA };
 
pci_read_config_dword(pdev, PCI_DEV_REG1, reg1);
+   /* Turn on/off phy power saving */
if (onoff)
-   /* Turn off phy power saving */
reg1 = ~phy_power[port];
else
reg1 |= phy_power[port];
 
+   if (onoff  hw-chip_id == CHIP_ID_YUKON_XL  hw-chip_rev  1)
+   reg1 |= coma_mode[port];
+
pci_write_config_dword(pdev, PCI_DEV_REG1, reg1);
pci_read_config_dword(pdev, PCI_DEV_REG1, reg1);
 

-- 
Stephen Hemminger [EMAIL PROTECTED]

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 0/6] sky2 patches for net-2.6

2007-10-11 Thread Stephen Hemminger
Includes:
   * fix for new NAPI status loop
   * use internal net_stats
   * fixes for fiber PHY power

-- 
Stephen Hemminger [EMAIL PROTECTED]

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 4/6] sky2: fiber advertise bits initialization (trivial)

2007-10-11 Thread Stephen Hemminger
Put initialization in sequential order (same as other constants).

Signed-off-by: Stephen Hemminger [EMAIL PROTECTED]

--- a/drivers/net/sky2.c2007-10-11 18:13:00.0 -0700
+++ b/drivers/net/sky2.c2007-10-11 18:13:01.0 -0700
@@ -296,10 +296,10 @@ static const u16 copper_fc_adv[] = {
 
 /* flow control to advertise bits when using 1000BaseX */
 static const u16 fiber_fc_adv[] = {
-   [FC_BOTH] = PHY_M_P_BOTH_MD_X,
+   [FC_NONE] = PHY_M_P_NO_PAUSE_X,
[FC_TX]   = PHY_M_P_ASYM_MD_X,
[FC_RX]   = PHY_M_P_SYM_MD_X,
-   [FC_NONE] = PHY_M_P_NO_PAUSE_X,
+   [FC_BOTH] = PHY_M_P_BOTH_MD_X,
 };
 
 /* flow control to GMA disable bits */

-- 
Stephen Hemminger [EMAIL PROTECTED]

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


  1   2   >