Re: [PATCH] netdev: lockdep classes in register_netdevice Re: [patch 04/13] ppp_generic: fix lockdep warning

2007-05-15 Thread Yuriy N. Shkandybin

I've patched 2.6.22-rc1 and there was no warnings from lock debugger.

Jura

- Original Message - 
From: Jarek Poplawski [EMAIL PROTECTED]

To: David Miller [EMAIL PROTECTED]
Cc: [EMAIL PROTECTED]; [EMAIL PROTECTED]; 
netdev@vger.kernel.org; [EMAIL PROTECTED]; [EMAIL PROTECTED]

Sent: Tuesday, May 15, 2007 9:31 AM
Subject: [PATCH] netdev: lockdep classes in register_netdevice Re: [patch 
04/13] ppp_generic: fix lockdep warning




On Sun, May 13, 2007 at 11:39:37PM -0700, David Miller wrote:

From: Jarek Poplawski [EMAIL PROTECTED]
Date: Mon, 14 May 2007 08:07:00 +0200

 After sending this patch I was a little confused, when next
 lockdep warning report appeared, and I thought - since this is
 not enough, this patch could be dumped. But now I changed my
 mind: there are really many possibilities of strange connections
 between locks taken from vlans, ppp (with pppoe), multicasts etc.
 - that every one possibility less is a gain here.
 ...
 Of course, later, if somebody will find better solution, they could
 be removed,

I already suggested a better fix, you ignored it.

For each unique netdev type, use a different locking class.

That will fix this forever, anything else is a situation specific
band-aid (but then again isn't that what every lockdep annotation is
:-).


So, I guess, you thought about something like this, plus
additional annotations in specific situations like vlan
(but some hint is needed, how much of this should be
considered).

Jarek P.

---

After initializing dev-_xmit_lock register_netdevice()
sets lockdep class according to dev-type.

Idea of this patch - by David Miller.


Signed-off-by: Jarek Poplawski [EMAIL PROTECTED]

---

diff -Nurp 2.6.22-/net/core/dev.c 2.6.22/net/core/dev.c
--- 2.6.22-/net/core/dev.c 2007-05-14 20:26:16.0 +0200
+++ 2.6.22/net/core/dev.c 2007-05-14 21:22:10.0 +0200
@@ -116,6 +116,7 @@
#include linux/dmaengine.h
#include linux/err.h
#include linux/ctype.h
+#include linux/if_arp.h

/*
 * The list of packet types we will receive (as opposed to discard)
@@ -217,6 +218,73 @@ extern void netdev_unregister_sysfs(stru
#define netdev_unregister_sysfs(dev) do { } while(0)
#endif

+#ifdef CONFIG_DEBUG_LOCK_ALLOC
+/*
+ * register_netdevice() inits dev-_xmit_lock and sets lockdep class
+ * according to dev-type
+ */
+static const unsigned short netdev_lock_type[] =
+ {ARPHRD_NETROM, ARPHRD_ETHER, ARPHRD_EETHER, ARPHRD_AX25,
+ ARPHRD_PRONET, ARPHRD_CHAOS, ARPHRD_IEEE802, ARPHRD_ARCNET,
+ ARPHRD_APPLETLK, ARPHRD_DLCI, ARPHRD_ATM, ARPHRD_METRICOM,
+ ARPHRD_IEEE1394, ARPHRD_EUI64, ARPHRD_INFINIBAND, ARPHRD_SLIP,
+ ARPHRD_CSLIP, ARPHRD_SLIP6, ARPHRD_CSLIP6, ARPHRD_RSRVD,
+ ARPHRD_ADAPT, ARPHRD_ROSE, ARPHRD_X25, ARPHRD_HWX25,
+ ARPHRD_PPP, ARPHRD_CISCO, ARPHRD_LAPB, ARPHRD_DDCMP,
+ ARPHRD_RAWHDLC, ARPHRD_TUNNEL, ARPHRD_TUNNEL6, ARPHRD_FRAD,
+ ARPHRD_SKIP, ARPHRD_LOOPBACK, ARPHRD_LOCALTLK, ARPHRD_FDDI,
+ ARPHRD_BIF, ARPHRD_SIT, ARPHRD_IPDDP, ARPHRD_IPGRE,
+ ARPHRD_PIMREG, ARPHRD_HIPPI, ARPHRD_ASH, ARPHRD_ECONET,
+ ARPHRD_IRDA, ARPHRD_FCPP, ARPHRD_FCAL, ARPHRD_FCPL,
+ ARPHRD_FCFABRIC, ARPHRD_IEEE802_TR, ARPHRD_IEEE80211,
+ ARPHRD_IEEE80211_PRISM, ARPHRD_IEEE80211_RADIOTAP, ARPHRD_VOID,
+ ARPHRD_NONE};
+
+static const char *netdev_lock_name[] =
+ {_xmit_NETROM, _xmit_ETHER, _xmit_EETHER, _xmit_AX25,
+ _xmit_PRONET, _xmit_CHAOS, _xmit_IEEE802, _xmit_ARCNET,
+ _xmit_APPLETLK, _xmit_DLCI, _xmit_ATM, _xmit_METRICOM,
+ _xmit_IEEE1394, _xmit_EUI64, _xmit_INFINIBAND, _xmit_SLIP,
+ _xmit_CSLIP, _xmit_SLIP6, _xmit_CSLIP6, _xmit_RSRVD,
+ _xmit_ADAPT, _xmit_ROSE, _xmit_X25, _xmit_HWX25,
+ _xmit_PPP, _xmit_CISCO, _xmit_LAPB, _xmit_DDCMP,
+ _xmit_RAWHDLC, _xmit_TUNNEL, _xmit_TUNNEL6, _xmit_FRAD,
+ _xmit_SKIP, _xmit_LOOPBACK, _xmit_LOCALTLK, _xmit_FDDI,
+ _xmit_BIF, _xmit_SIT, _xmit_IPDDP, _xmit_IPGRE,
+ _xmit_PIMREG, _xmit_HIPPI, _xmit_ASH, _xmit_ECONET,
+ _xmit_IRDA, _xmit_FCPP, _xmit_FCAL, _xmit_FCPL,
+ _xmit_FCFABRIC, _xmit_IEEE802_TR, _xmit_IEEE80211,
+ _xmit_IEEE80211_PRISM, _xmit_IEEE80211_RADIOTAP, _xmit_VOID,
+ _xmit_NONE};
+
+static struct lock_class_key 
netdev_xmit_lock_key[ARRAY_SIZE(netdev_lock_type)];

+
+static inline unsigned short netdev_lock_pos(unsigned short dev_type)
+{
+ int i;
+
+ for (i = 0; i  ARRAY_SIZE(netdev_lock_type); i++)
+ if (netdev_lock_type[i] == dev_type)
+ return i;
+ /* the last key is used by default */
+ return --i;
+}
+
+static inline void netdev_set_lockdep_class(spinlock_t *lock,
+ unsigned short dev_type)
+{
+ int i;
+
+ i = netdev_lock_pos(dev_type);
+ lockdep_set_class_and_name(lock, netdev_xmit_lock_key[i],
+netdev_lock_name[i]);
+}
+#else
+static inline void netdev_set_lockdep_class(spinlock_t *lock,
+ unsigned short dev_type)
+{
+}
+#endif

/***

@@ -3001,6 +3069,7 @@ int register_netdevice(struct net_device

 spin_lock_init(dev-queue_lock);
 spin_lock_init(dev-_xmit_lock);
+ netdev_set_lockdep_class(dev-_xmit_lock, dev-type);
 

Re: [PATCH] netdev: lockdep classes in register_netdevice Re: [patch 04/13] ppp_generic: fix lockdep warning

2007-05-15 Thread Yuriy N. Shkandybin

I've patched 2.6.22-rc1 and there was no warnings from lock debugger.

Jura

- Original Message - 
From: Jarek Poplawski [EMAIL PROTECTED]

To: David Miller [EMAIL PROTECTED]
Cc: [EMAIL PROTECTED]; [EMAIL PROTECTED]; 
netdev@vger.kernel.org; [EMAIL PROTECTED]; [EMAIL PROTECTED]

Sent: Tuesday, May 15, 2007 9:31 AM
Subject: [PATCH] netdev: lockdep classes in register_netdevice Re: [patch 
04/13] ppp_generic: fix lockdep warning




On Sun, May 13, 2007 at 11:39:37PM -0700, David Miller wrote:

From: Jarek Poplawski [EMAIL PROTECTED]
Date: Mon, 14 May 2007 08:07:00 +0200

 After sending this patch I was a little confused, when next
 lockdep warning report appeared, and I thought - since this is
 not enough, this patch could be dumped. But now I changed my
 mind: there are really many possibilities of strange connections
 between locks taken from vlans, ppp (with pppoe), multicasts etc.
 - that every one possibility less is a gain here.
 ...
 Of course, later, if somebody will find better solution, they could
 be removed,

I already suggested a better fix, you ignored it.

For each unique netdev type, use a different locking class.

That will fix this forever, anything else is a situation specific
band-aid (but then again isn't that what every lockdep annotation is
:-).


So, I guess, you thought about something like this, plus
additional annotations in specific situations like vlan
(but some hint is needed, how much of this should be
considered).

Jarek P.

---

After initializing dev-_xmit_lock register_netdevice()
sets lockdep class according to dev-type.

Idea of this patch - by David Miller.


Signed-off-by: Jarek Poplawski [EMAIL PROTECTED]

---

diff -Nurp 2.6.22-/net/core/dev.c 2.6.22/net/core/dev.c
--- 2.6.22-/net/core/dev.c 2007-05-14 20:26:16.0 +0200
+++ 2.6.22/net/core/dev.c 2007-05-14 21:22:10.0 +0200
@@ -116,6 +116,7 @@
#include linux/dmaengine.h
#include linux/err.h
#include linux/ctype.h
+#include linux/if_arp.h

/*
 * The list of packet types we will receive (as opposed to discard)
@@ -217,6 +218,73 @@ extern void netdev_unregister_sysfs(stru
#define netdev_unregister_sysfs(dev) do { } while(0)
#endif

+#ifdef CONFIG_DEBUG_LOCK_ALLOC
+/*
+ * register_netdevice() inits dev-_xmit_lock and sets lockdep class
+ * according to dev-type
+ */
+static const unsigned short netdev_lock_type[] =
+ {ARPHRD_NETROM, ARPHRD_ETHER, ARPHRD_EETHER, ARPHRD_AX25,
+ ARPHRD_PRONET, ARPHRD_CHAOS, ARPHRD_IEEE802, ARPHRD_ARCNET,
+ ARPHRD_APPLETLK, ARPHRD_DLCI, ARPHRD_ATM, ARPHRD_METRICOM,
+ ARPHRD_IEEE1394, ARPHRD_EUI64, ARPHRD_INFINIBAND, ARPHRD_SLIP,
+ ARPHRD_CSLIP, ARPHRD_SLIP6, ARPHRD_CSLIP6, ARPHRD_RSRVD,
+ ARPHRD_ADAPT, ARPHRD_ROSE, ARPHRD_X25, ARPHRD_HWX25,
+ ARPHRD_PPP, ARPHRD_CISCO, ARPHRD_LAPB, ARPHRD_DDCMP,
+ ARPHRD_RAWHDLC, ARPHRD_TUNNEL, ARPHRD_TUNNEL6, ARPHRD_FRAD,
+ ARPHRD_SKIP, ARPHRD_LOOPBACK, ARPHRD_LOCALTLK, ARPHRD_FDDI,
+ ARPHRD_BIF, ARPHRD_SIT, ARPHRD_IPDDP, ARPHRD_IPGRE,
+ ARPHRD_PIMREG, ARPHRD_HIPPI, ARPHRD_ASH, ARPHRD_ECONET,
+ ARPHRD_IRDA, ARPHRD_FCPP, ARPHRD_FCAL, ARPHRD_FCPL,
+ ARPHRD_FCFABRIC, ARPHRD_IEEE802_TR, ARPHRD_IEEE80211,
+ ARPHRD_IEEE80211_PRISM, ARPHRD_IEEE80211_RADIOTAP, ARPHRD_VOID,
+ ARPHRD_NONE};
+
+static const char *netdev_lock_name[] =
+ {_xmit_NETROM, _xmit_ETHER, _xmit_EETHER, _xmit_AX25,
+ _xmit_PRONET, _xmit_CHAOS, _xmit_IEEE802, _xmit_ARCNET,
+ _xmit_APPLETLK, _xmit_DLCI, _xmit_ATM, _xmit_METRICOM,
+ _xmit_IEEE1394, _xmit_EUI64, _xmit_INFINIBAND, _xmit_SLIP,
+ _xmit_CSLIP, _xmit_SLIP6, _xmit_CSLIP6, _xmit_RSRVD,
+ _xmit_ADAPT, _xmit_ROSE, _xmit_X25, _xmit_HWX25,
+ _xmit_PPP, _xmit_CISCO, _xmit_LAPB, _xmit_DDCMP,
+ _xmit_RAWHDLC, _xmit_TUNNEL, _xmit_TUNNEL6, _xmit_FRAD,
+ _xmit_SKIP, _xmit_LOOPBACK, _xmit_LOCALTLK, _xmit_FDDI,
+ _xmit_BIF, _xmit_SIT, _xmit_IPDDP, _xmit_IPGRE,
+ _xmit_PIMREG, _xmit_HIPPI, _xmit_ASH, _xmit_ECONET,
+ _xmit_IRDA, _xmit_FCPP, _xmit_FCAL, _xmit_FCPL,
+ _xmit_FCFABRIC, _xmit_IEEE802_TR, _xmit_IEEE80211,
+ _xmit_IEEE80211_PRISM, _xmit_IEEE80211_RADIOTAP, _xmit_VOID,
+ _xmit_NONE};
+
+static struct lock_class_key 
netdev_xmit_lock_key[ARRAY_SIZE(netdev_lock_type)];

+
+static inline unsigned short netdev_lock_pos(unsigned short dev_type)
+{
+ int i;
+
+ for (i = 0; i  ARRAY_SIZE(netdev_lock_type); i++)
+ if (netdev_lock_type[i] == dev_type)
+ return i;
+ /* the last key is used by default */
+ return --i;
+}
+
+static inline void netdev_set_lockdep_class(spinlock_t *lock,
+ unsigned short dev_type)
+{
+ int i;
+
+ i = netdev_lock_pos(dev_type);
+ lockdep_set_class_and_name(lock, netdev_xmit_lock_key[i],
+netdev_lock_name[i]);
+}
+#else
+static inline void netdev_set_lockdep_class(spinlock_t *lock,
+ unsigned short dev_type)
+{
+}
+#endif

/***

@@ -3001,6 +3069,7 @@ int register_netdevice(struct net_device

 spin_lock_init(dev-queue_lock);
 spin_lock_init(dev-_xmit_lock);
+ netdev_set_lockdep_class(dev-_xmit_lock, dev-type);
 

Re: [IPV4] LVS: Allow to send ICMP unreachable responses when real-servers are removed

2007-05-15 Thread Janusz Krzysztofik

Simon Horman wrote:

On Mon, May 14, 2007 at 07:41:48PM +0200, Patrick McHardy wrote:

So you're adding a local route for non-local destination and the
address selection in icmp_send() uses the original destination
address as source because the route has RTCF_LOCAL set, resulting
in an error in ip_route_output_slow().


I'm not entirely sure that adding a local route is the right
terminology, but then again, perhaps I'm missunderstanding exactly
what that means.


What I do exactly is:
  ip rule add prio 1000 fwmark $IF_MARK_LVS lookup lvs
  ip route replace table lvs local default dev lo


My undersanding of the problem is that IPVS likes to send icmp to notify
end-users when real-servers are down. 


Yes, there is one such place in IPVS code too, inside ip_vs_leave(),
used for notifying clients on service overload.


The source ip of such icmp is the
VIP, that is the IP address associated with the virtual service.
However, it is quite valid for this VIP not to be configured on the
machine that is running IPVS. Thus the machine in question wants to send
icmp packets with a non-local source address.

http://archive.linuxvirtualserver.org/html/lvs-users/2007-01/msg00109.html


If thats correct than this patch should also work, it changes
icmp_send() to check if the original destination address is
non-local when deciding whether to pick a new address (and
reverts the routing changes).

I think that your patch looks good, assuming that inet_addr_type(VIP)
is going to return RTN_LOCAL (except in the unlikely case that VIP is
multicast or something silly like that.


For now, I have no place other than my production firewall cluster to
verify this patch. I will do it as soon as possible and give you some
feedback.


However, I wonder if efficiency or safety reasons it might
be better for IPVS to pass some sort of OK_ITS_SUPPSED_TO_BE_NON_LOCAL
flag into ip_route(). 


Do you mean packets that are passed through ip_vs_in()?. If not, please
remember that current IPVS code does not send any ICMP port unreachable
messages except for this rare overload case. I still have no idea how to
solve more common problem of notifying clients on dead real server
inside the IPVS code itself, to avoid my complicated tricks of marking
based on connection tracking.

On the other hand, I have to state that even if I can now send
notifications to clients using my method, this does not solve my real
problem of broken ipsec connections going through LVS director. Openswan
clients I use do not care about ICMP port unreachable messages an insist
on using connections that are invalid due to switched real server. So
maybe we should first verify if there are any real cases when notifying
udp clients with ICMP port unreachable may be realy usefull and then
decide if we do need this functionality.

Janusz


P.S. Simon, sorry for duplicated message.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] netdev: lockdep classes in register_netdevice Re: [patch 04/13] ppp_generic: fix lockdep warning

2007-05-15 Thread Jarek Poplawski
On Tue, May 15, 2007 at 12:49:47PM +0400, Yuriy N. Shkandybin wrote:
 I've patched 2.6.22-rc1 and there was no warnings from lock debugger.
 
 Jura

Many thanks, Jura!

It seems reality is sometimes merciful...

On the other hand I wonder, how all this could stay so long:
a configuration similar to yours should be quite common. It
looks, some people probably don't care, and here your help
is even more precious!

Best regards,
Jarek P.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/2] tbf scheduler: TSO support (update 2)

2007-05-15 Thread Hirokazu Takahashi
Hi,

 Hirokazu Takahashi [EMAIL PROTECTED] wrote:
 
  Uhh, you are right.
  skb_shinfo(skb)-gso_segs and skb_shinfo(skb)-gso_size should be used.
 
 Actually forget about gso_segs, it's only filled in for TCP.

I realized it was really hard to determine the actual size of each
packet that will be generated from TSO packets, the size which
should be used to calculate the really accurate traffic.
There isn't enough information in socket buffers to determine
the size of their headers as gso_size just shows the maximum length of
the segment without any headers and the other members are helpless
either.

  split into
 TSO packet  ---  packets after being split
+--++--+
| headers  || headers  |
+--++--+   
| segment1 || segment1 | A
|  ||  | | gso_size
|  ||  | V
+--++--+   
| segment2 |
|  |+--+
|  || headers  |
+--++--+
| segment3 || segment2 |
|  ||  |
+--+|  |
+--+

+--+
| headers  |
+--+
| segment3 |
|  |
+--+

So I decided to make it simple to calculate the traffic:
   - assume each packet generated from the same TSO packet have
 the same length.
   - ignore the length of additional headers which will be
 automatically applied.

It looks working pretty well to control bandwidth as I expected,
but I'm not sure everybody will be satisfied with it.
Do you think this approximate calculation is enough?


I also realized CBQ scheduler have to be fixed to handle large
TSO packets or it may possibly cause Oops. The next mail contains
the patch for CBQ.



--- linux-2.6.21/net/sched/sch_tbf.c.ORG2007-05-08 20:59:28.0 
+0900
+++ linux-2.6.21/net/sched/sch_tbf.c2007-05-15 19:59:34.0 +0900
@@ -9,7 +9,8 @@
  * Authors:Alexey Kuznetsov, [EMAIL PROTECTED]
  * Dmitry Torokhov [EMAIL PROTECTED] - allow attaching inner 
qdiscs -
  *  original idea by Martin Devera
- *
+ * Fixes:
+ * Hirokazu Takahashi [EMAIL PROTECTED] : TSO support
  */
 
 #include linux/module.h
@@ -138,8 +139,12 @@ static int tbf_enqueue(struct sk_buff *s
 {
struct tbf_sched_data *q = qdisc_priv(sch);
int ret;
+   //unsigned int segs = skb_shinfo(skb)-gso_segs ? : 1;
+   unsigned int segs = skb_shinfo(skb)-gso_segs ? :
+ skb_shinfo(skb)-gso_size ? skb-len/skb_shinfo(skb)-gso_size + 1 : 
1;
+   unsigned int len = (skb-len - 1)/segs + 1;
 
-   if (skb-len  q-max_size) {
+   if (len  q-max_size) {
sch-qstats.drops++;
 #ifdef CONFIG_NET_CLS_POLICE
if (sch-reshape_fail == NULL || sch-reshape_fail(skb, sch))
@@ -204,22 +209,41 @@ static struct sk_buff *tbf_dequeue(struc
psched_time_t now;
long toks, delay;
long ptoks = 0;
-   unsigned int len = skb-len;
+   /*
+* Note: TSO packets will be larger than its actual mtu.
+* These packets should be treated as packets including
+* several ordinary ones. In this case, tokens should
+* be held until it reaches the length of them.
+*
+* To simplify, we assume each segment in a TSO packet
+* has the same length though it may probably not be true.
+* And ignore the length of headers which will be applied
+* to each segment when splitting TSO packets.
+* 
+* The number of segments are calculated from the segment
+* size of TSO packets temporarily if it isn't set.
+*/
+   unsigned int segs = skb_shinfo(skb)-gso_segs ? :
+ skb_shinfo(skb)-gso_size ? 
skb-len/skb_shinfo(skb)-gso_size + 1 : 1;
+   unsigned int len = (skb-len - 1)/segs + 1;
+   unsigned int expect = L2T(q, len) * segs;
+   long max_toks = max(expect, q-buffer);
+
 
PSCHED_GET_TIME(now);
 
-   toks = PSCHED_TDIFF_SAFE(now, q-t_c, q-buffer);
+   toks = PSCHED_TDIFF_SAFE(now, q-t_c, max_toks);
 
if (q-P_tab) {
ptoks = toks + q-ptokens;
-   if (ptoks  (long)q-mtu)
-   ptoks = q-mtu;
-   ptoks -= L2T_P(q, len);
+   if (ptoks  (long)(q-mtu * segs))
+   

[PATCH 2/2] tbf scheduler: TSO support (update 2)

2007-05-15 Thread Hirokazu Takahashi
Hi,

Without this patch, CBQ scheduler may cause Oops with some TSO packets
because the tables passed to L2T() isn't big enough to handle such
huge packets.

Thanks,
Hirokazu Takahashi.


--- linux-2.6.21/net/sched/sch_cbq.c.ORG2007-05-14 20:53:06.0 
+0900
+++ linux-2.6.21/net/sched/sch_cbq.c2007-05-15 18:10:43.0 +0900
@@ -176,6 +176,7 @@ struct cbq_sched_data
struct cbq_class*tx_class;
struct cbq_class*tx_borrowed;
int tx_len;
+   unsigned inttx_segs;
psched_time_t   now;/* Cached timestamp */
psched_time_t   now_rt; /* Cached real time */
unsignedpmask;
@@ -753,6 +754,7 @@ cbq_update(struct cbq_sched_data *q)
struct cbq_class *this = q-tx_class;
struct cbq_class *cl = this;
int len = q-tx_len;
+   unsigned int segs = q-tx_segs;
 
q-tx_class = NULL;
 
@@ -761,7 +763,7 @@ cbq_update(struct cbq_sched_data *q)
long idle;
 
cl-bstats.packets++;
-   cl-bstats.bytes += len;
+   cl-bstats.bytes += len*segs;
 
/*
   (now - last) is total time between packet right edges.
@@ -774,7 +776,7 @@ cbq_update(struct cbq_sched_data *q)
if ((unsigned long)idle  128*1024*1024) {
avgidle = cl-maxidle;
} else {
-   idle -= L2T(cl, len);
+   idle -= L2T(cl, len) * segs;
 
/* true_avgidle := (1-W)*true_avgidle + W*idle,
   where W=2^{-ewma_log}. But cl-avgidle is scaled:
@@ -811,8 +813,8 @@ cbq_update(struct cbq_sched_data *q)
   to the moment of cbq_update)
 */
 
-   idle -= L2T(q-link, len);
-   idle += L2T(cl, len);
+   idle -= L2T(q-link, len) * segs;
+   idle += L2T(cl, len) * segs;
 
PSCHED_AUDIT_TDIFF(idle);
 
@@ -924,7 +926,9 @@ cbq_dequeue_prio(struct Qdisc *sch, int 
cl-xstats.borrows += skb-len;
 #endif
}
-   q-tx_len = skb-len;
+   q-tx_segs = skb_shinfo(skb)-gso_segs ? :
+ skb_shinfo(skb)-gso_size ? 
skb-len/skb_shinfo(skb)-gso_size + 1 : 1;
+   q-tx_len = (skb-len - 1)/q-tx_segs + 1;
 
if (cl-deficit = 0) {
q-active[prio] = cl;
@@ -1013,7 +1017,7 @@ cbq_dequeue(struct Qdisc *sch)
 
   cbq_time = max(real_time, work);
 */
-   incr2 = L2T(q-link, q-tx_len);
+   incr2 = L2T(q-link, q-tx_len) * q-tx_segs;
PSCHED_TADD(q-now, incr2);
cbq_update(q);
if ((incr -= incr2)  0)

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Panic in ieee_80211_ibss_add_sta when trying to join ad-hoc network (rt2500pci)

2007-05-15 Thread David LAMPARTER
Hello,


while trying to get my wireless to work (a Ralink RT2560, as
sold in a Fujitsu-Siemens Amilo A 1630), I've been hitting the
following Panic twice:

BUG: unable to handle kernel NULL pointer derference at virtual address 0218
[...]
EIP is at ieee80211_ibss_add_sta+0xae/0x130
[...]
EIP: [c05773fe] ieee_80211_ibss_add_sta+0xae/0x130 SS:ESP 0068:f641dc38
Kernel panic - not syncing: Fatal exception in interrupt

The bug seems to be triggered as soon as the stack tries to
join my router's ad-hoc; it happen either directly when
doing ip l s wlan0 up as well as when doing
iwconfig wlan0 essid equinox (when it did not immediately
find the network).

Kernel version is 2.6.21-ge42d23f4 (git checkout from
git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-dev,
about a few hours old.)

Full information set available at http://celeste.diac24.net/rtpanic/
(includes pictures of the panics, in case I have a typo somewhere)
Requests for more information / patches welcome, but expect delayed
response.

More information attached.


Greetings,

David Lamparter



BUG: unable to handle kernel NULL pointer derference at virtual address 0218
 printing eip:
c05773fe
*pde = 
Oops:  [#1]
PREEMPT
Modules linked in: rt2500pci rt2x00pci rt2x00lib radeon drm
CPU:0
EIP:0060:[c05773fe]Not tainted VLI
EFLAGS: 0010286(2.6.21-ge42d23f3 #8)
EIP is at ieee80211_ibss_add_sta+0xae/0x130
eax: f76292c0   ebx: f78c381c   ecx:    edx: 0102
esi: f6a091a0   edi: f76292c0   ebp: f6bb8000   esp: f641dc38
ds: 007b   es: 007b   fs:   gs: 0033  ss: 0068
Process ip (pid: 1621, ti=f641c000 task=f78f8c30 task.ti=f641c000)
Stack: 0020 f782f800  0001 00e3 000f 00ae 00eb
   f6bb8000 f6a091a0 c193e8c0 f78c3822 f6bb83a0 0002 f78c3812 c0569b8a
   f78c381c df82f5ea f7e90458 f7cfea10 0001 f7cfea28 0018 
Call Trace:
 [c0569b8a] __ieee80211_rx+0xa5a/0xc10
 [c0180b5a] dentry_iput+0xda/0x120
 [c056c34f] ieee80211_tasklet_handler+0xaf/0xe0
 [c02f3c9f] _atomic_dec_and_loc+0x2f/0x50
 [c0121c43] tasklet_action+0x33/0x70
 [c0121b72] __do_softirq+0x52/0xa0
 [c0121c05] do_softirq+0x45/0x50
 [c0122023] local_bh_enable+0x53/0xa0
 [c047f74b] dev_mc_upload+0x3b/0x50
 [c047d16c] dev_open+0x5c/0x80
 [c056d017] ieee80211_open+0x317/0x420
 [c0121b86] __do_softirq+0x66/0xa0
 [c047d149] dev_open+0x39/0x80
 [c047b8cc] dev_change_flags+0x5c/0x140
 [c04c3613] devinet_ioctl+0x563/0x6e0
 [c0472310] sock_ioctl+0x0/0x1c0
 [c04723bf] sock_ioctl+0xaf/0x1c0
 [c0472310] sock_ioctl+0x0/0x1c0
 [c017b81b] do_ioctl+0x2b/0x90
 [c017b8dc] vfs_ioctl+0x5c/0x2b0
 [c017bb6d] sys_ioctl+0x3d/0x70
 [c010406e] sysenter_past_esp+0x5f/0x85
 ===
Code: 00 00 00 c7 04 24 5c 09 6d c0 89 44 24 04 e8 fa 5f ba ff 89 d9 89 ea 89 
f0 c7 04 24 20 00 00 00 e8 48 d1 ff ff 85 c0 89 c7 74 95 a1 18 02 00 00 8b 97 
8c 00 00 00 89 f1 89 47 64 8b 87 88 00 00
EIP: [c05773fe] ieee_80211_ibss_add_sta+0xae/0x130 SS:ESP 0068:f641dc38
Kernel panic - not syncing: Fatal exception in interrupt

Linux version 2.6.21-ge42d23f4 ([EMAIL PROTECTED]) (gcc version 4.1.2) #8 
PREEMPT Tue May 15 14:08:04 CEST 2007
00:00.0 Host bridge: Silicon Integrated Systems [SiS] 755 Host (rev 01)
Subsystem: Silicon Integrated Systems [SiS] 755 Host
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- 
Stepping- SERR- FastB2B-
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=medium TAbort- 
TAbort- MAbort+ SERR- PERR-
Latency: 32
Region 0: Memory at e000 (32-bit, non-prefetchable) [size=128M]
Capabilities: [a0] AGP version 3.0
Status: RQ=32 Iso- ArqSz=2 Cal=3 SBA+ ITACoh- GART64- HTrans- 
64bit- FW- AGP3+ Rate=x4,x8
Command: RQ=1 ArqSz=0 Cal=0 SBA+ AGP+ GART64- 64bit- FW- Rate=x8
Capabilities: [d0] HyperTransport: Slave or Primary Interface
!!! Possibly incomplete decoding
Command: BaseUnitID=0 UnitCnt=9 MastHost- DefDir-
Link Control 0: CFlE- CST- CFE- LkFail- Init+ EOC+ TXO- 
CRCErr=0
Link Config 0: MLWI=16bit MLWO=16bit LWI=16bit LWO=16bit
Link Control 1: CFlE- CST- CFE- LkFail+ Init- EOC+ TXO+ 
CRCErr=0
Link Config 1: MLWI=N/C MLWO=N/C LWI=N/C LWO=N/C
Revision ID: 1.02
Capabilities: [f0] HyperTransport: Interrupt Discovery and Configuration

00:01.0 PCI bridge: Silicon Integrated Systems [SiS] SG86C202 (prog-if 00 
[Normal decode])
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- 
Stepping- SERR+ FastB2B-
Status: Cap- 66MHz+ UDF- FastB2B- ParErr- DEVSEL=medium TAbort- 
TAbort- MAbort- SERR- PERR-
Latency: 64
Bus: primary=00, secondary=01, subordinate=01, sec-latency=64
I/O behind bridge: c000-cfff
Memory behind bridge: fea0-feaf
Prefetchable memory behind bridge: ee90-fe8f

Re: [IPV4] LVS: Allow to send ICMP unreachable responses when real-servers are removed

2007-05-15 Thread Patrick McHardy
Simon Horman wrote:
 On Mon, May 14, 2007 at 07:41:48PM +0200, Patrick McHardy wrote:
 
So you're adding a local route for non-local destination and the
address selection in icmp_send() uses the original destination
address as source because the route has RTCF_LOCAL set, resulting
in an error in ip_route_output_slow().
 
 
 I'm not entirely sure that adding a local route is the right
 terminology, but then again, perhaps I'm missunderstanding exactly
 what that means.

It means adding a route to the local table, which causes the
resulting dst_entry to be marked with RTCF_LOCAL.

 My undersanding of the problem is that IPVS likes to send icmp to notify
 end-users when real-servers are down. The source ip of such icmp is the
 VIP, that is the IP address associated with the virtual service.
 However, it is quite valid for this VIP not to be configured on the
 machine that is running IPVS. Thus the machine in question wants to send
 icmp packets with a non-local source address.
 
 http://archive.linuxvirtualserver.org/html/lvs-users/2007-01/msg00109.html
 
 I think that your patch looks good, assuming that inet_addr_type(VIP)
 is going to return RTN_LOCAL (except in the unlikely case that VIP is
 multicast or something silly like that.

I'm not familiar with the IPVS terms, but as far as I understand,
it is _not_ going to return RTN_LOCAL, so we get the desired
behaviour of selecting a local address as source.

 However, I wonder if efficiency or safety reasons it might
 be better for IPVS to pass some sort of OK_ITS_SUPPSED_TO_BE_NON_LOCAL
 flag into ip_route(). 
 
 Just a thought.

I'm not too thrilled about adding a route flag when it really is
ICMP address selection that is the problem here.

The patch should be completely safe since multicast and broadcast
packets are already filtered out earlier and the RTN_LOCAL test
matches exactly what ip_route_output_slow does.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] New driver API to speed up small packets xmits

2007-05-15 Thread Roland Dreier
   As I said before, getting multiple packets in one call to xmit would
   be nice for amortizing per-xmit overhead in IPoIB.  So it would be
   nice if the cases where the stack does GSO ended up passing all the
   segments into the driver in one go.
  
  Well TCP does upto 64k -- that is what GSO is about.

I see... the plan would be to add NETIF_F_GSO_SOFTWARE to the device
features and use skb_gso_segment() in the netdevice driver?  (I just
studied GSO more carefully -- I hadn't realized that was possible)

I'll have to think about implementing that for IPoIB.  One issue I see
is if I have, say, 4 free entries in my send queue and skb_gso_segment()
gives me back 5 packets to send.  It's not clear I can recover at that
point -- I guess I have to check against gso_segs in the xmit routine
before actually doing the segmentation.

 - R.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch 1/1] icom: Add new sub-device-id to support new adapter

2007-05-15 Thread wendy xiong
Hi Andrew,

I have tested this with new adapter on our systems. I didn't get
comments since I sent out last Wednesday.

Could you help me with this patch?

Thank you very much!
Wendy

On Wed, 2007-05-09 at 17:36 -0500, wendy xiong wrote:
 This patch add new sub-device-id to support new adapter and changed the
 interrupt irq number for unsigned char to unsigned int.
 
 Signed-off by: Wendy Xiong [EMAIL PROTECTED]
 
 
 diff -Nuar linux-2.6.21-rc7.orig/drivers/serial/icom.c 
 linux-2.6.21-rc7.new/drivers/serial/icom.c
 --- linux-2.6.21-rc7.orig/drivers/serial/icom.c 2008-01-10 23:53:59.0 
 -0600
 +++ linux-2.6.21-rc7.new/drivers/serial/icom.c  2008-01-10 23:58:30.0 
 -0600
 @@ -97,6 +97,13 @@
   .subdevice = PCI_DEVICE_ID_IBM_ICOM_FOUR_PORT_MODEL,
   .driver_data = ADAPTER_V2,
  },
 +   {
 + .vendor = PCI_VENDOR_ID_IBM,
 + .device = PCI_DEVICE_ID_IBM_ICOM_DEV_ID_2,
 + .subvendor = PCI_VENDOR_ID_IBM,
 + .subdevice = 
 PCI_DEVICE_ID_IBM_ICOM_V2_ONE_PORT_RVX_ONE_PORT_MDM_PCIE,
 + .driver_data = ADAPTER_V2,
 +},
 {}
  };
 
 diff -Nuar linux-2.6.21-rc7.orig/drivers/serial/icom.h 
 linux-2.6.21-rc7.new/drivers/serial/icom.h
 --- linux-2.6.21-rc7.orig/drivers/serial/icom.h 2008-01-10 23:53:59.0 
 -0600
 +++ linux-2.6.21-rc7.new/drivers/serial/icom.h  2008-01-10 23:55:42.0 
 -0600
 @@ -258,7 +258,7 @@
  struct icom_adapter {
 void __iomem * base_addr;
 unsigned long base_addr_pci;
 -   unsigned char irq_number;
 +   unsigned int irq_number;
 struct pci_dev *pci_dev;
 struct icom_port port_info[4];
 int index;
 diff -Nuar linux-2.6.21-rc7.orig/include/linux/pci_ids.h 
 linux-2.6.21-rc7.new/include/linux/pci_ids.h
 --- linux-2.6.21-rc7.orig/include/linux/pci_ids.h   2008-01-10 
 23:54:13.0 -0600
 +++ linux-2.6.21-rc7.new/include/linux/pci_ids.h2008-01-10 
 23:59:08.0 -0600
 @@ -471,6 +471,7 @@
  #define PCI_DEVICE_ID_IBM_ICOM_DEV_ID_20x0219
  #define PCI_DEVICE_ID_IBM_ICOM_V2_TWO_PORTS_RVX0x021A
  #define PCI_DEVICE_ID_IBM_ICOM_V2_ONE_PORT_RVX_ONE_PORT_MDM0x0251
 +#define PCI_DEVICE_ID_IBM_ICOM_V2_ONE_PORT_RVX_ONE_PORT_MDM_PCIE   0x0361
  #define PCI_DEVICE_ID_IBM_ICOM_FOUR_PORT_MODEL 0x252
 
 
 -
 To unsubscribe from this list: send the line unsubscribe netdev in
 the body of a message to [EMAIL PROTECTED]
 More majordomo info at  http://vger.kernel.org/majordomo-info.html

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch 1/1] icom: Add new sub-device-id to support new adapter

2007-05-15 Thread Andrew Morton
On Tue, 15 May 2007 11:29:15 -0500 wendy xiong [EMAIL PROTECTED] wrote:

 I have tested this with new adapter on our systems. I didn't get
 comments since I sent out last Wednesday.
 
 Could you help me with this patch?

You sent it to the wrong mailing list: netdev doesn't handle serial drivers.
I don't normally troll netdev for missed patches.

Please send miscellaneous patches to linux-kernel.

Your email client is replacing tabs with spaces - I fixed that up.

The undersized irq number bug was already fixed.

Thanks for the patch.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


select(0, ..) is valid ?

2007-05-15 Thread Badari Pulavarty
Hi,

Is select(0, ..) is a valid operation ?

I see that there is no check to prevent this or return
success early, without doing any work. Do we need one ?

slub code is complaining that we are doing kmalloc(0).

Thanks,
Badari

[ cut here ]
Badness at include/linux/slub_def.h:88
Call Trace:
[c001e4eb7640] [c000e650] .show_stack+0x68/0x1b0
(unreliable)
[c001e4eb76e0] [c029b854] .report_bug+0x94/0xe8
[c001e4eb7770] [c00219f0] .program_check_exception
+0x12c/0x568
[c001e4eb77f0] [c0004a84] program_check_common+0x104/0x180
--- Exception: 700 at .get_slab+0x4c/0x234
LR = .__kmalloc+0x24/0xc4
[c001e4eb7ae0] [c001e4eb7b80] 0xc001e4eb7b80 (unreliable)
[c001e4eb7b80] [c00a7ff0] .__kmalloc+0x24/0xc4
[c001e4eb7c10] [c00ea720] .compat_core_sys_select+0x90/0x240
[c001e4eb7d00] [c00ec3a4] .compat_sys_select+0xb0/0x190
[c001e4eb7dc0] [c0014944] .ppc32_select+0x14/0x28
[c001e4eb7e30] [c000872c] syscall_exit+0x0/0x40


-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Panic in ieee_80211_ibss_add_sta when trying to join ad-hoc network (rt2500pci)

2007-05-15 Thread John W. Linville
On Tue, May 15, 2007 at 05:28:42PM +0200, David LAMPARTER wrote:

 BUG: unable to handle kernel NULL pointer derference at virtual address 
 0218
 [...]
 EIP is at ieee80211_ibss_add_sta+0xae/0x130
 [...]
 EIP: [c05773fe] ieee_80211_ibss_add_sta+0xae/0x130 SS:ESP 0068:f641dc38
 Kernel panic - not syncing: Fatal exception in interrupt
 
 The bug seems to be triggered as soon as the stack tries to
 join my router's ad-hoc; it happen either directly when
 doing ip l s wlan0 up as well as when doing
 iwconfig wlan0 essid equinox (when it did not immediately
 find the network).

Probably because of this:

struct ieee80211_sub_if_data *sdata = NULL;
...
sta-supp_rates = sdata-u.sta.supp_rates_bits;

Patch below...does this work better?  Looks like upstream needs
it too...

John

---
Avoid sdata null pointer dereference in ieee80211_ibss_add_sta.

Signed-off-by: John W. Linville [EMAIL PROTECTED]
---

 net/mac80211/ieee80211_sta.c |3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/net/mac80211/ieee80211_sta.c b/net/mac80211/ieee80211_sta.c
index a36c6f3..dd36cc6 100644
--- a/net/mac80211/ieee80211_sta.c
+++ b/net/mac80211/ieee80211_sta.c
@@ -3154,7 +3154,7 @@ struct sta_info * ieee80211_ibss_add_sta(struct 
net_device *dev,
 {
struct ieee80211_local *local = wdev_priv(dev-ieee80211_ptr);
struct sta_info *sta;
-   struct ieee80211_sub_if_data *sdata = NULL;
+   struct ieee80211_sub_if_data *sdata = IEEE80211_DEV_TO_SUB_IF(dev);
 
/* TODO: Could consider removing the least recently used entry and
 * allow new one to be added. */

-- 
John W. Linville
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: select(0, ..) is valid ?

2007-05-15 Thread Mark Glines
On Tue, 15 May 2007 10:29:18 -0700
Badari Pulavarty [EMAIL PROTECTED] wrote:

 Hi,
 
 Is select(0, ..) is a valid operation ?

select(0, ..) is rather commonly used as a portable sleep() with
microsecond granularity.  Disabling it will break lots of things.

Mark
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: select(0, ..) is valid ?

2007-05-15 Thread Jiri Slaby
Badari Pulavarty napsal(a):
 Hi,
 
 Is select(0, ..) is a valid operation ?

Yes, it was (is) sometimes used for measuring (sleeping for) short time slices.

regards,
-- 
http://www.fi.muni.cz/~xslaby/Jiri Slaby
faculty of informatics, masaryk university, brno, cz
e-mail: jirislaby gmail com, gpg pubkey fingerprint:
 B674 9967 0407 CE62 ACC8  22A0 32CC 55C3 39D4 7A7E
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: select(0, ..) is valid ?

2007-05-15 Thread Alan Cox
On Tue, 15 May 2007 10:29:18 -0700
Badari Pulavarty [EMAIL PROTECTED] wrote:

 Hi,
 
 Is select(0, ..) is a valid operation ?

Yes. It's a fairly classic old BSD way to do timeouts

Alan
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: select(0, ..) is valid ?

2007-05-15 Thread Andrew Morton
On Tue, 15 May 2007 10:29:18 -0700
Badari Pulavarty [EMAIL PROTECTED] wrote:

 Hi,
 
 Is select(0, ..) is a valid operation ?

Probably - it becomes an elaborate way of doing a sleep.  Whatever - we
used to permit it without error, so we should continue to do so.

 I see that there is no check to prevent this or return
 success early, without doing any work. Do we need one ?
 
 slub code is complaining that we are doing kmalloc(0).
 
 [ cut here ]
 Badness at include/linux/slub_def.h:88
 Call Trace:
 [c001e4eb7640] [c000e650] .show_stack+0x68/0x1b0
 (unreliable)
 [c001e4eb76e0] [c029b854] .report_bug+0x94/0xe8
 [c001e4eb7770] [c00219f0] .program_check_exception
 +0x12c/0x568
 [c001e4eb77f0] [c0004a84] program_check_common+0x104/0x180
 --- Exception: 700 at .get_slab+0x4c/0x234
 LR = .__kmalloc+0x24/0xc4
 [c001e4eb7ae0] [c001e4eb7b80] 0xc001e4eb7b80 (unreliable)
 [c001e4eb7b80] [c00a7ff0] .__kmalloc+0x24/0xc4
 [c001e4eb7c10] [c00ea720] .compat_core_sys_select+0x90/0x240
 [c001e4eb7d00] [c00ec3a4] .compat_sys_select+0xb0/0x190
 [c001e4eb7dc0] [c0014944] .ppc32_select+0x14/0x28
 [c001e4eb7e30] [c000872c] syscall_exit+0x0/0x40


I _think_ we can just do

--- a/fs/compat.c~a
+++ a/fs/compat.c
@@ -1566,9 +1566,13 @@ int compat_core_sys_select(int n, compat
 */
ret = -ENOMEM;
size = FDS_BYTES(n);
-   bits = kmalloc(6 * size, GFP_KERNEL);
-   if (!bits)
-   goto out_nofds;
+   if (likely(size)) {
+   bits = kmalloc(6 * size, GFP_KERNEL);
+   if (!bits)
+   goto out_nofds;
+   } else {
+   bits = NULL;
+   }
fds.in  = (unsigned long *)  bits;
fds.out = (unsigned long *) (bits +   size);
fds.ex  = (unsigned long *) (bits + 2*size);
_

I mean, if that oopses then I'd be very interested in finding out why.

But I'm starting to suspect that it would be better to permit kmalloc(0) in
slub.  It depends on how many more of these things need fixing.

otoh, a kmalloc(0) could be a sign of some buggy/inefficient/weird code, so
there's some value in forcing us to go look at all the callsites.

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: select(0, ..) is valid ?

2007-05-15 Thread H. Peter Anvin
Badari Pulavarty wrote:
 Hi,
 
 Is select(0, ..) is a valid operation ?
 
 I see that there is no check to prevent this or return
 success early, without doing any work. Do we need one ?
 
 slub code is complaining that we are doing kmalloc(0).
 

select(0, ...) is valid, and is functionally equivalent to
select(..., NULL, NULL, NULL, ...); except that any nonzero fdsets get
zeroed on return.  As such, the only thing that can interrupt it is the
timeout, or a signal.

-hpa
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: select(0, ..) is valid ?

2007-05-15 Thread Badari Pulavarty
On Tue, 2007-05-15 at 10:44 -0700, Andrew Morton wrote:
 On Tue, 15 May 2007 10:29:18 -0700
 Badari Pulavarty [EMAIL PROTECTED] wrote:
 
  Hi,
  
  Is select(0, ..) is a valid operation ?
 
 Probably - it becomes an elaborate way of doing a sleep.  Whatever - we
 used to permit it without error, so we should continue to do so.

Okay.

 
  I see that there is no check to prevent this or return
  success early, without doing any work. Do we need one ?
  
  slub code is complaining that we are doing kmalloc(0).
  
  [ cut here ]
  Badness at include/linux/slub_def.h:88
  Call Trace:
  [c001e4eb7640] [c000e650] .show_stack+0x68/0x1b0
  (unreliable)
  [c001e4eb76e0] [c029b854] .report_bug+0x94/0xe8
  [c001e4eb7770] [c00219f0] .program_check_exception
  +0x12c/0x568
  [c001e4eb77f0] [c0004a84] program_check_common+0x104/0x180
  --- Exception: 700 at .get_slab+0x4c/0x234
  LR = .__kmalloc+0x24/0xc4
  [c001e4eb7ae0] [c001e4eb7b80] 0xc001e4eb7b80 (unreliable)
  [c001e4eb7b80] [c00a7ff0] .__kmalloc+0x24/0xc4
  [c001e4eb7c10] [c00ea720] .compat_core_sys_select+0x90/0x240
  [c001e4eb7d00] [c00ec3a4] .compat_sys_select+0xb0/0x190
  [c001e4eb7dc0] [c0014944] .ppc32_select+0x14/0x28
  [c001e4eb7e30] [c000872c] syscall_exit+0x0/0x40
 
 
 I _think_ we can just do
 
 --- a/fs/compat.c~a
 +++ a/fs/compat.c
 @@ -1566,9 +1566,13 @@ int compat_core_sys_select(int n, compat
*/
   ret = -ENOMEM;
   size = FDS_BYTES(n);
 - bits = kmalloc(6 * size, GFP_KERNEL);
 - if (!bits)
 - goto out_nofds;
 + if (likely(size)) {
 + bits = kmalloc(6 * size, GFP_KERNEL);
 + if (!bits)
 + goto out_nofds;
 + } else {
 + bits = NULL;
 + }
   fds.in  = (unsigned long *)  bits;
   fds.out = (unsigned long *) (bits +   size);
   fds.ex  = (unsigned long *) (bits + 2*size);
 _


Yes. This is what I did earlier, but then I was wondering if I
could skip the whole operation and bail out early (if n == 0). 
I guess not.

 I mean, if that oopses then I'd be very interested in finding out why.
 
 But I'm starting to suspect that it would be better to permit kmalloc(0) in
 slub.  It depends on how many more of these things need fixing.
 
 otoh, a kmalloc(0) could be a sign of some buggy/inefficient/weird code, so
 there's some value in forcing us to go look at all the callsites.

So far, I haven't found any other. Lets leave the check.

Thanks,
Badari

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: select(0, ..) is valid ?

2007-05-15 Thread Christoph Lameter
On Tue, 15 May 2007, Andrew Morton wrote:

 I _think_ we can just do
 
 --- a/fs/compat.c~a
 +++ a/fs/compat.c
 @@ -1566,9 +1566,13 @@ int compat_core_sys_select(int n, compat
*/
   ret = -ENOMEM;
   size = FDS_BYTES(n);
 - bits = kmalloc(6 * size, GFP_KERNEL);
 - if (!bits)
 - goto out_nofds;
 + if (likely(size)) {
 + bits = kmalloc(6 * size, GFP_KERNEL);
 + if (!bits)
 + goto out_nofds;
 + } else {
 + bits = NULL;
 + }
   fds.in  = (unsigned long *)  bits;
   fds.out = (unsigned long *) (bits +   size);
   fds.ex  = (unsigned long *) (bits + 2*size);
 _
 
 I mean, if that oopses then I'd be very interested in finding out why.
 
 But I'm starting to suspect that it would be better to permit kmalloc(0) in
 slub.  It depends on how many more of these things need fixing.
 
 otoh, a kmalloc(0) could be a sign of some buggy/inefficient/weird code, so
 there's some value in forcing us to go look at all the callsites.
 
Hmmm... We could have kmalloc(0) return a pointer to the zero page? That 
would catch any writers?
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: select(0, ..) is valid ?

2007-05-15 Thread Andrew Morton
On Tue, 15 May 2007 11:10:22 -0700 (PDT)
Christoph Lameter [EMAIL PROTECTED] wrote:

 On Tue, 15 May 2007, Andrew Morton wrote:
 
  I _think_ we can just do
  
  --- a/fs/compat.c~a
  +++ a/fs/compat.c
  @@ -1566,9 +1566,13 @@ int compat_core_sys_select(int n, compat
   */
  ret = -ENOMEM;
  size = FDS_BYTES(n);
  -   bits = kmalloc(6 * size, GFP_KERNEL);
  -   if (!bits)
  -   goto out_nofds;
  +   if (likely(size)) {
  +   bits = kmalloc(6 * size, GFP_KERNEL);
  +   if (!bits)
  +   goto out_nofds;
  +   } else {
  +   bits = NULL;
  +   }
  fds.in  = (unsigned long *)  bits;
  fds.out = (unsigned long *) (bits +   size);
  fds.ex  = (unsigned long *) (bits + 2*size);
  _
  
  I mean, if that oopses then I'd be very interested in finding out why.
  
  But I'm starting to suspect that it would be better to permit kmalloc(0) in
  slub.  It depends on how many more of these things need fixing.
  
  otoh, a kmalloc(0) could be a sign of some buggy/inefficient/weird code, so
  there's some value in forcing us to go look at all the callsites.
  
 Hmmm... We could have kmalloc(0) return a pointer to the zero page? That 
 would catch any writers?

Returning NULL would have the same effect..

But the problem is that we won't get 100% coverage of all codepaths
for ages, so any oopses we added won't get found.

otoh, any code which does dereference that pointer is buggy anwyay.

The problem here is that code which does

kmalloc(some-expression-which-returns-0)

will go and assume that the kmalloc(0) got an ENOMEM and it'll take the
error path.

Oh well, let's persist with things as they now are.

Perhaps putting a size=0 detector into slab also would speed this
process up.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: select(0, ..) is valid ?

2007-05-15 Thread Christoph Lameter
On Tue, 15 May 2007, Andrew Morton wrote:

 Perhaps putting a size=0 detector into slab also would speed this
 process up.

Signed-off-by: Christoph Lameter [EMAIL PROTECTED]

Index: linux-2.6/mm/slab.c
===
--- linux-2.6.orig/mm/slab.c2007-05-15 11:32:25.0 -0700
+++ linux-2.6/mm/slab.c 2007-05-15 11:35:55.0 -0700
@@ -792,6 +792,7 @@ static inline struct kmem_cache *__find_
 */
BUG_ON(malloc_sizes[INDEX_AC].cs_cachep == NULL);
 #endif
+   WARN_ON_ONCE(size == 0);
while (size  csizep-cs_size)
csizep++;
 
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Panic in ieee_80211_ibss_add_sta when trying to join ad-hoc network (rt2500pci)

2007-05-15 Thread Michael Wu
On Tuesday 15 May 2007 13:12, John W. Linville wrote:\
 Patch below...does this work better?  Looks like upstream needs
 it too...

ACK. Looks like I forgot to set sdata after removing the code that set it.

Thanks,
-Michael Wu


pgpG4koPmIL63.pgp
Description: PGP signature


mac80211 ad-hoc: carrier not set up [was: Panic in ieee_80211_ibss_add_sta]

2007-05-15 Thread David Lamparter
On Tue, May 15, 2007 at 01:12:02PM -0400, John W. Linville wrote:
 Patch below...does this work better?  Looks like upstream needs
 it too...

Yup, this fixes it. Thanks for the quick fix.

However, ad-hoc still does not work, since the network device's
carrier status does not seem to be properly set. (It remains
in NO-CARRIER even after wlan0: Selected IBSS BSSID
92:68:a2:db:de:45 based on configured SSID. I dirtily hacked
around that with the following two-liner:

--- wireless-dev/net/mac80211/ieee80211_sta.c.orig  2007-05-15 
20:19:55.0 +0200
+++ wireless-dev/net/mac80211/ieee80211_sta.c   2007-05-15 21:19:38.362587215 
+0200
@@ -2448,6 +2448,7 @@
mod_timer(ifsta-timer, jiffies + IEEE80211_IBSS_MERGE_INTERVAL);
 
ieee80211_rx_bss_put(dev, bss);
+   netif_carrier_on(dev);
 
return res;
 }
@@ -2648,6 +2649,7 @@
 
ifsta-ssid_set = len ? 1 : 0;
if (sdata-type == IEEE80211_IF_TYPE_IBSS  !ifsta-bssid_set) {
+   netif_carrier_off(dev);
ifsta-ibss_join_req = jiffies;
ifsta-state = IEEE80211_IBSS_SEARCH;
return ieee80211_sta_find_ibss(dev, ifsta);


However, I have NO CLUE WHAT I'M DOING THERE! Make a proper fix!
(Especially, I think it needs more netif_carrier_off calls in
different places.)


Anyway, thanks for my now-working wireless,

David Lamparter

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 15/17] sky2: only disable 88e8056 on some boards

2007-05-15 Thread kernel
Hello, Stephen,

 RE: sky2 88e8056 Gigabyte GA-965GM-S2 uATX motherboard

A couple days ago I posted my observations that this was a hardware problem. My 
boards are now working!

I used the GIGABYTE MOTHERBOARD webpage for SUPPORT. They replied with a EEPROM 
program for the Marvell Yukon ethernet chip.
Once the EEPROM was reprogrammed the diskless system on my bench started 
working and has for the past 24 hours taken a beating on the NFS mount as well 
as receiving a ping flood from another node.

I am not using the sky2 driver, rather the sk98lin from Marvell's website. It 
was a last grasp before buying PCI ethernet cards. Marvell's sk98lin was no 
better than sky2 before the EEPROM reprogramming. My observation is that 
sky2/v1.10 still does not work after the reprogramming.
I will grab your latest sky2 patches and report results.

Here are the details:
-
Suse 10.0 
2.6.21 i386
sk98lin: Network Device Driver v10.0.5.3
(download from Marvell's webpage)

Marvell Yukon chip 88E8056 reprogrammed with files provided by GIGABYTE 
MOTHERBOARD manufacturer:
1024 Feb  9 08:51 GBT5614n.raw
  24 May 16 00:23 VPD.BAT
  93 Mar  2 13:08 eep.bat
  155729 Oct 19  2006 mac.exe
  224533 Oct 31  2006 yukonvpd.exe

Contents of the eep.bat file:
-
@echo off
@del vpd.bat
mac
yukonvpd -P GBT5614n.raw 
yukonvpd -u 1458E000
call vpd.bat

Contents of the VPD.BAT file:
-
YUKONVPD -M 0016E6FF
(I replace the full MAC with FF FF FF)

More as it happens,
-Rob



 RE: sky2 88e8056 Gigabyte GA-965GM-S2 uATX motherboard

 I have now too many of these Gigabyte mobos, GA-965GM-S2 with the Marvell 
 88e8056
 www.gigabyte.com.tw / Products / Motherboard / 
 Products_Overview.aspx?ProductID=2388
 
 Observations.
 -
 The problem may not be sky2 specific. I have diskless nodes where too many 
 can NOT 
 1) reliably DHCP an IP number from the server 
 2) and if the pxelinux.cfg/default loads, it is not certain if the kernel or 
 the initrd would tftp transfer.
 
 There are two of 12 diskless nodes that work, ... always work.
 All mobos are the same, all CMOS is set identically
 



On Mon, May 14, 2007 at 12:55:38PM -0700, Stephen Hemminger wrote:
 On Mon, 14 May 2007 00:53:42 -0400
 Florin Malita [EMAIL PROTECTED] wrote:
 
  Hi Stephen,
  
  Stephen Hemminger wrote:
   Use DMI to add a blacklist of broken board. For now only one is known
   bad. Gentoo users report driver works on other motherboards (strange). 
  [snip]
   +   .ident = Gigabyte 965P-S3,
   +   .matches = {
   +   DMI_MATCH(DMI_SYS_VENDOR, Gigabyte Technology 
   Co., 
   Ltd.),
   +   DMI_MATCH(DMI_PRODUCT_NAME, 965P-S3),
  
  Actually, I've been using sky2 with a 965P-S3 for a couple of months 
  (x86_64 kernel) and as far as I can tell it works like a charm. Recently 
  I had to hack around the blacklisting but other than that I haven't 
  noticed anything strange.
  
  What failures are you trying to prevent? Would a warning (instead of 
  blacklisting) be acceptable?
  
  Thanks,
  Florin
 
 What happens on my system is that the chip is accessing some unknown
 memory location when it reads the descriptors. This leads to:
   * Transmit descriptor errors because the transmit descriptor doesn't
 have the Owner bit set. The list is fine, and all the barriers
 are there it seems like the chip read of memory is getting crap.
   * TSO errors (probably same problem as before)
   * Receive packets with no data. The stack ends up ignoring the 
 garbage; but since we reuse the memory the DMA can/will happen
 later and cause random memory corruption.
 
 Overall it looks like a PCI synchronization problem. Possible differences
 between working/non-working are:
   * BIOS, tried up to the latest beta version with no change
   * Memory, switched to name brand DDR2 800 (2G)
   * MSI
   * AHCI/SATA, I am using Raptor with AHCI when booted with i386 on old
 IDE drive saw no problems
  
 
 -
 To unsubscribe from this list: send the line unsubscribe netdev in
 the body of a message to [EMAIL PROTECTED]
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] don't put multicasts with mc_ttl=0 on the wire

2007-05-15 Thread akepner


A colleague of mine found that multicasts with a ttl of 0
can be sent on the wire. This happens if the sender doesn't
belong to the destination multicast group.

With the following the multicast ttl is respected whether
or not the sender belongs to the destination multicast group.

Signed-off-by: Arthur Kepner [EMAIL PROTECTED]

---

 net/ipv4/route.c |3 +--
 1 files changed, 1 insertions(+), 2 deletions(-)

diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index cb76e3c..bf25cf5 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -2249,8 +2249,7 @@ static inline int __mkroute_output(struct rtable **result,
}
if (flags  (RTCF_BROADCAST | RTCF_MULTICAST)) {
rth-rt_spec_dst = fl-fl4_src;
-   if (flags  RTCF_LOCAL 
-   !(dev_out-flags  IFF_LOOPBACK)) {
+   if (!(dev_out-flags  IFF_LOOPBACK)) {
rth-u.dst.output = ip_mc_output;
RT_CACHE_STAT_INC(out_slow_mc);
}

-- 
Arthur

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] New driver API to speed up small packets xmits

2007-05-15 Thread David Miller
From: Roland Dreier [EMAIL PROTECTED]
Date: Tue, 15 May 2007 09:25:28 -0700

 I'll have to think about implementing that for IPoIB.  One issue I see
 is if I have, say, 4 free entries in my send queue and skb_gso_segment()
 gives me back 5 packets to send.  It's not clear I can recover at that
 point -- I guess I have to check against gso_segs in the xmit routine
 before actually doing the segmentation.

I'd suggest adding a fudge factor to your free TX space, which
is advisable anyways so that when TX is woken up, more of
the transfer from queue to device can happen in a batch-like
fashion.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] mac80211: avoid null ptr deref in ieee80211_ibss_add_sta

2007-05-15 Thread John W. Linville
avoid sdata null pointer dereference in ieee80211_ibss_add_sta.

Signed-off-by: John W. Linville [EMAIL PROTECTED]
---
 net/mac80211/ieee80211_sta.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/net/mac80211/ieee80211_sta.c b/net/mac80211/ieee80211_sta.c
index a36c6f3..dd36cc6 100644
--- a/net/mac80211/ieee80211_sta.c
+++ b/net/mac80211/ieee80211_sta.c
@@ -3154,7 +3154,7 @@ struct sta_info * ieee80211_ibss_add_sta(struct 
net_device *dev,
 {
struct ieee80211_local *local = wdev_priv(dev-ieee80211_ptr);
struct sta_info *sta;
-   struct ieee80211_sub_if_data *sdata = NULL;
+   struct ieee80211_sub_if_data *sdata = IEEE80211_DEV_TO_SUB_IF(dev);
 
/* TODO: Could consider removing the least recently used entry and
 * allow new one to be added. */
-- 
1.5.0.6

-- 
John W. Linville
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] don't put multicasts with mc_ttl=0 on the wire

2007-05-15 Thread David Miller
From: [EMAIL PROTECTED]
Date: Tue, 15 May 2007 12:56:02 -0700

 A colleague of mine found that multicasts with a ttl of 0
 can be sent on the wire. This happens if the sender doesn't
 belong to the destination multicast group.
 
 With the following the multicast ttl is respected whether
 or not the sender belongs to the destination multicast group.
 
 Signed-off-by: Arthur Kepner [EMAIL PROTECTED]

This is actually used by some things if I remember correctly.
See this command and code in net/ipv4/route.c:

/* Special hack: user can direct multicasts
   and limited broadcast via necessary interface
   without fiddling with IP_MULTICAST_IF or IP_PKTINFO.
   This hack is not just for fun, it allows
   vic,vat and friends to work.
   They bind socket to loopback, set ttl to zero
   and expect that it will work.
   From the viewpoint of routing cache they are broken,
   because we are not allowed to build multicast path
   with loopback source addr (look, routing cache
   cannot know, that ttl is zero, so that packet
   will not leave this host and route is valid).
   Luckily, this hack is good workaround.
 */

fl.oif = dev_out-ifindex;
goto make_route;
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] New driver API to speed up small packets xmits

2007-05-15 Thread Roland Dreier
   I'll have to think about implementing that for IPoIB.  One issue I see
   is if I have, say, 4 free entries in my send queue and skb_gso_segment()
   gives me back 5 packets to send.  It's not clear I can recover at that
   point -- I guess I have to check against gso_segs in the xmit routine
   before actually doing the segmentation.
  
  I'd suggest adding a fudge factor to your free TX space, which
  is advisable anyways so that when TX is woken up, more of
  the transfer from queue to device can happen in a batch-like
  fashion.

Well, IPoIB doesn't do netif_wake_queue() until half the device's TX
queue is free, so we should get batching.  However, I'm not sure that
I can count on a fudge factor ensuring that there's enough space to
handle everything skb_gso_segment() gives me -- is there any reliable
way to get an upper bound on how many segments a given gso skb will
use when it's segmented?

 - R.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] New driver API to speed up small packets xmits

2007-05-15 Thread Michael Chan
On Tue, 2007-05-15 at 13:52 -0700, Roland Dreier wrote:

 Well, IPoIB doesn't do netif_wake_queue() until half the device's TX
 queue is free, so we should get batching.  However, I'm not sure that
 I can count on a fudge factor ensuring that there's enough space to
 handle everything skb_gso_segment() gives me -- is there any reliable
 way to get an upper bound on how many segments a given gso skb will
 use when it's segmented?

Take a look at tg3.c.  I use (gso_segs * 3) as the upper bound.

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] New driver API to speed up small packets xmits

2007-05-15 Thread Roland Dreier
   Well, IPoIB doesn't do netif_wake_queue() until half the device's TX
   queue is free, so we should get batching.  However, I'm not sure that
   I can count on a fudge factor ensuring that there's enough space to
   handle everything skb_gso_segment() gives me -- is there any reliable
   way to get an upper bound on how many segments a given gso skb will
   use when it's segmented?
  
  Take a look at tg3.c.  I use (gso_segs * 3) as the upper bound.

Thanks for the pointer... I noticed that code, but could you tell me
where the * 3 comes from?

 - R.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] New driver API to speed up small packets xmits

2007-05-15 Thread Roland Dreier
I thought to enable GSO, device driver actually does nothing rather
  than enabling the flag. GSO moved TCP offloading to interface layer before
  device xmit. It's a different idea with multiple packets per xmit. GSO
  still queue the packet one bye one in QDISC and xmit one bye one. The
  multiple packets per xmit will xmit N packets when N packets in QDISC
  queue. Please corrent me if wrong.

Current use of GSO does segmentation just above the netdevice driver.
However there's nothing that prevents it from being used to push
segmentation into the driver -- as I described, just set NETIF_F_GSO_SOFTWARE
and do skb_gso_segment() within the driver.

As Michael Chan pointed out, tg3.c already does this to work around a
rare bug in the HW TSO implementation.  For IPoIB we could do it all
the time to get multiple send work requests from one call to the xmit
method.

 - R.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] New driver API to speed up small packets xmits

2007-05-15 Thread Michael Chan
On Tue, 2007-05-15 at 14:08 -0700, Roland Dreier wrote:
Well, IPoIB doesn't do netif_wake_queue() until half the device's TX
queue is free, so we should get batching.  However, I'm not sure that
I can count on a fudge factor ensuring that there's enough space to
handle everything skb_gso_segment() gives me -- is there any reliable
way to get an upper bound on how many segments a given gso skb will
use when it's segmented?
   
   Take a look at tg3.c.  I use (gso_segs * 3) as the upper bound.
 
 Thanks for the pointer... I noticed that code, but could you tell me
 where the * 3 comes from?
 
For each gso_seg, there will be a header and the payload may span 2
pages for 1500-byte packets.  We always assume 1500-byte packets because
the buggy chips do not support jumbo frames.

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Intel IXP4xx network drivers v.2 - Ethernet and HSS

2007-05-15 Thread Lennert Buytenhek
On Wed, May 09, 2007 at 03:45:53PM +0100, Michael-Luke Jones wrote:

 No-one is saying that this driver should not be mainlined before it  
 has LE support. All that I said was:
 
  Personally I'd like LE ethernet tested and working before we push.
 
 The alternative would be to explicitly state in Kconfig that LE arm  
 is broken with this driver, so that this could be fixed later.

The driver does bomb out during compile if __ARMEB__ isn't defined,
but that apparently wasn't good enough.


 Please can we not blow this out of proportion, it really isn't that  
 big a deal. The irony is that fixing Krzysztof's driver to work on LE  
 will probably be quite easy, given that we already have a working LE  
 driver from Christian.

I'm looking forward to your patch.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: select(0, ..) is valid ?

2007-05-15 Thread Hugh Dickins
On Tue, 15 May 2007, Christoph Lameter wrote:
 On Tue, 15 May 2007, Andrew Morton wrote:
 
  I _think_ we can just do
  
  --- a/fs/compat.c~a
  +++ a/fs/compat.c
  @@ -1566,9 +1566,13 @@ int compat_core_sys_select(int n, compat
   */
  ret = -ENOMEM;
  size = FDS_BYTES(n);
  -   bits = kmalloc(6 * size, GFP_KERNEL);
  -   if (!bits)
  -   goto out_nofds;
  +   if (likely(size)) {
  +   bits = kmalloc(6 * size, GFP_KERNEL);
  +   if (!bits)
  +   goto out_nofds;
  +   } else {
  +   bits = NULL;
  +   }

It's interesting that compat_core_sys_select() shows this kmalloc(0)
failure but core_sys_select() does not.  That's because core_sys_select()
avoids kmalloc by using a buffer on the stack for small allocations (and
0 sure is small).  Shouldn't compat_core_sys_select() do just the same?
Or is SLUB going to be so efficient that doing so is a waste of time?

  fds.in  = (unsigned long *)  bits;
  fds.out = (unsigned long *) (bits +   size);
  fds.ex  = (unsigned long *) (bits + 2*size);
  _
  
  I mean, if that oopses then I'd be very interested in finding out why.
  
  But I'm starting to suspect that it would be better to permit kmalloc(0) in
  slub.  It depends on how many more of these things need fixing.
  
  otoh, a kmalloc(0) could be a sign of some buggy/inefficient/weird code, so
  there's some value in forcing us to go look at all the callsites.
  
 Hmmm... We could have kmalloc(0) return a pointer to the zero page? That 
 would catch any writers?

I don't think using the zero page that way would be at all safe:
there's probably configurations/architectures in which it is write
protected, but I don't believe that's a given at all.

But the principle is good: ERR_PTR(-MAX_ERRNO) should work,
that area up the top should always give a fault.
Hmm, but perhaps there are architectures on which it does not?

Hugh
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] New driver API to speed up small packets xmits

2007-05-15 Thread David Miller
From: Michael Chan [EMAIL PROTECTED]
Date: Tue, 15 May 2007 15:05:28 -0700

 On Tue, 2007-05-15 at 14:08 -0700, Roland Dreier wrote:
 Well, IPoIB doesn't do netif_wake_queue() until half the device's TX
 queue is free, so we should get batching.  However, I'm not sure that
 I can count on a fudge factor ensuring that there's enough space to
 handle everything skb_gso_segment() gives me -- is there any reliable
 way to get an upper bound on how many segments a given gso skb will
 use when it's segmented?

Take a look at tg3.c.  I use (gso_segs * 3) as the upper bound.
  
  Thanks for the pointer... I noticed that code, but could you tell me
  where the * 3 comes from?
  
 For each gso_seg, there will be a header and the payload may span 2
 pages for 1500-byte packets.  We always assume 1500-byte packets because
 the buggy chips do not support jumbo frames.

Correct.

I think there may be a case where you could see up to 4 segments.
If the user corks the TCP socket, does a sendmsg() (which puts
the data in the per-socket page) then does a sendfile() you'll
see something like:

skb-data   IP, TCP, ethernet headers, etc.
page0   sendmsg() data
page1   sendfile
page2   sendfile

Ie. this can happen if the sendfile() part starts near the
end of a page, so it would get split even for a 1500 MTU
frame.

Even more complex variants are possible if the user does
tiny sendfile() requests to different pages within the file.

So in fact it can span up to N pages.

But there is an upper limit defined by the original GSO
frame, and that is controlled by MAX_SKB_FRAGS, so at most
you would see MAX_SKB_FRAGS plus some small constant.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] New driver API to speed up small packets xmits

2007-05-15 Thread David Miller
From: Shirley Ma [EMAIL PROTECTED]
Date: Tue, 15 May 2007 14:22:57 -0700

   I just wonder without TSO support in HW, how much benefit we
 can get by pushing GSO from interface layer to device layer besides
 we can do multiple packets in IPoIB.

I bet the gain is non-trivial.

I'd say about half of the gain from TSO comes from only calling down
into the driver from TCP one time as opposed to N times.  That's
the majority of the CPU work involved in TCP sending.

The rest of the gain comes from only transmitting the packet headers
once rather than N times, which conserves I/O bus bandwidth.

GSO will not help the case of lots of UDP applications sending
small packets, or something like that.  An efficient qdisc--driver
transfer during netif_wake_queue() could help solve some of that,
as is being discussed here.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] New driver API to speed up small packets xmits

2007-05-15 Thread Roland Dreier
Shirley   I just wonder without TSO support in HW, how much
Shirley benefit we can get by pushing GSO from interface layer to
Shirley device layer besides we can do multiple packets in IPoIB.

The entire benefit comes from having multiple packets to queue in one
call to the xmit method.

 - R.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] don't put multicasts with mc_ttl=0 on the wire

2007-05-15 Thread David Stevens
Arthur,
I assume you're making use of the hack mentioned in route.c:

 ... This hack is not just for fun, it allows vic, vat and friends to 
work.
They bind socket to loopback, set ttl to zero and expect that it will 
work.
I don't know the details of the intent for this hack, but did you
test that it won't break them?
A multicast application that relies on the (unicast) routing table
at all is broken IMHO, as is an app that sets the TTL to 0, but shouldn't
all ttl 0 packets about to be sent to the wire be dropped? Wouldn't that
be a better way to handle this case?

+-DLS

[EMAIL PROTECTED] wrote on 05/15/2007 12:56:02 PM:

 A colleague of mine found that multicasts with a ttl of 0
 can be sent on the wire. This happens if the sender doesn't
 belong to the destination multicast group.
 
 With the following the multicast ttl is respected whether
 or not the sender belongs to the destination multicast group.
 
 Signed-off-by: Arthur Kepner [EMAIL PROTECTED]
 
 ---
 
  net/ipv4/route.c |3 +--
  1 files changed, 1 insertions(+), 2 deletions(-)
 
 diff --git a/net/ipv4/route.c b/net/ipv4/route.c
 index cb76e3c..bf25cf5 100644
 --- a/net/ipv4/route.c
 +++ b/net/ipv4/route.c
 @@ -2249,8 +2249,7 @@ static inline int __mkroute_output(struct rtable 
**result,
 }
 if (flags  (RTCF_BROADCAST | RTCF_MULTICAST)) {
rth-rt_spec_dst = fl-fl4_src;
 -  if (flags  RTCF_LOCAL 
 -  !(dev_out-flags  IFF_LOOPBACK)) {
 +  if (!(dev_out-flags  IFF_LOOPBACK)) {
   rth-u.dst.output = ip_mc_output;
   RT_CACHE_STAT_INC(out_slow_mc);
}
 
 -- 
 Arthur
 
 -
 To unsubscribe from this list: send the line unsubscribe netdev in
 the body of a message to [EMAIL PROTECTED]
 More majordomo info at  http://vger.kernel.org/majordomo-info.html

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 15/17] sky2: only disable 88e8056 on some boards

2007-05-15 Thread Stephen Hemminger
On Tue, 15 May 2007 15:43:39 -0400
[EMAIL PROTECTED] wrote:

 Hello, Stephen,
 
  RE: sky2 88e8056 Gigabyte GA-965GM-S2 uATX motherboard
 
 A couple days ago I posted my observations that this was a hardware problem. 
 My boards are now working!
 
 I used the GIGABYTE MOTHERBOARD webpage for SUPPORT. They replied with a 
 EEPROM program for the Marvell Yukon ethernet chip.
 Once the EEPROM was reprogrammed the diskless system on my bench started 
 working and has for the past 24 hours taken a beating on the NFS mount as 
 well as receiving a ping flood from another node.


Any chance of sending me the stuff to verify if it fixes my system.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch 1/4] pasemi_mac: Fix register defines

2007-05-15 Thread Jeff Garzik

[EMAIL PROTECTED] wrote:

Some shift values were obviously wrong. Fix them to correspond with
the masks.

Signed-off-by: Olof Johansson [EMAIL PROTECTED]


applied 1-4 to #upstream-fixes


-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch/resend] smc911x: fix compilation breakage

2007-05-15 Thread Jeff Garzik

Vitaly Wool wrote:

Looks like the new version of this patch has been overlooked,
so I'm resending it.

It just adapts the driver to the new IRQ API
according to what Russell has pointed out.

drivers/net/smc911x.c |6 ++
1 files changed, 2 insertions(+), 4 deletions(-)

Signed-off-by: Vitaly Wool [EMAIL PROTECTED]


applied to #upstream-fixes


-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/5] ucc_geth: eliminate max-speed, change interface-type to phy-connection-type

2007-05-15 Thread Jeff Garzik

Kim Phillips wrote:

It was agreed that phy-connection-type was a better name for
the interface-type property, so this patch renames it.

Also, the max-speed property name was determined too generic,
and is therefore eliminated in favour of phy-connection-type
derivation logic.

includes corrections to copyright text.

Signed-off-by: Kim Phillips [EMAIL PROTECTED]
---
 drivers/net/ucc_geth.c |   40 
 drivers/net/ucc_geth_mii.c |9 +
 drivers/net/ucc_geth_mii.h |   10 +-
 3 files changed, 26 insertions(+), 33 deletions(-)


applied to #upstream


-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: mac80211 ad-hoc: carrier not set up [was: Panic in ieee_80211_ibss_add_sta]

2007-05-15 Thread Ivo van Doorn
Hi,

 However, ad-hoc still does not work, since the network device's
 carrier status does not seem to be properly set. (It remains
 in NO-CARRIER even after wlan0: Selected IBSS BSSID
 92:68:a2:db:de:45 based on configured SSID. I dirtily hacked
 around that with the following two-liner:

I was aware of the recent rt2x00 adhoc breakage but hadn't looked into it yet,
the below suggestion about the netif_carrier does make sense though,
since the last report it was working was before rt2x00 removed the 
ieee80211_netif
calls, and the first report of its breakage was some time after the removal.
(Since a lot of code has been moved around in between the ieee80211_netif wasn't
the first thing that I would have thought of as a probable cause. ;) )

 --- wireless-dev/net/mac80211/ieee80211_sta.c.orig2007-05-15 
 20:19:55.0 +0200
 +++ wireless-dev/net/mac80211/ieee80211_sta.c 2007-05-15 21:19:38.362587215 
 +0200
 @@ -2448,6 +2448,7 @@
   mod_timer(ifsta-timer, jiffies + IEEE80211_IBSS_MERGE_INTERVAL);
  
   ieee80211_rx_bss_put(dev, bss);
 + netif_carrier_on(dev);
  
   return res;
  }
 @@ -2648,6 +2649,7 @@
  
   ifsta-ssid_set = len ? 1 : 0;
   if (sdata-type == IEEE80211_IF_TYPE_IBSS  !ifsta-bssid_set) {
 + netif_carrier_off(dev);
   ifsta-ibss_join_req = jiffies;
   ifsta-state = IEEE80211_IBSS_SEARCH;
   return ieee80211_sta_find_ibss(dev, ifsta);
 
 
 However, I have NO CLUE WHAT I'M DOING THERE! Make a proper fix!
 (Especially, I think it needs more netif_carrier_off calls in
 different places.)
 
 
 Anyway, thanks for my now-working wireless,

Ivo
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[WIP] [PATCH] WAS Re: [RFC] New driver API to speed up small packets xmits

2007-05-15 Thread jamal
On Tue, 2007-15-05 at 14:32 -0700, David Miller wrote:
  An efficient qdisc--driver
 transfer during netif_wake_queue() could help solve some of that,
 as is being discussed here.

Ok, heres the approach i discussed at netconf.
It needs net-2.6 and the patch i posted earlier to clean up
qdisc_restart() [1].
I havent ported over all the bits from 2.6.18,  but this works.
Krishna and i have colluded privately on working together. I just need
to reproduce the patches, so here is the core.
A lot of the code in the core could be aggragated later - right now i am
worried about correctness.
I will post a patch for tun device in a few minutes
that i use to test on my laptop (i need to remove some debugs) to show
an example.
I also plan to post a patch for e1000 - but that will take more
than a few minutes.
the e1000 driver has changed quiet a bit since 2.6.18, so it is
consuming.

What does a driver need to do to get batched-to?

1) On initialization (probe probably)
 a) set NETIF_F_BTX in its dev-features at startup
 i.e dev-features |= NETIF_F_BTX
 b) initialize the batch queue i.e something like
skb_queue_head_init(dev-blist);
c) set dev-xmit_win to something reasonable like
maybe half the DMA ring size or tx_queuelen

2) create a new method for batch txmit.
This loops on dev-blist and stashes onto hardware.
All return codes like NETDEV_TX_OK etc still apply.

3) set the dev-xmit_win which provides hints on how much
data to send from the core to the driver. Some suggestions:
a)on doing a netif_stop, set it to 1
b)on netif_wake_queue set it to the max available space

Of course, to work, all this requires that the driver to have a
threshold for waking up tx path; like  drivers such as e1000 or tg3 do
in order to invoke netif_wake_queue (example look at TX_WAKE_THRESHOLD
usage in e1000).


feedback welcome (preferably in the form of patches).

Anyone with a really nice tool to measure CPU improvement will help
a great deal in quantifying things. As i have said earlier, I never saw
any throughput improvement. But like T/GSO it may be just CPU savings
(as was suggested at netconf).

cheers,
jamal
[1] http://marc.info/?l=linux-netdevm=117914954911959w=2

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index f671cd2..7205748 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -325,6 +325,7 @@ struct net_device
 #define NETIF_F_VLAN_CHALLENGED	1024	/* Device cannot handle VLAN packets */
 #define NETIF_F_GSO		2048	/* Enable software GSO. */
 #define NETIF_F_LLTX		4096	/* LockLess TX */
+#define NETIF_F_BTX		8192	/* Capable of batch tx */
 
 	/* Segmentation offload features */
 #define NETIF_F_GSO_SHIFT	16
@@ -450,6 +451,11 @@ struct net_device
 	void			*priv;	/* pointer to private data	*/
 	int			(*hard_start_xmit) (struct sk_buff *skb,
 		struct net_device *dev);
+	int			(*hard_batch_xmit) (struct sk_buff_head *list,
+		struct net_device *dev);
+	int			(*hard_prep_xmit) (struct sk_buff *skb,
+		struct net_device *dev);
+	int			xmit_win;
 	/* These may be needed for future network-power-down code. */
 	unsigned long		trans_start;	/* Time (in jiffies) of last Tx	*/
 
@@ -466,6 +472,10 @@ struct net_device
 	struct list_head	todo_list;
 	/* device index hash chain */
 	struct hlist_node	index_hlist;
+	/*XXX: Fix eventually to not allocate if device not
+	 *batch capable
+	*/
+	struct sk_buff_head	blist;
 
 	struct net_device	*link_watch_next;
 
diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c
index ed80054..61fa301 100644
--- a/net/sched/sch_generic.c
+++ b/net/sched/sch_generic.c
@@ -85,10 +85,12 @@ static inline int
 do_dev_requeue(struct sk_buff *skb, struct net_device *dev, struct Qdisc *q)
 {
 
-	if (unlikely(skb-next))
-		dev-gso_skb = skb;
-	else
-		q-ops-requeue(skb, q);
+	if (skb) {
+		if (unlikely(skb-next))
+			dev-gso_skb = skb;
+		else
+			q-ops-requeue(skb, q);
+	}
 	/* XXX: Could netif_schedule fail? Or is that fact we are
 	 * requeueing imply the hardware path is closed
 	 * and even if we fail, some interupt will wake us
@@ -116,7 +118,10 @@ tx_islocked(struct sk_buff *skb, struct net_device *dev, struct Qdisc *q)
 	int ret = handle_dev_cpu_collision(dev);
 
 	if (ret == SCHED_TX_DROP) {
-		kfree_skb(skb);
+		if (skb) /* we are not batching */
+			kfree_skb(skb);
+		else if (!skb_queue_empty(dev-blist))
+			skb_queue_purge(dev-blist);
 		return qdisc_qlen(q);
 	}
 
@@ -195,10 +200,99 @@ static inline int qdisc_restart(struct net_device *dev)
 	return do_dev_requeue(skb, dev, q);
 }
 
+static int try_get_tx_pkts(struct net_device *dev, struct Qdisc *q, int count)
+{
+	struct sk_buff *skb;
+	struct sk_buff_head *skbs = dev-blist;
+	int tdq = count;
+
+	/* 
+	 * very unlikely, but who knows ..
+	 * If this happens we dont try to grab more pkts
+	 */
+	if (!skb_queue_empty(dev-blist))
+		return skb_queue_len(dev-blist);
+
+	if (dev-gso_skb) {
+		count--;
+		__skb_queue_head(skbs, dev-gso_skb);
+		dev-gso_skb = NULL;
+	}
+
+	

[PATCH 0/6] skge/sky2 patches for 2.6.21.2

2007-05-15 Thread Stephen Hemminger
This is a backport of all the bugfixes in 2.6.22-rc1 (or later)
to 2.6.21.y

--
Stephen Hemminger [EMAIL PROTECTED]

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 4/6] sky2: 88e8071 support not ready

2007-05-15 Thread Stephen Hemminger
The driver is not ready to support 88e8071 chip, it requires several
more changes (not done yet). If this chip is present, system will hang on boot.

Signed-off-by: Stephen Hemminger [EMAIL PROTECTED]

---
 drivers/net/sky2.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- linux-2.6.21.y.orig/drivers/net/sky2.c  2007-05-15 09:07:11.0 
-0700
+++ linux-2.6.21.y/drivers/net/sky2.c   2007-05-15 09:07:14.0 -0700
@@ -129,7 +129,7 @@ static const struct pci_device_id sky2_i
{ PCI_DEVICE(PCI_VENDOR_ID_MARVELL, 0x4368) }, /* 88EC034 */
{ PCI_DEVICE(PCI_VENDOR_ID_MARVELL, 0x4369) }, /* 88EC042 */
{ PCI_DEVICE(PCI_VENDOR_ID_MARVELL, 0x436A) }, /* 88E8058 */
-   { PCI_DEVICE(PCI_VENDOR_ID_MARVELL, 0x436B) }, /* 88E8071 */
+// { PCI_DEVICE(PCI_VENDOR_ID_MARVELL, 0x436B) }, /* 88E8071 */
{ 0 }
 };
 

--
Stephen Hemminger [EMAIL PROTECTED]

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/6] skge: allow WOL except for known broken chips

2007-05-15 Thread Stephen Hemminger
Wake On Lan works correctly on Yukon-FE and other variants.

Signed-off-by: Stephen Hemminger [EMAIL PROTECTED]a

---
 drivers/net/skge.c |9 ++---
 1 file changed, 6 insertions(+), 3 deletions(-)

--- linux-2.6.21.y.orig/drivers/net/skge.c  2007-05-10 12:10:38.0 
-0700
+++ linux-2.6.21.y/drivers/net/skge.c   2007-05-10 13:26:26.0 -0700
@@ -135,10 +135,13 @@ static void skge_get_regs(struct net_dev
 /* Wake on Lan only supported on Yukon chips with rev 1 or above */
 static u32 wol_supported(const struct skge_hw *hw)
 {
-   if (hw-chip_id == CHIP_ID_YUKON  hw-chip_rev != 0)
-   return WAKE_MAGIC | WAKE_PHY;
-   else
+   if (hw-chip_id == CHIP_ID_GENESIS)
return 0;
+
+   if (hw-chip_id == CHIP_ID_YUKON  hw-chip_rev == 0)
+   return 0;
+
+   return WAKE_MAGIC | WAKE_PHY;
 }
 
 static u32 pci_wake_enabled(struct pci_dev *dev)

--
Stephen Hemminger [EMAIL PROTECTED]

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3/6] sky2: allow 88E8056

2007-05-15 Thread Stephen Hemminger
It looks like the problems of Gigabyte 88E8056 are unique to that chip
motherboard and maybe fixable by EEPROM update.

Signed-off-by: Stephen Hemminger [EMAIL PROTECTED]

---
 drivers/net/sky2.c |3 ---
 1 file changed, 3 deletions(-)

--- linux-2.6.21.y.orig/drivers/net/sky2.c  2007-05-15 09:06:58.0 
-0700
+++ linux-2.6.21.y/drivers/net/sky2.c   2007-05-15 09:07:11.0 -0700
@@ -123,10 +123,7 @@ static const struct pci_device_id sky2_i
{ PCI_DEVICE(PCI_VENDOR_ID_MARVELL, 0x4361) }, /* 88E8050 */
{ PCI_DEVICE(PCI_VENDOR_ID_MARVELL, 0x4362) }, /* 88E8053 */
{ PCI_DEVICE(PCI_VENDOR_ID_MARVELL, 0x4363) }, /* 88E8055 */
-#ifdef broken
-   /* This device causes data corruption problems that are not resolved */
{ PCI_DEVICE(PCI_VENDOR_ID_MARVELL, 0x4364) }, /* 88E8056 */
-#endif
{ PCI_DEVICE(PCI_VENDOR_ID_MARVELL, 0x4366) }, /* 88EC036 */
{ PCI_DEVICE(PCI_VENDOR_ID_MARVELL, 0x4367) }, /* 88EC032 */
{ PCI_DEVICE(PCI_VENDOR_ID_MARVELL, 0x4368) }, /* 88EC034 */

--
Stephen Hemminger [EMAIL PROTECTED]

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 6/6] sky2: fix oops on shutdown

2007-05-15 Thread Stephen Hemminger
If the device fails during module startup for some reason like unsupported chip
version then the driver would crash dereferencing a null pointer, on shutdown
or suspend/resume.

Signed-off-by: Stephen Hemminger [EMAIL PROTECTED]

---
 drivers/net/sky2.c |   10 ++
 1 file changed, 10 insertions(+)

--- linux-2.6.21.y.orig/drivers/net/sky2.c  2007-05-15 09:07:14.0 
-0700
+++ linux-2.6.21.y/drivers/net/sky2.c   2007-05-15 09:07:32.0 -0700
@@ -3719,6 +3719,7 @@ err_out_free_regions:
pci_release_regions(pdev);
pci_disable_device(pdev);
 err_out:
+   pci_set_drvdata(pdev, NULL);
return err;
 }
 
@@ -3771,6 +3772,9 @@ static int sky2_suspend(struct pci_dev *
struct sky2_hw *hw = pci_get_drvdata(pdev);
int i, wol = 0;
 
+   if (!hw)
+   return 0;
+
del_timer_sync(hw-idle_timer);
netif_poll_disable(hw-dev[0]);
 
@@ -3802,6 +3806,9 @@ static int sky2_resume(struct pci_dev *p
struct sky2_hw *hw = pci_get_drvdata(pdev);
int i, err;
 
+   if (!hw)
+   return 0;
+
err = pci_set_power_state(pdev, PCI_D0);
if (err)
goto out;
@@ -3848,6 +3855,9 @@ static void sky2_shutdown(struct pci_dev
struct sky2_hw *hw = pci_get_drvdata(pdev);
int i, wol = 0;
 
+   if (!hw)
+   return;
+
del_timer_sync(hw-idle_timer);
netif_poll_disable(hw-dev[0]);
 

--
Stephen Hemminger [EMAIL PROTECTED]

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 5/6] skge: crash on shutdown/suspend

2007-05-15 Thread Stephen Hemminger
If device fails during module startup for some reason (like unsupported chip
version) then driver would crash dereferencing a null pointer, on shutdown
or suspend/resume.

Signed-off-by: Stephen Hemminger [EMAIL PROTECTED]

---
 drivers/net/skge.c |9 +
 1 file changed, 9 insertions(+)

--- linux-2.6.21.y.orig/drivers/net/skge.c  2007-05-15 09:06:30.0 
-0700
+++ linux-2.6.21.y/drivers/net/skge.c   2007-05-15 09:07:20.0 -0700
@@ -3794,6 +3794,9 @@ static int skge_suspend(struct pci_dev *
struct skge_hw *hw  = pci_get_drvdata(pdev);
int i, err, wol = 0;
 
+   if (!hw)
+   return 0;
+
err = pci_save_state(pdev);
if (err)
return err;
@@ -3822,6 +3825,9 @@ static int skge_resume(struct pci_dev *p
struct skge_hw *hw  = pci_get_drvdata(pdev);
int i, err;
 
+   if (!hw)
+   return 0;
+
err = pci_set_power_state(pdev, PCI_D0);
if (err)
goto out;
@@ -3860,6 +3866,9 @@ static void skge_shutdown(struct pci_dev
struct skge_hw *hw  = pci_get_drvdata(pdev);
int i, wol = 0;
 
+   if (!hw)
+   return;
+
for (i = 0; i  hw-ports; i++) {
struct net_device *dev = hw-dev[i];
struct skge_port *skge = netdev_priv(dev);

--
Stephen Hemminger [EMAIL PROTECTED]

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/6] skge: default WOL should be magic only

2007-05-15 Thread Stephen Hemminger
By default, the skge driver now enables wake on magic and wake on PHY.
This is a bad default (bug), wake on PHY means machine will never shutdown 
if connected to a switch.

Signed-off-by: Stephen Hemminger [EMAIL PROTECTED]a

---
 drivers/net/skge.c |4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

--- linux-2.6.21.y.orig/drivers/net/skge.c  2007-05-10 13:26:26.0 
-0700
+++ linux-2.6.21.y/drivers/net/skge.c   2007-05-10 13:26:31.0 -0700
@@ -3586,7 +3586,9 @@ static struct net_device *skge_devinit(s
skge-duplex = -1;
skge-speed = -1;
skge-advertising = skge_supported_modes(hw);
-   skge-wol = pci_wake_enabled(hw-pdev) ? wol_supported(hw) : 0;
+
+   if (pci_wake_enabled(hw-pdev))
+   skge-wol = wol_supported(hw)  WAKE_MAGIC;
 
hw-dev[port] = dev;
 

--
Stephen Hemminger [EMAIL PROTECTED]

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] [NET] Add constant for FCS/CRC length (frame check sequence)

2007-05-15 Thread Auke Kok
From: Auke Kok [EMAIL PROTECTED]

About a dozen drivers that have some form of crc checksumming or offloading
use this constant, warranting a global define for it.

Signed-off-by: Auke Kok [EMAIL PROTECTED]
---

 include/linux/if_ether.h |1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/include/linux/if_ether.h b/include/linux/if_ether.h
index 1db774c..3213f6f 100644
--- a/include/linux/if_ether.h
+++ b/include/linux/if_ether.h
@@ -33,6 +33,7 @@
 #define ETH_ZLEN   60  /* Min. octets in frame sans FCS */
 #define ETH_DATA_LEN   1500/* Max. octets in payload*/
 #define ETH_FRAME_LEN  1514/* Max. octets in frame sans FCS */
+#define ETH_FCS_LEN4   /* Octets in the FCS */
 
 /*
  * These are the defined Ethernet Protocol ID's.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [WIP] [PATCH] WAS Re: [RFC] New driver API to speed up small packets xmits

2007-05-15 Thread jamal
On Tue, 2007-15-05 at 18:17 -0400, jamal wrote:

 I will post a patch for tun device in a few minutes
 that i use to test on my laptop (i need to remove some debugs) to show
 an example.

Ok, here it is. 
The way i test is to point packets at a tun device. [One way i do it
is attach an ingress qdisc on lo; attach a u32 filter to match all;
on match redirect to the tun device].
The user space program reading sleeps for about a second every 20
packets or so. This forces things to accumulate in the drivers queue.
Backpressure builds up and the throttling effect is really nice to see
working.

I will try to post the e1000 patch tonight or tommorow morning.

cheers,
jamal


diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index a2c6caa..076f794 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -70,6 +70,7 @@
 static int debug;
 #endif
 
+#define NETDEV_LTT 4 /* the low threshold to open up the tx path */
 /* Network device part of the driver */
 
 static LIST_HEAD(tun_dev_list);
@@ -86,9 +87,53 @@ static int tun_net_open(struct net_device *dev)
 static int tun_net_close(struct net_device *dev)
 {
 	netif_stop_queue(dev);
+	//skb_queue_purge(dev-blist);
 	return 0;
 }
 
+/* Batch Net device start xmit
+ * combine with non-batching version
+ * */
+static int tun_net_bxmit(struct sk_buff_head *skbs, struct net_device *dev)
+{
+	struct sk_buff *skb;
+	struct tun_struct *tun = netdev_priv(dev);
+	u32 qlen = skb_queue_len(tun-readq);
+
+	/* Drop packet if interface is not attached */
+	if (!tun-attached) {
+		tun-stats.tx_dropped+=skb_queue_len(dev-blist);
+		skb_queue_purge(dev-blist);
+		return NETDEV_TX_OK;
+	}
+
+	while (skb_queue_len(dev-blist)) {
+		skb = __skb_dequeue(skbs);
+		if (!skb)
+			break;
+		skb_queue_tail(tun-readq, skb);
+	}
+
+	qlen = skb_queue_len(tun-readq);
+	if (qlen = dev-tx_queue_len) {
+		netif_stop_queue(dev);
+		tun-stats.tx_fifo_errors++;
+		dev-xmit_win = 1;
+	} else {
+		dev-xmit_win = dev-tx_queue_len - qlen;
+	}
+
+	/* Queue packet */
+	dev-trans_start = jiffies;
+
+	/* Notify and wake up reader process */
+	if (tun-flags  TUN_FASYNC)
+		kill_fasync(tun-fasync, SIGIO, POLL_IN);
+	wake_up_interruptible(tun-read_wait);
+
+	return NETDEV_TX_OK;
+}
+
 /* Net device start xmit */
 static int tun_net_xmit(struct sk_buff *skb, struct net_device *dev)
 {
@@ -207,6 +252,7 @@ static void tun_net_init(struct net_device *dev)
 		dev-tx_queue_len = TUN_READQ_SIZE;  /* We prefer our own queue length */
 		break;
 	}
+	dev-xmit_win = dev-tx_queue_len1; /* handwave, handwave */
 }
 
 /* Character device part */
@@ -382,7 +428,13 @@ static ssize_t tun_chr_aio_read(struct kiocb *iocb, const struct iovec *iv,
 			schedule();
 			continue;
 		}
-		netif_wake_queue(tun-dev);
+		{
+			u32 t = skb_queue_len(tun-readq);
+			if (netif_queue_stopped(tun-dev)  t  NETDEV_LTT) {
+tun-dev-xmit_win = tun-dev-tx_queue_len;
+netif_wake_queue(tun-dev);
+			}
+		}
 
 		/** Decide whether to accept this packet. This code is designed to
 		 * behave identically to an Ethernet interface. Accept the packet if
@@ -429,6 +481,7 @@ static void tun_setup(struct net_device *dev)
 	struct tun_struct *tun = netdev_priv(dev);
 
 	skb_queue_head_init(tun-readq);
+	skb_queue_head_init(dev-blist);
 	init_waitqueue_head(tun-read_wait);
 
 	tun-owner = -1;
@@ -436,6 +489,8 @@ static void tun_setup(struct net_device *dev)
 	SET_MODULE_OWNER(dev);
 	dev-open = tun_net_open;
 	dev-hard_start_xmit = tun_net_xmit;
+	dev-hard_prep_xmit = NULL;
+	dev-hard_batch_xmit = tun_net_bxmit;
 	dev-stop = tun_net_close;
 	dev-get_stats = tun_net_stats;
 	dev-ethtool_ops = tun_ethtool_ops;
@@ -458,7 +513,7 @@ static struct tun_struct *tun_get_by_name(const char *name)
 static int tun_set_iff(struct file *file, struct ifreq *ifr)
 {
 	struct tun_struct *tun;
-	struct net_device *dev;
+	struct net_device *dev = NULL;
 	int err;
 
 	tun = tun_get_by_name(ifr-ifr_name);
@@ -528,12 +583,15 @@ static int tun_set_iff(struct file *file, struct ifreq *ifr)
 	}
 
 	DBG(KERN_INFO %s: tun_set_iff\n, tun-dev-name);
+	dev-features |= NETIF_F_BTX;
 
 	if (ifr-ifr_flags  IFF_NO_PI)
 		tun-flags |= TUN_NO_PI;
 
-	if (ifr-ifr_flags  IFF_ONE_QUEUE)
+	if (ifr-ifr_flags  IFF_ONE_QUEUE) {
 		tun-flags |= TUN_ONE_QUEUE;
+		dev-features = ~NETIF_F_BTX;
+	}
 
 	file-private_data = tun;
 	tun-attached = 1;


[PATCH] use default 32768-61000 outgoing port range in all cases

2007-05-15 Thread Mark Glines
Hi,

I noticed I had chopped off a whole comment, when I meant to only remove
part of it.  So I've fixed that.

This is a reissued use-high-ports-for-local-stuff.diff, with a comment
fix.  Does anyone have a problem with this patch in its current form?
Any chance of applying it?

This diff changes the default port range used for outgoing connections,
from use 32768-61000 in most cases, but use N-4999 on small boxes
(where N is a multiple of 1024, depending on just *how* small the box
is) to just use 32768-61000 in all cases.

I don't believe there are any drawbacks to this change, and it keeps
outgoing connection ports farther away from the mess of IANA-registered
ports.

Thanks,

Signed-off-by: Mark Glines [EMAIL PROTECTED]

diff --git a/net/ipv4/inet_connection_sock.c b/net/ipv4/inet_connection_sock.c
index 43fb160..fbe7714 100644
--- a/net/ipv4/inet_connection_sock.c
+++ b/net/ipv4/inet_connection_sock.c
@@ -31,10 +31,8 @@ EXPORT_SYMBOL(inet_csk_timer_bug_msg);
 
 /*
  * This array holds the first and last local port number.
- * For high-usage systems, use sysctl to change this to
- * 32768-61000
  */
-int sysctl_local_port_range[2] = { 1024, 4999 };
+int sysctl_local_port_range[2] = { 32768, 61000 };
 
 int inet_csk_bind_conflict(const struct sock *sk,
   const struct inet_bind_bucket *tb)
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index bd4c295..33ef0e7 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -2465,13 +2465,10 @@ void __init tcp_init(void)
order++)
;
if (order = 4) {
-   sysctl_local_port_range[0] = 32768;
-   sysctl_local_port_range[1] = 61000;
tcp_death_row.sysctl_max_tw_buckets = 18;
sysctl_tcp_max_orphans = 4096  (order - 4);
sysctl_max_syn_backlog = 1024;
} else if (order  3) {
-   sysctl_local_port_range[0] = 1024 * (3 - order);
tcp_death_row.sysctl_max_tw_buckets = (3 - order);
sysctl_tcp_max_orphans = (3 - order);
sysctl_max_syn_backlog = 128;
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: smp_affinity, MSI-X and 2.6.21.1

2007-05-15 Thread Rick Jones

Andi Kleen wrote:

That's true, but we are talking about software state so in some sense
it might be better that the affinity-to-be is reported to the user in
this case.

Delayed register updates are an implementation detail the user does
not need to know about here.



This patch should fix it.


And it seems to when I apply it against the 2.6.21.1 kernel I'm messing about 
with:

hpcpc106:~/s2io-2.0.19-8893# cat /proc/irq/69/smp_affinity
,
hpcpc106:~/s2io-2.0.19-8893# echo 4  /proc/irq/69/smp_affinity
hpcpc106:~/s2io-2.0.19-8893# cat /proc/irq/69/smp_affinity
,0004
hpcpc106:~/s2io-2.0.19-8893# cat /proc/interrupts | grep 69
 69:  0  0  0  0 PCI-MSI  
eth2:MSI-X-6-RX

It would be nice if this could find its way into the kernel at some point - 
2.6.23 or 2.6.24 perhaps?


rick jones


-Andi

Report the pending irq if available in smp_affinity

Otherwise smp_affinity would only update after the next interrupt
on x86 systems.

Cc: [EMAIL PROTECTED]
Cc: [EMAIL PROTECTED]

Signed-off-by: Andi Kleen [EMAIL PROTECTED]

Index: linux/kernel/irq/proc.c
===
--- linux.orig/kernel/irq/proc.c
+++ linux/kernel/irq/proc.c
@@ -19,7 +19,14 @@ static struct proc_dir_entry *root_irq_d
 static int irq_affinity_read_proc(char *page, char **start, off_t off,
  int count, int *eof, void *data)
 {
-   int len = cpumask_scnprintf(page, count, irq_desc[(long)data].affinity);
+   struct irq_desc *desc = irq_desc + (long)data;
+   cpumask_t *mask = desc-affinity;
+   int len;
+#ifdef CONFIG_GENERIC_PENDING_IRQ
+   if (desc-status  IRQ_MOVE_PENDING)
+   mask = desc-pending_mask;
+#endif
+   len = cpumask_scnprintf(page, count, *mask);
 
 	if (count - len  2)

return -EINVAL;


-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] New driver API to speed up small packets xmits

2007-05-15 Thread David Miller
From: Shirley Ma [EMAIL PROTECTED]
Date: Tue, 15 May 2007 16:33:22 -0700

   That's interesting. So a generic LRO in interface layer will benefit
 the preformance more, right? Receiving path TCP N times is more expensive
 than sending, I think.

If you look at some of the drivers doing LRO, the bulk of the
implementation is in software, so yes :-)
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[git patches] net driver fixes

2007-05-15 Thread Jeff Garzik

Please pull from 'upstream-linus' branch of
master.kernel.org:/pub/scm/linux/kernel/git/jgarzik/netdev-2.6.git 
upstream-linus

to receive the following updates:

 drivers/net/pasemi_mac.c   |   45 +++
 drivers/net/pasemi_mac.h   |4 +-
 drivers/net/smc911x.c  |6 +---
 drivers/net/ucc_geth.c |   40 +++---
 drivers/net/ucc_geth_mii.c |9 ---
 drivers/net/ucc_geth_mii.h |   10 
 6 files changed, 58 insertions(+), 56 deletions(-)

Kim Phillips (1):
  ucc_geth: eliminate max-speed, change interface-type to 
phy-connection-type

Olof Johansson (1):
  pasemi_mac: Interrupt ack fixes

Vitaly Wool (1):
  smc911x: fix compilation breakage

[EMAIL PROTECTED] (3):
  pasemi_mac: Fix register defines
  pasemi_mac: Terminate PCI ID list
  pasemi_mac: Fix local-mac-address parsing

diff --git a/drivers/net/pasemi_mac.c b/drivers/net/pasemi_mac.c
index bc7f3de..8d38425 100644
--- a/drivers/net/pasemi_mac.c
+++ b/drivers/net/pasemi_mac.c
@@ -85,6 +85,7 @@ static int pasemi_get_mac_addr(struct pasemi_mac *mac)
 {
struct pci_dev *pdev = mac-pdev;
struct device_node *dn = pci_device_to_OF_node(pdev);
+   int len;
const u8 *maddr;
u8 addr[6];
 
@@ -94,9 +95,17 @@ static int pasemi_get_mac_addr(struct pasemi_mac *mac)
return -ENOENT;
}
 
-   maddr = of_get_property(dn, local-mac-address, NULL);
+   maddr = of_get_property(dn, local-mac-address, len);
+
+   if (maddr  len == 6) {
+   memcpy(mac-mac_addr, maddr, 6);
+   return 0;
+   }
+
+   /* Some old versions of firmware mistakenly uses mac-address
+* (and as a string) instead of a byte array in local-mac-address.
+*/
 
-   /* Fall back to mac-address for older firmware */
if (maddr == NULL)
maddr = of_get_property(dn, mac-address, NULL);
 
@@ -106,6 +115,7 @@ static int pasemi_get_mac_addr(struct pasemi_mac *mac)
return -ENOENT;
}
 
+
if (sscanf(maddr, %hhx:%hhx:%hhx:%hhx:%hhx:%hhx, addr[0],
   addr[1], addr[2], addr[3], addr[4], addr[5]) != 6) {
dev_warn(pdev-dev,
@@ -113,7 +123,8 @@ static int pasemi_get_mac_addr(struct pasemi_mac *mac)
return -EINVAL;
}
 
-   memcpy(mac-mac_addr, addr, sizeof(addr));
+   memcpy(mac-mac_addr, addr, 6);
+
return 0;
 }
 
@@ -384,17 +395,14 @@ static void pasemi_mac_replenish_rx_ring(struct 
net_device *dev)
 
 static void pasemi_mac_restart_rx_intr(struct pasemi_mac *mac)
 {
-   unsigned int reg, stat;
+   unsigned int reg, pcnt;
/* Re-enable packet count interrupts: finally
 * ack the packet count interrupt we got in rx_intr.
 */
 
-   pci_read_config_dword(mac-iob_pdev,
- PAS_IOB_DMA_RXCH_STAT(mac-dma_rxch),
- stat);
+   pcnt = *mac-rx_status  PAS_STATUS_PCNT_M;
 
-   reg = PAS_IOB_DMA_RXCH_RESET_PCNT(stat  PAS_IOB_DMA_RXCH_STAT_CNTDEL_M)
-   | PAS_IOB_DMA_RXCH_RESET_PINTC;
+   reg = PAS_IOB_DMA_RXCH_RESET_PCNT(pcnt) | PAS_IOB_DMA_RXCH_RESET_PINTC;
 
pci_write_config_dword(mac-iob_pdev,
   PAS_IOB_DMA_RXCH_RESET(mac-dma_rxch),
@@ -403,14 +411,12 @@ static void pasemi_mac_restart_rx_intr(struct pasemi_mac 
*mac)
 
 static void pasemi_mac_restart_tx_intr(struct pasemi_mac *mac)
 {
-   unsigned int reg, stat;
+   unsigned int reg, pcnt;
 
/* Re-enable packet count interrupts */
-   pci_read_config_dword(mac-iob_pdev,
- PAS_IOB_DMA_TXCH_STAT(mac-dma_txch), stat);
+   pcnt = *mac-tx_status  PAS_STATUS_PCNT_M;
 
-   reg = PAS_IOB_DMA_TXCH_RESET_PCNT(stat  PAS_IOB_DMA_TXCH_STAT_CNTDEL_M)
-   | PAS_IOB_DMA_TXCH_RESET_PINTC;
+   reg = PAS_IOB_DMA_TXCH_RESET_PCNT(pcnt) | PAS_IOB_DMA_TXCH_RESET_PINTC;
 
pci_write_config_dword(mac-iob_pdev,
   PAS_IOB_DMA_TXCH_RESET(mac-dma_txch), reg);
@@ -591,21 +597,24 @@ static irqreturn_t pasemi_mac_tx_intr(int irq, void *data)
 {
struct net_device *dev = data;
struct pasemi_mac *mac = netdev_priv(dev);
-   unsigned int reg;
+   unsigned int reg, pcnt;
 
if (!(*mac-tx_status  PAS_STATUS_CAUSE_M))
return IRQ_NONE;
 
pasemi_mac_clean_tx(mac);
 
-   reg = PAS_IOB_DMA_TXCH_RESET_PINTC;
+   pcnt = *mac-tx_status  PAS_STATUS_PCNT_M;
+
+   reg = PAS_IOB_DMA_TXCH_RESET_PCNT(pcnt) | PAS_IOB_DMA_TXCH_RESET_PINTC;
 
if (*mac-tx_status  PAS_STATUS_SOFT)
reg |= PAS_IOB_DMA_TXCH_RESET_SINTC;
if (*mac-tx_status  PAS_STATUS_ERROR)
reg |= PAS_IOB_DMA_TXCH_RESET_DINTC;
 
-   pci_write_config_dword(mac-iob_pdev, 
PAS_IOB_DMA_TXCH_RESET(mac-dma_txch),
+   

Re: [IPV4] LVS: Allow to send ICMP unreachable responses when real-servers are removed

2007-05-15 Thread Julian Anastasov

Hello,

On Tue, 15 May 2007, Patrick McHardy wrote:

 Simon Horman wrote:
  On Mon, May 14, 2007 at 07:41:48PM +0200, Patrick McHardy wrote:
  
 So you're adding a local route for non-local destination and the
 address selection in icmp_send() uses the original destination
 address as source because the route has RTCF_LOCAL set, resulting
 in an error in ip_route_output_slow().
  
  I'm not entirely sure that adding a local route is the right
  terminology, but then again, perhaps I'm missunderstanding exactly
  what that means.
 
 It means adding a route to the local table, which causes the
 resulting dst_entry to be marked with RTCF_LOCAL.

IPVS users add local route to user-defined (not local) routing
table to deliver locally only selected traffic, eg. by fwmark. The use
is just like for transparent proxy, deliver port X for VIP locally,
other traffic for VIP is forwarded (hits unicast route).

And it seems the idea is ICMP still to work for such scenarios,
icmp_send should know that this is authorized operation that can
avoid the source address check (saddr=VIP) in ip_route_output_slow.
If icmp_send is changed to use inet_addr_type() then ICMP will leave
with saddr != VIP and that is not nice.

I think, icmp_send is fine as is, the main problem is how
to differentiate the ip_route_output_slow users to ones that need
the check for valid source address and others (eg. NAT) that expect
and allow source address to be non-local.

It is interesting, what happens when some NAT rule maps
multiple internal addresses (one to one) to multiple non-local public
addresses, may be if packet from world is rejected we don't send ICMP
from the right public address? For example, if we are NAT router
with such rules:

internal 192.168.0.1 is mapped to public 10.0.0.1
internal 192.168.0.2 is mapped to public 10.0.0.2
...

where 10.0.0.0/16 are not configured as local IPs on the NAT router,
eg. when we don't want to add thousands of IPs.

World sends to 10.0.0.1 but the NAT router wants to send ICMP with
saddr=10.0.0.1 which is not local IP.

So, the rule here is that NAT allows ICMP to use 10.0.0.0/16
as source address, even if such IPs are not configured. Of course,
one may use ip addr add 10.0.0.0/16 dev lo for such case and may be all
such examples can be solved somehow, i simply didn't tried such setups.

 To summarize, what can help is a flag (eg. RT_ANYSRC) to
ip_route_output* that all special users can provide to skip the
check, for example:
- RTCF_LOCAL packets in icmp_send() can avoid the check
- NAT can avoid the check (ip_route_me_harder can be simplified?)

 Currently, all callers use the check, so may be the goal can be
to start with small set of callers that can set the new flag. It looks
like we can save some CPU cycles too, ip_route_me_harder looks too
overloaded.

  I think that your patch looks good, assuming that inet_addr_type(VIP)
  is going to return RTN_LOCAL (except in the unlikely case that VIP is
  multicast or something silly like that.
 
 I'm not familiar with the IPVS terms, but as far as I understand,
 it is _not_ going to return RTN_LOCAL, so we get the desired
 behaviour of selecting a local address as source.

But what is preferred is to use VIP in ICMP.

ip route add local VIP dev lo table user_defined

returns RTCF_LOCAL but inet_addr_type() does not return RTN_LOCAL,
we fix one thing but break another :)

Regards

--
Julian Anastasov [EMAIL PROTECTED]
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] [e1000] Lower the MSI unavailable message to INFO priority

2007-05-15 Thread H. Peter Anvin
Currently, if MSI is enabled but unavailable the e1000 prints an error
message Unable to allocate MSI interrupt Error with ERR priority.
This is confusing to users since this is not a functionality error;
the driver will immediately afterwards try to acquire a conventional
PIC/APIC interrupt and will print another message if that fails.

Accordingly, lower the priority of this message to INFO priority, since
it does not reflect any sort of loss of functionality, but rather just a
limitation of the configuration of the runtime system.

Signed-off-by: H. Peter Anvin [EMAIL PROTECTED]
---
 drivers/net/e1000/e1000_main.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/drivers/net/e1000/e1000_main.c b/drivers/net/e1000/e1000_main.c
index 637ae8f..089ae3f 100644
--- a/drivers/net/e1000/e1000_main.c
+++ b/drivers/net/e1000/e1000_main.c
@@ -307,7 +307,7 @@ static int e1000_request_irq(struct e1000_adapter *adapter)
if (adapter-hw.mac_type = e1000_82571) {
adapter-have_msi = TRUE;
if ((err = pci_enable_msi(adapter-pdev))) {
-   DPRINTK(PROBE, ERR,
+   DPRINTK(PROBE, INFO,
 Unable to allocate MSI interrupt Error: %d\n, err);
adapter-have_msi = FALSE;
}
-- 
1.5.1.4

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/1] [TIPC]: Fixed erroneous introduction of for_each_netdev

2007-05-15 Thread Jon Paul Maloy

Signed-off-by: Jon Paul Maloy [EMAIL PROTECTED]
---
 net/tipc/eth_media.c |   10 ++
 1 files changed, 6 insertions(+), 4 deletions(-)

diff --git a/net/tipc/eth_media.c b/net/tipc/eth_media.c
index 0ee6ded..c73c206 100644
--- a/net/tipc/eth_media.c
+++ b/net/tipc/eth_media.c
@@ -120,18 +120,20 @@ static int recv_msg(struct sk_buff *buf, struct 
net_device *dev,
 
 static int enable_bearer(struct tipc_bearer *tb_ptr)
 {
-   struct net_device *dev, *pdev;
+   struct net_device *dev = NULL;
+   struct net_device *pdev = NULL;
struct eth_bearer *eb_ptr = eth_bearers[0];
struct eth_bearer *stop = eth_bearers[MAX_ETH_BEARERS];
char *driver_name = strchr((const char *)tb_ptr-name, ':') + 1;
 
/* Find device with specified name */
-   dev = NULL;
-   for_each_netdev(pdev)
-   if (!strncmp(dev-name, driver_name, IFNAMSIZ)) {
+
+   for_each_netdev(pdev){
+   if (!strncmp(pdev-name, driver_name, IFNAMSIZ)) {
dev = pdev;
break;
}
+}
if (!dev)
return -ENODEV;
 
-- 
1.5.0.5

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/1] [TIPC]: Fixed erroneous introduction of for_each_netdev

2007-05-15 Thread David Miller
From: Jon Paul Maloy [EMAIL PROTECTED]
Date: Tue, 15 May 2007 20:21:14 -0400

 
 Signed-off-by: Jon Paul Maloy [EMAIL PROTECTED]

Sorry about that Jon, I thought the new code was correct :-/
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] [e1000] Lower the MSI unavailable message to INFO priority

2007-05-15 Thread Jeff Garzik

H. Peter Anvin wrote:

diff --git a/drivers/net/e1000/e1000_main.c b/drivers/net/e1000/e1000_main.c
index 637ae8f..089ae3f 100644
--- a/drivers/net/e1000/e1000_main.c
+++ b/drivers/net/e1000/e1000_main.c
@@ -307,7 +307,7 @@ static int e1000_request_irq(struct e1000_adapter *adapter)
if (adapter-hw.mac_type = e1000_82571) {
adapter-have_msi = TRUE;
if ((err = pci_enable_msi(adapter-pdev))) {
-   DPRINTK(PROBE, ERR,
+   DPRINTK(PROBE, INFO,
 Unable to allocate MSI interrupt Error: %d\n, err);
adapter-have_msi = FALSE;



Actually, it should not print any message at all.

pci_enable_msi() failure is a normal event (as you point out).  Even at 
KERN_INFO level, the message is still misleading.


Jeff


-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [WIP] [PATCH] WAS Re: [RFC] New driver API to speed up small packets xmits

2007-05-15 Thread jamal
On Tue, 2007-15-05 at 18:48 -0400, jamal wrote:

 I will try to post the e1000 patch tonight or tommorow morning.

I have the e1000 path done; a few features from the 2.6.18 missing
(mainly the one mucking with tx ring pruning on the tx path).
While it compiles and looks right - i havent tested it and wont have
time for another day or so. However, if anyone wants it in its current
form -let me know and i will email you privately.

cheers,
jamal 

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH (take 2)] netdev: lockdep classes in register_netdevice Re: [patch 04/13] ppp_generic: fix lockdep warning

2007-05-15 Thread Jarek Poplawski
Sorry - I've fogotten about something very important!
(Plus a small change in the diff.)

Jarek P.

--- (take 2)

After initializing dev-_xmit_lock register_netdevice()
sets lockdep class according to dev-type.

Idea of this patch - by David Miller.

Reported  tested by: Yuriy N. Shkandybin [EMAIL PROTECTED]
Signed-off-by: Jarek Poplawski [EMAIL PROTECTED]

---


diff -Nurp 2.6.22-/net/core/dev.c 2.6.22/net/core/dev.c
--- 2.6.22-/net/core/dev.c  2007-05-14 20:26:16.0 +0200
+++ 2.6.22/net/core/dev.c   2007-05-16 07:35:22.0 +0200
@@ -116,6 +116,7 @@
 #include linux/dmaengine.h
 #include linux/err.h
 #include linux/ctype.h
+#include linux/if_arp.h
 
 /*
  * The list of packet types we will receive (as opposed to discard)
@@ -217,6 +218,73 @@ extern void netdev_unregister_sysfs(stru
 #definenetdev_unregister_sysfs(dev)do { } while(0)
 #endif
 
+#ifdef CONFIG_DEBUG_LOCK_ALLOC
+/*
+ * register_netdevice() inits dev-_xmit_lock and sets lockdep class
+ * according to dev-type
+ */
+static const unsigned short netdev_lock_type[] =
+   {ARPHRD_NETROM, ARPHRD_ETHER, ARPHRD_EETHER, ARPHRD_AX25,
+ARPHRD_PRONET, ARPHRD_CHAOS, ARPHRD_IEEE802, ARPHRD_ARCNET,
+ARPHRD_APPLETLK, ARPHRD_DLCI, ARPHRD_ATM, ARPHRD_METRICOM,
+ARPHRD_IEEE1394, ARPHRD_EUI64, ARPHRD_INFINIBAND, ARPHRD_SLIP,
+ARPHRD_CSLIP, ARPHRD_SLIP6, ARPHRD_CSLIP6, ARPHRD_RSRVD,
+ARPHRD_ADAPT, ARPHRD_ROSE, ARPHRD_X25, ARPHRD_HWX25,
+ARPHRD_PPP, ARPHRD_CISCO, ARPHRD_LAPB, ARPHRD_DDCMP,
+ARPHRD_RAWHDLC, ARPHRD_TUNNEL, ARPHRD_TUNNEL6, ARPHRD_FRAD,
+ARPHRD_SKIP, ARPHRD_LOOPBACK, ARPHRD_LOCALTLK, ARPHRD_FDDI,
+ARPHRD_BIF, ARPHRD_SIT, ARPHRD_IPDDP, ARPHRD_IPGRE,
+ARPHRD_PIMREG, ARPHRD_HIPPI, ARPHRD_ASH, ARPHRD_ECONET,
+ARPHRD_IRDA, ARPHRD_FCPP, ARPHRD_FCAL, ARPHRD_FCPL,
+ARPHRD_FCFABRIC, ARPHRD_IEEE802_TR, ARPHRD_IEEE80211,
+ARPHRD_IEEE80211_PRISM, ARPHRD_IEEE80211_RADIOTAP, ARPHRD_VOID,
+ARPHRD_NONE};
+
+static const char *netdev_lock_name[] =
+   {_xmit_NETROM, _xmit_ETHER, _xmit_EETHER, _xmit_AX25,
+_xmit_PRONET, _xmit_CHAOS, _xmit_IEEE802, _xmit_ARCNET,
+_xmit_APPLETLK, _xmit_DLCI, _xmit_ATM, _xmit_METRICOM,
+_xmit_IEEE1394, _xmit_EUI64, _xmit_INFINIBAND, _xmit_SLIP,
+_xmit_CSLIP, _xmit_SLIP6, _xmit_CSLIP6, _xmit_RSRVD,
+_xmit_ADAPT, _xmit_ROSE, _xmit_X25, _xmit_HWX25,
+_xmit_PPP, _xmit_CISCO, _xmit_LAPB, _xmit_DDCMP,
+_xmit_RAWHDLC, _xmit_TUNNEL, _xmit_TUNNEL6, _xmit_FRAD,
+_xmit_SKIP, _xmit_LOOPBACK, _xmit_LOCALTLK, _xmit_FDDI,
+_xmit_BIF, _xmit_SIT, _xmit_IPDDP, _xmit_IPGRE,
+_xmit_PIMREG, _xmit_HIPPI, _xmit_ASH, _xmit_ECONET,
+_xmit_IRDA, _xmit_FCPP, _xmit_FCAL, _xmit_FCPL,
+_xmit_FCFABRIC, _xmit_IEEE802_TR, _xmit_IEEE80211,
+_xmit_IEEE80211_PRISM, _xmit_IEEE80211_RADIOTAP, _xmit_VOID,
+_xmit_NONE};
+
+static struct lock_class_key 
netdev_xmit_lock_key[ARRAY_SIZE(netdev_lock_type)];
+
+static inline unsigned short netdev_lock_pos(unsigned short dev_type)
+{
+   int i;
+
+   for (i = 0; i  ARRAY_SIZE(netdev_lock_type); i++)
+   if (netdev_lock_type[i] == dev_type)
+   return i;
+   /* the last key is used by default */
+   return ARRAY_SIZE(netdev_lock_type) - 1;
+}
+
+static inline void netdev_set_lockdep_class(spinlock_t *lock,
+   unsigned short dev_type)
+{
+   int i;
+
+   i = netdev_lock_pos(dev_type);
+   lockdep_set_class_and_name(lock, netdev_xmit_lock_key[i],
+  netdev_lock_name[i]);
+}
+#else
+static inline void netdev_set_lockdep_class(spinlock_t *lock,
+   unsigned short dev_type)
+{
+}
+#endif
 
 
/***
 
@@ -3001,6 +3069,7 @@ int register_netdevice(struct net_device
 
spin_lock_init(dev-queue_lock);
spin_lock_init(dev-_xmit_lock);
+   netdev_set_lockdep_class(dev-_xmit_lock, dev-type);
dev-xmit_lock_owner = -1;
spin_lock_init(dev-ingress_lock);
 
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] netdev: lockdep classes in register_netdevice Re: [patch 04/13] ppp_generic: fix lockdep warning

2007-05-15 Thread Jarek Poplawski
On Tue, May 15, 2007 at 12:49:47PM +0400, Yuriy N. Shkandybin wrote:
 I've patched 2.6.22-rc1 and there was no warnings from lock debugger.
 

So, you mean only this one patch - without previous vlan patch?
Very interesting...

Thanks once more,
Jarek P.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH (take 2)] netdev: lockdep classes in register_netdevice Re: [patch 04/13] ppp_generic: fix lockdep warning

2007-05-15 Thread David Miller
From: Jarek Poplawski [EMAIL PROTECTED]
Date: Wed, 16 May 2007 07:40:00 +0200

 After initializing dev-_xmit_lock register_netdevice()
 sets lockdep class according to dev-type.
 
 Idea of this patch - by David Miller.
 
 Reported  tested by: Yuriy N. Shkandybin [EMAIL PROTECTED]
 Signed-off-by: Jarek Poplawski [EMAIL PROTECTED]

Patch applied, dziekuje bardzo.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html