Re: [PATCH][RFC] network splice receive

2007-06-09 Thread Jens Axboe
On Fri, Jun 08 2007, Evgeniy Polyakov wrote:
 On Fri, Jun 08, 2007 at 06:57:25PM +0400, Evgeniy Polyakov ([EMAIL 
 PROTECTED]) wrote:
  I will try some things for the nearest 30-60 minutes, and then will move to
  canoe trip until thuesday, so will not be able to work on this idea.
 
 Ok, replacing in fs/splice.c every page_cache_release() with
 static void splice_page_release(struct page *p)
 {
   if (!PageSlab(p))
   page_cache_release(p);
 }

Ehm, I don't see why that should be necessary. Except in
splice_to_pipe(), I have considered that we need to pass in a release
function if mapping fails at some point. But it's probably best to do
that in the caller, since they have the knowledge of how to release the
pages.

The rest of the PageSlab() tests are bogus.

 and putting cloned skb into private field instead of 
 original on in spd_fill_page() ends up without kernel hung.

Why? Seems pointless to allocate a clone just to hold on to the skb, a
reference should be equally good. I would not be opposed to doing it
this way, I just don't see what a clone buys us as compared to just
holding that reference to the skb.

 I'm not sure it is correct, that page can be released in fs/splice.c
 without calling any callback from network code, when network data is
 being processed.

Please explain!

 Size of the received file is bigger than file sent, file contains repeated
 blocks of data sometimes. Cloned skb usage is likely too big overhead,
 although for receiving fast clone is unused in most cases, so there
 might be some gain.
 
 Attached your patch with above changes.

Thanks, I'll fiddle with this on monday.

-- 
Jens Axboe

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] RFC: have tcp_recvmsg() check kthread_should_stop() and treat it as if it were signalled

2007-06-09 Thread Jeff Layton
On Sat, 09 Jun 2007 11:30:04 +1000
Herbert Xu [EMAIL PROTECTED] wrote:

 Please cc networking patches to [EMAIL PROTECTED]
 
 Jeff Layton [EMAIL PROTECTED] wrote:
  
  The following patch is a first stab at removing this need. It makes it
  so that in tcp_recvmsg() we also check kthread_should_stop() at any
  point where we currently check to see if the task was signalled. If
  that returns true, then it acts as if it were signalled and returns to
  the calling function.
 
 This just doesn't seem to fit.  Why should networking care about kthreads?
 
 Perhaps you can get kthread_stop to send a signal instead?
 

The problem there is that we still have to make the kthread let signals
through. The nice thing about this approach is that we can make the
kthread ignore signals, but still allow it to break out of kernel_recvmsg
when a kthread_stop is done.

Though I will confess that you have a point about this feeling like a
layering violation...

-- 
Jeff Layton [EMAIL PROTECTED]
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] NET: Multiqueue network device support.

2007-06-09 Thread Herbert Xu
On Fri, Jun 08, 2007 at 09:12:52AM -0400, jamal wrote:
 
 To mimick that behavior in LLTX, a driver needs to use the same lock on
 both tx and receive. e1000 holds a different lock on tx path from rx
 path. Maybe theres something clever i am missing; but it seems to be a
 bug on e1000.

It's both actually :)

It takes the tx_lock in the xmit routine as well as in the clean-up
routine.  However, the lock is only taken when it updates the queue
status.

Thanks to the ring buffer structure the rest of the clean-up/xmit code
will run concurrently just fine.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmVHI~} [EMAIL PROTECTED]
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] iproute2: Format IPv6 tunnels endpoints nicely.

2007-06-09 Thread David Lamparter
Change formatting of IPv6 tunnel endpoints from hex chain to standard IPv6
representation.

Signed-off-by: David Lamparter [EMAIL PROTECTED]

---
 lib/ll_addr.c |3 +++
 1 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/lib/ll_addr.c b/lib/ll_addr.c
index 581487d..f558050 100644
--- a/lib/ll_addr.c
+++ b/lib/ll_addr.c
@@ -38,6 +38,9 @@ const char *ll_addr_n2a(unsigned char *addr, int alen, int 
type, char *buf, int
(type == ARPHRD_TUNNEL || type == ARPHRD_SIT || type == 
ARPHRD_IPGRE)) {
return inet_ntop(AF_INET, addr, buf, blen);
}
+   if (alen == 16  type == ARPHRD_TUNNEL6) {
+   return inet_ntop(AF_INET6, addr, buf, blen);
+   }
l = 0;
for (i=0; ialen; i++) {
if (i==0) {
-- 
1.5.0.1

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC] [PATCHES] pktgen IPSEC 0/4

2007-06-09 Thread jamal

This is a set of patches that add ipsec functionality to pktgen. I have
lost these patches before - but they are now fully recovered and well
tested. Robert has glanced at the patches and seems to have no qualms
with them. I am soliciting for any feedback because i would like to push
them for 2.6.23 when Dave opens his tree.

[I think i may have figured out how the cool cats send their series of
patches using git patch-format but i am not sure i can trust my lil
soldier's config to do the right thing. So i will do them manually. 
I am climbing up the git ladder folks, one step at a time!].


cheers,
jamal

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PKTGEN] Centralize packet overhead tracking

2007-06-09 Thread jamal
1 of 4.

cheers,
jamal

commit f7da845f37e3cd47be46697491210c126b37c8fc
Author: Jamal Hadi Salim [EMAIL PROTECTED]
Date:   Sat Jun 9 09:11:16 2007 -0400

[PKTGEN] Centralize packet overhead tracking
Track the extra packet overhead for VLAN tags, MPLS, IPSEC etc

Signed-off-by: Jamal Hadi Salim [EMAIL PROTECTED]

diff --git a/net/core/pktgen.c b/net/core/pktgen.c
index 9cd3a1c..1352316 100644
--- a/net/core/pktgen.c
+++ b/net/core/pktgen.c
@@ -228,6 +228,7 @@ struct pktgen_dev {
 
int min_pkt_size;   /* = ETH_ZLEN; */
int max_pkt_size;   /* = ETH_ZLEN; */
+   int pkt_overhead;   /* overhead for MPLS, VLANs, IPSEC etc */
int nfrags;
__u32 delay_us; /* Default delay */
__u32 delay_ns;
@@ -2075,6 +2076,13 @@ static void spin(struct pktgen_dev *pkt_dev, __u64 
spin_until_us)
pkt_dev-idle_acc += now - start;
 }
 
+static inline void set_pkt_overhead(struct pktgen_dev *pkt_dev)
+{
+   pkt_dev-pkt_overhead += pkt_dev-nr_labels*sizeof(u32);
+   pkt_dev-pkt_overhead += VLAN_TAG_SIZE(pkt_dev);
+   pkt_dev-pkt_overhead += SVLAN_TAG_SIZE(pkt_dev);
+}
+
 /* Increment/randomize headers according to flags and current values
  * for IP src/dest, UDP src/dst port, MAC-Addr src/dst
  */
@@ -2323,9 +2331,7 @@ static struct sk_buff *fill_packet_ipv4(struct net_device 
*odev,
 
datalen = (odev-hard_header_len + 16)  ~0xf;
skb = alloc_skb(pkt_dev-cur_pkt_size + 64 + datalen +
-   pkt_dev-nr_labels*sizeof(u32) +
-   VLAN_TAG_SIZE(pkt_dev) + SVLAN_TAG_SIZE(pkt_dev),
-   GFP_ATOMIC);
+   pkt_dev-pkt_overhead, GFP_ATOMIC);
if (!skb) {
sprintf(pkt_dev-result, No memory);
return NULL;
@@ -2368,7 +2374,7 @@ static struct sk_buff *fill_packet_ipv4(struct net_device 
*odev,
 
/* Eth + IPh + UDPh + mpls */
datalen = pkt_dev-cur_pkt_size - 14 - 20 - 8 -
- pkt_dev-nr_labels*sizeof(u32) - VLAN_TAG_SIZE(pkt_dev) - 
SVLAN_TAG_SIZE(pkt_dev);
+ pkt_dev-pkt_overhead;
if (datalen  sizeof(struct pktgen_hdr))
datalen = sizeof(struct pktgen_hdr);
 
@@ -2391,8 +2397,7 @@ static struct sk_buff *fill_packet_ipv4(struct net_device 
*odev,
iph-check = ip_fast_csum((void *)iph, iph-ihl);
skb-protocol = protocol;
skb-mac_header = (skb-network_header - ETH_HLEN -
-  pkt_dev-nr_labels * sizeof(u32) -
-  VLAN_TAG_SIZE(pkt_dev) - SVLAN_TAG_SIZE(pkt_dev));
+  pkt_dev-pkt_overhead);
skb-dev = odev;
skb-pkt_type = PACKET_HOST;
 
@@ -2662,9 +2667,7 @@ static struct sk_buff *fill_packet_ipv6(struct net_device 
*odev,
mod_cur_headers(pkt_dev);
 
skb = alloc_skb(pkt_dev-cur_pkt_size + 64 + 16 +
-   pkt_dev-nr_labels*sizeof(u32) +
-   VLAN_TAG_SIZE(pkt_dev) + SVLAN_TAG_SIZE(pkt_dev),
-   GFP_ATOMIC);
+   pkt_dev-pkt_overhead, GFP_ATOMIC);
if (!skb) {
sprintf(pkt_dev-result, No memory);
return NULL;
@@ -2708,7 +2711,7 @@ static struct sk_buff *fill_packet_ipv6(struct net_device 
*odev,
/* Eth + IPh + UDPh + mpls */
datalen = pkt_dev-cur_pkt_size - 14 -
  sizeof(struct ipv6hdr) - sizeof(struct udphdr) -
- pkt_dev-nr_labels*sizeof(u32) - VLAN_TAG_SIZE(pkt_dev) - 
SVLAN_TAG_SIZE(pkt_dev);
+ pkt_dev-pkt_overhead;
 
if (datalen  sizeof(struct pktgen_hdr)) {
datalen = sizeof(struct pktgen_hdr);
@@ -2738,8 +2741,7 @@ static struct sk_buff *fill_packet_ipv6(struct net_device 
*odev,
ipv6_addr_copy(iph-saddr, pkt_dev-cur_in6_saddr);
 
skb-mac_header = (skb-network_header - ETH_HLEN -
-  pkt_dev-nr_labels * sizeof(u32) -
-  VLAN_TAG_SIZE(pkt_dev) - SVLAN_TAG_SIZE(pkt_dev));
+  pkt_dev-pkt_overhead);
skb-protocol = protocol;
skb-dev = odev;
skb-pkt_type = PACKET_HOST;
@@ -2857,6 +2859,7 @@ static void pktgen_run(struct pktgen_thread *t)
pkt_dev-started_at = getCurUs();
pkt_dev-next_tx_us = getCurUs();   /* Transmit 
immediately */
pkt_dev-next_tx_ns = 0;
+   set_pkt_overhead(pkt_dev);
 
strcpy(pkt_dev-result, Starting);
started++;


[PKTGEN] Introduce sequential flows

2007-06-09 Thread jamal
2 of 4.

cheers,
jamal

commit d0d2c0c2e5539a54d66f07d2fa99bb52c19cc698
Author: Jamal Hadi Salim [EMAIL PROTECTED]
Date:   Sat Jun 9 09:12:21 2007 -0400

[PKTGEN] Introduce sequential flows

By default all flows in pktgen are randomly selected.
This patch introduces ability to have all defined flows to
be sent sequentially. It also cleans the small piece of code
associated with the change for readability.

Signed-off-by: Jamal Hadi Salim [EMAIL PROTECTED]

diff --git a/net/core/pktgen.c b/net/core/pktgen.c
index 1352316..2e861d2 100644
--- a/net/core/pktgen.c
+++ b/net/core/pktgen.c
@@ -181,6 +181,7 @@
 #define F_MPLS_RND(18)   /* Random MPLS labels */
 #define F_VID_RND (19)   /* Random VLAN ID */
 #define F_SVID_RND(110)  /* Random SVLAN ID */
+#define F_FLOW_RND(111)  /* Random flows */
 
 /* Thread control flag bits */
 #define T_TERMINATE   (10)
@@ -207,8 +208,12 @@ static struct proc_dir_entry *pg_proc_dir = NULL;
 struct flow_state {
__be32 cur_daddr;
int count;
+   __u32 flags;
 };
 
+/* flow flag bits */
+#define F_INIT   (10)/* flow has been initialized */
+
 struct pktgen_dev {
/*
 * Try to keep frequent/infrequent used vars. separated.
@@ -342,6 +347,7 @@ struct pktgen_dev {
unsigned cflows;/* Concurrent flows (config) */
unsigned lflow; /* Flow length  (config) */
unsigned nflows;/* accumulated flows (stats) */
+   unsigned curfl; /* current sequenced flow (state)*/
 
char result[512];
 };
@@ -691,6 +697,11 @@ static int pktgen_if_show(struct seq_file *seq, void *v)
if (pkt_dev-flags  F_MPLS_RND)
seq_printf(seq,  MPLS_RND  );
 
+   if (pkt_dev-flags  F_FLOW_RND)
+   seq_printf(seq,  FLOW_RND  );
+   else
+   seq_printf(seq,  FLOW_SEQ  ); /*in sequence flows*/
+
if (pkt_dev-flags  F_MACSRC_RND)
seq_printf(seq, MACSRC_RND  );
 
@@ -1182,6 +1193,9 @@ static ssize_t pktgen_if_write(struct file *file,
else if (strcmp(f, !SVID_RND) == 0)
pkt_dev-flags = ~F_SVID_RND;
 
+   else if (strcmp(f, FLOW_RND) == 0)
+   pkt_dev-flags |= F_FLOW_RND;
+
else if (strcmp(f, !IPV6) == 0)
pkt_dev-flags = ~F_IPV6;
 
@@ -1190,7 +1204,7 @@ static ssize_t pktgen_if_write(struct file *file,
Flag -:%s:- unknown\nAvailable flags, (prepend 
! to un-set flag):\n%s,
f,
IPSRC_RND, IPDST_RND, UDPSRC_RND, UDPDST_RND, 
-   MACSRC_RND, MACDST_RND, TXSIZE_RND, IPV6, 
MPLS_RND, VID_RND, SVID_RND\n);
+   MACSRC_RND, MACDST_RND, TXSIZE_RND, IPV6, 
MPLS_RND, VID_RND, SVID_RND, FLOW_RND\n);
return count;
}
sprintf(pg_result, OK: flags=0x%x, pkt_dev-flags);
@@ -2083,6 +2097,37 @@ static inline void set_pkt_overhead(struct pktgen_dev 
*pkt_dev)
pkt_dev-pkt_overhead += SVLAN_TAG_SIZE(pkt_dev);
 }
 
+static inline int f_seen(struct pktgen_dev *pkt_dev, int flow)
+{
+
+   if (pkt_dev-flows[flow].flags  F_INIT)
+   return 1;
+   else
+   return 0;
+}
+
+static inline int f_pick(struct pktgen_dev *pkt_dev)
+{
+   int flow = pkt_dev-curfl;
+
+   if (pkt_dev-flags  F_FLOW_RND) {
+   flow = random32() % pkt_dev-cflows;
+
+   if (pkt_dev-flows[flow].count  pkt_dev-lflow)
+   pkt_dev-flows[flow].count = 0;
+   } else {
+   if (pkt_dev-flows[flow].count = pkt_dev-lflow) {
+   /* reset time */
+   pkt_dev-flows[flow].count = 0;
+   pkt_dev-curfl += 1;
+   if (pkt_dev-curfl = pkt_dev-cflows)
+   pkt_dev-curfl = 0; /*reset */
+   }
+   }
+
+   return pkt_dev-curfl;
+}
+
 /* Increment/randomize headers according to flags and current values
  * for IP src/dest, UDP src/dst port, MAC-Addr src/dst
  */
@@ -2092,12 +2137,8 @@ static void mod_cur_headers(struct pktgen_dev *pkt_dev)
__u32 imx;
int flow = 0;
 
-   if (pkt_dev-cflows) {
-   flow = random32() % pkt_dev-cflows;
-
-   if (pkt_dev-flows[flow].count  pkt_dev-lflow)
-   pkt_dev-flows[flow].count = 0;
-   }
+   if (pkt_dev-cflows)
+   flow = f_pick(pkt_dev);
 
/*  Deal with source MAC */
if (pkt_dev-src_mac_count  1) {
@@ -2213,7 +2254,7 @@ static void mod_cur_headers(struct pktgen_dev *pkt_dev)
pkt_dev-cur_saddr = htonl(t);
}
 
-   if (pkt_dev-cflows  pkt_dev-flows[flow].count != 0) {
+   if (pkt_dev-cflows  f_seen(pkt_dev, flow)) {
  

[XFRM] Introduce standalone SAD lookup

2007-06-09 Thread jamal
3 of 4.

cheers,
jamal
commit 923d6c49f9f513da41e4bfd8188304787a5c8093
Author: Jamal Hadi Salim [EMAIL PROTECTED]
Date:   Sat Jun 9 09:16:12 2007 -0400

[XFRM] Introduce standalone SAD lookup
This allows other in-kernel functions to do SAD lookups.
The only known user at the moment is pktgen.

Signed-off-by: Jamal Hadi Salim [EMAIL PROTECTED]

diff --git a/include/net/xfrm.h b/include/net/xfrm.h
index 311f25a..79d2c37 100644
--- a/include/net/xfrm.h
+++ b/include/net/xfrm.h
@@ -920,6 +920,10 @@ extern struct xfrm_state *xfrm_state_find(xfrm_address_t 
*daddr, xfrm_address_t
  struct flowi *fl, struct xfrm_tmpl 
*tmpl,
  struct xfrm_policy *pol, int *err,
  unsigned short family);
+extern struct xfrm_state * xfrm_stateonly_find(xfrm_address_t *daddr,
+  xfrm_address_t *saddr,
+  unsigned short family,
+  u8 mode, u8 proto, u32 reqid);
 extern int xfrm_state_check_expire(struct xfrm_state *x);
 extern void xfrm_state_insert(struct xfrm_state *x);
 extern int xfrm_state_add(struct xfrm_state *x);
diff --git a/net/xfrm/xfrm_state.c b/net/xfrm/xfrm_state.c
index 85f3f43..b8562e4 100644
--- a/net/xfrm/xfrm_state.c
+++ b/net/xfrm/xfrm_state.c
@@ -686,6 +686,41 @@ out:
return x;
 }
 
+struct xfrm_state *
+xfrm_stateonly_find(xfrm_address_t *daddr, xfrm_address_t *saddr,
+   unsigned short family, u8 mode, u8 proto, u32 reqid)
+{
+   unsigned int h = xfrm_dst_hash(daddr, saddr, reqid, family);
+   struct xfrm_state *rx = NULL, *x = NULL;
+   struct hlist_node *entry;
+
+   spin_lock(xfrm_state_lock);
+   hlist_for_each_entry(x, entry, xfrm_state_bydst+h, bydst) {
+   if (x-props.family == family 
+   x-props.reqid == reqid 
+   !(x-props.flags  XFRM_STATE_WILDRECV) 
+   xfrm_state_addr_check(x, daddr, saddr, family) 
+   mode == x-props.mode 
+   proto == x-id.proto)  {
+
+   if (x-km.state != XFRM_STATE_VALID)
+   continue;
+   else {
+   rx = x;
+   break;
+   }
+   }
+   }
+
+   if (rx)
+   xfrm_state_hold(rx);
+   spin_unlock(xfrm_state_lock);
+
+
+   return rx;
+}
+EXPORT_SYMBOL(xfrm_stateonly_find);
+
 static void __xfrm_state_insert(struct xfrm_state *x)
 {
unsigned int h;


[PKTGEN] IPSEC support

2007-06-09 Thread jamal
4 of 4

cheers,
jamal
commit d1d8ea490a517df484e6774c4f41123ccde52434
Author: Jamal Hadi Salim [EMAIL PROTECTED]
Date:   Sat Jun 9 09:46:52 2007 -0400

[PKTGEN] IPSEC support
Added transport mode ESP support for starters.
I will send more of these modes and types once i have resolved
the tunnel mode isses.

Signed-off-by: Jamal Hadi Salim [EMAIL PROTECTED]

diff --git a/net/core/pktgen.c b/net/core/pktgen.c
index 2e861d2..2ef80aa 100644
--- a/net/core/pktgen.c
+++ b/net/core/pktgen.c
@@ -152,6 +152,9 @@
 #include net/checksum.h
 #include net/ipv6.h
 #include net/addrconf.h
+#ifdef CONFIG_XFRM
+#include net/xfrm.h
+#endif
 #include asm/byteorder.h
 #include linux/rcupdate.h
 #include asm/bitops.h
@@ -182,6 +185,7 @@
 #define F_VID_RND (19)   /* Random VLAN ID */
 #define F_SVID_RND(110)  /* Random SVLAN ID */
 #define F_FLOW_RND(111)  /* Random flows */
+#define F_IPSEC_ON(112)  /* ipsec on for flows */
 
 /* Thread control flag bits */
 #define T_TERMINATE   (10)
@@ -208,6 +212,9 @@ static struct proc_dir_entry *pg_proc_dir = NULL;
 struct flow_state {
__be32 cur_daddr;
int count;
+#ifdef CONFIG_XFRM
+   struct xfrm_state *x;
+#endif
__u32 flags;
 };
 
@@ -348,7 +355,10 @@ struct pktgen_dev {
unsigned lflow; /* Flow length  (config) */
unsigned nflows;/* accumulated flows (stats) */
unsigned curfl; /* current sequenced flow (state)*/
-
+#ifdef CONFIG_XFRM
+   __u8ipsmode;/* IPSEC mode (config) */
+   __u8ipsproto;   /* IPSEC type (config) */
+#endif
char result[512];
 };
 
@@ -702,6 +712,9 @@ static int pktgen_if_show(struct seq_file *seq, void *v)
else
seq_printf(seq,  FLOW_SEQ  ); /*in sequence flows*/
 
+   if (pkt_dev-flags  F_IPSEC_ON)
+   seq_printf(seq,  IPSEC  );
+
if (pkt_dev-flags  F_MACSRC_RND)
seq_printf(seq, MACSRC_RND  );
 
@@ -1196,6 +1209,11 @@ static ssize_t pktgen_if_write(struct file *file,
else if (strcmp(f, FLOW_RND) == 0)
pkt_dev-flags |= F_FLOW_RND;
 
+#ifdef CONFIG_XFRM
+   else if (strcmp(f, IPSEC) == 0)
+   pkt_dev-flags |= F_IPSEC_ON;
+#endif
+
else if (strcmp(f, !IPV6) == 0)
pkt_dev-flags = ~F_IPV6;
 
@@ -1204,7 +1222,7 @@ static ssize_t pktgen_if_write(struct file *file,
Flag -:%s:- unknown\nAvailable flags, (prepend 
! to un-set flag):\n%s,
f,
IPSRC_RND, IPDST_RND, UDPSRC_RND, UDPDST_RND, 
-   MACSRC_RND, MACDST_RND, TXSIZE_RND, IPV6, 
MPLS_RND, VID_RND, SVID_RND, FLOW_RND\n);
+   MACSRC_RND, MACDST_RND, TXSIZE_RND, IPV6, 
MPLS_RND, VID_RND, SVID_RND, FLOW_RND, IPSEC\n);
return count;
}
sprintf(pg_result, OK: flags=0x%x, pkt_dev-flags);
@@ -2092,6 +2110,7 @@ static void spin(struct pktgen_dev *pkt_dev, __u64 
spin_until_us)
 
 static inline void set_pkt_overhead(struct pktgen_dev *pkt_dev)
 {
+   pkt_dev-pkt_overhead = 0;
pkt_dev-pkt_overhead += pkt_dev-nr_labels*sizeof(u32);
pkt_dev-pkt_overhead += VLAN_TAG_SIZE(pkt_dev);
pkt_dev-pkt_overhead += SVLAN_TAG_SIZE(pkt_dev);
@@ -2128,6 +2147,31 @@ static inline int f_pick(struct pktgen_dev *pkt_dev)
return pkt_dev-curfl;
 }
 
+
+#ifdef CONFIG_XFRM
+/* If there was already an IPSEC SA, we keep it as is, else
+ * we go look for it ...
+*/
+inline
+void get_ipsec_sa(struct pktgen_dev *pkt_dev, int flow)
+{
+   struct xfrm_state *x = pkt_dev-flows[flow].x;
+   if (!x) {
+   /*slow path: we dont already have xfrm_state*/
+   x = xfrm_stateonly_find((xfrm_address_t *)pkt_dev-cur_daddr,
+   (xfrm_address_t *)pkt_dev-cur_saddr,
+   AF_INET,
+   pkt_dev-ipsmode,
+   pkt_dev-ipsproto, 0);
+   if (x) {
+   pkt_dev-flows[flow].x = x;
+   set_pkt_overhead(pkt_dev);
+   pkt_dev-pkt_overhead+=x-props.header_len;
+   }
+
+   }
+}
+#endif
 /* Increment/randomize headers according to flags and current values
  * for IP src/dest, UDP src/dst port, MAC-Addr src/dst
  */
@@ -2287,6 +2331,10 @@ static void mod_cur_headers(struct pktgen_dev *pkt_dev)
pkt_dev-flows[flow].flags |= F_INIT;
pkt_dev-flows[flow].cur_daddr =
pkt_dev-cur_daddr;
+#ifdef CONFIG_XFRM
+   if (pkt_dev-flags  F_IPSEC_ON)
+   get_ipsec_sa(pkt_dev, flow);
+#endif
  

Re: [XFRM] Introduce standalone SAD lookup

2007-06-09 Thread jamal
Sorry, meant to cc Herbert and James since they commented two
generations ago.
Gents, if you manage to have the cycles please look at this specific
one. Herbert, for tunnel mode i think i will agree with you and
introduce a dst struct; but i will defer that to some later patch.

cheers,
jamal

On Sat, 2007-09-06 at 10:18 -0400, jamal wrote:
 3 of 4.
 
 cheers,
 jamal

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] NET: Multiqueue network device support.

2007-06-09 Thread jamal
On Sat, 2007-09-06 at 21:08 +1000, Herbert Xu wrote:

 It takes the tx_lock in the xmit routine as well as in the clean-up
 routine.  However, the lock is only taken when it updates the queue
 status.
 
 Thanks to the ring buffer structure the rest of the clean-up/xmit code
 will run concurrently just fine.

I know you are a patient man Herbert - so please explain slowly (if that
doesnt make sense on email, then bear with me as usual) ;-

- it seems the cleverness is that some parts of the ring description are
written to on tx but not rx (and vice-versa), correct? example the
next_to_watch/use bits. If thats a yes - there at least should have been
a big fat comment on the code so nobody changes it;
- and even if thats the case, 
a) then the tx_lock sounds unneeded, correct? (given the RUNNING
atomicity).
b) do you even need the adapter lock? ;- given the nature of the NAPI
poll only one CPU can prune the descriptors.

I have tested with just getting rid of tx_lock and it worked fine. I
havent tried removing the adapter lock.

cheers,
jamal



-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: networking busted in current -git ???

2007-06-09 Thread Trond Myklebust
On Fri, 2007-06-08 at 19:06 -0700, David Miller wrote:
 From: Trond Myklebust [EMAIL PROTECTED]
 Date: Fri, 08 Jun 2007 17:43:27 -0400
 
  It is not dhcp. I'm seeing the same bug with bog-standard ifup with a
  static address on an FC-6 machine.
  
  It appears to be something in the latest dump from davem to Linus, but I
  haven't yet had time to identify what.
 
 Linus's current tree should have this fixed.
 
 Let us know if this is not the case.

It appears to be working for me again.

Trond

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH] NET: Multiqueue network device support.

2007-06-09 Thread Leonid Grossman


 -Original Message-
 From: [EMAIL PROTECTED] [mailto:netdev-
 [EMAIL PROTECTED] On Behalf Of Waskiewicz Jr, Peter P
 Sent: Wednesday, June 06, 2007 3:31 PM
 To: [EMAIL PROTECTED]; Patrick McHardy
 Cc: [EMAIL PROTECTED]; netdev@vger.kernel.org; [EMAIL PROTECTED]; Kok,
 Auke-jan H
 Subject: RE: [PATCH] NET: Multiqueue network device support.
 
  [Which of course leads to the complexity (and not optimizing
  for the common - which is single ring NICs)].
 
 The common for 100 Mbit and older 1Gbit is single ring NICs.  Newer
 PCI-X and PCIe NICs from 1Gbit to 10Gbit support multiple rings in the
 hardware, and it's all headed in that direction, so it's becoming the
 common case.

IMHO, in addition to current Intel and Neterion NICs, some/most upcoming
NICs are likely to be multiqueue, since virtualization emerges as a
major driver for hw designs (there are other things of course that drive
hw, but these are complimentary to multiqueue).

PCI-SIG IOV extensions for pci spec are almost done, and a typical NIC
(at least, typical 10GbE NIC that supports some subset of IOV) in the
near future is likely to have at least 8  independent channels with its
own tx/rx queue, MAC address, msi-x vector(s), reset that doesn't affect
other channels, etc.

Basically, each channel could be used as an independent NIC that just
happens to share pci bus and 10GbE PHY with other channels (but has
per-channel QoS and throughput guarantees).

In a non-virtualized system, such NICs could be used in a mode when each
channel runs on one core; this may eliminate some locking...  This mode
will require btw deterministic session steering, current hashing
approach in the patch is not sufficient; this is something we can
contribute once Peter's code is in. 
In general, a consensus on kernel support for multiqueue NICs will be
beneficial since multiqueue HW is here and other stacks already taking
advantage of it. 
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] PHY fixed driver: rework release path and update phy_id notation

2007-06-09 Thread Vitaly Bordug

device_bind_driver() error code returning has been fixed. 
release() function has been written, so that to free resources
in correct way; the release path is now clean.

Before the rework, it used to cause
 Device '[EMAIL PROTECTED]:1' does not have a release() function, it is broken
 and must be fixed.
 BUG: at drivers/base/core.c:104 device_release()
 
 Call Trace:  
  [802ec380] kobject_cleanup+0x53/0x7e
  [802ec3ab] kobject_release+0x0/0x9
  [802ecf3f] kref_put+0x74/0x81
  [8035493b] fixed_mdio_register_device+0x230/0x265
  [80564d31] fixed_init+0x1f/0x35
  [802071a4] init+0x147/0x2fb
  [80223b6e] schedule_tail+0x36/0x92
  [8020a678] child_rip+0xa/0x12
  [80311714] acpi_ds_init_one_object+0x0/0x83
  [8020705d] init+0x0/0x2fb
  [8020a66e] child_rip+0x0/0x12  


Also changed the notation of the fixed phy definition on
mdio bus to the form of speed+duplex to make it able to be used by
gianfar and ucc_geth that define phy_id strictly as %d:%d

Signed-off-by: Vitaly Bordug [EMAIL PROTECTED]

Signed-off-by: Vitaly Bordug [EMAIL PROTECTED]
  
---

 drivers/net/phy/Kconfig |4 ++
 drivers/net/phy/fixed.c |   93 +++
 2 files changed, 57 insertions(+), 40 deletions(-)

diff --git a/drivers/net/phy/Kconfig b/drivers/net/phy/Kconfig
index 09b6f25..a938c48 100644
--- a/drivers/net/phy/Kconfig
+++ b/drivers/net/phy/Kconfig
@@ -71,4 +71,8 @@ config FIXED_MII_100_FDX
bool Emulation for 100M Fdx fixed PHY behavior
depends on FIXED_PHY
 
+config FIXED_MII_1000_FDX
+   bool Emulation for 1000M Fdx fixed PHY behavior
+   depends on FIXED_PHY
+
 endif # PHYLIB
diff --git a/drivers/net/phy/fixed.c b/drivers/net/phy/fixed.c
index 68c99b4..34b9111 100644
--- a/drivers/net/phy/fixed.c
+++ b/drivers/net/phy/fixed.c
@@ -187,12 +187,29 @@ static struct phy_driver fixed_mdio_driver = {
.driver = { .owner = THIS_MODULE,},
 };
 
+static void fixed_mdio_release (struct device * dev)
+{
+   struct phy_device *phydev = container_of(dev, struct phy_device, dev);
+   struct mii_bus *bus = phydev-bus;
+   struct fixed_info *fixed = bus-priv;
+
+   kfree(phydev);
+   kfree(bus-dev);
+   kfree(bus);
+   kfree(fixed-regs);
+   kfree(fixed);
+}
+
 /*-
  *  This func is used to create all the necessary stuff, bind
  * the fixed phy driver and register all it on the mdio_bus_type.
- * speed is either 10 or 100, duplex is boolean.
+ * speed is either 10 or 100 or 1000, duplex is boolean.
  * number is used to create multiple fixed PHYs, so that several devices can
  * utilize them simultaneously.
+ *
+ * The device on mdio bus will look like bus_id:phy_id,
+ * bus_id = number 
+ * phy_id = speed+duplex.
  
*-*/
 static int fixed_mdio_register_device(int number, int speed, int duplex)
 {
@@ -221,6 +238,12 @@ static int fixed_mdio_register_device(int number, int 
speed, int duplex)
}
 
fixed-regs = kzalloc(MII_REGS_NUM*sizeof(int), GFP_KERNEL);
+   if (NULL == fixed-regs) {
+   kfree(dev);
+   kfree(new_bus);
+   kfree(fixed);
+   return -ENOMEM;
+   }
fixed-regs_num = MII_REGS_NUM;
fixed-phy_status.speed = speed;
fixed-phy_status.duplex = duplex;
@@ -249,57 +272,43 @@ static int fixed_mdio_register_device(int number, int 
speed, int duplex)
fixed-phydev = phydev;
 
if(NULL == phydev) {
-   err = -ENOMEM;
-   goto device_create_fail;
+   kfree(dev);
+   kfree(new_bus);
+   kfree(fixed-regs);
+   kfree(fixed);
+   return -ENOMEM;
}
 
phydev-irq = PHY_IGNORE_INTERRUPT;
phydev-dev.bus = mdio_bus_type;
 
-   if(number)
-   snprintf(phydev-dev.bus_id, BUS_ID_SIZE,
-   [EMAIL PROTECTED]:%d, number, speed, duplex);
-   else
-   snprintf(phydev-dev.bus_id, BUS_ID_SIZE,
-   [EMAIL PROTECTED]:%d, speed, duplex);
+   snprintf(phydev-dev.bus_id, BUS_ID_SIZE,
+   %d:%d, number, speed + duplex);
+
phydev-bus = new_bus;
 
+   phydev-dev.driver = fixed_mdio_driver.driver;
+   phydev-dev.release = fixed_mdio_release;
+ 
+   err = phydev-dev.driver-probe(phydev-dev);
+   if(err  0) {
+   printk(KERN_ERR Phy %s: problems with fixed driver\n,
+   phydev-dev.bus_id);
+   kfree(phydev);
+   kfree(dev);
+   kfree(new_bus);
+   kfree(fixed-regs);
+   kfree(fixed);
+   

[PATCH] PHY fixed driver: rework release path and update phy_id notation

2007-06-09 Thread Vitaly Bordug

device_bind_driver() error code returning has been fixed. 
release() function has been written, so that to free resources
in correct way; the release path is now clean.

Before the rework, it used to cause
 Device '[EMAIL PROTECTED]:1' does not have a release() function, it is broken
 and must be fixed.
 BUG: at drivers/base/core.c:104 device_release()
 
 Call Trace:  
  [802ec380] kobject_cleanup+0x53/0x7e
  [802ec3ab] kobject_release+0x0/0x9
  [802ecf3f] kref_put+0x74/0x81
  [8035493b] fixed_mdio_register_device+0x230/0x265
  [80564d31] fixed_init+0x1f/0x35
  [802071a4] init+0x147/0x2fb
  [80223b6e] schedule_tail+0x36/0x92
  [8020a678] child_rip+0xa/0x12
  [80311714] acpi_ds_init_one_object+0x0/0x83
  [8020705d] init+0x0/0x2fb
  [8020a66e] child_rip+0x0/0x12  


Also changed the notation of the fixed phy definition on
mdio bus to the form of speed+duplex to make it able to be used by
gianfar and ucc_geth that define phy_id strictly as %d:%d

Signed-off-by: Vitaly Bordug [EMAIL PROTECTED]

---

 drivers/net/phy/Kconfig |4 ++
 drivers/net/phy/fixed.c |   93 +++
 2 files changed, 57 insertions(+), 40 deletions(-)

diff --git a/drivers/net/phy/Kconfig b/drivers/net/phy/Kconfig
index 09b6f25..a938c48 100644
--- a/drivers/net/phy/Kconfig
+++ b/drivers/net/phy/Kconfig
@@ -71,4 +71,8 @@ config FIXED_MII_100_FDX
bool Emulation for 100M Fdx fixed PHY behavior
depends on FIXED_PHY
 
+config FIXED_MII_1000_FDX
+   bool Emulation for 1000M Fdx fixed PHY behavior
+   depends on FIXED_PHY
+
 endif # PHYLIB
diff --git a/drivers/net/phy/fixed.c b/drivers/net/phy/fixed.c
index 68c99b4..34b9111 100644
--- a/drivers/net/phy/fixed.c
+++ b/drivers/net/phy/fixed.c
@@ -187,12 +187,29 @@ static struct phy_driver fixed_mdio_driver = {
.driver = { .owner = THIS_MODULE,},
 };
 
+static void fixed_mdio_release (struct device * dev)
+{
+   struct phy_device *phydev = container_of(dev, struct phy_device, dev);
+   struct mii_bus *bus = phydev-bus;
+   struct fixed_info *fixed = bus-priv;
+
+   kfree(phydev);
+   kfree(bus-dev);
+   kfree(bus);
+   kfree(fixed-regs);
+   kfree(fixed);
+}
+
 /*-
  *  This func is used to create all the necessary stuff, bind
  * the fixed phy driver and register all it on the mdio_bus_type.
- * speed is either 10 or 100, duplex is boolean.
+ * speed is either 10 or 100 or 1000, duplex is boolean.
  * number is used to create multiple fixed PHYs, so that several devices can
  * utilize them simultaneously.
+ *
+ * The device on mdio bus will look like bus_id:phy_id,
+ * bus_id = number 
+ * phy_id = speed+duplex.
  
*-*/
 static int fixed_mdio_register_device(int number, int speed, int duplex)
 {
@@ -221,6 +238,12 @@ static int fixed_mdio_register_device(int number, int 
speed, int duplex)
}
 
fixed-regs = kzalloc(MII_REGS_NUM*sizeof(int), GFP_KERNEL);
+   if (NULL == fixed-regs) {
+   kfree(dev);
+   kfree(new_bus);
+   kfree(fixed);
+   return -ENOMEM;
+   }
fixed-regs_num = MII_REGS_NUM;
fixed-phy_status.speed = speed;
fixed-phy_status.duplex = duplex;
@@ -249,57 +272,43 @@ static int fixed_mdio_register_device(int number, int 
speed, int duplex)
fixed-phydev = phydev;
 
if(NULL == phydev) {
-   err = -ENOMEM;
-   goto device_create_fail;
+   kfree(dev);
+   kfree(new_bus);
+   kfree(fixed-regs);
+   kfree(fixed);
+   return -ENOMEM;
}
 
phydev-irq = PHY_IGNORE_INTERRUPT;
phydev-dev.bus = mdio_bus_type;
 
-   if(number)
-   snprintf(phydev-dev.bus_id, BUS_ID_SIZE,
-   [EMAIL PROTECTED]:%d, number, speed, duplex);
-   else
-   snprintf(phydev-dev.bus_id, BUS_ID_SIZE,
-   [EMAIL PROTECTED]:%d, speed, duplex);
+   snprintf(phydev-dev.bus_id, BUS_ID_SIZE,
+   %d:%d, number, speed + duplex);
+
phydev-bus = new_bus;
 
+   phydev-dev.driver = fixed_mdio_driver.driver;
+   phydev-dev.release = fixed_mdio_release;
+ 
+   err = phydev-dev.driver-probe(phydev-dev);
+   if(err  0) {
+   printk(KERN_ERR Phy %s: problems with fixed driver\n,
+   phydev-dev.bus_id);
+   kfree(phydev);
+   kfree(dev);
+   kfree(new_bus);
+   kfree(fixed-regs);
+   kfree(fixed);
+   return err;
+   }
+ 
err = device_register(phydev-dev);
if(err) {
printk(KERN_ERR Phy %s failed 

[1/2] 2.6.22-rc4: known regressions with patches v2

2007-06-09 Thread Michal Piotrowski

Hi all,

Here is a list of some known regressions in 2.6.22-rc4
with patches available.

Feel free to add new regressions/remove fixed etc.
http://kernelnewbies.org/known_regressions



Unclassified

Subject: kernel BUG at arch/i386/kernel/cpu/perfctr-watchdog.c:126!
References : http://lkml.org/lkml/2007/6/3/60
Submitter  : Udo A. Steinberg [EMAIL PROTECTED]
Handled-By : Björn Steinbrink [EMAIL PROTECTED]
Patch  : http://lkml.org/lkml/2007/6/8/23
Status : patch available



Memory management

Subject: bug in i386 MTRR initialization
References : http://lkml.org/lkml/2007/5/19/93
Submitter  : Andrea Righi [EMAIL PROTECTED]
Status : patch available



Networking

Subject: OOPS iproute2/tc/u32_destroy in 2.6.22-rc3-git6
References : http://lkml.org/lkml/2007/6/3/66
Submitter  : Strobl Anton [EMAIL PROTECTED]
Handled-By : Patrick McHardy [EMAIL PROTECTED]
Patch  : http://lkml.org/lkml/2007/6/3/137
Status : patch available

Subject: no irda0 interface (2.6.21 was OK), smsc does not find chip
References : http://lkml.org/lkml/2007/6/3/16
Submitter  : Andrey Borzenkov [EMAIL PROTECTED]
Handled-By : Samuel Ortiz [EMAIL PROTECTED]
Bjorn Helgaas [EMAIL PROTECTED]
Patch  : http://lkml.org/lkml/2007/6/7/237
Status : patch was suggested



Regards,
Michal

--
Najbardziej brakowało mi twojego milczenia.
-- Andrzej Sapkowski Coś więcej
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH] NET: Multiqueue network device support.

2007-06-09 Thread jamal
On Sat, 2007-09-06 at 10:58 -0400, Leonid Grossman wrote:

 IMHO, in addition to current Intel and Neterion NICs, some/most upcoming
 NICs are likely to be multiqueue, since virtualization emerges as a
 major driver for hw designs (there are other things of course that drive
 hw, but these are complimentary to multiqueue).
 
 PCI-SIG IOV extensions for pci spec are almost done, and a typical NIC
 (at least, typical 10GbE NIC that supports some subset of IOV) in the
 near future is likely to have at least 8  independent channels with its
 own tx/rx queue, MAC address, msi-x vector(s), reset that doesn't affect
 other channels, etc.

Leonid - any relation between that and data center ethernet? i.e
http://www.ieee802.org/3/ar/public/0503/wadekar_1_0503.pdf
It seems to desire to do virtualization as well. 
Is there any open spec for PCI-SIG IOV?

 Basically, each channel could be used as an independent NIC that just
 happens to share pci bus and 10GbE PHY with other channels (but has
 per-channel QoS and throughput guarantees).

Sounds very similar to data centre ethernet - except data centre
ethernet seems to map channels to rings; whereas the scheme you
describe maps a channel essentially to a virtual nic which seems to read
in the common case as a single tx, single rx ring. Is that right? If
yes, we should be able to do the virtual nics today without any changes
really since each one appears as a separate NIC. It will be a matter of
probably boot time partitioning and parametrization to create virtual
nics (ex of priorities of each virtual NIC etc).

 In a non-virtualized system, such NICs could be used in a mode when each
 channel runs on one core; this may eliminate some locking...  This mode
 will require btw deterministic session steering, current hashing
 approach in the patch is not sufficient; this is something we can
 contribute once Peter's code is in. 

I can actually see how the PCI-SIG approach using virtual NIC approach
could run on multiple CPUs (since each is no different from a NIC that
we have today). And our current Linux steering would also work just
fine.

In the case of non-virtual NICs, i am afraid i dont think it is as easy
as simple session steering - if you want to be generic that is; you may
wanna consider a more complex connection tracking i.e a grouping of
sessions as the basis for steering to a tx ring (and therefore tying to
a specific CPU).
If you are an ISP or a data center with customers partitioned based on
simple subnets, then i can see a simple classification based on subnets
being tied to a hw ring/CPU. And in such cases simple flow control on a
per ring basis makes sense.
Have you guys experimented on the the non-virtual case? And are you
doing the virtual case as a pair of tx/rx being a single virtual nic?

 In general, a consensus on kernel support for multiqueue NICs will be
 beneficial since multiqueue HW is here and other stacks already taking
 advantage of it. 

My main contention with the Peters approach has been to do with the 
propagating of flow control back to the qdisc queues. However, if this
PCI SIG standard is also desiring such an approach then it will shed a
different light.

cheers,
jamal

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH] NET: Multiqueue network device support.

2007-06-09 Thread Leonid Grossman


 -Original Message-
 From: J Hadi Salim [mailto:[EMAIL PROTECTED] On Behalf Of jamal
 Sent: Saturday, June 09, 2007 12:23 PM
 To: Leonid Grossman
 Cc: Waskiewicz Jr, Peter P; Patrick McHardy; [EMAIL PROTECTED];
 netdev@vger.kernel.org; [EMAIL PROTECTED]; Kok, Auke-jan H; Ramkrishna
 Vepa; Alex Aizman
 Subject: RE: [PATCH] NET: Multiqueue network device support.
 
 On Sat, 2007-09-06 at 10:58 -0400, Leonid Grossman wrote:
 
  IMHO, in addition to current Intel and Neterion NICs, some/most
 upcoming
  NICs are likely to be multiqueue, since virtualization emerges as a
  major driver for hw designs (there are other things of course that
 drive
  hw, but these are complimentary to multiqueue).
 
  PCI-SIG IOV extensions for pci spec are almost done, and a typical
 NIC
  (at least, typical 10GbE NIC that supports some subset of IOV) in
the
  near future is likely to have at least 8  independent channels with
 its
  own tx/rx queue, MAC address, msi-x vector(s), reset that doesn't
 affect
  other channels, etc.
 
 Leonid - any relation between that and data center ethernet? i.e
 http://www.ieee802.org/3/ar/public/0503/wadekar_1_0503.pdf
 It seems to desire to do virtualization as well.

Not really. This is a very old presentation; you probably saw some newer
PR on Convergence Enhanced Ethernet, Congestion Free Ethernet etc. 
These efforts are in very early stages and arguably orthogonal to
virtualization, but in general having per channel QoS (flow control is
just a part of it) is a good thing. 

 Is there any open spec for PCI-SIG IOV?

I don't think so, the actual specs and event presentations at
www.pcisig.org are members-only, although there are many PRs about early
IOV support that may shed some light on the features.  

But my point was that while virtualization capabilities of upcoming NICs
may be not even relevant to Linux, the multi-channel hw designs (a side
effect of virtualization push, if you will) will be there and a
non-virtualized stack can take advantage of them.

Actually, our current 10GbE NICs have most of such multichannel
framework already shipping (in pre-IOV fashion), so the programming
manual on the website can probably give you a pretty good idea about how
multi-channel 10GbE NICs may look like. 

 
  Basically, each channel could be used as an independent NIC that
just
  happens to share pci bus and 10GbE PHY with other channels (but has
  per-channel QoS and throughput guarantees).
 
 Sounds very similar to data centre ethernet - except data centre
 ethernet seems to map channels to rings; whereas the scheme you
 describe maps a channel essentially to a virtual nic which seems to
 read
 in the common case as a single tx, single rx ring. Is that right? If
 yes, we should be able to do the virtual nics today without any
changes
 really since each one appears as a separate NIC. It will be a matter
of
 probably boot time partitioning and parametrization to create virtual
 nics (ex of priorities of each virtual NIC etc).

Right, this is one deployment scenario for a multi-channel NIC, and it
will require very few changes in the stack (couple extra IOCTLS would be
nice).
There are two reasons why you still may want to have a generic
multi-channel support/awareness in the stack: 
1. Some users may want to have single ip interface with multiple
channels.
2. While multi-channel NICs will likely to be many, only best-in-class
will make the hw channels completely independent and able to operate
as a separate nic. Other implementations may have some limitations, and
will work as multi-channel API compliant devices but not nesseserily as
independent mac devices.
I agree though that supporting multi-channel APIs is a bigger effort.

 
  In a non-virtualized system, such NICs could be used in a mode when
 each
  channel runs on one core; this may eliminate some locking...  This
 mode
  will require btw deterministic session steering, current hashing
  approach in the patch is not sufficient; this is something we can
  contribute once Peter's code is in.
 
 I can actually see how the PCI-SIG approach using virtual NIC approach
 could run on multiple CPUs (since each is no different from a NIC that
 we have today). And our current Linux steering would also work just
 fine.
 
 In the case of non-virtual NICs, i am afraid i dont think it is as
easy
 as simple session steering - if you want to be generic that is; you
may
 wanna consider a more complex connection tracking i.e a grouping of
 sessions as the basis for steering to a tx ring (and therefore tying
to
 a specific CPU).
 If you are an ISP or a data center with customers partitioned based on
 simple subnets, then i can see a simple classification based on
subnets
 being tied to a hw ring/CPU. And in such cases simple flow control on
a
 per ring basis makes sense.
 Have you guys experimented on the the non-virtual case? And are you
 doing the virtual case as a pair of tx/rx being a single virtual nic?

To a degree. We have quite a bit of 

Re: [PATCH] NET: Multiqueue network device support.

2007-06-09 Thread Jeff Garzik

Leonid Grossman wrote:

But my point was that while virtualization capabilities of upcoming NICs
may be not even relevant to Linux, the multi-channel hw designs (a side
effect of virtualization push, if you will) will be there and a
non-virtualized stack can take advantage of them.



I'm looking at the current hardware virtualization efforts, and often 
grimacing.  A lot of these efforts assume that virtual PCI devices 
will be wonderful virtualization solutions, without stopping to think 
about global events that affect all such devices, such as silicon resets 
or errata workarounds.  In the real world, you wind up having to 
un-virtualize to deal with certain exceptional events.


But as you point out, these hardware virt efforts can bestow benefits on 
non-virtualized stacks.


Jeff
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/4] NetXen: Fix link status messages

2007-06-09 Thread Jeff Garzik

Mithlesh Thukral wrote:

-   if ((netif_running(netdev))  !netif_carrier_ok(netdev)) {
-   printk(KERN_INFO %s port %d, %s carrier is now ok\n,
-  netxen_nic_driver_name, adapter-portnum, netdev-name);
+   if ((netdev-flags  IFF_UP)  !netif_carrier_ok(netdev) 
+   netxen_nic_link_ok(adapter) ) {
+   printk(KERN_INFO %s %s (port %d), Link is up\n,
+  netxen_nic_driver_name, netdev-name, 
adapter-portnum);
netif_carrier_on(netdev);
-   }
-
-   if (netif_queue_stopped(netdev))
netif_wake_queue(netdev);
+   } else if(!(netdev-flags  IFF_UP)  netif_carrier_ok(netdev)) {
+   printk(KERN_ERR %s %s Link is Down\n,
+   netxen_nic_driver_name, netdev-name);
+   netif_carrier_off(netdev);
+   netif_stop_queue(netdev);



Most of the patch is OK, but by substituting IFF_UP tests for 
netif_running(), you are removing race-free, correct tests and replacing 
them with incorrect, racy tests.


NAK the IFF_UP changes.  the rest looks OK.

Jeff


-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2.6.21.3] bonding: Fix 802.3ad no carrier on no partner found instance

2007-06-09 Thread Jeff Garzik

Laurent Chavey wrote:

Remove the requirement to have at least one configured partner to
enable the operation of links. The later is necessary to have the code
in compliance with section 43.3.9 of IEEE 802.3,

Signed-off-by: Laurent Chavey [EMAIL PROTECTED]


Looks OK but patch is corrupted:

[EMAIL PROTECTED] netdev-2.6]$ git-am --signoff --utf8 /g/tmp/mbox

Applying 'bonding: Fix 802.3ad no carrier on no partner found instance'

fatal: patch fragment without header at line 7: @@ -2303,19 +2303,17 @@
Patch failed at 0001.
When you have resolved this problem run git-am --resolved.
If you would prefer to skip this patch, instead run git-am --skip.

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] ibmveth: Fix h_free_logical_lan error on pool resize

2007-06-09 Thread Jeff Garzik

Brian King wrote:

When attempting to activate additional rx buffer pools on an ibmveth interface 
that
was not yet up, the error below was seen. The patch fixes this by only closing
and opening the interface to activate the resize if the interface is already
opened.


applied 1-2 to #upstream-fixes


-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: typo in via-velocity.c

2007-06-09 Thread Jeff Garzik

Dave Jones wrote:

http://bugzilla.kernel.org/show_bug.cgi?id=8160

Signed-off-by: Dave Jones [EMAIL PROTECTED]


applied


-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/4] NetXen: Fix ping issue after reboot on Blades with 3.4.19 firmware

2007-06-09 Thread Jeff Garzik

Mithlesh Thukral wrote:

NetXen: Fix initialization and subsequent ping issue on 3.4.19 firmware
This patch fixes the ping problem seen X/PBlades after the adapter's 
firmware was moved to 3.4.19. After configured interface up, ping 
failed. 
NetXen adapter couldn't accept ARP broadcast packet. Manual addition of

MAC address in the ARP table, made ping work.
NetXen adapter should finish initilization after system boot. But looks
NetXen adapter didn't initilization correctly after system boot up.
So have to re-load the firmware again in probe routine.
Also re-initilization netxen_config_0 and netxen_config_1 registers.

Signed-off by: Wen Xiong [EMAIL PROTECTED]
Signed-off by: Mithlesh Thukral [EMAIL PROTECTED]


applied


-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/4] NetXen: Fix compile failure seen on PPC architecture

2007-06-09 Thread Jeff Garzik

Mithlesh Thukral wrote:

NetXen: Add NETXEN prefixes to macros to clean them up.
This is a cleanup patch which adds NETXEN prefix to some stand 
alone macro names.

These posed compile errors when NetXen driver was backported to 2.6.9
on PPC architecture as macros like USER_START are defined in file
arch/ppc64/mm/hash_utils.c

Signed-off-by: Andy Gospodarek [EMAIL PROTECTED]
Signed-off by: Wen Xiong [EMAIL PROTECTED]
Acked-off by: Mithlesh Thukral [EMAIL PROTECTED]


applied


-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2.6.22-rc4] ehea: Fixed possible kernel panic on VLAN packet recv

2007-06-09 Thread Jeff Garzik

Thomas Klein wrote:

This patch fixes a possible kernel panic due to not checking the vlan group
when processing received VLAN packets and a malfunction in VLAN/hypervisor
registration.


Signed-off-by: Thomas Klein [EMAIL PROTECTED]


applied


-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] phylib: add RGMII-ID mode to the Marvell m88e1111 PHY to fix broken ucc_geth

2007-06-09 Thread Jeff Garzik

Li Yang wrote:

From: Kim Phillips [EMAIL PROTECTED]

Support for configuring RGMII-ID (RGMII with internal delay) mode on the
88e and 88e1145.  Ucc_geth on MPC8360EMDS(the main user of ucc_geth)
is broken after changed to use phylib.  It is fixed by adding this
internal delay.

Also renamed 88es - 88e (no references to an 88es part were
found), and fixed some whitespace.

Signed-off-by: Kim Phillips [EMAIL PROTECTED]
Signed-off-by: Li Yang [EMAIL PROTECTED]
---
Please push this to Linus before 2.6.22 rc phase ends.  The regression
has caused serious breakage to ucc_geth driver.

drivers/net/phy/marvell.c |   62 
+++--

1 files changed, 54 insertions(+), 8 deletions(-)


applied


-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] net: fix typo in drivers/net/usb/Kconfig

2007-06-09 Thread Sam Ravnborg
Replace invisible character with a space.

The diff looks like this on my terminal:
-A0Choose this option if you're using a host-to-host cable
-A0with one of these chips.
+ Choose this option if you're using a host-to-host cable
+ with one of these chips.

Reported by: Massimo Maiurana [EMAIL PROTECTED]

Signed-off-by: Sam Ravnborg [EMAIL PROTECTED]
Cc: Massimo Maiurana [EMAIL PROTECTED]
---
diff --git a/drivers/net/usb/Kconfig b/drivers/net/usb/Kconfig
index 3de564b..8dc09a3 100644
--- a/drivers/net/usb/Kconfig
+++ b/drivers/net/usb/Kconfig
@@ -313,8 +313,8 @@ config USB_KC2190
boolean KT Technology KC2190 based cables (InstaNet)
depends on USB_NET_CDC_SUBSET  EXPERIMENTAL
help
- Choose this option if you're using a host-to-host cable
- with one of these chips.
+ Choose this option if you're using a host-to-host cable
+ with one of these chips.
 
 config USB_NET_ZAURUS
tristate Sharp Zaurus (stock ROMs) and compatible
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH] NET: Multiqueue network device support.

2007-06-09 Thread jamal
On Sat, 2007-09-06 at 17:23 -0400, Leonid Grossman wrote:

 Not really. This is a very old presentation; you probably saw some newer
 PR on Convergence Enhanced Ethernet, Congestion Free Ethernet etc.

Not been keeping up to date in that area.

 These efforts are in very early stages and arguably orthogonal to
 virtualization, but in general having per channel QoS (flow control is
 just a part of it) is a good thing. 

our definition of channel on linux so far is a netdev
(not a DMA ring). A netdev is the entity that can be bound to a CPU.
Link layer flow control terminates (and emanates) from the netdev.

 But my point was that while virtualization capabilities of upcoming NICs
 may be not even relevant to Linux, the multi-channel hw designs (a side
 effect of virtualization push, if you will) will be there and a
 non-virtualized stack can take advantage of them.

Makes sense...

 Actually, our current 10GbE NICs have most of such multichannel
 framework already shipping (in pre-IOV fashion), so the programming
 manual on the website can probably give you a pretty good idea about how
 multi-channel 10GbE NICs may look like. 

Ok, thanks.

 Right, this is one deployment scenario for a multi-channel NIC, and it
 will require very few changes in the stack (couple extra IOCTLS would be
 nice).

Essentially a provisioning interface.

 There are two reasons why you still may want to have a generic
 multi-channel support/awareness in the stack: 
 1. Some users may want to have single ip interface with multiple
 channels.
 2. While multi-channel NICs will likely to be many, only best-in-class
 will make the hw channels completely independent and able to operate
 as a separate nic. Other implementations may have some limitations, and
 will work as multi-channel API compliant devices but not nesseserily as
 independent mac devices.
 I agree though that supporting multi-channel APIs is a bigger effort.

IMO, the challenges you describe above are solvable via a parent
netdevice (similar to bonding) with children being the virtual NICs. The
IP address is attached to the parent. Of course the other model is not
to show the parent device at all.

 To a degree. We have quite a bit of testing done in non-virtual OS (not
 in Linux though), using channels with tx/rx rings, msi-x etc as
 independent NICs. Flow control was not a focus since the fabric
 typically was not congested in these tests, but in theory per-channel
 flow control should work reasonably well. Of course, flow control is
 only part of resource sharing problem. 

In the current model - flow control to the s/ware queueing level (qdisc)
is implicit. i.e hardware receives pause frames - stops sending; ring
becomes full as hardware sends, netdev tx path gets shut until things
open up when 

 This is not what I'm saying :-). The IEEE link you sent shows that
 per-link flow control is a separate effort, and it will likely to take
 time to become a standard. 

Ok, my impression was it was happening already or it will happen
tommorow morning ;-

 Also, (besides the shared link) the channels will share pci bus.
 
 One solution could be to provide a generic API for QoS level to a
 channel 
 (and also to a generic NIC!). 
 Internally, device driver can translate QoS requirements into flow
 control, pci bus bandwidth, and whatever else is shared on the physical
 NIC between the channels.
 As always, as some of that code becomes common between the drivers it
 can migrate up.

indeed. 

cheers,
jamal


-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[git patches] net driver fixes

2007-06-09 Thread Jeff Garzik

A big batch of fixes for the newly added libertas wireless driver is
coming soon, too.

Please pull from 'upstream-linus' branch of
master.kernel.org:/pub/scm/linux/kernel/git/jgarzik/netdev-2.6.git 
upstream-linus

to receive the following updates:

 drivers/net/ehea/ehea.h |2 +-
 drivers/net/ehea/ehea_main.c|   12 ++---
 drivers/net/ibmveth.c   |   80 +--
 drivers/net/netxen/netxen_nic.h |   47 +-
 drivers/net/netxen/netxen_nic_ethtool.c |8 ++--
 drivers/net/netxen/netxen_nic_hw.c  |   12 ++--
 drivers/net/netxen/netxen_nic_init.c|   23 +
 drivers/net/netxen/netxen_nic_main.c|7 +++
 drivers/net/netxen/netxen_nic_niu.c |8 +--
 drivers/net/phy/marvell.c   |   62 +---
 drivers/net/usb/Kconfig |4 +-
 drivers/net/via-velocity.c  |2 +-
 12 files changed, 172 insertions(+), 95 deletions(-)

Brian King (2):
  ibmveth: Fix h_free_logical_lan error on pool resize
  ibmveth: Automatically enable larger rx buffer pools for larger mtu

Dave Jones (1):
  typo in via-velocity.c

Kim Phillips (1):
  phylib: add RGMII-ID mode to the Marvell m88e PHY to fix broken 
ucc_geth

Mithlesh Thukral (2):
  NetXen: Fix ping issue after reboot on Blades with 3.4.19 firmware
  NetXen: Fix compile failure seen on PPC architecture

Sam Ravnborg (1):
  net: fix typo in drivers/net/usb/Kconfig

Thomas Klein (1):
  ehea: Fixed possible kernel panic on VLAN packet recv

diff --git a/drivers/net/ehea/ehea.h b/drivers/net/ehea/ehea.h
index e85a933..c0f81b5 100644
--- a/drivers/net/ehea/ehea.h
+++ b/drivers/net/ehea/ehea.h
@@ -39,7 +39,7 @@
 #include asm/io.h
 
 #define DRV_NAME   ehea
-#define DRV_VERSIONEHEA_0061
+#define DRV_VERSIONEHEA_0064
 
 #define EHEA_MSG_DEFAULT (NETIF_MSG_LINK | NETIF_MSG_TIMER \
| NETIF_MSG_RX_ERR | NETIF_MSG_TX_ERR)
diff --git a/drivers/net/ehea/ehea_main.c b/drivers/net/ehea/ehea_main.c
index 152bb20..9e13433 100644
--- a/drivers/net/ehea/ehea_main.c
+++ b/drivers/net/ehea/ehea_main.c
@@ -451,7 +451,8 @@ static struct ehea_cqe *ehea_proc_rwqes(struct net_device 
*dev,
processed_rq3++;
}
 
-   if (cqe-status  EHEA_CQE_VLAN_TAG_XTRACT)
+   if ((cqe-status  EHEA_CQE_VLAN_TAG_XTRACT)
+port-vgrp)
vlan_hwaccel_receive_skb(skb, port-vgrp,
 cqe-vlan_tag);
else
@@ -1910,10 +1911,7 @@ static void ehea_vlan_rx_register(struct net_device *dev,
goto out;
}
 
-   if (grp)
-   memset(cb1-vlan_filter, 0, sizeof(cb1-vlan_filter));
-   else
-   memset(cb1-vlan_filter, 0xFF, sizeof(cb1-vlan_filter));
+   memset(cb1-vlan_filter, 0, sizeof(cb1-vlan_filter));
 
hret = ehea_h_modify_ehea_port(adapter-handle, port-logical_port_id,
   H_PORT_CB1, H_PORT_CB1_ALL, cb1);
@@ -1947,7 +1945,7 @@ static void ehea_vlan_rx_add_vid(struct net_device *dev, 
unsigned short vid)
}
 
index = (vid / 64);
-   cb1-vlan_filter[index] |= ((u64)(1  (vid  0x3F)));
+   cb1-vlan_filter[index] |= ((u64)(0x8000  (vid  0x3F)));
 
hret = ehea_h_modify_ehea_port(adapter-handle, port-logical_port_id,
   H_PORT_CB1, H_PORT_CB1_ALL, cb1);
@@ -1982,7 +1980,7 @@ static void ehea_vlan_rx_kill_vid(struct net_device *dev, 
unsigned short vid)
}
 
index = (vid / 64);
-   cb1-vlan_filter[index] = ~((u64)(1  (vid  0x3F)));
+   cb1-vlan_filter[index] = ~((u64)(0x8000  (vid  0x3F)));
 
hret = ehea_h_modify_ehea_port(adapter-handle, port-logical_port_id,
   H_PORT_CB1, H_PORT_CB1_ALL, cb1);
diff --git a/drivers/net/ibmveth.c b/drivers/net/ibmveth.c
index 3bec0f7..6ec3d50 100644
--- a/drivers/net/ibmveth.c
+++ b/drivers/net/ibmveth.c
@@ -915,17 +915,36 @@ static int ibmveth_change_mtu(struct net_device *dev, int 
new_mtu)
 {
struct ibmveth_adapter *adapter = dev-priv;
int new_mtu_oh = new_mtu + IBMVETH_BUFF_OH;
-   int i;
+   int reinit = 0;
+   int i, rc;
 
if (new_mtu  IBMVETH_MAX_MTU)
return -EINVAL;
 
+   for (i = 0; i  IbmVethNumBufferPools; i++)
+   if (new_mtu_oh  adapter-rx_buff_pool[i].buff_size)
+   break;
+
+   if (i == IbmVethNumBufferPools)
+   return -EINVAL;
+
/* Look for an active buffer pool that can hold the new MTU */
for(i = 0; iIbmVethNumBufferPools; i++) {
-   if (!adapter-rx_buff_pool[i].active)
-   continue;
+   if