Re: [PATCH] Allow kfree_skb to be called with a NULL argument

2006-02-23 Thread Herbert Xu
On Thu, Feb 23, 2006 at 07:53:36AM +0100, J?rn Engel wrote:
 
 How is that argument special for kfree_skb?  Both libc free and kfree
 ignore NULL arguments and do so for good reasons.

Well with kfree there is actually a slight gain in that you are doing
the check in one place.

kfree_skb on the other hand is inlined so the you're actually adding
bloat to many places that simply don't need it.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmVHI~} [EMAIL PROTECTED]
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH]IPv4 UDP does not discard the datagram with invalid checksum

2006-02-23 Thread David S. Miller
From: Wei Yongjun [EMAIL PROTECTED]
Date: Thu, 23 Feb 2006 16:03:18 -0500

 IPv4 UDP does not discard the datagram with invalid checksum. UDP can
 validate UDP checksums correctly only when socket filtering instructions
 is set. If socket filtering instructions is not set, datagram with
 invalid checksum will be passed to the application.

We check the checksum later, in parallel with the copy of
the packet data into userspace.

See udp_recvmsg(), where we do this:

if (skb-ip_summed==CHECKSUM_UNNECESSARY) {
err = skb_copy_datagram_iovec(skb, sizeof(struct udphdr), 
msg-msg_iov,
  copied);
} else if (msg-msg_flagsMSG_TRUNC) {
if (__udp_checksum_complete(skb))
goto csum_copy_err;
err = skb_copy_datagram_iovec(skb, sizeof(struct udphdr), 
msg-msg_iov,
  copied);
} else {
err = skb_copy_and_csum_datagram_iovec(skb, sizeof(struct 
udphdr), msg-msg_iov);

if (err == -EINVAL)
goto csum_copy_err;
}

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] prism54usb: compile fix

2006-02-23 Thread Pete Zaitcev
On Mon, 20 Feb 2006 20:39:16 +0100, Carlos Martin [EMAIL PROTECTED] wrote:

 diff --git a/drivers/net/wireless/prism54usb/isl_sm.h 
 b/drivers/net/wireless/prism54usb/isl_sm.h
 index 9e41587..c39bb48 100644
 --- a/drivers/net/wireless/prism54usb/isl_sm.h
 +++ b/drivers/net/wireless/prism54usb/isl_sm.h
 @@ -249,7 +249,7 @@ extern int  islsm_wait_timeo
  
  /* now the helper functions, for sending packets */
  int islsm_outofband_msg(struct net_device *netdev,
 - void *buf, unsigned int size);
 + void *buf, size_t size);

I have it in my tree already. Something is inconsistent somewhere. Weird.

-- Pete
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] Some infrastructure for interrupt-less TX

2006-02-23 Thread Lennert Buytenhek
On Thu, Feb 23, 2006 at 08:00:32AM +0100, Jörn Engel wrote:

  I am assuming the real goal is avoiding interrupts when
  transmit completions can be reported without them on a
  reasonably periodic basis.
 
 Not necessarily on a periodic basis.  For some network driver I once
 worked on, the hardware simply had a ring buffer of n frames.
 Whenever a n+1th frame was transmitted, the first would be checked for
 completion.  If it was completed, it was freed, else the new frame was
 dropped (and freed).

This breaks socket buffer accounting.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Allow kfree_skb to be called with a NULL argument

2006-02-23 Thread Jörn Engel
On Thu, 23 February 2006 19:28:49 +1100, Herbert Xu wrote:
 On Thu, Feb 23, 2006 at 07:53:36AM +0100, J?rn Engel wrote:
  
  How is that argument special for kfree_skb?  Both libc free and kfree
  ignore NULL arguments and do so for good reasons.
 
 Well with kfree there is actually a slight gain in that you are doing
 the check in one place.
 
 kfree_skb on the other hand is inlined so the you're actually adding
 bloat to many places that simply don't need it.

Wrt. the binary, you have a point.  For source code, my patch does not
any new bloat and allows removal of the existing.  Lemme do a quick
measurement for the kernel I run on my machine:

-rwxr-xr-x  1 joern src   4836592 Feb 23 10:43 vmlinux
-rwxr-xr-x  1 joern src   4836727 Feb 23 10:19 vmlinux.kfree_null

135 bytes added by my patch.  Not that much.

Jörn

-- 
He who knows others is wise.
He who knows himself is enlightened.
-- Lao Tsu
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Allow kfree_skb to be called with a NULL argument

2006-02-23 Thread David S. Miller
From: Herbert Xu [EMAIL PROTECTED]
Date: Thu, 23 Feb 2006 21:55:43 +1100

 Now there's a good idea.  After all, the great majority of callers
 of kfree_skb expect to free the skb.  Dave, what do you think?

Absolutely.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Allow kfree_skb to be called with a NULL argument

2006-02-23 Thread Jörn Engel
On Thu, 23 February 2006 03:11:12 -0800, David S. Miller wrote:
 
  Now there's a good idea.  After all, the great majority of callers
  of kfree_skb expect to free the skb.  Dave, what do you think?
 
 Absolutely.

Should I merge the two patches into one and resend?

Jörn

-- 
If you're willing to restrict the flexibility of your approach,
you can almost always do something better.
-- John Carmack
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Allow kfree_skb to be called with a NULL argument

2006-02-23 Thread Herbert Xu
On Thu, Feb 23, 2006 at 12:22:31PM +0100, J?rn Engel wrote:

 Should I merge the two patches into one and resend?

Sounds good.
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmVHI~} [EMAIL PROTECTED]
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] Uninline kfree_skb and allow NULL argument

2006-02-23 Thread Jörn Engel
On Thu, 23 February 2006 22:26:01 +1100, Herbert Xu wrote:
 On Thu, Feb 23, 2006 at 12:22:31PM +0100, J?rn Engel wrote:
 
  Should I merge the two patches into one and resend?
 
 Sounds good.

Here it is.

Jörn

-- 
Fancy algorithms are buggier than simple ones, and they're much harder
to implement. Use simple algorithms as well as simple data structures.
-- Rob Pike


o Uninline kfree_skb, which saves some 15k of object code on my notebook.

o Allow kfree_skb to be called with a NULL argument.

  Subsequent patches can remove conditional from drivers and further
  reduce source and object size.

Signed-off-by: Jörn Engel [EMAIL PROTECTED]
---

 include/linux/skbuff.h |   17 +
 net/core/skbuff.c  |   18 ++
 2 files changed, 19 insertions(+), 16 deletions(-)

--- kfree_skb/include/linux/skbuff.h~kfree_skb_uninline_null2006-02-23 
13:35:05.0 +0100
+++ kfree_skb/include/linux/skbuff.h2006-02-23 13:36:23.0 +0100
@@ -306,6 +306,7 @@ struct sk_buff {
 
 #include asm/system.h
 
+void kfree_skb(struct sk_buff *skb);
 extern void   __kfree_skb(struct sk_buff *skb);
 extern struct sk_buff *__alloc_skb(unsigned int size,
   gfp_t priority, int fclone);
@@ -406,22 +407,6 @@ static inline struct sk_buff *skb_get(st
  */
 
 /**
- * kfree_skb - free an sk_buff
- * @skb: buffer to free
- *
- * Drop a reference to the buffer and free it if the usage count has
- * hit zero.
- */
-static inline void kfree_skb(struct sk_buff *skb)
-{
-   if (likely(atomic_read(skb-users) == 1))
-   smp_rmb();
-   else if (likely(!atomic_dec_and_test(skb-users)))
-   return;
-   __kfree_skb(skb);
-}
-
-/**
  * skb_cloned - is the buffer a clone
  * @skb: buffer to check
  *
--- kfree_skb/net/core/skbuff.c~kfree_skb_uninline_null 2006-02-23 
13:35:05.0 +0100
+++ kfree_skb/net/core/skbuff.c 2006-02-23 13:37:01.0 +0100
@@ -355,6 +355,24 @@ void __kfree_skb(struct sk_buff *skb)
 }
 
 /**
+ * kfree_skb - free an sk_buff
+ * @skb: buffer to free
+ *
+ * Drop a reference to the buffer and free it if the usage count has
+ * hit zero.
+ */
+void kfree_skb(struct sk_buff *skb)
+{
+   if (unlikely(!skb))
+   return;
+   if (likely(atomic_read(skb-users) == 1))
+   smp_rmb();
+   else if (likely(!atomic_dec_and_test(skb-users)))
+   return;
+   __kfree_skb(skb);
+}
+
+/**
  * skb_clone   -   duplicate an sk_buff
  * @skb: buffer to clone
  * @gfp_mask: allocation priority
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Problem with Ipsec transport mode over NAT

2006-02-23 Thread Chinh Nguyen
Patrick McHardy wrote:
 Chinh Nguyen wrote:
 
I discovered that the bug is in the function tcp_v4_rcv for kernel 
2.6.16-rc1.

After the ESP packet is decapped and decrypted in xfrm4_rcv_encap_finish, the
unencrypted packet is pushed back through ip_local_deliver. For a UDP packet, 
it
goes (back) to function udp_queue_rcv_skb. The first thing this function does 
is
called xfrm4_policy_check. As noted previously, in xfrm4_policy_check, if the
skb-sp != NULL, the esp_post_input function is called. The post input 
function
sets skb-ip_summed to CHECKSUM_UNNECESSASRY if we are in transport mode.
Therefore, further down in udp_queue_rcv_skb, we skip the checksum check and 
the
packet is passed up the stack.

However, for a decrypted TCP packet, the packet goes to tcp_v4_rcv. This
function does the checksum check right away if skb-ip_summed !=
CHECKSUM_UNNECESSARY while xfrm4_policy_check is called a little later in the
function. Therefore, the esp post input has not yet set the ip_summed to
unnecessary. The decrypted packet fails the checksum and is discarded.

To confirm this, I added another call to xfrm4_policy_check before the 
checksum
check in tcp_v4_rcv (to call esp post input). Once patched, my systems were 
able
to initiate TCP connections using Transport Mode/NAT.
 
 
 What values does skb-ip_summed have before that?

the skb-ip_summed value before the checksum check in tcp_v4_rcv is
CHECKSUM_NONE. Hence tcp_v4_rcv checks its value, which is incorrect because the
checksum is with regards to the private IP but the NAT device has modified the
source IP. I believe that skb-ip_summed is set to CHECKSUM_NONE by esp_input
(net/ipv4/esp4.c:180) which is called by xfrm4_rcv_encap
(net/ipv4/xfrm4_input.c:101).

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RFC: fix first packet goes out with MAC 00:00:00:00:00:00

2006-02-23 Thread jamal
On Thu, 2006-23-02 at 17:41 +0300, Alexey Kuznetsov wrote:

 After some thinking I suspect the deletion of this chunk could change 
 behaviour
 of some parts which do not use neighbour cache f.e. packet socket.
 

Thanks Alexey, this was what i was worried about ;-

 
 I think safer approach would be to move this chunk after if (daddr).
 And the possibility to remove this completely could be analyzed later.
 

Ok, patch attached. Dave this also is needed for 2.6.16-rcXX.

Tested against a standard eth device (e1000) and tuntap.

cheers,
jamal


For ethernet-like netdevices, dont overwritte first packet's dst 
MAC address when it is already resolved

Signed-off-by: Jamal Hadi Salim [EMAIL PROTECTED]
---

diff --git a/net/ethernet/eth.c b/net/ethernet/eth.c
index 9890fd9..c971f14 100644
--- a/net/ethernet/eth.c
+++ b/net/ethernet/eth.c
@@ -95,6 +95,12 @@ int eth_header(struct sk_buff *skb, stru
saddr = dev-dev_addr;
memcpy(eth-h_source,saddr,dev-addr_len);
 
+   if(daddr)
+   {
+   memcpy(eth-h_dest,daddr,dev-addr_len);
+   return ETH_HLEN;
+   }
+   
/*
 *  Anyway, the loopback-device should never use this function... 
 */
@@ -105,12 +111,6 @@ int eth_header(struct sk_buff *skb, stru
return ETH_HLEN;
}

-   if(daddr)
-   {
-   memcpy(eth-h_dest,daddr,dev-addr_len);
-   return ETH_HLEN;
-   }
-   
return -ETH_HLEN;
 }
 


ipw2200 tester needed

2006-02-23 Thread Larry Finger
In reviewing the ieee80211 stack in order to add additional geographic support for wireless drivers, 
I have studied all the in-kernel wireless drivers for their interactions with the routines in 
ieee80211_geo.c. As clearly stated in the comments, ipw2200.c duplicates most of those routines, 
even though ieee80211 is required to use ipw2200. Obviously, this bloats both the source code and 
the binaries for any user of ipw2200. I am planning to develop a patch to have ipw2200 use the 
ieee80211 code; however, I do not have the necessary hardware to test the result.


Is anyone interested in testing this patch for me? Are there any comments 
regarding this change?

Thanks,

Larry
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Problem with Ipsec transport mode over NAT

2006-02-23 Thread Patrick McHardy
Chinh Nguyen wrote:
 Patrick McHardy wrote:
 
What values does skb-ip_summed have before that?
 
 
 the skb-ip_summed value before the checksum check in tcp_v4_rcv is
 CHECKSUM_NONE. Hence tcp_v4_rcv checks its value, which is incorrect because 
 the
 checksum is with regards to the private IP but the NAT device has modified the
 source IP.

Netfilter recalculates the checksum when NATing it.

 I believe that skb-ip_summed is set to CHECKSUM_NONE by esp_input
 (net/ipv4/esp4.c:180) which is called by xfrm4_rcv_encap
 (net/ipv4/xfrm4_input.c:101).

The question is why the checksum is invalid. Please start by describing
what you're trying to do.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2.6.16-rc4] e1000: revert to single descriptor for legacy receive path

2006-02-23 Thread Jesse Brandeburg
A recent patch attempted to enable more efficient memory usage by using 
only 2kB descriptors for jumbo frames.  The method used to implement this 
has since been commented upon as illegal and in recent kernels even 
causes a BUG when receiving ip fragments while using jumbo frames. This 
patch simply goes back to the way things were.  We expect some complaints 
to reoccur due to order 3 allocations failing due to this change.


Signed-off-by: Jesse Brandeburg [EMAIL PROTECTED]

---

 drivers/net/e1000/e1000.h  |3 -
 drivers/net/e1000/e1000_main.c |  117 +++-
 2 files changed, 45 insertions(+), 75 deletions(-)

diff --git a/drivers/net/e1000/e1000.h b/drivers/net/e1000/e1000.h
index 27c7730..99baf0e 100644
--- a/drivers/net/e1000/e1000.h
+++ b/drivers/net/e1000/e1000.h
@@ -225,9 +225,6 @@ struct e1000_rx_ring {
struct e1000_ps_page *ps_page;
struct e1000_ps_page_dma *ps_page_dma;

-   struct sk_buff *rx_skb_top;
-   struct sk_buff *rx_skb_prev;
-
/* cpu for rx queue */
int cpu;

diff --git a/drivers/net/e1000/e1000_main.c b/drivers/net/e1000/e1000_main.c
index 31e3329..5b7d0f4 100644
--- a/drivers/net/e1000/e1000_main.c
+++ b/drivers/net/e1000/e1000_main.c
@@ -103,7 +103,7 @@ static char e1000_driver_string[] = Int
 #else
 #define DRIVERNAPI -NAPI
 #endif
-#define DRV_VERSION 6.3.9-k2DRIVERNAPI
+#define DRV_VERSION 6.3.9-k4DRIVERNAPI
 char e1000_driver_version[] = DRV_VERSION;
 static char e1000_copyright[] = Copyright (c) 1999-2005 Intel Corporation.;

@@ -1635,8 +1635,6 @@ setup_rx_desc_die:

rxdr-next_to_clean = 0;
rxdr-next_to_use = 0;
-   rxdr-rx_skb_top = NULL;
-   rxdr-rx_skb_prev = NULL;

return 0;
 }
@@ -1713,8 +1711,23 @@ e1000_setup_rctl(struct e1000_adapter *a
rctl |= adapter-rx_buffer_len  0x11;
} else {
rctl = ~E1000_RCTL_SZ_4096;
-   rctl = ~E1000_RCTL_BSEX;
-   rctl |= E1000_RCTL_SZ_2048;
+		rctl |= E1000_RCTL_BSEX; 
+		switch (adapter-rx_buffer_len) {

+   case E1000_RXBUFFER_2048:
+   default:
+   rctl |= E1000_RCTL_SZ_2048;
+   rctl = ~E1000_RCTL_BSEX;
+   break;
+   case E1000_RXBUFFER_4096:
+   rctl |= E1000_RCTL_SZ_4096;
+   break;
+   case E1000_RXBUFFER_8192:
+   rctl |= E1000_RCTL_SZ_8192;
+   break;
+   case E1000_RXBUFFER_16384:
+   rctl |= E1000_RCTL_SZ_16384;
+   break;
+   }
}

 #ifndef CONFIG_E1000_DISABLE_PACKET_SPLIT
@@ -2107,16 +2120,6 @@ e1000_clean_rx_ring(struct e1000_adapter
}
}

-   /* there also may be some cached data in our adapter */
-   if (rx_ring-rx_skb_top) {
-   dev_kfree_skb(rx_ring-rx_skb_top);
-
-   /* rx_skb_prev will be wiped out by rx_skb_top */
-   rx_ring-rx_skb_top = NULL;
-   rx_ring-rx_skb_prev = NULL;
-   }
-
-
size = sizeof(struct e1000_buffer) * rx_ring-count;
memset(rx_ring-buffer_info, 0, size);
size = sizeof(struct e1000_ps_page) * rx_ring-count;
@@ -3106,24 +3109,27 @@ e1000_change_mtu(struct net_device *netd
break;
}

-   /* since the driver code now supports splitting a packet across
-* multiple descriptors, most of the fifo related limitations on
-* jumbo frame traffic have gone away.
-* simply use 2k descriptors for everything.
-*
-* NOTE: dev_alloc_skb reserves 16 bytes, and typically NET_IP_ALIGN
-* means we reserve 2 more, this pushes us to allocate from the next
-* larger slab size
-* i.e. RXBUFFER_2048 -- size-4096 slab */

-   /* recent hardware supports 1KB granularity */
if (adapter-hw.mac_type  e1000_82547_rev_2) {
-   adapter-rx_buffer_len =
-   ((max_frame  E1000_RXBUFFER_2048) ?
-   max_frame : E1000_RXBUFFER_2048);
+   adapter-rx_buffer_len = max_frame;
E1000_ROUNDUP(adapter-rx_buffer_len, 1024);
-   } else
-   adapter-rx_buffer_len = E1000_RXBUFFER_2048;
+   } else {
+   if(unlikely((adapter-hw.mac_type  e1000_82543) 
+  (max_frame  MAXIMUM_ETHERNET_FRAME_SIZE))) {
+   DPRINTK(PROBE, ERR, Jumbo Frames not supported 
+   on 82542\n);
+   return -EINVAL;
+   } else {
+   if(max_frame = E1000_RXBUFFER_2048)
+   adapter-rx_buffer_len = E1000_RXBUFFER_2048;
+   else if(max_frame = E1000_RXBUFFER_4096)
+   adapter-rx_buffer_len = E1000_RXBUFFER_4096;
+   else 

Fw: [Bugme-new] [Bug 6121] New: TCP_DEFER_ACCEPT is reset on listen() call

2006-02-23 Thread Andrew Morton


Begin forwarded message:

Date: Thu, 23 Feb 2006 07:26:28 -0800
From: [EMAIL PROTECTED]
To: [EMAIL PROTECTED]
Subject: [Bugme-new] [Bug 6121] New: TCP_DEFER_ACCEPT is reset on listen() call


http://bugzilla.kernel.org/show_bug.cgi?id=6121

   Summary: TCP_DEFER_ACCEPT is reset on listen() call
Kernel Version: 2.6.14, 2.6.15
Status: NEW
  Severity: normal
 Owner: [EMAIL PROTECTED]
 Submitter: [EMAIL PROTECTED]


Most recent kernel where this bug did not occur: 2.6.13
Distribution:
Hardware Environment:
Software Environment:
Problem Description:
Value of TCP_DEFER_ACCEPT socket option is reset to zero when listen() is 
called.

Steps to reproduce:
Following program shows the problem:
#include sys/types.h
#include sys/socket.h
#include netinet/in.h
#include netinet/tcp.h

main()
{
int s = socket(AF_INET, SOCK_STREAM, 0);
int val = 1;
int len = sizeof(val);

setsockopt(s, SOL_TCP, TCP_DEFER_ACCEPT, val, len);
listen(s, 1);
getsockopt(s, SOL_TCP, TCP_DEFER_ACCEPT, val, len);
printf(get TCP_DEFER_ACCEPT = %d\n, val);
}

On =2.6.13 output is get TCP_DEFER_ACCEPT = 3;
On =2.6.14 output is get TCP_DEFER_ACCEPT = 0.


Starting from 2.6.14, defer_accept is moved to request_sock_queue structure,
which is re-initialized in inet_csk_listen_start().

--- You are receiving this mail because: ---
You are on the CC list for the bug, or are watching someone who is.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Fw: [Bugme-new] [Bug 6121] New: TCP_DEFER_ACCEPT is reset on listen() call

2006-02-23 Thread Arnaldo Carvalho de Melo
On 2/23/06, Andrew Morton [EMAIL PROTECTED] wrote:

 Starting from 2.6.14, defer_accept is moved to request_sock_queue structure,
 which is re-initialized in inet_csk_listen_start().

Oops, looking into it...

- Arnaldo
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 02/02] add mask options to fwmark masking code

2006-02-23 Thread Michael Richardson
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1


 Patrick == Patrick McHardy [EMAIL PROTECTED] writes:
 #define RTA_FWMARK RTA_PROTOINFO +#define RTA_FWMARK_MASK
 RTA_CACHEINFO

Patrick Please introduce a new attribute for this instead of
Patrick overloading RTA_CACHEINFO.

  I would be happy to do that.
  Should I also un-overload FWMARK, with backwards compatibility?

 diff --git a/net/ipv4/fib_rules.c b/net/ipv4/fib_rules.c index
 de327b3..69eed89 100644 --- a/net/ipv4/fib_rules.c +++
 b/net/ipv4/fib_rules.c @@ -68,6 +68,7 @@ struct fib_rule u8
 r_tos; #ifdef CONFIG_IP_ROUTE_FWMARK u32 r_fwmark; + u32
 r_fwmark_mask;

Patrick Both patches have whitespace issues. You should also change

  uhm. okay.
  I'm surprised, since I produced it with git-format-patch. Maybe there
are tabs that emacs screwed up.

- -- 
]   ON HUMILITY: to err is human. To moo, bovine.   |  firewalls  [
]   Michael Richardson,Xelerance Corporation, Ottawa, ON|net architect[
] [EMAIL PROTECTED]  http://www.sandelman.ottawa.on.ca/mcr/ |device driver[
] panic(Just another Debian GNU/Linux using, kernel hacking, security guy); [


-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.1 (GNU/Linux)
Comment: Finger me for keys

iQEVAwUBQ/4O2ICLcPvd0N1lAQK/egf6A0iQ1hvecR4BeaCrQiu53beGZd6zHldk
o6logfar94kPP/H/D/kMcNeAvL2a3cJ8wyfyP02Cav8gP1C3X+XV+yLtA9jHIrdK
nqQ1gw7F4Cj2+v7du/jS8GxNMWevXhJ7f9hvnzh8+DHMUCjqiksgsuIgcRQYrqOQ
vxYERvR5TojEIaJfg8kH/lJRn3sm/APuMphM6c6SAeqrWpAdijbZb4LSNpGH50ci
nNhUp+FxoP8vVFTMTu7M1MK4fpCIWA/PxBkmy3YDhcQx1+mE2nrEqHdbKfx9uY+t
0mxR8UC5sthhn94/VCjcqWOoHe3S/Gi+WWoPtwN1sFe5BujwU7Vcfw==
=yKIA
-END PGP SIGNATURE-
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] iproute2 -- add fwmarkmask

2006-02-23 Thread Michael Richardson
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1


 Patrick == Patrick McHardy [EMAIL PROTECTED] writes:
Patrick The normal way to display masks is with a /. Also I think
Patrick it shouldn't display the default mask to avoid breaking
Patrick scripts that parse the output.

  I generally dislike the /VALUE, since I expect /PREFIX-LEN.
  I agree that it shouldn't show if it is default.

Patrick ip should be able to parse its own output, and it would
Patrick also look nicer if I could just say fwmark
Patrick 0x1/32. fwmarkmask is really an incredible ugly expression
Patrick :)

  Sure. Is that a 32-bit long mask (0xfff), or is it a 0x0020?
  fwmark is not an address.

  Or would you like /32 to be a prefix-based mask, and value and/or
fwmarkmask to be a value? 

- -- 
]   ON HUMILITY: to err is human. To moo, bovine.   |  firewalls  [
]   Michael Richardson,Xelerance Corporation, Ottawa, ON|net architect[
] [EMAIL PROTECTED]  http://www.sandelman.ottawa.on.ca/mcr/ |device driver[
] panic(Just another Debian GNU/Linux using, kernel hacking, security guy); [
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.1 (GNU/Linux)
Comment: Finger me for keys

iQEUAwUBQ/4PcoCLcPvd0N1lAQIHhQf3XzPLA91QEx2+XpmYIm8RyB1oKmUUXDP+
s2UrhOKbQwipcq8/hk1t4FKx8J5j/dFHzVXbgPK+ZUwX4+IjHmM3r0sCIcK08xwU
/ZZjf0wqwUI+RcPRFw3zC0+hnwRUIAUxhl3p7h3PigDpPu7AY5tQ1dXc6WNwRjTi
fS7Yozbo225dzvVLKHhSIqOQ4eJFJcPPQdTKQLxnc3gtVoSe41DKMM+x6uix6fG8
se9dngJRbhye1Xgws9AGnBQT9f7JVmCSv7V4SHnNynmnRw3cra8++QEnLZ/vhm5C
JdeVSeDGxAPuKEj6HA2RZu/UOG6RkYNZGPovGKzuPn403x0HNBuf
=BzfV
-END PGP SIGNATURE-
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Problem with Ipsec transport mode over NAT

2006-02-23 Thread Chinh Nguyen
Patrick McHardy wrote:
 Chinh Nguyen wrote:
 
Patrick McHardy wrote:


What values does skb-ip_summed have before that?


the skb-ip_summed value before the checksum check in tcp_v4_rcv is
CHECKSUM_NONE. Hence tcp_v4_rcv checks its value, which is incorrect because 
the
checksum is with regards to the private IP but the NAT device has modified the
source IP.
 
 
 Netfilter recalculates the checksum when NATing it.

The NATing is not done by netfilter but by the NAT device between the IPsec 
peers.

 
  I believe that skb-ip_summed is set to CHECKSUM_NONE by esp_input
 
(net/ipv4/esp4.c:180) which is called by xfrm4_rcv_encap
(net/ipv4/xfrm4_input.c:101).
 
 
 The question is why the checksum is invalid. Please start by describing
 what you're trying to do.

[Linux ipsec client C] -- [NAT device] -- [Linux ipsec server S]

C negotiates a IPsec Transport Mode with S. Because of Transport Mode/NAT-T, 2
things happen to an IPsec packet.

1. It is UDP-encapsulated, typically on port 4500/udp.
2. Transport Mode traffic leaves the original IP header alone whereas tunnel
mode wraps the entire traffic in a second IP header. As such, when the packet
passes through the NAT device, the source IP is N. However, the original
unencrypted packet had source IP C.

S rips off the UDP-encap header, decrypts the payload, and joins the content
back to the IP header. If the decrypted content is UDP or TCP, the UDP/TCP
checksum is now incorrect because the source IP is now N not C.

(In tunnel mode, we would ignore the NAT-ted outer IP header because the
decrypted content has an entire IP header + UDP/TCP etc)

This is a well-known problem with transport mode/NAT. One solution is to use
NAT-OA and NAT-OR to recalculate the checksum. The linux kernel does the simpler
thing of ignoring the UDP/TCP checksum altogether in this particular case:

function esp_post_input (net/ipv4/esp4.c)
290 /*
291  * 2) ignore UDP/TCP checksums in case
292  *of NAT-T in Transport Mode, or
293  *perform other post-processing fixes
294  *as per * draft-ietf-ipsec-udp-encaps-06,
295  *section 3.1.2
296  */
297 if (!x-props.mode)
298 skb-ip_summed = CHECKSUM_UNNECESSARY;
299
300 break;


As noted, esp_post_input is called in xfrm4_policy_check. Decrypted UDP traffic
through transport mode/nat also has bad checksums. However, since it is passed
through udp_queue_rcv_skb after decryption, and this function calls
xfrm4_policy_check before checking the UDP checksum, line 298 means the kernel
ignores the bad checksum.

Decrypted TCP traffic has bad checksums too. But since tcp_v4_rcv checks the TCP
checksum before calling xfrm4_policy_check, the bad checksum means the TCP
packet is dropped as a bad segment.

The end result is that UDP and other traffic (eg, ICMP) can pass through
transport mode/nat but not TCP.

I don't know what correct fix is. Adding an extra call to xfrm4_policy_check in
tcp_v4_rcv before the checksum check fixes this problem and doesn't seem to
break anything else. On the other hand, moving some of the code in
esp_post_input into esp_input (especially line 298) will work, too.

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Uninline kfree_skb and allow NULL argument

2006-02-23 Thread David S. Miller
From: Jörn Engel [EMAIL PROTECTED]
Date: Thu, 23 Feb 2006 13:52:59 +0100

 +void kfree_skb(struct sk_buff *skb);
  extern void __kfree_skb(struct sk_buff *skb);

If you wish to contribute to a software project, you should adhere to
the coding style and conventions of that project when submitting
changes.  It doesn't matter what the reasons are for those
conventions, you should follow them until the projects decides to
change them.

If you wish to discuss the merits of putting extern there or not in
function declarations, you can start a thread about that and make
proposals on linux-kernel.

Patch submissions are not the place to do that.

So place add extern here, thanks a lot.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: tg3 losing promisc rx_mode bit

2006-02-23 Thread Michael Chan
On Thu, 2006-02-23 at 14:31 -0800, Jim Westfall wrote:

 I am seeing the following issue on only the first onboard nic on each of 
 the servers.  If the nic is put into promisc mode too soon after the nic 
 is brought up, the promisc bit in the rx_mode register is somehow getting 
 reset to 0;
 

This is a known problem caused by ASF or IPMI firmware overwriting the
promiscuous mode bit. I will have someone contact you to get the
firmware upgraded.

Thanks.

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: tg3 losing promisc rx_mode bit

2006-02-23 Thread Ian McDonald
On 2/24/06, Michael Chan [EMAIL PROTECTED] wrote:
 This is a known problem caused by ASF or IPMI firmware overwriting the
 promiscuous mode bit. I will have someone contact you to get the
 firmware upgraded.

 Thanks.

Thinking out loud here without reading source... - can you check the
version of the firmware and make noise if they have a version like
this one?

Ian
--
Ian McDonald
Web: http://wand.net.nz/~iam4
Blog: http://imcdnzl.blogspot.com
WAND Network Research Group
Department of Computer Science
University of Waikato
New Zealand
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Patch 1/6] IPSEC: core updates

2006-02-23 Thread David S. Miller
From: jamal [EMAIL PROTECTED]
Date: Tue, 21 Feb 2006 08:31:49 -0500

 Ok. Patch attached against net-2617
 
 Yoshfuji-san you should probably write a little doc that should be
 available in the Doc/ directory.

If we write this, please ask Andi Kleen to review it.
His arch has the most problems in this area making him
an expert on this topic :-)

 struct xfrm_aevent_id needs to be 32-bit + 64-bit align friendly.
 
 Signed-off-by: Jamal Hadi Salim [EMAIL PROTECTED]

Applied, thanks everyone.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Fw: [Bugme-new] [Bug 6121] New: TCP_DEFER_ACCEPT is reset on listen() call

2006-02-23 Thread Arnaldo Carvalho de Melo
On 2/23/06, Arnaldo Carvalho de Melo [EMAIL PROTECTED] wrote:
 On 2/23/06, Andrew Morton [EMAIL PROTECTED] wrote:

  Starting from 2.6.14, defer_accept is moved to request_sock_queue structure,
  which is re-initialized in inet_csk_listen_start().

 Oops, looking into it...

culprit:
http://www.kernel.org/git/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=295f7324ff8d9ea58b4d3ec93b1aaa1d80e048a9

Alexandra, can you please test by just removing the zeroing from
reqsk_queue_alloc() in net/core/request_sock.c? Just remove this
line:

queue-rskq_defer_accept = 0;

icsk-icsk_accept_queue (that maps to the queue- above) is zeroed
at sk alloc time, so just removing this one should restore the previous
behaviour.

Thanks,

- Arnaldo
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


ip6_tunnel keeping dst_cache after change of params

2006-02-23 Thread Hugo Santos
Hi,

   ip6_tunnel keeps a cached dst (dst_cache in ip6_tnl) per tunnel
 instance. This cached dst is re-used while it's not marked obsolete. A
 change of the tunnel's parameters (via SIOCCHGTUNNEL) does not
 invalidate the dst_cache directly, which results on it being used by
 ip6ip6_tnl_xmit after the tunnel is configured with new parameters.
   Shouldn't ip6ip6_tnl_change dst_release() the cached dst and leave
 ip6ip6_tnl_xmit to pick a new one based on the new local/remote
 addresses etc? I can provide a patch to fix this, meanwhile just wanted
 to confirm the expected behaviour.

   Thanks,
  Hugo


signature.asc
Description: Digital signature


Re: (usagi-users 03614) Re: IPv6 setsockopt software MTU patch

2006-02-23 Thread David S. Miller
From: YOSHIFUJI Hideaki [EMAIL PROTECTED]
Date: Fri, 24 Feb 2006 00:23:51 +0900 (JST)

 David, please apply.  Thank you.

Can you please resend the patch with a full changelog
entry and Signed-off-by lines for me?  Thank you.

This is for net-2.6 right?  Or net-2.6.17?

Thanks again.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] pktgen: fix races between control/worker threads

2006-02-23 Thread David S. Miller
From: Robert Olsson [EMAIL PROTECTED]
Date: Wed, 22 Feb 2006 19:47:13 +0100

 
 Jesse Brandeburg writes:
   
   I looked quickly at this on a couple different machines and wasn't
   able to reproduce, so don't let me block the patch.  I think its a
   good patch FWIW
 
  OK! 
  We ask Deve to apply it.

Applied to net-2.6.17, thanks.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 00/01] pktgen: Lindent run.

2006-02-23 Thread David S. Miller
From: Luiz Fernando Capitulino [EMAIL PROTECTED]
Date: Mon, 23 Jan 2006 13:44:19 -0200

 
  This patch is not in-lined because it's 120K bytes long, you can found it at:
 
 http://www.cpu.eti.br/patches/pktgen_lindent_1.patch

Not found:

[EMAIL PROTECTED]:~/src/GIT/net-2.6.17$ wget 
http://www.cpu.eti.br/patches/pktgen_lindent_1.patch
--17:16:50--  http://www.cpu.eti.br/patches/pktgen_lindent_1.patch
   = `pktgen_lindent_1.patch'
Resolving www.cpu.eti.br... 209.59.143.183
Connecting to www.cpu.eti.br|209.59.143.183|:80... connected.
HTTP request sent, awaiting response... 404 Not Found
17:16:50 ERROR 404: Not Found.

Anyways, can you please regenerate these 4 patches against
net-2.6.17, as I put in Arthur's race fix and it will certainly
conflict with these.

Sorry for taking so long to get to this :-(
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/3] skge: patches for 2.6.16

2006-02-23 Thread Jeff Garzik

Francois Romieu wrote:

Stephen Hemminger [EMAIL PROTECTED] :


Bug fix patches to skge driver that need to go in 2.6.16.
Some of them are in -mm and some have already been sent (and ignored).



#1..#3 Applied to branch 'for-jeff' at
git://electric-eye.fr.zoreil.com/home/romieu/linux-2.6.git

Shortlog

$ git rev-list --pretty master..HEAD | git shortlog

Francois Romieu:
  r8169: fix broken ring index handling in suspend/resume
  r8169: enable wake on lan

Stephen Hemminger:
  sky2: yukon-ec-u chipset initialization
  sky2: limit coalescing values to ring size
  sky2: poke coalescing timer to fix hang
  sky2: force early transmit status
  sky2: use device iomem to access PCI config
  sky2: close race on IRQ mask update.
  skge: NAPI/irq race fix
  skge: genesis phy initialzation
  skge: protect interrupt mask



pulled, thanks.  It definitely makes things easier, if the patches are 
rolled up like this.


Jeff


-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] Some state changes not be counted to TCP_MIB_ATTEMPTFAILS

2006-02-23 Thread Wei Yongjun
Refer to RFC2012, tcpAttemptFails is defined as following:
  tcpAttemptFails OBJECT-TYPE
  SYNTAX  Counter32
  MAX-ACCESS  read-only
  STATUS  current
  DESCRIPTION
  The number of times TCP connections have made a direct
  transition to the CLOSED state from either the SYN-SENT
  state or the SYN-RCVD state, plus the number of times TCP
  connections have made a direct transition to the LISTEN
  state from the SYN-RCVD state.
  ::= { tcp 7 }

State changes of SYN-RCVD to CLOSED, SYN-SENT to CLOSED and SYN-RCVD to
LISTEN should be counted to TCP_MIB_ATTEMPTFAILS.

Following state changes does not be counted to TCP_MIB_ATTEMPTFAILS by
the kernel.

SYN-SENT state = CLOSED

TCP A TCP B
  
1.  LISTENCLOSED
   
2. -- SEQ=ZCTL=SYN  --  SYN-SENT

3. -- SEQ=XACK=Z+1CTL=RST  --  CLOSED

SYN-RECEIVED state(came from SYN-SENT state) = CLOSED

TCP A TCP B
  
1.  LISTENCLOSED

2. -- SEQ=ZCTL=SYN  --  SYN-SENT

3. -- SEQ=XACK=Z+1CTL=SYN  SYN-SENT

4. -- SEQ=Z+1ACK=X+1CTL=ACK   --  SYN-RECEIVED

3. -- SEQ=X+1ACK=Z+2CTL=RST   --  CLOSED

SYN-RECEIVED state(came from SYN-SENT state) = CLOSED

TCP A TCP B
  
1.  LISTENCLOSED

2. -- SEQ=ZCTL=SYN  --  SYN-SENT

3. -- SEQ=XACK=Z+1CTL=SYN  SYN-SENT

4. -- SEQ=Z+1ACK=X+1CTL=ACK   --  SYN-RECEIVED

3. -- SEQ=X+1ACK=Z+2CTL=SYN   --  CLOSED

SYN-RECEIVED state = LISTEN

TCP A TCP B
  
1.  LISTENLISTEN
  
2.   ... SEQ=ZCTL=SYN--  SYN-RECEIVED
  
3.  (??) -- SEQ=XACK=Z+1CTL=SYN,ACK   --  SYN-RECEIVED
  
4.   -- SEQ=Z+1CTL=RST  --  (return to
LISTEN!)
  
5.  LISTENLISTEN

SYN-RECEIVED state = LISTEN

TCP A TCP B
  
1.  LISTENLISTEN
  
2.   ... SEQ=ZCTL=SYN--  SYN-RECEIVED
  
3.  (??) -- SEQ=XACK=Z+1CTL=SYN,ACK   --  SYN-RECEIVED
  
4.   -- SEQ=Z+1CTL=SYN  --  (return to
LISTEN!)
  
5.  LISTENLISTEN

Patch to kernel 2.6.15.4 as following:

diff -Nur a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
--- a/net/ipv4/tcp_input.c  2006-02-23 09:20:24.659262056 +0900
+++ b/net/ipv4/tcp_input.c  2006-02-23 09:28:50.772321176 +0900
@@ -4003,6 +4003,7 @@
 */
 
if (th-rst) {
+   TCP_INC_STATS_BH(TCP_MIB_ATTEMPTFAILS);
tcp_reset(sk);
goto discard;
}
@@ -4290,6 +4291,8 @@
 
/* step 2: check RST bit */
if(th-rst) {
+   if(sk-sk_state == TCP_SYN_RECV)
+   TCP_INC_STATS_BH(TCP_MIB_ATTEMPTFAILS);
tcp_reset(sk);
goto discard;
}
@@ -4303,6 +4306,8 @@
 *  Check for a SYN in window.
 */
if (th-syn  !before(TCP_SKB_CB(skb)-seq, tp-rcv_nxt)) {
+   if(sk-sk_state == TCP_SYN_RECV)
+   TCP_INC_STATS_BH(TCP_MIB_ATTEMPTFAILS);
NET_INC_STATS_BH(LINUX_MIB_TCPABORTONSYN);
tcp_reset(sk);
return 1;
diff -Nur a/net/ipv4/tcp_minisocks.c b/net/ipv4/tcp_minisocks.c
--- a/net/ipv4/tcp_minisocks.c  2006-02-23 09:20:24.660261904 +0900
+++ b/net/ipv4/tcp_minisocks.c  2006-02-23 09:26:07.432152656 +0900
@@ -591,8 +591,10 @@
/* RFC793: second check the RST bit and
 * fourth, check the SYN bit
 */
-   if (flg  (TCP_FLAG_RST|TCP_FLAG_SYN))
+   if (flg  (TCP_FLAG_RST|TCP_FLAG_SYN)) {
+   TCP_INC_STATS_BH(TCP_MIB_ATTEMPTFAILS);
goto embryonic_reset;
+   }
 
/* ACK sequence verified above, just make sure ACK is
 * set.  If ACK not set, just silently drop the packet.


-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH]IPv4 UDP does not discard the datagram with invalid checksum

2006-02-23 Thread Wei Yongjun
Under IPv4, when I send a UDP packet with invalid checksum, kernel used
udp_rcv() to up packet to UDP layer, application used udp_recvmsg to
receive message. So if one UDP packet with invalid checksum is arrived
to host, UDP_MIB_INDATAGRAMS will be increased 1, UDP_MIB_INERRORS
should be increased 1.
int udp_rcv(struct sk_buff *skb) {
...
udp_queue_rcv_skb();
...
}

static int udp_queue_rcv_skb(struct sock * sk, struct sk_buff *skb) {
...
if (sk-sk_filter  skb-ip_summed != CHECKSUM_UNNECESSARY) {
 if (__udp_checksum_complete(skb)) {
 UDP_INC_STATS_BH(UDP_MIB_INERRORS);
 kfree_skb(skb);
 return -1;
}
skb-ip_summed = CHECKSUM_UNNECESSARY;
}

UDP_INC_STATS_BH(UDP_MIB_INDATAGRAMS);
...
}

static int udp_recvmsg(...) {
...
csum_copy_err:
UDP_INC_STATS_BH(UDP_MIB_INERRORS);
...
}

In my test, I send a to a IPv4 UDP packet with invalid checksum to echo-
udp, I can find the following message in file /var/log/messages:
xinetd[4468]: service echo-dgram, recvfrom: Resource temporarily
unavailable (errno = 11)
and UDP_MIB_INDATAGRAMS increased 1, UDP_MIB_INERRORS increased 0.
xinetd used other fucntion to receive message, not udp_recvmsg()?

The other question is why discard the packet with invalid checksum only
when sk-sk_filter is set?

By the way, under IPv6, packet with invalid checksum be discard in
udpv6_rcv(), so So if one UDP packet with invalid checksum is arrived to
IPv6 host, UDP_MIB_INDATAGRAMS will be increased 0, UDP_MIB_INERRORS
should be increased 1.

static int udpv6_rcv(struct sk_buff **pskb, unsigned int *nhoffp) {
...
udpv6_queue_rcv_skb();
...
}

static inline int udpv6_queue_rcv_skb(struct sock * sk, struct sk_buff
*skb) {
...
if (skb-ip_summed != CHECKSUM_UNNECESSARY) {
 if ((unsigned short)csum_fold(skb_checksum(skb, 0, skb-len,
skb-csum))) {
 UDP6_INC_STATS_BH(UDP_MIB_INERRORS);
 kfree_skb(skb);
return 0;
 }
 skb-ip_summed = CHECKSUM_UNNECESSARY;
}
...
UDP6_INC_STATS_BH(UDP_MIB_INDATAGRAMS);
...
}

One packet with invalid checksum arrived to IPv4 and IPv6 host, the
count of UDP_MIB_INDATAGRAMS and UDP_MIB_INERRORS get different
increase. There definition of the two count are some difference between
IPv4 and IPv6?


  IPv4 UDP does not discard the datagram with invalid checksum. UDP can
  validate UDP checksums correctly only when socket filtering
 instructions
  is set. If socket filtering instructions is not set, datagram with
  invalid checksum will be passed to the application.
 
 We check the checksum later, in parallel with the copy of
 the packet data into userspace.
 
 See udp_recvmsg(), where we do this:
 
 if (skb-ip_summed==CHECKSUM_UNNECESSARY) {
 err = skb_copy_datagram_iovec(skb, sizeof(struct
 udphdr), msg-msg_iov,
   copied);
 } else if (msg-msg_flagsMSG_TRUNC) {
 if (__udp_checksum_complete(skb))
 goto csum_copy_err;
 err = skb_copy_datagram_iovec(skb, sizeof(struct
 udphdr), msg-msg_iov,
   copied);
 } else {
 err = skb_copy_and_csum_datagram_iovec(skb, sizeof
 (struct udphdr), msg-msg_iov);
 
 if (err == -EINVAL)
 goto csum_copy_err;
 }


-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Problem with Ipsec transport mode over NAT

2006-02-23 Thread Patrick McHardy
Chinh Nguyen wrote:
 Patrick McHardy wrote:
 
Netfilter recalculates the checksum when NATing it.
 
 
 The NATing is not done by netfilter but by the NAT device between the IPsec 
 peers.

I see, so the TCP checksum includes the wrong IPs.

 [Linux ipsec client C] -- [NAT device] -- [Linux ipsec server S]
 
 C negotiates a IPsec Transport Mode with S. Because of Transport Mode/NAT-T, 2
 things happen to an IPsec packet.
 
 1. It is UDP-encapsulated, typically on port 4500/udp.
 2. Transport Mode traffic leaves the original IP header alone whereas tunnel
 mode wraps the entire traffic in a second IP header. As such, when the packet
 passes through the NAT device, the source IP is N. However, the original
 unencrypted packet had source IP C.
 
 S rips off the UDP-encap header, decrypts the payload, and joins the content
 back to the IP header. If the decrypted content is UDP or TCP, the UDP/TCP
 checksum is now incorrect because the source IP is now N not C.
 
 (In tunnel mode, we would ignore the NAT-ted outer IP header because the
 decrypted content has an entire IP header + UDP/TCP etc)
 
 This is a well-known problem with transport mode/NAT. One solution is to use
 NAT-OA and NAT-OR to recalculate the checksum. The linux kernel does the 
 simpler
 thing of ignoring the UDP/TCP checksum altogether in this particular case:
 
 function esp_post_input (net/ipv4/esp4.c)
 290 /*
 291  * 2) ignore UDP/TCP checksums in case
 292  *of NAT-T in Transport Mode, or
 293  *perform other post-processing fixes
 294  *as per * draft-ietf-ipsec-udp-encaps-06,
 295  *section 3.1.2
 296  */
 297 if (!x-props.mode)
 298 skb-ip_summed = CHECKSUM_UNNECESSARY;
 299
 300 break;
 
 
 As noted, esp_post_input is called in xfrm4_policy_check. Decrypted UDP 
 traffic
 through transport mode/nat also has bad checksums. However, since it is passed
 through udp_queue_rcv_skb after decryption, and this function calls
 xfrm4_policy_check before checking the UDP checksum, line 298 means the kernel
 ignores the bad checksum.
 
 Decrypted TCP traffic has bad checksums too. But since tcp_v4_rcv checks the 
 TCP
 checksum before calling xfrm4_policy_check, the bad checksum means the TCP
 packet is dropped as a bad segment.
 
 The end result is that UDP and other traffic (eg, ICMP) can pass through
 transport mode/nat but not TCP.
 
 I don't know what correct fix is. Adding an extra call to xfrm4_policy_check 
 in
 tcp_v4_rcv before the checksum check fixes this problem and doesn't seem to
 break anything else. On the other hand, moving some of the code in
 esp_post_input into esp_input (especially line 298) will work, too.

So we could move checksum validation behind xfrm4_policy_check or
already set ip_summed to CHECKSUM_UNNECESSARY in esp_input. Already
setting ip_summed in esp4_input looks easier. But this still leaves
one problem. With netfilter and local NAT, a decapsulated transport
mode packet might be forwarded to another host. In that case the
checksum contained in the packet is invalid. Any ideas how to fix
this anyone?
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 02/02] add mask options to fwmark masking code

2006-02-23 Thread Patrick McHardy
Michael Richardson wrote:
 
 
Patrick == Patrick McHardy [EMAIL PROTECTED] writes:
 
  #define RTA_FWMARK RTA_PROTOINFO +#define RTA_FWMARK_MASK
  RTA_CACHEINFO
 
 Patrick Please introduce a new attribute for this instead of
 Patrick overloading RTA_CACHEINFO.
 
   I would be happy to do that.
   Should I also un-overload FWMARK, with backwards compatibility?

No, that one is fine since it doesn't already have a different meaning.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] iproute2 -- add fwmarkmask

2006-02-23 Thread Patrick McHardy
Michael Richardson wrote:
 
 
Patrick == Patrick McHardy [EMAIL PROTECTED] writes:
 
 Patrick The normal way to display masks is with a /. Also I think
 Patrick it shouldn't display the default mask to avoid breaking
 Patrick scripts that parse the output.
 
   I generally dislike the /VALUE, since I expect /PREFIX-LEN.
   I agree that it shouldn't show if it is default.
 
 Patrick ip should be able to parse its own output, and it would
 Patrick also look nicer if I could just say fwmark
 Patrick 0x1/32. fwmarkmask is really an incredible ugly expression
 Patrick :)
 
   Sure. Is that a 32-bit long mask (0xfff), or is it a 0x0020?
   fwmark is not an address.
 
   Or would you like /32 to be a prefix-based mask, and value and/or
 fwmarkmask to be a value? 

That was not the greatest example :) I think it should be a bitmask.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Patch 1/1] updated: TCP/UDP getpeersec

2006-02-23 Thread Catherine Zhang
Hi,

Updated as per Herbert's comment.

Catherine

---

From: [EMAIL PROTECTED]

This patch implements an application of the LSM-IPSec networking
controls whereby an application can determine the label of the
security association its TCP or UDP sockets are currently connected to
via getsockopt and the auxiliary data mechanism of recvmsg.

Patch purpose:

This patch enables a security-aware application to retrieve the
security context of an IPSec security association a particular TCP or
UDP socket is using.  The application can then use this security
context to determine the security context for processing on behalf of
the peer at the other end of this connection.  In the case of UDP, the
security context is for each individual packet.  An example
application is the inetd daemon, which could be modified to start
daemons running at security contexts dependent on the remote client.

Patch design approach:

- Design for TCP
The patch enables the SELinux LSM to set the peer security context for
a socket based on the security context of the IPSec security
association.  The application may retrieve this context using
getsockopt.  When called, the kernel determines if the socket is a
connected (TCP_ESTABLISHED) TCP socket and, if so, uses the dst_entry
cache on the socket to retrieve the security associations.  If a
security association has a security context, the context string is
returned, as for UNIX domain sockets.

- Design for UDP
Unlike TCP, UDP is connectionless.  This requires a somewhat different
API to retrieve the peer security context.  With TCP, the peer
security context stays the same throughout the connection, thus it can
be retrieved at any time between when the connection is established
and when it is torn down.  With UDP, each read/write can have
different peer and thus the security context might change every time.
As a result the security context retrieval must be done TOGETHER with
the packet retrieval.

The solution is to build upon the existing Unix domain socket API for
retrieving user credentials.  Linux offers the API for obtaining user
credentials via ancillary messages (i.e., out of band/control messages
that are bundled together with a normal message).

Patch implementation details: 

- Implementation for TCP
The security context can be retrieved by applications using getsockopt
with the existing SO_PEERSEC flag.  As an example (ignoring error
checking):

getsockopt(sockfd, SOL_SOCKET, SO_PEERSEC, optbuf, optlen);
printf(Socket peer context is: %s\n, optbuf);

The SELinux function, selinux_socket_getpeersec, is extended to check
for labeled security associations for connected (TCP_ESTABLISHED ==
sk-sk_state) TCP sockets only.  If so, the socket has a dst_cache of
struct dst_entry values that may refer to security associations.  If
these have security associations with security contexts, the security
context is returned.  

getsockopt returns a buffer that contains a security context string or 
the buffer is unmodified. 

- Implementation for UDP
To retrieve the security context, the application first indicates to
the kernel such desire by setting the IP_PASSSEC option via
getsockopt.  Then the application retrieves the security context using
the auxiliary data mechanism.  

An example server application for UDP should look like this:

toggle = 1;
toggle_len = sizeof(toggle);

setsockopt(sockfd, SOL_IP, IP_PASSSEC, toggle, toggle_len);
recvmsg(sockfd, msg_hdr, 0);
if (msg_hdr.msg_controllen  sizeof(struct cmsghdr)) {
cmsg_hdr = CMSG_FIRSTHDR(msg_hdr);
if (cmsg_hdr-cmsg_len = CMSG_LEN(sizeof(scontext)) 
cmsg_hdr-cmsg_level == SOL_IP 
cmsg_hdr-cmsg_type == SCM_SECURITY) {
memcpy(scontext, CMSG_DATA(cmsg_hdr), sizeof(scontext));
}
}

ip_setsockopt is enhanced with a new socket option IP_PASSSEC to allow
a server socket to receive security context of the peer.  A new
ancillary message type SCM_SECURITY.

When the packet is received we get the security context from the
sec_path pointer which is contained in the sk_buff, and copy it to the
ancillary message space.  An additional LSM hook,
selinux_socket_getpeersec_udp, is defined to retrieve the security
context from the SELinux space.  The existing function,
selinux_socket_getpeersec does not suit our purpose, because the
security context is copied directly to user space, rather than to
kernel space.


Testing:

We have tested the patch by setting up TCP and UDP connections between
applications on two machines using the IPSec policies that result in
labeled security associations being built.  For TCP, we can then
extract the peer security context using getsockopt on either end.  For
UDP, the receiving end can retrieve the security context using the
auxiliary data mechanism of recvmsg.


---

 include/linux/in.h  |1 
 include/linux/security.h|   25 +++---
 include/linux/socket.h  |1 
 net/core/sock.c |2 -
 net/ipv4/ip_sockglue.c  |   

[git patches] net driver fixes

2006-02-23 Thread Jeff Garzik

Please pull from 'upstream-fixes' branch of
master.kernel.org:/pub/scm/linux/kernel/git/jgarzik/netdev-2.6.git

to receive the following updates:

 drivers/net/r8169.c |  189 
 drivers/net/skge.c  |   75 
 drivers/net/skge.h  |1 
 drivers/net/sky2.c  |  173 ---
 drivers/net/sky2.h  |   85 ---
 drivers/net/tlan.c  |2 
 6 files changed, 371 insertions(+), 154 deletions(-)

Adrian Bunk:
  drivers/net/tlan.c: #ifdef CONFIG_PCI the PCI specific code

Francois Romieu:
  r8169: fix broken ring index handling in suspend/resume
  r8169: enable wake on lan

Stephen Hemminger:
  sky2: yukon-ec-u chipset initialization
  sky2: limit coalescing values to ring size
  sky2: poke coalescing timer to fix hang
  sky2: force early transmit status
  sky2: use device iomem to access PCI config
  sky2: close race on IRQ mask update.
  skge: NAPI/irq race fix
  skge: genesis phy initialzation
  skge: protect interrupt mask

diff --git a/drivers/net/r8169.c b/drivers/net/r8169.c
index 6e10184..8cc0d0b 100644
--- a/drivers/net/r8169.c
+++ b/drivers/net/r8169.c
@@ -287,6 +287,20 @@ enum RTL8169_register_content {
TxInterFrameGapShift = 24,
TxDMAShift = 8, /* DMA burst value (0-7) is shift this many bits */
 
+   /* Config1 register p.24 */
+   PMEnable= (1  0), /* Power Management Enable */
+
+   /* Config3 register p.25 */
+   MagicPacket = (1  5), /* Wake up when receives a Magic Packet 
*/
+   LinkUp  = (1  4), /* Wake up when the cable connection is 
re-established */
+
+   /* Config5 register p.27 */
+   BWF = (1  6), /* Accept Broadcast wakeup frame */
+   MWF = (1  5), /* Accept Multicast wakeup frame */
+   UWF = (1  4), /* Accept Unicast wakeup frame */
+   LanWake = (1  1), /* LanWake enable/disable */
+   PMEStatus   = (1  0), /* PME status can be reset by PCI RST# 
*/
+
/* TBICSR p.28 */
TBIReset= 0x8000,
TBILoopback = 0x4000,
@@ -433,6 +447,7 @@ struct rtl8169_private {
unsigned int (*phy_reset_pending)(void __iomem *);
unsigned int (*link_ok)(void __iomem *);
struct work_struct task;
+   unsigned wol_enabled : 1;
 };
 
 MODULE_AUTHOR(Realtek and the Linux r8169 crew netdev@vger.kernel.org);
@@ -607,6 +622,80 @@ static void rtl8169_link_option(int idx,
*duplex = p-duplex;
 }
 
+static void rtl8169_get_wol(struct net_device *dev, struct ethtool_wolinfo 
*wol)
+{
+   struct rtl8169_private *tp = netdev_priv(dev);
+   void __iomem *ioaddr = tp-mmio_addr;
+   u8 options;
+
+   wol-wolopts = 0;
+
+#define WAKE_ANY (WAKE_PHY | WAKE_MAGIC | WAKE_UCAST | WAKE_BCAST | WAKE_MCAST)
+   wol-supported = WAKE_ANY;
+
+   spin_lock_irq(tp-lock);
+
+   options = RTL_R8(Config1);
+   if (!(options  PMEnable))
+   goto out_unlock;
+
+   options = RTL_R8(Config3);
+   if (options  LinkUp)
+   wol-wolopts |= WAKE_PHY;
+   if (options  MagicPacket)
+   wol-wolopts |= WAKE_MAGIC;
+
+   options = RTL_R8(Config5);
+   if (options  UWF)
+   wol-wolopts |= WAKE_UCAST;
+   if (options  BWF)
+   wol-wolopts |= WAKE_BCAST;
+   if (options  MWF)
+   wol-wolopts |= WAKE_MCAST;
+
+out_unlock:
+   spin_unlock_irq(tp-lock);
+}
+
+static int rtl8169_set_wol(struct net_device *dev, struct ethtool_wolinfo *wol)
+{
+   struct rtl8169_private *tp = netdev_priv(dev);
+   void __iomem *ioaddr = tp-mmio_addr;
+   int i;
+   static struct {
+   u32 opt;
+   u16 reg;
+   u8  mask;
+   } cfg[] = {
+   { WAKE_ANY,   Config1, PMEnable },
+   { WAKE_PHY,   Config3, LinkUp },
+   { WAKE_MAGIC, Config3, MagicPacket },
+   { WAKE_UCAST, Config5, UWF },
+   { WAKE_BCAST, Config5, BWF },
+   { WAKE_MCAST, Config5, MWF },
+   { WAKE_ANY,   Config5, LanWake }
+   };
+
+   spin_lock_irq(tp-lock);
+
+   RTL_W8(Cfg9346, Cfg9346_Unlock);
+
+   for (i = 0; i  ARRAY_SIZE(cfg); i++) {
+   u8 options = RTL_R8(cfg[i].reg)  ~cfg[i].mask;
+   if (wol-wolopts  cfg[i].opt)
+   options |= cfg[i].mask;
+   RTL_W8(cfg[i].reg, options);
+   }
+
+   RTL_W8(Cfg9346, Cfg9346_Lock);
+
+   tp-wol_enabled = (wol-wolopts) ? 1 : 0;
+
+   spin_unlock_irq(tp-lock);
+
+   return 0;
+}
+
 static void rtl8169_get_drvinfo(struct net_device *dev,
struct ethtool_drvinfo *info)
 {
@@ -1025,6 +1114,8 @@ static struct ethtool_ops rtl8169_ethtoo
.get_tso= 

Re: [git patches] net driver fixes

2006-02-23 Thread Wolfgang Hoffmann
On Friday 24 February 2006 06:22, Jeff Garzik wrote:
 Please pull from 'upstream-fixes' branch of
 master.kernel.org:/pub/scm/linux/kernel/git/jgarzik/netdev-2.6.git

 [...]
 Stephen Hemminger:
   sky2: yukon-ec-u chipset initialization
   sky2: limit coalescing values to ring size
   sky2: poke coalescing timer to fix hang
   sky2: force early transmit status
   sky2: use device iomem to access PCI config
   sky2: close race on IRQ mask update.
[...]

Thanks for the update.

Still I'm seeing reproducable hangs with this version of sky2 (as reported in 
bugzilla 6084 and discussed on netdev).

Stephen, if there is anything I can do to narrow down my hangs a bit more 
systematically, please let me know, I'd be happy to help.

Wolfgang
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html