Re: myri10ge conversion to non-contiguous skb

2006-08-28 Thread Jesse Brandeburg

On 8/24/06, Brice Goglin [EMAIL PROTECTED] wrote:

During the submission of the myri10ge driver, some people raised the
question of using pages (or any kind of non-contiguous skb) instead of
our current 16kB contiguous skb. We are looking at this right now and it
is not clear what solution is the best. From what we understand, Linux
provides two mostly redundant mechanisms to handle discontinuous skb,
the skb-frags and the skb-frag_list, s2io using the latter while e1000
uses the former. Is one or the other recommended? What is the purpose of
having them both in the net core?


you really only have one option, to use PAGE_SIZE pages and frags[]
w/nr_frags.  e1000 tried the frag_list option but that is used by ip
reassembly and badly conflicts with driver generated frag_list.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH]d80211: fix iwconfig key [x] behavior

2006-08-28 Thread Hong Liu
iwconfig key [x] behavior is not correctly handled in the stack, also
modify the giwencode method to show the key info.

Thanks,
Hong
[PATCH]d80211: fix iwconfig key [x] behavior

Signed-off-by: Hong Liu [EMAIL PROTECTED]

diff --git a/net/d80211/ieee80211_ioctl.c b/net/d80211/ieee80211_ioctl.c
index dd52555..d3dc59c 100644
--- a/net/d80211/ieee80211_ioctl.c
+++ b/net/d80211/ieee80211_ioctl.c
@@ -2811,9 +2811,10 @@ static int ieee80211_ioctl_siwencode(str
 		if (sdata-default_key == NULL)
 			idx = 0;
 		else for (i = 0; i  NUM_DEFAULT_KEYS; i++) {
-			if (sdata-default_key == sdata-keys[i])
+			if (sdata-default_key == sdata-keys[i]) {
 idx = i;
-			break;
+break;
+			}
 		}
 		if (idx  0)
 			return -EINVAL;
@@ -2824,16 +2825,22 @@ static int ieee80211_ioctl_siwencode(str
 		alg = ALG_NONE;
 	else if (erq-length == 0) {
 		/* No key data - just set the default TX key index */
-		sdata-default_key = sdata-keys[idx];
+		if (sdata-default_key != sdata-keys[idx]) {
+			if (sdata-default_key)
+ieee80211_key_sysfs_remove_default(sdata);
+			sdata-default_key = sdata-keys[idx];
+			if (sdata-default_key)
+ieee80211_key_sysfs_add_default(sdata);
+		}
+		return 0;
 	}
 
 	return ieee80211_set_encryption(
 		dev, bcaddr,
-		idx, erq-length == 0 ? ALG_NONE : ALG_WEP,
+		idx, alg,
 		sdata-default_key == NULL,
 		NULL, keybuf, erq-length);
 
-	return 0;
 }
 
 
@@ -2852,9 +2859,10 @@ static int ieee80211_ioctl_giwencode(str
 		if (sdata-default_key == NULL)
 			idx = 0;
 		else for (i = 0; i  NUM_DEFAULT_KEYS; i++) {
-			if (sdata-default_key == sdata-keys[i])
+			if (sdata-default_key == sdata-keys[i]) {
 idx = i;
-			break;
+break;
+			}
 		}
 		if (idx  0)
 			return -EINVAL;
@@ -2869,7 +2877,8 @@ static int ieee80211_ioctl_giwencode(str
 		return 0;
 	}
 
-	erq-length = 0;
+	erq-length = sdata-keys[idx]-keylen;
+	memcpy(key, sdata-keys[idx]-key, erq-length);
 	erq-flags |= IW_ENCODE_ENABLED;
 
 	return 0;


Re: [PATCH] net/*: use SLAB_PANIC

2006-08-28 Thread Christoph Hellwig
On Sun, Aug 27, 2006 at 03:08:41AM +0400, Alexey Dobriyan wrote:
 Signed-off-by: Alexey Dobriyan [EMAIL PROTECTED]
 ---
 
  Forgive me reformatting, in some cases making it fit in 80 columns was hard.
 
  net/core/flow.c|6 +-
  net/core/neighbour.c   |   12 
  net/core/skbuff.c  |9 ++---
  net/decnet/dn_route.c  |   11 +++
  net/ipv4/inetpeer.c|5 +
  net/ipv4/ipmr.c|5 +
  net/ipv4/route.c   |   10 +++---
  net/ipv4/tcp.c |4 +---
  net/ipv6/ip6_fib.c |4 +---
  net/ipv6/route.c   |   10 +++---

ipv6 can be modular, so panicing on a initialization failure is wrong.

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2.6.17 2/9] NetXen: Hardware access routines

2006-08-28 Thread Amit S. Kale
On Monday 21 August 2006 19:33, Stephen Hemminger wrote:
 On Mon, 21 Aug 2006 13:57:23 +0530

 Amit S. Kale [EMAIL PROTECTED] wrote:
  We can certainly create a table for all error messages. It'll hurt
  readability of code in many of the other places where printks are used to
  indicate some hardware error.
  -Amit

 My suggestion was intended as an way to handle multiple driver versions
 all using the same firmware or vice versa. By locking the firmware and
 driver version together you might make maintenance more difficult.

Ah, I had missed that completely in your first email. Thanks for your 
suggestion.

The NetXen firmware will most probably keep changing. It's hardware is 
flexible enough, so the firmware changes will possibly be varied in nature. 
Thinking about this further, it seems we should coalesce firmware dependent 
code into a few isolated functions. While this may be difficult, we should do 
it anyway. Hopefully future changes will not cause these efforts to go waste.
-Amit

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/4] net: VM deadlock avoidance framework

2006-08-28 Thread Peter Zijlstra
On Sat, 2006-08-26 at 04:37 +0200, Indan Zupancic wrote:
 On Fri, August 25, 2006 17:39, Peter Zijlstra said:
  @@ -282,7 +282,8 @@ struct sk_buff {
  nfctinfo:3;
  __u8pkt_type:3,
  fclone:2,
  -   ipvs_property:1;
  +   ipvs_property:1,
  +   emerg:1;
  __be16  protocol;
 
 Why not 'emergency'? Looks like 'emerge' with a typo now. ;-)

hehe, me lazy, you gentoo ;-)
sed -i -e 's/emerg/emregency/g' -e 's/EMERG/EMERGENCY/g' *.patch

  @@ -391,6 +391,7 @@ enum sock_flags {
  SOCK_RCVTSTAMP, /* %SO_TIMESTAMP setting */
  SOCK_LOCALROUTE, /* route locally only, %SO_DONTROUTE setting */
  SOCK_QUEUE_SHRUNK, /* write queue has been shrunk recently */
  +   SOCK_VMIO, /* promise to never block on receive */
 
 It might be used for IO related to the VM, but that doesn't tell _what_ it 
 does.
 It also does much more than just not blocking on receive, so overal, aren't
 both the vmio name and the comment slightly misleading?

I'm so having trouble with this name; I had SOCK_NONBLOCKING for a
while, but that is a very bad name because nonblocking has this well
defined meaning when talking about sockets, and this is not that.

Hence I came up with the VMIO, because that is the only selecting
criteria for being special. - I'll fix up the comment.

  +static inline int emerg_rx_pages_try_inc(void)
  +{
  +   return atomic_read(vmio_socks) 
  +   atomic_add_unless(emerg_rx_pages_used, 1, RX_RESERVE_PAGES);
  +}
 
 It looks cleaner to move that first check to the caller, as it is often
 redundant and in the other cases makes it more clear what the caller is
 really doing.

Yes, very good suggestion indeed, what was I thinking?!

  @@ -82,6 +82,7 @@ EXPORT_SYMBOL(zone_table);
 
   static char *zone_names[MAX_NR_ZONES] = { DMA, DMA32, Normal, 
  HighMem };
   int min_free_kbytes = 1024;
  +int var_free_kbytes;
 
 Using var_free_pages makes the code slightly simpler, as all that needless
 convertion isn't needed anymore. Perhaps the same is true for 
 min_free_kbytes...

't seems I'm a bit puzzled as to what you mean here.

 
  +noskb:
  +   /* Attempt emergency allocation when RX skb. */
  +   if (!(flags  SKB_ALLOC_RX))
  +   goto out;
 
 So only incoming skb allocation is guaranteed? What about outgoing skbs?
 What am I missing? Or can we sleep then, and increasing var_free_kbytes is
 sufficient to guarantee it?

-sk_allocation |= __GFP_EMERGENCY - will take care of the outgoing
packets. Also, since one only sends a limited number of packets out and
then will wait for answers, we do not need to worry about fragmentation
issues that much in this case.

  +static void emerg_free_skb(struct kmem_cache *cache, void *objp)
  +{
  +   free_page((unsigned long)objp);
  +   emerg_rx_pages_dec();
  +}
  +
   /*
* Free an skbuff by memory without cleaning the state.
*/
  @@ -326,17 +373,21 @@ void kfree_skbmem(struct sk_buff *skb)
   {
  struct sk_buff *other;
  atomic_t *fclone_ref;
  +   void (*free_skb)(struct kmem_cache *, void *);
 
  skb_release_data(skb);
  +
  +   free_skb = skb-emerg ? emerg_free_skb : kmem_cache_free;
  +
  switch (skb-fclone) {
  case SKB_FCLONE_UNAVAILABLE:
  -   kmem_cache_free(skbuff_head_cache, skb);
  +   free_skb(skbuff_head_cache, skb);
  break;
 
  case SKB_FCLONE_ORIG:
  fclone_ref = (atomic_t *) (skb + 2);
  if (atomic_dec_and_test(fclone_ref))
  -   kmem_cache_free(skbuff_fclone_cache, skb);
  +   free_skb(skbuff_fclone_cache, skb);
  break;
 
  case SKB_FCLONE_CLONE:
  @@ -349,7 +400,7 @@ void kfree_skbmem(struct sk_buff *skb)
  skb-fclone = SKB_FCLONE_UNAVAILABLE;
 
  if (atomic_dec_and_test(fclone_ref))
  -   kmem_cache_free(skbuff_fclone_cache, other);
  +   free_skb(skbuff_fclone_cache, other);
  break;
  };
   }
 
 I don't have the original code in front of me, but isn't it possible to
 add a goto free which has all the freeing in one place? That would get
 rid of the function pointer stuff and emerg_free_skb.

perhaps, yes, however I prefer this one, it allows access to the size.

  @@ -435,6 +486,17 @@ struct sk_buff *skb_clone(struct sk_buff
  atomic_t *fclone_ref = (atomic_t *) (n + 1);
  n-fclone = SKB_FCLONE_CLONE;
  atomic_inc(fclone_ref);
  +   } else if (skb-emerg) {
  +   if (!emerg_rx_pages_try_inc())
  +   return NULL;
  +
  +   n = (void *)__get_free_page(gfp_mask | __GFP_EMERG);
  +   if (!n) {
  +   WARN_ON(1);
  +   emerg_rx_pages_dec();
  +   return NULL;
  +   }
  +   n-fclone = SKB_FCLONE_UNAVAILABLE;
  } else {
  n = 

Re: [RFC][PATCH 0/3] net: a lighter UDP-Lite (RFC 3828)

2006-08-28 Thread gerrit
[NET/IPv4]: update for udp.c only, to match 2.6.18-rc4-mm3

This is an update only, as the previous patch can not cope
with recent changes to udp.c (all other files remain the same).

Up-to-date, complete patches can always be taken from 
http://www.erg.abdn.ac.uk/users/gerrit/udp-lite/files/udplite_linux.tar.gz

Signed-off-by: Gerrit Renker [EMAIL PROTECTED]
---
  udp.c |  606 
--
 1 file changed, 410 insertions(+), 196 deletions(-)


diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index 514c1e9..4ddd8e6 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -92,10 +92,8 @@ #include linux/errno.h
 #include linux/timer.h
 #include linux/mm.h
 #include linux/inet.h
-#include linux/ipv6.h
 #include linux/netdevice.h
 #include net/snmp.h
-#include net/ip.h
 #include net/tcp_states.h
 #include net/protocol.h
 #include linux/skbuff.h
@@ -121,7 +119,19 @@ DEFINE_RWLOCK(udp_hash_lock);
 /* Shared by v4/v6 udp. */
 int udp_port_rover;
 
-static int udp_v4_get_port(struct sock *sk, unsigned short snum)
+/* the extensions for UDP-Lite (RFC 3828) */
+#include udplite.c
+
+/**
+ * __udp_get_port  -  find an unbound UDP(-Lite) port
+ *
+ * @sk: udp_sock
+ * @snum:   port number to look up
+ * @udptable:   hash list table, must be of UDP_HTABLE_SIZE
+ * @port_rover: pointer to record of last unallocated port
+ */
+int __udp_get_port(struct sock *sk, unsigned short snum,
+ struct hlist_head udptable[], int *port_rover)
 {
struct hlist_node *node;
struct sock *sk2;
@@ -131,16 +141,16 @@ static int udp_v4_get_port(struct sock *
if (snum == 0) {
int best_size_so_far, best, result, i;
 
-   if (udp_port_rover  sysctl_local_port_range[1] ||
-   udp_port_rover  sysctl_local_port_range[0])
-   udp_port_rover = sysctl_local_port_range[0];
+   if (*port_rover  sysctl_local_port_range[1] ||
+   *port_rover  sysctl_local_port_range[0])
+   *port_rover = sysctl_local_port_range[0];
best_size_so_far = 32767;
-   best = result = udp_port_rover;
+   best = result = *port_rover;
for (i = 0; i  UDP_HTABLE_SIZE; i++, result++) {
struct hlist_head *list;
int size;
 
-   list = udp_hash[result  (UDP_HTABLE_SIZE - 1)];
+   list = udptable[result  (UDP_HTABLE_SIZE - 1)];
if (hlist_empty(list)) {
if (result  sysctl_local_port_range[1])
result = sysctl_local_port_range[0] +
@@ -162,16 +172,16 @@ static int udp_v4_get_port(struct sock *
result = sysctl_local_port_range[0]
+ ((result - 
sysctl_local_port_range[0]) 
   (UDP_HTABLE_SIZE - 1));
-   if (!udp_lport_inuse(result))
+   if (! __udp_lport_inuse(result, udptable))
break;
}
if (i = (1  16) / UDP_HTABLE_SIZE)
goto fail;
 gotit:
-   udp_port_rover = snum = result;
+   *port_rover = snum = result;
} else {
sk_for_each(sk2, node,
-   udp_hash[snum  (UDP_HTABLE_SIZE - 1)]) {
+   udptable[snum  (UDP_HTABLE_SIZE - 1)]) {
struct inet_sock *inet2 = inet_sk(sk2);
 
if (inet2-num == snum 
@@ -189,7 +199,7 @@ gotit:
}
inet-num = snum;
if (sk_unhashed(sk)) {
-   struct hlist_head *h = udp_hash[snum  (UDP_HTABLE_SIZE - 1)];
+   struct hlist_head *h = udptable[snum  (UDP_HTABLE_SIZE - 1)];
 
sk_add_node(sk, h);
sock_prot_inc_use(sk-sk_prot);
@@ -202,6 +212,11 @@ fail:
return 1;
 }
 
+static __inline__ int udp_v4_get_port(struct sock *sk, unsigned short snum)
+{
+   return  __udp_get_port(sk, snum, udp_hash, udp_port_rover);
+}
+
 static void udp_v4_hash(struct sock *sk)
 {
BUG();
@@ -217,18 +232,24 @@ static void udp_v4_unhash(struct sock *s
write_unlock_bh(udp_hash_lock);
 }
 
-/* UDP is nearly always wildcards out the wazoo, it makes no sense to try
- * harder than this. -DaveM
+/**
+ * __udp_lookup  -  find UDP(-Lite) socket
+ *
+ * @udptable:   hash list table, must be of UDP_HTABLE_SIZE
+ *
+ * UDP nearly always wildcards out the wazoo, it makes no sense to try
+ * harder than this. -DaveM
  */
-static struct sock *udp_v4_lookup_longway(u32 saddr, u16 sport,
- u32 daddr, u16 dport, int dif)
+struct sock *__udp_lookup(u32 saddr, u16 sport, u32 daddr, u16 dport, int dif,
+

Re: [take14 0/3] kevent: Generic event handling mechanism.

2006-08-28 Thread Jari Sundell

On 8/28/06, Nicholas Miell [EMAIL PROTECTED] wrote:

Also complicated is the case where waiting threads have different
priorities, different timeouts, and different minimum event counts --
how do you decide which thread gets events first? What if the decisions
are different depending on whether you want to maximize throughput or
interactivity?


BTW, what is the intended use of the min event count parameter? The
obvious reason I can see, avoiding waking up a thread too often with
few queued events, would imo be handled cleaner by just passing a
parameter telling the kernel to try to queue more events.

With a min event count you'd have to use a rather low timeout to
ensure that events get handled within a resonable time.

Rakshasa
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC][PATCH 0/3] net: a lighter UDP-Lite (RFC 3828)

2006-08-28 Thread Arnaldo Carvalho de Melo

On 8/28/06, [EMAIL PROTECTED] [EMAIL PROTECTED] wrote:

[NET/IPv4]: update for udp.c only, to match 2.6.18-rc4-mm3

This is an update only, as the previous patch can not cope
with recent changes to udp.c (all other files remain the same).

Up-to-date, complete patches can always be taken from
http://www.erg.abdn.ac.uk/users/gerrit/udp-lite/files/udplite_linux.tar.gz

Signed-off-by: Gerrit Renker [EMAIL PROTECTED]
---
  udp.c |  606 
--
 1 file changed, 410 insertions(+), 196 deletions(-)


diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index 514c1e9..4ddd8e6 100644




@@ -731,12 +801,12 @@ out:
 }

 /*
- * IOCTL requests applicable to the UDP protocol
+ * IOCTL requests applicable to the UDP(-Lite) protocol
  */


Avoid these changes to reduce patch file size, please


-
+
 int udp_ioctl(struct sock *sk, int cmd, unsigned long arg)
 {
-   switch(cmd)
+   switch(cmd)


Ditto



-/*
- * This should be easy, if there is something there we
- * return it, otherwise we block.
+/**
+ * udp_recvmsg  -  generic UDP/-Lite receive processing
+ *
+ * This routine is udplite-aware and works for both protocols.




@@ -980,7 +1055,11 @@ #else
 #endif
 }

-/* returns:
+/**
+ * udp_queue_rcv_skb  -  receive queue processing
+ *
+ * This routine is udplite-aware and works on both sockets.




if (up-encap_type) {
@@ -1010,7 +1087,7 @@ static int udp_queue_rcv_skb(struct sock
 * If it's an encapsulateed packet, then pass it to the
 * IPsec xfrm input and return the response
 * appropriately.  Otherwise, just fall through and
-* pass this up the UDP socket.
+* pass this up the UDP/-Lite socket.
 */



-   /* FALLTHROUGH -- it's a UDP Packet */
+   /* FALLTHROUGH -- it's a UDP/-Lite Packet */
}




 /*
- * All we need to do is get the socket, and then do a checksum.
+ * All we need to do is get the socket, and then do a checksum.
  */
-


Huh, what was this one? trailing whitespace? Can you leave this for
another cset doing just the reformatting?


@@ -1219,7 +1363,7 @@ static int udp_destroy_sock(struct sock
 }

 /*
- * Socket option code for UDP
+ * Socket option code for UDP and UDP-Lite (shared).
  */



 #endif
+
 /**
- * udp_poll - wait for a UDP event.
+ * udp_poll  -  wait for a UDP(-Lite) event.


See next comment


  * @file - file struct
  * @sock - socket
  * @wait - poll table
@@ -1348,11 +1528,14 @@ #endif
  * then it could get return from select indicating data available
  * but then block when reading it. Add special case code
  * to work around these arguably broken applications.
+ *
+ * The routine is udplite-aware and works for both protocols.


I guess these comments can go as well, as one can quickly realise the
functions handles UDP lite with all the IS_UDPLITE(sk) calls and
is_{udp}lite variables :-)


  */
 unsigned int udp_poll(struct file *file, struct socket *sock, poll_table *wait)
 {
unsigned int mask = datagram_poll(file, sock, wait);
struct sock *sk = sock-sk;
+   int is_lite = IS_UDPLITE(sk);


Regards,

- Arnaldo
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC][PATCH 0/3] net: a lighter UDP-Lite (RFC 3828)

2006-08-28 Thread gerrit
Quoting Arnaldo Carvalho de Melo:
|  Avoid these changes to reduce patch file size, please

I apologize for the bad patch format - I am revising the entire
patch to improve readability and will resend.

- Gerrit
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: e1000: operation without eeprom?

2006-08-28 Thread Brent Cook
On Sunday 27 August 2006 19:50, Lennert Buytenhek wrote:
 Hi,

 There are a couple of ARM boards out there with on-board e1000s but
 without any kind of eeprom.  The boot loader and kernel board support
 code have all the info necessary to configure the e1000, but the e1000
 driver bombs out because there isn't an eeprom connected -- how are
 we supposed to deal with this situation?


u-boot, which uses modified versions of the linux e1000 drivers, handles this 
special cases with a bunch of platform-specific #ifdefs

http://www.denx.de/cgi-bin/gitweb.cgi?p=u-boot.git;a=blob;h=927acbb26737a20e02962f67047e192545a870a1;hb=16850919ff8666f20d047cb83b4ee77581336515;f=drivers/e1000.c

I fear that working in general across the e1000 product line without an eeprom 
might not work so well. We've had 82545's work OK without and eeprom, but the 
82572 did not work so well. Some chips appear to work OK with pure software 
config, whereas others might need some special setup parameters that would 
work best at chip power-up via the eeprom.

As a general solution (with fewer ifdefs than the u-boot solution), it might 
be nice to have a read_eeprom_virtual(..) method in the driver where one 
could supply a binary blob to the driver instead of having a real eeprom. All 
of the driver code that relies on the eeprom could work like normal. I've 
been toying with this under u-boot for a custom ARM board without an eeprom 
too, though it does have the side-effect of bloating the u-boot driver a bit 
with the fake eeprom data that's really useless after boot (it's mostly 
0x's though, so you could totally optimize it.)

 - Brent
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC][PATCH 0/3] net: a lighter UDP-Lite (RFC 3828)

2006-08-28 Thread Arnaldo Carvalho de Melo

On 8/28/06, [EMAIL PROTECTED] [EMAIL PROTECTED] wrote:

Quoting Arnaldo Carvalho de Melo:
|  Avoid these changes to reduce patch file size, please

I apologize for the bad patch format - I am revising the entire
patch to improve readability and will resend.


No need for apologies and thanks for taking my suggestions into account.

- Arnaldo
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 2.6.18-rc4-mm[1,2,3] -- Network card not getting assigned an eth device name

2006-08-28 Thread Benoit Boissinot

On 8/27/06, Jeremy Fitzhardinge [EMAIL PROTECTED] wrote:

Andrew Morton wrote:
 Jeremy reported that a while back too.  I do not know what is causing it
 and as far as I know no net developers have yet looked into it.


It went away with -rc4-mm[23] for me...


I just reproduced it with rc4-mm3, ipw2200 after coming out of
suspend. I'll apply the patch from David Miller and see if anything
shows out in the log.

regards,

Benoit
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 2.6.18-rc4-mm[1,2,3] -- Network card not getting assigned an eth device name

2006-08-28 Thread Miles Lane

On 8/27/06, David Miller [EMAIL PROTECTED] wrote:

From: Andrew Morton [EMAIL PROTECTED]
Date: Sun, 27 Aug 2006 00:19:43 -0700

 Jeremy reported that a while back too.  I do not know what is causing it
 and as far as I know no net developers have yet looked into it.

A debugging patch like this one should help figure out the culprit.

If we don't see the gibberish netdevice name printed in the kernel
logs, then likely something is corrupting the netdevice structure or
the memory holding the name.

diff --git a/net/core/dev.c b/net/core/dev.c
index d4a1ec3..45f9b19 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -738,6 +738,11 @@ int dev_change_name(struct net_device *d

if (!dev_valid_name(newname))
return -EINVAL;
+#if 1
+   printk([%s:%d]: Changing netdevice name from [%s] to [%s]\n,
+  current-comm, current-pid,
+  dev-name, newname);
+#endif

if (strchr(newname, '%')) {
err = dev_alloc_name(dev, newname);



Dan, do you have any idea why NetworkManager from Ubuntu 6.06.1
would be corrupting network device names on recent MM kernels?
I haven't seen this happening with Ubuntu's kernels.  If you like, I can
send you my kernel .config file.

Here's what I get:

[NetworkManager:5399]: Changing netdevice name from [eth0] to [��]
��: link down
ADDRCONF(NETDEV_UP): ��: link is not ready
[NetworkManager:5399]: Changing netdevice name from [eth1] to [7G*e]
7G*e: no IPv6 routers present

Here's the result of strace -f -F -v -a50 NetworkManager:

execve(./NetworkManager.bak, [./NetworkManager.bak],
[TERM=linux, SHELL=/bin/bash, HUSHLOGIN=FALSE,
OLDPWD=/home/miles, USER=root,
LS_COLORS=no=00:fi=00:di=01;34:l..., SUDO_USER=miles,
SUDO_UID=1000, PATH=/usr/local/sbin:/usr/local/...,
MAIL=/var/mail/miles, PWD=/usr/sbin, LANG=en_US.UTF-8,
HISTCONTROL=ignoredups, SUDO_COMMAND=/bin/bash,
HOME=/home/miles, SHLVL=2, LANGUAGE=en_US:en_GB:en,
LOGNAME=root, LESSOPEN=| /usr/bin/lesspipe %s, SUDO_GID=1000,
LESSCLOSE=/usr/bin/lesspipe %s %..., _=/usr/bin/strace]) = 0
uname({sysname=Linux, nodename=Dumbleedor,
release=2.6.18-rc4-mm3, version=#32 Sun Aug 27 01:01:35 PDT 2006,
machine=i686}) = 0
brk(0)= 0x808b000
old_mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS,
-1, 0) = 0xb7f8a000
access(/etc/ld.so.nohwcap, F_OK)= -1 ENOENT (No such
file or directory)
old_mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS,
-1, 0) = 0xb7f88000
access(/etc/ld.so.preload, R_OK)= -1 ENOENT (No such
file or directory)
open(/etc/ld.so.cache, O_RDONLY)= 3
fstat64(3, {st_dev=makedev(3, 10), st_ino=195836,
st_mode=S_IFREG|0644, st_nlink=1, st_uid=0, st_gid=0, st_blksize=4096,
st_blocks=216, st_size=102666, st_atime=2006/08/28-00:34:02,
st_mtime=2006/08/25-22:58:56, st_ctime=2006/08/25-22:58:56}) = 0
old_mmap(NULL, 102666, PROT_READ, MAP_PRIVATE, 3, 0) = 0xb7f6e000
close(3)  = 0
access(/etc/ld.so.nohwcap, F_OK)= -1 ENOENT (No such
file or directory)
open(/usr/lib/libhal.so.1, O_RDONLY)= 3
read(3, \177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\\36\0..., 512) = 512
fstat64(3, {st_dev=makedev(3, 10), st_ino=830757,
st_mode=S_IFREG|0644, st_nlink=1, st_uid=0, st_gid=0, st_blksize=4096,
st_blocks=64, st_size=30448, st_atime=2006/08/28-00:34:02,
st_mtime=2006/05/22-08:09:25, st_ctime=2006/07/05-21:10:31}) = 0
old_mmap(NULL, 33464, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE,
3, 0) = 0xb7f65000
old_mmap(0xb7f6d000, 4096, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x7000) = 0xb7f6d000
close(3)  = 0
access(/etc/ld.so.nohwcap, F_OK)= -1 ENOENT (No such
file or directory)
open(/lib/libiw.so.28, O_RDONLY)= 3
read(3, \177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\0\300\25..., 512) = 512
fstat64(3, {st_dev=makedev(3, 10), st_ino=814477,
st_mode=S_IFREG|0644, st_nlink=1, st_uid=0, st_gid=0, st_blksize=4096,
st_blocks=48, st_size=23228, st_atime=2006/08/28-00:34:02,
st_mtime=2006/02/09-15:38:09, st_ctime=2006/07/05-21:19:53}) = 0
old_mmap(NULL, 26188, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE,
3, 0) = 0xb7f5e000
old_mmap(0xb7f64000, 4096, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x5000) = 0xb7f64000
close(3)  = 0
access(/etc/ld.so.nohwcap, F_OK)= -1 ENOENT (No such
file or directory)
open(/usr/lib/libnl.so.1, O_RDONLY) = 3
read(3, \177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\\236..., 512) = 512
fstat64(3, {st_dev=makedev(3, 10), st_ino=831039,
st_mode=S_IFREG|0644, st_nlink=1, st_uid=0, st_gid=0, st_blksize=4096,
st_blocks=368, st_size=180452, st_atime=2006/08/28-00:34:03,
st_mtime=2006/03/22-05:46:12, st_ctime=2006/03/29-09:41:12}) = 0
old_mmap(NULL, 4096, PROT_READ|PROT_WRITE, 

Re: [PATCH 1/4] net: VM deadlock avoidance framework

2006-08-28 Thread Indan Zupancic
On Mon, August 28, 2006 12:22, Peter Zijlstra said:
 On Sat, 2006-08-26 at 04:37 +0200, Indan Zupancic wrote:
 Why not 'emergency'? Looks like 'emerge' with a typo now. ;-)

 hehe, me lazy, you gentoo ;-)
 sed -i -e 's/emerg/emregency/g' -e 's/EMERG/EMERGENCY/g' *.patch

I used it for a while, long ago, until I figured out that there were better
alternatives. I didn't like the overly complex init and portage system though.

But if you say emerg it will sound as emerge, and all other fields in that
struct aren't abbreviated either and often longer, so it just makes more sense
to use the full name.


  @@ -391,6 +391,7 @@ enum sock_flags {
 SOCK_RCVTSTAMP, /* %SO_TIMESTAMP setting */
 SOCK_LOCALROUTE, /* route locally only, %SO_DONTROUTE setting */
 SOCK_QUEUE_SHRUNK, /* write queue has been shrunk recently */
  +  SOCK_VMIO, /* promise to never block on receive */

 It might be used for IO related to the VM, but that doesn't tell _what_ it 
 does.
 It also does much more than just not blocking on receive, so overal, aren't
 both the vmio name and the comment slightly misleading?

 I'm so having trouble with this name; I had SOCK_NONBLOCKING for a
 while, but that is a very bad name because nonblocking has this well
 defined meaning when talking about sockets, and this is not that.

 Hence I came up with the VMIO, because that is the only selecting
 criteria for being special. - I'll fix up the comment.

It's nice and short, but it might be weird if someone after a while finds 
another way
of using this stuff. And it's relation to 'emergency' looks unclear. So maybe 
calling
both the same makes most sense, no matter how you name it.


  @@ -82,6 +82,7 @@ EXPORT_SYMBOL(zone_table);
 
   static char *zone_names[MAX_NR_ZONES] = { DMA, DMA32, Normal, 
  HighMem };
   int min_free_kbytes = 1024;
  +int var_free_kbytes;

 Using var_free_pages makes the code slightly simpler, as all that needless
 convertion isn't needed anymore. Perhaps the same is true for 
 min_free_kbytes...

 't seems I'm a bit puzzled as to what you mean here.

I mean to store the variable reserve in pages instead of kilobytes. Currently 
you're
converting from the one to the other both when setting and when using the 
value. That
doesn't make much sense and can be avoided by storing the value in pages from 
the start.


  +noskb:
  +  /* Attempt emergency allocation when RX skb. */
  +  if (!(flags  SKB_ALLOC_RX))
  +  goto out;

 So only incoming skb allocation is guaranteed? What about outgoing skbs?
 What am I missing? Or can we sleep then, and increasing var_free_kbytes is
 sufficient to guarantee it?

 -sk_allocation |= __GFP_EMERGENCY - will take care of the outgoing
 packets. Also, since one only sends a limited number of packets out and
 then will wait for answers, we do not need to worry about fragmentation
 issues that much in this case.

Ah, missed that one. Didn't knew that the alloc flags were stored in the sock.


  +static void emerg_free_skb(struct kmem_cache *cache, void *objp)
  +{
  +  free_page((unsigned long)objp);
  +  emerg_rx_pages_dec();
  +}
  +
   /*
*Free an skbuff by memory without cleaning the state.
*/
  @@ -326,17 +373,21 @@ void kfree_skbmem(struct sk_buff *skb)
   {
 struct sk_buff *other;
 atomic_t *fclone_ref;
  +  void (*free_skb)(struct kmem_cache *, void *);
 
 skb_release_data(skb);
  +
  +  free_skb = skb-emerg ? emerg_free_skb : kmem_cache_free;
  +
 switch (skb-fclone) {
 case SKB_FCLONE_UNAVAILABLE:
  -  kmem_cache_free(skbuff_head_cache, skb);
  +  free_skb(skbuff_head_cache, skb);
 break;
 
 case SKB_FCLONE_ORIG:
 fclone_ref = (atomic_t *) (skb + 2);
 if (atomic_dec_and_test(fclone_ref))
  -  kmem_cache_free(skbuff_fclone_cache, skb);
  +  free_skb(skbuff_fclone_cache, skb);
 break;
 
 case SKB_FCLONE_CLONE:
  @@ -349,7 +400,7 @@ void kfree_skbmem(struct sk_buff *skb)
 skb-fclone = SKB_FCLONE_UNAVAILABLE;
 
 if (atomic_dec_and_test(fclone_ref))
  -  kmem_cache_free(skbuff_fclone_cache, other);
  +  free_skb(skbuff_fclone_cache, other);
 break;
 };
   }

 I don't have the original code in front of me, but isn't it possible to
 add a goto free which has all the freeing in one place? That would get
 rid of the function pointer stuff and emerg_free_skb.

 perhaps, yes, however I prefer this one, it allows access to the size.

What size are you talking about? What I had in mind is probably less readable,
but it avoids a bunch of function calls and that indirect function call, so
with luck it has less overhead and smaller object size:

void kfree_skbmem(struct sk_buff *skb)
{
struct sk_buff *other;
atomic_t *fclone_ref;
struct kmem_cache *cache = skbuff_head_cache;
struct sk_buff *free = skb;

skb_release_data(skb);
switch 

Re: [PATCH 5/10] rt2x00: Register initialization fixes

2006-08-28 Thread John W. Linville
On Sun, Aug 27, 2006 at 05:39:14PM +0200, Ivo van Doorn wrote:
 Various register initialization fixes to make the device work properly.
 This will fix the RX/TX issue for rt61pci.
 
 Signed-off-by Ivo van Doorn [EMAIL PROTECTED]
 
 ---
 
 diff -rU3 
 wireless-dev-rt2x00-interface/drivers/net/wireless/d80211/rt2x00/rt2400pci.c 
 wireless-dev-rt2x00-register/drivers/net/wireless/d80211/rt2x00/rt2400pci.c
 --- 
 wireless-dev-rt2x00-interface/drivers/net/wireless/d80211/rt2x00/rt2400pci.c  
 2006-08-27 16:11:40.0 +0200
 +++ 
 wireless-dev-rt2x00-register/drivers/net/wireless/d80211/rt2x00/rt2400pci.c   
 2006-08-27 16:17:02.0 +0200
 @@ -1192,11 +1192,7 @@
   rt2x00_register_write(rt2x00dev, RXCSR0, reg);
  
   rt2x00_register_write(rt2x00dev, MACCSR0, cpu_to_le32(0x00217223));
 -
 - rt2x00_register_read(rt2x00dev, MACCSR1, reg);
 - rt2x00_set_field32(reg, MACCSR1_AUTO_TXBBP, 1);
 - rt2x00_set_field32(reg, MACCSR1_AUTO_RXBBP, 1);
 - rt2x00_register_write(rt2x00dev, MACCSR1, reg);
 + rt2x00_register_write(rt2x00dev, MACCSR1, cpu_to_le32(0x00235518));
  
   rt2x00_register_read(rt2x00dev, MACCSR2, reg);
   rt2x00_set_field32(reg, MACCSR2_DELAY, 64);
 diff -rU3 
 wireless-dev-rt2x00-interface/drivers/net/wireless/d80211/rt2x00/rt2500pci.c 
 wireless-dev-rt2x00-register/drivers/net/wireless/d80211/rt2x00/rt2500pci.c
 --- 
 wireless-dev-rt2x00-interface/drivers/net/wireless/d80211/rt2x00/rt2500pci.c  
 2006-08-27 16:12:03.0 +0200
 +++ 
 wireless-dev-rt2x00-register/drivers/net/wireless/d80211/rt2x00/rt2500pci.c   
 2006-08-27 16:17:56.0 +0200
 @@ -1249,6 +1249,7 @@
   return -EBUSY;
  
   rt2x00_register_write(rt2x00dev, PWRCSR0, cpu_to_le32(0x3f3b3100));
 + rt2x00_register_write(rt2x00dev, PCICSR, cpu_to_le32(0x03b8));
  
   rt2x00_register_write(rt2x00dev, PSCSR0, cpu_to_le32(0x00020002));
   rt2x00_register_write(rt2x00dev, PSCSR1, cpu_to_le32(0x0002));
 @@ -1272,12 +1273,11 @@
   rt2x00_set_field32(reg, RXCSR0_DISABLE_RX, 0);
   rt2x00_register_write(rt2x00dev, RXCSR0, reg);
  
 - rt2x00_register_write(rt2x00dev, MACCSR0, cpu_to_le32(0x00213223));
 + rt2x00_register_write(rt2x00dev, GPIOCSR, cpu_to_le32(0xff00));
 + rt2x00_register_write(rt2x00dev, TESTCSR, cpu_to_le32(0x00f0));
  
 - rt2x00_register_read(rt2x00dev, MACCSR1, reg);
 - rt2x00_set_field32(reg, MACCSR1_AUTO_TXBBP, 1);
 - rt2x00_set_field32(reg, MACCSR1_AUTO_RXBBP, 1);
 - rt2x00_register_write(rt2x00dev, MACCSR1, reg);
 + rt2x00_register_write(rt2x00dev, MACCSR0, cpu_to_le32(0x00213223));
 + rt2x00_register_write(rt2x00dev, MACCSR1, cpu_to_le32(0x00235518));
  
   rt2x00_register_read(rt2x00dev, MACCSR2, reg);
   rt2x00_set_field32(reg, MACCSR2_DELAY, 64);

Lots of magic numbers here...can we do something about that?

John
-- 
John W. Linville
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] [IPV6] ROUTE: Fix dst refcounting.

2006-08-28 Thread YOSHIFUJI Hideaki / 吉藤英明
[IPV6] ROUTE: Fix dst reference counting in ip6_pol_route_lookup().

In ip6_pol_route_lookup(), when we finish backtracking at the
top-level root entry, we need to hold it.

Bug noticed by Mitsuru Chinen [EMAIL PROTECTED].

Signed-off-by: YOSHIFUJI Hideaki [EMAIL PROTECTED]

--- 
diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index 626ff7a..9d61ae9 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -510,8 +510,8 @@ restart:
rt = fn-leaf;
rt = rt6_device_match(rt, fl-oif, flags);
BACKTRACK(fl-fl6_src);
-   dst_hold(rt-u.dst);
 out:
+   dst_hold(rt-u.dst);
read_unlock_bh(table-tb6_lock);
 
rt-u.dst.lastuse = jiffies;

-- 
YOSHIFUJI Hideaki @ USAGI Project  [EMAIL PROTECTED]
GPG-FP  : 9022 65EB 1ECF 3AD1 0BDF  80D8 4807 F894 E062 0EEA
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: e1000 and 802.1ad/stacked vlan tagging

2006-08-28 Thread Stephan von Krawczynski
Hello Jesse,

thank you for answering anyway. Though I think your answer covers only the
obvious half of the problem.
Indeed one might think that this solves the issue - as long as there are only
linux kernels involved. Unfortunately my setup is a bit more complicated in
terms of hardware. So I should have probably clarified the question this way:
how do you configure the interface in a manner that packets with data length
of 1500 get transferred, and not only 1496 ?
I tried enlarging both real-device and first vlan interface mtu but that does
not work out. I really thought that the visible device setting of mtu=1500
should have worked out and that the driver (or some code in between) should
have corrected the allowed frame size to reflect the actual setup, not?

Regards,
Stephan

PS: crossposted to both lists, list-members keep in mind I am not subscribed
when answering! Thank you.



On Mon, 28 Aug 2006 10:23:09 -0700
Brandeburg, Jesse [EMAIL PROTECTED] wrote:

 Stephan von Krawczynski wrote:
  Hello Jesse,
  
  sorry to bother you directly, but since you did the patch for my e1000
  interrupt problem last time (February) I hope you have a short-hand
  idea for my current issue, too.
  
  I am trying to make stacked vlan tagging work under kernel 2.4 with
  e1000. Generally I do this on two boxes connected back-to-back:
  
  ifconfig eth0 up
  vconfig add eth0 4094
  ifconfig eth0.4094 up
  vconfig add eth0.4094 1
  ifconfig eth0.4094.1 192.168.1.1 netmask 255.255.255.0 broadcast
  192.168.1.255 up
  
  (of course the second box gets another ip, lets assume .2).
  
  if you do a
  
  ping -s 1472 192.168.1.2
  
  through the stacked vlan you see the packets vanish.
  With
  
  ping -s 1468 192.168.1.2
  
  everything seems ok.
  
  I have the impression that the stacked vlans show some problem with
  mtu handling inside the e1000 driver. Mtu is set to 1500 but because
  stacking tags uses 4 bytes more the packets cannot use the full mtu.
  Any ideas what happens here?
 
 The packet is being dropped because it is longer than the allowed frame
 size for 1500 MTU.  check ethtool -S eth0
 
 mine shows 
 rx_long_length_errors: 169
 
 which indicates that you need to change your mtu on the stacked
 interface to 1496, at which point after I did:
 
 ip l s eth1.4094.1 mtu 1496
 
 on both sides of my connection, everything was working.  I think in this
 case it is just a configuration problem.  When you stack vlans you have
 to account for the extra inserted length someplace and that place is by
 reducing the MTU.
 
 I'd appreciate it in the future if you could use
 [EMAIL PROTECTED] or netdev@vger.kernel.org for support questions
 like this because I'm not the only one who can answer questions (and I
 might have been on vacation! :-) )
 
 Jesse
 

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/4] net: VM deadlock avoidance framework

2006-08-28 Thread Peter Zijlstra
On Mon, 2006-08-28 at 18:03 +0200, Indan Zupancic wrote:
 On Mon, August 28, 2006 12:22, Peter Zijlstra said:

   @@ -391,6 +391,7 @@ enum sock_flags {
SOCK_RCVTSTAMP, /* %SO_TIMESTAMP setting */
SOCK_LOCALROUTE, /* route locally only, %SO_DONTROUTE setting */
SOCK_QUEUE_SHRUNK, /* write queue has been shrunk recently */
   +SOCK_VMIO, /* promise to never block on receive */
 
  It might be used for IO related to the VM, but that doesn't tell _what_ it 
  does.
  It also does much more than just not blocking on receive, so overal, aren't
  both the vmio name and the comment slightly misleading?
 
  I'm so having trouble with this name; I had SOCK_NONBLOCKING for a
  while, but that is a very bad name because nonblocking has this well
  defined meaning when talking about sockets, and this is not that.
 
  Hence I came up with the VMIO, because that is the only selecting
  criteria for being special. - I'll fix up the comment.
 
 It's nice and short, but it might be weird if someone after a while finds 
 another way
 of using this stuff. And it's relation to 'emergency' looks unclear. So maybe 
 calling
 both the same makes most sense, no matter how you name it.

I've tried to come up with another use-case, but failed (of course that
doesn't mean there is no). Also, I'm really past caring what the thing
is called ;-) But if ppl object I guess its easy enough to run yet
another sed command over the patches.

   @@ -82,6 +82,7 @@ EXPORT_SYMBOL(zone_table);
  
static char *zone_names[MAX_NR_ZONES] = { DMA, DMA32, Normal, 
   HighMem };
int min_free_kbytes = 1024;
   +int var_free_kbytes;
 
  Using var_free_pages makes the code slightly simpler, as all that needless
  convertion isn't needed anymore. Perhaps the same is true for 
  min_free_kbytes...
 
  't seems I'm a bit puzzled as to what you mean here.
 
 I mean to store the variable reserve in pages instead of kilobytes. Currently 
 you're
 converting from the one to the other both when setting and when using the 
 value. That
 doesn't make much sense and can be avoided by storing the value in pages from 
 the start.

right, will have a peek.

 void kfree_skbmem(struct sk_buff *skb)
 {
   struct sk_buff *other;
   atomic_t *fclone_ref;
   struct kmem_cache *cache = skbuff_head_cache;
   struct sk_buff *free = skb;
 
   skb_release_data(skb);
   switch (skb-fclone) {
   case SKB_FCLONE_UNAVAILABLE:
   goto free;
 
   case SKB_FCLONE_ORIG:
   fclone_ref = (atomic_t *) (skb + 2);
   if (atomic_dec_and_test(fclone_ref)){
   cache = skbuff_fclone_cache;
   goto free;
   }
   break;
 
   case SKB_FCLONE_CLONE:
   fclone_ref = (atomic_t *) (skb + 1);
   other = skb - 1;
 
   /* The clone portion is available for
* fast-cloning again.
*/
   skb-fclone = SKB_FCLONE_UNAVAILABLE;
 
   if (atomic_dec_and_test(fclone_ref)){
   cache = skbuff_fclone_cache;
   free = other;
   goto free;
   }
   break;
   };
   return;
 free:
   if (!skb-emergency)
   kmem_cache_free(cache, free);
   else
   emergency_rx_free(free, kmem_cache_size(cache));
 }

Ah, like so, sure, that looks good.

  You can get rid of the memalloc_reserve and vmio_request_queues variables
  if you want, they aren't really needed for anything. If using them reduces
  the total code size I'd keep them though.
 
  I find my version easier to read, but that might just be the way my
  brain works.
 
 Maybe true, but I believe my version is more natural in the sense that it 
 makes
 more clear what the code is doing. Less bookkeeping, more real work, so to 
 speak.

Ok, I'll have another look at it, perhaps my gray matter has shifted ;-)

 But after another look things seem a bit shaky, in the locking corner anyway.
 
 sk_adjust_memalloc() calls adjust_memalloc_reserve(), which changes 
 var_free_kbytes
 and then calls setup_per_zone_pages_min(), which does the real work. But it 
 reads
 min_free_kbytes without holding any locks. In mainline that's fine as the 
 function
 is only called by the proc handler and in obscure memory hotplug stuff. But 
 with
 your code it can also be called at any moment when a VMIO socket is made, 
 which now
 races with the proc callback. More a theoretical than a real problem, but 
 still
 slightly messy.

Knew about that, hadn't made up my mind on a fix yet. Good spot never
the less. Time to actually fix it I guess.

 adjust_memalloc_reserve() has no locking at all, while it might be called 
 concurrently
 from different sources. Luckily sk_adjust_memalloc() is the only user, and 
 which uses
 its own spinlock for synchronization, so things go well by accident now. It 
 seems
 

[RFC][PATCHv2 2.6.18-rc4-mm3 2/3] net/ipv4: UDP and generic UDP(-Lite) processing

2006-08-28 Thread gerrit
[Net/IPv4]: REVISED Modifications to the UDP module and generic UDP/-Lite 
processing.


Signed-off-by: Gerrit Renker [EMAIL PROTECTED]
---
 include/net/udp.h |   68 ++-
 net/ipv4/udp.c|  489 --
 2 files changed, 395 insertions(+), 162 deletions(-)


diff --git a/include/net/udp.h b/include/net/udp.h
index 766fba1..77c5fb9 100644
--- a/include/net/udp.h
+++ b/include/net/udp.h
@@ -26,9 +26,48 @@ #include linux/list.h
 #include net/inet_sock.h
 #include net/sock.h
 #include net/snmp.h
+#include net/ip.h
+#include linux/ipv6.h
 #include linux/seq_file.h
 
 #define UDP_HTABLE_SIZE128
+#include net/udplite.h
+
+/**
+ * struct udp_skb_cb  -  UDP(-Lite) private variables
+ *
+ * @header:  private variables used by IPv4/IPv6
+ * @cscov:   checksum coverage length (UDP-Lite only)
+ * @partial_cov: if set indicates partial csum coverage
+ */
+struct udp_skb_cb {
+   union {
+   struct inet_skb_parmh4;
+#if defined(CONFIG_IPV6) || defined (CONFIG_IPV6_MODULE)
+   struct inet6_skb_parm   h6;
+#endif
+   } header;
+   __u16   cscov;
+   __u8partial_cov;
+};
+#define UDP_SKB_CB(__skb)  ((struct udp_skb_cb *)((__skb)-cb))
+
+/*
+ * Generic checksumming routines for UDP(-Lite) v4 and v6
+ */
+static inline u16  __udp_checksum_complete(struct sk_buff *skb)
+{
+   if (! UDP_SKB_CB(skb)-partial_cov)
+   return __skb_checksum_complete(skb);
+   return  csum_fold(skb_checksum(skb, 0, UDP_SKB_CB(skb)-cscov,
+ skb-csum));
+}
+
+static __inline__ int udp_checksum_complete(struct sk_buff *skb)
+{
+   return skb-ip_summed != CHECKSUM_UNNECESSARY 
+   __udp_checksum_complete(skb);
+}
 
 /* udp.c: This needs to be shared by v4 and v6 because the lookup
  *and hashing code needs to work with different AF's yet
@@ -39,16 +78,24 @@ extern rwlock_t udp_hash_lock;
 
 extern int udp_port_rover;
 
-static inline int udp_lport_inuse(u16 num)
+/*
+ * XXX: since udp_v{4,6}_get_port use common code, these two functions
+ * will soon go
+ */
+static inline int __udp_lport_inuse(u16 num, struct hlist_head udptable[])
 {
struct sock *sk;
struct hlist_node *node;
 
-   sk_for_each(sk, node, udp_hash[num  (UDP_HTABLE_SIZE - 1)])
+   sk_for_each(sk, node, udptable[num  (UDP_HTABLE_SIZE - 1)])
if (inet_sk(sk)-num == num)
return 1;
return 0;
 }
+static __inline__ int udp_lport_inuse(u16 num)
+{
+   return __udp_lport_inuse(num, udp_hash);
+}
 
 /* Note: this must match 'valbool' in sock_setsockopt */
 #define UDP_CSUM_NOXMIT1
@@ -75,21 +122,32 @@ extern unsigned int udp_poll(struct file
 poll_table *wait);
 
 DECLARE_SNMP_STAT(struct udp_mib, udp_statistics);
-#define UDP_INC_STATS(field)   SNMP_INC_STATS(udp_statistics, field)
-#define UDP_INC_STATS_BH(field)
SNMP_INC_STATS_BH(udp_statistics, field)
-#define UDP_INC_STATS_USER(field)  SNMP_INC_STATS_USER(udp_statistics, 
field)
+/*
+ * SNMP statistics for UDP and UDP-Lite
+ */
+#define UDP_INC_STATS_USER(field, is_udplite) do {   \
+   if (is_udplite) SNMP_INC_STATS_USER(udplite_statistics, field);   \
+   elseSNMP_INC_STATS_USER(udp_statistics, field);  }  while(0)
+#define UDP_INC_STATS_BH(field, is_udplite)   do  {  \
+   if (is_udplite) SNMP_INC_STATS_BH(udplite_statistics, field); \
+   elseSNMP_INC_STATS_BH(udp_statistics, field);}  while(0)
+#define UDP_DEC_STATS_BH(field, is_udplite)   do  {  \
+   if (is_udplite) SNMP_DEC_STATS_BH(udplite_statistics, field); \
+   elseSNMP_DEC_STATS_BH(udp_statistics, field);}  while(0)
 
 /* /proc */
 struct udp_seq_afinfo {
struct module   *owner;
char*name;
sa_family_t family;
+   struct hlist_head   *hashtable;
int (*seq_show) (struct seq_file *m, void *v);
struct file_operations  *seq_fops;
 };
 
 struct udp_iter_state {
sa_family_t family;
+   struct hlist_head   *hashtable;
int bucket;
struct seq_operations   seq_ops;
 };
diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index 514c1e9..5ca0db3 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -92,10 +92,8 @@ #include linux/errno.h
 #include linux/timer.h
 #include linux/mm.h
 #include linux/inet.h
-#include linux/ipv6.h
 #include linux/netdevice.h
 #include net/snmp.h
-#include net/ip.h
 #include net/tcp_states.h
 #include net/protocol.h
 #include linux/skbuff.h
@@ -108,6 +106,8 @@ #include net/route.h
 #include net/inet_common.h
 #include net/checksum.h
 #include net/xfrm.h
+/* the extensions 

[RFC][PATCHv2 2.6.18-rc4-mm3 1/3] net/ipv4: UDP-Lite extensions

2006-08-28 Thread gerrit
[Net/IPv4]: REVISED UDP-Lite standalone support and shared UDP/-Lite socket 
structure.

This is in principle the same patch as posted earlier, with the difference that
all whitespace changes have been removed; in addition, statements have been 
re-ordered
so as to give a better-readable patch.


Signed-off-by: Gerrit Renker [EMAIL PROTECTED]
---

 include/linux/udp.h   |   11 ++
 include/net/udplite.h |   35 
 net/ipv4/udplite.c|  209 ++
 3 files changed, 255 insertions(+)


diff --git a/include/linux/udp.h b/include/linux/udp.h
index 90223f0..1b7cf10 100644
--- a/include/linux/udp.h
+++ b/include/linux/udp.h
@@ -50,12 +50,23 @@ struct udp_sock {
 * when the socket is uncorked.
 */
__u16len;   /* total length of pending frames */
+   /*
+* Fields specific to UDP-Lite.
+*/
+   __u16pcslen;
+   __u16pcrlen;
+/* indicator bits used by pcflag: */
+#define UDPLITE_BIT  0x1   /* set by udplite proto init function */
+#define UDPLITE_SEND_CC  0x2   /* set via udplite setsockopt */
+#define UDPLITE_RECV_CC  0x4   /* set via udplite setsocktopt*/
+   __u8 pcflag;/* marks socket as UDP-Lite if  0*/
 };
 
 static inline struct udp_sock *udp_sk(const struct sock *sk)
 {
return (struct udp_sock *)sk;
 }
+#define IS_UDPLITE(__sk) (udp_sk(__sk)-pcflag)
 
 #endif
 
diff --git a/net/ipv4/udplite.c b/net/ipv4/udplite.c
new file mode 100644
index 000..3911403
--- /dev/null
+++ b/net/ipv4/udplite.c
@@ -0,0 +1,209 @@
+/*
+ *  UDPLITE An implementation of the UDP-Lite protocol (RFC 3828).
+ *
+ *  Version:$Id: udplite.c,v 1.22 2006/08/22 13:01:52 gerrit Exp gerrit $
+ *
+ *  Authors:Gerrit Renker   [EMAIL PROTECTED]
+ *
+ *  Changes:
+ *  Fixes:
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ */
+
+struct hlist_head  udplite_hash[UDP_HTABLE_SIZE];
+intudplite_port_rover;
+DEFINE_SNMP_STAT(struct udp_mib, udplite_statistics)   __read_mostly;
+
+/* these functions are called by UDP-Lite with protocol-specific parameters */
+static int __udp_get_port(struct sock *, unsigned short,
+  struct hlist_head *, int *);
+static struct sock *__udp_lookup(u32 , u16, u32, u16, int, struct hlist_head 
*);
+static int __udp_mcast_deliver(struct sk_buff *, struct udphdr *,
+   u32, u32, struct hlist_head * );
+static int __udp_common_rcv(struct sk_buff *, int is_udplite);
+static void__udp_err(struct sk_buff *, u32, struct hlist_head *);
+#ifdef CONFIG_PROC_FS
+static int udp4_seq_show(struct seq_file *, void *);
+#endif
+
+/*
+ * Designate sk as UDP-Lite socket
+ */
+static inline int udplite_sk_init(struct sock *sk)
+{
+   udp_sk(sk)-pcflag = UDPLITE_BIT;
+   return 0;
+}
+
+static __inline__ int udplite_v4_get_port(struct sock *sk, unsigned short snum)
+{
+   return  __udp_get_port(sk, snum, udplite_hash, udplite_port_rover);
+}
+
+static __inline__ struct sock *udplite_v4_lookup(u32 saddr, u16 sport,
+u32 daddr, u16 dport, int dif)
+{
+   return __udp_lookup(saddr, sport, daddr, dport, dif, udplite_hash);
+}
+
+static __inline__ int udplite_v4_mcast_deliver(struct sk_buff *skb,
+   struct udphdr *uh, u32 saddr, u32 daddr)
+{
+   return __udp_mcast_deliver(skb, uh, saddr, daddr, udplite_hash);
+}
+
+__inline__ int udplite_rcv(struct sk_buff *skb)
+{
+   return __udp_common_rcv(skb, 1);
+}
+
+__inline__ void udplite_err(struct sk_buff *skb, u32 info)
+{
+   return __udp_err(skb, info, udplite_hash);
+}
+
+static int udplite_checksum_init(struct sk_buff *skb, struct udphdr *uh,
+unsigned short len, u32 saddr, u32 daddr)
+{
+   u16 cscov;
+
+/* In UDPv4 a zero checksum means that the transmitter generated no
+ * checksum. UDP-Lite (like IPv6) mandates checksums, hence packets
+ * with a zero checksum field are illegal.
*/
+   if (uh-check == 0) {
+   LIMIT_NETDEBUG(KERN_DEBUG UDPLITE: zeroed csum field
+   (%d.%d.%d.%d:%d - %d.%d.%d.%d:%d)\n, NIPQUAD(saddr),
+   ntohs(uh-source), NIPQUAD(daddr), ntohs(uh-dest));
+   return 0;
+   }
+
+UDP_SKB_CB(skb)-partial_cov = 0;
+cscov = ntohs(uh-len);
+
+   if (cscov == 0)  /* Indicates that full coverage is required. */
+   cscov = 

[PATCH 1/9] sky2: remove cloned/pskb_expand_head check

2006-08-28 Thread shemminger
The code to handle cloned skb overwriting is unnecessary in the
sky2 driver and is buggy. The bug is that pskb_expand_head can change the
skb but the driver has already mapped in the header.

Since the sky2 hardware doesn't need to overwrite memory, the buggy
code can just be removed; it was mistakenly copied from the tg3
driver.

Signed-off-by: Stephen Hemminger [EMAIL PROTECTED]
--- sky2.orig/drivers/net/sky2.c2006-08-25 16:00:28.0 -0700
+++ sky2/drivers/net/sky2.c 2006-08-25 16:01:33.0 -0700
@@ -1239,13 +1239,6 @@
/* Check for TCP Segmentation Offload */
mss = skb_shinfo(skb)-gso_size;
if (mss != 0) {
-   /* just drop the packet if non-linear expansion fails */
-   if (skb_header_cloned(skb) 
-   pskb_expand_head(skb, 0, 0, GFP_ATOMIC)) {
-   dev_kfree_skb(skb);
-   goto out_unlock;
-   }
-
mss += ((skb-h.th-doff - 5) * 4); /* TCP options */
mss += (skb-nh.iph-ihl * 4) + sizeof(struct tcphdr);
mss += ETH_HLEN;
@@ -1341,7 +1334,6 @@
 
sky2_put_idx(hw, txqaddr[sky2-port], sky2-tx_prod);
 
-out_unlock:
spin_unlock(sky2-tx_lock);
 
dev-trans_start = jiffies;

--
Stephen Hemminger [EMAIL PROTECTED]

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 9/9] sky2: version 1.7

2006-08-28 Thread shemminger
Change version number for this bundle.

Signed-off-by: Stephen Hemminger [EMAIL PROTECTED]

--- sky2.orig/drivers/net/sky2.c2006-08-22 14:55:42.0 -0700
+++ sky2/drivers/net/sky2.c 2006-08-22 14:55:46.0 -0700
@@ -50,7 +50,7 @@
 #include sky2.h
 
 #define DRV_NAME   sky2
-#define DRV_VERSION1.6
+#define DRV_VERSION1.7
 #define PFXDRV_NAME  
 
 /*

--
Stephen Hemminger [EMAIL PROTECTED]


-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 8/9] sky2: pci post bug

2006-08-28 Thread shemminger
Make sure that PCI write occurs before the delay.

Signed-off-by: Stephen Hemminger [EMAIL PROTECTED]


--- sky2.orig/drivers/net/sky2.c2006-08-28 10:00:17.0 -0700
+++ sky2/drivers/net/sky2.c 2006-08-28 10:00:20.0 -0700
@@ -531,6 +531,7 @@
reg1 |= phy_power[port];
 
sky2_pci_write32(hw, PCI_DEV_REG1, reg1);
+   sky2_pci_read32(hw, PCI_DEV_REG1);
udelay(100);
 }
 
@@ -766,9 +767,10 @@
 /* Update chip's next pointer */
 static inline void sky2_put_idx(struct sky2_hw *hw, unsigned q, u16 idx)
 {
+   q = Y2_QADDR(q, PREF_UNIT_PUT_IDX);
wmb();
-   sky2_write16(hw, Y2_QADDR(q, PREF_UNIT_PUT_IDX), idx);
-   mmiowb();
+   sky2_write16(hw, q, idx);
+   sky2_read16(hw, q);
 }
 
 

--
Stephen Hemminger [EMAIL PROTECTED]

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 4/9] sky2: MSI test timing

2006-08-28 Thread shemminger
The test for MSI IRQ could have timing issues. The PCI write needs to be 
pushed out before waiting, and the wait queue should be initialized before
the IRQ.

Signed-off-by: Stephen Hemminger [EMAIL PROTECTED]

--- sky2.orig/drivers/net/sky2.c2006-08-25 16:05:10.0 -0700
+++ sky2/drivers/net/sky2.c 2006-08-25 16:05:14.0 -0700
@@ -3189,6 +3189,8 @@
struct pci_dev *pdev = hw-pdev;
int err;
 
+   init_waitqueue_head (hw-msi_wait);
+
sky2_write32(hw, B0_IMSK, Y2_IS_IRQ_SW);
 
err = request_irq(pdev-irq, sky2_test_intr, IRQF_SHARED, DRV_NAME, hw);
@@ -3198,10 +3200,8 @@
return err;
}
 
-   init_waitqueue_head (hw-msi_wait);
-
sky2_write8(hw, B0_CTST, CS_ST_SW_IRQ);
-   wmb();
+   sky2_read8(hw, B0_CTST);
 
wait_event_timeout(hw-msi_wait, hw-msi_detected, HZ/10);
 

--
Stephen Hemminger [EMAIL PROTECTED]

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 6/9] sky2: optimize checksum offload information

2006-08-28 Thread shemminger
Since many packets have the same checksum starting offset and insertion
location; the driver can save the last information and only tell hardware
when it changes.

Signed-off-by: Stephen Hemminger [EMAIL PROTECTED]

--- sky2.orig/drivers/net/sky2.c2006-08-28 10:00:08.0 -0700
+++ sky2/drivers/net/sky2.c 2006-08-28 10:00:13.0 -0700
@@ -1280,12 +1280,17 @@
if (skb-nh.iph-protocol == IPPROTO_UDP)
ctrl |= UDPTCP;
 
-   le = get_tx_le(sky2);
-   le-tx.csum.start = cpu_to_le16(hdr);
-   le-tx.csum.offset = cpu_to_le16(offset);
-   le-length = 0; /* initial checksum value */
-   le-ctrl = 1;   /* one packet */
-   le-opcode = OP_TCPLISW | HW_OWNER;
+   if (hdr != sky2-tx_csum_start || offset != 
sky2-tx_csum_offset) {
+   sky2-tx_csum_start = hdr;
+   sky2-tx_csum_offset = offset;
+
+   le = get_tx_le(sky2);
+   le-tx.csum.start = cpu_to_le16(hdr);
+   le-tx.csum.offset = cpu_to_le16(offset);
+   le-length = 0; /* initial checksum value */
+   le-ctrl = 1;   /* one packet */
+   le-opcode = OP_TCPLISW | HW_OWNER;
+   }
}
 
le = get_tx_le(sky2);
--- sky2.orig/drivers/net/sky2.h2006-08-28 09:59:36.0 -0700
+++ sky2/drivers/net/sky2.h 2006-08-28 10:00:13.0 -0700
@@ -1843,6 +1843,8 @@
u32  tx_addr64;
u16  tx_pending;
u16  tx_last_mss;
+   u16  tx_csum_start;
+   u16  tx_csum_offset;
 
struct ring_info *rx_ring cacheline_aligned_in_smp;
struct sky2_rx_le*rx_le;

--
Stephen Hemminger [EMAIL PROTECTED]

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3/9] sky2: dont use force status bit

2006-08-28 Thread shemminger
Don't use force status bit. It was never implemented on all chips, or has
no impact.

Signed-off-by: Stephen Hemminger [EMAIL PROTECTED]

--- sky2.orig/drivers/net/sky2.c2006-08-25 16:02:27.0 -0700
+++ sky2/drivers/net/sky2.c 2006-08-25 16:05:10.0 -0700
@@ -1192,7 +1192,6 @@
struct sky2_tx_le *le = NULL;
struct tx_ring_info *re;
unsigned i, len;
-   int avail;
dma_addr_t mapping;
u32 addr64;
u16 mss;
@@ -1328,12 +1327,8 @@
re-idx = sky2-tx_prod;
le-ctrl |= EOP;
 
-   avail = tx_avail(sky2);
-   if (mss != 0 || avail  TX_MIN_PENDING) {
-   le-ctrl |= FRC_STAT;
-   if (avail = MAX_SKB_TX_LE)
-   netif_stop_queue(dev);
-   }
+   if (tx_avail(sky2) = MAX_SKB_TX_LE)
+   netif_stop_queue(dev);
 
sky2_put_idx(hw, txqaddr[sky2-port], sky2-tx_prod);
 
--- sky2.orig/drivers/net/sky2.h2006-08-25 16:00:28.0 -0700
+++ sky2/drivers/net/sky2.h 2006-08-25 16:05:10.0 -0700
@@ -1748,7 +1748,6 @@
INIT_SUM= 13,
LOCK_SUM= 14,
INS_VLAN= 15,
-   FRC_STAT= 16,
EOP = 17,
 };
 

--
Stephen Hemminger [EMAIL PROTECTED]

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH]: Add to MAINTAINERS file

2006-08-28 Thread Jim Lewis

This patch adds Jim Lewis to the MAINTAINERS file for the Spidernet
network driver.

Signed-off-by: James K Lewis [EMAIL PROTECTED]


---
 MAINTAINERS |6 ++
 1 file changed, 6 insertions(+)

Index: linux-2.6.18-rc2/MAINTAINERS
===
--- linux-2.6.18-rc2.orig/MAINTAINERS   2006-08-21 16:59:25.0
-0500
+++ linux-2.6.18-rc2/MAINTAINERS2006-08-21 17:19:14.0 -0500
@@ -2702,6 +2702,12 @@ M:   [EMAIL PROTECTED]
 L: linux-kernel@vger.kernel.org ?
 S: Supported
 
+SPIDERNET NETWORK DRIVER for CELL
+P: Jim Lewis
+M: [EMAIL PROTECTED]
+L: netdev@vger.kernel.org
+S: Supported
+
 SRM (Alpha) environment access
 P: Jan-Benedict Glaw
 M: [EMAIL PROTECTED]


-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 0/9] sky2 1.7 non-critical bug fixes

2006-08-28 Thread shemminger
These are a set of non-critical fixes to the sky2 driver.
  1. cloned skb tso bug fix
  2. netdev_alloc_skb
  3. don't use force status on transmit
  4. MSI pci posting bug
  5. TSO segment size optimization
  6. checksum offload optimization
  7. power up PHY only on network open
  8. pci post bug before delay
  9. version 1.7

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 5/9] sky2: TSO mss optimization

2006-08-28 Thread shemminger
The MSS in the transmit engine only has to change if TSO mtu changes. This
means less commands to the chip when mixing TSO and regular data.

Signed-off-by: Stephen Hemminger [EMAIL PROTECTED]
--- sky2.orig/drivers/net/sky2.c2006-08-28 10:00:07.0 -0700
+++ sky2/drivers/net/sky2.c 2006-08-28 10:00:08.0 -0700
@@ -1244,15 +1244,15 @@
mss += ((skb-h.th-doff - 5) * 4); /* TCP options */
mss += (skb-nh.iph-ihl * 4) + sizeof(struct tcphdr);
mss += ETH_HLEN;
-   }
 
-   if (mss != sky2-tx_last_mss) {
-   le = get_tx_le(sky2);
-   le-tx.tso.size = cpu_to_le16(mss);
-   le-tx.tso.rsvd = 0;
-   le-opcode = OP_LRGLEN | HW_OWNER;
-   le-ctrl = 0;
-   sky2-tx_last_mss = mss;
+   if (mss != sky2-tx_last_mss) {
+   le = get_tx_le(sky2);
+   le-tx.tso.size = cpu_to_le16(mss);
+   le-tx.tso.rsvd = 0;
+   le-opcode = OP_LRGLEN | HW_OWNER;
+   le-ctrl = 0;
+   sky2-tx_last_mss = mss;
+   }
}
 
ctrl = 0;
@@ -1320,7 +1320,7 @@
le-opcode = OP_BUFFER | HW_OWNER;
 
fre = sky2-tx_ring
-   + RING_NEXT((re - sky2-tx_ring) + i, TX_RING_SIZE);
+   + RING_NEXT((re - sky2-tx_ring) + i, TX_RING_SIZE);
pci_unmap_addr_set(fre, mapaddr, mapping);
}
 

--
Stephen Hemminger [EMAIL PROTECTED]

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 7/9] sky2: power down PHY when not up

2006-08-28 Thread shemminger
To save power, don't enable power to the PHY until device is brought up.

Signed-off-by: Stephen Hemminger [EMAIL PROTECTED]

--- sky2.orig/drivers/net/sky2.c2006-08-28 10:00:13.0 -0700
+++ sky2/drivers/net/sky2.c 2006-08-28 10:00:17.0 -0700
@@ -195,7 +195,6 @@
 static void sky2_set_power_state(struct sky2_hw *hw, pci_power_t state)
 {
u16 power_control;
-   u32 reg1;
int vaux;
 
pr_debug(sky2_set_power_state %d\n, state);
@@ -228,20 +227,9 @@
else
sky2_write8(hw, B2_Y2_CLK_GATE, 0);
 
-   /* Turn off phy power saving */
-   reg1 = sky2_pci_read32(hw, PCI_DEV_REG1);
-   reg1 = ~(PCI_Y2_PHY1_POWD | PCI_Y2_PHY2_POWD);
-
-   /* looks like this XL is back asswards .. */
-   if (hw-chip_id == CHIP_ID_YUKON_XL  hw-chip_rev  1) {
-   reg1 |= PCI_Y2_PHY1_COMA;
-   if (hw-ports  1)
-   reg1 |= PCI_Y2_PHY2_COMA;
-   }
-   sky2_pci_write32(hw, PCI_DEV_REG1, reg1);
-   udelay(100);
-
if (hw-chip_id == CHIP_ID_YUKON_EC_U) {
+   u32 reg1;
+
sky2_pci_write32(hw, PCI_DEV_REG3, 0);
reg1 = sky2_pci_read32(hw, PCI_DEV_REG4);
reg1 = P_ASPM_CONTROL_MSK;
@@ -253,15 +241,6 @@
 
case PCI_D3hot:
case PCI_D3cold:
-   /* Turn on phy power saving */
-   reg1 = sky2_pci_read32(hw, PCI_DEV_REG1);
-   if (hw-chip_id == CHIP_ID_YUKON_XL  hw-chip_rev  1)
-   reg1 = ~(PCI_Y2_PHY1_POWD | PCI_Y2_PHY2_POWD);
-   else
-   reg1 |= (PCI_Y2_PHY1_POWD | PCI_Y2_PHY2_POWD);
-   sky2_pci_write32(hw, PCI_DEV_REG1, reg1);
-   udelay(100);
-
if (hw-chip_id == CHIP_ID_YUKON_XL  hw-chip_rev  1)
sky2_write8(hw, B2_Y2_CLK_GATE, 0);
else
@@ -285,7 +264,7 @@
sky2_write8(hw, B2_TST_CTRL1, TST_CFG_WRITE_OFF);
 }
 
-static void sky2_phy_reset(struct sky2_hw *hw, unsigned port)
+static void sky2_gmac_reset(struct sky2_hw *hw, unsigned port)
 {
u16 reg;
 
@@ -533,6 +512,28 @@
gm_phy_write(hw, port, PHY_MARV_INT_MASK, PHY_M_DEF_MSK);
 }
 
+static void sky2_phy_power(struct sky2_hw *hw, unsigned port, int onoff)
+{
+   u32 reg1;
+   static const u32 phy_power[]
+   = { PCI_Y2_PHY1_POWD, PCI_Y2_PHY2_POWD };
+
+   /* looks like this XL is back asswards .. */
+   if (hw-chip_id == CHIP_ID_YUKON_XL  hw-chip_rev  1)
+   onoff = !onoff;
+
+   reg1 = sky2_pci_read32(hw, PCI_DEV_REG1);
+
+   if (onoff)
+   /* Turn off phy power saving */
+   reg1 = ~phy_power[port];
+   else
+   reg1 |= phy_power[port];
+
+   sky2_pci_write32(hw, PCI_DEV_REG1, reg1);
+   udelay(100);
+}
+
 /* Force a renegotiation */
 static void sky2_phy_reinit(struct sky2_port *sky2)
 {
@@ -1088,6 +1089,8 @@
if (!sky2-rx_ring)
goto err_out;
 
+   sky2_phy_power(hw, port, 1);
+
sky2_mac_init(hw, port);
 
/* Determine available ram buffer space (in 4K blocks).
@@ -1421,7 +1424,7 @@
/* Stop more packets from being queued */
netif_stop_queue(dev);
 
-   sky2_phy_reset(hw, port);
+   sky2_gmac_reset(hw, port);
 
/* Stop transmitter */
sky2_write32(hw, Q_ADDR(txqaddr[port], Q_CSR), BMU_STOP);
@@ -1469,6 +1472,8 @@
imask = ~portirq_msk[port];
sky2_write32(hw, B0_IMSK, imask);
 
+   sky2_phy_power(hw, port, 0);
+
/* turn off LED's */
sky2_write16(hw, B0_Y2LED, LED_STAT_OFF);
 
@@ -2403,7 +2408,7 @@
sky2_write32(hw, B0_HWE_IMSK, Y2_HWE_ALL_MASK);
 
for (i = 0; i  hw-ports; i++)
-   sky2_phy_reset(hw, i);
+   sky2_gmac_reset(hw, i);
 
memset(hw-st_le, 0, STATUS_LE_BYTES);
hw-st_idx = 0;

--
Stephen Hemminger [EMAIL PROTECTED]

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/9] sky2: use netdev_alloc_skb

2006-08-28 Thread shemminger
Use netdev_alloc_skb for buffer allocation to allow for headroom.
This prevents bugs in code paths that assume extra space at the
front and makes sky2 behave like other drivers.

Signed-off-by: Stephen Hemminger [EMAIL PROTECTED]

--- sky2.orig/drivers/net/sky2.c2006-08-25 16:01:33.0 -0700
+++ sky2/drivers/net/sky2.c 2006-08-25 16:02:27.0 -0700
@@ -954,14 +954,16 @@
 /*
  * It appears the hardware has a bug in the FIFO logic that
  * cause it to hang if the FIFO gets overrun and the receive buffer
- * is not aligned. ALso alloc_skb() won't align properly if slab
- * debugging is enabled.
+ * is not 64 byte aligned. The buffer returned from netdev_alloc_skb is
+ * aligned except if slab debugging is enabled.
  */
-static inline struct sk_buff *sky2_alloc_skb(unsigned int size, gfp_t gfp_mask)
+static inline struct sk_buff *sky2_alloc_skb(struct net_device *dev,
+unsigned int length,
+gfp_t gfp_mask)
 {
struct sk_buff *skb;
 
-   skb = alloc_skb(size + RX_SKB_ALIGN, gfp_mask);
+   skb = __netdev_alloc_skb(dev, length + RX_SKB_ALIGN, gfp_mask);
if (likely(skb)) {
unsigned long p = (unsigned long) skb-data;
skb_reserve(skb, ALIGN(p, RX_SKB_ALIGN) - p);
@@ -997,7 +999,8 @@
for (i = 0; i  sky2-rx_pending; i++) {
struct ring_info *re = sky2-rx_ring + i;
 
-   re-skb = sky2_alloc_skb(sky2-rx_bufsize, GFP_KERNEL);
+   re-skb = sky2_alloc_skb(sky2-netdev, sky2-rx_bufsize,
+GFP_KERNEL);
if (!re-skb)
goto nomem;
 
@@ -1829,15 +1832,16 @@
  * For small packets or errors, just reuse existing skb.
  * For larger packets, get new buffer.
  */
-static struct sk_buff *sky2_receive(struct sky2_port *sky2,
+static struct sk_buff *sky2_receive(struct net_device *dev,
u16 length, u32 status)
 {
+   struct sky2_port *sky2 = netdev_priv(dev);
struct ring_info *re = sky2-rx_ring + sky2-rx_next;
struct sk_buff *skb = NULL;
 
if (unlikely(netif_msg_rx_status(sky2)))
printk(KERN_DEBUG PFX %s: rx slot %u status 0x%x len %d\n,
-  sky2-netdev-name, sky2-rx_next, status, length);
+  dev-name, sky2-rx_next, status, length);
 
sky2-rx_next = (sky2-rx_next + 1) % sky2-rx_pending;
prefetch(sky2-rx_ring + sky2-rx_next);
@@ -1848,11 +1852,11 @@
if (!(status  GMR_FS_RX_OK))
goto resubmit;
 
-   if (length  sky2-netdev-mtu + ETH_HLEN)
+   if (length  dev-mtu + ETH_HLEN)
goto oversize;
 
if (length  copybreak) {
-   skb = alloc_skb(length + 2, GFP_ATOMIC);
+   skb = netdev_alloc_skb(dev, length + 2);
if (!skb)
goto resubmit;
 
@@ -1867,7 +1871,7 @@
} else {
struct sk_buff *nskb;
 
-   nskb = sky2_alloc_skb(sky2-rx_bufsize, GFP_ATOMIC);
+   nskb = sky2_alloc_skb(dev, sky2-rx_bufsize, GFP_ATOMIC);
if (!nskb)
goto resubmit;
 
@@ -1897,7 +1901,7 @@
 
if (netif_msg_rx_err(sky2)  net_ratelimit())
printk(KERN_INFO PFX %s: rx error, status 0x%x length %d\n,
-  sky2-netdev-name, status, length);
+  dev-name, status, length);
 
if (status  (GMR_FS_LONG_ERR | GMR_FS_UN_SIZE))
sky2-net_stats.rx_length_errors++;
@@ -1951,11 +1955,10 @@
 
switch (le-opcode  ~HW_OWNER) {
case OP_RXSTAT:
-   skb = sky2_receive(sky2, length, status);
+   skb = sky2_receive(dev, length, status);
if (!skb)
break;
 
-   skb-dev = dev;
skb-protocol = eth_type_trans(skb, dev);
dev-last_rx = jiffies;
 

--
Stephen Hemminger [EMAIL PROTECTED]


-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch 5/5] d80211: add ioctl to stop data frame tx

2006-08-28 Thread Elliot Schwartz

Hi Johannes,

Johannes Berg [EMAIL PROTECTED] writes:
 On Tue, 2006-08-22 at 10:34 -0700, David Kimdon wrote:
  This ioctl is used when radar is delected on a channel.  Data
  frames must stop but management frames must be allowed to continue
  for some time to communicate the channel switch to stations.

 Which does lead to the question: How are you detecting radar in
 userspace in the first place??

I've been working on merging Devicescape's 802.11h / radar detection
implementation into the open source hostapd (and the wireless-dev kernel). 

Radar is initially detected by the low-level radio driver.  Userspace
gets notified of radar via calls to ieee80211_radar_status, which
generates a fake management frame with a struct ieee80211_radar_info
in it.  Userspace is then responsible for handling the resultant
activities, such as stopping transmission on that channel, selecting
another channel, sending out channel switch announcements, changing
channels, and remembering to block use of the old channel for the
required time.

I'll reply to your and Jiri's other question separately.

Thanks,

elliot


-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/1][SCTP]: Fix sctp_primitive_ABORT() call in sctp_close()

2006-08-28 Thread Sridhar Samudrala
Dave,

The recent SCTP CVE fix that went into 2.6.18 changed 
sctp_primitive_ABORT() callers to create an ABORT chunk
and pass it as an arg instead of struct msghdr.
While submitting this fix, i missed the other location in
sctp_close() where this is called.

Please apply this patch to 2.6.18 and it should also go
into the stable series.

Thanks
Sridhar

[SCTP]: Fix sctp_primitive_ABORT() call in sctp_close().

With the recent fix, the callers of sctp_primitive_ABORT()
need to create an ABORT chunk and pass it as an argument rather
than msghdr that was passed earlier.

Signed-off-by: Sridhar Samudrala [EMAIL PROTECTED]

diff --git a/net/sctp/socket.c b/net/sctp/socket.c
index fde3f55..dab1594 100644
--- a/net/sctp/socket.c
+++ b/net/sctp/socket.c
@@ -1289,9 +1289,13 @@ SCTP_STATIC void sctp_close(struct sock 
}
}
 
-   if (sock_flag(sk, SOCK_LINGER)  !sk-sk_lingertime)
-   sctp_primitive_ABORT(asoc, NULL);
-   else
+   if (sock_flag(sk, SOCK_LINGER)  !sk-sk_lingertime) {
+   struct sctp_chunk *chunk;
+
+   chunk = sctp_make_abort_user(asoc, NULL, 0);
+   if (chunk)
+   sctp_primitive_ABORT(asoc, chunk);
+   } else
sctp_primitive_SHUTDOWN(asoc, NULL);
}
 

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 5/10] rt2x00: Register initialization fixes

2006-08-28 Thread Ivo van Doorn
On Monday 28 August 2006 18:08, John W. Linville wrote:
 On Sun, Aug 27, 2006 at 05:39:14PM +0200, Ivo van Doorn wrote:
  Various register initialization fixes to make the device work properly.
  This will fix the RX/TX issue for rt61pci.
  
  Signed-off-by Ivo van Doorn [EMAIL PROTECTED]
  
  ---
  
  diff -rU3 
  wireless-dev-rt2x00-interface/drivers/net/wireless/d80211/rt2x00/rt2400pci.c
   wireless-dev-rt2x00-register/drivers/net/wireless/d80211/rt2x00/rt2400pci.c
  --- 
  wireless-dev-rt2x00-interface/drivers/net/wireless/d80211/rt2x00/rt2400pci.c
  2006-08-27 16:11:40.0 +0200
  +++ 
  wireless-dev-rt2x00-register/drivers/net/wireless/d80211/rt2x00/rt2400pci.c 
  2006-08-27 16:17:02.0 +0200
  @@ -1192,11 +1192,7 @@
  rt2x00_register_write(rt2x00dev, RXCSR0, reg);
   
  rt2x00_register_write(rt2x00dev, MACCSR0, cpu_to_le32(0x00217223));
  -
  -   rt2x00_register_read(rt2x00dev, MACCSR1, reg);
  -   rt2x00_set_field32(reg, MACCSR1_AUTO_TXBBP, 1);
  -   rt2x00_set_field32(reg, MACCSR1_AUTO_RXBBP, 1);
  -   rt2x00_register_write(rt2x00dev, MACCSR1, reg);
  +   rt2x00_register_write(rt2x00dev, MACCSR1, cpu_to_le32(0x00235518));
   
  rt2x00_register_read(rt2x00dev, MACCSR2, reg);
  rt2x00_set_field32(reg, MACCSR2_DELAY, 64);
  diff -rU3 
  wireless-dev-rt2x00-interface/drivers/net/wireless/d80211/rt2x00/rt2500pci.c
   wireless-dev-rt2x00-register/drivers/net/wireless/d80211/rt2x00/rt2500pci.c
  --- 
  wireless-dev-rt2x00-interface/drivers/net/wireless/d80211/rt2x00/rt2500pci.c
  2006-08-27 16:12:03.0 +0200
  +++ 
  wireless-dev-rt2x00-register/drivers/net/wireless/d80211/rt2x00/rt2500pci.c 
  2006-08-27 16:17:56.0 +0200
  @@ -1249,6 +1249,7 @@
  return -EBUSY;
   
  rt2x00_register_write(rt2x00dev, PWRCSR0, cpu_to_le32(0x3f3b3100));
  +   rt2x00_register_write(rt2x00dev, PCICSR, cpu_to_le32(0x03b8));
   
  rt2x00_register_write(rt2x00dev, PSCSR0, cpu_to_le32(0x00020002));
  rt2x00_register_write(rt2x00dev, PSCSR1, cpu_to_le32(0x0002));
  @@ -1272,12 +1273,11 @@
  rt2x00_set_field32(reg, RXCSR0_DISABLE_RX, 0);
  rt2x00_register_write(rt2x00dev, RXCSR0, reg);
   
  -   rt2x00_register_write(rt2x00dev, MACCSR0, cpu_to_le32(0x00213223));
  +   rt2x00_register_write(rt2x00dev, GPIOCSR, cpu_to_le32(0xff00));
  +   rt2x00_register_write(rt2x00dev, TESTCSR, cpu_to_le32(0x00f0));
   
  -   rt2x00_register_read(rt2x00dev, MACCSR1, reg);
  -   rt2x00_set_field32(reg, MACCSR1_AUTO_TXBBP, 1);
  -   rt2x00_set_field32(reg, MACCSR1_AUTO_RXBBP, 1);
  -   rt2x00_register_write(rt2x00dev, MACCSR1, reg);
  +   rt2x00_register_write(rt2x00dev, MACCSR0, cpu_to_le32(0x00213223));
  +   rt2x00_register_write(rt2x00dev, MACCSR1, cpu_to_le32(0x00235518));
   
  rt2x00_register_read(rt2x00dev, MACCSR2, reg);
  rt2x00_set_field32(reg, MACCSR2_DELAY, 64);
 
 Lots of magic numbers here...can we do something about that?

Only partially, there are currently a couple problems with:
 - Some of the above registers are documented, but there is however a main 
problem with current documentation from Ralink,
   not all register maps are correct. The driver indicates some other function 
for the register than our documentation claims.
 - The register not defined, the documentation gives no details and I cannot 
understand the meaning of the value from
   the original Ralink code. This is a main problem with the original code 
since its code contains a great deal of magical values,
   most have them have been analysed and its meaning or origin was determined. 
There are however still some magical values left.
 - In some other situations it is indeed possible to use the bitmaps as defined 
in the header,
   but those would not always explain the meaning of the value clearer (GPIOCSR 
is for example just a list of BIT0, BIT1, BIT2 etc).
 - Using the defines fom the headers and using the rt2x00_set_field32 would 
result in quite a lot of cpu_to_le32 calls,
   this would be a waste on big endian machines when the particular register is 
only used once.

I will go over the values again and see which ones can be made clearer with 
comments, and if using the
rt2x00_set/get_field32 can be used instead to make things clearer.

Ivo
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC][PATCHv2 2.6.18-rc4-mm3 3/3] net/ipv4: misc. support files

2006-08-28 Thread gerrit
[Net/IPv4]: REVISED Miscellaneous changes which complete the 
v4 support for UDP-Lite.


Signed-off-by: Gerrit Renker [EMAIL PROTECTED]
---

 include/linux/in.h |1 +
 include/linux/socket.h |1 +
 include/net/snmp.h |2 ++
 include/net/xfrm.h |2 ++
 net/ipv4/af_inet.c |   15 ++-
 net/ipv4/proc.c|   16 ++--
 net/ipv6/udp.c |1 +
 7 files changed, 35 insertions(+), 3 deletions(-)


diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c
index 877e5b3..43faef2 100644
--- a/net/ipv4/af_inet.c
+++ b/net/ipv4/af_inet.c
@@ -1223,10 +1223,14 @@ static int __init init_ipv4_mibs(void)
tcp_statistics[1] = alloc_percpu(struct tcp_mib);
udp_statistics[0] = alloc_percpu(struct udp_mib);
udp_statistics[1] = alloc_percpu(struct udp_mib);
+   udplite_statistics[0] = alloc_percpu(struct udp_mib);
+   udplite_statistics[1] = alloc_percpu(struct udp_mib);
+
if (!
(net_statistics[0]  net_statistics[1]  ip_statistics[0]
  ip_statistics[1]  tcp_statistics[0]  tcp_statistics[1]
- udp_statistics[0]  udp_statistics[1]))
+ udp_statistics[0]  udp_statistics[1]
+ udplite_statistics[0]  udplite_statistics[1] ) )
return -ENOMEM;
 
(void) tcp_mib_init();
@@ -1300,6 +1304,11 @@ #endif
inet_register_protosw(q);
 
/*
+*  Add UDP-Lite (RFC 3828)
+*/
+   udplite4_register();
+
+   /*
 *  Set the ARP module up
 */
 
@@ -1367,6 +1376,8 @@ static int __init ipv4_proc_init(void)
goto out_tcp;
if (udp4_proc_init())
goto out_udp;
+   if (udplite4_proc_init())
+   goto out_udplite;
if (fib_proc_init())
goto out_fib;
if (ip_misc_proc_init())
@@ -1376,6 +1387,8 @@ out:
 out_misc:
fib_proc_exit();
 out_fib:
+   udplite4_proc_exit();
+out_udplite:
udp4_proc_exit();
 out_udp:
tcp4_proc_exit();
diff --git a/net/ipv4/proc.c b/net/ipv4/proc.c
index 9c6cbe3..608fe34 100644
--- a/net/ipv4/proc.c
+++ b/net/ipv4/proc.c
@@ -66,9 +66,10 @@ static int sockstat_seq_show(struct seq_
   tcp_death_row.tw_count, atomic_read(tcp_sockets_allocated),
   atomic_read(tcp_memory_allocated));
seq_printf(seq, UDP: inuse %d\n, fold_prot_inuse(udp_prot));
+   seq_printf(seq, UDPLITE: inuse %d\n, fold_prot_inuse(udplite_prot));
seq_printf(seq, RAW: inuse %d\n, fold_prot_inuse(raw_prot));
-   seq_printf(seq,  FRAG: inuse %d memory %d\n, ip_frag_nqueues,
-  atomic_read(ip_frag_mem));
+   seq_printf(seq, FRAG: inuse %d memory %d\n, ip_frag_nqueues,
+atomic_read(ip_frag_mem));
return 0;
 }
 
@@ -304,6 +305,17 @@ static int snmp_seq_show(struct seq_file
   fold_field((void **) udp_statistics, 
  snmp4_udp_list[i].entry));
 
+   /* the UDP and UDP-Lite MIBs are the same */
+   seq_puts(seq, \nUdpLite:);
+   for (i = 0; snmp4_udp_list[i].name != NULL; i++)
+   seq_printf(seq,  %s, snmp4_udp_list[i].name);
+
+   seq_puts(seq, \nUdpLite:);
+   for (i = 0; snmp4_udp_list[i].name != NULL; i++)
+   seq_printf(seq,  %lu,
+  fold_field((void **) udplite_statistics,
+ snmp4_udp_list[i].entry) );
+
seq_putc(seq, '\n');
return 0;
 }
diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c
index b9cc55c..b72540b 100644
--- a/net/ipv6/udp.c
+++ b/net/ipv6/udp.c
@@ -1073,6 +1073,7 @@ static struct udp_seq_afinfo udp6_seq_af
.owner  = THIS_MODULE,
.name   = udp6,
.family = AF_INET6,
+   .hashtable  = udp_hash,
.seq_show   = udp6_seq_show,
.seq_fops   = udp6_seq_fops,
 };
diff --git a/include/linux/in.h b/include/linux/in.h
index 94f557f..5ada82e 100644
--- a/include/linux/in.h
+++ b/include/linux/in.h
@@ -44,6 +44,7 @@ enum {
 
   IPPROTO_COMP   = 108,/* Compression Header protocol */
   IPPROTO_SCTP   = 132,/* Stream Control Transport Protocol
*/
+  IPPROTO_UDPLITE = 136,   /* UDP-Lite (RFC 3828)  */
 
   IPPROTO_RAW   = 255, /* Raw IP packets   */
   IPPROTO_MAX
diff --git a/include/linux/socket.h b/include/linux/socket.h
index 3614090..592b666 100644
--- a/include/linux/socket.h
+++ b/include/linux/socket.h
@@ -264,6 +264,7 @@ #define SOL_UDP 17
 #define SOL_IPV6   41
 #define SOL_ICMPV6 58
 #define SOL_SCTP   132
+#define SOL_UDPLITE136 /* UDP-Lite (RFC 3828) */
 #define SOL_RAW255
 #define SOL_IPX256
 #define SOL_AX25   257
diff --git a/include/net/snmp.h b/include/net/snmp.h
index 

Re: [RFC][PATCHv2 2.6.18-rc4-mm3 3/3] net/ipv4: misc. support files

2006-08-28 Thread Patrick McHardy
[EMAIL PROTECTED] wrote:
 [Net/IPv4]: REVISED Miscellaneous changes which complete the 
 v4 support for UDP-Lite.
 

 --- a/include/net/xfrm.h
 +++ b/include/net/xfrm.h
 @@ -467,6 +467,7 @@ u16 xfrm_flowi_sport(struct flowi *fl)
   switch(fl-proto) {
   case IPPROTO_TCP:
   case IPPROTO_UDP:
 + case IPPROTO_UDPLITE:
   case IPPROTO_SCTP:
   port = fl-fl_ip_sport;
   break;
 @@ -492,6 +493,7 @@ u16 xfrm_flowi_dport(struct flowi *fl)
   switch(fl-proto) {
   case IPPROTO_TCP:
   case IPPROTO_UDP:
 + case IPPROTO_UDPLITE:
   case IPPROTO_SCTP:
   port = fl-fl_ip_dport;
   break;

You also need to adapt _decode_session[46] in xfrm[46]_policy.c for
IPsec. While you're at it you might consider adjusting xt_tcpudp,
xt_multiport, ipt_LOG and ip6t_LOG as well to get some basic
netfilter support. I'm going to take care of connection tracking
and NAT once this is in mainline.

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 2.6.18-rc4-mm[1,2,3] -- Network card not getting assigned an eth device name

2006-08-28 Thread Andrew Morton
On Mon, 28 Aug 2006 08:52:02 -0700
Miles Lane [EMAIL PROTECTED] wrote:

 On 8/27/06, David Miller [EMAIL PROTECTED] wrote:
  From: Andrew Morton [EMAIL PROTECTED]
  Date: Sun, 27 Aug 2006 00:19:43 -0700
 
   Jeremy reported that a while back too.  I do not know what is causing it
   and as far as I know no net developers have yet looked into it.
 
  A debugging patch like this one should help figure out the culprit.
 
  If we don't see the gibberish netdevice name printed in the kernel
  logs, then likely something is corrupting the netdevice structure or
  the memory holding the name.
 
  diff --git a/net/core/dev.c b/net/core/dev.c
  index d4a1ec3..45f9b19 100644
  --- a/net/core/dev.c
  +++ b/net/core/dev.c
  @@ -738,6 +738,11 @@ int dev_change_name(struct net_device *d
 
  if (!dev_valid_name(newname))
  return -EINVAL;
  +#if 1
  +   printk([%s:%d]: Changing netdevice name from [%s] to [%s]\n,
  +  current-comm, current-pid,
  +  dev-name, newname);
  +#endif
 
  if (strchr(newname, '%')) {
  err = dev_alloc_name(dev, newname);
 
 
 Dan, do you have any idea why NetworkManager from Ubuntu 6.06.1
 would be corrupting network device names on recent MM kernels?
 I haven't seen this happening with Ubuntu's kernels.  If you like, I can
 send you my kernel .config file.
 
 Here's what I get:
 

grepping for `ioctl' gives:

ioctl(9, SIOCGIWNAME, 0xbfe38d8c) = -1 EINVAL (Invalid argument)
ioctl(9, SIOCETHTOOL, 0xbfe38d2c) = 0
ioctl(11, SIOCGIFHWADDR, {ifr_name=eth0, ???})  = -1 ENODEV (No such device)
ioctl(11, SIOCGIFFLAGS, {ifr_name=eth0, ???})   = -1 ENODEV (No such device)

Perhaps you could generate the strace output for 2.6.18-rc5, grep that for
ioctl, look for differences?  That initial SIOCGIWNAME failure is fishy.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] [IPV6] ROUTE: Fix dst refcounting.

2006-08-28 Thread David Miller
From: YOSHIFUJI Hideaki [EMAIL PROTECTED]
Date: Tue, 29 Aug 2006 01:46:49 +0900 (JST)

 [IPV6] ROUTE: Fix dst reference counting in ip6_pol_route_lookup().
 
 In ip6_pol_route_lookup(), when we finish backtracking at the
 top-level root entry, we need to hold it.
 
 Bug noticed by Mitsuru Chinen [EMAIL PROTECTED].
 
 Signed-off-by: YOSHIFUJI Hideaki [EMAIL PROTECTED]

Applied, thank you.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 0/7] d80211: support more wireless command

2006-08-28 Thread mabbas
the following patches are based on earlier patched. I did separate each 
set of command into its own patch with enhanced based on you comments.

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] net/*: use SLAB_PANIC

2006-08-28 Thread David Miller
From: Christoph Hellwig [EMAIL PROTECTED]
Date: Mon, 28 Aug 2006 10:36:51 +0100

 ipv6 can be modular, so panicing on a initialization failure is wrong.

That may be the case, but he merely translated the code
as it existed, he didn't change it to start panic()'ing
it already did.

It would be a seperate change to undo the panic() in
the ipv6 code.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3/7] d80211: report supported rates and channels in SIOCGIWRANGE

2006-08-28 Thread mabbas



This patch modify d80211 to report more information like supported
rate and channel in SIOCGIWRANGE command.
 
Signed-off-by: Mohamed Abbas [EMAIL PROTECTED]

diff --git a/net/d80211/ieee80211_ioctl.c b/net/d80211/ieee80211_ioctl.c
index 89a58e3..3d8156c 100644
--- a/net/d80211/ieee80211_ioctl.c
+++ b/net/d80211/ieee80211_ioctl.c
@@ -1566,6 +1566,10 @@ static int ieee80211_ioctl_giwrange(stru
  struct iw_point *data, char *extra)
 {
 	struct iw_range *range = (struct iw_range *) extra;
+	int i,j,c,n;
+	int skip = 0;
+	struct ieee80211_local *local = dev-ieee80211_ptr;
+	struct ieee80211_hw_modes *bg = NULL;
 
 	data-length = sizeof(struct iw_range);
 	memset(range, 0, sizeof(struct iw_range));
@@ -1581,6 +1585,55 @@ static int ieee80211_ioctl_giwrange(stru
 	range-min_frag = 256;
 	range-max_frag = 2346;
 
+	j = 0;
+	for (i = 0; i  local-num_curr_rates  j  IW_MAX_BITRATES; i++) {
+		struct ieee80211_rate *rate = local-curr_rates[i];
+
+		if (rate-flags  IEEE80211_RATE_SUPPORTED) {
+			range-bitrate[j] = rate-rate * 10;
+			j++;
+		}
+	}
+	range-num_bitrates = j;
+
+	c = 0;
+	for (i = 0; i  local-hw-num_modes; i++) {
+		struct ieee80211_hw_modes *mode = local-hw-modes[i];
+
+		for (j = 0; 
+		 j  mode-num_channels  c  IW_MAX_FREQUENCIES; j++) {
+			struct ieee80211_channel *chan = mode-channels[j];
+
+			/* skip any repeated bg channel */
+			skip = 0;
+			if (bg 
+			((mode-mode == MODE_IEEE80211G) ||
+			(mode-mode == MODE_IEEE80211B))) {
+
+for (n = 0; n  bg-num_channels; n++) {
+	if (bg-channels[0].chan == chan-chan){
+		skip = 1;
+		break;
+	}
+}
+			}
+
+			if (skip)
+continue;
+
+			range-freq[c].i = chan-chan;
+			range-freq[c].m = chan-freq * 10;
+			range-freq[c].e = 1;
+			c++;
+		}
+		if (!bg  ((mode-mode == MODE_IEEE80211G) || 
+		(mode-mode == MODE_IEEE80211B)))
+			bg = mode;
+
+	}
+	range-num_channels = c;
+	range-num_frequency = c;
+
 	return 0;
 }
 


RE: e1000 and 802.1ad/stacked vlan tagging

2006-08-28 Thread Brandeburg, Jesse
Stephan von Krawczynski wrote:
 thank you for answering anyway. Though I think your answer covers
 only the obvious half of the problem.
 Indeed one might think that this solves the issue - as long as there
 are only linux kernels involved. Unfortunately my setup is a bit more
 complicated in terms of hardware. So I should have probably clarified
 the question this way: how do you configure the interface in a manner
 that packets with data length of 1500 get transferred, and not only
 1496 ? 

I tried setting mtu 2000 and everything worked with the virtual device
(both) mtu at 1500.
If you are getting long_packet errors at the mtu settings you tried (you
didn't mention which ones) then the hardware is dropping the packets due
to being over 1522 bytes in length (including CRC).

 I tried enlarging both real-device and first vlan interface mtu but
 that does not work out. I really thought that the visible device
 setting of mtu=1500 should have worked out and that the driver (or
 some code in between) should have corrected the allowed frame size to
 reflect the actual setup, not? 

unfortunately I believe that your hardware MTU on the base interface
MUST be adjusted in order to do stacked vlans because the vlan code
doesn't fragment packets, it just inserts tags.  The e1000 hardware is
capable of inserting/stripping 1 level of tags without dropping overlong
frames, but cannot seamlessly handle 1+n levels of inserted tags.
Transmit, we don't care how long the frames are that are given to us
(the driver doesn't enforce MTU on transmit) but on receive we have
limited space so it is important that each frame fit into the allocated
buffer (including CRC).

Please try MTU 1508 on eth0 (base interface only), as that configuration
worked for me.

Jesse
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 6/7] d80211: diplay supported rates in readable format

2006-08-28 Thread mabbas


This patch modify d80211 to report supported rates in readable
format in iwlist scan command.

Signed-off-by: Mohamed Abbas [EMAIL PROTECTED]

diff --git a/net/d80211/ieee80211_sta.c b/net/d80211/ieee80211_sta.c
index a933d92..b2e45a4 100644
--- a/net/d80211/ieee80211_sta.c
+++ b/net/d80211/ieee80211_sta.c
@@ -2714,15 +2714,21 @@ ieee80211_sta_scan_result(struct net_dev
 		current_ev = iwe_stream_add_point(current_ev, end_buf, iwe,
 		  buf);
 
-		p = buf;
-		p += sprintf(p, supp_rates=);
-		for (i = 0; i  bss-supp_rates_len; i++)
-			p+= sprintf(p, %02x, bss-supp_rates[i]);
-		memset(iwe, 0, sizeof(iwe));
-		iwe.cmd = IWEVCUSTOM;
-		iwe.u.data.length = strlen(buf);
-		current_ev = iwe_stream_add_point(current_ev, end_buf, iwe,
-		  buf);
+		/* dispaly all support rates in readable format */
+		p = current_ev + IW_EV_LCP_LEN;
+		iwe.cmd = SIOCGIWRATE;
+		/* Those two flags are ignored... */
+		iwe.u.bitrate.fixed = iwe.u.bitrate.disabled = 0;
+
+		for (i = 0; i  bss-supp_rates_len; i++) {
+			iwe.u.bitrate.value = ((bss-supp_rates[i]  
+			0x7f) * 50);
+			p = iwe_stream_add_value(current_ev, p, 
+	end_buf, iwe, IW_EV_PARAM_LEN);
+		}
+		/* Check if we added any rate */
+		if((p - current_ev)  IW_EV_LCP_LEN)
+			current_ev = p;
 
 		kfree(buf);
 		break;


[PATCH 5/7] d80211: indicate if unassociate/radio off status

2006-08-28 Thread mabbas



This patch indicate unassociated and radio off status
in name field

Signed-off-by: Mohamed Abbas [EMAIL PROTECTED]

diff --git a/net/d80211/ieee80211_ioctl.c b/net/d80211/ieee80211_ioctl.c
index 89a58e3..44b2698 100644
--- a/net/d80211/ieee80211_ioctl.c
+++ b/net/d80211/ieee80211_ioctl.c
@@ -1538,6 +1538,19 @@ static int ieee80211_ioctl_giwname(struc
    char *name, char *extra)
 {
 	struct ieee80211_local *local = dev-ieee80211_ptr;
+	struct ieee80211_sub_if_data *sdata;
+
+	sdata = IEEE80211_DEV_TO_SUB_IF(dev);
+	if (!local-conf.radio_enabled) {
+	strcpy(name, radio off);
+return 0;
+	} else if (sdata-type == IEEE80211_IF_TYPE_STA) {
+		if ((sdata-u.sta.state != IEEE80211_ASSOCIATED) ||
+		(sdata-u.sta.probereq_poll)) {
+			strcpy(name, unassociated);
+			return 0;
+		}
+	}
 
 	switch (local-conf.phymode) {
 	case MODE_IEEE80211A:


Re: [PATCH 1/1][SCTP]: Fix sctp_primitive_ABORT() call in sctp_close()

2006-08-28 Thread David Miller
From: Sridhar Samudrala [EMAIL PROTECTED]
Date: Mon, 28 Aug 2006 12:11:36 -0700

 The recent SCTP CVE fix that went into 2.6.18 changed 
 sctp_primitive_ABORT() callers to create an ABORT chunk
 and pass it as an arg instead of struct msghdr.
 While submitting this fix, i missed the other location in
 sctp_close() where this is called.
 
 Please apply this patch to 2.6.18 and it should also go
 into the stable series.

This shows why embargoes are detrimental to Linux kernel development.

There might have been a chance of this being caught earlier if the
original bug fix had been posted for review here on netdev.  Now we
have the situation where a -stable kernel release went out with the
bogus version of the fix because it got _ZERO_ public review, and it
is likely that several distributions have therefore shipped kernels
with this problem too.

This is unacceptable.

Please everyone, learn from this, and understand that I will always
ignore embargoed bug reports sent to me.  Discuss networking bugs,
no matter how severe, here on netdev from the beginning so we can
fix things correctly and not via some private group of individuals.

Thanks.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 4/7] d80211: add support for SIOCSIWNICKN SIOCGIWNICKN

2006-08-28 Thread mabbas



This patch modify d80211 to add nick wireless command 

Signed-off-by: Mohamed Abbas [EMAIL PROTECTED]

diff --git a/net/d80211/ieee80211_i.h b/net/d80211/ieee80211_i.h
index 0d2d79d..02242c6 100644
--- a/net/d80211/ieee80211_i.h
+++ b/net/d80211/ieee80211_i.h
@@ -241,6 +241,7 @@ struct ieee80211_if_sta {
 		IEEE80211_IBSS_SEARCH, IEEE80211_IBSS_JOINED
 	} state;
 	struct timer_list timer;
+	u8 nick[IW_ESSID_MAX_SIZE];
 	u8 bssid[ETH_ALEN], prev_bssid[ETH_ALEN];
 	u8 ssid[IEEE80211_MAX_SSID_LEN];
 	size_t ssid_len;
diff --git a/net/d80211/ieee80211_ioctl.c b/net/d80211/ieee80211_ioctl.c
index 89a58e3..956eabb 100644
--- a/net/d80211/ieee80211_ioctl.c
+++ b/net/d80211/ieee80211_ioctl.c
@@ -2153,6 +2153,39 @@ static void ieee80211_ioctl_unmask_chann
 }
 
 
+static int ieee80211_ioctl_siwnick(struct net_device *dev,
+   struct iw_request_info *info,
+   union iwreq_data *wrqu, char *extra)
+{
+	struct ieee80211_sub_if_data *sdata;
+	struct ieee80211_if_sta *ifsta;
+
+	sdata = IEEE80211_DEV_TO_SUB_IF(dev);
+	ifsta = sdata-u.sta;
+	if (wrqu-data.length = IW_ESSID_MAX_SIZE)
+		return -E2BIG;
+
+	memset(ifsta-nick, 0, sizeof(ifsta-nick));
+	memcpy(ifsta-nick, extra, wrqu-data.length);
+	return 0;
+}
+
+static int ieee80211_ioctl_giwnick(struct net_device *dev,
+   struct iw_request_info *info,
+   union iwreq_data *wrqu, char *extra)
+{
+	struct ieee80211_sub_if_data *sdata;
+	struct ieee80211_if_sta *ifsta;
+
+	sdata = IEEE80211_DEV_TO_SUB_IF(dev);
+	ifsta = sdata-u.sta;
+
+	wrqu-data.length = strlen(ifsta-nick) + 1;
+	memcpy(extra, ifsta-nick, wrqu-data.length);
+	wrqu-data.flags = 1;   /* active */
+	return 0;
+}
+
 static int ieee80211_ioctl_test_mode(struct net_device *dev, int mode)
 {
 	struct ieee80211_local *local = dev-ieee80211_ptr;
@@ -3138,8 +3171,8 @@ static const iw_handler ieee80211_handle
 	(iw_handler) ieee80211_ioctl_giwscan,		/* SIOCGIWSCAN */
 	(iw_handler) ieee80211_ioctl_siwessid,		/* SIOCSIWESSID */
 	(iw_handler) ieee80211_ioctl_giwessid,		/* SIOCGIWESSID */
-	(iw_handler) NULL,/* SIOCSIWNICKN */
-	(iw_handler) NULL,/* SIOCGIWNICKN */
+	(iw_handler) ieee80211_ioctl_siwnick,		/* SIOCSIWNICKN */
+	(iw_handler) ieee80211_ioctl_giwnick,		/* SIOCGIWNICKN */
 	(iw_handler) NULL,/* -- hole -- */
 	(iw_handler) NULL,/* -- hole -- */
 	(iw_handler) NULL,/* SIOCSIWRATE */


[PATCH 7/7] d80211: getting wrong freq value if we did hardware scan

2006-08-28 Thread mabbas
In this patch  we search all A-BAND available channels to get the  right 
frequency value. this might not be the right thing to do in beacon 
parsing. Another approach is to have a static array of  the maximum 
A-BAND channel number then we can map from channel to frequency fast. we 
can set the values of this array at run time.  

This patch modify d80211 to fix getting wrong frequency value 
for scan implemented in hardware. With harware scan we might get
beacon of a network that is on different channel that in 
local-conf.channel causing set freq to wrong value.
 
Signed-off-by: Mohamed Abbas [EMAIL PROTECTED]

diff --git a/net/d80211/ieee80211_sta.c b/net/d80211/ieee80211_sta.c
index a933d92..374193e 100644
--- a/net/d80211/ieee80211_sta.c
+++ b/net/d80211/ieee80211_sta.c
@@ -1543,8 +1543,6 @@ #endif
 	bss-channel = channel;
 	bss-freq = local-conf.freq;
 	if (channel != local-conf.channel 
-	(local-conf.phymode == MODE_IEEE80211G ||
-	 local-conf.phymode == MODE_IEEE80211B) 
 	channel = 1  channel = 14) {
 		static const int freq_list[] = {
 			2412, 2417, 2422, 2427, 2432, 2437, 2442,
@@ -1553,6 +1551,32 @@ #endif
 		/* IEEE 802.11g/b mode can receive packets from neighboring
 		 * channels, so map the channel into frequency. */
 		bss-freq = freq_list[channel - 1];
+
+		if (bss-hw_mode != MODE_IEEE80211G 
+		bss-hw_mode != MODE_IEEE80211B) 
+			bss-hw_mode = MODE_IEEE80211G;
+
+	} else if (channel != local-conf.channel ) {
+		int j, i; 
+		int b_found = 0;
+
+		/* not a bg channel search in other mode */
+		for (i = 0; i  local-hw-num_modes; i++) {
+			struct ieee80211_hw_modes *mode = local-hw-modes[i];
+
+			if ((mode-mode != MODE_IEEE80211G) 
+			(mode-mode != MODE_IEEE80211B)){
+for (j = 0; mode-num_channels; j++)
+if (mode-channels[j].chan == channel) {
+	bss-freq = mode-channels[j].freq;
+	b_found = 1;
+	bss-hw_mode = mode-mode;
+	break;
+}
+			}
+			if (b_found)
+break;
+		}
 	}
 	bss-timestamp = timestamp;
 	bss-last_update = jiffies;


Re: [PATCH 5/7] d80211: indicate if unassociate/radio off status

2006-08-28 Thread Michael Wu
It would be helpful if you inlined your patches instead of attaching them next 
time.

I'm not comfortable with using the name for this purpose. Don't we just report 
00:00:00:00:00:00 when not associated? Also, for radio off, wasn't that being 
covered by the rfkill patches?

-Michael Wu


pgpSscKut6D6p.pgp
Description: PGP signature


Re: e1000 and 802.1ad/stacked vlan tagging

2006-08-28 Thread Ben Greear

Stephan von Krawczynski wrote:

Hello Jesse,

thank you for answering anyway. Though I think your answer covers only the
obvious half of the problem.
Indeed one might think that this solves the issue - as long as there are only
linux kernels involved. Unfortunately my setup is a bit more complicated in
terms of hardware. So I should have probably clarified the question this way:
how do you configure the interface in a manner that packets with data length
of 1500 get transferred, and not only 1496 ?
I tried enlarging both real-device and first vlan interface mtu but that does
not work out. I really thought that the visible device setting of mtu=1500
should have worked out and that the driver (or some code in between) should
have corrected the allowed frame size to reflect the actual setup, not?


Unless you are patching the VLAN code, stacked VLANs are not going to 
work anyway.  Search the archives of the VLAN mailing list for reasons 
why..and at least a few patches that 'fix' the problem for a few types 
of uses.


Ben
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 2.6.18-rc4-mm[1,2,3] -- Network card not getting assigned an eth device name

2006-08-28 Thread David Miller
From: Andrew Morton [EMAIL PROTECTED]
Date: Mon, 28 Aug 2006 12:03:28 -0700

 grepping for `ioctl' gives:
 
 ioctl(9, SIOCGIWNAME, 0xbfe38d8c) = -1 EINVAL (Invalid 
 argument)
 ioctl(9, SIOCETHTOOL, 0xbfe38d2c) = 0
 ioctl(11, SIOCGIFHWADDR, {ifr_name=eth0, ???})  = -1 ENODEV (No such device)
 ioctl(11, SIOCGIFFLAGS, {ifr_name=eth0, ???})   = -1 ENODEV (No such device)
 
 Perhaps you could generate the strace output for 2.6.18-rc5, grep that for
 ioctl, look for differences?  That initial SIOCGIWNAME failure is fishy.

That might help, but SIOCGIWNAME just gets a string that
says what wireless mode the device is in, not the device
name.  Althought NetworkManager might use this for something
interesting.

All of the interesting config calls are probably happening
via netlink, which doesn't get decoded by strace.

But changes via netlink can get traced by using ip in monitor mode,
try ip monitor all as root during such a NetworkManager run.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: divide error: 0000 in fib6_rule_match [Re: 2.6.18-rc4-mm3]

2006-08-28 Thread Andrew Morton
On Mon, 28 Aug 2006 22:07:16 +0200
Mattia Dongili [EMAIL PROTECTED] wrote:

 On Sat, Aug 26, 2006 at 04:09:22PM -0700, Andrew Morton wrote:
  
  ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.18-rc4/2.6.18-rc4-mm3/
 [...]
   git-net.patch
 
 got this one when starting sshd:
 
 [   44.412000] divide error:  [#1]
 [   44.412000] 4K_STACKS PREEMPT 
 [   44.412000] last sysfs file: 
 /devices/system/cpu/cpu0/cpufreq/ondemand/ignore_nice_load
 [   44.412000] Modules linked in: nfsd exportfs lockd sunrpc ipt_MASQUERADE 
 iptable_nat ip_nat xt_tcpudp xt_state ip_conntrack iptable_filter ip_tables 
 x_tables ipv6 jfs aes dm_crypt dm_mod rtc sony_acpi tun psmouse sonypi 
 speedstep_ich speedstep_lib cpufreq_conservative cpufreq_ondemand freq_table 
 cpufreq_powersave sd_mod usb_storage scsi_mod usbhid pcmcia snd_intel8x0 
 snd_ac97_codec snd_ac97_bus snd_pcm_oss snd_mixer_oss snd_pcm snd_timer 
 intel_agp agpgart i2c_i801 snd soundcore snd_page_alloc yenta_socket 
 rsrc_nonstatic pcmcia_core uhci_hcd usbcore evdev e100 mii pcspkr
 [   44.412000] CPU:0
 [   44.412000] EIP:0060:[d1516aca]Not tainted VLI
 [   44.412000] EFLAGS: 00210246   (2.6.18-rc4-mm3-1 #6) 
 [   44.412000] EIP is at fib6_rule_match+0x7a/0x150 [ipv6]
 [   44.412000] eax:    ebx: cd9d4e30   ecx: d15290e0   edx: 
 [   44.412000] esi: cd7d9e08   edi: cd9d4e30   ebp: cd9d4d34   esp: cd9d4d0c
 [   44.412000] ds: 007b   es: 007b   ss: 0068
 [   44.412000] Process sshd (pid: 3780, ti=cd9d4000 task=cf131590 
 task.ti=cd9d4000)
 [   44.412000] Stack: 0003 c018b200  ced9df60 cd9d4d6c  
 ced9df60 d15290e0 
 [   44.412000]cd7d9e08 cd9d4e30 cd9d4d58 c02c198e d15290e0 cd9d4e30 
  c123f380 
 [   44.412000]cd9d4e30 cd7d9e08 cd9d4e30 cd9d4d80 d15169dc d15290a0 
 cd9d4e30  
 [   44.412000] Call Trace:
 [   44.412000]  [c02c198e] fib_rules_lookup+0x5e/0xe0
 [   44.412000]  [d15169dc] fib6_rule_lookup+0x3c/0xb0 [ipv6]
 [   44.412000]  [d14f8702] ip6_route_output+0x32/0x40 [ipv6]
 [   44.412000]  [d14ed155] ip6_dst_lookup_tail+0x95/0xd0 [ipv6]
 [   44.412000]  [d14ed1a7] ip6_dst_lookup+0x17/0x20 [ipv6]
 [   44.412000]  [d15120ce] ip6_datagram_connect+0x36e/0x6c0 [ipv6]
 [   44.412000]  [c02f6829] inet_dgram_connect+0x39/0x80
 [   44.412000]  [c02a6ceb] sys_connect+0x6b/0x90
 [   44.412000]  [c02a846f] sys_socketcall+0x9f/0x260
 [   44.412000]  [c010325b] syscall_call+0x7/0xb
 [   44.412000]  [b7c7c93c] 0xb7c7c93c
 [   44.412000]  ===
 [   44.412000] Code: 00 00 00 89 d8 83 e0 1f 0f 85 9a 00 00 00 8b 5d 08 0f b6 
 53 68 84 d2 75 78 8b 55 08 8b 5d 0c 8b 4a 60 8b 43 28 31 c8 89 d1 31 d2 f7 
 71 64 85 c0 0f 94 c0 0f b6 c0 8b 5d f4 8b 75 f8 8b 7d fc 89 
 [   44.412000] EIP: [d1516aca] fib6_rule_match+0x7a/0x150 [ipv6] SS:ESP 
 0068:cd9d4d0c

I cannot work out how the heck you got a divide instruction in
fib6_rule_match().

Can you please do `make net/ipv6/fib6_rules.s', find the code which
implements fib6_rule_match() (line starting with fib6_rule_match:) and
send that plus the next 200-odd lines?  Or just stick fib6_rules.s on a
server somewhere?  Or mail me fib6_rules.s off-list.

Thanks.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: divide error: 0000 in fib6_rule_match [Re: 2.6.18-rc4-mm3]

2006-08-28 Thread Andi Kleen

 I cannot work out how the heck you got a divide instruction in
 fib6_rule_match().

This might be another symptom of the broken smp-alternatives patch.
It tended to randomly corrupt some instructions by inserting different
bytes which then crash in interesting ways.

I already sent a fix for that, but it's not in yet.

-Andi
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: divide error: 0000 in fib6_rule_match [Re: 2.6.18-rc4-mm3]

2006-08-28 Thread Andrew Morton
On Mon, 28 Aug 2006 22:07:16 +0200
Mattia Dongili [EMAIL PROTECTED] wrote:

 [   44.412000]  ===
 [   44.412000] Code: 00 00 00 89 d8 83 e0 1f 0f 85 9a 00 00 00 8b 5d 08 0f b6 
 53 68 84 d2 75 78 8b 55 08 8b 5d 0c 8b 4a 60 8b 43 28 31 c8 89 d1 31 d2 f7 
 71 64 85 c0 0f 94 c0 0f b6 c0 8b 5d f4 8b 75 f8 8b 7d fc 89 
 [   44.412000] EIP: [d1516aca] fib6_rule_match+0x7a/0x150 [ipv6] SS:ESP 
 0068:cd9d4d0c
 [   44.412000]  6note: sshd[3780] exited with preempt_count 1
 
 config and full dmesg:
 http://oioio.altervista.org/linux/config-2.6.18-rc4-mm3-1
 http://oioio.altervista.org/linux/dmesg-2.6.18-rc4-mm3-1
 
 it's at fib6_rules.c:132 but since I can't tell why r-fwmask is 0 I'll
 avoid proposing a wrong patch :)

Oh.  It looks like this has already been fixed:

#ifdef CONFIG_IPV6_ROUTE_FWMARK
if ((r-fwmark ^ fl-fl6_fwmark)  r-fwmask)
return 0;
#endif

there's no divide in there now.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: divide error: 0000 in fib6_rule_match

2006-08-28 Thread David Miller
From: Andrew Morton [EMAIL PROTECTED]
Date: Mon, 28 Aug 2006 14:30:03 -0700

 Oh.  It looks like this has already been fixed:
 
 #ifdef CONFIG_IPV6_ROUTE_FWMARK
 if ((r-fwmark ^ fl-fl6_fwmark)  r-fwmask)
 return 0;
 #endif
 
 there's no divide in there now.

That's right there used to be a typo there.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: IPSec kernel oops on ppc64

2006-08-28 Thread Joy Latten
Joy Latten [EMAIL PROTECTED] wrote:
 I installed 2.6.17 + patch-2.6.18-rc4 + 2.6.18-rc4-mm2
 onto two pSeries power 5 (ppc64 lpars) machines. I configured
 IPSec using the configuration listed below. 

Could you try straight 2.6.17? If that crashes too, then at least we
can be sure that it isn't something new.

A straight 2.6.17 kernel does not crash and my pings work.
A 2.6.17 + patch-2.6.18-rc4 does crash and my pings do not work.
The above tests were done on a ppc64. 
I can try patch-2.6.18-rc1, etc... to see which one it stops
working on to narrow it down.

Regards,
Joy

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/6] neighbour: convert neighbour hash table to hlist

2006-08-28 Thread Stephen Hemminger
Change the neighbour table hash list to hlist from list.h
to allow for easier later conversion to RCU.

Signed-off-by: Stephen Hemminger [EMAIL PROTECTED]

---
 include/net/neighbour.h |6 -
 net/core/neighbour.c|  160 +---
 2 files changed, 88 insertions(+), 78 deletions(-)

--- net-2.6.19.orig/include/net/neighbour.h
+++ net-2.6.19/include/net/neighbour.h
@@ -88,10 +88,10 @@ struct neigh_statistics
 
 struct neighbour
 {
-   struct neighbour*next;
+   struct hlist_node   hlist;
struct neigh_table  *tbl;
struct neigh_parms  *parms;
-   struct net_device   *dev;
+   struct net_device   *dev;
unsigned long   used;
unsigned long   confirmed;
unsigned long   updated;
@@ -161,7 +161,7 @@ struct neigh_table
unsigned long   last_rand;
kmem_cache_t*kmem_cachep;
struct neigh_statistics *stats;
-   struct neighbour**hash_buckets;
+   struct hlist_head   *hash_buckets;
unsigned inthash_mask;
__u32   hash_rnd;
unsigned inthash_chain_gc;
--- net-2.6.19.orig/net/core/neighbour.c
+++ net-2.6.19/net/core/neighbour.c
@@ -126,10 +126,11 @@ static int neigh_forced_gc(struct neigh_
 
write_lock_bh(tbl-lock);
for (i = 0; i = tbl-hash_mask; i++) {
-   struct neighbour *n, **np;
+   struct neighbour *n;
+   struct hlist_node *node, *tmp;
 
-   np = tbl-hash_buckets[i];
-   while ((n = *np) != NULL) {
+   hlist_for_each_entry_safe(n, node, tmp,
+ tbl-hash_buckets[i], hlist) {
/* Neighbour record may be discarded if:
 * - nobody refers to it.
 * - it is not permanent
@@ -137,7 +138,7 @@ static int neigh_forced_gc(struct neigh_
write_lock(n-lock);
if (atomic_read(n-refcnt) == 1 
!(n-nud_state  NUD_PERMANENT)) {
-   *np = n-next;
+   hlist_del(n-hlist);
n-dead = 1;
shrunk  = 1;
write_unlock(n-lock);
@@ -145,7 +146,6 @@ static int neigh_forced_gc(struct neigh_
continue;
}
write_unlock(n-lock);
-   np = n-next;
}
}
 
@@ -181,14 +181,15 @@ static void neigh_flush_dev(struct neigh
int i;
 
for (i = 0; i = tbl-hash_mask; i++) {
-   struct neighbour *n, **np = tbl-hash_buckets[i];
+   struct hlist_node *node, *tmp;
+   struct neighbour *n;
 
-   while ((n = *np) != NULL) {
-   if (dev  n-dev != dev) {
-   np = n-next;
+   hlist_for_each_entry_safe(n, node, tmp,
+ tbl-hash_buckets[i], hlist) {
+   if (dev  n-dev != dev)
continue;
-   }
-   *np = n-next;
+
+   hlist_del(n-hlist);
write_lock(n-lock);
neigh_del_timer(n);
n-dead = 1;
@@ -279,23 +280,20 @@ out_entries:
goto out;
 }
 
-static struct neighbour **neigh_hash_alloc(unsigned int entries)
+static struct hlist_head *neigh_hash_alloc(unsigned int entries)
 {
-   unsigned long size = entries * sizeof(struct neighbour *);
-   struct neighbour **ret;
+   unsigned long size = entries * sizeof(struct hlist_head);
 
-   if (size = PAGE_SIZE) {
-   ret = kzalloc(size, GFP_ATOMIC);
-   } else {
-   ret = (struct neighbour **)
+   if (size = PAGE_SIZE)
+   return kzalloc(size, GFP_ATOMIC);
+   else
+   return (struct hlist_head *)
  __get_free_pages(GFP_ATOMIC|__GFP_ZERO, get_order(size));
-   }
-   return ret;
 }
 
-static void neigh_hash_free(struct neighbour **hash, unsigned int entries)
+static void neigh_hash_free(struct hlist_head *hash, unsigned int entries)
 {
-   unsigned long size = entries * sizeof(struct neighbour *);
+   unsigned long size = entries * sizeof(struct hlist_head);
 
if (size = PAGE_SIZE)
kfree(hash);
@@ -305,7 +303,7 @@ static void neigh_hash_free(struct neigh
 
 static void neigh_hash_grow(struct neigh_table *tbl, unsigned long new_entries)
 {
-   struct neighbour **new_hash, **old_hash;
+   struct hlist_head *new_hash, *old_hash;
unsigned int i, new_hash_mask, old_entries;
 
NEIGH_CACHE_STAT_INC(tbl, hash_grows);
@@ -321,16 

[PATCH 6/6] neighbour: convert hard header cache to sequence number

2006-08-28 Thread Stephen Hemminger
The reading of the hard header cache in the output path can be
made lockless using seqlock.

Signed-off-by: Stephen Hemminger [EMAIL PROTECTED]

---
 include/linux/netdevice.h |3 ++-
 include/net/neighbour.h   |2 ++
 net/core/neighbour.c  |   40 +++-
 net/ipv4/ip_output.c  |   13 +++--
 net/ipv6/ip6_output.c |   13 +++--
 5 files changed, 45 insertions(+), 26 deletions(-)

--- net-2.6.19.orig/include/linux/netdevice.h
+++ net-2.6.19/include/linux/netdevice.h
@@ -193,7 +193,7 @@ struct hh_cache
  */
int hh_len; /* length of header */
int (*hh_output)(struct sk_buff *skb);
-   rwlock_thh_lock;
+   seqlock_t   hh_lock;
 
/* cached hardware header; allow for machine alignment needs.*/
 #define HH_DATA_MOD16
@@ -217,6 +217,7 @@ struct hh_cache
 #define LL_RESERVED_SPACE_EXTRA(dev,extra) \
dev)-hard_header_len+extra)~(HH_DATA_MOD - 1)) + HH_DATA_MOD)
 
+
 /* These flag bits are private to the generic network queueing
  * layer, they may not be explicitly referenced by any other
  * code.
--- net-2.6.19.orig/net/core/neighbour.c
+++ net-2.6.19/net/core/neighbour.c
@@ -591,9 +591,11 @@ void neigh_destroy(struct neighbour *nei
while ((hh = neigh-hh) != NULL) {
neigh-hh = hh-hh_next;
hh-hh_next = NULL;
-   write_lock_bh(hh-hh_lock);
+
+   write_seqlock_bh(hh-hh_lock);
hh-hh_output = neigh_blackhole;
-   write_unlock_bh(hh-hh_lock);
+   write_sequnlock_bh(hh-hh_lock);
+
if (atomic_dec_and_test(hh-hh_refcnt))
kfree(hh);
}
@@ -912,9 +914,9 @@ static void neigh_update_hhs(struct neig
 
if (update) {
for (hh = neigh-hh; hh; hh = hh-hh_next) {
-   write_lock_bh(hh-hh_lock);
+   write_seqlock_bh(hh-hh_lock);
update(hh, neigh-dev, neigh-ha);
-   write_unlock_bh(hh-hh_lock);
+   write_sequnlock_bh(hh-hh_lock);
}
}
 }
@@ -1105,7 +1107,7 @@ static void neigh_hh_init(struct neighbo
break;
 
if (!hh  (hh = kzalloc(sizeof(*hh), GFP_ATOMIC)) != NULL) {
-   rwlock_init(hh-hh_lock);
+   seqlock_init(hh-hh_lock);
hh-hh_type = protocol;
atomic_set(hh-hh_refcnt, 0);
hh-hh_next = NULL;
@@ -1128,6 +1130,33 @@ static void neigh_hh_init(struct neighbo
}
 }
 
+
+/*
+ * Add header to skb from hard header cache
+ * Handle case where cache gets changed.
+ */
+int neigh_hh_output(const struct hh_cache *hh, struct sk_buff *skb)
+{
+   int len, alen;
+   unsigned seq;
+   int (*output)(struct sk_buff *);
+
+   for(;;) {
+   seq = read_seqbegin(hh-hh_lock);
+   len = hh-hh_len;
+   alen = HH_DATA_ALIGN(len);
+   output = hh-hh_output;
+   memcpy(skb-data - alen, hh-hh_data, alen);
+   skb_push(skb, len);
+
+   if (likely(!read_seqretry(hh-hh_lock, seq)))
+   return output(skb);
+
+   /* undo and try again */
+   __skb_pull(skb, len);
+   }
+}
+
 /* This function can be used in contexts, where only old dev_queue_xmit
worked, f.e. if you want to override normal output path (eql, shaper),
but resolution is not made yet.
@@ -2767,6 +2796,7 @@ EXPORT_SYMBOL(neigh_delete);
 EXPORT_SYMBOL(neigh_destroy);
 EXPORT_SYMBOL(neigh_dump_info);
 EXPORT_SYMBOL(neigh_event_ns);
+EXPORT_SYMBOL(neigh_hh_output);
 EXPORT_SYMBOL(neigh_ifdown);
 EXPORT_SYMBOL(neigh_lookup);
 EXPORT_SYMBOL(neigh_lookup_nodev);
--- net-2.6.19.orig/net/ipv4/ip_output.c
+++ net-2.6.19/net/ipv4/ip_output.c
@@ -182,16 +182,9 @@ static inline int ip_finish_output2(stru
skb = skb2;
}
 
-   if (hh) {
-   int hh_alen;
-
-   read_lock_bh(hh-hh_lock);
-   hh_alen = HH_DATA_ALIGN(hh-hh_len);
-   memcpy(skb-data - hh_alen, hh-hh_data, hh_alen);
-   read_unlock_bh(hh-hh_lock);
-   skb_push(skb, hh-hh_len);
-   return hh-hh_output(skb);
-   } else if (dst-neighbour)
+   if (hh)
+   return neigh_hh_output(hh, skb);
+   else if (dst-neighbour)
return dst-neighbour-output(skb);
 
if (net_ratelimit())
--- net-2.6.19.orig/net/ipv6/ip6_output.c
+++ net-2.6.19/net/ipv6/ip6_output.c
@@ -76,16 +76,9 @@ static inline int ip6_output_finish(stru
struct dst_entry *dst = skb-dst;
struct hh_cache *hh = dst-hh;
 
-   if (hh) {
-   int hh_alen;
-
-   read_lock_bh(hh-hh_lock);
-   hh_alen = HH_DATA_ALIGN(hh-hh_len);
-   

[PATCH 5/6] neighbour: convert lookup to sequence lock

2006-08-28 Thread Stephen Hemminger
The reading of neighbour table entries can be converted from a slow
reader/writer lock to a fast lockless sequence number check.

Signed-off-by: Stephen Hemminger [EMAIL PROTECTED]


---
 include/net/neighbour.h |2 
 net/core/neighbour.c|  117 +---
 net/ipv4/arp.c  |  101 +
 net/ipv6/ndisc.c|   16 +++---
 net/ipv6/route.c|   12 ++--
 net/sched/sch_teql.c|   11 +++-
 6 files changed, 155 insertions(+), 104 deletions(-)

--- net-2.6.19.orig/include/net/neighbour.h
+++ net-2.6.19/include/net/neighbour.h
@@ -100,7 +100,7 @@ struct neighbour
__u8type;
__u8dead;
atomic_tprobes;
-   rwlock_tlock;
+   seqlock_t   lock;
unsigned char   ha[ALIGN(MAX_ADDR_LEN, sizeof(unsigned long))];
struct hh_cache *hh;
atomic_trefcnt;
--- net-2.6.19.orig/net/core/neighbour.c
+++ net-2.6.19/net/core/neighbour.c
@@ -143,17 +143,17 @@ static int neigh_forced_gc(struct neigh_
 * - nobody refers to it.
 * - it is not permanent
 */
-   write_lock(n-lock);
+   write_seqlock(n-lock);
if (atomic_read(n-refcnt) == 1 
!(n-nud_state  NUD_PERMANENT)) {
hlist_del_rcu(n-hlist);
n-dead = 1;
shrunk  = 1;
-   write_unlock(n-lock);
+   write_sequnlock(n-lock);
call_rcu(n-rcu, neigh_rcu_release);
continue;
}
-   write_unlock(n-lock);
+   write_sequnlock(n-lock);
}
}
 
@@ -198,7 +198,7 @@ static void neigh_flush_dev(struct neigh
continue;
 
hlist_del_rcu(n-hlist);
-   write_lock(n-lock);
+   write_seqlock(n-lock);
neigh_del_timer(n);
n-dead = 1;
 
@@ -220,7 +220,7 @@ static void neigh_flush_dev(struct neigh
n-nud_state = NUD_NONE;
NEIGH_PRINTK2(neigh %p is stray.\n, n);
}
-   write_unlock(n-lock);
+   write_sequnlock(n-lock);
neigh_release(n);
}
}
@@ -267,7 +267,7 @@ static struct neighbour *neigh_alloc(str
memset(n, 0, tbl-entry_size);
 
skb_queue_head_init(n-arp_queue);
-   rwlock_init(n-lock);
+   seqlock_init(n-lock);
n-updated= n-used = now;
n-nud_state  = NUD_NONE;
n-output = neigh_blackhole;
@@ -615,7 +615,7 @@ void neigh_destroy(struct neighbour *nei
 /* Neighbour state is suspicious;
disable fast path.
 
-   Called with write_locked neigh.
+   Called with locked neigh.
  */
 static void neigh_suspect(struct neighbour *neigh)
 {
@@ -632,7 +632,7 @@ static void neigh_suspect(struct neighbo
 /* Neighbour state is OK;
enable fast path.
 
-   Called with write_locked neigh.
+   Called with locked neigh.
  */
 static void neigh_connect(struct neighbour *neigh)
 {
@@ -676,7 +676,7 @@ static void neigh_periodic_timer(unsigne
hlist_for_each_entry_safe(n, node, tmp, head, hlist) {
unsigned int state;
 
-   write_lock(n-lock);
+   write_seqlock(n-lock);
 
state = n-nud_state;
if (state  (NUD_PERMANENT | NUD_IN_TIMER))
@@ -690,12 +690,12 @@ static void neigh_periodic_timer(unsigne
 time_after(now, n-used + n-parms-gc_staletime))) {
hlist_del_rcu(n-hlist);
n-dead = 1;
-   write_unlock(n-lock);
+   write_sequnlock(n-lock);
neigh_release(n);
continue;
}
next_elt:
-   write_unlock(n-lock);
+   write_sequnlock(n-lock);
}
 
/* Cycle through all hash buckets every base_reachable_time/2 ticks.
@@ -738,7 +738,7 @@ static void neigh_timer_handler(unsigned
unsigned state;
int notify = 0;
 
-   write_lock(neigh-lock);
+   write_seqlock(neigh-lock);
 
state = neigh-nud_state;
now = jiffies;
@@ -748,6 +748,7 @@ static void neigh_timer_handler(unsigned
 #ifndef CONFIG_SMP
printk(KERN_WARNING neigh: timer  !nud_in_timer\n);
 #endif
+   write_sequnlock(neigh-lock);
goto out;
}
 
@@ -808,9 +809,9 @@ static void neigh_timer_handler(unsigned
  

[PATCH 3/6] neighbour: convert pneigh hash table to hlist

2006-08-28 Thread Stephen Hemminger
Change the pneigh_entry table to hlist from list.h
to allow for easier later conversion to RCU.

Signed-off-by: Stephen Hemminger [EMAIL PROTECTED]

---
 include/net/neighbour.h |6 ++--
 net/core/neighbour.c|   58 
 2 files changed, 33 insertions(+), 31 deletions(-)

--- net-2.6.19.orig/include/net/neighbour.h
+++ net-2.6.19/include/net/neighbour.h
@@ -124,8 +124,8 @@ struct neigh_ops
 
 struct pneigh_entry
 {
-   struct pneigh_entry *next;
-   struct net_device   *dev;
+   struct hlist_node   hlist;
+   struct net_device   *dev;
u8  key[0];
 };
 
@@ -165,7 +165,7 @@ struct neigh_table
unsigned inthash_mask;
__u32   hash_rnd;
unsigned inthash_chain_gc;
-   struct pneigh_entry **phash_buckets;
+   struct hlist_head   *phash_buckets;
 #ifdef CONFIG_PROC_FS
struct proc_dir_entry   *pde;
 #endif
--- net-2.6.19.orig/net/core/neighbour.c
+++ net-2.6.19/net/core/neighbour.c
@@ -455,6 +455,7 @@ struct pneigh_entry * pneigh_lookup(stru
struct net_device *dev, int creat)
 {
struct pneigh_entry *n;
+   struct hlist_node *tmp;
int key_len = tbl-key_len;
u32 hash_val = *(u32 *)(pkey + key_len - 4);
 
@@ -465,7 +466,7 @@ struct pneigh_entry * pneigh_lookup(stru
 
read_lock_bh(tbl-lock);
 
-   for (n = tbl-phash_buckets[hash_val]; n; n = n-next) {
+   hlist_for_each_entry(n, tmp, tbl-phash_buckets[hash_val], hlist) {
if (!memcmp(n-key, pkey, key_len) 
(n-dev == dev || !n-dev)) {
read_unlock_bh(tbl-lock);
@@ -495,8 +496,7 @@ struct pneigh_entry * pneigh_lookup(stru
}
 
write_lock_bh(tbl-lock);
-   n-next = tbl-phash_buckets[hash_val];
-   tbl-phash_buckets[hash_val] = n;
+   hlist_add_head(n-hlist, tbl-phash_buckets[hash_val]);
write_unlock_bh(tbl-lock);
 out:
return n;
@@ -506,7 +506,8 @@ out:
 int pneigh_delete(struct neigh_table *tbl, const void *pkey,
  struct net_device *dev)
 {
-   struct pneigh_entry *n, **np;
+   struct pneigh_entry *n;
+   struct hlist_node *tmp;
int key_len = tbl-key_len;
u32 hash_val = *(u32 *)(pkey + key_len - 4);
 
@@ -516,10 +517,9 @@ int pneigh_delete(struct neigh_table *tb
hash_val = PNEIGH_HASHMASK;
 
write_lock_bh(tbl-lock);
-   for (np = tbl-phash_buckets[hash_val]; (n = *np) != NULL;
-np = n-next) {
+   hlist_for_each_entry(n, tmp, tbl-phash_buckets[hash_val], hlist) {
if (!memcmp(n-key, pkey, key_len)  n-dev == dev) {
-   *np = n-next;
+   hlist_del(n-hlist);
write_unlock_bh(tbl-lock);
if (tbl-pdestructor)
tbl-pdestructor(n);
@@ -535,22 +535,21 @@ int pneigh_delete(struct neigh_table *tb
 
 static int pneigh_ifdown(struct neigh_table *tbl, struct net_device *dev)
 {
-   struct pneigh_entry *n, **np;
u32 h;
 
for (h = 0; h = PNEIGH_HASHMASK; h++) {
-   np = tbl-phash_buckets[h];
-   while ((n = *np) != NULL) {
+   struct pneigh_entry *n;
+   struct hlist_node *tmp, *nxt;
+
+   hlist_for_each_entry_safe(n, tmp, nxt, tbl-phash_buckets[h], 
hlist) {
if (!dev || n-dev == dev) {
-   *np = n-next;
+   hlist_del(n-hlist);
if (tbl-pdestructor)
tbl-pdestructor(n);
if (n-dev)
dev_put(n-dev);
kfree(n);
-   continue;
}
-   np = n-next;
}
}
return -ENOENT;
@@ -1332,7 +1331,6 @@ void neigh_parms_destroy(struct neigh_pa
 void neigh_table_init_no_netlink(struct neigh_table *tbl)
 {
unsigned long now = jiffies;
-   unsigned long phsize;
 
atomic_set(tbl-parms.refcnt, 1);
INIT_RCU_HEAD(tbl-parms.rcu_head);
@@ -1359,8 +1357,8 @@ void neigh_table_init_no_netlink(struct 
tbl-hash_mask = 1;
tbl-hash_buckets = neigh_hash_alloc(tbl-hash_mask + 1);
 
-   phsize = (PNEIGH_HASHMASK + 1) * sizeof(struct pneigh_entry *);
-   tbl-phash_buckets = kzalloc(phsize, GFP_KERNEL);
+   tbl-phash_buckets = kcalloc(PNEIGH_HASHMASK + 1, sizeof(struct 
hlist_head),
+GFP_KERNEL);
 
if (!tbl-hash_buckets || !tbl-phash_buckets)
panic(cannot allocate neighbour cache hashes);
@@ -2188,18 +2186,18 @@ static struct pneigh_entry *pneigh_get_f
 {
struct neigh_seq_state *state = 

[PATCH 4/6] net neighbour: convert to RCU

2006-08-28 Thread Stephen Hemminger
Use RCU to allow for lock less access to the neighbour table.
This should speedup the send path because no atomic operations
will be needed to lookup ARP entries, etc.


Signed-off-by: Stephen Hemminger [EMAIL PROTECTED]

---
 include/net/neighbour.h |4 -
 net/core/neighbour.c|  158 +---
 2 files changed, 87 insertions(+), 75 deletions(-)

--- net-2.6.19.orig/include/net/neighbour.h
+++ net-2.6.19/include/net/neighbour.h
@@ -108,6 +108,7 @@ struct neighbour
struct sk_buff_head arp_queue;
struct timer_list   timer;
struct neigh_ops*ops;
+   struct rcu_head rcu;
u8  primary_key[0];
 };
 
@@ -126,6 +127,7 @@ struct pneigh_entry
 {
struct hlist_node   hlist;
struct net_device   *dev;
+   struct rcu_head rcu;
u8  key[0];
 };
 
@@ -157,7 +159,7 @@ struct neigh_table
struct timer_list   proxy_timer;
struct sk_buff_head proxy_queue;
atomic_tentries;
-   rwlock_tlock;
+   spinlock_t  lock;
unsigned long   last_rand;
kmem_cache_t*kmem_cachep;
struct neigh_statistics *stats;
--- net-2.6.19.orig/net/core/neighbour.c
+++ net-2.6.19/net/core/neighbour.c
@@ -67,9 +67,10 @@ static struct file_operations neigh_stat
 #endif
 
 /*
-   Neighbour hash table buckets are protected with rwlock tbl-lock.
+   Neighbour hash table buckets are protected with lock tbl-lock.
 
-   - All the scans/updates to hash buckets MUST be made under this lock.
+   - All the scans of hash buckes must be made with RCU read lock (nopreempt)
+   - updates to hash buckets MUST be made under this lock.
- NOTHING clever should be made under this lock: no callbacks
  to protocol backends, no attempts to send something to network.
  It will result in deadlocks, if backend/driver wants to use neighbour
@@ -117,6 +118,13 @@ unsigned long neigh_rand_reach_time(unsi
 }
 
 
+static void neigh_rcu_release(struct rcu_head *head)
+{
+   struct neighbour *neigh = container_of(head, struct neighbour, rcu);
+
+   neigh_release(neigh);
+}
+
 static int neigh_forced_gc(struct neigh_table *tbl)
 {
int shrunk = 0;
@@ -124,7 +132,7 @@ static int neigh_forced_gc(struct neigh_
 
NEIGH_CACHE_STAT_INC(tbl, forced_gc_runs);
 
-   write_lock_bh(tbl-lock);
+   spin_lock_bh(tbl-lock);
for (i = 0; i = tbl-hash_mask; i++) {
struct neighbour *n;
struct hlist_node *node, *tmp;
@@ -138,11 +146,11 @@ static int neigh_forced_gc(struct neigh_
write_lock(n-lock);
if (atomic_read(n-refcnt) == 1 
!(n-nud_state  NUD_PERMANENT)) {
-   hlist_del(n-hlist);
+   hlist_del_rcu(n-hlist);
n-dead = 1;
shrunk  = 1;
write_unlock(n-lock);
-   neigh_release(n);
+   call_rcu(n-rcu, neigh_rcu_release);
continue;
}
write_unlock(n-lock);
@@ -151,7 +159,7 @@ static int neigh_forced_gc(struct neigh_
 
tbl-last_flush = jiffies;
 
-   write_unlock_bh(tbl-lock);
+   spin_unlock_bh(tbl-lock);
 
return shrunk;
 }
@@ -189,7 +197,7 @@ static void neigh_flush_dev(struct neigh
if (dev  n-dev != dev)
continue;
 
-   hlist_del(n-hlist);
+   hlist_del_rcu(n-hlist);
write_lock(n-lock);
neigh_del_timer(n);
n-dead = 1;
@@ -220,17 +228,17 @@ static void neigh_flush_dev(struct neigh
 
 void neigh_changeaddr(struct neigh_table *tbl, struct net_device *dev)
 {
-   write_lock_bh(tbl-lock);
+   spin_lock_bh(tbl-lock);
neigh_flush_dev(tbl, dev);
-   write_unlock_bh(tbl-lock);
+   spin_unlock_bh(tbl-lock);
 }
 
 int neigh_ifdown(struct neigh_table *tbl, struct net_device *dev)
 {
-   write_lock_bh(tbl-lock);
+   spin_lock_bh(tbl-lock);
neigh_flush_dev(tbl, dev);
pneigh_ifdown(tbl, dev);
-   write_unlock_bh(tbl-lock);
+   spin_unlock_bh(tbl-lock);
 
del_timer_sync(tbl-proxy_timer);
pneigh_queue_purge(tbl-proxy_queue);
@@ -326,8 +334,8 @@ static void neigh_hash_grow(struct neigh
unsigned int hash_val = tbl-hash(n-primary_key, 
n-dev);
 
hash_val = new_hash_mask;
-   hlist_del(n-hlist);
-   hlist_add_head(n-hlist, new_hash[hash_val]);
+   __hlist_del(n-hlist);
+   hlist_add_head_rcu(n-hlist, 

[PATCH 0/5] skge update

2006-08-28 Thread Stephen Hemminger
Several non-critical bug fixes for skge driver.

--
Stephen Hemminger [EMAIL PROTECTED]

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/5] skge: pci bus post fixes

2006-08-28 Thread Stephen Hemminger
At the end of a critical section, we need to force the PCI write
to complete by doing a read.

Signed-off-by: Stephen Hemminger [EMAIL PROTECTED]

--- skge-2.6.orig/drivers/net/skge.c
+++ skge-2.6/drivers/net/skge.c
@@ -2747,7 +2747,7 @@ static int skge_poll(struct net_device *
spin_lock_irq(hw-hw_lock);
hw-intr_mask |= rxirqmask[skge-port];
skge_write32(hw, B0_IMSK, hw-intr_mask);
-   mmiowb();
+   skge_read32(hw, B0_IMSK);
spin_unlock_irq(hw-hw_lock);
 
return 0;
@@ -2881,6 +2881,7 @@ static void skge_extirq(void *arg)
spin_lock_irq(hw-hw_lock);
hw-intr_mask |= IS_EXT_REG;
skge_write32(hw, B0_IMSK, hw-intr_mask);
+   skge_read32(hw, B0_IMSK);
spin_unlock_irq(hw-hw_lock);
 }
 
@@ -2955,6 +2956,7 @@ static irqreturn_t skge_intr(int irq, vo
skge_error_irq(hw);
 
skge_write32(hw, B0_IMSK, hw-intr_mask);
+   skge_read32(hw, B0_IMSK);
spin_unlock(hw-hw_lock);
 
return IRQ_HANDLED;
@@ -3424,6 +3426,7 @@ static void __devexit skge_remove(struct
spin_lock_irq(hw-hw_lock);
hw-intr_mask = 0;
skge_write32(hw, B0_IMSK, 0);
+   skge_read32(hw, B0_IMSK);
spin_unlock_irq(hw-hw_lock);
 
skge_write16(hw, B0_LED, LED_STAT_OFF);

--
Stephen Hemminger [EMAIL PROTECTED]

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 4/5] skge: use ethX for irq assigments

2006-08-28 Thread Stephen Hemminger
The user level irq balance daemon uses eth as a way to distinquish
ethernet devices. Also, by using device name it is possible to distinquish
different boards.

Signed-off-by: Stephen Hemminger [EMAIL PROTECTED]

--- skge-2.6.orig/drivers/net/skge.c
+++ skge-2.6/drivers/net/skge.c
@@ -3343,23 +3343,16 @@ static int __devinit skge_probe(struct p
goto err_out_free_hw;
}
 
-   err = request_irq(pdev-irq, skge_intr, IRQF_SHARED, DRV_NAME, hw);
-   if (err) {
-   printk(KERN_ERR PFX %s: cannot assign irq %d\n,
-  pci_name(pdev), pdev-irq);
-   goto err_out_iounmap;
-   }
-   pci_set_drvdata(pdev, hw);
-
err = skge_reset(hw);
if (err)
-   goto err_out_free_irq;
+   goto err_out_iounmap;
 
printk(KERN_INFO PFX DRV_VERSION  addr 0x%llx irq %d chip %s rev %d\n,
   (unsigned long long)pci_resource_start(pdev, 0), pdev-irq,
   skge_board_name(hw), hw-chip_rev);
 
-   if ((dev = skge_devinit(hw, 0, using_dac)) == NULL)
+   dev = skge_devinit(hw, 0, using_dac);
+   if (!dev)
goto err_out_led_off;
 
if (!is_valid_ether_addr(dev-dev_addr)) {
@@ -3369,7 +3362,6 @@ static int __devinit skge_probe(struct p
goto err_out_free_netdev;
}
 
-
err = register_netdev(dev);
if (err) {
printk(KERN_ERR PFX %s: cannot register net device\n,
@@ -3377,6 +3369,12 @@ static int __devinit skge_probe(struct p
goto err_out_free_netdev;
}
 
+   err = request_irq(pdev-irq, skge_intr, IRQF_SHARED, dev-name, hw);
+   if (err) {
+   printk(KERN_ERR PFX %s: cannot assign irq %d\n,
+  dev-name, pdev-irq);
+   goto err_out_unregister;
+   }
skge_show_addr(dev);
 
if (hw-ports  1  (dev1 = skge_devinit(hw, 1, using_dac))) {
@@ -3389,15 +3387,16 @@ static int __devinit skge_probe(struct p
free_netdev(dev1);
}
}
+   pci_set_drvdata(pdev, hw);
 
return 0;
 
+err_out_unregister:
+   unregister_netdev(dev);
 err_out_free_netdev:
free_netdev(dev);
 err_out_led_off:
skge_write16(hw, B0_LED, LED_STAT_OFF);
-err_out_free_irq:
-   free_irq(pdev-irq, hw);
 err_out_iounmap:
iounmap(hw-regs);
 err_out_free_hw:

--
Stephen Hemminger [EMAIL PROTECTED]

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3/5] skge: use dev_alloc_skb

2006-08-28 Thread Stephen Hemminger
To avoid problems with buggy protocols that assume extra header space,
use dev_alloc_skb() when allocating receive buffers.

Signed-off-by: Stephen Hemminger [EMAIL PROTECTED]

--- skge-2.6.orig/drivers/net/skge.c
+++ skge-2.6/drivers/net/skge.c
@@ -827,7 +827,8 @@ static int skge_rx_fill(struct skge_port
do {
struct sk_buff *skb;
 
-   skb = alloc_skb(skge-rx_buf_size + NET_IP_ALIGN, GFP_KERNEL);
+   skb = __dev_alloc_skb(skge-rx_buf_size + NET_IP_ALIGN,
+ GFP_KERNEL);
if (!skb)
return -ENOMEM;
 
@@ -2609,7 +2610,7 @@ static inline struct sk_buff *skge_rx_ge
goto error;
 
if (len  RX_COPY_THRESHOLD) {
-   skb = alloc_skb(len + 2, GFP_ATOMIC);
+   skb = dev_alloc_skb(len + 2);
if (!skb)
goto resubmit;
 
@@ -2624,7 +2625,7 @@ static inline struct sk_buff *skge_rx_ge
skge_rx_reuse(e, skge-rx_buf_size);
} else {
struct sk_buff *nskb;
-   nskb = alloc_skb(skge-rx_buf_size + NET_IP_ALIGN, GFP_ATOMIC);
+   nskb = dev_alloc_skb(skge-rx_buf_size + NET_IP_ALIGN);
if (!nskb)
goto resubmit;
 

--
Stephen Hemminger [EMAIL PROTECTED]

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] pcnet32: fix user visible typo

2006-08-28 Thread Alexey Dobriyan
Also, final dot removed and single form fixed. The cause of #6428 is
still to be found.

Signed-off-by: Alexey Dobriyan [EMAIL PROTECTED]
---

 drivers/net/pcnet32.c |3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

--- a/drivers/net/pcnet32.c
+++ b/drivers/net/pcnet32.c
@@ -2986,7 +2986,8 @@ static int __init pcnet32_init_module(vo
pcnet32_probe_vlbus(pcnet32_portlist);
 
if (cards_found  (pcnet32_debug  NETIF_MSG_PROBE))
-   printk(KERN_INFO PFX %d cards_found.\n, cards_found);
+   printk(KERN_INFO PFX %d card%s found\n,
+   cards_found, cards_found  1 ? s : );
 
return (pcnet32_have_pci + cards_found) ? 0 : -ENODEV;
 }

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 5/5] skge: version 1.7

2006-08-28 Thread Stephen Hemminger
Increase version.

Signed-off-by: Stephen Hemminger [EMAIL PROTECTED]

--- skge-2.6.orig/drivers/net/skge.c
+++ skge-2.6/drivers/net/skge.c
@@ -43,7 +43,7 @@
 #include skge.h
 
 #define DRV_NAME   skge
-#define DRV_VERSION1.6
+#define DRV_VERSION1.7
 #define PFXDRV_NAME  
 
 #define DEFAULT_TX_RING_SIZE   128

--
Stephen Hemminger [EMAIL PROTECTED]

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/5] skge: cleanup suspend/resume code

2006-08-28 Thread Stephen Hemminger
The code for suspend/resume needs several fixes. The hardware lock
should be setup in probe only, not in resume. Interrupts should be
disabled during suspend, etc.

Signed-off-by: Stephen Hemminger [EMAIL PROTECTED]


--- skge-2.6.orig/drivers/net/skge.c
+++ skge-2.6/drivers/net/skge.c
@@ -3106,7 +3106,6 @@ static int skge_reset(struct skge_hw *hw
else
hw-ram_size = t8 * 4096;
 
-   spin_lock_init(hw-hw_lock);
hw-intr_mask = IS_HW_ERR | IS_EXT_REG | IS_PORT_1;
if (hw-ports  1)
hw-intr_mask |= IS_PORT_2;
@@ -3332,6 +3331,7 @@ static int __devinit skge_probe(struct p
hw-pdev = pdev;
mutex_init(hw-phy_mutex);
INIT_WORK(hw-phy_work, skge_extirq, hw);
+   spin_lock_init(hw-hw_lock);
 
hw-regs = ioremap_nocache(pci_resource_start(pdev, 0), 0x4000);
if (!hw-regs) {
@@ -3449,26 +3449,25 @@ static int skge_suspend(struct pci_dev *
struct skge_hw *hw  = pci_get_drvdata(pdev);
int i, wol = 0;
 
-   for (i = 0; i  2; i++) {
+   pci_save_state(pdev);
+   for (i = 0; i  hw-ports; i++) {
struct net_device *dev = hw-dev[i];
 
-   if (dev) {
+   if (netif_running(dev)) {
struct skge_port *skge = netdev_priv(dev);
-   if (netif_running(dev)) {
-   netif_carrier_off(dev);
-   if (skge-wol)
-   netif_stop_queue(dev);
-   else
-   skge_down(dev);
-   }
-   netif_device_detach(dev);
+
+   netif_carrier_off(dev);
+   if (skge-wol)
+   netif_stop_queue(dev);
+   else
+   skge_down(dev);
wol |= skge-wol;
}
+   netif_device_detach(dev);
}
 
-   pci_save_state(pdev);
+   skge_write32(hw, B0_IMSK, 0);
pci_enable_wake(pdev, pci_choose_state(pdev, state), wol);
-   pci_disable_device(pdev);
pci_set_power_state(pdev, pci_choose_state(pdev, state));
 
return 0;
@@ -3477,23 +3476,33 @@ static int skge_suspend(struct pci_dev *
 static int skge_resume(struct pci_dev *pdev)
 {
struct skge_hw *hw  = pci_get_drvdata(pdev);
-   int i;
+   int i, err;
 
pci_set_power_state(pdev, PCI_D0);
pci_restore_state(pdev);
pci_enable_wake(pdev, PCI_D0, 0);
 
-   skge_reset(hw);
+   err = skge_reset(hw);
+   if (err)
+   goto out;
 
-   for (i = 0; i  2; i++) {
+   for (i = 0; i  hw-ports; i++) {
struct net_device *dev = hw-dev[i];
-   if (dev) {
-   netif_device_attach(dev);
-   if (netif_running(dev)  skge_up(dev))
+
+   netif_device_attach(dev);
+   if (netif_running(dev)) {
+   err = skge_up(dev);
+
+   if (err) {
+   printk(KERN_ERR PFX %s: could not up: %d\n,
+  dev-name, err);
dev_close(dev);
+   goto out;
+   }
}
}
-   return 0;
+out:
+   return err;
 }
 #endif
 

--
Stephen Hemminger [EMAIL PROTECTED]

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: IPSec kernel oops on ppc64

2006-08-28 Thread David Miller
From: Joy Latten [EMAIL PROTECTED]
Date: Mon, 28 Aug 2006 17:25:15 -0500

 I can try patch-2.6.18-rc1, etc... to see which one it stops
 working on to narrow it down.

If you could do this in the meanwhile, it would help us out
a lot.

Thanks.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] pcnet32: fix user visible typo

2006-08-28 Thread Don Fry
The cause of #6428 has already been fixed in v1.32 of the pcnet32
driver.  To be correct, the printk should be:

printk(KERN_INFO PFX %d card%s found\n,
cards_found, cards_found != 1 ? s : );

So that zero cards also says 'pcnet32: 0 cards found.'
Why delete the period from the end of the sentence?

On Tue, Aug 29, 2006 at 03:32:49AM +0400, Alexey Dobriyan wrote:
 Also, final dot removed and single form fixed. The cause of #6428 is
 still to be found.
 
 Signed-off-by: Alexey Dobriyan [EMAIL PROTECTED]
 ---
 
  drivers/net/pcnet32.c |3 ++-
  1 file changed, 2 insertions(+), 1 deletion(-)
 
 --- a/drivers/net/pcnet32.c
 +++ b/drivers/net/pcnet32.c
 @@ -2986,7 +2986,8 @@ static int __init pcnet32_init_module(vo
   pcnet32_probe_vlbus(pcnet32_portlist);
  
   if (cards_found  (pcnet32_debug  NETIF_MSG_PROBE))
 - printk(KERN_INFO PFX %d cards_found.\n, cards_found);
 + printk(KERN_INFO PFX %d card%s found\n,
 + cards_found, cards_found  1 ? s : );
  
   return (pcnet32_have_pci + cards_found) ? 0 : -ENODEV;
  }
 
 -
 To unsubscribe from this list: send the line unsubscribe netdev in
 the body of a message to [EMAIL PROTECTED]
 More majordomo info at  http://vger.kernel.org/majordomo-info.html

-- 
Don Fry
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/4] net: VM deadlock avoidance framework

2006-08-28 Thread Indan Zupancic
On Mon, August 28, 2006 19:32, Peter Zijlstra said:
 Also, I'm really past caring what the thing
 is called ;-) But if ppl object I guess its easy enough to run yet
 another sed command over the patches.

True, same here.

  You can get rid of the memalloc_reserve and vmio_request_queues variables
  if you want, they aren't really needed for anything. If using them reduces
  the total code size I'd keep them though.
 
  I find my version easier to read, but that might just be the way my
  brain works.

 Maybe true, but I believe my version is more natural in the sense that it 
 makes
 more clear what the code is doing. Less bookkeeping, more real work, so to 
 speak.

 Ok, I'll have another look at it, perhaps my gray matter has shifted ;-)

I don't care either way, just providing an alternative. I'd compile both and see
which one is smaller.


 Ah, no accident there, I'm fully aware that there would need to be a
 spinlock in adjust_memalloc_reserve() if there were another caller.
 (I even had it there for some time) - added comment.

Good that you're aware of it. Thing is, how much sense does the split-up into
adjust_memalloc_reserve() and sk_adjust_memalloc() make at this point? Why not
merge the code of adjust_memalloc_reserve() with sk_adjust_memalloc() and only
add adjust_memalloc_reserve() when it's really needed? It saves an export.

Feedback on the 28-Aug-2006 19:24 version from
programming.kicks-ass.net/kernel-patches/vm_deadlock/current/


 +void setup_per_zone_pages_min(void)
 +{
 + static DEFINE_SPINLOCK(lock);
 + unsigned long flags;
 +
 + spin_lock_irqsave(lock, flags);
 + __setup_per_zone_pages_min();
 + spin_unlock_irqrestore(lock, flags);
 +}

Better to put the lock next to min_free_kbytes, both for readability and
cache behaviour. And it satisfies the lock data, not code mantra.


 +static inline void * emergency_rx_alloc(size_t size, gfp_t gfp_mask)
 +{
 + void * page = NULL;
 +
 + if (size  PAGE_SIZE)
 + return page;
 +
 + if (atomic_add_unless(emergency_rx_pages_used, 1, RX_RESERVE_PAGES)) {
 + page = (void *)__get_free_page(gfp_mask);
 + if (!page) {
 + WARN_ON(1);
 + atomic_dec(emergency_rx_pages_used);
 + }
 + }
 +
 + return page;
 +}

If you prefer to avoid cmpxchg (which is often used in atomic_add_unless
and can be expensive) then you can use something like:

static inline void * emergency_rx_alloc(size_t size, gfp_t gfp_mask)
{
void * page;

if (size  PAGE_SIZE)
return NULL;

if (atomic_inc_return(emergency_rx_pages_used) == RX_RESERVE_PAGES)
goto out;
page = (void *)__get_free_page(gfp_mask);
if (page)
return page;
WARN_ON(1);
out:
atomic_dec(emergency_rx_pages_used);
return NULL;
}

The tiny race should be totally harmless. Both versions are a bit big
to inline though.


 @@ -195,6 +196,86 @@ __u32 sysctl_rmem_default = SK_RMEM_MAX;
  /* Maximal space eaten by iovec or ancilliary data plus some space */
  int sysctl_optmem_max = sizeof(unsigned long)*(2*UIO_MAXIOV + 512);

 +static DEFINE_SPINLOCK(memalloc_lock);
 +static int memalloc_reserve;
 +static unsigned int vmio_request_queues;
 +
 +atomic_t vmio_socks;
 +atomic_t emergency_rx_pages_used;
 +EXPORT_SYMBOL_GPL(vmio_socks);

Is this export needed? It's only used in net/core/skbuff.c and net/core/sock.c,
which are compiled into one module.

 +EXPORT_SYMBOL_GPL(emergency_rx_pages_used);

Same here. It's only used by code in sock.c and skbuff.c, and no external
code calls emergency_rx_alloc(), nor emergency_rx_free().

--

I think I depleted my usefulness, there isn't much left to say for me.
It's up to the big guys to decide about the merrit of this patch.
If Evgeniy's network allocator fixes all deadlocks and also has other
advantages, then great.

IMHO:

- This patch isn't really a framework, more a minimal fix for one specific,
though important problem. But it's small and doesn't have much impact
(numbers would be nice, e.g. vmlinux/modules size before and after, and
some network benchmark results).

- If Evgeniy's network allocator is as good as it looks, then why can't it
replace the existing one? Just adding private subsystem specific memory
allocators seems wrong. I might be missing the big picture, but it looks
like memory allocator things should be at least synchronized and discussed
with Christoph Lameter and his modular slab allocator patch.

All in all it seems it will take a while until Evgeniy's code will be merged,
so I think applying Peter's patch soonish and removing it again the moment it
becomes unnecessary is reasonable.

Greetings,

Indan


-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4 7/7] AMSO1100 Makefiles and Kconfig changes.

2006-08-28 Thread Roland Dreier
I'm finally getting around to merging this up, and:

  --- /dev/null
  +++ b/drivers/infiniband/hw/amso1100/README
  @@ -0,0 +1,11 @@
  +This is the OpenFabrics provider driver for the 
  +AMSO1100 1Gb RNIC adapter. 
  +
  +This adapter is available in limited quantities 
  +for development purposes from Open Grid Computing.
  +
  +This driver requires the IWCM and CMA mods necessary
  +to support iWARP.
  +
  +Contact [EMAIL PROTECTED] for more information.
  +

I don't think this belongs in the drivers directory.  In fact, is it
worth having this in the kernel at all?

How about if I just add a MAINTAINERS entry for amso1100 pointing at
[EMAIL PROTECTED] ?

 - R.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4 7/7] AMSO1100 Makefiles and Kconfig changes.

2006-08-28 Thread Steve Wise

Sounds good to me.


- Original Message - 
From: Roland Dreier [EMAIL PROTECTED]

To: Steve Wise [EMAIL PROTECTED]
Cc: openib-general@openib.org; netdev@vger.kernel.org
Sent: Monday, August 28, 2006 6:07 PM
Subject: Re: [PATCH v4 7/7] AMSO1100 Makefiles and Kconfig changes.



I'm finally getting around to merging this up, and:

 --- /dev/null
 +++ b/drivers/infiniband/hw/amso1100/README
 @@ -0,0 +1,11 @@
 +This is the OpenFabrics provider driver for the 
 +AMSO1100 1Gb RNIC adapter. 
 +
 +This adapter is available in limited quantities 
 +for development purposes from Open Grid Computing.

 +
 +This driver requires the IWCM and CMA mods necessary
 +to support iWARP.
 +
 +Contact [EMAIL PROTECTED] for more information.
 +

I don't think this belongs in the drivers directory.  In fact, is it
worth having this in the kernel at all?

How about if I just add a MAINTAINERS entry for amso1100 pointing at
[EMAIL PROTECTED] ?

- R.


-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: IPSec kernel oops on ppc64

2006-08-28 Thread Herbert Xu
On Mon, Aug 28, 2006 at 05:25:15PM -0500, Joy Latten wrote:
 
 A straight 2.6.17 kernel does not crash and my pings work.
 A 2.6.17 + patch-2.6.18-rc4 does crash and my pings do not work.
 The above tests were done on a ppc64. 

Thanks for that info.  This does sound like a bug.

Could you please generate a dump of the stack/register contents and
a disassembly of the code around the crash?

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmVHI~} [EMAIL PROTECTED]
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] net/*: use SLAB_PANIC

2006-08-28 Thread Alexey Dobriyan
On Mon, Aug 28, 2006 at 01:36:37PM -0700, David Miller wrote:
  ipv6 can be modular, so panicing on a initialization failure is wrong.

 That may be the case, but he merely translated the code
 as it existed, he didn't change it to start panic()'ing
 it already did.

 It would be a seperate change to undo the panic() in
 the ipv6 code.

That separate change transformed into big cleanup of IPV6 init/exit
codepaths to fix panic properly. Will be posted soon.

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH]ethtool.c:fix buffer overflow when devname it too long

2006-08-28 Thread Zhao Yu Wang
As the length of field ifr_name of struct ifreq is IFNAMSIZ(16) in header 
file /usr/include/net/if.h. It will result in buffer overflow when devname 
is too long. I modified strcpy to strncpy for only copying IFNAMSIZ bytes 
into struct ifreq. Also, by adding a section into parse_cmdline to detect 
if the length of devname is invalid.

diff -Nrup ethtool-4.orig/ethtool.c ethtool-4/ethtool.c
--- ethtool-4.orig/ethtool.c2006-07-18 21:21:38.0 -0500
+++ ethtool-4/ethtool.c 2006-08-27 22:32:12.0 -0500
@@ -626,6 +626,9 @@ static void parse_cmdline(int argc, char
 
if (devname == NULL) {
show_usage(1);
+   } else if (strlen(devname)  IFNAMSIZ) {
+   fprintf(stderr, Device name is too long. Should be less 
than %d!\n, IFNAMSIZ);
+   show_usage(1);
}
 }
 
@@ -1139,7 +1142,7 @@ static int doit(void)
 
/* Setup our control structures. */
memset(ifr, 0, sizeof(ifr));
-   strcpy(ifr.ifr_name, devname);
+   strncpy(ifr.ifr_name, devname, IFNAMSIZ);
 
/* Open control socket. */
fd = socket(AF_INET, SOCK_DGRAM, 0);



-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH -net-2.6.19] net/*: don't panic

2006-08-28 Thread Alexey Dobriyan
IPv6 can be modular and panicking on module loading is the last thing
you want.

Two SLAB_PANIC cases converted to error propagating as well as one
panic() call.

Signed-off-by: Alexey Dobriyan [EMAIL PROTECTED]
---

 I recall release is near, so error handling continues to suck. It needs
 big revamp anyway. init functions returning void. functions simply dropping
 -E... Partly shared with IPv4 :-(

 One more question: how can one unload ipv6? it seems to immediately get
 8 users here no matter what.

 include/net/ip6_fib.h   |2 +-
 include/net/ip6_route.h |2 +-
 include/net/transp_v6.h |2 +-
 net/ipv6/af_inet6.c |6 +-
 net/ipv6/ip6_fib.c  |8 +---
 net/ipv6/route.c|   14 +++---
 net/ipv6/tcp_ipv6.c |   19 ++-
 7 files changed, 38 insertions(+), 15 deletions(-)

--- a/include/net/ip6_fib.h
+++ b/include/net/ip6_fib.h
@@ -210,7 +210,7 @@ extern void fib6_run_gc(unsigned long 
 
 extern voidfib6_gc_cleanup(void);
 
-extern voidfib6_init(void);
+extern int fib6_init(void);
 
 extern voidfib6_rules_init(void);
 extern voidfib6_rules_cleanup(void);
--- a/include/net/ip6_route.h
+++ b/include/net/ip6_route.h
@@ -59,7 +59,7 @@ extern struct dst_entry * ip6_route_outp
 
 extern int ip6_route_me_harder(struct sk_buff *skb);
 
-extern voidip6_route_init(void);
+extern int ip6_route_init(void);
 extern voidip6_route_cleanup(void);
 
 extern int ipv6_route_ioctl(unsigned int cmd, void __user 
*arg);
--- a/include/net/transp_v6.h
+++ b/include/net/transp_v6.h
@@ -24,7 +24,7 @@ extern void   ipv6_destopt_init(void);
 /* transport protocols */
 extern voidrawv6_init(void);
 extern voidudpv6_init(void);
-extern voidtcpv6_init(void);
+extern int tcpv6_init(void);
 
 extern int udpv6_connect(struct sock *sk,
  struct sockaddr *uaddr,
--- a/net/ipv6/af_inet6.c
+++ b/net/ipv6/af_inet6.c
@@ -863,13 +863,17 @@ #endif
 
/* Init v6 transport protocols. */
udpv6_init();
-   tcpv6_init();
+   err = tcpv6_init();
+   if (err)
+   goto tcpv6_init_fail;
 
ipv6_packet_init();
err = 0;
 out:
return err;
 
+tcpv6_init_fail:
+   addrconf_cleanup();
 addrconf_fail:
ip6_flowlabel_cleanup();
ip6_route_cleanup();
--- a/net/ipv6/ip6_fib.c
+++ b/net/ipv6/ip6_fib.c
@@ -1468,14 +1468,16 @@ void fib6_run_gc(unsigned long dummy)
spin_unlock_bh(fib6_gc_lock);
 }
 
-void __init fib6_init(void)
+int __init fib6_init(void)
 {
fib6_node_kmem = kmem_cache_create(fib6_nodes,
   sizeof(struct fib6_node),
-  0, SLAB_HWCACHE_ALIGN|SLAB_PANIC,
+  0, SLAB_HWCACHE_ALIGN,
   NULL, NULL);
-
+   if (!fib6_node_kmem)
+   return -ENOMEM;
fib6_tables_init();
+   return 0;
 }
 
 void fib6_gc_cleanup(void)
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -2411,14 +2411,21 @@ ctl_table ipv6_route_table[] = {
 
 #endif
 
-void __init ip6_route_init(void)
+int __init ip6_route_init(void)
 {
struct proc_dir_entry *p;
+   int rv;
 
ip6_dst_ops.kmem_cachep =
kmem_cache_create(ip6_dst_cache, sizeof(struct rt6_info), 0,
- SLAB_HWCACHE_ALIGN|SLAB_PANIC, NULL, NULL);
-   fib6_init();
+ SLAB_HWCACHE_ALIGN, NULL, NULL);
+   if (!ip6_dst_ops.kmem_cachep)
+   return -ENOMEM;
+   rv = fib6_init();
+   if (rv  0) {
+   kmem_cache_destroy(ip6_dst_ops.kmem_cachep);
+   return rv;
+   }
 #ifdef CONFIG_PROC_FS
p = proc_net_create(ipv6_route, 0, rt6_proc_info);
if (p)
@@ -2432,6 +2439,7 @@ #endif
 #ifdef CONFIG_IPV6_MULTIPLE_TABLES
fib6_rules_init();
 #endif
+   return 0;
 }
 
 void ip6_route_cleanup(void)
--- a/net/ipv6/tcp_ipv6.c
+++ b/net/ipv6/tcp_ipv6.c
@@ -1644,14 +1644,23 @@ static struct inet_protosw tcpv6_protosw
INET_PROTOSW_ICSK,
 };
 
-void __init tcpv6_init(void)
+int __init tcpv6_init(void)
 {
+   int rv;
+
/* register inet6 protocol */
-   if (inet6_add_protocol(tcpv6_protocol, IPPROTO_TCP)  0)
+   rv = inet6_add_protocol(tcpv6_protocol, IPPROTO_TCP);
+   if (rv  0) {
printk(KERN_ERR tcpv6_init: Could not register protocol\n);
+   return rv;
+   }
inet6_register_protosw(tcpv6_protosw);
 
-   if 

Re: myri10ge conversion to non-contiguous skb

2006-08-28 Thread Brice Goglin
Jesse Brandeburg wrote:
 On 8/24/06, Brice Goglin [EMAIL PROTECTED] wrote:
 During the submission of the myri10ge driver, some people raised the
 question of using pages (or any kind of non-contiguous skb) instead of
 our current 16kB contiguous skb. We are looking at this right now and it
 is not clear what solution is the best. From what we understand, Linux
 provides two mostly redundant mechanisms to handle discontinuous skb,
 the skb-frags and the skb-frag_list, s2io using the latter while e1000
 uses the former. Is one or the other recommended? What is the purpose of
 having them both in the net core?

 you really only have one option, to use PAGE_SIZE pages and frags[]
 w/nr_frags.  e1000 tried the frag_list option but that is used by ip
 reassembly and badly conflicts with driver generated frag_list.

Ok, thanks for the clarification, we'll use frags then.

Is s2io going to be converted from frag_list to frags then?

Brice


-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html