[RFC] Net vm deadlock fix (take two)

2005-08-06 Thread Daniel Phillips
Hi,

This version does not do blatantly stupid things in hardware irq context, is
more efficient, and... wow the patch is smaller!  (That never happens.)

I don't mark skbs as being allocated from reserve any more.  That works, but
it is slightly bogus, because it doesn't matter which skb came from reserve,
it only matters that we put one back.  So I just count them and don't mark
them.

The tricky issue that had to be dealt with is the possibility that a massive 
number of skbs could in theory be queued by the hardware interrupt before the 
softnet softirq gets around to delivering them to the protocol.  But we can 
only allocate a limited number of skbs from reserve memory.  If we run out of 
reserve memory we have no choice but to fail skb allocations, and that can 
cause packets to be dropped.  Since we don't know which packets are blockio 
packets and which are not at that point, we could be so unlucky as to always 
drop block IO packets and always let other junk through.  The other junk 
won't get very far though, because those packets will be dropped as soon as 
the protocol headers are decoded, which will reveal that they do not belong 
to a memalloc socket.  This short circuit ought to help take away the cpu 
load that caused the softirq constipation in the first place.

What is actually going to happen is, a few block IO packets might be randomly 
dropped under such conditions, degrading the transport efficiency.  Block IO 
progress will continue, unless we manage to accidently drop every block IO 
packet and our softirqs continue to stay comatose, probably indicating a 
scheduler bug.

OK, we want to allocate skbs from reserve, but we cannot go infinitely into 
reserve.  So we count how many packets a driver has allocated from reserve, 
in the net_device struct.  If this goes over some limit, the skb 
allocation fails and the device driver may drop a packet because of that.  
Note that the e1000 driver will just be trying to refill its rx-ring at this 
point, and it will try to refill it again as soon as the next packet arrives, 
so it is still some ways away from actually dropping a packet.  Other drivers 
may immediately drop a packet at this point, c'est la vie.  Remember, this 
can only happen if the softirqs are backed up a silly amount.

The thing is, we have got our block IO traffic moving, by virtue of dipping 
into the reserve, and most likely moving at near-optimal speed.  Normal 
memory-consuming tasks are not moving because they are blocked on vm IO.  The 
things that can mess us up are cpu hogs - a scheduler problem - and tons of 
unhelpful traffic sharing the network wire, which we are killing off early as 
mentioned above.

What happens when a packet arrives at the protocol handler is a little subtle.  
At this point, if the interface is into reserve, we can always decrement the 
reserve count, regardless of what type of packet it is.  If it is a block IO 
packet, the packet is still accounted for within the block driver's 
throttling.  We are sure that the packet's resources will be returned to the 
common pool in an organized way.  If it is some other random kind of packet, 
we drop it right away, also returning the resources to the common pool.  
Either way, it is not the responsibility of the interface to account for it 
any more.

I'll just reiterate what I'm trying to accomplish here:

   1) Guarantee network block io forward progress
   2) Block IO throughput should not degrade much under low memory

This iteration of the patch addresses those goals nicely, I think.  I have not 
yet shown how to drive this from the block IO layer, and I haven't shown how 
to be sure that all protocols on an interface (not just TCPv4, as here) can 
handle the reserve management semantics.  I have ignored all transports 
besides IP, though not much changes for other transports.  I have some 
accounting code that is very probably racy and needs to be rewritten with 
atomic_t.  I have ignored the many hooks that are possible in the protocol 
path.  I have assumed that all receive skbs are the same size, and haven't 
accounted for the possibility that that size (MTU) might change.  All these 
things need looking at, but the main point at the moment is to establish a 
solid sense of correctness and to get some real results on a vanilla delivery 
path.  That in itself will be useful for cluster work, where configuration
issues are kept under careful control.

As far as drivers are concerned, the new interface is dev_memalloc_skb, which 
is straightforward.  It needs to know about the netdev for accounting 
purposes, so it takes it as a parameter and thoughtfully plugs it into the 
skb for you.

I am still using the global memory reserve, not mempool.  But notice, now I am 
explicitly accounting and throttling how deep a driver dips into the global 
reserve.  So GFP_MEMALLOC wins a point: the driver isn't just using the 
global reserve blindly, as has been traditional.  The jury is still out 

Re: [PATCH] netpoll can lock up on low memory.

2005-08-06 Thread Daniel Phillips
On Saturday 06 August 2005 12:32, Steven Rostedt wrote:
   If you need to really get the data out, then the design should be
   changed.  Have some return value showing the failure, check for
   oops_in_progress or whatever, and try again after turning interrupts
   back on, and getting to a point where the system can free up memory
   (write to swap, etc).  Just a busy loop without ever getting a skb is
   just bad.
 
  Why, pray tell, do you think there will be a second chance after
  re-enabling interrupts? How does this work when we're panicking or
  oopsing where we most care? How does this work when the netpoll client
  is the kernel debugger and the machine is completely stopped because
  we're tracing it?

 What I meant was to check for an oops and maybe then don't break out.
 Otherwise let the system try to reclaim memory. Since this is locked
 when the alloc_skb called with GFP_ATOMIC and fails.

You might want to take a look at my stupid little __GFP_MEMALLOC hack in the 
network block IO deadlock thread on netdev.  It will let you use the memalloc 
reserve from atomic context.  As long as you can be sure your usage will be 
bounded and you will eventually give it back, this should be ok.

Regards,

Daniel
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] netpoll can lock up on low memory.

2005-08-06 Thread Ingo Molnar

* Andi Kleen [EMAIL PROTECTED] wrote:

 On Fri, Aug 05, 2005 at 01:01:57PM -0700, Matt Mackall wrote:
  The netpoll philosophy is to assume that its traffic is an absolute
  priority - it is better to potentially hang trying to deliver a panic
  message than to give up and crash silently.
 
 That would be ok if netpoll was only used to deliver panics. But it is 
 not. It delivers all messages, and you cannot hang the kernel during 
 that. Actually even for panics it is wrong, because often it is more 
 important to reboot in a panic than (with a panic timeout) to actually 
 deliver the panic. That's needed e.g. in a failover cluster.

without going into the merits of this discussion, reliable failover 
clusters must include (and do include) an external ability to cut power.  
No amount of in-kernel logic will prevent the kernel from hanging, given 
a bad enough kernel bug.

So the right question is not 'can we prevent the kernel from hanging, 
ever' (we cannot), but 'which change makes it less likely for the kernel 
to hang'. (and, obviously: assuming all other kernel components are 
functioning per specification, netpoll itself most not hang :-)

even a plain printk to VGA can hang in certain kernel crashes. Netpoll 
is more complex and thus has more exposure to hangs. E.g. netpoll relies 
on the network driver to correctly recycle skbs within a bound amount of 
time. If the network driver leaks skbs, it's game over for netpoll.

[ i'd prefer a hang over nondeterministic behavior, and e.g. losing 
  console messages is sure nondeterministic behavior. What if the 
  console message is WARNING: the box has just been broken into? ]

we could do one thing (see the patch below): i think it would be useful 
to fill up the netlogging skb queue straight at initialization time.  
Especially if netpoll is used for dumping alone, the system might not be 
in a situation to fill up the queue at the point of crash, so better be 
a bit more prepared and keep the pipeline filled.

Ingo

Signed-off-by: Ingo Molnar [EMAIL PROTECTED]

--- net/core/netpoll.c.orig
+++ net/core/netpoll.c
@@ -720,6 +720,8 @@ int netpoll_setup(struct netpoll *np)
}
/* last thing to do is link it to the net device structure */
ndev-npinfo = npinfo;
+   /* fill up the skb queue */
+   refill_skbs();
 
return 0;
 
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: critical section violation in tg3.c?

2005-08-06 Thread David S. Miller

Simply do the pci_save_state before the register_netdev()
call, no need to mess around with the locking.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: test

2005-08-06 Thread David S. Miller
From: Daniel Phillips [EMAIL PROTECTED]
Date: Sat, 6 Aug 2005 04:52:07 +1000

 So then there is no choice but to throttle the per-cpu -input_pkt queues.  

Make the driver support NAPI if you want device fairness.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: ICMP broken in 2.6.13-rc5

2005-08-06 Thread Harald Welte
On Sat, Aug 06, 2005 at 02:08:15AM +0400, Vladimir B. Savkin wrote:
 I found that it really is NOTRACK who cause? bogus ICMP errors.

Well, this means that your ICMP errors need to be NAT'ed but they
cannot, since the original connection causing the ICMP error did not go
through connection tracking.

Your not-correctly-NATed ICMP packets are the logical result of this
configuration.

Use of NOTRACK in combination with NAT is _extremely_ dangerous, and
unless you understand it's full implications, I would not recommend
combining the two.

So it seems your use of NOTRACK is invalid in this setup - and thus like
a configuration problem.

-- 
- Harald Welte [EMAIL PROTECTED]  http://gnumonks.org/

Privacy in residential applications is a desirable marketing option.
  (ETSI EN 300 175-7 Ch. A6)


pgp6WMm07KihA.pgp
Description: PGP signature


[PATCH 5/6][INET] Generalise tcp_v4_hash tcp_unhash

2005-08-06 Thread Arnaldo Carvalho de Melo
David,

First set of changesets, please consider pulling from:

rsync://rsync.kernel.org/pub/scm/linux/kernel/git/acme/net-2.6.14.git

- Arnaldo

tree 7095737bc15a06613ef809457f95847e88a66550
parent f48ce924d611ea239cc3527235c2d926715564bb
author Arnaldo Carvalho de Melo [EMAIL PROTECTED] 1122957550 -0300
committer Arnaldo Carvalho de Melo [EMAIL PROTECTED] 1122957550 -0300

[INET] Generalise tcp_v4_hash  tcp_unhash

It really just makes the existing code be a helper function that tcp_v4_hash
and tcp_unhash uses, specifying the right inet_hashinfo, tcp_hashinfo.

One thing I'll investigate at some point is to have the inet_hashinfo pointer
in sk_prot, so that we get all the hashtable information from the sk pointer,
this can lead to some extra indirections that may well hurt performance/code
size, we'll see. Ultimate idea would be that sk_prot would provide _all_ the
information about a protocol implementation.

Signed-off-by: Arnaldo Carvalho de Melo [EMAIL PROTECTED]
Signed-off-by: David S. Miller [EMAIL PROTECTED]

--

 include/net/inet_hashtables.h |   34 ++
 net/ipv4/tcp_ipv4.c   |   29 ++---
 2 files changed, 36 insertions(+), 27 deletions(-)

--

diff --git a/include/net/inet_hashtables.h b/include/net/inet_hashtables.h
--- a/include/net/inet_hashtables.h
+++ b/include/net/inet_hashtables.h
@@ -240,4 +240,38 @@ static inline void __inet_hash(struct in
if (listen_possible  sk-sk_state == TCP_LISTEN)
wake_up(hashinfo-lhash_wait);
 }
+
+static inline void inet_hash(struct inet_hashinfo *hashinfo, struct sock *sk)
+{
+   if (sk-sk_state != TCP_CLOSE) {
+   local_bh_disable();
+   __inet_hash(hashinfo, sk, 1);
+   local_bh_enable();
+   }
+}
+
+static inline void inet_unhash(struct inet_hashinfo *hashinfo, struct sock *sk)
+{
+   rwlock_t *lock;
+
+   if (sk_unhashed(sk))
+   goto out;
+
+   if (sk-sk_state == TCP_LISTEN) {
+   local_bh_disable();
+   inet_listen_wlock(hashinfo);
+   lock = hashinfo-lhash_lock;
+   } else {
+   struct inet_ehash_bucket *head = 
hashinfo-ehash[sk-sk_hashent];
+   lock = head-lock;
+   write_lock_bh(head-lock);
+   }
+
+   if (__sk_del_node_init(sk))
+   sock_prot_dec_use(sk-sk_prot);
+   write_unlock_bh(lock);
+out:
+   if (sk-sk_state == TCP_LISTEN)
+   wake_up(hashinfo-lhash_wait);
+}
 #endif /* _INET_HASHTABLES_H */
diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
--- a/net/ipv4/tcp_ipv4.c
+++ b/net/ipv4/tcp_ipv4.c
@@ -225,37 +225,12 @@ fail:
 
 static void tcp_v4_hash(struct sock *sk)
 {
-   if (sk-sk_state != TCP_CLOSE) {
-   local_bh_disable();
-   __inet_hash(tcp_hashinfo, sk, 1);
-   local_bh_enable();
-   }
+   inet_hash(tcp_hashinfo, sk);
 }
 
 void tcp_unhash(struct sock *sk)
 {
-   rwlock_t *lock;
-
-   if (sk_unhashed(sk))
-   goto ende;
-
-   if (sk-sk_state == TCP_LISTEN) {
-   local_bh_disable();
-   inet_listen_wlock(tcp_hashinfo);
-   lock = tcp_hashinfo.lhash_lock;
-   } else {
-   struct inet_ehash_bucket *head = 
tcp_hashinfo.ehash[sk-sk_hashent];
-   lock = head-lock;
-   write_lock_bh(head-lock);
-   }
-
-   if (__sk_del_node_init(sk))
-   sock_prot_dec_use(sk-sk_prot);
-   write_unlock_bh(lock);
-
- ende:
-   if (sk-sk_state == TCP_LISTEN)
-   wake_up(tcp_hashinfo.lhash_wait);
+   inet_unhash(tcp_hashinfo, sk);
 }
 
 /* Don't inline this cruft.  Here are some nice properties to
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 6/6][INET] Generalise tcp_v4_lookup_listener

2005-08-06 Thread Arnaldo Carvalho de Melo
David,

First set of changesets, please consider pulling from:

rsync://rsync.kernel.org/pub/scm/linux/kernel/git/acme/net-2.6.14.git

- Arnaldo

tree 74a7900b3b8a414e7bd2703d46ab098cb3058c97
parent 31c00831e34dd1da084057326655a0a080ba5fb2
author Arnaldo Carvalho de Melo [EMAIL PROTECTED] 1122962893 -0300
committer Arnaldo Carvalho de Melo [EMAIL PROTECTED] 1122962893 -0300

[INET] Generalise tcp_v4_lookup_listener

[EMAIL PROTECTED] net-2.6.14]$ grep built-in /tmp/before /tmp/after
/tmp/before: 282560   131229312  304994   4a762 net/ipv4/built-in.o
/tmp/after:  282560   131229312  304994   4a762 net/ipv4/built-in.o

Will be used in DCCP, not exporting it right now not to get in Adrian Bunk's
exported-but-not-used-on-modules radar 8)

Signed-off-by: Arnaldo Carvalho de Melo [EMAIL PROTECTED]
Signed-off-by: David S. Miller [EMAIL PROTECTED]

--

 include/net/inet_hashtables.h |   36 ++
 net/ipv4/inet_hashtables.c|   41 +
 net/ipv4/tcp_ipv4.c   |   81 ++
 3 files changed, 82 insertions(+), 76 deletions(-)

--

diff --git a/include/net/inet_hashtables.h b/include/net/inet_hashtables.h
--- a/include/net/inet_hashtables.h
+++ b/include/net/inet_hashtables.h
@@ -16,8 +16,10 @@
 
 #include linux/interrupt.h
 #include linux/ip.h
+#include linux/ipv6.h
 #include linux/list.h
 #include linux/slab.h
+#include linux/socket.h
 #include linux/spinlock.h
 #include linux/types.h
 #include linux/wait.h
@@ -274,4 +276,38 @@ out:
if (sk-sk_state == TCP_LISTEN)
wake_up(hashinfo-lhash_wait);
 }
+
+extern struct sock *__inet_lookup_listener(const struct hlist_head *head,
+  const u32 daddr,
+  const unsigned short hnum,
+  const int dif);
+
+/* Optimize the common listener case. */
+static inline struct sock *inet_lookup_listener(struct inet_hashinfo *hashinfo,
+   const u32 daddr,
+   const unsigned short hnum,
+   const int dif)
+{
+   struct sock *sk = NULL;
+   struct hlist_head *head;
+
+   read_lock(hashinfo-lhash_lock);
+   head = hashinfo-listening_hash[inet_lhashfn(hnum)];
+   if (!hlist_empty(head)) {
+   const struct inet_sock *inet = inet_sk((sk = __sk_head(head)));
+
+   if (inet-num == hnum  !sk-sk_node.next 
+   (!inet-rcv_saddr || inet-rcv_saddr == daddr) 
+   (sk-sk_family == PF_INET || !ipv6_only_sock(sk)) 
+   !sk-sk_bound_dev_if)
+   goto sherry_cache;
+   sk = __inet_lookup_listener(head, daddr, hnum, dif);
+   }
+   if (sk) {
+sherry_cache:
+   sock_hold(sk);
+   }
+   read_unlock(hashinfo-lhash_lock);
+   return sk;
+}
 #endif /* _INET_HASHTABLES_H */
diff --git a/net/ipv4/inet_hashtables.c b/net/ipv4/inet_hashtables.c
--- a/net/ipv4/inet_hashtables.c
+++ b/net/ipv4/inet_hashtables.c
@@ -121,3 +121,44 @@ void inet_listen_wlock(struct inet_hashi
 }
 
 EXPORT_SYMBOL(inet_listen_wlock);
+
+/*
+ * Don't inline this cruft. Here are some nice properties to exploit here. The
+ * BSD API does not allow a listening sock to specify the remote port nor the
+ * remote address for the connection. So always assume those are both
+ * wildcarded during the search since they can never be otherwise.
+ */
+struct sock *__inet_lookup_listener(const struct hlist_head *head, const u32 
daddr,
+   const unsigned short hnum, const int dif)
+{
+   struct sock *result = NULL, *sk;
+   const struct hlist_node *node;
+   int hiscore = -1;
+
+   sk_for_each(sk, node, head) {
+   const struct inet_sock *inet = inet_sk(sk);
+
+   if (inet-num == hnum  !ipv6_only_sock(sk)) {
+   const __u32 rcv_saddr = inet-rcv_saddr;
+   int score = sk-sk_family == PF_INET ? 1 : 0;
+
+   if (rcv_saddr) {
+   if (rcv_saddr != daddr)
+   continue;
+   score += 2;
+   }
+   if (sk-sk_bound_dev_if) {
+   if (sk-sk_bound_dev_if != dif)
+   continue;
+   score += 2;
+   }
+   if (score == 5)
+   return sk;
+   if (score  hiscore) {
+   hiscore = score;
+

Re: [PATCH 6/6][INET] Generalise tcp_v4_lookup_listener

2005-08-06 Thread David S. Miller

All pulled, as well as your dccp-2.6.14 tree, into net-2.6.14

It should show up on the kernel.org mirrors shortly.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 6/6][INET] Generalise tcp_v4_lookup_listener

2005-08-06 Thread Arnaldo Carvalho de Melo
On 8/6/05, David S. Miller [EMAIL PROTECTED] wrote:
 
 All pulled, as well as your dccp-2.6.14 tree, into net-2.6.14
 
 It should show up on the kernel.org mirrors shortly.


WOW, that was fast, thank you! I'll be just one e-mail away to work
right away on fixing any bug introduced by these changesets! But
there should be none 8-)

- Arnaldo
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


ANNOUNCE: Linux DCCP implementation merged

2005-08-06 Thread Arnaldo Carvalho de Melo
Hi Guys,

I'm very pleased to announce that the Linux 2.6 DCCP implementation
has been merged in David Miller's net-2.6.14.git tree, and should appear
shortly on Andrew Morton's 2.6.13-rcLATEST-mm tree and finally in mainline
when Linus starts 2.6.14.

There is still a lot of work to be done, but this is a milestone
to celebrate!

Now to work on:

1. Getting the DCCP CCID infrastructure closer to the TCP Congestion
   Avoidance one

2. Finish the generalisation of TCP TIMEWAIT minisockets and make DCCP
   use it properly

3. Fully generalise net/ipv4/tcp_diag.c into net/core/net_diag.c so that
   we have all of the iproute2/netlink functionality

4. Implement CCID2

5. Implement the remaining options processing

6. Implement feature negotiation so that the interop tests with Joacim's
   FreeBSD and Nishida-san NetBSD implementations move along faster

7. Polish CCID3 getting it up to the latest standards, probably moving
   the packet history handling to the core and make it selectable by the
   CCIDs, like we already have the initial support for ACK Vectors

8. Reimplement dccp_{sendmsg,recvmsg} more inteligently, closer to the
   TCP implementation (this is closely related to #1 above).

9. Implement all the CCIDs Sally, Eddie and others come up with! :-)

Thanks a lot to all involved!

- Arnaldo
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] netpoll can lock up on low memory.

2005-08-06 Thread Steven Rostedt
On Sat, 2005-08-06 at 02:46 -0700, David S. Miller wrote:
 Can you guys stop peeing your pants over this, put aside
 your differences, and work on a mutually acceptable fix
 for these bugs?
 
 Much appreciated, thanks :-)

In my last email, I stated that this discussion seems to have
demonstrated that the e1000 driver's netpoll is indeed broken, and needs
to be fixed.  I submitted eariler a patch for this, but it's untested
and someone who owns an e1000 needs to try it.

As for all the netpoll issues, I'm satisfied with whatever you guys
decide.  But I've seen lots of problems posted over the netpoll and
e1000, where people send in patches that do everything but fix the
e1000, and that's where I chimed in.

Thank you, my pants are dry now :-)

-- Steve


-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] IPSec anti-replay sequence numbers

2005-08-06 Thread Ulrich Weber

KOVACS Krisztian wrote:


 Hi,

On Friday 05 August 2005 12.50, Patrick McHardy wrote:
 


Is there already userspace code which uses this feature somewhere?
   



 AFAIK Ulrich has a patch for OpenSWAN, and we (Balabit) have a patch 
for racoon. Unfortunately this racoon version is available only as a 
commercial product.


 

The patch for openswan is nearly finished and will be released around 
the end of this year.
In my first post I split the patch into three pieces, two to get the 
sequence numbers with pf_key and netlink/xfrm, and one to set/inform 
about the sequence numbers over netlink/xfrm.


IMHO der first two are useful for everyone using ipsec under linux, so 
it would be great if these two would flow into the vanilla kernel.
The latter one must be determined if it's useful to add it to the 
vanilla kernel and if yes, in which form.


Best regards
Ulrich


-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] netpoll can lock up on low memory.

2005-08-06 Thread Andi Kleen
On Sat, Aug 06, 2005 at 09:45:03AM +0200, Ingo Molnar wrote:
 
 * Andi Kleen [EMAIL PROTECTED] wrote:
 
  On Fri, Aug 05, 2005 at 01:01:57PM -0700, Matt Mackall wrote:
   The netpoll philosophy is to assume that its traffic is an absolute
   priority - it is better to potentially hang trying to deliver a panic
   message than to give up and crash silently.
  
  That would be ok if netpoll was only used to deliver panics. But it is 
  not. It delivers all messages, and you cannot hang the kernel during 
  that. Actually even for panics it is wrong, because often it is more 
  important to reboot in a panic than (with a panic timeout) to actually 
  deliver the panic. That's needed e.g. in a failover cluster.
 
 without going into the merits of this discussion, reliable failover 
 clusters must include (and do include) an external ability to cut power.  
 No amount of in-kernel logic will prevent the kernel from hanging, given 
 a bad enough kernel bug.

Ok, true, but we should do a best effort.

 
 So the right question is not 'can we prevent the kernel from hanging, 
 ever' (we cannot), but 'which change makes it less likely for the kernel 
 to hang'. (and, obviously: assuming all other kernel components are 
 functioning per specification, netpoll itself most not hang :-)
 
 even a plain printk to VGA can hang in certain kernel crashes. Netpoll 
 is more complex and thus has more exposure to hangs. E.g. netpoll relies 
 on the network driver to correctly recycle skbs within a bound amount of 
 time. If the network driver leaks skbs, it's game over for netpoll.

I don't think we even need to think about such rare cases,
until the easy cases (everything hangs when the cable is pulled) 
are not fixed.

 [ i'd prefer a hang over nondeterministic behavior, and e.g. losing 
   console messages is sure nondeterministic behavior. What if the 
   console message is WARNING: the box has just been broken into? ]

That just makes netconsole useless in production. If it causes frequenet
hangs people will not use it.


 
 we could do one thing (see the patch below): i think it would be useful 
 to fill up the netlogging skb queue straight at initialization time.  
 Especially if netpoll is used for dumping alone, the system might not be 
 in a situation to fill up the queue at the point of crash, so better be 
 a bit more prepared and keep the pipeline filled.

You're solving a completely different issue here?

-Andi

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: assertion (cnt = tp-packets_out) failed

2005-08-06 Thread John Bäckstrand

Hang on a second, the original poster mentioned rc5.  Is this really
pristine rc5 with the one netpoll patch? If so then it can't be the
patches we're talking about because they only went in days later.


Yes, I have no other patches in, so if it was not in -RC5, I was not 
running it.


---
John Bäckstrand
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] netpoll can lock up on low memory.

2005-08-06 Thread John Bäckstrand

Steven Rostedt wrote:


In my last email, I stated that this discussion seems to have
demonstrated that the e1000 driver's netpoll is indeed broken, and needs
to be fixed.  I submitted eariler a patch for this, but it's untested
and someone who owns an e1000 needs to try it.


I can test this, but not right now: Im trying, again, to find my hard 
lockup issue, and so I will try to run this machine until it locks up. 
It lasted 9 days at one time, so it could potentially take some time, 
I'm afraid.


---
John Bäckstrand
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/1][INET] Make inet_create try to load protocol modules

2005-08-06 Thread David S. Miller
From: [EMAIL PROTECTED] (Arnaldo Carvalho de Melo)
Date: Sat, 6 Aug 2005 10:01:05 -0300

 + /* Be more specific, e.g. net-pf-2-132-1 
 (net-pf-PF_INET-IPPROTO_SCTP-SOCK_STREAM) */
 + if (++try_loading_module == 1)
 + request_module(net-proto-%d-%d-%d, PF_INET, 
 protocol, sock-type);

Your comments don't match the strings you are actually
building in request_module() ie. net-pf-* vs. net-proto-*.
Please make them be consistent.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/1][INET] Make inet_create try to load protocol modules

2005-08-06 Thread Arnaldo Carvalho de Melo
Em Sat, Aug 06, 2005 at 06:24:35AM -0700, David S. Miller escreveu:
 From: [EMAIL PROTECTED] (Arnaldo Carvalho de Melo)
 Date: Sat, 6 Aug 2005 10:01:05 -0300
 
  +   /* Be more specific, e.g. net-pf-2-132-1 
  (net-pf-PF_INET-IPPROTO_SCTP-SOCK_STREAM) */
  +   if (++try_loading_module == 1)
  +   request_module(net-proto-%d-%d-%d, PF_INET, 
  protocol, sock-type);
 
 Your comments don't match the strings you are actually
 building in request_module() ie. net-pf-* vs. net-proto-*.
 Please make them be consistent.

OK, I'll do this later, lack of sleep must be the reason for this mistake
:-\

-- 


- Arnaldo
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2.6 6/5] tg3: Fix bug in setting a tg3_flag

2005-08-06 Thread David S. Miller

Michael, I've added all 6 patches to my net-2.6.14 tree.
It should show up on the kernel.org GIT mirrors shortly.

I decided against sticking this into 2.6.13, as these changes
can introduce regressions and the space of users effected by
this problem is decidedly small compared to how many could be
effected by any error in these changes.

Thanks.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: assertion (cnt = tp-packets_out) failed

2005-08-06 Thread David S. Miller
From: Herbert Xu [EMAIL PROTECTED]
Date: Sat, 6 Aug 2005 17:57:17 +1000

 Hang on a second, the original poster mentioned rc5.  Is this really
 pristine rc5 with the one netpoll patch? If so then it can't be the
 patches we're talking about because they only went in days later.

This seems to be confirmed now... so I'll hold off on the revert
for now.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/1][INET] Make inet_create try to load protocol modules

2005-08-06 Thread Arnaldo Carvalho de Melo
Em Sat, Aug 06, 2005 at 06:24:35AM -0700, David S. Miller escreveu:
 From: [EMAIL PROTECTED] (Arnaldo Carvalho de Melo)
 Date: Sat, 6 Aug 2005 10:01:05 -0300
 
  +   /* Be more specific, e.g. net-pf-2-132-1 
  (net-pf-PF_INET-IPPROTO_SCTP-SOCK_STREAM) */
  +   if (++try_loading_module == 1)
  +   request_module(net-proto-%d-%d-%d, PF_INET, 
  protocol, sock-type);
 
 Your comments don't match the strings you are actually
 building in request_module() ie. net-pf-* vs. net-proto-*.
 Please make them be consistent.

Fixed:

rsync://rsync.kernel.org/pub/scm/linux/kernel/git/acme/net-2.6.14.git

I checked and the mirrors picked this one already.

- Arnaldo

tree 13278f7cf4453ec1bc5d9e2f45bd5cd250f7ce18
parent 16963c77a4472768f6c04d14681584a118f6a7f4
author Arnaldo Carvalho de Melo [EMAIL PROTECTED] 1123337601 -0300
committer Arnaldo Carvalho de Melo [EMAIL PROTECTED] 1123337601 -0300

[INET] Make inet_create try to load protocol modules

Syntax is net-proto-PROTOCOL_FAMILY-PROTOCOL-SOCK_TYPE and if this fails
net-proto-PROTOCOL_FAMILY-PROTOCOL.

Signed-off-by: Arnaldo Carvalho de Melo [EMAIL PROTECTED]
Signed-off-by: David S. Miller [EMAIL PROTECTED]

--

 dccp/proto.c|9 +++--
 ipv4/af_inet.c  |   21 +
 sctp/protocol.c |4 
 3 files changed, 28 insertions(+), 6 deletions(-)

--

diff --git a/net/dccp/proto.c b/net/dccp/proto.c
--- a/net/dccp/proto.c
+++ b/net/dccp/proto.c
@@ -811,8 +811,13 @@ static void __exit dccp_fini(void)
 module_init(dccp_init);
 module_exit(dccp_fini);
 
-/* __stringify doesn't likes enums, so use SOCK_DCCP (6) value directly  */
-MODULE_ALIAS(net-pf- __stringify(PF_INET) -6);
+/*
+ * __stringify doesn't likes enums, so use SOCK_DCCP (6) and IPPROTO_DCCP (33)
+ * values directly, Also cover the case where the protocol is not specified,
+ * i.e. net-proto-PF_INET-0-SOCK_DCCP
+ */
+MODULE_ALIAS(net-proto- __stringify(PF_INET) -33-6);
+MODULE_ALIAS(net-proto- __stringify(PF_INET) -0-6);
 MODULE_LICENSE(GPL);
 MODULE_AUTHOR(Arnaldo Carvalho de Melo [EMAIL PROTECTED]);
 MODULE_DESCRIPTION(DCCP - Datagram Congestion Controlled Protocol);
diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c
--- a/net/ipv4/af_inet.c
+++ b/net/ipv4/af_inet.c
@@ -228,12 +228,14 @@ static int inet_create(struct socket *so
struct proto *answer_prot;
unsigned char answer_flags;
char answer_no_check;
-   int err;
+   int try_loading_module = 0;
+   int err = -ESOCKTNOSUPPORT;
 
sock-state = SS_UNCONNECTED;
 
/* Look for the requested type/protocol pair. */
answer = NULL;
+lookup_protocol:
rcu_read_lock();
list_for_each_rcu(p, inetsw[sock-type]) {
answer = list_entry(p, struct inet_protosw, list);
@@ -254,9 +256,20 @@ static int inet_create(struct socket *so
answer = NULL;
}
 
-   err = -ESOCKTNOSUPPORT;
-   if (!answer)
-   goto out_rcu_unlock;
+   if (unlikely(answer == NULL)) {
+   if (try_loading_module  2) {
+   rcu_read_unlock();
+   /* Be more specific, e.g. net-proto-2-132-1 
(net-proto-PF_INET-IPPROTO_SCTP-SOCK_STREAM) */
+   if (++try_loading_module == 1)
+   request_module(net-proto-%d-%d-%d, PF_INET, 
protocol, sock-type);
+   /* Fall back to generic, e.g. net-proto-132-1 
(net-proto-IPPROTO_SCTP) */
+   else
+   request_module(net-proto-%d-%d, PF_INET, 
protocol);
+   goto lookup_protocol;
+   } else
+   goto out_rcu_unlock;
+   }
+
err = -EPERM;
if (answer-capability  0  !capable(answer-capability))
goto out_rcu_unlock;
diff --git a/net/sctp/protocol.c b/net/sctp/protocol.c
--- a/net/sctp/protocol.c
+++ b/net/sctp/protocol.c
@@ -1242,6 +1242,10 @@ SCTP_STATIC __exit void sctp_exit(void)
 module_init(sctp_init);
 module_exit(sctp_exit);
 
+/*
+ * __stringify doesn't likes enums, so use IPPROTO_SCTP value (132) directly.
+ */
+MODULE_ALIAS(net-proto- __stringify(PF_INET) -132);
 MODULE_AUTHOR(Linux Kernel SCTP developers [EMAIL PROTECTED]);
 MODULE_DESCRIPTION(Support for the SCTP protocol (RFC2960));
 MODULE_LICENSE(GPL);
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] LSM-IPSec Networking Hooks -- revised flow cache [resend]

2005-08-06 Thread Trent Jaeger
OK.  Thanks for the comments.  I'll get back soon.

Regards,
Trent.

Trent Jaeger
IBM T.J. Watson Research Center
19 Skyline Drive, Hawthorne, NY 10532
(914) 784-7225, FAX (914) 784-7225




Herbert Xu [EMAIL PROTECTED]
08/06/2005 03:45 AM
 
To: Trent Jaeger/Watson/[EMAIL PROTECTED]
cc: [EMAIL PROTECTED], [EMAIL PROTECTED], [EMAIL PROTECTED], 
netdev@vger.kernel.org, [EMAIL PROTECTED], Serge E 
Hallyn/Austin/[EMAIL PROTECTED], [EMAIL PROTECTED], [EMAIL PROTECTED]
Subject:Re: [PATCH 1/2] LSM-IPSec Networking Hooks -- 
revised flow cache [resend]


On Tue, Aug 02, 2005 at 02:04:41PM -0400, jaegert wrote:
 Resend of 20 July patch that repaired the flow_cache_lookup 
 authorization (now for 2.6.13-rc4-git4).

Thanks Trent.  I'm happy with the flow cache stuff now.
However, there are still some technical details to take
care of.

 diff -puN include/linux/xfrm.h~lsm-xfrm-nethooks include/linux/xfrm.h
 --- linux-2.6.13-rc4-xfrm/include/linux/xfrm.h~lsm-xfrm-nethooks  
2005-08-01 16:11:22.0 -0400
 +++ linux-2.6.13-rc4-xfrm-root/include/linux/xfrm.h 2005-08-01 
16:11:22.0 -0400
 @@ -173,6 +201,7 @@ enum xfrm_attr_type_t {
XFRMA_ALG_CRYPT,/* struct xfrm_algo */
XFRMA_ALG_COMP, /* struct 
xfrm_algo */
XFRMA_ENCAP,/* struct 
xfrm_algo + struct xfrm_encap_tmpl */
 +  XFRMA_SEC_CTX,  /* struct xfrm_sec_ctx */
XFRMA_TMPL, /* 1 or more 
struct xfrm_user_tmpl */
XFRMA_SA,
XFRMA_POLICY,

Please add it at the end of the enum as otherwise you may break
existing user-space applications.  In this particular case the
breakage isn't serious since those three XFRMA types are fairly
recent but still it's better to be safe than sorry :)
 
 diff -puN include/net/xfrm.h~lsm-xfrm-nethooks include/net/xfrm.h
 --- linux-2.6.13-rc4-xfrm/include/net/xfrm.h~lsm-xfrm-nethooks  
2005-08-01 16:11:22.0 -0400
 +++ linux-2.6.13-rc4-xfrm-root/include/net/xfrm.h 2005-08-01 
16:11:22.0 -0400
 @@ -510,6 +514,27 @@ xfrm_selector_match(struct xfrm_selector
return 0;
  }
 
 +/* If neither has a context -- match
 +   Otherwise, both must have a context and the sids, doi, alg must 
match */
 +static inline int xfrm_sec_ctx_match(struct xfrm_sec_ctx *s1, struct 
xfrm_sec_ctx *s2)
 +{
 +  return ((!s1  !s2) ||
 +  (s1  s2 
 +   (s1-ctx_sid == s2-ctx_sid) 
 +   (s1-ctx_doi == s2-ctx_doi) 
 +   (s1-ctx_alg == s2-ctx_alg)));
 +}

Would it be possible to make this conditional on CONFIG_SECURITY_NETWORK?

 +static inline struct xfrm_sec_ctx *xfrm_policy_security(struct 
xfrm_policy *xp)
 +{
 +  return (xp ? xp-security : NULL);
 +}
 +
 +static inline struct xfrm_sec_ctx *xfrm_state_security(struct 
xfrm_state *x)
 +{
 +  return (x ? x-security : NULL);
 +}
 +

Do you really need these NULL checks? If not I'd suggest getting rid
of these altogether.

A quick glance at all the users of xfrm_policy_security in Patch 1
seems to indicate that none of those places can have xp being NULL.

 diff -puN net/core/flow.c~lsm-xfrm-nethooks net/core/flow.c
 --- linux-2.6.13-rc4-xfrm/net/core/flow.c~lsm-xfrm-nethooks 2005-08-01 
16:11:22.0 -0400
 +++ linux-2.6.13-rc4-xfrm-root/net/core/flow.c 2005-08-01 
16:12:03.0 -0400
 @@ -23,6 +23,7 @@
  #include net/flow.h
  #include asm/atomic.h
  #include asm/semaphore.h
 +#include linux/security.h

This appears to be unnecessary.

 diff -puN net/ipv4/xfrm4_policy.c~lsm-xfrm-nethooks 
net/ipv4/xfrm4_policy.c
 --- linux-2.6.13-rc4-xfrm/net/ipv4/xfrm4_policy.c~lsm-xfrm-nethooks  
2005-08-01 16:11:22.0 -0400
 +++ linux-2.6.13-rc4-xfrm-root/net/ipv4/xfrm4_policy.c 2005-08-01 
16:11:22.0 -0400
 @@ -36,6 +36,8 @@ __xfrm4_find_bundle(struct flowi *fl, st
if (xdst-u.rt.fl.oif == fl-oif   
/*XXX*/
xdst-u.rt.fl.fl4_dst == fl-fl4_dst 

xdst-u.rt.fl.fl4_src == fl-fl4_src 

 + xfrm_sec_ctx_match(xfrm_policy_security(policy),
 + xfrm_state_security(dst-xfrm)) 

Is this necessary? The policy's context must've matched the state's
context at its creation time.  AFAIK there is no way for the security
context to change during their life-cycle.

 diff -puN net/ipv6/xfrm6_policy.c~lsm-xfrm-nethooks 
net/ipv6/xfrm6_policy.c
 --- linux-2.6.13-rc4-xfrm/net/ipv6/xfrm6_policy.c~lsm-xfrm-nethooks  
2005-08-01 16:11:22.0 -0400
 +++ linux-2.6.13-rc4-xfrm-root/net/ipv6/xfrm6_policy.c 2005-08-01 
16:11:22.0 -0400
 @@ -54,6 +54,8 @@ __xfrm6_find_bundle(struct flowi *fl, st
 xdst-u.rt6.rt6i_src.plen);

Re: ICMP broken in 2.6.13-rc5

2005-08-06 Thread Harald Welte
On Sat, Aug 06, 2005 at 01:25:43PM +0400, Vladimir B. Savkin wrote:
 On Sat, Aug 06, 2005 at 11:13:37AM +0200, Harald Welte wrote:
  On Sat, Aug 06, 2005 at 02:08:15AM +0400, Vladimir B. Savkin wrote:
   I found that it really is NOTRACK who cause? bogus ICMP errors.
  
  Well, this means that your ICMP errors need to be NAT'ed but they
  cannot, since the original connection causing the ICMP error did not go
  through connection tracking.
 
 How so, when there are no NAT rules that can match either source packets
 or ICMP errors?

As soon as you load NAT, _all_ connections need to be tracked, since
those with no NAT configured need to allocate a null binding.

NAT needs to know about all connections, since otherwise it would not be
able to learn about all already-used port/ip tuples.

So independant of the specific ICMP problem you're observing, the
configuration seems broken to me in the first place.

It remains to be questioned, whether we should deal more gracefully with
such a setup, though.

But the discussion like this are one of the reasons why we thought very
hard whether we should include the NOTRACK target into mainline at all.
It is dangerous, and a lot of people will use it in combination and end
up with broken configuration.

I think we should make NOTRACK and NAT an XOR, i.e. only allow one of
them to be enabled at any given time.

-- 
- Harald Welte [EMAIL PROTECTED]  http://gnumonks.org/

Privacy in residential applications is a desirable marketing option.
  (ETSI EN 300 175-7 Ch. A6)


pgpSphNKlfV0Z.pgp
Description: PGP signature


Re: ICMP broken in 2.6.13-rc5

2005-08-06 Thread Vladimir B. Savkin
On Sat, Aug 06, 2005 at 05:12:01PM +0200, Harald Welte wrote:
   Well, this means that your ICMP errors need to be NAT'ed but they
   cannot, since the original connection causing the ICMP error did not go
   through connection tracking.
  
  How so, when there are no NAT rules that can match either source packets
  or ICMP errors?
 
 As soon as you load NAT, _all_ connections need to be tracked, since
 those with no NAT configured need to allocate a null binding.
 
 NAT needs to know about all connections, since otherwise it would not be
 able to learn about all already-used port/ip tuples.
 
 So independant of the specific ICMP problem you're observing, the
 configuration seems broken to me in the first place.
 
 It remains to be questioned, whether we should deal more gracefully with
 such a setup, though.

In my case, I have local network and Internet access.
Local traffic (packets which have both src and dst IP belonging
to local prefix) does not need to be NATed or statefully filtered.
So I wanted to use NOTRACK for maximum forwarding performance.
ICMP error were matched by NOTRACK too (in OUTPUT chain of raw table),
as it also has local src and dst. IMO, this means then there should
be no NAT attempts for this ICMP packet...

I think of this as a valuable feature of Linux - using one box 
for two or more applications, in my case - local router (no NAT, no
stateful filtering, maximum performance) and Internet gateway (with NAT,
more filtering, maximum control).

 But the discussion like this are one of the reasons why we thought very
 hard whether we should include the NOTRACK target into mainline at all.
 It is dangerous, and a lot of people will use it in combination and end
 up with broken configuration.
 
 I think we should make NOTRACK and NAT an XOR, i.e. only allow one of
 them to be enabled at any given time.
 

Well, this would break this feature which worked very well for me 
with older kernels.

~
:wq
With best regards, 
   Vladimir Savkin. 

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: ICMP broken in 2.6.13-rc5

2005-08-06 Thread Vladimir B. Savkin
On Sat, Aug 06, 2005 at 04:58:46PM +0200, Patrick McHardy wrote:
 Harald Welte wrote:
 On Sat, Aug 06, 2005 at 02:08:15AM +0400, Vladimir B. Savkin wrote:
 
 I found that it really is NOTRACK who cause? bogus ICMP errors.
 
 Good work tracking this down. I've seen reports of this before, but
 never found the reason.
 
 Well, this means that your ICMP errors need to be NAT'ed but they
 cannot, since the original connection causing the ICMP error did not go
 through connection tracking.
 
 Your not-correctly-NATed ICMP packets are the logical result of this
 configuration.
 
 Use of NOTRACK in combination with NAT is _extremely_ dangerous, and
 unless you understand it's full implications, I would not recommend
 combining the two.
 
 So it seems your use of NOTRACK is invalid in this setup - and thus like
 a configuration problem.
 
 I disagree, NAT already ignores untracked connections in most places,
 just icmp_reply_translation is missing.
 
 Vladimir, can you please test the attached patch?

No success, looks that with this patch no ICMP replies are generated (*),
no matter whether there exist any NOTRACK rules.

(*) I only tested that no replies were received by the client (broken
tracepath) and that there were no bogus packets on loopback.

 diff --git a/net/ipv4/netfilter/ip_nat_core.c 
 b/net/ipv4/netfilter/ip_nat_core.c
 --- a/net/ipv4/netfilter/ip_nat_core.c
 +++ b/net/ipv4/netfilter/ip_nat_core.c
 @@ -430,6 +430,19 @@ int icmp_reply_translation(struct sk_buf
   } *inside;
   struct ip_conntrack_tuple inner, target;
   int hdrlen = (*pskb)-nh.iph-ihl * 4;
 + unsigned long statusbit;
 +
 + if (manip == IP_NAT_MANIP_SRC)
 + statusbit = IPS_SRC_NAT;
 + else
 + statusbit = IPS_DST_NAT;
 +
 + /* Invert if this is reply dir. */
 + if (dir == IP_CT_DIR_REPLY)
 + statusbit ^= IPS_NAT_MASK;
 +
 + if (!(ct-status  statusbit))
 + return 0;
  
   if (!skb_make_writable(pskb, hdrlen + sizeof(*inside)))
   return 0;

~
:wq
With best regards, 
   Vladimir Savkin. 

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: atheros driver - desc

2005-08-06 Thread Kalle Valo
Mateusz Berezecki [EMAIL PROTECTED] writes:

 The driver is not yet fully working because I didn't finish kernel
 integration yet. Almost all
 driver I/O ops are reverse engineered independently of openbsd openhal
 which is missing just too much.


 Ok, enough talking. Most of the atheros 5212 hal is now open :)

This is great news. An open source Atheros driver which could be
included to Linux is really needed.

But how was the reverse engineering done? I noticed that forcedeth
driver was implemented using the clean room design[1] and Linux
Broadcom 4301 driver project[2] seems to be using the same method.

The reason I'm asking this is that I just wouldn't want see the same
happening this with this driver as happened during reverse engineering
of pwc Philips Webcam driver (some parts of the driver were removed
from kernel, but I believe the situation is now solved).

Actually, what are requirements to get a reverse engineered driver
included to Linux? Is clean room design an absolute must? It seems
that reverse engineering is needed if we want Linux support for most
of the WLAN cards on the market :(

[1] http://en.wikipedia.org/wiki/Clean_room_design
[2] http://linux-bcom4301.sourceforge.net/go/progress

-- 
Kalle Valo

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: w1 netlink

2005-08-06 Thread Evgeniy Polyakov
On Sat, Aug 06, 2005 at 09:37:00PM +0200, Patrick McHardy ([EMAIL PROTECTED]) 
wrote:
 I'm working on extending netlink to work with an arbitary number
 of groups and stumbled over this in the w1 driver:
 
 dev-groups = 23
 
 NETLINK_CB(skb).dst_group = dev-groups;
 netlink_broadcast(dev-nls, skb, 0, dev-groups, GFP_ATOMIC);
 
 Apparently it wants to send to multiple groups at once, is that correct?
 Why does it need to do so? One limitation introduced by my patches will
 be that broadcasting to multiple groups won't be possible anymore and
 this is the only code in the kernel that uses this feature of netlink.

23 was selected arbitrary - w1 definitely can live without multicast.

According to complete removal of multicast feature - it is qiute
usefull, maybe it is better to make it per-socket.
And will not it break RTMGRP_* messages?

-- 
Evgeniy Polyakov
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: kfree_skb questions

2005-08-06 Thread Patrick McHardy

Daniel Phillips wrote:

Hi,

The way I read this, __kfree_skb will sometimes be called with -users = 1 and 
sometimes with -users = 0, is that right?  


Yes.


static inline void kfree_skb(struct sk_buff *skb)
{
if (likely(atomic_read(skb-users) == 1))
smp_rmb();
else if (likely(!atomic_dec_and_test(skb-users)))
return;
__kfree_skb(skb);
}

If so, then why not just:

static inline void kfree_skb(struct sk_buff *skb)
{
if (likely(atomic_read(skb-users) == 1))
smp_rmb();
if (likely(!atomic_dec_and_test(skb-users)))
return;
__kfree_skb(skb);
}

so __kfree_skb can BUG_ON(atomic_read(skb-users))?  Perhaps this has 
something to do with the smp_rmb, could somebody please explain to me why it 
is necessary here, and for which architectures?


The atomic_read is used as an optimization under the assumption that
an atomic_read is cheaper than an atomic_dec_and_test. The smp_rmb
is (was) needed to make sure the CPU didn't reorder things because
we used to have a BUG check in __kfree_skb which triggered if
skb-list was non-NULL.

Anyway, do we not want BUG_ON(!atomic_read(skb-users)) at the beginning of 
kfree_skb, since we rely on it?


Why do you care if skb-users is 0 or 1 in __kfree_skb()?
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: kfree_skb questions

2005-08-06 Thread Daniel Phillips
On Sunday 07 August 2005 06:26, Patrick McHardy wrote:
  Anyway, do we not want BUG_ON(!atomic_read(skb-users)) at the beginning
  of kfree_skb, since we rely on it?

 Why do you care if skb-users is 0 or 1 in __kfree_skb()?

Because I am a neatness freak and I like to check things that inattentive 
coders can easily get wrong.  But the question above is not about that, it is 
about checking for possible calls where skb-users is already zero and 
thereby catching the double free early instead of letting it slide further 
into the innards of the machine.

Regards,

Daniel
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] reorganize include/linux/dccp.h

2005-08-06 Thread Harald Welte
Hi Arnaldo!

The protocol header files in linux/foo.h are usually structured in a
way to be included by userspace code.  The top section consists of
general protocol structure definitions, typedefs, enums - followed by an
#ifdef __KERNEL__ section.

Currently linux/dccp.h doesn't follow that convention and can
therefore not be used from userspace.  However, e.g. iptables'
libipt_dccp.c actually needs various definitions.

Below is a proposed patch to clean up dccp.h.  Please review and
consider applying it.  Thanks!

[the iptables ipt_dccp patch applies cleanly on top of this - but not
 the other way around]

Cheers,
Harald

-- 
- Harald Welte [EMAIL PROTECTED]  http://gnumonks.org/

Privacy in residential applications is a desirable marketing option.
  (ETSI EN 300 175-7 Ch. A6)
[DCCP] make linux/dccp.h include-able from userspace

The protocol header files in linux/foo.h are usually structured in a
way to be included by userspace code.  The top section consists of
general protocol structure definitions, typedefs, enums - followed by an
#ifdef __KERNEL__ section.

Currently linux/dccp.h doesn't follow that convention and can
therefore not be used from userspace.  However, for example iptables'
libipt_dccp.c actually needs various definitions from there.

Signed-off-by: Harald Welte [EMAIL PROTECTED]

---
commit 328f1df306bf5ae317d399d15146daae7bbd8477
tree 2d5da11ab69a35124755f95ef8f6a61ff492b935
parent 627c49af0423f8f48a2f467c8b69f746ef1891bc
author Harald Welte [EMAIL PROTECTED] Sa, 06 Aug 2005 23:17:00 +0200
committer Harald Welte [EMAIL PROTECTED] Sa, 06 Aug 2005 23:17:00 +0200

 include/linux/dccp.h |  238 +-
 1 files changed, 121 insertions(+), 117 deletions(-)

diff --git a/include/linux/dccp.h b/include/linux/dccp.h
--- a/include/linux/dccp.h
+++ b/include/linux/dccp.h
@@ -1,16 +1,8 @@
 #ifndef _LINUX_DCCP_H
 #define _LINUX_DCCP_H
 
-#include linux/in.h
-#include linux/list.h
 #include linux/types.h
-#include linux/uio.h
-#include linux/workqueue.h
-
-#include net/inet_connection_sock.h
-#include net/sock.h
-#include net/tcp_states.h
-#include net/tcp.h
+#include asm/byteorder.h
 
 /* FIXME: this is utterly wrong */
 struct sockaddr_dccp {
@@ -18,40 +10,6 @@ struct sockaddr_dccp {
unsigned intservice;
 };
 
-enum dccp_state {
-   DCCP_OPEN   = TCP_ESTABLISHED,
-   DCCP_REQUESTING = TCP_SYN_SENT,
-   DCCP_PARTOPEN   = TCP_FIN_WAIT1, /* FIXME:
-   This mapping is horrible, but TCP 
has
-   no matching state for DCCP_PARTOPEN,
-   as TCP_SYN_RECV is already used by
-   DCCP_RESPOND, why don't stop using 
TCP
-   mapping of states? OK, now we don't 
use
-   sk_stream_sendmsg anymore, so 
doesn't
-   seem to exist any reason for us to
-   do the TCP mapping here */
-   DCCP_LISTEN = TCP_LISTEN,
-   DCCP_RESPOND= TCP_SYN_RECV,
-   DCCP_CLOSING= TCP_CLOSING,
-   DCCP_TIME_WAIT  = TCP_TIME_WAIT,
-   DCCP_CLOSED = TCP_CLOSE,
-   DCCP_MAX_STATES = TCP_MAX_STATES,
-};
-
-#define DCCP_STATE_MASK 0xf
-#define DCCP_ACTION_FIN (17)
-
-enum {
-   DCCPF_OPEN   = TCPF_ESTABLISHED,
-   DCCPF_REQUESTING = TCPF_SYN_SENT,
-   DCCPF_PARTOPEN   = TCPF_FIN_WAIT1,
-   DCCPF_LISTEN = TCPF_LISTEN,
-   DCCPF_RESPOND= TCPF_SYN_RECV,
-   DCCPF_CLOSING= TCPF_CLOSING,
-   DCCPF_TIME_WAIT  = TCPF_TIME_WAIT,
-   DCCPF_CLOSED = TCPF_CLOSE,
-};
-
 /**
  * struct dccp_hdr - generic part of DCCP packet header
  *
@@ -94,11 +52,6 @@ struct dccp_hdr {
 #endif
 };
 
-static inline struct dccp_hdr *dccp_hdr(const struct sk_buff *skb)
-{
-   return (struct dccp_hdr *)skb-h.raw;
-}
-
 /**
  * struct dccp_hdr_ext - the low bits of a 48 bit seq packet
  *
@@ -108,34 +61,6 @@ struct dccp_hdr_ext {
__u32   dccph_seq_low;
 };
 
-static inline struct dccp_hdr_ext *dccp_hdrx(const struct sk_buff *skb)
-{
-   return (struct dccp_hdr_ext *)(skb-h.raw + sizeof(struct dccp_hdr));
-}
-
-static inline unsigned int dccp_basic_hdr_len(const struct sk_buff *skb)
-{
-   const struct dccp_hdr *dh = dccp_hdr(skb);
-   return sizeof(*dh) + (dh-dccph_x ? sizeof(struct dccp_hdr_ext) : 0);
-}
-
-static inline __u64 dccp_hdr_seq(const struct sk_buff *skb)
-{
-   const struct dccp_hdr *dh = dccp_hdr(skb);
-#if defined(__LITTLE_ENDIAN_BITFIELD)
-   __u64 seq_nr = ntohl(dh-dccph_seq  8);
-#elif defined(__BIG_ENDIAN_BITFIELD)
-   __u64 seq_nr = ntohl(dh-dccph_seq);
-#else
-#error  Adjust your 

Re: [PATCH] reorganize include/linux/dccp.h

2005-08-06 Thread Arnaldo Carvalho de Melo
On 8/6/05, Harald Welte [EMAIL PROTECTED] wrote:
 Hi Arnaldo!
 
 The protocol header files in linux/foo.h are usually structured in a
 way to be included by userspace code.  The top section consists of
 general protocol structure definitions, typedefs, enums - followed by an
 #ifdef __KERNEL__ section.
 
 Currently linux/dccp.h doesn't follow that convention and can
 therefore not be used from userspace.  However, e.g. iptables'
 libipt_dccp.c actually needs various definitions.
 
 Below is a proposed patch to clean up dccp.h.  Please review and
 consider applying it.  Thanks!
 
 [the iptables ipt_dccp patch applies cleanly on top of this - but not
  the other way around]

OK, I'm applying both patches, just had to add an include for linux/in.h that
was missing, thanks!

- Arnaldo
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: atheros driver - desc

2005-08-06 Thread Mateusz Berezecki
Kalle Valo [EMAIL PROTECTED] wrote:
| 
| This is great news. An open source Atheros driver which could be
| included to Linux is really needed.
| 
| But how was the reverse engineering done? I noticed that forcedeth
| driver was implemented using the clean room design[1] and Linux
| Broadcom 4301 driver project[2] seems to be using the same method.

  Reverse engineering was done by dissassemblying binary HAL
  and in harder parts by running it in userspace(yes, that is possible)
  and analysing input and produced output. The crucial part was to
  discover the meaning of hidden part of the structure describing
  device state. Once this was done it will be a little if no problem to me
  to provide updates for this driver, unless the whole binary HAL
  changes dramatically. That's one of the reasons I do this work myself.
  
| 
| The reason I'm asking this is that I just wouldn't want see the same
| happening this with this driver as happened during reverse engineering
| of pwc Philips Webcam driver (some parts of the driver were removed
| from kernel, but I believe the situation is now solved).

  If get into trouble I write documentation :-) I promise.


| Actually, what are requirements to get a reverse engineered driver
| included to Linux? Is clean room design an absolute must? It seems
| that reverse engineering is needed if we want Linux support for most
| of the WLAN cards on the market :(
| 

   Sad but true. The problem is not at vendors' side though.
   Look at FCC regulations... :/


   kind regards
   Mateusz


-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: ANNOUNCE: Linux DCCP implementation merged

2005-08-06 Thread Harald Welte
On Sat, Aug 06, 2005 at 06:57:15AM -0300, Arnaldo Carvalho de Melo wrote:
 Hi Guys,
 
   I'm very pleased to announce that the Linux 2.6 DCCP implementation
 has been merged in David Miller's net-2.6.14.git tree, and should appear
 shortly on Andrew Morton's 2.6.13-rcLATEST-mm tree and finally in mainline
 when Linus starts 2.6.14.

great ;)

   Now to work on:
 
 1. Getting the DCCP CCID infrastructure closer to the TCP Congestion
Avoidance one
 [...]

10. Implement iptables header matching for DCCP (see attached patch)

I've attached an (untested) patch for basic iptables support. Please
review (esp. the option matching part) and consider applying it to your
tree (or tell me to submit it to davem).  Current iptables from
svn.netfilter.org has the required userspace support (and even a manpage
snippet).

11. Implement connection tracking and NAT for DCCP in
netfilter/iptables.  To the best of my knowledge, we're the only
stateful packet filter that does SCTP so far... would be great to
have DCCP support, too.

Since you know the state transitions and other aspects of the DCCP
protocol well, it would be great to see ip_conntrack_proto_dccp.c (or even
better: nf_conntrack_proto_dccp.c) at some point :)

Cheers,
Harald
-- 
- Harald Welte [EMAIL PROTECTED]  http://gnumonks.org/

Privacy in residential applications is a desirable marketing option.
  (ETSI EN 300 175-7 Ch. A6)
[NETFILTER] New iptables DCCP protocol header match

Using this new iptables DCCP protocol header match, it is possible to
create simplistic stateless packet filtering rules for DCCP.  It permits
matching of port numbers, packet type and options.

Signed-off-by: Harald Welte [EMAIL PROTECTED]

---
commit 6e79d96f764001a225dea95bf84bcd9fef35476f
tree 5612cf3c9196b1e59bc0dcf8eb9e51c331f1aba3
parent c16fd4ffed6349d0888cd97a75d04394dac42021
author Harald Welte [EMAIL PROTECTED] Sa, 06 Aug 2005 20:48:01 +0200
committer Harald Welte [EMAIL PROTECTED] Sa, 06 Aug 2005 20:48:01 +0200

 include/linux/dccp.h|   16 ++-
 include/linux/netfilter_ipv4/ipt_dccp.h |   23 
 net/ipv4/netfilter/Kconfig  |   11 ++
 net/ipv4/netfilter/Makefile |1 
 net/ipv4/netfilter/ipt_dccp.c   |  176 +++
 5 files changed, 224 insertions(+), 3 deletions(-)

diff --git a/include/linux/dccp.h b/include/linux/dccp.h
--- a/include/linux/dccp.h
+++ b/include/linux/dccp.h
@@ -113,10 +113,15 @@ static inline struct dccp_hdr_ext *dccp_
return (struct dccp_hdr_ext *)(skb-h.raw + sizeof(struct dccp_hdr));
 }
 
+static inline unsigned int __dccp_basic_hdr_len(const struct dccp_hdr *dh)
+{
+   return sizeof(*dh) + (dh-dccph_x ? sizeof(struct dccp_hdr_ext) : 0);
+}
+
 static inline unsigned int dccp_basic_hdr_len(const struct sk_buff *skb)
 {
const struct dccp_hdr *dh = dccp_hdr(skb);
-   return sizeof(*dh) + (dh-dccph_x ? sizeof(struct dccp_hdr_ext) : 0);
+   return __dccp_basic_hdr_len(dh);
 }
 
 static inline __u64 dccp_hdr_seq(const struct sk_buff *skb)
@@ -249,10 +254,15 @@ static inline unsigned int dccp_packet_h
return sizeof(struct dccp_hdr_reset);
 }
 
+static inline unsigned int __dccp_hdr_len(const struct dccp_hdr *dh)
+{
+   return __dccp_basic_hdr_len(dh) +
+  dccp_packet_hdr_len(dh-dccph_type);
+}
+
 static inline unsigned int dccp_hdr_len(const struct sk_buff *skb)
 {
-   return dccp_basic_hdr_len(skb) +
-  dccp_packet_hdr_len(dccp_hdr(skb)-dccph_type);
+   return __dccp_hdr_len(dccp_hdr(skb));
 }
 
 enum dccp_reset_codes {
diff --git a/include/linux/netfilter_ipv4/ipt_dccp.h 
b/include/linux/netfilter_ipv4/ipt_dccp.h
new file mode 100644
--- /dev/null
+++ b/include/linux/netfilter_ipv4/ipt_dccp.h
@@ -0,0 +1,23 @@
+#ifndef _IPT_DCCP_H_
+#define _IPT_DCCP_H_
+
+#define IPT_DCCP_SRC_PORTS 0x01
+#define IPT_DCCP_DEST_PORTS0x02
+#define IPT_DCCP_TYPE  0x04
+#define IPT_DCCP_OPTION0x08
+
+#define IPT_DCCP_VALID_FLAGS   0x0f
+
+struct ipt_dccp_info {
+   u_int16_t dpts[2];  /* Min, Max */
+   u_int16_t spts[2];  /* Min, Max */
+
+   u_int16_t flags;
+   u_int16_t invflags;
+
+   u_int16_t typemask;
+   u_int8_t option;
+};
+
+#endif /* _IPT_DCCP_H_ */
+
diff --git a/net/ipv4/netfilter/Kconfig b/net/ipv4/netfilter/Kconfig
--- a/net/ipv4/netfilter/Kconfig
+++ b/net/ipv4/netfilter/Kconfig
@@ -354,6 +354,17 @@ config IP_NF_MATCH_SCTP
  If you want to compile it as a module, say M here and read
  file:Documentation/modules.txt.  If unsure, say `N'.
 
+config IP_NF_MATCH_DCCP
+   tristate  'DCCP protocol match support'
+   depends on IP_NF_IPTABLES
+   help
+ With this option enabled, you will be able to use the iptables
+   

[RFC] Net vm deadlock fix, version 4

2005-08-06 Thread Daniel Phillips
Hi,

This patch fills in some missing pieces:

   * Support v4 udp: same as v4 tcp, when in reserve, drop packets on
 noncritical sockets

   * Support v4 icmp: when in reserve, drop icmp traffic

   * Add reserve skb support to e1000 driver

   * API for dropping packets before delivery (dev_drop_skb)

   * Atomic_t for reserve accounting

Now ready for proof-of-concept testing.  High level API boilerplate will come
later.

Regards,

Daniel

diff -up --recursive 2.6.12.3.clean/drivers/net/e1000/e1000_main.c 
2.6.12.3/drivers/net/e1000/e1000_main.c
--- 2.6.12.3.clean/drivers/net/e1000/e1000_main.c   2005-07-15 
17:18:57.0 -0400
+++ 2.6.12.3/drivers/net/e1000/e1000_main.c 2005-08-06 16:46:13.0 
-0400
@@ -3242,7 +3242,7 @@ e1000_alloc_rx_buffers_ps(struct e1000_a
cpu_to_le64(ps_page_dma-ps_page_dma[j]);
}
 
-   skb = dev_alloc_skb(adapter-rx_ps_bsize0 + NET_IP_ALIGN);
+   skb = dev_memalloc_skb(netdev, adapter-rx_ps_bsize0 + 
NET_IP_ALIGN);
 
if(unlikely(!skb))
break;
@@ -3253,8 +3253,6 @@ e1000_alloc_rx_buffers_ps(struct e1000_a
 */
skb_reserve(skb, NET_IP_ALIGN);
 
-   skb-dev = netdev;
-
buffer_info-skb = skb;
buffer_info-length = adapter-rx_ps_bsize0;
buffer_info-dma = pci_map_single(pdev, skb-data,
diff -up --recursive 2.6.12.3.clean/include/linux/gfp.h 
2.6.12.3/include/linux/gfp.h
--- 2.6.12.3.clean/include/linux/gfp.h  2005-07-15 17:18:57.0 -0400
+++ 2.6.12.3/include/linux/gfp.h2005-08-05 21:53:09.0 -0400
@@ -39,6 +39,7 @@ struct vm_area_struct;
 #define __GFP_COMP 0x4000u /* Add compound page metadata */
 #define __GFP_ZERO 0x8000u /* Return zeroed page on success */
 #define __GFP_NOMEMALLOC 0x1u /* Don't use emergency reserves */
+#define __GFP_MEMALLOC  0x2u /* Use emergency reserves */
 
 #define __GFP_BITS_SHIFT 20/* Room for 20 __GFP_FOO bits */
 #define __GFP_BITS_MASK ((1  __GFP_BITS_SHIFT) - 1)
diff -up --recursive 2.6.12.3.clean/include/linux/netdevice.h 
2.6.12.3/include/linux/netdevice.h
--- 2.6.12.3.clean/include/linux/netdevice.h2005-07-15 17:18:57.0 
-0400
+++ 2.6.12.3/include/linux/netdevice.h  2005-08-06 16:37:14.0 -0400
@@ -371,6 +371,8 @@ struct net_device
struct Qdisc*qdisc_ingress;
struct list_headqdisc_list;
unsigned long   tx_queue_len;   /* Max frames per queue allowed 
*/
+   int rx_reserve;
+   atomic_trx_reserve_used;
 
/* ingress path synchronizer */
spinlock_t  ingress_lock;
@@ -662,6 +664,49 @@ static inline void dev_kfree_skb_any(str
dev_kfree_skb(skb);
 }
 
+/*
+ * Support for critical network IO under low memory conditions
+ */
+static inline int dev_reserve_used(struct net_device *dev)
+{
+   return atomic_read(dev-rx_reserve_used);
+}
+
+static inline struct sk_buff *__dev_memalloc_skb(struct net_device *dev,
+   unsigned length, int gfp_mask)
+{
+   struct sk_buff *skb = __dev_alloc_skb(length, gfp_mask);
+   if (skb)
+   goto done;
+   if (dev_reserve_used(dev) = dev-rx_reserve)
+   return NULL;
+   if (!__dev_alloc_skb(length, gfp_mask|__GFP_MEMALLOC))
+   return NULL;;
+   atomic_inc(dev-rx_reserve_used);
+done:
+   skb-dev = dev;
+   return skb;
+}
+
+static inline struct sk_buff *dev_memalloc_skb(struct net_device *dev,
+   unsigned length)
+{
+   return __dev_memalloc_skb(dev, length, GFP_ATOMIC);
+}
+
+static inline void dev_unreserve(struct net_device *dev)
+{
+   if (atomic_dec_return(dev-rx_reserve_used)  0)
+   atomic_inc(dev-rx_reserve_used);
+}
+
+static inline void dev_drop_skb(struct sk_buff *skb)
+{
+   struct net_device *dev = skb-dev;
+   __kfree_skb(skb);
+   dev_unreserve(dev);
+}
+
 #define HAVE_NETIF_RX 1
 extern int netif_rx(struct sk_buff *skb);
 extern int netif_rx_ni(struct sk_buff *skb);
diff -up --recursive 2.6.12.3.clean/include/net/sock.h 
2.6.12.3/include/net/sock.h
--- 2.6.12.3.clean/include/net/sock.h   2005-07-15 17:18:57.0 -0400
+++ 2.6.12.3/include/net/sock.h 2005-08-05 21:53:09.0 -0400
@@ -382,6 +382,7 @@ enum sock_flags {
SOCK_NO_LARGESEND, /* whether to sent large segments or not */
SOCK_LOCALROUTE, /* route locally only, %SO_DONTROUTE setting */
SOCK_QUEUE_SHRUNK, /* write queue has been shrunk recently */
+   SOCK_MEMALLOC, /* protocol can use memalloc reserve */
 };
 
 static inline void sock_set_flag(struct sock *sk, enum sock_flags flag)
@@ -399,6 +400,11 @@ static inline int sock_flag(struct sock 
return test_bit(flag, sk-sk_flags);
 }
 
+static inline int is_memalloc_sock(struct sock *sk)
+{
+   return 

Re: [PATCH] netpoll can lock up on low memory.

2005-08-06 Thread Matt Mackall
On Sat, Aug 06, 2005 at 09:58:27AM +0200, Ingo Molnar wrote:
 
 btw., the current NR_SKBS 32 in netpoll.c seems quite low, especially 
 e1000 can have a whole lot more skbs queued at once. Might be more 
 robust to increase it to 128 or 256?

Not sure that the card's queueing really makes a difference. It either
eventually releases the queued SKBs or it doesn't. What's more
important is that we be able to survive bursts like the output of
sysrq-t. This seems to work already.

-- 
Mathematics is the supreme nostalgia of our time.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html