[RFC] Net vm deadlock fix (take two)
Hi, This version does not do blatantly stupid things in hardware irq context, is more efficient, and... wow the patch is smaller! (That never happens.) I don't mark skbs as being allocated from reserve any more. That works, but it is slightly bogus, because it doesn't matter which skb came from reserve, it only matters that we put one back. So I just count them and don't mark them. The tricky issue that had to be dealt with is the possibility that a massive number of skbs could in theory be queued by the hardware interrupt before the softnet softirq gets around to delivering them to the protocol. But we can only allocate a limited number of skbs from reserve memory. If we run out of reserve memory we have no choice but to fail skb allocations, and that can cause packets to be dropped. Since we don't know which packets are blockio packets and which are not at that point, we could be so unlucky as to always drop block IO packets and always let other junk through. The other junk won't get very far though, because those packets will be dropped as soon as the protocol headers are decoded, which will reveal that they do not belong to a memalloc socket. This short circuit ought to help take away the cpu load that caused the softirq constipation in the first place. What is actually going to happen is, a few block IO packets might be randomly dropped under such conditions, degrading the transport efficiency. Block IO progress will continue, unless we manage to accidently drop every block IO packet and our softirqs continue to stay comatose, probably indicating a scheduler bug. OK, we want to allocate skbs from reserve, but we cannot go infinitely into reserve. So we count how many packets a driver has allocated from reserve, in the net_device struct. If this goes over some limit, the skb allocation fails and the device driver may drop a packet because of that. Note that the e1000 driver will just be trying to refill its rx-ring at this point, and it will try to refill it again as soon as the next packet arrives, so it is still some ways away from actually dropping a packet. Other drivers may immediately drop a packet at this point, c'est la vie. Remember, this can only happen if the softirqs are backed up a silly amount. The thing is, we have got our block IO traffic moving, by virtue of dipping into the reserve, and most likely moving at near-optimal speed. Normal memory-consuming tasks are not moving because they are blocked on vm IO. The things that can mess us up are cpu hogs - a scheduler problem - and tons of unhelpful traffic sharing the network wire, which we are killing off early as mentioned above. What happens when a packet arrives at the protocol handler is a little subtle. At this point, if the interface is into reserve, we can always decrement the reserve count, regardless of what type of packet it is. If it is a block IO packet, the packet is still accounted for within the block driver's throttling. We are sure that the packet's resources will be returned to the common pool in an organized way. If it is some other random kind of packet, we drop it right away, also returning the resources to the common pool. Either way, it is not the responsibility of the interface to account for it any more. I'll just reiterate what I'm trying to accomplish here: 1) Guarantee network block io forward progress 2) Block IO throughput should not degrade much under low memory This iteration of the patch addresses those goals nicely, I think. I have not yet shown how to drive this from the block IO layer, and I haven't shown how to be sure that all protocols on an interface (not just TCPv4, as here) can handle the reserve management semantics. I have ignored all transports besides IP, though not much changes for other transports. I have some accounting code that is very probably racy and needs to be rewritten with atomic_t. I have ignored the many hooks that are possible in the protocol path. I have assumed that all receive skbs are the same size, and haven't accounted for the possibility that that size (MTU) might change. All these things need looking at, but the main point at the moment is to establish a solid sense of correctness and to get some real results on a vanilla delivery path. That in itself will be useful for cluster work, where configuration issues are kept under careful control. As far as drivers are concerned, the new interface is dev_memalloc_skb, which is straightforward. It needs to know about the netdev for accounting purposes, so it takes it as a parameter and thoughtfully plugs it into the skb for you. I am still using the global memory reserve, not mempool. But notice, now I am explicitly accounting and throttling how deep a driver dips into the global reserve. So GFP_MEMALLOC wins a point: the driver isn't just using the global reserve blindly, as has been traditional. The jury is still out
Re: [PATCH] netpoll can lock up on low memory.
On Saturday 06 August 2005 12:32, Steven Rostedt wrote: If you need to really get the data out, then the design should be changed. Have some return value showing the failure, check for oops_in_progress or whatever, and try again after turning interrupts back on, and getting to a point where the system can free up memory (write to swap, etc). Just a busy loop without ever getting a skb is just bad. Why, pray tell, do you think there will be a second chance after re-enabling interrupts? How does this work when we're panicking or oopsing where we most care? How does this work when the netpoll client is the kernel debugger and the machine is completely stopped because we're tracing it? What I meant was to check for an oops and maybe then don't break out. Otherwise let the system try to reclaim memory. Since this is locked when the alloc_skb called with GFP_ATOMIC and fails. You might want to take a look at my stupid little __GFP_MEMALLOC hack in the network block IO deadlock thread on netdev. It will let you use the memalloc reserve from atomic context. As long as you can be sure your usage will be bounded and you will eventually give it back, this should be ok. Regards, Daniel - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] netpoll can lock up on low memory.
* Andi Kleen [EMAIL PROTECTED] wrote: On Fri, Aug 05, 2005 at 01:01:57PM -0700, Matt Mackall wrote: The netpoll philosophy is to assume that its traffic is an absolute priority - it is better to potentially hang trying to deliver a panic message than to give up and crash silently. That would be ok if netpoll was only used to deliver panics. But it is not. It delivers all messages, and you cannot hang the kernel during that. Actually even for panics it is wrong, because often it is more important to reboot in a panic than (with a panic timeout) to actually deliver the panic. That's needed e.g. in a failover cluster. without going into the merits of this discussion, reliable failover clusters must include (and do include) an external ability to cut power. No amount of in-kernel logic will prevent the kernel from hanging, given a bad enough kernel bug. So the right question is not 'can we prevent the kernel from hanging, ever' (we cannot), but 'which change makes it less likely for the kernel to hang'. (and, obviously: assuming all other kernel components are functioning per specification, netpoll itself most not hang :-) even a plain printk to VGA can hang in certain kernel crashes. Netpoll is more complex and thus has more exposure to hangs. E.g. netpoll relies on the network driver to correctly recycle skbs within a bound amount of time. If the network driver leaks skbs, it's game over for netpoll. [ i'd prefer a hang over nondeterministic behavior, and e.g. losing console messages is sure nondeterministic behavior. What if the console message is WARNING: the box has just been broken into? ] we could do one thing (see the patch below): i think it would be useful to fill up the netlogging skb queue straight at initialization time. Especially if netpoll is used for dumping alone, the system might not be in a situation to fill up the queue at the point of crash, so better be a bit more prepared and keep the pipeline filled. Ingo Signed-off-by: Ingo Molnar [EMAIL PROTECTED] --- net/core/netpoll.c.orig +++ net/core/netpoll.c @@ -720,6 +720,8 @@ int netpoll_setup(struct netpoll *np) } /* last thing to do is link it to the net device structure */ ndev-npinfo = npinfo; + /* fill up the skb queue */ + refill_skbs(); return 0; - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: critical section violation in tg3.c?
Simply do the pci_save_state before the register_netdev() call, no need to mess around with the locking. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: test
From: Daniel Phillips [EMAIL PROTECTED] Date: Sat, 6 Aug 2005 04:52:07 +1000 So then there is no choice but to throttle the per-cpu -input_pkt queues. Make the driver support NAPI if you want device fairness. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: ICMP broken in 2.6.13-rc5
On Sat, Aug 06, 2005 at 02:08:15AM +0400, Vladimir B. Savkin wrote: I found that it really is NOTRACK who cause? bogus ICMP errors. Well, this means that your ICMP errors need to be NAT'ed but they cannot, since the original connection causing the ICMP error did not go through connection tracking. Your not-correctly-NATed ICMP packets are the logical result of this configuration. Use of NOTRACK in combination with NAT is _extremely_ dangerous, and unless you understand it's full implications, I would not recommend combining the two. So it seems your use of NOTRACK is invalid in this setup - and thus like a configuration problem. -- - Harald Welte [EMAIL PROTECTED] http://gnumonks.org/ Privacy in residential applications is a desirable marketing option. (ETSI EN 300 175-7 Ch. A6) pgp6WMm07KihA.pgp Description: PGP signature
[PATCH 5/6][INET] Generalise tcp_v4_hash tcp_unhash
David, First set of changesets, please consider pulling from: rsync://rsync.kernel.org/pub/scm/linux/kernel/git/acme/net-2.6.14.git - Arnaldo tree 7095737bc15a06613ef809457f95847e88a66550 parent f48ce924d611ea239cc3527235c2d926715564bb author Arnaldo Carvalho de Melo [EMAIL PROTECTED] 1122957550 -0300 committer Arnaldo Carvalho de Melo [EMAIL PROTECTED] 1122957550 -0300 [INET] Generalise tcp_v4_hash tcp_unhash It really just makes the existing code be a helper function that tcp_v4_hash and tcp_unhash uses, specifying the right inet_hashinfo, tcp_hashinfo. One thing I'll investigate at some point is to have the inet_hashinfo pointer in sk_prot, so that we get all the hashtable information from the sk pointer, this can lead to some extra indirections that may well hurt performance/code size, we'll see. Ultimate idea would be that sk_prot would provide _all_ the information about a protocol implementation. Signed-off-by: Arnaldo Carvalho de Melo [EMAIL PROTECTED] Signed-off-by: David S. Miller [EMAIL PROTECTED] -- include/net/inet_hashtables.h | 34 ++ net/ipv4/tcp_ipv4.c | 29 ++--- 2 files changed, 36 insertions(+), 27 deletions(-) -- diff --git a/include/net/inet_hashtables.h b/include/net/inet_hashtables.h --- a/include/net/inet_hashtables.h +++ b/include/net/inet_hashtables.h @@ -240,4 +240,38 @@ static inline void __inet_hash(struct in if (listen_possible sk-sk_state == TCP_LISTEN) wake_up(hashinfo-lhash_wait); } + +static inline void inet_hash(struct inet_hashinfo *hashinfo, struct sock *sk) +{ + if (sk-sk_state != TCP_CLOSE) { + local_bh_disable(); + __inet_hash(hashinfo, sk, 1); + local_bh_enable(); + } +} + +static inline void inet_unhash(struct inet_hashinfo *hashinfo, struct sock *sk) +{ + rwlock_t *lock; + + if (sk_unhashed(sk)) + goto out; + + if (sk-sk_state == TCP_LISTEN) { + local_bh_disable(); + inet_listen_wlock(hashinfo); + lock = hashinfo-lhash_lock; + } else { + struct inet_ehash_bucket *head = hashinfo-ehash[sk-sk_hashent]; + lock = head-lock; + write_lock_bh(head-lock); + } + + if (__sk_del_node_init(sk)) + sock_prot_dec_use(sk-sk_prot); + write_unlock_bh(lock); +out: + if (sk-sk_state == TCP_LISTEN) + wake_up(hashinfo-lhash_wait); +} #endif /* _INET_HASHTABLES_H */ diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c --- a/net/ipv4/tcp_ipv4.c +++ b/net/ipv4/tcp_ipv4.c @@ -225,37 +225,12 @@ fail: static void tcp_v4_hash(struct sock *sk) { - if (sk-sk_state != TCP_CLOSE) { - local_bh_disable(); - __inet_hash(tcp_hashinfo, sk, 1); - local_bh_enable(); - } + inet_hash(tcp_hashinfo, sk); } void tcp_unhash(struct sock *sk) { - rwlock_t *lock; - - if (sk_unhashed(sk)) - goto ende; - - if (sk-sk_state == TCP_LISTEN) { - local_bh_disable(); - inet_listen_wlock(tcp_hashinfo); - lock = tcp_hashinfo.lhash_lock; - } else { - struct inet_ehash_bucket *head = tcp_hashinfo.ehash[sk-sk_hashent]; - lock = head-lock; - write_lock_bh(head-lock); - } - - if (__sk_del_node_init(sk)) - sock_prot_dec_use(sk-sk_prot); - write_unlock_bh(lock); - - ende: - if (sk-sk_state == TCP_LISTEN) - wake_up(tcp_hashinfo.lhash_wait); + inet_unhash(tcp_hashinfo, sk); } /* Don't inline this cruft. Here are some nice properties to - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 6/6][INET] Generalise tcp_v4_lookup_listener
David, First set of changesets, please consider pulling from: rsync://rsync.kernel.org/pub/scm/linux/kernel/git/acme/net-2.6.14.git - Arnaldo tree 74a7900b3b8a414e7bd2703d46ab098cb3058c97 parent 31c00831e34dd1da084057326655a0a080ba5fb2 author Arnaldo Carvalho de Melo [EMAIL PROTECTED] 1122962893 -0300 committer Arnaldo Carvalho de Melo [EMAIL PROTECTED] 1122962893 -0300 [INET] Generalise tcp_v4_lookup_listener [EMAIL PROTECTED] net-2.6.14]$ grep built-in /tmp/before /tmp/after /tmp/before: 282560 131229312 304994 4a762 net/ipv4/built-in.o /tmp/after: 282560 131229312 304994 4a762 net/ipv4/built-in.o Will be used in DCCP, not exporting it right now not to get in Adrian Bunk's exported-but-not-used-on-modules radar 8) Signed-off-by: Arnaldo Carvalho de Melo [EMAIL PROTECTED] Signed-off-by: David S. Miller [EMAIL PROTECTED] -- include/net/inet_hashtables.h | 36 ++ net/ipv4/inet_hashtables.c| 41 + net/ipv4/tcp_ipv4.c | 81 ++ 3 files changed, 82 insertions(+), 76 deletions(-) -- diff --git a/include/net/inet_hashtables.h b/include/net/inet_hashtables.h --- a/include/net/inet_hashtables.h +++ b/include/net/inet_hashtables.h @@ -16,8 +16,10 @@ #include linux/interrupt.h #include linux/ip.h +#include linux/ipv6.h #include linux/list.h #include linux/slab.h +#include linux/socket.h #include linux/spinlock.h #include linux/types.h #include linux/wait.h @@ -274,4 +276,38 @@ out: if (sk-sk_state == TCP_LISTEN) wake_up(hashinfo-lhash_wait); } + +extern struct sock *__inet_lookup_listener(const struct hlist_head *head, + const u32 daddr, + const unsigned short hnum, + const int dif); + +/* Optimize the common listener case. */ +static inline struct sock *inet_lookup_listener(struct inet_hashinfo *hashinfo, + const u32 daddr, + const unsigned short hnum, + const int dif) +{ + struct sock *sk = NULL; + struct hlist_head *head; + + read_lock(hashinfo-lhash_lock); + head = hashinfo-listening_hash[inet_lhashfn(hnum)]; + if (!hlist_empty(head)) { + const struct inet_sock *inet = inet_sk((sk = __sk_head(head))); + + if (inet-num == hnum !sk-sk_node.next + (!inet-rcv_saddr || inet-rcv_saddr == daddr) + (sk-sk_family == PF_INET || !ipv6_only_sock(sk)) + !sk-sk_bound_dev_if) + goto sherry_cache; + sk = __inet_lookup_listener(head, daddr, hnum, dif); + } + if (sk) { +sherry_cache: + sock_hold(sk); + } + read_unlock(hashinfo-lhash_lock); + return sk; +} #endif /* _INET_HASHTABLES_H */ diff --git a/net/ipv4/inet_hashtables.c b/net/ipv4/inet_hashtables.c --- a/net/ipv4/inet_hashtables.c +++ b/net/ipv4/inet_hashtables.c @@ -121,3 +121,44 @@ void inet_listen_wlock(struct inet_hashi } EXPORT_SYMBOL(inet_listen_wlock); + +/* + * Don't inline this cruft. Here are some nice properties to exploit here. The + * BSD API does not allow a listening sock to specify the remote port nor the + * remote address for the connection. So always assume those are both + * wildcarded during the search since they can never be otherwise. + */ +struct sock *__inet_lookup_listener(const struct hlist_head *head, const u32 daddr, + const unsigned short hnum, const int dif) +{ + struct sock *result = NULL, *sk; + const struct hlist_node *node; + int hiscore = -1; + + sk_for_each(sk, node, head) { + const struct inet_sock *inet = inet_sk(sk); + + if (inet-num == hnum !ipv6_only_sock(sk)) { + const __u32 rcv_saddr = inet-rcv_saddr; + int score = sk-sk_family == PF_INET ? 1 : 0; + + if (rcv_saddr) { + if (rcv_saddr != daddr) + continue; + score += 2; + } + if (sk-sk_bound_dev_if) { + if (sk-sk_bound_dev_if != dif) + continue; + score += 2; + } + if (score == 5) + return sk; + if (score hiscore) { + hiscore = score; +
Re: [PATCH 6/6][INET] Generalise tcp_v4_lookup_listener
All pulled, as well as your dccp-2.6.14 tree, into net-2.6.14 It should show up on the kernel.org mirrors shortly. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 6/6][INET] Generalise tcp_v4_lookup_listener
On 8/6/05, David S. Miller [EMAIL PROTECTED] wrote: All pulled, as well as your dccp-2.6.14 tree, into net-2.6.14 It should show up on the kernel.org mirrors shortly. WOW, that was fast, thank you! I'll be just one e-mail away to work right away on fixing any bug introduced by these changesets! But there should be none 8-) - Arnaldo - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
ANNOUNCE: Linux DCCP implementation merged
Hi Guys, I'm very pleased to announce that the Linux 2.6 DCCP implementation has been merged in David Miller's net-2.6.14.git tree, and should appear shortly on Andrew Morton's 2.6.13-rcLATEST-mm tree and finally in mainline when Linus starts 2.6.14. There is still a lot of work to be done, but this is a milestone to celebrate! Now to work on: 1. Getting the DCCP CCID infrastructure closer to the TCP Congestion Avoidance one 2. Finish the generalisation of TCP TIMEWAIT minisockets and make DCCP use it properly 3. Fully generalise net/ipv4/tcp_diag.c into net/core/net_diag.c so that we have all of the iproute2/netlink functionality 4. Implement CCID2 5. Implement the remaining options processing 6. Implement feature negotiation so that the interop tests with Joacim's FreeBSD and Nishida-san NetBSD implementations move along faster 7. Polish CCID3 getting it up to the latest standards, probably moving the packet history handling to the core and make it selectable by the CCIDs, like we already have the initial support for ACK Vectors 8. Reimplement dccp_{sendmsg,recvmsg} more inteligently, closer to the TCP implementation (this is closely related to #1 above). 9. Implement all the CCIDs Sally, Eddie and others come up with! :-) Thanks a lot to all involved! - Arnaldo - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] netpoll can lock up on low memory.
On Sat, 2005-08-06 at 02:46 -0700, David S. Miller wrote: Can you guys stop peeing your pants over this, put aside your differences, and work on a mutually acceptable fix for these bugs? Much appreciated, thanks :-) In my last email, I stated that this discussion seems to have demonstrated that the e1000 driver's netpoll is indeed broken, and needs to be fixed. I submitted eariler a patch for this, but it's untested and someone who owns an e1000 needs to try it. As for all the netpoll issues, I'm satisfied with whatever you guys decide. But I've seen lots of problems posted over the netpoll and e1000, where people send in patches that do everything but fix the e1000, and that's where I chimed in. Thank you, my pants are dry now :-) -- Steve - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] IPSec anti-replay sequence numbers
KOVACS Krisztian wrote: Hi, On Friday 05 August 2005 12.50, Patrick McHardy wrote: Is there already userspace code which uses this feature somewhere? AFAIK Ulrich has a patch for OpenSWAN, and we (Balabit) have a patch for racoon. Unfortunately this racoon version is available only as a commercial product. The patch for openswan is nearly finished and will be released around the end of this year. In my first post I split the patch into three pieces, two to get the sequence numbers with pf_key and netlink/xfrm, and one to set/inform about the sequence numbers over netlink/xfrm. IMHO der first two are useful for everyone using ipsec under linux, so it would be great if these two would flow into the vanilla kernel. The latter one must be determined if it's useful to add it to the vanilla kernel and if yes, in which form. Best regards Ulrich - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] netpoll can lock up on low memory.
On Sat, Aug 06, 2005 at 09:45:03AM +0200, Ingo Molnar wrote: * Andi Kleen [EMAIL PROTECTED] wrote: On Fri, Aug 05, 2005 at 01:01:57PM -0700, Matt Mackall wrote: The netpoll philosophy is to assume that its traffic is an absolute priority - it is better to potentially hang trying to deliver a panic message than to give up and crash silently. That would be ok if netpoll was only used to deliver panics. But it is not. It delivers all messages, and you cannot hang the kernel during that. Actually even for panics it is wrong, because often it is more important to reboot in a panic than (with a panic timeout) to actually deliver the panic. That's needed e.g. in a failover cluster. without going into the merits of this discussion, reliable failover clusters must include (and do include) an external ability to cut power. No amount of in-kernel logic will prevent the kernel from hanging, given a bad enough kernel bug. Ok, true, but we should do a best effort. So the right question is not 'can we prevent the kernel from hanging, ever' (we cannot), but 'which change makes it less likely for the kernel to hang'. (and, obviously: assuming all other kernel components are functioning per specification, netpoll itself most not hang :-) even a plain printk to VGA can hang in certain kernel crashes. Netpoll is more complex and thus has more exposure to hangs. E.g. netpoll relies on the network driver to correctly recycle skbs within a bound amount of time. If the network driver leaks skbs, it's game over for netpoll. I don't think we even need to think about such rare cases, until the easy cases (everything hangs when the cable is pulled) are not fixed. [ i'd prefer a hang over nondeterministic behavior, and e.g. losing console messages is sure nondeterministic behavior. What if the console message is WARNING: the box has just been broken into? ] That just makes netconsole useless in production. If it causes frequenet hangs people will not use it. we could do one thing (see the patch below): i think it would be useful to fill up the netlogging skb queue straight at initialization time. Especially if netpoll is used for dumping alone, the system might not be in a situation to fill up the queue at the point of crash, so better be a bit more prepared and keep the pipeline filled. You're solving a completely different issue here? -Andi - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: assertion (cnt = tp-packets_out) failed
Hang on a second, the original poster mentioned rc5. Is this really pristine rc5 with the one netpoll patch? If so then it can't be the patches we're talking about because they only went in days later. Yes, I have no other patches in, so if it was not in -RC5, I was not running it. --- John Bäckstrand - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] netpoll can lock up on low memory.
Steven Rostedt wrote: In my last email, I stated that this discussion seems to have demonstrated that the e1000 driver's netpoll is indeed broken, and needs to be fixed. I submitted eariler a patch for this, but it's untested and someone who owns an e1000 needs to try it. I can test this, but not right now: Im trying, again, to find my hard lockup issue, and so I will try to run this machine until it locks up. It lasted 9 days at one time, so it could potentially take some time, I'm afraid. --- John Bäckstrand - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/1][INET] Make inet_create try to load protocol modules
From: [EMAIL PROTECTED] (Arnaldo Carvalho de Melo) Date: Sat, 6 Aug 2005 10:01:05 -0300 + /* Be more specific, e.g. net-pf-2-132-1 (net-pf-PF_INET-IPPROTO_SCTP-SOCK_STREAM) */ + if (++try_loading_module == 1) + request_module(net-proto-%d-%d-%d, PF_INET, protocol, sock-type); Your comments don't match the strings you are actually building in request_module() ie. net-pf-* vs. net-proto-*. Please make them be consistent. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/1][INET] Make inet_create try to load protocol modules
Em Sat, Aug 06, 2005 at 06:24:35AM -0700, David S. Miller escreveu: From: [EMAIL PROTECTED] (Arnaldo Carvalho de Melo) Date: Sat, 6 Aug 2005 10:01:05 -0300 + /* Be more specific, e.g. net-pf-2-132-1 (net-pf-PF_INET-IPPROTO_SCTP-SOCK_STREAM) */ + if (++try_loading_module == 1) + request_module(net-proto-%d-%d-%d, PF_INET, protocol, sock-type); Your comments don't match the strings you are actually building in request_module() ie. net-pf-* vs. net-proto-*. Please make them be consistent. OK, I'll do this later, lack of sleep must be the reason for this mistake :-\ -- - Arnaldo - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2.6 6/5] tg3: Fix bug in setting a tg3_flag
Michael, I've added all 6 patches to my net-2.6.14 tree. It should show up on the kernel.org GIT mirrors shortly. I decided against sticking this into 2.6.13, as these changes can introduce regressions and the space of users effected by this problem is decidedly small compared to how many could be effected by any error in these changes. Thanks. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: assertion (cnt = tp-packets_out) failed
From: Herbert Xu [EMAIL PROTECTED] Date: Sat, 6 Aug 2005 17:57:17 +1000 Hang on a second, the original poster mentioned rc5. Is this really pristine rc5 with the one netpoll patch? If so then it can't be the patches we're talking about because they only went in days later. This seems to be confirmed now... so I'll hold off on the revert for now. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/1][INET] Make inet_create try to load protocol modules
Em Sat, Aug 06, 2005 at 06:24:35AM -0700, David S. Miller escreveu: From: [EMAIL PROTECTED] (Arnaldo Carvalho de Melo) Date: Sat, 6 Aug 2005 10:01:05 -0300 + /* Be more specific, e.g. net-pf-2-132-1 (net-pf-PF_INET-IPPROTO_SCTP-SOCK_STREAM) */ + if (++try_loading_module == 1) + request_module(net-proto-%d-%d-%d, PF_INET, protocol, sock-type); Your comments don't match the strings you are actually building in request_module() ie. net-pf-* vs. net-proto-*. Please make them be consistent. Fixed: rsync://rsync.kernel.org/pub/scm/linux/kernel/git/acme/net-2.6.14.git I checked and the mirrors picked this one already. - Arnaldo tree 13278f7cf4453ec1bc5d9e2f45bd5cd250f7ce18 parent 16963c77a4472768f6c04d14681584a118f6a7f4 author Arnaldo Carvalho de Melo [EMAIL PROTECTED] 1123337601 -0300 committer Arnaldo Carvalho de Melo [EMAIL PROTECTED] 1123337601 -0300 [INET] Make inet_create try to load protocol modules Syntax is net-proto-PROTOCOL_FAMILY-PROTOCOL-SOCK_TYPE and if this fails net-proto-PROTOCOL_FAMILY-PROTOCOL. Signed-off-by: Arnaldo Carvalho de Melo [EMAIL PROTECTED] Signed-off-by: David S. Miller [EMAIL PROTECTED] -- dccp/proto.c|9 +++-- ipv4/af_inet.c | 21 + sctp/protocol.c |4 3 files changed, 28 insertions(+), 6 deletions(-) -- diff --git a/net/dccp/proto.c b/net/dccp/proto.c --- a/net/dccp/proto.c +++ b/net/dccp/proto.c @@ -811,8 +811,13 @@ static void __exit dccp_fini(void) module_init(dccp_init); module_exit(dccp_fini); -/* __stringify doesn't likes enums, so use SOCK_DCCP (6) value directly */ -MODULE_ALIAS(net-pf- __stringify(PF_INET) -6); +/* + * __stringify doesn't likes enums, so use SOCK_DCCP (6) and IPPROTO_DCCP (33) + * values directly, Also cover the case where the protocol is not specified, + * i.e. net-proto-PF_INET-0-SOCK_DCCP + */ +MODULE_ALIAS(net-proto- __stringify(PF_INET) -33-6); +MODULE_ALIAS(net-proto- __stringify(PF_INET) -0-6); MODULE_LICENSE(GPL); MODULE_AUTHOR(Arnaldo Carvalho de Melo [EMAIL PROTECTED]); MODULE_DESCRIPTION(DCCP - Datagram Congestion Controlled Protocol); diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c --- a/net/ipv4/af_inet.c +++ b/net/ipv4/af_inet.c @@ -228,12 +228,14 @@ static int inet_create(struct socket *so struct proto *answer_prot; unsigned char answer_flags; char answer_no_check; - int err; + int try_loading_module = 0; + int err = -ESOCKTNOSUPPORT; sock-state = SS_UNCONNECTED; /* Look for the requested type/protocol pair. */ answer = NULL; +lookup_protocol: rcu_read_lock(); list_for_each_rcu(p, inetsw[sock-type]) { answer = list_entry(p, struct inet_protosw, list); @@ -254,9 +256,20 @@ static int inet_create(struct socket *so answer = NULL; } - err = -ESOCKTNOSUPPORT; - if (!answer) - goto out_rcu_unlock; + if (unlikely(answer == NULL)) { + if (try_loading_module 2) { + rcu_read_unlock(); + /* Be more specific, e.g. net-proto-2-132-1 (net-proto-PF_INET-IPPROTO_SCTP-SOCK_STREAM) */ + if (++try_loading_module == 1) + request_module(net-proto-%d-%d-%d, PF_INET, protocol, sock-type); + /* Fall back to generic, e.g. net-proto-132-1 (net-proto-IPPROTO_SCTP) */ + else + request_module(net-proto-%d-%d, PF_INET, protocol); + goto lookup_protocol; + } else + goto out_rcu_unlock; + } + err = -EPERM; if (answer-capability 0 !capable(answer-capability)) goto out_rcu_unlock; diff --git a/net/sctp/protocol.c b/net/sctp/protocol.c --- a/net/sctp/protocol.c +++ b/net/sctp/protocol.c @@ -1242,6 +1242,10 @@ SCTP_STATIC __exit void sctp_exit(void) module_init(sctp_init); module_exit(sctp_exit); +/* + * __stringify doesn't likes enums, so use IPPROTO_SCTP value (132) directly. + */ +MODULE_ALIAS(net-proto- __stringify(PF_INET) -132); MODULE_AUTHOR(Linux Kernel SCTP developers [EMAIL PROTECTED]); MODULE_DESCRIPTION(Support for the SCTP protocol (RFC2960)); MODULE_LICENSE(GPL); - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/2] LSM-IPSec Networking Hooks -- revised flow cache [resend]
OK. Thanks for the comments. I'll get back soon. Regards, Trent. Trent Jaeger IBM T.J. Watson Research Center 19 Skyline Drive, Hawthorne, NY 10532 (914) 784-7225, FAX (914) 784-7225 Herbert Xu [EMAIL PROTECTED] 08/06/2005 03:45 AM To: Trent Jaeger/Watson/[EMAIL PROTECTED] cc: [EMAIL PROTECTED], [EMAIL PROTECTED], [EMAIL PROTECTED], netdev@vger.kernel.org, [EMAIL PROTECTED], Serge E Hallyn/Austin/[EMAIL PROTECTED], [EMAIL PROTECTED], [EMAIL PROTECTED] Subject:Re: [PATCH 1/2] LSM-IPSec Networking Hooks -- revised flow cache [resend] On Tue, Aug 02, 2005 at 02:04:41PM -0400, jaegert wrote: Resend of 20 July patch that repaired the flow_cache_lookup authorization (now for 2.6.13-rc4-git4). Thanks Trent. I'm happy with the flow cache stuff now. However, there are still some technical details to take care of. diff -puN include/linux/xfrm.h~lsm-xfrm-nethooks include/linux/xfrm.h --- linux-2.6.13-rc4-xfrm/include/linux/xfrm.h~lsm-xfrm-nethooks 2005-08-01 16:11:22.0 -0400 +++ linux-2.6.13-rc4-xfrm-root/include/linux/xfrm.h 2005-08-01 16:11:22.0 -0400 @@ -173,6 +201,7 @@ enum xfrm_attr_type_t { XFRMA_ALG_CRYPT,/* struct xfrm_algo */ XFRMA_ALG_COMP, /* struct xfrm_algo */ XFRMA_ENCAP,/* struct xfrm_algo + struct xfrm_encap_tmpl */ + XFRMA_SEC_CTX, /* struct xfrm_sec_ctx */ XFRMA_TMPL, /* 1 or more struct xfrm_user_tmpl */ XFRMA_SA, XFRMA_POLICY, Please add it at the end of the enum as otherwise you may break existing user-space applications. In this particular case the breakage isn't serious since those three XFRMA types are fairly recent but still it's better to be safe than sorry :) diff -puN include/net/xfrm.h~lsm-xfrm-nethooks include/net/xfrm.h --- linux-2.6.13-rc4-xfrm/include/net/xfrm.h~lsm-xfrm-nethooks 2005-08-01 16:11:22.0 -0400 +++ linux-2.6.13-rc4-xfrm-root/include/net/xfrm.h 2005-08-01 16:11:22.0 -0400 @@ -510,6 +514,27 @@ xfrm_selector_match(struct xfrm_selector return 0; } +/* If neither has a context -- match + Otherwise, both must have a context and the sids, doi, alg must match */ +static inline int xfrm_sec_ctx_match(struct xfrm_sec_ctx *s1, struct xfrm_sec_ctx *s2) +{ + return ((!s1 !s2) || + (s1 s2 + (s1-ctx_sid == s2-ctx_sid) + (s1-ctx_doi == s2-ctx_doi) + (s1-ctx_alg == s2-ctx_alg))); +} Would it be possible to make this conditional on CONFIG_SECURITY_NETWORK? +static inline struct xfrm_sec_ctx *xfrm_policy_security(struct xfrm_policy *xp) +{ + return (xp ? xp-security : NULL); +} + +static inline struct xfrm_sec_ctx *xfrm_state_security(struct xfrm_state *x) +{ + return (x ? x-security : NULL); +} + Do you really need these NULL checks? If not I'd suggest getting rid of these altogether. A quick glance at all the users of xfrm_policy_security in Patch 1 seems to indicate that none of those places can have xp being NULL. diff -puN net/core/flow.c~lsm-xfrm-nethooks net/core/flow.c --- linux-2.6.13-rc4-xfrm/net/core/flow.c~lsm-xfrm-nethooks 2005-08-01 16:11:22.0 -0400 +++ linux-2.6.13-rc4-xfrm-root/net/core/flow.c 2005-08-01 16:12:03.0 -0400 @@ -23,6 +23,7 @@ #include net/flow.h #include asm/atomic.h #include asm/semaphore.h +#include linux/security.h This appears to be unnecessary. diff -puN net/ipv4/xfrm4_policy.c~lsm-xfrm-nethooks net/ipv4/xfrm4_policy.c --- linux-2.6.13-rc4-xfrm/net/ipv4/xfrm4_policy.c~lsm-xfrm-nethooks 2005-08-01 16:11:22.0 -0400 +++ linux-2.6.13-rc4-xfrm-root/net/ipv4/xfrm4_policy.c 2005-08-01 16:11:22.0 -0400 @@ -36,6 +36,8 @@ __xfrm4_find_bundle(struct flowi *fl, st if (xdst-u.rt.fl.oif == fl-oif /*XXX*/ xdst-u.rt.fl.fl4_dst == fl-fl4_dst xdst-u.rt.fl.fl4_src == fl-fl4_src + xfrm_sec_ctx_match(xfrm_policy_security(policy), + xfrm_state_security(dst-xfrm)) Is this necessary? The policy's context must've matched the state's context at its creation time. AFAIK there is no way for the security context to change during their life-cycle. diff -puN net/ipv6/xfrm6_policy.c~lsm-xfrm-nethooks net/ipv6/xfrm6_policy.c --- linux-2.6.13-rc4-xfrm/net/ipv6/xfrm6_policy.c~lsm-xfrm-nethooks 2005-08-01 16:11:22.0 -0400 +++ linux-2.6.13-rc4-xfrm-root/net/ipv6/xfrm6_policy.c 2005-08-01 16:11:22.0 -0400 @@ -54,6 +54,8 @@ __xfrm6_find_bundle(struct flowi *fl, st xdst-u.rt6.rt6i_src.plen);
Re: ICMP broken in 2.6.13-rc5
On Sat, Aug 06, 2005 at 01:25:43PM +0400, Vladimir B. Savkin wrote: On Sat, Aug 06, 2005 at 11:13:37AM +0200, Harald Welte wrote: On Sat, Aug 06, 2005 at 02:08:15AM +0400, Vladimir B. Savkin wrote: I found that it really is NOTRACK who cause? bogus ICMP errors. Well, this means that your ICMP errors need to be NAT'ed but they cannot, since the original connection causing the ICMP error did not go through connection tracking. How so, when there are no NAT rules that can match either source packets or ICMP errors? As soon as you load NAT, _all_ connections need to be tracked, since those with no NAT configured need to allocate a null binding. NAT needs to know about all connections, since otherwise it would not be able to learn about all already-used port/ip tuples. So independant of the specific ICMP problem you're observing, the configuration seems broken to me in the first place. It remains to be questioned, whether we should deal more gracefully with such a setup, though. But the discussion like this are one of the reasons why we thought very hard whether we should include the NOTRACK target into mainline at all. It is dangerous, and a lot of people will use it in combination and end up with broken configuration. I think we should make NOTRACK and NAT an XOR, i.e. only allow one of them to be enabled at any given time. -- - Harald Welte [EMAIL PROTECTED] http://gnumonks.org/ Privacy in residential applications is a desirable marketing option. (ETSI EN 300 175-7 Ch. A6) pgpSphNKlfV0Z.pgp Description: PGP signature
Re: ICMP broken in 2.6.13-rc5
On Sat, Aug 06, 2005 at 05:12:01PM +0200, Harald Welte wrote: Well, this means that your ICMP errors need to be NAT'ed but they cannot, since the original connection causing the ICMP error did not go through connection tracking. How so, when there are no NAT rules that can match either source packets or ICMP errors? As soon as you load NAT, _all_ connections need to be tracked, since those with no NAT configured need to allocate a null binding. NAT needs to know about all connections, since otherwise it would not be able to learn about all already-used port/ip tuples. So independant of the specific ICMP problem you're observing, the configuration seems broken to me in the first place. It remains to be questioned, whether we should deal more gracefully with such a setup, though. In my case, I have local network and Internet access. Local traffic (packets which have both src and dst IP belonging to local prefix) does not need to be NATed or statefully filtered. So I wanted to use NOTRACK for maximum forwarding performance. ICMP error were matched by NOTRACK too (in OUTPUT chain of raw table), as it also has local src and dst. IMO, this means then there should be no NAT attempts for this ICMP packet... I think of this as a valuable feature of Linux - using one box for two or more applications, in my case - local router (no NAT, no stateful filtering, maximum performance) and Internet gateway (with NAT, more filtering, maximum control). But the discussion like this are one of the reasons why we thought very hard whether we should include the NOTRACK target into mainline at all. It is dangerous, and a lot of people will use it in combination and end up with broken configuration. I think we should make NOTRACK and NAT an XOR, i.e. only allow one of them to be enabled at any given time. Well, this would break this feature which worked very well for me with older kernels. ~ :wq With best regards, Vladimir Savkin. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: ICMP broken in 2.6.13-rc5
On Sat, Aug 06, 2005 at 04:58:46PM +0200, Patrick McHardy wrote: Harald Welte wrote: On Sat, Aug 06, 2005 at 02:08:15AM +0400, Vladimir B. Savkin wrote: I found that it really is NOTRACK who cause? bogus ICMP errors. Good work tracking this down. I've seen reports of this before, but never found the reason. Well, this means that your ICMP errors need to be NAT'ed but they cannot, since the original connection causing the ICMP error did not go through connection tracking. Your not-correctly-NATed ICMP packets are the logical result of this configuration. Use of NOTRACK in combination with NAT is _extremely_ dangerous, and unless you understand it's full implications, I would not recommend combining the two. So it seems your use of NOTRACK is invalid in this setup - and thus like a configuration problem. I disagree, NAT already ignores untracked connections in most places, just icmp_reply_translation is missing. Vladimir, can you please test the attached patch? No success, looks that with this patch no ICMP replies are generated (*), no matter whether there exist any NOTRACK rules. (*) I only tested that no replies were received by the client (broken tracepath) and that there were no bogus packets on loopback. diff --git a/net/ipv4/netfilter/ip_nat_core.c b/net/ipv4/netfilter/ip_nat_core.c --- a/net/ipv4/netfilter/ip_nat_core.c +++ b/net/ipv4/netfilter/ip_nat_core.c @@ -430,6 +430,19 @@ int icmp_reply_translation(struct sk_buf } *inside; struct ip_conntrack_tuple inner, target; int hdrlen = (*pskb)-nh.iph-ihl * 4; + unsigned long statusbit; + + if (manip == IP_NAT_MANIP_SRC) + statusbit = IPS_SRC_NAT; + else + statusbit = IPS_DST_NAT; + + /* Invert if this is reply dir. */ + if (dir == IP_CT_DIR_REPLY) + statusbit ^= IPS_NAT_MASK; + + if (!(ct-status statusbit)) + return 0; if (!skb_make_writable(pskb, hdrlen + sizeof(*inside))) return 0; ~ :wq With best regards, Vladimir Savkin. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: atheros driver - desc
Mateusz Berezecki [EMAIL PROTECTED] writes: The driver is not yet fully working because I didn't finish kernel integration yet. Almost all driver I/O ops are reverse engineered independently of openbsd openhal which is missing just too much. Ok, enough talking. Most of the atheros 5212 hal is now open :) This is great news. An open source Atheros driver which could be included to Linux is really needed. But how was the reverse engineering done? I noticed that forcedeth driver was implemented using the clean room design[1] and Linux Broadcom 4301 driver project[2] seems to be using the same method. The reason I'm asking this is that I just wouldn't want see the same happening this with this driver as happened during reverse engineering of pwc Philips Webcam driver (some parts of the driver were removed from kernel, but I believe the situation is now solved). Actually, what are requirements to get a reverse engineered driver included to Linux? Is clean room design an absolute must? It seems that reverse engineering is needed if we want Linux support for most of the WLAN cards on the market :( [1] http://en.wikipedia.org/wiki/Clean_room_design [2] http://linux-bcom4301.sourceforge.net/go/progress -- Kalle Valo - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: w1 netlink
On Sat, Aug 06, 2005 at 09:37:00PM +0200, Patrick McHardy ([EMAIL PROTECTED]) wrote: I'm working on extending netlink to work with an arbitary number of groups and stumbled over this in the w1 driver: dev-groups = 23 NETLINK_CB(skb).dst_group = dev-groups; netlink_broadcast(dev-nls, skb, 0, dev-groups, GFP_ATOMIC); Apparently it wants to send to multiple groups at once, is that correct? Why does it need to do so? One limitation introduced by my patches will be that broadcasting to multiple groups won't be possible anymore and this is the only code in the kernel that uses this feature of netlink. 23 was selected arbitrary - w1 definitely can live without multicast. According to complete removal of multicast feature - it is qiute usefull, maybe it is better to make it per-socket. And will not it break RTMGRP_* messages? -- Evgeniy Polyakov - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: kfree_skb questions
Daniel Phillips wrote: Hi, The way I read this, __kfree_skb will sometimes be called with -users = 1 and sometimes with -users = 0, is that right? Yes. static inline void kfree_skb(struct sk_buff *skb) { if (likely(atomic_read(skb-users) == 1)) smp_rmb(); else if (likely(!atomic_dec_and_test(skb-users))) return; __kfree_skb(skb); } If so, then why not just: static inline void kfree_skb(struct sk_buff *skb) { if (likely(atomic_read(skb-users) == 1)) smp_rmb(); if (likely(!atomic_dec_and_test(skb-users))) return; __kfree_skb(skb); } so __kfree_skb can BUG_ON(atomic_read(skb-users))? Perhaps this has something to do with the smp_rmb, could somebody please explain to me why it is necessary here, and for which architectures? The atomic_read is used as an optimization under the assumption that an atomic_read is cheaper than an atomic_dec_and_test. The smp_rmb is (was) needed to make sure the CPU didn't reorder things because we used to have a BUG check in __kfree_skb which triggered if skb-list was non-NULL. Anyway, do we not want BUG_ON(!atomic_read(skb-users)) at the beginning of kfree_skb, since we rely on it? Why do you care if skb-users is 0 or 1 in __kfree_skb()? - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: kfree_skb questions
On Sunday 07 August 2005 06:26, Patrick McHardy wrote: Anyway, do we not want BUG_ON(!atomic_read(skb-users)) at the beginning of kfree_skb, since we rely on it? Why do you care if skb-users is 0 or 1 in __kfree_skb()? Because I am a neatness freak and I like to check things that inattentive coders can easily get wrong. But the question above is not about that, it is about checking for possible calls where skb-users is already zero and thereby catching the double free early instead of letting it slide further into the innards of the machine. Regards, Daniel - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] reorganize include/linux/dccp.h
Hi Arnaldo! The protocol header files in linux/foo.h are usually structured in a way to be included by userspace code. The top section consists of general protocol structure definitions, typedefs, enums - followed by an #ifdef __KERNEL__ section. Currently linux/dccp.h doesn't follow that convention and can therefore not be used from userspace. However, e.g. iptables' libipt_dccp.c actually needs various definitions. Below is a proposed patch to clean up dccp.h. Please review and consider applying it. Thanks! [the iptables ipt_dccp patch applies cleanly on top of this - but not the other way around] Cheers, Harald -- - Harald Welte [EMAIL PROTECTED] http://gnumonks.org/ Privacy in residential applications is a desirable marketing option. (ETSI EN 300 175-7 Ch. A6) [DCCP] make linux/dccp.h include-able from userspace The protocol header files in linux/foo.h are usually structured in a way to be included by userspace code. The top section consists of general protocol structure definitions, typedefs, enums - followed by an #ifdef __KERNEL__ section. Currently linux/dccp.h doesn't follow that convention and can therefore not be used from userspace. However, for example iptables' libipt_dccp.c actually needs various definitions from there. Signed-off-by: Harald Welte [EMAIL PROTECTED] --- commit 328f1df306bf5ae317d399d15146daae7bbd8477 tree 2d5da11ab69a35124755f95ef8f6a61ff492b935 parent 627c49af0423f8f48a2f467c8b69f746ef1891bc author Harald Welte [EMAIL PROTECTED] Sa, 06 Aug 2005 23:17:00 +0200 committer Harald Welte [EMAIL PROTECTED] Sa, 06 Aug 2005 23:17:00 +0200 include/linux/dccp.h | 238 +- 1 files changed, 121 insertions(+), 117 deletions(-) diff --git a/include/linux/dccp.h b/include/linux/dccp.h --- a/include/linux/dccp.h +++ b/include/linux/dccp.h @@ -1,16 +1,8 @@ #ifndef _LINUX_DCCP_H #define _LINUX_DCCP_H -#include linux/in.h -#include linux/list.h #include linux/types.h -#include linux/uio.h -#include linux/workqueue.h - -#include net/inet_connection_sock.h -#include net/sock.h -#include net/tcp_states.h -#include net/tcp.h +#include asm/byteorder.h /* FIXME: this is utterly wrong */ struct sockaddr_dccp { @@ -18,40 +10,6 @@ struct sockaddr_dccp { unsigned intservice; }; -enum dccp_state { - DCCP_OPEN = TCP_ESTABLISHED, - DCCP_REQUESTING = TCP_SYN_SENT, - DCCP_PARTOPEN = TCP_FIN_WAIT1, /* FIXME: - This mapping is horrible, but TCP has - no matching state for DCCP_PARTOPEN, - as TCP_SYN_RECV is already used by - DCCP_RESPOND, why don't stop using TCP - mapping of states? OK, now we don't use - sk_stream_sendmsg anymore, so doesn't - seem to exist any reason for us to - do the TCP mapping here */ - DCCP_LISTEN = TCP_LISTEN, - DCCP_RESPOND= TCP_SYN_RECV, - DCCP_CLOSING= TCP_CLOSING, - DCCP_TIME_WAIT = TCP_TIME_WAIT, - DCCP_CLOSED = TCP_CLOSE, - DCCP_MAX_STATES = TCP_MAX_STATES, -}; - -#define DCCP_STATE_MASK 0xf -#define DCCP_ACTION_FIN (17) - -enum { - DCCPF_OPEN = TCPF_ESTABLISHED, - DCCPF_REQUESTING = TCPF_SYN_SENT, - DCCPF_PARTOPEN = TCPF_FIN_WAIT1, - DCCPF_LISTEN = TCPF_LISTEN, - DCCPF_RESPOND= TCPF_SYN_RECV, - DCCPF_CLOSING= TCPF_CLOSING, - DCCPF_TIME_WAIT = TCPF_TIME_WAIT, - DCCPF_CLOSED = TCPF_CLOSE, -}; - /** * struct dccp_hdr - generic part of DCCP packet header * @@ -94,11 +52,6 @@ struct dccp_hdr { #endif }; -static inline struct dccp_hdr *dccp_hdr(const struct sk_buff *skb) -{ - return (struct dccp_hdr *)skb-h.raw; -} - /** * struct dccp_hdr_ext - the low bits of a 48 bit seq packet * @@ -108,34 +61,6 @@ struct dccp_hdr_ext { __u32 dccph_seq_low; }; -static inline struct dccp_hdr_ext *dccp_hdrx(const struct sk_buff *skb) -{ - return (struct dccp_hdr_ext *)(skb-h.raw + sizeof(struct dccp_hdr)); -} - -static inline unsigned int dccp_basic_hdr_len(const struct sk_buff *skb) -{ - const struct dccp_hdr *dh = dccp_hdr(skb); - return sizeof(*dh) + (dh-dccph_x ? sizeof(struct dccp_hdr_ext) : 0); -} - -static inline __u64 dccp_hdr_seq(const struct sk_buff *skb) -{ - const struct dccp_hdr *dh = dccp_hdr(skb); -#if defined(__LITTLE_ENDIAN_BITFIELD) - __u64 seq_nr = ntohl(dh-dccph_seq 8); -#elif defined(__BIG_ENDIAN_BITFIELD) - __u64 seq_nr = ntohl(dh-dccph_seq); -#else -#error Adjust your
Re: [PATCH] reorganize include/linux/dccp.h
On 8/6/05, Harald Welte [EMAIL PROTECTED] wrote: Hi Arnaldo! The protocol header files in linux/foo.h are usually structured in a way to be included by userspace code. The top section consists of general protocol structure definitions, typedefs, enums - followed by an #ifdef __KERNEL__ section. Currently linux/dccp.h doesn't follow that convention and can therefore not be used from userspace. However, e.g. iptables' libipt_dccp.c actually needs various definitions. Below is a proposed patch to clean up dccp.h. Please review and consider applying it. Thanks! [the iptables ipt_dccp patch applies cleanly on top of this - but not the other way around] OK, I'm applying both patches, just had to add an include for linux/in.h that was missing, thanks! - Arnaldo - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: atheros driver - desc
Kalle Valo [EMAIL PROTECTED] wrote: | | This is great news. An open source Atheros driver which could be | included to Linux is really needed. | | But how was the reverse engineering done? I noticed that forcedeth | driver was implemented using the clean room design[1] and Linux | Broadcom 4301 driver project[2] seems to be using the same method. Reverse engineering was done by dissassemblying binary HAL and in harder parts by running it in userspace(yes, that is possible) and analysing input and produced output. The crucial part was to discover the meaning of hidden part of the structure describing device state. Once this was done it will be a little if no problem to me to provide updates for this driver, unless the whole binary HAL changes dramatically. That's one of the reasons I do this work myself. | | The reason I'm asking this is that I just wouldn't want see the same | happening this with this driver as happened during reverse engineering | of pwc Philips Webcam driver (some parts of the driver were removed | from kernel, but I believe the situation is now solved). If get into trouble I write documentation :-) I promise. | Actually, what are requirements to get a reverse engineered driver | included to Linux? Is clean room design an absolute must? It seems | that reverse engineering is needed if we want Linux support for most | of the WLAN cards on the market :( | Sad but true. The problem is not at vendors' side though. Look at FCC regulations... :/ kind regards Mateusz - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: ANNOUNCE: Linux DCCP implementation merged
On Sat, Aug 06, 2005 at 06:57:15AM -0300, Arnaldo Carvalho de Melo wrote: Hi Guys, I'm very pleased to announce that the Linux 2.6 DCCP implementation has been merged in David Miller's net-2.6.14.git tree, and should appear shortly on Andrew Morton's 2.6.13-rcLATEST-mm tree and finally in mainline when Linus starts 2.6.14. great ;) Now to work on: 1. Getting the DCCP CCID infrastructure closer to the TCP Congestion Avoidance one [...] 10. Implement iptables header matching for DCCP (see attached patch) I've attached an (untested) patch for basic iptables support. Please review (esp. the option matching part) and consider applying it to your tree (or tell me to submit it to davem). Current iptables from svn.netfilter.org has the required userspace support (and even a manpage snippet). 11. Implement connection tracking and NAT for DCCP in netfilter/iptables. To the best of my knowledge, we're the only stateful packet filter that does SCTP so far... would be great to have DCCP support, too. Since you know the state transitions and other aspects of the DCCP protocol well, it would be great to see ip_conntrack_proto_dccp.c (or even better: nf_conntrack_proto_dccp.c) at some point :) Cheers, Harald -- - Harald Welte [EMAIL PROTECTED] http://gnumonks.org/ Privacy in residential applications is a desirable marketing option. (ETSI EN 300 175-7 Ch. A6) [NETFILTER] New iptables DCCP protocol header match Using this new iptables DCCP protocol header match, it is possible to create simplistic stateless packet filtering rules for DCCP. It permits matching of port numbers, packet type and options. Signed-off-by: Harald Welte [EMAIL PROTECTED] --- commit 6e79d96f764001a225dea95bf84bcd9fef35476f tree 5612cf3c9196b1e59bc0dcf8eb9e51c331f1aba3 parent c16fd4ffed6349d0888cd97a75d04394dac42021 author Harald Welte [EMAIL PROTECTED] Sa, 06 Aug 2005 20:48:01 +0200 committer Harald Welte [EMAIL PROTECTED] Sa, 06 Aug 2005 20:48:01 +0200 include/linux/dccp.h| 16 ++- include/linux/netfilter_ipv4/ipt_dccp.h | 23 net/ipv4/netfilter/Kconfig | 11 ++ net/ipv4/netfilter/Makefile |1 net/ipv4/netfilter/ipt_dccp.c | 176 +++ 5 files changed, 224 insertions(+), 3 deletions(-) diff --git a/include/linux/dccp.h b/include/linux/dccp.h --- a/include/linux/dccp.h +++ b/include/linux/dccp.h @@ -113,10 +113,15 @@ static inline struct dccp_hdr_ext *dccp_ return (struct dccp_hdr_ext *)(skb-h.raw + sizeof(struct dccp_hdr)); } +static inline unsigned int __dccp_basic_hdr_len(const struct dccp_hdr *dh) +{ + return sizeof(*dh) + (dh-dccph_x ? sizeof(struct dccp_hdr_ext) : 0); +} + static inline unsigned int dccp_basic_hdr_len(const struct sk_buff *skb) { const struct dccp_hdr *dh = dccp_hdr(skb); - return sizeof(*dh) + (dh-dccph_x ? sizeof(struct dccp_hdr_ext) : 0); + return __dccp_basic_hdr_len(dh); } static inline __u64 dccp_hdr_seq(const struct sk_buff *skb) @@ -249,10 +254,15 @@ static inline unsigned int dccp_packet_h return sizeof(struct dccp_hdr_reset); } +static inline unsigned int __dccp_hdr_len(const struct dccp_hdr *dh) +{ + return __dccp_basic_hdr_len(dh) + + dccp_packet_hdr_len(dh-dccph_type); +} + static inline unsigned int dccp_hdr_len(const struct sk_buff *skb) { - return dccp_basic_hdr_len(skb) + - dccp_packet_hdr_len(dccp_hdr(skb)-dccph_type); + return __dccp_hdr_len(dccp_hdr(skb)); } enum dccp_reset_codes { diff --git a/include/linux/netfilter_ipv4/ipt_dccp.h b/include/linux/netfilter_ipv4/ipt_dccp.h new file mode 100644 --- /dev/null +++ b/include/linux/netfilter_ipv4/ipt_dccp.h @@ -0,0 +1,23 @@ +#ifndef _IPT_DCCP_H_ +#define _IPT_DCCP_H_ + +#define IPT_DCCP_SRC_PORTS 0x01 +#define IPT_DCCP_DEST_PORTS0x02 +#define IPT_DCCP_TYPE 0x04 +#define IPT_DCCP_OPTION0x08 + +#define IPT_DCCP_VALID_FLAGS 0x0f + +struct ipt_dccp_info { + u_int16_t dpts[2]; /* Min, Max */ + u_int16_t spts[2]; /* Min, Max */ + + u_int16_t flags; + u_int16_t invflags; + + u_int16_t typemask; + u_int8_t option; +}; + +#endif /* _IPT_DCCP_H_ */ + diff --git a/net/ipv4/netfilter/Kconfig b/net/ipv4/netfilter/Kconfig --- a/net/ipv4/netfilter/Kconfig +++ b/net/ipv4/netfilter/Kconfig @@ -354,6 +354,17 @@ config IP_NF_MATCH_SCTP If you want to compile it as a module, say M here and read file:Documentation/modules.txt. If unsure, say `N'. +config IP_NF_MATCH_DCCP + tristate 'DCCP protocol match support' + depends on IP_NF_IPTABLES + help + With this option enabled, you will be able to use the iptables +
[RFC] Net vm deadlock fix, version 4
Hi, This patch fills in some missing pieces: * Support v4 udp: same as v4 tcp, when in reserve, drop packets on noncritical sockets * Support v4 icmp: when in reserve, drop icmp traffic * Add reserve skb support to e1000 driver * API for dropping packets before delivery (dev_drop_skb) * Atomic_t for reserve accounting Now ready for proof-of-concept testing. High level API boilerplate will come later. Regards, Daniel diff -up --recursive 2.6.12.3.clean/drivers/net/e1000/e1000_main.c 2.6.12.3/drivers/net/e1000/e1000_main.c --- 2.6.12.3.clean/drivers/net/e1000/e1000_main.c 2005-07-15 17:18:57.0 -0400 +++ 2.6.12.3/drivers/net/e1000/e1000_main.c 2005-08-06 16:46:13.0 -0400 @@ -3242,7 +3242,7 @@ e1000_alloc_rx_buffers_ps(struct e1000_a cpu_to_le64(ps_page_dma-ps_page_dma[j]); } - skb = dev_alloc_skb(adapter-rx_ps_bsize0 + NET_IP_ALIGN); + skb = dev_memalloc_skb(netdev, adapter-rx_ps_bsize0 + NET_IP_ALIGN); if(unlikely(!skb)) break; @@ -3253,8 +3253,6 @@ e1000_alloc_rx_buffers_ps(struct e1000_a */ skb_reserve(skb, NET_IP_ALIGN); - skb-dev = netdev; - buffer_info-skb = skb; buffer_info-length = adapter-rx_ps_bsize0; buffer_info-dma = pci_map_single(pdev, skb-data, diff -up --recursive 2.6.12.3.clean/include/linux/gfp.h 2.6.12.3/include/linux/gfp.h --- 2.6.12.3.clean/include/linux/gfp.h 2005-07-15 17:18:57.0 -0400 +++ 2.6.12.3/include/linux/gfp.h2005-08-05 21:53:09.0 -0400 @@ -39,6 +39,7 @@ struct vm_area_struct; #define __GFP_COMP 0x4000u /* Add compound page metadata */ #define __GFP_ZERO 0x8000u /* Return zeroed page on success */ #define __GFP_NOMEMALLOC 0x1u /* Don't use emergency reserves */ +#define __GFP_MEMALLOC 0x2u /* Use emergency reserves */ #define __GFP_BITS_SHIFT 20/* Room for 20 __GFP_FOO bits */ #define __GFP_BITS_MASK ((1 __GFP_BITS_SHIFT) - 1) diff -up --recursive 2.6.12.3.clean/include/linux/netdevice.h 2.6.12.3/include/linux/netdevice.h --- 2.6.12.3.clean/include/linux/netdevice.h2005-07-15 17:18:57.0 -0400 +++ 2.6.12.3/include/linux/netdevice.h 2005-08-06 16:37:14.0 -0400 @@ -371,6 +371,8 @@ struct net_device struct Qdisc*qdisc_ingress; struct list_headqdisc_list; unsigned long tx_queue_len; /* Max frames per queue allowed */ + int rx_reserve; + atomic_trx_reserve_used; /* ingress path synchronizer */ spinlock_t ingress_lock; @@ -662,6 +664,49 @@ static inline void dev_kfree_skb_any(str dev_kfree_skb(skb); } +/* + * Support for critical network IO under low memory conditions + */ +static inline int dev_reserve_used(struct net_device *dev) +{ + return atomic_read(dev-rx_reserve_used); +} + +static inline struct sk_buff *__dev_memalloc_skb(struct net_device *dev, + unsigned length, int gfp_mask) +{ + struct sk_buff *skb = __dev_alloc_skb(length, gfp_mask); + if (skb) + goto done; + if (dev_reserve_used(dev) = dev-rx_reserve) + return NULL; + if (!__dev_alloc_skb(length, gfp_mask|__GFP_MEMALLOC)) + return NULL;; + atomic_inc(dev-rx_reserve_used); +done: + skb-dev = dev; + return skb; +} + +static inline struct sk_buff *dev_memalloc_skb(struct net_device *dev, + unsigned length) +{ + return __dev_memalloc_skb(dev, length, GFP_ATOMIC); +} + +static inline void dev_unreserve(struct net_device *dev) +{ + if (atomic_dec_return(dev-rx_reserve_used) 0) + atomic_inc(dev-rx_reserve_used); +} + +static inline void dev_drop_skb(struct sk_buff *skb) +{ + struct net_device *dev = skb-dev; + __kfree_skb(skb); + dev_unreserve(dev); +} + #define HAVE_NETIF_RX 1 extern int netif_rx(struct sk_buff *skb); extern int netif_rx_ni(struct sk_buff *skb); diff -up --recursive 2.6.12.3.clean/include/net/sock.h 2.6.12.3/include/net/sock.h --- 2.6.12.3.clean/include/net/sock.h 2005-07-15 17:18:57.0 -0400 +++ 2.6.12.3/include/net/sock.h 2005-08-05 21:53:09.0 -0400 @@ -382,6 +382,7 @@ enum sock_flags { SOCK_NO_LARGESEND, /* whether to sent large segments or not */ SOCK_LOCALROUTE, /* route locally only, %SO_DONTROUTE setting */ SOCK_QUEUE_SHRUNK, /* write queue has been shrunk recently */ + SOCK_MEMALLOC, /* protocol can use memalloc reserve */ }; static inline void sock_set_flag(struct sock *sk, enum sock_flags flag) @@ -399,6 +400,11 @@ static inline int sock_flag(struct sock return test_bit(flag, sk-sk_flags); } +static inline int is_memalloc_sock(struct sock *sk) +{ + return
Re: [PATCH] netpoll can lock up on low memory.
On Sat, Aug 06, 2005 at 09:58:27AM +0200, Ingo Molnar wrote: btw., the current NR_SKBS 32 in netpoll.c seems quite low, especially e1000 can have a whole lot more skbs queued at once. Might be more robust to increase it to 128 or 256? Not sure that the card's queueing really makes a difference. It either eventually releases the queued SKBs or it doesn't. What's more important is that we be able to survive bursts like the output of sysrq-t. This seems to work already. -- Mathematics is the supreme nostalgia of our time. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html