[PATCH][UNIX] EOF on non-blocking SOCK_SEQPACKET
Hi, I am not absolutely sure whether this actually is a bug (as in: I've got no clue what the standards say or what other implementations do), but at least I was pretty surprised when I noticed that a recv() on a non-blocking unix domain socket of type SOCK_SEQPACKET (which is connection oriented, after all) where the remote end has closed the connection returned -1 (EAGAIN) rather than 0 to indicate end of file. This is a test case: | #include sys/types.h | #include unistd.h | #include sys/socket.h | #include sys/un.h | #include fcntl.h | #include string.h | #include stdlib.h | | int main(){ | int sock; | struct sockaddr_un addr; | char buf[4096]; | int pfds[2]; | | pipe(pfds); | sock=socket(PF_UNIX,SOCK_SEQPACKET,0); | addr.sun_family=AF_UNIX; | strcpy(addr.sun_path,/tmp/foobar_testsock); | bind(sock,(struct sockaddr *)addr,sizeof(addr)); | listen(sock,1); | if(fork()){ | close(sock); | sock=socket(PF_UNIX,SOCK_SEQPACKET,0); | connect(sock,(struct sockaddr *)addr,sizeof(addr)); | fcntl(sock,F_SETFL,fcntl(sock,F_GETFL)|O_NONBLOCK); | close(pfds[1]); | read(pfds[0],buf,sizeof(buf)); | recv(sock,buf,sizeof(buf),0); // -- this one | }else accept(sock,NULL,NULL); | exit(0); | } If you try it, make sure /tmp/foobar_testsock doesn't exist. The marked recv() returns -1 (EAGAIN) on 2.6.23.9. Below you find a patch that fixes that. Florian --- Signed-off-by: Florian Zumbiehl [EMAIL PROTECTED] diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c index a05c342..fa0aec5 100644 --- a/net/unix/af_unix.c +++ b/net/unix/af_unix.c @@ -1632,8 +1632,13 @@ static int unix_dgram_recvmsg(struct kiocb *iocb, struct socket *sock, mutex_lock(u-readlock); skb = skb_recv_datagram(sk, flags, noblock, err); - if (!skb) + if (!skb) { + unix_state_lock(sk); + if (sk-sk_type==SOCK_SEQPACKET err==-EAGAIN (sk-sk_shutdownRCV_SHUTDOWN)) + err=0; /* signal EOF on disconnected non-blocking SEQPACKET socket */ + unix_state_unlock(sk); goto out_unlock; + } wake_up_interruptible(u-peer_wait); - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RESEND][PATCH 1/3] PPPoE: improved hashing routine
Hi, -static int hash_item(unsigned long sid, unsigned char *addr) +#if 8%PPPOE_HASH_BITS +#error 8 must be a multiple of PPPOE_HASH_BITS +#endif Since PPPOE_HASH_BITS is 4 I would think this check will break the build. :-) Erm, I thought that 8 was 4*2, but maybe I didn't quite understand that natural numbers business ;-) Anyways, I looked at this myself and half of the problem comes from the fact that sid is passed around as an unsigned long throughout this entire file yet the thing is just a __u16. So the first thing to fix is to use __u16 consistently for sid values. Then the sid hash looks obviously wrong and is easy to fix. Then you end up with a hash_item() that simply looks like: static unsigned int hash_item(__u16 sid, unsigned char *addr) { unsigned int hash; hash = (((unsigned int)addr[0] 24) | ((unsigned int)addr[1] 16) | ((unsigned int)addr[2] 8) | ((unsigned int)addr[3] 0)); hash ^= (((unsigned int)addr[4] 8) | ((unsigned int)addr[5] 0)); hash ^= sid; return ((hash ^ (hash 8) ^ (hash 16)) (PPPOE_HASH_SIZE - 1)); } which is what I've checked into my tree. Erm, I'd say this not only produces different results than the old version, but it also produces wrong results, in that it ignores quite a bit of the data that's supposed to be hashed. If I didn't overlook something, it only considers addr0x0f0f0f0f0f00 and sid0x0f0f, given the right endianness of addr and that PPPOE_HASH_SIZE stays 16. Florian - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RESEND][PATCH 1/3] PPPoE: improved hashing routine
Hi, Erm, I'd say this not only produces different results than the old version, but it also produces wrong results, in that it ignores quite a bit of the data that's supposed to be hashed. If I didn't overlook something, it only considers addr0x0f0f0f0f0f00 and sid0x0f0f, given the right endianness of addr and that PPPOE_HASH_SIZE stays 16. You're right, I need to fix the shifts up. How does this version look? hash ^= (hash 4) ^ (hash 12) ^ (hash 20); return (head ^ (hash 8) ^ (hash 24)) (PPPOE_HASH_SIZE - 1); Assuming that it was supposed to read s/head/hash/: Same disclaimers apply, but I'd say this considers only addr0xff0fff0f000f and sid0x0fff, so, well, yes, it's better, but still not quite what I think it should be ;-) Actually it might be simpler and more efficient to just make PPPOE_HASH_SHIFT be 8. SHIFT? SIZE? BITS? Florian - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RESEND][PATCH 1/3] PPPoE: improved hashing routine
Hi, Actually it might be simpler and more efficient to just make PPPOE_HASH_SHIFT be 8. SHIFT? SIZE? BITS? You know what I meant :-) PPPOE_HASH_BITS. Actually, I wasn't sure, for SHIFT looks more similar to SIZE than to BITS, plus numbers are somewhat same order of magnitude anyway, and I didn't quite see what you were up to, so ... whatever ;-) I guess otherwise we degenerate back to your original patch :) Well, as I don't quite see what you were up to and as my patch doesn't change anything about the semantics of the code, I'd at least prefer that over your other suggestions so far ;-) A few variations I tried back when I created the patch, using larger things than a char for accumulating the pieces and then folding down from that, turned out to be slower than what I finally submitted, at least on the machines I tested it on. I didn't do comprehensive testing, as it really doesn't matter, after all, but I think the version I submitted is pretty fast, plus it's quite readable, and it keeps the flexibility of different hash sizes, but still should allow the compiler to optimize away the loops that allow for this flexibility. Florian - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[RESEND][PATCH 1/3] PPPoE: improved hashing routine
Hi, I'm not sure whether this is really worth it, but it looked so extremely inefficient that I couldn't resist - so let's hope providers will keep PPPoE around for a while, at least until terabit dsl ;-) The new code produces the same results as the old version and is ~ 3 to 6 times faster for 4-bit hashes on the CPUs I tested. Florian --- Signed-off-by: Florian Zumbiehl [EMAIL PROTECTED] diff --git a/drivers/net/pppoe.c b/drivers/net/pppoe.c index 9e51fcc..954328c 100644 --- a/drivers/net/pppoe.c +++ b/drivers/net/pppoe.c @@ -108,19 +108,24 @@ static inline int cmp_addr(struct pppoe_addr *a, unsigned long sid, char *addr) (memcmp(a-remote,addr,ETH_ALEN) == 0)); } -static int hash_item(unsigned long sid, unsigned char *addr) +#if 8%PPPOE_HASH_BITS +#error 8 must be a multiple of PPPOE_HASH_BITS +#endif + +static int hash_item(unsigned int sid, unsigned char *addr) { - char hash = 0; - int i, j; + unsigned char hash = 0; + unsigned int i; - for (i = 0; i ETH_ALEN ; ++i) { - for (j = 0; j 8/PPPOE_HASH_BITS ; ++j) { - hash ^= addr[i] ( j * PPPOE_HASH_BITS ); - } + for (i = 0 ; i ETH_ALEN ; i++) { + hash ^= addr[i]; + } + for (i = 0 ; i sizeof(sid_t)*8 ; i += 8 ){ + hash ^= sidi; + } + for (i = 8 ; (i=1) = PPPOE_HASH_BITS ; ) { + hash ^= hashi; } - - for (i = 0; i (sizeof(unsigned long)*8) / PPPOE_HASH_BITS ; ++i) - hash ^= sid (i*PPPOE_HASH_BITS); return hash ( PPPOE_HASH_SIZE - 1 ); } - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[RESEND][PATCH 2/3] PPPoX/E: return ENOTTY on unknown ioctl requests
Hi, here another patch for the PPPoX/E code that makes sure that ENOTTY is returned for unknown ioctl requests rather than 0 (and removes another unneeded initializer which I didn't bother creating a separate patch for). Florian --- Signed-off-by: Florian Zumbiehl [EMAIL PROTECTED] diff --git a/drivers/net/pppoe.c b/drivers/net/pppoe.c index 954328c..9554924 100644 --- a/drivers/net/pppoe.c +++ b/drivers/net/pppoe.c @@ -669,8 +669,8 @@ static int pppoe_ioctl(struct socket *sock, unsigned int cmd, { struct sock *sk = sock-sk; struct pppox_sock *po = pppox_sk(sk); - int val = 0; - int err = 0; + int val; + int err; switch (cmd) { case PPPIOCGMRU: @@ -759,8 +759,9 @@ static int pppoe_ioctl(struct socket *sock, unsigned int cmd, err = 0; break; - default:; - }; + default: + err = -ENOTTY; + } return err; } diff --git a/drivers/net/pppox.c b/drivers/net/pppox.c index 3f8115d..51de561 100644 --- a/drivers/net/pppox.c +++ b/drivers/net/pppox.c @@ -72,7 +72,7 @@ int pppox_ioctl(struct socket *sock, unsigned int cmd, unsigned long arg) { struct sock *sk = sock-sk; struct pppox_sock *po = pppox_sk(sk); - int rc = 0; + int rc; lock_sock(sk); @@ -93,12 +93,9 @@ int pppox_ioctl(struct socket *sock, unsigned int cmd, unsigned long arg) break; } default: - if (pppox_protos[sk-sk_protocol]-ioctl) - rc = pppox_protos[sk-sk_protocol]-ioctl(sock, cmd, - arg); - - break; - }; + rc = pppox_protos[sk-sk_protocol]-ioctl ? + pppox_protos[sk-sk_protocol]-ioctl(sock, cmd, arg) : -ENOTTY; + } release_sock(sk); return rc; - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[RESEND][PATCH 3/3] PPPoE: move lock_sock() in pppoe_sendmsg() to the right location
Hi, and the last one for now: Acquire the sock lock in pppoe_sendmsg() before accessing the sock - and in particular avoid releasing the lock even though it hasn't been acquired. Florian --- Signed-off-by: Florian Zumbiehl [EMAIL PROTECTED] diff --git a/drivers/net/pppoe.c b/drivers/net/pppoe.c index 9554924..eef8a5b 100644 --- a/drivers/net/pppoe.c +++ b/drivers/net/pppoe.c @@ -779,6 +779,7 @@ static int pppoe_sendmsg(struct kiocb *iocb, struct socket *sock, struct net_device *dev; char *start; + lock_sock(sk); if (sock_flag(sk, SOCK_DEAD) || !(sk-sk_state PPPOX_CONNECTED)) { error = -ENOTCONN; goto end; @@ -789,8 +790,6 @@ static int pppoe_sendmsg(struct kiocb *iocb, struct socket *sock, hdr.code = 0; hdr.sid = po-num; - lock_sock(sk); - dev = po-pppoe_dev; error = -EMSGSIZE; - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 4/4] PPPoE: race between interface going down and release()
Hi, This patch doesn't apply becuase in the actual pppoe.c code: [...] So I'm having trouble figuring out what tree you generated that patch against :-) Perhaps there was an earlier patch I missed or something. But I won't second guess and leave it to you to let me know what I should actually apply. That one was against my first try of a fix. So, either you apply that one first and then remove it again by applying that patch ;-) - or just ignore those four patches completely and apply [EMAIL PROTECTED] ff instead. Florian - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/5 2.6.21-rc4] l2tp: pppol2tp core
Hi, Florian Zumbiehl wrote: Hi, + * 251003 :Copied from pppoe.c version 0.6.9. you might want to have a look at the patches to the PPPoE code that were posted to netdev recently, as some of them seem to apply to code that's left over from pppoe.c. Do you mean this change? * 070228 : Fix to allow multiple sessions with same remote MAC and same *session id by including the local device ifindex in the *tuple identifying a session. This also ensures packets can't * be injected into a session from interfaces other than the * one specified by userspace. Florian Zumbiehl [EMAIL PROTECTED] The pppoe session hash table isn't used in the pppol2tp code so the above change doesn't affect pppol2tp. I rechecked the latest pppoe.c and I don't see other changes that haven't already been incorporated in pppol2tp. Which specific code in pppol2tp do you think needs to be updated? I meant those that haven't been committed yet (AFAIK): http://www.spinics.net/lists/netdev/msg26926.html http://www.spinics.net/lists/netdev/msg26927.html http://www.spinics.net/lists/netdev/msg26928.html http://www.spinics.net/lists/netdev/msg26751.html http://www.spinics.net/lists/netdev/msg26752.html http://www.spinics.net/lists/netdev/msg26753.html http://www.spinics.net/lists/netdev/msg26754.html http://www.spinics.net/lists/netdev/msg26787.html (Plus the respective followup postings, of course.) Some probably don't apply. I didn't have a very detailed look at your code, but some certainly do apply. Florian - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/3] PPPoE: improved hashing routine
Hi, I'm not sure whether this is really worth it, but it looked so extremely inefficient that I couldn't resist - so let's hope providers will keep PPPoE around for a while, at least until terabit dsl ;-) The new code produces the same results as the old version and is ~ 3 to 6 times faster for 4-bit hashes on the CPUs I tested. Florian --- Signed-off-by: Florian Zumbiehl [EMAIL PROTECTED] diff --git a/drivers/net/pppoe.c b/drivers/net/pppoe.c index 9e51fcc..954328c 100644 --- a/drivers/net/pppoe.c +++ b/drivers/net/pppoe.c @@ -108,19 +108,24 @@ static inline int cmp_addr(struct pppoe_addr *a, unsigned long sid, char *addr) (memcmp(a-remote,addr,ETH_ALEN) == 0)); } -static int hash_item(unsigned long sid, unsigned char *addr) +#if 8%PPPOE_HASH_BITS +#error 8 must be a multiple of PPPOE_HASH_BITS +#endif + +static int hash_item(unsigned int sid, unsigned char *addr) { - char hash = 0; - int i, j; + unsigned char hash = 0; + unsigned int i; - for (i = 0; i ETH_ALEN ; ++i) { - for (j = 0; j 8/PPPOE_HASH_BITS ; ++j) { - hash ^= addr[i] ( j * PPPOE_HASH_BITS ); - } + for (i = 0 ; i ETH_ALEN ; i++) { + hash ^= addr[i]; + } + for (i = 0 ; i sizeof(sid_t)*8 ; i += 8 ){ + hash ^= sidi; + } + for (i = 8 ; (i=1) = PPPOE_HASH_BITS ; ) { + hash ^= hashi; } - - for (i = 0; i (sizeof(unsigned long)*8) / PPPOE_HASH_BITS ; ++i) - hash ^= sid (i*PPPOE_HASH_BITS); return hash ( PPPOE_HASH_SIZE - 1 ); } - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/3] PPPoX/E: return ENOTTY on unknown ioctl requests
Hi, here another patch for the PPPoX/E code that makes sure that ENOTTY is returned for unknown ioctl requests rather than 0 (and removes another unneeded initializer which I didn't bother creating a separate patch for). Florian --- Signed-off-by: Florian Zumbiehl [EMAIL PROTECTED] diff --git a/drivers/net/pppoe.c b/drivers/net/pppoe.c index 954328c..9554924 100644 --- a/drivers/net/pppoe.c +++ b/drivers/net/pppoe.c @@ -669,8 +669,8 @@ static int pppoe_ioctl(struct socket *sock, unsigned int cmd, { struct sock *sk = sock-sk; struct pppox_sock *po = pppox_sk(sk); - int val = 0; - int err = 0; + int val; + int err; switch (cmd) { case PPPIOCGMRU: @@ -759,8 +759,9 @@ static int pppoe_ioctl(struct socket *sock, unsigned int cmd, err = 0; break; - default:; - }; + default: + err = -ENOTTY; + } return err; } diff --git a/drivers/net/pppox.c b/drivers/net/pppox.c index 3f8115d..51de561 100644 --- a/drivers/net/pppox.c +++ b/drivers/net/pppox.c @@ -72,7 +72,7 @@ int pppox_ioctl(struct socket *sock, unsigned int cmd, unsigned long arg) { struct sock *sk = sock-sk; struct pppox_sock *po = pppox_sk(sk); - int rc = 0; + int rc; lock_sock(sk); @@ -93,12 +93,9 @@ int pppox_ioctl(struct socket *sock, unsigned int cmd, unsigned long arg) break; } default: - if (pppox_protos[sk-sk_protocol]-ioctl) - rc = pppox_protos[sk-sk_protocol]-ioctl(sock, cmd, - arg); - - break; - }; + rc = pppox_protos[sk-sk_protocol]-ioctl ? + pppox_protos[sk-sk_protocol]-ioctl(sock, cmd, arg) : -ENOTTY; + } release_sock(sk); return rc; - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 3/3] PPPoE: move lock_sock() in pppoe_sendmsg() to the right location
Hi, and the last one for now: Acquire the sock lock in pppoe_sendmsg() before accessing the sock - and in particular avoid releasing the lock even though it hasn't been acquired. Florian --- Signed-off-by: Florian Zumbiehl [EMAIL PROTECTED] diff --git a/drivers/net/pppoe.c b/drivers/net/pppoe.c index 9554924..eef8a5b 100644 --- a/drivers/net/pppoe.c +++ b/drivers/net/pppoe.c @@ -779,6 +779,7 @@ static int pppoe_sendmsg(struct kiocb *iocb, struct socket *sock, struct net_device *dev; char *start; + lock_sock(sk); if (sock_flag(sk, SOCK_DEAD) || !(sk-sk_state PPPOX_CONNECTED)) { error = -ENOTCONN; goto end; @@ -789,8 +790,6 @@ static int pppoe_sendmsg(struct kiocb *iocb, struct socket *sock, hdr.code = 0; hdr.sid = po-num; - lock_sock(sk); - dev = po-pppoe_dev; error = -EMSGSIZE; - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 4/4] PPPoE: race between interface going down and release()
Hi, Attached below is my take on how to address this problem. This addresses any concerns you may have had about checking po-pppoe_dev==NULL, because accesses to this field are now synchronized with pppoe_hash_lock. That indeed looks like a much cleaner solution, so I'd certainly prefer it over mine. Testing so far didn't turn up any problems, either. Once we can settle on a fix for this, I'll deal with the SID==0 issue (trying to do that now would just cause patch conflicts). Well, yeah, this stuff probably is more important than support for SID 0 :-) Florian - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/4] PPPoE: miscellaneous smaller cleanups
Hi, below is a patch that just removes dead code/initializers without any effect (first access is an assignment) that I stumbled accross while reading the source. Florian --- diff --git a/drivers/net/pppoe.c b/drivers/net/pppoe.c index 13f5b22..18d1a4d 100644 --- a/drivers/net/pppoe.c +++ b/drivers/net/pppoe.c @@ -207,7 +207,7 @@ static inline struct pppox_sock *get_item(unsigned long sid, static inline struct pppox_sock *get_item_by_addr(struct sockaddr_pppox *sp) { - struct net_device *dev = NULL; + struct net_device *dev; int ifindex; dev = dev_get_by_name(sp-sa_addr.pppoe.dev); @@ -222,9 +222,6 @@ static inline int set_item(struct pppox_sock *po) { int i; - if (!po) - return -EINVAL; - write_lock_bh(pppoe_hash_lock); i = __set_item(po); write_unlock_bh(pppoe_hash_lock); @@ -344,7 +341,7 @@ static struct notifier_block pppoe_notifier = { static int pppoe_rcv_core(struct sock *sk, struct sk_buff *skb) { struct pppox_sock *po = pppox_sk(sk); - struct pppox_sock *relay_po = NULL; + struct pppox_sock *relay_po; if (sk-sk_state PPPOX_BOUND) { struct pppoe_hdr *ph = (struct pppoe_hdr *) skb-nh.raw; @@ -514,7 +511,6 @@ static int pppoe_release(struct socket *sock) { struct sock *sk = sock-sk; struct pppox_sock *po; - int error = 0; if (!sk) return 0; @@ -543,7 +539,7 @@ static int pppoe_release(struct socket *sock) skb_queue_purge(sk-sk_receive_queue); sock_put(sk); - return error; + return 0; } @@ -762,10 +758,10 @@ static int pppoe_ioctl(struct socket *sock, unsigned int cmd, static int pppoe_sendmsg(struct kiocb *iocb, struct socket *sock, struct msghdr *m, size_t total_len) { - struct sk_buff *skb = NULL; + struct sk_buff *skb; struct sock *sk = sock-sk; struct pppox_sock *po = pppox_sk(sk); - int error = 0; + int error; struct pppoe_hdr hdr; struct pppoe_hdr *ph; struct net_device *dev; @@ -929,10 +925,10 @@ static int pppoe_recvmsg(struct kiocb *iocb, struct socket *sock, struct msghdr *m, size_t total_len, int flags) { struct sock *sk = sock-sk; - struct sk_buff *skb = NULL; + struct sk_buff *skb; int error = 0; int len; - struct pppoe_hdr *ph = NULL; + struct pppoe_hdr *ph; if (sk-sk_state PPPOX_BOUND) { error = -EIO; @@ -949,7 +945,6 @@ static int pppoe_recvmsg(struct kiocb *iocb, struct socket *sock, m-msg_namelen = 0; if (skb) { - error = 0; ph = (struct pppoe_hdr *) skb-nh.raw; len = ntohs(ph-length); @@ -991,7 +986,7 @@ out: static __inline__ struct pppox_sock *pppoe_get_idx(loff_t pos) { - struct pppox_sock *po = NULL; + struct pppox_sock *po; int i = 0; for (; i PPPOE_HASH_SIZE; i++) { - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/4] PPPoE: race between interface going down and connect()
Hi, below you find a patch that (hopefully) fixes a race between an interface going down and a connect() to a peer on that interface. Before, connect() would determine that an interface is up, then the interface could go down and all entries referring to that interface in the item_hash_table would be marked as ZOMBIEs and their references to the device would be freed, and after that, connect() would put a new entry into the hash table referring to the device that meanwhile is down already - which also would cause unregister_netdevice() to wait until the socket has been release()d. This patch does not suffice if we are not allowed to accept connect()s referring to a device that we already acked a NETDEV_GOING_DOWN for (that is: all references are only guaranteed to be freed after NETDEV_DOWN has been acknowledged, not necessarily after the NETDEV_GOING_DOWN already). And if we are allowed to, we could avoid looking through the hash table upon NETDEV_GOING_DOWN completely and only do that once we get the NETDEV_DOWN ... Florian --- diff --git a/drivers/net/pppoe.c b/drivers/net/pppoe.c index 18d1a4d..1aeac2c 100644 --- a/drivers/net/pppoe.c +++ b/drivers/net/pppoe.c @@ -218,17 +218,6 @@ static inline struct pppox_sock *get_item_by_addr(struct sockaddr_pppox *sp) return get_item(sp-sa_addr.pppoe.sid, sp-sa_addr.pppoe.remote, ifindex); } -static inline int set_item(struct pppox_sock *po) -{ - int i; - - write_lock_bh(pppoe_hash_lock); - i = __set_item(po); - write_unlock_bh(pppoe_hash_lock); - - return i; -} - static inline struct pppox_sock *delete_item(unsigned long sid, char *addr, int ifindex) { struct pppox_sock *ret; @@ -595,14 +584,18 @@ static int pppoe_connect(struct socket *sock, struct sockaddr *uservaddr, po-pppoe_dev = dev; po-pppoe_ifindex = dev-ifindex; - if (!(dev-flags IFF_UP)) + write_lock_bh(pppoe_hash_lock); + if (!(dev-flags IFF_UP)){ + write_unlock_bh(pppoe_hash_lock); goto err_put; + } memcpy(po-pppoe_pa, sp-sa_addr.pppoe, sizeof(struct pppoe_addr)); - error = set_item(po); + error = __set_item(po); + write_unlock_bh(pppoe_hash_lock); if (error 0) goto err_put; - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 3/4] PPPoE: memory leak when socket is release()d before PPPIOCGCHAN has been called on it
Hi, below you find a patch that fixes a memory leak when a PPPoE socket is release()d after it has been connect()ed, but before the PPPIOCGCHAN ioctl ever has been called on it. This is somewhat of a security problem, too, since PPPoE sockets can be created by any user, so any user can easily allocate all the machine's RAM to non-swappable address space and thus DoS the system. Is there any specific reason for PPPoE sockets being available to any unprivileged process, BTW? After all, you need a packet socket for the discovery stage anyway, so it's unlikely that any unprivileged process will ever need to create a PPPoE socket, no? Allocating all session IDs for a known AC is a kind of DoS, too, after all - with Juniper ERXes, this is really easy, actually, since they don't ever assign session ids above 8000 ... Florian --- diff --git a/drivers/net/pppox.c b/drivers/net/pppox.c index 9315046..3f8115d 100644 --- a/drivers/net/pppox.c +++ b/drivers/net/pppox.c @@ -58,7 +58,7 @@ void pppox_unbind_sock(struct sock *sk) { /* Clear connection to ppp device, if attached. */ - if (sk-sk_state (PPPOX_BOUND | PPPOX_ZOMBIE)) { + if (sk-sk_state (PPPOX_BOUND | PPPOX_CONNECTED | PPPOX_ZOMBIE)) { ppp_unregister_channel(pppox_sk(sk)-chan); sk-sk_state = PPPOX_DEAD; } - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 4/4] PPPoE: race between interface going down and release()
Hi, below you find the last patch for now. It (hopefully) fixes a race between a socket being release()d and the interface it's using going down. As pppoe_release() didn't lock the socket, and pppoe_flush_dev() did the locking in the wrong place, pppoe_flush_dev() could set po-pppoe_dev to NULL, which would then cause pppoe_release() to not dev_put() the interface, but to still mark the socket as DEAD, which in turn would cause pppoe_flush_dev() to not dev_put() the interface, effectively leaking one reference to the device, thus making it impossible to remove (ignoring the possibility of overflowing the reference counter by repeated use of this race ;-). The thing I'm not quite sure about is whether the outer |if (po-pppoe_dev == dev) { is actually reliable this way on SMP systems, as far as cache consistency is concerned. I left it that way for now, as any alternative locking strategies that would lock the socket before doing this comparison seemed to be pretty complicated to implement because of the need to drop the hash table lock before trying to acquire the socket lock, so I'd rather be sure that this actually is a problem before I try to solve it ;-) Florian --- diff --git a/drivers/net/pppoe.c b/drivers/net/pppoe.c index 1aeac2c..f5abfff 100644 --- a/drivers/net/pppoe.c +++ b/drivers/net/pppoe.c @@ -253,7 +253,6 @@ static void pppoe_flush_dev(struct net_device *dev) struct sock *sk = sk_pppox(po); sock_hold(sk); - po-pppoe_dev = NULL; /* We hold a reference to SK, now drop the * hash table lock so that we may attempt @@ -263,12 +262,15 @@ static void pppoe_flush_dev(struct net_device *dev) lock_sock(sk); - if (sk-sk_state - (PPPOX_CONNECTED | PPPOX_BOUND)) { - pppox_unbind_sock(sk); + if(po-pppoe_dev==dev){ dev_put(dev); - sk-sk_state = PPPOX_ZOMBIE; - sk-sk_state_change(sk); + po-pppoe_dev = NULL; + if (sk-sk_state + (PPPOX_CONNECTED | PPPOX_BOUND)) { + pppox_unbind_sock(sk); + sk-sk_state = PPPOX_ZOMBIE; + sk-sk_state_change(sk); + } } release_sock(sk); @@ -504,8 +506,11 @@ static int pppoe_release(struct socket *sock) if (!sk) return 0; - if (sock_flag(sk, SOCK_DEAD)) + lock_sock(sk); + if (sock_flag(sk, SOCK_DEAD)){ + release_sock(sk); return -EBADF; + } pppox_unbind_sock(sk); @@ -526,6 +531,7 @@ static int pppoe_release(struct socket *sock) sock-sk = NULL; skb_queue_purge(sk-sk_receive_queue); + release_sock(sk); sock_put(sk); return 0; - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Session ID 0 with PPPoE
Hi, In the current code SID 0 indicates that the socket is to be un-bound. That are the semantics used by the kernel code, yes - but even pppd uses different semantics (which can't quite work, of course ...). Supporting unbinding of the socket was intended to permit the PPPoE session to be reconnected without closing/reopening the socket; which would mean that you'd have to re-bind the PPPoE/PPP channel bindings. Thus it is conceivable to swap or renegotiate PPPoE connection underneath a PPP connection, hypothetically if anyone ever considered doing so. Is that worth it? I don't know. One could eliminate that disconnect behavior and I don't think anyone would care. Well, if _you_ don't even know ... =:-) Anyway, if it is not to be eliminated, it should be represented some way differently, I guess. Which probably would break backwards compatibility at the userspace API completely, of course, which is a bad thing[tm], so possibly simply changing the semantics to what is already assumed by most userspace applications might be the way to go. I'll conceed that a SID of 0 could appear from outer space. I've never seen that happening. Now, the question is, of course: How many samples is this based on (both, number of connection attempts and number of different peer implementations)? The only way I see this being an issue is if a PPPoE server insists on giving you SID 0 and only SID 0 repeatedly. And I've never seen *that* happening. So, what you are saying is that pppd/kernel-PPP/-PPPoE is so unstable that you shouldn't ever be using it without some cron job/wrapper that takes care of restarts anyway, so some additional bug that causes pppd to exit unexpectedly doesn't matter? That is to say: If I didn't overlook anything, pppd (the rp-pppoe plugin, that is) warns you if the PADS' session id field is 0 that the AC was somehow violating the RFC, but then goes on, using 0 as the session id anyway - thus calling connect() with a session id field of 0, obviously assuming that this will bind to session id 0 - which it doesn't. Now, when it invokes the PPPIOCGCHAN ioctl on that seemingly successfully connect()ed socket, it gets an ENOTCONN - and exits with a fatal error. And exiting with a fatal error isn't quite what I'd expect pppd to do when the peer is behaving correctly according to the RFC, since that will not just cause a retry that might be likely to give a different session id and thus would cause the connection to finally be established with some delay, but rather will break network connectivity until manual intervention. Of course, you could argue that this is somehow a bug in pppd, as it doesn't use the kernel API correctly. But I assume that fixing the kernel to support sid 0 rather than fixing pppd to support sid 0 (which would mean implementing an automatic retry when assigned sid 0 by the AC) is somehow the better solution. If you'd really like to pursue this, I'll be happy to review and ack patches in this regard. However, I don't see what there is to be actually gained by pursuing this. I'm open to being convinced; what is the motivation behind this? If there is a real problem here I'll be glad to get involved in fixing it myself. Now, what is a real problem? I haven't noticed this being a problem for me yet, no. But I have been using userspace RP-PPPOE until recently and I am using a cron job that takes care of restarts in case pppd exits for some reason, so it's rather unlikely that I'd notice if it did happen. But isn't a problem that does occur seldom, is difficult to reproduce, and has more than just temporary effect much worse a problem than something that causes a crash every half an hour and thus is simple to come by using debugging and doesn't surprise you the very moment you need things to simply work? Actually, IMO the major question is: Is there any application that does (intentionally) make use of this session rebinding/unbinding? Since simply dropping that probably would be the easiest fix to implement. Florian - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Session ID 0 with PPPoE
Hi, This change can be made; the unbinding behavior can be removed and SID 0 can be made valid. I hope I was clear in my previous e-mail that I didn't object to this. Not quite. But now I think I got it ;-) PPPoE connections are unstable. Ethernet frames get dropped. Things die randomly. And yes, you typically want to have a cron job or script re-spawning pppd on failure. Making this change won't increase the reliability of these connections in any meaningful way, it won't break pppd either. I mean, you are right from my experience that you want to have that for pppd (which is why I _do_ have that cron job) - but your rationale doesn't look very convincing to me. IP networks are unstable, you know. IP packets get dropped. Things (read: TCP peers) die randomly. But no, I don't have a cron job that restarts the MTA or web server in case some TCP peer died - because they don't exit when some TCP connection times out or something. The very same thing I'd expect from pppd. If I do specify the persist and maxfail 0 options, I expect pppd not to exit even when it receives complete garbage from the network, much less when it receives completely valid packets as per the RFC. The only reasonable reasons for exiting IMO are lack of system resources or missing system features/invalid configuration. The only reason why I do have a cron job that takes care of restarts are all the bugs I encountered in pppd so far that caused it to exit because it used the kernel PPP API wrongly in case of reconnects or because it leaked ptys until there were none available anymore, not the fact that PPP(oE) is a potentially unreliable network protocol or because modems do lose their connection at times. The only question I have is why is this important to you? I'm simply curious as to what you are trying to accomplish; is this related to some other work you are doing or is it correctness as a virtue? It isn't extremely important to me as it doesn't affect me. But if for any reason, then because I hope that others will take care of bugs that they stumble across before they become a potentially difficult to solve problem for me, too. Florian - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Session ID 0 with PPPoE
Hi, From the RFC: 5.4 The PPPoE Active Discovery Session-confirmation (PADS) packet When the Access Concentrator receives a PADR packet, it prepares to begin a PPP session. It generates a unique SESSION_ID for the PPPoE session and replies to the Host with a PADS packet. The DESTINATION_ADDR field is the unicast Ethernet address of the Host that sent the PADR. The CODE field is set to 0x65 and the SESSION_ID MUST be set to the unique value generated for this PPPoE session. The PADS packet contains exactly one TAG of TAG_TYPE Service-Name, indicating the service under which Access Concentrator has accepted the PPPoE session, and any number of other TAG types. If the Access Concentrator does not like the Service-Name in the PADR, then it MUST reply with a PADS containing a TAG of TAG_TYPE Service-Name-Error (and any number of other TAG types). In this case the SESSION_ID MUST be set to 0x. As you can see from the last paragraph, a session id of 0 implies a rejection of the PADR. Thus, you can't possibly get a PADS packet that completes and initiates a valid session if the session id is 0. Note that the RFC does not prohibit all other aspects of the PADS to be structured as if it were a valid success response; the only condition and requirement of a failure mode here is the session id. | [...] then it MUST reply with a PADS containing a TAG of TAG_TYPE | Service-Name-Error [...] !?! To my understanding, the indicator is the Service-Name-Error tag, and the RFC only states that if such a tag is present (indicating that the AC doesn't like the requested service name and thus rejects the session request), the session id field must be 0x - not that the session id field may not be 0x if this tag is not present (which would indicate that this is a valid session). Also 0x is reserved for future use. Thus it cannot be used as a sentinel value to indicate an invalid session id. Well, currently it could (IMO, a connect() specifying 0x as the session ID should fail anyway as of now as it is not a valid session id as per the RFC - and 0x in the session id field could be used to mean basically anything at the protocol level in the future) - however that probably wouldn't be a good choice for extensibility reasons: If at some point, a protocol session id field of 0x does somehow mean something that would sensibly be represented as 0x in the session id field of the internal data structure, one would have to change the code again. So I guess the session id simply shouldn't be overloaded, not even with an indication of its validity. Changing this code would require that the user-space component be synchronized with this change; as the socket interface implies that 0 is an invalid/unbound session id. Well, either that or the indication as to whether the session id is currently valid should be stored in some different way. Lots of badness will occur if 0 is allowed as a session id, and nothing will be gained because it can't possibly be a valid session id. Well, if that was the case, sure. But I still don't see any reason why it can't be. Florian - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH][BUG][SECURITY] Re: Weird problem with PPPoE on tap interface
Hi, From: Florian Zumbiehl [EMAIL PROTECTED] Date: Wed, 28 Feb 2007 13:38:44 +0100 As noone seems to have an opinion on this: Here is a patch that does work for me and that should solve the problem as far as that is easily possible. It is based on the assumption that an interface's ifindex is basically an alias for a local MAC address, so incoming packets now are matched to sockets based on remote MAC, session id, and ifindex of the interface the packet came in on/the socket was bound to by connect(). I agree with your analysis and have applied your patch. Below you find a slightly changed version of the patch that avoids a possible NULL pointer dereference in case pppoe_device_event()/ pppoe_flush_dev() dev_put()s dev and sets it to NULL before pppoe_connect() tries to unbind from the previous address, in which case it would dereference the NULL pointer in dev. It now saves the ifindex in the socket's data structure upon connect(), so that it's still available for finding the entry to remove from the hash table in case pppoe_device_event() should have dropped the socket's reference to dev. Another way to implement this would have been to store the pre-computed ifindex on the kernel side sockaddr. Well, that probably depends on the intended semantics. There isn't any documentation somewhere that specifies what the intended behaviour is, is there?! Florian --- linux-2.6.20/drivers/net/pppoe.c.orig 2007-02-25 19:23:51.0 +0100 +++ linux-2.6.20/drivers/net/pppoe.c2007-03-04 02:11:51.0 +0100 @@ -7,6 +7,12 @@ * * Version:0.7.0 * + * 070228 :Fix to allow multiple sessions with same remote MAC and same + * session id by including the local device ifindex in the + * tuple identifying a session. This also ensures packets can't + * be injected into a session from interfaces other than the one + * specified by userspace. Florian Zumbiehl [EMAIL PROTECTED] + * (Oh, BTW, this one is YYMMDD, in case you were wondering ...) * 220102 :Fix module use count on failure in pppoe_create, pppox_sk -acme * 030700 :Fixed connect logic to allow for disconnect. * 270700 :Fixed potential SMP problems; we must protect against @@ -127,14 +133,14 @@ * Set/get/delete/rehash items (internal versions) * **/ -static struct pppox_sock *__get_item(unsigned long sid, unsigned char *addr) +static struct pppox_sock *__get_item(unsigned long sid, unsigned char *addr, int ifindex) { int hash = hash_item(sid, addr); struct pppox_sock *ret; ret = item_hash_table[hash]; - while (ret !cmp_addr(ret-pppoe_pa, sid, addr)) + while (ret !(cmp_addr(ret-pppoe_pa, sid, addr) ret-pppoe_ifindex == ifindex)) ret = ret-next; return ret; @@ -147,21 +153,19 @@ ret = item_hash_table[hash]; while (ret) { - if (cmp_2_addr(ret-pppoe_pa, po-pppoe_pa)) + if (cmp_2_addr(ret-pppoe_pa, po-pppoe_pa) ret-pppoe_ifindex == po-pppoe_ifindex) return -EALREADY; ret = ret-next; } - if (!ret) { - po-next = item_hash_table[hash]; - item_hash_table[hash] = po; - } + po-next = item_hash_table[hash]; + item_hash_table[hash] = po; return 0; } -static struct pppox_sock *__delete_item(unsigned long sid, char *addr) +static struct pppox_sock *__delete_item(unsigned long sid, char *addr, int ifindex) { int hash = hash_item(sid, addr); struct pppox_sock *ret, **src; @@ -170,7 +174,7 @@ src = item_hash_table[hash]; while (ret) { - if (cmp_addr(ret-pppoe_pa, sid, addr)) { + if (cmp_addr(ret-pppoe_pa, sid, addr) ret-pppoe_ifindex == ifindex) { *src = ret-next; break; } @@ -188,12 +192,12 @@ * **/ static inline struct pppox_sock *get_item(unsigned long sid, -unsigned char *addr) +unsigned char *addr, int ifindex) { struct pppox_sock *po; read_lock_bh(pppoe_hash_lock); - po = __get_item(sid, addr); + po = __get_item(sid, addr, ifindex); if (po) sock_hold(sk_pppox(po)); read_unlock_bh(pppoe_hash_lock); @@ -203,7 +207,15 @@ static inline struct pppox_sock *get_item_by_addr(struct sockaddr_pppox *sp) { - return get_item(sp-sa_addr.pppoe.sid, sp-sa_addr.pppoe.remote); + struct net_device *dev = NULL; + int ifindex; + + dev = dev_get_by_name(sp-sa_addr.pppoe.dev); + if(!dev) + return NULL; + ifindex = dev-ifindex; + dev_put(dev); + return
Session ID 0 with PPPoE
Hi, I noticed that the PPPoE code doesn't allow session id 0x to be used for an actual session but rather considers 0 a special value denoting that the socket is unbound. Now, when reading RFC 2516, I couldn't really find anything that would forbid 0x as a session id. Only 0x is reserved for future use and MUST NOT be used, while 0x is specified as the only allowed value for the session id field on certain types of packets, but neither can I find any statement that forbids 0x as an ordinary session identifier, nor can I find any reasons that would prevent PPPoE from functioning properly with a session id of 0x. Does anyone of you see any reason why a server would not be allowed to select 0x as the session id for a PPPoE session? Florian - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH][BUG][SECURITY] Re: Weird problem with PPPoE on tap interface
Hi, Well, your opinions are welcome. Plus any hints as to how to fix this. I'd tend to simply(?) add some more fields to the {hash,get,set,delete}_item() functions in drivers/net/pppoe.c. But maybe there is some better way? As noone seems to have an opinion on this: Here is a patch that does work for me and that should solve the problem as far as that is easily possible. It is based on the assumption that an interface's ifindex is basically an alias for a local MAC address, so incoming packets now are matched to sockets based on remote MAC, session id, and ifindex of the interface the packet came in on/the socket was bound to by connect(). For relayed packets, the socket that's used for relaying is selected based on destination MAC, session ID and the interface index of the interface whose name currently matches the name requested by userspace as the relaying source interface. The relaying part of the patch is untested. Please note that I'd consider this a security fix for reasons outlined in previous mails. Florian --- linux-2.6.20/drivers/net/pppoe.c.orig 2007-02-25 19:23:51.0 +0100 +++ linux-2.6.20/drivers/net/pppoe.c2007-02-28 12:56:05.0 +0100 @@ -7,6 +7,12 @@ * * Version:0.7.0 * + * 070228 :Fix to allow multiple sessions with same remote MAC and same + * session id by including the local device ifindex in the + * tuple identifying a session. This also ensures packets can't + * be injected into a session from interfaces other than the one + * specified by userspace. Florian Zumbiehl [EMAIL PROTECTED] + * (Oh, BTW, this one is YYMMDD, in case you were wondering ...) * 220102 :Fix module use count on failure in pppoe_create, pppox_sk -acme * 030700 :Fixed connect logic to allow for disconnect. * 270700 :Fixed potential SMP problems; we must protect against @@ -127,14 +133,14 @@ * Set/get/delete/rehash items (internal versions) * **/ -static struct pppox_sock *__get_item(unsigned long sid, unsigned char *addr) +static struct pppox_sock *__get_item(unsigned long sid, unsigned char *addr, int ifindex) { int hash = hash_item(sid, addr); struct pppox_sock *ret; ret = item_hash_table[hash]; - while (ret !cmp_addr(ret-pppoe_pa, sid, addr)) + while (ret !(cmp_addr(ret-pppoe_pa, sid, addr) ret-pppoe_dev-ifindex == ifindex)) ret = ret-next; return ret; @@ -147,21 +153,19 @@ ret = item_hash_table[hash]; while (ret) { - if (cmp_2_addr(ret-pppoe_pa, po-pppoe_pa)) + if (cmp_2_addr(ret-pppoe_pa, po-pppoe_pa) ret-pppoe_dev-ifindex == po-pppoe_dev-ifindex) return -EALREADY; ret = ret-next; } - if (!ret) { - po-next = item_hash_table[hash]; - item_hash_table[hash] = po; - } + po-next = item_hash_table[hash]; + item_hash_table[hash] = po; return 0; } -static struct pppox_sock *__delete_item(unsigned long sid, char *addr) +static struct pppox_sock *__delete_item(unsigned long sid, char *addr, int ifindex) { int hash = hash_item(sid, addr); struct pppox_sock *ret, **src; @@ -170,7 +174,7 @@ src = item_hash_table[hash]; while (ret) { - if (cmp_addr(ret-pppoe_pa, sid, addr)) { + if (cmp_addr(ret-pppoe_pa, sid, addr) ret-pppoe_dev-ifindex == ifindex) { *src = ret-next; break; } @@ -188,12 +192,12 @@ * **/ static inline struct pppox_sock *get_item(unsigned long sid, -unsigned char *addr) +unsigned char *addr, int ifindex) { struct pppox_sock *po; read_lock_bh(pppoe_hash_lock); - po = __get_item(sid, addr); + po = __get_item(sid, addr, ifindex); if (po) sock_hold(sk_pppox(po)); read_unlock_bh(pppoe_hash_lock); @@ -203,7 +207,15 @@ static inline struct pppox_sock *get_item_by_addr(struct sockaddr_pppox *sp) { - return get_item(sp-sa_addr.pppoe.sid, sp-sa_addr.pppoe.remote); + struct net_device *dev = NULL; + int ifindex; + + dev = dev_get_by_name(sp-sa_addr.pppoe.dev); + if(!dev) + return NULL; + ifindex = dev-ifindex; + dev_put(dev); + return get_item(sp-sa_addr.pppoe.sid, sp-sa_addr.pppoe.remote, ifindex); } static inline int set_item(struct pppox_sock *po) @@ -220,12 +232,12 @@ return i; } -static inline struct pppox_sock *delete_item(unsigned long sid, char *addr) +static inline struct pppox_sock *delete_item(unsigned long sid, char *addr, int ifindex) { struct
Re: Weird problem with PPPoE on tap interface
Hi, I'm experiencing a pretty strange problem with kernel PPPoE on tap interfaces with a vanilla 2.6.20 kernel that prevents the PPP connection from being established: [...] Well, I guess I have found the source of the problem: The PPPoE code seems to match inbound packets to sockets based only on source address and session ID - so in the given scenario, packets are delivered to pppd once when they arrive at the real ethernet interface and a second time when they arrive at the tap interface. Now, this is kindof two problems, actually: 1. This is not in compliance with RFC 2516, which states in section 4: | The SESSION_ID field is sixteen bits. It is an unsigned value in | network byte order. It's value is defined below for Discovery | packets. The value is fixed for a given PPP session and, in fact, | defines a PPP session along with the Ethernet SOURCE_ADDR and | DESTINATION_ADDR. So, it would be legal for there to be multiple sessions with the same peer using the same session ID, as long as they are using different local MAC addresses - but the current PPPoE code would not be able to distinguish those. This probably is not a problem for me, but should be fixed anyway, IMO. (I say probably since I am actually using different local MAC addresses for multiple PPPoE sessions to the same DSL-AC, as T-Com doesn't allow for multiple sessions from the same MAC address - so they would be allowed by the RFC to use the same session ID for more than one of those sessions, even though I doubt that they're doing that.) 2. In addition to session ID, source and destination address, IMO, the incoming interface should be added to the key that's used for matching packets to sockets. This is probably somewhat of a design decision much like the weak ES vs. strong ES issue with IP, but AFAICS, the userspace-API works by binding to an interface index, not by binding to a MAC address (unlike IP socket binding), which is why I'd expect from pppd's perspective that only packets from that interface are delivered. Plus, from the end user's perspective, this might even be somewhat of a security problem. I for one would (well, I obviuosly _did_ :-) expect that if I do specify an interface name in pppd's config, pppd won't get into contact with anything that's going on on interfaces other than the one specified. As the involved MAC addresses probably usually aren't considered a secret, all you'd have to guess in order to interfere with/inject packets into PPPoE sessions on a different interface is the 16-bit session identifier, which might not even be random. Well, your opinions are welcome. Plus any hints as to how to fix this. I'd tend to simply(?) add some more fields to the {hash,get,set,delete}_item() functions in drivers/net/pppoe.c. But maybe there is some better way? Florian - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Weird problem with PPPoE on tap interface
Hi, I'm experiencing a pretty strange problem with kernel PPPoE on tap interfaces with a vanilla 2.6.20 kernel that prevents the PPP connection from being established: Every PPPoE session packet (that is, LCP, since it never gets to a stage where any other session data is being exchanged) is delivered to pppd twice. This can be seen from pppd's logging messages when debugging is enabled, and strace confirmed that it indeed does read() the exact same data twice in a row from the same file descriptor - even though a tcpdump on the corresponding tap interface does show each packet only once. For confirming this, I used a program with a select() loop that simply moves packets unchanged and without reordering back and forth between a real ethernet interface in promiscuous mode and the tap interface used by pppd. What makes this even stranger, is, that the setup works perfectly (and only a single copy of packets is delivered to pppd) if I simply replace the tap interface in pppd's config with the real ethernet interface that the tap interface was previously bridged to (it's an ISA NE2K clone, BTW). Finally, it also works perfectly when I use userspace rp-pppoe through the tap interface. So far, I also confirmed that in kernel space, the call to ppp_input() in drivers/net/pppoe.c is executed as many times as pppd receives a packet, so the problem must be somewhere before that. Well, I'm gonna try to find out more - but if someone with some more knowledge of the involved kernel code would be willing to help with this in some way or another, that would certainly be appreciated ;-) - if you do need any additional information, let me know ... Florian - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: TCP not retransmitting supposedly lost segment
Hi, This tells us that the kernel was indeed retransmitting but the packet didn't make it out of the interface according to your tcpdump. Well, at least the tcp part of it, it seems, yep. The debugging I've done so far also seems to confirm that the problem is somewhere beyond the tcp code, but I don't have any real clue yet ... Do you have any firewall rules? Any tc rules? ip6tables rules, yes, but none in the filter/OUTPUT chain (and policy ACCEPT, of course) and only counting rules in mangle/OUTPUT (using only user-defined chains and target RETURN, with policy ACCEPT, once again). So i'd wonder if that'd be the cause of the problem. BTW, other TCP/IPv6 (IRC) connections active at the same time work without problems - but each of them fails in that way now and then ... Well, all in all that probably doesn't help you much - but any hints as to how/where to further debug are welcome :-) Florian - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: TCP not retransmitting supposedly lost segment
Hi, meanwhile I gathered some information from /proc/net/tcp6 on this problem - in case anyone is interested :-) | 1144560959: 8: :1223 :1A0B 01 : 02:00097F82 10000 3188619 8 c1409260 89 10 20 2 2 | 1144560969: 8: :1223 :1A0B 01 0016: 01:0074 0004 10000 3188619 9 c1409260 1424 10 18 1 2 | 1144560980: 8: :1223 :1A0B 01 0016: 01:00FA 0005 10000 3188619 9 c1409260 2848 10 18 1 2 | 1144560990: 8: :1223 :1A0B 01 0016: 01:05F2 0006 10000 3188619 9 c1409260 5696 10 18 1 2 | 1144561000: 8: :1223 :1A0B 01 0016: 01:0205 0006 10000 3188619 9 c1409260 5696 10 18 1 2 | 1144561010: 8: :1223 :1A0B 01 0016: 01:0FE4 0007 10000 3188619 9 c1409260 11392 10 18 1 2 | 1144561020: 8: :1223 :1A0B 01 0016: 01:0BF6 0007 10000 3188619 9 c1409260 11392 10 18 1 2 | 1144561030: 8: :1223 :1A0B 01 0016: 01:0809 0007 10000 3188619 9 c1409260 11392 10 20 1 2 | 1144561040: 8: :1223 :1A0B 01 0016: 01:041B 0007 10000 3188619 9 c1409260 11392 10 20 1 2 | 1144561050: 8: :1223 :1A0B 01 0016: 01:002E 0007 10000 3188619 9 c1409260 11392 10 20 1 2 | 1144561060: 8: :1223 :1A0B 01 0016: 01:1FDA 0008 10000 3188619 9 c1409260 22784 10 20 1 2 | 1144561070: 8: :1223 :1A0B 01 0016: 01:1BE7 0008 10000 3188619 9 c1409260 22784 10 20 1 2 | 1144561080: 8: :1223 :1A0B 01 0016: 01:17F9 0008 10000 3188619 9 c1409260 22784 10 20 1 2 | 1144561090: 8: :1223 :1A0B 01 0016: 01:140B 0008 10000 3188619 9 c1409260 22784 10 20 1 2 | 1144561100: 8: :1223 :1A0B 01 0016: 01:101E 0008 10000 3188619 9 c1409260 22784 10 20 1 2 | 1144561110: 8: :1223 :1A0B 01 0016: 01:0C30 0008 10000 3188619 9 c1409260 22784 10 20 1 2 | 1144561120: 8: :1223 :1A0B 01 0016: 01:0842 0008 10000 3188619 9 c1409260 22784 10 20 1 2 | 1144561130: 8: :1223 :1A0B 01 0016: 01:0455 0008 10000 3188619 9 c1409260 22784 10 20 1 2 | 1144561140: 8: :1223 :1A0B 01 0016: 01:0067 0008 10000 3188619 9 c1409260 22784 10 20 1 2 | 1144561150: 8: :1223 :1A0B 04 0017: 01:2B5A 0001 00 0 9 c1409260 3 10 18 1 2 | 1144561161: 8: :1223 :1A0B 04 0017: 01:276C 0001 00 0 9 c1409260 3 10 18 1 2 | 1144561171: 8: :1223 :1A0B 04 0017: 01:237E 0001 00 0 9 c1409260 3 10 18 1 2 | 1144561181: 8: :1223 :1A0B 04 0017: 01:1F91 0001 00 0 9 c1409260 3 10 18 1 2 | 1144561191: 8: :1223 :1A0B 04 0017: 01:1BA3 0001 00 0 9 c1409260 3 10 18 1 2 | 1144561201: 8: :1223 :1A0B 04 0017: 01:17B5 0001 00 0 9 c1409260 3 10 18 1 2 | 1144561211: 8: :1223 :1A0B 04 0017: 01:13C2 0001 00 0 9 c1409260 3 10 18 1 2 | 1144561221: 8:
TCP not retransmitting supposedly lost segment
Hi, I've been regularly losing my IRC connections going over my PPP/PPPoE/ADSL IPv6 connection with static addresses for quite some time now. I haven't found any obvious correlation with PPP reconnects or anything else one would expect to cause this - now, I've had a look at it using tcpdump and it looks to me pretty much like it's linux's TCP implementation that fails to retransmit lost segments on some occasions. The two occasions I captured so far look pretty similar, so here is one of them: | 03:34:56.061190 2002::1.6667 2001::1.2214: P 42984:43037(53) ack 17052 win 5712 nop,nop,timestamp 2372706322 98778470 | 03:34:56.061470 2001::1.2214 2002::1.6667: . ack 43037 win 6888 nop,nop,timestamp 98778484 2372706322 | 03:35:57.004547 2001::1.2214 2002::1.6667: P 17052:17073(21) ack 43037 win 6888 nop,nop,timestamp 98793722 2372706322 | 03:38:58.062458 2001::1.2214 2002::1.6667: F 17073:17073(0) ack 43037 win 6888 nop,nop,timestamp 98838992 2372706322 | 03:38:58.118258 2002::1.6667 2001::1.2214: . ack 17052 win 5712 nop,nop,timestamp 2372730527 98778484,nop,nop,sack sack 1 {17073:17074} | 03:38:58.657355 2002::1.6667 2001::1.2214: P 43037:43059(22) ack 17052 win 5712 nop,nop,timestamp 2372730581 98778484,nop,nop,sack sack 1 {17073:17074} | 03:38:58.657759 2001::1.2214 2002::1.6667: R 130103887:130103887(0) win 0 (The addresses are not the original ones, obviously.) Note in particular the third and the fourth lines (which is when irssi decides to close the connection because of a ping timeout): No retransmission takes place for (more than) three minutes even though the segment sent in the third line is not acknowledged. Tcpdump was running on 2001::1 and it reported no dropped packets, so I suppose that no packets are missing from this dump (unless there is a bug in there(, too), of course ...). 2001::1 is running a linux kernel version 2.6.16, 2002::1 is an IRC server i've got no control over - but I suppose that it's not at fault here anyway, so that probably doesn't matter ... I've previously been observing similar behaviour with 2.4 kernels already (that is, IRC connections on that internet connection getting lost, but I've never had a close look at it then), so I suppose it's not just a faulty compilation or something like that ... Well - in case there is anything I can provide you with that would help you with fixing this, let me know (nope, no clue yet how to reproduce this except for waiting for my IRC connections to time out ;-) - simply dropping some packets in ip6tables's filter/OUTPUT chain did not do the trick ...). Florian - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html