Re: 2.4.0-test6 network socket problems
Thanks Allen, you're exactly right. I'm charged with the task of finding lots of nasties like that in our old code base where a number of things were just hacked in down and dirty. Our embeded environment moved from XINU on an SH2/SH3 with no mmu support and a BSD protocol stack we hacked in ourselves to Linux kernel, MIPS, and mmu. Codeing standards truely have had to step up a couple levels due to the growing complexity of the environment, but eventually it will be worth the pain. > > > alarm(1) > [sudden swap frenzy] > alarm is delivered.. do nothing > read > > blocks forever. You need to make clever use of siglongjmp to avoid that one > occurring or use select/poll. > - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: 2.4.0-test6 network socket problems
Thanks Allen, you're exactly right. I'm charged with the task of finding lots of nasties like that in our old code base where a number of things were just hacked in down and dirty. Our embeded environment moved from XINU on an SH2/SH3 with no mmu support and a BSD protocol stack we hacked in ourselves to Linux kernel, MIPS, and mmu. Codeing standards truely have had to step up a couple levels due to the growing complexity of the environment, but eventually it will be worth the pain. alarm(1) [sudden swap frenzy] alarm is delivered.. do nothing read blocks forever. You need to make clever use of siglongjmp to avoid that one occurring or use select/poll. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: 2.4.0-test6 network socket problems
> I've found the problem. This type of loop does not work: > > do { > alarm(t); > read(fd); > if (EINT) >exception(); > else >alarm(0); > } while (data); > > There are some semantics here that differ from other *nix where this > works. The read() won't come out when the alarm comes, and the socket > will effectively become broken. The restart or continue behaviour is undefined unless you use sigaction() to control your signal behaviour (see POSIX.1 or SuS). Even then your code is buggy on every OS I know Suppose this happens.. alarm(1) [sudden swap frenzy] alarm is delivered.. do nothing read blocks forever. You need to make clever use of siglongjmp to avoid that one occurring or use select/poll. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: 2.4.0-test6 network socket problems
On Fri, 13 Oct 2000, J. Scott Kasten wrote: > I've found the problem. This type of loop does not work: > > do { > alarm(t); > read(fd); > if (EINT) >exception(); > else >alarm(0); > } while (data); > > There are some semantics here that differ from other *nix where this > works. The read() won't come out when the alarm comes, and the socket > will effectively become broken. > > Instead, it appears that I needed to use select(), which probably would > have been better in the first place anyway. > > Thanks to anyone that took the time to look at this. > You can certainly use select() and it's 'better' and more useful. However, the problem is the default nature of the way signal() in the 'C' runtime library sets up the handler. You should use sigaction() without the SA_RESTART flag. This lets a signal unblock a system call, the resulting errno being EINTER. Cheers, Dick Johnson Penguin : Linux version 2.2.17 on an i686 machine (801.18 BogoMips). "Memory is like gasoline. You use it up when you are running. Of course you get it all back when you reboot..."; Actual explanation obtained from the Micro$oft help desk. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: 2.4.0-test6 network socket problems
I've found the problem. This type of loop does not work: do { alarm(t); read(fd); if (EINT) exception(); else alarm(0); } while (data); There are some semantics here that differ from other *nix where this works. The read() won't come out when the alarm comes, and the socket will effectively become broken. Instead, it appears that I needed to use select(), which probably would have been better in the first place anyway. Thanks to anyone that took the time to look at this. -S- > I'm working with test6 on an embedded > QED MIPS arch in big endian mode. I > have run into some bizarre socket problems that appear to affect both > udp and tcp transport. Applications actively using sockets (examples, > ftp, tftp, others...) will unexpectedly stop receiving data on the > socket, even though data is present. The process will be forever > sleeping on the read even though data is queued up. To illustrate my > point, I've dug deep into the udp code (net/ipv4/udp.c) and the > datagram core (net/core/datagram.c) researching the simple tftp > example. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
2.4.0-test6 network socket problems
I'm working with test6 on an embedded QED MIPS arch in big endian mode. I have run into some bizarre socket problems that appear to affect both udp and tcp transport. Applications actively using sockets (examples, ftp, tftp, others...) will unexpectedly stop receiving data on the socket, even though data is present. The process will be forever sleeping on the read even though data is queued up. To illustrate my point, I've dug deep into the udp code (net/ipv4/udp.c) and the datagram core (net/core/datagram.c) researching the simple tftp example. After much debugging, here is what I know: I have followed the packets in from the network driver all the way to udp_rcv() in udp.c. I see it do the sk lookup and drop it off in udp_queue_rcv_skb(). Everything is fine on that end. On the process end, I've been watching in the udp_recvmsg() function, also in udp.c. Under normal operation, I see it pick up the data from the correct skb and return. When the rare condition that causes failure occurs, skb_recv_datagram() returns a NULL and err is set to -ERESTARTSYS. It is only when the process gets hung on that socket that I see this happen. It never revisits this portion of the code again, however, until the sender stops transmitting data from ACK timeouts, I see the packets continue to pile up on the udp_rcv() side without incident. I further looked at datagram.c to see what the skb_recv_datagram() was doing. It was spinning through the do {} while() loop waiting for wait_for_packet() to hand it something. It is in that routine that the error code is generated. The signal_pending() function returns true and sock_intr_errno() returns the -ERESTARTSYS code, which gets passed back down the chain from here. The structure of the tftp code that I'm working with is such that it does a generic blocking read() on the socket file handle and uses an alarm to wake up when the critical timeout is reached. Not the most glorious code, but demonstrates a problem none-the-less. The read() never returns an EINTR or EAGAIN or anything. It's just hung. I'm assuming that the signal_pending() return comes from my alarm(), which means that the process had already been sitting on that socket for a while not seeing the data that is clearly already present. Thus, there may be two problems here, the signal not returning, and data trapped in the skb. I would appreciate it if anyone more familiar with this code could point me better to what I should be looking at, or at least explain what should be happening that isn't. TIA, -S- -- J. Scott Kasten Email: jsk AT tetracon-eng DOT net "In most cases, all an argument proves is that two people were present.." - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: 2.4.0-test6 network socket problems
I've found the problem. This type of loop does not work: do { alarm(t); read(fd); if (EINT) exception(); else alarm(0); } while (data); There are some semantics here that differ from other *nix where this works. The read() won't come out when the alarm comes, and the socket will effectively become broken. Instead, it appears that I needed to use select(), which probably would have been better in the first place anyway. Thanks to anyone that took the time to look at this. -S- I'm working with test6 on an embedded QED MIPS arch in big endian mode. I have run into some bizarre socket problems that appear to affect both udp and tcp transport. Applications actively using sockets (examples, ftp, tftp, others...) will unexpectedly stop receiving data on the socket, even though data is present. The process will be forever sleeping on the read even though data is queued up. To illustrate my point, I've dug deep into the udp code (net/ipv4/udp.c) and the datagram core (net/core/datagram.c) researching the simple tftp example. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: 2.4.0-test6 network socket problems
I've found the problem. This type of loop does not work: do { alarm(t); read(fd); if (EINT) exception(); else alarm(0); } while (data); There are some semantics here that differ from other *nix where this works. The read() won't come out when the alarm comes, and the socket will effectively become broken. The restart or continue behaviour is undefined unless you use sigaction() to control your signal behaviour (see POSIX.1 or SuS). Even then your code is buggy on every OS I know Suppose this happens.. alarm(1) [sudden swap frenzy] alarm is delivered.. do nothing read blocks forever. You need to make clever use of siglongjmp to avoid that one occurring or use select/poll. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/