Re: Lightwight socket IO wrapper
"Dennis Lee Bieber"wrote in message news:mailman.12.1442794762.28679.python-l...@python.org... On Sun, 20 Sep 2015 23:36:30 +0100, "James Harris" declaimed the following: There are a few things and more crop up as time goes on. For example, over TCP it would be helpful to have a function to receive a specific number of bytes or one to read bytes until reaching a certain delimiter such as newline or zero or space etc. Even better would be to be able to use the iteration protocol so you could just code next() and get the next such chunk of read in a for loop. When sending it would be good to just say to send a bunch of bytes but know that you will get told how many were sent (or didn't get sent) if it fails. Sock.sendall() doesn't do that. Note that the "buffer size" option on a TCP socket.recv() gives you your "specific number of bytes" -- if available at that time. "If" is a big word! AIUI the buffer size is not guaranteed to relate to the number of bytes returned except that you won't/shouldn't(!) get more than the buffer size. I wouldn't want to user .recv(1) though to implement your "reaching a certain delimiter"... Much better to read as much as available and search it for the delimiter. Yes, that's what I do at the moment. I keep a block of bytes, add any new stuff to it and scan it for delimiters. I'll confess, adding a .readln() FOR TCP ONLY, might be a nice extension over BSD sockets (might need to allow option for whether line-ends are Internet standard or some other marker, and whether they should be converted upon reading to the native format for the host). Akira Li pointed out that there is just such an extension: makefile. Scanning to is what I do just now as that includes too and I leave them on the string. IIRC file.readline works in the same way. I thought UDP would deliver (or drop) a whole datagram but cannot find anything in the Python documentaiton to guarantee that. In fact documentation for the send() call says that apps are responsible for checking that all data has been sent. They may mean that to apply to stream protocols only but it doesn't state that. (Of course, UDP datagrams are limited in size so the call may validly indicate incomplete transmission even when the first part of a big message is sent successfully.) Looking in the wrong documentation You probably should be looking at the UDP RFC. Or maybe just http://www.diffen.com/difference/TCP_vs_UDP """ Packets are sent individually and are checked for integrity only if they arrive. Packets have definite boundaries which are honored upon receipt, meaning a read operation at the receiver socket will yield an entire message as it was originally sent. """ I would rather see it in the Python docs because we program to the language standard and there can be - and often are, for good reason - areas where Python does not work in the same way as underlying systems. Even if the IP layer has to fragment a UDP packet to meet limits of the transport media, it should put them back together on the other end before passing it up to the UDP layer. To my knowledge, UDP does not have a size limit on the message (well -- a 16-bit length field in the UDP header). But since it /is/ "got it all" or "dropped" with no inherent confirmation, one would have to embed their own protocol within it -- sequence numbers with ACK/NAK, for example. Problem: if using LARGE UDP packets, this protocol would mean having LARGE resends should packets be dropped or arrive out of sequence (and since the ACK/NAK could be dropped too, you may have to handle the case of a duplicated packet -- also large). Yes, it was the 16-bit limitation that I was talking about. TCP is a stream protocol -- the protocol will ensure that all data arrives, and that it arrives in order, but does not enforce any boundaries on the data; what started as a relatively large packet at one end may arrive as lots of small packets due to intermediate transport limits (one can visualize a worst case: each TCP packet is broken up to fit Hollerith cards; 20bytes for header and 60 bytes of data -- then fed to a reader and sent on AS-IS). Boundaries are the end-user responsibility... line endings (look at SMTP, where an email message ends on a line containing just a ".") or embedded length counter (not the TCP packet length). Yes. Receiving no bytes is taken as indicating the end of the communication. That's OK for TCP but not for UDP so there should be a way to distinguish between the end of data and receiving an empty datagram. I don't believe UDP supports a truly empty datagram (length of 0) -- presuming a sending stack actually sends one, the receiving stack will probably drop it as there is no data to pass on to a client (there is a PR at work because we have a UDP driver that doesn't drop 0-length messages, but also can't deliver them -- so the circular
Re: Lightwight socket IO wrapper
"Marko Rauhamaa"wrote in message news:8737y6cgp6@elektro.pacujo.net... "James Harris" : I agree with what you say. A zero-length UDP datagram should be possible and not indicate end of input but is that guaranteed and portable? The zero-length payload size shouldn't be an issue, but UDP doesn't make any guarantees about delivering the message. Your UDP application must be prepared for some, most or all of the messages disappearing without any error indication. In practice, you'd end up implementing your own TCP on top of UDP (retries, timeouts, acknowledgements, sequence numbers etc). The unreliability of UDP was not the case in point here. Rather, it was about whether different platforms could be relied upon to deliver zero-length datagrams to the app if the datagrams got safely across the network. James -- https://mail.python.org/mailman/listinfo/python-list
Re: Lightwight socket IO wrapper
Random832 wrote: Isn't this technically the same problem as pressing ctrl-d at a terminal - it's not _really_ the end of the input (you can continue reading after), but it sends the program something it will interpret as such? Yes. There's no concept of "closing the connection" with UDP, because there's no connection. So if a read returns 0 bytes, it must be because someone sent you a 0-length datagram. -- Greg -- https://mail.python.org/mailman/listinfo/python-list
Re: Lightwight socket IO wrapper
"Akira Li" <4kir4...@gmail.com> wrote in message news:mailman.18.1442804862.28679.python-l...@python.org... "James Harris"writes: ... There are a few things and more crop up as time goes on. For example, over TCP it would be helpful to have a function to receive a specific number of bytes or one to read bytes until reaching a certain delimiter such as newline or zero or space etc. The answer is sock.makefile('rb') then `file.read(nbytes)` returns a specific number of bytes. Thanks, I hadn't seen that. Now I know of it I see references to it all over the place but beforehand it was in hiding It is exactly the type of convenience wrapper I was expecting Python to have but expected it to be in another module. It looks as though it will definitely cover some of the issues I had. `file.readline()` reads until newline (b'\n') There is Python Issue: "Add support for reading records with arbitrary separators to the standard IO stack" http://bugs.python.org/issue1152248 See also http://bugs.python.org/issue17083 Perhaps, it is easier to implement read_until(sep) that is best suited for a particular case. OK. ... When sending it would be good to just say to send a bunch of bytes but know that you will get told how many were sent (or didn't get sent) if it fails. Sock.sendall() doesn't do that. sock.send() returns the number of bytes sent that may be less than given. You could reimplement sock.sendall() to include the number of bytes successfully sent in case of an error. I know. As mentioned, I wondered if there were already such functions to save me using my own. I thought UDP would deliver (or drop) a whole datagram but cannot find anything in the Python documentaiton to guarantee that. In fact documentation for the send() call says that apps are responsible for checking that all data has been sent. They may mean that to apply to stream protocols only but it doesn't state that. (Of course, UDP datagrams are limited in size so the call may validly indicate incomplete transmission even when the first part of a big message is sent successfully.) Receiving no bytes is taken as indicating the end of the communication. That's OK for TCP but not for UDP so there should be a way to distinguish between the end of data and receiving an empty datagram. There is no end of communication in UDP and therefore there is no end of data. If you've got a zero bytes in return then it means that you've received a zero length datagram. sock.recvfrom() is a thin wrapper around the corresponding C function. You could read any docs you like about UDP sockets. http://stackoverflow.com/questions/5307031/how-to-detect-receipt-of-a-0-length-udp-datagram As mentioned to Dennis just now, I would prefer to write code to conform with the documented behaviour of Python and its libraries, as long as they were known to be reliable implementations of what was documented, of course. I agree with what you say. A zero-length UDP datagram should be possible and not indicate end of input but is that guaranteed and portable? (Rhetorical.) It seems not. Even the Linux man page for recv says: "If no messages are available at the socket, the receive calls wait for a message to arrive, unless the socket is nonblocking" In that case, of course, what it defines as a "message" - and whether it can be zero length or not - is not stated. The recv calls require a buffer size to be supplied which is a technical detail. A Python wrapper could save the programmer dealing with that. It is not just a buffer size. It is the maximum amount of data to be received at once i.e., sock.recv() may return less but never more. My point was that we might want to request the entire next line or next field of input and not know a maximum length. *C* programmers are used to giving buffers fixed sizes often because then they can avoid fiddling with memory management but Python normally does that for us. I was suggesting that the thin wrapper around the socket recv() call is too thin! The makefile() approach that you mentioned seems more Pythonesque, though. You could use makefile() and read() if recv() is too low-level. Yes. James -- https://mail.python.org/mailman/listinfo/python-list
Re: Lightwight socket IO wrapper
"James Harris": > I agree with what you say. A zero-length UDP datagram should be > possible and not indicate end of input but is that guaranteed and > portable? The zero-length payload size shouldn't be an issue, but UDP doesn't make any guarantees about delivering the message. Your UDP application must be prepared for some, most or all of the messages disappearing without any error indication. In practice, you'd end up implementing your own TCP on top of UDP (retries, timeouts, acknowledgements, sequence numbers etc). Marko -- https://mail.python.org/mailman/listinfo/python-list
Re: Lightwight socket IO wrapper
On Tue, Sep 22, 2015, at 15:45, James Harris wrote: > "Dennis Lee Bieber"wrote in message > news:mailman.12.1442794762.28679.python-l...@python.org... > > On Sun, 20 Sep 2015 23:36:30 +0100, "James Harris" > > declaimed the following: > >>Receiving no bytes is taken as indicating the end of the > >>communication. > >>That's OK for TCP but not for UDP so there should be a way to > >>distinguish between the end of data and receiving an empty datagram. > >> > > I don't believe UDP supports a truly empty datagram (length of 0) -- > > presuming a sending stack actually sends one, the receiving stack will > > probably drop it as there is no data to pass on to a client (there is > > a PR > > at work because we have a UDP driver that doesn't drop 0-length > > messages, > > but also can't deliver them -- so the circular buffer might fill with > > undeliverable headers) > > As others have pointed out, UDP implementations do seem to work with > zero-byte datagrams properly. Again, I would rather see that in the > Python documentation which is what, effectively, forms a contract that > we should be able to rely on. Isn't this technically the same problem as pressing ctrl-d at a terminal - it's not _really_ the end of the input (you can continue reading after), but it sends the program something it will interpret as such? -- https://mail.python.org/mailman/listinfo/python-list
Re: Lightwight socket IO wrapper
On Mon, 2015-09-21, Cameron Simpson wrote: > On 21Sep2015 10:34, Chris Angelicowrote: >>If you're going to add sequencing and acknowledgements to UDP, >>wouldn't it be easier to use TCP and simply prefix every message with >>a two-byte length? > > Frankly, often yes. That's what I do. (different length encoding, but > otherwise...) > > UDP's neat if you do not care if a packet fails to arrive and if you can > guarentee that your data fits in a packet in the face of different MTUs. There's also the impact on your application. With TCP you need to consider that you may block when reading or writing, and you'll be using threads and/or a state machine driven by select() or something. UDP is more fire-and-forget. > I like TCP myself, most of the time. Another nice thing about TCP is that wil > a > little effort you get to pack multiple data packets (or partial data packets) > into a network packet, etc. That, and also (again) the impact on the application. With UDP you can easily end up wasting a lot of time reading tiny datagrams one by one. It has often been a performance bottleneck for me, with certain UDP-based protocols which cannot pack multiple application-level messages into one datagram. Although perhaps you tend not to use Python in those situations. /Jorgen -- // Jorgen Grahn O o . -- https://mail.python.org/mailman/listinfo/python-list
Re: Lightwight socket IO wrapper
On Mon, 2015-09-21, Chris Angelico wrote: > On Mon, Sep 21, 2015 at 6:38 PM, Marko Rauhamaawrote: >> Chris Angelico : >> >>> On Mon, Sep 21, 2015 at 5:59 PM, Marko Rauhamaa wrote: You can read a full buffer even if you have a variable-length length encoding. >>> >>> Not sure what you mean there. Unless you can absolutely guarantee that >>> you didn't read too much, or can absolutely guarantee that your >>> buffering function will be the ONLY way anything reads from the >>> socket, buffering is a problem. >> >> Only one reader can read a socket safely at any given time so mutual >> exclusion is needed. >> >> If you read "too much," the excess can be put in the application's read >> buffer where it is available for whoever wants to process the next >> message. > > Oops, premature send - sorry! Trying again. > > Which works only if you have a single concept of "application's read > buffer". That means that you have only one place that can ever read > data. Imagine a protocol that mainly consists of lines of text > terminated by CRLF, but allows binary data to be transmitted by > sending "DATA N\r\n" followed by N arbitrary bytes. The simplest and > most obvious way to handle the base protocol is to buffer your reads > as much as possible, but that means potentially reading the beginning > of the data stream along with its header. You therefore cannot use the > basic read() method to read that data - you have to use something from > your line-based wrapper, even though you are decidedly NOT using a > line-based protocol at that point. > > That's what I mean by guaranteeing that your buffering function is the > only way data gets read from the socket. Either that, or you need an > underlying facility for un-reading a bunch of data - de-buffering and > making it readable again. The way it seems to me, reading a TCP socket always ends up as: - keep an application buffer - do one socket read and append to the buffer - consume 0--more complete "entries" from the beginning of the buffer; keep the incomplete one which may exist at the end - go back and read some more when there's a chance more data has arrived So the buffer is a circular buffer of octets, which you chop up by parsing it so you can see it as a circular buffer of complete and incomplete entries or messages. At that level, yes, the line-oriented data and the binary data would coexist in the same application buffer. /Jorgen -- // Jorgen Grahn O o . -- https://mail.python.org/mailman/listinfo/python-list
Re: Lightwight socket IO wrapper
On 21Sep2015 12:40, Chris Angelicowrote: On Mon, Sep 21, 2015 at 11:55 AM, Cameron Simpson wrote: On 21Sep2015 10:34, Chris Angelico wrote: If you're going to add sequencing and acknowledgements to UDP, wouldn't it be easier to use TCP and simply prefix every message with a two-byte length? Frankly, often yes. That's what I do. (different length encoding, but otherwise...) Out of interest, what encoding? NB: this is for binary protocols. I don't like embedding arbitrary size limits in protocols or data formats if I can easily avoid it. So (for my home grown binary protocols) I encode unsigned integers as big endian octets with the top bit meaning "another octet follows" and the bottom 7 bits going to the value. So my packets look like: encoded(length)data For sizes below 128, one byte of length. For sizes 128-16383, two bytes. And so on. Compact yet unbounded. My new protocols ar probably going to derive from the scheme implemented in the code cited below. "New" means as of some weeks ago, when I completely rewrote a painful ad hoc protocol of mine and pulled out the general features into what follows. The actual packet format is implemented by the Packet class at the bottom of this: https://bitbucket.org/cameron_simpson/css/src/tip/lib/python/cs/serialise.py Simple and flexible. As for using that data format multiplexed with multiple channels, see the PacketConnection class here: https://bitbucket.org/cameron_simpson/css/src/tip/lib/python/cs/stream.py Broadly, the packets are length[tag,flags[,channel#],payload] and one implements whatever semantics one needs on top of that. You can see this exercised over UNIX pipes and TCP streams in the unit tests here: https://bitbucket.org/cameron_simpson/css/src/tip/lib/python/cs/stream_tests.py On the subject of packet stuffing, my preferred loop for that is visible in the PacketConnection._send worker thread method, which goes: fp = self._send_fp Q = self._sendQ for P in Q: sig = (P.channel, P.tag, P.is_request) if sig in self.__sent: raise RuntimeError("second send of %s" % (P,)) self.__sent.add(sig) write_Packet(fp, P) if Q.empty(): fp.flush() fp.close() In short: get packets from the queue and write them to the stream buffer. If the queue gets empty, _only then_ flush the buffer. This assures synchronicity in comms while giving the IO library a chance to fill a buffer with several packets. Cheers, Cameron Simpson ERROR 155 - You can't do that. - Data General S200 Fortran error code list -- https://mail.python.org/mailman/listinfo/python-list
Re: Lightwight socket IO wrapper
Marko Rauhamaa wrote: > I recommend using socket.TCP_CORK with socket.TCP_NODELAY where they are > available (Linux). If these options are not available are both option constants also not available? Or does the implementation have to look into sys.platform? Ciao, Michael. -- https://mail.python.org/mailman/listinfo/python-list
Re: Lightwight socket IO wrapper
Chris Angelico: > On Mon, Sep 21, 2015 at 4:27 PM, Cameron Simpson wrote: >> For sizes below 128, one byte of length. For sizes 128-16383, two bytes. And >> so on. Compact yet unbounded. > > [...] > > It's generally a lot faster to do a read(2) than a loop with any > number of read(1), and you get some kind of bound on your allocations. > Whether that's important to you or not is another question, but > certainly your chosen encoding is a good way of allowing arbitrary > integer values. You can read a full buffer even if you have a variable-length length encoding. Marko -- https://mail.python.org/mailman/listinfo/python-list
Re: Lightwight socket IO wrapper
Michael Ströder: > Marko Rauhamaa wrote: >> I recommend using socket.TCP_CORK with socket.TCP_NODELAY where they >> are available (Linux). > > If these options are not available are both option constants also not > available? Or does the implementation have to look into sys.platform? >>> import socket >>> 'TCP_CORK' in dir(socket) True The TCP_NODELAY option is available everywhere but has special semantics with TCP_CORK. Marko -- https://mail.python.org/mailman/listinfo/python-list
Re: Lightwight socket IO wrapper
On Mon, Sep 21, 2015 at 5:59 PM, Marko Rauhamaawrote: > Chris Angelico : > >> On Mon, Sep 21, 2015 at 4:27 PM, Cameron Simpson wrote: >>> For sizes below 128, one byte of length. For sizes 128-16383, two bytes. And >>> so on. Compact yet unbounded. >> >> [...] >> >> It's generally a lot faster to do a read(2) than a loop with any >> number of read(1), and you get some kind of bound on your allocations. >> Whether that's important to you or not is another question, but >> certainly your chosen encoding is a good way of allowing arbitrary >> integer values. > > You can read a full buffer even if you have a variable-length length > encoding. Not sure what you mean there. Unless you can absolutely guarantee that you didn't read too much, or can absolutely guarantee that your buffering function will be the ONLY way anything reads from the socket, buffering is a problem. ChrisA -- https://mail.python.org/mailman/listinfo/python-list
Re: Lightwight socket IO wrapper
Chris Angelico: > On Mon, Sep 21, 2015 at 2:39 PM, Marko Rauhamaa wrote: >> Chris Angelico : >> >>> If you write a packet of data, then write another one, and another, >>> and another, and another, without waiting for responses, Nagling >>> should combine them automatically. [...] >> >> Unfortunately, Nagle and delayed ACK, which are both defaults, don't go >> well together (you get nasty 200-millisecond hickups). > > Only in the write-write-read scenario. Which is the case you brought up. Ideally, application code should be oblivious to the inner heuristics of the TCP implementation. IOW, write-write-read is perfectly valid and shouldn't lead to performance degradation. Unfortunately, the socket API doesn't provide a standard way for the application to tell the kernel that it is done sending for now. Linux's TCP_CORK+TCP_NODELAY is a nonstandard way but does the job quite nicely. >> As for the topic, TCP doesn't need wrappers to abstract away the >> difficult bits. That's a superficially good idea that leads to >> trouble. > > Depends what you're doing - if you're working with a higher level > protocol like HTTP, then abstracting away the difficult bits of TCP is > part of abstracting away the difficult bits of HTTP, and something > like 'requests' is superb. Naturally, a higher-level protocol hides the lower-level protocol. It in turn has intricacies of its own. Unfortunately, Python's stdlib HTTP facilities are too naive (ie, blocking, incompatible with asyncio) to be usable. Marko -- https://mail.python.org/mailman/listinfo/python-list
Re: Lightwight socket IO wrapper
On Mon, Sep 21, 2015 at 4:27 PM, Cameron Simpsonwrote: > I don't like embedding arbitrary size limits in protocols or data formats if > I can easily avoid it. So (for my home grown binary protocols) I encode > unsigned integers as big endian octets with the top bit meaning "another > octet follows" and the bottom 7 bits going to the value. So my packets look > like: > > encoded(length)data > > For sizes below 128, one byte of length. For sizes 128-16383, two bytes. And > so on. Compact yet unbounded. Ah, the MIDI Variable-Length Integer. Decent. It's generally a lot faster to do a read(2) than a loop with any number of read(1), and you get some kind of bound on your allocations. Whether that's important to you or not is another question, but certainly your chosen encoding is a good way of allowing arbitrary integer values. ChrisA -- https://mail.python.org/mailman/listinfo/python-list
Re: Lightwight socket IO wrapper
On 2015-09-21 09:47, Marko Rauhamaa wrote: Michael Ströder: Marko Rauhamaa wrote: Michael Ströder : Marko Rauhamaa wrote: I recommend using socket.TCP_CORK with socket.TCP_NODELAY where they are available (Linux). If these options are not available are both option constants also not available? Or does the implementation have to look into sys.platform? >>> import socket >>> 'TCP_CORK' in dir(socket) True On which platform was this done? Python3 on Fedora 21. Python2 on RHEL4. Sorry, don't have non-Linux machines to try. How to automagically detect whether TCP_CORK is really available on a platform? I sure hope 'TCP_CORK' in dir(socket) evaluates to False on non-Linux machines. On Windows 10: Python 3.5.0 (v3.5.0:374f501f4567, Sep 13 2015, 02:27:37) [MSC v.1900 64 bit (AMD64)] on win32 Type "help", "copyright", "credits" or "license" for more information. >>> import socket >>> 'TCP_CORK' in dir(socket) False >>> -- https://mail.python.org/mailman/listinfo/python-list
Re: Lightwight socket IO wrapper
On Mon, 2015-09-21, Dennis Lee Bieber wrote: > On Sun, 20 Sep 2015 23:36:30 +0100, "James Harris" >declaimed the following: ... >>I thought UDP would deliver (or drop) a whole datagram but cannot find >>anything in the Python documentaiton to guarantee that. In fact >>documentation for the send() call says that apps are responsible for >>checking that all data has been sent. They may mean that to apply to >>stream protocols only but it doesn't state that. (Of course, UDP >>datagrams are limited in size so the call may validly indicate >>incomplete transmission even when the first part of a big message is >>sent successfully.) >> > Looking in the wrong documentation > > You probably should be looking at the UDP RFC. Or maybe just > > http://www.diffen.com/difference/TCP_vs_UDP > > """ > Packets are sent individually and are checked for integrity only if they > arrive. Packets have definite boundaries which are honored upon receipt, > meaning a read operation at the receiver socket will yield an entire > message as it was originally sent. > """ > > Even if the IP layer has to fragment a UDP packet to meet limits of the > transport media, it should put them back together on the other end before > passing it up to the UDP layer. To my knowledge, UDP does not have a size > limit on the message (well -- a 16-bit length field in the UDP header). So they are "limited in size" like the OP wrote. (A TCP stream OTOH is potentially infinite.) But also, the IPv4 RFC says: All hosts must be prepared to accept datagrams of up to 576 octets (whether they arrive whole or in fragments). It is recommended that hosts only send datagrams larger than 576 octets if they have assurance that the destination is prepared to accept the larger datagrams. As for "all or nothing" with UDP datagrams, you also have the socket layer case where the user does read() into a 1000 octet buffer and the datagram was 1200 octets. With BSD sockets you can (if you try) detect this, but the extra 200 octets are lost forever. > But since it /is/ "got it all" or "dropped" with no inherent confirmation, > one > would have to embed their own protocol within it -- sequence numbers with > ACK/NAK, for example. Problem: if using LARGE UDP packets, this protocol > would mean having LARGE resends should packets be dropped or arrive out of > sequence (and since the ACK/NAK could be dropped too, you may have to > handle the case of a duplicated packet -- also large). > > TCP is a stream protocol -- the protocol will ensure that all data > arrives, and that it arrives in order, but does not enforce any boundaries > on the data; what started as a relatively large packet at one end may > arrive as lots of small packets due to intermediate transport limits (one > can visualize a worst case: each TCP packet is broken up to fit Hollerith > cards; 20bytes for header and 60 bytes of data -- then fed to a reader and > sent on AS-IS). The problem is IMO more this: the chunks of data that the application writes doesn't map to what the other application reads. In the lower layers, I don't expect TCP segments to be split, and IP fragmentation (if it happens at all) operates at an even lower level. However the end result is still just as you write: > Boundaries are the end-user responsibility... line endings > (look at SMTP, where an email message ends on a line containing just a ".") > or embedded length counter (not the TCP packet length). > >>Receiving no bytes is taken as indicating the end of the communication. >>That's OK for TCP but not for UDP so there should be a way to >>distinguish between the end of data and receiving an empty datagram. >> > I don't believe UDP supports a truly empty datagram (length of 0) -- > presuming a sending stack actually sends one, the receiving stack will > probably drop it as there is no data to pass on to a client UDP datagrams of length 0 work (just tried it on Linux). There's nothing special about it. > (there is a PR > at work because we have a UDP driver that doesn't drop 0-length messages, > but also can't deliver them -- so the circular buffer might fill with > undeliverable headers) Those messages should be delivered to the receiving socket, in the sense that they are sanity-checked, used to wake up the application and mark the socket readable, fill up one entry in the read queue and so on ... Of course your system at work may have the rights to be more restrictive, if it's special-purpose. /Jorgen -- // Jorgen Grahn O o . -- https://mail.python.org/mailman/listinfo/python-list
Re: Lightwight socket IO wrapper
On 21Sep2015 18:07, Chris Angelicowrote: On Mon, Sep 21, 2015 at 5:59 PM, Marko Rauhamaa wrote: Chris Angelico : On Mon, Sep 21, 2015 at 4:27 PM, Cameron Simpson wrote: For sizes below 128, one byte of length. For sizes 128-16383, two bytes. And so on. Compact yet unbounded. [...] It's generally a lot faster to do a read(2) than a loop with any number of read(1), and you get some kind of bound on your allocations. Whether that's important to you or not is another question, but certainly your chosen encoding is a good way of allowing arbitrary integer values. You can read a full buffer even if you have a variable-length length encoding. Not sure what you mean there. Unless you can absolutely guarantee that you didn't read too much, or can absolutely guarantee that your buffering function will be the ONLY way anything reads from the socket, buffering is a problem. I'm using buffered io streams, so that layer will be reading in chunks. Pulling things from that buffer with fp.read(1) is cheap enough for my use. Cheers, Cameron Simpson -- https://mail.python.org/mailman/listinfo/python-list
Re: Lightwight socket IO wrapper
Chris Angelico: > On Mon, Sep 21, 2015 at 5:59 PM, Marko Rauhamaa wrote: >> You can read a full buffer even if you have a variable-length length >> encoding. > > Not sure what you mean there. Unless you can absolutely guarantee that > you didn't read too much, or can absolutely guarantee that your > buffering function will be the ONLY way anything reads from the > socket, buffering is a problem. Only one reader can read a socket safely at any given time so mutual exclusion is needed. If you read "too much," the excess can be put in the application's read buffer where it is available for whoever wants to process the next message. Marko -- https://mail.python.org/mailman/listinfo/python-list
Re: Lightwight socket IO wrapper
On Mon, Sep 21, 2015 at 6:38 PM, Marko Rauhamaawrote: > Chris Angelico : > >> On Mon, Sep 21, 2015 at 5:59 PM, Marko Rauhamaa wrote: >>> You can read a full buffer even if you have a variable-length length >>> encoding. >> >> Not sure what you mean there. Unless you can absolutely guarantee that >> you didn't read too much, or can absolutely guarantee that your >> buffering function will be the ONLY way anything reads from the >> socket, buffering is a problem. > > Only one reader can read a socket safely at any given time so mutual > exclusion is needed. > > If you read "too much," the excess can be put in the application's read > buffer where it is available for whoever wants to process the next > message. Which works only if you have a single concept of "application's read buffer". That means that you have only one place that can ever read data. Imagine a -- https://mail.python.org/mailman/listinfo/python-list
Re: Lightwight socket IO wrapper
Chris Angelico: > On Mon, Sep 21, 2015 at 6:38 PM, Marko Rauhamaa wrote: >> Only one reader can read a socket safely at any given time so mutual >> exclusion is needed. >> >> If you read "too much," the excess can be put in the application's read >> buffer where it is available for whoever wants to process the next >> message. > > Which works only if you have a single concept of "application's read > buffer". Well, the socket's read buffer. Marko -- https://mail.python.org/mailman/listinfo/python-list
Re: Lightwight socket IO wrapper
Marko Rauhamaa wrote: > Michael Ströder: > >> Marko Rauhamaa wrote: >>> I recommend using socket.TCP_CORK with socket.TCP_NODELAY where they >>> are available (Linux). >> >> If these options are not available are both option constants also not >> available? Or does the implementation have to look into sys.platform? > >>>> import socket >>>> 'TCP_CORK' in dir(socket) >True On which platform was this done? To rephrase myquestion: How to automagically detect whether TCP_CORK is really available on a platform? 'TCP_CORK' in dir(socket) or catch AttributeError sys.platform=='linux2' hoping that Linux 2.1 or prior is not around anymore... ... Ciao, Michael. -- https://mail.python.org/mailman/listinfo/python-list
Re: Lightwight socket IO wrapper
Michael Ströder: > Marko Rauhamaa wrote: >> Michael Ströder : >> >>> Marko Rauhamaa wrote: I recommend using socket.TCP_CORK with socket.TCP_NODELAY where they are available (Linux). >>> >>> If these options are not available are both option constants also not >>> available? Or does the implementation have to look into sys.platform? >> >>>>> import socket >>>>> 'TCP_CORK' in dir(socket) >>True > > On which platform was this done? Python3 on Fedora 21. Python2 on RHEL4. Sorry, don't have non-Linux machines to try. > How to automagically detect whether TCP_CORK is really available on a > platform? I sure hope 'TCP_CORK' in dir(socket) evaluates to False on non-Linux machines. Marko -- https://mail.python.org/mailman/listinfo/python-list
Re: Lightwight socket IO wrapper
Marko Rauhamaa: > Chris Angelico : > >> On Mon, Sep 21, 2015 at 6:38 PM, Marko Rauhamaa wrote: >>> Only one reader can read a socket safely at any given time so mutual >>> exclusion is needed. >>> >>> If you read "too much," the excess can be put in the application's read >>> buffer where it is available for whoever wants to process the next >>> message. >> >> Which works only if you have a single concept of "application's read >> buffer". > > Well, the socket's read buffer. To be exact, the application should associate a read buffer with each socket. Marko -- https://mail.python.org/mailman/listinfo/python-list
Re: Lightwight socket IO wrapper
On Mon, Sep 21, 2015 at 6:38 PM, Marko Rauhamaawrote: > Chris Angelico : > >> On Mon, Sep 21, 2015 at 5:59 PM, Marko Rauhamaa wrote: >>> You can read a full buffer even if you have a variable-length length >>> encoding. >> >> Not sure what you mean there. Unless you can absolutely guarantee that >> you didn't read too much, or can absolutely guarantee that your >> buffering function will be the ONLY way anything reads from the >> socket, buffering is a problem. > > Only one reader can read a socket safely at any given time so mutual > exclusion is needed. > > If you read "too much," the excess can be put in the application's read > buffer where it is available for whoever wants to process the next > message. Oops, premature send - sorry! Trying again. Which works only if you have a single concept of "application's read buffer". That means that you have only one place that can ever read data. Imagine a protocol that mainly consists of lines of text terminated by CRLF, but allows binary data to be transmitted by sending "DATA N\r\n" followed by N arbitrary bytes. The simplest and most obvious way to handle the base protocol is to buffer your reads as much as possible, but that means potentially reading the beginning of the data stream along with its header. You therefore cannot use the basic read() method to read that data - you have to use something from your line-based wrapper, even though you are decidedly NOT using a line-based protocol at that point. That's what I mean by guaranteeing that your buffering function is the only way data gets read from the socket. Either that, or you need an underlying facility for un-reading a bunch of data - de-buffering and making it readable again. ChrisA -- https://mail.python.org/mailman/listinfo/python-list
Lightwight socket IO wrapper
I guess there have been many attempts to make socket IO easier to handle and a good number of those have been in Python. The trouble with trying to improve something which is already well designed (and conciously left as is) is that the so-called improvement can become much more complex and overly elaborate. That can apply to the initial idea, for sure, but when writing helper or convenience functions perhaps it applies more to the temptation to keep adding just a little bit extra. The end result can be overly elaborate such as a framework which is fine where such is needed but is overkill for simpler requirements. Do you guys have any recommendations of some *lightweight* additions to Python socket IO before I write any more of my own? Something built in to Python would be much preferred over any modules which have to be added. I had in the back of my mind that there was a high-level socket-IO library - much as threading was added as a wrapper to the basic thread module - but I cannot find anything above socket. Is there any? A current specific to illustrate where basic socket IO is limited: it normally provides no guarantees over how many bytes are transferred at a time (AFAICS that's true for both streams and datagrams) so the delimiting of messages/records needs to be handled by the sender and receiver. I do already handle some of this myself but I wondered if there was a prebuilt solution that I should be using instead - to save me adding just a little bit extra. ;-) James -- https://mail.python.org/mailman/listinfo/python-list
Re: Lightwight socket IO wrapper
"Akira Li" <4kir4...@gmail.com> wrote in message news:mailman.37.1442754893.21674.python-l...@python.org... "James Harris"writes: I guess there have been many attempts to make socket IO easier to handle and a good number of those have been in Python. The trouble with trying to improve something which is already well designed (and conciously left as is) is that the so-called improvement can become much more complex and overly elaborate. That can apply to the initial idea, for sure, but when writing helper or convenience functions perhaps it applies more to the temptation to keep adding just a little bit extra. The end result can be overly elaborate such as a framework which is fine where such is needed but is overkill for simpler requirements. Do you guys have any recommendations of some *lightweight* additions to Python socket IO before I write any more of my own? Something built in to Python would be much preferred over any modules which have to be added. I had in the back of my mind that there was a high-level socket-IO library - much as threading was added as a wrapper to the basic thread module - but I cannot find anything above socket. Is there any? Does ØMQ qualify as lightweight? It's certainly interesting. It's puzzling, too. For example, http://zguide.zeromq.org/py:hwserver The Python code there includes message = socket.recv() but given that this is a TCP socket it doesn't look like there is any way for the stack to know how many bytes to return. Either ZeroMQ layers another end-to-end protocol on top of TCP (which would be no good) or it will be guessing (which would not be good either). There are probably answers to that query but there is a lot of documentation, including on reliable communication, and that in itself makes ZeroMQ seem overkill, even if it can be persuaded to do what I want. I am impressed that they show code in many languages. I may come back to it but for the moment it doesn't seem to be what I was looking for. And it is not built in. A current specific to illustrate where basic socket IO is limited: it normally provides no guarantees over how many bytes are transferred at a time (AFAICS that's true for both streams and datagrams) so the delimiting of messages/records needs to be handled by the sender and receiver. I do already handle some of this myself but I wondered if there was a prebuilt solution that I should be using instead - to save me adding just a little bit extra. ;-) There are already convenience functions in stdlib such as sock.sendall(), sock.sendfile(), socket.create_connection() in addition to BSD Sockets API. If you want to extend this list and have specific suggestions; see https://docs.python.org/devguide/stdlibchanges.html That may be a bit overkill just now but it's a good suggestion. Or just describe your current specific issue in more detail here. There are a few things and more crop up as time goes on. For example, over TCP it would be helpful to have a function to receive a specific number of bytes or one to read bytes until reaching a certain delimiter such as newline or zero or space etc. Even better would be to be able to use the iteration protocol so you could just code next() and get the next such chunk of read in a for loop. When sending it would be good to just say to send a bunch of bytes but know that you will get told how many were sent (or didn't get sent) if it fails. Sock.sendall() doesn't do that. I thought UDP would deliver (or drop) a whole datagram but cannot find anything in the Python documentaiton to guarantee that. In fact documentation for the send() call says that apps are responsible for checking that all data has been sent. They may mean that to apply to stream protocols only but it doesn't state that. (Of course, UDP datagrams are limited in size so the call may validly indicate incomplete transmission even when the first part of a big message is sent successfully.) Receiving no bytes is taken as indicating the end of the communication. That's OK for TCP but not for UDP so there should be a way to distinguish between the end of data and receiving an empty datagram. The recv calls require a buffer size to be supplied which is a technical detail. A Python wrapper could save the programmer dealing with that. Reminder to self: encoding issues. None of the above is difficult to write and I have written the bits I need myself but, basically, there are things that would make socket IO easier and yet still compatible with more long-winded code. So I wondered if there were already some Python modules which were more convenient than what I found in the documentation. James -- https://mail.python.org/mailman/listinfo/python-list
Re: Lightwight socket IO wrapper
On Mon, Sep 21, 2015 at 10:19 AM, Dennis Lee Bieberwrote: > Even if the IP layer has to fragment a UDP packet to meet limits of > the > transport media, it should put them back together on the other end before > passing it up to the UDP layer. To my knowledge, UDP does not have a size > limit on the message (well -- a 16-bit length field in the UDP header). But > since it /is/ "got it all" or "dropped" with no inherent confirmation, one > would have to embed their own protocol within it -- sequence numbers with > ACK/NAK, for example. Problem: if using LARGE UDP packets, this protocol > would mean having LARGE resends should packets be dropped or arrive out of > sequence (and since the ACK/NAK could be dropped too, you may have to > handle the case of a duplicated packet -- also large). > If you're going to add sequencing and acknowledgements to UDP, wouldn't it be easier to use TCP and simply prefix every message with a two-byte length? UDP is great when order doesn't matter and each packet stands entirely alone. DNS is a well-known example - the question "What is the IP address for www.rosuav.com?" doesn't in any way affect the question "What is the mail server for gmail.com?", so you fire off UDP packets for each one, and get responses whenever you get them. UDP's also perfect for a heartbeat system - you send out a packet every however-often, and if the monitor hasn't heard from you in X seconds, it starts alerting people. No need for responses of any kind there. But for working with a stream, I usually find it's a lot easier to build on top of TCP than UDP. ChrisA -- https://mail.python.org/mailman/listinfo/python-list
Re: Lightwight socket IO wrapper
"James Harris"writes: > I guess there have been many attempts to make socket IO easier to > handle and a good number of those have been in Python. > > The trouble with trying to improve something which is already well > designed (and conciously left as is) is that the so-called improvement > can become much more complex and overly elaborate. That can apply to > the initial idea, for sure, but when writing helper or convenience > functions perhaps it applies more to the temptation to keep adding > just a little bit extra. The end result can be overly elaborate such > as a framework which is fine where such is needed but is overkill for > simpler requirements. > > Do you guys have any recommendations of some *lightweight* additions > to Python socket IO before I write any more of my own? Something built > in to Python would be much preferred over any modules which have to be > added. I had in the back of my mind that there was a high-level > socket-IO library - much as threading was added as a wrapper to the > basic thread module - but I cannot find anything above socket. Is > there any? Does ØMQ qualify as lightweight? > A current specific to illustrate where basic socket IO is limited: it > normally provides no guarantees over how many bytes are transferred at > a time (AFAICS that's true for both streams and datagrams) so the > delimiting of messages/records needs to be handled by the sender and > receiver. I do already handle some of this myself but I wondered if > there was a prebuilt solution that I should be using instead - to save > me adding just a little bit extra. ;-) There are already convenience functions in stdlib such as sock.sendall(), sock.sendfile(), socket.create_connection() in addition to BSD Sockets API. If you want to extend this list and have specific suggestions; see https://docs.python.org/devguide/stdlibchanges.html Or just describe your current specific issue in more detail here. -- https://mail.python.org/mailman/listinfo/python-list
Re: Lightwight socket IO wrapper
On 21Sep2015 10:34, Chris Angelicowrote: If you're going to add sequencing and acknowledgements to UDP, wouldn't it be easier to use TCP and simply prefix every message with a two-byte length? Frankly, often yes. That's what I do. (different length encoding, but otherwise...) UDP's neat if you do not care if a packet fails to arrive and if you can guarentee that your data fits in a packet in the face of different MTUs. I like TCP myself, most of the time. Another nice thing about TCP is that wil a little effort you get to pack multiple data packets (or partial data packets) into a network packet, etc. Cheers, Cameron Simpson If you lie to the compiler, it will get its revenge.- Henry Spencer -- https://mail.python.org/mailman/listinfo/python-list
Re: Lightwight socket IO wrapper
"James Harris"writes: ... > There are a few things and more crop up as time goes on. For example, > over TCP it would be helpful to have a function to receive a specific > number of bytes or one to read bytes until reaching a certain > delimiter such as newline or zero or space etc. The answer is sock.makefile('rb') then `file.read(nbytes)` returns a specific number of bytes. `file.readline()` reads until newline (b'\n') There is Python Issue: "Add support for reading records with arbitrary separators to the standard IO stack" http://bugs.python.org/issue1152248 See also http://bugs.python.org/issue17083 Perhaps, it is easier to implement read_until(sep) that is best suited for a particular case. > Even better would be to be able to use the iteration protocol so you > could just code next() and get the next such chunk of read in a for > loop. file is an iterator over lines i.e., next(file) works. > When sending it would be good to just say to send a bunch of bytes but > know that you will get told how many were sent (or didn't get sent) if > it fails. Sock.sendall() doesn't do that. sock.send() returns the number of bytes sent that may be less than given. You could reimplement sock.sendall() to include the number of bytes successfully sent in case of an error. > I thought UDP would deliver (or drop) a whole datagram but cannot find > anything in the Python documentaiton to guarantee that. In fact > documentation for the send() call says that apps are responsible for > checking that all data has been sent. They may mean that to apply to > stream protocols only but it doesn't state that. (Of course, UDP > datagrams are limited in size so the call may validly indicate > incomplete transmission even when the first part of a big message is > sent successfully.) > > Receiving no bytes is taken as indicating the end of the > communication. That's OK for TCP but not for UDP so there should be a > way to distinguish between the end of data and receiving an empty > datagram. There is no end of communication in UDP and therefore there is no end of data. If you've got a zero bytes in return then it means that you've received a zero length datagram. sock.recvfrom() is a thin wrapper around the corresponding C function. You could read any docs you like about UDP sockets. http://stackoverflow.com/questions/5307031/how-to-detect-receipt-of-a-0-length-udp-datagram > The recv calls require a buffer size to be supplied which is a > technical detail. A Python wrapper could save the programmer dealing > with that. It is not just a buffer size. It is the maximum amount of data to be received at once i.e., sock.recv() may return less but never more. You could use makefile() and read() if recv() is too low-level. > Reminder to self: encoding issues. > > None of the above is difficult to write and I have written the bits I > need myself but, basically, there are things that would make socket IO > easier and yet still compatible with more long-winded code. So I > wondered if there were already some Python modules which were more > convenient than what I found in the documentation. > > James -- https://mail.python.org/mailman/listinfo/python-list
Re: Lightwight socket IO wrapper
Dennis Lee Bieber wrote: worst case: each TCP packet is broken up to fit Hollerith cards; Or printed on strips of paper and tied to pigeons: https://en.wikipedia.org/wiki/IP_over_Avian_Carriers -- Greg -- https://mail.python.org/mailman/listinfo/python-list
Re: Lightwight socket IO wrapper
On Mon, Sep 21, 2015 at 11:55 AM, Cameron Simpsonwrote: > On 21Sep2015 10:34, Chris Angelico wrote: >> >> If you're going to add sequencing and acknowledgements to UDP, >> wouldn't it be easier to use TCP and simply prefix every message with >> a two-byte length? > > > Frankly, often yes. That's what I do. (different length encoding, but > otherwise...) Out of interest, what encoding? With most protocols, I would prefer to encode in ASCII digits terminated by end-of-line, but for arbitrary content you're packaging up, it's usually easier to read 2 bytes (or 4 or whatever you want to specify), then read that many bytes, and that's your content. No buffering required - you'll never read past the end of a packet. > UDP's neat if you do not care if a packet fails to arrive and if you can > guarentee that your data fits in a packet in the face of different MTUs. > I like TCP myself, most of the time. Another nice thing about TCP is that > wil a little effort you get to pack multiple data packets (or partial data > packets) into a network packet, etc. Emphatically - a little effort sometimes, and other times no effort at all! If you write a packet of data, then write another one, and another, and another, and another, without waiting for responses, Nagling should combine them automatically. And even if they're not deliberately queued by Nagle's Algorithm, packets can get combined for other reasons. So, yeah! Definitely can help a lot with packet counts on small writes. ChrisA -- https://mail.python.org/mailman/listinfo/python-list
Re: Lightwight socket IO wrapper
Chris Angelico: > On Mon, Sep 21, 2015 at 11:55 AM, Cameron Simpson wrote: >> Another nice thing about TCP is that wil a little effort you get to >> pack multiple data packets (or partial data packets) into a network >> packet, etc. > > Emphatically - a little effort sometimes, and other times no effort at > all! If you write a packet of data, then write another one, and > another, and another, and another, without waiting for responses, > Nagling should combine them automatically. And even if they're not > deliberately queued by Nagle's Algorithm, packets can get combined for > other reasons. So, yeah! Definitely can help a lot with packet counts > on small writes. Unfortunately, Nagle and delayed ACK, which are both defaults, don't go well together (you get nasty 200-millisecond hickups). I recommend using socket.TCP_CORK with socket.TCP_NODELAY where they are available (Linux). They give you Nagle without delayed ACK. See http://linux.die.net/man/7/tcp> As for the topic, TCP doesn't need wrappers to abstract away the difficult bits. That's a superficially good idea that leads to trouble. Marko -- https://mail.python.org/mailman/listinfo/python-list
Re: Lightwight socket IO wrapper
On Mon, Sep 21, 2015 at 2:39 PM, Marko Rauhamaawrote: > Chris Angelico : > >> On Mon, Sep 21, 2015 at 11:55 AM, Cameron Simpson wrote: >>> Another nice thing about TCP is that wil a little effort you get to >>> pack multiple data packets (or partial data packets) into a network >>> packet, etc. >> >> Emphatically - a little effort sometimes, and other times no effort at >> all! If you write a packet of data, then write another one, and >> another, and another, and another, without waiting for responses, >> Nagling should combine them automatically. And even if they're not >> deliberately queued by Nagle's Algorithm, packets can get combined for >> other reasons. So, yeah! Definitely can help a lot with packet counts >> on small writes. > > Unfortunately, Nagle and delayed ACK, which are both defaults, don't go > well together (you get nasty 200-millisecond hickups). Only in the write-write-read scenario. If you write-read-write-read, or if your reads don't depend on your writes, then Nagle + delayed ACK works just fine. But if you write a bunch of stuff, then block waiting for the other end to respond, and then write multiple times, and wait for a response, _then_ the pair work badly together, yes. > As for the topic, TCP doesn't need wrappers to abstract away the > difficult bits. That's a superficially good idea that leads to trouble. Depends what you're doing - if you're working with a higher level protocol like HTTP, then abstracting away the difficult bits of TCP is part of abstracting away the difficult bits of HTTP, and something like 'requests' is superb. But if you're inventing your own protocol, directly on top of a BSD socket, then I would agree - just call socket functions directly. Otherwise you risk nasty surprises when your file-like object has ridiculous performance problems. ChrisA -- https://mail.python.org/mailman/listinfo/python-list