Re: [Zope] Re: Running more than one instance on windows often block each other
[Sune B. Woeller] ... This is what I'm experiencing as well. I can narrow it down a bit: I *always* experience one out of two erroneous behaviours, as described below. I see only one of the behaviors below (the second -- no problems), and don't agree it's in error. I tried to make an even simpler test situation, without binding sockets 'r' and 'w' to each other in the same process. I try to reproduce the problem in a 'standard' socket use case, where a client in one process binds to a server in another process. The following two scripts acts as a server and a client. #*** # sock_server_reader.py #*** import socket a = socket.socket (socket.AF_INET, socket.SOCK_STREAM) Note that a = socket.socket() is an easier way to spell the same thing; the Medusa code is ancient. a.bind((127.0.0.1, 1)) print a.getsockname() # assigned (host, port) pair a.listen(1) print a accepting: r, addr = a.accept() # r becomes asyncore's (self.)socket print a accepted: print ' ' + str(r.getsockname()) + ', peer=' + str(r.getpeername()) a.close() Key point: no socket is _listening_ on address (127.0.0.1, 1) after this close(). From what comes later, I guess you believe that no socket should be allowed to listen on that address again until all connections made with that `a` also close, but I don't think you'll find anything in socket documentation to support that belief. In the world of socket connections, what needs to be unique is _the connection_, and that's a 4-tuple: (side 1 host, side 1 port, side 2 host, side 2 port) There's no prohibition against seeing either side's address in any number of connections simultaneously, you just can't have two connections simultaneouly that match in all 4 positions. It so happens that Windows is happy to allow another socket to bind to a port the instant after a socket that had been listening on it closes (and regardless of whether connections made via the latter are still open), but I don't believe that's a bug. What I appear to be seeing is that sometimes-- rarely --Windows allows binding to a port by two sockets simultaneously, not serially as you're showing here. Simultaneous binding (in the absence of SO_REUSEADDR on Windows) is a bug. msg = r.recv(100) print 'msg recieved:', msg #*** # sock_client_writer.py #*** import socket, random w = socket.socket (socket.AF_INET, socket.SOCK_STREAM) w.setsockopt(socket.IPPROTO_TCP, 1, 1) print 'w connecting:' w.connect(('127.0.0.1', 1)) print 'w connected:' print w.getsockname() print ' ' + str(w.getsockname()) + ', peer=' + str(w.getpeername()) msg = str(random.randrange(100)) print 'sending msg: ', msg w.send(msg) There are two possible outcomes [a) and b)] of running two instances of this client/server pair (that is, 4 processes in total like the following). (Numbers 1 to 4 are steps executed in chronological order.) 1) python -i sock_server_reader.py So -i keeps the connection open -- these programs never finish. The server prints: ('127.0.0.1', 1) a accepting: and waits for a connection 2) python -i sock_client_writer.py The client prints: w connecting: w connected: ('127.0.0.1', 3774) ('127.0.0.1', 3774), peer=('127.0.0.1', 1) sending msg: 903848 and the server now accepts the connection and prints: a accepted: ('127.0.0.1', 1), peer=('127.0.0.1', 3774) msg recieved: 903848 This is like it should be. Agreed so far wink. Then lets try to setup a second client/server pair, on the same port (1). The expected outcome of this is that the bind() call in sock_server_reader.py should fail with socket.error: (10048, 'Address already in use'). Sorry, I don't expect that. sock_server_reader is no longer listening on port 1, so there's no reason some other socket can't start listening on it. 3) python -i sock_server_reader.py The server prints: ('127.0.0.1', 1) a accepting: Already here the problem occurs, bind() is allowed to bind to a port that is in use, in this case by the client socket 'r'. [also on other windows ? Mikkel: yes. Diku:???] I showed an example before of how you can get any number (well, up to 64K) of sockets simultaneously alive saying they're bound to the same address, on Windows or Linux. The socket returned by a.accept() always duplicates a's (hosthame, port) address. That's so that if the peer asks for its peer, it gets back the address it originally connected to. It may be confusing, but that's how it works. Windows and Linux seem to differ in how willing they are to reuse a port after a listening socket is closed, but dollars to doughnuts says Microsoft wouldn't accept a claim that their behavior is a bug. 4) python -i sock_client_writer.py Now one out of two things happen: a) The client prints: w connecting:
Re: [Zope] Re: Running more than one instance on windows often block each other
[Sune B. Woeller] ... But then I stumbled upon this flag in the WinSock documentation: SO_EXCLUSIVEADDRUSE See the description here: http://msdn.microsoft.com/library/default.asp?url=/library/en-us/winsock/winsock/using_so_exclusiveaddruse.asp Right, I vaguely wink knew about that. Note that the documentation explicitly states that in the absence of SO_EXCLUSIVEADDRUSE, the port may be reused as soon as the socket on which bind was called (that is, the socket the connection was originated on or the listening socket) is closed. IOW, that's your case #b from earlier email, and Windows is just doing what's documented there. Believe it or not, I haven't found any Linux-ish docs as clear as these MS docs about the behavior of its bind() in all cases. There are problems with SO_EXCLUSIVEADDRUSE too, which Google will find. A big one is that many versions of Windows require admin privs to set this option, including many versions of Windows Server, and WinXP through SP1. That was a bug, but it's only recently been fixed (in SP2 for WinXP). At this point, I wouldn't consider using it unless someone first took the tedious time it needs to demonstrate that when it is used, the thing that _I_ think is a bug here goes away in its presence: the seeming ability of Windows to sometimes permit more than one socket to bind to the same address simultaneously (not serially -- Windows does seem to prevent that reliably). If you can, I would like you to try the ZODB 3.4 Windows socket dance code, and see if it works for you in practice. I know it's not bulletproof, but it's portable across all flavors of Windows and is much better-behaved in my tests so far than the Medusa Windows socket dance. It is very interesting reading, especially: An important caveat to using the SO_EXCLUSIVEADDRUSE option exists: If one or more connections originating from (or accepted on) a port bound with SO_EXCLUSIVEADDRUSE is active, all bind attempts to that port will fail. Note too that they describe that as an important caveat (a warning), not as a feature. They go on to explain that active means all of the ESTABLISHED, FIN_WAIT, FIN_WAIT_2, and LAST_ACK states, meaning the port stays tied up (in reality) for minutes even after the `r` and `w` sockets are closed. That's a 50% increase then in the # of ports each trigger tiies up for an arbitrarily long time. ... There is a python bugfix for this, but only for python 2.4: http://sourceforge.net/tracker/index.php?func=detailaid=982665group_id=5470atid=305470 (It is added to version 1.294 of socketmodule.c) That's not a real problem; if needed this could easily be done under Python 2.3.5 too (the patch only adds a symbolic name for a fixed integer; the integer could be hard-coded when not hasattr(socket, SO_EXCLUSIVEADDRUSE) -- much as the current Medusa dance hardcodes 1 instead of using socket.TCP_NODELAY). ___ Zope maillist - Zope@zope.org http://mail.zope.org/mailman/listinfo/zope ** No cross posts or HTML encoding! ** (Related lists - http://mail.zope.org/mailman/listinfo/zope-announce http://mail.zope.org/mailman/listinfo/zope-dev )
Re: [Zope] Re: Running more than one instance on windows often block each other
[Tim] ... At this point, I wouldn't consider using it [SO_EXCLUSIVEADDRUSE] unless someone first took the tedious time it needs to demonstrate that when it is used, the thing that _I_ think is a bug here goes away in its presence: the seeming ability of Windows to sometimes permit more than one socket to bind to the same address simultaneously (not serially -- Windows does seem to prevent that reliably). I started, but didn't get that far. The first time I ran a pair of processes with the attached (Python 2.4.1, WinXP Pro SP2), one fell over with ... w.connect((host, port)) File string, line 1, in connect socket.error: (10048, 'Address already in use') after about 20 minutes. So, on the face of it, playing with SO_EXCLUSIVEADDRUSE is no better than the ZODB 3.4 Windows socket dance. Both appear mounds better-behaved than the Medusa Windows socket dance without SO_EXCLUSIVEADDRUSE, though. Since there are fewer other problems associated with the ZODB 3.4 version (see last email), I'd like to repeat this part: If you can, I would like you to try the ZODB 3.4 Windows socket dance code, and see if it works for you in practice. I know it's not bulletproof, but it's portable across all flavors of Windows and is much better-behaved in my tests so far than the Medusa Windows socket dance. Bulletproof appears impossible due to what still look like race bugs in the Windows socket implementation. Here's the code. Note that it changed to try (no more than) 10,000 ports, although I didn't see it need to go through more than 200: import socket, errno import time, random class BindError(Exception): pass def socktest15(): Like socktest1, but w/o pointless blocking games. Added SO_EXCLUSIVEADDRUSE to the server socket. a = socket.socket() w = socket.socket() a.setsockopt(socket.SOL_SOCKET, socket.SO_EXCLUSIVEADDRUSE, 1) # set TCP_NODELAY to true to avoid buffering w.setsockopt(socket.IPPROTO_TCP, 1, 1) # tricky: get a pair of connected sockets host = '127.0.0.1' port = 1 while 1: try: a.bind((host, port)) break except: if port = 1: raise BindError, 'Cannot bind trigger!' port -= 1 port2count[port] = port2count.get(port, 0) + 1 a.listen(1) w.connect((host, port)) r, addr = a.accept() a.close() return (r, w) def close(r, w): for s in r, w: s.close() return # the fancy stuff below didn't help or hurt for s in w, r: s.shutdown(socket.SHUT_WR) for s in w, r: while 1: msg = s.recv(10) if msg == : break print eh?!, repr(msg) for s in w, r: s.close() port2count = {} def dump(): print items = port2count.items() items.sort() for pair in items: print %5d %7d % pair sofar = [] i = 0 try: while 1: if i % 1000 == 0: dump() i += 1 print '.', try: stuff = socktest15() except RuntimeError: raise sofar.append(stuff) time.sleep(random.random()/10) if len(sofar) == 50: tup = sofar.pop(0) r, w = tup msg = str(random.randrange(100)) w.send(msg) msg2 = r.recv(100) assert msg == msg2, (msg, msg2, r.getsockname(), w.getsockname()) close(r, w) except KeyboardInterrupt: for tup in sofar: close(*tup) ___ Zope maillist - Zope@zope.org http://mail.zope.org/mailman/listinfo/zope ** No cross posts or HTML encoding! ** (Related lists - http://mail.zope.org/mailman/listinfo/zope-announce http://mail.zope.org/mailman/listinfo/zope-dev )
Re: [Zope] Re: Running more than one instance on windows often block each other
[Sune B. Woeller] I will try to recreate the problem on other flavours of windows asap. I will get back to you later. Cool! If you can, posting a self-contained program that demonstrates the problem is the best way to make progress. I guess my reporting was a bit too quick, sorry: Not at all -- you did excellent detective work here! It's appreciated. The problem is that English descriptions are nearly always ambiguous, especially when trying to explain something complicated that other people haven't reported. Posting a program removes all that guesswork: it reproduces the problem for other people on other boxes, or it doesn't, and we learn something valuable either way; if it does fail for others, then they can help investigate _why_ it fails. At the start, thoroughly demonstrating a problem exists is more important than guessing at what might be needed to worm around it. I'm running python 2.3.5, (installed from windows binary). Zope 2.7.7 (not necessary for the test scripts) Windows XP Home SP2 (blush - my laptop came with that... ;) ) Good -- thanks. A pretty vanilla system, then. I've heard that XP Home has special limitations on network capabilities, but don't know more than that; it's at least possible they're relevant. I'm not sure that running multiple Zope instances on a laptop is a prime use case for Zope wink. ___ Zope maillist - Zope@zope.org http://mail.zope.org/mailman/listinfo/zope ** No cross posts or HTML encoding! ** (Related lists - http://mail.zope.org/mailman/listinfo/zope-announce http://mail.zope.org/mailman/listinfo/zope-dev )