[Tim Peters] ... > .... Ran that loop in two processes. No hangs, or any > other oddities, for some minutes. It did _eventually_ hang-- and both > processes at the same time --with netstat showing more than 4000 > sockets hanging around in TIME_WAIT state then. I assume I bashed > into some internal Windows socket resource limit there, which Windows > didn't handle gracefully. Attaching to the processes under the MSVC 6 > debugger, they were hung inside the MS socket libraries. Repeated > this several times (everything appeared to work fine until > 4000 > sockets were sitting in TIME_WAIT, and then both processes hung at > approximately the same time).
More info on that: since WinXP Pro supplies only about 4000 ephemeral ports by default, and the program kept hanging after about 4000 ephemeral ports were in use (albeit most in their 4-minute TIME_WAIT shutdown state), I tried boosting the # of ephemeral ports: http://support.microsoft.com/kb/q196271 After that, I never saw the processes hang again. BUT, I saw something worse: after about 20 minutes, both processes died with assert errors, in the code I added to verify that the sockets were communicating correctly. The random string created in process A was actually read by a socket in process B (instead of by its pair in process A), and vice versa: the random string created in process B was read in process A, and at approximately the same time process B was reading process A's string. I tried it again, and got a pair of similar assert failures after about 15 minutes. That's dreadful, and I don't see how it could be anything except a race bug in the Windows socket implementation. The same program on Linux doesn't run long enough to say anything interesting -- it raises "BindError, 'Cannot bind trigger!'" very quickly every time, because it apparently keeps server port numbers (19999, 19998, ,,,) reserved for "a long time" after the server socket is closed (where "a long time" just means longer than the few seconds it takes for the program to die on Linux). All of the above is wrt using socktest1() below. socktest2() below contains the Windows code I already changed ZODB 3.4 to use. I've been running socktest2() in two processes that way on Windows for more than 2 hours now, with no glitches. The same code is running fine on a Linux box too. So best guess now is that there is a subtle, rare error in the Windows socket code that could cause the Medusa/ZODB3.2 Windows trigger code to screw up. Complete code: import socket, errno import time, random class BindError(Exception): pass def socktest1(): """blabla """ address = ('127.9.9.9', 19999) a = socket.socket (socket.AF_INET, socket.SOCK_STREAM) w = socket.socket (socket.AF_INET, socket.SOCK_STREAM) # set TCP_NODELAY to true to avoid buffering w.setsockopt(socket.IPPROTO_TCP, 1, 1) # tricky: get a pair of connected sockets host='127.0.0.1' port=19999 while 1: if port < 19999: print port try: a.bind((host, port)) break except: if port <= 19950: raise BindError, 'Cannot bind trigger!' port -= 1 a.listen (1) w.setblocking (0) try: w.connect ((host, port)) except: pass r, addr = a.accept() a.close() w.setblocking (1) #return (a, w, r) return (r, w) #return w def socktest2(): a = socket.socket() w = socket.socket() # set TCP_NODELAY to true to avoid buffering w.setsockopt(socket.IPPROTO_TCP, 1, 1) # Specifying port 0 tells Windows to pick a port for us. a.bind(("127.0.0.1", 0)) connect_address = a.getsockname() # assigned (host, port) pair a.listen(1) w.connect(connect_address) r, addr = a.accept() # r becomes asyncore's (self.)socket a.close() #return (a, w, r) return (r, w) #return w sofar =  try: while 1: print '.', stuff = socktest1() sofar.append(stuff) time.sleep(random.random()/10) if len(sofar) == 50: tup = sofar.pop(0) r, w = tup msg = str(random.randrange(1000000)) w.send(msg) msg2 = r.recv(100) assert msg == msg2, (msg, msg2) for s in tup: s.close() except KeyboardInterrupt: for tup in sofar: for s in tup: s.close() _______________________________________________ Zope maillist - Zope@zope.org http://mail.zope.org/mailman/listinfo/zope ** No cross posts or HTML encoding! ** (Related lists - http://mail.zope.org/mailman/listinfo/zope-announce http://mail.zope.org/mailman/listinfo/zope-dev )