Re: [Zope] Re: Running more than one instance on windows often block each other

2005-07-28 Thread Tim Peters
[Sune B. Woeller]
 ...
 This is what I'm experiencing as well.
 I can narrow it down a bit: I *always* experience one out of two
 erroneous behaviours, as described below.

I see only one of the behaviors below (the second -- no problems), and
don't agree it's in error.

 I tried to make an even simpler test situation, without binding
 sockets 'r' and 'w' to each other in the same process. I try to
 reproduce the problem in a 'standard' socket use case, where a client
 in one process binds to a server in another process.
 
 The following two scripts acts as a server and a client.
 
 #***
 # sock_server_reader.py
 #***
 import socket

 a = socket.socket (socket.AF_INET, socket.SOCK_STREAM)

Note that

a = socket.socket()

is an easier way to spell the same thing; the Medusa code is ancient.

 a.bind((127.0.0.1, 1))
 print a.getsockname()  # assigned (host, port) pair

 a.listen(1)

 print a accepting:
 r, addr = a.accept()  # r becomes asyncore's (self.)socket
 print a accepted: 
 print ' ' + str(r.getsockname()) + ', peer=' + str(r.getpeername())
 
 a.close()

Key point:  no socket is _listening_ on address (127.0.0.1, 1)
after this close().  From what comes later, I guess you believe that
no socket should be allowed to listen on that address again until all
connections made with that `a` also close, but I don't think you'll
find anything in socket documentation to support that belief.  In the
world of socket connections, what needs to be unique is _the
connection_, and that's a 4-tuple:

(side 1 host, side 1 port, side 2 host, side 2 port)

There's no prohibition against seeing either side's address in any
number of connections simultaneously, you just can't have two
connections simultaneouly that match in all 4 positions.  It so
happens that Windows is happy to allow another socket to bind to a
port the instant after a socket that had been listening on it closes
(and regardless of whether connections made via the latter are still
open), but I don't believe that's a bug.

What I appear to be seeing is that sometimes-- rarely --Windows allows
binding to a port by two sockets simultaneously, not serially as
you're showing here.  Simultaneous binding (in the absence of
SO_REUSEADDR on Windows) is a bug.
  
 msg = r.recv(100)
 print 'msg recieved:', msg


 #***
 # sock_client_writer.py
 #***
 import socket, random
 
 w = socket.socket (socket.AF_INET, socket.SOCK_STREAM)
 w.setsockopt(socket.IPPROTO_TCP, 1, 1)

 print 'w connecting:'
 w.connect(('127.0.0.1', 1))
 print 'w connected:'
 print w.getsockname()
 print ' ' + str(w.getsockname()) + ', peer=' + str(w.getpeername())
 msg = str(random.randrange(100))
 print 'sending msg: ', msg
 w.send(msg)

 There are two possible outcomes [a) and b)] of running two instances
 of this client/server pair (that is, 4 processes in total like the
 following).
 (Numbers 1 to 4 are steps executed in chronological order.)

 1) python -i sock_server_reader.py

So -i keeps the connection open -- these programs never finish.

 The server prints:
 ('127.0.0.1', 1)
 a accepting:
 and waits for a connection

 2) python -i sock_client_writer.py
 The client prints:
 w connecting:
 w connected:
 ('127.0.0.1', 3774)
  ('127.0.0.1', 3774), peer=('127.0.0.1', 1)
 sending msg:  903848
 

 and the server now accepts the connection and prints:
 a accepted:
  ('127.0.0.1', 1), peer=('127.0.0.1', 3774)
 msg recieved: 903848
 

 This is like it should be.

Agreed so far wink.

 Then lets try to setup a second
 client/server pair, on the same port (1). The expected outcome of
 this is that the bind() call in sock_server_reader.py should fail with
 socket.error: (10048, 'Address already in use').

Sorry, I don't expect that.  sock_server_reader is no longer listening
on port 1, so there's no reason some other socket can't start
listening on it.

 3) python -i sock_server_reader.py
 The server prints:
 ('127.0.0.1', 1)
 a accepting:
 
 Already here the problem occurs, bind() is allowed to bind to a port
 that is in use, in this case by the client socket 'r'.
 [also on other windows ? Mikkel: yes. Diku:???]

I showed an example before of how you can get any number (well, up to
64K) of sockets simultaneously alive saying they're bound to the same
address, on Windows or Linux.  The socket returned by a.accept()
always duplicates a's (hosthame, port) address.  That's so that if the
peer asks for its peer, it gets back the address it originally
connected to.  It may be confusing, but that's how it works.

Windows and Linux seem to differ in how willing they are to reuse a
port after a listening socket is closed, but dollars to doughnuts says
Microsoft wouldn't accept a claim that their behavior is a bug.

 4) python -i sock_client_writer.py
 Now one out of two things happen:
 
 a) The client prints:
 w connecting:
 

Re: [Zope] Re: Running more than one instance on windows often block each other

2005-07-28 Thread Tim Peters
[Sune B. Woeller]
...
 But then I stumbled upon this flag in the WinSock documentation:
 SO_EXCLUSIVEADDRUSE
 See the description here:
 http://msdn.microsoft.com/library/default.asp?url=/library/en-us/winsock/winsock/using_so_exclusiveaddruse.asp

Right, I vaguely wink knew about that.  Note that the documentation
explicitly states that in the absence of SO_EXCLUSIVEADDRUSE,

the port may be reused as soon as the socket on which bind was called
(that is, the socket the connection was originated on or the listening
socket) is closed.

IOW, that's your case #b from earlier email, and Windows is just
doing what's documented there.  Believe it or not, I haven't found any
Linux-ish docs as clear as these MS docs about the behavior of its
bind() in all cases.

There are problems with SO_EXCLUSIVEADDRUSE too, which Google will
find.  A big one is that many versions of Windows require admin privs
to set this option, including many versions of Windows Server, and
WinXP through SP1.  That was a bug, but it's only recently been fixed
(in SP2 for WinXP).

At this point, I wouldn't consider using it unless someone first took
the tedious time it needs to demonstrate that when it is used, the
thing that _I_ think is a bug here goes away in its presence:  the
seeming ability of Windows to sometimes permit more than one socket to
bind to the same address simultaneously (not serially -- Windows does
seem to prevent that reliably).

If you can, I would like you to try the ZODB 3.4 Windows socket dance
code, and see if it works for you in practice.  I know it's not
bulletproof, but it's portable across all flavors of Windows and is
much better-behaved in my tests so far than the Medusa Windows socket
dance.

 It is very interesting reading, especially:
 An important caveat to using the SO_EXCLUSIVEADDRUSE option exists: If
 one or more connections originating from (or accepted on) a port bound
 with SO_EXCLUSIVEADDRUSE is active, all bind attempts to that port will
 fail.

Note too that they describe that as an important caveat (a warning),
not as a feature.  They go on to explain that active means all of
the ESTABLISHED, FIN_WAIT, FIN_WAIT_2, and LAST_ACK states, meaning
the port stays tied up (in reality) for minutes even after the `r` and
`w` sockets are closed.  That's a 50% increase then in the # of ports
each trigger tiies up for an arbitrarily long time.

...

 There is a python bugfix for this, but only for python 2.4:
 http://sourceforge.net/tracker/index.php?func=detailaid=982665group_id=5470atid=305470

 (It is added to version 1.294 of socketmodule.c)

That's not a real problem; if needed this could easily be done under
Python 2.3.5 too (the patch only adds a symbolic name for a fixed
integer; the integer could be hard-coded when not hasattr(socket,
SO_EXCLUSIVEADDRUSE) -- much as the current Medusa dance hardcodes 1
instead of using socket.TCP_NODELAY).
___
Zope maillist  -  Zope@zope.org
http://mail.zope.org/mailman/listinfo/zope
**   No cross posts or HTML encoding!  **
(Related lists -
 http://mail.zope.org/mailman/listinfo/zope-announce
 http://mail.zope.org/mailman/listinfo/zope-dev )


Re: [Zope] Re: Running more than one instance on windows often block each other

2005-07-28 Thread Tim Peters
[Tim]
 ...
 At this point, I wouldn't consider using it [SO_EXCLUSIVEADDRUSE]
 unless someone first took the tedious time it needs to demonstrate that
 when it is used, the thing that _I_ think is a bug here goes away in its
 presence:  the seeming ability of Windows to sometimes permit more
 than one socket to bind to the same address simultaneously (not serially --
 Windows does seem to prevent that reliably).

I started, but didn't get that far.  The first time I ran a pair of
processes with the attached (Python 2.4.1, WinXP Pro SP2), one fell
over with

...
w.connect((host, port))
  File string, line 1, in connect
socket.error: (10048, 'Address already in use')

after about 20 minutes.

So, on the face of it, playing with SO_EXCLUSIVEADDRUSE is no better
than the ZODB 3.4 Windows socket dance.  Both appear mounds
better-behaved than the Medusa Windows socket dance without
SO_EXCLUSIVEADDRUSE, though.  Since there are fewer other problems
associated with the ZODB 3.4 version (see last email), I'd like to
repeat this part:

 If you can, I would like you to try the ZODB 3.4 Windows socket dance
 code, and see if it works for you in practice.  I know it's not
 bulletproof, but it's portable across all flavors of Windows and is
 much better-behaved in my tests so far than the Medusa Windows socket
 dance.

Bulletproof appears impossible due to what still look like race bugs
in the Windows socket implementation.

Here's the code.  Note that it changed to try (no more than) 10,000
ports, although I didn't see it need to go through more than 200:

import socket, errno
import time, random

class BindError(Exception):
pass

def socktest15():
Like socktest1, but w/o pointless blocking games.
Added SO_EXCLUSIVEADDRUSE to the server socket.


a = socket.socket()
w = socket.socket()

a.setsockopt(socket.SOL_SOCKET, socket.SO_EXCLUSIVEADDRUSE, 1)
# set TCP_NODELAY to true to avoid buffering
w.setsockopt(socket.IPPROTO_TCP, 1, 1)
# tricky: get a pair of connected sockets
host = '127.0.0.1'
port = 1

while 1:
try:
a.bind((host, port))
break
except:
if port = 1:
raise BindError, 'Cannot bind trigger!'
port -= 1

port2count[port] = port2count.get(port, 0) + 1
a.listen(1)
w.connect((host, port))
r, addr = a.accept()
a.close()

return (r, w)

def close(r, w):
for s in r, w:
s.close()
return # the fancy stuff below didn't help or hurt
for s in w, r:
s.shutdown(socket.SHUT_WR)
for s in w, r:
while 1:
msg = s.recv(10)
if msg == :
break
print eh?!, repr(msg)
for s in w, r:
s.close()

port2count = {}

def dump():
print
items = port2count.items()
items.sort()
for pair in items:
print %5d %7d % pair

sofar = []
i = 0
try:
   while 1:
   if i % 1000 == 0:
   dump()
   i += 1
   print '.',
   try:
   stuff = socktest15()
   except RuntimeError:
   raise
   sofar.append(stuff)
   time.sleep(random.random()/10)
   if len(sofar) == 50:
   tup = sofar.pop(0)
   r, w = tup
   msg = str(random.randrange(100))
   w.send(msg)
   msg2 = r.recv(100)
   assert msg == msg2, (msg, msg2, r.getsockname(), w.getsockname())
   close(r, w)
except KeyboardInterrupt:
   for tup in sofar:
   close(*tup)
___
Zope maillist  -  Zope@zope.org
http://mail.zope.org/mailman/listinfo/zope
**   No cross posts or HTML encoding!  **
(Related lists -
 http://mail.zope.org/mailman/listinfo/zope-announce
 http://mail.zope.org/mailman/listinfo/zope-dev )


Re: [Zope] Re: Running more than one instance on windows often block each other

2005-07-27 Thread Tim Peters
[Sune B. Woeller]
 I will try to recreate the problem on other
 flavours of windows asap. I will get back to you
 later.

Cool!  If you can, posting a self-contained program that demonstrates
the problem is the best way to make progress.

 I guess my reporting was a bit too quick, sorry:

Not at all -- you did excellent detective work here!  It's
appreciated.  The problem is that English descriptions are nearly
always ambiguous, especially when trying to explain something
complicated that other people haven't reported.  Posting a program
removes all that guesswork:  it reproduces the problem for other
people on other boxes, or it doesn't, and we learn something valuable
either way; if it does fail for others, then they can help investigate
_why_ it fails.  At the start, thoroughly demonstrating a problem
exists is more important than guessing at what might be needed to worm
around it.

 I'm running python 2.3.5, (installed from windows binary).
 Zope 2.7.7 (not necessary for the test scripts)
 Windows XP Home SP2 (blush - my laptop came with that... ;) )

Good -- thanks.  A pretty vanilla system, then.  I've heard that XP
Home has special limitations on network capabilities, but don't know
more than that; it's at least possible they're relevant.  I'm not sure
that running multiple Zope instances on a laptop is a prime use case
for Zope wink.
___
Zope maillist  -  Zope@zope.org
http://mail.zope.org/mailman/listinfo/zope
**   No cross posts or HTML encoding!  **
(Related lists -
 http://mail.zope.org/mailman/listinfo/zope-announce
 http://mail.zope.org/mailman/listinfo/zope-dev )