Bugs item #539175, was opened at 2002-04-04 04:54 Message generated for change (Comment added) made by akuchling You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=105470&aid=539175&group_id=5470
Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Threads Group: None Status: Open Resolution: None Priority: 5 Private: No Submitted By: dustin sallings (dustin) Assigned to: Nobody/Anonymous (nobody) Summary: resolver not thread safe Initial Comment: I've got an application that does SNMP monitoring and has a thread listening with SimpleXMLRPCServer for remote control. I noticed the XMLRPC listener logging an incorrect address while snmp jobs were processing: sw1.west.spy.net - - [04/Apr/2002 01:16:37] "POST /RPC2 HTTP/1.0" 200 - localhost.west.spy.net - - [04/Apr/2002 01:16:43] "POST /RPC2 HTTP/1.0" 200 - sw1 is one of the machines that is being queried, but the XMLRPC requests are happening over localhost. gethostbyname() and gethostbyaddr() both return static data, thus they aren't reentrant. As a workaround, I copied socket.py to my working directory and added the following to it: try: import threading except ImportError, ie: sys.stderr.write(str(ie) + "\n") # mutex for DNS lookups __dns_mutex=None try: __dns_mutex=threading.Lock() except NameError: pass def __lock(): if __dns_mutex!=None: __dns_mutex.acquire() def __unlock(): if __dns_mutex!=None: __dns_mutex.release() def gethostbyaddr(addr): """Override gethostbyaddr to try to get some thread safety.""" rv=None try: __lock() rv=_socket.gethostbyaddr(addr) finally: __unlock() return rv def gethostbyname(name): """Override gethostbyname to try to get some thread safety.""" rv=None try: __lock() rv=_socket.gethostbyname(name) finally: __unlock() return rv ---------------------------------------------------------------------- >Comment By: A.M. Kuchling (akuchling) Date: 2006-12-21 10:13 Message: Logged In: YES user_id=11375 Originator: NO Attaching the test script. The script now fails because some of the spy.net addresses are resolved to hostnames such as adsl-69-230-8-158.dsl.pltn13.pacbell.net. When I changed the test script to use python.org machine names and ran it with Python 2.5 on Linux, no errors were reported. Does this still fail on current OS X? If not, I suggest calling this a platform C library bug and closing this report. File Added: resolv-bug.py ---------------------------------------------------------------------- Comment By: dustin sallings (dustin) Date: 2002-08-11 15:27 Message: Logged In: YES user_id=43919 No, unfortunately, I haven't been able to look at it in a while. Short of locking it in python, I wasn't able to avoid the failure. I'm sorry I haven't updated this at all. As far as I can tell, it's still a problem, but I haven't not been able to find a solution in the C code. I supposely I spoke with too much haste when I said I was perfectly capable of fixing the problem myself. The locking in the C code did seem correct, but the memory was still getting stomped. ---------------------------------------------------------------------- Comment By: Neal Norwitz (nnorwitz) Date: 2002-08-11 11:04 Message: Logged In: YES user_id=33168 Dustin, any progress on a patch or diagnosing this further? ---------------------------------------------------------------------- Comment By: dustin sallings (dustin) Date: 2002-04-05 16:44 Message: Logged In: YES user_id=43919 I first noticed this problem on my OS X box. Since it's affecting me, it's not obvious to anyone else, and I'm perfectly capable of fixing it myself, I'll try to spend some time figuring out what's going on this weekend. It seems like it might be making a decision to not use the lock at compile time. I will investigate further and submit a patch. ---------------------------------------------------------------------- Comment By: Tim Peters (tim_one) Date: 2002-04-05 16:31 Message: Logged In: YES user_id=31435 Just a reminder that the first thing to try on any SGI box is to recompile Python with optimization disabled. I can't remember the last time we had "a Python bug" on SGI that wasn't traced to a compiler -O bug. ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2002-04-05 03:56 Message: Logged In: YES user_id=21627 Can you spot the error in the Python socket module? I still fail to see our bug, and I would assume it is a C library bug; I also cannot reproduce the problem on any of my machines. Can you please report the settings of the various HAVE_ defines for irix? ---------------------------------------------------------------------- Comment By: dustin sallings (dustin) Date: 2002-04-04 17:21 Message: Logged In: YES user_id=43919 Looking over the code a bit more, I see that my last message wasn't entirely accurate. There does seem to be only one lock for both gethostbyname and gethostbyaddr (gethostbyname_lock is used for both). This is a pretty simple test that illustrates the problem I'm seeing. My previous work was on my OS X machine, but this is Python 2.2 (#3, Mar 6 2002, 18:30:37) [C] on irix6. #!/usr/bin/env python # # Copyright (c) 2002 Dustin Sallings <dustin@spy.net> # $Id$ import threading import socket import time class ResolveMe(threading.Thread): hosts=['propaganda.spy.net', 'bleu.west.spy.net', 'mail.west.spy.net'] def __init__(self): threading.Thread.__init__(self) self.setDaemon(1) def run(self): # Run 100 times for i in range(100): for h in self.hosts: nrv=socket.gethostbyname_ex(h) arv=socket.gethostbyaddr(nrv[2][0]) try: # Verify the hostname is correct assert(h==nrv[0]) # Verify the two hostnames match assert(nrv[0]==arv[0]) # Verify the two addresses match assert(nrv[2]==arv[2]) except AssertionError: print "Failed! Checking " + `h` + " got, " \ + `nrv` + " and " + `arv` if __name__=='__main__': for i in range(1,10): print "Starting " + `i` + " threads." threads=[] for n in range(i): rm=ResolveMe() rm.start() threads.append(rm) for t in threads: t.join() print `i` + " threads complete." time.sleep(60) The output looks like this: verde:/tmp 190> ./pytest.py Starting 1 threads. 1 threads complete. Starting 2 threads. Failed! Checking 'propaganda.spy.net' got, ('mail.west.spy.net', [], ['66.149.231.226']) and ('mail.west.spy.net', [], ['66.149.231.226']) Failed! Checking 'bleu.west.spy.net' got, ('mail.west.spy.net', [], ['66.149.231.226']) and ('mail.west.spy.net', [], ['66.149.231.226']) [...] ---------------------------------------------------------------------- Comment By: dustin sallings (dustin) Date: 2002-04-04 16:08 Message: Logged In: YES user_id=43919 The XMLRPC request is clearly being logged as coming from my cisco switch when it was, in fact, coming from localhost. I can't find any clear documentation, but it seems that on at least some systems gethostbyname and gethostbyaddr reference the same static variable, so having separate locks for each one (as seen in socketmodule.c) doesn't help anything. It's not so much that they're not reentrant, but you can't call any combination of the two of them at the same time. Here's some test code: #include <stdio.h> #include <sys/types.h> #include <sys/socket.h> #include <netinet/in.h> #include <netdb.h> #include <assert.h> int main(int argc, char **argv) { struct hostent *byaddr, *byname; unsigned int addr; struct sockaddr *sa = (struct sockaddr *)&addr; addr=1117120483; byaddr=gethostbyaddr(sa, sizeof(addr), AF_INET); assert(byaddr); printf("byaddr: %s\n", byaddr->h_name); byname=gethostbyname("mail.west.spy.net"); assert(byname); printf("byname: %s\n", byname->h_name); printf("\nReprinting:\n\n"); printf("byaddr: %s\n", byaddr->h_name); printf("byname: %s\n", byname->h_name); } ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2002-04-04 15:06 Message: Logged In: YES user_id=21627 I'm not sure what problem you are reporting. Python does not attempt to invoke gethostbyname from two threads simultaneously; this is prevented by the GIL. On some systems, gethostname is reentrant (in the gethostname_r incarnation); Python uses that where available, and releases the GIL before calling it. So I fail to see the bug. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=105470&aid=539175&group_id=5470 _______________________________________________ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com