[jira] Created: (MODPYTHON-114) Problems with PythonPath directive.
Problems with PythonPath directive. --- Key: MODPYTHON-114 URL: http://issues.apache.org/jira/browse/MODPYTHON-114 Project: mod_python Type: Bug Components: core Versions: 3.2, 3.1.4 Reporter: Graham Dumpleton The PythonPath setting can be used to define the value of the Python sys.path variable. It is this variable which defines the list of directories that Python will search in when looking for a module to be imported. Although the actual reassignment of sys.path by mod_python does not in itself present a problem due to assignment in Python being thread safe by definition, the context in which the assignment occurs is not thread safe and a race condition exists. This exists as the top level mod_python dispatcher will consult the existing value of sys.path and the last value for the PythonPath setting encountered before then making a decision to modify sys.path. If multiple requests are being serviced in distinct threads within the context of the same interpreter instance, and each at the same time decide they want to modify the value of sys.path, only one might ultimately succeed in setting it to the value it wants and any modification required by the other may be lost. In the worst case scenario, this can result in the importation of any subsequent modules within that request failing due to a required directory not being present in sys.path. It is possible that this situation may resolve itself and go away on a subsequent request, but due to how mod_python caches the last value of PythonPath in a global variable this will be dependent on what other requests arrive. At the least, for mod_python to resolve the problem itself would require a request to arrive in the interim which targeted the URL which was not the last to cache its raw setting for PythonPath. This only works though due to a further issue whereby alternate requests against URLs with different PythonPath settings will cause sys.path to be extended everytime if the PythonPath setting references sys.path. This results in sys.path continually growing over time due to directories being added multiple times. The problematic code in apache.py is: if config.has_key(PythonPath): # we want to do as little evaling as possible, # so we remember the path in un-evaled form and # compare it global _path pathstring = config[PythonPath] if pathstring != _path: _path = pathstring newpath = eval(pathstring) if sys.path != newpath: sys.path[:] = newpath To fix the problems, the processing of PythonPath directive should be protected by a thread mutex lock and a dictionary should be used to store all seen values of PythonPath rather than a single variable. If a value for PythonPath has ever been seen, and not just from the last change, then no update would be necessary. if config.has_key(PythonPath): try: _path_cache_lock.acquire() pathstring = config[PythonPath] if not _path_cache.has_key(pathstring): newpath = eval(pathstring) _path_cache[pathstring] = None sys.path[:] = newpath finally: _path_cache_lock.release() There shouldn't really be a need to check whether the new path is different to the current value of sys.path as that scenario shouldn't occur at that point. The two parts of this problem were previously catalogued as ISSUE 15 and ISSUE 16 in my list of problems with the module importing system. The list can be found at: http://www.dscpl.com.au/articles/modpython-003.html -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: [jira] Created: (MODPYTHON-114) Problems with PythonPath directive.
On 26/01/2006, at 11:48 PM, Mike Looijmans wrote: Two comments: 1. (bug): The acquire() call should be *outside* the try ... finally block. You do not want to release a lock that you did not aquire. Whoops. Quite agree. One hopes that acquiring a simple mutex lock never fails though. If it does, you process is probably suffering much more serious issues. ;-( 2. (optimization): If you are not planning to change the path, you do not have to aquire the lock. Aquiring a lock is quite expensive, so the double-check will have much less impact on performance. if config.has_key(PythonPath): pathstring = config[PythonPath] if not _path_cache.has_key(pathstring): # acquire must be done outside the try block _path_cache_lock.acquire() try: # double-check, because two threads might come # to the same conclusion up until here. if not _path_cache.has_key(pathstring): newpath = eval(pathstring) _path_cache[pathstring] = None sys.path[:] = newpath finally: _path_cache_lock.release() You run the very very small risk that two different threads will both decide they need to update the sys.path value for the same value of PythonPath. The new entries will thus get added twice. This sort of thing was one of the things wrong with the old code, although now that a map is used to cache seen values of PythonPath, not so big a problem as simply means you get redundant entries rather than loss of potential directories from sys.path. Number of redundant entries will depend on how many threads hit that check at the same time. After the first time it should not happen again. Thus, trade off of speed versus correctness is probably reasonable as it will not cause a failure. In general I tend towards robustness and unexpected surprises and that is why the code was written as it was. That this trade off has been made will need to be clearly documented in code though. Graham
Re: [jira] Created: (MODPYTHON-114) Problems with PythonPath directive.
2006/1/26, Mike Looijmans [EMAIL PROTECTED]: Two comments: 1. (bug): The acquire() call should be *outside* the try ... finally block. You do not want to release a lock that you did not aquire. 2. (optimization): If you are not planning to change the path, you do not have to aquire the lock. Aquiring a lock is quite expensive, so the double-check will have much less impact on performance. if config.has_key(PythonPath): pathstring = config[PythonPath] if not _path_cache.has_key(pathstring): # acquire must be done outside the try block _path_cache_lock.acquire() try: # double-check, because two threads might come # to the same conclusion up until here. if not _path_cache.has_key(pathstring): newpath = eval(pathstring) _path_cache[pathstring] = None sys.path[:] = newpath finally: _path_cache_lock.release() Mike Looijmans Philips Natlab / Topic Automation Hi, Your optimisation is called double-checked locking, and it is now known to be broken, especially on multiple-CPU systems. The problem exists in numerous languages. Here is an explanation of the problem : http://www.cs.umd.edu/~pugh/java/memoryModel/DoubleCheckedLocking.html Now, maybe Python is not touched by this problem due to its relatively poor support for multithreading. The presence of the GIL guarantees that only one thread of Python bytecode interpreter runs at a time, but still, in a multiple CPU environment, you can get bitten by local caching issues. As this thing is a bit hairy, I think we should first strive for correctness, then care about performance later. And no, I won't bother you with one more quote about premature this and the root of that ;). Regards, Nicolas
Re: [jira] Created: (MODPYTHON-114) Problems with PythonPath directive.
Graham Dumpleton wrote: Thus, trade off of speed versus correctness is probably reasonable as it will not cause a failure. In general I tend towards robustness and unexpected surprises and that is why the code was written as it was. Personally I tend towards robustness and *away* from unexpected suprises, but if that's what it takes to keep your life interesting who am I to argue? :) Jim
3.2.6 test period - how long do we wait?
It seems like any 3.2.6 testing that is going to be done, has been done. How long do we wait before making a decision for an official release. If we don't get cracking on 3.3 soon Graham's gonna fill another couple of pages on JIRA and we'll never catch up. :) Jim
Re: [jira] Created: (MODPYTHON-114) Problems with PythonPath directive.
Jim Gallacher wrote .. Graham Dumpleton wrote: Thus, trade off of speed versus correctness is probably reasonable as it will not cause a failure. In general I tend towards robustness and unexpected surprises and that is why the code was written as it was. Personally I tend towards robustness and *away* from unexpected suprises, but if that's what it takes to keep your life interesting who am I to argue? :) Not not not again. I'm not doing very well at getting my not and un around the right way of late am I. It is a problem I always have, but that is twice now in the last day. :-( Graham
Re: 3.2.6 test period - how long do we wait?
Jim Gallacher wrote .. It seems like any 3.2.6 testing that is going to be done, has been done. How long do we wait before making a decision for an official release. If we don't get cracking on 3.3 soon Graham's gonna fill another couple of pages on JIRA and we'll never catch up. :) You aren't far wrong. I was actually thinking of sending an email to the list warning about an impending avalanche of JIRA bug reports. This is going to be caused by me going through my module importer issues list and logging separate bug reports for each problem where none already exists in JIRA. Also have potentially others as well I know of but aren't in my module importer issues list at the moment but which I will be adding. Also going to create new bugs or ask for some old bugs to be reopened because the issue was only addressed in version 3.2.6 for mod_python.publisher and not mod_python as a whole. Thus, consider yourself warned. Note that none of what I will be creating issues for should stop 3.2.6 (final) being released. Already accepted that they would be 3.3. Graham
Re: 3.2.6 test period - how long do we wait?
On Thu, 26 Jan 2006, Jim Gallacher wrote: It seems like any 3.2.6 testing that is going to be done, has been done. I've been kinda swamped with unrelated things past two weeks, so I wasn't paying much attention. Perhaps an e-mail summarizing the +1's so far and a quick vote of the core group on whether this is enough? Grisha
Re: 3.2.6 test period - how long do we wait?
Gregory (Grisha) Trubetskoy wrote: On Thu, 26 Jan 2006, Jim Gallacher wrote: It seems like any 3.2.6 testing that is going to be done, has been done. I've been kinda swamped with unrelated things past two weeks, so I wasn't paying much attention. Perhaps an e-mail summarizing the +1's so far and a quick vote of the core group on whether this is enough? I will gather the info later tonight. Jim
Re: 3.2.6 test period - how long do we wait?
Gregory (Grisha) Trubetskoy wrote: On Thu, 26 Jan 2006, Jim Gallacher wrote: It seems like any 3.2.6 testing that is going to be done, has been done. I've been kinda swamped with unrelated things past two weeks, so I wasn't paying much attention. Perhaps an e-mail summarizing the +1's so far and a quick vote of the core group on whether this is enough? Grisha Here is the summary of the 3.2.6 testing to date. Let me know if I've missed anyone's tests or you still want to toss you test on the pile. For the 3.2.6b tarball (same as 3.2.6 but with the messed up version string). +1 Gentoo [current], Apache-2.0.55 (mpm-prefork), Python-2.4, gcc-3.4.4 +1 Linux (amd64) Ubuntu Breezy 5.10, apache 2.0.54 mpm-worker python 2.4 +1 Linux Debian sid, apache 2.0.55 (mpm-prefork), python 2.3 +1 Linux Debian sarge, apache 2.0.54 (mpm-worker), python 2.3 +1 Mac OS X 10.3, Apache 2.0.55, python 2.3 +1 Slackware 10.1, Apache 2.0.55 (mpm-prefork), Python 2.4 +1 Windows XP, python 2.4 +1 Windows 2000, python 2.3 For 3.2.6 +1 Fedora Core 4 Apache 2.0.54 Python 2.4.1 +1 Linux Debian sid, apache 2.0.55 (mpm-prefork), python 2.3 +1 Linux Debian sarge, apache 2.0.54 (mpm-worker), python 2.3 +1 MacOS-10.4.4 Apache-2.0.55 mpm-prefork Python-2.4.2 +1 Windows 2000 Server SP4, Python 2.3 +1 Windows XP Pro SP2, Python 2.4 +1 Windows XP Pro SP2 Apache 2.0.54 Python 2.4.2 0 HP Tru64, mpm-worker On the bsd front, Barry Pederson reported a problem with FreeBSD 6.0 but didn't issue an actual -1. FreeBSD and I are still not getting along, but with a small manual tweak of the Makefile I was able to run all the tests successfully. I don't know if there is a configure problem or if it's just my BSD incompetence but I'm willing to issue a shaky +0 for FreeBSD 6.0. Jim
[jira] Commented: (MODPYTHON-114) Problems with PythonPath directive.
[ http://issues.apache.org/jira/browse/MODPYTHON-114?page=comments#action_12364176 ] Graham Dumpleton commented on MODPYTHON-114: Note that the call to acquire the mutex lock should be moved outside of the try block. if config.has_key(PythonPath): _path_cache_lock.acquire() try: pathstring = config[PythonPath] if not _path_cache.has_key(pathstring): newpath = eval(pathstring) _path_cache[pathstring] = None sys.path[:] = newpath finally: _path_cache_lock.release() See: http://www.mail-archive.com/python-dev@httpd.apache.org/msg01025.html for discussion about this point and other suggested modifications to the patch. The other suggested modifications would scarifice total correctness for reduced thread locking. Even though it wouldn't be totally correct, it wouldn't cause actual failure, just redundant entries in sys.path. Opinion seemed to be leaning though towards correctness. Problems with PythonPath directive. --- Key: MODPYTHON-114 URL: http://issues.apache.org/jira/browse/MODPYTHON-114 Project: mod_python Type: Bug Components: core Versions: 3.2, 3.1.4 Reporter: Graham Dumpleton The PythonPath setting can be used to define the value of the Python sys.path variable. It is this variable which defines the list of directories that Python will search in when looking for a module to be imported. Although the actual reassignment of sys.path by mod_python does not in itself present a problem due to assignment in Python being thread safe by definition, the context in which the assignment occurs is not thread safe and a race condition exists. This exists as the top level mod_python dispatcher will consult the existing value of sys.path and the last value for the PythonPath setting encountered before then making a decision to modify sys.path. If multiple requests are being serviced in distinct threads within the context of the same interpreter instance, and each at the same time decide they want to modify the value of sys.path, only one might ultimately succeed in setting it to the value it wants and any modification required by the other may be lost. In the worst case scenario, this can result in the importation of any subsequent modules within that request failing due to a required directory not being present in sys.path. It is possible that this situation may resolve itself and go away on a subsequent request, but due to how mod_python caches the last value of PythonPath in a global variable this will be dependent on what other requests arrive. At the least, for mod_python to resolve the problem itself would require a request to arrive in the interim which targeted the URL which was not the last to cache its raw setting for PythonPath. This only works though due to a further issue whereby alternate requests against URLs with different PythonPath settings will cause sys.path to be extended everytime if the PythonPath setting references sys.path. This results in sys.path continually growing over time due to directories being added multiple times. The problematic code in apache.py is: if config.has_key(PythonPath): # we want to do as little evaling as possible, # so we remember the path in un-evaled form and # compare it global _path pathstring = config[PythonPath] if pathstring != _path: _path = pathstring newpath = eval(pathstring) if sys.path != newpath: sys.path[:] = newpath To fix the problems, the processing of PythonPath directive should be protected by a thread mutex lock and a dictionary should be used to store all seen values of PythonPath rather than a single variable. If a value for PythonPath has ever been seen, and not just from the last change, then no update would be necessary. if config.has_key(PythonPath): try: _path_cache_lock.acquire() pathstring = config[PythonPath] if not _path_cache.has_key(pathstring): newpath = eval(pathstring) _path_cache[pathstring] = None sys.path[:] = newpath finally: _path_cache_lock.release() There shouldn't really be a need to check whether the new path is different to the current value of sys.path as that scenario shouldn't occur at that point. The two parts of this problem were previously catalogued as ISSUE 15 and ISSUE 16 in my list of problems with the module importing system. The list can be found at: http://www.dscpl.com.au/articles/modpython-003.html -- This message is automatically
[jira] Created: (MODPYTHON-115) import_module() and multiple modules of same name.
import_module() and multiple modules of same name. -- Key: MODPYTHON-115 URL: http://issues.apache.org/jira/browse/MODPYTHON-115 Project: mod_python Type: Bug Components: core Versions: 3.1.4, 3.2 Reporter: Graham Dumpleton The apache.import_module() function is a thin wrapper over the standard Python module importing system. This means that modules are still stored in sys.modules. As modules in sys.modules are keyed by their module name, this in turn means that there can only be one active instance of a module for a specific name. The import_module() function tries to work around this by checking the path name of the location of a module against that being requested and if it is different will reload the correct module. This check of the path though only occurs when the path argument is actually supplied to the import_module() function. The path is only supplied in this way when mod_python.publisher makes use of the import_module() function, it is not supplied when the Python*Handler directives are used because in that circumstance a module may actually be a system module and supplying path would prevent it from being found. Even though mod_python.publisher supplies the path argument to the import_module() function, the check of the path has bugs, with modules possibly becoming inaccessible as documented in JIRA as MODPYTHON-9. The check by mod_python of the path name to the actual code file for a module to determine if it should be reloaded, can also cause a continual cycle of module reloading even though the modules on disk may not have changed. This will occur when successive requests alternate between URLs related to the distinct modules having the same name. This cyclic reloading is documented in JIRA as MODPYTHON-10. That a module is reloaded into the same object space as the existing module when two modules of the same name are in different locations, can also cause namespace pollution and security issues if one location for the module was public and the other private. This cross contamination of modules is as documented in JIRA as MODPYTHON-11. In respect of the Python*Handler directives where the path argument was never supplied to the import_module() function, the result would be that the first module loaded under the specified name would be used. Thus, any subsequent module of the same name referred to by a Python*Handler directive found in a different directory but within the same interpreter would in effect be ignored. A caveat to this though is that such a Python*Handler directive would result in that handlers directory being inserted at the head of sys.path. If the first instance of the module loaded under that name were at some point modified, the module would be automatically reloaded, but it would load the version from the different directory. Now, although these problem as they relate to mod_python.publisher are addressed in mod_python 3.2.6, the underlying problems in 'import_module()' are not. As the bug reports as they relate to mod_python.publisher have been closed off as resolved, am creating this bug report so as to carry on a bug report for the underlying problem as it applies to Python*Handler directive and use of import_module() explicitly. To illustrate the issue as it applies to Python*Handler directive, create two separate directories with a .htaccess file containing: AddHandler mod_python .py PythonHandler index PythonDebug On In the index.py file in each separate directory put: import os from mod_python import apache def handler(req): req.content_type = 'text/plain' print req, os.getpid(), __file__ return apache.OK Assuming these are accessed as: /~grahamd/mod_python_9/subdir-1/index.py /~grahamd/mod_python_9/subdir-2/index.py access the first URL, and the result will be: 10665 /Users/grahamd/Sites/mod_python_9/subdir-1/index.py now access the second URL and we get: 10665 /Users/grahamd/Sites/mod_python_9/subdir-1/index.py Note this assumes the same child process got it, so fixing Apache to run one child process is required for this test. As one can see, it doesn't actually use the 'subdir-2/index.py module at all and still uses the subdir-1/index.py' module. If one modifies subdir-1/index.py' so its timestamp is updated and load the second URL again, we get: 10665 /Users/grahamd/Sites/mod_python_9/subdir-2/index.py This occurs because it detects the change in the first module loaded, but because sys.path had the second handler directory at the head of sys.path now, when reloaded it picked up the latter. These issues with same name module in multiple locations is listed as ISSUE 14 in my list of module importer problems. See: http://www.dscpl.com.au/articles/modpython-003.html -- This message is automatically generated by JIRA. - If you think it was sent incorrectly
[jira] Commented: (MODPYTHON-116) Attributes removed from module code file still accessible until restart.
[ http://issues.apache.org/jira/browse/MODPYTHON-116?page=comments#action_12364187 ] Graham Dumpleton commented on MODPYTHON-116: Whoops, forgot to reference back to my problem list for completeness. :-) This problem is listed as ISSUE 13 in my list of module importer problems. See: http://www.dscpl.com.au/articles/modpython-003.html Attributes removed from module code file still accessible until restart. Key: MODPYTHON-116 URL: http://issues.apache.org/jira/browse/MODPYTHON-116 Project: mod_python Type: Bug Components: core Versions: 3.2, 3.1.4 Reporter: Graham Dumpleton When using apache.import_module() directly, or when it is used indirectly by the Python*Handler directives, and automatic module reloading occurs, modules are reloaded on top of the existing module. One of the problems with this is that if an attribute such as a function or data value is removed from the code file on disk, when the module is reloaded, that attribute doesn't get removed from the in memory module held by mod_python. The only way to eliminate such an attributed from the in memory module currently is to restart Apache, automatic module reloading doesn't help. A good example of the problems this can cause is with mod_python.publisher. Because attributes can be arbitrarily mapped to by a URL, if you forget to prefix variables or functions with an underscore they will be visible to a request. If such a mistake was realised and you change the source code to add an underscore prefix and relied on automatic module reloading, it wouldn't actually get rid of the incorrectly named attribute in the module held in the mod_python cache. Thus, the restart of Apache is still required. As it stands for mod_python.publisher, the problem is fixed in 3.2.6, but it still exists for direct use of apache.import_module() and Python*Handler directives. The consequences outside of mod_python.publisher may not be as problematic, bu depends on specific user code. A scheme whereby new modules are reloaded into a new module instance would eliminate this problem. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira