[jira] Created: (MODPYTHON-114) Problems with PythonPath directive.

2006-01-26 Thread Graham Dumpleton (JIRA)
Problems with PythonPath directive.
---

 Key: MODPYTHON-114
 URL: http://issues.apache.org/jira/browse/MODPYTHON-114
 Project: mod_python
Type: Bug
  Components: core  
Versions: 3.2, 3.1.4
Reporter: Graham Dumpleton


The PythonPath setting can be used to define the value of the Python 
sys.path variable. It is this variable which defines the list of directories 
that Python will search in when looking for a module to be imported.

Although the actual reassignment of sys.path by mod_python does not in itself 
present a problem due to assignment in Python being thread safe by definition, 
the context in which the assignment occurs is not thread safe and a race 
condition exists.

This exists as the top level mod_python dispatcher will consult the existing 
value of sys.path and the last value for the PythonPath setting encountered 
before then making a decision to modify sys.path. If multiple requests are 
being serviced in distinct threads within the context of the same interpreter 
instance, and each at the same time decide they want to modify the value of 
sys.path, only one might ultimately succeed in setting it to the value it 
wants and any modification required by the other may be lost.

In the worst case scenario, this can result in the importation of any 
subsequent modules within that request failing due to a required directory not 
being present in sys.path. It is possible that this situation may resolve 
itself and go away on a subsequent request, but due to how mod_python caches 
the last value of PythonPath in a global variable this will be dependent on 
what other requests arrive.

At the least, for mod_python to resolve the problem itself would require a 
request to arrive in the interim which targeted the URL which was not the last 
to cache its raw setting for PythonPath. This only works though due to a 
further issue whereby alternate requests against URLs with different 
PythonPath settings will cause sys.path to be extended everytime if the 
PythonPath setting references sys.path. This results in sys.path 
continually growing over time due to directories being added multiple times.

The problematic code in apache.py is:

if config.has_key(PythonPath):
# we want to do as little evaling as possible,
# so we remember the path in un-evaled form and
# compare it
global _path
pathstring = config[PythonPath]
if pathstring != _path:
_path = pathstring
newpath = eval(pathstring)
if sys.path != newpath:
sys.path[:] = newpath

To fix the problems, the processing of PythonPath directive should be protected 
by a thread mutex lock and a dictionary should be used to store all seen values 
of PythonPath rather than a single variable. If a value for PythonPath has ever 
been seen, and not just from the last change, then no update would be necessary.

if config.has_key(PythonPath):

try:
_path_cache_lock.acquire()

pathstring = config[PythonPath]

if not _path_cache.has_key(pathstring):
newpath = eval(pathstring)
_path_cache[pathstring] = None
sys.path[:] = newpath

finally:
_path_cache_lock.release()

There shouldn't really be a need to check whether the new path is different to 
the current value of sys.path as that scenario shouldn't occur at that point.

The two parts of this problem were previously catalogued as ISSUE 15 and ISSUE 
16 in my list of problems with the module importing system. The list can be 
found at:

  http://www.dscpl.com.au/articles/modpython-003.html



-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira



Re: [jira] Created: (MODPYTHON-114) Problems with PythonPath directive.

2006-01-26 Thread Graham Dumpleton


On 26/01/2006, at 11:48 PM, Mike Looijmans wrote:


Two comments:
1. (bug): The acquire() call should be *outside* the try ... finally 
block. You do not want to release a lock that you did not aquire.


Whoops. Quite agree. One hopes that acquiring a simple mutex lock never
fails though. If it does, you process is probably suffering much more 
serious

issues. ;-(

2. (optimization): If you are not planning to change the path, you do 
not have to aquire the lock. Aquiring a lock is quite expensive, so 
the double-check will have much less impact on performance.



if config.has_key(PythonPath):
pathstring = config[PythonPath]
if not _path_cache.has_key(pathstring):
# acquire must be done outside the try block
_path_cache_lock.acquire()
try:
# double-check, because two threads might come
# to the same conclusion up until here.
if not _path_cache.has_key(pathstring):
newpath = eval(pathstring)
_path_cache[pathstring] = None
sys.path[:] = newpath
finally:
_path_cache_lock.release()


You run the very very small risk that two different threads will both 
decide
they need to update the sys.path value for the same value of 
PythonPath. The

new entries will thus get added twice. This sort of thing was one of the
things wrong with the old code, although now that a map is used to cache
seen values of PythonPath, not so big a problem as simply means you get
redundant entries rather than loss of potential directories from 
sys.path.
Number of redundant entries will depend on how many threads hit that 
check

at the same time. After the first time it should not happen again.

Thus, trade off of speed versus correctness is probably reasonable as 
it will

not cause a failure. In general I tend towards robustness and unexpected
surprises and that is why the code was written as it was. That this 
trade

off has been made will need to be clearly documented in code though.

Graham



Re: [jira] Created: (MODPYTHON-114) Problems with PythonPath directive.

2006-01-26 Thread Nicolas Lehuen
2006/1/26, Mike Looijmans [EMAIL PROTECTED]:
 Two comments:
 1. (bug): The acquire() call should be *outside* the try ... finally
 block. You do not want to release a lock that you did not aquire.

 2. (optimization): If you are not planning to change the path, you do
 not have to aquire the lock. Aquiring a lock is quite expensive, so the
 double-check will have much less impact on performance.


  if config.has_key(PythonPath):
  pathstring = config[PythonPath]
  if not _path_cache.has_key(pathstring):
  # acquire must be done outside the try block
  _path_cache_lock.acquire()
  try:
  # double-check, because two threads might come
  # to the same conclusion up until here.
  if not _path_cache.has_key(pathstring):
  newpath = eval(pathstring)
  _path_cache[pathstring] = None
  sys.path[:] = newpath
  finally:
  _path_cache_lock.release()



 Mike Looijmans
 Philips Natlab / Topic Automation

Hi,

Your optimisation is called double-checked locking, and it is now
known to be broken, especially on multiple-CPU systems. The problem
exists in numerous languages. Here is an explanation of the problem :

http://www.cs.umd.edu/~pugh/java/memoryModel/DoubleCheckedLocking.html

Now, maybe Python is not touched by this problem due to its relatively
poor support for multithreading. The presence of the GIL guarantees
that only one thread of Python bytecode interpreter runs at a time,
but still, in a multiple CPU environment, you can get bitten by local
caching issues.

As this thing is a bit hairy, I think we should first strive for
correctness, then care about performance later. And no, I won't bother
you with one more quote about premature this and the root of that ;).

Regards,
Nicolas


Re: [jira] Created: (MODPYTHON-114) Problems with PythonPath directive.

2006-01-26 Thread Jim Gallacher

Graham Dumpleton wrote:
Thus, trade off of speed versus correctness is probably reasonable as it 
will

not cause a failure. In general I tend towards robustness and unexpected
surprises and that is why the code was written as it was.


Personally I tend towards robustness and *away* from unexpected 
suprises, but if that's what it takes to keep your life interesting who 
am I to argue? :)


Jim


3.2.6 test period - how long do we wait?

2006-01-26 Thread Jim Gallacher
It seems like any 3.2.6 testing that is going to be done, has been done. 
How long do we wait before making a decision for an official release.


If we don't get cracking on 3.3 soon Graham's gonna fill another couple 
of pages on JIRA and we'll never catch up. :)


Jim


Re: [jira] Created: (MODPYTHON-114) Problems with PythonPath directive.

2006-01-26 Thread Graham Dumpleton
Jim Gallacher wrote ..
 Graham Dumpleton wrote:
  Thus, trade off of speed versus correctness is probably reasonable as
 it 
  will
  not cause a failure. In general I tend towards robustness and unexpected
  surprises and that is why the code was written as it was.
 
 Personally I tend towards robustness and *away* from unexpected 
 suprises, but if that's what it takes to keep your life interesting who
 am I to argue? :)

Not not not again. I'm not doing very well at getting my not and un
around the right way of late am I. It is a problem I always have, but that
is twice now in the last day. :-(

Graham


Re: 3.2.6 test period - how long do we wait?

2006-01-26 Thread Graham Dumpleton
Jim Gallacher wrote ..
 It seems like any 3.2.6 testing that is going to be done, has been done.
 How long do we wait before making a decision for an official release.
 
 If we don't get cracking on 3.3 soon Graham's gonna fill another couple
 of pages on JIRA and we'll never catch up. :)

You aren't far wrong. I was actually thinking of sending an email to the
list warning about an impending avalanche of JIRA bug reports. This is
going to be caused by me going through my module importer issues
list and logging separate bug reports for each problem where none
already exists in JIRA. Also have potentially others as well I know of but
aren't in my module importer issues list at the moment but which I will
be adding. Also going to create new bugs or ask for some old bugs to
be reopened because the issue was only addressed in version 3.2.6 for
mod_python.publisher and not mod_python as a whole.

Thus, consider yourself warned. Note that none of what I will be creating
issues for should stop 3.2.6 (final) being released. Already accepted that
they would be 3.3.

Graham


Re: 3.2.6 test period - how long do we wait?

2006-01-26 Thread Gregory (Grisha) Trubetskoy


On Thu, 26 Jan 2006, Jim Gallacher wrote:


It seems like any 3.2.6 testing that is going to be done, has been done.


I've been kinda swamped with unrelated things past two weeks, so I wasn't 
paying much attention. Perhaps an e-mail summarizing the +1's so far and a 
quick vote of the core group on whether this is enough?


Grisha


Re: 3.2.6 test period - how long do we wait?

2006-01-26 Thread Jim Gallacher

Gregory (Grisha) Trubetskoy wrote:


On Thu, 26 Jan 2006, Jim Gallacher wrote:


It seems like any 3.2.6 testing that is going to be done, has been done.



I've been kinda swamped with unrelated things past two weeks, so I 
wasn't paying much attention. Perhaps an e-mail summarizing the +1's so 
far and a quick vote of the core group on whether this is enough?


I will gather the info later tonight.

Jim


Re: 3.2.6 test period - how long do we wait?

2006-01-26 Thread Jim Gallacher

Gregory (Grisha) Trubetskoy wrote:


On Thu, 26 Jan 2006, Jim Gallacher wrote:


It seems like any 3.2.6 testing that is going to be done, has been done.



I've been kinda swamped with unrelated things past two weeks, so I 
wasn't paying much attention. Perhaps an e-mail summarizing the +1's so 
far and a quick vote of the core group on whether this is enough?


Grisha


Here is the summary of the 3.2.6 testing to date. Let me know if I've 
missed anyone's tests or you still want to toss you test on the pile.


For the 3.2.6b tarball (same as 3.2.6 but with the messed up version 
string).


+1 Gentoo [current], Apache-2.0.55 (mpm-prefork), Python-2.4, gcc-3.4.4
+1 Linux (amd64) Ubuntu Breezy 5.10, apache 2.0.54 mpm-worker python 2.4
+1 Linux Debian sid, apache 2.0.55 (mpm-prefork), python 2.3
+1 Linux Debian sarge, apache 2.0.54 (mpm-worker), python 2.3
+1 Mac OS X 10.3, Apache 2.0.55, python 2.3
+1 Slackware 10.1, Apache 2.0.55 (mpm-prefork), Python 2.4
+1 Windows XP, python 2.4
+1 Windows 2000, python 2.3

For 3.2.6
+1 Fedora Core 4 Apache 2.0.54 Python 2.4.1
+1 Linux Debian sid, apache 2.0.55 (mpm-prefork), python 2.3
+1 Linux Debian sarge, apache 2.0.54 (mpm-worker), python 2.3
+1 MacOS-10.4.4 Apache-2.0.55 mpm-prefork Python-2.4.2
+1 Windows 2000 Server SP4, Python 2.3
+1 Windows XP Pro SP2, Python 2.4
+1 Windows XP Pro SP2 Apache 2.0.54 Python 2.4.2

 0 HP Tru64, mpm-worker

On the bsd front, Barry Pederson reported a problem with FreeBSD 6.0 but 
didn't issue an actual -1. FreeBSD and I are still not getting along, 
but with a small manual tweak of the Makefile I was able to run all the 
tests successfully. I don't know if there is a configure problem or if 
it's just my BSD incompetence but I'm willing to issue a shaky +0 for 
FreeBSD 6.0.


Jim


[jira] Commented: (MODPYTHON-114) Problems with PythonPath directive.

2006-01-26 Thread Graham Dumpleton (JIRA)
[ 
http://issues.apache.org/jira/browse/MODPYTHON-114?page=comments#action_12364176
 ] 

Graham Dumpleton commented on MODPYTHON-114:


Note that the call to acquire the mutex lock should be moved outside of the try 
block.

if config.has_key(PythonPath): 

_path_cache_lock.acquire() 

try: 
pathstring = config[PythonPath] 

if not _path_cache.has_key(pathstring): 
newpath = eval(pathstring) 
_path_cache[pathstring] = None 
sys.path[:] = newpath 

finally: 
_path_cache_lock.release() 

See:

  http://www.mail-archive.com/python-dev@httpd.apache.org/msg01025.html

for discussion about this point and other suggested modifications to the patch.

The other suggested modifications would scarifice total correctness for reduced 
thread locking. Even though it wouldn't be totally correct, it wouldn't cause 
actual failure, just redundant entries in sys.path. Opinion seemed to be 
leaning though towards correctness.

 Problems with PythonPath directive.
 ---

  Key: MODPYTHON-114
  URL: http://issues.apache.org/jira/browse/MODPYTHON-114
  Project: mod_python
 Type: Bug
   Components: core
 Versions: 3.2, 3.1.4
 Reporter: Graham Dumpleton


 The PythonPath setting can be used to define the value of the Python 
 sys.path variable. It is this variable which defines the list of 
 directories that Python will search in when looking for a module to be 
 imported.
 Although the actual reassignment of sys.path by mod_python does not in 
 itself present a problem due to assignment in Python being thread safe by 
 definition, the context in which the assignment occurs is not thread safe and 
 a race condition exists.
 This exists as the top level mod_python dispatcher will consult the existing 
 value of sys.path and the last value for the PythonPath setting 
 encountered before then making a decision to modify sys.path. If multiple 
 requests are being serviced in distinct threads within the context of the 
 same interpreter instance, and each at the same time decide they want to 
 modify the value of sys.path, only one might ultimately succeed in setting 
 it to the value it wants and any modification required by the other may be 
 lost.
 In the worst case scenario, this can result in the importation of any 
 subsequent modules within that request failing due to a required directory 
 not being present in sys.path. It is possible that this situation may 
 resolve itself and go away on a subsequent request, but due to how mod_python 
 caches the last value of PythonPath in a global variable this will be 
 dependent on what other requests arrive.
 At the least, for mod_python to resolve the problem itself would require a 
 request to arrive in the interim which targeted the URL which was not the 
 last to cache its raw setting for PythonPath. This only works though due to 
 a further issue whereby alternate requests against URLs with different 
 PythonPath settings will cause sys.path to be extended everytime if the 
 PythonPath setting references sys.path. This results in sys.path 
 continually growing over time due to directories being added multiple times.
 The problematic code in apache.py is:
 if config.has_key(PythonPath):
 # we want to do as little evaling as possible,
 # so we remember the path in un-evaled form and
 # compare it
 global _path
 pathstring = config[PythonPath]
 if pathstring != _path:
 _path = pathstring
 newpath = eval(pathstring)
 if sys.path != newpath:
 sys.path[:] = newpath
 To fix the problems, the processing of PythonPath directive should be 
 protected by a thread mutex lock and a dictionary should be used to store all 
 seen values of PythonPath rather than a single variable. If a value for 
 PythonPath has ever been seen, and not just from the last change, then no 
 update would be necessary.
 if config.has_key(PythonPath):
 try:
 _path_cache_lock.acquire()
 pathstring = config[PythonPath]
 if not _path_cache.has_key(pathstring):
 newpath = eval(pathstring)
 _path_cache[pathstring] = None
 sys.path[:] = newpath
 finally:
 _path_cache_lock.release()
 There shouldn't really be a need to check whether the new path is different 
 to the current value of sys.path as that scenario shouldn't occur at that 
 point.
 The two parts of this problem were previously catalogued as ISSUE 15 and 
 ISSUE 16 in my list of problems with the module importing system. The list 
 can be found at:
   http://www.dscpl.com.au/articles/modpython-003.html

-- 
This message is automatically 

[jira] Created: (MODPYTHON-115) import_module() and multiple modules of same name.

2006-01-26 Thread Graham Dumpleton (JIRA)
import_module() and multiple modules of same name.
--

 Key: MODPYTHON-115
 URL: http://issues.apache.org/jira/browse/MODPYTHON-115
 Project: mod_python
Type: Bug
  Components: core  
Versions: 3.1.4, 3.2
Reporter: Graham Dumpleton


The apache.import_module() function is a thin wrapper over the standard 
Python module importing system. This means that modules are still stored in 
sys.modules. As modules in sys.modules are keyed by their module name, this 
in turn means that there can only be one active instance of a module for a 
specific name.

The import_module() function tries to work around this by checking the path 
name of the location of a module against that being requested and if it is 
different will reload the correct module. This check of the path though only 
occurs when the path argument is actually supplied to the import_module() 
function. The path is only supplied in this way when mod_python.publisher 
makes use of the import_module() function, it is not supplied when the 
Python*Handler directives are used because in that circumstance a module may 
actually be a system module and supplying path would prevent it from being 
found.

Even though mod_python.publisher supplies the path argument to the 
import_module() function, the check of the path has bugs, with modules 
possibly becoming inaccessible as documented in JIRA as MODPYTHON-9. 

The check by mod_python of the path name to the actual code file for a module 
to determine if it should be reloaded, can also cause a continual cycle of 
module reloading even though the modules on disk may not have changed. This 
will occur when successive requests alternate between URLs related to the 
distinct modules having the same name. This cyclic reloading is documented in 
JIRA as MODPYTHON-10.

That a module is reloaded into the same object space as the existing module 
when two modules of the same name are in different locations, can also cause 
namespace pollution and security issues if one location for the module was 
public and the other private. This cross contamination of modules is as 
documented in JIRA as MODPYTHON-11.

In respect of the Python*Handler directives where the path argument was 
never supplied to the import_module() function, the result would be that the 
first module loaded under the specified name would be used. Thus, any 
subsequent module of the same name referred to by a Python*Handler directive 
found in a different directory but within the same interpreter would in effect 
be ignored.

A caveat to this though is that such a Python*Handler directive would result 
in that handlers directory being inserted at the head of sys.path. If the 
first instance of the module loaded under that name were at some point 
modified, the module would be automatically reloaded, but it would load the 
version from the different directory.

Now, although these problem as they relate to mod_python.publisher are 
addressed in mod_python 3.2.6, the underlying problems in 'import_module()' are 
not. As the bug reports as they relate to mod_python.publisher have been closed 
off as resolved, am creating this bug report so as to carry on a bug report for 
the underlying problem as it applies to Python*Handler directive and use of 
import_module() explicitly.

To illustrate the issue as it applies to Python*Handler directive, create two 
separate directories with a .htaccess file containing:

  AddHandler mod_python .py
  PythonHandler index
  PythonDebug On

In the index.py file in each separate directory put:

  import os
  from mod_python import apache

  def handler(req):
req.content_type = 'text/plain'
print  req, os.getpid(), __file__
return apache.OK

Assuming these are accessed as:

  /~grahamd/mod_python_9/subdir-1/index.py
  /~grahamd/mod_python_9/subdir-2/index.py

access the first URL, and the result will be:

  10665 /Users/grahamd/Sites/mod_python_9/subdir-1/index.py

now access the second URL and we get:

  10665 /Users/grahamd/Sites/mod_python_9/subdir-1/index.py

Note this assumes the same child process got it, so fixing Apache to run one 
child process is required for this test.

As one can see, it doesn't actually use the 'subdir-2/index.py module at all 
and still uses the subdir-1/index.py' module.

If one modifies subdir-1/index.py' so its timestamp is updated and load the 
second URL again, we get:

  10665 /Users/grahamd/Sites/mod_python_9/subdir-2/index.py

This occurs because it detects the change in the first module loaded, but 
because sys.path had the second handler directory at the head of sys.path now, 
when reloaded it picked up the latter.

These issues with same name module in multiple locations is listed as ISSUE 14 
in my list of module importer problems. See:

  http://www.dscpl.com.au/articles/modpython-003.html


-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly 

[jira] Commented: (MODPYTHON-116) Attributes removed from module code file still accessible until restart.

2006-01-26 Thread Graham Dumpleton (JIRA)
[ 
http://issues.apache.org/jira/browse/MODPYTHON-116?page=comments#action_12364187
 ] 

Graham Dumpleton commented on MODPYTHON-116:


Whoops, forgot to reference back to my problem list for completeness. :-)

This problem is listed as ISSUE 13 in my list of module importer problems. See:

  http://www.dscpl.com.au/articles/modpython-003.html

 Attributes removed from module code file still accessible until restart.
 

  Key: MODPYTHON-116
  URL: http://issues.apache.org/jira/browse/MODPYTHON-116
  Project: mod_python
 Type: Bug
   Components: core
 Versions: 3.2, 3.1.4
 Reporter: Graham Dumpleton


 When using apache.import_module() directly, or when it is used indirectly 
 by the Python*Handler directives, and automatic module reloading occurs, 
 modules are reloaded on top of the existing module. One of the problems with 
 this is that if an attribute such as a function or data value is removed from 
 the code file on disk, when the module is reloaded, that attribute doesn't 
 get removed from the in memory module held by mod_python. The only way to 
 eliminate such an attributed from the in memory module currently is to 
 restart Apache, automatic module reloading doesn't help.
 A good example of the problems this can cause is with mod_python.publisher. 
 Because attributes can be arbitrarily mapped to by a URL, if you forget to 
 prefix variables or functions with an underscore they will be visible to a 
 request. If such a mistake was realised and you change the source code to add 
 an underscore prefix and relied on automatic module reloading, it wouldn't 
 actually get rid of the incorrectly named attribute in the module held in the 
 mod_python cache. Thus, the restart of Apache is still required.
 As it stands for mod_python.publisher, the problem is fixed in 3.2.6, but it 
 still exists for direct use of apache.import_module() and Python*Handler 
 directives. The consequences outside of mod_python.publisher may not be as 
 problematic, bu depends on specific user code.
 A scheme whereby new modules are reloaded into a new module instance would 
 eliminate this problem.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira