Re: [Python-Dev] Fwd: Problem withthe API for str.rpartition()

2006-09-06 Thread Steve Holden
Raymond Hettinger wrote:
[...]
> That's fine with me.  I accept there will always be someone who stands 
> on their head [...]

You'd have to be some kind of contortionist to stand on your head.

willfully-misunderstanding-ly y'rs  - steve
-- 
Steve Holden   +44 150 684 7255  +1 800 494 3119
Holden Web LLC/Ltd  http://www.holdenweb.com
Skype: holdenweb   http://holdenweb.blogspot.com
Recent Ramblings http://del.icio.us/steve.holden

___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Fwd: Problem withthe API for str.rpartition()

2006-09-06 Thread Nick Coghlan
Phillip J. Eby wrote:
> At 04:55 PM 9/5/2006 -0400, Barry Warsaw wrote:
>> On Sep 5, 2006, at 4:43 PM, Jim Jewett wrote:
>>
>>> I think I finally figured out where Raymond is coming from.
>>>
>>> For Raymond, "head" is where he started processing -- for rpartition,
>>> this is the .endswith part.
>>>
>>> For me, "head" is the start of the data structure -- always the
>>> .startswith part.
>>>
>>> We won't resolve that with anything suggesting a sequential order; we
>>> need something that makes it clear which part is the large leftover.
>> See, for me, it's all about the results of the operation, not how the
>> results are (supposedly) used.  The way I think about it is that I've
>> got some string and I'm looking for some split point within that
>> string.  That split point is clearly the "middle" (but "sep" works
>> too) and everything to the right of that split point gets returned in
>> "right" while everything to the left gets returned in "left".
> 
> +1 for left/sep/right for both operations.  It's easier to remember a 
> visual correlation (left,sep,right) than it is to try and think about an 
> abstraction in which the order of results has something to do with what 
> direction I found the separator in.

-1. The string docs are already lousy with left/right terminology that is
flatout wrong when dealing with a script that is displayed with a
right-to-left or vertical orientation*. In reality, strings are processed such
that index 0 is the first character and index -1 is the last character,
regardless of script orientation, but you could be forgiven for not realising
that after reading the current string docs. Let's not make that particular
problem any worse.

I don't see anything wrong with Raymond's 'head, sep, tail' and 'tail, sep,
head' terminology (although noting the common postcondition 'sep not in head'
in the docstrings might be useful).

However, if we're going to use the same result tuple for both, then I'd prefer
'before, sep, after', with the partition() postcondition being 'sep not in
before' and the rpartition() postcondition being 'sep not in after'. Those
terms are accurate regardless of script orientation.

Either way, I suggest putting the postcondition in the docstring to make the 
difference between the two methods explicit.

Regards,
Nick.

* I acknowledge that Python *code* is almost certainly going to be edited in a 
left-to-right text editor, because it's an English-based programming language. 
But the strings that string methods like partition() and rpartition() are used 
with are quite likely to be coming from or written to a or user interface that 
uses a native script orientation.

-- 
Nick Coghlan   |   [EMAIL PROTECTED]   |   Brisbane, Australia
---
 http://www.boredomandlaziness.org
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Fwd: Problem withthe API for str.rpartition()

2006-09-06 Thread Steve Holden
Nick Coghlan wrote:
> Phillip J. Eby wrote:
> 
>>At 04:55 PM 9/5/2006 -0400, Barry Warsaw wrote:
>>
>>>On Sep 5, 2006, at 4:43 PM, Jim Jewett wrote:
>>>
>>>
I think I finally figured out where Raymond is coming from.

For Raymond, "head" is where he started processing -- for rpartition,
this is the .endswith part.

For me, "head" is the start of the data structure -- always the
.startswith part.

We won't resolve that with anything suggesting a sequential order; we
need something that makes it clear which part is the large leftover.
>>>
>>>See, for me, it's all about the results of the operation, not how the
>>>results are (supposedly) used.  The way I think about it is that I've
>>>got some string and I'm looking for some split point within that
>>>string.  That split point is clearly the "middle" (but "sep" works
>>>too) and everything to the right of that split point gets returned in
>>>"right" while everything to the left gets returned in "left".
>>
>>+1 for left/sep/right for both operations.  It's easier to remember a 
>>visual correlation (left,sep,right) than it is to try and think about an 
>>abstraction in which the order of results has something to do with what 
>>direction I found the separator in.
> 
> 
> -1. The string docs are already lousy with left/right terminology that is
> flatout wrong when dealing with a script that is displayed with a
> right-to-left or vertical orientation*. In reality, strings are processed such
> that index 0 is the first character and index -1 is the last character,
> regardless of script orientation, but you could be forgiven for not realising
> that after reading the current string docs. Let's not make that particular
> problem any worse.
> 
> I don't see anything wrong with Raymond's 'head, sep, tail' and 'tail, sep,
> head' terminology (although noting the common postcondition 'sep not in head'
> in the docstrings might be useful).
> 
> However, if we're going to use the same result tuple for both, then I'd prefer
> 'before, sep, after', with the partition() postcondition being 'sep not in
> before' and the rpartition() postcondition being 'sep not in after'. Those
> terms are accurate regardless of script orientation.
> 
> Either way, I suggest putting the postcondition in the docstring to make the 
> difference between the two methods explicit.
> 
> Regards,
> Nick.
> 
> * I acknowledge that Python *code* is almost certainly going to be edited in 
> a 
> left-to-right text editor, because it's an English-based programming 
> language. 
> But the strings that string methods like partition() and rpartition() are 
> used 
> with are quite likely to be coming from or written to a or user interface 
> that 
> uses a native script orientation.
> 
Perhaps we should be thinking "beginning" and "end" here, though it 
seems as though it won't be possible to find a terminology that will be 
intuitively obvious to everyone.

regards
  Steve
-- 
Steve Holden   +44 150 684 7255  +1 800 494 3119
Holden Web LLC/Ltd  http://www.holdenweb.com
Skype: holdenweb   http://holdenweb.blogspot.com
Recent Ramblings http://del.icio.us/steve.holden

___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Fwd: Problem withthe API for str.rpartition()

2006-09-06 Thread Georg Brandl
Steve Holden wrote:

>> * I acknowledge that Python *code* is almost certainly going to be edited in 
>> a 
>> left-to-right text editor, because it's an English-based programming 
>> language. 
>> But the strings that string methods like partition() and rpartition() are 
>> used 
>> with are quite likely to be coming from or written to a or user interface 
>> that 
>> uses a native script orientation.
>> 
> Perhaps we should be thinking "beginning" and "end" here, though it 
> seems as though it won't be possible to find a terminology that will be 
> intuitively obvious to everyone.

Which is why an example is absolutely necessary and will make things clear for
everyone.

Georg

___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] inspect.py very slow under 2.5

2006-09-06 Thread Ralf Schmitt
Fernando Perez wrote:
> 
> These enormous numbers of calls are the origin of the slowdown, and the more
> modules have been imported, the worse it gets.


--- /exp/lib/python2.5/inspect.py   2006-08-28 11:53:36.0 +0200
+++ inspect.py  2006-09-06 12:10:45.0 +0200
@@ -444,7 +444,8 @@
  in the file and the line number indexes a line in that list.  An 
IOError
  is raised if the source code cannot be retrieved."""
  file = getsourcefile(object) or getfile(object)
-module = getmodule(object)
+#module = getmodule(object)
+module = None
  if module:
  lines = linecache.getlines(file, module.__dict__)
  else:

The problem seems to originate from the module=getmodule(object) in 
findsource. If I outcomment that code (or rather do a module=None),
things seem to be back as normal. (linecache.getlines has been called 
with a None module in python 2.4's inspect.py).

- Ralf

___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Signals, threads, blocking C functions

2006-09-06 Thread Michael Hudson
"Gustavo Carneiro" <[EMAIL PROTECTED]> writes:

> On 9/4/06, Nick Maclaren <[EMAIL PROTECTED]> wrote:
>> "Gustavo Carneiro" <[EMAIL PROTECTED]> wrote:
>> >   I am now thinking of something along these lines:
>> > typedef void (*PyPendingCallNotify)(void *user_data);
>> > PyAPI_FUNC(void) Py_AddPendingCallNotify(PyPendingCallNotify callback,
>> > void *user_data);
>> > PyAPI_FUNC(void) Py_RemovePendingCallNotify(PyPendingCallNotify
>> > callback, void *user_data);
>>
>> Why would that help?  The problems are semantic, not syntactic.
>>
>> Anthony Baxter isn't exaggerating the problem, despite what you may
>> think from his posting.
>
>   You guys are tough customers to please. 

Yes.

> I am just trying to solve a problem here, not create a new one; you
> have to believe me.

We believe you, but you are stirring the ashes of old problems.

>  1. In PyGTK we have a gobject.MainLoop.run() method, which blocks
> essentially forever in a poll() system call, and only wakes if/when it
> has to process timeout or IO event;
>  2. When we only have one thread, we can guarantee that e.g.
> SIGINT will always be caught by the thread running the
> g_main_loop_run(), so we know poll() will be interrupted and a EINTR
> will be generated, giving us control temporarily back to check for
> python signals;
>  3. When we have multiple thread, we cannot make this assumption,
> so instead we install a timeout to periodically check for signals.
>
>   We want to get rid of timeouts.  Now my idea: add a Python API to say:
>  "dear Python, please call me when you start having pending calls,
> even if from a signal handler context, ok?"

This seems a reasonable proposal.  But it's totally a Python 2.6
thing, so how about taking a deep breath, working on a patch and
submitting it when it's ready?

Having to wake a process up a few times a second is ugly and annoying,
sure, but it is not a release delaying problem.

Cheers,
mwh

-- 
  It is never worth a first class man's time to express a majority
  opinion.  By definition, there are plenty of others to do that.
-- G. H. Hardy
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] inspect.py very slow under 2.5

2006-09-06 Thread Nick Coghlan
Ralf Schmitt wrote:
> The problem seems to originate from the module=getmodule(object) in 
> findsource. If I outcomment that code (or rather do a module=None),
> things seem to be back as normal. (linecache.getlines has been called 
> with a None module in python 2.4's inspect.py).

It looks like the problem is the call to getabspath() in getmodule(). This 
happens every time, even if the file name is already in the modulesbyfile 
cache. This calls os.path.abspath() and os.path.normpath() every time that 
inspect.findsource() is called.

That can be fixed by having findsource() pass the filename argument to 
getmodule(), and adding a check of the modulesbyfile cache *before* the call 
to getabspath().

Can you try this patch and see if you get 2.4 level performance back on 
Fernando's test?:

http://www.python.org/sf/1553314

(Assigned to Neal in the hopes of making 2.5rc2)

Cheers,
Nick.

-- 
Nick Coghlan   |   [EMAIL PROTECTED]   |   Brisbane, Australia
---
 http://www.boredomandlaziness.org
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] inspect.py very slow under 2.5

2006-09-06 Thread Ralf Schmitt
Nick Coghlan wrote:
> 
> It looks like the problem is the call to getabspath() in getmodule(). This 
> happens every time, even if the file name is already in the modulesbyfile 
> cache. This calls os.path.abspath() and os.path.normpath() every time that 
> inspect.findsource() is called.
> 
> That can be fixed by having findsource() pass the filename argument to 
> getmodule(), and adding a check of the modulesbyfile cache *before* the call 
> to getabspath().
> 
> Can you try this patch and see if you get 2.4 level performance back on 
> Fernando's test?:

no. this doesn't work. getmodule always iterates over 
sys.modules.values() and only returns None afterwards.
One would have to cache the bad file value, or only inspect new/changed 
modules from sys.modules.

> 
> http://www.python.org/sf/1553314
> 
> (Assigned to Neal in the hopes of making 2.5rc2)
> 
> Cheers,
> Nick.
> 

___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] Exception message for invalid with statement usage

2006-09-06 Thread Georg Brandl
Current trunk:

>>> with 1:
...  print "1"
...
Traceback (most recent call last):
  File "", line 1, in 
AttributeError: 'int' object has no attribute '__exit__'

Isn't that a bit crude? For "for i in 1" there's a better
error message, so why shouldn't the above give a
TypeError: 'int' object is not a context manager

?

Georg

___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] inspect.py very slow under 2.5

2006-09-06 Thread Nick Coghlan
Ralf Schmitt wrote:
> Nick Coghlan wrote:
>>
>> It looks like the problem is the call to getabspath() in getmodule(). 
>> This happens every time, even if the file name is already in the 
>> modulesbyfile cache. This calls os.path.abspath() and 
>> os.path.normpath() every time that inspect.findsource() is called.
>>
>> That can be fixed by having findsource() pass the filename argument to 
>> getmodule(), and adding a check of the modulesbyfile cache *before* 
>> the call to getabspath().
>>
>> Can you try this patch and see if you get 2.4 level performance back 
>> on Fernando's test?:
> 
> no. this doesn't work. getmodule always iterates over 
> sys.modules.values() and only returns None afterwards.
> One would have to cache the bad file value, or only inspect new/changed 
> modules from sys.modules.

Good point. I modified the patch so it does the latter (it only calls 
getabspath() again for a module if the value of module.__file__ changes).

Cheers,
Nick.


-- 
Nick Coghlan   |   [EMAIL PROTECTED]   |   Brisbane, Australia
---
 http://www.boredomandlaziness.org
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Exception message for invalid with statement usage

2006-09-06 Thread Nick Coghlan
Georg Brandl wrote:
> Current trunk:
> 
 with 1:
> ...  print "1"
> ...
> Traceback (most recent call last):
>   File "", line 1, in 
> AttributeError: 'int' object has no attribute '__exit__'
> 
> Isn't that a bit crude? For "for i in 1" there's a better
> error message, so why shouldn't the above give a
> TypeError: 'int' object is not a context manager

The for loop has a nice error message because it starts with its own opcode, 
but the with statement translates pretty much to the code in PEP 343. There's 
a special opcode at the end to help with unwinding the stack, but at the start 
it's just normal attribute retrieval opcodes for __enter__ and __exit__.

 >>> def f():
...   with 1:
... pass
...
 >>> dis.dis(f)
   2   0 LOAD_CONST   1 (1)
   3 DUP_TOP
   4 LOAD_ATTR0 (__exit__)
   7 STORE_FAST   0 (_[1])
  10 LOAD_ATTR1 (__enter__)
  13 CALL_FUNCTION0
  16 POP_TOP
  17 SETUP_FINALLY4 (to 24)

   3  20 POP_BLOCK
  21 LOAD_CONST   0 (None)
 >>   24 LOAD_FAST0 (_[1])
  27 DELETE_FAST  0 (_[1])
  30 WITH_CLEANUP
  31 END_FINALLY
  32 LOAD_CONST   0 (None)
  35 RETURN_VALUE

Cheers,
Nick.

-- 
Nick Coghlan   |   [EMAIL PROTECTED]   |   Brisbane, Australia
---
 http://www.boredomandlaziness.org
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] inspect.py very slow under 2.5

2006-09-06 Thread Ralf Schmitt
Nick Coghlan wrote:
> Ralf Schmitt wrote:
>> Nick Coghlan wrote:
>>> It looks like the problem is the call to getabspath() in getmodule(). 
>>> This happens every time, even if the file name is already in the 
>>> modulesbyfile cache. This calls os.path.abspath() and 
>>> os.path.normpath() every time that inspect.findsource() is called.
>>>
>>> That can be fixed by having findsource() pass the filename argument to 
>>> getmodule(), and adding a check of the modulesbyfile cache *before* 
>>> the call to getabspath().
>>>
>>> Can you try this patch and see if you get 2.4 level performance back 
>>> on Fernando's test?:
>> no. this doesn't work. getmodule always iterates over 
>> sys.modules.values() and only returns None afterwards.
>> One would have to cache the bad file value, or only inspect new/changed 
>> modules from sys.modules.
> 
> Good point. I modified the patch so it does the latter (it only calls 
> getabspath() again for a module if the value of module.__file__ changes).

with _filesbymodname[modname] = file changed to 
_filesbymodname[modname] = f
it seems to work ok.

diff -r d41ffd2faa28 inspect.py
--- a/inspect.pyWed Sep 06 13:01:12 2006 +0200
+++ b/inspect.pyWed Sep 06 16:52:39 2006 +0200
@@ -403,6 +403,7 @@ def getabsfile(object, _filename=None):
  return os.path.normcase(os.path.abspath(_filename))

  modulesbyfile = {}
+_filesbymodname = {}

  def getmodule(object, _filename=None):
  """Return the module an object was defined in, or None if not 
found."""
@@ -410,17 +411,23 @@ def getmodule(object, _filename=None):
  return object
  if hasattr(object, '__module__'):
  return sys.modules.get(object.__module__)
+if _filename is not None and _filename in modulesbyfile:
+return sys.modules.get(modulesbyfile[_filename])
  try:
  file = getabsfile(object, _filename)
  except TypeError:
  return None
  if file in modulesbyfile:
  return sys.modules.get(modulesbyfile[file])
-for module in sys.modules.values():
+for modname, module in sys.modules.iteritems():
  if ismodule(module) and hasattr(module, '__file__'):
+f = module.__file__
+if f == _filesbymodname.get(modname, None):
+continue
+_filesbymodname[modname] = f
  f = getabsfile(module)
  modulesbyfile[f] = modulesbyfile[
-os.path.realpath(f)] = module.__name__
+os.path.realpath(f)] = modname
  if file in modulesbyfile:
  return sys.modules.get(modulesbyfile[file])
  main = sys.modules['__main__']
@@ -444,7 +451,7 @@ def findsource(object):
  in the file and the line number indexes a line in that list.  An 
IOError
  is raised if the source code cannot be retrieved."""
  file = getsourcefile(object) or getfile(object)
-module = getmodule(object)
+module = getmodule(object, file)
  if module:
  lines = linecache.getlines(file, module.__dict__)
  else:

___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Exception message for invalid with statement usage

2006-09-06 Thread Guido van Rossum
IMO it's fine. The only time you'll see this in reality is when
someone passed you the wrong type of object by mistake, and then the
type mentioned in the message is plenty help to debug it. Anyone with
even a slight understanding of 'with' knows it involves '__exit__',
and the linenumber should be a big fat hint, too.

On 9/6/06, Georg Brandl <[EMAIL PROTECTED]> wrote:
> Current trunk:
>
> >>> with 1:
> ...  print "1"
> ...
> Traceback (most recent call last):
>   File "", line 1, in 
> AttributeError: 'int' object has no attribute '__exit__'
>
> Isn't that a bit crude? For "for i in 1" there's a better
> error message, so why shouldn't the above give a
> TypeError: 'int' object is not a context manager
>
> ?
>
> Georg
>
> ___
> Python-Dev mailing list
> [email protected]
> http://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: 
> http://mail.python.org/mailman/options/python-dev/guido%40python.org
>


-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Cross-platform math functions?

2006-09-06 Thread Tim Peters
[Tim Peters]
>> Package a Python wrapper and see how popular it becomes.  Some reasons
>> against trying to standardize on fdlibm were explained here:
>>
>>http://mail.python.org/pipermail/python-list/2005-July/290164.html

[Andreas Raab]
> Thanks, these are good points. About speed, do you have any good
> benchmarks available?

Certainly not for "typical Python use" -- doubt such a benchmark
exists.  Some people use  sqrt once in a blue moon, others make heavy
use of many libm functions over millions & millions of floats, and in
some apps extremely heavy use is made where speed is everything and
accuracy doesn't much matter at all (e.g., gross plotting).

I'd ask on numeric Python lists, and (e.g.) people working with visualization.

> In my experience fdlibm is quite reasonable for speed in the context of use
> by dynamic languages (i.e., counting allocation overheads, lookup and send
> performance etc)

"Reasonable" for which purpose(s), specifically?  Some people would
certainly care about a 5% slowdown, while most others wouldn't, but
one thing to avoid is pissing off the people who use a thing the most
;-)

> but since I'm not a Python expert I'd appreciate some help with realistic
> benchmarks.

As above, python-dev isn't a likely place to look for such answers.

> ...
> Agreed. Thus my question if someone had already done this ;-)

Not that I know of, although my understanding (which may be wrong) is
that glibc's current math functions started as a copy of fdlibm.
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] buildbot breakage

2006-09-06 Thread Gustavo Niemeyer
Some buildbots will fail because they got revision r51793, and it
has a change I made to fix a problem in the subprocess module.

Please do not rollback any changes. I'm handling the issue.

Also notice that there's no broken code there.  The problem is that
the issue in subprocess is related to stdout/stderr handling, and I'm
having trouble making buildbot happy while keeping the new tests
in place.

I apologise for any inconvenience this may cause.

-- 
Gustavo Niemeyer
http://niemeyer.net
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] buildbot breakage

2006-09-06 Thread Gustavo Niemeyer
> Some buildbots will fail because they got revision r51793, and it
> has a change I made to fix a problem in the subprocess module.

I've removed the offending test in r51794 and buildbots should be
happy again.

One of the ways of exploring the issue reported is using sys.stdout
as the stdout keyword, such as:

   subprocess.call([...], stdout=sys.stdout)

it breaks because it ends up closing one of the standard descriptors
of the subprocess.

Unfortunately we can't test it that way because buildbot uses a
StringIO in sys.stdout.

I kept the test which uses stdout=1, and removed the one expecting
sys.stdout to be a "normal" file.

Sorry for the trouble,

-- 
Gustavo Niemeyer
http://niemeyer.net
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] new security doc using object-capabilities

2006-09-06 Thread Ka-Ping Yee
Hi Brett,

Here are some comments on your proposal.  Sorry this took so long.
I apologize if any of these comments are out of date (but also look
forward to your answers to some of the questions, as they'll help
me understand some more of the details of your proposal).  Thanks!

> Introduction
> ///
[...]
> Throughout this document several terms are going to be used.  A
> "sandboxed interpreter" is one where the built-in namespace is not the
> same as that of an interpreter whose built-ins were unaltered, which
> is called an "unprotected interpreter".

Is this a definition or an implementation choice?  As in, are you
defining "sandboxed" to mean "with altered built-ins" or just
"restricted in some way", and does the above mean to imply that
altering the built-ins is what triggers other kinds of restrictions
(as it did in Python's old restricted execution mode)?

> A "bare interpreter" is one where the built-in namespace has been
> stripped down the bare minimum needed to run any form of basic Python
> program.  This means that all atomic types (i.e., syntactically
> supported types), ``object``, and the exceptions provided by the
> ``exceptions`` module are considered in the built-in namespace.  There
> have also been no imports executed in the interpreter.

Is a "bare interpreter" just one example of a sandboxed interpreter,
or are all sandboxed interpreters in your design initially bare (i.e.
"sandboxed" = "bare" + zero or more granted authorities)?

> The "security domain" is the boundary at which security is cared
> about.  For this dicussion, it is the interpreter.

It might be clearer to say (if i understand correctly) "Each interpreter
is a separate security domain."

Many interpreters can run within a single operating system process,
right?  Could you say a bit about what sort of concurrency model you
have in mind?  How would this interact (if at all) with use of the
existing threading functionality?

> The "powerbox" is the thing that possesses the ultimate power in the
> system.  In our case it is the Python process.

This could also be the application process, right?

> Rationale
> ///
[...]
> For instance, think of an application that supports a plug-in system
> with Python as the language used for writing plug-ins.  You do not
> want to have to examine every plug-in you download to make sure that
> it does not alter your filesystem if you can help it.  With a proper
> security model and implementation in place this hinderance of having
> to examine all code you execute should be alleviated.

I'm glad to have this use case set out early in the document, so the
reader can keep it in mind as an example while reading about the model.

> Approaches to Security
> ///
>
> There are essentially two types of security: who-I-am
> (permissions-based) security and what-I-have (authority-based)
> security.

As Mark Miller mentioned in another message, your descriptions of
"who-I-am" security and "what-I-have" security make sense, but
they don't correspond to "permission" vs. "authority".  They
correspond to "identity-based" vs. "authority-based" security.

> Difficulties in Python for Object-Capabilities
> //
[...]
> Three key requirements for providing a proper perimeter defence is
> private namespaces, immutable shared state across domains, and
> unforgeable references.

Nice summary.

> Problem of No Private Namespace
> ===
[...]
> The Python language has no such thing as a private namespace.

Don't local scopes count as private namespaces?  It seems clear
that they aren't designed with the intention of being exposed,
unlike other namespaces in Python.

> It also makes providing security at the object level using
> object-capabilities non-existent in pure Python code.

I don't think this is necessarily the case.  No Python code i've
ever seen expects to be able to invade the local scopes of other
functions, so you could use them as private namespaces.  There
are two ways i've seen to invade local scopes:

(a) Use gc.get_referents to get back from a cell object
to its contents.

(b) Compare the cell object to another cell object, thereby
causing __eq__ to be invoked to compare the contents of
the cells.

So you could protect local scopes by prohibiting these or by
simply turning off access to func_closure.  It's clear that hardly
any code depends on these introspection featuresl, so it would be
reasonble to turn them off in a sandboxed interpreter.  (It seems
you would have to turn off some introspection features anyway in
order to have reliable import guards.)

> Problem of Mutable Shared State
> ===
[...]
> Regardless, sharing of state that can be influenced by another
> interpreter is not safe for object-capabilities.

Yup.

> Threat Model
> ///