Re: [Python-Dev] PEP 402: Simplified Package Layout and Partitioning

2011-08-12 Thread P.J. Eby

At 02:02 PM 8/11/2011 -0400, Glyph Lefkowitz wrote:
Rather than a one-by-one ad-hoc consideration of which attribute 
should be set to None or empty strings or string or what have 
you, I'd really like to see a discussion in the PEP saying what a 
package really is vs. what a module is, and what one can reasonably 
expect from it from an API and tooling perspective.


The assumption I've been working from is the only guarantee I've ever 
seen the Python docs give: i.e., that a package is a module object 
with a __path__ attribute.  Modules aren't even required to have a 
__file__ object -- builtin modules don't, for example.  (And the 
contents of __file__ are not required to have any particular 
semantics: PEP 302 notes that it can be a dummy value like 
frozen, for example.)


Technically, btw, PEP 302 requires __file__ to be a string, so making 
__file__ = None will be a backwards-incompatible change.  But any 
code that walks modules in sys.modules is going to break today if it 
expects a __file__ attribute to exist, because 'sys' itself doesn't have one!


So, my leaning is towards leaving off __file__, since today's code 
already has to deal with it being nonexistent, if it's working with 
arbitrary modules, and that'll produce breakage sooner rather than 
later -- the twisted.python.modules code, for example, would fail 
with a loud AttributeError, rather than going on to silently assume 
that a module with a dummy __file__ isn't a package.   (Which is NOT 
a valid assumption *now*, btw, as I'll explain below.)


Anyway, if you have any suggestions for verbiage that should be added 
to the PEP to clarify these assumptions, I'd be happy to add 
them.  However, I think that the real problem you're encountering at 
the moment has more to do with making assumptions about the Python 
import ecosystem that aren't valid today, and haven't been valid 
since at least the introduction of PEP 302, if not earlier import 
hook systems as well.



 But the whole pure virtual mechanism here seems to pile even 
more inconsistency on top of an already irritatingly inconsistent 
import mechanism.  I was reasonably happy with my attempt to paper 
over PEP 302's weirdnesses from a user perspective:


http://twistedmatrix.com/documents/11.0.0/api/twisted.python.modules.htmlhttp://twistedmatrix.com/documents/11.0.0/api/twisted.python.modules.html

(or https://launchpad.net/moduleshttps://launchpad.net/modules if 
you are not a Twisted user)


Users of this API can traverse the module hierarchy with certain 
expectations; each module or package would have .pathEntry and 
.filePath attributes, each of which would refer to the appropriate 
place.  Of course __path__ complicates things a bit, but so it goes.


I don't mean to be critical, and no doubt what you've written works 
fine for your current requirements, but on my quick attempt to skim 
through the code I found many things which appear to me to be 
incompatible with PEP 302.


That is, the above code hardocdes a variety of assumptions about the 
import system that haven't been true since Python 2.3.  (For example, 
it assumes that the contents of sys.path strings have inspectable 
semantics, that the contents of __file__ can tell you things about 
the module-ness or package-ness of a module object, etc.)


If you want to fully support PEP 302, you might want to consider 
making this a wrapper over the corresponding pkgutil APIs (available 
since Python 2.5) that do roughly the same things, but which delegate 
all path string inspection to importer objects and allow extensible 
delegation for importers that don't support the optional methods involved.


(Of course, if the pkgutil APIs are missing something you need, 
perhaps you could propose additions.)



Now it seems like pure virtual packages are going to introduce a new 
type of special case into the hierarchy which have neither 
.pathEntry nor .filePath objects.


The problem is that your API's notion that these things exist as 
coherent concepts was never really a valid assumption in the first 
place.  .pth files and namespace packages already meant that the idea 
of a package coming from a single path entry made no sense.  And 
namespace packages installed by setuptools' system packaging mode 
*don't have a __file__ attribute* today...  heck they don't have 
__init__ modules, either.


So, adding virtual packages isn't actually going to change anything, 
except perhaps by making these scenarios more common.


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 402: Simplified Package Layout and Partitioning

2011-08-12 Thread P.J. Eby

At 01:09 PM 8/12/2011 -0400, Glyph Lefkowitz wrote:
Upon further reflection, PEP 402 _will_ make dealing with namespace 
packages from this code considerably easier: we won't need to do AST 
analysis to look for a __path__ attribute or anything gross like 
that improve correctness; we can just look in various directories on 
sys.path and accurately predict what __path__ will be synthesized to be.


The flip side of that is that you can't always know whether a 
directory is a virtual package without deep inspection: one 
consequence of PEP 402 is that any directory that contains a Python 
module (of whatever type), however deeply nested, will be a valid 
package name.  So, you can't rule out that a given directory *might* 
be a package, without walking its entire reachable subtree.  (Within 
the subset of directory names that are valid Python identifiers, of course.)


However, you *can* quickly tell that a directory *might* be a package 
or is *probably* one: if it contains modules, or is the same name as 
an already-discovered module, it's a pretty safe bet that you can 
flag it as such.


In any case, you probably should *not* do the building of a virtual 
path yourself; the protocols and APIs added by PEP 402 should allow 
you to simply ask for the path to be constructed on your 
behalf.  Otherwise, you are going to be back in the same business of 
second-guessing arbitrary importer backends again!


(E.g. note that PEP 402 does not say virtual package subpaths must be 
filesystem or zipfile subdirectories of their parents - an importer 
could just as easily allow you to treat subdirectories named 
'twisted.python' as part of a virtual package with that name!)


Anyway, pkgutil defines some extra methods that importers can 
implement to support module-walking, and part of the PEP 402 
implementation should be to make this support virtual packages as well.



This code still needs to support Python 2.4, but I will make a note 
of this for future reference.


A suggestion: just take the pkgutil code and bundle it for Python 2.4 
as something._pkgutil.  There's very little about it that's 2.5+ 
specific, at least when I wrote the bits that do the module walking.


Of course, the main disadvantage of pkgutil for your purposes is that 
it currently requires packages to be imported in order to walk their 
child modules.  (IIRC, it does *not*, however, require them to be 
imported in order to discover their existence.)



In that case, I guess it's a good thing; these bugs should be dealt 
with.  Thanks for pointing them out.  My opinion of PEP 402 has been 
completely reversed - although I'd still like to see a section about 
the module system from a library/tools author point of view rather 
than a time-traveling perl user's narrative :).


LOL.

If you will propose the wording you'd like to see, I'll be happy to 
check it for any current-and-or-future incorrect assumptions.  ;-)


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 402: Simplified Package Layout and Partitioning

2011-08-12 Thread P.J. Eby

At 05:03 PM 8/12/2011 -0400, Glyph Lefkowitz wrote:
Are there any rules about passing invalid identifiers to __import__ 
though, or is that just less likely? :)


I suppose you have a point there.  ;-)


I still like the idea of a 'marker' file.  It would be great if 
there were a new marker like __package__.py.


Having any required marker file makes separately-installable portions 
of a package impossible, since it would then be in conflict at 
installation time.


The (semi-)competing proposal, PEP 382, is based on allowing each 
portion to have a differently-named marker; we came up with PEP 402 
as a way to get rid of the need for any marker files (not to mention 
the bikeshedding involved.)




What do you mean building of a virtual path?


Constructing the __path__-to-be of a not-yet-imported virtual 
package.  The PEP defines a protocol for constructing this, by asking 
the importer objects to provide __path__ entries, and it does not 
require anything to be imported.  So there's no reason to 
re-implement the algorithm yourself.



The more that this can focus on module-walking without executing 
code, the happier I'll be :).


Virtual packages actually improve on this situation, in that a 
virtual path can be computed without the need to import the 
package.  (Assuming a submodule or subpackage doesn't munge the 
__path__, of course.)


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 402: Simplified Package Layout and Partitioning

2011-08-11 Thread P.J. Eby

At 04:39 PM 8/11/2011 +0200, Éric Araujo wrote:

Hi,

I've read PEP 402 and would like to offer comments.


Thanks.


Minor: I would reserve packaging for
packaging/distribution/installation/deployment matters, not Python
modules.  I suggest Python package semantics.


Changing to Python package import semantics to hopefully be even 
clearer.  ;-)


(Nitpick: I was somewhat intentionally ambiguous because we are 
talking here about how a package is physically implemented in the 
filesystem, and that actually *is* kind of a packaging issue.  But 
it's not necessarily a *useful* intentional ambiguity, so I've no 
problem with removing it.)




Minor: In the UNIX world, or with version control tools, moving and
renaming are the same one thing (hg mv spam.py spam/__init__.py for
example).  Also, if you turn a module into a package, you may want to
move code around, change imports, etc., so I'm not sure the renaming
part is such a big step.  Anyway, if the import-sig people say that
users think it's a complex or costly operation, I can believe it.


It's not that it's complex or costly in anything other than *mental* 
overhead -- you have to remember to do it and it's not particularly 
obvious.  (But people on import-sig did mention this and other things 
covered by the PEP as being a frequent root cause of beginner 
inquiries on #python, Stackoverflow, et al.)




 (By the way, both of these additions to the import protocol (i.e. the
 dynamically-added ``__path__``, and dynamically-created modules)
 apply recursively to child packages, using the parent package's
 ``__path__`` in place of ``sys.path`` as a basis for generating a
 child ``__path__``.  This means that self-contained and virtual
 packages can contain each other without limitation, with the caveat
 that if you put a virtual package inside a self-contained one, it's
 gonna have a really short ``__path__``!)
I don't understand the caveat or its implications.


Since each package's __path__ is the same length or shorter than its 
parent's by default, then if you put a virtual package inside a 
self-contained one, it will be functionally speaking no different 
than a self-contained one, in that it will have only one path 
entry.  So, it's not really useful to put a virtual package inside a 
self-contained one, even though you can do it.  (Apart form it 
letting you avoid a superfluous __init__ module, assuming it's indeed 
superfluous.)




 In other words, we don't allow pure virtual packages to be imported
 directly, only modules and self-contained packages.  (This is an
 acceptable limitation, because there is no *functional* value to
 importing such a package by itself.  After all, the module object
 will have no *contents* until you import at least one of its
 subpackages or submodules!)

 Once ``zc.buildout`` has been successfully imported, though, there
 *will* be a ``zc`` module in ``sys.modules``, and trying to import it
 will of course succeed.  We are only preventing an *initial* import
 from succeeding, in order to prevent false-positive import successes
 when clashing subdirectories are present on ``sys.path``.
I find that limitation acceptable.  After all, there is no zc project,
and no zc module, just a zc namespace.  I'll just regret that it's not
possible to provide a module docstring to inform that this is a
namespace package used for X and Y.


It *is* possible - you'd just have to put it in a zc.py file.  IOW, 
this PEP still allows namespace-defining packages to exist, as was 
requested by early commenters on PEP 382.  It just doesn't *require* 
them to exist in order for the namespace contents to be importable.




 The resulting list (whether empty or not) is then stored in a
 ``sys.virtual_package_paths`` dictionary, keyed by module name.
This was probably said on import-sig, but here I go: yet another import
artifact in the sys module!  I hope we get ImportEngine in 3.3 to clean
up all this.


Well, I rather *like* having them there, personally, vs. having to 
learn yet another API, but oh well, whatever.  AFAIK, ImportEngine 
isn't going to do away with the need for the global ones to live 
somewhere, at least not in 3.3.




 * A new ``extend_virtual_paths(path_entry)`` function, to extend
   existing, already-imported virtual packages' ``__path__`` attributes
   to include any portions found in a new ``sys.path`` entry.  This
   function should be called by applications extending ``sys.path``
   at runtime, e.g. when adding a plugin directory or an egg to the
   path.
Let's imagine my application Spam has a namespace spam.ext for plugins.
 To use a custom directory where plugins are stored, or a zip file with
plugins (I don't use eggs, so let me talk about zip files here), I'd
have to call sys.path.append *and* pkgutil.extend_virtual_paths?


As written in the current proposal, yes.  There was some discussion 
on Python-Dev about having this happen automatically, and I proposed 
that it could be done by making virtual 

Re: [Python-Dev] Import lock considered mysterious

2011-07-22 Thread P.J. Eby

At 02:48 PM 7/22/2011 +0200, Antoine Pitrou wrote:


See http://bugs.python.org/issue9260

There's a patch there but it needs additional sophistication to remove
deadlocks when doing concurrent circular imports.


I don't think that approach can work, as PEP 302 loaders can 
currently assume the global import lock is being held when they 
run...  and in general, there are too many global data structures in 
sys that need to be protected by code that uses such things.


A simpler solution to Greg's problem would be to have a timeout on 
attempts to acquire the import lock, and have it fail with a 
RuntimeError describing the problem.  (*Not* an ImportError, mind 
you, which might get ignored and trigger a different code path.)


The timeout would need to be on the order of seconds to prevent false 
positives, and there'd need to be a way to change or remove the 
timeout in the event somebody really needs to.  But it would 
eliminate the mysteriousness.  A unique and Google-able error message 
would let someone find a clear explanation of what's going on, as well.


A second thing that *could* be helpful would be to issue a warning 
when a new thread is started (or waited on) while the import lock is 
held.  This is already known to be a bad thing to do.


The tricky part is issuing the warning for the right caller level, 
but I suppose you could walk back up the call stack until you found 
some module-level code, and then fingered that line of code as the culprit.


Yes, that might do it: the code for starting or waiting on a thread, 
could check to see if the import lock is held by the current thread, 
and if so, walk up the stack to find a module frame (one where 
f_globals is f_locals and __name__ in f_locals and 
sys.modules[__name__].__dict__ is f_locals), and if one is found, 
issue a warning about not starting or waiting on threads in module-level code.


Between that and the timeout, the mysteriousness could be completely 
done away with, without throwing a monkey wrench into the current 
import mechanisms.


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Draft PEP: Simplified Package Layout and Partitioning

2011-07-21 Thread P.J. Eby

At 11:52 AM 7/21/2011 +1000, Nick Coghlan wrote:

Trying to change how packages are identified at the Python level makes
PEP 382 sound positively appealing. __path__ needs to stay :)


In which case, it should be a list, not a sentinel.  ;-)



Even better would be for these (and sys.path) to be list subclasses
that did the right thing under the hood as Glenn suggested. Code that
*replaces* rather than modifies these attributes would still
potentially break virtual packages, but code that modifies them in
place would do the right thing automatically. (Note that all code that
manipulates sys.path and __path__ attributes requires explicit calls
to correctly support current namespace package mechanisms, so this
would actually be an improvement on the status quo rather than making
anything worse).


I think the simplest thing, if we're keeping __path__ (and on 
reflection, I think we should), would be to simply call 
extend_virtual_paths() automatically on new path entries found in 
sys.path when an import is performed, relative to the previous value 
of sys.path.


That is, we save an old copy of sys.path somewhere, and whenever 
__import__() is called (well, once it gets past checking if the 
target is already in sys.modules, anyway), it checks the current 
sys.path against it, and calls extend_virtual_paths() on any sys.path 
entries that weren't in the old sys.path.


This is not the most efficient thing in the world, as it will cause a 
bunch of stat calls to happen against the new directories, in the 
middle of a possibly-entirely-unrelated import operation, but it 
would certainly address the issue in the Simplest Way That Could Possibly Work.


A stricter (safer) version of the same thing would be one where we 
only update __path__ values that are unchanged since we created them, 
and rather than only appending new entries, we replace the __path__ 
with a newly-computed one.


This version is safer because it avoids corner cases like I imported 
foo.bar while foo.baz 1.1 was on my path, then I prepended a 
directory to sys.path that has foo.baz 1.2, but I still get foo.baz 
1.1 when I import.  But it loses in cases where people do direct 
__path__ manipulation.


On the other hand, it's a lot easier to say you break it, you bought 
it where __path__ manipulation is concerned, so I'm actually pretty 
inclined towards using the strict version.


Hey...  here's a crazy idea.  Suppose that a virtual package __path__ 
is a *tuple* instead of a list?  Now, in order to change it, you 
*have* to replace it.  And we can cache the tuple we initially set it 
to in sys.virtual_package_paths, so we can do an 'is' check before 
replacing it.


Voila: __path__ still exists and is still a sequence for a virtual 
path, but you have to explicitly replace it if you want to do 
anything funky -- at which point you're responsible for maintaining it.


I'm tempted to say, well, why not use a list-subclass proxy, then?, 
but that means more work for no real difference.  I just went through 
dozens of examples of __path__ usage (found via Google), and I found 
exactly two examples of code that modifies a __path__ that is not:


1. In the __init__.py whose __path__ it is (i.e., code that'll still 
have a list), or
2. Modifying the __path__ of an explicitly-named self-contained 
package that's part of the same distribution.


The two examples are from Twisted, and Google AppEngine.  In the 
Twisted case, it's some sort of namespace package-like plugin 
chicanery, and in the AppEngine case, well, I'm not sure what the 
heck it's doing, but it seems to be making sure that you can still 
import stuff that has the same name as stdlib stuff, or something.


The Twisted case (and an apparent copy of the same code in a project 
called flumotion) uses ihooks, though, so I'm not sure it'll even 
get executed for virtual packages.  The Google case loops over 
everything in sys.modules, in a function by the name of 
appengine.dist.fix_paths()...  but I wasn't able to find out who 
calls this function, when and why.


So, pretty much, except for these bits of nosy code, the vast 
majority of code out there seems to only mess with its own 
self-contained paths, making the use of tuples seem like a pretty safe choice.


(Oh, and all the code I found that reads paths without modifying them 
only use tuple-safe operations.)


So, if we implement automatic __path__ updates for virtual packages, 
I'm currently leaning towards the strict approach using tuples, but 
could possibly be persuaded towards read-only list-proxies instead.


Side note: it looks like a *lot* of code out there abuses __path__[0] 
to find data files, so I probably need to add a note to the PEP about 
not doing that when you convert a self-contained package to a virtual 
one.  Of course, I suppose using a sentinel could address *that* 
problem, or an iteration-only proxy.


The main concern here is that using __path__[0] will *seem* to work 
when you first use it with a 

Re: [Python-Dev] Draft PEP: Simplified Package Layout and Partitioning

2011-07-21 Thread P.J. Eby

At 12:59 PM 7/21/2011 -0700, Reliable Domains wrote:
I assume that the implicit extend_virtual_paths() would be smart 
enough to only do real work if there are virtual packages to do it 
in, so much of the performance costs (bunch of stats) are bounded by 
the existence of and number of virtual packages that have actually 
been imported, correct?


Yes - this is true even for an explicit call.  It only does this for 
imported virtual packages, and child virtual packages are only 
checked for if the parent package exists.  So, in the case of a 
directory being added that has no parent packages, then the cost in 
stats is equal to the number of top-level, *imported* virtual packages.


The __path__ wrapper scheme can do this even better, and defer doing 
any of the stat calls until/unless another import occurs for one of 
those packages.  So if you munge sys.path and then don't import 
anything from a virtual package, no extra stat calls would happen at all.


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Draft PEP: Simplified Package Layout and Partitioning

2011-07-21 Thread P.J. Eby

At 03:04 AM 7/22/2011 +0200, Antoine Pitrou wrote:
The additional confusion lies in the fact that a module can be 
shadowed by something which is not a module (a mere global 
variable). I find it rather baffling.


If you move x.py to x/__init__.py, it does *exactly the same thing* 
in current versions of Python:


Python 2.7.1 (r271:86832, Nov 27 2010, 18:30:46) [MSC v.1500 32 bit 
(Intel)] on win32

Type help, copyright, credits or license for more information.
 from x import y
 import x.y
 x.y
module 'x.y' from 'x\y.py'
 y
5

The PEP does nothing new or different here.  If something is baffling 
you, it's the behavior of from ... import, not the actual importing process.


from x import y means import x; y = x.y.  The PEP does not 
propose we change this.  ;-)


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Draft PEP: Simplified Package Layout and Partitioning

2011-07-20 Thread P.J. Eby

At 06:46 PM 7/20/2011 +1000, Nick Coghlan wrote:

On Wed, Jul 20, 2011 at 1:58 PM, P.J. Eby p...@telecommunity.com wrote:
 So, without further ado, here it is:

I pushed this version up to the PEPs repo, so it now has a number
(402) and can be read in prettier HTML format:
http://www.python.org/dev/peps/pep-0402/


Technically, shouldn't this be a 3XXX series PEP?  Or are we not 
doing those any more now that all PEPs would be 3XXX?


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Draft PEP: Simplified Package Layout and Partitioning

2011-07-20 Thread P.J. Eby

At 02:24 AM 7/20/2011 -0700, Glenn Linderman wrote:
When I read about creating __path__ from sys.path, I immediately 
thought of the issue of programs that extend sys.path, and the above 
is the workaround for such programs.  but it requires such 
programs to do work, and there are a lot of such programs (I, a 
relative newbie, have had to write some).  As it turns out, I can't 
think of a situation where I have extended sys.path that would 
result in a problem for fancy namespace packages, because so far 
I've only written modules, not packages, and only modules are on the 
paths that I add to sys.path.  But that does not make for a general solution.


Most programs extend sys.path in order to import things.  If those 
things aren't yet imported, they don't have a __path__ yet, and so 
don't need to be fixed.  Only programs that modify sys.path *after* 
importing something that has a dynamic __path__ would need to do 
anything about that.



Is there some way to create a new __path__ that would reflect the 
fact that it has been dynamically created, rather than set from 
__init__.py, and then when it is referenced, calculate (and cache?) 
a new value of __path__ to actually search?


That's what extend_virtual_paths() is for.  It updates the __path__ 
of all currently-imported virtual packages.  Where before you wrote:


 sys.path.append('foo')

You would now write:

 sys.path.append('foo')
 pkgutil.extend_virtual_paths('foo')

...assuming you have virtual packages you've already imported.  If 
you don't, there's no reason to call extend_virtual_paths().  But it 
doesn't hurt anything if you call it unnecessarily, because it uses 
sys.virtual_packages to find out what to update, and if you haven't 
imported any virtual packages, there's nothing to update and the call 
will be a quick no-op.


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Draft PEP: Simplified Package Layout and Partitioning

2011-07-20 Thread P.J. Eby

At 10:40 AM 7/20/2011 -0400, Neal Becker wrote:
I wonder if this fixes the long-standing issue in OS vendor's 
distributions.  In

Fedora, for example, there is both arch-specific and non-arch directories:
/usr/lib/python2.7 + /usr/lib64/python2.7, for example.  Pure python 
goes into
/usr/lib/python2.7, and code including binaries goes into 
/usr/lib64/python2.7.
But if a package has both, it all has to go into 
/usr/lib64/python2.7, because

the current loader can't find pieces in 2 different directories.

You can't have both /usr/lib/python2.7/site-packages/foo and
/usr/lib64/python2.7/site-packages/foo.

So if this PEP will allow pieces of foo to be found in 2 different 
places, that

would be helpful, IMO.


It's more of a long-term solution than a short-term one.  In order 
for it to work the way you want, 'foo' would need to have its main 
code in foo.py rather than foo/__init__.py.


You could of course make that change on the author's behalf for your 
distro, or remove it altogether if it doesn't contain any actual 
code.  However, if you're going to make changes, you could change its 
__init__.py right now to append extra directories to the module 
__path__...  and that's something you can do right now.


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] [Python-checkins] peps: Restore whitespace characters lost via email transmission.

2011-07-20 Thread P.J. Eby

At 04:21 PM 7/20/2011 +0200, Éric Araujo wrote:

FYI, reST uses three-space indents, not four (so that blocks align
nicely under the leading two dots + one space), so I think the change
was intentional.  The “Documenting Python” guide tells this (in the
standard docs), and I think it applies to PEPs too.


PEP 12 prescribes four-space indents for PEPs, actually, wherever it 
mentions an specific indentation depth.  Also, a formfeed character 
was lost, not just the leading spaces.


Essentially, though, I was just merging my working copy, and those 
were the only differences that showed up (apart from the filled-in 
Post-History header), so I assumed it was just whitespace lost in transmission.


(I'm a bit surprised that three-space indents are mandated for 
anything involving documenting Python in reST, though, since that 
would mean you'd also have to indent your code samples by three 
spaces, or else have an editor that supports two different tab widths.)  


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Draft PEP: Simplified Package Layout and Partitioning

2011-07-20 Thread P.J. Eby

At 08:56 AM 7/20/2011 -0700, Jeff Hardy wrote:

On Tue, Jul 19, 2011 at 8:58 PM, P.J. Eby p...@telecommunity.com wrote:
 The biggest likely exception to the above would be when a piece of
 code tries to check whether some package is installed by importing
 it.  If this is done *only* by importing a top-level module (i.e., not
 checking for a ``__version__`` or some other attribute), *and* there
 is a directory of the same name as the sought-for package on
 ``sys.path`` somewhere, *and* the package is not actually installed,
 then such code could *perhaps* be fooled into thinking a package is
 installed that really isn't.

This part worries me slightly. Imagine a program as such:

datagen.py
json/foo.js
json/bar.js

datagen.py uses the files in json/ to generate sample data for a
database. In datagen.py is the following code:

try:
import json
except ImportError:
import simplejson as json

Currently, this works just fine, but if will break (as I understand
it) under the PEP because the json directory will become a virtual
package and no ImportError will be raised.


Well, it won't fail as long if there actually *is* a json module or 
package on the path.  ;-)  But I do see your point.




Is there a mitigation for this in the PEP that I've missed?


A possible mitigation would be to require that get_subpath() only 
return a directory name if that directory in fact contains importable 
modules somewhere.  This is actually discussed a bit later as an open 
issue under Implementation Notes, indicating that iter_modules() 
has this issue as well.


The main open questions in doing this kind of checking have to do 
with recursion: it's perfectly valid to have say, a 'zc/' directory 
whose only content is a 'buildout/' subdirectory.


Of course, it still wouldn't help if the 'json/' subdirectory in your 
example did contain .py files.


There is another possibility, though:

What if we change the logic for pure-virtual package creation so that 
the parent module is created *if and only if* a child module is found?


In that case, trying to import a pure virtual 'zc' package would 
fail, but importing 'zc.buildout' would succeed as long as there was 
a zc/buildout.py or a zc/buildout/__init__.py somewhere.


And in your example, 'import json' would fail -- which is to say, succeed.  ;-)

This is a minor change to the spec, though perhaps a bit hairier to 
implement in practice.


The current import.c loop over the module name parts (iterating over 
say, 'zc', then 'buildout', and importing them in turn) would need to 
be reworked so that it could either roll back the virtual package 
creation in the event of sub-import failure or conversely delay 
creation of the parent package(s) until a sub-import finds a module.


I certainly think it's *doable*, mind you, but I'd hate to have to do 
it in C.  ;-)


Hm.  Here's another variant that might be easier to implement (even 
in C), and could offer some other advantages as well.


Suppose we replace the sys.virtual_packages set() with a 
sys.virtual_paths dict(): a dictionary that maps from module names to 
__path__ lists, and that's populated by the __path__ creation 
algorithm described in the PEP.  (An empty list would mean that 
__path__ creation failed for that module/package name.)


Now, if a module doesn't have a __path__ (or doesn't exist), we look 
in sys.virtual_paths for the module name.  If the retrieved list is 
empty, we fail the import.  If it's not, we proceed...  but *don't* 
create a module or set the existing module's __path__.


Then, at the point where an import succeeds, and we're going to set 
an attribute on the parent module, we recursively construct parent 
modules and set their __path__ attributes from sys.virtual_paths, if 
a module doesn't exist in sys.path, or its __path__ isn't set.


Voila.  Now there are fewer introspection problems as well: trying to 
'import json.foo' when there's no 'foo.py' in any json/ directory 
will *not* create an empty 'json' package in sys.modules as a 
side-effect.  And it won't add a __path__ to the 'json' module if 
there were a json.py found, either.


What's more, since importing a pure virtual package now fails unless 
you've successfully imported something from it before, it makes more 
sense for it to not have a __file__, or a __file__ of None.


Actually, it's too bad that we have to have parent packages in 
sys.modules, or I'd suggest we just make pure virtual packages 
unimportable, period.


Technically, we *could* always create dummy parent modules for 
virtual packages and *not* put them in sys.modules, but I'm not sure 
if that's a good idea.  It would be more consistent in some ways with 
the idea that virtual packages are not directly importable, but an 
interesting side effect would be that if module A does:


  import foo.bar

and module B does:

  import foo.baz

Then module A's version of 'foo' has *only* a 'bar' attribute and B's 
version has *only* a 'baz' attribute

Re: [Python-Dev] Draft PEP: Simplified Package Layout and Partitioning

2011-07-20 Thread P.J. Eby

At 12:37 PM 7/20/2011 -0400, Erik wrote:

The best solution I can think of would be to have a way for a module
to mark itself as finalized (I'm not sure if that's the best
term--just the first that popped into my head).  This would prevent
its __path__ from being created or extended in any way.  For example,
if the json module contains `__finalized__ = True` or something of the
like, any `import json.foo` would immediately fail.


That wouldn't actually fix the problem Jeff brought up, which was the 
case where there *wasn't* a json.py.


In any case, we can fix this now by banning direct import of 
pure-virtual packages.




In that case there would need to be a way to mark a directory as not
containing importable code.  Not sure what the best approach to that
would be, especially since one of the goals of this PEP seems to be to
avoid marker files.


For this particular issue, we don't need it.  For tools that process 
Python code, or use pkgutil.walk_modules(), there may still be use 
cases, so we'll keep an eye open for relevant input.  Hopefully 
someone will say something that jars loose an idea or two, as 
happened with Jeff's issue above.


(Btw, as we speak, I am swiping Jeff's example and adding it into the 
PEP.  ;-)  It makes a great motivating example for banning 
pure-virtual package imports.)


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Draft PEP: Simplified Package Layout and Partitioning

2011-07-20 Thread P.J. Eby

At 01:35 PM 7/20/2011 -0600, Eric Snow wrote:

This is a really nice solution.  So a virtual package is not imported
until a submodule of the virtual package is successfully imported


Correct...


(except for direct import of pure virtual packages).


Not correct.  ;-)  What we do is avoid creating a parent module or 
altering its __path__ until a submodule/subpackage import is just 
about to be successfully completed.


See the change I just pushed to the PEP:

   http://hg.python.org/peps/rev/a6f02035c66c

Or read the revised Specification section here (which is a bit easier 
to read than the diff):


   http://www.python.org/dev/peps/pep-0402/#specification

The change is basically that we wait until a successful find_module() 
happens before creating or tweaking any parent modules.  This way, 
the load_module() part will still see an initialized parent package 
in sys.modules, and if it does any relative imports, they'll still work.


(It *does* mean that if an error happens during load_module(), then 
future imports of the virtual package will succeed, but I'm okay with 
that corner case.)




It seems like
sys.virtual_packages should be populated even during a failed
submodule import.  Is that right?


Yes.  In the actual draft, btw, I dubbed it 
``sys.virtual_package_paths`` and made it a dictionary.  This 
actually makes the pkgutil.extend_path() code more general: it'll be 
able to fix the paths of things you haven't actually imported yet.  ;-)




Also, it makes sense that the above applies to all virtual packages,
not just pure ones.


Well, if the package isn't pure then what you've imported is really 
just an ordinary module, not a package at all.  ;-)





When a pure virtual package is directly imported, a new [empty] module
is created and its __path__ is set to the matching value in
sys.virtual_packages.  However, an impure virtual package is not
created upon direct import, and its __path__ is not updated until a
submodule import is attempted.  Even the sys.virtual_packages entry is
not generated until the submodule attempt, since the virtual package
mechanism doesn't kick in until the point that an ImportError is
currently raised.

This isn't that big a deal, but it would be the one behavioral
difference between the two kinds of virtual packages.  So either leave
that one difference, disallow direct import of pure virtual packages,
or attempt to make virtual packages for all non-package imports.  That
last one would impose the virtual package overhead on many more
imports so it is probably too impractical.  I'm fine with leaving the
one difference.


At this point, I've updated the PEP to disallow direct imports of 
pure virtual packages.  AFAICT it's the only approach that ensures 
you can't get false positive imports by having 
unrelated-but-similarly-named directories floating around.


So, really, there's not a difference, except that you can't import a 
useless empty module that you have no real business importing in the 
first place...  and I'm fine with that.  ;-)




FYI, last night I started on an importlib-based implementation for the
PEP and the above solution would be really easy to incorporate.


Well, you might want to double-check that now that I've updated the 
spec.  ;-)  In the new approach, you cannot rely on parent modules 
existing before proceeding to the submodule import.


However, I've just glanced at the importlib trunk, and I think I see 
what you mean.  It's already using a recursive approach, rather than 
an iterative one, so the change should be a lot simpler there than in import.c.


There probably just needs to be a pair of functions like:

def _get_parent_path(parent):
pmod = sys.modules.get(parent)
if pmod is None:
try:
pmod = _gcd_import(parent)
except ImportError:
# Can't import parent, is it a virtual package?
path = imp.get_virtual_path(parent)
if not path:
# no, allow the parent's import error to propagate
raise
return path
if hasattr(pmod, '__path__'):
return pmod.__path__
else:
return imp.get_virtual_path(parent)

def _get_parent_module(parent):
pmod = sys.modules.get(parent)
if pmod is None:
pmod = sys.modules[parent] = imp.new_module(parent)
if '.' in parent:
head, _, tail = parent.rpartition('.')
setattr(_get_parent_module(head), tail, pmod)
if not hasattr(pmod, '__path__'):
pmod.__path__ = imp.get_virtual_path(parent)
return pmod

And then instead of hanging on to parent_module during the import 
process, you'd just grab a path from _get_parent_path(), and 
initialize parent_module a little later, i.e.:


if parent:
path = _get_parent_path(parent)
if not path:
msg = (_ERR_MSG + '; {} is not a 

Re: [Python-Dev] Draft PEP: Simplified Package Layout and Partitioning

2011-07-20 Thread P.J. Eby

At 03:22 PM 7/20/2011 -0600, Eric Snow wrote:

On Wed, Jul 20, 2011 at 2:44 PM, P.J. Eby p...@telecommunity.com wrote:

 So, yeah, actually, that's looking pretty sweet.  Basically, we 
just have to

 throw a virtual_package_paths dict into the sys module, and do the above
 along with the get_virtual_path() function and add get_subpath() to the
 importer objects, in order to get the PEP's core functionality working.


Exactly.  That's part of why the importlib approach is so appealing to
me.


Actually, it turns out I was a little too optimistic -- the sketch I 
gave doesn't work right for anything but top-level virtual packages, 
because I didn't take into account the part where get_virtual_path() 
needs a parent path.


Fixing *that* error then leads to a really nasty bit of mutual 
recursion in which the parent module imports are attempted over and 
over again in something like O(N**2), I think.  In order to get rid 
of that, _gcd_import would have to grow some internal memoization so 
it doesn't retry the same imports repeatedly.


Ironically enough, this is because _gcd_import() is recursive, and 
thus attempts the imports in the opposite order (sort of) than 
import.c does, which means that you can't get hold of the parent's 
__path__ without recursing (again).  :-(


And trying to work around that with memoization, led me to the 
realization that you actually can't implement PEP 402 using that type 
of recursion.  That is, to implement the spec correctly, _gcd_import 
is going to have to be refactored to iterate left-to-right over 
module name parts, rather than recursing right-to-left.


That's because PEP 402 only allows for processing a virtual path if a 
module is not found, *not* if a module is found but can't be loaded.


But, with importlib currently being recursive, it only knows that a 
parent import failed via ImportError, not whether that error arose 
from failing to find the module, or failing to load the module!


So, the core part of the _gcd_import() function will need to be 
rewritten to iterate instead of recursing.


(Still, it's probably not going to be *terribly* difficult.  I'll 
take a look at doing a sketch of that next, but if I do one I'll send 
it to Import-SIG instead of here; it's not a detail that matters to 
the general PEP discussion.)


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Draft PEP: Simplified Package Layout and Partitioning

2011-07-20 Thread P.J. Eby

At 03:09 PM 7/20/2011 -0700, Glenn Linderman wrote:

On 7/20/2011 6:05 AM, P.J. Eby wrote:

At 02:24 AM 7/20/2011 -0700, Glenn Linderman wrote:
When I read about creating __path__ from sys.path, I immediately 
thought of the issue of programs that extend sys.path, and the 
above is the workaround for such programs.  but it requires 
such programs to do work, and there are a lot of such programs (I, 
a relative newbie, have had to write some).  As it turns out, I 
can't think of a situation where I have extended sys.path that 
would result in a problem for fancy namespace packages, because so 
far I've only written modules, not packages, and only modules are 
on the paths that I add to sys.path.  But that does not make 
for a general solution.


Most programs extend sys.path in order to import things.  If those 
things aren't yet imported, they don't have a __path__ yet, and so 
don't need to be fixed.  Only programs that modify sys.path 
*after* importing something that has a dynamic __path__ would need 
to do anything about that.


Sure.  But there are a lot of things already imported by Python 
itself, and if this mechanism gets used in the stdlib, a program 
wouldn't know whether it is safe or not, to not bother with the 
pkgutil.extend_virtual_paths() call or not.


I'm not sure I see how the mechanism could meaningfully be used in 
the stdlib, since IIUC we're not going for Perl-style package 
naming.  So, all stdlib packages would be self-contained.



Plus, that requires importing pkgutil, which isn't necessarily done 
by every program that extends the sys.path (import sys is 
sufficient at present).


Plus, if some 3rd party packages are imported before sys.path is 
extended, the knowledge of how they are implement is required to 
make a choice about whether it is needed to import pkgutil and call 
extend_virtual_paths or not.


I'd recommend *always* using it, outside of simple startup code.



So I am still left with my original question:

Is there some way to create a new __path__ that would reflect the 
fact that it has been dynamically created, rather than set from 
__init__.py, and then when it is referenced, calculate (and 
cache?) a new value of __path__ to actually search?


Hm.  Yes, there is a way to do something like that, but it would 
complicate things a bit.  We'd need to:


1. Leave __path__ off of the modules, and always pull them from 
sys.virtual_package_paths, and


2. Before using a value in sys.virtual_package_paths, we'd need to 
check whether sys.path had changed since we last cached anything, and 
if so, clear sys.virtual_package_paths first, to force a refresh.


This doesn't sound particularly forbidding, but there are various 
unpleasant consequences, like being unable to tell whether a module 
is a package or not, and whether it's a virtual package or not.  We'd 
have to invent new ways to denote these things.


On the bright side, though, it *would* allow transparent live updates 
to virtual package paths, so it might be worth considering.


By the way, the reason we have to get rid of __path__ is that if we 
kept it, then code could change it, and then we wouldn't know if it 
was actually safe to change it automatically...  even if no code had 
actually changed it.


In principle, we could keep __path__ attributes around, and 
automatically update them in the case where sys.path has changed, so 
long as user code hasn't directly altered or replaced the 
__path__.  But it seems to me to be a dangerous corner case; I'd 
rather that code which touches __path__ be taking responsibility for 
that path's correctness from then on, rather than having it get 
updated (possibly incorrectly) behind its back.


So, I'd say that for this approach, we'd have to actually leave 
__path__ off of virtual packages' parent modules.


Anyway, it seems worth considering.  We just need to sort out what 
the downsides are for any current tools thinking that such modules 
aren't packages.  (But hey, at least it'll be consistent with what 
such tools would think of the on-disk representation!  That is, a 
tool that thinks foo.py alongside a foo/ subdirectory is just a 
module with no package, will also think that 'foo', once imported, is 
a module with no package.)



And, in the absence of knowing (because I didn't write them) whether 
any of the packages I imported before extending sys.path are virtual 
packages or not, I would have to do this every time I extend 
sys.path.  And so it becomes a burden on writing programs.


If the code is so boilerplate as you describe, should sys.path 
become an object that acts like a list, instead of a list, and have 
its append method automatically do the pkgutil.extend_virtual_paths 
for me?  Then I wouldn't have to worry about whether any of the 
packages I imported were virtual packages or not.


Well, then we'd have to worry about other mutation methods, and 
things like 'sys.path = [blah, blah]', as well.  So if we're going to 
ditch

[Python-Dev] Draft PEP: Simplified Package Layout and Partitioning

2011-07-19 Thread P.J. Eby
So, over on the Import-SIG, we were talking about the implementation 
and terminology for PEP 382, and it became increasingly obvious that 
things were, well, not entirely okay in the implementation is easy 
to explain department.


Anyway, to make a long story short, we came up with an alternative 
implementation plan that actually solves some other problems besides 
the one that PEP 382 sets out to solve, and whose implementation a 
bit is easier to explain.  (In fact, for users coming from various 
other languages, it hardly needs any explanation at all.)


However, for long-time users of Python, the approach may require a 
bit more justification, which is why roughly 2/3rds of the PEP 
consists of a detailed rationale, specification overview, rejected 
alternatives, and backwards-compatibility discussion...  which is 
still a lot less verbiage than reading through the lengthy Import-SIG 
threads that led up to the proposal.  ;-)  (The remaining 1/3rd of 
the PEP is the short, sweet, and easy-to-explain implementation detail.)


Anyway, the PEP has already been discussed on the Import-SIG, and is 
proposed as an alternative to PEP 382 (Namespace packages).  We 
expect, however, that many people will be interested in it for 
reasons having little to do with the namespace packaging use case.


So, we would like to submit this for discussion, hole-finding, and 
eventual Pronouncement.  As Barry put it, I think it's certainly 
worthy of posting to python-dev to see if anybody else can shoot 
holes in it, or come up with useful solutions to open 
questions.  I'll be very interested to see Guido's reaction to it. :)


So, without further ado, here it is:

PEP: XXX
Title: Simplified Package Layout and Partitioning
Version: $Revision$
Last-Modified: $Date$
Author: P.J. Eby
Status: Draft
Type: Standards Track
Content-Type: text/x-rst
Created: 12-Jul-2011
Python-Version: 3.3
Post-History:
Replaces: 382

Abstract


This PEP proposes an enhancement to Python's package importing
to:

* Surprise users of other languages less,
* Make it easier to convert a module into a package, and
* Support dividing packages into separately installed components
  (ala namespace packages, as described in PEP 382)

The proposed enhancements do not change the semantics of any
currently-importable directory layouts, but make it possible for
packages to use a simplified directory layout (that is not importable
currently).

However, the proposed changes do NOT add any performance overhead to
the importing of existing modules or packages, and performance for the
new directory layout should be about the same as that of previous
namespace package solutions (such as ``pkgutil.extend_path()``).


The Problem
===

.. epigraph::

Most packages are like modules.  Their contents are highly
interdependent and can't be pulled apart.  [However,] some
packages exist to provide a separate namespace. ...  It should
be possible to distribute sub-packages or submodules of these
[namespace packages] independently.

-- Jim Fulton, shortly before the release of Python 2.3 [1]_


When new users come to Python from other languages, they are often
confused by Python's packaging semantics.  At Google, for example,
Guido received complaints from a large crowd with pitchforks [2]_
that the requirement for packages to contain an ``__init__`` module
was a misfeature, and should be dropped.

In addition, users coming from languages like Java or Perl are
sometimes confused by a difference in Python's import path searching.

In most other languages that have a similar path mechanism to Python's
``sys.path``, a package is merely a namespace that contains modules
or classes, and can thus be spread across multiple directories in
the language's path.  In Perl, for instance, a ``Foo::Bar`` module
will be searched for in ``Foo/`` subdirectories all along the module
include path, not just in the first such subdirectory found.

Worse, this is not just a problem for new users: it prevents *anyone*
from easily splitting a package into separately-installable
components.  In Perl terms, it would be as if every possible ``Net::``
module on CPAN had to be bundled up and shipped in a single tarball!

For that reason, various workarounds for this latter limitation exist,
circulated under the term namespace packages.  The Python standard
library has provided one such workaround since Python 2.3 (via the
``pkgutil.extend_path()`` function), and the setuptools package
provides another (via ``pkg_resources.declare_namespace()``).

The workarounds themselves, however, fall prey to a *third* issue with
Python's way of laying out packages in the filesystem.

Because a package *must* contain an ``__init__`` module, any attempt
to distribute modules for that package must necessarily include that
``__init__`` module, if those modules are to be importable.

However, the very fact that each distribution of modules for a package
must contain

Re: [Python-Dev] EuroPython Language Summit report

2011-06-26 Thread P.J. Eby

At 12:32 PM 6/25/2011 -0400, R. David Murray wrote:

So your proposed code would allow me, when writing a generator in
my code, do something that would allow me to yield up all the
values from an arbitrary generator I'm calling, over which I have
no control (ie: I can't modify its code)?


With a decorator on your own function, yes.  See:

  http://mail.python.org/pipermail/python-dev/2010-July/102320.html

for details.  Mostly, though, that proposal was a suggestion for how 
the optimized implementation would work - i.e., a suggestion that 
PEP 380 be implemented that way under the hood, by implicitly turning 
'yield from' into 'yield From()' and wrapping the generator itself 
with another From() instance.


(IOW, that was a proposal for how to avoid the extra overhead of 
recursive yielding in a series of nested yield-from's.)


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] EuroPython Language Summit report

2011-06-25 Thread P.J. Eby

At 10:46 AM 6/25/2011 +1000, Nick Coghlan wrote:

Indeed, PEP 380 is *really* hard to do properly without language
support.


No, it isn't.  You add a decorator, a 'from_' class, and a 'return_' 
function, and there you go.  (See my previous code sketches here in 
early PEP 380 discussions.)


Python frameworks have been doing variations of the same thing (with 
varying features and APIs) for at least 7 years now -- even on Python 
versions that lack decorators or the ability to return values from 
yield statements.


So the main benefit of a PEP for this functionality would be 
providing a common implementation/API - and that could be initially 
done in the stdlib, without any added syntax support.


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Python 3.x and bytes

2011-06-14 Thread P.J. Eby

At 01:56 AM 6/14/2011 +, exar...@twistedmatrix.com wrote:

On 12:35 am, ncogh...@gmail.com wrote:

On Tue, Jun 14, 2011 at 9:40 AM, P.J. Eby p...@telecommunity.com wrote:

You can still do it one at a time:

CHAR, = b'C'
INT,  = b'I'
...

etc.  I just tried it with Python 3.1 and it works there.


I almost mentioned that, although it does violate one of the
unwritten rules of the Zen (in this case, syntax shall not look
like grit on Tim's monitor)


   [CHAR] = b'C'
   [INT]  = b'I'
   ...


Holy carpal tunnel time machine...  That works in 2.3.  (Without the 
'b' of course.)  Didn't know you could just use list syntax like 
that.  It's an extra character to type, and two more shift keyings, 
but brevity isn't always the soul of clarity.


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Python 3.x and bytes

2011-06-13 Thread P.J. Eby

At 03:11 PM 6/13/2011 -0700, Ethan Furman wrote:

Nick Coghlan wrote:
 Agreed, but:

 EOH, CHAR, DATE, FLOAT, INT, LOGICAL, MEMO, NUMBER = b'\rCDFILMN'

 is a shorter way to write the same thing.

 Going two per line makes it easier to mentally map the characters:

 EOH, CHAR = b'\rC'
 DATE, FLOAT = b'DF'
 INT, LOGICAL = b'IL'
 MEMO, NUMBER = b'MN'

Wow.  I didn't realize that could be done.  That very nearly makes 
up for not being able to do it one char at a time.


You can still do it one at a time:

CHAR, = b'C'
INT,  = b'I'
...

etc.  I just tried it with Python 3.1 and it works there.

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Python jails

2011-06-10 Thread P.J. Eby

At 06:23 PM 6/10/2011 -0600, Sam Edwards wrote:

I have a couple remaining issues that I haven't quite sussed out:
[long list of questions deleted]


You might be able to answer some of them by looking at this project:

  http://pypi.python.org/pypi/RestrictedPython

Which implements the necessary ground machinery for doing that sort 
of thing, in the form of a specialized Python compiler (implemented 
in Python, for 2.3 through 2.7) that allows you to implement whatever 
sorts of guards and security policies you want on top of it.


Even if it doesn't answer all your questions in and of itself, it may 
prove a fruitful environment in which you can experiment with various 
approaches and see which ones you actually like, without first having 
to write a bunch of code yourself.


Discussing an official implementation of this sort of thing as a 
language feature is probably best left to python-ideas, though, until 
and unless you actually have a PEP to propose.


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] python and super

2011-04-14 Thread P.J. Eby

At 03:55 PM 4/14/2011 +0100, Michael Foord wrote:
Ricardo isn't suggesting that Python should always call super for 
you, but when you *start* the chain by calling super then Python 
could ensure that all the methods are called for you. If an 
individual method doesn't call super then a theoretical 
implementation could skip the parents

methods (unless another child calls super).


That would break classes that deliberately don't call super.  I can 
think of examples in my own code that would break, especially in 
__init__() cases.


It's perfectly sensible and useful for there to be classes that 
intentionally fail to call super(), and yet have a subclass that 
wants to use super().  So, this change would expose an internal 
implementation detail of a class to its subclasses, and make fragile 
base class problems worse.  (i.e., where an internal change to a 
base class breaks a previously-working subclass).


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 396, Module Version Numbers

2011-04-10 Thread P.J. Eby

At 03:24 PM 4/10/2011 +, exar...@twistedmatrix.com wrote:

On 04:02 am, p...@telecommunity.com wrote:

At 08:52 AM 4/10/2011 +1000, Ben Finney wrote:

This is an often-overlooked case, I think. The unspoken assumption is
often that ``setup.py`` is a suitable place for the overall version
string, but this is not the case when that string must be read by
non-Python programs.


If you haven't used the distutils a lot, you might not realize that 
you can do this:


$ python setup.py --version
0.6c12

(The --name option also works, and they can be used together -- the 
answers will be on two separate lines.)


This only works as long as setup.py is around - which it typically 
no longer is after installation is complete.


And though it's common and acceptable enough to launch a child 
process in a shell script in order to get some piece of information, 
it isn't as pleasant in a Python program.  Can you get this version 
information out of setup.py without running a child process and 
without monkey-patching sys.argv and sys.stdout?


I was replying to the part above about setup.py ...  must be read by 
non-Python programs.


In other words, I thought the question was, given a 
not-yet-installed source package, how can we find the version number 
without writing Python code.  Your question is a bit different.  ;-)


As it happens, if you have a source distribution of a package, you 
can expect to find a PKG-INFO file that contains version info anyway, 
generated from the source file.  This is true for both distutils and 
setuptools-built source distributions.  (It is not the case, alas, 
for simple revision control checkouts.)


Anyway, I was merely addressing the technical question of how to get 
information from the tools that already exist, rather than advocating 
any solutions.


And, along that same line, monkeypatching sys.argv and sys.stdout 
aren't technically necessary for you to get the information from a 
setup script, but a sandbox to keep the setup script from trying to 
do any installation steps is probably a good idea.  (Some people have 
written setup scripts that actually copy files or do other things 
before they even call setup().  Nasty -- and one of the reasons that 
easy_install has a sandboxing facility.)


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 396, Module Version Numbers

2011-04-09 Thread P.J. Eby

At 08:52 AM 4/10/2011 +1000, Ben Finney wrote:

This is an often-overlooked case, I think. The unspoken assumption is
often that ``setup.py`` is a suitable place for the overall version
string, but this is not the case when that string must be read by
non-Python programs.


If you haven't used the distutils a lot, you might not realize that 
you can do this:


$ python setup.py --version
0.6c12

(The --name option also works, and they can be used together -- the 
answers will be on two separate lines.)


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] The purpose of SETUP_LOOP, BREAK_LOOP, CONTINUE_LOOP

2011-03-12 Thread P.J. Eby

At 08:25 AM 3/12/2011 -0500, Eugene Toder wrote:

Right, I'm not suggesting to remove all blocks, only SETUP_LOOP
blocks. Do you see the problem in that case?


I think you guys are forgetting about FOR_ITER, listcomps, and the like.

That is, IIRC, the reason loops use the block stack is because they 
put things on the regular stack, that need to be cleared off the 
stack when the loop is exited (whether normally or via an exception).


In other words, just jumping out of a loop without popping the block 
stack would leave junk on the regular stack, thereby failing to 
deallocate the loop iterator.  In the case of a nested loop, this 
would also mean that the outer loop would start using the inner 
loop's iterator, and all sorts of hilarity would then ensue.




___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 395: Module Aliasing

2011-03-09 Thread P.J. Eby

At 05:35 PM 3/4/2011 +, Michael Foord wrote:
That (below) is not distutils it is setuptools. distutils just uses 
`scripts=[...]`, which annoyingly *doesn't* work with setuptools.


Er, what?  That's news to me.  Could you file a bug report about what 
doesn't work, specifically?


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 3333: wsgi_string() function

2011-01-07 Thread P.J. Eby

At 09:43 AM 1/7/2011 -0500, James Y Knight wrote:

On Jan 7, 2011, at 6:51 AM, Victor Stinner wrote:
 I don't understand why you are attached to this horrible hack
 (bytes-in-unicode). It introduces more work and more confusing than
 using raw bytes unchanged.

 It doesn't work and so something has to be changed.

It's gross but it does work. This has been discussed ad-nausium on 
web-sig over a period of years.


I'd like to reiterate that it is only even a potential issue for the 
PATH_INFO/SCRIPT_NAME keys. Those two keys are required to have been 
urldecoded already, into byte-data in some encoding. For all the 
other keys (including the ones from os.environ), they are either 
*properly* decoded in 8859-1 or are just ascii (possibly still 
urlencoded, so the app needs to urldecode and decode into a string 
with the correct encoding).


Right.  Also, it should be mentioned that none of this would be 
necessary if we could've gotten a bytes of a known encoding 
type.  If you look back to the last big Python-Dev discussion on 
bytes/unicode and stdlib API breakage, this was the holdup for 
getting a sane WSGI spec.


Since we couldn't change the language to fix the problem (due to the 
moratorium), we had to use this less-pleasant way of dealing with 
things, in order to get a final WSGI spec for Python 3.


(If anybody is wondering about the specifics of the language change 
that was needed, it'd be having a bytes with known encoding type, 
that when combined in any polymorphic operation with a unicode 
string, would result in bytes-with-encoding output, and would raise 
an error if the resulting value could not be encoded in the target 
encoding.  Then we would simply do all WSGI header operations with 
this type, using latin-1 as the target encoding.)


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 3333: wsgi_string() function

2011-01-06 Thread P.J. Eby

At 04:00 PM 1/6/2011 -0800, Raymond Hettinger wrote:

Can you please take a look at
http://docs.python.org/dev/whatsnew/3.2.html#pep--python-web-server-gateway-interface-v1-0-1http://docs.python.org/dev/whatsnew/3.2.html#pep--python-web-server-gateway-interface-v1-0-1
to see if it accurately recaps the resolution of the WSGI text/bytes issues.
I would appreciate any feedback, as it is likely that the whatsnew
document will be most people's first chance to hear the outcome
of the multi-year discussion.


Hi Raymond -- nice work there.  A few minor suggestions:

1. Native strings are used as the keys and values of the environ 
dictionary, not just as headers for start_response.


2. The read_environ() method is strictly for use with CGI-to-WSGI 
gateways, or for bridging other CGI-like protocols (e.g. FastCGI) to 
WSGI.  It is ONLY for server implementers, in other words, and the 
typical app developer is doing something terribly wrong if they are 
even bothering to read its documentation.  ;-)


3. The primary relevance of the native string type to an app 
developer is that when porting code from Python 2 to 3, they must 
still decode environment variable values, even though they are 
already Unicode.  If their code was previously dealing only in 
Python 2 'str' objects, then nothing really changes.  If they were 
previously decoding from environ str's to unicode, then they must 
replace their prior .decode('whatever') with 
.encode('latin1').decode('whatever').  That's basically it for 
porting from Python 2.


IOW, this design choice allows most HTTP header manipulating code 
(whether input or output) to be ported to Python 3 with a very 
mechanical change pattern.  Most such code is working with ASCII 
anyway, since normally both input and output headers are, and there 
are few headers that an application would be likely to convert to 
actual unicode anyway.


On output via send_response(), if an application is currently 
encoding an output header  -- why they would be, I have no idea, but 
if they are -- they need to add a re-encode to latin1.  (i.e., 
.encode('whatever').decode('latin1'))


IOW, a short 2-to-3 porting guide for WSGI:

* If you just used strings for headers before, that part of your code 
doesn't change.  (And if it was broken before, it's still broken in 
exactly the same way.  No new breakage is introduced. ;-) )


* If you encoded any output headers or decoded any input headers, you 
must take into account the extra latin1 step.  This is expected to be 
rare, since it's usually only SCRIPT_NAME and PATH_INFO that anybody 
would ever care about on input, and almost never anything on output.


* Values yielded by an application or sent via a write() call MUST be 
byte strings; The environ and start_response() MUST be native 
strings.  No mixing and matching.


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 3333: wsgi_string() function

2011-01-04 Thread P.J. Eby

At 03:44 AM 1/4/2011 +0100, Victor Stinner wrote:

Hi,

In the PEP , I read:
--
import os, sys

enc, esc = sys.getfilesystemencoding(), 'surrogateescape'

def wsgi_string(u):
# Convert an environment variable to a WSGI bytes-as-unicode
string
return u.encode(enc, esc).decode('iso-8859-1')

def run_with_cgi(application):
environ = {k: wsgi_string(v) for k,v in os.environ.items()}
environ['wsgi.input']= sys.stdin
environ['wsgi.errors']   = sys.stderr
environ['wsgi.version']  = (1, 0)
...
--

What is this horrible encoding bytes-as-unicode? os.environ is
supposed to be correctly decoded and contain valid unicode characters.
If WSGI uses another encoding than the locale encoding (which is a bad
idea), it should use os.environb and decodes keys and values using its
own encoding.

If you really want to store bytes in unicode, str is not the right type:
use the bytes type and use os.environb instead.


If you want to discuss this, the Web-SIG is the appropriate 
place.  Also, it was the appropriate place months ago, when the final 
decision on the environ encoding was made.  ;-)


IOW, the above change to the PEP is merely fixing the code example to 
be correct for Python 3, where it previously was correct only for 
Python 2.  The PEP itself has already required this since the 
previous revisions, and wsgiref in the stdlib is already compliant 
with the above (although it uses a more sophisticated approach for 
dealing with win32 compatibility).


The rationale for this choice is described in the PEP, and was also 
discussed in the mailing list emails back when the work was being done.


IOW, this particular ship already sailed a long time ago.  In fact, 
for Jython this bytes-as-unicode approach has been the PEP 
333-defined encoding for at least *six years*...  so it's REALLY late 
to complain about it now! ;-)


PEP  is merely a mapping of PEP 333 to allow WSGI apps to be 
ported from Python 2 to Python 3.  There is work in progress on the 
Web-SIG now on PEP 444, which will support only Python 2.6+, where 
'b' literals and the 'bytes' alias are available.  It is as yet 
uncertain what environ encoding will be used, but at the moment I'm 
not convinced that either pure bytes or pure unicode are acceptable 
replacements for the PEP 333-compatible approach.


In any event, that is a discussion for the Web-SIG, not Python-Dev.

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] ICU

2010-12-02 Thread P.J. Eby

At 07:47 AM 12/2/2010 -0800, Guido van Rossum wrote:

On Wed, Dec 1, 2010 at 8:45 PM, Alexander Belopolsky
alexander.belopol...@gmail.com wrote:
 On Tue, Nov 30, 2010 at 3:13 PM, Antoine Pitrou 
solip...@pitrou.net wrote:


 Oh, about ICU:

  Actually, I remember you saying that locale should ideally be replaced
  with a wrapper around the ICU library.

 By that, I stand - however, I have given up the hope that this will
 happen anytime soon.

 Perhaps this could be made a GSOC topic.


 Incidentally, this may also address another Python's Achilles' heel:
 the timezone support.

 http://icu-project.org/download/icutzu.html

I work with people who speak highly of ICU, so I want to encourage
work in this area.

At the same time, I'm skeptical -- IIRC, ICU is a large amount of C++
code. I don't know how easy it will be to integrate this into our
build processes for various platforms, nor how Pythonic the
resulting APIs will look to the experienced Python user.

Still, those are not roadblocks, the benefits are potentially great,
so it's definitely worth investigating!


FWIW, OSAF did a wrapping for Chandler, though I personally haven't used it:

   http://pyicu.osafoundation.org/

The README explains the mapping from the ICU APIs to Python ones, 
including iteration, string conversion, and timezone mapping for use 
with the datetime type.




--
--Guido van Rossum (python.org/~guido)
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/pje%40telecommunity.com


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] constant/enum type in stdlib

2010-11-23 Thread P.J. Eby

At 11:31 AM 11/23/2010 -0500, Barry Warsaw wrote:

On Nov 23, 2010, at 03:15 PM, Michael Foord wrote:

(Well, there is a third option that takes __name__ and sets the constants in
the module automagically. I can understand why people would dislike that
though.)

Personally, I think if you want that, then the explicit class definition is a
better way to go.


This reminds me: a stdlib enum should support proper pickling and 
copying; i.e.:


   assert SomeEnum.anEnum is pickle.loads(pickle.dumps(SomeEnum.anEnum))

This could probably be implemented by adding something like:

   def __reduce__(self):
   return getattr, (self._class, self._enumname)

in the EnumValue class.

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Issue 10194 - Adding a gc.remap() function

2010-10-26 Thread P.J. Eby

At 10:24 AM 10/26/2010 -0700, Peter Ingebretson wrote:

I have a relatively large application written in Python, and a
specific use case where it will significantly increase our speed
of iteration to be able to change and test modules without needing
to restart the application.


If all you really want this for is reloading, it would probably make 
more sense to simply modify the existing class and function objects 
using the reloaded values as a template, then save the modified 
classes and functions back to the module.


Have you tried http://pypi.python.org/pypi/plone.reload or 
http://svn.python.org/projects/sandbox/trunk/xreload/xreload.py, or 
any other existing code reloaders, or tried extending them for your 
specific use case?


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Exposing pkguitl's import emulation (was Re: [Python-checkins] r85538 - python/branches/py3k/Doc/library/pkgutil.rst)

2010-10-19 Thread P.J. Eby

At 08:03 AM 10/18/2010 +1000, Nick Coghlan wrote:

I'm a little dubious about exposing these officially. They're mainly a
hack to get some parts of the standard library working (e.g. runpy) in
the absence of full PEP 302 support in the imp module, not really
something we want to encourage anyone else to use (and yes, they
should probably have underscores in their names, but we missed that
when the various private implementations scattered around the stdlib
were consolidated in pkgutil).


Well, my intention at least was that they should be documented and 
released; it's the documenting part I didn't get around to.  ;-)


Of course, this was also pre-importlib; were we starting the work 
today, the obvious thing to do would be to expose the Python 
implementations of the relevant objects.




That said, who knows when we'll actually have it done right, so in the
meantime maybe having an official workaround is better than nothing...

Cheers,
Nick.

--
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/pje%40telecommunity.com


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] [Web-SIG] WSGI is now Python 3-friendly

2010-09-27 Thread P.J. Eby

At 01:22 PM 9/27/2010 -0400, Terry Reedy wrote:

On 9/26/2010 9:38 PM, P.J. Eby wrote:

At 11:15 AM 9/27/2010 +1000, Ben Finney wrote:



You misunderstand me; I wasn't asking how to *add* a link, but how to
turn OFF the automatic conversion of the phrase PEP 333 that happens
without any special markup.



Currently, the PEP  preface is littered with unnecessary links,
because the PEP pre-processor turns *every* mere textual mention of a
PEP into a link to it.


Ouch. This is about as annoying as Thunderbird's message editor 
popping up a windowed asking me what file I want to at.tach 
everytime I write the word at-tach' or a derivative without the 
extra punctuation. It would definitely not be the vehicle for 
writing about at=mentment syndromes.


Suggestion pending something better from rst/PEP experts:
This PEP extends PEP 333 (abbreviated P333 hereafter).
perhaps with to avoid auto-link creation added before ')' to 
pre-answer pesky questions and to avoid some editor re-expanding the 
abbreviations.


It turns out that using a backslash before the number (e.g. PEP \333) 
turns off the automatic conversion.


The PEP still hasn't showed up on Python.org, though, so I'm 
wondering if maybe I broke something else somewhere.


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] [Web-SIG] WSGI is now Python 3-friendly

2010-09-27 Thread P.J. Eby

At 12:36 PM 9/27/2010 -0700, Brett Cannon wrote:

All fixed.


Nope.  I mean, sure, I checked in fixed PEP sources several hours 
ago, but python.org still doesn't show PEP , or the updated 
version of PEP 333.


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] [Web-SIG] WSGI is now Python 3-friendly

2010-09-27 Thread P.J. Eby

At 02:03 PM 9/27/2010 -0700, Guido van Rossum wrote:

On Mon, Sep 27, 2010 at 1:33 PM, P.J. Eby p...@telecommunity.com wrote:
 At 12:36 PM 9/27/2010 -0700, Brett Cannon wrote:

 All fixed.

 Nope.  I mean, sure, I checked in fixed PEP sources several hours ago, but
 python.org still doesn't show PEP , or the updated version of PEP 333.

Seems Brett has fixed it. Both PEPs are now online.

I wonder if it would make sense to change both from Informational to
Standard Track ?


From PEP 1:


There are three kinds of PEP:
   * A Standards Track PEP describes a new feature or implementation 
for Python.
   * An Informational PEP describes a Python design issue, or 
provides general guidelines or information to the Python community, 
but does not propose a new feature. Informational PEPs do not 
necessarily represent a Python community consensus or recommendation, 
so users and implementors are free to ignore Informational PEPs or 
follow their advice.
   * A Process PEP describes a process surrounding Python, or 
proposes a change to (or an event in) a process. Process PEPs are 
like Standards Track PEPs but apply to areas other than the Python 
language itself. They may propose an implementation, but not to 
Python's codebase; they often require community consensus; unlike 
Informational PEPs, they are more than recommendations, and users are 
typically not free to ignore them. Examples include procedures, 
guidelines, changes to the decision-making process, and changes to 
the tools or environment used in Python development. Any meta-PEP is 
also considered a Process PEP.



I don't think it qualifies as a Standards PEP under the above 
definitions.  I made it Informational originally because it's rather 
like the DB API PEPs, which are Informational.


I suppose we could say it's a Process PEP, or perhaps update PEP 1 to 
add a new category (into which the DB API PEPs would also fall), or 
maybe just tweak the above definitions a bit so that the 
Informational category makes more sense.


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] [Web-SIG] WSGI is now Python 3-friendly

2010-09-27 Thread P.J. Eby

At 05:41 PM 9/27/2010 -0700, Guido van Rossum wrote:

On Mon, Sep 27, 2010 at 4:29 PM, P.J. Eby p...@telecommunity.com wrote:
 At 02:03 PM 9/27/2010 -0700, Guido van Rossum wrote:

 On Mon, Sep 27, 2010 at 1:33 PM, P.J. Eby p...@telecommunity.com wrote:
  At 12:36 PM 9/27/2010 -0700, Brett Cannon wrote:
 
  All fixed.
 
  Nope.  I mean, sure, I checked in fixed PEP sources several hours ago,
  but
  python.org still doesn't show PEP , or the updated version of PEP
  333.

 Seems Brett has fixed it. Both PEPs are now online.

 I wonder if it would make sense to change both from Informational to
 Standard Track ?

 From PEP 1:

 
 There are three kinds of PEP:
   * A Standards Track PEP describes a new feature or implementation for
 Python.
   * An Informational PEP describes a Python design issue, or provides
 general guidelines or information to the Python community, but does not
 propose a new feature. Informational PEPs do not necessarily represent a
 Python community consensus or recommendation, so users and implementors are
 free to ignore Informational PEPs or follow their advice.
   * A Process PEP describes a process surrounding Python, or proposes a
 change to (or an event in) a process. Process PEPs are like Standards Track
 PEPs but apply to areas other than the Python language itself. They may
 propose an implementation, but not to Python's codebase; they often require
 community consensus; unlike Informational PEPs, they are more than
 recommendations, and users are typically not free to ignore them. Examples
 include procedures, guidelines, changes to the decision-making process, and
 changes to the tools or environment used in Python development. 
Any meta-PEP

 is also considered a Process PEP.
 

 I don't think it qualifies as a Standards PEP under the above definitions.
  I made it Informational originally because it's rather like the DB API
 PEPs, which are Informational.

 I suppose we could say it's a Process PEP, or perhaps update PEP 1 to add a
 new category (into which the DB API PEPs would also fall), or maybe just
 tweak the above definitions a bit so that the Informational category makes
 more sense.

Hm. I would rather extend the definition of Standards Track to include
API standards that are important to the community even if they do not
introduce a new feature for the language or standard library. WSGI and
DB-API being the two most well-known examples but I wouldn't be
surprised if there were others, possibly in the NumPy world.


Well, one of the tradeoffs here is that Informational track allows 
something to grow into a solid standard without also having to pass 
the same level of up-front scrutiny and commitment that a Standards 
track item does.  I rather doubt that either the DBAPI *or* WSGI 
would've passed that scrutiny in early days, and the free to ignore 
part means that there's a lot less pushback on the minor points than 
generally occurs with Standards track PEPs.


So, I'd hate for us to lose out on the *next* DBAPI or WSGI due to an 
implied pressure of needing to get it right in the first 
place.  (Indeed, I think we need *more* Informational PEPs -- in 
retrospect there was probably some point at which I should have done 
some relating to setuptools and eggs and such.)


Overall, though, I supposed there's no problem with promoting Final 
Informational PEPs to Standards, *unless* it creates an expectation 
that Informational PEPs will become Standards and they thus end up 
being debated in the same way anyway.  (Of course, if it generally 
takes five or six years before an Informational PEP usually gets 
promoted, this is unlikely to be a major worry.)


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] WSGI is now Python 3-friendly

2010-09-26 Thread P.J. Eby

At 07:15 PM 9/25/2010 -0700, Guido van Rossum wrote:

Don't see this as a new spec. See it as a procedural issue.


As a procedural issue, PEP 333 is an Informational PEP, in Draft 
status, which I'd like to make Final after these amendments.  See 
http://www.wsgi.org/wsgi/Amendments_1.0, which Graham created in 2007, stating:


This page is intended to collect any ideas related to amendments 
to the original WSGI 1.0 so that it can be marked as 'Final'.


IOW, there is no intention to treat the PEP as mutable going 
forward; this is just cleanup so we can mark it Final.  After that, 
it's an ex-parrot.




Clarifications of ambiguous/unspecified behavior can possibly rule as
non-conforming implementations that used to get the benefit of the
doubt. Best-practice recommendations also have the effect of changing
(perceived) compliance.


I understand the general principle, but with respect to these 
*specific* changes, any perceived-compliance arguments that were 
going to happen, already happened years ago.  The changes are merely 
to officially document the way those arguments already turned out, so 
the PEP can become Final.


Specifically, the changes all fall into one of three categories:

1. Textual clarification (SERVER_PORT is not an int, iteration can 
stop before all output is consumed)


2. Practical issues with wsgi.input arising from the fact that 
real-world programs needed its behavior to be more file-like than 
the specification required...  and which essentially forced servers 
that were not using socket.makefile() to make their emulations work 
like that, anyway (or else be rejected by users).


3. Clarification of behavior that would break HTTP compliance (apps 
or servers sending more than Content-Length bytes) and is therefore 
*already a bug* in any implementation that does it.


Since in all three categories any implementation that did not end up 
following the recommendations on its own is going to have been 
considered buggy by its users (regardless of its formal 
compliance), and because the changes do not actually declare the 
buggy behaviors in categories 2 and 3 to be non-compliant, I do not 
see how any of these changes can produce the type of problems you're 
worried about here.


Certainly, if I thought such problems were possible, I wouldn't have 
accepted these amendments.  Likewise, if I thought that changes would 
continue to be made to the PEP past this point, the goal wouldn't be 
getting it to Final status.


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] [Web-SIG] WSGI is now Python 3-friendly

2010-09-26 Thread P.J. Eby

At 08:20 AM 9/26/2010 -0700, Guido van Rossum wrote:

I'm happy approving Final status for the
*original* PEP 333 and I'm happy to approve a new PEP which includes
PJE's corrections.


Can we make it PEP , then?  ;-)

That number would at least communicate that it's the same thing, but 
for Python 3.


Really, my reason for trying to do the (non Py3-specific) amendments 
in a way that didn't require a new PEP number was because of the many 
ancillary questions that it raises for the community, such as:


* Is this is some sort of competition/replacement to PEP 444?
* What happened to the old one, why can't we just use that?
* Why isn't there a different protocol version?
* How is this different from the old one?

To be fair, I *also* wanted to avoid all the work associated with 
*answering* them.  ;-)  (Heck, I really wanted to avoid the work of 
having to even *think* about which questions *might* arise and how 
they'd need to be addressed.)


OTOH, I can certainly see that my attempt to avoid this has *already* 
failed: it simply brought up a different set of questions, just on 
Python-Dev instead of Web-SIG or Python-list.


Oh well.  Perhaps making the numbering appear to be a continuation 
will help a bit.


Another option would be to make a PEP that consists solely of the 
amendments and errata themselves, as this would answer most of the 
above questions directly.


Still another would be to abandon the effort to amend the PEP, and 
simply leave things as they are now: AFAICT, the fact that these 
amendments aren't in the PEP hasn't stopped anybody from *treating* 
most of them as if they were.  (Because everyone understands that 
failure to follow them constitutes a bug in your program, even if it 
technically complies with the spec.)



___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] [Web-SIG] WSGI is now Python 3-friendly

2010-09-26 Thread P.J. Eby

At 01:44 PM 9/26/2010 -0700, Guido van Rossum wrote:

On Sun, Sep 26, 2010 at 12:47 PM, Barry Warsaw ba...@python.org wrote:
 On Sep 26, 2010, at 1:33 PM, P.J. Eby wrote:

 At 08:20 AM 9/26/2010 -0700, Guido van Rossum wrote:
 I'm happy approving Final status for the
 *original* PEP 333 and I'm happy to approve a new PEP which includes
 PJE's corrections.

 Can we make it PEP , then?  ;-)

 That works for me.

Go for it.


Shall I just svn cp it, then (to preserve edit history), or wait 
for the PEP editor do it?


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] [Web-SIG] WSGI is now Python 3-friendly

2010-09-26 Thread P.J. Eby
Done.  The other amendments were never actually made, so I just 
reverted the Python 3 bit after moving it to the new PEP.  I'll make 
the changes to  instead as soon as I have another time slot free.


At 01:56 PM 9/26/2010 -0700, Guido van Rossum wrote:

Since you have commit privileges, just do it. The PEP editor position
mostly exists to assure non-committers are not prevented from
authoring PEPs.

Please do add a prominent note at the top of PEP 333 pointing to PEP
 for further information on Python 3 compliance or some such
words. Add a similar note at the top of PEP  -- maybe mark up the
differences in PEP  so people can easily tell what was added. And
move PEP 333 to Final status.

--Guido

On Sun, Sep 26, 2010 at 1:50 PM, P.J. Eby p...@telecommunity.com wrote:
 At 01:44 PM 9/26/2010 -0700, Guido van Rossum wrote:

 On Sun, Sep 26, 2010 at 12:47 PM, Barry Warsaw ba...@python.org wrote:
  On Sep 26, 2010, at 1:33 PM, P.J. Eby wrote:
 
  At 08:20 AM 9/26/2010 -0700, Guido van Rossum wrote:
  I'm happy approving Final status for the
  *original* PEP 333 and I'm happy to approve a new PEP which includes
  PJE's corrections.
 
  Can we make it PEP , then?  ;-)
 
  That works for me.

 Go for it.

 Shall I just svn cp it, then (to preserve edit history), or wait for the
 PEP editor do it?





--
--Guido van Rossum (python.org/~guido)
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/pje%40telecommunity.com


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] [Web-SIG] WSGI is now Python 3-friendly

2010-09-26 Thread P.J. Eby

At 02:59 PM 9/26/2010 -0400, Terry Reedy wrote:
You could mark added material is a way that does not conflict with 
rst or html. Or use .rst to make new text stand out in the .html web 
verion (bold, underlined, red, or whatever). People familiar with 
333 can focus on the marked sections. New readers can ignore the marking.


If you (or anybody else) have any idea how to do that (highlight 
stuff in PEP-dialect .rst), let me know.


(For that matter, if anybody knows how to make it not turn *every* 
PEP reference into a link, that'd be good too!  It doesn't really 
need to turn 5 or 6 occurrences of PEP 333 in the same paragraph 
into separate links.  ;-) )


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] [Web-SIG] WSGI is now Python 3-friendly

2010-09-26 Thread P.J. Eby

At 11:15 AM 9/27/2010 +1000, Ben Finney wrote:

P.J. Eby http://mail.python.org/mailman/listinfo/python-devpje 
at telecommunity.com writes:


 (For that matter, if anybody knows how to make it not turn *every* PEP
 reference into a link, that'd be good too! It doesn't really need to
 turn 5 or 6 occurrences of PEP 333 in the same paragraph into
 separate links. ;-) )

reST, being designed explicitly for Python documentation, has support
for PEP references built in:



You misunderstand me; I wasn't asking how to *add* a link, but how to 
turn OFF the automatic conversion of the phrase PEP 333 that 
happens without any special markup.


Currently, the PEP  preface is littered with unnecessary links, 
because the PEP pre-processor turns *every* mere textual mention of a 
PEP into a link to it.


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] WSGI is now Python 3-friendly

2010-09-25 Thread P.J. Eby
I have only done the Python 3-specific changes at this point; the 
diff is here if anybody wants to review, nitpick or otherwise comment:


  
http://svn.python.org/view/peps/trunk/pep-0333.txt?r1=85014r2=85013pathrev=85014

For that matter, if anybody wants to take a crack at updating Python 
3's wsgiref based on the above, feel free.  ;-)  I'll be happy to 
answer any questions I can that come up in the process.


(Please note: I went with Ian Bicking's headers are strings, bodies 
are bytes proposal, rather than my original bodies and outputs are 
bytes one, as there were not only some good arguments in its favor, 
but because it also resulted in fewer changes to the PEP, especially 
in the code samples.)


I will continue to work on adding the other addenda/errata mentioned here:

  http://mail.python.org/pipermail/web-sig/2010-September/004655.html

But because these are shoulds rather than musts, and apply to both 
Python 2 and 3, they are not as high priority for immediate 
implementation in wsgiref and do not necessarily need to hold up the 
3.2 release.


(Nonetheless, if anybody is willing to implement them in the Python 3 
version, I will happily review the changes for backport into the 
Python 2 standalone version of wsgiref, and issue an updated release 
to include them.)


Thanks!

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] WSGI is now Python 3-friendly

2010-09-25 Thread P.J. Eby

At 09:22 PM 9/25/2010 -0400, Jesse Noller wrote:

It seems like it will end up
different enough to be a different specification, closely related to
the original, but different enough to trip up all the people
maintaining current WSGI servers and apps.


The only actual *change* to the spec is mandating the use of the 
'bytes' type or equivalent for HTTP bodies when using Python 3.


Seriously, that's *it*.

Everything else that's (planned to be) added is either 100% truly 
just clarifications (e.g. nothing in the spec *ever* said SERVER_PORT 
could be an int, but apparently some people somehow interpreted it 
so), or else best-practice recommendations from people who actually 
implemented WSGI servers.


For example, the readline() size hint is not supported in the 
original spec (meaning clients can't call it and be compliant).  The 
planned modification is servers should implement it (best 
practice), but you can't call an implementation that *doesn't* 
implement it noncompliant.  (This just addresses the fact that most 
practical implementations *did* in fact support it, and code out 
there relies on this.)


So, no (previously-)compliant implementations were harmed in the 
making of the updated spec.  If they were compliant before, they're 
compliant now.


I'm actually a bit surprised people are bringing this up now, since 
when I announced the plan to make these changes, I said that nothing 
would be changed that would break anything...  even for what I 
believe are the only Python 3 WSGI implementations right now (by 
Graham Dumpleton and Robert Brewer).


Indeed, all of the changes (except the bytes thing) are stuff 
previously discussed endlessly on the Web-SIG (years ago in most 
cases) and widely agreed on as, this should have been made clear in 
the original PEP.


And, I also explicitly deferred and/or rejected items that *can't* be 
done in a 100% backward-compatible way, and would have to be WSGI 1.1 
or higher -- indeed, I have a long list of changes from Graham that 
I've pronounced can't be done without a 1.1.


Indeed, the entire point of the my scope choices were to allow all 
this to happen *without* a whole new spec.  ;-)


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] WSGI is now Python 3-friendly

2010-09-25 Thread P.J. Eby

At 02:07 PM 9/25/2010 -0700, Guido van Rossum wrote:

This is a very laudable initiative and I approve of the changes -- but
I really think it ought to be a separate PEP rather than pretending it
is just a set of textual corrections on the existing PEP 333.


With the exception of the bytes change, I ruled out accepting any 
proposed amendments that would actually alter the protocol.  The 
amendments are all either textual clarifications, clarifications of 
ambiguous/unspecified areas, or best-practice recommendations by 
implementors.  (i.e., which are generally already implemented in major servers)


The full list of things Graham and others have asked for or 
recommended would indeed require a 1.1 version at minimum, and thus a 
new PEP.  But I really don't want to start down that road right now, 
and therefore hope that I can talk Graham or some other poor soul 
into shepherding a 1.1 PEP instead.  ;-)


(Seriously: through an ironic twist of fate, I have done nearly 
*zero* Python web programming since around the time I drafted the 
first spec in 2004, so even if it makes sense for me to finish PEP 
333, it makes little sense for me to be starting a *new* one on the topic now!)


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] Backup plan: WSGI 1 Addenda and wsgiref update for Py3

2010-09-21 Thread P.J. Eby
While the Web-SIG is trying to hash out PEP 444, I thought it would 
be a good idea to have a backup plan that would allow the Python 3 
stdlib to move forward, without needing a major new spec to settle 
out implementation questions.


After all, even if PEP 333 is ultimately replaced by PEP 444, it's 
probably a good idea to have *some* sort of WSGI 1-ish thing 
available on Python 3, with bytes/unicode and other matters settled.


In the past, I was waiting for some consensuses (consensi?) on 
Web-SIG about different approaches to Python 3, looking for some sort 
of definite, yes, we all like this response.  However, I can see 
now that this just means it's my fault we don't have a spec yet.:-(


So, unless any last-minute showstopper rebuttals show up this week, 
I've decided to go ahead officially bless nearly all of what Graham 
Dumpleton (who's not only the mod_wsgi author, but has put huge 
amounts of work into shepherding WSGI-on-Python3 proposals, WSGI 
amendments, etc.) has proposed, with a few minor exceptions.


In other words: almost none of the following is my own original work; 
it's like 90% Graham's.  Any praise for this belongs to him; the only 
thing that belongs to me is the blame for not doing this 
sooner!  (Sorry Graham.  You asked me to do this ages ago, and you were right.)


Anyway, I'm posting this for comment to both Python-Dev and the 
Web-SIG.  If you are commenting on the technical details of the 
amendments, please reply to the Web-SIG only.  If you are commenting 
on the development agenda for wsgiref or other Python 3 library 
issues, please reply to Python-Dev only.  That way, neither list will 
see off-topic discussions.  Thanks!



The Plan


I plan to update the proposal below per comments and feedback during 
this week, then update PEP 333 itself over the weekend or early next 
week, followed by a code review of Python 3's wsgiref, and 
implementation of needed changes (such as recoding os.environ to 
latin1-captured bytes in the CGI handler).


To complete the changes, it is possible that I may need assistance 
from one or more developers who have more Python 3 experience.  If 
after reading the proposed changes to the spec, you would like to 
volunteer to help with updating wsgiref to match, please let me know!



The Proposal



Overview


1. The primary purpose of this update is to provide a uniform porting 
pattern for moving Python 2 WSGI code to Python 3, meaning a pattern 
of changes that can be mechanically applied to as little code as 
practical, while still keeping the WSGI spec easy to programmatically 
validate (e.g. via ``wsgiref.validate``).


The Python 3 specific changes are to use:

* ``bytes`` for I/O streams in both directions
* ``str`` for environ keys and values
* ``bytes`` for arguments to start_response() and write()
* text stream for wsgi.errors

In other words, strings in, bytes out for headers, bytes for bodies.

In general, only changes that don't break Python 2 WSGI 
implementations are allowed.  The changes should also not break 
mod_wsgi on Python 3, but may make some Python 3 wsgi applications 
non-compliant, despite continuing to function on mod_wsgi.


This is because mod_wsgi allows applications to output string headers 
and bodies, but I am ruling that option out because it forces every 
piece of middleware to have to be tested with arbitrary combinations 
of strings and bytes in order to test compliance.  If you want your 
application to output strings rather than bytes, you can always use a 
decorator to do that.  (And a sample one could be provided in wsgiref.)



2. The secondary purpose of the update is to address some 
long-standing open issues documented here:


   http://www.wsgi.org/wsgi/Amendments_1.0

As with the Python 3 changes, only changes that don't retroactively 
invalidate existing implementations are allowed.



3. There is no tertiary purpose.  ;-)  (By which I mean, all other 
kinds of changes are out-of-scope for this update.)



4. The section below labeled A Note On String Types is proposed for 
verbatim addition to the Specification Overview section in the PEP; 
the other sections below describe changes to be made inline at the 
appropriate part of the spec, and changes that were proposed but are 
rejected for inclusion in this amendment.



A Note On String Types
--

In general, HTTP deals with bytes, which means that this 
specification is mostly about handling bytes.


However, the content of those bytes often has some kind of textual 
interpretation, and in Python, strings are the most convenient way to 
handle text.


But in many Python versions and implementations, strings are Unicode, 
rather than bytes.  This requires a careful balance between a usable 
API and correct translations between bytes and text in the context of 
HTTP...  especially to support porting code between Python 
implementations with different ``str`` types.


WSGI therefore 

Re: [Python-Dev] [Web-SIG] Backup plan: WSGI 1 Addenda and wsgiref update for Py3

2010-09-21 Thread P.J. Eby

At 12:55 PM 9/21/2010 -0400, Ian Bicking wrote:
On Tue, Sep 21, 2010 at 12:47 PM, Chris McDonough 
mailto:chr...@plope.comchr...@plope.com wrote:

On Tue, 2010-09-21 at 12:09 -0400, P.J. Eby wrote:
 While the Web-SIG is trying to hash out PEP 444, I thought it would
 be a good idea to have a backup plan that would allow the Python 3
 stdlib to move forward, without needing a major new spec to settle
 out implementation questions.

If a WSGI-1-compatible protocol seems more sensible to folks, I'm
personally happy to defer discussion on PEP 444 or any other
backwards-incompatible proposal.


I think both make sense, making WSGI 1 sensible for Python 3 (as 
well as other small errata like the size hint) doesn't detract from 
PEP 444 at all, IMHO.


Yep.  I agree.  I do, however, want to get these amendments settled 
and make sure they get carried over to whatever spec is the successor 
to PEP 333.  I've had a lot of trouble following exactly what was 
changed in 444, and I'm a tad worried that several new ambiguities 
may be being introduced.  So, solidifying 333 a bit might be helpful 
if it gives a good baseline against which to diff 444 (or whatever).


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Backup plan: WSGI 1 Addenda and wsgiref update for Py3

2010-09-21 Thread P.J. Eby

At 06:52 PM 9/21/2010 +0200, Antoine Pitrou wrote:

On Tue, 21 Sep 2010 12:09:44 -0400
P.J. Eby p...@telecommunity.com wrote:
 While the Web-SIG is trying to hash out PEP 444, I thought it would
 be a good idea to have a backup plan that would allow the Python 3
 stdlib to move forward, without needing a major new spec to settle
 out implementation questions.

If this allows the Web situation in Python 3 to be improved faster
and with less hassle then all the better.
There's something strange in your proposal: it mentions WSGI 2 at
several places while there's no guarantee about what WSGI 2 will be (is
there?).


Sorry - WSGI 2 should be read as shorthand for, whatever new spec 
succeeds PEP 333, whether that's PEP 444 or something else.


It just means that any new spec that doesn't have to be 
backward-compatible can (and should) more thoroughly address the 
issue in question. 


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] [Catalog-sig] egg_info in PyPI

2010-09-18 Thread P.J. Eby

At 05:19 PM 9/18/2010 +0200, Martin v. Löwis wrote:

In the specific case of tl.eggdeps, the dependency information is only
used to create printable graphs. If this turns out to be slightly 
incorrect, people would notice if they try to use the packages in

question.


By the way, just providing this information for .egg files and *not* 
for sdists would ensure accuracy of the metadata for that 
platform/python version.



___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] [Catalog-sig] egg_info in PyPI

2010-09-18 Thread P.J. Eby

At 06:06 PM 9/18/2010 +0200, Martin v. Löwis wrote:

Am 18.09.10 17:49, schrieb P.J. Eby:

At 05:19 PM 9/18/2010 +0200, Martin v. Löwis wrote:

In the specific case of tl.eggdeps, the dependency information is only
used to create printable graphs. If this turns out to be slightly
incorrect, people would notice if they try to use the packages in
question.


By the way, just providing this information for .egg files and *not* for
sdists would ensure accuracy of the metadata for that platform/python
version.


True (I presume - unless there are also dependencies on the specific
OS version or system installation that may affect the metadata).


No, because an egg's egg-info is what it is.  easy_install doesn't 
rebuild that information, so it is correct by 
definition.  ;-)  (Certainly, it is what it will be used for 
dependency information.)




OTOH, I do think that the users asking for that prefer per-release
information, despite the limitations that this may have.

OTTH, if the concerns could be relieved if egg-info would provided
for all files that have it, I could provide that as well/instead.


I am +0 on the idea myself, as I don't think the plan is quite enough 
to be able to provide a user-experience upgrade for use cases besides 
make me a dependency graph without downloading the distributions themselves.


It certainly would be nice to be able to say to the user, here are 
the things I will need to download in order to fulfill your request, 
but if you have to download individual files to get at that 
information, I'm not sure how much it helps vs. just downloading the files.


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] standards for distribution names

2010-09-16 Thread P.J. Eby

At 12:08 PM 9/16/2010 +0100, Chris Withers wrote:

Following on from this question:

http://twistedmatrix.com/pipermail/twisted-python/2010-September/022877.html

...I'd thought that the correct names for distributions would have 
been documented in one of:


...

Where are the standards for this or is it still a case of whatever 
setuptools does?


Actually, in this case, it's whatever distutils does.  If you don't 
build your .exe's with Distutils, or if you rename them after the 
fact, then setuptools won't recognize them as things it can consume.


FYI, Twisted has a long history of releasing distribution files that 
are either built using non-distutils tools or else renamed after being built.


Note, too, that if the Windows exe's they're providing aren't built 
by the distutils bdist_wininst command, then setuptools is probably 
not going to be able to consume them, no matter what they're called.






___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] standards for distribution names

2010-09-16 Thread P.J. Eby

At 12:08 PM 9/16/2010 +0100, Chris Withers wrote:
...I'd thought that the correct names for distributions would have 
been documented in one of:


http://www.python.org/dev/peps/pep-0345
http://www.python.org/dev/peps/pep-0376
http://www.python.org/dev/peps/pep-0386

...but having read them, I drew a blank.


Forgot to mention: see distinfo_dirname() in PEP 376 for an 
explanation of distribution-name normalization.


(Case-insensitivity and os-specific case handling is not addressed in 
the PEPs, though, AFAICT.)


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] 3.x as the official release

2010-09-16 Thread P.J. Eby

At 10:18 PM 9/16/2010 +0200, Éric Araujo wrote:
Le 15/09/2010 21:45, Tarek Ziadé a écrit :  Could we remove in 
any case the wsgiref.egg-info file ? Since we've  been working on a 
new format for that (PEP 376), that should be  starting to get used 
in the coming years, it'll be a bit of a  non-sense to have that 
metadata file in the sdtlib shipped with 3,2 On a related subject: 
Would it make sense not to run install_egg_info from install 
anymore?  We probably can’t remove the command because of backward 
compat, but we could stop running it (thus creating egg-info files) by default.


If you're talking about distutils2 on Python 3, then of course 
anything goes: backward compatibility isn't an issue.  For 2.x, not 
writing the files would indeed produce backward compatibility problems.


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] 3.x as the official release

2010-09-15 Thread P.J. Eby

At 11:11 AM 9/15/2010 -0700, Guido van Rossum wrote:

Given that wsgiref is in the stdlib, I think we should hold up the 3.2
release (and even the first beta) until this is resolved, unless we
can convince ourselves that it's okay to delete wsgiref from the
stdlib (which sounds unlikely but may not be any more incompatible
than making it work properly :-).


FWIW, I'd be fine with that option.



I want to emphasize that I am *not* a stakeholder so my preference for
bytes or Unicode shouldn't matter; that said, given WSGI's traditional
emphasis on using the lowest-level, vanilla standard datatypes (e.g.
you can't even subclass dict let alone provide another kind of mapping
-- it has to be a real dict) it makes sense to me that the values
should be bytes, os.environ notwithstanding. The keys probably could
be Unicode (HTTP headers are required to use only 7-bit ASCII
characters anyways right?). But I'd be happy to be shown the error of
my ways (or given a link showing prior discussion of the matter --
preferably with a conclusion :-).


There isn't a conclusion yet, but the proposals under discussion are 
summarized here:


  http://www.wsgi.org/wsgi/Python_3#Proposals

The primary points of consensus are bytes for wsgi.input, and native 
strings (i.e. Unicode on Python 3) for environment keys.


If I were to offer a suggestion to a PEP author or dictator wanting 
to get something out ASAP, it would probably be to create a 
compromise between the flat model (my personal favorite) and the 
mod_wsgi model, as an addendum to PEP 333.  Specifically:


* leave start_response/write in play (ala mod_wsgi)

* use the required types from the flat proposal (i.e. status, 
headers, and output stream MUST be bytes)


* add a decorator to wsgiref that supports using native strings as 
output instead of bytes, for ease-of-porting (combine mod_wsgi's 
ease-of-porting w/flat's simple verifiability)


This would probably allow us to get by with the least changes to 
existing code, the stdlib, the standard itself, and 
wsgiref.   (wsgiref itself would still need a thorough code review, 
especially wsgiref.validate, but it'd be unlikely to change much.)


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] 3.x as the official release

2010-09-15 Thread P.J. Eby

At 11:12 PM 9/15/2010 +0200, Éric Araujo wrote:
Unless I remember wrong, the intent was not to break code that used 
pkg_resources.require('wsgiref')


More precisely, at the time it was done, setuptools was slated for 
inclusion in Python 2.5, and the idea was that when modules moved 
from PyPI to the stdlib, they would include the metadata so that 
projects requiring the module on an older version of Python would not 
need to use Python-version-dependent dependencies.


So, for example, if a package was written on 2.4 using a requirement 
of wsgiref, then that code would run unchanged on 2.5 using the 
stdlib-supplied copy.


In practice, this didn't work out in 2.x, and it's meaningless on 3.x 
where nothing has migrated yet from PyPI to stdlib AFAIK.  ;-)


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] 3.x as the official release

2010-09-15 Thread P.J. Eby

At 11:50 PM 9/15/2010 +0200, Dirkjan Ochtman wrote:

On Wed, Sep 15, 2010 at 21:18, P.J. Eby p...@telecommunity.com wrote:
 If I were to offer a suggestion to a PEP author or dictator wanting to get
 something out ASAP, it would probably be to create a compromise between the
 flat model (my personal favorite) and the mod_wsgi model, as an addendum
 to PEP 333. Â Specifically:

 * leave start_response/write in play (ala mod_wsgi)

The alternative is returning a three-tuple status, headers,
content-iterable, right?

I would definitely prefer just returning a three-tuple instead of the
crappy start_response callback that returns a write callable. It makes
applications easier to write, and the unified model should also make
server implemation easier. It also combines nicely with yield from in
some cases.


I would prefer it too (which is why the flat model is my favorite), 
but I think it would be easier to get a quick consensus for something 
that allows apps to be more mechanically ported from 2.x to 3.x.


That's why I said, offer a suggestion to ... get something out ASAP.  ;-)

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 444 aka Web3 (was Re: how to decide on a Python 3 design for wsgiref)

2010-09-15 Thread P.J. Eby

At 09:22 AM 9/16/2010 +1000, James Mills wrote:

On Thu, Sep 16, 2010 at 9:06 AM, Chris McDonough chr...@plope.com wrote:
 Comments and competing specs would be useful.

Can I post comments here ? :)


Please, let's put any spec-detail commentary on the Web-SIG instead 
(commenting here on process issues related to the 3.x releases is of 
course fine).


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] 'hasattr' is broken by design

2010-08-25 Thread P.J. Eby

At 12:10 PM 8/25/2010 +1200, Greg Ewing wrote:

Consider an object that is trying to be a transparent
proxy for another object, and behave as much as possible
as though it really were the other object. Should an
attribute statically defined on the proxied object be
considered dynamically defined on the proxy? If so, then
the proxy isn't as transparent as some people may want.


Yep.  That's why the proposed addition to inspect is a bad idea.  If 
we encourage that sort of static thinking, it will lead to people 
creating all sorts of breakage with respect to more dynamic code.


AFAICT, the whole avoid running code thing only makes sense for a 
debugging tool -- at which point, you can always use the trace 
facility and throw an error when any Python code runs that's not part 
of your debugging tool.  Something like:


def exists(ob, attr):
   __running__ = True
   # ... set trace function here
   try:
   try:
   getattr(ob, attr)
   return True
   except AttributeError:
   return False
   except CodeRanError:
return True   # or False if you prefer
   finally:
   __running__ = False
   # restore old tracing here

Where the trace function is just something that throws CodeRanError 
if it detects a call event and the __running__ flag is True.  This 
would stop any Python code from actually executing.  (It'd need to 
keep the same trace function for c_call events, since that might lead 
to nested non-C calls .)


Of course, a debugger's object inspection tool would probably 
actually want to return either the attribute value, or a special 
value to mean dyanmic calculation needed.


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] 'hasattr' is broken by design

2010-08-25 Thread P.J. Eby

At 08:58 PM 8/25/2010 +0300, Michael Foord wrote:
If your proxy class defines __call__ then callable returns True, 
even if the delegation to the proxied object would cause an 
AttributeError to be raised.


Nope.  You just have to use delegate via __getattribute__ (since 2.2) 
instead of __getattr__:


 from peak.util.proxies import ObjectProxy
 o=ObjectProxy(lambda:1)
 o()
1
 o.__call__
method-wrapper '__call__' of function object at 0x00E004B0

 o=ObjectProxy(1)
 o()
Traceback (most recent call last):
  File stdin, line 1, in module
  File 
c:\cygwin\home\pje\projects\proxytypes\peak\util\proxies.py, line 6, in

 __call__
return self.__subject__(*args,**kw)
TypeError: 'int' object is not callable

 o.__call__
Traceback (most recent call last):
  File stdin, line 1, in module
  File 
c:\cygwin\home\pje\projects\proxytypes\peak\util\proxies.py, line 12, i

n __getattribute__
return getattr(subject,attr)
AttributeError: 'int' object has no attribute '__call__'


As you can see, the __call__ attribute in each case is whatever the 
proxied object's __call__ attribute is, even though the proxy itself 
has a __call__ method, that is invoked when the proxy is called.


This is actually pretty straightforward stuff since the introduction 
of __getattribute__.


(The code is at http://pypi.python.org/pypi/ProxyTypes, btw.)

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] 'hasattr' is broken by design

2010-08-24 Thread P.J. Eby

At 03:37 PM 8/24/2010 +0200, Hrvoje Niksic wrote:
a) a business case of throwing anything other than AttributeError 
from __getattr__ and friends is almost certainly a bug waiting to happen, and


FYI, best practice for __getattr__ is generally to bail with an 
AttributeError as soon as you see double underscores in the name, 
unless you intend to support special attributes.


I don't think this is documented anywhere, but experience got this 
pretty ingrained in my head since Python 2.2 or even earlier.


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] 'hasattr' is broken by design

2010-08-24 Thread P.J. Eby

At 10:13 AM 8/24/2010 -0500, Benjamin Peterson wrote:

2010/8/24 James Y Knight f...@fuhm.net:

 On Aug 24, 2010, at 10:26 AM, Benjamin Peterson wrote:

 2010/8/24 P.J. Eby p...@telecommunity.com:

 At 03:37 PM 8/24/2010 +0200, Hrvoje Niksic wrote:

 a) a business case of throwing anything other than AttributeError from
 __getattr__ and friends is almost certainly a bug waiting to happen, and

 FYI, best practice for __getattr__ is generally to bail with an
 AttributeError as soon as you see double underscores in the name, unless
 you
 intend to support special attributes.

 Unless you're in an old-style class, you shouldn't get an double
 underscore methods in __getattr__ (or __getattribute__). If you do,
 it's a bug.

 Uh, did you see the message that was in response to?

 Maybe it should be a bug report?

Old version of Python I think.


If by old you mean 2.6, sure.  (Also, I did say this was a best 
practice since 2.2.)


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] 'hasattr' is broken by design

2010-08-23 Thread P.J. Eby

At 12:02 AM 8/24/2010 +0300, Michael Foord wrote:
For properties there is *no reason* why code should be executed 
merely in order to discover if the attribute exists or not.


That depends on what you mean by exists.  Note that a property 
might raise AttributeError to signal that the attribute is not 
currently set.  Likewise, unless you special case __slots__ 
descriptors, you can have the bizarre condition where hasattr() will 
return True, but getattr() will still raise an AttributeError.


The idea that you could determine the presence of an attribute on an 
object without executing that object's code is something that hasn't 
been practical since the birth of descriptors in Python 2.2.



Yes I know the dance (walking the mro fetching the attribute out of 
the appropriate type __dict__ or the instance dict - or looking on 
the metaclass if the object you are introspecting is a type itself), 
it is just not trivial - which is why I think it is a shame that 
people are forced to implement it just to ask if a member exists 
without triggering code execution.


Even if you implement it, you will get wrong answers in some 
cases.  __getattribute__ is allowed to throw out the entire algorithm 
you just described and replace it utterly with something else.


My ProxyTypes library makes use of that fact, for example, so if you 
actually attempted to inspect a proxy instance with your 
re-implemented dance, your code will fail to notice what attributes 
the proxy actually has.


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] 'hasattr' is broken by design

2010-08-23 Thread P.J. Eby

At 06:12 PM 8/23/2010 -0400, Yury Selivanov wrote:

BTW, is it possible to add new magic method __hasattr__?  Maybe not
in Python 3.2, but in general.


In order to do this properly, you'd need to also add __has__ or 
__exists__ (or some such) to the descriptor protocol; otherwise you 
break descriptors' ability to operate independently of the class 
they're used in.  You would probably also need a __hasattribute__, in 
order to be able to properly synchronize with __getattr__/__getattribute__.


Seems like overkill to me, though, as I'm not sure how such a 
protocol actually helps ORM or persistence schemes (and I've written 
a few).  Pretty much, if you're trying to check for the existence of 
an attribute, you're probably about to be getting that attribute 
anyway.  (i.e. why query the existence of an attribute you *don't* 
intend to use?)


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 376 proposed changes for basic plugins support

2010-08-03 Thread P.J. Eby

At 10:28 AM 8/3/2010 +0200, M.-A. Lemburg wrote:

Since you are into comparing numbers, you might want to count
the number of Zope plugins that are available on PyPI and its plugin
system has been around much longer than setuptools has been.
I don't think that proves anything, though.


Actually, some of the ones I found in the search using entry points 
*were* Zope, which, as I mentioned before, is increasingly moving 
away from the old approach in favor of entry points.


In any case, I am not advocating *setuptools* -- I'm advocating that 
if PEP 376 expands to add plugin support, that it do so with a file 
format and associated API based on that of entry points, so as to 
make migration of those ~187 modules and their associated plugins to 
distutils2 a little easier.


In other words, I'm trying to make it easier for people to move OFF 
of setuptools.


Crazy, I know, but there you go.  ;-)

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 376 proposed changes for basic plugins support

2010-08-03 Thread P.J. Eby

At 01:40 PM 8/3/2010 +0200, M.-A. Lemburg wrote:

If you look at the proposal, it is really just about adding a
new data store to manage a certain package type called plugins.
Next time around, someone will want to see support for skins or
themes. Then perhaps identify script packages, or
application packages, or namespace packages, or stubs, etc.
All this can be had by providing this kind of extra
meta-information in the already existing format.


If by existing format, you mean entry points, then yes, that is 
true.  ;-)  They are used today for most of the things you listed; 
anything that's an importable Python object (module, class, function, 
package, constant, global) can be listed as an entry point belonging 
to a named group.  Heck, the first code sample on Nullege for 
iter_entry_points is some package called Apydia loading an entry 
point group called apydia.themes!


Seriously, though, PEP 376 is just setuptools' egg-info under a 
different name with uninstall support added.  And egg-info was 
designed to be able to hold all those things you're talking 
about.  The EggTranslations project, for example, defines 
i18n-support files that can be placed under egg-info, and provides 
its own APIs for looking those things up.  Applications using 
EggTranslations can not only have their own translations shipped as 
plugins, but plugins can provide translations for other plugins of 
the same application.  (I believe it also supports providing other 
i18n resources such as icons as well.)


So, it isn't actually necessary for the stdlib to provide any 
particular support for specific kinds of metadata within PEP 376, as 
long as the PEP 376 API supports finding packages with metadata files 
of a particular name.  (EggTranslations uses similar APIs provided by 
pkg_resources.)


However, since Tarek proposed adding a stdlib-supported plugins 
feature, I am suggesting it adopt the entry_points.txt file name and 
format, to avoid unnecessary API fragmentation.




If we add a new extra file to be managed by the package
managers every time someone comes up with a new use case,
we'd just clutter up the disk with more and more CSV file
extracts and make PEP 376 more and more complex.


The setuptools egg-info convention is not to create files that don't 
contain any useful content, so that their presence or absence conveys 
information.  If that convention is continued in PEP 376, features 
that aren't used won't take up any disk space.


As for cluttering the PEP, IMO any metadata files that aren't part of 
the installation database feature should probably have their own PEP.


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Yield-From Implementation Updated for Python 3

2010-08-02 Thread P.J. Eby

At 09:24 PM 8/1/2010 -0700, Guido van Rossum wrote:

I don't understand all the details and corner
cases (e.g. the concatenation of stacks


It's just to ensure that you never have From's iterating over other 
From's, vs. just iterating whatever's at the top of the stack.




which seems to have to do with the special-casing of From objects in __new__)


It isn't connected, actually except that it's another place where I'm 
keeping From's flat, instead of nested.  (I hear that flat is better.  ;-) )




I am curious whether, if you need a trampoline for async I/O anyway,
there would be a swaying argument for integrating this functionality
into the general trampoline (as in the PEP 342 example),


Originally, that was why I wasn't very enthusiastic about PEP 380; it 
didn't seem to me to be adding any new value over what you could do 
with existing, widely-used libraries.  (Twisted's had its own *and* 
multiple third-party From-ish libraries supporting it for many years now.)


After I wrote From(), however (which was originally intended to show 
why I thought 380 was unnecessary), I realized that having One 
Obvious Way to implement generator-based pseudothreads independent of 
an event loop, is actually useful precisely *because* it separates 
the pseudothreadedness from what you're using the pseudothreadedness for.


Essentially, the PEP 380-ish bit is the hardest part of writing an 
actual pseudothread implementation; connecting that implementation to 
an I/O framework is actually the relatively simple part.  You just 
write code that steps into the generator, and uses the yielded object 
to initiate an I/O operation and register a callback.  (If you're 
using Twisted or something else that has promise-like deferred 
results, it's *really* easy, because you only have a couple of types 
of yielded objects to deal with, and a uniform callback signature.)


Indeed, if you're using an existing async I/O framework, you don't 
even really *have* a trampoline as such -- you just have a bit of 
code that registers callbacks to itself, and the app's main event 
loop just calls back to that wrapper when the I/O is done.


In effect, an I/O framework integration would just give you a single 
API like, run(From(geniter)), which then performs one iteration, 
and then registers whatever callback it's told to by the yield, and 
the callback it registers would actually be a reinvocation of run() 
on the same From instance when the I/O is ready, but with a value to 
pass back into the send(), or an error to throw().  So, the I/O 
framework's event loop is half of the trampoline, and the wrapper 
that sends or throws, then registers an I/O callback, is the other half.


Something like:

def run(coroutine, value=None, exc_info=()):
if exc_info:
action = coroutine.throw(*exc_info)
else:
action = coroutine.send(value)
action.registerCallback(partial(run, coroutine))

Where 'action' is some I/O command object, and registerCallback() 
will call its argument back with a value or exc_info, after the I/O is done.


Of course, a real framework integration might actually dispatch on 
type here rather than using special command objects like this, and 
there might be more glue code to deal with exceptions, but really, 
the heart of the thing is just going to look like that.  (I just 
wrote it that way to show the basic structure.)


Really, it's just a few functions, maybe a utility routine or two, 
and maybe a big if-then or dictionary dispatch on types if you just 
want to be able to 'yield' existing I/O objects provided by the frameworks.


IOW, it's a *lot* simpler than actually rolling your own I/O or GUI 
framework like Twisted or Eventlet or wxPython or tk or some other such thing.




But it seems a bit of a waste to have two different trampolines,
especially since the trampoline itself is so hard to understand
(speaking for myself here :-). ISTM that the single combined
trampoline is easier to understand than the From class.


Well, the PEP 342 example was made to look simple, because it doesn't 
have to actually DO anything (like I/O!)  To work for real, it'd need 
some pluggability, and some things to help it interoperate with 
different GUI and I/O frameworks and event loops.  (Using your own 
event loop for real isn't very useful in a lot of non-trivial applications.)


Heck, after writing From(), it gave me an idea that I could just 
write a trampoline that *could* integrate with other event loops, 
with an idea to have it be a general-purpose companion to From.


But, after several wasted hours, I realized that yes, it *could* be 
written (I still have the draft), but it was mostly just something 
that would save a little boilerplate in bolting From()'s onto an 
existing async I/O framework, and not really anything to write home about.


So, I guess what I'm saying is, the benefit of separating the 
trampoline from control flow, is that people can then use them 

Re: [Python-Dev] PEP 376 proposed changes for basic plugins support

2010-08-02 Thread P.J. Eby

At 01:53 PM 8/2/2010 +, exar...@twistedmatrix.com wrote:

On 01:27 pm, m...@egenix.com wrote:

exar...@twistedmatrix.com wrote:

This is also roughly how Twisted's plugin system works.  One drawback,
though, is that it means potentially executing a large amount of Python
in order to load plugins.  This can build up to a significant
performance issue as more and more plugins are installed.


I'd say that it's up to the application to deal with this problem.

An application which requires lots and lots of plugins could
define a registration protocol that does not require loading
all plugins at scanning time.


Just for the record, solving this problem is precisely what entry 
points are for: they provide a discovery mechanism that doesn't 
require importing anything until you actually need it.


It's not fixable at the application level, at least in Twisted's 
plugin system.  It sounds like Zope's system has the same problem, 
but all I know of that system is what you wrote above.


I don't know about Zope in general, but there are certainly Zope 
corp. projects that use entry points instead of namespaces (buildout, 
for one), and I believe that there's been a long time push to move 
third-party code out of the common namespace package.  i.e., AFAIK, 
Zope 3 doesn't use package namespaces as a primary method of extension.



  The cost increases with the number of plugins installed on the 
system, not the number of plugins the application wants to load.


Pretty much any plugin discovery system is going to scale that way, 
but entry points only require file reads rather than imports, and 
have a shared cache for all code in use by the application.  So if, 
say, Twisted uses entry points and an application running on Twisted 
also uses entry points, the loading cost is only paid once for both 
sets of entry points inspected.


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 376 proposed changes for basic plugins support

2010-08-02 Thread P.J. Eby

At 01:10 PM 8/2/2010 +0200, Tarek Ziadé wrote:

I don't have a specific example in mind, and I must admit that if an
application does the right thing
(provide the right configuration file), this activate feature is not
useful at all. So it seems to be a bad idea.


Well, it's not a *bad* idea as such; actually, having conventions for 
such configuration, and libraries that help to implement the 
convention are a *good* idea, and I support it.  I just don't think 
it makes much sense to *impose* the convention on the app developers; 
there are, after all, use cases that don't need the extra configuration.


Setuptools was mainly designed to support the application plugin 
directory model for invasive sorts of plugins, and the global 
plugin availability model for the kind of plugins that a user has to 
explicitly select (e.g. file type converters, special distutils 
commands, etc.).  However, there are definitely use cases for 
user-configured plugins, and the apps that do it generally use some 
sort of configuration file to identify which entry points they'll actually use.




IOW, have entry points like setuptools provides, but in a metadata
field instead of a entry_points.txt file.


May I suggest, then, that we keep entry_points.txt, but simply 
provide a summary in PKG-INFO?  (i.e., list the groups and names provided)


This would still make it easy for human browsing/discovery of entry 
points on PyPI, but it would allow easy forward/backward 
compatibility between setuptools and distutils2, while also providing 
faster lookup of entry points (because you can skip distributions 
that don't have an entry points file, vs. having to parse *every* 
PKG-INFO file).


Or to put it another way, when I implement PEP 376 support in 
setuptools 0.7, I'll only have to change the name of the .egg-info 
directory and copy the entry point summary into PKG-INFO.  And, even 
more to the point, people who define entry points with distutils2 
will then be able to have them work with setuptools-based projects, 
and vice versa, helping to smooth the transition.


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 376 proposed changes for basic plugins support

2010-08-02 Thread P.J. Eby

At 05:08 PM 8/2/2010 +0200, Éric Araujo wrote:
I wonder if functions in pkgutil or importlib could allow one to 
iterate over the plugins (i.e. submodules and subpackages of the 
namespace package) without actually loading then.


See pkgutil.walk_packages(), available since 2.5.

It has to load __init__.py files, especially because of namespace 
packages, but it doesn't load any non-package modules.


That being said, using namespace packages for plugins kind of defeats 
the true purpose of namespace packages, which is to give developers 
private package namespaces they can use across multiple projects, 
like zope.*, peak.*, twisted.*, etc., thereby avoiding naming 
conflicts in the root package namespace.


Granted, you can always re-nest namespaces and do something like 
someproject.plugins.mynamehere.myplugin, but with entry points you 
can just register something in mynamehere.mysomeprojectplugin, and 
flat is better than nested.  ;-)  (Plus, you can include information 
about the individual plugins/features residing in that module in the 
metadata, and avoid importing until/unless you need that feature.)


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 376 proposed changes for basic plugins support

2010-08-02 Thread P.J. Eby

At 09:03 PM 8/2/2010 +0100, Michael Foord wrote:

Ouch. I really don't want to emulate that system. For installing a
plugin for a single project the recommended technique is:

 * Unpack the source. It should provide a setup.py.
 * Run:

   $ python setup.py bdist_egg

 Then you will have a *.egg file. Examine the output of running
python to find where this was created.

 Once you have the plugin archive, you need to copy it into the
plugins directory of the project environment


Those instructions are apparently out-of-date; you can actually just 
easy_install -m or pip the plugin directly to the plugins 
directory, without any additional intervening steps.


(The only reason to create an .egg file for Trac is if you intend to 
distribute to non-developer users who will be told to just drop it in 
the plugins directory.)




For global plugins it just uses entry points, which is similar to the
functionality we are suggesting adding...


I believe it's using entry points for both, actually.  It just has an 
(application-specific) filtering mechanism to restrict which entry 
points get loaded.




Really this sounds *astonishingly* like the system we are proposing. :-)


Which is why I keep pointing out that the code for doing most of it 
is already available in setuptools, distribute, pip, buildout, etc., 
and so (IMO) ought to just get copied into distutils2, the way 
easy_install's package index code was.  ;-)


(Of course, adding some filtering utilities to make it easier for 
apps to do explicit configuration would still be nice.)


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 376 proposed changes for basic plugins support

2010-08-02 Thread P.J. Eby

At 10:37 PM 8/2/2010 +0200, M.-A. Lemburg wrote:

If that's the case, then it would be better to come up with an
idea of how to make access to that meta-data available in a less
I/O intense way, e.g. by having pip or other package managers update
a central SQLite database cache of the data found on disk.


Don't forget system packaging tools like .deb, .rpm, etc., which do 
not generally take kindly to updating such things.  For better or 
worse, the filesystem *is* our central database these days.


Btw, while adding PLUGINS to PEP 376 is a new proposal, it's 
essentially another spelling of the existing entry_points.txt used by 
eggs; it changes the format to csv instead of .ini, and adds 
description and type fields, but drops requirements information 
and I'm not sure if it can point to arbitrary objects the way 
entry_points.txt can.


Anyway, entry_points.txt has been around enough years in the field 
that the concept itself can't really be called new - it's actually 
quite proven.  Checking 
http://nullege.com/codes/search/pkg_resources.iter_entry_points/call 
, I find 187 modules using just that one entry points API.


Some projects do have more than one module loading plugins, but the 
majority of those 187 appear to be different projects.


Note that that's modules *loading plugins*, not plugins being 
provided...  so the total number of PyPI projects using entry points 
in some way is likely much higher, once you add in the plugins that 
these 187 lookups are, well, looking up.



___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 382 progress: import hooks

2010-08-02 Thread P.J. Eby

At 05:28 PM 8/2/2010 -0700, Brett Cannon wrote:


On Fri, Jul 23, 2010 at 09:54, P.J. Eby 
mailto:p...@telecommunity.comp...@telecommunity.com wrote:

At 11:57 AM 7/23/2010 +0100, Brett Cannon wrote:



On Thu, Jul 22, 2010 at 19:19, P.J. Eby 
mailto:p...@telecommunity.comp...@telecommunity.com wrote:


What does is not a package actually mean in that context?


The module is a module but not a package.


Um... Â that's not any clearer. Â Are you saying that a module of 
the same name takes precedence over a package? Â Is that the current 
precedence as well?



No, packages take precedence. I meant that something is a module but 
it is not a package; a package implicitly includes a module, but a 
module is not automatically a package.


That explanation still isn't making things any clearer for me.  That 
is, I don't know how to get from that statement to actual code, even 
if I were writing a filesystem or zip importer, let alone anything more exotic.



 zipimport also does it this way as it too does not differentiate a 
reload from a clean load beyond grabbing the module from 
sys.modules if it is already there. PEP 302 does not directly state 
that reloading should not reset the attributes that import must 
set, simply that a module from sys.modules must be reused. Since 
zipimport does it this way I wouldn't count on other loaders not 
setting __path__.


Fair enough, though certainly unfortunate.  In particular, it means 
that it's not actually possible to correctly/completely implement PEP 
382 on any already-released version of Python, without essentially 
replacing zipimport.  (Unless the spec can be tweaked a bit.)



I'm personally not worried about supporting older versions of Python 
as this is a new feature. Better to design it properly than come up 
with some hack solution as we will all have to live with this for a long time.


Currently, older Pythons are the only versions I *do* support, so I'm 
very concerned with it.  Otherwise, I'd not be asking all these questions.  ;-)


Personally, I think there are features in the PEP that make things 
unnecessarily complicated - for example, supporting both __init__.py 
*and* .pth files in the same directory.  If it were either/or, it 
would be a LOT easier to implement on older Pythons, since it 
wouldn't matter when you initialized the __path__ in that case.


(By the way, there were some other questions I asked about the PEP 
382 revisions, that you didn't reply to in previous emails (such as 
the format of the strings to be returned by find_path()); I hope 
either you or Martin can fill those in for me, and hopefully update 
the PEP with the things we have talked about in this thread.)


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Yield-From Implementation Updated for Python 3

2010-08-01 Thread P.J. Eby

At 08:49 AM 8/1/2010 -0400, Kevin Jacobs jac...@bioinformed.com wrote:
On Sun, Aug 1, 2010 at 3:54 AM, Greg Ewing 
mailto:greg.ew...@canterbury.ac.nzgreg.ew...@canterbury.ac.nz wrote:

I have updated my prototype yield-from implementation
to work with Python 3.1.2.


My work is primarily on the management and analysis of huge genomics 
datasets.  I use Python generators extensively and intensively to 
perform efficient computations and transformations on these datasets 
that avoid the need to materialize them in main memory to the extent 
possible.   I've spent a great deal of effort working around the 
lack of an efficient yield from construct and would be very 
excited to see this feature added.


Just so you know, you don't need to wait for this to be added to 
Python in order to have such a construct; it just won't have the 
extra syntax sugar.  See the sample code I posted here using a 
@From.container decorator, and a yield From() call:


  http://mail.python.org/pipermail/python-dev/2010-July/102320.html

This code effectively reduces your generator nesting depth to a 
constant, no matter how deeply you nest sub-generator 
invocations.  It's not as efficient as the equivalent C 
implementation, but if you're actually being affected by nesting 
overhead now, it will nonetheless provide you with some immediate 
relief, if you backport it to 2.x code.  (It's not very 3.x-ish as it 
sits, really.)


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 376 proposed changes for basic plugins support

2010-08-01 Thread P.J. Eby

At 02:03 AM 8/2/2010 +0200, Tarek Ziadé wrote:

but then we would be back to the problem mentioned about entry points:
installing projects can implicitly add a plugin and activate it, and break
existing applications that iterate over entry points without further
configuration. So being able to disable plugins from the beginning seems
important to me


So which are these apps that don't allow configuration, and which are 
the plugins that break them?  Have the issues been reported so that 
the authors can fix them?


ISTM that the issue can only arise in cases where you are installing 
plugins to a *global* environment, rather than to an environment 
specific to the application.


In the case of setuptools, for example, it's expected that a project 
will use 'setup_requires' to identify the plugins it wishes to use, 
apart from any that were intentionally installed globally.  (The 
requested plugins are then added to sys.path only for the duration of 
the setup script execution.)


Other applications have plugin directories where their plugins are to 
be installed, and still others have explicit configuration to enable 
named plugins.


Even in the worst-case scenario, where an app has no plugin 
configuration and no private plugin directory, you can still control 
plugin availability by installing plugins to the directory where the 
application's main script is located, or point PYTHONPATH to point to 
a directory you've chosen to hold the plugins of your choice.


So without specific examples of why this is a problem, it's hard to 
see why a special Python-specific set of configuration files is 
needed to resolve it, vs. say, encouraging application authors to use 
the available alternatives for doing plugin directories, config files, etc.


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] proto-pep: plugin proposal (for unittest)

2010-07-30 Thread P.J. Eby

At 03:34 PM 7/30/2010 +0100, Michael Foord wrote:
Automatic discoverability, a-la setuptools entry points, is not 
without its problems though. Tarek outlines some of these in a more 
recent blog post:


FWIW, it's not discovery that's the problem, but configuring *which* 
plugins you wish to have active.  Entry points support access by 
name, and it's up to the application using them to decide *which* ones to load.


The underlying idea is that entry points expose a hook; it's up to 
the app to decide which ones it should actually import and use.  An 
application also can list the available plugins and ask the user, 
etc.(For example, setuptools only loads setup() argument entry 
points for specified arguments, and command entry points only for the 
commands a user explicitly invokes.)


IOW, entry points provide access to plugins, not policy or 
configuration for *which* plugins you wish to use.  This was an 
intentional decision since applications vary widely in what sort of 
configuration mechanism they use.  In the simplest cases (e.g. 
single-app environments like Chandler), simply making the plugin 
available on sys.path (e.g. via a special plugins directory) is 
configuration enough.  In more complex use cases, an app might have 
to import plugins in order to get more information about them.



___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] unexpected import behaviour

2010-07-30 Thread P.J. Eby

At 11:50 PM 7/30/2010 +0400, Oleg Broytman wrote:

On Fri, Jul 30, 2010 at 07:46:44PM +0100, Daniel Waterworth wrote:
 can anyone think of a case where someone has
 been annoyed that, having imported that same module twice via
 symlinks, they have had problems relating to modules being independent
 instances?

   I've had problems with two instances of the same module imported after
sys.path manipulations. Never had a problem with reimported scripts.


I have.  The unittest module used to have this problem, when used 
as a script.


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] proto-pep: plugin proposal (for unittest)

2010-07-30 Thread P.J. Eby

At 04:37 PM 7/30/2010 +0200, Tarek Ziadé wrote:

On Fri, Jul 30, 2010 at 4:04 PM, Barry Warsaw ba...@python.org wrote:
..
 * Registration - How do third party plugins declare themselves to 
exist, and

  be enabled?  Part of this seems to me to include interface declarations
  too.  Is installation of the plugin enough to register it?  How 
do end users

  enable and disable plugins that me be registered on their system?  How do
  plugins describe themselves (provide short and log descriptions, declare
  options, hook into command line interfaces, etc.)?

 * Installation - How are plugins installed on the system?  Do they have to
  appear in a special directory on the file system?  Do they need special
  setup.py magic to write extra files?  Do they need to live in a 
pre-defined

  namespace?

FWIW We are thinking about adding in distutils2 a system quite similar
to the entry points
setuptools has, but with extra abilities for the end user :

- activate / deactivate plugins without having to remove the project
that added them
- configure globally if plugins are implicitely activated or not --
and maybe allow the distutils2 installer to ask the user
  when a plugin is detected if he wants it activate or not
- provide a tool to browse them


Note, by the way, that none of these are mutually exclusive to the 
entry point mechanism; it is simply up to an application developer to 
decide which of those features he/she wishes to provide.  A library 
that provides common implementations of such features on top of entry 
points would be a good idea.


pkg_resources already supplies one such tool, btw: the 
find_plugins() API for locating projects in one or more plugin 
directories that *could* be added to sys.path to provide plugins for 
an application.  It's then up to the application to filter this list 
further (e.g. via its own configuration).


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Thoughts fresh after EuroPython

2010-07-25 Thread P.J. Eby

At 04:29 PM 7/25/2010 +1000, Nick Coghlan wrote:

So, while I can understand Guido's temptation (PEP 380 *is* pretty
cool), I'm among those that hope he resists that temptation. Letting
these various ideas bake a little longer without syntactic support
likely won't hurt either.


Well, if somebody wants to clean up my syntax-sugar-free version a 
little (maybe adding a From.return_(value) staticmethod that raises 
StopIteration(value)) and throw it in the stdlib, then people can 
certainly experiment with the feature in 3.2, and get an opportunity 
to iron out any implementation issues before going to the 
C-and-sugared version later.


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 380 - return value question and prototype implementation (was Thoughts fresh after EuroPython)

2010-07-24 Thread P.J. Eby

At 07:08 AM 7/24/2010 -0700, Guido van Rossum wrote:

- After seeing Raymond's talk about monocle (search for it on PyPI) I
am getting excited again about PEP 380 (yield from, return values from
generators). Having read the PEP on the plane back home I didn't see
anything wrong with it, so it could just be accepted in its current
form.


I would like to reiterate (no pun intended) the suggestion of a 
special syntactic  form for the return, such as yield return x, or 
return with x or something similar, to distinguish it from a normal 
generator return.


I think that when people are getting used to the idea of generators, 
it's important for them to get the idea that the function's return 
value isn't really a value, it's an iterator object.  Allowing a 
return value, but then having that value silently disappear, seems 
like it would delay that learning, so, a special form might help to 
make it clear that the generator in question is intended for use with 
a corresponding yield from, and help avoid confusion on this.


(I could of course be wrong, and would defer to anyone who sees a 
better way to explain/teach around this issue.  In any event, I'm +1 
on the PEP otherwise.)


By the way, the PEP's optimized implementation could probably be 
done just by making generator functions containing yield-from 
statements return an object of a different type than the standard 
geniter.  Here's a Python implementation sketch, using a helper class 
and a decorator -- translation to a C version is likely 
straightforward, as it'll basically be this plus a light sprinkling 
of syntactic sugar.


So, in the pure-Python prototype (without syntax sugaring), usage 
would look like this:


@From.container
def some_generator(...):
...
yield From(other_generator(...))  # equivalent to 'yield from'
...

def other_generator(...):
...
raise StopIteration(value)  # equivalent to 'return value'


We mark some_generator() with @From.container to indicate that it 
uses 'yield from' internally (which would happen automatically in the 
C/syntax sugar version).  We don't mark other_generator(), though, 
because it doesn't contain a 'yield from'.


Now, the implementation code (a slightly altered/watered-down version 
of a trampoline I've used before in 2.x, hopefully altered correctly 
for Python 3.x syntax/semantics):


class From:
@classmethod
def container(cls, func):
def decorated(*args, **kw):
return cls(func(*args, **kw))  # wrap generator in a 
From() instance

return decorated

def __new__(cls, geniter):
if isinstance(geniter, cls):
# It's already a 'From' instance, just return it
return geniter
self = object.__new__(cls, geniter)
self.stack = [geniter]
return self

def __iter__(self):
return self

def __next__(self):
return self._step()

def send(self, value):
return self._step(value)

def throw(self, *exc_info):
return self._step(None, exc_info)

def _step(self, value=None, exc_info=()):
if not self.stack:
raise RuntimeError(Can't resume completed generator)
try:
while self.stack:
try:
it = self.stack[-1]
if exc_info:
try:
rv = it.throw(*exc_info)
finally:
exc_info = ()
elif value is not None:
rv = it.send(value)
else:
rv = it.next()
except:
value = None
exc_info = sys.exc_info()
if exc_info[0] is StopIteration:
# pass return value up the stack
value, = exc_info[1].args or (None,)
exc_info = ()   # but not the error
self.stack.pop()
else:
if isinstance(rv, From):
stack.extend(rv.stack)  # Call subgenerator
value, exc_info, rv = None, (), None
else:
return rv   # it's a value to yield/return
else:
# Stack's empty, so exit w/current return value or error
if exc_info:
raise exc_info[1]
else:
return value
finally:
exc_info = ()   # don't let this create garbage

def close(self):
if self.stack:
try:
# There's probably a cleaner way to do this in Py 3, I just
# don't know what it is off the top of my head...
raise GeneratorExit
except GeneratorExit as e:
try:
self.throw(*sys.exc_info())
except 

Re: [Python-Dev] PEP 380 - return value question and prototype implementation (was Thoughts fresh after EuroPython)

2010-07-24 Thread P.J. Eby

At 08:21 PM 7/24/2010 -0700, Guido van Rossum wrote:

FWIW, the thing that was harder to debug when I tried to write some
code involving generators and a trampoline recently, was thinking of a
function as a generator without actually putting a yield in it
(because a particular version of a coroutine pattern didn't need to
block at all). Monocle uses a decorator to flag all coroutines which
fixes this up in the right way, which I think is clever, but I'm torn
about the need to flag every coroutine with a decorator -- Monocle
makes the decorator really short (@_o) because, as Raymond (not
Monocle's author but its advocate at EuroPython) said, you'll be
using this hundreds of times. Which I find disturbing in itself.


I haven't used Monocle, but in all the libraries I've written myself 
for this sort of thing (Trellis and peak.events), a decorator is only 
required for a generator that is a root task; everything else is 
just a normal generator.


For example, in Trellis you use @Task.factory to mark a function as 
spawning an independent task each time it's called, but subgenerator 
functions called within the task don't need to be marked, and in fact 
the yield from is just a yield - the trampoline expects all 
yields of generators to be subgenerator calls.  (PEP 380 can't do 
this of course, since it also doubles as a sort of 'yield *' - i.e., 
you may care about the yielded values)


Note, though, that even in the sketch I just gave, you don't *really* 
need to decorate every function, just the ones that need to be called 
from *non*-decorated functions...  i.e. root coroutines.  Even 
then, you could *still* skip the decorator and replace:


  an_iter = decorated_root_function()

with:

  an_iter = From(undecorated_root_function())

and not need to decorate *anything*.



___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 382 progress: import hooks

2010-07-23 Thread P.J. Eby

At 11:57 AM 7/23/2010 +0100, Brett Cannon wrote:


On Thu, Jul 22, 2010 at 19:19, P.J. Eby 
mailto:p...@telecommunity.comp...@telecommunity.com wrote:


What does is not a package actually mean in that context?


The module is a module but not a package.


Um...  that's not any clearer.  Are you saying that a module of the 
same name takes precedence over a package?  Is that the current 
precedence as well?



Regarding load_module_with_path(), how does its specification differ 
from simply creating a module in sys.modules, setting its __path__, 
and then invoking the standard load_module()? Â (i.e., is this 
method actually needed, since a correct PEP 302 loader *must* reuse 
an existing module object in sys.modules)



It must reuse the module itself but a proper reload would reset 
__path__ as leaving it unchanged is not a proper resetting of the 
module object. So this module is needed in order to force the loaderÂ


Um, no.  Reloading doesn't reset the module contents, not even 
__path__.  Never has, from Python 2.2 through 2.7 -- even in 3.1.  At 
least, not for normal filesystem .py/.pyc files.  (I tested with 
'os', adding an extra 'foo' attribute, and also setting a __path__; 
both were unaffected by reload(), in all 7 Python versions.


Perhaps you're saying this happens with zipfiles, or packages that 
already have a __path__, or...?




Â

Am I correct in understanding that, as written, one would have to 
redefine __import__ to implement this in a library for older Python 
versions? Â Or is it implementable as a meta_path importer?



Redefine __import__ (unless Martin and I are missing something, but 
I tried to think of how to implement this use sys.meta_path and 
couldn't come up with a solution).


I'm thinking it *could* be done with a meta_path hook, but only by 
doubling the search length in the event that the search failed.  That 
seems a bit icky, but replacing the entire import process seems 
ickier (more code surface to maintain, more bug potential) in the 
case of supporting older Pythons.


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 382 progress: import hooks

2010-07-22 Thread P.J. Eby

At 01:51 PM 7/22/2010 +0100, Martin v. Löwis wrote:

At EuroPython, I sat down with Brett and we propose an approach
how namespace packages get along with import hooks. I reshuffled
the order in which things get done a little bit, and added a
section that elaborates on the hooks.

Basically, a finder will need to support a find_path method,
return all .pth files, and a loader will need to support a
load_module_with_path method, to initialize __path__.

Please comment if you think that this needs further changes;


I'm not certain I understand it precisely.  There seem to be some 
ambiguities in the spec, e.g.:


If fullname is not found, is not a package, or does not have any 
*.pth files, None must be returned.


What does is not a package actually mean in that context?  What 
happens if an empty list is returned - does that mean the importer is 
saying, this is a package, whether it has an __init__.py or not?


As for the list of strings returned, is each string the entire 
contents of the .pth file?  Is it to be \n-separated, or is any 
universal-newlines-compatible string accepted?  Is there a particular 
order in which .pth file contents are to be returned?


Regarding load_module_with_path(), how does its specification differ 
from simply creating a module in sys.modules, setting its __path__, 
and then invoking the standard load_module()?  (i.e., is this method 
actually needed, since a correct PEP 302 loader *must* reuse an 
existing module object in sys.modules)




I'll hope to start implementing it soon.


Am I correct in understanding that, as written, one would have to 
redefine __import__ to implement this in a library for older Python 
versions?  Or is it implementable as a meta_path importer?




Regards,
Martin


Thanks for your work on this, I was just thinking about pinging to 
see how it was going.  ;-)


(I want setuptools 0.7 to be able to supply an add-on module for 
supporting this PEP in older Pythons, so that its current .pth hacks 
for implementing namespace packages can be dropped.)


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] bytes / unicode

2010-06-27 Thread P.J. Eby

At 03:53 PM 6/27/2010 +1000, Nick Coghlan wrote:

We could talk about this even longer, but the most effective way
forward is going to be a patch that improves the URL parsing
situation.


Certainly, it's the only practical solution for the immediate problems in 3.2.

I only mentioned that I hate the idea because I'd be more 
comfortable if it was explicitly declared to be a temporary hack to 
work around the absence of a string coercion protocol, due to the 
moratorium on language changes.


But, since the moratorium *is* in effect, I'll try to make this my 
last post on string protocols for a while...  and maybe wait until 
I've looked at the code (str/bytes C implementations) in more detail 
and can make a more concrete proposal for what the protocol would be 
and how it would work.  (Not to mention closer to the end of the moratorium.)



There are a *very small* number of APIs where it is appropriate to 
be polymorphic


This is only true if you focus exclusively on bytes vs. unicode, 
rather than the general issue that it's currently impractical to pass 
*any* sort of user-defined string type through code that you don't 
directly control (stdlib or third-party).




The virtues of a separate poly_str type are that:
1. It can be simple and implemented in Python, dispatching to str or
bytes as appropriate (probably in the strings module)
2. No chance of impacting the performance of the core interpreter (as
builtins are not affected)


Note that adding a string coercion protocol isn't going to change 
core performance for existing cases, since any place where the 
protocol would be invoked would be a code branch that either throws 
an error or *already* falls back to some other protocol (e.g. the 
buffer protocol).




3. Lower impact if it turns out to have been a bad idea


How many protocols have been added that turned out to be bad 
ideas?  The only ones that have been removed in 3.x, IIRC, are 
three-way compare, slice-specific operations, and __coerce__...  and 
I'm going to miss __cmp__.  ;-)


However, IIUC, the reason these protocols were dropped isn't because 
they were bad ideas.  Rather, they're things that can be 
implemented in terms of a finer-grained protocol.  i.e., if you want 
__cmp__ or __getslice__ or __coerce__, you can always implement them 
via a mixin that converts the newer fine-grained protocols into 
invocations of the older protocol.  (As I plan to do for __cmp__ in 
the handful of places I use it.)


At the moment, however, this isn't possible for multi-string 
operations outside of __add__/__radd__ and comparison -- the coercion 
rules are hard-wired and can't be overridden by user-defined types.


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] bytes / unicode

2010-06-26 Thread P.J. Eby

At 12:42 PM 6/26/2010 +0900, Stephen J. Turnbull wrote:

What I'm saying here is that if bytes are the signal of validity, and
the stdlib functions preserve validity, then it's better to have the
stdlib functions object to unicode data as an argument.  Compare the
alternative: it returns a unicode object which might get passed around
for a while before one of your functions receives it and identifies it
as unvalidated data.


I still don't follow, since passing in bytes should return 
bytes.  Returning unicode would be an error, in the case of a 
polymorphic function (per Guido).




But you agree that there are better mechanisms for validation
(although not available in Python yet), so I don't see this as an
potential obstacle to polymorphism now.


Nope.  I'm just saying that, given two bytestrings to url-join or 
path join or whatever, a polymorph should hand back a 
bytestring.  This seems pretty uncontroversial.




  What I want is for the stdlib to create stringlike objects of a
  type determined by the types of the inputs --

In general this is a hard problem, though.  Polymorphism, OK, one-way
tainting OK, but in general combining related types is pretty
arbitrary, and as in the encoded-bytes case, the result type often
varies depending on expectations of callers, not the types of the
data.


But the caller can enforce those expectations by passing in arguments 
whose types do what they want in such cases, as long as the string 
literals used by the function don't get to override the relevant 
parts of the string protocol(s).


The idea that I'm proposing is that the basic string and byte types 
should defer to user-defined string types for mixed type 
operations, so that polymorphism of string-manipulation functions is 
the *default* case, rather than a *special* case.  This makes 
tainting easier to implement, as well as optimizing and other special 
cases (like my source string w/file and line info, or a string with 
font/formatting attributes).






___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/pje%40telecommunity.com


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] bytes / unicode

2010-06-26 Thread P.J. Eby

At 12:43 PM 6/27/2010 +1000, Nick Coghlan wrote:

While full support for third party strings and
byte sequence implementations is an interesting idea, I think it's
overkill for the specific problem of making it easier to write
str/bytes agnostic functions for tasks like URL parsing.


OTOH, to write your partial implementation is almost as complex - it 
still must take into account joining and formatting, and so by that 
point, you've just proposed a new protocol for coercion...  so why 
not just make the coercion protocol explicit in the first place, 
rather than hardwiring a third type's worth of special cases?


Remember, bytes and strings already have to detect mixed-type 
operations.  If there was an API for that, then the hardcoded special 
cases would just be replaced, or supplemented with type slot checks 
and calls after the special cases.


To put it another way, if you already have two types special-casing 
their interactions with each other, then rather than add a *third* 
type to that mix, maybe it's time to have a protocol instead, so that 
the types that care can do the special-casing themselves, and you 
generalize to N user types.


(Btw, those who are saying that the resulting potential for N*N 
interaction makes the feature unworkable seem to be overlooking 
metaclasses and custom numeric types -- two Python features that in 
principle have the exact same problem, when you use them beyond a 
certain scope.  At least with those features, though, you can 
generally mix your user-defined metaclasses or numeric types with the 
Python-supplied basic ones and call arbitrary Python functions on 
them, without as much heartbreak as you'll get with a from-scratch 
stringlike object.)


All that having been said, a new protocol probably falls under the 
heading of the language moratorium, unless it can be considered new 
methods on builtins?  (But that seems like a stretch even to me.)


I just hate the idea that functions taking strings should have to be 
*rewritten* to be explicitly type-agnostic.  It seems *so* 
un-Pythonic...  like if all the bitmasking functions you'd ever 
written using 32-bit int constants had to be rewritten just because 
we added longs to the language, and you had to upcast them to be 
compatible or something.  Sounds too much like C or Java or some 
other non-Python language, where dynamism and polymorphy are the 
special case, instead of the general rule. 


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] bytes / unicode

2010-06-25 Thread P.J. Eby

At 04:49 PM 6/25/2010 +0900, Stephen J. Turnbull wrote:

P.J. Eby writes:

  This doesn't have to be in the functions; it can be in the
  *types*.  Mixed-type string operations have to do type checking and
  upcasting already, but if the protocol were open, you could make an
  encoded-bytes type that would handle the error checking.

Don't you realize that encoded-bytes is equivalent to use of a very
limited profile of ISO 2022 coding extensions?  Such as Emacs/MULE
internal encoding or TRON code?  It has been tried.  It does not work.

I understand how types can do such checking; my point is that the
encoded-bytes type doesn't have enough information to do it in the
cases where you think it is better than converting to str.  There are
*no useful operations* that can be done on two encoded-bytes with
different encodings unless you know the ultimate target codec.


I do know the ultimate target codec -- that's the point.

IOW, I want to be able to do to all my operations by passing 
target-encoded strings to polymorphic functions.  Then, the moment 
something creeps in that won't go to the target codec, I'll be able 
to track down the hole in the legacy code that's letting bad data creep in.




  The
only sensible way to define the concatenation of ('ascii', 'English')
with ('euc-jp','ÆüËܸì') is something like ('ascii', 'English',
'euc-jp','ÆüËܸì'), and *not* ('euc-jp','EnglishÆüËܸì'), because you
don't know that the ultimate target codec is 'euc-jp'-compatible.
Worse, you need to build in all the information about which codecs are
mutually compatible into the encoded-bytes type.  For example, if the
ultimate target is known to be 'shift_jis', it's trivially compatible
with 'ascii' and 'euc-jp' requires a conversion, but latin-9 you can't
have.


The interaction won't be with other encoded bytes, it'll be with 
other *unicode* strings.  Ones coming from other code, and literals 
embedded in the stdlib.





No, the problem is not with the Unicode, it is with the code that
allows characters not encodable with the target codec.


And which code that is, precisely, is the thing that may be very 
difficult to find, unless I can identify it at the first point it 
enters (and corrupts) my output data.  When dealing with a large code 
base, this may be a nontrivial problem.


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] bytes / unicode

2010-06-25 Thread P.J. Eby

At 01:18 AM 6/26/2010 +0900, Stephen J. Turnbull wrote:

It seems to me what is wanted here is something like Perl's taint
mechanism, for *both* kinds of strings.  Am I missing something?


You could certainly view it as a kind of tainting.  The part where 
the type would be bytes-based is indeed somewhat incidental to the 
actual use case -- it's just that if you already have the bytes, and 
all you want to do is tag them (e.g. the WSGI headers case), the 
extra encoding step seems pointless.


A string coercion protocol (that would be used by .join(), .format(), 
__contains__, __mod__, etc.) would allow you to do whatever sort of 
tainted-string or tainted-bytes implementations one might wish to 
have.  I suppose that tainting user inputs (as in Perl) would be just 
as useful of an application of the same coercion protocol.


Actually, I have another use case for this custom string coercion, 
which is that I once wrote a string subclass whose purpose was to 
track the original file and line number of some text.  Even though 
only my code was manipulating the strings, it was very difficult to 
get the tainting to work correctly without extreme care as to the 
string methods used.  (For example, I had to use string addition 
rather than %-formatting.)




But with your architecture, it seems to me that you actually don't
want polymorphic functions in the stdlib.  You want the stdlib
functions to be bytes-oriented if and only if they are reliable.  (This
is what I was saying to Guido elsewhere.)


I'm not sure I follow you.  What I want is for the stdlib to create 
stringlike objects of a type determined by the types of the inputs -- 
where the logic for deciding this coercion can be controlled by the 
input objects' types, rather than putting this in the hands of the 
stdlib function.


And of course, this applies to non-stdlib functions, too -- anything 
that simply manipulates user-defined string classes, should allow the 
user-defined classes to determine the coercion of the result.



BTW, this was a little unclear to me:

  [Collisions will] be with other *unicode* strings.  Ones coming
  from other code, and literals embedded in the stdlib.

What about the literals in the stdlib?  Are you saying they contain
invalid code points for your known output encoding?  Or are you saying
that with non-polymorphic unicode stdlib, you get lots of false
positives when combining with your validated bytes?


No, I mean that the current string coercion rules cause everything to 
be converted to unicode, thereby discarding the tainting information, 
so to speak.  This applies equally to other tainting use cases, and 
other uses for custom stringlike objects.


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] bytes / unicode

2010-06-24 Thread P.J. Eby

At 05:12 PM 6/24/2010 +0900, Stephen J. Turnbull wrote:

Guido van Rossum writes:

  For example: how we can make the suite of functions used for URL
  processing more polymorphic, so that each developer can choose for
  herself how URLs need to be treated in her application.

While you have come down on the side of polymorphism (as opposed to
separate functions), I'm a little nervous about it.  Specifically,
Philip Eby expressed a desire for earlier type errors, while
polymorphism seems to ensure that you'll need to Look Before You Leap
to get early error detection.


This doesn't have to be in the functions; it can be in the 
*types*.  Mixed-type string operations have to do type checking and 
upcasting already, but if the protocol were open, you could make an 
encoded-bytes type that would handle the error checking.


(Btw, in some earlier emails, Stephen, you implied that this could be 
fixed with codecs -- but it can't, because the problem isn't with the 
bytes containing invalid Unicode, it's with the Unicode containing 
invalid bytes -- i.e., characters that can't be encoded to the 
ultimate codec target.)


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] bytes / unicode

2010-06-23 Thread P.J. Eby

At 08:34 PM 6/22/2010 -0400, Glyph Lefkowitz wrote:

I suspect the practical problem here is that there's no CharacterString ABC


That, and the absence of a string coercion protocol so that mixing 
your custom string with standard strings will do the right thing for 
your intended use.


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] bytes / unicode

2010-06-22 Thread P.J. Eby

At 07:41 AM 6/23/2010 +1000, Nick Coghlan wrote:

Then my example above could be made polymorphic (for ASCII compatible
encodings) by writing:

  [x for x in seq if x.endswith(x.coerce(b))]

I'm trying to see downsides to this idea, and I'm not really seeing
any (well, other than 2.7 being almost out the door and the fact we'd
have to grant ourselves an exception to the language moratorium)


Notice, however, that if multi-string operations used a coercion 
protocol (they currently have to do type checks already for 
byte/unicode mixes), then you could make the entire stdlib 
polymorphic by default, even for other kinds of strings that don't exist yet.


If you invent a new numeric type, generally speaking you can pass it 
to existing stdlib functions taking numbers, as long as it implements 
the appropriate protocols.  Why not do the same for strings?


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] email package status in 3.X

2010-06-21 Thread P.J. Eby

At 10:20 PM 6/21/2010 +1000, Nick Coghlan wrote:

For the idea of avoiding excess copying of bytes through multiple
encoding/decoding calls... isn't that meant to be handled at an
architectural level (i.e. decode once on the way in, encode once on
the way out)? Optimising the single-byte codec case by minimising data
copying (possibly through creative use of PEP 3118) may be something
that we want to look at eventually, but it strikes me as something of
a premature optimisation at this point in time (i.e. the old adage
first get it working, then get it working fast).


The issue is, I'd like to have an idempotent incantation that I can 
use to make the inputs and outputs to stdlib functions behave in a 
type-safe manner with respect to bytes, in cases where bytes are 
really what I want operated on.


Note too that this is an argument for symmetry in wrapping the inputs 
and outputs, so that the code doesn't have to know what it's dealing with!


After all, right now, if a stdlib function might return bytes or 
unicode depending on runtime conditions, I can't even hardcode an 
.encode() call -- it would fail if the return type is a bytes.


This basically goes against the tell, don't ask pattern, and the 
Pythonically idempotent approach.  That is, Python builtins normally 
return you back the same thing if it's already what you want - 
int(someInt)- someInt, iter(someIter)-someIter, etc.


Since this incantation may need to be used often, and in places that 
are not known to me in advance, I would like it to not impose new 
overhead in unexpected places.  (i.e., the usual argument brought 
against making changes to the 'list' type that would change certain 
operations from O(1) to O(log something)).


It's more about predictability, and having One *Obvious* Way To Do 
It, as opposed to several ways, which you need to think carefully 
about and restructure your entire architecture around if 
necessary.  One obvious way means I can focus on the mechanical 
effort of porting *first*, without having to think.


So, the performance issue isn't really about performance *per se*, so 
much as about the mental UI of the language.  You could just as 
easily lie and tell me that your bstr implementation is O(1), and I 
would probably be happy and never notice, because the issue was never 
really about performance as such, but about having to *think* about 
it.  (i.e., breaking flow.)


Really, the entire issue can presumably be dealt with by some series 
of incantations - it's just code after all.  But having to sit and 
think about *every* situation where I'm dealing with bytes/unicode 
distinctions seems like a torture compared to being able to say, 
okay, so when dealing with this sort of API and this sort of data, 
this is the One Obvious Way to do the conversions.


It's One Obvious Way that I want, but some people seem to be arguing 
that the One Obvious Way is to Think Carefully About It Every Time -- 
and that seems to violate the Obvious part, IMO.  ;-)


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] bytes / unicode

2010-06-21 Thread P.J. Eby

At 10:51 PM 6/21/2010 +1000, Nick Coghlan wrote:

It may be that there are places where we need to rewrite standard
library algorithms to be bytes/str neutral (e.g. by using length one
slices instead of indexing). It may be that there are more APIs that
need to grow encoding keyword arguments that they then pass on to
the functions they call or use to convert str arguments to bytes (or
vice-versa). But without people trying to port affected libraries and
reporting bugs when they find issues, the situation isn't going to
improve.

Now, if these bugs are already being reported against 3.1 and just
aren't getting fixed, that's a completely different story...


The overall impression, though, is that this isn't really a step 
forward.  Now, bytes are the special case instead of unicode, but 
that special case isn't actually handled any better by the stdlib - 
in fact, it's arguably worse.  And, the burden of addressing this 
seems to have been shifted from the people who made the change, to 
the people who are going to use it.  But those people are not 
necessarily in a position to tell you anything more than, give me 
something that works with bytes.


What I can tell you is that before, since string constants in the 
stdlib were ascii bytes, and transparently promoted to unicode, 
stdlib behavior was *predictable* in the presence of special cases: 
you got back either bytes or unicode, but either way, you could 
idempotently upgrade the result to unicode, or just pass it on.  APIs 
were str safe, unicode aware.  If you passed in bytes, you weren't 
going to get unicode without a warning, and if you passed in unicode, 
it'd work and you'd get unicode back.


Now, the APIs are neither safe nor aware -- if you pass bytes in, you 
get unpredictable results back.


Ironically, it almost *would* have been better if bytes simply didn't 
work as strings at all, *ever*, but if you could wrap them with a 
bstr() to *treat* them as text.  You could still have restrictions on 
combining them, as long as it was a restriction on the unicode you 
mixed with them.  That is, if you could combine a bstr and a str if 
the *str* was restricted to ASCII.


If we had the Python 3 design discussions to do over again, I think I 
would now have stuck with the position of not letting bytes be 
string-compatible at all, and instead proposed an explicit bstr() 
wrapper/adapter to use them as strings, that would (in that case) 
force coercion in the direction of bytes rather than strings.  (And 
bstr need not have been a builtin - it could have been something you 
import, to help discourage casual usage.)


Might this approach lead to some people doing things wrong in the 
case of porting?  Sure.  But there'd be little reason to use it in 
new code that didn't have a real need for bytestring manipulation.


It might've been a better balance between practicality and purity, in 
that it keeps the language pure, while offering a practical way to 
deal with things in bytes if you really need to.  And, bytes wouldn't 
silently succeed *some* of the time, leading to a trap.  An easy 
inconsistency is worse than a bit of uniform chicken-waving.


Is it too late to make that tradeoff?  Probably.  Certainly it's not 
practical to *implement* outside the language core, and removing 
string methods would fux0r anybody whose currently-ported code relies 
on bytes objects having string-like methods.


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] bytes / unicode

2010-06-21 Thread P.J. Eby

At 01:08 AM 6/22/2010 +0900, Stephen J. Turnbull wrote:

But if you need that everywhere, what's so hard about

def urljoin_wrapper (base, subdir):
return urljoin(str(base, 'latin-1'), subdir).encode('latin-1')

Now, note how that pattern fails as soon as you want to use
non-ISO-8859-1 languages for subdir names.


Bear in mind that the use cases I'm talking about here are WSGI 
stacks with components written by multiple authors -- each of whom 
may have to define that function, and still get it right.


Sure, there are some things that could go in wsgiref in the 
stdlib.  However, as of this moment, there's only a very uneasy rough 
consensus in Web-Sig as to how the heck WSGI should actually *work* 
on Python 3, because of issues like these.


That makes it tough to actually say what should happen in the stdlib 
-- e.g., which things should be classed as stdlib bugs, which things 
should be worked around with wrappers or new functions, etc.


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] email package status in 3.X

2010-06-21 Thread P.J. Eby

At 11:43 AM 6/21/2010 -0400, Barry Warsaw wrote:

On Jun 21, 2010, at 10:20 PM, Nick Coghlan wrote:
Something that may make sense to ease the porting process is for some
of these on the boundary I/O related string manipulation functions
(such as os.path.join) to grow encoding keyword-only arguments. The
recommended approach would be to provide all strings, but bytes could
also be accepted if an encoding was specified. (If you want to mix
encodings - tough, do the decoding yourself).

This is probably a stupid idea, and if so I'll plead Monday morning mindfuzz
for it.

Would it make sense to have encoding-carrying bytes and str types?


It's not a stupid idea, and could potentially work.  It also might 
have a better chance of being able to actually be *implemented* in 
3.x than my idea.



Basically, I'm thinking of types (maybe even the current ones) that carry
around a .encoding attribute so that they can be automatically encoded and
decoded where necessary.  This at least would simplify APIs that need to do
the conversion.


I'm not really sure how much use the encoding is on a unicode object 
- what would it actually mean?


Hm. I suppose it would effectively mean this string can be 
represented in this encoding -- which is useful, in that you could 
fail operations when combining with bytes of a different encoding.


Hm... no, in that case you should just encode the string to the 
bytes' encoding, and let that throw an error if it fails.  So, 
really, there's no reason for a string to know its encoding.  All you 
need is the bytes type to have an encoding attribute, and when doing 
mixed-type operations between bytes and strings, coerce to *bytes of 
the same encoding*.


However, if .encoding is None, then coercion would follow the same 
rules as now -- i.e., convert the bytes to unicode, assuming an ascii 
encoding.  (This would be different than setting an encoding of 
'ascii', because in that case, it means you want cross-type 
operations to result in ascii bytes, rather than a  unicode string, 
and to fail if the unicode part can't be encoded appropriately.  The 
'None' setting is effectively a nod to compatibility with prior 3.x 
versions, since I assume we can't just throw out the old coercion behavior.)


Then, a few more changes to the bytes type would round out the implementation:

* Allow .decode() to not specify an encoding, unless .encoding is None

* Add back in the missing string methods (e.g. .encode()), since you 
can transparently upgrade to a string)


* Smart __str__, as shown in your proposal.



Would it be feasible?  Dunno.


Probably, although it might mean adding back in special cases that 
were previously taken out, and a few new ones.




  Would it help ease the bytes/str confusion?  Dunno.


Not sure what confusion you mean -- Web-SIG and I at least are not 
confused about the difference between bytes and str, or we wouldn't 
be having an issue.  ;-)  Or maybe you mean the stdlib's API 
confusion?  In which case, yes, definitely!




  But I think it would help make APIs easier to design and use because
it would cut down on the encoding-keyword function signature infection.


Not only that, but I believe it would also retroactively make the 
stdlib's implementation of those APIs correct again, and give us 
One Obvious Way to work with bytes of a known encoding, while 
constraining any unicode that gets combined with those bytes to be 
validly encodable.  It also gives you an idempotent constructor for 
bytes of a specified encoding, that can take either a bytes of 
unspecified encoding, a bytes of the correct encoding, or a string 
that can be encoded as such.


In short, +1.  (I wish it were possible to go back and make bytes 
non-strings and have only this ebytes or bstr or whatever type have 
string methods, but I'm pretty sure that ship has already sailed.)


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] email package status in 3.X

2010-06-21 Thread P.J. Eby

At 12:34 PM 6/21/2010 -0400, Toshio Kuratomi wrote:

What do you think of making the encoding attribute a mandatory part of
creating an ebyte object?  (ex: ``eb = ebytes(b, 'euc-jp')``).


As long as the coercion rules force str+ebytes (or str % ebytes, 
ebytes % str, etc.) to result in another ebytes (and fail if the str 
can't be encoded in the ebytes' encoding), I'm personally fine with 
it, although I really like the idea of tacking the encoding to bytes 
objects in the first place.


OTOH, one potential problem with having the encoding on the bytes 
object rather than the ebytes object is that then you can't easily 
take bytes from a socket and then say what encoding they are, without 
interfering with the sockets API (or whatever other place you get the 
bytes from).


So, on balance, making ebytes a separate type (perhaps one that's 
just a pointer to the bytes and a pointer to the encoding) would 
indeed make more sense.  It having different coercion rules for 
interacting with strings would make more sense too in that 
case.  (The ideal, of course, would still be to not let bytes objects 
be stringlike at all, with only ebytes acting string-like.  That way, 
you'd be forced to be explicit about your encoding when working with 
bytes, but all you'd need to do was make an ebytes call.)


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] bytes / unicode

2010-06-21 Thread P.J. Eby

At 05:49 PM 6/21/2010 +0100, Michael Foord wrote:
Why is your proposed bstr wrapper not practical to implement outside 
the core and use in your own libraries and frameworks?


__contains__ doesn't have a converse operation, so you can't code a 
type that works around this (Python 3.1 shown):


 from os.path import join
 join(b'x','y')
Traceback (most recent call last):
  File stdin, line 1, in module
  File c:\Python31\lib\ntpath.py, line 161, in join
if b[:1] in seps:
TypeError: Type str doesn't support the buffer API
 join('y',b'x')
Traceback (most recent call last):
  File stdin, line 1, in module
  File c:\Python31\lib\ntpath.py, line 161, in join
if b[:1] in seps:
TypeError: 'in string' requires string as left operand, not bytes

IOW, only one of these two cases can be worked around by using a bstr 
(or ebytes) that doesn't have support from the core string type.


I'm not sure if the in operator is the only case where implementing 
such a type would fail, but it's the most obvious one.  String 
formatting, of both the % and .format() varieties is 
another.  (__rmod__ doesn't help if your bytes object is one of 
several data items in a tuple or dict -- the common case for % formatting.)


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] bytes / unicode

2010-06-21 Thread P.J. Eby

At 12:56 PM 6/21/2010 -0400, Toshio Kuratomi wrote:

One comment here -- you can also have uri's that aren't decodable into their
true textual meaning using a single encoding.

Apache will happily serve out uris that have utf-8, shift-jis, and euc-jp
components inside of their path but the textual representation that 
was intended

will be garbled (or be represented by escaped byte sequences).  For that
matter, apache will serve requests that have no true textual representation
as it is working on the byte level rather than the character level.

So a complete solution really should allow the programmer to pass in uris as
bytes when the programmer knows that they need it.


ebytes(somebytes, 'garbage'), perhaps, which would be like ascii, but 
where combining with non-garbage would results in another 'garbage' ebytes?


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


  1   2   3   >