Re: [Python-Dev] PEP 402: Simplified Package Layout and Partitioning
At 02:02 PM 8/11/2011 -0400, Glyph Lefkowitz wrote: Rather than a one-by-one ad-hoc consideration of which attribute should be set to None or empty strings or string or what have you, I'd really like to see a discussion in the PEP saying what a package really is vs. what a module is, and what one can reasonably expect from it from an API and tooling perspective. The assumption I've been working from is the only guarantee I've ever seen the Python docs give: i.e., that a package is a module object with a __path__ attribute. Modules aren't even required to have a __file__ object -- builtin modules don't, for example. (And the contents of __file__ are not required to have any particular semantics: PEP 302 notes that it can be a dummy value like frozen, for example.) Technically, btw, PEP 302 requires __file__ to be a string, so making __file__ = None will be a backwards-incompatible change. But any code that walks modules in sys.modules is going to break today if it expects a __file__ attribute to exist, because 'sys' itself doesn't have one! So, my leaning is towards leaving off __file__, since today's code already has to deal with it being nonexistent, if it's working with arbitrary modules, and that'll produce breakage sooner rather than later -- the twisted.python.modules code, for example, would fail with a loud AttributeError, rather than going on to silently assume that a module with a dummy __file__ isn't a package. (Which is NOT a valid assumption *now*, btw, as I'll explain below.) Anyway, if you have any suggestions for verbiage that should be added to the PEP to clarify these assumptions, I'd be happy to add them. However, I think that the real problem you're encountering at the moment has more to do with making assumptions about the Python import ecosystem that aren't valid today, and haven't been valid since at least the introduction of PEP 302, if not earlier import hook systems as well. But the whole pure virtual mechanism here seems to pile even more inconsistency on top of an already irritatingly inconsistent import mechanism. I was reasonably happy with my attempt to paper over PEP 302's weirdnesses from a user perspective: http://twistedmatrix.com/documents/11.0.0/api/twisted.python.modules.htmlhttp://twistedmatrix.com/documents/11.0.0/api/twisted.python.modules.html (or https://launchpad.net/moduleshttps://launchpad.net/modules if you are not a Twisted user) Users of this API can traverse the module hierarchy with certain expectations; each module or package would have .pathEntry and .filePath attributes, each of which would refer to the appropriate place. Of course __path__ complicates things a bit, but so it goes. I don't mean to be critical, and no doubt what you've written works fine for your current requirements, but on my quick attempt to skim through the code I found many things which appear to me to be incompatible with PEP 302. That is, the above code hardocdes a variety of assumptions about the import system that haven't been true since Python 2.3. (For example, it assumes that the contents of sys.path strings have inspectable semantics, that the contents of __file__ can tell you things about the module-ness or package-ness of a module object, etc.) If you want to fully support PEP 302, you might want to consider making this a wrapper over the corresponding pkgutil APIs (available since Python 2.5) that do roughly the same things, but which delegate all path string inspection to importer objects and allow extensible delegation for importers that don't support the optional methods involved. (Of course, if the pkgutil APIs are missing something you need, perhaps you could propose additions.) Now it seems like pure virtual packages are going to introduce a new type of special case into the hierarchy which have neither .pathEntry nor .filePath objects. The problem is that your API's notion that these things exist as coherent concepts was never really a valid assumption in the first place. .pth files and namespace packages already meant that the idea of a package coming from a single path entry made no sense. And namespace packages installed by setuptools' system packaging mode *don't have a __file__ attribute* today... heck they don't have __init__ modules, either. So, adding virtual packages isn't actually going to change anything, except perhaps by making these scenarios more common. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 402: Simplified Package Layout and Partitioning
At 01:09 PM 8/12/2011 -0400, Glyph Lefkowitz wrote: Upon further reflection, PEP 402 _will_ make dealing with namespace packages from this code considerably easier: we won't need to do AST analysis to look for a __path__ attribute or anything gross like that improve correctness; we can just look in various directories on sys.path and accurately predict what __path__ will be synthesized to be. The flip side of that is that you can't always know whether a directory is a virtual package without deep inspection: one consequence of PEP 402 is that any directory that contains a Python module (of whatever type), however deeply nested, will be a valid package name. So, you can't rule out that a given directory *might* be a package, without walking its entire reachable subtree. (Within the subset of directory names that are valid Python identifiers, of course.) However, you *can* quickly tell that a directory *might* be a package or is *probably* one: if it contains modules, or is the same name as an already-discovered module, it's a pretty safe bet that you can flag it as such. In any case, you probably should *not* do the building of a virtual path yourself; the protocols and APIs added by PEP 402 should allow you to simply ask for the path to be constructed on your behalf. Otherwise, you are going to be back in the same business of second-guessing arbitrary importer backends again! (E.g. note that PEP 402 does not say virtual package subpaths must be filesystem or zipfile subdirectories of their parents - an importer could just as easily allow you to treat subdirectories named 'twisted.python' as part of a virtual package with that name!) Anyway, pkgutil defines some extra methods that importers can implement to support module-walking, and part of the PEP 402 implementation should be to make this support virtual packages as well. This code still needs to support Python 2.4, but I will make a note of this for future reference. A suggestion: just take the pkgutil code and bundle it for Python 2.4 as something._pkgutil. There's very little about it that's 2.5+ specific, at least when I wrote the bits that do the module walking. Of course, the main disadvantage of pkgutil for your purposes is that it currently requires packages to be imported in order to walk their child modules. (IIRC, it does *not*, however, require them to be imported in order to discover their existence.) In that case, I guess it's a good thing; these bugs should be dealt with. Thanks for pointing them out. My opinion of PEP 402 has been completely reversed - although I'd still like to see a section about the module system from a library/tools author point of view rather than a time-traveling perl user's narrative :). LOL. If you will propose the wording you'd like to see, I'll be happy to check it for any current-and-or-future incorrect assumptions. ;-) ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 402: Simplified Package Layout and Partitioning
At 05:03 PM 8/12/2011 -0400, Glyph Lefkowitz wrote: Are there any rules about passing invalid identifiers to __import__ though, or is that just less likely? :) I suppose you have a point there. ;-) I still like the idea of a 'marker' file. It would be great if there were a new marker like __package__.py. Having any required marker file makes separately-installable portions of a package impossible, since it would then be in conflict at installation time. The (semi-)competing proposal, PEP 382, is based on allowing each portion to have a differently-named marker; we came up with PEP 402 as a way to get rid of the need for any marker files (not to mention the bikeshedding involved.) What do you mean building of a virtual path? Constructing the __path__-to-be of a not-yet-imported virtual package. The PEP defines a protocol for constructing this, by asking the importer objects to provide __path__ entries, and it does not require anything to be imported. So there's no reason to re-implement the algorithm yourself. The more that this can focus on module-walking without executing code, the happier I'll be :). Virtual packages actually improve on this situation, in that a virtual path can be computed without the need to import the package. (Assuming a submodule or subpackage doesn't munge the __path__, of course.) ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 402: Simplified Package Layout and Partitioning
At 04:39 PM 8/11/2011 +0200, Ãric Araujo wrote: Hi, I've read PEP 402 and would like to offer comments. Thanks. Minor: I would reserve packaging for packaging/distribution/installation/deployment matters, not Python modules. I suggest Python package semantics. Changing to Python package import semantics to hopefully be even clearer. ;-) (Nitpick: I was somewhat intentionally ambiguous because we are talking here about how a package is physically implemented in the filesystem, and that actually *is* kind of a packaging issue. But it's not necessarily a *useful* intentional ambiguity, so I've no problem with removing it.) Minor: In the UNIX world, or with version control tools, moving and renaming are the same one thing (hg mv spam.py spam/__init__.py for example). Also, if you turn a module into a package, you may want to move code around, change imports, etc., so I'm not sure the renaming part is such a big step. Anyway, if the import-sig people say that users think it's a complex or costly operation, I can believe it. It's not that it's complex or costly in anything other than *mental* overhead -- you have to remember to do it and it's not particularly obvious. (But people on import-sig did mention this and other things covered by the PEP as being a frequent root cause of beginner inquiries on #python, Stackoverflow, et al.) (By the way, both of these additions to the import protocol (i.e. the dynamically-added ``__path__``, and dynamically-created modules) apply recursively to child packages, using the parent package's ``__path__`` in place of ``sys.path`` as a basis for generating a child ``__path__``. This means that self-contained and virtual packages can contain each other without limitation, with the caveat that if you put a virtual package inside a self-contained one, it's gonna have a really short ``__path__``!) I don't understand the caveat or its implications. Since each package's __path__ is the same length or shorter than its parent's by default, then if you put a virtual package inside a self-contained one, it will be functionally speaking no different than a self-contained one, in that it will have only one path entry. So, it's not really useful to put a virtual package inside a self-contained one, even though you can do it. (Apart form it letting you avoid a superfluous __init__ module, assuming it's indeed superfluous.) In other words, we don't allow pure virtual packages to be imported directly, only modules and self-contained packages. (This is an acceptable limitation, because there is no *functional* value to importing such a package by itself. After all, the module object will have no *contents* until you import at least one of its subpackages or submodules!) Once ``zc.buildout`` has been successfully imported, though, there *will* be a ``zc`` module in ``sys.modules``, and trying to import it will of course succeed. We are only preventing an *initial* import from succeeding, in order to prevent false-positive import successes when clashing subdirectories are present on ``sys.path``. I find that limitation acceptable. After all, there is no zc project, and no zc module, just a zc namespace. I'll just regret that it's not possible to provide a module docstring to inform that this is a namespace package used for X and Y. It *is* possible - you'd just have to put it in a zc.py file. IOW, this PEP still allows namespace-defining packages to exist, as was requested by early commenters on PEP 382. It just doesn't *require* them to exist in order for the namespace contents to be importable. The resulting list (whether empty or not) is then stored in a ``sys.virtual_package_paths`` dictionary, keyed by module name. This was probably said on import-sig, but here I go: yet another import artifact in the sys module! I hope we get ImportEngine in 3.3 to clean up all this. Well, I rather *like* having them there, personally, vs. having to learn yet another API, but oh well, whatever. AFAIK, ImportEngine isn't going to do away with the need for the global ones to live somewhere, at least not in 3.3. * A new ``extend_virtual_paths(path_entry)`` function, to extend existing, already-imported virtual packages' ``__path__`` attributes to include any portions found in a new ``sys.path`` entry. This function should be called by applications extending ``sys.path`` at runtime, e.g. when adding a plugin directory or an egg to the path. Let's imagine my application Spam has a namespace spam.ext for plugins. To use a custom directory where plugins are stored, or a zip file with plugins (I don't use eggs, so let me talk about zip files here), I'd have to call sys.path.append *and* pkgutil.extend_virtual_paths? As written in the current proposal, yes. There was some discussion on Python-Dev about having this happen automatically, and I proposed that it could be done by making virtual
Re: [Python-Dev] Import lock considered mysterious
At 02:48 PM 7/22/2011 +0200, Antoine Pitrou wrote: See http://bugs.python.org/issue9260 There's a patch there but it needs additional sophistication to remove deadlocks when doing concurrent circular imports. I don't think that approach can work, as PEP 302 loaders can currently assume the global import lock is being held when they run... and in general, there are too many global data structures in sys that need to be protected by code that uses such things. A simpler solution to Greg's problem would be to have a timeout on attempts to acquire the import lock, and have it fail with a RuntimeError describing the problem. (*Not* an ImportError, mind you, which might get ignored and trigger a different code path.) The timeout would need to be on the order of seconds to prevent false positives, and there'd need to be a way to change or remove the timeout in the event somebody really needs to. But it would eliminate the mysteriousness. A unique and Google-able error message would let someone find a clear explanation of what's going on, as well. A second thing that *could* be helpful would be to issue a warning when a new thread is started (or waited on) while the import lock is held. This is already known to be a bad thing to do. The tricky part is issuing the warning for the right caller level, but I suppose you could walk back up the call stack until you found some module-level code, and then fingered that line of code as the culprit. Yes, that might do it: the code for starting or waiting on a thread, could check to see if the import lock is held by the current thread, and if so, walk up the stack to find a module frame (one where f_globals is f_locals and __name__ in f_locals and sys.modules[__name__].__dict__ is f_locals), and if one is found, issue a warning about not starting or waiting on threads in module-level code. Between that and the timeout, the mysteriousness could be completely done away with, without throwing a monkey wrench into the current import mechanisms. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Draft PEP: Simplified Package Layout and Partitioning
At 11:52 AM 7/21/2011 +1000, Nick Coghlan wrote: Trying to change how packages are identified at the Python level makes PEP 382 sound positively appealing. __path__ needs to stay :) In which case, it should be a list, not a sentinel. ;-) Even better would be for these (and sys.path) to be list subclasses that did the right thing under the hood as Glenn suggested. Code that *replaces* rather than modifies these attributes would still potentially break virtual packages, but code that modifies them in place would do the right thing automatically. (Note that all code that manipulates sys.path and __path__ attributes requires explicit calls to correctly support current namespace package mechanisms, so this would actually be an improvement on the status quo rather than making anything worse). I think the simplest thing, if we're keeping __path__ (and on reflection, I think we should), would be to simply call extend_virtual_paths() automatically on new path entries found in sys.path when an import is performed, relative to the previous value of sys.path. That is, we save an old copy of sys.path somewhere, and whenever __import__() is called (well, once it gets past checking if the target is already in sys.modules, anyway), it checks the current sys.path against it, and calls extend_virtual_paths() on any sys.path entries that weren't in the old sys.path. This is not the most efficient thing in the world, as it will cause a bunch of stat calls to happen against the new directories, in the middle of a possibly-entirely-unrelated import operation, but it would certainly address the issue in the Simplest Way That Could Possibly Work. A stricter (safer) version of the same thing would be one where we only update __path__ values that are unchanged since we created them, and rather than only appending new entries, we replace the __path__ with a newly-computed one. This version is safer because it avoids corner cases like I imported foo.bar while foo.baz 1.1 was on my path, then I prepended a directory to sys.path that has foo.baz 1.2, but I still get foo.baz 1.1 when I import. But it loses in cases where people do direct __path__ manipulation. On the other hand, it's a lot easier to say you break it, you bought it where __path__ manipulation is concerned, so I'm actually pretty inclined towards using the strict version. Hey... here's a crazy idea. Suppose that a virtual package __path__ is a *tuple* instead of a list? Now, in order to change it, you *have* to replace it. And we can cache the tuple we initially set it to in sys.virtual_package_paths, so we can do an 'is' check before replacing it. Voila: __path__ still exists and is still a sequence for a virtual path, but you have to explicitly replace it if you want to do anything funky -- at which point you're responsible for maintaining it. I'm tempted to say, well, why not use a list-subclass proxy, then?, but that means more work for no real difference. I just went through dozens of examples of __path__ usage (found via Google), and I found exactly two examples of code that modifies a __path__ that is not: 1. In the __init__.py whose __path__ it is (i.e., code that'll still have a list), or 2. Modifying the __path__ of an explicitly-named self-contained package that's part of the same distribution. The two examples are from Twisted, and Google AppEngine. In the Twisted case, it's some sort of namespace package-like plugin chicanery, and in the AppEngine case, well, I'm not sure what the heck it's doing, but it seems to be making sure that you can still import stuff that has the same name as stdlib stuff, or something. The Twisted case (and an apparent copy of the same code in a project called flumotion) uses ihooks, though, so I'm not sure it'll even get executed for virtual packages. The Google case loops over everything in sys.modules, in a function by the name of appengine.dist.fix_paths()... but I wasn't able to find out who calls this function, when and why. So, pretty much, except for these bits of nosy code, the vast majority of code out there seems to only mess with its own self-contained paths, making the use of tuples seem like a pretty safe choice. (Oh, and all the code I found that reads paths without modifying them only use tuple-safe operations.) So, if we implement automatic __path__ updates for virtual packages, I'm currently leaning towards the strict approach using tuples, but could possibly be persuaded towards read-only list-proxies instead. Side note: it looks like a *lot* of code out there abuses __path__[0] to find data files, so I probably need to add a note to the PEP about not doing that when you convert a self-contained package to a virtual one. Of course, I suppose using a sentinel could address *that* problem, or an iteration-only proxy. The main concern here is that using __path__[0] will *seem* to work when you first use it with a
Re: [Python-Dev] Draft PEP: Simplified Package Layout and Partitioning
At 12:59 PM 7/21/2011 -0700, Reliable Domains wrote: I assume that the implicit extend_virtual_paths() would be smart enough to only do real work if there are virtual packages to do it in, so much of the performance costs (bunch of stats) are bounded by the existence of and number of virtual packages that have actually been imported, correct? Yes - this is true even for an explicit call. It only does this for imported virtual packages, and child virtual packages are only checked for if the parent package exists. So, in the case of a directory being added that has no parent packages, then the cost in stats is equal to the number of top-level, *imported* virtual packages. The __path__ wrapper scheme can do this even better, and defer doing any of the stat calls until/unless another import occurs for one of those packages. So if you munge sys.path and then don't import anything from a virtual package, no extra stat calls would happen at all. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Draft PEP: Simplified Package Layout and Partitioning
At 03:04 AM 7/22/2011 +0200, Antoine Pitrou wrote: The additional confusion lies in the fact that a module can be shadowed by something which is not a module (a mere global variable). I find it rather baffling. If you move x.py to x/__init__.py, it does *exactly the same thing* in current versions of Python: Python 2.7.1 (r271:86832, Nov 27 2010, 18:30:46) [MSC v.1500 32 bit (Intel)] on win32 Type help, copyright, credits or license for more information. from x import y import x.y x.y module 'x.y' from 'x\y.py' y 5 The PEP does nothing new or different here. If something is baffling you, it's the behavior of from ... import, not the actual importing process. from x import y means import x; y = x.y. The PEP does not propose we change this. ;-) ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Draft PEP: Simplified Package Layout and Partitioning
At 06:46 PM 7/20/2011 +1000, Nick Coghlan wrote: On Wed, Jul 20, 2011 at 1:58 PM, P.J. Eby p...@telecommunity.com wrote: So, without further ado, here it is: I pushed this version up to the PEPs repo, so it now has a number (402) and can be read in prettier HTML format: http://www.python.org/dev/peps/pep-0402/ Technically, shouldn't this be a 3XXX series PEP? Or are we not doing those any more now that all PEPs would be 3XXX? ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Draft PEP: Simplified Package Layout and Partitioning
At 02:24 AM 7/20/2011 -0700, Glenn Linderman wrote: When I read about creating __path__ from sys.path, I immediately thought of the issue of programs that extend sys.path, and the above is the workaround for such programs. but it requires such programs to do work, and there are a lot of such programs (I, a relative newbie, have had to write some). As it turns out, I can't think of a situation where I have extended sys.path that would result in a problem for fancy namespace packages, because so far I've only written modules, not packages, and only modules are on the paths that I add to sys.path. But that does not make for a general solution. Most programs extend sys.path in order to import things. If those things aren't yet imported, they don't have a __path__ yet, and so don't need to be fixed. Only programs that modify sys.path *after* importing something that has a dynamic __path__ would need to do anything about that. Is there some way to create a new __path__ that would reflect the fact that it has been dynamically created, rather than set from __init__.py, and then when it is referenced, calculate (and cache?) a new value of __path__ to actually search? That's what extend_virtual_paths() is for. It updates the __path__ of all currently-imported virtual packages. Where before you wrote: sys.path.append('foo') You would now write: sys.path.append('foo') pkgutil.extend_virtual_paths('foo') ...assuming you have virtual packages you've already imported. If you don't, there's no reason to call extend_virtual_paths(). But it doesn't hurt anything if you call it unnecessarily, because it uses sys.virtual_packages to find out what to update, and if you haven't imported any virtual packages, there's nothing to update and the call will be a quick no-op. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Draft PEP: Simplified Package Layout and Partitioning
At 10:40 AM 7/20/2011 -0400, Neal Becker wrote: I wonder if this fixes the long-standing issue in OS vendor's distributions. In Fedora, for example, there is both arch-specific and non-arch directories: /usr/lib/python2.7 + /usr/lib64/python2.7, for example. Pure python goes into /usr/lib/python2.7, and code including binaries goes into /usr/lib64/python2.7. But if a package has both, it all has to go into /usr/lib64/python2.7, because the current loader can't find pieces in 2 different directories. You can't have both /usr/lib/python2.7/site-packages/foo and /usr/lib64/python2.7/site-packages/foo. So if this PEP will allow pieces of foo to be found in 2 different places, that would be helpful, IMO. It's more of a long-term solution than a short-term one. In order for it to work the way you want, 'foo' would need to have its main code in foo.py rather than foo/__init__.py. You could of course make that change on the author's behalf for your distro, or remove it altogether if it doesn't contain any actual code. However, if you're going to make changes, you could change its __init__.py right now to append extra directories to the module __path__... and that's something you can do right now. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] [Python-checkins] peps: Restore whitespace characters lost via email transmission.
At 04:21 PM 7/20/2011 +0200, Ãric Araujo wrote: FYI, reST uses three-space indents, not four (so that blocks align nicely under the leading two dots + one space), so I think the change was intentional. The âDocumenting Pythonâ guide tells this (in the standard docs), and I think it applies to PEPs too. PEP 12 prescribes four-space indents for PEPs, actually, wherever it mentions an specific indentation depth. Also, a formfeed character was lost, not just the leading spaces. Essentially, though, I was just merging my working copy, and those were the only differences that showed up (apart from the filled-in Post-History header), so I assumed it was just whitespace lost in transmission. (I'm a bit surprised that three-space indents are mandated for anything involving documenting Python in reST, though, since that would mean you'd also have to indent your code samples by three spaces, or else have an editor that supports two different tab widths.) ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Draft PEP: Simplified Package Layout and Partitioning
At 08:56 AM 7/20/2011 -0700, Jeff Hardy wrote: On Tue, Jul 19, 2011 at 8:58 PM, P.J. Eby p...@telecommunity.com wrote: The biggest likely exception to the above would be when a piece of code tries to check whether some package is installed by importing it. If this is done *only* by importing a top-level module (i.e., not checking for a ``__version__`` or some other attribute), *and* there is a directory of the same name as the sought-for package on ``sys.path`` somewhere, *and* the package is not actually installed, then such code could *perhaps* be fooled into thinking a package is installed that really isn't. This part worries me slightly. Imagine a program as such: datagen.py json/foo.js json/bar.js datagen.py uses the files in json/ to generate sample data for a database. In datagen.py is the following code: try: import json except ImportError: import simplejson as json Currently, this works just fine, but if will break (as I understand it) under the PEP because the json directory will become a virtual package and no ImportError will be raised. Well, it won't fail as long if there actually *is* a json module or package on the path. ;-) But I do see your point. Is there a mitigation for this in the PEP that I've missed? A possible mitigation would be to require that get_subpath() only return a directory name if that directory in fact contains importable modules somewhere. This is actually discussed a bit later as an open issue under Implementation Notes, indicating that iter_modules() has this issue as well. The main open questions in doing this kind of checking have to do with recursion: it's perfectly valid to have say, a 'zc/' directory whose only content is a 'buildout/' subdirectory. Of course, it still wouldn't help if the 'json/' subdirectory in your example did contain .py files. There is another possibility, though: What if we change the logic for pure-virtual package creation so that the parent module is created *if and only if* a child module is found? In that case, trying to import a pure virtual 'zc' package would fail, but importing 'zc.buildout' would succeed as long as there was a zc/buildout.py or a zc/buildout/__init__.py somewhere. And in your example, 'import json' would fail -- which is to say, succeed. ;-) This is a minor change to the spec, though perhaps a bit hairier to implement in practice. The current import.c loop over the module name parts (iterating over say, 'zc', then 'buildout', and importing them in turn) would need to be reworked so that it could either roll back the virtual package creation in the event of sub-import failure or conversely delay creation of the parent package(s) until a sub-import finds a module. I certainly think it's *doable*, mind you, but I'd hate to have to do it in C. ;-) Hm. Here's another variant that might be easier to implement (even in C), and could offer some other advantages as well. Suppose we replace the sys.virtual_packages set() with a sys.virtual_paths dict(): a dictionary that maps from module names to __path__ lists, and that's populated by the __path__ creation algorithm described in the PEP. (An empty list would mean that __path__ creation failed for that module/package name.) Now, if a module doesn't have a __path__ (or doesn't exist), we look in sys.virtual_paths for the module name. If the retrieved list is empty, we fail the import. If it's not, we proceed... but *don't* create a module or set the existing module's __path__. Then, at the point where an import succeeds, and we're going to set an attribute on the parent module, we recursively construct parent modules and set their __path__ attributes from sys.virtual_paths, if a module doesn't exist in sys.path, or its __path__ isn't set. Voila. Now there are fewer introspection problems as well: trying to 'import json.foo' when there's no 'foo.py' in any json/ directory will *not* create an empty 'json' package in sys.modules as a side-effect. And it won't add a __path__ to the 'json' module if there were a json.py found, either. What's more, since importing a pure virtual package now fails unless you've successfully imported something from it before, it makes more sense for it to not have a __file__, or a __file__ of None. Actually, it's too bad that we have to have parent packages in sys.modules, or I'd suggest we just make pure virtual packages unimportable, period. Technically, we *could* always create dummy parent modules for virtual packages and *not* put them in sys.modules, but I'm not sure if that's a good idea. It would be more consistent in some ways with the idea that virtual packages are not directly importable, but an interesting side effect would be that if module A does: import foo.bar and module B does: import foo.baz Then module A's version of 'foo' has *only* a 'bar' attribute and B's version has *only* a 'baz' attribute
Re: [Python-Dev] Draft PEP: Simplified Package Layout and Partitioning
At 12:37 PM 7/20/2011 -0400, Erik wrote: The best solution I can think of would be to have a way for a module to mark itself as finalized (I'm not sure if that's the best term--just the first that popped into my head). This would prevent its __path__ from being created or extended in any way. For example, if the json module contains `__finalized__ = True` or something of the like, any `import json.foo` would immediately fail. That wouldn't actually fix the problem Jeff brought up, which was the case where there *wasn't* a json.py. In any case, we can fix this now by banning direct import of pure-virtual packages. In that case there would need to be a way to mark a directory as not containing importable code. Not sure what the best approach to that would be, especially since one of the goals of this PEP seems to be to avoid marker files. For this particular issue, we don't need it. For tools that process Python code, or use pkgutil.walk_modules(), there may still be use cases, so we'll keep an eye open for relevant input. Hopefully someone will say something that jars loose an idea or two, as happened with Jeff's issue above. (Btw, as we speak, I am swiping Jeff's example and adding it into the PEP. ;-) It makes a great motivating example for banning pure-virtual package imports.) ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Draft PEP: Simplified Package Layout and Partitioning
At 01:35 PM 7/20/2011 -0600, Eric Snow wrote: This is a really nice solution. So a virtual package is not imported until a submodule of the virtual package is successfully imported Correct... (except for direct import of pure virtual packages). Not correct. ;-) What we do is avoid creating a parent module or altering its __path__ until a submodule/subpackage import is just about to be successfully completed. See the change I just pushed to the PEP: http://hg.python.org/peps/rev/a6f02035c66c Or read the revised Specification section here (which is a bit easier to read than the diff): http://www.python.org/dev/peps/pep-0402/#specification The change is basically that we wait until a successful find_module() happens before creating or tweaking any parent modules. This way, the load_module() part will still see an initialized parent package in sys.modules, and if it does any relative imports, they'll still work. (It *does* mean that if an error happens during load_module(), then future imports of the virtual package will succeed, but I'm okay with that corner case.) It seems like sys.virtual_packages should be populated even during a failed submodule import. Is that right? Yes. In the actual draft, btw, I dubbed it ``sys.virtual_package_paths`` and made it a dictionary. This actually makes the pkgutil.extend_path() code more general: it'll be able to fix the paths of things you haven't actually imported yet. ;-) Also, it makes sense that the above applies to all virtual packages, not just pure ones. Well, if the package isn't pure then what you've imported is really just an ordinary module, not a package at all. ;-) When a pure virtual package is directly imported, a new [empty] module is created and its __path__ is set to the matching value in sys.virtual_packages. However, an impure virtual package is not created upon direct import, and its __path__ is not updated until a submodule import is attempted. Even the sys.virtual_packages entry is not generated until the submodule attempt, since the virtual package mechanism doesn't kick in until the point that an ImportError is currently raised. This isn't that big a deal, but it would be the one behavioral difference between the two kinds of virtual packages. So either leave that one difference, disallow direct import of pure virtual packages, or attempt to make virtual packages for all non-package imports. That last one would impose the virtual package overhead on many more imports so it is probably too impractical. I'm fine with leaving the one difference. At this point, I've updated the PEP to disallow direct imports of pure virtual packages. AFAICT it's the only approach that ensures you can't get false positive imports by having unrelated-but-similarly-named directories floating around. So, really, there's not a difference, except that you can't import a useless empty module that you have no real business importing in the first place... and I'm fine with that. ;-) FYI, last night I started on an importlib-based implementation for the PEP and the above solution would be really easy to incorporate. Well, you might want to double-check that now that I've updated the spec. ;-) In the new approach, you cannot rely on parent modules existing before proceeding to the submodule import. However, I've just glanced at the importlib trunk, and I think I see what you mean. It's already using a recursive approach, rather than an iterative one, so the change should be a lot simpler there than in import.c. There probably just needs to be a pair of functions like: def _get_parent_path(parent): pmod = sys.modules.get(parent) if pmod is None: try: pmod = _gcd_import(parent) except ImportError: # Can't import parent, is it a virtual package? path = imp.get_virtual_path(parent) if not path: # no, allow the parent's import error to propagate raise return path if hasattr(pmod, '__path__'): return pmod.__path__ else: return imp.get_virtual_path(parent) def _get_parent_module(parent): pmod = sys.modules.get(parent) if pmod is None: pmod = sys.modules[parent] = imp.new_module(parent) if '.' in parent: head, _, tail = parent.rpartition('.') setattr(_get_parent_module(head), tail, pmod) if not hasattr(pmod, '__path__'): pmod.__path__ = imp.get_virtual_path(parent) return pmod And then instead of hanging on to parent_module during the import process, you'd just grab a path from _get_parent_path(), and initialize parent_module a little later, i.e.: if parent: path = _get_parent_path(parent) if not path: msg = (_ERR_MSG + '; {} is not a
Re: [Python-Dev] Draft PEP: Simplified Package Layout and Partitioning
At 03:22 PM 7/20/2011 -0600, Eric Snow wrote: On Wed, Jul 20, 2011 at 2:44 PM, P.J. Eby p...@telecommunity.com wrote: So, yeah, actually, that's looking pretty sweet. Basically, we just have to throw a virtual_package_paths dict into the sys module, and do the above along with the get_virtual_path() function and add get_subpath() to the importer objects, in order to get the PEP's core functionality working. Exactly. That's part of why the importlib approach is so appealing to me. Actually, it turns out I was a little too optimistic -- the sketch I gave doesn't work right for anything but top-level virtual packages, because I didn't take into account the part where get_virtual_path() needs a parent path. Fixing *that* error then leads to a really nasty bit of mutual recursion in which the parent module imports are attempted over and over again in something like O(N**2), I think. In order to get rid of that, _gcd_import would have to grow some internal memoization so it doesn't retry the same imports repeatedly. Ironically enough, this is because _gcd_import() is recursive, and thus attempts the imports in the opposite order (sort of) than import.c does, which means that you can't get hold of the parent's __path__ without recursing (again). :-( And trying to work around that with memoization, led me to the realization that you actually can't implement PEP 402 using that type of recursion. That is, to implement the spec correctly, _gcd_import is going to have to be refactored to iterate left-to-right over module name parts, rather than recursing right-to-left. That's because PEP 402 only allows for processing a virtual path if a module is not found, *not* if a module is found but can't be loaded. But, with importlib currently being recursive, it only knows that a parent import failed via ImportError, not whether that error arose from failing to find the module, or failing to load the module! So, the core part of the _gcd_import() function will need to be rewritten to iterate instead of recursing. (Still, it's probably not going to be *terribly* difficult. I'll take a look at doing a sketch of that next, but if I do one I'll send it to Import-SIG instead of here; it's not a detail that matters to the general PEP discussion.) ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Draft PEP: Simplified Package Layout and Partitioning
At 03:09 PM 7/20/2011 -0700, Glenn Linderman wrote: On 7/20/2011 6:05 AM, P.J. Eby wrote: At 02:24 AM 7/20/2011 -0700, Glenn Linderman wrote: When I read about creating __path__ from sys.path, I immediately thought of the issue of programs that extend sys.path, and the above is the workaround for such programs.àbut it requires such programs to do work, and there are a lot of such programs (I, a relative newbie, have had to write some).àAs it turns out, I can't think of a situation where I have extended sys.path that would result in a problem for fancy namespace packages, because so far I've only written modules, not packages, and only modules are on the paths that I add to sys.path.àBut that does not make for a general solution. Most programs extend sys.path in order to import things. If those things aren't yet imported, they don't have a __path__ yet, and so don't need to be fixed. Only programs that modify sys.path *after* importing something that has a dynamic __path__ would need to do anything about that. Sure. But there are a lot of things already imported by Python itself, and if this mechanism gets used in the stdlib, a program wouldn't know whether it is safe or not, to not bother with the pkgutil.extend_virtual_paths() call or not. I'm not sure I see how the mechanism could meaningfully be used in the stdlib, since IIUC we're not going for Perl-style package naming. So, all stdlib packages would be self-contained. Plus, that requires importing pkgutil, which isn't necessarily done by every program that extends the sys.path (import sys is sufficient at present). Plus, if some 3rd party packages are imported before sys.path is extended, the knowledge of how they are implement is required to make a choice about whether it is needed to import pkgutil and call extend_virtual_paths or not. I'd recommend *always* using it, outside of simple startup code. So I am still left with my original question: Is there some way to create a new __path__ that would reflect the fact that it has been dynamically created, rather than set from __init__.py, and then when it is referenced, calculate (and cache?) a new value of __path__ to actually search? Hm. Yes, there is a way to do something like that, but it would complicate things a bit. We'd need to: 1. Leave __path__ off of the modules, and always pull them from sys.virtual_package_paths, and 2. Before using a value in sys.virtual_package_paths, we'd need to check whether sys.path had changed since we last cached anything, and if so, clear sys.virtual_package_paths first, to force a refresh. This doesn't sound particularly forbidding, but there are various unpleasant consequences, like being unable to tell whether a module is a package or not, and whether it's a virtual package or not. We'd have to invent new ways to denote these things. On the bright side, though, it *would* allow transparent live updates to virtual package paths, so it might be worth considering. By the way, the reason we have to get rid of __path__ is that if we kept it, then code could change it, and then we wouldn't know if it was actually safe to change it automatically... even if no code had actually changed it. In principle, we could keep __path__ attributes around, and automatically update them in the case where sys.path has changed, so long as user code hasn't directly altered or replaced the __path__. But it seems to me to be a dangerous corner case; I'd rather that code which touches __path__ be taking responsibility for that path's correctness from then on, rather than having it get updated (possibly incorrectly) behind its back. So, I'd say that for this approach, we'd have to actually leave __path__ off of virtual packages' parent modules. Anyway, it seems worth considering. We just need to sort out what the downsides are for any current tools thinking that such modules aren't packages. (But hey, at least it'll be consistent with what such tools would think of the on-disk representation! That is, a tool that thinks foo.py alongside a foo/ subdirectory is just a module with no package, will also think that 'foo', once imported, is a module with no package.) And, in the absence of knowing (because I didn't write them) whether any of the packages I imported before extending sys.path are virtual packages or not, I would have to do this every time I extend sys.path. And so it becomes a burden on writing programs. If the code is so boilerplate as you describe, should sys.path become an object that acts like a list, instead of a list, and have its append method automatically do the pkgutil.extend_virtual_paths for me? Then I wouldn't have to worry about whether any of the packages I imported were virtual packages or not. Well, then we'd have to worry about other mutation methods, and things like 'sys.path = [blah, blah]', as well. So if we're going to ditch
[Python-Dev] Draft PEP: Simplified Package Layout and Partitioning
So, over on the Import-SIG, we were talking about the implementation and terminology for PEP 382, and it became increasingly obvious that things were, well, not entirely okay in the implementation is easy to explain department. Anyway, to make a long story short, we came up with an alternative implementation plan that actually solves some other problems besides the one that PEP 382 sets out to solve, and whose implementation a bit is easier to explain. (In fact, for users coming from various other languages, it hardly needs any explanation at all.) However, for long-time users of Python, the approach may require a bit more justification, which is why roughly 2/3rds of the PEP consists of a detailed rationale, specification overview, rejected alternatives, and backwards-compatibility discussion... which is still a lot less verbiage than reading through the lengthy Import-SIG threads that led up to the proposal. ;-) (The remaining 1/3rd of the PEP is the short, sweet, and easy-to-explain implementation detail.) Anyway, the PEP has already been discussed on the Import-SIG, and is proposed as an alternative to PEP 382 (Namespace packages). We expect, however, that many people will be interested in it for reasons having little to do with the namespace packaging use case. So, we would like to submit this for discussion, hole-finding, and eventual Pronouncement. As Barry put it, I think it's certainly worthy of posting to python-dev to see if anybody else can shoot holes in it, or come up with useful solutions to open questions. I'll be very interested to see Guido's reaction to it. :) So, without further ado, here it is: PEP: XXX Title: Simplified Package Layout and Partitioning Version: $Revision$ Last-Modified: $Date$ Author: P.J. Eby Status: Draft Type: Standards Track Content-Type: text/x-rst Created: 12-Jul-2011 Python-Version: 3.3 Post-History: Replaces: 382 Abstract This PEP proposes an enhancement to Python's package importing to: * Surprise users of other languages less, * Make it easier to convert a module into a package, and * Support dividing packages into separately installed components (ala namespace packages, as described in PEP 382) The proposed enhancements do not change the semantics of any currently-importable directory layouts, but make it possible for packages to use a simplified directory layout (that is not importable currently). However, the proposed changes do NOT add any performance overhead to the importing of existing modules or packages, and performance for the new directory layout should be about the same as that of previous namespace package solutions (such as ``pkgutil.extend_path()``). The Problem === .. epigraph:: Most packages are like modules. Their contents are highly interdependent and can't be pulled apart. [However,] some packages exist to provide a separate namespace. ... It should be possible to distribute sub-packages or submodules of these [namespace packages] independently. -- Jim Fulton, shortly before the release of Python 2.3 [1]_ When new users come to Python from other languages, they are often confused by Python's packaging semantics. At Google, for example, Guido received complaints from a large crowd with pitchforks [2]_ that the requirement for packages to contain an ``__init__`` module was a misfeature, and should be dropped. In addition, users coming from languages like Java or Perl are sometimes confused by a difference in Python's import path searching. In most other languages that have a similar path mechanism to Python's ``sys.path``, a package is merely a namespace that contains modules or classes, and can thus be spread across multiple directories in the language's path. In Perl, for instance, a ``Foo::Bar`` module will be searched for in ``Foo/`` subdirectories all along the module include path, not just in the first such subdirectory found. Worse, this is not just a problem for new users: it prevents *anyone* from easily splitting a package into separately-installable components. In Perl terms, it would be as if every possible ``Net::`` module on CPAN had to be bundled up and shipped in a single tarball! For that reason, various workarounds for this latter limitation exist, circulated under the term namespace packages. The Python standard library has provided one such workaround since Python 2.3 (via the ``pkgutil.extend_path()`` function), and the setuptools package provides another (via ``pkg_resources.declare_namespace()``). The workarounds themselves, however, fall prey to a *third* issue with Python's way of laying out packages in the filesystem. Because a package *must* contain an ``__init__`` module, any attempt to distribute modules for that package must necessarily include that ``__init__`` module, if those modules are to be importable. However, the very fact that each distribution of modules for a package must contain
Re: [Python-Dev] EuroPython Language Summit report
At 12:32 PM 6/25/2011 -0400, R. David Murray wrote: So your proposed code would allow me, when writing a generator in my code, do something that would allow me to yield up all the values from an arbitrary generator I'm calling, over which I have no control (ie: I can't modify its code)? With a decorator on your own function, yes. See: http://mail.python.org/pipermail/python-dev/2010-July/102320.html for details. Mostly, though, that proposal was a suggestion for how the optimized implementation would work - i.e., a suggestion that PEP 380 be implemented that way under the hood, by implicitly turning 'yield from' into 'yield From()' and wrapping the generator itself with another From() instance. (IOW, that was a proposal for how to avoid the extra overhead of recursive yielding in a series of nested yield-from's.) ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] EuroPython Language Summit report
At 10:46 AM 6/25/2011 +1000, Nick Coghlan wrote: Indeed, PEP 380 is *really* hard to do properly without language support. No, it isn't. You add a decorator, a 'from_' class, and a 'return_' function, and there you go. (See my previous code sketches here in early PEP 380 discussions.) Python frameworks have been doing variations of the same thing (with varying features and APIs) for at least 7 years now -- even on Python versions that lack decorators or the ability to return values from yield statements. So the main benefit of a PEP for this functionality would be providing a common implementation/API - and that could be initially done in the stdlib, without any added syntax support. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Python 3.x and bytes
At 01:56 AM 6/14/2011 +, exar...@twistedmatrix.com wrote: On 12:35 am, ncogh...@gmail.com wrote: On Tue, Jun 14, 2011 at 9:40 AM, P.J. Eby p...@telecommunity.com wrote: You can still do it one at a time: CHAR, = b'C' INT, = b'I' ... etc. I just tried it with Python 3.1 and it works there. I almost mentioned that, although it does violate one of the unwritten rules of the Zen (in this case, syntax shall not look like grit on Tim's monitor) [CHAR] = b'C' [INT] = b'I' ... Holy carpal tunnel time machine... That works in 2.3. (Without the 'b' of course.) Didn't know you could just use list syntax like that. It's an extra character to type, and two more shift keyings, but brevity isn't always the soul of clarity. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Python 3.x and bytes
At 03:11 PM 6/13/2011 -0700, Ethan Furman wrote: Nick Coghlan wrote: Agreed, but: EOH, CHAR, DATE, FLOAT, INT, LOGICAL, MEMO, NUMBER = b'\rCDFILMN' is a shorter way to write the same thing. Going two per line makes it easier to mentally map the characters: EOH, CHAR = b'\rC' DATE, FLOAT = b'DF' INT, LOGICAL = b'IL' MEMO, NUMBER = b'MN' Wow. I didn't realize that could be done. That very nearly makes up for not being able to do it one char at a time. You can still do it one at a time: CHAR, = b'C' INT, = b'I' ... etc. I just tried it with Python 3.1 and it works there. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Python jails
At 06:23 PM 6/10/2011 -0600, Sam Edwards wrote: I have a couple remaining issues that I haven't quite sussed out: [long list of questions deleted] You might be able to answer some of them by looking at this project: http://pypi.python.org/pypi/RestrictedPython Which implements the necessary ground machinery for doing that sort of thing, in the form of a specialized Python compiler (implemented in Python, for 2.3 through 2.7) that allows you to implement whatever sorts of guards and security policies you want on top of it. Even if it doesn't answer all your questions in and of itself, it may prove a fruitful environment in which you can experiment with various approaches and see which ones you actually like, without first having to write a bunch of code yourself. Discussing an official implementation of this sort of thing as a language feature is probably best left to python-ideas, though, until and unless you actually have a PEP to propose. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] python and super
At 03:55 PM 4/14/2011 +0100, Michael Foord wrote: Ricardo isn't suggesting that Python should always call super for you, but when you *start* the chain by calling super then Python could ensure that all the methods are called for you. If an individual method doesn't call super then a theoretical implementation could skip the parents methods (unless another child calls super). That would break classes that deliberately don't call super. I can think of examples in my own code that would break, especially in __init__() cases. It's perfectly sensible and useful for there to be classes that intentionally fail to call super(), and yet have a subclass that wants to use super(). So, this change would expose an internal implementation detail of a class to its subclasses, and make fragile base class problems worse. (i.e., where an internal change to a base class breaks a previously-working subclass). ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 396, Module Version Numbers
At 03:24 PM 4/10/2011 +, exar...@twistedmatrix.com wrote: On 04:02 am, p...@telecommunity.com wrote: At 08:52 AM 4/10/2011 +1000, Ben Finney wrote: This is an often-overlooked case, I think. The unspoken assumption is often that ``setup.py`` is a suitable place for the overall version string, but this is not the case when that string must be read by non-Python programs. If you haven't used the distutils a lot, you might not realize that you can do this: $ python setup.py --version 0.6c12 (The --name option also works, and they can be used together -- the answers will be on two separate lines.) This only works as long as setup.py is around - which it typically no longer is after installation is complete. And though it's common and acceptable enough to launch a child process in a shell script in order to get some piece of information, it isn't as pleasant in a Python program. Can you get this version information out of setup.py without running a child process and without monkey-patching sys.argv and sys.stdout? I was replying to the part above about setup.py ... must be read by non-Python programs. In other words, I thought the question was, given a not-yet-installed source package, how can we find the version number without writing Python code. Your question is a bit different. ;-) As it happens, if you have a source distribution of a package, you can expect to find a PKG-INFO file that contains version info anyway, generated from the source file. This is true for both distutils and setuptools-built source distributions. (It is not the case, alas, for simple revision control checkouts.) Anyway, I was merely addressing the technical question of how to get information from the tools that already exist, rather than advocating any solutions. And, along that same line, monkeypatching sys.argv and sys.stdout aren't technically necessary for you to get the information from a setup script, but a sandbox to keep the setup script from trying to do any installation steps is probably a good idea. (Some people have written setup scripts that actually copy files or do other things before they even call setup(). Nasty -- and one of the reasons that easy_install has a sandboxing facility.) ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 396, Module Version Numbers
At 08:52 AM 4/10/2011 +1000, Ben Finney wrote: This is an often-overlooked case, I think. The unspoken assumption is often that ``setup.py`` is a suitable place for the overall version string, but this is not the case when that string must be read by non-Python programs. If you haven't used the distutils a lot, you might not realize that you can do this: $ python setup.py --version 0.6c12 (The --name option also works, and they can be used together -- the answers will be on two separate lines.) ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] The purpose of SETUP_LOOP, BREAK_LOOP, CONTINUE_LOOP
At 08:25 AM 3/12/2011 -0500, Eugene Toder wrote: Right, I'm not suggesting to remove all blocks, only SETUP_LOOP blocks. Do you see the problem in that case? I think you guys are forgetting about FOR_ITER, listcomps, and the like. That is, IIRC, the reason loops use the block stack is because they put things on the regular stack, that need to be cleared off the stack when the loop is exited (whether normally or via an exception). In other words, just jumping out of a loop without popping the block stack would leave junk on the regular stack, thereby failing to deallocate the loop iterator. In the case of a nested loop, this would also mean that the outer loop would start using the inner loop's iterator, and all sorts of hilarity would then ensue. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 395: Module Aliasing
At 05:35 PM 3/4/2011 +, Michael Foord wrote: That (below) is not distutils it is setuptools. distutils just uses `scripts=[...]`, which annoyingly *doesn't* work with setuptools. Er, what? That's news to me. Could you file a bug report about what doesn't work, specifically? ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 3333: wsgi_string() function
At 09:43 AM 1/7/2011 -0500, James Y Knight wrote: On Jan 7, 2011, at 6:51 AM, Victor Stinner wrote: I don't understand why you are attached to this horrible hack (bytes-in-unicode). It introduces more work and more confusing than using raw bytes unchanged. It doesn't work and so something has to be changed. It's gross but it does work. This has been discussed ad-nausium on web-sig over a period of years. I'd like to reiterate that it is only even a potential issue for the PATH_INFO/SCRIPT_NAME keys. Those two keys are required to have been urldecoded already, into byte-data in some encoding. For all the other keys (including the ones from os.environ), they are either *properly* decoded in 8859-1 or are just ascii (possibly still urlencoded, so the app needs to urldecode and decode into a string with the correct encoding). Right. Also, it should be mentioned that none of this would be necessary if we could've gotten a bytes of a known encoding type. If you look back to the last big Python-Dev discussion on bytes/unicode and stdlib API breakage, this was the holdup for getting a sane WSGI spec. Since we couldn't change the language to fix the problem (due to the moratorium), we had to use this less-pleasant way of dealing with things, in order to get a final WSGI spec for Python 3. (If anybody is wondering about the specifics of the language change that was needed, it'd be having a bytes with known encoding type, that when combined in any polymorphic operation with a unicode string, would result in bytes-with-encoding output, and would raise an error if the resulting value could not be encoded in the target encoding. Then we would simply do all WSGI header operations with this type, using latin-1 as the target encoding.) ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 3333: wsgi_string() function
At 04:00 PM 1/6/2011 -0800, Raymond Hettinger wrote: Can you please take a look at http://docs.python.org/dev/whatsnew/3.2.html#pep--python-web-server-gateway-interface-v1-0-1http://docs.python.org/dev/whatsnew/3.2.html#pep--python-web-server-gateway-interface-v1-0-1 to see if it accurately recaps the resolution of the WSGI text/bytes issues. I would appreciate any feedback, as it is likely that the whatsnew document will be most people's first chance to hear the outcome of the multi-year discussion. Hi Raymond -- nice work there. A few minor suggestions: 1. Native strings are used as the keys and values of the environ dictionary, not just as headers for start_response. 2. The read_environ() method is strictly for use with CGI-to-WSGI gateways, or for bridging other CGI-like protocols (e.g. FastCGI) to WSGI. It is ONLY for server implementers, in other words, and the typical app developer is doing something terribly wrong if they are even bothering to read its documentation. ;-) 3. The primary relevance of the native string type to an app developer is that when porting code from Python 2 to 3, they must still decode environment variable values, even though they are already Unicode. If their code was previously dealing only in Python 2 'str' objects, then nothing really changes. If they were previously decoding from environ str's to unicode, then they must replace their prior .decode('whatever') with .encode('latin1').decode('whatever'). That's basically it for porting from Python 2. IOW, this design choice allows most HTTP header manipulating code (whether input or output) to be ported to Python 3 with a very mechanical change pattern. Most such code is working with ASCII anyway, since normally both input and output headers are, and there are few headers that an application would be likely to convert to actual unicode anyway. On output via send_response(), if an application is currently encoding an output header -- why they would be, I have no idea, but if they are -- they need to add a re-encode to latin1. (i.e., .encode('whatever').decode('latin1')) IOW, a short 2-to-3 porting guide for WSGI: * If you just used strings for headers before, that part of your code doesn't change. (And if it was broken before, it's still broken in exactly the same way. No new breakage is introduced. ;-) ) * If you encoded any output headers or decoded any input headers, you must take into account the extra latin1 step. This is expected to be rare, since it's usually only SCRIPT_NAME and PATH_INFO that anybody would ever care about on input, and almost never anything on output. * Values yielded by an application or sent via a write() call MUST be byte strings; The environ and start_response() MUST be native strings. No mixing and matching. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 3333: wsgi_string() function
At 03:44 AM 1/4/2011 +0100, Victor Stinner wrote: Hi, In the PEP , I read: -- import os, sys enc, esc = sys.getfilesystemencoding(), 'surrogateescape' def wsgi_string(u): # Convert an environment variable to a WSGI bytes-as-unicode string return u.encode(enc, esc).decode('iso-8859-1') def run_with_cgi(application): environ = {k: wsgi_string(v) for k,v in os.environ.items()} environ['wsgi.input']= sys.stdin environ['wsgi.errors'] = sys.stderr environ['wsgi.version'] = (1, 0) ... -- What is this horrible encoding bytes-as-unicode? os.environ is supposed to be correctly decoded and contain valid unicode characters. If WSGI uses another encoding than the locale encoding (which is a bad idea), it should use os.environb and decodes keys and values using its own encoding. If you really want to store bytes in unicode, str is not the right type: use the bytes type and use os.environb instead. If you want to discuss this, the Web-SIG is the appropriate place. Also, it was the appropriate place months ago, when the final decision on the environ encoding was made. ;-) IOW, the above change to the PEP is merely fixing the code example to be correct for Python 3, where it previously was correct only for Python 2. The PEP itself has already required this since the previous revisions, and wsgiref in the stdlib is already compliant with the above (although it uses a more sophisticated approach for dealing with win32 compatibility). The rationale for this choice is described in the PEP, and was also discussed in the mailing list emails back when the work was being done. IOW, this particular ship already sailed a long time ago. In fact, for Jython this bytes-as-unicode approach has been the PEP 333-defined encoding for at least *six years*... so it's REALLY late to complain about it now! ;-) PEP is merely a mapping of PEP 333 to allow WSGI apps to be ported from Python 2 to Python 3. There is work in progress on the Web-SIG now on PEP 444, which will support only Python 2.6+, where 'b' literals and the 'bytes' alias are available. It is as yet uncertain what environ encoding will be used, but at the moment I'm not convinced that either pure bytes or pure unicode are acceptable replacements for the PEP 333-compatible approach. In any event, that is a discussion for the Web-SIG, not Python-Dev. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] ICU
At 07:47 AM 12/2/2010 -0800, Guido van Rossum wrote: On Wed, Dec 1, 2010 at 8:45 PM, Alexander Belopolsky alexander.belopol...@gmail.com wrote: On Tue, Nov 30, 2010 at 3:13 PM, Antoine Pitrou solip...@pitrou.net wrote: Oh, about ICU: Actually, I remember you saying that locale should ideally be replaced with a wrapper around the ICU library. By that, I stand - however, I have given up the hope that this will happen anytime soon. Perhaps this could be made a GSOC topic. Incidentally, this may also address another Python's Achilles' heel: the timezone support. http://icu-project.org/download/icutzu.html I work with people who speak highly of ICU, so I want to encourage work in this area. At the same time, I'm skeptical -- IIRC, ICU is a large amount of C++ code. I don't know how easy it will be to integrate this into our build processes for various platforms, nor how Pythonic the resulting APIs will look to the experienced Python user. Still, those are not roadblocks, the benefits are potentially great, so it's definitely worth investigating! FWIW, OSAF did a wrapping for Chandler, though I personally haven't used it: http://pyicu.osafoundation.org/ The README explains the mapping from the ICU APIs to Python ones, including iteration, string conversion, and timezone mapping for use with the datetime type. -- --Guido van Rossum (python.org/~guido) ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/pje%40telecommunity.com ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] constant/enum type in stdlib
At 11:31 AM 11/23/2010 -0500, Barry Warsaw wrote: On Nov 23, 2010, at 03:15 PM, Michael Foord wrote: (Well, there is a third option that takes __name__ and sets the constants in the module automagically. I can understand why people would dislike that though.) Personally, I think if you want that, then the explicit class definition is a better way to go. This reminds me: a stdlib enum should support proper pickling and copying; i.e.: assert SomeEnum.anEnum is pickle.loads(pickle.dumps(SomeEnum.anEnum)) This could probably be implemented by adding something like: def __reduce__(self): return getattr, (self._class, self._enumname) in the EnumValue class. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Issue 10194 - Adding a gc.remap() function
At 10:24 AM 10/26/2010 -0700, Peter Ingebretson wrote: I have a relatively large application written in Python, and a specific use case where it will significantly increase our speed of iteration to be able to change and test modules without needing to restart the application. If all you really want this for is reloading, it would probably make more sense to simply modify the existing class and function objects using the reloaded values as a template, then save the modified classes and functions back to the module. Have you tried http://pypi.python.org/pypi/plone.reload or http://svn.python.org/projects/sandbox/trunk/xreload/xreload.py, or any other existing code reloaders, or tried extending them for your specific use case? ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Exposing pkguitl's import emulation (was Re: [Python-checkins] r85538 - python/branches/py3k/Doc/library/pkgutil.rst)
At 08:03 AM 10/18/2010 +1000, Nick Coghlan wrote: I'm a little dubious about exposing these officially. They're mainly a hack to get some parts of the standard library working (e.g. runpy) in the absence of full PEP 302 support in the imp module, not really something we want to encourage anyone else to use (and yes, they should probably have underscores in their names, but we missed that when the various private implementations scattered around the stdlib were consolidated in pkgutil). Well, my intention at least was that they should be documented and released; it's the documenting part I didn't get around to. ;-) Of course, this was also pre-importlib; were we starting the work today, the obvious thing to do would be to expose the Python implementations of the relevant objects. That said, who knows when we'll actually have it done right, so in the meantime maybe having an official workaround is better than nothing... Cheers, Nick. -- Nick Coghlan | ncogh...@gmail.com | Brisbane, Australia ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/pje%40telecommunity.com ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] [Web-SIG] WSGI is now Python 3-friendly
At 01:22 PM 9/27/2010 -0400, Terry Reedy wrote: On 9/26/2010 9:38 PM, P.J. Eby wrote: At 11:15 AM 9/27/2010 +1000, Ben Finney wrote: You misunderstand me; I wasn't asking how to *add* a link, but how to turn OFF the automatic conversion of the phrase PEP 333 that happens without any special markup. Currently, the PEP preface is littered with unnecessary links, because the PEP pre-processor turns *every* mere textual mention of a PEP into a link to it. Ouch. This is about as annoying as Thunderbird's message editor popping up a windowed asking me what file I want to at.tach everytime I write the word at-tach' or a derivative without the extra punctuation. It would definitely not be the vehicle for writing about at=mentment syndromes. Suggestion pending something better from rst/PEP experts: This PEP extends PEP 333 (abbreviated P333 hereafter). perhaps with to avoid auto-link creation added before ')' to pre-answer pesky questions and to avoid some editor re-expanding the abbreviations. It turns out that using a backslash before the number (e.g. PEP \333) turns off the automatic conversion. The PEP still hasn't showed up on Python.org, though, so I'm wondering if maybe I broke something else somewhere. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] [Web-SIG] WSGI is now Python 3-friendly
At 12:36 PM 9/27/2010 -0700, Brett Cannon wrote: All fixed. Nope. I mean, sure, I checked in fixed PEP sources several hours ago, but python.org still doesn't show PEP , or the updated version of PEP 333. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] [Web-SIG] WSGI is now Python 3-friendly
At 02:03 PM 9/27/2010 -0700, Guido van Rossum wrote: On Mon, Sep 27, 2010 at 1:33 PM, P.J. Eby p...@telecommunity.com wrote: At 12:36 PM 9/27/2010 -0700, Brett Cannon wrote: All fixed. Nope. I mean, sure, I checked in fixed PEP sources several hours ago, but python.org still doesn't show PEP , or the updated version of PEP 333. Seems Brett has fixed it. Both PEPs are now online. I wonder if it would make sense to change both from Informational to Standard Track ? From PEP 1: There are three kinds of PEP: * A Standards Track PEP describes a new feature or implementation for Python. * An Informational PEP describes a Python design issue, or provides general guidelines or information to the Python community, but does not propose a new feature. Informational PEPs do not necessarily represent a Python community consensus or recommendation, so users and implementors are free to ignore Informational PEPs or follow their advice. * A Process PEP describes a process surrounding Python, or proposes a change to (or an event in) a process. Process PEPs are like Standards Track PEPs but apply to areas other than the Python language itself. They may propose an implementation, but not to Python's codebase; they often require community consensus; unlike Informational PEPs, they are more than recommendations, and users are typically not free to ignore them. Examples include procedures, guidelines, changes to the decision-making process, and changes to the tools or environment used in Python development. Any meta-PEP is also considered a Process PEP. I don't think it qualifies as a Standards PEP under the above definitions. I made it Informational originally because it's rather like the DB API PEPs, which are Informational. I suppose we could say it's a Process PEP, or perhaps update PEP 1 to add a new category (into which the DB API PEPs would also fall), or maybe just tweak the above definitions a bit so that the Informational category makes more sense. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] [Web-SIG] WSGI is now Python 3-friendly
At 05:41 PM 9/27/2010 -0700, Guido van Rossum wrote: On Mon, Sep 27, 2010 at 4:29 PM, P.J. Eby p...@telecommunity.com wrote: At 02:03 PM 9/27/2010 -0700, Guido van Rossum wrote: On Mon, Sep 27, 2010 at 1:33 PM, P.J. Eby p...@telecommunity.com wrote: At 12:36 PM 9/27/2010 -0700, Brett Cannon wrote: All fixed. Nope. I mean, sure, I checked in fixed PEP sources several hours ago, but python.org still doesn't show PEP , or the updated version of PEP 333. Seems Brett has fixed it. Both PEPs are now online. I wonder if it would make sense to change both from Informational to Standard Track ? From PEP 1: There are three kinds of PEP: * A Standards Track PEP describes a new feature or implementation for Python. * An Informational PEP describes a Python design issue, or provides general guidelines or information to the Python community, but does not propose a new feature. Informational PEPs do not necessarily represent a Python community consensus or recommendation, so users and implementors are free to ignore Informational PEPs or follow their advice. * A Process PEP describes a process surrounding Python, or proposes a change to (or an event in) a process. Process PEPs are like Standards Track PEPs but apply to areas other than the Python language itself. They may propose an implementation, but not to Python's codebase; they often require community consensus; unlike Informational PEPs, they are more than recommendations, and users are typically not free to ignore them. Examples include procedures, guidelines, changes to the decision-making process, and changes to the tools or environment used in Python development. Any meta-PEP is also considered a Process PEP. I don't think it qualifies as a Standards PEP under the above definitions. I made it Informational originally because it's rather like the DB API PEPs, which are Informational. I suppose we could say it's a Process PEP, or perhaps update PEP 1 to add a new category (into which the DB API PEPs would also fall), or maybe just tweak the above definitions a bit so that the Informational category makes more sense. Hm. I would rather extend the definition of Standards Track to include API standards that are important to the community even if they do not introduce a new feature for the language or standard library. WSGI and DB-API being the two most well-known examples but I wouldn't be surprised if there were others, possibly in the NumPy world. Well, one of the tradeoffs here is that Informational track allows something to grow into a solid standard without also having to pass the same level of up-front scrutiny and commitment that a Standards track item does. I rather doubt that either the DBAPI *or* WSGI would've passed that scrutiny in early days, and the free to ignore part means that there's a lot less pushback on the minor points than generally occurs with Standards track PEPs. So, I'd hate for us to lose out on the *next* DBAPI or WSGI due to an implied pressure of needing to get it right in the first place. (Indeed, I think we need *more* Informational PEPs -- in retrospect there was probably some point at which I should have done some relating to setuptools and eggs and such.) Overall, though, I supposed there's no problem with promoting Final Informational PEPs to Standards, *unless* it creates an expectation that Informational PEPs will become Standards and they thus end up being debated in the same way anyway. (Of course, if it generally takes five or six years before an Informational PEP usually gets promoted, this is unlikely to be a major worry.) ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] WSGI is now Python 3-friendly
At 07:15 PM 9/25/2010 -0700, Guido van Rossum wrote: Don't see this as a new spec. See it as a procedural issue. As a procedural issue, PEP 333 is an Informational PEP, in Draft status, which I'd like to make Final after these amendments. See http://www.wsgi.org/wsgi/Amendments_1.0, which Graham created in 2007, stating: This page is intended to collect any ideas related to amendments to the original WSGI 1.0 so that it can be marked as 'Final'. IOW, there is no intention to treat the PEP as mutable going forward; this is just cleanup so we can mark it Final. After that, it's an ex-parrot. Clarifications of ambiguous/unspecified behavior can possibly rule as non-conforming implementations that used to get the benefit of the doubt. Best-practice recommendations also have the effect of changing (perceived) compliance. I understand the general principle, but with respect to these *specific* changes, any perceived-compliance arguments that were going to happen, already happened years ago. The changes are merely to officially document the way those arguments already turned out, so the PEP can become Final. Specifically, the changes all fall into one of three categories: 1. Textual clarification (SERVER_PORT is not an int, iteration can stop before all output is consumed) 2. Practical issues with wsgi.input arising from the fact that real-world programs needed its behavior to be more file-like than the specification required... and which essentially forced servers that were not using socket.makefile() to make their emulations work like that, anyway (or else be rejected by users). 3. Clarification of behavior that would break HTTP compliance (apps or servers sending more than Content-Length bytes) and is therefore *already a bug* in any implementation that does it. Since in all three categories any implementation that did not end up following the recommendations on its own is going to have been considered buggy by its users (regardless of its formal compliance), and because the changes do not actually declare the buggy behaviors in categories 2 and 3 to be non-compliant, I do not see how any of these changes can produce the type of problems you're worried about here. Certainly, if I thought such problems were possible, I wouldn't have accepted these amendments. Likewise, if I thought that changes would continue to be made to the PEP past this point, the goal wouldn't be getting it to Final status. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] [Web-SIG] WSGI is now Python 3-friendly
At 08:20 AM 9/26/2010 -0700, Guido van Rossum wrote: I'm happy approving Final status for the *original* PEP 333 and I'm happy to approve a new PEP which includes PJE's corrections. Can we make it PEP , then? ;-) That number would at least communicate that it's the same thing, but for Python 3. Really, my reason for trying to do the (non Py3-specific) amendments in a way that didn't require a new PEP number was because of the many ancillary questions that it raises for the community, such as: * Is this is some sort of competition/replacement to PEP 444? * What happened to the old one, why can't we just use that? * Why isn't there a different protocol version? * How is this different from the old one? To be fair, I *also* wanted to avoid all the work associated with *answering* them. ;-) (Heck, I really wanted to avoid the work of having to even *think* about which questions *might* arise and how they'd need to be addressed.) OTOH, I can certainly see that my attempt to avoid this has *already* failed: it simply brought up a different set of questions, just on Python-Dev instead of Web-SIG or Python-list. Oh well. Perhaps making the numbering appear to be a continuation will help a bit. Another option would be to make a PEP that consists solely of the amendments and errata themselves, as this would answer most of the above questions directly. Still another would be to abandon the effort to amend the PEP, and simply leave things as they are now: AFAICT, the fact that these amendments aren't in the PEP hasn't stopped anybody from *treating* most of them as if they were. (Because everyone understands that failure to follow them constitutes a bug in your program, even if it technically complies with the spec.) ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] [Web-SIG] WSGI is now Python 3-friendly
At 01:44 PM 9/26/2010 -0700, Guido van Rossum wrote: On Sun, Sep 26, 2010 at 12:47 PM, Barry Warsaw ba...@python.org wrote: On Sep 26, 2010, at 1:33 PM, P.J. Eby wrote: At 08:20 AM 9/26/2010 -0700, Guido van Rossum wrote: I'm happy approving Final status for the *original* PEP 333 and I'm happy to approve a new PEP which includes PJE's corrections. Can we make it PEP , then? ;-) That works for me. Go for it. Shall I just svn cp it, then (to preserve edit history), or wait for the PEP editor do it? ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] [Web-SIG] WSGI is now Python 3-friendly
Done. The other amendments were never actually made, so I just reverted the Python 3 bit after moving it to the new PEP. I'll make the changes to instead as soon as I have another time slot free. At 01:56 PM 9/26/2010 -0700, Guido van Rossum wrote: Since you have commit privileges, just do it. The PEP editor position mostly exists to assure non-committers are not prevented from authoring PEPs. Please do add a prominent note at the top of PEP 333 pointing to PEP for further information on Python 3 compliance or some such words. Add a similar note at the top of PEP -- maybe mark up the differences in PEP so people can easily tell what was added. And move PEP 333 to Final status. --Guido On Sun, Sep 26, 2010 at 1:50 PM, P.J. Eby p...@telecommunity.com wrote: At 01:44 PM 9/26/2010 -0700, Guido van Rossum wrote: On Sun, Sep 26, 2010 at 12:47 PM, Barry Warsaw ba...@python.org wrote: On Sep 26, 2010, at 1:33 PM, P.J. Eby wrote: At 08:20 AM 9/26/2010 -0700, Guido van Rossum wrote: I'm happy approving Final status for the *original* PEP 333 and I'm happy to approve a new PEP which includes PJE's corrections. Can we make it PEP , then? ;-) That works for me. Go for it. Shall I just svn cp it, then (to preserve edit history), or wait for the PEP editor do it? -- --Guido van Rossum (python.org/~guido) ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/pje%40telecommunity.com ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] [Web-SIG] WSGI is now Python 3-friendly
At 02:59 PM 9/26/2010 -0400, Terry Reedy wrote: You could mark added material is a way that does not conflict with rst or html. Or use .rst to make new text stand out in the .html web verion (bold, underlined, red, or whatever). People familiar with 333 can focus on the marked sections. New readers can ignore the marking. If you (or anybody else) have any idea how to do that (highlight stuff in PEP-dialect .rst), let me know. (For that matter, if anybody knows how to make it not turn *every* PEP reference into a link, that'd be good too! It doesn't really need to turn 5 or 6 occurrences of PEP 333 in the same paragraph into separate links. ;-) ) ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] [Web-SIG] WSGI is now Python 3-friendly
At 11:15 AM 9/27/2010 +1000, Ben Finney wrote: P.J. Eby http://mail.python.org/mailman/listinfo/python-devpje at telecommunity.com writes: (For that matter, if anybody knows how to make it not turn *every* PEP reference into a link, that'd be good too! It doesn't really need to turn 5 or 6 occurrences of PEP 333 in the same paragraph into separate links. ;-) ) reST, being designed explicitly for Python documentation, has support for PEP references built in: You misunderstand me; I wasn't asking how to *add* a link, but how to turn OFF the automatic conversion of the phrase PEP 333 that happens without any special markup. Currently, the PEP preface is littered with unnecessary links, because the PEP pre-processor turns *every* mere textual mention of a PEP into a link to it. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] WSGI is now Python 3-friendly
I have only done the Python 3-specific changes at this point; the diff is here if anybody wants to review, nitpick or otherwise comment: http://svn.python.org/view/peps/trunk/pep-0333.txt?r1=85014r2=85013pathrev=85014 For that matter, if anybody wants to take a crack at updating Python 3's wsgiref based on the above, feel free. ;-) I'll be happy to answer any questions I can that come up in the process. (Please note: I went with Ian Bicking's headers are strings, bodies are bytes proposal, rather than my original bodies and outputs are bytes one, as there were not only some good arguments in its favor, but because it also resulted in fewer changes to the PEP, especially in the code samples.) I will continue to work on adding the other addenda/errata mentioned here: http://mail.python.org/pipermail/web-sig/2010-September/004655.html But because these are shoulds rather than musts, and apply to both Python 2 and 3, they are not as high priority for immediate implementation in wsgiref and do not necessarily need to hold up the 3.2 release. (Nonetheless, if anybody is willing to implement them in the Python 3 version, I will happily review the changes for backport into the Python 2 standalone version of wsgiref, and issue an updated release to include them.) Thanks! ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] WSGI is now Python 3-friendly
At 09:22 PM 9/25/2010 -0400, Jesse Noller wrote: It seems like it will end up different enough to be a different specification, closely related to the original, but different enough to trip up all the people maintaining current WSGI servers and apps. The only actual *change* to the spec is mandating the use of the 'bytes' type or equivalent for HTTP bodies when using Python 3. Seriously, that's *it*. Everything else that's (planned to be) added is either 100% truly just clarifications (e.g. nothing in the spec *ever* said SERVER_PORT could be an int, but apparently some people somehow interpreted it so), or else best-practice recommendations from people who actually implemented WSGI servers. For example, the readline() size hint is not supported in the original spec (meaning clients can't call it and be compliant). The planned modification is servers should implement it (best practice), but you can't call an implementation that *doesn't* implement it noncompliant. (This just addresses the fact that most practical implementations *did* in fact support it, and code out there relies on this.) So, no (previously-)compliant implementations were harmed in the making of the updated spec. If they were compliant before, they're compliant now. I'm actually a bit surprised people are bringing this up now, since when I announced the plan to make these changes, I said that nothing would be changed that would break anything... even for what I believe are the only Python 3 WSGI implementations right now (by Graham Dumpleton and Robert Brewer). Indeed, all of the changes (except the bytes thing) are stuff previously discussed endlessly on the Web-SIG (years ago in most cases) and widely agreed on as, this should have been made clear in the original PEP. And, I also explicitly deferred and/or rejected items that *can't* be done in a 100% backward-compatible way, and would have to be WSGI 1.1 or higher -- indeed, I have a long list of changes from Graham that I've pronounced can't be done without a 1.1. Indeed, the entire point of the my scope choices were to allow all this to happen *without* a whole new spec. ;-) ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] WSGI is now Python 3-friendly
At 02:07 PM 9/25/2010 -0700, Guido van Rossum wrote: This is a very laudable initiative and I approve of the changes -- but I really think it ought to be a separate PEP rather than pretending it is just a set of textual corrections on the existing PEP 333. With the exception of the bytes change, I ruled out accepting any proposed amendments that would actually alter the protocol. The amendments are all either textual clarifications, clarifications of ambiguous/unspecified areas, or best-practice recommendations by implementors. (i.e., which are generally already implemented in major servers) The full list of things Graham and others have asked for or recommended would indeed require a 1.1 version at minimum, and thus a new PEP. But I really don't want to start down that road right now, and therefore hope that I can talk Graham or some other poor soul into shepherding a 1.1 PEP instead. ;-) (Seriously: through an ironic twist of fate, I have done nearly *zero* Python web programming since around the time I drafted the first spec in 2004, so even if it makes sense for me to finish PEP 333, it makes little sense for me to be starting a *new* one on the topic now!) ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] Backup plan: WSGI 1 Addenda and wsgiref update for Py3
While the Web-SIG is trying to hash out PEP 444, I thought it would be a good idea to have a backup plan that would allow the Python 3 stdlib to move forward, without needing a major new spec to settle out implementation questions. After all, even if PEP 333 is ultimately replaced by PEP 444, it's probably a good idea to have *some* sort of WSGI 1-ish thing available on Python 3, with bytes/unicode and other matters settled. In the past, I was waiting for some consensuses (consensi?) on Web-SIG about different approaches to Python 3, looking for some sort of definite, yes, we all like this response. However, I can see now that this just means it's my fault we don't have a spec yet.:-( So, unless any last-minute showstopper rebuttals show up this week, I've decided to go ahead officially bless nearly all of what Graham Dumpleton (who's not only the mod_wsgi author, but has put huge amounts of work into shepherding WSGI-on-Python3 proposals, WSGI amendments, etc.) has proposed, with a few minor exceptions. In other words: almost none of the following is my own original work; it's like 90% Graham's. Any praise for this belongs to him; the only thing that belongs to me is the blame for not doing this sooner! (Sorry Graham. You asked me to do this ages ago, and you were right.) Anyway, I'm posting this for comment to both Python-Dev and the Web-SIG. If you are commenting on the technical details of the amendments, please reply to the Web-SIG only. If you are commenting on the development agenda for wsgiref or other Python 3 library issues, please reply to Python-Dev only. That way, neither list will see off-topic discussions. Thanks! The Plan I plan to update the proposal below per comments and feedback during this week, then update PEP 333 itself over the weekend or early next week, followed by a code review of Python 3's wsgiref, and implementation of needed changes (such as recoding os.environ to latin1-captured bytes in the CGI handler). To complete the changes, it is possible that I may need assistance from one or more developers who have more Python 3 experience. If after reading the proposed changes to the spec, you would like to volunteer to help with updating wsgiref to match, please let me know! The Proposal Overview 1. The primary purpose of this update is to provide a uniform porting pattern for moving Python 2 WSGI code to Python 3, meaning a pattern of changes that can be mechanically applied to as little code as practical, while still keeping the WSGI spec easy to programmatically validate (e.g. via ``wsgiref.validate``). The Python 3 specific changes are to use: * ``bytes`` for I/O streams in both directions * ``str`` for environ keys and values * ``bytes`` for arguments to start_response() and write() * text stream for wsgi.errors In other words, strings in, bytes out for headers, bytes for bodies. In general, only changes that don't break Python 2 WSGI implementations are allowed. The changes should also not break mod_wsgi on Python 3, but may make some Python 3 wsgi applications non-compliant, despite continuing to function on mod_wsgi. This is because mod_wsgi allows applications to output string headers and bodies, but I am ruling that option out because it forces every piece of middleware to have to be tested with arbitrary combinations of strings and bytes in order to test compliance. If you want your application to output strings rather than bytes, you can always use a decorator to do that. (And a sample one could be provided in wsgiref.) 2. The secondary purpose of the update is to address some long-standing open issues documented here: http://www.wsgi.org/wsgi/Amendments_1.0 As with the Python 3 changes, only changes that don't retroactively invalidate existing implementations are allowed. 3. There is no tertiary purpose. ;-) (By which I mean, all other kinds of changes are out-of-scope for this update.) 4. The section below labeled A Note On String Types is proposed for verbatim addition to the Specification Overview section in the PEP; the other sections below describe changes to be made inline at the appropriate part of the spec, and changes that were proposed but are rejected for inclusion in this amendment. A Note On String Types -- In general, HTTP deals with bytes, which means that this specification is mostly about handling bytes. However, the content of those bytes often has some kind of textual interpretation, and in Python, strings are the most convenient way to handle text. But in many Python versions and implementations, strings are Unicode, rather than bytes. This requires a careful balance between a usable API and correct translations between bytes and text in the context of HTTP... especially to support porting code between Python implementations with different ``str`` types. WSGI therefore
Re: [Python-Dev] [Web-SIG] Backup plan: WSGI 1 Addenda and wsgiref update for Py3
At 12:55 PM 9/21/2010 -0400, Ian Bicking wrote: On Tue, Sep 21, 2010 at 12:47 PM, Chris McDonough mailto:chr...@plope.comchr...@plope.com wrote: On Tue, 2010-09-21 at 12:09 -0400, P.J. Eby wrote: While the Web-SIG is trying to hash out PEP 444, I thought it would be a good idea to have a backup plan that would allow the Python 3 stdlib to move forward, without needing a major new spec to settle out implementation questions. If a WSGI-1-compatible protocol seems more sensible to folks, I'm personally happy to defer discussion on PEP 444 or any other backwards-incompatible proposal. I think both make sense, making WSGI 1 sensible for Python 3 (as well as other small errata like the size hint) doesn't detract from PEP 444 at all, IMHO. Yep. I agree. I do, however, want to get these amendments settled and make sure they get carried over to whatever spec is the successor to PEP 333. I've had a lot of trouble following exactly what was changed in 444, and I'm a tad worried that several new ambiguities may be being introduced. So, solidifying 333 a bit might be helpful if it gives a good baseline against which to diff 444 (or whatever). ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Backup plan: WSGI 1 Addenda and wsgiref update for Py3
At 06:52 PM 9/21/2010 +0200, Antoine Pitrou wrote: On Tue, 21 Sep 2010 12:09:44 -0400 P.J. Eby p...@telecommunity.com wrote: While the Web-SIG is trying to hash out PEP 444, I thought it would be a good idea to have a backup plan that would allow the Python 3 stdlib to move forward, without needing a major new spec to settle out implementation questions. If this allows the Web situation in Python 3 to be improved faster and with less hassle then all the better. There's something strange in your proposal: it mentions WSGI 2 at several places while there's no guarantee about what WSGI 2 will be (is there?). Sorry - WSGI 2 should be read as shorthand for, whatever new spec succeeds PEP 333, whether that's PEP 444 or something else. It just means that any new spec that doesn't have to be backward-compatible can (and should) more thoroughly address the issue in question. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] [Catalog-sig] egg_info in PyPI
At 05:19 PM 9/18/2010 +0200, Martin v. Löwis wrote: In the specific case of tl.eggdeps, the dependency information is only used to create printable graphs. If this turns out to be slightly incorrect, people would notice if they try to use the packages in question. By the way, just providing this information for .egg files and *not* for sdists would ensure accuracy of the metadata for that platform/python version. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] [Catalog-sig] egg_info in PyPI
At 06:06 PM 9/18/2010 +0200, Martin v. Löwis wrote: Am 18.09.10 17:49, schrieb P.J. Eby: At 05:19 PM 9/18/2010 +0200, Martin v. Löwis wrote: In the specific case of tl.eggdeps, the dependency information is only used to create printable graphs. If this turns out to be slightly incorrect, people would notice if they try to use the packages in question. By the way, just providing this information for .egg files and *not* for sdists would ensure accuracy of the metadata for that platform/python version. True (I presume - unless there are also dependencies on the specific OS version or system installation that may affect the metadata). No, because an egg's egg-info is what it is. easy_install doesn't rebuild that information, so it is correct by definition. ;-) (Certainly, it is what it will be used for dependency information.) OTOH, I do think that the users asking for that prefer per-release information, despite the limitations that this may have. OTTH, if the concerns could be relieved if egg-info would provided for all files that have it, I could provide that as well/instead. I am +0 on the idea myself, as I don't think the plan is quite enough to be able to provide a user-experience upgrade for use cases besides make me a dependency graph without downloading the distributions themselves. It certainly would be nice to be able to say to the user, here are the things I will need to download in order to fulfill your request, but if you have to download individual files to get at that information, I'm not sure how much it helps vs. just downloading the files. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] standards for distribution names
At 12:08 PM 9/16/2010 +0100, Chris Withers wrote: Following on from this question: http://twistedmatrix.com/pipermail/twisted-python/2010-September/022877.html ...I'd thought that the correct names for distributions would have been documented in one of: ... Where are the standards for this or is it still a case of whatever setuptools does? Actually, in this case, it's whatever distutils does. If you don't build your .exe's with Distutils, or if you rename them after the fact, then setuptools won't recognize them as things it can consume. FYI, Twisted has a long history of releasing distribution files that are either built using non-distutils tools or else renamed after being built. Note, too, that if the Windows exe's they're providing aren't built by the distutils bdist_wininst command, then setuptools is probably not going to be able to consume them, no matter what they're called. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] standards for distribution names
At 12:08 PM 9/16/2010 +0100, Chris Withers wrote: ...I'd thought that the correct names for distributions would have been documented in one of: http://www.python.org/dev/peps/pep-0345 http://www.python.org/dev/peps/pep-0376 http://www.python.org/dev/peps/pep-0386 ...but having read them, I drew a blank. Forgot to mention: see distinfo_dirname() in PEP 376 for an explanation of distribution-name normalization. (Case-insensitivity and os-specific case handling is not addressed in the PEPs, though, AFAICT.) ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] 3.x as the official release
At 10:18 PM 9/16/2010 +0200, Ãric Araujo wrote: Le 15/09/2010 21:45, Tarek Ziadé a écrit : Could we remove in any case the wsgiref.egg-info file ? Since we've been working on a new format for that (PEP 376), that should be starting to get used in the coming years, it'll be a bit of a non-sense to have that metadata file in the sdtlib shipped with 3,2 On a related subject: Would it make sense not to run install_egg_info from install anymore? We probably canât remove the command because of backward compat, but we could stop running it (thus creating egg-info files) by default. If you're talking about distutils2 on Python 3, then of course anything goes: backward compatibility isn't an issue. For 2.x, not writing the files would indeed produce backward compatibility problems. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] 3.x as the official release
At 11:11 AM 9/15/2010 -0700, Guido van Rossum wrote: Given that wsgiref is in the stdlib, I think we should hold up the 3.2 release (and even the first beta) until this is resolved, unless we can convince ourselves that it's okay to delete wsgiref from the stdlib (which sounds unlikely but may not be any more incompatible than making it work properly :-). FWIW, I'd be fine with that option. I want to emphasize that I am *not* a stakeholder so my preference for bytes or Unicode shouldn't matter; that said, given WSGI's traditional emphasis on using the lowest-level, vanilla standard datatypes (e.g. you can't even subclass dict let alone provide another kind of mapping -- it has to be a real dict) it makes sense to me that the values should be bytes, os.environ notwithstanding. The keys probably could be Unicode (HTTP headers are required to use only 7-bit ASCII characters anyways right?). But I'd be happy to be shown the error of my ways (or given a link showing prior discussion of the matter -- preferably with a conclusion :-). There isn't a conclusion yet, but the proposals under discussion are summarized here: http://www.wsgi.org/wsgi/Python_3#Proposals The primary points of consensus are bytes for wsgi.input, and native strings (i.e. Unicode on Python 3) for environment keys. If I were to offer a suggestion to a PEP author or dictator wanting to get something out ASAP, it would probably be to create a compromise between the flat model (my personal favorite) and the mod_wsgi model, as an addendum to PEP 333. Specifically: * leave start_response/write in play (ala mod_wsgi) * use the required types from the flat proposal (i.e. status, headers, and output stream MUST be bytes) * add a decorator to wsgiref that supports using native strings as output instead of bytes, for ease-of-porting (combine mod_wsgi's ease-of-porting w/flat's simple verifiability) This would probably allow us to get by with the least changes to existing code, the stdlib, the standard itself, and wsgiref. (wsgiref itself would still need a thorough code review, especially wsgiref.validate, but it'd be unlikely to change much.) ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] 3.x as the official release
At 11:12 PM 9/15/2010 +0200, Ãric Araujo wrote: Unless I remember wrong, the intent was not to break code that used pkg_resources.require('wsgiref') More precisely, at the time it was done, setuptools was slated for inclusion in Python 2.5, and the idea was that when modules moved from PyPI to the stdlib, they would include the metadata so that projects requiring the module on an older version of Python would not need to use Python-version-dependent dependencies. So, for example, if a package was written on 2.4 using a requirement of wsgiref, then that code would run unchanged on 2.5 using the stdlib-supplied copy. In practice, this didn't work out in 2.x, and it's meaningless on 3.x where nothing has migrated yet from PyPI to stdlib AFAIK. ;-) ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] 3.x as the official release
At 11:50 PM 9/15/2010 +0200, Dirkjan Ochtman wrote: On Wed, Sep 15, 2010 at 21:18, P.J. Eby p...@telecommunity.com wrote: If I were to offer a suggestion to a PEP author or dictator wanting to get something out ASAP, it would probably be to create a compromise between the flat model (my personal favorite) and the mod_wsgi model, as an addendum to PEP 333. Â Specifically: * leave start_response/write in play (ala mod_wsgi) The alternative is returning a three-tuple status, headers, content-iterable, right? I would definitely prefer just returning a three-tuple instead of the crappy start_response callback that returns a write callable. It makes applications easier to write, and the unified model should also make server implemation easier. It also combines nicely with yield from in some cases. I would prefer it too (which is why the flat model is my favorite), but I think it would be easier to get a quick consensus for something that allows apps to be more mechanically ported from 2.x to 3.x. That's why I said, offer a suggestion to ... get something out ASAP. ;-) ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 444 aka Web3 (was Re: how to decide on a Python 3 design for wsgiref)
At 09:22 AM 9/16/2010 +1000, James Mills wrote: On Thu, Sep 16, 2010 at 9:06 AM, Chris McDonough chr...@plope.com wrote: Comments and competing specs would be useful. Can I post comments here ? :) Please, let's put any spec-detail commentary on the Web-SIG instead (commenting here on process issues related to the 3.x releases is of course fine). ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] 'hasattr' is broken by design
At 12:10 PM 8/25/2010 +1200, Greg Ewing wrote: Consider an object that is trying to be a transparent proxy for another object, and behave as much as possible as though it really were the other object. Should an attribute statically defined on the proxied object be considered dynamically defined on the proxy? If so, then the proxy isn't as transparent as some people may want. Yep. That's why the proposed addition to inspect is a bad idea. If we encourage that sort of static thinking, it will lead to people creating all sorts of breakage with respect to more dynamic code. AFAICT, the whole avoid running code thing only makes sense for a debugging tool -- at which point, you can always use the trace facility and throw an error when any Python code runs that's not part of your debugging tool. Something like: def exists(ob, attr): __running__ = True # ... set trace function here try: try: getattr(ob, attr) return True except AttributeError: return False except CodeRanError: return True # or False if you prefer finally: __running__ = False # restore old tracing here Where the trace function is just something that throws CodeRanError if it detects a call event and the __running__ flag is True. This would stop any Python code from actually executing. (It'd need to keep the same trace function for c_call events, since that might lead to nested non-C calls .) Of course, a debugger's object inspection tool would probably actually want to return either the attribute value, or a special value to mean dyanmic calculation needed. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] 'hasattr' is broken by design
At 08:58 PM 8/25/2010 +0300, Michael Foord wrote: If your proxy class defines __call__ then callable returns True, even if the delegation to the proxied object would cause an AttributeError to be raised. Nope. You just have to use delegate via __getattribute__ (since 2.2) instead of __getattr__: from peak.util.proxies import ObjectProxy o=ObjectProxy(lambda:1) o() 1 o.__call__ method-wrapper '__call__' of function object at 0x00E004B0 o=ObjectProxy(1) o() Traceback (most recent call last): File stdin, line 1, in module File c:\cygwin\home\pje\projects\proxytypes\peak\util\proxies.py, line 6, in __call__ return self.__subject__(*args,**kw) TypeError: 'int' object is not callable o.__call__ Traceback (most recent call last): File stdin, line 1, in module File c:\cygwin\home\pje\projects\proxytypes\peak\util\proxies.py, line 12, i n __getattribute__ return getattr(subject,attr) AttributeError: 'int' object has no attribute '__call__' As you can see, the __call__ attribute in each case is whatever the proxied object's __call__ attribute is, even though the proxy itself has a __call__ method, that is invoked when the proxy is called. This is actually pretty straightforward stuff since the introduction of __getattribute__. (The code is at http://pypi.python.org/pypi/ProxyTypes, btw.) ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] 'hasattr' is broken by design
At 03:37 PM 8/24/2010 +0200, Hrvoje Niksic wrote: a) a business case of throwing anything other than AttributeError from __getattr__ and friends is almost certainly a bug waiting to happen, and FYI, best practice for __getattr__ is generally to bail with an AttributeError as soon as you see double underscores in the name, unless you intend to support special attributes. I don't think this is documented anywhere, but experience got this pretty ingrained in my head since Python 2.2 or even earlier. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] 'hasattr' is broken by design
At 10:13 AM 8/24/2010 -0500, Benjamin Peterson wrote: 2010/8/24 James Y Knight f...@fuhm.net: On Aug 24, 2010, at 10:26 AM, Benjamin Peterson wrote: 2010/8/24 P.J. Eby p...@telecommunity.com: At 03:37 PM 8/24/2010 +0200, Hrvoje Niksic wrote: a) a business case of throwing anything other than AttributeError from __getattr__ and friends is almost certainly a bug waiting to happen, and FYI, best practice for __getattr__ is generally to bail with an AttributeError as soon as you see double underscores in the name, unless you intend to support special attributes. Unless you're in an old-style class, you shouldn't get an double underscore methods in __getattr__ (or __getattribute__). If you do, it's a bug. Uh, did you see the message that was in response to? Maybe it should be a bug report? Old version of Python I think. If by old you mean 2.6, sure. (Also, I did say this was a best practice since 2.2.) ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] 'hasattr' is broken by design
At 12:02 AM 8/24/2010 +0300, Michael Foord wrote: For properties there is *no reason* why code should be executed merely in order to discover if the attribute exists or not. That depends on what you mean by exists. Note that a property might raise AttributeError to signal that the attribute is not currently set. Likewise, unless you special case __slots__ descriptors, you can have the bizarre condition where hasattr() will return True, but getattr() will still raise an AttributeError. The idea that you could determine the presence of an attribute on an object without executing that object's code is something that hasn't been practical since the birth of descriptors in Python 2.2. Yes I know the dance (walking the mro fetching the attribute out of the appropriate type __dict__ or the instance dict - or looking on the metaclass if the object you are introspecting is a type itself), it is just not trivial - which is why I think it is a shame that people are forced to implement it just to ask if a member exists without triggering code execution. Even if you implement it, you will get wrong answers in some cases. __getattribute__ is allowed to throw out the entire algorithm you just described and replace it utterly with something else. My ProxyTypes library makes use of that fact, for example, so if you actually attempted to inspect a proxy instance with your re-implemented dance, your code will fail to notice what attributes the proxy actually has. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] 'hasattr' is broken by design
At 06:12 PM 8/23/2010 -0400, Yury Selivanov wrote: BTW, is it possible to add new magic method __hasattr__? Maybe not in Python 3.2, but in general. In order to do this properly, you'd need to also add __has__ or __exists__ (or some such) to the descriptor protocol; otherwise you break descriptors' ability to operate independently of the class they're used in. You would probably also need a __hasattribute__, in order to be able to properly synchronize with __getattr__/__getattribute__. Seems like overkill to me, though, as I'm not sure how such a protocol actually helps ORM or persistence schemes (and I've written a few). Pretty much, if you're trying to check for the existence of an attribute, you're probably about to be getting that attribute anyway. (i.e. why query the existence of an attribute you *don't* intend to use?) ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 376 proposed changes for basic plugins support
At 10:28 AM 8/3/2010 +0200, M.-A. Lemburg wrote: Since you are into comparing numbers, you might want to count the number of Zope plugins that are available on PyPI and its plugin system has been around much longer than setuptools has been. I don't think that proves anything, though. Actually, some of the ones I found in the search using entry points *were* Zope, which, as I mentioned before, is increasingly moving away from the old approach in favor of entry points. In any case, I am not advocating *setuptools* -- I'm advocating that if PEP 376 expands to add plugin support, that it do so with a file format and associated API based on that of entry points, so as to make migration of those ~187 modules and their associated plugins to distutils2 a little easier. In other words, I'm trying to make it easier for people to move OFF of setuptools. Crazy, I know, but there you go. ;-) ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 376 proposed changes for basic plugins support
At 01:40 PM 8/3/2010 +0200, M.-A. Lemburg wrote: If you look at the proposal, it is really just about adding a new data store to manage a certain package type called plugins. Next time around, someone will want to see support for skins or themes. Then perhaps identify script packages, or application packages, or namespace packages, or stubs, etc. All this can be had by providing this kind of extra meta-information in the already existing format. If by existing format, you mean entry points, then yes, that is true. ;-) They are used today for most of the things you listed; anything that's an importable Python object (module, class, function, package, constant, global) can be listed as an entry point belonging to a named group. Heck, the first code sample on Nullege for iter_entry_points is some package called Apydia loading an entry point group called apydia.themes! Seriously, though, PEP 376 is just setuptools' egg-info under a different name with uninstall support added. And egg-info was designed to be able to hold all those things you're talking about. The EggTranslations project, for example, defines i18n-support files that can be placed under egg-info, and provides its own APIs for looking those things up. Applications using EggTranslations can not only have their own translations shipped as plugins, but plugins can provide translations for other plugins of the same application. (I believe it also supports providing other i18n resources such as icons as well.) So, it isn't actually necessary for the stdlib to provide any particular support for specific kinds of metadata within PEP 376, as long as the PEP 376 API supports finding packages with metadata files of a particular name. (EggTranslations uses similar APIs provided by pkg_resources.) However, since Tarek proposed adding a stdlib-supported plugins feature, I am suggesting it adopt the entry_points.txt file name and format, to avoid unnecessary API fragmentation. If we add a new extra file to be managed by the package managers every time someone comes up with a new use case, we'd just clutter up the disk with more and more CSV file extracts and make PEP 376 more and more complex. The setuptools egg-info convention is not to create files that don't contain any useful content, so that their presence or absence conveys information. If that convention is continued in PEP 376, features that aren't used won't take up any disk space. As for cluttering the PEP, IMO any metadata files that aren't part of the installation database feature should probably have their own PEP. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Yield-From Implementation Updated for Python 3
At 09:24 PM 8/1/2010 -0700, Guido van Rossum wrote: I don't understand all the details and corner cases (e.g. the concatenation of stacks It's just to ensure that you never have From's iterating over other From's, vs. just iterating whatever's at the top of the stack. which seems to have to do with the special-casing of From objects in __new__) It isn't connected, actually except that it's another place where I'm keeping From's flat, instead of nested. (I hear that flat is better. ;-) ) I am curious whether, if you need a trampoline for async I/O anyway, there would be a swaying argument for integrating this functionality into the general trampoline (as in the PEP 342 example), Originally, that was why I wasn't very enthusiastic about PEP 380; it didn't seem to me to be adding any new value over what you could do with existing, widely-used libraries. (Twisted's had its own *and* multiple third-party From-ish libraries supporting it for many years now.) After I wrote From(), however (which was originally intended to show why I thought 380 was unnecessary), I realized that having One Obvious Way to implement generator-based pseudothreads independent of an event loop, is actually useful precisely *because* it separates the pseudothreadedness from what you're using the pseudothreadedness for. Essentially, the PEP 380-ish bit is the hardest part of writing an actual pseudothread implementation; connecting that implementation to an I/O framework is actually the relatively simple part. You just write code that steps into the generator, and uses the yielded object to initiate an I/O operation and register a callback. (If you're using Twisted or something else that has promise-like deferred results, it's *really* easy, because you only have a couple of types of yielded objects to deal with, and a uniform callback signature.) Indeed, if you're using an existing async I/O framework, you don't even really *have* a trampoline as such -- you just have a bit of code that registers callbacks to itself, and the app's main event loop just calls back to that wrapper when the I/O is done. In effect, an I/O framework integration would just give you a single API like, run(From(geniter)), which then performs one iteration, and then registers whatever callback it's told to by the yield, and the callback it registers would actually be a reinvocation of run() on the same From instance when the I/O is ready, but with a value to pass back into the send(), or an error to throw(). So, the I/O framework's event loop is half of the trampoline, and the wrapper that sends or throws, then registers an I/O callback, is the other half. Something like: def run(coroutine, value=None, exc_info=()): if exc_info: action = coroutine.throw(*exc_info) else: action = coroutine.send(value) action.registerCallback(partial(run, coroutine)) Where 'action' is some I/O command object, and registerCallback() will call its argument back with a value or exc_info, after the I/O is done. Of course, a real framework integration might actually dispatch on type here rather than using special command objects like this, and there might be more glue code to deal with exceptions, but really, the heart of the thing is just going to look like that. (I just wrote it that way to show the basic structure.) Really, it's just a few functions, maybe a utility routine or two, and maybe a big if-then or dictionary dispatch on types if you just want to be able to 'yield' existing I/O objects provided by the frameworks. IOW, it's a *lot* simpler than actually rolling your own I/O or GUI framework like Twisted or Eventlet or wxPython or tk or some other such thing. But it seems a bit of a waste to have two different trampolines, especially since the trampoline itself is so hard to understand (speaking for myself here :-). ISTM that the single combined trampoline is easier to understand than the From class. Well, the PEP 342 example was made to look simple, because it doesn't have to actually DO anything (like I/O!) To work for real, it'd need some pluggability, and some things to help it interoperate with different GUI and I/O frameworks and event loops. (Using your own event loop for real isn't very useful in a lot of non-trivial applications.) Heck, after writing From(), it gave me an idea that I could just write a trampoline that *could* integrate with other event loops, with an idea to have it be a general-purpose companion to From. But, after several wasted hours, I realized that yes, it *could* be written (I still have the draft), but it was mostly just something that would save a little boilerplate in bolting From()'s onto an existing async I/O framework, and not really anything to write home about. So, I guess what I'm saying is, the benefit of separating the trampoline from control flow, is that people can then use them
Re: [Python-Dev] PEP 376 proposed changes for basic plugins support
At 01:53 PM 8/2/2010 +, exar...@twistedmatrix.com wrote: On 01:27 pm, m...@egenix.com wrote: exar...@twistedmatrix.com wrote: This is also roughly how Twisted's plugin system works. One drawback, though, is that it means potentially executing a large amount of Python in order to load plugins. This can build up to a significant performance issue as more and more plugins are installed. I'd say that it's up to the application to deal with this problem. An application which requires lots and lots of plugins could define a registration protocol that does not require loading all plugins at scanning time. Just for the record, solving this problem is precisely what entry points are for: they provide a discovery mechanism that doesn't require importing anything until you actually need it. It's not fixable at the application level, at least in Twisted's plugin system. It sounds like Zope's system has the same problem, but all I know of that system is what you wrote above. I don't know about Zope in general, but there are certainly Zope corp. projects that use entry points instead of namespaces (buildout, for one), and I believe that there's been a long time push to move third-party code out of the common namespace package. i.e., AFAIK, Zope 3 doesn't use package namespaces as a primary method of extension. The cost increases with the number of plugins installed on the system, not the number of plugins the application wants to load. Pretty much any plugin discovery system is going to scale that way, but entry points only require file reads rather than imports, and have a shared cache for all code in use by the application. So if, say, Twisted uses entry points and an application running on Twisted also uses entry points, the loading cost is only paid once for both sets of entry points inspected. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 376 proposed changes for basic plugins support
At 01:10 PM 8/2/2010 +0200, Tarek Ziadé wrote: I don't have a specific example in mind, and I must admit that if an application does the right thing (provide the right configuration file), this activate feature is not useful at all. So it seems to be a bad idea. Well, it's not a *bad* idea as such; actually, having conventions for such configuration, and libraries that help to implement the convention are a *good* idea, and I support it. I just don't think it makes much sense to *impose* the convention on the app developers; there are, after all, use cases that don't need the extra configuration. Setuptools was mainly designed to support the application plugin directory model for invasive sorts of plugins, and the global plugin availability model for the kind of plugins that a user has to explicitly select (e.g. file type converters, special distutils commands, etc.). However, there are definitely use cases for user-configured plugins, and the apps that do it generally use some sort of configuration file to identify which entry points they'll actually use. IOW, have entry points like setuptools provides, but in a metadata field instead of a entry_points.txt file. May I suggest, then, that we keep entry_points.txt, but simply provide a summary in PKG-INFO? (i.e., list the groups and names provided) This would still make it easy for human browsing/discovery of entry points on PyPI, but it would allow easy forward/backward compatibility between setuptools and distutils2, while also providing faster lookup of entry points (because you can skip distributions that don't have an entry points file, vs. having to parse *every* PKG-INFO file). Or to put it another way, when I implement PEP 376 support in setuptools 0.7, I'll only have to change the name of the .egg-info directory and copy the entry point summary into PKG-INFO. And, even more to the point, people who define entry points with distutils2 will then be able to have them work with setuptools-based projects, and vice versa, helping to smooth the transition. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 376 proposed changes for basic plugins support
At 05:08 PM 8/2/2010 +0200, Ãric Araujo wrote: I wonder if functions in pkgutil or importlib could allow one to iterate over the plugins (i.e. submodules and subpackages of the namespace package) without actually loading then. See pkgutil.walk_packages(), available since 2.5. It has to load __init__.py files, especially because of namespace packages, but it doesn't load any non-package modules. That being said, using namespace packages for plugins kind of defeats the true purpose of namespace packages, which is to give developers private package namespaces they can use across multiple projects, like zope.*, peak.*, twisted.*, etc., thereby avoiding naming conflicts in the root package namespace. Granted, you can always re-nest namespaces and do something like someproject.plugins.mynamehere.myplugin, but with entry points you can just register something in mynamehere.mysomeprojectplugin, and flat is better than nested. ;-) (Plus, you can include information about the individual plugins/features residing in that module in the metadata, and avoid importing until/unless you need that feature.) ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 376 proposed changes for basic plugins support
At 09:03 PM 8/2/2010 +0100, Michael Foord wrote: Ouch. I really don't want to emulate that system. For installing a plugin for a single project the recommended technique is: * Unpack the source. It should provide a setup.py. * Run: $ python setup.py bdist_egg Then you will have a *.egg file. Examine the output of running python to find where this was created. Once you have the plugin archive, you need to copy it into the plugins directory of the project environment Those instructions are apparently out-of-date; you can actually just easy_install -m or pip the plugin directly to the plugins directory, without any additional intervening steps. (The only reason to create an .egg file for Trac is if you intend to distribute to non-developer users who will be told to just drop it in the plugins directory.) For global plugins it just uses entry points, which is similar to the functionality we are suggesting adding... I believe it's using entry points for both, actually. It just has an (application-specific) filtering mechanism to restrict which entry points get loaded. Really this sounds *astonishingly* like the system we are proposing. :-) Which is why I keep pointing out that the code for doing most of it is already available in setuptools, distribute, pip, buildout, etc., and so (IMO) ought to just get copied into distutils2, the way easy_install's package index code was. ;-) (Of course, adding some filtering utilities to make it easier for apps to do explicit configuration would still be nice.) ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 376 proposed changes for basic plugins support
At 10:37 PM 8/2/2010 +0200, M.-A. Lemburg wrote: If that's the case, then it would be better to come up with an idea of how to make access to that meta-data available in a less I/O intense way, e.g. by having pip or other package managers update a central SQLite database cache of the data found on disk. Don't forget system packaging tools like .deb, .rpm, etc., which do not generally take kindly to updating such things. For better or worse, the filesystem *is* our central database these days. Btw, while adding PLUGINS to PEP 376 is a new proposal, it's essentially another spelling of the existing entry_points.txt used by eggs; it changes the format to csv instead of .ini, and adds description and type fields, but drops requirements information and I'm not sure if it can point to arbitrary objects the way entry_points.txt can. Anyway, entry_points.txt has been around enough years in the field that the concept itself can't really be called new - it's actually quite proven. Checking http://nullege.com/codes/search/pkg_resources.iter_entry_points/call , I find 187 modules using just that one entry points API. Some projects do have more than one module loading plugins, but the majority of those 187 appear to be different projects. Note that that's modules *loading plugins*, not plugins being provided... so the total number of PyPI projects using entry points in some way is likely much higher, once you add in the plugins that these 187 lookups are, well, looking up. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 382 progress: import hooks
At 05:28 PM 8/2/2010 -0700, Brett Cannon wrote: On Fri, Jul 23, 2010 at 09:54, P.J. Eby mailto:p...@telecommunity.comp...@telecommunity.com wrote: At 11:57 AM 7/23/2010 +0100, Brett Cannon wrote: On Thu, Jul 22, 2010 at 19:19, P.J. Eby mailto:p...@telecommunity.comp...@telecommunity.com wrote: What does is not a package actually mean in that context? The module is a module but not a package. Um... Â that's not any clearer. Â Are you saying that a module of the same name takes precedence over a package? Â Is that the current precedence as well? No, packages take precedence. I meant that something is a module but it is not a package; a package implicitly includes a module, but a module is not automatically a package. That explanation still isn't making things any clearer for me. That is, I don't know how to get from that statement to actual code, even if I were writing a filesystem or zip importer, let alone anything more exotic. zipimport also does it this way as it too does not differentiate a reload from a clean load beyond grabbing the module from sys.modules if it is already there. PEP 302 does not directly state that reloading should not reset the attributes that import must set, simply that a module from sys.modules must be reused. Since zipimport does it this way I wouldn't count on other loaders not setting __path__. Fair enough, though certainly unfortunate. In particular, it means that it's not actually possible to correctly/completely implement PEP 382 on any already-released version of Python, without essentially replacing zipimport. (Unless the spec can be tweaked a bit.) I'm personally not worried about supporting older versions of Python as this is a new feature. Better to design it properly than come up with some hack solution as we will all have to live with this for a long time. Currently, older Pythons are the only versions I *do* support, so I'm very concerned with it. Otherwise, I'd not be asking all these questions. ;-) Personally, I think there are features in the PEP that make things unnecessarily complicated - for example, supporting both __init__.py *and* .pth files in the same directory. If it were either/or, it would be a LOT easier to implement on older Pythons, since it wouldn't matter when you initialized the __path__ in that case. (By the way, there were some other questions I asked about the PEP 382 revisions, that you didn't reply to in previous emails (such as the format of the strings to be returned by find_path()); I hope either you or Martin can fill those in for me, and hopefully update the PEP with the things we have talked about in this thread.) ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Yield-From Implementation Updated for Python 3
At 08:49 AM 8/1/2010 -0400, Kevin Jacobs jac...@bioinformed.com wrote: On Sun, Aug 1, 2010 at 3:54 AM, Greg Ewing mailto:greg.ew...@canterbury.ac.nzgreg.ew...@canterbury.ac.nz wrote: I have updated my prototype yield-from implementation to work with Python 3.1.2. My work is primarily on the management and analysis of huge genomics datasets. I use Python generators extensively and intensively to perform efficient computations and transformations on these datasets that avoid the need to materialize them in main memory to the extent possible. I've spent a great deal of effort working around the lack of an efficient yield from construct and would be very excited to see this feature added. Just so you know, you don't need to wait for this to be added to Python in order to have such a construct; it just won't have the extra syntax sugar. See the sample code I posted here using a @From.container decorator, and a yield From() call: http://mail.python.org/pipermail/python-dev/2010-July/102320.html This code effectively reduces your generator nesting depth to a constant, no matter how deeply you nest sub-generator invocations. It's not as efficient as the equivalent C implementation, but if you're actually being affected by nesting overhead now, it will nonetheless provide you with some immediate relief, if you backport it to 2.x code. (It's not very 3.x-ish as it sits, really.) ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 376 proposed changes for basic plugins support
At 02:03 AM 8/2/2010 +0200, Tarek Ziadé wrote: but then we would be back to the problem mentioned about entry points: installing projects can implicitly add a plugin and activate it, and break existing applications that iterate over entry points without further configuration. So being able to disable plugins from the beginning seems important to me So which are these apps that don't allow configuration, and which are the plugins that break them? Have the issues been reported so that the authors can fix them? ISTM that the issue can only arise in cases where you are installing plugins to a *global* environment, rather than to an environment specific to the application. In the case of setuptools, for example, it's expected that a project will use 'setup_requires' to identify the plugins it wishes to use, apart from any that were intentionally installed globally. (The requested plugins are then added to sys.path only for the duration of the setup script execution.) Other applications have plugin directories where their plugins are to be installed, and still others have explicit configuration to enable named plugins. Even in the worst-case scenario, where an app has no plugin configuration and no private plugin directory, you can still control plugin availability by installing plugins to the directory where the application's main script is located, or point PYTHONPATH to point to a directory you've chosen to hold the plugins of your choice. So without specific examples of why this is a problem, it's hard to see why a special Python-specific set of configuration files is needed to resolve it, vs. say, encouraging application authors to use the available alternatives for doing plugin directories, config files, etc. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] proto-pep: plugin proposal (for unittest)
At 03:34 PM 7/30/2010 +0100, Michael Foord wrote: Automatic discoverability, a-la setuptools entry points, is not without its problems though. Tarek outlines some of these in a more recent blog post: FWIW, it's not discovery that's the problem, but configuring *which* plugins you wish to have active. Entry points support access by name, and it's up to the application using them to decide *which* ones to load. The underlying idea is that entry points expose a hook; it's up to the app to decide which ones it should actually import and use. An application also can list the available plugins and ask the user, etc.(For example, setuptools only loads setup() argument entry points for specified arguments, and command entry points only for the commands a user explicitly invokes.) IOW, entry points provide access to plugins, not policy or configuration for *which* plugins you wish to use. This was an intentional decision since applications vary widely in what sort of configuration mechanism they use. In the simplest cases (e.g. single-app environments like Chandler), simply making the plugin available on sys.path (e.g. via a special plugins directory) is configuration enough. In more complex use cases, an app might have to import plugins in order to get more information about them. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] unexpected import behaviour
At 11:50 PM 7/30/2010 +0400, Oleg Broytman wrote: On Fri, Jul 30, 2010 at 07:46:44PM +0100, Daniel Waterworth wrote: can anyone think of a case where someone has been annoyed that, having imported that same module twice via symlinks, they have had problems relating to modules being independent instances? I've had problems with two instances of the same module imported after sys.path manipulations. Never had a problem with reimported scripts. I have. The unittest module used to have this problem, when used as a script. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] proto-pep: plugin proposal (for unittest)
At 04:37 PM 7/30/2010 +0200, Tarek Ziadé wrote: On Fri, Jul 30, 2010 at 4:04 PM, Barry Warsaw ba...@python.org wrote: .. * Registration - How do third party plugins declare themselves to exist, and be enabled? Part of this seems to me to include interface declarations too. Is installation of the plugin enough to register it? How do end users enable and disable plugins that me be registered on their system? How do plugins describe themselves (provide short and log descriptions, declare options, hook into command line interfaces, etc.)? * Installation - How are plugins installed on the system? Do they have to appear in a special directory on the file system? Do they need special setup.py magic to write extra files? Do they need to live in a pre-defined namespace? FWIW We are thinking about adding in distutils2 a system quite similar to the entry points setuptools has, but with extra abilities for the end user : - activate / deactivate plugins without having to remove the project that added them - configure globally if plugins are implicitely activated or not -- and maybe allow the distutils2 installer to ask the user when a plugin is detected if he wants it activate or not - provide a tool to browse them Note, by the way, that none of these are mutually exclusive to the entry point mechanism; it is simply up to an application developer to decide which of those features he/she wishes to provide. A library that provides common implementations of such features on top of entry points would be a good idea. pkg_resources already supplies one such tool, btw: the find_plugins() API for locating projects in one or more plugin directories that *could* be added to sys.path to provide plugins for an application. It's then up to the application to filter this list further (e.g. via its own configuration). ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Thoughts fresh after EuroPython
At 04:29 PM 7/25/2010 +1000, Nick Coghlan wrote: So, while I can understand Guido's temptation (PEP 380 *is* pretty cool), I'm among those that hope he resists that temptation. Letting these various ideas bake a little longer without syntactic support likely won't hurt either. Well, if somebody wants to clean up my syntax-sugar-free version a little (maybe adding a From.return_(value) staticmethod that raises StopIteration(value)) and throw it in the stdlib, then people can certainly experiment with the feature in 3.2, and get an opportunity to iron out any implementation issues before going to the C-and-sugared version later. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 380 - return value question and prototype implementation (was Thoughts fresh after EuroPython)
At 07:08 AM 7/24/2010 -0700, Guido van Rossum wrote: - After seeing Raymond's talk about monocle (search for it on PyPI) I am getting excited again about PEP 380 (yield from, return values from generators). Having read the PEP on the plane back home I didn't see anything wrong with it, so it could just be accepted in its current form. I would like to reiterate (no pun intended) the suggestion of a special syntactic form for the return, such as yield return x, or return with x or something similar, to distinguish it from a normal generator return. I think that when people are getting used to the idea of generators, it's important for them to get the idea that the function's return value isn't really a value, it's an iterator object. Allowing a return value, but then having that value silently disappear, seems like it would delay that learning, so, a special form might help to make it clear that the generator in question is intended for use with a corresponding yield from, and help avoid confusion on this. (I could of course be wrong, and would defer to anyone who sees a better way to explain/teach around this issue. In any event, I'm +1 on the PEP otherwise.) By the way, the PEP's optimized implementation could probably be done just by making generator functions containing yield-from statements return an object of a different type than the standard geniter. Here's a Python implementation sketch, using a helper class and a decorator -- translation to a C version is likely straightforward, as it'll basically be this plus a light sprinkling of syntactic sugar. So, in the pure-Python prototype (without syntax sugaring), usage would look like this: @From.container def some_generator(...): ... yield From(other_generator(...)) # equivalent to 'yield from' ... def other_generator(...): ... raise StopIteration(value) # equivalent to 'return value' We mark some_generator() with @From.container to indicate that it uses 'yield from' internally (which would happen automatically in the C/syntax sugar version). We don't mark other_generator(), though, because it doesn't contain a 'yield from'. Now, the implementation code (a slightly altered/watered-down version of a trampoline I've used before in 2.x, hopefully altered correctly for Python 3.x syntax/semantics): class From: @classmethod def container(cls, func): def decorated(*args, **kw): return cls(func(*args, **kw)) # wrap generator in a From() instance return decorated def __new__(cls, geniter): if isinstance(geniter, cls): # It's already a 'From' instance, just return it return geniter self = object.__new__(cls, geniter) self.stack = [geniter] return self def __iter__(self): return self def __next__(self): return self._step() def send(self, value): return self._step(value) def throw(self, *exc_info): return self._step(None, exc_info) def _step(self, value=None, exc_info=()): if not self.stack: raise RuntimeError(Can't resume completed generator) try: while self.stack: try: it = self.stack[-1] if exc_info: try: rv = it.throw(*exc_info) finally: exc_info = () elif value is not None: rv = it.send(value) else: rv = it.next() except: value = None exc_info = sys.exc_info() if exc_info[0] is StopIteration: # pass return value up the stack value, = exc_info[1].args or (None,) exc_info = () # but not the error self.stack.pop() else: if isinstance(rv, From): stack.extend(rv.stack) # Call subgenerator value, exc_info, rv = None, (), None else: return rv # it's a value to yield/return else: # Stack's empty, so exit w/current return value or error if exc_info: raise exc_info[1] else: return value finally: exc_info = () # don't let this create garbage def close(self): if self.stack: try: # There's probably a cleaner way to do this in Py 3, I just # don't know what it is off the top of my head... raise GeneratorExit except GeneratorExit as e: try: self.throw(*sys.exc_info()) except
Re: [Python-Dev] PEP 380 - return value question and prototype implementation (was Thoughts fresh after EuroPython)
At 08:21 PM 7/24/2010 -0700, Guido van Rossum wrote: FWIW, the thing that was harder to debug when I tried to write some code involving generators and a trampoline recently, was thinking of a function as a generator without actually putting a yield in it (because a particular version of a coroutine pattern didn't need to block at all). Monocle uses a decorator to flag all coroutines which fixes this up in the right way, which I think is clever, but I'm torn about the need to flag every coroutine with a decorator -- Monocle makes the decorator really short (@_o) because, as Raymond (not Monocle's author but its advocate at EuroPython) said, you'll be using this hundreds of times. Which I find disturbing in itself. I haven't used Monocle, but in all the libraries I've written myself for this sort of thing (Trellis and peak.events), a decorator is only required for a generator that is a root task; everything else is just a normal generator. For example, in Trellis you use @Task.factory to mark a function as spawning an independent task each time it's called, but subgenerator functions called within the task don't need to be marked, and in fact the yield from is just a yield - the trampoline expects all yields of generators to be subgenerator calls. (PEP 380 can't do this of course, since it also doubles as a sort of 'yield *' - i.e., you may care about the yielded values) Note, though, that even in the sketch I just gave, you don't *really* need to decorate every function, just the ones that need to be called from *non*-decorated functions... i.e. root coroutines. Even then, you could *still* skip the decorator and replace: an_iter = decorated_root_function() with: an_iter = From(undecorated_root_function()) and not need to decorate *anything*. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 382 progress: import hooks
At 11:57 AM 7/23/2010 +0100, Brett Cannon wrote: On Thu, Jul 22, 2010 at 19:19, P.J. Eby mailto:p...@telecommunity.comp...@telecommunity.com wrote: What does is not a package actually mean in that context? The module is a module but not a package. Um... that's not any clearer. Are you saying that a module of the same name takes precedence over a package? Is that the current precedence as well? Regarding load_module_with_path(), how does its specification differ from simply creating a module in sys.modules, setting its __path__, and then invoking the standard load_module()?  (i.e., is this method actually needed, since a correct PEP 302 loader *must* reuse an existing module object in sys.modules) It must reuse the module itself but a proper reload would reset __path__ as leaving it unchanged is not a proper resetting of the module object. So this module is needed in order to force the loader Um, no. Reloading doesn't reset the module contents, not even __path__. Never has, from Python 2.2 through 2.7 -- even in 3.1. At least, not for normal filesystem .py/.pyc files. (I tested with 'os', adding an extra 'foo' attribute, and also setting a __path__; both were unaffected by reload(), in all 7 Python versions. Perhaps you're saying this happens with zipfiles, or packages that already have a __path__, or...?  Am I correct in understanding that, as written, one would have to redefine __import__ to implement this in a library for older Python versions?  Or is it implementable as a meta_path importer? Redefine __import__ (unless Martin and I are missing something, but I tried to think of how to implement this use sys.meta_path and couldn't come up with a solution). I'm thinking it *could* be done with a meta_path hook, but only by doubling the search length in the event that the search failed. That seems a bit icky, but replacing the entire import process seems ickier (more code surface to maintain, more bug potential) in the case of supporting older Pythons. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 382 progress: import hooks
At 01:51 PM 7/22/2010 +0100, Martin v. Löwis wrote: At EuroPython, I sat down with Brett and we propose an approach how namespace packages get along with import hooks. I reshuffled the order in which things get done a little bit, and added a section that elaborates on the hooks. Basically, a finder will need to support a find_path method, return all .pth files, and a loader will need to support a load_module_with_path method, to initialize __path__. Please comment if you think that this needs further changes; I'm not certain I understand it precisely. There seem to be some ambiguities in the spec, e.g.: If fullname is not found, is not a package, or does not have any *.pth files, None must be returned. What does is not a package actually mean in that context? What happens if an empty list is returned - does that mean the importer is saying, this is a package, whether it has an __init__.py or not? As for the list of strings returned, is each string the entire contents of the .pth file? Is it to be \n-separated, or is any universal-newlines-compatible string accepted? Is there a particular order in which .pth file contents are to be returned? Regarding load_module_with_path(), how does its specification differ from simply creating a module in sys.modules, setting its __path__, and then invoking the standard load_module()? (i.e., is this method actually needed, since a correct PEP 302 loader *must* reuse an existing module object in sys.modules) I'll hope to start implementing it soon. Am I correct in understanding that, as written, one would have to redefine __import__ to implement this in a library for older Python versions? Or is it implementable as a meta_path importer? Regards, Martin Thanks for your work on this, I was just thinking about pinging to see how it was going. ;-) (I want setuptools 0.7 to be able to supply an add-on module for supporting this PEP in older Pythons, so that its current .pth hacks for implementing namespace packages can be dropped.) ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] bytes / unicode
At 03:53 PM 6/27/2010 +1000, Nick Coghlan wrote: We could talk about this even longer, but the most effective way forward is going to be a patch that improves the URL parsing situation. Certainly, it's the only practical solution for the immediate problems in 3.2. I only mentioned that I hate the idea because I'd be more comfortable if it was explicitly declared to be a temporary hack to work around the absence of a string coercion protocol, due to the moratorium on language changes. But, since the moratorium *is* in effect, I'll try to make this my last post on string protocols for a while... and maybe wait until I've looked at the code (str/bytes C implementations) in more detail and can make a more concrete proposal for what the protocol would be and how it would work. (Not to mention closer to the end of the moratorium.) There are a *very small* number of APIs where it is appropriate to be polymorphic This is only true if you focus exclusively on bytes vs. unicode, rather than the general issue that it's currently impractical to pass *any* sort of user-defined string type through code that you don't directly control (stdlib or third-party). The virtues of a separate poly_str type are that: 1. It can be simple and implemented in Python, dispatching to str or bytes as appropriate (probably in the strings module) 2. No chance of impacting the performance of the core interpreter (as builtins are not affected) Note that adding a string coercion protocol isn't going to change core performance for existing cases, since any place where the protocol would be invoked would be a code branch that either throws an error or *already* falls back to some other protocol (e.g. the buffer protocol). 3. Lower impact if it turns out to have been a bad idea How many protocols have been added that turned out to be bad ideas? The only ones that have been removed in 3.x, IIRC, are three-way compare, slice-specific operations, and __coerce__... and I'm going to miss __cmp__. ;-) However, IIUC, the reason these protocols were dropped isn't because they were bad ideas. Rather, they're things that can be implemented in terms of a finer-grained protocol. i.e., if you want __cmp__ or __getslice__ or __coerce__, you can always implement them via a mixin that converts the newer fine-grained protocols into invocations of the older protocol. (As I plan to do for __cmp__ in the handful of places I use it.) At the moment, however, this isn't possible for multi-string operations outside of __add__/__radd__ and comparison -- the coercion rules are hard-wired and can't be overridden by user-defined types. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] bytes / unicode
At 12:42 PM 6/26/2010 +0900, Stephen J. Turnbull wrote: What I'm saying here is that if bytes are the signal of validity, and the stdlib functions preserve validity, then it's better to have the stdlib functions object to unicode data as an argument. Compare the alternative: it returns a unicode object which might get passed around for a while before one of your functions receives it and identifies it as unvalidated data. I still don't follow, since passing in bytes should return bytes. Returning unicode would be an error, in the case of a polymorphic function (per Guido). But you agree that there are better mechanisms for validation (although not available in Python yet), so I don't see this as an potential obstacle to polymorphism now. Nope. I'm just saying that, given two bytestrings to url-join or path join or whatever, a polymorph should hand back a bytestring. This seems pretty uncontroversial. What I want is for the stdlib to create stringlike objects of a type determined by the types of the inputs -- In general this is a hard problem, though. Polymorphism, OK, one-way tainting OK, but in general combining related types is pretty arbitrary, and as in the encoded-bytes case, the result type often varies depending on expectations of callers, not the types of the data. But the caller can enforce those expectations by passing in arguments whose types do what they want in such cases, as long as the string literals used by the function don't get to override the relevant parts of the string protocol(s). The idea that I'm proposing is that the basic string and byte types should defer to user-defined string types for mixed type operations, so that polymorphism of string-manipulation functions is the *default* case, rather than a *special* case. This makes tainting easier to implement, as well as optimizing and other special cases (like my source string w/file and line info, or a string with font/formatting attributes). ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/pje%40telecommunity.com ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] bytes / unicode
At 12:43 PM 6/27/2010 +1000, Nick Coghlan wrote: While full support for third party strings and byte sequence implementations is an interesting idea, I think it's overkill for the specific problem of making it easier to write str/bytes agnostic functions for tasks like URL parsing. OTOH, to write your partial implementation is almost as complex - it still must take into account joining and formatting, and so by that point, you've just proposed a new protocol for coercion... so why not just make the coercion protocol explicit in the first place, rather than hardwiring a third type's worth of special cases? Remember, bytes and strings already have to detect mixed-type operations. If there was an API for that, then the hardcoded special cases would just be replaced, or supplemented with type slot checks and calls after the special cases. To put it another way, if you already have two types special-casing their interactions with each other, then rather than add a *third* type to that mix, maybe it's time to have a protocol instead, so that the types that care can do the special-casing themselves, and you generalize to N user types. (Btw, those who are saying that the resulting potential for N*N interaction makes the feature unworkable seem to be overlooking metaclasses and custom numeric types -- two Python features that in principle have the exact same problem, when you use them beyond a certain scope. At least with those features, though, you can generally mix your user-defined metaclasses or numeric types with the Python-supplied basic ones and call arbitrary Python functions on them, without as much heartbreak as you'll get with a from-scratch stringlike object.) All that having been said, a new protocol probably falls under the heading of the language moratorium, unless it can be considered new methods on builtins? (But that seems like a stretch even to me.) I just hate the idea that functions taking strings should have to be *rewritten* to be explicitly type-agnostic. It seems *so* un-Pythonic... like if all the bitmasking functions you'd ever written using 32-bit int constants had to be rewritten just because we added longs to the language, and you had to upcast them to be compatible or something. Sounds too much like C or Java or some other non-Python language, where dynamism and polymorphy are the special case, instead of the general rule. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] bytes / unicode
At 04:49 PM 6/25/2010 +0900, Stephen J. Turnbull wrote: P.J. Eby writes: This doesn't have to be in the functions; it can be in the *types*. Mixed-type string operations have to do type checking and upcasting already, but if the protocol were open, you could make an encoded-bytes type that would handle the error checking. Don't you realize that encoded-bytes is equivalent to use of a very limited profile of ISO 2022 coding extensions? Such as Emacs/MULE internal encoding or TRON code? It has been tried. It does not work. I understand how types can do such checking; my point is that the encoded-bytes type doesn't have enough information to do it in the cases where you think it is better than converting to str. There are *no useful operations* that can be done on two encoded-bytes with different encodings unless you know the ultimate target codec. I do know the ultimate target codec -- that's the point. IOW, I want to be able to do to all my operations by passing target-encoded strings to polymorphic functions. Then, the moment something creeps in that won't go to the target codec, I'll be able to track down the hole in the legacy code that's letting bad data creep in. The only sensible way to define the concatenation of ('ascii', 'English') with ('euc-jp','ÆüËܸì') is something like ('ascii', 'English', 'euc-jp','ÆüËܸì'), and *not* ('euc-jp','EnglishÆüËܸì'), because you don't know that the ultimate target codec is 'euc-jp'-compatible. Worse, you need to build in all the information about which codecs are mutually compatible into the encoded-bytes type. For example, if the ultimate target is known to be 'shift_jis', it's trivially compatible with 'ascii' and 'euc-jp' requires a conversion, but latin-9 you can't have. The interaction won't be with other encoded bytes, it'll be with other *unicode* strings. Ones coming from other code, and literals embedded in the stdlib. No, the problem is not with the Unicode, it is with the code that allows characters not encodable with the target codec. And which code that is, precisely, is the thing that may be very difficult to find, unless I can identify it at the first point it enters (and corrupts) my output data. When dealing with a large code base, this may be a nontrivial problem. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] bytes / unicode
At 01:18 AM 6/26/2010 +0900, Stephen J. Turnbull wrote: It seems to me what is wanted here is something like Perl's taint mechanism, for *both* kinds of strings. Am I missing something? You could certainly view it as a kind of tainting. The part where the type would be bytes-based is indeed somewhat incidental to the actual use case -- it's just that if you already have the bytes, and all you want to do is tag them (e.g. the WSGI headers case), the extra encoding step seems pointless. A string coercion protocol (that would be used by .join(), .format(), __contains__, __mod__, etc.) would allow you to do whatever sort of tainted-string or tainted-bytes implementations one might wish to have. I suppose that tainting user inputs (as in Perl) would be just as useful of an application of the same coercion protocol. Actually, I have another use case for this custom string coercion, which is that I once wrote a string subclass whose purpose was to track the original file and line number of some text. Even though only my code was manipulating the strings, it was very difficult to get the tainting to work correctly without extreme care as to the string methods used. (For example, I had to use string addition rather than %-formatting.) But with your architecture, it seems to me that you actually don't want polymorphic functions in the stdlib. You want the stdlib functions to be bytes-oriented if and only if they are reliable. (This is what I was saying to Guido elsewhere.) I'm not sure I follow you. What I want is for the stdlib to create stringlike objects of a type determined by the types of the inputs -- where the logic for deciding this coercion can be controlled by the input objects' types, rather than putting this in the hands of the stdlib function. And of course, this applies to non-stdlib functions, too -- anything that simply manipulates user-defined string classes, should allow the user-defined classes to determine the coercion of the result. BTW, this was a little unclear to me: [Collisions will] be with other *unicode* strings. Ones coming from other code, and literals embedded in the stdlib. What about the literals in the stdlib? Are you saying they contain invalid code points for your known output encoding? Or are you saying that with non-polymorphic unicode stdlib, you get lots of false positives when combining with your validated bytes? No, I mean that the current string coercion rules cause everything to be converted to unicode, thereby discarding the tainting information, so to speak. This applies equally to other tainting use cases, and other uses for custom stringlike objects. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] bytes / unicode
At 05:12 PM 6/24/2010 +0900, Stephen J. Turnbull wrote: Guido van Rossum writes: For example: how we can make the suite of functions used for URL processing more polymorphic, so that each developer can choose for herself how URLs need to be treated in her application. While you have come down on the side of polymorphism (as opposed to separate functions), I'm a little nervous about it. Specifically, Philip Eby expressed a desire for earlier type errors, while polymorphism seems to ensure that you'll need to Look Before You Leap to get early error detection. This doesn't have to be in the functions; it can be in the *types*. Mixed-type string operations have to do type checking and upcasting already, but if the protocol were open, you could make an encoded-bytes type that would handle the error checking. (Btw, in some earlier emails, Stephen, you implied that this could be fixed with codecs -- but it can't, because the problem isn't with the bytes containing invalid Unicode, it's with the Unicode containing invalid bytes -- i.e., characters that can't be encoded to the ultimate codec target.) ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] bytes / unicode
At 08:34 PM 6/22/2010 -0400, Glyph Lefkowitz wrote: I suspect the practical problem here is that there's no CharacterString ABC That, and the absence of a string coercion protocol so that mixing your custom string with standard strings will do the right thing for your intended use. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] bytes / unicode
At 07:41 AM 6/23/2010 +1000, Nick Coghlan wrote: Then my example above could be made polymorphic (for ASCII compatible encodings) by writing: [x for x in seq if x.endswith(x.coerce(b))] I'm trying to see downsides to this idea, and I'm not really seeing any (well, other than 2.7 being almost out the door and the fact we'd have to grant ourselves an exception to the language moratorium) Notice, however, that if multi-string operations used a coercion protocol (they currently have to do type checks already for byte/unicode mixes), then you could make the entire stdlib polymorphic by default, even for other kinds of strings that don't exist yet. If you invent a new numeric type, generally speaking you can pass it to existing stdlib functions taking numbers, as long as it implements the appropriate protocols. Why not do the same for strings? ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] email package status in 3.X
At 10:20 PM 6/21/2010 +1000, Nick Coghlan wrote: For the idea of avoiding excess copying of bytes through multiple encoding/decoding calls... isn't that meant to be handled at an architectural level (i.e. decode once on the way in, encode once on the way out)? Optimising the single-byte codec case by minimising data copying (possibly through creative use of PEP 3118) may be something that we want to look at eventually, but it strikes me as something of a premature optimisation at this point in time (i.e. the old adage first get it working, then get it working fast). The issue is, I'd like to have an idempotent incantation that I can use to make the inputs and outputs to stdlib functions behave in a type-safe manner with respect to bytes, in cases where bytes are really what I want operated on. Note too that this is an argument for symmetry in wrapping the inputs and outputs, so that the code doesn't have to know what it's dealing with! After all, right now, if a stdlib function might return bytes or unicode depending on runtime conditions, I can't even hardcode an .encode() call -- it would fail if the return type is a bytes. This basically goes against the tell, don't ask pattern, and the Pythonically idempotent approach. That is, Python builtins normally return you back the same thing if it's already what you want - int(someInt)- someInt, iter(someIter)-someIter, etc. Since this incantation may need to be used often, and in places that are not known to me in advance, I would like it to not impose new overhead in unexpected places. (i.e., the usual argument brought against making changes to the 'list' type that would change certain operations from O(1) to O(log something)). It's more about predictability, and having One *Obvious* Way To Do It, as opposed to several ways, which you need to think carefully about and restructure your entire architecture around if necessary. One obvious way means I can focus on the mechanical effort of porting *first*, without having to think. So, the performance issue isn't really about performance *per se*, so much as about the mental UI of the language. You could just as easily lie and tell me that your bstr implementation is O(1), and I would probably be happy and never notice, because the issue was never really about performance as such, but about having to *think* about it. (i.e., breaking flow.) Really, the entire issue can presumably be dealt with by some series of incantations - it's just code after all. But having to sit and think about *every* situation where I'm dealing with bytes/unicode distinctions seems like a torture compared to being able to say, okay, so when dealing with this sort of API and this sort of data, this is the One Obvious Way to do the conversions. It's One Obvious Way that I want, but some people seem to be arguing that the One Obvious Way is to Think Carefully About It Every Time -- and that seems to violate the Obvious part, IMO. ;-) ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] bytes / unicode
At 10:51 PM 6/21/2010 +1000, Nick Coghlan wrote: It may be that there are places where we need to rewrite standard library algorithms to be bytes/str neutral (e.g. by using length one slices instead of indexing). It may be that there are more APIs that need to grow encoding keyword arguments that they then pass on to the functions they call or use to convert str arguments to bytes (or vice-versa). But without people trying to port affected libraries and reporting bugs when they find issues, the situation isn't going to improve. Now, if these bugs are already being reported against 3.1 and just aren't getting fixed, that's a completely different story... The overall impression, though, is that this isn't really a step forward. Now, bytes are the special case instead of unicode, but that special case isn't actually handled any better by the stdlib - in fact, it's arguably worse. And, the burden of addressing this seems to have been shifted from the people who made the change, to the people who are going to use it. But those people are not necessarily in a position to tell you anything more than, give me something that works with bytes. What I can tell you is that before, since string constants in the stdlib were ascii bytes, and transparently promoted to unicode, stdlib behavior was *predictable* in the presence of special cases: you got back either bytes or unicode, but either way, you could idempotently upgrade the result to unicode, or just pass it on. APIs were str safe, unicode aware. If you passed in bytes, you weren't going to get unicode without a warning, and if you passed in unicode, it'd work and you'd get unicode back. Now, the APIs are neither safe nor aware -- if you pass bytes in, you get unpredictable results back. Ironically, it almost *would* have been better if bytes simply didn't work as strings at all, *ever*, but if you could wrap them with a bstr() to *treat* them as text. You could still have restrictions on combining them, as long as it was a restriction on the unicode you mixed with them. That is, if you could combine a bstr and a str if the *str* was restricted to ASCII. If we had the Python 3 design discussions to do over again, I think I would now have stuck with the position of not letting bytes be string-compatible at all, and instead proposed an explicit bstr() wrapper/adapter to use them as strings, that would (in that case) force coercion in the direction of bytes rather than strings. (And bstr need not have been a builtin - it could have been something you import, to help discourage casual usage.) Might this approach lead to some people doing things wrong in the case of porting? Sure. But there'd be little reason to use it in new code that didn't have a real need for bytestring manipulation. It might've been a better balance between practicality and purity, in that it keeps the language pure, while offering a practical way to deal with things in bytes if you really need to. And, bytes wouldn't silently succeed *some* of the time, leading to a trap. An easy inconsistency is worse than a bit of uniform chicken-waving. Is it too late to make that tradeoff? Probably. Certainly it's not practical to *implement* outside the language core, and removing string methods would fux0r anybody whose currently-ported code relies on bytes objects having string-like methods. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] bytes / unicode
At 01:08 AM 6/22/2010 +0900, Stephen J. Turnbull wrote: But if you need that everywhere, what's so hard about def urljoin_wrapper (base, subdir): return urljoin(str(base, 'latin-1'), subdir).encode('latin-1') Now, note how that pattern fails as soon as you want to use non-ISO-8859-1 languages for subdir names. Bear in mind that the use cases I'm talking about here are WSGI stacks with components written by multiple authors -- each of whom may have to define that function, and still get it right. Sure, there are some things that could go in wsgiref in the stdlib. However, as of this moment, there's only a very uneasy rough consensus in Web-Sig as to how the heck WSGI should actually *work* on Python 3, because of issues like these. That makes it tough to actually say what should happen in the stdlib -- e.g., which things should be classed as stdlib bugs, which things should be worked around with wrappers or new functions, etc. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] email package status in 3.X
At 11:43 AM 6/21/2010 -0400, Barry Warsaw wrote: On Jun 21, 2010, at 10:20 PM, Nick Coghlan wrote: Something that may make sense to ease the porting process is for some of these on the boundary I/O related string manipulation functions (such as os.path.join) to grow encoding keyword-only arguments. The recommended approach would be to provide all strings, but bytes could also be accepted if an encoding was specified. (If you want to mix encodings - tough, do the decoding yourself). This is probably a stupid idea, and if so I'll plead Monday morning mindfuzz for it. Would it make sense to have encoding-carrying bytes and str types? It's not a stupid idea, and could potentially work. It also might have a better chance of being able to actually be *implemented* in 3.x than my idea. Basically, I'm thinking of types (maybe even the current ones) that carry around a .encoding attribute so that they can be automatically encoded and decoded where necessary. This at least would simplify APIs that need to do the conversion. I'm not really sure how much use the encoding is on a unicode object - what would it actually mean? Hm. I suppose it would effectively mean this string can be represented in this encoding -- which is useful, in that you could fail operations when combining with bytes of a different encoding. Hm... no, in that case you should just encode the string to the bytes' encoding, and let that throw an error if it fails. So, really, there's no reason for a string to know its encoding. All you need is the bytes type to have an encoding attribute, and when doing mixed-type operations between bytes and strings, coerce to *bytes of the same encoding*. However, if .encoding is None, then coercion would follow the same rules as now -- i.e., convert the bytes to unicode, assuming an ascii encoding. (This would be different than setting an encoding of 'ascii', because in that case, it means you want cross-type operations to result in ascii bytes, rather than a unicode string, and to fail if the unicode part can't be encoded appropriately. The 'None' setting is effectively a nod to compatibility with prior 3.x versions, since I assume we can't just throw out the old coercion behavior.) Then, a few more changes to the bytes type would round out the implementation: * Allow .decode() to not specify an encoding, unless .encoding is None * Add back in the missing string methods (e.g. .encode()), since you can transparently upgrade to a string) * Smart __str__, as shown in your proposal. Would it be feasible? Dunno. Probably, although it might mean adding back in special cases that were previously taken out, and a few new ones. Would it help ease the bytes/str confusion? Dunno. Not sure what confusion you mean -- Web-SIG and I at least are not confused about the difference between bytes and str, or we wouldn't be having an issue. ;-) Or maybe you mean the stdlib's API confusion? In which case, yes, definitely! But I think it would help make APIs easier to design and use because it would cut down on the encoding-keyword function signature infection. Not only that, but I believe it would also retroactively make the stdlib's implementation of those APIs correct again, and give us One Obvious Way to work with bytes of a known encoding, while constraining any unicode that gets combined with those bytes to be validly encodable. It also gives you an idempotent constructor for bytes of a specified encoding, that can take either a bytes of unspecified encoding, a bytes of the correct encoding, or a string that can be encoded as such. In short, +1. (I wish it were possible to go back and make bytes non-strings and have only this ebytes or bstr or whatever type have string methods, but I'm pretty sure that ship has already sailed.) ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] email package status in 3.X
At 12:34 PM 6/21/2010 -0400, Toshio Kuratomi wrote: What do you think of making the encoding attribute a mandatory part of creating an ebyte object? (ex: ``eb = ebytes(b, 'euc-jp')``). As long as the coercion rules force str+ebytes (or str % ebytes, ebytes % str, etc.) to result in another ebytes (and fail if the str can't be encoded in the ebytes' encoding), I'm personally fine with it, although I really like the idea of tacking the encoding to bytes objects in the first place. OTOH, one potential problem with having the encoding on the bytes object rather than the ebytes object is that then you can't easily take bytes from a socket and then say what encoding they are, without interfering with the sockets API (or whatever other place you get the bytes from). So, on balance, making ebytes a separate type (perhaps one that's just a pointer to the bytes and a pointer to the encoding) would indeed make more sense. It having different coercion rules for interacting with strings would make more sense too in that case. (The ideal, of course, would still be to not let bytes objects be stringlike at all, with only ebytes acting string-like. That way, you'd be forced to be explicit about your encoding when working with bytes, but all you'd need to do was make an ebytes call.) ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] bytes / unicode
At 05:49 PM 6/21/2010 +0100, Michael Foord wrote: Why is your proposed bstr wrapper not practical to implement outside the core and use in your own libraries and frameworks? __contains__ doesn't have a converse operation, so you can't code a type that works around this (Python 3.1 shown): from os.path import join join(b'x','y') Traceback (most recent call last): File stdin, line 1, in module File c:\Python31\lib\ntpath.py, line 161, in join if b[:1] in seps: TypeError: Type str doesn't support the buffer API join('y',b'x') Traceback (most recent call last): File stdin, line 1, in module File c:\Python31\lib\ntpath.py, line 161, in join if b[:1] in seps: TypeError: 'in string' requires string as left operand, not bytes IOW, only one of these two cases can be worked around by using a bstr (or ebytes) that doesn't have support from the core string type. I'm not sure if the in operator is the only case where implementing such a type would fail, but it's the most obvious one. String formatting, of both the % and .format() varieties is another. (__rmod__ doesn't help if your bytes object is one of several data items in a tuple or dict -- the common case for % formatting.) ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] bytes / unicode
At 12:56 PM 6/21/2010 -0400, Toshio Kuratomi wrote: One comment here -- you can also have uri's that aren't decodable into their true textual meaning using a single encoding. Apache will happily serve out uris that have utf-8, shift-jis, and euc-jp components inside of their path but the textual representation that was intended will be garbled (or be represented by escaped byte sequences). For that matter, apache will serve requests that have no true textual representation as it is working on the byte level rather than the character level. So a complete solution really should allow the programmer to pass in uris as bytes when the programmer knows that they need it. ebytes(somebytes, 'garbage'), perhaps, which would be like ascii, but where combining with non-garbage would results in another 'garbage' ebytes? ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com