Re: [Python-Dev] PEP 476: Enabling certificate validation by default!
On Aug 29, 2014, at 7:44 PM, Alex Gaynor alex.gay...@gmail.com wrote: Disabling verification entirely externally to the program, through a CLI flag or environment variable. I'm pretty down on this idea, the problem you hit is that it's a pretty blunt instrument to swing, and it's almost impossible to imagine it not hitting things it shouldn't; it's far too likely to be used in applications that make two sets of outbound connections: 1) to some internal service which you want to disable verification on, and 2) some external service which needs strong validation. A global flag causes the latter to fail silently when subjected to a MITM attack, and that's exactly what we're trying to avoid. It also makes things much harder for library authors: I write an API client for some API, and make TLS connections to it. I want those to be verified by default. I can't even rely on the httplib defaults, because someone might disable them from the outside. I would strongly recommend against such a mechanism. For what it's worth, Twisted simply unconditionally started verifying certificates in 14.0 with no disable switch, and (to my knowledge) literally no users have complained. Twisted has a very, very strict backwards compatibility policy. For example, I once refused to accept the deletion of a class that raised an exception upon construction, on the grounds that someone might have been inadvertently importing that class, and they shouldn't see an exception until they've seen a deprecation for one release. Despite that, we classified failing to verify certificates as a security bug, and fixed it with no deprecation period. When users type the 's' after the 'p' and before the ':' in a URL, they implicitly expect browser-like certificate verification. The lack of complaints is despite the fact that 14.0 has been out for several months now, and, thanks to the aforementioned strict policy, users tend to upgrade fairly often (since they know they can almost always do so without fear of application-breaking consequences). According to PyPI metadata, 14.0.0 has had 273283 downloads so far. Furthermore, disable verification is a nonsensical thing to do with TLS. select a trust root is a valid configuration option, and OpenSSL already provides it via the SSL_CERT_DIR environment variable, so there's no need for Python to provide anything beyond that. -glyph ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 476: Enabling certificate validation by default!
On Sep 2, 2014, at 4:01 PM, Nick Coghlan ncogh...@gmail.com wrote: On 3 Sep 2014 08:18, Alex Gaynor alex.gay...@gmail.com wrote: Antoine Pitrou solipsis at pitrou.net writes: And how many people are using Twisted as an HTTPS client? (compared to e.g. Python's httplib, and all the third-party libraries building on it?) I don't think anyone could give an honest estimate of these counts, however there's two factors to bare in mind: a) It's extremely strongly recommended to use requests to make any HTTP requests precisely because httplib is negligent in certificate and hostname checking by default, b) We're talking about Python3, which has fewer users than Python2. Creating *new* incompatibilities between Python 2 Python 3 is a major point of concern. One key focus of 3.5 is *reducing* barriers to migration, and this PEP would be raising a new one. No. Providing the security that the user originally asked for is not a backwards incompatible change. It is a bug fix. And believe me: I care a _LOT_ about reducing barriers to migration. This would not be on my list of the top 1000 things that make migration difficult. It's a change worth making, but we have time to ensure there are easy ways to do things like skipping cert validation, or tolerate expired certificates. The API already supports both of these things. What I believe you're implicitly saying is that there needs to be a way to do this without editing code, and... no, there really doesn't. Not to mention the fact that you could already craft a horrific monkeypatch to allow operators to cause the ssl module to malfunction by 'pip install'ing a separate package, which is about as supported as this should be. -glyph ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 476: Enabling certificate validation by default!
On Sep 2, 2014, at 4:28 PM, Nick Coghlan ncogh...@gmail.com wrote: On 3 Sep 2014 09:08, David Reid dr...@dreid.org wrote: Nick Coghlan ncoghlan at gmail.com writes: Creating *new* incompatibilities between Python 2 Python 3 is a major point of concern. Clearly this change should be backported to Python2. Proposing to break backwards compatibility in a maintenance release (...) As we keep saying, this is not a break in backwards compatibility, it's a bug fix. Yes, systems might break, but that breakage represents an increase in security which may well be operationally important. Not everyone with a working application has the relevant understanding an expertise to know that Python's HTTP client is exposing them to surveillance. These applications should break. That is the very nature of the fix. It is not a compatibility break that the system starts correctly rejecting invalid connections. By way of analogy, here's another kind of breach in security: an arbitrary remote code execution vulnerability in XML-RPC. I think we all agree that any 0day RCE vulnerabilities in Python really ought to be fixed and could be legitimately included without worrying about backwards compatibility breaks. (At least... gosh, I hope so.) Perhaps this arbitrary remote execution looks harmless; the use of an eval() instead of an int() someplace. Perhaps someone discovered that they can do 3 + 4 in their XML-RPC and the server does the computation for them. Great! They start relying on this in their applications to use symbolic values in their requests instead of having explicit enumerations. This can save you quite a bit of code! When the RCE is fixed, this application will break, and that's fine. In fact that's the whole point of issuing the fix, that people will no longer be able to make arbitrary computation requests of your server any more. If that server's maintainer has the relevant context and actually wants the XML-RPC endpoint to enable arbitrary RCE, they can easily modify their application to start doing eval() on the data that they received, just as someone can easily modify their application to intentionally disable all connection security. (Let's stop calling it certificate verification because that sounds like some kind of clerical detail: if you disable certificate verification, TLS connections are unauthenticated and unidentified and therefore insecure.) For what it's worth, on the equivalent Twisted change, I originally had just these concerns, but my mind was changed when I considered what exactly the user-interface ramifications were for people typing that 's' for 'secure' in URLs. I was convinced, and we made the change, and there have been no ill effects that I'm aware of as a result. In fact, there has been a renewed interest in Twisted for HTTP client work, because we finally made security work more or less like it's supposed to, and the standard library is so broken. I care about the health of the broader Python community, so I will passionately argue that this change should be made, but for me personally it's a lot easier to justify that everyone should use Twisted (at least since 14+) because transport security in the stdlib is such a wreck and even if it gets fixed it's going to have easy options to turn it off unilaterally so your application can never really be sure if it's getting transport security when it's requesting transport security. -glyph ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] Language Summit Follow-Up
At the language summit, Alex and I volunteered to put together some recommendations on what changes could be made to Python (the language) in order to facilitate a smoother transition from Python 2 to Python 3. One of the things that motivated this was the (surprising, to us) consideration that features like ensurepip might be added to the future versions of the 2.7 installers from python.org. The specific motivations for writing this are: Library maintainers have a rapidly expanding matrix that requires an increasing number of branches to satisfy. People with large corporate codebases absolutely cannot port all at once. If you don't have perfect test coverage then you can't make any progress. So these changes are intended to make porting from python 2 to python 3 more guided and incremental. We believe that these attributes are necessary. We would like to stress that we don't believe anything on this list is as important as the continuing efforts that everyone in the broader ecosystem is making. If you just want to ease the transition by working on anything at all, the best use of your time right now is porting https://warehouse.python.org/project/MySQL-python/ to Python 3. :) Nevertheless there are some things that the language and CPython could do. Unfortunately we had to reject any proposal that involved new __future__ imports, since unknown __future__ imports are un-catchable SyntaxErrors. Here are some ideas for Python 2.7+. Add ensurepip to the installers. Having pip reliably available increases the availability of libraries that help with porting, and will generally strengthen the broader ecosystem in the (increasingly long) transition period. Add some warnings about python 3 compatibility. It should at least be possible to get a warning for every single implicit string coercion. Old-style classes. Old-style division. Print statements. Old-style exception syntax. buffer(). bytes(memoryview(b'abc')) Importing old locations from the stdlib (see point 4.) Long integer syntax. Use of variables beyond the lifetime of an 'except Exception as e' block or a list comprehension. Backport 'yield from' to allow people to use Tulip and Tulip-compatible code, and to facilitate the development of Tulip-friendly libraries and a Tulip ecosystem. A robust Tulip ecosystem requires the participation of people who are not yet using Python 3. Add aliases for the renamed modules in the stdlib. This will allow people to just write python 3 in a lot more circumstances. (re-)Enable warnings by default, including enabling -3 warnings. Right now all warnings are silent by default, which greatly reduces discoverability of future compatibility issues. I hope it's not controversial to say that most new Python code is still being written against Python 2.7 today; if people are writing that code in such a way that it's not 3-friendly, it should be a more immediately noticeable issue. Get rid of 2to3. Particularly, of any discussion of using 2to3 in the documentation. More than one very experienced, well-known Python developer in this discussion has told me that they thought 2to3 was the blessed way to port their code, and it's no surprise that they think so, given that the first technique https://docs.python.org/3/howto/pyporting.html mentions is still 2to3. We should replace 2to3 with something like https://github.com/mitsuhiko/python-modernize. 2to3 breaks your code on python 2, and doesn't necessarily get it running on python 3. A more conservative approach that reduced the amount of work to get your code 2/3 compatible but was careful to leave everything working would be a lot more effective. Add a new 'bytes' type that actually behaves like the Python 3 bytes type (bytes(5)). We have rejected any changes for Python 3.5, simply because of the extremely long time to get those features into users hands. Any changes for Python 3 that we're proposing would need to get into a 3.4.x release, so that, for example, they can make their way into Ubuntu 14.04 LTS. Here are some ideas for Python 3.4.x: Usage of Python2 style syntax (for example, a print statement) or stdlib module names (for example, 'import urllib2') should result in a specific, informative warning, not a generic SyntaxError/ImportError. This will really help new users. Add 'unicode' back as an alias for 'str'. Just today I was writing some documentation where I had to resort to some awkward encoding tricks just to get a bytes object out without explaining the whole 2/3 dichotomy in some unrelated prose. We'd like to thank all the individuals who gave input and feedback in creating this list. -glyph Alex Gaynor ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 418 is too divisive and confusing and should be postponed
On Apr 7, 2012, at 3:40 AM, Steven D'Aprano wrote: In any case, NTP is not the only thing that adjusts the clock, e.g. the operating system will adjust the time for daylight savings. Daylight savings time is not a clock adjustment, at least not in the sense this thread has mostly been talking about the word clock. It doesn't affect the seconds from epoch measurement, it affects the way in which the clock is formatted to the user. -glyph___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] this is why we shouldn't call it a monotonic clock (was: PEP 418 is too divisive and confusing and should be postponed)
On Apr 5, 2012, at 8:07 PM, Zooko Wilcox-O'Hearn wrote: On Thu, Apr 5, 2012 at 7:14 PM, Greg Ewing greg.ew...@canterbury.ac.nz wrote: This is the strict mathematical meaning of the word monotonic, but the way it's used in relation to OS clocks, it seems to mean rather more than that. Yep. As far as I can tell, nobody has a use for an unsteady, monotonic clock. There seem to be two groups of people: 1. Those who think that monotonic clock means a clock that never goes backwards. These people are in the majority. After all, that's what the word monotonic means ¹ . However, a clock which guarantees *only* this is useless. While this is a popular view on this list and in this discussion, it is also a view that seems to contradict quite a lot that has been written on the subject, and seems contrary to the usual jargon when referring to clocks. 2. Those who think that monotonic clock means a clock that never jumps, and that runs at a rate approximating the rate of real time. This is a very useful kind of clock to have! It is what C++ now calls a steady clock. It is what all the major operating systems provide. All clocks run at a rate approximating the rate of real time. That is very close to the definition of the word clock in this context. All clocks have flaws in that approximation, and really those flaws are the whole point of access to distinct clock APIs. Different applications can cope with different flaws. There seems to be a persistent desire in this discussion to specify and define these flaws out of existence, where this API really should instead be embracing the flaws and classifying them. (Victor is doing a truly amazing job with the PEP in that regard; it's already the first web search hit on every search engine I've tried for more than half of these terms.) Steadiness, in the C++ sense, only applies to most OS clocks that are given the label of monotonic during the run of a single program on a single computer while that computer is running at some close approximation of full power. As soon as you close your laptop lid, the property of steadiness with respect to real local time goes away; the clock stops ticking forward, and only resumes when the lid is opened again. The thing I'd like to draw attention to here is that when you get one of these clocks, you *do not* get a parallel facility that allows you to identify whether a suspend has happened (or, for that matter, when the wall clock has stepped). Or at least, nobody's proposed one for Python. I proposed one for Twisted, http://twistedmatrix.com/trac/ticket/2424#comment:26, but you need an event loop for that, because you need to be able to register interest in that event. I believe that the fact that these clocks are only semi-steady, or only steady with respect to certain kinds of time, is why the term monotonic clock remains so popular, despite the fact that mathematical monotonicity is not actually their most useful property. While these OS-provided clocks have other useful properties, they only have those properties under specific conditions which you cannot necessarily detect and you definitely cannot enforce. But they all remain monotonic in the mathematical sense (modulo hardware and OS bugs), so it is the term monotonic which comes to label all their other, more useful, but less reliable properties. The people in class 1 are more correct, technically, and far more numerous, but the concept from 1 is a useless concept that should be forgotten. Technically correct; the best kind of correct! The people in class 1 are only more correct if you accept that mis-applying jargon from one field (mathematics) to replace generally-accepted terminology in another field (software clocks) is the right thing to do. I think it's better to learn the local jargon and try to apply it consistently. If you search around the web for the phrase monotonic clock, it's applied in a sense closest to the one you mean on thousands and thousands of web pages. steady clock generally applies with reference to C++, and even then is often found in phrases like is_steady indicates whether this clock is a monotonic clock. Software developers mis-apply mathematical terms like isomorphic, orthogonal, incidental, tangential, and reflexive all the time. Physicists and mathematicians also disagree on the subtleties of the same terms. Context is everything. So before proceeding, we should mutually agree that we have no interest in implementing a clock of type 1. It wouldn't serve anyone's use case (correct me if I'm wrong!) and the major operating systems don't offer such a thing anyway. +1. Then, if we all agree to stop thinking about that first concept, then we need to agree whether we're all going to use the word monotonic clock to refer to the second concept, or if we're going to use a different word (such as steady clock) to refer to the second concept. I would prefer
Re: [Python-Dev] Use QueryPerformanceCounter() for time.monotonic() and/or time.highres()?
On Apr 2, 2012, at 10:39 AM, Kristján Valur Jónsson wrote: no steps is something unquantifiable. All time has steps in it. No steps means something very specific when referring to time APIs. As I recently explained here: http://article.gmane.org/gmane.comp.python.devel/131487/. -glyph ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Playing with a new theme for the docs
On Mar 21, 2012, at 6:28 PM, Greg Ewing wrote: Ned Batchelder wrote: Any of the tweaks people are suggesting could be applied individually using this technique. We could just as easily choose to make the site left-justified, and let the full-justification fans use custom stylesheets to get it. Is it really necessary for the site to specify the justification at all? Why not leave it to the browser and whatever customisation the user chooses to make? It's design. It's complicated. Maybe yes, if you look at research related to default usage patterns, and saccade distance, reading speed and retention latency. Maybe no, if you look at research related to fixation/focus time, eye strain, and non-linear access patterns. Maybe maybe, if you look at the subjective aesthetic of the page according to various criteria, like does it look like a newspaper and do I have to resize my browser every time I visit a new site to get a decent width for reading. As has been said previously in this thread several times, it's best to leave this up to a design czar who will at least make some decisions who will make some people happy. I'm fairly certain it's not possible to create a design that's optimal for all readers in all cases. -glyph ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Issue 13524: subprocess on Windows
On Mar 21, 2012, at 4:38 PM, Brad Allen wrote: I tripped over this one trying to make one of our Python at work Windows compatible. We had no idea that a magic 'SystemRoot' environment variable would be required, and it was causing issues for pyzmq. It might be nice to reflect the findings of this email thread on the subprocess documentation page: http://docs.python.org/library/subprocess.html Currently the docs mention this: Note If specified, env must provide any variables required for the program to execute. On Windows, in order to run a side-by-side assembly the specified env must include a valid SystemRoot. How about rewording that to: Note If specified, env must provide any variables required for the program to execute. On Windows, a valid SystemRoot environment variable is required for some Python libraries such as the 'random' module. Also, in order to run a side-by-side assembly the specified env must include a valid SystemRoot. Also, in order to execute in any installation environment where libraries are found in non-default locations, you will need to set LD_LIBRARY_PATH. Oh, and you will also need to set $PATH on UNIX so that libraries can find their helper programs and %PATH% on Windows so that any compiled dynamically-loadable modules and/or DLLs can be loaded. And by the way you will also need to relay DYLD_LIBRARY_PATH if you did a UNIX-style build on OS X, not LD_LIBRARY_PATH. Don't forget that you probably also need PYTHONPATH to make sure any subprocess environments can import the same modules as their parent. Not to mention SSH_AUTH_SOCK if your application requires access to _remote_ process spawning, rather than just local. Oh and DISPLAY in case your subprocesses need GUI support from an X11 program (which sometimes you need just to initialize certain libraries which don't actually do anything with a GUI). Oh and __CF_USER_TEXT_ENCODING is important sometimes too, don't forget that. And if your subprocess is in Perl or Ruby or Java you may need a couple dozen other variables which your deployment environment has set for you too. Did I mention CFLAGS or LC_ALL yet? Let me tell you a story about this one HP/UX machine... Ahem. Bottom line: it seems like screwing with the process spawning environment to make it minimal is a good idea for simplicity, for security, and for modularity. But take it from me, it isn't. I guarantee you that you don't actually know what is in your operating system's environment, and initializing it is a complicated many-step dance which some vendor or sysadmin or product integrator figured out how to do much better than your hapless Python program can. %SystemRoot% is just the tip of a very big, very nasty iceberg. Better not to keep refining why exactly it's required, or someone will eventually be adding a new variable (starting with %APPDATA% and %HOMEPATH%) that can magically cause your subprocess not to spawn properly to this page every six months for eternity. If you're spawning processes as a regular user, you should just take the environment you're given, perhaps with a few specific light additions whose meaning you understand. If you're spawning a process as an administrator or root, you should probably initialize the environment for the user you want to spawn that process as using an OS-specific mechanism like login(1). (Sorry that I don't know the Windows equivalent.) -glyph ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] sharing sockets among processes on windows
On Mar 13, 2012, at 5:27 PM, Kristján Valur Jónsson wrote: Hi, I´m interested in contributing a patch to duplicate sockets between processes on windows. Tha api to do this is WSADuplicateSocket/WSASocket(), as already used by dup() in the _socketmodule.c Here´s what I have: Just in case anyone is interested, we also have a ticket for this in Twisted: http://twistedmatrix.com/trac/ticket/4389. It would be great to share code as much as possible. -glyph ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Python 3 optimizations, continued, continued again...
On Feb 1, 2012, at 12:46 PM, Guido van Rossum wrote: I understand that you're hesitant to just dump your current mess, and you want to clean it up before you show it to us. That's fine. (...) And remember, it doesn't need to be perfect (in fact perfectionism is probably a bad idea here). Just as a general point of advice to open source contributors, I'd suggest erring on the side of the latter rather than the former suggestion here: dump your current mess, along with the relevant caveats (it's a mess, much of it is irrelevant) so that other developers can help you clean it up, rather than putting the entire burden of the cleanup on yourself. Experience has taught me that most people who hold back work because it needs cleanup eventually run out of steam and their work never gets integrated and maintained. -glyph___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Packaging and setuptools compatibility
On Jan 24, 2012, at 12:54 PM, Alexis Métaireau wrote: I'm wondering if we should support that (a way to have plugins) in the new packaging thing, or not. If not, this mean we should come with another solution to support this outside of packaging (may be in distribute). If yes, then we should design it, and probably make it a sub-part of packaging. First, my interest: Twisted has its own plugin system. I would like this to continue to work in the future. I do not believe that packaging should support plugins directly. Run-time metadata is not the packaging system's job. However, the packaging system does need to provide some guarantees about how to install and update data at installation (and post-installation time) so that databases of plugin metadata may be kept up to date. Basically, packaging's job is constructing explicitly declared parallels between your development environment and your deployment environment. Some such databases are outside of Python entirely (for example, you might think of /etc/init.d as such a database), so even if you don't care about the future of Twisted's weirdo plugin system, it would be nice for this to be supported. In other words, packaging should have a meta-plugin system: a way for a plugin system to register itself and provide an API for things to install their metadata, and a way to query the packaging module about the way that a Python package is installed so that it can put things near to it in an appropriate way. (Keep in mind that near to it may mean in a filesystem directory, or a zip file, or stuffed inside a bundle or executable.) In my design of Twisted's plugin system, we used PEP 302 as this sort of meta-standard, and (modulo certain bugs in easy_install and pip, most of which are apparently getting fixed in pip pretty soon) it worked out reasonably well. The big missing pieces are post-install and post-uninstall hooks. If we had those, translating to native packages for Twisted (and for things that use it) could be made totally automatic. -glyph___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Fixing the XML batteries
On Dec 10, 2011, at 2:38 AM, Stefan Behnel wrote: Note, however, that html5lib is likely way too big to add it to the stdlib, and that BeautifulSoup lacks a parser for non-conforming HTML in Python 3, which would be the target release series for better HTML support. So, whatever library or API you would want to use for HTML processing is currently only the second question as long as Py3 lacks a real-world HTML parser in the stdlib, as well as a robust character detection mechanism. I don't think that can be fixed all that easily. Here's the problem in a nutshell, I think: Everybody wants an HTML parser in the stdlib, because it's inconvenient to pull in a dependency for such a simple task. Everybody wants the stdlib to remain small, stable, and simple and not get overcomplicated. Parsing arbitrary HTML5 is a monstrously complex problem, for which there exist rapidly-evolving standards and libraries to deal with it. Parsing 'the web' (which is rapidly growing to include stuff like SVG, MathML etc) is even harder. My personal opinion is that HTML5Lib gets this problem almost completely right, and so it should be absorbed by the stdlib. Trying to re-invent this from scratch, or even use something like BeautifulSoup which uses a bunch of heuristics and hacks rather than reference to the laboriously-crafted standard that says exactly how parsing malformed stuff has to go to be like a browser, seems like it will just give the stdlib solution a reputation for working on the test input but not working in the real world. (No disrespect to BeautifulSoup: it was a great attempt in the pre-HTML5 world which it was born into, and I've used it numerous times to implement useful things. But much more effort has been poured into this problem since then, and the problems are better understood now.) -glyph ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Fixing the XML batteries
On Dec 10, 2011, at 6:30 PM, Terry Reedy wrote: A little data: the HTML5lib project lives at https://code.google.com/p/html5lib/ It has 4 owners and 22 other committers. The most recent release, html5lib 0.90 for Python, is nearly 2 years old. Since there is a separate Python3 repository, and there is no mention on Python3 compatibility elsewhere that I saw, including the pypi listing, I assume that is for Python2 only. I believe that you are correct. A comment on a recent (July 11) Python3 issue https://code.google.com/p/html5lib/issues/detail?id=187colspec=ID%20Type%20Status%20Priority%20Milestone%20Owner%20Summary%20Port suggest that the Python3 version still has problems. Merged in now, though still lots of errors and failures in the testsuite. I don't see what bearing this has on the discussion. There are three possible ways I can imagine to interpret this information. First, you could believe that porting a codebase from Python 2 to Python 3 is much easier than solving a difficult domain-specific problem. In that case, html5lib has done the hard part and someone interested in html-in-the-stdlib should do the rest. Second, you could believe that porting a codebase from Python 2 to Python 3 is harder than solving a difficult domain-specific problem, in which case something is seriously wrong with Python 3 or its attendant migration tools and that needs to be fixed, so someone should fix that rather than worrying about parsing HTML right now. (I doubt that many subscribers to this list would share this opinion, though.) Third, you could believe that parsing HTML is not a difficult domain-specific problem. But only a crazy person would believe that, so you're left with one of the previous options :). -glyph ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Maintenance burden of str.swapcase
On Sep 11, 2011, at 11:49 AM, Michael Foord wrote: Does anyone *actually* use .title() for this? (And why not just use the correct casing in the string literal...) Yes. Twisted does, in various MIME-ish places (IMAP, SIP), although not in HTTP from what I can see. I imagine other similar software would as well. One issue is that you don't always have a string literal to work with. If you're proxying traffic, you start from a mis-cased header and you possibly need to correct it to a canonically-cased one. (On at least one occasion I've had to use such a proxy to make certain buggy client software work.) Of course you could have something like {bCONNECTION-LOST: bConnection-Lost, ...} somewhere at module scope, but that feels a bit sillier than just having a nice '.title()' method. -glyph ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Maintenance burden of str.swapcase
On Sep 7, 2011, at 10:26 AM, Stephen J. Turnbull wrote: How about title? 'content-length'.title() 'Content-Length' You might say that the protocol has to be case-insensitive so this is a silly frill: there are definitely enough case-sensitive crappy bits of network middleware out there that this function is critically important for an HTTP server. In general I'd like to defend keeping as many of these methods as possible for compatibility (porting to Py3 is already hard enough). Although even I might have a hard time defending 'swapcase', which is never used _at all_ within Twisted, on text or bytes. The only use-case I can think of for that method is goofy joke text filters, and it wouldn't be very good at that either. -glyph ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Python 3 optimizations continued...
On Sep 1, 2011, at 5:23 AM, Cesare Di Mauro wrote: A simple solution: when tracing is enabled, the new instruction format will never be executed (and information tracking disabled as well). Correct me if I'm wrong: doesn't this mean that no profiler will accurately be able to measure the performance impact of the new instruction format, and therefore one may get incorrect data when on is trying to make a CPU optimization for real-world performance? ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Ctypes and the stdlib (was Re: LZMA compression support in 3.3)
On Aug 28, 2011, at 7:27 PM, Guido van Rossum wrote: In general, an existing library cannot be called without access to its .h files -- there are probably struct and constant definitions, platform-specific #ifdefs and #defines, and other things in there that affect the linker-level calling conventions for the functions in the library. Unfortunately I don't know a lot about this, but I keep hearing about something called rffi that PyPy uses to call C from RPython: http://readthedocs.org/docs/pypy/en/latest/rffi.html. This has some shortcomings currently, most notably the fact that it needs those .h files (and therefore a C compiler) at runtime, so it's currently a non-starter for code distributed to users. Not to mention the fact that, as you can see, it's not terribly thoroughly documented. But, that ExternalCompilationInfo object looks very promising, since it has fields like includes, libraries, etc. Nevertheless it seems like it's a bit more type-safe than ctypes or cython, and it seems to me that it could cache some of that information that it extracts from header files and store it for later when a compiler might not be around. Perhaps someone with more PyPy knowledge than I could explain whether this is a realistic contender for other Python runtimes? ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 402: Simplified Package Layout and Partitioning
On Aug 12, 2011, at 11:24 AM, P.J. Eby wrote: That is, the above code hardocdes a variety of assumptions about the import system that haven't been true since Python 2.3. Thanks for this feedback. I honestly did not realize how old and creaky this code had gotten. It was originally developed for Python 2.4 and it certainly shows its age. Practically speaking, the code is correct for the bundled importers, and paths and zipfiles are all we've cared about thus far. (For example, it assumes that the contents of sys.path strings have inspectable semantics, that the contents of __file__ can tell you things about the module-ness or package-ness of a module object, etc.) Unfortunately, the primary goal of this code is to do something impossible - walk the module hierarchy without importing any code. So some heuristics are necessary. Upon further reflection, PEP 402 _will_ make dealing with namespace packages from this code considerably easier: we won't need to do AST analysis to look for a __path__ attribute or anything gross like that improve correctness; we can just look in various directories on sys.path and accurately predict what __path__ will be synthesized to be. However, the isPackage() method can and should be looking at the module if it's already loaded, and not always guessing based on paths. The whole reason there's an 'importPackages' flag to walk() is that some applications of this code care more about accuracy than others, so it tries to be as correct as it can be. (Of course this is still wrong for the case where a __path__ is dynamically constructed by user code, but there's only so well one can do at that.) If you want to fully support PEP 302, you might want to consider making this a wrapper over the corresponding pkgutil APIs (available since Python 2.5) that do roughly the same things, but which delegate all path string inspection to importer objects and allow extensible delegation for importers that don't support the optional methods involved. This code still needs to support Python 2.4, but I will make a note of this for future reference. (Of course, if the pkgutil APIs are missing something you need, perhaps you could propose additions.) Now it seems like pure virtual packages are going to introduce a new type of special case into the hierarchy which have neither .pathEntry nor .filePath objects. The problem is that your API's notion that these things exist as coherent concepts was never really a valid assumption in the first place. .pth files and namespace packages already meant that the idea of a package coming from a single path entry made no sense. And namespace packages installed by setuptools' system packaging mode *don't have a __file__ attribute* today... heck they don't have __init__ modules, either. The fact that getModule('sys') breaks is reason enough to re-visit some of these design decisions. So, adding virtual packages isn't actually going to change anything, except perhaps by making these scenarios more common. In that case, I guess it's a good thing; these bugs should be dealt with. Thanks for pointing them out. My opinion of PEP 402 has been completely reversed - although I'd still like to see a section about the module system from a library/tools author point of view rather than a time-traveling perl user's narrative :). ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 402: Simplified Package Layout and Partitioning
On Aug 12, 2011, at 2:33 PM, P.J. Eby wrote: At 01:09 PM 8/12/2011 -0400, Glyph Lefkowitz wrote: Upon further reflection, PEP 402 _will_ make dealing with namespace packages from this code considerably easier: we won't need to do AST analysis to look for a __path__ attribute or anything gross like that improve correctness; we can just look in various directories on sys.path and accurately predict what __path__ will be synthesized to be. The flip side of that is that you can't always know whether a directory is a virtual package without deep inspection: one consequence of PEP 402 is that any directory that contains a Python module (of whatever type), however deeply nested, will be a valid package name. So, you can't rule out that a given directory *might* be a package, without walking its entire reachable subtree. (Within the subset of directory names that are valid Python identifiers, of course.) Are there any rules about passing invalid identifiers to __import__ though, or is that just less likely? :) However, you *can* quickly tell that a directory *might* be a package or is *probably* one: if it contains modules, or is the same name as an already-discovered module, it's a pretty safe bet that you can flag it as such. I still like the idea of a 'marker' file. It would be great if there were a new marker like __package__.py. I say this more for the benefit of users looking at a directory on their filesystem and trying to understand whether this is a package or not than I do for my own programmatic tools though; it's already hard enough to understand the package-ness of a part of your filesystem and its interactions with PYTHONPATH; making directories mysteriously and automatically become packages depending on context will worsen that situation, I think. I also have this not-terribly-well-defined idea that it would be handy for different providers of the _contents_ of namespace packages to provide their own instrumentation to be made aware that they've been added to the __path__ of a particular package. This may be a solution in search of a problem, but I imagine that each __package__.py would be executed in the same module namespace. This would allow namespace packages to do things like set up compatibility aliases, lazy imports, plugin registrations, etc, as they currently do with __init__.py. Perhaps it would be better to define its relationship to the package-module namespace in a more sensible way than execute all over each other in no particular order. Also, if I had my druthers, Python would raise an exception if someone added a directory marked as a package to sys.path, to refuse to import things from it, and when a submodule was run as a script, add the nearest directory not marked as a package to sys.path, rather than the script's directory itself. The whole __name__ is wrong because your current directory was wrong when you ran that command thing is so confusing to explain that I hope we can eventually consign it to the dustbin of history. But if you can't even reasonably guess whether a directory is supposed to be an entry on sys.path or a package, that's going to be really hard to do. In any case, you probably should *not* do the building of a virtual path yourself; the protocols and APIs added by PEP 402 should allow you to simply ask for the path to be constructed on your behalf. Otherwise, you are going to be back in the same business of second-guessing arbitrary importer backends again! What do you mean building of a virtual path? (E.g. note that PEP 402 does not say virtual package subpaths must be filesystem or zipfile subdirectories of their parents - an importer could just as easily allow you to treat subdirectories named 'twisted.python' as part of a virtual package with that name!) Anyway, pkgutil defines some extra methods that importers can implement to support module-walking, and part of the PEP 402 implementation should be to make this support virtual packages as well. The more that this can focus on module-walking without executing code, the happier I'll be :). This code still needs to support Python 2.4, but I will make a note of this for future reference. A suggestion: just take the pkgutil code and bundle it for Python 2.4 as something._pkgutil. There's very little about it that's 2.5+ specific, at least when I wrote the bits that do the module walking. Of course, the main disadvantage of pkgutil for your purposes is that it currently requires packages to be imported in order to walk their child modules. (IIRC, it does *not*, however, require them to be imported in order to discover their existence.) One of the stipulations of this code is that it might give different results when the modules are loaded and not. So it's fine to inspect that first and then invoke pkgutil only in the 'loaded' case, with the knowledge that the not-loaded case may
Re: [Python-Dev] PEP 402: Simplified Package Layout and Partitioning
On Aug 11, 2011, at 11:39 AM, Barry Warsaw wrote: On Aug 11, 2011, at 04:39 PM, Éric Araujo wrote: * XXX what is the __file__ of a pure virtual package? ``None``? Some arbitrary string? The path of the first directory with a trailing separator? No matter what we put, *some* code is going to break, but the last choice might allow some code to accidentally work. Is that good or bad? A pure virtual package having no source file, I think it should have no __file__ at all. I don’t know if that would break more code than using an empty string for example, but it feels righter. I agree that the empty string is the worst of the choices. no __file__ or __file__=None is better. In some sense, I agree: hacks like empty strings are likely to lead to path-manipulation bugs where the wrong file gets opened (or worse, deleted, with predictable deleterious effects). But the whole pure virtual mechanism here seems to pile even more inconsistency on top of an already irritatingly inconsistent import mechanism. I was reasonably happy with my attempt to paper over PEP 302's weirdnesses from a user perspective: http://twistedmatrix.com/documents/11.0.0/api/twisted.python.modules.html (or https://launchpad.net/modules if you are not a Twisted user) Users of this API can traverse the module hierarchy with certain expectations; each module or package would have .pathEntry and .filePath attributes, each of which would refer to the appropriate place. Of course __path__ complicates things a bit, but so it goes. Now it seems like pure virtual packages are going to introduce a new type of special case into the hierarchy which have neither .pathEntry nor .filePath objects. Rather than a one-by-one ad-hoc consideration of which attribute should be set to None or empty strings or string or what have you, I'd really like to see a discussion in the PEP saying what a package really is vs. what a module is, and what one can reasonably expect from it from an API and tooling perspective. Right now I have to puzzle out the intent of the final API from the problem/solution description and thought experiment. Despite authoring several namespace packages myself, I don't have any of the problems described in the PEP. I just want to know how to write correct tools given this new specification. I suspect that this PEP will be the only reference for how packages work for a long time coming (just as PEP 302 was before it) so it should really get this right.___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] HTMLParser and HTML5
On Jul 29, 2011, at 7:46 AM, Stefan Behnel wrote: Joao S. O. Bueno, 29.07.2011 13:22: On Fri, Jul 29, 2011 at 1:37 AM, Stefan Behnel wrote: Brett Cannon, 28.07.2011 23:49: On Thu, Jul 28, 2011 at 11:25, Matt wrote: - What policies are in place for keeping parity with other HTML parsers (such as those in web browsers)? There aren't any beyond it would be nice. [...] It's more of an issue of someone caring enough to do the coding work to bring the parser up to spec for HTML5 (or introduce new code to live beside the HTML4 parsing code). Which, given that html5lib readily exists, would likely be a lot more work than anyone who is interested in HTML5 handling would want to invest. I don't think we need a new HTML5 parsing implementation only to have it in the stdlib. That's the old sunny Java way of doing it. I disaagree. Having proper html parsing out of the box is part of the batteries included thing. Well, you can easily prove me wrong by implementing this. Stefan Please don't implement this just to profe Stefan wrong :). The thing to do, if you want html parsing in the stdlib, is to _incorporate_ html5lib, which is already a perfectly good, thoroughly tested HTML parser, and simply deprecate HTMLParser and friends. Implementing a new parser would serve no purpose I can see. -glyph ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] HTMLParser and HTML5
On Jul 29, 2011, at 3:00 PM, Matt wrote: I don't see any real reason to drop a decent piece of code (HTMLParser, that is) in favor of a third party library when only relatively minor updates are needed to bring it up to speed with the latest spec. I am not really one to throw stones here, as Twisted contains a lenient pseudo-XML parser which I still maintain - one which decidedly does not agree with html5's requirements for dealing with invalid data, but just a bunch of ad-hoc guesses of my own. My impression of HTML5 is that HTMLParser would require significant modifications and possibly a drastic re-architecture in order to really do HTML5 right; especially the parts that the html5lib authors claim makes HTML5 streaming-unfriendly, i.e. subtree reordering when encountering certain types of invalid data. But if I'm wrong about that, and there are just a few spec updates and bugfixes that need to be applied, by all means, ignore my comment. -glyph ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Comments of the PEP 3151
On Jul 26, 2011, at 6:49 PM, Antoine Pitrou wrote: On Mon, 25 Jul 2011 15:28:47 +1000 Nick Coghlan ncogh...@gmail.com wrote: There may be some error codes that we choose to map to these generic errors, even if we don't give them their own exception types at this point (e.g. ECONSHUTDOWN could map directly to ConnectionError). Ok, I can find neither ECONSHUTDOWN nor ECONNSHUTDOWN on www.opengroup.org, and it's not mentioned in errnomodule.c. Is it some system-specific error code? I assume that ESHUTDOWN is the errno in question? (This is also already mentioned in the PEP.) ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] The socket HOWTO
On Jun 5, 2011, at 3:35 PM, Martin v. Löwis wrote: And that's all fine. I still claim that you have to *understand* sockets in order to use it properly. By this, I mean stuff like what is a TCP connection? how is it established?, how is UDP different from TCP?, when data arrives, what layers of software does it go through?, what is a port number?, etc. Yes, these are all excellent concepts to be familiar with. But the word socket (and the socket HOWTO) refers to a specific way to interface with those concepts, the Berkeley socket API: http://en.wikipedia.org/wiki/Berkeley_sockets. Which you don't have to know anything about if you're going to use Twisted. You should know about IPC in general, and TCP/UDP specifically if you're going to use Twisted, but sockets are completely optional. Also, I feel that I should point out that the sockets HOWTO does not cover even a single one of these concepts in any useful depth. If you think that these are what it should be explaining, it needs some heavy editing. Here's what it has to say about each one: what is a TCP connection? The only place that the characters TCP appear in the entire document is in the phrase ... which is completely different from TCP_NODELAY Nowhere is a TCP connection explained at a conceptual level, except to say that it's something a web browser does. how is UDP different from TCP? The phrase UDP never appears in the HOWTO. DGRAM sockets get a brief mention as anything else in the sentence: ... you’ll get better behavior and performance from a STREAM socket than anything else (To be fair, I do endorse teaching that the difference between TCP and UDP is that you should not use UDP to anyone not sufficiently advanced to read the relevant reference documentation themselves.) when data arrives, what layers of software does it go through? There's no discussion of this that I can find at all. what is a port number? Aside from a few comments in the code examples, the only discussion of port numbers is low number ports are usually reserved for “well known” services (HTTP, SNMP etc). It would be very good to have a Python networking overview somewhere that explained this stuff at a very high level, and described how data might get into or out of your program, with links to things like the socket HOWTO that describe more specific techniques. This would be useful because most commonly, I think that data will get into Python network programs via WSGI, not direct sockets or anything like Twisted. To be clear, having read it now: I do _not_ agree with Antoine that this document should be deleted. I dimly recall that it helped me understand some things in the very early days of Twisted. While it's far from perfect, it might help someone in a similar situation understand those things as well today. I just found it interesting that the main concepts one would associate with such a HOWTO are nowhere to be found :). -glyph___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] The socket HOWTO
On Jun 4, 2011, at 11:32 PM, Martin v. Löwis wrote: b) telling people to use Twisted or asyncore on the server side if they are new to sockets is bad advice. People *first* have to understand sockets, and *then* can use these libraries and frameworks. Those libraries aren't made to be black boxes that work even if you don't know how - you *have* to know how they work inside, or else you can't productively use them. First, Twisted doesn't always use the BSD sockets API; the Windows IOCP reactor, especially, starts off with the socket() function, but things go off in a different direction pretty quickly from there. So it's perfectly fine to introduce yourself to networking via Twisted, and many users have done just that. If you're using it idiomatically, you should never encounter a socket object or file descriptor poking through the API anywhere. Asyncore is different: you do need to know how sockets work in order to use it, because you're expected to call .send() and .recv() yourself. (And, in my opinion, this is a serious design flaw, for reasons which will hopefully be elucidated in the PEP that Laurens is now writing.) Second, it makes me a little sad that it appears to be folk wisdom that Twisted is only for servers. A lot of work has gone into making it equally appropriate for clients. This is especially true if your client has a GUI, where Twisted is often better than a protocol-specific library, which may either be blocking or have its own ad-hoc event loop. I don't have an opinion on the socket HOWTO per se, only on the possibility of linking to Twisted as an alternate implementation mechanism. It really would be better to say go use Twisted rather than reading any of the following than read the following, which will help you understand Twisted. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Python 3.x and bytes
On May 19, 2011, at 1:43 PM, Guido van Rossum wrote: -1; the result is not a *character* but an integer. Well, really the result ought to be an octet, but I suppose adding an 'octet' type is beyond the scope of even this sprawling discussion :). I'm personally favoring using b'a'[0] and possibly hiding this in a constant definition. As someone who spends a frankly unfortunate amount of time handling protocols where things like this are necessary, I agree with this recommendation. In protocols where one needs to compare network data with one-byte type identifiers or packet prefixes, more (documented) constants and less inscrutable junk like if p == 'c': ... elif p == 'j': ... elif p == 'J': # for compatibility ... would definitely be a good thing. Of course, I realize that this sort of programmer will most likely replace those constants with 99, 106, 74 than take a moment to document what they mean, but at least they'll have to pause for a moment and realize that they have now lost _all_ mnemonics... In fact, I feel like I would want to push in the opposite direction: don't treat one-byte bytes slices less like integers; I wish I could more easily treat n-byte sequences _more_ like integers! :). More protocols have 2-byte or 4-byte network-endian packed integers embedded in them than have individual tag bytes that I want to examine. For the typical ASCII-ish protocol where you want to look at command names and CRLF-separated messages, you'd never want to look at an individual octet, stringish operations like split() will give you what you want. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Linus on garbage collection
On May 6, 2011, at 12:31 PM, Michael Foord wrote: pypy and .NET choose to arbitrarily break cycles rather than leave objects unfinalised and memory unreclaimed. Not sure what Java does. I think that's a mischaracterization of their respective collectors; arbitrarily break cycles implies that user code would see broken or incomplete objects, at least during finalization, which I'm fairly sure is not true on either .NET or PyPy. Java definitely has a collector that can handles cycles too. (None of these are reference counting.) -glyph___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Linus on garbage collection
Apologies in advance for contributing to an obviously and increasingly off-topic thread, but this kind of FUD about GC is a pet peeve of mine. On May 6, 2011, at 10:04 AM, Neal Becker wrote: http://gcc.gnu.org/ml/gcc/2002-08/msg00552.html Counterpoint: http://lwn.net/Articles/268783/. Sorry Linus, sometimes correctness matters more than performance. But, even the performance argument is kind of bogus. See, for example, this paper on real-time garbage collection: http://domino.research.ibm.com/comm/research_people.nsf/pages/dgrove.ecoop07.html. That's just one example of an easy-to-find solution to a problem that Linus holds up as unsolved or unsolvable. There are solutions to pretty much all of the problems that Linus brings up. One of these solutions is even famously implemented by CPython! The CPython string += idiom optimization fixes at least one case of the you tend to always copy the node antipattern Linus describes, and lots of languages (especially Scheme and derivatives, IIRC) have very nice optimizations around this area. One could argue that any functional language without large pools of mutable state (i.e. Erlang) is a massive optimization for this case. Another example: the dirty cache problem Linus talks about can be addressed by having a GC that cooperates with the VMM: http://www.cs.umass.edu/~emery/pubs/f034-hertz.pdf. And the re-using stuff as fast as possible thing is exactly the kind of problem that generational GCs address. When you run out of space in cache, you reap your first generation before you start copying stuff. One of the key insights of generational GC is that you'll usually reclaim enough (in this case, cache-local) memory that you can keep going for a little while. You don't have to read a super fancy modern paper on this, Wikipedia explains nicely: http://en.wikipedia.org/wiki/Garbage_collection_(computer_science)#Generational_GC_.28ephemeral_GC.29. Of course if you don't tune your GC at all for your machine-specific cache size, you won't see this performance benefit play out. I don't know if there's a programming language and runtime with a real-time, VM-cooperating garbage collector that actually exists today which has all the bells and whistles required to implement an OS kernel, so I wouldn't give the Linux kernel folks too much of a hard time for still using C; but there's nothing wrong with the idea in the abstract. The performance differences between automatic and manual GC are dubious at best, and with a really good GC and a language that supports it, GC tends to win big. When it loses, it loses in ways which can be fixed in one area of the code (the GC) rather than millions of tiny fixes across your whole codebase, as is the case with strategies used by manual collection algorithms. The assertion that modern hardware is not designed for big data-structure pointer-chasing is also a bit silly. On the contrary, modern hardware has evolved staggeringly massive caches, specifically because large programs (whether they're GC'd or not) tend to do lots of this kind of thing, because there's a certain level of complexity beyond which one can no longer avoid it. It's old hardware, with tiny caches (that were, by virtue of their tininess, closer to the main instruction-processing silicon), that was optimized for the carefully stack-allocating everything in the world to conserve cache approach. You can see this pretty clearly by running your favorite Python benchmark of choice on machines which are similar except for cache size. The newer machine, with the bigger cache, will run Python considerably faster, but doesn't help the average trivial C benchmark that much - or, for that matter, Linux benchmarks. -glyph ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] the role of assert in the standard library ?
On Apr 28, 2011, at 12:59 PM, Guido van Rossum wrote: On Thu, Apr 28, 2011 at 12:54 AM, Tarek Ziadé ziade.ta...@gmail.com wrote: In my opinion assert should be avoided completely anywhere else than in the tests. If this is a wrong statement, please let me know why :) I would turn that around. The assert statement should not be used in unit tests; unit tests should use self.assertXyzzy() always. In regular code, assert should be about detecting buggy code. It should not be used to test for error conditions in input data. (Both these can be summarized as if you still want the test to happen with -O, don't use assert.) You're both right! :) My take on assert is don't use it, ever. assert is supposed to be about conditions that never happen. So there are a few cases where I might use it: If I use it to enforce a precondition, it's wrong because under -OO my preconditions won't be checked and my input might be invalid. If I use it to enforce a postcondition, then my API's consumers have to occasionally handle this weird error, except it won't be checked under -OO so they won't be able to handle it consistently. If I use it to try to make assertions about internal state during a computation, then I introduce an additional, untested (at the very least untested under -OO), probably undocumented (did I remember to say and raises AssertionError when... in its docstring?) code path where when this bad thing happens, I get an exception instead of a result. If that's an important failure mode, then there ought to be a documented exception, which the computation's consumers can deal with. If it really should never happen, then I really should have just written some unit tests verifying that it doesn't happen in any case I can think of. And I shouldn't be writing code to handle cases I can't come up with any way to exercise, because how do I know that it's going to do the right thing? (If I had a dollar for every 'assert' message that didn't have the right number of arguments to its format string, etc.) Also, when things that should never happen do actually happen in real life, is a random exception that interrupts the process actually an improvement over just continuing on with some potentially bad data? In most cases, no, it really isn't, because by blowing up you've removed the ability of the user to take corrective action or do a workaround. (In the cases where blowing up is better because you're about to do something destructive, again, a test seems in order.) My python code is very well documented, which means that there is sometimes a significant runtime overhead from docstrings. That's really my only interest in -OO: reducing memory footprint of Python processes by dropping dozens of megabytes of library documentation from each process. The fact that it changes the semantics of 'assert' is an unfortunate distraction. So the only time I'd even consider using 'assert' is in a throwaway script which might be run once, that I'm not going to write any tests for and I'm not going to maintain, but I might care about just enough to want to blow up instead of calling 'os.unlink' if certain conditions are not met. (But then every time I actually use it that way, I realize that I should have dealt with the error sanely and I probably have to go back and fix it anyway.) ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] python and super
On Apr 14, 2011, at 12:59 PM, Ronald Oussoren wrote: What would the semantics be of a super that (...) I think it's long past time that this move to python-ideas, if you don't mind. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Supporting Visual Studio 2010
On Apr 5, 2011, at 8:52 AM, exar...@twistedmatrix.com wrote: On 09:58 am, mar...@v.loewis.de wrote: Won't that still be an issue despite the stable ABI? Extensions on Windows should be linked to the same version of MSVCRT used to compile Python Not if they use the stable ABI. There still might be issues if you mix CRTs, but none related to the Python ABI - in particular, none of those crashing conditions can arise from the stable ABI. Does this mean new versions of distutils let you build_ext with any C compiler, instead of enforcing the same compiler as it has done previously? That would be great. That *would* be great. But is it possible? http://www.python.org/dev/peps/pep-0384/ says functions expecting FILE* are not part of the ABI, to avoid depending on a specific version of the Microsoft C runtime DLL on Windows. Can extension modules that need to read and write files practically avoid all of those functions? (If your extension module links a library with a different CRT, but doesn't pass functions back and forth to Python, is that OK?) The PEP also says that it will allow users to check whether their modules conform to the ABI, but it doesn't say how that will be done. How can we build extension modules so that we're sure we're ABI-conformant? ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Policy for making changes to the AST
On Apr 4, 2011, at 2:00 PM, Guido van Rossum wrote: On Mon, Apr 4, 2011 at 10:05 AM, fwierzbi...@gmail.com fwierzbi...@gmail.com wrote: As a re-implementor of ast.py that tries to be node for node compatible, I'm fine with #1 but would really like to have tests that will fail in test_ast.py to alert me! [and] On Mon, Apr 4, 2011 at 10:38 AM, Michael Foord fuzzy...@voidspace.org.uk wrote: A lot of tools that work with Python source code use ast - so even though other implementations may not use the same ast under the hood they will probably at least *want* to provide a compatible implementation. IronPython is in that boat too (although I don't know if we *have* a compatible implementation yet - we certainly feel like we *should* have one). Ok, so it sounds like ast is *not* limited to CPython? Oh, definitely not. I would be pretty dismayed if tools like http://bazaar.launchpad.net/~divmod-dev/divmod.org/trunk/files/head:/Pyflakes/ would not run on Jython PyPy. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Differences among Emacsen
On Mar 30, 2011, at 2:54 PM, Barry Warsaw wrote: On Mar 30, 2011, at 09:43 AM, Ralf Schmitt wrote: Barry Warsaw ba...@python.org writes: In case you missed it, there are now *three* Python modes. Tim Peters' original and best (in my completely unbiased opinion wink) python-mode.el which is still being developed, the older but apparently removed from Emacs python.el and the 'new' (so I've heard) python.el. https://github.com/fgallina/python.el is the fourth one.. Wonderful. I have a plea for posterity: since I'm sure that a hundred people will see this post and decide that the best solution to this proliferation of python plugins for emacs is that there should be a new one that is even better than all these other ones (and also totally incompatible, of course)... I won't try to stop you all from doing that, but please at least don't call it python.el. This is like if ActiveState, Wing, PyCharm and PyDev for Eclipse had all decided to call their respective projects IDLE because that's what you call a Python IDE :). It would be nice to be able to talk about Python / Emacs code without having to do an Abbott and Costello routine. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Finally switch urllib.parse to RFC3986 semantics?
On Mar 18, 2011, at 8:41 PM, Guido van Rossum wrote: Really. Do they still call them URIs? :-) Well, by RFC 398*7* they're calling them IRIs instead. 'irilib', perhaps? ;-) ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] funky buildbot
On Mar 10, 2011, at 3:18 PM, Bill Janssen wrote: It's a new Mac Mini running the latest Snow Leopard, with Python 2.6.1 (the /usr/bin/python) and buildslave 0.8.3, using Twisted 8.2.0. I realize that Python 2.6 is pretty old too, but a _lot_ of bugfixes have gone into Twisted since 8.2. I'm not 100% sure this is a Twisted issue but you may want to try upgrading to 10.2.0 and see if that fixes things. (I have a dim memory of similar issues which were eventually fixed by something in our subprocess support...) -glyph ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Support the /usr/bin/python2 symlink upstream
On Fri, Mar 4, 2011 at 10:03 AM, Westley Martínez aniko...@gmail.comwrote: On Fri, 2011-03-04 at 00:54 -0800, Aaron DeVore wrote: On Thu, Mar 3, 2011 at 11:44 PM, Kerrick Staley m...@kerrickstaley.com wrote: That way, if the sysadmin does decide to replace the installed python file, he can do so without inadvertently deleting the previously installed binary. Nit pick: Change he to they to be gender neutral. Nit pick: Change they to he to be grammatically correct. If we really have to be gender neutral, change he to he or she. This grammatical rule is a modern fiction with no particular utility. Go ahead and use singular they as a gender-neutral pronoun; it was good enough for Shakespeare, Twain, Austen and Shaw, it should be good enough for Python. http://en.wikipedia.org/wiki/Singular_they#Examples_of_generic_they ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Import and unicode: part two
On Jan 20, 2011, at 11:46 AM, Guido van Rossum wrote: On Thu, Jan 20, 2011 at 5:16 AM, Nick Coghlan ncogh...@gmail.com wrote: On Thu, Jan 20, 2011 at 10:08 PM, Simon Cross hodgestar+python...@gmail.com wrote: I'm changing my vote on this to a +1 for two reasons: * Initially I thought this wasn't supported by Python at all but I see that currently it is supported but that support is broken (or at least limited to UTF-8 filesystem encodings). Since support is there, might as well make it better (especially if it tidies up the code base at the same time). * I still don't think it's a good idea to give modules non-ASCII names but the consenting adults approach suggests we should let people shoot themselves in the foot if they believe they have good reason to do so. I'm also +1 on this for the reasons Simon gives. Same here. *Most* code will never be shared, or will only be shared between users in the same community. When it goes wrong it's also a learning opportunity. :-) Despite my usual proclivity for being contrarian, I find myself in agreement here. Linux users with locales that don't specify UTF-8 frankly _should_ have to deal with all kinds of nastiness until they can transcode their filesystems. MacOS and Windows both have a right answer here and your third-party tools shouldn't create mojibake in your filenames. However, I feel that we should not necessarily be making non-ASCII programmers second-class citizens, if they are to be supported at all. The obvious outcome of the current regime is, if you want your code to work in the wider world, you have to make everything ASCII, so non-ASCII programmers have to do a huge amount of extra work to prepare their stuff for distribution. As an english speaker I'd be happy about that, but as a person with a lot of Chinese in-laws, it gives me pause. There is a difference between sharing code for inspection and editing (where a little codec pain is good for the soul: set your locale to UTF-8 and forget it already!) and sharing code so that a (non-programming) user can just run it. If I can write software in English and distribute it to Chinese people, fair's fair, they should be able to write it in chinese and have it work on my computer. To support the latter, could we just make sure that zipimport has a consistent, non-locale-or-operating-system-dependent interpretation of encoding? That way a distributed egg would be importable from a zipfile regardless of how screwed up the distribution target machine's filesystem is. (And this is yet more motivation for distributors to set zip_safe=True.)___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Import and unicode: part two
On Jan 20, 2011, at 12:02 AM, Glenn Linderman wrote: But for local code, having to think up an ASCII name for a module rather than use the obvious native-language name, is just brain-burden when creating the code. Is it really? You already had to type 'import', presumably if you can think in Python you can think in ASCII. (After my experiences with namespace crowding in Twisted, I'm inclined to suggest something more like import m_07117FE4A1EBD544965DC19573183DA2 as café - then I never need to worry about café2 looking ugly or cafe being incompatible :).) ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Import and unicode: part two
On Jan 20, 2011, at 12:19 AM, Glenn Linderman wrote: Now if the stuff after m_ was the hex UTF-8 of café, that could get interesting :) (As it happens, it's the hex digest of the MD5 of the UTF-8 of café... ;-))___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] devguide: Point out that OS X users need to change examples to use python.exe instead of
On Jan 10, 2011, at 1:37 PM, Łukasz Langa wrote: I'm using the case-sensitive variant of HFS+ since 10.4. It works, I like it and you get ./python with it. I realize that this isn't a popularity contest for this feature, but I feel like I should pipe up here and mention that it breaks some applications - for example, you can't really install World of Warcraft on a case-insensitive filesystem. Not the filesystem's fault really, but it is a good argument for why users shouldn't choose it. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Checking input range in time.asctime and time.ctime
On Jan 5, 2011, at 4:33 PM, Guido van Rossum wrote: Shouldn't the logic be to take the current year into account? By the time 2070 comes around, I'd expect 70 to refer to 2070, not to 1970. In fact, I'd expect it to refer to 2070 long before 2070 comes around. All of which makes me think that this is better left to the app, which can decide for itself whether it is more important to represent dates in the future or dates in the past. The point of this somewhat silly flag (as I understood its description earlier in the thread) is to provide compatibility with POSIX 2-year dates. As per http://pubs.opengroup.org/onlinepubs/007908799/xsh/strptime.html - %y is the year within century. When a century is not otherwise specified, values in the range 69-99 refer to years in the twentieth century (1969 to 1999 inclusive); values in the range 00-68 refer to years in the twenty-first century (2000 to 2068 inclusive). Leading zeros are permitted but not required. So, 70 means 1970, forever, in programs that care about this nonsense. Personally, by the time 2070 comes around, I hope that 70 will just refer to 70 A.D., and get you odd looks if you use it in a written date - you might as well just write '0' :). ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Possible optimization for LOAD_FAST ?
On Jan 2, 2011, at 10:18 PM, Guido van Rossum wrote: On Sun, Jan 2, 2011 at 5:50 PM, Alex Gaynor alex.gay...@gmail.com wrote: No, it's singularly impossible to prove that any global load will be any given value at compile time. Any optimization based on this premise is wrong. True. My proposed way out of this conundrum has been to change the language semantics slightly so that global names which (a) coincide with a builtin, and (b) have no explicit assignment to them in the current module, would be fair game for such optimizations, with the understanding that the presence of e.g. len = len anywhere in the module (even in dead code!) would be sufficient to disable the optimization. But barring someone interested in implementing something based on this rule, the proposal has languished for many years. Wouldn't this optimization break things like mocking out 'open' for testing via 'module.open = fakeopen'? I confess I haven't ever wanted to change 'len' but that one seems pretty useful. If CPython wants such optimizations, it should do what PyPy and its ilk do, which is to notice the assignment, but recompile code in that module to disable the fast path at runtime, preserving the existing semantics. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] len(chr(i)) = 2?
On Nov 24, 2010, at 4:03 AM, Stephen J. Turnbull wrote: You end up proliferating types that all do the same kind of thing. Judicious use of inheritance helps, but getting the fundamental abstraction right is hard. Or least, Emacs hasn't found it in 20 years of trying. Emacs hasn't even figured out how to do general purpose iteration in 20 years of trying either. The easiest way I've found to loop across an arbitrary pile of 'stuff' is the CL 'loop' macro, which you're not even supposed to use. Even then, you still have to make the arcane and pointless distinction of using 'across' or 'in' or 'on'. Python, on the other hand, has iteration pretty well tied up nicely in a bow. I don't know how to respond to the rest of your argument. Nothing you've said has in any way indicated to me why having code-point offsets is a good idea, only that people who know C and elisp would rather sling around piles of integers than have good abstract types. For example: I think it more likely that markers are very expense to create and use compared to integers. What? When you do 'for x in str' in python, you are already creating an iterator object, which has to store the exact same amount of state that our proposed 'marker' or 'character pointer' would have to store. The proposed UTF-8 marker would have to do a tiny bit more work when iterating because it would have to combine multibyte characters, but in exchange for that you get to skip a whole ton of copying when encoding and decoding. How is this expensive to create and use? For every application I have ever designed, encountered, or can even conjecture about, this would be cheaper. (Assuming not just a UTF-8 string type, but one for UTF-16 as well, where native data is in that format already.) For what it's worth, not wanting to use abstract types in Emacs makes sense to me: I've written my share of elisp code, and it is hard to create reasonable abstractions in Emacs, because the facilities for defining types and creating polymorphic logic are so crude. It's a lot easier to just assume your underlying storage is an array, because at the end of the day you're going to need to call some functions on it which care whether it's an array or an alist or a list or a vector anyway, so you might as well just say so up front. But in Python we could just call 'mystring.by_character()' or 'mystring.by_codepoint()' and get an iterator object back and forget about all that junk. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] len(chr(i)) = 2?
On Nov 24, 2010, at 10:55 PM, Stephen J. Turnbull wrote: Greg Ewing writes: On 24/11/10 22:03, Stephen J. Turnbull wrote: But if you actually need to remember positions, or regions, to jump to later or to communicate to other code that manipulates them, doing this stuff the straightforward way (just copying the whole iterator object to hang on to its state) becomes expensive. If the internal representation of a text pointer (I won't call it an iterator because that means something else in Python) is a byte offset or something similar, it shouldn't take up any more space than a Python int, which is what you'd be using anyway if you represented text positions by grapheme indexes or whatever. That's not necessarily true. Eg, in Emacs (there you go again), Lisp integers are not only immediate (saving one pointer), but the type is encoded in the lower bits, so that there is no need for a type pointer -- the representation is smaller than the opaque marker type. Altogether, up to 8 of 12 bytes saved on a 32-bit platform, or 16 of 24 bytes on a 64-bit platform. Yes, yes, lisp is very clever. Maybe some other runtime, like PyPy, could make this optimization. But I don't think that anyone is filling up main memory with gigantic piles of character indexes and need to squeeze out that extra couple of bytes of memory on such a tiny object. Plus, this would allow such a user to stop copying the character data itself just to decode it, and on mostly-ascii UTF-8 text (a common use-case) this is a 2x savings right off the bat. In Python it's true that markers can use the same data structure as integers and simply provide different methods, and it's arguable that Python's design is better. But if you use bytes internally, then you have problems. No, you just have design questions. Do you expose that byte value to the user? Yes, but only if they ask for it. It's useful for computing things like quota and the like. Can users (programmers using the language and end users) specify positions in terms of byte values? Sure, why not? If so, what do you do if the user specifies a byte value that points into a multibyte character? Go to the beginning of the multibyte character. Report that position; if the user then asks the requested marker object for its position, it will report that byte offset, not the originally-requested one. (Obviously, do the same thing for surrogate pair code points.) What if the user wants to specify position by number of characters? Part of the point that we are trying to make here is that nobody really cares about that use-case. In order to know anything useful about a position in a text, you have to have traversed to that location in the text. You can remember interesting things like the offsets of starts of lines, or the x/y positions of characters. Can you translate efficiently? No, because there's no point :). But you _could_ implement an overlay that cached things like the beginning of lines, or the x/y positions of interesting characters. As I say elsewhere, it's possible that there really never is a need to efficiently specify an absolute position in a large text as a character (grapheme, whatever) count. But I think it would be hard to implement an efficient text-processing *language*, eg, a Python module for *full conformance* in handling Unicode, on top of UTF-8. Still: why? I guess if I have some free time I'll try my hand at it, and maybe I'll run into a wall and realize you're right :). Any time you have an algorithm that requires efficient access to arbitrary text positions, you'll spend all your skull sweat fighting the representation. At least, that's been my experience with Emacsen. What sort of algorithm would that be, though? The main thing that I could think of is a text editor trying to efficiently allow the user to scroll to the middle of a large file without reading the whole thing into memory. But, in that case, you could use byte-positions to estimate, and display an heuristic number while calculating the real line numbers. (This is what 'less' does, and it seems to work well.) So I don't really see what you're arguing for here. How do *you* think positions in unicode strings should be represented? I think what users should see is character positions, and they should be able to specify them numerically as well as via an opaque marker object. I don't care whether that position is represented as bytes or characters internally, except that the experience of Emacsen is that representation as byte positions is both inefficient and fragile. The representation as character positions is more robust but slightly more inefficient. Is it really the representation as byte positions which is fragile (i.e. the internal implementation detail), or the exposure of that position to calling code, and the idiomatic usage of that number as an integer?
Re: [Python-Dev] constant/enum type in stdlib
On Nov 23, 2010, at 10:37 AM, ben.cottr...@nominum.com wrote: I'd prefer not to think of the number of times I've made the following mistake: s = socket.socket(socket.SOCK_DGRAM, socket.AF_INET) If it's any consolation, it's fewer than the number of times I have :). (More fun, actually, is where you pass a file descriptor to the wrong argument of 'fromfd'...) ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] constant/enum type in stdlib
On Nov 23, 2010, at 10:01 AM, Antoine Pitrou wrote: Well, it is easy to assign range(N) to a tuple of names when desired. I don't think an automatically-enumerating constant generator is needed. I don't think that numerical enumerations are the only kind of constants we're talking about. Others have already mentioned strings. Also, see http://tm.tl/4671 for some other use-cases. Since this isn't coming to 2.x, we're probably going to do our own thing anyway (unless it turns out that flufl.enum is so great that we want to add another dependency...) but I'm hoping that the outcome of this discussion will point to something we can be compatible with. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] len(chr(i)) = 2?
On Nov 23, 2010, at 7:22 PM, James Y Knight wrote: On Nov 23, 2010, at 6:49 PM, Greg Ewing wrote: Maybe Python should have used UTF-8 as its internal unicode representation. Then people who were foolish enough to assume one character per string item would have their programs break rather soon under only light unicode testing. :-) You put a smiley, but, in all seriousness, I think that's actually the right thing to do if anyone writes a new programming language. It is clearly the right thing if you don't have to be concerned with backwards-compatibility: nobody really needs to be able to access the Nth codepoint in a string in constant time, so there's not really any point in storing a vector of codepoints. Instead, provide bidirectional iterators which can traverse the string by byte, codepoint, or by grapheme (that is: the set of combining characters + base character that go together, making up one thing which a human would think of as a character). I really hope that this idea is not just for new programming languages. If you switch from doing unicode wrong to doing unicode right in Python, you quadruple the memory footprint of programs which primarily store and manipulate large amounts of text. This is especially ridiculous in PyGTK applications, where the GUI's internal representation required by the GUI UTF-8 anyway, so the round-tripping of string data back and forth to the exploded UTF-32 representation is wasting gobs of memory and time. It at least makes sense when your C library's idea about character width and your Python build match up. But, in a desktop app this is unlikely to be a performance concern; in servers, it's a big deal; measurably so. I am pretty sure that in the server apps that I work on, we are eventually going to need our own string type and UTF-8 logic that does exactly what James suggested - certainly if we ever hope to support Py3. (I dimly recall that both James and I have made this point before, but it's pretty important, so it bears repeating.) ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] OpenSSL Voluntarily (openssl-1.0.0a)
On Nov 23, 2010, at 9:02 AM, Antoine Pitrou wrote: On Tue, 23 Nov 2010 00:07:09 -0500 Glyph Lefkowitz gl...@twistedmatrix.com wrote: On Mon, Nov 22, 2010 at 11:13 PM, Hirokazu Yamamoto ocean-c...@m2.ccsnet.ne.jp wrote: Hello. Does this affect python? Thank you. http://www.openssl.org/news/secadv_20101116.txt No. Well, actually it does, but Python links against the system OpenSSL on most platforms (except Windows), so it's up to the OS vendor to apply the patch. It does? If so, I must have misunderstood the vulnerability. Can you explain how it affects Python? ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] len(chr(i)) = 2?
On Nov 23, 2010, at 9:44 PM, Stephen J. Turnbull wrote: James Y Knight writes: You put a smiley, but, in all seriousness, I think that's actually the right thing to do if anyone writes a new programming language. It is clearly the right thing if you don't have to be concerned with backwards-compatibility: nobody really needs to be able to access the Nth codepoint in a string in constant time, so there's not really any point in storing a vector of codepoints. A sad commentary on the state of Emacs usage, nobody. The theory is that accessing the first character of a region in a string often occurs as a primitive operation in O(N) or worse algorithms, sometimes without enough locality at the collection of regions level to give a reasonably small average access time. I'm not sure what you mean by the theory is. Whose theory? About what? In practice, any *Emacs user can tell you that yes, we do need to be able to access the Nth codepoint in a buffer in constant time. The O(N) behavior of current Emacs implementations means that people often use a binary coding system on large files. Yes, some position caching is done, but if you have a large file (eg, a mail file) which is virtually segmented using pointers to regions, locality gets lost. (This is not a design bug, this is a fundamental requirement: consider fast switching between threaded view and author-sorted view.) Sounds like a design bug to me. Personally, I'd implement fast switching between threaded view and author-sorted view the same way I'd address any other multiple-views-on-the-same-data problem. I'd retain data structures for both, and update them as the underlying model changed. These representations may need to maintain cursors into the underlying character data, if they must retain giant wads of character data as an underlying representation (arguably the _main_ design bug in Emacs, that it encourages you to do that for everything, rather than imposing a sensible structure), but those cursors don't need to be code-point counters; they could be byte offsets, or opaque handles whose precise meaning varied with the potentially variable underlying storage. Also, please remember that Emacs couldn't be implemented with giant Python strings anyway: crucially, all of this stuff is _mutable_ in Emacs. And of course an operation that sorts regions in a buffer using character pointers will have the same problem. Working with memory pointers, OTOH, sucks more than that; GNU Emacs recently bit the bullet and got rid of their higher-level memory-oriented APIs, all of the Lisp structures now work with pointers, and only the very low-level structures know about character-to-memory pointer translation. This performance issue is perceptible even on 3GHz machines with not so large (50MB) mbox files. It's *horrid* if you do something like occur on a 1GB log file, then try randomly jumping to detected log entries. Case in point: occur needs to scan the buffer anyway; you can't do better than linear time there. So you're going to iterate through the buffer, using one of the techniques that James proposed, and remember some locations. Why not just have those locations be opaque cursors into your data? In summary: you're right, in that James missed a spot. You need bidirectional, *copyable* iterators that can traverse the string by byte, codepoint, grapheme, or decomposed glyph. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] OpenSSL Voluntarily (openssl-1.0.0a)
On Mon, Nov 22, 2010 at 11:13 PM, Hirokazu Yamamoto ocean-c...@m2.ccsnet.ne.jp wrote: Hello. Does this affect python? Thank you. http://www.openssl.org/news/secadv_20101116.txt No. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Breaking undocumented API
On Nov 16, 2010, at 4:49 PM, Guido van Rossum wrote: PEP 8 isn't nearly visible enough, either. Whatever the rule is, it needs to be presented with the information itself. If the rule is that things not documented in the library manual have no compatibility guarantees, then all of the means of getting documentation *other* than looking at the library manual need to indicate this somehow (alternatively, the information shouldn't be duplicated, but I doubt I'll convince anyone of that). Assuming people actually read the disclaimers. I don't think it necessarily needs to be presented as a disclaimer. There will always be people who just ignore part of the information presented, but the message could be something along the lines of Here's some basic documentation, but it might be out-of-date or incomplete. You can find a better reference at http://helpful-hyperlink.example.com. If it's easy to click on the link, I think a lot of people will click on it. Especially since the library reference really _is_ more helpful than the docstrings, for the standard library. (IMHO, dir()'s semantics are so weird that it should emit a warning too, like looking for docs? please use help().) ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Breaking undocumented API
On Nov 10, 2010, at 2:21 PM, James Y Knight wrote: On the other hand, if you make the primary mechanism to indicate privateness be a leading underscore, that's obvious to everyone. +1. One of the best features of Python is the ability to make a conscious decision to break the interface of a library and just get on with your work, even if your use-case is not really supported, because nothing can stop you calling its private functionality. But, IMHO the worst problem with Python is the fact that you can do this _without realizing it_ and pay a steep maintenance price later when an upgrade of something springs the trap that you had unwittingly set for yourself. The leading-underscore convention is the only thing I've found that even mitigates this problem. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Breaking undocumented API
On Nov 8, 2010, at 4:50 PM, Guido van Rossum wrote: On Mon, Nov 8, 2010 at 3:55 PM, Glyph Lefkowitz gl...@twistedmatrix.com wrote: This seems like a pretty clear case of practicality beats purity. Not only has nobody complained about deprecatedModuleAttribute, but there are tons of things which show up in sys.modules that aren't modules in the sense of 'instances of ModuleType'. The Twisted reactor, for example, is an instance, and we've been doing *that* for about 10 years with no complaints. But the Twisted universe is only a subset of the Python universe. The Python stdlib needs to move more carefully. While this is true, I think the Twisted universe generally represents a particularly conservative, compatibility-conscious area within the Python universe (multiverse?). I know of several Twisted users who regularly upgrade to the most recent version of Twisted without incident, but can't move from Python 2.4-2.5 because of compatibility issues. That's not to say that there are no areas within the larger Python ecosystem that I'm unaware of where putting non-module-objects into sys.modules would cause issues. But if it were a practice that were at all common, I suspect that we would have bumped into it by now. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Breaking undocumented API
On Nov 8, 2010, at 2:35 PM, exar...@twistedmatrix.com wrote: On 09:57 pm, br...@python.org wrote: On Mon, Nov 8, 2010 at 13:45, exar...@twistedmatrix.com wrote: On 09:25 pm, br...@python.org wrote: On Mon, Nov 8, 2010 at 13:03, exar...@twistedmatrix.com wrote: On 07:58 pm, br...@python.org wrote: I don't think a strict don't remove without deprecation policy is workable. For example, is trace.rx_blank constant part of the trace module API that needs to be preserved indefinitely? I don't even know if it is possible to add a deprecation warning to it, but CoverageResults._blank_re would certainly be a better place for it. The deprecation policy obviously cannot apply to module-level attributes. I'm not sure why this is. Can you elaborate? There is no way to directly trigger a DeprecationWarning for an attribute. We can still document it, but there is just no way to programmatically enforce it. What about `deprecatedModuleAttribute` (http://twistedmatrix.com/documents/current/api/twisted.python.deprecate.html) or zope.deprecation (http://docs.zope.org/zope3/Book/deprecation/show.html) which inspired it? Just checked the code and it looks like it substitutes the module for some proxy object? To begin that break subclass checks. After that I don't know the ramifications without really digging into the ModuleType code. That could be fixed if ModuleType allowed subclassing. :) For what it's worth, no one has complained about problems caused by `deprecatedModuleAttribute`, but we've only been using it for about two and a half years. This seems like a pretty clear case of practicality beats purity. Not only has nobody complained about deprecatedModuleAttribute, but there are tons of things which show up in sys.modules that aren't modules in the sense of 'instances of ModuleType'. The Twisted reactor, for example, is an instance, and we've been doing *that* for about 10 years with no complaints. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Pickle alternative in stdlib (Was: On breaking modules into packages)
On Nov 4, 2010, at 12:49 PM, Guido van Rossum wrote: What's the attack you're thinking of on marshal? It never executes any code while unmarshalling (although it can unmarshal code objects -- but the receiving program has to do something additionally to execute those). These issues may have been fixed now, but a long time ago I recall seeing some nasty segfaults which looked exploitable when feeding marshal malformed data. If they still exist, running a fuzzer on some pyc files should reveal them pretty quickly. When I ran across them I didn't think much of them, and probably did not even report the bug, since marshal is mostly used to load code anyway, which is implicitly trusted. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] On breaking modules into packages Was: [issue10199] Move Demo/turtle under Lib/
On Nov 3, 2010, at 1:04 PM, James Y Knight wrote: This is the strongest reason why I recommend to everyone I know that they not use pickle for storage they'd like to keep working after upgrades [not just of stdlib, but other 3rd party software or their own software]. :) +1. Twisted actually tried to preserve pickle compatibility in the bad old days, but it was impossible. Pickles should never really be saved to disk unless they contain nothing but lists, ints, strings, and dicts. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] On breaking modules into packages Was: [issue10199] Move Demo/turtle under Lib/
On Nov 3, 2010, at 11:26 AM, Alexander Belopolsky wrote: This may not be a problem for smart tools, but for me and a simple editor what used to be: Maybe this is the real problem? It's 2010, we should all be far enough beyond EDLIN that our editors can jump to the definition of a Python class. Even Vim can be convinced to do this (http://rope.sourceforge.net/ropevim.html). Could Python itself make this easier? Maybe ship with a command that says hey, somewhere on sys.path, there is a class with this name. Please run '$EDITOR file +line' (or the current OS's equivalent) so I can look at the source code. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] closing files and sockets in a timely manner in the stdlib
On Oct 30, 2010, at 2:39 PM, Jack Diederich wrote: On Fri, Oct 29, 2010 at 8:35 PM, Brett Cannon br...@python.org wrote: For those of you who have not noticed, Antoine committed a patch that raises a ResourceWarning under a pydebug build if a file or socket is closed through garbage collection instead of being explicitly closed. Just yesterday I discovered /proc/your PID here/fd/ which is a list of open file descriptors for your PID on *nix and includes all open files, pipes, and sockets. Very handy, I filed some tickets about company internal libs that were opening file handles as a side effect of import (logging mostly). I tried to provoke standard python imports (non-test) to leave some open handles and came up empty. That path (and anything below /proc, really) is a list of open file descriptors specifically on Linux, not *nix. Also on linux, you can avoid your pid here by just doing /proc/self. A more portable (albeit not standard) path for what file descriptors do I have open is /dev/fd/. This is supported via a symlink to /proc/self on all the Linuxes I've tested on. There's no portable standard equivalent for not-yourself processes that I'm aware of, though. See more discussion here: http://twistedmatrix.com/trac/ticket/4522. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Continuing 2.x
On Oct 28, 2010, at 10:51 PM, Brett Cannon wrote: I think people need to stop viewing the difference between Python 2.7 and Python 3.2 as this crazy shift and view it from python-dev's perspective; it should be viewed one follows from the other at this point. You can view it as Python 3.2 is the next version after Python 2.7 just like 2.7 followed to 2.6, which makes the policies we follow for releases make total sense and negates this discussion. It just so happens people don't plan to switch to the newest release immediately as the backward-incompatible changes are more involved than what people are used to from past releases. Brett, with all due respect, this is not a reasonable position. You are making it sound like the popular view of 3.2 is a crazy shift is based on a personal dislike of python-dev or something. The fact is that the amount of effort required to port to 3.2 is extreme compared to previous upgrades, and most people still aren't willing to deal with it. It is a crazy shift. Let's take PyPI numbers as a proxy. There are ~8000 packages with a Programming Language::Python classifier. There are ~250 with Programming Langauge::Python::3. Roughly speaking, we can say that is 3% of Python code which has been ported so far. Python 3.0 was released at the end of 2008, so people have had roughly 2 years to port, which comes up with 1.5% per year. Let's say that 20% of the code on PyPI is just junk; it's unfair to expect 100% of all code ever to get ported. But, still: with this back-of-the-envelope estimate of the rate of porting, it will take over 50 years before a decisive majority of Python code is on Python 3. By contrast, there are 536 packages with ::2.6, and 177 with ::2.7. (Trying to compare apples to apples here, since I assume the '2' tag is much more lightly used than '3' to identify supported versions; I figure someone likely to tag one micro-version would also tag the other.) 2.7 was released on July 3rd, so let's be generous and say approximately 6 months. That's 30% of packages, ported in 6 months, or 60% per year. This means that Python 3 is two orders of magnitude crazier of a shift than 2.7. I know that the methods involved at arriving at these numbers are not particularly good. But, I think that if their accuracy differs from that of the download stats, it's better: it takes a much more significant commitment to actually write some code and upload it than to accidentally download 3.x because it's the later version. Right now, Kristján is burning off his (non-fungible) enthusiasm in this discussion rather than addressing more 2.x maintenance issues. If 3.x adoption takes off and makes a nice hockey stick graph, then few people will care about this in retrospect. In the intervening hypothetical half-century while we wait to see how it pans out, isn't it better to just have an official Python branch for the maybe 2.8 release? Nobody from the current core team needs to work on it, necessarily; either other, new maintainers will show up or they won't. For that matter, Kristján is still talking about porting much of his work to 3.x anyway. In the best case (3.x takes over the world in 6 months) a 2.x branch won't be needed and nobody will show up to do the work of a release; some small amount of this work (the stuff not ported to 3.x) will be lost. In the medium case (3.x adoption is good, but there are still millions of 2.x users in 5 years) it will accumulate some helpers that will make migrating to 3.x even smoother than with 2.7. In the worst case (straw man: 3.x adoption actually declines, and distros start maintaining their own branches of 2.7) I'm sure everyone will be glad that some of this maintenance effort took place and there's some central place to continue it. I'm perfectly willing to admit that I'm still too pessimistic about this and I could be wrong. But given the relatively minimal amount of effort required to let 2.x bugs continue to get fixed under the aegis of Python.org rather than going through the painful negotiation process of figuring out where else to host it (and thereby potentially losing a bunch of maintenance that would not otherwise happen), it seems foolhardy to insist that those of us who think 2.x is going to necessitate another release must necessarily be wrong. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Support for async read/write
On Oct 19, 2010, at 9:55 PM, exar...@twistedmatrix.com wrote: Not only is the performance usually worse than expected, the behavior of aio_* functions require all kinds of subtle and mysterious coordination with signal handling, which I'm not entirely sure Python would even be able to pull off without some modifications to the signal module. (And, as Jean-Paul mentioned, if your OS kernel runs out of space in a queue somewhere, completion notifications might just never be delivered at all.) Just to be clear, James corrected me there. I thought Jesus was talking about the mostly useless Linux AIO APIs, which have the problems I described. He was actually talking about the POSIX AIO APIs, which have a different set of problems making them a waste of time. I know, I'm referring to the behavior of POSIX AIO. Perhaps I'm overstating the case with 'subtle and mysterious', then, but the POISX 'aiocb' structure still includes an 'aio_sigevent' member which is the way to find out about I/O event completion. If you're writing an application that uses AIO, basically all of your logic ends up living in the context of a signal handler, and as http://www.opengroup.org/onlinepubs/95399/functions/xsh_chap02_04.html#tag_02_04_01 puts it, When signal-catching functions are invoked asynchronously with process execution, the behavior of some of the functions defined by this volume of IEEE Std 1003.1-2001 is unspecified if they are called from a signal-catching function. Of course, you could try using signalfd(), but that's not in POSIX. (Or, you could use SIGEV_THREAD, but that would be functionally equivalent to running read() in a thread, except much more difficult.) ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Support for async read/write
On Oct 20, 2010, at 12:31 AM, Jeffrey Yasskin wrote: No comment on the rest of your claim, but this is a silly argument. The standard says the same thing about at least fcntl.h, signal.h, pthread.h, and ucontext.h, which clearly are useful. It was meant to be tongue-in-cheek :). Perhaps I should not have assumed that everyone else was as familiar with the POSIX documentation; I figured that most readers would know that most pages say that. But, that was the result of a string of many different searches attempting to find someone explaining why this was a good idea or why anyone would want to use it. I think in this case, it's accurate. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Support for async read/write
On Oct 19, 2010, at 8:09 PM, James Y Knight wrote: There's a difference. os._exit is useful. os.open is useful. aio_* are *not* useful. For anything. If there's anything you think you want to use them for, you're wrong. It either won't work properly or it will worse performing than the simpler alternatives. I'd like to echo this sentiment. This is not about providing a 'safe' wrapper to hide some powerful feature of these APIs: the POSIX aio_* functions are really completely useless. To quote the relevant standard http://www.opengroup.org/onlinepubs/95399/basedefs/aio.h.html: APPLICATION USAGE None. RATIONALE None. FUTURE DIRECTIONS None. Not only is the performance usually worse than expected, the behavior of aio_* functions require all kinds of subtle and mysterious coordination with signal handling, which I'm not entirely sure Python would even be able to pull off without some modifications to the signal module. (And, as Jean-Paul mentioned, if your OS kernel runs out of space in a queue somewhere, completion notifications might just never be delivered at all.) I would love for someone to prove me wrong. In particular, I would really love for there to be a solution to asynchronous filesystem I/O better than start a thread, read until you block. But, as far as I know, there isn't, and wrapping these functions will just confuse and upset anyone who attempts to use them in any way. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Supporting raw bytes data in urllib.parse.* (was Re: Polymorphic best practices)
On Sep 18, 2010, at 10:18 PM, Steve Holden wrote: I could probably be persuaded to merge the APIs, but the email6 precedent suggests to me that separating the APIs better reflects the mental model we're trying to encourage in programmers manipulating text (i.e. the difference between the raw octet sequence and the text character sequence/parsed data). That sounds pretty sane and coherent to me. While I don't like the email6 precedent as such (that there would be different parsed objects, based on whether you started parsing with bytes or with strings), the idea that when you are working directly with bytes or text, you should have to know which one you have, is a good one. +1 for keeping the APIs separate with 'urlsplitb' etc.___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Polymorphic best practices [was: (Not) delaying the 3.2 release]
On Sep 16, 2010, at 4:51 PM, R. David Murray wrote: Given a message, there are many times you want to serialize it as text (for example, for presentation in a UI). You could provide alternate serialization methods to get text out on demandbut then what if someone wants to push that text representation back in to email to rebuild a model of the message? You tell them too bad, make some bytes out of that text. Leave it up to the application. Period, the end, it's not the library's job. If you pushed the text out to a 'view message source' UI representation, then the vicissitudes of the system clipboard and other encoding and decoding things may corrupt it in inscrutable ways. You can't fix it. Don't try. So now we have both a bytes parser and a string parser. Why do so many messages on this subject take this for granted? It's wrong for the email module just like it's wrong for every other package. There are plenty of other (better) ways to deal with this problem. Let the application decide how to fudge the encoding of the characters back into bytes that can be parsed. In the face of ambiguity, refuse the temptation to guess and all that. The application has more of an idea of what's going on than the library here, so let it make encoding decisions. Put another way, there's nothing wrong with having a text parser, as long as it just encodes the text according to some known encoding and then parses the bytes :). So, after much discussion, what we arrived at (so far!) is a model that mimics the Python3 split between bytes and strings. If you start with bytes input, you end up with a BytesMessage object. If you start with string input to the parser, you end up with a StringMessage. That may be a handy way to deal with some grotty internal implementation details, but having a 'decode()' method is broken. The thing I care about, as a consumer of this API, is that there is a clearly defined Message interface, which gives me a uniform-looking place where I can ask for either characters (if I'm displaying them to the user) or bytes (if I'm putting them on the wire). I don't particularly care where those bytes came from. I don't care what decoding tricks were necessary to produce the characters. Now, it may be worthwhile to have specific normalization / debrokenifying methods which deal with specific types of corrupt data from the wire; encoding-guessing, replacement-character insertion or whatever else are fine things to try. It may also be helpful to keep around a list of errors in the message, for inspection. But as we know, there are lots of ways that MIME data can go bad other than encoding, so that's just one variety of error that we might want to keep around. (Looking at later messages as I'm about to post this, I think this all sounds pretty similar to Antoine's suggestions, with respect to keeping the implementation within a single class, and not having BytesMessage/UnicodeMessage at the same abstraction level.)___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Polymorphic best practices [was: (Not) delaying the 3.2 release]
On Sep 16, 2010, at 7:34 PM, Barry Warsaw wrote: On Sep 16, 2010, at 06:11 PM, Glyph Lefkowitz wrote: That may be a handy way to deal with some grotty internal implementation details, but having a 'decode()' method is broken. The thing I care about, as a consumer of this API, is that there is a clearly defined Message interface, which gives me a uniform-looking place where I can ask for either characters (if I'm displaying them to the user) or bytes (if I'm putting them on the wire). I don't particularly care where those bytes came from. I don't care what decoding tricks were necessary to produce the characters. But first you have to get to that Message interface. This is why the current email package separates parsing and generating from the representation model. You could conceivably have a parser that rot13's all the payload, or just parses the headers and leaves the payload as a blob of bytes. But the parser tries to be lenient in what it accepts, so that one bad header doesn't cause it to just punt on everything that follows. Instead, it parses what it can and registers a defect on that header, which the application can then reason about, because it has a Message object. If it were to just throw up its hands (i.e. raise an exception), you'd basically be left with a blob of useless crap that will just get /dev/null'd. Oh, absolutely. Please don't interpret anything I say as meaning that the email API should not handle broken data. I'm just saying that you should not expect broken data to round-trip through translation to characters and back, any more than you should expect a broken PNG to round-trip through a translation to a 2d array of pixels and back. Now, it may be worthwhile to have specific normalization / debrokenifying methods which deal with specific types of corrupt data from the wire; encoding-guessing, replacement-character insertion or whatever else are fine things to try. It may also be helpful to keep around a list of errors in the message, for inspection. But as we know, there are lots of ways that MIME data can go bad other than encoding, so that's just one variety of error that we might want to keep around. Right. The middle ground IMO is what the current parser does. It recognizes the problem, registers a defect, and tries to recover, but it doesn't fix the corrupt data. So for example, if you had a valid RFC 2047 encoded Subject but a broken X-Foo header, you'd at least still end up with a Message object. The value of the good headers would be things from which you can get the unicode value, the raw bytes value, parse its parameters, munge it, etc. while the bad header might be something you can only get the raw bytes from. My take on this would be that you should always be able to get bytes or characters, but characters are always suspect, in that once you've decoded, if you had invalid bytes, then they're replacement characters (or your choice of encoding fix).___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Garbage announcement printed on interpreter shutdown
On Sep 10, 2010, at 5:10 PM, Amaury Forgeot d'Arc wrote: 2010/9/10 Fred Drake fdr...@acm.org: On Fri, Sep 10, 2010 at 4:32 PM, Georg Brandl g.bra...@gmx.net wrote: IMO this runs contrary to the decision we made when DeprecationWarnings were made silent by default: it spews messages not only at developers, but also at users, who don't need it and probably are going to be quite confused by it, Agreed; this should be silent by default. +1. I suggest to enable it only when Py_DEBUG (or Py_TRACE_REFS or Py_REF_DEBUG?) is defined. Would it be possible to treat it the same way as a deprecation warning, and show it under the same conditions? It would be nice to know if my Python program is leaking uncollectable objects without rebuilding the interpreter. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Internal counter to debug leaking file descriptors
On Aug 31, 2010, at 10:03 AM, Guido van Rossum wrote: On Linux you can look somewhere in /proc, but I don't know that it would help you find where a file was opened. /dev/fd is actually a somewhat portable way of getting this information. I don't think it's part of a standard, but on Linux it's usually a symlink to /proc/self/fd, and it's available on MacOS and most BSDs (based on a hasty and completely-not-comprehensive investigation). But it won't help you find out when the FDs were originally opened, no. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] 'hasattr' is broken by design
On Aug 24, 2010, at 8:31 AM, Benjamin Peterson wrote: 2010/8/24 Hrvoje Niksic hrvoje.nik...@avl.com: The __length_hint__ lookup expects either no exception or AttributeError, and will propagate others. I'm not sure if this is a bug. On the one hand, throwing anything except AttributeError from __getattr__ is bad style (which is why we fixed the bug by deriving our business exception from AttributeError), but the __length_hint__ check is supposed to be an internal optimization completely invisible to the caller of list(). __length_hint__ is internal and undocumented, so it can do whatever it wants. As it happens though, list() is _quite_ public. Saying X is internal and undocumented, so it can do whatever it wants is never really realistic, especially in response to someone saying we already saw this problem in production, _without_ calling / referring to / knowing about this private API. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Fixing #7175: a standard location for Python config files
On Aug 12, 2010, at 6:30 AM, Tim Golden wrote: I don't care how many stats we're doing You might not, but I certainly do. And I can guarantee you that the authors of command-line tools that have to start up in under ten seconds, for example 'bzr', care too. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 376 proposed changes for basic plugins support
On Aug 3, 2010, at 4:28 AM, M.-A. Lemburg wrote: I don't think that's a problem: the SQLite database would be a cache like e.g. a font cache or TCSH command cache, not a replacement of the meta files stored in directories. Such a database would solve many things at once: faster access to the meta-data of installed packages, fewer I/O calls during startup, more flexible ways of doing queries on the meta-data, needed for introspection and discovery, etc. This is exactly what Twisted already does with its plugin cache, and the previously-cited ticket in this thread should expand the types of metadata which can be obtained about plugins. Packaging systems are perfectly capable of generating and updating such metadata caches, but various packages of Twisted (Debian's especially) didn't read our documentation and kept moving around the place where Python source files were installed, which routinely broke the post-installation hooks and caused all kinds of problems. I would strongly recommend looping in the Python packaging teams from various distros *before* adding another such cache, unless you want to be fielding bugs from Launchpad.net for five years :). ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 376 proposed changes for basic plugins support
On Aug 2, 2010, at 9:53 AM, exar...@twistedmatrix.com wrote: On 01:27 pm, m...@egenix.com wrote: exar...@twistedmatrix.com wrote: On 12:21 pm, m...@egenix.com wrote: See Zope for an example of how well this simply mechanism works out in practice: it simply scans the Products namespace for sub-packages and then loads each sub-package it finds to have it register itself with Zope. This is also roughly how Twisted's plugin system works. One drawback, though, is that it means potentially executing a large amount of Python in order to load plugins. This can build up to a significant performance issue as more and more plugins are installed. I'd say that it's up to the application to deal with this problem. An application which requires lots and lots of plugins could define a registration protocol that does not require loading all plugins at scanning time. It's not fixable at the application level, at least in Twisted's plugin system. It sounds like Zope's system has the same problem, but all I know of that system is what you wrote above. The cost increases with the number of plugins installed on the system, not the number of plugins the application wants to load. We do have a plan to address this in Twisted's plugin system (eventually): http://twistedmatrix.com/trac/ticket/3773, although I'm not sure if that's relevant to the issue at hand. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] proto-pep: plugin proposal (for unittest)
On Aug 1, 2010, at 3:52 PM, Ronald Oussoren wrote: On 1 Aug, 2010, at 17:22, Éric Araujo wrote: Speaking of which... Your documentation says it's named ~/unittest.cfg, could you make this a file in the user base (that is, the prefix where 'setup.py install --user' will install files)? Putting .pydistutils.cfg .pypirc .unittest2.cfg .idlerc and possibly other in the user home directory (or %APPDATA% on win32 and what-have-you on Mac) is unnecessary clutter. However, $PYTHONUSERBASE is not the right directory for configuration files, as pointed in http://bugs.python.org/issue7175 It would be nice to agree on a ~/.python (resp. %APPADATA%/Python) or $XDG_CONFIG_HOME/python directory and put config files there. ~/Library/Python would be a good location on OSX, even if the 100% formally correct location would be ~/Preferences/Python (at least of framework builds, unix-style builds may want to follow the unix convention). 100% formally speaking, MacOS behaves like UNIX in many ways. http://en.wikipedia.org/wiki/Single_UNIX_Specification#Mac_OS_X_and_Mac_OS_X_Server It's fine to have a mac-pathname-convention-following place for such data, but please _also_ respect the UNIX-y version on the Mac. The only possible outcome of python on the Mac respect only Mac pathnames is to have automation scripts that work fine on BSD and Linux, but then break when you try to run them on a Mac. There is really no benefit to intentionally avoiding honoring the UNIX conventions. (For another example, note that although Python resides in /System/Library, on the mac, the thing that's in your $PATH when you're using a terminal is the symlink in /usr/bin/python.) Also, no, ~/Preferences isn't the right place for it either; there's no such thing. You probably meant ~/Library/Preferences. I'd say that since ~/Library/Python is already used, there's no particular reason to add a new ~/Library/Preferences/Python location. After all, if you really care a lot about platform conventions, you should put it in ~/Library/Preferences/org.python.distutils.plist, but I don't see what benefit that extra complexity would have for anyone. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Python signal processing question
On Jul 22, 2010, at 12:00 AM, Stephen J. Turnbull wrote: My understanding of OSError is that the OS is saying sorry, what you tried to do is perfectly reasonable under some circumstances, but you can't do that now. ENOMEM, EPERM, ENOENT etc fit this model. RuntimeError OTOH is basically saying You should know better than to try that! EINVAL fits this model. That is not my understanding of OSError at all, especially given that I have seen plenty of OSErrors that have EINVAL set by various things. OSError's docstring specifically says OS system call failed., and that's the way I've already understood it: you made a syscall and got some kind of error. Python _mostly_ avoids classifying OSErrors into different exception types in other APIs. The selection of RuntimeError in this particular case seems somewhat random and ad-hoc, given that out-of-range signal values give ValueError while SIGKILL and SIGSTOP give RuntimeError. The RuntimeError's args start with 22 (which I assume is supposed to mean EINVAL) but it doesn't have an 'errno' attribute as an OSError would. The ValueError doesn't relate to an errno at all. Nowhere does the documentation say raises OSError or ValueError or TypeError or RuntimeError whose args[0] may be an errno. To be clear, this particular area doesn't bother me. I've been dealing with weird and puzzling signal-handling issues in Python for years and years and this dusty corner of the code has never come up. I did want to reply to this particular message, though, because I *would* eventually like the exception hierarchy raised by certain stdlib functions to be more thoroughly documented and coherent, but a prerequisite to that is to avoid rationalizing the random potpourri of exception types that certain parts of the stdlib emit. I think signal.signal is one such part. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] What to do with languishing patches?
On Jul 18, 2010, at 1:46 PM, Alexander Belopolsky wrote: We already have posponed and remind resolutions, but these are exclusive of accepted. I think there should be a clear way to mark the issue accepted and would be applied if X.Y was out already. Chances are one of the resolution labels already has such meaning, but in this case it should be more prominently documented as such. This is what branches are for. When the X.Y release cycle starts, there should be a branch for X.Y. Any would be applied patches can simply be applied to trunk without interrupting anything; the X.Y release branch can be merged back into trunk as necessary. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] avoiding accidental shadowing of top-level libraries by the main module
On Jul 13, 2010, at 5:02 PM, Nick Coghlan wrote: My concerns aren't about a module reimporting itself directly, they're about the case where a utility module is invoked as __main__ but is also imported normally somewhere else in a program (e.g. pdb is invoked as a top-level debugger, but is also imported directly for some reason). Currently that works as a non-circular import and will only cause hassles if there is top-level state in the affected module that absolutely must be a singleton within a given application. Either change (disallowing it completely as you suggest, or making it a circular import, as I suggest) runs the risk of breaking code that currently appears to work correctly. Fred's point about the practice of changing __name__ in the main module corrupting generated pickles is one I hadn't thought of before though. It's not just pickle; anything that requires __name__ (or __module__) to be accurate for introspection or debugging is also problematic. I have long considered it a 'best practice' (ugh, I hate that phrase, but I can't think of what else to call it) to _always_ do this type of shadowing, and avoid defining _any_ names in the __name__ == '__main__' case, so that there's no ambiguity: http://glyf.livejournal.com/60326.html ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Removing IDLE from the standard library
On Jul 12, 2010, at 4:34 AM, Éric Araujo wrote: Plus, http://twistedmatrix.com/trac/report/15 is a useful resource for core developers with only a little bit of free time to do a review. Title: “Review Tickets, By Order You Should Review Them In” I haven’t found a description of this order, can you explain? Thanks. Part of the reason that the report is worded that way is that we may decide that the order should be different, but it will still be the order that you should review them in :). Right now the order is amount of time since last change, sorted from highest to lowest. In other words, first come, first serve, by last activity. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] [Idle-dev] Removing IDLE from the standard library
On Jul 12, 2010, at 11:36 AM, Reid Kleckner wrote: (Somwhat off-topic): Another pain point students had was accidentally shadowing stdlib modules, like random. Renaming the file didn't solve the problem either, because it left behind .pycs, which I had to help them delete. I feel your pain. It seems like every third person who starts playing with Twisted starts off by making a file called 'twisted.py' and then getting really confused by the behavior. I would love it if this could be fixed, but I haven't yet thought of a solution that would be less confusing than the problem itself.___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] [Idle-dev] Removing IDLE from the standard library
On Jul 12, 2010, at 5:47 PM, Fred Drake wrote: On Mon, Jul 12, 2010 at 5:42 PM, Michael Foord fuzzy...@voidspace.org.uk wrote: I'm sure Brett will love this idea, but if it was impossible to reimport the script being executed as __main__ with a different name it would solve these problems. Indeed! And I'd be quite content with such a solution, since I consider scripts and modules to be distinct. but ... isn't the whole point of 'python -m' to make scripts and modules _not_ be distinct?___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] [Idle-dev] Removing IDLE from the standard library
On Jul 11, 2010, at 10:22 AM, Tal Einat wrote: Most of the responses up to this point have been strongly against my proposal. The main reason given is that it is nice to have a graphical IDE supported out-of-the-box with almost any Python installation. This is especially important for novice programmers and in teaching environments. I understand this sentiment, but I think that supplying a quirky IDE with many caveats, lacking documentation, some bugs and a partially working debugger ends up causing more confusion than good. The people who are actually *in* those environments seem to disagree with you :). I think you underestimate the difficulty of getting software installed and overestimate the demands of new Python users and students. While I don't ever use IDLE if there's an alternative available, I have been very grateful many times for its presence in environments where it was a struggle even to say install Python. A workable editor and graphical shell is important, whatever its flaws. (And I think you exaggerate IDLE's flaws just a bit.) ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Removing IDLE from the standard library
On Jul 11, 2010, at 2:37 PM, Martin v. Löwis wrote: Initially (five years ago!) I tried to overcome these issues by improving IDLE, solving problems and adding a few key features. Without going into details, suffice to say that IDLE hasn't improved much since 2005 despite my efforts. For example, see http://bugs.python.org/issue1529142, where it took nearly 3 years to fix a major issue from the moment I posted the first workaround. For another example, see http://bugs.python.org/issue3068, where I posted a patch for an extension configuration dialog over two years ago, and it hasn't received as much as a sneeze in response. I can understand that this is frustrating, but please understand that this is not specific to your patches, or to IDLE. Many other patches on bugs.python.org remain unreviewed for many years. That's because many of the issues are really tricky, and there are very few people who both have the time and the expertise to evaluate them. This problem seems to me to be the root cause here. Guido proposes to give someone interested in IDLE commit access, and hopefully that will help in this particular area. But, as I recall, at the last language summit there was quite a bit of discussion about how to address the broader issue of patches falling into a black hole. Is anybody working on it? (This seems to me like an area where a judicious application of PSF funds might help; if every single bug were actively triaged and responded to, even if it weren't reviewed, and patch contributors were directed to take specific steps to elicit a response or a review, the fact that patch reviews take a while might not be so bad.) FWIW, I don't consider a few months as a long time for a patch review. It may not be a long time compared to other patch reviews, but it is a very long time for a volunteer to wait for something, especially if that something is any indication that the python developers care that this patch was submitted at all. There seems to be at least one thread a month on this list from a disgruntled community member complaining (directly or indirectly) about this delay. I think that makes it a big problem. At the moment, I'm personally able to perhaps review one issue per week (sometimes less); at this rate, it'll take several years until I get to everything. I guess it depends what you mean by everything, but given that the open bug count is actually increasing at a significant rate, I would say that you can never possibly get to everything. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Removing IDLE from the standard library
On Jul 11, 2010, at 3:19 PM, Martin v. Löwis wrote: Unfortunately, it's often not clear what the submitter wants: does she want to help, or want to get help? For a bug report, I often post a message can you provide a patch?, but sometimes, it isn't that clear. Perhaps this is the one area where the biggest advance could be made: a clarification of the workflow. My experience with Python issues which have been triaged is that everyone who triages tickets has a slightly different idea of who is responsible for the ticket and what they're supposed to do next at every point in the process. Triage, as described on http://www.python.org/dev/workflow/, emphasizes making sure that all fields in the issue tracker are properly set, rather than on communicating with the contributor or reporter. On Twisted, we try to encourage triagers to focus on communicating the workflow ramifications of what a particular contributor has done. We try to provide a response to the bug reporter or patch submitter that says thanks, but in order to move this along, you need to go through the following steps and sometimes even attach a link to the workflow document pointing out exactly where in the process the ticket is now stuck. (At least, that's what we're trying to do.) This involves a lot of repeating ourselves in ticket comments, but it's well worth it (and as more of the repetition moves into citing links to documents that have been written to describe aspects of the workflow, it's less onerous). http://www.python.org/dev/workflow/ describes what the steps are, but it's in a sort of procedural passive voice that doesn't say who is responsible for doing reviews or how to get a list of patches which need to be reviewed or what exactly a third-party non-core-committer reviewer should do to remove the 'Patch review' keyword. http://twistedmatrix.com/trac/wiki/TwistedDevelopment#SubmittingaPatch and http://twistedmatrix.com/trac/wiki/ReviewProcess meander around a bit, but a while ago we re-worked them so that each section has a specific audience (authors, reviewers, or external patch submitters) and that helped readers understand what they're intended to do. Plus, http://twistedmatrix.com/trac/report/15 is a useful resource for core developers with only a little bit of free time to do a review. (I'm just offering some suggestions based on what I think has worked, not to hold Twisted up as a paragon of a perfect streamlined process. We still have folks complain about stuck patches, these documents are _far_ from perfect, and there are still some varying opinions about how certain workflow problems should be dealt with and differences in quality of review. Plus, we have far fewer patches to deal with than Python. Nevertheless, the situation used to be worse for us, and these measures seem to have helped.)___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Removing IDLE from the standard library
On Jul 11, 2010, at 5:33 PM, Georg Brandl wrote: Honestly, how would you feel as a committer to have scores of issues assigned to you -- as a consequence of speedy triage -- knowing that you have to invest potentially hours of volunteer time into them, while the person doing the triaging is done with the bug in a few minutes and paid for it? I'd feel a little bit duped. That doesn't strike me as a particularly useful type of triage. The most useful type of triage in this case would be the kind where the bug gets re-assigned to the *original contributor*, not a core committer, with a message clearly saying thanks! but we will not do anything further with this ticket until *you* do XYZ. This may result in some tickets getting left by wayside, but at least it will be clear that they have been left by the wayside, and whose responsibility they really are. Even so, I would certainly feel better having scores of issues assigned to me than I would feel having scores of issues that are just hanging out in limbo forever. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Licensing // PSF // Motion of non-confidence
On Jul 6, 2010, at 8:09 AM, Steven D'Aprano wrote: You've never used Apple's much-missed Hypertalk, have you? :) on mailingListMessage get the message put it into aMessage if the thread of aMessage contains license wankery put aMessage into the trash end if end mailingListMessage ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Can Python implementations reject semantically invalid expressions?
On Jul 2, 2010, at 12:28 AM, Steven D'Aprano wrote: This question was inspired by something asked on #python today. Consider it a hypothetical, not a serious proposal. We know that many semantic errors in Python lead to runtime errors, e.g. 1 + 1. If an implementation rejected them at compile time, would it still be Python? E.g. if the keyhole optimizer raised SyntaxError (or some other exception) on seeing this: def f(): return 1 + 1 instead of compiling something which can't fail to raise an exception, would that still be a legal Python implementation? I'd say no. Python has defined semantics in this situation: a TypeError is raised. To me, this seems akin to a keyhole optimizer arbitrarily deciding that raise TypeError() should cause the compiler to abort. If this type of expression were common, it would be within the rights of, for example, a Python JIT to generate a fast path through 'f' that wouldn't bother to actually invoke its 'int' type's '__add__' method, since there is no possible way for a Python program to tell the difference, since int.__add__ is immutable. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] thoughts on the bytes/string discussion
On Jun 24, 2010, at 4:59 PM, Guido van Rossum wrote: Regarding the proposal of a String ABC, I hope this isn't going to become a backdoor to reintroduce the Python 2 madness of allowing equivalency between text and bytes for *some* strings of bytes and not others. For my part, what I want out of a string ABC is simply the ability to do application-specific optimizations. There are many applications where all input and output is text, but _must_ be UTF-8. Even GTK uses UTF-8 as its native text representation, so output could just be display. Right now, in Python 3, the only way to be correct about this is to copy every byte of input into 4 bytes of output, then copy each code point *back* into a single byte of output. If all your application does is rewrite the occasional XML attribute, for example, this cost can be significant, if not overwhelming. I'd like a version of 'decode' which would give me a type that was, in every respect, unicode, and responded to all protocols exactly as other unicode objects (or str objects, if you prefer py3 nomenclature ;-)) do, but wouldn't actually copy any of that memory unless it really needed to (for example, to pass to a C API that expected native wide characters), and that would hold on to the original bytes so that it could produce them on demand if encoded to the same encoding again. So, as others in this thread have mentioned, the 'ABC' really implies some stuff about C APIs as well. I'm not sure about the exact performance impact of such a class, which is why I'd like the ability to implement it *outside* of the stdlib and see how it works on a project, and return with a proposal along with some data. There are also different ways to implement this, and other optimizations (like ropes) which might be better. You can almost do this today, but the lack of things like the hypothetical __rcontains__ does make it impossible to be totally transparent about it.___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] bytes / unicode
On Jun 22, 2010, at 8:57 PM, Robert Collins wrote: bzr has a cache of decoded strings in it precisely because decode is slow. We accept slowness encoding to the users locale because thats typically much less data to examine than we've examined while generating the commit/diff/whatever. We also face memory pressure on a regular basis, and that has been, at least partly, due to UCS4 - our translation cache helps there because we have less duplicate UCS4 strings. Thanks for setting the record straight - apologies if I missed this earlier in the thread. It does seem vaguely familiar. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] email package status in 3.X
On Jun 23, 2010, at 8:17 AM, Steve Holden wrote: Guido van Rossum wrote: On Tue, Jun 22, 2010 at 9:37 AM, Tres Seaver tsea...@palladion.com wrote: Any turdiness (which I am *not* arguing for) is a natural consequence of the kinds of backward incompatibilities which were *not* ruled out for Python 3, along with the (early, now waning) build it and they will come optimism about adoption rates. FWIW, my optimisim is *not* waning. I think it's good that we're having this discussion and I expect something useful will come out of it; I also expect in general that the (admittedly serious) problem of having to port all dependencies will be solved in the next few years. Not by magic, but because many people are taking small steps in the right direction, and there will be light eventually. In the mean time I don't blame anyone for sticking with 2.x or being too busy to help port stuff to 3.x. Python 3 has been a long time in the making -- it will be a bit longer still, which was expected. +1 The important thing is to avoid bigotry and FUD, and deal with things the way they are. The #python IRC team have just helped us make a major step forward. This won't be a campaign with a victorious charge over some imaginary finish line. For sure. I don't speak for Tres, but I don't think he wasn't talking about optimism about *adoption*, overall, but optimism about adoption *rates*. And I don't think he was talking about it coming from Guido :). There has definitely been some irrational exuberance from some quarters. The form it usually takes is someone making a blog post which assumes, because the author could port their smallish library or application without too much hassle, that Python 2.x is already dead and everyone should be off of it in a couple of weeks. I've never heard this position from the core team or any official communication or documentation. Far from it: the realistic attitude that the Python 3 migration is something that will take a while has significantly reduced my own concerns. Even the aforementioned blog posts have been encouraging in some ways, because a lot of people are reporting surprisingly easy transitions. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] bytes / unicode
On Jun 21, 2010, at 10:58 PM, Stephen J. Turnbull wrote: The RFC says that URIs are text, and therefore they can (and IMO should) be operated on as text in the stdlib. No, *blue* is the best color for a shed. Oops, wait, let me try that again. While I broadly agree with this statement, it is really an oversimplification. An URI is a structured object, with many different parts, which are transformed from bytes to ASCII (or something latin1-ish, which is really just bytes with a nice face on them) to real, honest-to-goodness text via the IRI specification: http://tools.ietf.org/html/rfc3987. Note also that the complete solution argument cuts both ways. Eg, a complete solution should implement UTS 39 confusables detection[1] and IDNA[2]. Good luck doing that with bytes! And good luck doing that with just characters, too. You need a parsed representation of the URI that you can encode different parts of in different ways. (My understanding is that you should only really implement confusables detection in the netloc... while that may be a bogus example, you're certainly only supposed to do IDNA in the netloc!) You can just call urlsplit() all over the place to emulate this, but this does not give you the ability to go back to the original bytes, and thereby preserve things like brokenly-encoded segments, which seems to be what a lot of this hand-wringing is about. To put it another way, there is no possible information-preserving string or bytes type that will make everyone happy as a result from urljoin(). The only return-type that gives you *everything* is URI. just using 'latin-1' as the encoding allows you to use the (unicode) string operations internally, and then spew your mess out into the world for someone else to clean up, just as using bytes would. This is the limitation that everyone seems to keep dancing around. If you are using the stdlib, with functions that operate on sequences like 'str' or 'bytes', you need to choose from one of three options: 1. decode everything to latin1 (although I prefer to call it charmap when used in this way) so that you can have some mojibake that will fool a function that needs a unicode object, but not lose any information about your input so that it can be transformed back into exact bytes (and be very careful to never pass it somewhere that it will interact with real text!), 2. actually decode things to an appropriate encoding to be displayed to the user and manipulated with proper text-manipulation tools, and throw away information about the bytes, 3. keep both the bytes and the characters together (perhaps in a data structure) so that you can both display the data and encode it in situationally-appropriate ways. The stdlib as it is today is not going to handle the 3rd case for anyone. I think that's fine; it is not the stdlib's job to solve everyone's problems. I've been happy with it providing correctly-functioning pieces that can be used to build more elaborate solutions. This is what I meant when I said I agree with Stephen's first point: the stdlib *should* just keep operating entirely on strings, because URIs are defined, by the spec, to be sequences of ASCII characters. But that's not the whole story. PJE's bstr and ebytes proposals set my teeth on edge. I can totally understand the motivation for them, but I think it would be a big step backwards for python 3 to succumb to that temptation, even in the form of a third-party library. It is really trying to cram more information into a pile of bytes than truly exists there. (Also, if we're going to have encodings attached to bytes objects, I would very much like to add JPEG and FLAC to the list of possibilities.) The real tension there is that WSGI is desperately trying to avoid defining any data structures (i.e. classes), while still trying to work with structured data. An URI class with a 'child' method could handily solve this problem. You could happily call IRI(...).join(some bytes).join(some text) and then just say give me some bytes, it's time to put this on the network, or give me some characters, I have to show something to the user, or even give me some characters appropriate for an 'href=' target in some HTML I'm generating - although that last one could be left to the HTML generator, provided it could get enough information from the URI/IRI object's various parts itself. I don't mean to pick on WSGI, either. This is a common pain-point for porting software to 3.x - you had a string, it kinda worked most of the time before, but now you need to keep track of text too and the functions which seemed to work on bytes no longer do. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] bytes / unicode
On Jun 22, 2010, at 12:53 PM, Guido van Rossum wrote: On Mon, Jun 21, 2010 at 11:47 PM, Raymond Hettinger raymond.hettin...@gmail.com wrote: On Jun 21, 2010, at 10:31 PM, Glyph Lefkowitz wrote: This is a common pain-point for porting software to 3.x - you had a string, it kinda worked most of the time before, but now you need to keep track of text too and the functions which seemed to work on bytes no longer do. Thanks Glyph. That is a nice summary of one kind of challenge facing programmers. Ironically, Glyph also described the pain in 2.x: it only kinda worked. It was not my intention to be ironic about it - that was exactly what I meant :). 3.x is forcing you to confront an issue that you _should_ have confronted for 2.x anyway. (And, I hope, most libraries doing a 3.x migration will take the opportunity to make their 2.x APIs unicode-clean while still in 2to3 mode, and jump ship to 3.x source only _after_ there's a nice transition path for their clients that can be taken in 2 steps.) ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] bytes / unicode
On Jun 22, 2010, at 2:07 PM, James Y Knight wrote: Yeah. This is a real issue I have with the direction Python3 went: it pushes you into decoding everything to unicode early, even when you don't care -- all you really wanted to do is pass it from one API to another, with some well-defined transformations, which don't actually depend on it having being decoded properly. (For example, extracting the path from the URL and attempting to open it as a file on the filesystem.) But you _do_ need to decode it in this case. If you got your URL from some funky UTF-32 datasource, b\x00\x00\x00/ is not a path separator, / is. Plus, you should really be separating path segments and looking at them individually so that you don't fall victim to %2F bugs. And if you want your code to be portable, you need a Unicode representation of your pathname anyway for Windows; plus, there, you need to care about \ as well as /. The fact that your wire-bytes were probably ASCII(-ish) and your filesystem probably encodes pathnames as UTF-8 and so everything looks like it lines up is no excuse not to be explicit about your expectations there. You may want to transcode your characters into some other characters later, but that shouldn't stop you from treating them as characters of some variety in the meanwhile. The surrogateescape method is a nice workaround for this, but I can't help thinking that it might've been better to just treat stuff as possibly-invalid-but-probably-utf8 byte-strings from input, through processing, to output. It seems kinda too late for that, though: next time someone designs a language, they can try that. :) I can think of lots of optimizations that might be interesting for Python (or perhaps some other runtime less concerned with cleverness overload, like PyPy) to implement, like a UTF-8 combining-characters overlay that would allow for fast indexing, lazily populated as random access dictates. But this could all be implemented as smartness inside .encode() and .decode() and the str and bytes types without changing the way the API works. I realize that there are implications at the C level, but as long as you can squeeze a function call in to certain places, it could still work. I can also appreciate what's been said in this thread a bunch of times: to my knowledge, nobody has actually shown a profile of an application where encoding is significant overhead. I believe that encoding _will_ be a significant overhead for some applications (and actually I think it will be very significant for some applications that I work on), but optimizations should really be implemented once that's been demonstrated, so that there's a better understanding of what the overhead is, exactly. Is memory a big deal? Is CPU? Is it both? Do you want to tune for the tradeoff? etc, etc. Clever data-structures seem premature until someone has a good idea of all those things. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] bytes / unicode
On Jun 22, 2010, at 7:23 PM, Ian Bicking wrote: This is a place where bytes+encoding might also have some benefit. XML is someplace where you might load a bunch of data but only touch a little bit of it, and the amount of data is frequently large enough that the efficiencies are important. Different encodings have different characteristics, though, which makes them amenable to different types of optimizations. If you've got an ASCII string or a latin1 string, the optimizations of unicode are pretty obvious; if you've got one in UTF-16 with no multi-code-unit sequences, you could also hypothetically cheat for a while if you're on a UCS4 build of Python. I suspect the practical problem here is that there's no CharacterString ABC in the collections module for third-party libraries to provide their own peculiarly-optimized implementations that could lazily turn into real 'str's as needed. I'd volunteer to write a PEP if I thought I could actually get it done :-\. If someone else wants to be the primary author though, I'll try to help out. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] bytes / unicode
On Jun 21, 2010, at 2:17 PM, P.J. Eby wrote: One issue I remember from my enterprise days is some of the Asian-language developers at NTT/Verio explaining to me that unicode doesn't actually solve certain issues -- that there are use cases where you really *do* need bytes plus encoding in order to properly express something. The thing that I have heard in passing from a couple of folks with experience in this area is that some older software in asia would present characters differently if they were originally encoded in a japanese encoding versus a chinese encoding, even though they were really the same characters. I do know that Han Unification is a giant political mess (http://en.wikipedia.org/wiki/Han_unification makes for some interesting reading), but my understanding is that it has handled enough of the cases by now that one can write software to display asian languages and it will basically work with a modern version of unicode. (And of course, there's always the private use area, as Stephen Turnbull pointed out.) Regardless, this is another example where keeping around a string isn't really enough. If you need to display a japanese character in a distinct way because you are operating in the japanese *script*, you need a tag surrounding your data that is a hint to its presentation. The fact that these presentation hints were sometimes determined by their encoding is an unfortunate historical accident. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] #Python3 ! ? (was Python Library Support in 3.x)
On Jun 19, 2010, at 5:02 PM, Terry Reedy wrote: HoweverI have very little experience with IRC and consequently have little idea what getting a permanent, owned, channel like #python entails. Hence the '?' that follows. What do others think? Sure, this is a good idea. Technically speaking, this is extremely easy. Somebody needs to /msg chanserv register #python3 and that's about it. (In this case, that someone may need to be Brett Cannon, since he is the official group contact for Freenode regarding Python-related channels.) Practically speaking, you will need a group of at least a dozen contributors, each in a different timezone, who sit there all day answering questions :). Otherwise the ownership of the channel is just a signpost pointing at an empty room. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] #Python3 ! ? (was Python Library Support in 3.x)
On Jun 19, 2010, at 5:39 PM, geremy condra wrote: Bottom line, what I'd really like to do is kick them all off of #python, but practically I see very little that can be done to rectify the situation at this point. Here's something you can do: port libraries to python 3 and make the ecosystem viable. It's as simple as that. Nobody on #python has an ideological axe to grind, they just want to tell users to use tools which actually solve their problems. (Well, unless you think that helping users is ideological axe-grinding, in which case I think you may want to re-examine your own premises.) If Python 3 had all the features and libraries as Python 2, and ran in all the same places (for example, as Stephen Thorne reminded me when I asked him about this, the oldest supported version of Red Hat Enterprise Linux...) then it would be an equally viable answer on IRC. It's going to take a lot of work to get it to that point. Even if you write code, of course, it's too much work for one person to fill the whole gap. Have some patience. The PSF is funding these efforts, and more library authors are porting all the time. Eventually, resistance in forums like Freenode's #python will disappear. But you can't make it go away by wishing it away, you have to get rid of the cause. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 3148 ready for pronouncement
On May 24, 2010, at 5:36 AM, Brian Quinlan wrote: On May 24, 2010, at 5:16 AM, Glyph Lefkowitz wrote: On May 23, 2010, at 2:37 AM, Brian Quinlan wrote: On May 23, 2010, at 2:44 PM, Glyph Lefkowitz wrote: ProcessPoolExecutor has the same serialization perils that multiprocessing does. My original plan was to link to the multiprocessing docs to explain them but I couldn't find them listed. Linking to the pickle documentation might be a good start. Yes, the execution context is Executor-dependent. The section under ProcessPoolExecutor and ThreadPoolExecutor spells this out, I think. I suppose so. I guess I'm just looking for more precise usage of terminology. (This is a PEP, after all. It's a specification that multiple VMs may have to follow, not just some user documentation for a package, even if they'll *probably* be using your code in all cases.) I'd be happier if there were a clearer term than calls for the things being scheduled (submissions?), since the done callbacks aren't called in the subprocess for ProcessPoolExecutor, as we just discussed. Sure. Really, almost any contract would work, it just needs to be spelled out. It might be nice to know whether the thread invoking the callbacks is a daemon thread or not, but I suppose it's not strictly necessary. Your concerns is that the thread will be killed when the interpreter exits? It won't be. Good to know. Tell it to the PEP though, not me ;). No reaction on [invoker vs. future]? I think you'll wish you did this in a couple of years when you start bumping into application code that calls set_result :). My reactions are mixed ;-) Well, you are not obliged to take my advice, as long as I am not obliged to refrain from mocking you mercilessly if it happens that I was right in a couple of years ;-). Your proposal is to add a level of indirection to make it harder for people to call implementation methods. The downside is that it makes it a bit harder to write tests and Executors. Both tests and executors will still create and invoke methods directly on one object; the only additional difficulty seems to be the need to type '.future' every so often on the executor/testing side of things, and that seems a cost well worth paying to avoid confusion over who is allowed to call those methods and when. I also can't see a big problem in letting people call set_result in client code though it is documented as being only for Executor implementations and tests. On the implementation side, I don't see why an Invoker needs a reference to the future. Well, uh... class Invoker(object): def __init__(self): Should only be called by Executor implementations. self.future = Future() ^ this is what I'd call a reference to the future ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 3148 ready for pronouncement
On May 26, 2010, at 3:37 AM, Paul Moore wrote: On 26 May 2010 08:11, Lennart Regebro rege...@gmail.com wrote: On Wed, May 26, 2010 at 06:22, Nick Coghlan ncogh...@gmail.com wrote: - download a futures module from PyPI and live with the additional dependency Why would that be a problem? That has been hashed out repeatedly on this and other lists. Can it please be stipulated that for *some* people, in *some* cases, it is a problem? Sure, but I for one fully support Lennart asking the question, because while in the short term this *is* a problem with packaging tools in the Python ecosystem, in the long term (as you do note) it's an organizational dysfunction that can be addressed with better tools. I think it would be bad to ever concede the point that sane factoring of dependencies and code re-use aren't worth it because some jerk in Accounting or System Operations wants you to fill out a requisition form for a software component that's free and liberally licensed anyway. To support the unfortunate reality that such jerks in such departments really do in fact exist, there should be simple tools to glom a set of small, nicely factored dependencies into a giant monolithic ball of crud that installs all at once, and slap a sticker on the side of it that says I am only filling out your stupid form once, okay. This should be as distant as possible from the actual decision to package things in sensibly-sized chunks. In other words, while I kinda-sorta buy Brian's argument that having this module in easy reach will motivate more people to use a standard, tested idiom for parallelization, I *don't* think that the stdlib should be expanded simply to accommodate those who just don't want to install additional packages for anything.___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 3148 ready for pronouncement
On May 26, 2010, at 4:55 AM, Brian Quinlan wrote: I said exactly the opposite of what I meant: futures don't need a reference to the invoker. Indeed they don't, and they really shouldn't have one. If I wrote that they did, then it was an error. ... and that appears to be it! Thank you for your very gracious handling of a pretty huge pile of criticism :). Good luck with the PEP, -glyph ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 3148 ready for pronouncement
On May 23, 2010, at 2:37 AM, Brian Quinlan wrote: On May 23, 2010, at 2:44 PM, Glyph Lefkowitz wrote: On May 22, 2010, at 8:47 PM, Brian Quinlan wrote: Jesse, the designated pronouncer for this PEP, has decided to keep discussion open for a few more days. So fire away! As you wish! I retract my request ;-) May you get what you wish for, may you find what you are seeking :). The PEP should be consistent in its usage of terminology about callables. It alternately calls them callables, functions, and functions or methods. It would be nice to clean this up and be consistent about what can be called where. I personally like callables. Did you find the terminology confusing? If not then I propose not changing it. Yes, actually. Whenever I see references to the multiprocessing module, I picture a giant HERE BE (serialization) DRAGONS sign. When I saw that some things were documented as being functions, I thought that maybe there was intended to be a restriction like the these can only be top-level functions so they're easy for different executors to locate and serialize. I didn't realize that the intent was arbitrary callables until I carefully re-read the document and noticed that the terminology was inconsistent. But changing it in the user docs is probably a good idea. I like callables too. Great. Still, users will inevitably find the PEP and use it as documentation too. The execution context of callable code is not made clear. Implicitly, submit() or map() would run the code in threads or processes as defined by the executor, but that's not spelled out clearly. Any response to this bit? Did I miss something in the PEP? More relevant to my own interests, the execution context of the callables passed to add_done_callback and remove_done_callback is left almost completely to the imagination. If I'm reading the sample implementation correctly, http://code.google.com/p/pythonfutures/source/browse/branches/feedback/python3/futures/process.py#241, it looks like in the multiprocessing implementation, the done callbacks are invoked in a random local thread. The fact that they are passed the future itself *sort* of implies that this is the case, but the multiprocessing module plays fast and loose with object identity all over the place, so it would be good to be explicit and say that it's *not* a pickled copy of the future sitting in some arbitrary process (or even on some arbitrary machine). The callbacks will always be called in a thread other than the main thread in the process that created the executor. Is that a strong enough contract? Sure. Really, almost any contract would work, it just needs to be spelled out. It might be nice to know whether the thread invoking the callbacks is a daemon thread or not, but I suppose it's not strictly necessary. This is really minor, I know, but why does it say NOTE: This method can be used to create adapters from Futures to Twisted Deferreds? First of all, what's the deal with NOTE; it's the only NOTE in the whole PEP, and it doesn't seem to add anything. This sentence would read exactly the same if that word were deleted. Without more clarity on the required execution context of the callbacks, this claim might not actually be true anyway; Deferred callbacks can only be invoked in the main reactor thread in Twisted. But even if it is perfectly possible, why leave so much of the adapter implementation up to the imagination? If it's important enough to mention, why not have a reference to such an adapter in the reference Futures implementation, since it *should* be fairly trivial to write? I'm a bit surprised that this doesn't allow for better interoperability with Deferreds given this discussion: discussion snipped I did not communicate that well. As implemented, it's quite possible to implement a translation layer which turns a Future into a Deferred. What I meant by that comment was, the specification in the PEP was to loose to be sure that such a layer would work with arbitrary executors. For what it's worth, the Deferred translator would look like this, if you want to include it in the PEP (untested though, you may want to run it first): from twisted.internet.defer import Deferred from twisted.internet.reactor import callFromThread def future2deferred(future): d = Deferred() def invoke_deferred(): try: result = future.result() except: d.errback() else: d.callback(result) def done_callback(same_future): callFromThread(invoke_deferred) future.add_done_callback(done_callback) return d This does beg the question of what the traceback will look like in that except: block though. I guess the multi-threaded executor will use python3 exception chaining so Deferred should be able to show a sane
Re: [Python-Dev] PEP 3148 ready for pronouncement
On May 22, 2010, at 8:47 PM, Brian Quinlan wrote: Jesse, the designated pronouncer for this PEP, has decided to keep discussion open for a few more days. So fire away! As you wish! The PEP should be consistent in its usage of terminology about callables. It alternately calls them callables, functions, and functions or methods. It would be nice to clean this up and be consistent about what can be called where. I personally like callables. The execution context of callable code is not made clear. Implicitly, submit() or map() would run the code in threads or processes as defined by the executor, but that's not spelled out clearly. More relevant to my own interests, the execution context of the callables passed to add_done_callback and remove_done_callback is left almost completely to the imagination. If I'm reading the sample implementation correctly, http://code.google.com/p/pythonfutures/source/browse/branches/feedback/python3/futures/process.py#241, it looks like in the multiprocessing implementation, the done callbacks are invoked in a random local thread. The fact that they are passed the future itself *sort* of implies that this is the case, but the multiprocessing module plays fast and loose with object identity all over the place, so it would be good to be explicit and say that it's *not* a pickled copy of the future sitting in some arbitrary process (or even on some arbitrary machine). This is really minor, I know, but why does it say NOTE: This method can be used to create adapters from Futures to Twisted Deferreds? First of all, what's the deal with NOTE; it's the only NOTE in the whole PEP, and it doesn't seem to add anything. This sentence would read exactly the same if that word were deleted. Without more clarity on the required execution context of the callbacks, this claim might not actually be true anyway; Deferred callbacks can only be invoked in the main reactor thread in Twisted. But even if it is perfectly possible, why leave so much of the adapter implementation up to the imagination? If it's important enough to mention, why not have a reference to such an adapter in the reference Futures implementation, since it *should* be fairly trivial to write? The fact that add_done_callback is implemented using a set is weird, since it means you can't add the same callback more than once. The set implementation also means that the callbacks get called in a semi-random order, potentially creating even _more_ hard-to-debug order of execution issues than you'd normally have with futures. And I think that this documentation will be unclear to a lot of novice developers: many people have trouble with the idea that a = Foo(); b = Foo(); a.bar_method != b.bar_method, but import foo_module; foo_module.bar_function == foo_module.bar_function. It's also weird that you can remove callbacks - what's the use case? Deferreds have no callback-removal mechanism and nobody has ever complained of the need for one, as far as I know. (But lots of people do add the same callback multiple times.) I suggest having have add_done_callback, implementing it with a list so that callbacks are always invoked in the order that they're added, and getting rid of remove_done_callback. futures._base.Executor isn't exposed publicly, but it needs to be. The PEP kinda makes it sound like it is (Executor is an abstract class...). Plus, A third party library wanting to implement an executor of its own shouldn't have to copy and paste the implementation of Executor.map. One minor suggestion on the internal future methods bit - something I wish we'd done with Deferreds was to put 'callback()' and 'addCallbacks()' on separate objects, so that it was very explicit whether you were on the emitting side of a Deferred or the consuming side. That seems to be the case with these internal methods - they are not so much internal as they are for the producer of the Future (whether a unit test or executor) so you might want to put them on a different object that it's easy for the thing creating a Future() to get at but hard for any subsequent application code to fiddle with by accident. Off the top of my head, I suggest naming it Invoker(). A good way to do this would be to have an Invoker class which can't be instantiated (raises an exception from __init__ or somesuch), then a Future.create() method which returns an Invoker, which itself has a '.future' attribute. Finally, why isn't this just a module on PyPI? It doesn't seem like there's any particular benefit to making this a stdlib module and going through the whole PEP process - except maybe to prompt feedback like this :). Issues like the ones I'm bringing up could be fixed pretty straightforwardly if it were just a matter of filing a bug on a small package, but fixing a stdlib module is a major undertaking. ___ Python-Dev mailing list