Re: [Python-Dev] Ctypes and the stdlib (was Re: LZMA compression support in 3.3)
Guido van Rossum wrote: On Sun, Aug 28, 2011 at 11:23 AM, Stefan Behnel stefan...@behnel.de wrote: Hi, sorry for hooking in here with my usual Cython bias and promotion. When the question comes up what a good FFI for Python should look like, it's an obvious reaction from my part to throw Cython into the game. Terry Reedy, 28.08.2011 06:58: Dan, I once had the more or less the same opinion/question as you with regard to ctypes, but I now see at least 3 problems. 1) It seems hard to write it correctly. There are currently 47 open ctypes issues, with 9 being feature requests, leaving 38 behavior-related issues. Tom Heller has not been able to work on it since the beginning of 2010 and has formally withdrawn as maintainer. No one else that I know of has taken his place. Cython has an active set of developers and a rather large and growing user base. It certainly has lots of open issues in its bug tracker, but most of them are there because we *know* where the development needs to go, not so much because we don't know how to get there. After all, the semantics of Python and C/C++, between which Cython sits, are pretty much established. Cython compiles to C code for CPython, (hopefully soon [1]) to Python+ctypes for PyPy and (mostly [2]) C++/CLI code for IronPython, which boils down to the same build time and runtime kind of dependencies that the supported Python runtimes have anyway. It does not add dependencies on any external libraries by itself, such as the libffi in CPython's ctypes implementation. For the CPython backend, the generated code is very portable and is self-contained when compiled against the CPython runtime (plus, obviously, libraries that the user code explicitly uses). It generates efficient code for all existing CPython versions starting with Python 2.4, with several optimisations also for recent CPython versions (including the upcoming 3.3). 2) It is not trivial to use it correctly. Cython is basically Python, so Python developers with some C or C++ knowledge tend to get along with it quickly. I can't say yet how easy it is (or will be) to write code that is portable across independent Python implementations, but given that that field is still young, there's certainly a lot that can be done to aid this. Cythin does sound attractive for cross-Python-implementation use. This is exciting. I think it needs a SWIG-like companion script that can write at least first-pass ctypes code from the .h header files. Or maybe it could/should use header info at runtime (with the .h bundled with a module). From my experience, this is a nice to have more than a requirement. It has been requested for Cython a couple of times, especially by new users, and there are a couple of scripts out there that do this to some extent. But the usual problem is that Cython users (and, similarly, ctypes users) do not want a 1:1 mapping of a library API to a Python API (there's SWIG for that), and you can't easily get more than a trivial mapping out of a script. But, yes, a one-shot generator for the necessary declarations would at least help in cases where the API to be wrapped is somewhat large. Hm, the main use that was proposed here for ctypes is to wrap existing libraries (not to create nicer APIs, that can be done in pure Python on top of this). In general, an existing library cannot be called without access to its .h files -- there are probably struct and constant definitions, platform-specific #ifdefs and #defines, and other things in there that affect the linker-level calling conventions for the functions in the library. (Just like Python's own .h files -- e.g. the extensive renaming of the Unicode APIs depending on narrow/wide build) How does Cython deal with these? I wonder if for this particular purpose SWIG isn't the better match. (If SWIG weren't universally hated, even by its original author. :-) SIP is an alternative to SWIG: http://www.riverbankcomputing.com/software/sip/intro http://pypi.python.org/pypi/SIP and there are a few others as well: http://wiki.python.org/moin/IntegratingPythonWithOtherLanguages 3) It seems to be slower than compiled C extension wrappers. That, at least, was the discovery of someone who re-wrote pygame using ctypes. (The hope was that using ctypes would aid porting to 3.x, but the time penalty was apparently too much for time-critical code.) Cython code can be as fast as C code, and in some cases, especially when developer time is limited, even faster than hand written C extensions. It allows for a straight forward optimisation path from regular Python code down to the speed of C, and trivial interaction with C code itself, if the need arises. Stefan [1] The PyPy port of Cython is currently being written as a GSoC project. [2] The IronPython port of Cython was written to facility a NumPy port to the .NET environment. It's currently not a complete port of all Cython
Re: [Python-Dev] PEP 393 review
On Sun, Aug 28, 2011 at 21:47, Martin v. Löwis mar...@v.loewis.de wrote: result strings. In PEP 393, a buffer must be scanned for the highest code point, which means that each byte must be inspected twice (a second time when the copying occurs). This may be a silly question: are there things in place to optimize this for the case where two strings are combined? E.g. highest character in combined string is max(highest character in either of the strings). Also, this PEP makes me wonder if there should be a way to distinguish between language PEPs and (CPython) implementation PEPs, by adding a tag or using the PEP number ranges somehow. Cheers, Dirkjan ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 393 review
Le 29/08/2011 11:03, Dirkjan Ochtman a écrit : On Sun, Aug 28, 2011 at 21:47, Martin v. Löwismar...@v.loewis.de wrote: result strings. In PEP 393, a buffer must be scanned for the highest code point, which means that each byte must be inspected twice (a second time when the copying occurs). This may be a silly question: are there things in place to optimize this for the case where two strings are combined? E.g. highest character in combined string is max(highest character in either of the strings). The double-scan issue is only for codec decoders. If you combine two Unicode objects (a+b), you already know the highest code point and the kind of each string. Victor ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 393 review
Le 28/08/2011 23:06, Martin v. Löwis a écrit : Am 28.08.2011 22:01, schrieb Antoine Pitrou: - the iobench results are between 2% acceleration (seek operations), 16% slowdown for small-sized reads (4.31MB/s vs. 5.22 MB/s) and 37% for large sized reads (154 MB/s vs. 235 MB/s). The speed difference is probably in the UTF-8 decoder; I have already restored the runs of ASCII optimization and am out of ideas for further speedups. Again, having to scan the UTF-8 string twice is probably one cause of slowdown. I don't think it's the UTF-8 decoder because I see an even larger slowdown with simpler encodings (e.g. -E latin1 or -E utf-16le). Those haven't been ported to the new API, yet. Consider, for example, d9821affc9ee. Before that, I got 253 MB/s on the 4096 units read test; with that change, I get 610 MB/s. The trunk gives me 488 MB/s, so this is a 25% speedup for PEP 393. If I understand correctly, the performance now highly depend on the used characters? A pure ASCII string is faster than a string with characters in the ISO-8859-1 charset? Is it also true for BMP characters vs non-BMP characters? Do these benchmark tools use only ASCII characters, or also some ISO-8859-1 characters? Or, better, different Unicode ranges in different tests? Victor ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Software Transactional Memory for Python
Hi Guido, On Sun, Aug 28, 2011 at 6:43 PM, Guido van Rossum gu...@python.org wrote: This sounds like a very interesting idea to pursue, even if it's late, and even if it's experimental, and even if it's possible to cause deadlocks (no news there). I propose that we offer a C API in Python 3.3 as well as an extension module that offers the proposed decorator. Very good idea. http://bugs.python.org/issue12850 The extension module, called 'stm' for now, is designed as an independent 3rd-party extension module. It should at this point not be included in the stdlib; for one thing, it needs some more testing than my quick one-page hacks, and we need to seriously look at the deadlock issues mentioned here. But the patch to ceval.c above looks rather straightforward to me and could, if no subtle issue is found, be included in the standard CPython. A bientôt, Armin. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Ctypes and the stdlib
Guido van Rossum, 29.08.2011 04:27: On Sun, Aug 28, 2011 at 11:23 AM, Stefan Behnel wrote: Terry Reedy, 28.08.2011 06:58: I think it needs a SWIG-like companion script that can write at least first-pass ctypes code from the .h header files. Or maybe it could/should use header info at runtime (with the .h bundled with a module). From my experience, this is a nice to have more than a requirement. It has been requested for Cython a couple of times, especially by new users, and there are a couple of scripts out there that do this to some extent. But the usual problem is that Cython users (and, similarly, ctypes users) do not want a 1:1 mapping of a library API to a Python API (there's SWIG for that), and you can't easily get more than a trivial mapping out of a script. But, yes, a one-shot generator for the necessary declarations would at least help in cases where the API to be wrapped is somewhat large. Hm, the main use that was proposed here for ctypes is to wrap existing libraries (not to create nicer APIs, that can be done in pure Python on top of this). The same applies to Cython, obviously. The main advantage of Cython over ctypes for this is that the Python-level wrapper code is also compiled into C, so whenever the need for a thicker wrapper arises in some part of the API, you don't loose any performance in intermediate layers. In general, an existing library cannot be called without access to its .h files -- there are probably struct and constant definitions, platform-specific #ifdefs and #defines, and other things in there that affect the linker-level calling conventions for the functions in the library. (Just like Python's own .h files -- e.g. the extensive renaming of the Unicode APIs depending on narrow/wide build) How does Cython deal with these? In the CPython backend, the header files are normally #included by the generated C code, so they are used at C compilation time. Cython has its own view on the header files in separate declaration files (.pxd). Basically looks like this: # file mymath.pxd cdef extern from aheader.h: double PI double E double abs(double x) These declaration files usually only contain the parts of a header file that are used in the user code, either manually copied over or extracted by scripts (that's what I was referring to in my reply to Terry). The complete 'real' content of the header file is then used by the C compiler at C compilation time. The user code employs a cimport statement to import the declarations at Cython compilation time, e.g. # file mymodule.pyx cimport mymath print mymath.PI + mymath.E would result in C code that #includes aheader.h, adds the C constants PI and E, converts the result to a Python float object and prints it out using the normal CPython machinery. This means that declarations can be reused across modules, just like with header files. In fact, Cython actually ships with a couple of common declaration files, e.g. for parts of libc, NumPy or CPython's C-API. I don't know that much about the IronPython backend, but from what I heard, it uses basically the same build time mechanisms and generates a thin C++ wrapper and a corresponding CLI part as glue layer. The ctypes backend for PyPy works different in that it generates a Python module from the .pxd files that contains the declarations as ctypes code. Then, the user code imports that normally at Python runtime. Obviously, this means that there are cases where the Cython-level declarations and thus the generated ctypes code will not match the ABI for a given target platform. So, in the worst case, there is a need to manually adapt the ctypes declarations in the Python module that was generated from the .pxd. Not worse than the current situation, though, and the rest of the Cython wrapper will compile into plain Python code that simply imports the declarations from the .pxd modules. But there's certainly room for improvements here. Stefan ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Ctypes and the stdlib
On 29 August 2011 10:39, Stefan Behnel stefan...@behnel.de wrote: In the CPython backend, the header files are normally #included by the generated C code, so they are used at C compilation time. Cython has its own view on the header files in separate declaration files (.pxd). Basically looks like this: # file mymath.pxd cdef extern from aheader.h: double PI double E double abs(double x) These declaration files usually only contain the parts of a header file that are used in the user code, either manually copied over or extracted by scripts (that's what I was referring to in my reply to Terry). The complete 'real' content of the header file is then used by the C compiler at C compilation time. The user code employs a cimport statement to import the declarations at Cython compilation time, e.g. # file mymodule.pyx cimport mymath print mymath.PI + mymath.E would result in C code that #includes aheader.h, adds the C constants PI and E, converts the result to a Python float object and prints it out using the normal CPython machinery. One thing that would make it easier for me to understand the role of Cython in this context would be to see a simple example of the type of thin wrapper we're talking about here. The above code is nearly this, but the pyx file executes real code. For example, how do I simply expose pi and abs from math.h? Based on the above, I tried a pyx file containing just the code cdef extern from math.h: double pi double abs(double x) but the resulting module exported no symbols. What am I doing wrong? Could you show a working example of writing such a wrapper? This is probably a bit off-topic, but it seems to me that whenever Cython comes up in these discussions, the implications of Cython-as-an-implementation-of-python obscure the idea of simply using Cython as a means of writing thin library wrappers. Just to clarify - the above code (if it works) seems to me like a nice simple means of writing wrappers. Something involving this in a pxd file, plus a pyx file with a whole load of dummy def abs(x): return cimported_module.abs(x) definitions, seems ok, but annoyingly clumsy. (Particularly for big APIs). I've kept python-dev in this response, on the assumption that others on the list might be glad of seeing a concrete example of using Cython to build wrapper code. But anything further should probably be taken off-list... Thanks, Paul. PS This would also probably be a useful addition to the Cython wiki and/or the manual. I searched both and found very little other than a page on wrapping C++ classes (which is not very helpful for simple C global functions and constants). ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Should we move to replace re with regex?
On Sun, Aug 28, 2011 at 7:28 AM, Guido van Rossum gu...@python.org wrote: Are you volunteering? (Even if you don't want to be the only maintainer, it still sounds like you'd be a good co-maintainer of the regex module.) My name is listed in the experts index for 're' [0], and that should make me already co-maintainer for the module. [...] 4) add documentation for the module and the (public) functions in Doc/library (this should be done anyway). Does regex have a significany public C interface? (_sre.c doesn't.) Does it have a Python-level interface beyond what re.py offers (apart from the obvious new flags and new regex syntax/semantics)? I don't think it does. Explaining the new syntax/semantics is useful for developers (e.g.what \p and \X are supposed to match), but also for users, so it's fine to have this documented in Doc/library/re.rst (and I don't think it's necessary to duplicate it in the README/PEP/Wiki). This will ensure that the general quality of the code is good, and when someone actually has to work on the code, there's enough documentation to make it possible. That sounds like a good description of a process that could lead to acceptance of regex as a re replacement. So if we want to get this done I think we need Matthew for 1) (unless someone else wants to do it and have him review the result). If making a diff with the current re is doable and makes sense, we can use the rietveld instance on the bug tracker to make the review for 2). The same could be done with a diff that replaces the whole module though. 3) will follow after 2), and 4) is not difficult and can be done when we actually replace re (it's probably enough to reorganize a bit and convert to rst the page on PyPI). Best Regards, Ezio Melotti [0]: http://docs.python.org/devguide/experts.html#stdlib ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 393 Summer of Code Project
On Mon, 29 Aug 2011 12:43:24 +0900 Stephen J. Turnbull step...@xemacs.org wrote: Since when can s[0] represent a code point outside the BMP, for s a Unicode string in a narrow build? Remember, the UCS-2/narrow vs. UCS-4/wide distinction is *not* about what Python supports vs. the outside world. It's about what the str/ unicode type is an array of. Why would that be? Antoine. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Software Transactional Memory for Python
On Sun, 28 Aug 2011 09:43:33 -0700 Guido van Rossum gu...@python.org wrote: This sounds like a very interesting idea to pursue, even if it's late, and even if it's experimental, and even if it's possible to cause deadlocks (no news there). I propose that we offer a C API in Python 3.3 as well as an extension module that offers the proposed decorator. The C API could then be used to implement alternative APIs purely as extension modules (e.g. would a deadlock-detecting API be possible?). We could offer the C API without shipping an extension module ourselves. I don't think we should provide (and maintain!) a Python API that helps users put themselves in all kind of nasty situations. There is enough misunderstanding around the GIL and multithreading already. Regards Antoine. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] LZMA compression support in 3.3
On Aug 27, 2011, at 10:36 PM, Nadeem Vawda wrote: I talked to Antoine about this on IRC; he didn't seem to think a PEP would be necessary. But a summary of the discussion on the tracker issue might still be a useful thing to have, given how long it's gotten. I agree with Antoine - no PEP should be necessary. A well reviewed and tested module should do it. -Barry ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] SWIG (was Re: Ctypes and the stdlib)
On Mon, Aug 29, 2011 at 12:27 PM, Guido van Rossum gu...@python.org wrote: I wonder if for this particular purpose SWIG isn't the better match. (If SWIG weren't universally hated, even by its original author. :-) Hate is probably a strong word, but as the author of Swig, let me chime in here ;-). I think there are probably some lessons to be learned from Swig. As Nick noted, Swig is best suited when you have control over both sides (C/C++ and Python) of whatever code you're working with. In fact, the original motivation for Swig was to give application programmers (scientists in my case), a means for automatically generating the Python bindings to their code. However, there was one other important assumption--and that was the fact that all of your real code was going to be written in C/C++ and that the Python scripting interface was just an optional add-on (perhaps even just a throw-away thing). Keep in mind, Swig was first created in 1995 and at that time, the use of Python (or any similar language) was a pretty radical idea in the sciences. Moreover, there was a lot of legacy code that people just weren't going to abandon. Thus, I always viewed Swig as a kind of transitional vehicle for getting people to use Python who might otherwise not even consider it. Getting back to Nick's point though, to really use Swig effectiv ely, it was always known that you might have to reorganize or refactor your C/C++ code to make it more Python friendly. However, due to the automatic wrapper generation, you didn't have to do it all at once. Basically your code could organically evolve and Swig would just keep up with whatever you were doing. In my projects, we'd usually just tuck Swig away in some Makefile somewhere and forget about it. One of the major complexities of Swig is the fact that it attempts to parse C/C++ header files. This very notion is actually a dangerous trap waiting for anyone who wants to wander into it. You might look at a header file and say, well how hard could it be to just grab a few definitions out of there? I'll just write a few regexs or come up with some simple hack for recognizing function definitions or something. Yes, you can do that, but you're immediately going to find that whatever approach you take starts to break down into horrible corner cases. Swig started out like this and quickly turned into a quagmire of esoteric bug reports. All sorts of problems with preprocessor macros, typedefs, missing headers, and other things. For awhile, I would get these bug reports that would go something like I had this C++ class inside a namespace with an abstract method taking a typedef'd const reference to this smart pointer . and Swig broke. Hell, I can't even underst and the bug report let alone know how to fix it. Almost all of these bugs were due to the fact that Swig started out as a hack and didn't really have any kind of solid conceptual foundation for how it should be put together. If you flash forward a bit, from about 2001-2004 there was a very serious push to fix these kinds of issues. Although it was not a complete rewrite of Swig, there were a huge number of changes to how it worked during this time. Swig grew a fully compatible C++ preprocessor that fully supported macros A complete C++ type system was implemented including support for namespaces, templates, and even such things as template partial specialization. Swig evolved into a multi-pass compiler that was doing all sorts of global analysis of the interface. Just to give you an idea, Swig would do things such as automatically detect/wrap C++ smart pointers. It could wrap overloaded C++ methods/function. Also, if you had a C++ class with virtual methods, it would only make one Python wrapper function and then reuse across all wrapped subclasses. Under the covers of all of this, the implementation basically evolved into a sophisticated macro preprocessor coupled with a pattern matching engine built on top of the C++ type system. For example, you could write patterns that matched specific C++ types (the much hated typemap feature) and you could write patterns that matched entire C++ declarations. This whole pattern matching approach had a huge power if you knew what you were doing. For example, I had a graduate student working on adding contracts to Swig--something that was being funded by a NSF grant. It was cool and mind boggling all at once. In hindsight however, I think the complexity of Swig has exceeded anyone's ability to fully understand it (including my own). For example, to even make sense of what's happening, you have to have a pretty solid grasp of the C/C++ type system (easier said than done). Couple that with all sorts of crazy pattern matching, low-level code fragments, and a ton of macro definitions, your head will literally explode if you try to figure out what's happening. So far as I know,
Re: [Python-Dev] Software Transactional Memory for Python
On Mon, Aug 29, 2011 at 5:20 AM, Antoine Pitrou solip...@pitrou.net wrote: On Sun, 28 Aug 2011 09:43:33 -0700 Guido van Rossum gu...@python.org wrote: This sounds like a very interesting idea to pursue, even if it's late, and even if it's experimental, and even if it's possible to cause deadlocks (no news there). I propose that we offer a C API in Python 3.3 as well as an extension module that offers the proposed decorator. The C API could then be used to implement alternative APIs purely as extension modules (e.g. would a deadlock-detecting API be possible?). We could offer the C API without shipping an extension module ourselves. I don't think we should provide (and maintain!) a Python API that helps users put themselves in all kind of nasty situations. There is enough misunderstanding around the GIL and multithreading already. +1 ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Software Transactional Memory for Python
Hi Charles-François, 2011/8/27 Charles-François Natali neolo...@free.fr: The problem is that many locks are actually acquired implicitely. For example, `print` to a buffered stream will acquire the fileobject's mutex. Indeed. After looking more at the kind of locks used throughout the stdlib, I notice that in many cases a lock is acquired by code in the following simple pattern: Py_BEGIN_ALLOW_THREADS PyThread_acquire_lock(self-lock, 1); Py_END_ALLOW_THREADS If one thread is waiting in the END_ALLOW_THREADS for another one to release the GIL, but the other one is in a with atomic block and tries to acquire the same lock, deadlock. But the issue can be resolved: the first thread in the above example needs to notice that the other thread is in a with atomic block, and be nice and release the lock again. Then it waits until the with atomic block finishes, and tries again from the start. We could do this by putting the above pattern it own function (which makes some sense anyway, because the pattern is repeated left and right, and is often complicated by an additional if (!PyThread_acquire_lock(self-lock, 0)) before); and then allowing that function to be overridden by the external 'stm' module. I suspect that I need to do a more thorough review of the stdlib to make sure (at least more than now) that all potential deadlocking places can be avoided with a similar refactoring. All in all, it seems that the patch to CPython itself will need to be more than just the few lines in ceval.c --- but still very reasonable both in size and in content. A bientôt, Armin. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Should we move to replace re with regex?
On Aug 27, 2011, at 07:11 PM, Martin v. Löwis wrote: A PEP should IMO only cover end-user aspects of the new re module. Code organization is typically not in the PEP. To give a specific example: you mentioned that there is (near) code duplication MRAB's module. As a reviewer, I would discuss whether this can be eliminated - but not in the PEP. +1 -Barry ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Ctypes and the stdlib (was Re: LZMA compression support in 3.3)
2011/8/29 Glyph Lefkowitz gl...@twistedmatrix.com: On Aug 28, 2011, at 7:27 PM, Guido van Rossum wrote: In general, an existing library cannot be called without access to its .h files -- there are probably struct and constant definitions, platform-specific #ifdefs and #defines, and other things in there that affect the linker-level calling conventions for the functions in the library. Unfortunately I don't know a lot about this, but I keep hearing about something called rffi that PyPy uses to call C from RPython: http://readthedocs.org/docs/pypy/en/latest/rffi.html. This has some shortcomings currently, most notably the fact that it needs those .h files (and therefore a C compiler) at runtime This is incorrect. rffi is actually quite like ctypes. The part you are referring to is probably rffi_platform [1], which invokes the compiler to determine constant values and struct offsets, or ctypes_configure, which does need runtime headers [2]. [1] https://bitbucket.org/pypy/pypy/src/92e36ab4eb5e/pypy/rpython/tool/rffi_platform.py [2] https://bitbucket.org/pypy/pypy/src/92e36ab4eb5e/ctypes_configure/ -- Regards, Benjamin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Should we move to replace re with regex?
On Aug 26, 2011, at 05:25 PM, Dan Stromberg wrote: from __future__ import is an established way of trying something for a while to see if it's going to work. Actually, no. The documentation says: -snip snip- __future__ is a real module, and serves three purposes: * To avoid confusing existing tools that analyze import statements and expect to find the modules they’re importing. * To ensure that future statements run under releases prior to 2.1 at least yield runtime exceptions (the import of __future__ will fail, because there was no module of that name prior to 2.1). * To document when incompatible changes were introduced, and when they will be — or were — made mandatory. This is a form of executable documentation, and can be inspected programmatically via importing __future__ and examining its contents. -snip snip- So, really the __future__ module is a way to introduce accepted but incompatible changes in a controlled way, through successive releases. It's never been used to introduce experimental features that might be removed if they don't work out. Cheers, -Barry ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Should we move to replace re with regex?
On Aug 27, 2011, at 01:15 PM, Ben Finney wrote: My question is directed more to M-A Lemburg's passage above, and its implicit assumption that the user understand the changes between “Unicode 2.0/3.0 semantics” and “Unicode 6 semantics”, and how their own needs relate to those semantics. More likely, it'll be a choice between wanting Unicode 6 semantics, and don't care. So the PEP could include some clues as to why you'd care to use regex instead of re. -Barry ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] issue 6721 Locks in python standard library should be sanitized on fork
On Sat, Aug 27, 2011 at 2:59 AM, Ask Solem a...@celeryproject.org wrote: On 26 Aug 2011, at 16:53, Antoine Pitrou wrote: Hi, I think that deprecating the use of threads w/ multiprocessing - or at least crippling it is the wrong answer. Multiprocessing needs the helper threads it uses internally to manage queues, etc. Removing that ability would require a near-total rewrite, which is just a non-starter. I agree that this wouldn't actually benefit anyone. Besides, I don't think it's even possible to avoid threads in multiprocessing, given the various constraints. We would have to force the user to run their main thread in an event loop, and that would be twisted (tm). I would focus on the atfork() patch more directly, ignoring multiprocessing in the discussion, and focusing on the merits of gps' initial proposal and patch. I think this could also be combined with Charles-François' patch. Regards Have to agree with Jesse and Antoine here. Celery (celeryproject.org) uses multiprocessing, is wildly used in production, and is regarded as stable software that have been known to run for months at a time only to be restarted for software upgrades. I have been investigating an issue for some time, that I'm pretty sure is caused by this. It occurs only rarely, so rarely I have not had any actual bug reports about it, it's just something I have experienced during extensive testing. The tone of the discussion on the bug tracker makes me think that I have been very lucky :-) Using the fork+exec approach seems like a much more realistic solution than rewriting multiprocessing.Pool and Manager to not use threads. In fact this is something I have been considering as a fix for the suspected issue for for some time. It does have implications that are annoying for sure, but we are already used to this on the Windows platform (it could help portability even). +3 (agreed to Jesse, Antoine and Ask here). The http://bugs.python.org/issue8713 described non-fork implementation that always uses subprocesses rather than plain forked processes is the right way forward for multiprocessing. -gps ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Cython, ctypes and the stdlib
Hi, I agree that this is getting off-topic for this list. I'm answering here in a certain detail to lighten things up a bit regarding thin and thick wrappers, but please move further usage related questions to the cython-users mailing list. Paul Moore, 29.08.2011 12:37: On 29 August 2011 10:39, Stefan Behnel wrote: In the CPython backend, the header files are normally #included by the generated C code, so they are used at C compilation time. Cython has its own view on the header files in separate declaration files (.pxd). Basically looks like this: # file mymath.pxd cdef extern from aheader.h: double PI double E double abs(double x) These declaration files usually only contain the parts of a header file that are used in the user code, either manually copied over or extracted by scripts (that's what I was referring to in my reply to Terry). The complete 'real' content of the header file is then used by the C compiler at C compilation time. The user code employs a cimport statement to import the declarations at Cython compilation time, e.g. # file mymodule.pyx cimport mymath print mymath.PI + mymath.E would result in C code that #includes aheader.h, adds the C constants PI and E, converts the result to a Python float object and prints it out using the normal CPython machinery. One thing that would make it easier for me to understand the role of Cython in this context would be to see a simple example of the type of thin wrapper we're talking about here. The above code is nearly this, but the pyx file executes real code. Yes, that's the idea. If all you want is an exact, thin wrapper, you are better off with SWIG (well, assuming that performance is not important for you - Cython is a *lot* faster). But if you use it, or any other plain glue code generator, chances are that you will quickly learn that you do not actually want a thin wrapper. Instead, you want something that makes the external library easily and efficiently usable from Python code. Which means that the wrapper will be thin in some places and thick in others, sometimes very thick in selected places, and usually growing thicker over time. You can do this by using a glue code generator and writing the rest in a Python wrapper on top of the thin glue code. It's just that Cython makes such a wrapper much more efficient (for CPython), be it in terms of CPU performance (fast Python interaction, overhead-free C interaction, native C data type support, various Python code optimisations), or in terms of parallelisation support (explicit GIL-free threading and OpenMP), or just general programmer efficiency, e.g. regarding automatic data conversion or ease and safety of manual C memory management. For example, how do I simply expose pi and abs from math.h? Based on the above, I tried a pyx file containing just the code cdef extern from math.h: double pi double abs(double x) but the resulting module exported no symbols. Recent Cython versions have support for directly exporting C values (e.g. enum values) at the Python module level. However, the normal way is to explicitly implement the module API as you guessed, i.e. cimport mydecls # assuming there is a mydecls.pxd PI = mydecls.PI def abs(x): return mydecls.abs(x) Looks simple, right? Nothing interesting here, until you start putting actual code into it, as in this (totally contrived and untested, but much more correct) example: from libc cimport math cdef extern from *: # these are defined by the always included Python.h: long LONG_MAX, LONG_MIN def abs(x): if isinstance(x, float):# - C double return math.fabs(x) elif isinstance(x, int):# - may or may not be a C integer if LONG_MIN = x = LONG_MAX: return unsigned long math.labs(x) else: # either within long long or raise OverflowError return unsigned long long math.llabs(x) else: # assume it can at least coerce to a C long, # or raise ValueError or OverflowError or whatever return unsigned long math.labs(x) BTW, there is some simple templating/generics-like type merging support upcoming in a GSoC to simplify this kind of type specific code. This is probably a bit off-topic, but it seems to me that whenever Cython comes up in these discussions, the implications of Cython-as-an-implementation-of-python obscure the idea of simply using Cython as a means of writing thin library wrappers. Cython is not a glue code generator, it's a full-fledged programming language. It's Python, with additional support for C data types. That makes it great for writing non-trivial wrappers between Python and C. It's not so great for the trivial cases, but luckily, those are rare. ;) I've kept python-dev in this response, on the assumption that
[Python-Dev] PEP categories (was Re: PEP 393 review)
On Aug 29, 2011, at 11:03 AM, Dirkjan Ochtman wrote: Also, this PEP makes me wonder if there should be a way to distinguish between language PEPs and (CPython) implementation PEPs, by adding a tag or using the PEP number ranges somehow. I've thought about this, and about a similar split between language changes and stdlib changes (i.e. new modules such as regex). Probably the best thing to do would be to allocate some 1000's to the different categories, like we did for the 3xxx Python 3k PEPS (now largely moot though). -Barry ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] issue 6721 Locks in python standard library should be sanitized on fork
+3 (agreed to Jesse, Antoine and Ask here). The http://bugs.python.org/issue8713 described non-fork implementation that always uses subprocesses rather than plain forked processes is the right way forward for multiprocessing. I see two drawbacks: - it will be slower, since the interpreter startup time is non-negligible (well, normally you shouldn't spawn a new process for every item, but it should be noted) - it'll consume more memory, since we lose the COW advantage (even though it's already limited by the fact that even treating a variable read-only can trigger an incref, as was noted in a previous thread) cf ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP categories (was Re: PEP 393 review)
On Mon, Aug 29, 2011 at 18:24, Barry Warsaw ba...@python.org wrote: Also, this PEP makes me wonder if there should be a way to distinguish between language PEPs and (CPython) implementation PEPs, by adding a tag or using the PEP number ranges somehow. I've thought about this, and about a similar split between language changes and stdlib changes (i.e. new modules such as regex). Probably the best thing to do would be to allocate some 1000's to the different categories, like we did for the 3xxx Python 3k PEPS (now largely moot though). Allocating 1000's seems sensible enough to me. And yes, the division between recents 3x and non-3x PEPs seems quite arbitrary. Cheers, Dirkjan P.S. Perhaps the index could list accepted and open PEPs before meta and informational? And maybe reverse the order under some headings, for example in the finished category... ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP categories (was Re: PEP 393 review)
On Mon, 29 Aug 2011 18:38:23 +0200 Dirkjan Ochtman dirk...@ochtman.nl wrote: On Mon, Aug 29, 2011 at 18:24, Barry Warsaw ba...@python.org wrote: Also, this PEP makes me wonder if there should be a way to distinguish between language PEPs and (CPython) implementation PEPs, by adding a tag or using the PEP number ranges somehow. I've thought about this, and about a similar split between language changes and stdlib changes (i.e. new modules such as regex). Probably the best thing to do would be to allocate some 1000's to the different categories, like we did for the 3xxx Python 3k PEPS (now largely moot though). Allocating 1000's seems sensible enough to me. And yes, the division between recents 3x and non-3x PEPs seems quite arbitrary. I like the 3k numbers myself :)) ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP categories (was Re: PEP 393 review)
Barry Warsaw, 29.08.2011 18:24: On Aug 29, 2011, at 11:03 AM, Dirkjan Ochtman wrote: Also, this PEP makes me wonder if there should be a way to distinguish between language PEPs and (CPython) implementation PEPs, by adding a tag or using the PEP number ranges somehow. I've thought about this, and about a similar split between language changes and stdlib changes (i.e. new modules such as regex). Probably the best thing to do would be to allocate some 1000's to the different categories, like we did for the 3xxx Python 3k PEPS (now largely moot though). These things tend to get somewhat clumsy over time, though. What about a stdlib change that only applies to CPython for some reason, e.g. because no other implementation currently has that module? I think it's ok to make a coarse-grained distinction by numbers, but there should also be a way to tag PEPs textually. Stefan ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] issue 6721 Locks in python standard library should be sanitized on fork
2011/8/29 Charles-François Natali neolo...@free.fr: +3 (agreed to Jesse, Antoine and Ask here). The http://bugs.python.org/issue8713 described non-fork implementation that always uses subprocesses rather than plain forked processes is the right way forward for multiprocessing. I see two drawbacks: - it will be slower, since the interpreter startup time is non-negligible (well, normally you shouldn't spawn a new process for every item, but it should be noted) Yes; but spawning and forking are both slow to begin with - it's documented (I hope heavily enough) that you should spawn multiprocessing children early, and keep them around instead of constantly creating/destroying them. - it'll consume more memory, since we lose the COW advantage (even though it's already limited by the fact that even treating a variable read-only can trigger an incref, as was noted in a previous thread) cf Yes, it would consume slightly more memory; but the benefits - making it consistent across *all* platforms with the *same* restrictions gets us closer to the principle of least surprise. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] SWIG (was Re: Ctypes and the stdlib)
snip I've sometimes thought it might be interesting to create a Swig replacement purely in Python. When I work on the PLY project, this is often what I think about. In that project, I've actually built a number of the parsing tools that would be useful in creating such a thing. The only catch is that when I start thinking along these lines, I usually reach a point where I say nah, I'll just write the whole application in Python. Anyways, this is probably way more than anyone wants to know about Swig. Getting back to the original topic of using it to make standard library modules, I just don't know. I think you probably could have some success with an automatic code generator of some kind. I'm just not sure it should take the Swig approach of parsing C++ headers. I think you could do better. Dave, Having written a full C99 parser (http://code.google.com/p/pycparser/) based on your (excellent) PLY library, my impression is that the problem is with the problem, not with the solution. Strange sentence, I know :-) What I mean is that parsing C++ (even its headers) is inherently hard, which is why the solutions tend to grow so complex. Even with the modest C99, clean and simple solutions based on theoretical approaches (like PLY with its generated LALR parsers) tend to run into walls [*]. C++ is an order of magnitude harder. If I went to implement something like SWIG today, I would almost surely base my implementation on Clang (http://clang.llvm.org/). They have a full C++ parser (carefully hand-crafted, quite admirably keeping a relatively comprehensible code-base for such a task) used in a real compiler front-end, and a flexible library structure aimed at creating tools. There are also Python bindings that would allow to do most of the interesting Python-interface-specific work in Python - parse the C++ headers using Clang's existing parser into ASTs - then generate ctypes / extensions from that, *in Python*. The community is also gladly accepting contributions. I've had some fixes committed for the Python bindings and the C interfaces that tie them to Clang, and got the impression from Clang's core devs that further contributions will be most welcome. So whatever is missing from the Python bindings can be easily added. Eli [*] http://eli.thegreenplace.net/2011/05/02/the-context-sensitivity-of-c%E2%80%99s-grammar-revisited/ ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] issue 6721 Locks in python standard library should be sanitized on fork
On Mon, 29 Aug 2011 13:03:53 -0400 Jesse Noller jnol...@gmail.com wrote: 2011/8/29 Charles-François Natali neolo...@free.fr: +3 (agreed to Jesse, Antoine and Ask here). The http://bugs.python.org/issue8713 described non-fork implementation that always uses subprocesses rather than plain forked processes is the right way forward for multiprocessing. I see two drawbacks: - it will be slower, since the interpreter startup time is non-negligible (well, normally you shouldn't spawn a new process for every item, but it should be noted) Yes; but spawning and forking are both slow to begin with - it's documented (I hope heavily enough) that you should spawn multiprocessing children early, and keep them around instead of constantly creating/destroying them. I think fork() is quite fast on modern systems (e.g. Linux). exec() is certainly slow, though. The third drawback is that you are limited to picklable objects when specifying the arguments for your child process. This can be annoying if, for example, you wanted to pass an OS resource. Regards Antoine. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] issue 6721 Locks in python standard library should be sanitized on fork
On Mon, Aug 29, 2011 at 1:16 PM, Antoine Pitrou solip...@pitrou.net wrote: On Mon, 29 Aug 2011 13:03:53 -0400 Jesse Noller jnol...@gmail.com wrote: 2011/8/29 Charles-François Natali neolo...@free.fr: +3 (agreed to Jesse, Antoine and Ask here). The http://bugs.python.org/issue8713 described non-fork implementation that always uses subprocesses rather than plain forked processes is the right way forward for multiprocessing. I see two drawbacks: - it will be slower, since the interpreter startup time is non-negligible (well, normally you shouldn't spawn a new process for every item, but it should be noted) Yes; but spawning and forking are both slow to begin with - it's documented (I hope heavily enough) that you should spawn multiprocessing children early, and keep them around instead of constantly creating/destroying them. I think fork() is quite fast on modern systems (e.g. Linux). exec() is certainly slow, though. The third drawback is that you are limited to picklable objects when specifying the arguments for your child process. This can be annoying if, for example, you wanted to pass an OS resource. Regards Antoine. Yes, it is annoying; but again - this makes it more consistent with the windows implementation. I'd rather that restriction than the sanitization of the ability to use threading and multiprocessing alongside one another. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] issue 6721 Locks in python standard library should be sanitized on fork
Le lundi 29 août 2011 à 13:23 -0400, Jesse Noller a écrit : Yes, it is annoying; but again - this makes it more consistent with the windows implementation. I'd rather that restriction than the sanitization of the ability to use threading and multiprocessing alongside one another. That sanitization is generally useful, though. For example if you want to use any I/O after a fork(). Regards Antoine. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] Python 3 optimizations continued...
Hi, pretty much a year ago I wrote about the optimizations I did for my PhD thesis that target the Python 3 series interpreters. While I got some replies, the discussion never really picked up and no final explicit conclusion was reached. AFAICT, because of the following two factors, my optimizations were not that interesting for inclusion with the distribution at that time: a) Unladden Swallow was targeting Python 3, too. b) My prototype did not pass the regression tests. As of November 2010 (IIRC), Google is not supporting work on US anymore, and the project is stalled. (If I am wrong and there is still activity and any plans with the corresponding PEP, please let me know.) Which is why I recently spent some time fixing issues so that I can run the regression tests. There is still some work to be done, but by and large it should be possible to complete all regression tests in reasonable time (with the actual infrastructure in place, enabling optimizations later on is not a problem at all, too.) So, the two big issues aside, is there any interest in incorporating these optimizations in Python 3? Have a nice day, --stefan PS: It probably is unusual, but in a part of my home page I have created a link to indicate interest (makes both counting and voting easier, http://www.ics.uci.edu/~sbruntha/) There were also links indicating interest in funding the work; I have disabled these, so as not to upset anybody or make the impression of begging for money... ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] issue 6721 Locks in python standard library should be sanitized on fork
On Mon, Aug 29, 2011 at 1:22 PM, Antoine Pitrou solip...@pitrou.net wrote: Le lundi 29 août 2011 à 13:23 -0400, Jesse Noller a écrit : Yes, it is annoying; but again - this makes it more consistent with the windows implementation. I'd rather that restriction than the sanitization of the ability to use threading and multiprocessing alongside one another. That sanitization is generally useful, though. For example if you want to use any I/O after a fork(). Oh! I don't disagree; I'm just against the removal of the ability to mix multiprocessing and threads; which it does internally and others do in every day code. The proposed removal of that functionality - using the two together - would leave users in the dust, and not needed if we patch http://bugs.python.org/issue8713 - which at it's core is just an addition flag. We could document the risk(s) of using the fork() mechanism which has to remain the default for some time. The point is, is that the solution to http://bugs.python.org/issue6721 should not be intertwined or cause a severe change in the multiprocessing module (e.g. rewriting from scratch), etc. I'm not arguing that both bugs should not be fixed. jesse ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Python 3 optimizations continued...
2011/8/29 stefan brunthaler s.bruntha...@uci.edu: So, the two big issues aside, is there any interest in incorporating these optimizations in Python 3? Perhaps there would be something to say given patches/overviews/specifics. -- Regards, Benjamin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] issue 6721 Locks in python standard library should be sanitized on fork
On Mon, Aug 29, 2011 at 8:16 PM, Antoine Pitrou solip...@pitrou.net wrote: On Mon, 29 Aug 2011 13:03:53 -0400 Jesse Noller jnol...@gmail.com wrote: Yes; but spawning and forking are both slow to begin with - it's documented (I hope heavily enough) that you should spawn multiprocessing children early, and keep them around instead of constantly creating/destroying them. I think fork() is quite fast on modern systems (e.g. Linux). exec() is certainly slow, though. On my system, the time it takes worker code to start is: 40 usec with thread.start_new_thread 240 usec with threading.Thread().start 450 usec with os.fork 1 ms with multiprocessing.Process.start 25 ms with subprocess.Popen to start a trivial script. so os.fork has similar latency to threading.Thread().start, while spawning is 100 times slower. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Python 3 optimizations continued...
Perhaps there would be something to say given patches/overviews/specifics. Currently I don't have patches, but for an overview and specifics, I can provide the following: * My optimizations basically rely on quickening to incorporate run-time information. * I use two separate instruction dispatch routines, and use profiling to switch from the regular Python 3 dispatch routine to an optimized one (the implementation is actually vice versa, but that is not important now) * The optimized dispatch routine has a changed instruction format (word-sized instead of bytecodes) that allows for regular instruction decoding (without the HAS_ARG-check) and inlinind of some objects in the instruction format on 64bit architectures. * I use inline-caching based on quickening (passes almost all regression tests [302 out of 307]), eliminate reference count operations using quickening (passes but has a memory leak), promote frequently accessed local variables to their dedicated instructions (passes), and cache LOAD_GLOBAL/LOAD_NAME objects in the instruction encoding when possible (I am working on this right now.) The changes I made can be summarized as: * I changed some header files to accommodate additional information (Python.h, ceval.h, code.h, frameobject.h, opcode.h, tupleobject.h) * I changed mostly abstract.c to incorporate runtime-type feedback. * All other changes target mostly ceval.c and all supplementary code is in a sub-directory named opt and all generated files in a sub-directory within that (opt/gen). * I have a code generator in place that takes care of generating all the functions; it uses the Mako template system for creating C code and does not necessarily need to be shipped with the interpreter (though one can play around and experiment with it.) So, all in all, the changes are not that big to the actual implementation, and most of the code is generated (using sloccount, opt has 1990 lines of C, and opt/gen has 8649 lines of C). That's a quick summary, if there are any further or more in-depth questions, let me know. best, --stefan ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] SWIG (was Re: Ctypes and the stdlib)
On Mon, Aug 29, 2011 at 7:14 PM, Eli Bendersky eli...@gmail.com wrote: snip I've sometimes thought it might be interesting to create a Swig replacement purely in Python. When I work on the PLY project, this is often what I think about. In that project, I've actually built a number of the parsing tools that would be useful in creating such a thing. The only catch is that when I start thinking along these lines, I usually reach a point where I say nah, I'll just write the whole application in Python. Anyways, this is probably way more than anyone wants to know about Swig. Getting back to the original topic of using it to make standard library modules, I just don't know. I think you probably could have some success with an automatic code generator of some kind. I'm just not sure it should take the Swig approach of parsing C++ headers. I think you could do better. Dave, Having written a full C99 parser (http://code.google.com/p/pycparser/) based on your (excellent) PLY library, my impression is that the problem is with the problem, not with the solution. Strange sentence, I know :-) What I mean is that parsing C++ (even its headers) is inherently hard, which is why the solutions tend to grow so complex. Even with the modest C99, clean and simple solutions based on theoretical approaches (like PLY with its generated LALR parsers) tend to run into walls [*]. C++ is an order of magnitude harder. If I went to implement something like SWIG today, I would almost surely base my implementation on Clang (http://clang.llvm.org/). They have a full C++ parser (carefully hand-crafted, quite admirably keeping a relatively comprehensible code-base for such a task) used in a real compiler front-end, and a flexible library structure aimed at creating tools. There are also Python bindings that would allow to do most of the interesting Python-interface-specific work in Python - parse the C++ headers using Clang's existing parser into ASTs - then generate ctypes / extensions from that, *in Python*. The community is also gladly accepting contributions. I've had some fixes committed for the Python bindings and the C interfaces that tie them to Clang, and got the impression from Clang's core devs that further contributions will be most welcome. So whatever is missing from the Python bindings can be easily added. Agreed, I know some people have looked into that direction in the scientific python community (to generate .pxd for cython). I wrote one of the hack Stefan refered to (based on ctypeslib using gccxml), and using clang makes so much more sense. To go back to the initial issue, using cython to wrap C code makes a lot of sense. In the scipy community, I believe there is a broad agreement that most of code which would requires C/C++ should be done in cython instead (numpy and scipy already do so a bit). I personally cannot see man situations where writing wrappers in C by hand works better than cython (especially since cython handles python2/3 automatically for you). cheers, David ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP categories (was Re: PEP 393 review)
On Aug 29, 2011, at 06:55 PM, Stefan Behnel wrote: These things tend to get somewhat clumsy over time, though. What about a stdlib change that only applies to CPython for some reason, e.g. because no other implementation currently has that module? I think it's ok to make a coarse-grained distinction by numbers, but there should also be a way to tag PEPs textually. Yeah, the categories would be pretty coarse grained, and their orthogonality would cause classification problems. I suppose we could use some kind of hashtag approach. OTOH, I'm not entirely sure it's worth it either. ;) I think we'd need a concrete proposal and someone willing to hack the PEP0 autogen tools. -Barry ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP categories (was Re: PEP 393 review)
On Aug 29, 2011, at 06:40 PM, Antoine Pitrou wrote: I like the 3k numbers myself :)) Me too. :) But I think we've pretty much abandoned that convention for any new PEPs. Well, until Guido announces Python 4k. :) -Barry signature.asc Description: PGP signature ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] LZMA compression support in 3.3
I've updated the issue http://bugs.python.org/issue6715 with a patch containing my work so far - the LZMACompressor and LZMADecompressor classes, along with some tests. These two classes should provide a fairly complete interface to liblzma; it will be possible to implement LZMAFile on top of them, entirely in Python. Note that the C code does no I/O; this will be handled by LZMAFile. Please take a look, and let me know what you think. Cheers, Nadeem ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 393 review
Am 29.08.2011 11:03, schrieb Dirkjan Ochtman: On Sun, Aug 28, 2011 at 21:47, Martin v. Löwis mar...@v.loewis.de wrote: result strings. In PEP 393, a buffer must be scanned for the highest code point, which means that each byte must be inspected twice (a second time when the copying occurs). This may be a silly question: are there things in place to optimize this for the case where two strings are combined? E.g. highest character in combined string is max(highest character in either of the strings). Unicode_Concat goes like this maxchar = PyUnicode_MAX_CHAR_VALUE(u); if (PyUnicode_MAX_CHAR_VALUE(v) maxchar) maxchar = PyUnicode_MAX_CHAR_VALUE(v); /* Concat the two Unicode strings */ w = (PyUnicodeObject *) PyUnicode_New( PyUnicode_GET_LENGTH(u) + PyUnicode_GET_LENGTH(v), maxchar); if (w == NULL) goto onError; PyUnicode_CopyCharacters(w, 0, u, 0, PyUnicode_GET_LENGTH(u)); PyUnicode_CopyCharacters(w, PyUnicode_GET_LENGTH(u), v, 0, PyUnicode_GET_LENGTH(v)); Also, this PEP makes me wonder if there should be a way to distinguish between language PEPs and (CPython) implementation PEPs, by adding a tag or using the PEP number ranges somehow. Well, no. This would equally apply to every single patch, and is just not feasible. Instead, alternative implementations typically target a CPython version, and then find out what features they need to implement to claim conformance. Regards, Martin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Should we move to replace re with regex?
On 8/29/2011 9:00 AM, Barry Warsaw wrote: On Aug 27, 2011, at 07:11 PM, Martin v. Löwis wrote: A PEP should IMO only cover end-user aspects of the new re module. Code organization is typically not in the PEP. To give a specific example: you mentioned that there is (near) code duplication MRAB's module. As a reviewer, I would discuss whether this can be eliminated - but not in the PEP. +1 I think at this point we need a tracker issue to which can be attached such reviews, for safe-keeping, even if most discussion continues here. -- Terry Jan Reedy ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] SWIG (was Re: Ctypes and the stdlib)
Then there is gccxml, although I'm not sure how active it is now. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 393 review
Those haven't been ported to the new API, yet. Consider, for example, d9821affc9ee. Before that, I got 253 MB/s on the 4096 units read test; with that change, I get 610 MB/s. The trunk gives me 488 MB/s, so this is a 25% speedup for PEP 393. If I understand correctly, the performance now highly depend on the used characters? A pure ASCII string is faster than a string with characters in the ISO-8859-1 charset? How did you infer that from above paragraph??? ASCII and Latin-1 are mostly identical in terms of performance - the ASCII decoder should be slightly slower than the Latin-1 decoder, since the ASCII decoder needs to check for errors, whereas the Latin-1 decoder will never be confronted with errors. What matters is a) is the codec already rewritten to use the new representation, or must it go through Py_UNICODE[] first, requiring then a second copy to the canonical form? b) what is the cost of finding out the highest character? - regardless of what the highest character turns out to be Is it also true for BMP characters vs non-BMP characters? Well... If you are talking about the ASCII and Latin-1 codecs - neither of these support most BMP characters, let alone non-BMP characters. In general, non-BMP characters are more expensive to process since they take more space. Do these benchmark tools use only ASCII characters, or also some ISO-8859-1 characters? See for yourself. iobench uses Latin-1, including non-ASCII, but not non-Latin-1. Or, better, different Unicode ranges in different tests? That's why I asked for a list of benchmarks to perform. I cannot run an infinite number of benchmarks prior to adoption of the PEP. Regards, Martin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] issue 6721 Locks in python standard library should be sanitized on fork
On Mon, Aug 29, 2011 at 8:42 PM, Jesse Noller jnol...@gmail.com wrote: On Mon, Aug 29, 2011 at 1:22 PM, Antoine Pitrou solip...@pitrou.net wrote: That sanitization is generally useful, though. For example if you want to use any I/O after a fork(). Oh! I don't disagree; I'm just against the removal of the ability to mix multiprocessing and threads; which it does internally and others do in every day code. I am not familiar with the python-dev definition for deprecation, but when I used the word in the bug discussion I meant to advertize to users that they should not mix threading and forking since that mix is and will remain broken by design; I did not mean removal or crippling of functionality. “When I use a word,” Humpty Dumpty said, in rather a scornful tone, “it means just what I choose it to mean—neither more nor less.” - Through the Looking-Glass (btw, my tone is not scornful) And there is no way around it - the mix in general is broken, with an atfork mechanism or without it. People can choose to keep doing it in their every day code at their own risk, be it significantly high or insignificantly low. But the documentation should explain the problem clearly. As for the internal use of threads in the multiprocessing module I proposed a potential way to sanitize those particular worker threads: http://bugs.python.org/issue6721#msg140402 If it makes sense and entails changes to internal multiprocessing worker threads, those changes could be applied as bug fixes to Python 2.x and previous Python 3.x releases. This does not contradict adding now the feature to spawn, and to make it the only possibility in the future. I agree that this is the saner approach but it is a new feature not a bug fix. Nir ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 393 review
tl;dr: PEP-393 reduces the memory usage for strings of a very small Django app from 7.4MB to 4.4MB, all other objects taking about 1.9MB. Am 26.08.2011 16:55, schrieb Guido van Rossum: It would be nice if someone wrote a test to roughly verify these numbers, e.v. by allocating lots of strings of a certain size and measuring the process size before and after (being careful to adjust for the list or other data structure required to keep those objects alive). I have now written a Django application to measure the effect of PEP 393, using the debug mode (to find all strings), and sys.getsizeof: https://bitbucket.org/t0rsten/pep-393/src/ad02e1b4cad9/pep393utils/djmemprof/count/views.py The results for 3.3 and pep-393 are attached. The Django app is small in every respect: trivial ORM, very few objects (just for the sake of exercising the ORM at all), no templating, short strings. The memory snapshot is taken in the middle of a request. The tests were run on a 64-bit Linux system with 32-bit Py_UNICODE. The tally of strings by length confirms that both tests have indeed comparable sets of objects (not surprising since it is identical Django source code and the identical application). Most strings in this benchmark are shorter than 16 characters, and a few have several thousand characters. The tally of byte lengths shows that it's the really long memory blocks that are gone with the PEP. Digging into the internal representation, it's possibly to estimate unaccounted bytes. For PEP 393: bytes - 80*strings - (chars+strings) = 190053 This is the total of the wchar_t and UTF-8 representations for objects that have them, plus any 2-byte and four-byte strings accounted incorrectly in above formula. Unfortunately, for default bytes + 56*strings - 4*(chars+strings) = 0 as unicode__sizeof__ doesn't account for the (separate) PyBytes object that may carry the default encoding. So in practice, the 3.3 number should be somewhat larger. In both cases, the app didn't cope for internal fragmentation; this would be possible by rounding up each string size to the next multiple of 8 (given that it's all allocated through the object allocator). It should be possible to squeeze a little bit out of the 190kB, by finding objects for which the wchar_t or UTF-8 representations are created unnecessarily. Regards, Martin 3.3.0a0 (default:45b63a8a76c9, Aug 29 2011, 21:45:49) [GCC 4.6.1 20110526 (prerelease)] Strings: 36075 Chars: 1303746 Bytes: 7379484 Other objects: 1906432 By Length (length: numstrings) Up to 4: 5710 Up to 8: 8997 Up to 16: 11657 Up to 32: 4267 Up to 64: 2319 Up to 128: 1373 Up to 256: 828 Up to 512: 558 Up to 1024: 233 Up to 2048: 104 Up to 4096: 23 Up to 8192: 5 Up to 16384: 0 Up to 32768: 1 By Size (size: numstrings) Up to 40: 0 Up to 80: 7913 Up to 160: 21796 Up to 320: 3317 Up to 640: 1452 Up to 1280: 847 Up to 2560: 482 Up to 5120: 183 Up to 10240: 65 Up to 20480: 18 Up to 40960: 1 Up to 81920: 1 3.3.0a0 (pep-393:6ffa3b569228, Aug 29 2011, 22:00:31) [GCC 4.6.1 20110526 (prerelease)] Strings: 36091 Chars: 1304098 Bytes: 4417522 Other objects: 1866616 By Length (length: numstrings) Up to 4: 5728 Up to 8: 8997 Up to 16: 11658 Up to 32: 4239 Up to 64: 2335 Up to 128: 1382 Up to 256: 828 Up to 512: 558 Up to 1024: 233 Up to 2048: 104 Up to 4096: 23 Up to 8192: 5 Up to 16384: 0 Up to 32768: 1 By Size (size: numstrings) Up to 40: 0 Up to 80: 0 Up to 160: 33247 Up to 320: 1500 Up to 640: 1007 Up to 1280: 226 Up to 2560: 86 Up to 5120: 21 Up to 10240: 3 Up to 20480: 1 ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Python 3 optimizations continued...
So, the two big issues aside, is there any interest in incorporating these optimizations in Python 3? The question really is whether this is an all-or-nothing deal. If you could identify smaller parts that can be applied independently, interest would be higher. Also, I'd be curious whether your techniques help or hinder a potential integration of a JIT generator. Regards, Martin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 393 review
Martin v. Löwis wrote: tl;dr: PEP-393 reduces the memory usage for strings of a very small Django app from 7.4MB to 4.4MB, all other objects taking about 1.9MB. Am 26.08.2011 16:55, schrieb Guido van Rossum: It would be nice if someone wrote a test to roughly verify these numbers, e.v. by allocating lots of strings of a certain size and measuring the process size before and after (being careful to adjust for the list or other data structure required to keep those objects alive). I have now written a Django application to measure the effect of PEP 393, using the debug mode (to find all strings), and sys.getsizeof: https://bitbucket.org/t0rsten/pep-393/src/ad02e1b4cad9/pep393utils/djmemprof/count/views.py The results for 3.3 and pep-393 are attached. The Django app is small in every respect: trivial ORM, very few objects (just for the sake of exercising the ORM at all), no templating, short strings. The memory snapshot is taken in the middle of a request. The tests were run on a 64-bit Linux system with 32-bit Py_UNICODE. For comparison, could you run the test of the unmodified Python 3.3 on a 16-bit Py_UNICODE version as well ? Thanks, -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Aug 29 2011) Python/Zope Consulting and Support ...http://www.egenix.com/ mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/ 2011-10-04: PyCon DE 2011, Leipzig, Germany36 days to go ::: Try our new mxODBC.Connect Python Database Interface for free ! eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 393 review
On Mon, 29 Aug 2011 22:32:01 +0200 Martin v. Löwis mar...@v.loewis.de wrote: I have now written a Django application to measure the effect of PEP 393, using the debug mode (to find all strings), and sys.getsizeof: https://bitbucket.org/t0rsten/pep-393/src/ad02e1b4cad9/pep393utils/djmemprof/count/views.py The results for 3.3 and pep-393 are attached. This looks very nice. Is 3.3 a wide build? (how about a narrow build?) (is it with your own port of Django to py3k, or is there an official branch for it?) Regards Antoine. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Python 3 optimizations continued...
The question really is whether this is an all-or-nothing deal. If you could identify smaller parts that can be applied independently, interest would be higher. Well, it's not an all-or-nothing deal. In my current architecture, I can selectively enable most of the optimizations as I see fit. The only pre-requisite (in my implementation) is that I have two dispatch loops with a changed instruction format. It is, however, not a technical necessity, just the way I implemented it. Basically, you can choose whatever you like best, and I could extract that part. I am just offering to add all the things that I have done :) Also, I'd be curious whether your techniques help or hinder a potential integration of a JIT generator. This is something I have previously frequently discussed with several JIT people. IMHO, having my optimizations in-place also helps a JIT compiler, since it can re-use the information I gathered to generate more aggressively optimized native machine code right away (the inline caches can be generated with the type information right away, some functions could be inlined with the guard statements subsumed, etc.) Another benefit could be that the JIT compiler can spend longer time on generating code, because the interpreter is already faster (so in some cases it would probably not make sense to include a non-optimizing fast and simple JIT compiler). There are others on the list, who probably can/want to comment on this, too. That aside, I think that while having a JIT is an important goal, I can very well imagine scenarios where the additional memory consumption (for the generated native machine code) of a JIT for each process (I assume that the native machine code caches are not shared) hinders scalability. I have in fact no data to back this up, but I think that would be an interesting trade off, say if I have 30% gain in performance without substantial additional memory requirements on my existing hardware, compared to higher achievable speedups that require more machines, though. Regards, --stefan ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Python 3 optimizations continued...
On Mon, 29 Aug 2011 11:33:14 -0700 stefan brunthaler s.bruntha...@uci.edu wrote: * The optimized dispatch routine has a changed instruction format (word-sized instead of bytecodes) that allows for regular instruction decoding (without the HAS_ARG-check) and inlinind of some objects in the instruction format on 64bit architectures. Having a word-sized bytecode format would probably be acceptable in itself, so if you want to submit a patch for that, go ahead. Regards Antoine. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Ctypes and the stdlib (was Re: LZMA compression support in 3.3)
Guido van Rossum wrote: (Just like Python's own .h files -- e.g. the extensive renaming of the Unicode APIs depending on narrow/wide build) How does Cython deal with these? Pyrex/Cython deal with it by generating C code that includes the relevant headers, so the C compiler expands all the macros, interprets the struct declarations, etc. All you need to do when writing the .pyx file is follow the same API that you would if you were writing C code to use the library. -- Greg ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 3151 from the BDFOP
On Aug 24, 2011, at 12:51 PM, Nick Coghlan wrote: On Wed, Aug 24, 2011 at 9:57 AM, Antoine Pitrou solip...@pitrou.net wrote: Using IOError.__new__ is the easiest way to ensure that all code raising IO errors takes advantage of the errno mapping. Otherwise you may get APIs raising the proper subclasses, and other APIs always raising base IOError (it doesn't happen often, but some Python library code raises an IOError with an explicit errno). It's also the natural place to put the errno-exception type mapping so that existing code will raise the new errors without requiring modification. We could spell it as a new class method (from_errno or similar), but there isn't any ambiguity in doing it directly in __new__, so a class method seems pointlessly inconvenient. As I mentioned, my main concern with this is the surprise factor for people debugging and reading the code. A class method would solve that, but looks uglier and doesn't work with existing code. -Barry ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 3151 from the BDFOP
On Aug 24, 2011, at 01:57 AM, Antoine Pitrou wrote: One guiding principle for me is that we should keep the abstraction as thin as possible. In particular, I'm concerned about mapping multiple errnos into a single Error. For example both EPIPE and ESHUTDOWN mapping to BrokePipeError, or EACESS or EPERM to PermissionError. I think we should resist this, so that one errno maps to exactly one Error. Where grouping is desired, Python already has mechanisms to deal with that, e.g. superclasses and multiple inheritance. Therefore, I think it would be better to have + FileSystemPermissionError + AccessError (EACCES) + PermissionError (EPERM) I'm not sure that's a good idea: Was it the specific grouping under FileSystemPermissionError that you're objecting to, or the keep the abstraction thin principle? Let's say we threw out the idea of FSPE superclass, would you still want to collapse EACCES and EPERM into PermissionError, or would separate exceptions for each be okay? It's still pretty easy to catch both in one except clause, and it won't be too annoying if it's rare. Yes, FileSystemError might be removed. I thought that it would be useful, in some library routines, to catch all filesystem-related errors indistinctly, but it's not a complete catchall actually (for example, AccessError is outside of the FileSystemError subtree). Reading your IRC message (sorry, I was afk) it sounds like you think FileSystemError can be removed. I like keeping the hierarchy flat. Similarly, I think it would be helpful to have the errno name (e.g. ENOENT) in the error message string. That way, it won't get in the way for most code, but would be usefully printed out for uncaught exceptions. Agreed, but I think that's a feature request quite orthogonal from the PEP. The errno *number* is still printed as it was before: open(foo) Traceback (most recent call last): File stdin, line 1, in module FileNotFoundError: [Errno 2] No such file or directory: 'foo' (see e.g. http://bugs.python.org/issue12762) True, but since you're going to be creating a bunch of new exception classes, it should be relatively painless to give them a better str. Thanks for pointing out that bug; I agree with it. A second guiding principle should be that careful code that works in Python 3.2 must continue to work in Python 3.3 once PEP 3151 is accepted, but also for Python 2 code ported straight to Python 3.3. I don't porting straight to 3.3 would make a difference, especially now that the idea of deprecating old exception names has been abandoned. Cool. Do be prepared for complaints about compatibility for careless code though - there's a ton of that out in the wild, and people will always complain with their working code breaks due to an upgrade. Be *very* explicit about this in the release notes and NEWS file, and put your asbestos underoos on. I'll take care about that :) :) Have you considered the impact of this PEP on other Python implementations? My hazy memory of Jython tells me that errnos don't really leak into Java and thus Jython much, but what about PyPy and IronPython? E.g. step 1's deprecation strategy seems pretty CPython-centric. Alternative implementations already have to implement errno codes in a way or another if they want to have a chance of running existing code. So I don't think the PEP makes much of a difference for them. But their implementors can give their opinion on this. Let's give them a little more time to chime in (hopefully, they are reading this thread). We needn't wait too long though. As for step 1 (coalescing the errors). This makes sense and I'm generally agreeable, but I'm wondering whether it's best to re-use IOError for this rather than introduce a new exception. Not that I can think of a good name for that. I'm just not totally convinced that existing code when upgrading to Python 3.3 won't introduce silent failures. If an existing error is to be re-used for this, I'm torn on whether IOError or OSError is a better choice. Popularity aside, OSError *feels* more right. I don't have any personal preference. Previous discussions seemed to indicate people preferred IOError. But changing the implementation to OSError would be simple. I agree OSError feels slightly more right, as in more generic. Thanks for making this change in the PEP. And that anything raising an exception (e.g. via PyErr_SetFromErrno) other than the new ones will raise IOError? I'm not sure I understand the question precisely. My question mostly was about raising OSError (as the current PEP states) with an errno that does *not* map to one of the new exceptions. In that case, I don't think there's anything you could raise other than exactly OSError, right? The errno mapping mechanism is implemented in IOError.__new__, but it gets called only if the class is exactly IOError, not a subclass: IOError(errno.EPERM, foo) PermissionError(1, 'foo') class MyIOError(IOError):
Re: [Python-Dev] SWIG (was Re: Ctypes and the stdlib)
Thanks for an insightful post, Dave! I took the liberty of mentioning it on Google+: https://plus.google.com/115212051037621986145/posts/NyEiLEfR6HF (PS. Anyone wanting a G+ invite, go here: https://plus.google.com/i/7w3niYersIA:8fxDrfW-6TA ) --Guido On Mon, Aug 29, 2011 at 5:41 AM, David Beazley d...@dabeaz.com wrote: On Mon, Aug 29, 2011 at 12:27 PM, Guido van Rossum gu...@python.org wrote: I wonder if for this particular purpose SWIG isn't the better match. (If SWIG weren't universally hated, even by its original author. :-) Hate is probably a strong word, but as the author of Swig, let me chime in here ;-). I think there are probably some lessons to be learned from Swig. As Nick noted, Swig is best suited when you have control over both sides (C/C++ and Python) of whatever code you're working with. In fact, the original motivation for Swig was to give application programmers (scientists in my case), a means for automatically generating the Python bindings to their code. However, there was one other important assumption--and that was the fact that all of your real code was going to be written in C/C++ and that the Python scripting interface was just an optional add-on (perhaps even just a throw-away thing). Keep in mind, Swig was first created in 1995 and at that time, the use of Python (or any similar language) was a pretty radical idea in the sciences. Moreover, there was a lot of legacy code that people just weren't going to abandon. Thus, I always viewed Swig as a kind of transitional vehicle for getting people to use Python who might otherwise not even consider it. Getting back to Nick's point though, to really use Swig effectiv ely, it was always known that you might have to reorganize or refactor your C/C++ code to make it more Python friendly. However, due to the automatic wrapper generation, you didn't have to do it all at once. Basically your code could organically evolve and Swig would just keep up with whatever you were doing. In my projects, we'd usually just tuck Swig away in some Makefile somewhere and forget about it. One of the major complexities of Swig is the fact that it attempts to parse C/C++ header files. This very notion is actually a dangerous trap waiting for anyone who wants to wander into it. You might look at a header file and say, well how hard could it be to just grab a few definitions out of there? I'll just write a few regexs or come up with some simple hack for recognizing function definitions or something. Yes, you can do that, but you're immediately going to find that whatever approach you take starts to break down into horrible corner cases. Swig started out like this and quickly turned into a quagmire of esoteric bug reports. All sorts of problems with preprocessor macros, typedefs, missing headers, and other things. For awhile, I would get these bug reports that would go something like I had this C++ class inside a namespace with an abstract method taking a typedef'd const reference to this smart pointer . and Swig broke. Hell, I can't even underst and the bug report let alone know how to fix it. Almost all of these bugs were due to the fact that Swig started out as a hack and didn't really have any kind of solid conceptual foundation for how it should be put together. If you flash forward a bit, from about 2001-2004 there was a very serious push to fix these kinds of issues. Although it was not a complete rewrite of Swig, there were a huge number of changes to how it worked during this time. Swig grew a fully compatible C++ preprocessor that fully supported macros A complete C++ type system was implemented including support for namespaces, templates, and even such things as template partial specialization. Swig evolved into a multi-pass compiler that was doing all sorts of global analysis of the interface. Just to give you an idea, Swig would do things such as automatically detect/wrap C++ smart pointers. It could wrap overloaded C++ methods/function. Also, if you had a C++ class with virtual methods, it would only make one Python wrapper function and then reuse across all wrapped subclasses. Under the covers of all of this, the implementation basically evolved into a sophisticated macro preprocessor coupled with a pattern matching engine built on top of the C++ type system. For example, you could write patterns that matched specific C++ types (the much hated typemap feature) and you could write patterns that matched entire C++ declarations. This whole pattern matching approach had a huge power if you knew what you were doing. For example, I had a graduate student working on adding contracts to Swig--something that was being funded by a NSF grant. It was cool and mind boggling all at once. In hindsight however, I think the complexity of Swig has exceeded anyone's ability to fully
Re: [Python-Dev] PEP 3151 from the BDFOP
On Mon, 29 Aug 2011 17:18:33 -0400 Barry Warsaw ba...@python.org wrote: On Aug 24, 2011, at 01:57 AM, Antoine Pitrou wrote: One guiding principle for me is that we should keep the abstraction as thin as possible. In particular, I'm concerned about mapping multiple errnos into a single Error. For example both EPIPE and ESHUTDOWN mapping to BrokePipeError, or EACESS or EPERM to PermissionError. I think we should resist this, so that one errno maps to exactly one Error. Where grouping is desired, Python already has mechanisms to deal with that, e.g. superclasses and multiple inheritance. Therefore, I think it would be better to have + FileSystemPermissionError + AccessError (EACCES) + PermissionError (EPERM) I'm not sure that's a good idea: Was it the specific grouping under FileSystemPermissionError that you're objecting to, or the keep the abstraction thin principle? The former. EPERM is generally returned for things which aren't filesystem-related. (although I also think separating EACCES and EPERM is of little value *in practice*) Let's say we threw out the idea of FSPE superclass, would you still want to collapse EACCES and EPERM into PermissionError, or would separate exceptions for each be okay? I have a preference for the former, but am not against the latter. I just think that, given AccessError and PermissionError, most users won't know up front which one they should care about. It's still pretty easy to catch both in one except clause, and it won't be too annoying if it's rare. Indeed. Reading your IRC message (sorry, I was afk) it sounds like you think FileSystemError can be removed. I like keeping the hierarchy flat. Ok. It can be reintroduced later on. (the main reason why I think it can be removed is that EACCES in itself is often tied to filesystem access rights; so the EACCES exception class would have to be a subclass of FileSystemError, while the EPERM one should not :-)) open(foo) Traceback (most recent call last): File stdin, line 1, in module FileNotFoundError: [Errno 2] No such file or directory: 'foo' (see e.g. http://bugs.python.org/issue12762) True, but since you're going to be creating a bunch of new exception classes, it should be relatively painless to give them a better str. Thanks for pointing out that bug; I agree with it. Well, the str right now is exactly the same as OSError's. My question mostly was about raising OSError (as the current PEP states) with an errno that does *not* map to one of the new exceptions. In that case, I don't think there's anything you could raise other than exactly OSError, right? And indeed, that's what the implementation does :) So, for raising OSError with an errno mapping to one of the subclasses, it appears to break the explicit is better than implicit principle, and I think it could lead to hard-to-debug or understand code. You'll look at code that raises OSError, but the exception that gets printed will be one of the subclasses. I'm afraid that if you don't know that this is happening, you're going to think you're going crazy. Except that it only happens if you use a recognized errno. For example if you do: OSError(errno.ENOENT, not found) FileNotFoundError(2, 'not found') Not if you just pass a message (or anything else, actually): OSError(some message) OSError('some message',) But if you pass an explicit errno, then the subclass doesn't appear that surprising, does it? The other half is, let's say raising FileNotFoundError with the EEXIST errno. I'm guessing that the __init__'s for the new OSError subclasses will not have an `errno` attribute, so there's no way you can do that, but the PEP does not discuss this. Actually, the __new__ and the __init__ are exactly the same as OSError's: e = FileNotFoundError(some message) e.errno e = FileNotFoundError(errno.ENOENT, some message) e.errno 2 Wow, I didn't know ESRCH. How would you call the respective exceptions? - ChildProcessError for ECHILD? [...] - ProcessLookupError for ESRCH? [...] So in a sense, both are lookup errors, though I think it's going too far to multiply inherit from LookupError. Maybe ChildWaitError or ChildLookupError for the former? ProcessLookupError seems good to me. Ok. What if all the errno symbolic names were mapped as attributes on IOError? The only advantage of that would be to eliminate the need to import errno, or for the ugly `e.errno == errno.ENOENT` stuff. That would then be rewritten as `e.errno == IOError.ENOENT`. A mild savings to be sure, but still. Hmm, I guess that's explorable as an orthogonal idea. Cool. How should we capture that? A separate PEP perhaps, or more appropriately (IMHO) a tracker entry, since it's just about enriching the attributes of an existing type. I think it's a bit weird to define a whole lot of constants on a built-in type, though. Okay, so here's what's still
Re: [Python-Dev] Python 3 optimizations continued...
Le lundi 29 août 2011 19:35:14, stefan brunthaler a écrit : pretty much a year ago I wrote about the optimizations I did for my PhD thesis that target the Python 3 series interpreters Does it speed up Python? :-) Could you provide numbers (benchmarks)? Victor ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 393 review
Le lundi 29 août 2011 21:34:48, vous avez écrit : Those haven't been ported to the new API, yet. Consider, for example, d9821affc9ee. Before that, I got 253 MB/s on the 4096 units read test; with that change, I get 610 MB/s. The trunk gives me 488 MB/s, so this is a 25% speedup for PEP 393. If I understand correctly, the performance now highly depend on the used characters? A pure ASCII string is faster than a string with characters in the ISO-8859-1 charset? How did you infer that from above paragraph??? ASCII and Latin-1 are mostly identical in terms of performance - the ASCII decoder should be slightly slower than the Latin-1 decoder, since the ASCII decoder needs to check for errors, whereas the Latin-1 decoder will never be confronted with errors. I don't compare ASCII and ISO-8859-1 decoders. I was asking if decoding b'abc' from ISO-8859-1 is faster than decoding b'ab\xff' from ISO-8859-1, and if yes: why? Your patch replaces PyUnicode_New(size, 255) ... memcpy(), by PyUnicode_FromUCS1(). I don't understand how it makes Python faster: PyUnicode_FromUCS1() does first scan the input string for the maximum code point. I suppose that the main difference is that the ISO-8859-1 encoded string is stored as the UTF-8 encoded string (shared pointer) if all characters of the string are ASCII characters. In this case, encoding the string to UTF-8 doesn't cost anything, we already have the result. Am I correct? Victor ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Python 3 optimizations continued...
Does it speed up Python? :-) Could you provide numbers (benchmarks)? Yes, it does ;) The maximum overall speedup I achieved was by a factor of 2.42 on my i7-920 for the spectralnorm benchmark of the computer language benchmark game. Others from the same set are: binarytrees: 1.9257 (1.9891) fannkuch: 1.6509 (1.7264) fasta: 1.5446 (1.7161) mandelbrot: 2.0040 (2.1847) nbody: 1.6165 (1.7602) spectralnorm: 2.2538 (2.4176) --- overall: 1.8213 (1.9382) (The first number is the combination of all optimizations, the one in parentheses is with my last optimization [Interpreter Instruction Scheduling] enabled, too.) For a comparative real world benchmark I tested Martin von Loewis' django port (there are not that many meaningful Python 3 real world benchmarks) and got a speedup of 1.3 (without IIS). This is reasonably well, US got a speedup of 1.35 on this benchmark. I just checked that pypy-c-latest on 64 bit reports 1.5 (the pypy-c-jit-latest figures seem to be not working currently or *really* fast...), but I cannot tell directly how that relates to speedups (it just says less is better and I did not quickly find an explanation). Since I did this benchmark last year, I have spent more time investigating this benchmark and found that I could do better, but I would have to guess as to how much (An interesting aside though: on this benchmark, the executable never grew on more than 5 megs of memory usage, exactly like the vanilla Python 3 interpreter.) hth, --stefan ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Ctypes and the stdlib (was Re: LZMA compression support in 3.3)
On Sat, Aug 27, 2011 at 11:58 PM, Terry Reedy tjre...@udel.edu wrote: Dan, I once had the more or less the same opinion/question as you with regard to ctypes, but I now see at least 3 problems. 1) It seems hard to write it correctly. There are currently 47 open ctypes issues, with 9 being feature requests, leaving 38 behavior-related issues. Tom Heller has not been able to work on it since the beginning of 2010 and has formally withdrawn as maintainer. No one else that I know of has taken his place. I am trying to work through getting these issues resolved. The hard part so far has been getting reviews and commits. The follow patches are awaiting review (the patch for issue 11241 has been accepted, just not applied): 1. http://bugs.python.org/issue9041 2. http://bugs.python.org/issue9651 3. http://bugs.python.org/issue11241 I am more than happy to keep working through these issues, but I need some help getting the patches actually applied since I don't have commit rights. -- # Meador ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 3151 from the BDFOP
On Tue, Aug 30, 2011 at 7:18 AM, Barry Warsaw ba...@python.org wrote: Okay, so here's what's still outstanding for me: * Should we eliminate FileSystemError? (probably yes) I've also been persuaded that this isn't a generally meaningful categorisation, so +1 for dropping it. ConnectionError is worth keeping, though. * Should we ensure one errno == one exception? - i.e. separate EACCES and EPERM - i.e. separate EPIPE and ESHUTDOWN I think the concept of a 1:1 mapping is a complete non-starter, since OSError is always going to map to multiple errnos (i.e. everything that hasn't been assigned to a specific subclass). Maintaining the class categorisation down to a certain level for ease of separate handling is worthwhile, but below that point it's better to let people realise that they need to understand the subtleties of the different errno values. * Should the str of the new exception subclasses be improved (e.g. to include the symbolic name instead of the errno first)? I'd say that's a distinct RFE on the tracker (since it applies regardless of the acceptance or rejection of PEP 3151). Good idea in principle, though. * Is the OSError.__new__() hackery a good idea? I agree it's a little magical, but I also think the PEP becomes pretty useless without it. If OSError.__new__ handles the mapping, then most code (including C code) doesn't need to change - it will raise the new subclasses automatically. If we demand that all exception *raising* code be changed, then exception *catching* code will have a hard time assuming that the new subclasses are going to be raised correctly instead of a top level OSError. To make that transition feasible, I think we *need* to make it as hard as we can (if not impossible) to raise OSError instances with defined errno values that *don't* conform to the new hierarchy so that 3.3+ exception catching code doesn't need to worry about things like ENOENT being raised as OSError instead of FileNotFoundError. Only code that also supports earlier versions should need to resort to inspecting the errno values for the coarse distinctions that the PEP provides via the new class hierarchy. * Should the PEP define the signature of the new exceptions (e.g. to prohibit passing in an incorrect errno to an OSError subclass)? Unfortunately, I think the variations in errno details across platforms mean that being too restrictive in this space would cause more problems than it solves. So it may be wiser to technically allow people to do silly things like raise FileNotFoundError(errno.EPIPE) with the admonition not to actually do that because it is obscure and confusing. Consenting adults, etc. * Can we add ECHILD and ESRCH, and if so, what names should we use? +1 for ChildProcessError and ProcessLookupError (as peer exceptions on the tier directly below OSError) * Where can we capture the idea of putting the symbolic names on OSError class attributes, or is it a dumb idea that should be ditched? Tracker RFE for the former and maybe for the latter. With this PEP, the need for direct inspection of errno values should be significantly reduced in most code, so importing errno shouldn't be necessary. * How long should we wait for other Python implementations to chime in? Until Antoine gets back from his holiday sounds reasonable to me. Cheers, Nick. -- Nick Coghlan | ncogh...@gmail.com | Brisbane, Australia ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Ctypes and the stdlib
On Mon, Aug 29, 2011 at 2:39 AM, Stefan Behnel stefan...@behnel.de wrote: Guido van Rossum, 29.08.2011 04:27: Hm, the main use that was proposed here for ctypes is to wrap existing libraries (not to create nicer APIs, that can be done in pure Python on top of this). The same applies to Cython, obviously. The main advantage of Cython over ctypes for this is that the Python-level wrapper code is also compiled into C, so whenever the need for a thicker wrapper arises in some part of the API, you don't loose any performance in intermediate layers. Yes, this is a very nice advantage. The only advantage that I can think of for ctypes is that it doesn't require a toolchain -- you can just write the Python code and get going. With Cython you will always have to invoke the Cython compiler. Another advantage may be that it works *today* for PyPy -- I don't know the status of Cython for PyPy. Also, (maybe this was answered before?), how well does Cython deal with #include files (especially those you don't have control over, like the ones typically required to use some libfoo.so safely on all platforms)? -- --Guido van Rossum (python.org/~guido) ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Python 3 optimizations continued...
On Tue, Aug 30, 2011 at 7:14 AM, Antoine Pitrou solip...@pitrou.net wrote: On Mon, 29 Aug 2011 11:33:14 -0700 stefan brunthaler s.bruntha...@uci.edu wrote: * The optimized dispatch routine has a changed instruction format (word-sized instead of bytecodes) that allows for regular instruction decoding (without the HAS_ARG-check) and inlinind of some objects in the instruction format on 64bit architectures. Having a word-sized bytecode format would probably be acceptable in itself, so if you want to submit a patch for that, go ahead. Although any such patch should discuss how it compares with Cesare's work on wpython. Personally, I *like* CPython fitting into the simple-and-portable niche in the Python interpreter space. Armin Rigo made the judgment years ago that CPython was a poor platform for serious optimisation when he stopped working on Psyco and started PyPy instead, and I think the contrasting fates of PyPy and Unladen Swallow have borne out that opinion. Significantly increasing the complexity of CPython for speed-ups that are dwarfed by those available through PyPy seems like a poor trade-off to me. At a bare minimum, I don't think any significant changes should be made under the it will be faster justification until the bulk of the real-world benchmark suite used for speed.pypy.org is available for Python 3. (Wasn't there a GSoC project about that?) Cheers, Nick. -- Nick Coghlan | ncogh...@gmail.com | Brisbane, Australia ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Ctypes and the stdlib (was Re: LZMA compression support in 3.3)
On Mon, Aug 29, 2011 at 2:17 PM, Greg Ewing greg.ew...@canterbury.ac.nz wrote: Guido van Rossum wrote: (Just like Python's own .h files -- e.g. the extensive renaming of the Unicode APIs depending on narrow/wide build) How does Cython deal with these? Pyrex/Cython deal with it by generating C code that includes the relevant headers, so the C compiler expands all the macros, interprets the struct declarations, etc. All you need to do when writing the .pyx file is follow the same API that you would if you were writing C code to use the library. Interesting. Then how does Pyrex/Cython typecheck your code at compile time? -- --Guido van Rossum (python.org/~guido) ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Python 3 optimizations continued...
Personally, I *like* CPython fitting into the simple-and-portable niche in the Python interpreter space. Armin Rigo made the judgment years ago that CPython was a poor platform for serious optimisation when he stopped working on Psyco and started PyPy instead, and I think the contrasting fates of PyPy and Unladen Swallow have borne out that opinion. Significantly increasing the complexity of CPython for speed-ups that are dwarfed by those available through PyPy seems like a poor trade-off to me. I agree with the trade-off, but the nice thing is that CPython's interpreter remains simple and portable using my optimizations. All of these optimizations are purely interpretative and the complexity of CPython is not affected much. (For example, I have an inline-cached version of BINARY_ADD that is called INCA_FLOAT_ADD [INCA being my abbreviation for INline CAching]; you don't actually have to look at its source code, since it is generated by my code generator but can by looking at instruction traces immediately tell what's going on.) So, the interpreter remains fully portable and any compatibility issues with C modules should not occur either. At a bare minimum, I don't think any significant changes should be made under the it will be faster justification until the bulk of the real-world benchmark suite used for speed.pypy.org is available for Python 3. (Wasn't there a GSoC project about that?) Having more tests would surely be helpful, as already said, the most real-world stuff I can do is Martin's django patch (some of the other benchmarks though are from the shootout and I can [and did] run them, too {binarytrees, fannkuch, fasta, mandelbrot, nbody and spectralnorm}. I have also the AI benchmark from Unladden Swallow but no current figures.) Best, --stefan ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] issue 6721 Locks in python standard library should be sanitized on fork
On 8/29/2011 3:41 PM, Nir Aides wrote: I am not familiar with the python-dev definition for deprecation, but Possible to planned eventual removal when I used the word in the bug discussion I meant to advertize to users that they should not mix threading and forking since that mix is and will remain broken by design; I did not mean removal or crippling of functionality. This would be a note or warning in the doc. You can suggest what and where to add something on an existing issue or a new one. -- Terry Jan Reedy ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Python 3 optimizations continued...
On Tue, 30 Aug 2011 10:00:28 +1000 Nick Coghlan ncogh...@gmail.com wrote: Having a word-sized bytecode format would probably be acceptable in itself, so if you want to submit a patch for that, go ahead. Although any such patch should discuss how it compares with Cesare's work on wpython. Personally, I *like* CPython fitting into the simple-and-portable niche in the Python interpreter space. Changing the bytecode width wouldn't make the interpreter more complex. Armin Rigo made the judgment years ago that CPython was a poor platform for serious optimisation when he stopped working on Psyco and started PyPy instead, and I think the contrasting fates of PyPy and Unladen Swallow have borne out that opinion. Well, PyPy didn't show any significant achievements before they spent *much* more time on it than the Unladen Swallow guys did. Whether or not a good JIT is possible on top of CPython might remain a largely unanswered question. Significantly increasing the complexity of CPython for speed-ups that are dwarfed by those available through PyPy seems like a poor trade-off to me. Some years ago we were waiting for Unladen Swallow to improve itself and be ported to Python 3. Now it seems we are waiting for PyPy to be ported to Python 3. I'm not sure how let's just wait is a good trade-off if someone proposes interesting patches (which, of course, remains to be seen). At a bare minimum, I don't think any significant changes should be made under the it will be faster justification until the bulk of the real-world benchmark suite used for speed.pypy.org is available for Python 3. (Wasn't there a GSoC project about that?) I'm not sure what the bulk is, but have you already taken a look at http://hg.python.org/benchmarks/ ? Regards Antoine. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Python 3 optimizations continued...
On Mon, Aug 29, 2011 at 2:05 PM, stefan brunthaler s.bruntha...@uci.eduwrote: The question really is whether this is an all-or-nothing deal. If you could identify smaller parts that can be applied independently, interest would be higher. Well, it's not an all-or-nothing deal. In my current architecture, I can selectively enable most of the optimizations as I see fit. The only pre-requisite (in my implementation) is that I have two dispatch loops with a changed instruction format. It is, however, not a technical necessity, just the way I implemented it. Basically, you can choose whatever you like best, and I could extract that part. I am just offering to add all the things that I have done :) +1 from me on going forward with your performance improvements. The more you can break them down into individual smaller patch sets the better as they can be reviewed and applied as needed. A prerequisites patch, a patch for the wide opcodes, etc.. For benchmarks given this is python 3, just get as many useful ones running as you can. Some in this thread seemed to give the impression that CPython performance is not something to care about. I disagree. I see CPython being the main implementation of Python used in most places for a long time. Improving its performance merely raises the bar to be met by other implementations if they want to compete. That is a good thing! -gps Also, I'd be curious whether your techniques help or hinder a potential integration of a JIT generator. This is something I have previously frequently discussed with several JIT people. IMHO, having my optimizations in-place also helps a JIT compiler, since it can re-use the information I gathered to generate more aggressively optimized native machine code right away (the inline caches can be generated with the type information right away, some functions could be inlined with the guard statements subsumed, etc.) Another benefit could be that the JIT compiler can spend longer time on generating code, because the interpreter is already faster (so in some cases it would probably not make sense to include a non-optimizing fast and simple JIT compiler). There are others on the list, who probably can/want to comment on this, too. That aside, I think that while having a JIT is an important goal, I can very well imagine scenarios where the additional memory consumption (for the generated native machine code) of a JIT for each process (I assume that the native machine code caches are not shared) hinders scalability. I have in fact no data to back this up, but I think that would be an interesting trade off, say if I have 30% gain in performance without substantial additional memory requirements on my existing hardware, compared to higher achievable speedups that require more machines, though. Regards, --stefan ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/greg%40krypto.org ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Python 3 optimizations continued...
On Tue, Aug 30, 2011 at 12:38 PM, Gregory P. Smith g...@krypto.org wrote: Some in this thread seemed to give the impression that CPython performance is not something to care about. I disagree. I see CPython being the main implementation of Python used in most places for a long time. Improving its performance merely raises the bar to be met by other implementations if they want to compete. That is a good thing! Not the impression I intended to give. I merely want to highlight that we need to be careful that incremental increases in complexity are justified with real, measured performance improvements. PyPy has set the bar on how to do that - people that seriously want to make CPython faster need to focus on getting speed.python.org sorted *first* (so we know where we're starting) and *then* work on trying to improve CPython's numbers relative to that starting point. The PSF has the hardware to run the site, but, unless more has been going in the background than I am aware of, is still lacking trusted volunteers to do the following: 1. Getting codespeed up and running on the PSF hardware 2. Hooking it in to the CPython source control infrastructure 3. Getting a reasonable set of benchmarks running on 3.x (likely starting with the already ported set in Mercurial, but eventually we want the full suite that PyPy uses) 4. Once PyPy, Jython and IronPython offer 3.x compatible versions, start including them as well (alternatively, offer 2.x performance comparisons as well, although that's less interesting from a CPython point of view since it can't be used to guide future CPython optimisation efforts) Anecdotal, non-reproducible performance figures are *not* the way to go about serious optimisation efforts. Using a dedicated machine is vulnerable to architecture-specific idiosyncracies, but ad hoc testing on other systems can still be used as a sanity check. Regards, Nick. -- Nick Coghlan | ncogh...@gmail.com | Brisbane, Australia ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Ctypes and the stdlib (was Re: LZMA compression support in 3.3)
Guido van Rossum wrote: On Mon, Aug 29, 2011 at 2:17 PM, Greg Ewing greg.ew...@canterbury.ac.nz wrote: All you need to do when writing the .pyx file is follow the same API that you would if you were writing C code to use the library. Interesting. Then how does Pyrex/Cython typecheck your code at compile time? You might be reading more into that statement than I meant. You have to supply Pyrex/Cython versions of the C declarations, either hand-written or generated by a tool. But you write them based on the advertised C API -- you don't have to manually expand macros, work out the low-level layout of structs, or anything like that (as you often have to do when using ctypes). -- Greg ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com