Re: [Python-Dev] thoughts on the bytes/string discussion
Tres Seaver wrote: I do know for a fact that using a UCS2-compiled Python instead of the system's UCS4-compiled Python leads to measurable, noticable drop in memory consumption of long-running webserver processes using Unicode Would there be any sanity in having an option to compile Python with UTF-8 as the internal string representation? -- Greg ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] thoughts on the bytes/string discussion
Ian Bicking, 26.06.2010 00:26: On Fri, Jun 25, 2010 at 4:02 PM, Guido van Rossum wrote: On Fri, Jun 25, 2010 at 1:43 PM, Glyph Lefkowitz I'd like a version of 'decode' which would give me a type that was, in every respect, unicode, and responded to all protocols exactly as other unicode objects (or "str objects", if you prefer py3 nomenclature ;-)) do, but wouldn't actually copy any of that memory unless it really needed to (for example, to pass to a C API that expected native wide characters), and that would hold on to the original bytes so that it could produce them on demand if encoded to the same encoding again. So, as others in this thread have mentioned, the 'ABC' really implies some stuff about C APIs as well. Well, there's the buffer API, so you can already create something that refers to an existing C buffer. However, with respect to a string, you will have to make sure the underlying buffer doesn't get freed while the string is still in use. That will be hard and sometimes impossible to do at the C-API level, even if the string is allowed to keep a reference to something that holds the buffer. At least in lxml, such a feature would be completely worthless, as text is never held by any ref-counted Python wrapper object. It's only part of the XML tree, which is allowed to change at (more or less) any time, so the underlying char* buffer could just get freed without further notice. Adding a guard against that would likely have a larger impact on the performance than the decoding operations. I'm not sure about the exact performance impact of such a class, which is why I'd like the ability to implement it *outside* of the stdlib and see how it works on a project, and return with a proposal along with some data. There are also different ways to implement this, and other optimizations (like ropes) which might be better. You can almost do this today, but the lack of things like the hypothetical "__rcontains__" does make it impossible to be totally transparent about it. But you'd still have to validate it, right? You wouldn't want to go on using what you thought was wrapped UTF-8 if it wasn't actually valid UTF-8 (or you'd be worse off than in Python 2). So you're really just worried about space consumption. I'd like to see a lot of hard memory profiling data before I got overly worried about that. It wasn't my profiling, but I seem to recall that Fredrik Lundh specifically benchmarked ElementTree with all-unicode and sometimes-ascii-bytes, and found that using Python 2 strs in some cases provided notable advantages. I know Stefan copied ElementTree in this regard in lxml, maybe he also did a benchmark or knows of one? Actually, bytes vs. unicode doesn't make that a big difference in Py2 for lxml. ElementTree is a lot older, so I guess it made a larger difference when its code was written (and I even think I recall seeing numbers for lxml where it seemed to make a notable difference). In lxml, text content is stored in the C tree of libxml2 as UTF-8 encoded char* text. On request, lxml creates a string object from it and returns it. In Py2, it checks for plain ASCII content first and returns a byte string for that. Only non-ASCII strings are returned as decoded unicode strings. In Py3, it always returns unicode strings. When I run a little benchmark on lxml in Py2.6.5 that just reads some short text content from an Element object, I only see a tiny difference between unicode strings and byte strings. The gap obviously increases when the text gets longer, e.g. when I serialise the complete text content of an XML document to either a byte string or a unicode string. But even for documents in the megabyte range we are still talking about single milliseconds here, and the difference stays well below 10%. It's seriously hard to make that the performance bottleneck in an XML application. Also, since the string objects are only instantiated at request, memory isn't an issue either. That's different for (c)ElementTree again, where string content is stored as Python objects. Four times the size even for plain ASCII strings (e.g. numbers, IDs or even trailing whitespace!) can well become a problem there, and can easily dominate the overall size of the in-memory tree. Plain ASCII content is surprisingly common in XML documents. Stefan ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] thoughts on the bytes/string discussion
Greg Ewing, 26.06.2010 09:58: Tres Seaver wrote: I do know for a fact that using a UCS2-compiled Python instead of the system's UCS4-compiled Python leads to measurable, noticable drop in memory consumption of long-running webserver processes using Unicode Would there be any sanity in having an option to compile Python with UTF-8 as the internal string representation? It would break Py_UNICODE, because the internal size of a unicode character would no longer be fixed. Stefan ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Signs of neglect?
Nick Coghlan wrote: > On Sat, Jun 26, 2010 at 9:23 AM, Benjamin Peterson > wrote: >> 2010/6/25 Steve Holden : >> I would call it more a sign of no tests rather than one of neglect and >> perhaps also an indication of the usefulness of those tools. > > Less than useful tools with no tests probably qualify as neglected... > > An assessment of the contents of the Py3k tools directory is probably > in order, with at least a basic "will it run?" check added for those > we decide to keep.. > Neither webchecker nor wcgui.py will run - the former breaks because sgmllib is mossing, the latter because it uses the wrong name for "tkinter" (but overcoming this will throw it bak to an sgmllib dependency too). Guido thinks it's OK to abandon at least some of them, so I don't see the rest getting much love in the future. They do need sorting through - I don't see anyone wanting xxci.py, for example ("check in files for which rcsdiff returns nonzero exit status"). But I'm grateful you agree with my diagnosis of neglect (not that a diagnosis in itself is going to help in fixing things). regards Steve -- Steve Holden +1 571 484 6266 +1 800 494 3119 See Python Video! http://python.mirocommunity.org/ Holden Web LLC http://www.holdenweb.com/ UPCOMING EVENTS:http://holdenweb.eventbrite.com/ "All I want for my birthday is another birthday" - Ian Dury, 1942-2000 ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] [ANN]: "newthreading" - an approach to simplified thread usage, and a path to getting rid of the GIL
We have just released a proof-of-concept implementation of a new approach to thread management - "newthreading". It is available for download at https://sourceforge.net/projects/newthreading/ The user's guide is at http://www.animats.com/papers/languages/newthreadingintro.html This is a pure Python implementation of synchronized objects, along with a set of restrictions which make programs race-condition free, even without a Global Interpreter Lock. The basic idea is that classes derived from SynchronizedObject are automatically locked at entry and unlocked at exit. They're also unlocked when a thread blocks within the class. So at no time can two threads be active in such a class at one time. In addition, only "frozen" objects can be passed in and out of synchronized objects. (This is somewhat like the multiprocessing module, where you can only pass objects that can be "pickled". But it's not as restrictive; multiple threads can access the same synchronized object, one at a time. This pure Python implementation is usable, but does not improve performance. It's a proof of concept implementation so that programmers can try out synchronized classes and see what it's like to work within those restrictions. The semantics of Python don't change for single-thread programs. But when the program forks off the first new thread, the rules change, and some of the dynamic features of Python are disabled. Some of the ideas are borrowed from Java, and some are from "safethreading". The point is to come up with a set of liveable restrictions which would allow getting rid of the GIL. This is becoming essential as Unladen Swallow starts to work and the number of processors per machine keeps climbing. This may in time become a Python Enhancement Proposal. We'd like to get some experience with it first. Try it out and report back. The SourceForge forum for the project is the best place to report problems. John Nagle ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] [pypy-dev] PyPy 1.3 released
Hi, On Fri, Jun 25, 2010 at 05:27:52PM -0600, Maciej Fijalkowski wrote: >python setup.py build As corrected on the blog (http://morepypy.blogspot.com/), this line should read: pypy setup.py build Armin. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] [ANN]: "newthreading" - an approach to simplified thread usage, and a path to getting rid of the GIL
On 26/06/2010 07:11, John Nagle wrote: We have just released a proof-of-concept implementation of a new approach to thread management - "newthreading". It is available for download at https://sourceforge.net/projects/newthreading/ The user's guide is at http://www.animats.com/papers/languages/newthreadingintro.html The user guide says: The suggested import is from newthreading import * The import * form is considered bad practise in *general* and should not be recommended unless there is a good reason. This is slightly off-topic for python-dev, although I appreciate that you want feedback with the eventual goal of producing a PEP - however the introduction of free-threading in Python has not been hampered by lack of synchronization primitives but by the difficulty of changing the interpreter without unduly impacting single threaded code. Providing an alternative garbage collection mechanism other than reference counting would be a more interesting first-step as far as I can see, as that removes the locking required around every access to an object (which currently touches the reference count). Introducing free-threading by *changing* the threading semantics (so you can't share non-frozen objects between threads) would not be acceptable. That comment is likely to be based on a misunderstanding of your future intentions though. :-) All the best, Michael Foord This is a pure Python implementation of synchronized objects, along with a set of restrictions which make programs race-condition free, even without a Global Interpreter Lock. The basic idea is that classes derived from SynchronizedObject are automatically locked at entry and unlocked at exit. They're also unlocked when a thread blocks within the class. So at no time can two threads be active in such a class at one time. In addition, only "frozen" objects can be passed in and out of synchronized objects. (This is somewhat like the multiprocessing module, where you can only pass objects that can be "pickled". But it's not as restrictive; multiple threads can access the same synchronized object, one at a time. This pure Python implementation is usable, but does not improve performance. It's a proof of concept implementation so that programmers can try out synchronized classes and see what it's like to work within those restrictions. The semantics of Python don't change for single-thread programs. But when the program forks off the first new thread, the rules change, and some of the dynamic features of Python are disabled. Some of the ideas are borrowed from Java, and some are from "safethreading". The point is to come up with a set of liveable restrictions which would allow getting rid of the GIL. This is becoming essential as Unladen Swallow starts to work and the number of processors per machine keeps climbing. This may in time become a Python Enhancement Proposal. We'd like to get some experience with it first. Try it out and report back. The SourceForge forum for the project is the best place to report problems. John Nagle ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/fuzzyman%40voidspace.org.uk -- http://www.ironpythoninaction.com/ http://www.voidspace.org.uk/blog READ CAREFULLY. By accepting and reading this email you agree, on behalf of your employer, to release me from all obligations and waivers arising from any and all NON-NEGOTIATED agreements, licenses, terms-of-service, shrinkwrap, clickwrap, browsewrap, confidentiality, non-disclosure, non-compete and acceptable use policies (”BOGUS AGREEMENTS”) that I have entered into with your employer, its partners, licensors, agents and assigns, in perpetuity, without prejudice to my ongoing rights and privileges. You further represent that you have the authority to release me from any BOGUS AGREEMENTS on behalf of your employer. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] [ANN]: "newthreading" - an approach to simplified thread usage, and a path to getting rid of the GIL
On Sat, Jun 26, 2010 at 9:29 AM, Michael Foord wrote: > On 26/06/2010 07:11, John Nagle wrote: >> >> We have just released a proof-of-concept implementation of a new >> approach to thread management - "newthreading". It is available >> for download at >> >> https://sourceforge.net/projects/newthreading/ >> >> The user's guide is at >> >> http://www.animats.com/papers/languages/newthreadingintro.html > > The user guide says: > > The suggested import is > > from newthreading import * > > The import * form is considered bad practise in *general* and should not be > recommended unless there is a good reason. This is slightly off-topic for > python-dev, although I appreciate that you want feedback with the eventual > goal of producing a PEP - however the introduction of free-threading in > Python has not been hampered by lack of synchronization primitives but by > the difficulty of changing the interpreter without unduly impacting single > threaded code. > I asked John to drop a message here for this project - so feel free to flame me if anyone. This *is* relevant, and I'd guess fairly interesting to the group as a whole. jesse ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] [ANN]: "newthreading" - an approach to simplified thread usage, and a path to getting rid of the GIL
On Sat, 26 Jun 2010 14:29:24 +0100 Michael Foord wrote: > > the introduction of > free-threading in Python has not been hampered by lack of > synchronization primitives but by the difficulty of changing the > interpreter without unduly impacting single threaded code. Exactly what I think too. cheers Antoine. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] [ANN]: "newthreading" - an approach to simplified thread usage, and a path to getting rid of the GIL
On Sat, Jun 26, 2010 at 9:29 AM, Michael Foord wrote: > On 26/06/2010 07:11, John Nagle wrote: >> >> We have just released a proof-of-concept implementation of a new >> approach to thread management - "newthreading". It is available >> for download at >> >> https://sourceforge.net/projects/newthreading/ >> >> The user's guide is at >> >> http://www.animats.com/papers/languages/newthreadingintro.html > > The user guide says: > > The suggested import is > > from newthreading import * > > The import * form is considered bad practise in *general* and should not be > recommended unless there is a good reason. This is slightly off-topic for > python-dev, although I appreciate that you want feedback with the eventual > goal of producing a PEP - however the introduction of free-threading in > Python has not been hampered by lack of synchronization primitives but by > the difficulty of changing the interpreter without unduly impacting single > threaded code. > > Providing an alternative garbage collection mechanism other than reference > counting would be a more interesting first-step as far as I can see, as that > removes the locking required around every access to an object (which > currently touches the reference count). Introducing free-threading by > *changing* the threading semantics (so you can't share non-frozen objects > between threads) would not be acceptable. That comment is likely to be based > on a misunderstanding of your future intentions though. :-) > > All the best, > > Michael Foord I'd also like to point out, that one of the project John cites is Adam Olsen's Safethread work: http://code.google.com/p/python-safethread/ Which, in and of itself is a good read. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] thoughts on the bytes/string discussion
Greg Ewing writes: > Would there be any sanity in having an option to compile > Python with UTF-8 as the internal string representation? Losing Py_UNICODE as mentioned by Stefan Behnel (IIRC) is just the beginning of the pain. If Emacs's experience is any guide, the cost in speed and complexity of a variable-width internal representation is high. There are a number of tricks you can use, but basically everything becomes O(n) for the natural implementation of most operations (such as indexing by character). You can get around that with a position cache, of course, but that adds complexity, and really cuts into the space saving (and worse, adds another chunk that may or may not be paged in when you need it). What we're considering is a system where buffers come in 1-, 2-, and 4-octet widechars, with automatic translation depending on content. But the buffer is the primary random-access structure in Emacsen, so optimizing it is probably worth our effort. I doubt it would be worth it for Python, but my intuitions here are not reliable. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] [ANN]: "newthreading" - an approach to simplified thread usage, and a path to getting rid of the GIL
On 6/26/2010 7:44 AM, Jesse Noller wrote: On Sat, Jun 26, 2010 at 9:29 AM, Michael Foord wrote: On 26/06/2010 07:11, John Nagle wrote: We have just released a proof-of-concept implementation of a new approach to thread management - "newthreading". The import * form is considered bad practise in *general* and should not be recommended unless there is a good reason. I agree. I just did that to make the examples cleaner. however the introduction of free-threading in Python has not been hampered by lack of synchronization primitives but by the difficulty of changing the interpreter without unduly impacting single threaded code. That's what I'm trying to address here. Providing an alternative garbage collection mechanism other than reference counting would be a more interesting first-step as far as I can see, as that removes the locking required around every access to an object (which currently touches the reference count). Introducing free-threading by *changing* the threading semantics (so you can't share non-frozen objects between threads) would not be acceptable. That comment is likely to be based on a misunderstanding of your future intentions though. :-) This work comes out of a discussion a few of us had at a restaurant in Palo Alto after a Stanford talk by the group at Facebook which is building a JIT compiler for PHP. We were discussing how to make threading both safe for the average programmer and efficient. Javascript and PHP don't have threads at all; Python has safe threading, but it's slow. C/C++/Java all have race condition problems, of course. The Facebook guy pointed out that you can't redefine a function dynamically in PHP, and they get a performance win in their JIT by exploiting this. I haven't gone into the memory model in enough detail in the technical paper. The memory model I envision for this has three memory zones: 1. Shared fully-immutable objects: primarily strings, numbers, and tuples, all of whose elements are fully immutable. These can be shared without locking, and reclaimed by a concurrent garbage collector like Boehm's. They have no destructors, so finalization is not an issue. 2. Local objects. These are managed as at present, and require no locking. These can either be thread-local, or local to a synchronized object. There are no links between local objects under different "ownership". Whether each thread and object has its own private heap, or whether there's a common heap with locks at the allocator is an implementation decision. 3. Shared mutable objects: mostly synchronized objects, but also immutable objects like tuples which contain references to objects that aren't fully immutable. These are the high-overhead objects, and require locking during reference count updates, or atomic reference count operations if supported by the hardware. The general idea is to minimize the number of objects in this zone. The zone of an object is determined when the object is created, and never changes. This is relatively simple to implement. Tuples (and frozensets, frozendicts, etc.) are normally zone 2 objects. Only "freeze" creates collections in zones 1 and 3. Synchronized objects are always created in zone 3. There are no difficult handoffs, where an object that was previously thread-local now has to be shared and has to acquire locks during the transition. Existing interlinked data structures, like parse trees and GUIs, are by default zone 2 objects, with the same semantics as at present. They can be placed inside a SynchronizedObject if desired, which makes them usable from multiple threads. That's optional; they're thread-local otherwise. The rationale behind "freezing" some of the language semantics when the program goes multi-thread comes from two sources - Adam Olsen's Safethread work, and the acceptance of the multiprocessing module. Olsen tried to retain all the dynamism of the language in a multithreaded environment, but locking all the underlying dictionaries was a boat-anchor on the whole system, and slowed things down so much that he abandoned the project. The Unladen Swallow documentation indicates that early thinking on the project was that Olsen's approach would allow getting rid of the GIL, but later notes indicate that no path to a GIL-free JIT system is currently in development. The multiprocessing module provides semantics similar to threading with "freezing". Data passed between processes is "frozen" by pickling. Processes can't modify each other's code. Restrictive though the multiprocessing module is, it appears to be useful. It is sometimes recommended as the Pythonic approach to multi-core CPUs. This is an indication that "freezing" is not unacceptable to the user community. Most of the real-world use cases for extreme dynamism involve events that happen during startup. Configuration files are read, modules are selectively included, functions are overridden,
Re: [Python-Dev] bytes / unicode
At 12:42 PM 6/26/2010 +0900, Stephen J. Turnbull wrote: What I'm saying here is that if bytes are the signal of validity, and the stdlib functions preserve validity, then it's better to have the stdlib functions object to unicode data as an argument. Compare the alternative: it returns a unicode object which might get passed around for a while before one of your functions receives it and identifies it as unvalidated data. I still don't follow, since passing in bytes should return bytes. Returning unicode would be an error, in the case of a "polymorphic" function (per Guido). But you agree that there are better mechanisms for validation (although not available in Python yet), so I don't see this as an potential obstacle to polymorphism now. Nope. I'm just saying that, given two bytestrings to url-join or path join or whatever, a polymorph should hand back a bytestring. This seems pretty uncontroversial. > What I want is for the stdlib to create stringlike objects of a > type determined by the types of the inputs -- In general this is a hard problem, though. Polymorphism, OK, one-way tainting OK, but in general combining related types is pretty arbitrary, and as in the encoded-bytes case, the result type often varies depending on expectations of callers, not the types of the data. But the caller can enforce those expectations by passing in arguments whose types do what they want in such cases, as long as the string literals used by the function don't get to override the relevant parts of the string protocol(s). The idea that I'm proposing is that the basic string and byte types should defer to "user-defined" string types for mixed type operations, so that polymorphism of string-manipulation functions is the *default* case, rather than a *special* case. This makes tainting easier to implement, as well as optimizing and other special cases (like my "source string w/file and line info", or a string with font/formatting attributes). ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/pje%40telecommunity.com ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] versioned .so files for Python 3.2
On 25.06.2010 22:12, James Y Knight wrote: On Jun 25, 2010, at 4:53 AM, Scott Dial wrote: On 6/24/2010 8:23 PM, James Y Knight wrote: On Jun 24, 2010, at 5:53 PM, Scott Dial wrote: If the package has .so files that aren't compatible with other version of python, then what is the motivation for placing that in a shared location (since it can't actually be shared) Because python looks for .so files in the same place it looks for the .py files of the same package. My suggestion was that a package that contains .so files should not be shared (e.g., the entire lxml package should be placed in a version-specific path). The motivation for this PEP was to simplify the installation python packages for distros; it was not to reduce the number of .py files on the disk. Placing .so files together does not simplify that install process in any way. You will still have to handle such packages in a special way. This is a good point, but I think still falls short of a solution. For a package like lxml, indeed you are correct. Since debian needs to build it once per version, it could just put the entire package (.py files and .so files) into a different per-python-version directory. This is what is currently done. This will increase the size of packages by duplicating the .py files, or you have to install the .py in a common location (irrelevant to sys.path), and provide (sym)links to the expected location. A "different per-python-version directory" also has the disadvantage that file conflicts between (distribution) packages cannot be detected. However, then you have to also consider python packages made up of multiple distro packages -- like twisted or zope. Twisted includes some C extensions in the core package. But then there are other twisted modules (installed under a "twisted.foo" name) which do not include C extensions. If the base twisted package is installed under a version-specific directory, then all of the submodule packages need to also be installed under the same version-specific directory (and thus built for all versions). In the past, it has proven somewhat tricky to coordinate which directory the modules for package "foo" should be installed in, because you need to know whether *any* of the related packages includes a native ".so" file, not just the current package. The converse situation, where a base package did *not* get installed into a version-specific directory because it includes no native code, but a submodule *does* include a ".so" file, is even trickier. I don't think that installation into different locations based on the presence of extension will work. Should a location really change if an extension is added as an optimization? Splitting a (python) package into different installation locations should be avoided. Matthias ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] versioned .so files for Python 3.2
On 26.06.2010 02:19, Nick Coghlan wrote: On Sat, Jun 26, 2010 at 6:12 AM, James Y Knight wrote: However, then you have to also consider python packages made up of multiple distro packages -- like twisted or zope. Twisted includes some C extensions in the core package. But then there are other twisted modules (installed under a "twisted.foo" name) which do not include C extensions. If the base twisted package is installed under a version-specific directory, then all of the submodule packages need to also be installed under the same version-specific directory (and thus built for all versions). In the past, it has proven somewhat tricky to coordinate which directory the modules for package "foo" should be installed in, because you need to know whether *any* of the related packages includes a native ".so" file, not just the current package. The converse situation, where a base package did *not* get installed into a version-specific directory because it includes no native code, but a submodule *does* include a ".so" file, is even trickier. I think there are two major ways to tackle this: - allow multiple versions of a .so file within a single directory (i.e Barry's current suggestion) we already do this, see the naming of the extensions of a python debug build on Windows. Several distributions (Debian, Fedora, Ubuntu) do use this as well to provide extensions for python debug builds. - enhanced namespace packages, allowing a single package to be spread across multiple directories, some of which may be Python version specific (i.e. modifications to PEP 382 to support references to version-specific directories) this is not what I want to use in a distribution. package management systems like rpm and dpkg do handle conflicts and replacements of files pretty well, having the same file in potentially different locations in the file system doesn't help detecting conflicts and duplicate packages. Matthias ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] versioned .so files for Python 3.2
On 24.06.2010 22:46, Barry Warsaw wrote: On Jun 24, 2010, at 02:28 PM, Barry Warsaw wrote: On Jun 24, 2010, at 01:00 PM, Benjamin Peterson wrote: 2010/6/24 Barry Warsaw: On Jun 24, 2010, at 10:58 AM, Benjamin Peterson wrote: 2010/6/24 Barry Warsaw: Please let me know what you think. I'm happy to just commit this to the py3k branch if there are no objections. I don't think a new PEP is in order, but an update to PEP 3147 might make sense. How will this interact with PEP 384 if that is implemented? I'm trying to come up with something that will work immediately while PEP 384 is being adopted. But how will modules specify that they support multiple ABIs then? I didn't understand, so asked Benjamin for clarification in IRC. barry: if python 3.3 will only load x.3.3.so, but x.3.2.so supports the stable abi, will it load it? [14:25] gutworth: thanks, now i get it :) [14:26] gutworth: i think it should, but it wouldn't under my scheme. let me think about it So, we could say that PEP 384 compliant extension modules would get written without a version specifier. IOW, we'd treat foo.so as using the ABI. It would then be up to the Python runtime to throw ImportErrors if in fact we were loading a legacy, non-PEP 384 compliant extension. Is it realistic to never break the ABI? I would think of having the ABI encoded in the file name as well, and only bump the ABI if it does change. With the "versioned .so files" proposal an ABI bump is necessary with every python version, with PEP 384 the ABI bump will be decoupled from the python version. Matthias ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] FHS compliance of Python installation
On 25.06.2010 02:54, Ben Finney wrote: James Y Knight writes: Really, python should store the .py files in /usr/share/python/, the .so files in /usr/lib/x86_64- linux-gnu/python2.5-debug/, and the .pyc files in /var/lib/python2.5- debug. But python doesn't work like that. +1 So who's going to draft the “Filesystem Hierarchy Standard compliance” PEP? :-) This has nothing to do with the FHS. The FHS talks about data, not code. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] FHS compliance of Python installation
On Sat, Jun 26, 2010 at 10:25:28PM +0200, Matthias Klose wrote: > On 25.06.2010 02:54, Ben Finney wrote: >> James Y Knight writes: >> >>> Really, python should store the .py files in /usr/share/python/, the >>> .so files in /usr/lib/x86_64- linux-gnu/python2.5-debug/, and the .pyc >>> files in /var/lib/python2.5- debug. But python doesn't work like that. >> >> +1 >> >> So who's going to draft the ???Filesystem Hierarchy Standard compliance??? >> PEP? :-) > > This has nothing to do with the FHS. The FHS talks about data, not code. Really? It has some guidelines here for object files, etc., at least as of 2004. http://www.pathname.com/fhs/pub/fhs-2.3.html A quick scan suggests /usr/lib is the right place to look: http://www.pathname.com/fhs/pub/fhs-2.3.html#USRLIBLIBRARIESFORPROGRAMMINGANDPA cheers, --titus -- C. Titus Brown, c...@msu.edu ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] FHS compliance of Python installation
On 26.06.2010 22:30, C. Titus Brown wrote: On Sat, Jun 26, 2010 at 10:25:28PM +0200, Matthias Klose wrote: On 25.06.2010 02:54, Ben Finney wrote: James Y Knight writes: Really, python should store the .py files in /usr/share/python/, the .so files in /usr/lib/x86_64- linux-gnu/python2.5-debug/, and the .pyc files in /var/lib/python2.5- debug. But python doesn't work like that. +1 So who's going to draft the ???Filesystem Hierarchy Standard compliance??? PEP? :-) This has nothing to do with the FHS. The FHS talks about data, not code. Really? It has some guidelines here for object files, etc., at least as of 2004. http://www.pathname.com/fhs/pub/fhs-2.3.html A quick scan suggests /usr/lib is the right place to look: http://www.pathname.com/fhs/pub/fhs-2.3.html#USRLIBLIBRARIESFORPROGRAMMINGANDPA agreed for object files, but http://www.pathname.com/fhs/pub/fhs-2.3.html#USRSHAREARCHITECTUREINDEPENDENTDATA explicitely states "The /usr/share hierarchy is for all read-only architecture independent *data* files". ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] versioned .so files for Python 3.2
On 25.06.2010 20:58, Brett Cannon wrote: On Fri, Jun 25, 2010 at 01:53, Scott Dial Placing .so files together does not simplify that install process in any way. You will still have to handle such packages in a special way. You must still compile the package multiple times for each relevant version of python (with special tagging that I imagine distutils can take care of) and, worse yet, you have created a more trick install than merely having multiple search paths (e.g., installing/uninstalling lxml for *one* version of python is actually more difficult in this scheme). This is meant to be used by distros in a programmatic fashion, so my response is "so what?" Their package management system is going to maintain the directory, not a person. You and I are not going to be using this for anything. This is purely meant for Linux OS vendors (maybe OS X) to manage their installs through their package software. I honestly do not expect human beings to be mucking around with these installs (and I suspect Barry doesn't either). Placing files for a distribution in a version-independent path does help distributions handling file conflicts, detecting duplicates and with moving files between different (distribution) packages. Having non-conflicting extension names is a schema which already is used on some platforms (debug builds on Windows). The question for me is, if just a renaming of the .so files is acceptable for upstream, or if distributors should implement this on their own, as something like: if ext_path.startswith('/usr/') and not ext_path.startswith('/usr/local/'): load_ext('foo.2.6.so') else: load_ext('foo.so') I fear this will cause issues when e.g. virtualenv environments start copying parts from the system installation instead of symlinking it. Matthias ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] what environment variable should contain compiler warning suppression flags?
Brett Cannon wrote: I finally realized why clang has not been silencing its warnings about unused return values: I have -Wno-unused-value set in CFLAGS which comes before OPT (which defines -Wall) as set in PY_CFLAGS in Makefile.pre.in. I could obviously set OPT in my environment, but that would override the default OPT settings Python uses. I could put it in EXTRA_CFLAGS, but the README says that's for stuff that tweak binary compatibility. So basically what I am asking is what environment variable should I use? If CFLAGS is correct then does anyone have any issues if I change the order of things for PY_CFLAGS in the Makefile so that CFLAGS comes after OPT? It is not important to me as flags set to BASECFLAGS, CFLAGS, OPT or EXTRA_CFLAGS will set makefile macros CFLAGS and after distribution python distutil will use them to build extension modules. So all variable are equal for builds. Also after configure without OPT variable set we could check what script select for build platform and to rerun configure with OPT+own_flags set on command line (! ;) ) . Roumen ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] FHS compliance of Python installation
On Jun 26, 2010, at 4:35 PM, Matthias Klose wrote: On 26.06.2010 22:30, C. Titus Brown wrote: On Sat, Jun 26, 2010 at 10:25:28PM +0200, Matthias Klose wrote: On 25.06.2010 02:54, Ben Finney wrote: James Y Knight writes: Really, python should store the .py files in /usr/share/python/, the .so files in /usr/lib/x86_64- linux-gnu/python2.5-debug/, and the .pyc files in /var/lib/python2.5- debug. But python doesn't work like that. +1 So who's going to draft the ???Filesystem Hierarchy Standard compliance??? PEP? :-) This has nothing to do with the FHS. The FHS talks about data, not code. Really? It has some guidelines here for object files, etc., at least as of 2004. http://www.pathname.com/fhs/pub/fhs-2.3.html A quick scan suggests /usr/lib is the right place to look: http://www.pathname.com/fhs/pub/fhs-2.3.html#USRLIBLIBRARIESFORPROGRAMMINGANDPA agreed for object files, but http://www.pathname.com/fhs/pub/fhs-2.3.html#USRSHAREARCHITECTUREINDEPENDENTDATA explicitely states "The /usr/share hierarchy is for all read-only architecture independent *data* files". I always figured the "read-only architecture independent" bit was the important part there, and "code is data". Emacs's el files go into / usr/share/emacs, for instance. James ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] thoughts on the bytes/string discussion
The several posts in this and other threads go me to think about text versus number computing (which I am more familiar with). For numbers, we have in Python three builtins, the general purpose ints and floats and the more specialized complex. Two other rational types can be imported for specialized uses. And then there are 3rd-party libraries like mpz and numpy with more number and array of number types. What makes these all potentially work together is the special method system, including, in particular, the rather complete set of __rxxx__ number methods. The latter allow non-commutative operations to be mixed either way and ease mixed commutative operations. For text, we have general purpose str and encoded bytes (and bytearry). I think these are sufficient for general use and I am not sure there should even be anything else in the stdlib. But I think it should be possible to experiment with and use specialized 3rd-party text classes just as one can with number classes. I can imagine that inter-operation, when appropriate, might work better with addition of a couple of missing __rxxx__ methods, such as the mentioned __rcontains__. Although adding such would affect the implementation of a core syntax feature, it would not affect syntax as such as seen by the user. -- Terry Jan Reedy ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] what environment variable should contain compiler warning suppression flags?
On Wed, Jun 23, 2010 at 14:53, Brett Cannon wrote: > I finally realized why clang has not been silencing its warnings about > unused return values: I have -Wno-unused-value set in CFLAGS which > comes before OPT (which defines -Wall) as set in PY_CFLAGS in > Makefile.pre.in. > > I could obviously set OPT in my environment, but that would override > the default OPT settings Python uses. I could put it in EXTRA_CFLAGS, > but the README says that's for stuff that tweak binary compatibility. > > So basically what I am asking is what environment variable should I > use? If CFLAGS is correct then does anyone have any issues if I change > the order of things for PY_CFLAGS in the Makefile so that CFLAGS comes > after OPT? > Since no one objected I swapped the order in r82259. In case anyone else uses clang to compile Python, this means that -Wno-unused-value will now work to silence the warning about unused return values that is caused by some macros. Probably using -Wno-empty-body is also good to avoid all the warnings triggered by the UCS4 macros in cjkcodecs. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] versioned .so files for Python 3.2
On 6/26/2010 4:06 PM, Matthias Klose wrote: > On 25.06.2010 22:12, James Y Knight wrote: >> On Jun 25, 2010, at 4:53 AM, Scott Dial wrote: >>> Placing .so files together does not simplify that install process in any >>> way. You will still have to handle such packages in a special way. >> >> This is a good point, but I think still falls short of a solution. For a >> package like lxml, indeed you are correct. Since debian needs to build >> it once per version, it could just put the entire package (.py files and >> .so files) into a different per-python-version directory. > > This is what is currently done. This will increase the size of packages > by duplicating the .py files, or you have to install the .py in a common > location (irrelevant to sys.path), and provide (sym)links to the > expected location. "This is what is currently done" and "provide (sym)links to the expected location" are conflicting statements. If you are symlinking .py files from a shared location, then that is not the same as "just install the package into a version-specific location". What motivation is there for preferring symlinks? Who cares if a ditro package install yields duplicate .py files? Nor am I motivated by having to carry duplicate .py files in a distribution package (I imagine the compression of duplicate .py files is amazing). > A "different per-python-version directory" also has the disadvantage > that file conflicts between (distribution) packages cannot be detected. Why? That sounds like a broken tool, maybe I am naive, please explain. If two packages install /usr/lib/python2.6/foo.so that should be just as detectable two installing /usr/lib/python-shared/foo.cpython-26.so If you *must* compile .so files for every supported version of python at packaging time, then you are already saying the set of python versions is known. I fail to see the difference between a package that installs .py and .so files into many directories than having many .so files in a single directory; except that many directories *already* works. The only gain I can see is that you save duplicate .py files in the package and on the filesystem, and I don't feel that gain alone warrants this fundamental change. I would appreciate a proper explanation of why/how a single directory is better for your distribution. Also, I haven't heard anyone that wasn't using debian tools chime in with support for any of this, so I would like to know how this can help RPMs and ebuilds and the like. > I don't think that installation into different locations based on the > presence of extension will work. Should a location really change if an > extension is added as an optimization? Splitting a (python) package > into different installation locations should be avoided. I'm not sure why changing paths would matter; any package that writes data in its install location would be considered broken by your distro already, so what harm is there in having the packaging tool move it later? Your tool will remove the old path and place it in a new path. All of these shenanigans seem to manifest from your distro's python-support/-central design, which seems to be entirely motivated by reducing duplicate files and *not* simplifying the packaging. While this plan works rather well with .py files, the devil is in the details. I don't think Python should be getting involved in what I believe is a flawed design. What happens to the distro packaging if a python package splits the codebase between 2.x and 3.x (meaning they have distinct .py files)? As someone else mentioned, how is virtualenv going to interact with packages that install like this? -- Scott Dial sc...@scottdial.com scod...@cs.indiana.edu ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] what environment variable should contain compiler warning suppression flags?
Brett Cannon wrote: > On Wed, Jun 23, 2010 at 14:53, Brett Cannon wrote: >> I finally realized why clang has not been silencing its warnings about >> unused return values: I have -Wno-unused-value set in CFLAGS which >> comes before OPT (which defines -Wall) as set in PY_CFLAGS in >> Makefile.pre.in. >> >> I could obviously set OPT in my environment, but that would override >> the default OPT settings Python uses. I could put it in EXTRA_CFLAGS, >> but the README says that's for stuff that tweak binary compatibility. >> >> So basically what I am asking is what environment variable should I >> use? If CFLAGS is correct then does anyone have any issues if I change >> the order of things for PY_CFLAGS in the Makefile so that CFLAGS comes >> after OPT? >> > > Since no one objected I swapped the order in r82259. In case anyone > else uses clang to compile Python, this means that -Wno-unused-value > will now work to silence the warning about unused return values that > is caused by some macros. Probably using -Wno-empty-body is also good > to avoid all the warnings triggered by the UCS4 macros in cjkcodecs. I think you need to come up with a different solution and revert the change... OPT has historically been the only variable to use for adjusting the Python C compiler settings. As the name implies this was usually used to adjust the optimizer settings, including raising the optimization level from the default or disabling it. With your change CFLAGS will always override OPT and thus any optimization definitions made in OPT will no longer have an effect. Note that CFLAGS defines -O2 on many platforms. In your particular case, you should try setting OPT to "... -Wno-unused-value ..." (ie. replace -Wall with your setting). -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Jun 27 2010) >>> Python/Zope Consulting and Support ...http://www.egenix.com/ >>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/ 2010-07-19: EuroPython 2010, Birmingham, UK21 days to go ::: Try our new mxODBC.Connect Python Database Interface for free ! eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] what environment variable should contain compiler warning suppression flags?
On Sat, Jun 26, 2010 at 16:37, M.-A. Lemburg wrote: > Brett Cannon wrote: >> On Wed, Jun 23, 2010 at 14:53, Brett Cannon wrote: >>> I finally realized why clang has not been silencing its warnings about >>> unused return values: I have -Wno-unused-value set in CFLAGS which >>> comes before OPT (which defines -Wall) as set in PY_CFLAGS in >>> Makefile.pre.in. >>> >>> I could obviously set OPT in my environment, but that would override >>> the default OPT settings Python uses. I could put it in EXTRA_CFLAGS, >>> but the README says that's for stuff that tweak binary compatibility. >>> >>> So basically what I am asking is what environment variable should I >>> use? If CFLAGS is correct then does anyone have any issues if I change >>> the order of things for PY_CFLAGS in the Makefile so that CFLAGS comes >>> after OPT? >>> >> >> Since no one objected I swapped the order in r82259. In case anyone >> else uses clang to compile Python, this means that -Wno-unused-value >> will now work to silence the warning about unused return values that >> is caused by some macros. Probably using -Wno-empty-body is also good >> to avoid all the warnings triggered by the UCS4 macros in cjkcodecs. > > I think you need to come up with a different solution and revert > the change... > > OPT has historically been the only variable to use for > adjusting the Python C compiler settings. Just found the relevant section in the README. > > As the name implies this was usually used to adjust the > optimizer settings, including raising the optimization level > from the default or disabling it. It meant optional to me, not optimization. I hate abbreviations sometimes. > > With your change CFLAGS will always override OPT and thus > any optimization definitions made in OPT will no longer > have an effect. That was the point; OPT defines defaults through configure.in and I simply wanted to add to those instead of having OPT completely overwritten by me. > > Note that CFLAGS defines -O2 on many platforms. So then wouldn't that mean they want that to be the optimization level? Or is the historical reason that default exists is so that some default exists but to expect the application to override as desired? > > In your particular case, you should try setting OPT to > "... -Wno-unused-value ..." (ie. replace -Wall with your > setting). So what is CFLAGS for then? ``configure -h`` says it's for "C compiler flags"; that's extremely ambiguous. And it doesn't help that OPT is not mentioned by ``configure -h`` as that is what I have always gone by to know what flags are available for compilation. -Brett > > -- > Marc-Andre Lemburg > eGenix.com > > Professional Python Services directly from the Source (#1, Jun 27 2010) Python/Zope Consulting and Support ... http://www.egenix.com/ mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ > > 2010-07-19: EuroPython 2010, Birmingham, UK 21 days to go > > ::: Try our new mxODBC.Connect Python Database Interface for free ! > > > eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 > D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg > Registered at Amtsgericht Duesseldorf: HRB 46611 > http://www.egenix.com/company/contact/ > ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] bytes / unicode
On Sun, Jun 27, 2010 at 4:17 AM, P.J. Eby wrote: > The idea that I'm proposing is that the basic string and byte types should > defer to "user-defined" string types for mixed type operations, so that > polymorphism of string-manipulation functions is the *default* case, rather > than a *special* case. This makes tainting easier to implement, as well as > optimizing and other special cases (like my "source string w/file and line > info", or a string with font/formatting attributes). Rather than building this into the base string type, perhaps it would be better (at least initially) to add in a polymorphic str subtype that worked along the following lines: 1. Has an encoded argument in the constructor (e.g. poly_str("/", encoded=b"/") 2. If given objects with an encode() method, assumes they're strings and uses its own parent class methods 3. If given objects with a decode() method, assumes they're encoded and delegates to the encoded attribute str/bytes agnostic functions would need to invoke poly_str deliberately, while bytes-only and text-only algorithms could just use the appropriate literals. Third party types would be supported to some degree (by having either encode or decode methods), although they could still run into trouble with some operations (While full support for third party strings and byte sequence implementations is an interesting idea, I think it's overkill for the specific problem of making it easier to write str/bytes agnostic functions for tasks like URL parsing). Regards, Nick. -- Nick Coghlan | ncogh...@gmail.com | Brisbane, Australia ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] thoughts on the bytes/string discussion
On Sun, Jun 27, 2010 at 8:11 AM, Terry Reedy wrote: > I can imagine that inter-operation, when appropriate, might work better with > addition of a couple of missing __rxxx__ methods, such as the mentioned > __rcontains__. Although adding such would affect the implementation of a > core syntax feature, it would not affect syntax as such as seen by the user. The problem with strings isn't really the binary operations like __contains__ - adding __rcontains__ would be a fairly simple extrapolation of the existing approaches. Where it gets really messy for strings is the fact that whereas invoking named methods directly on numbers is rare, invoking them on strings is very common, and some of those methods (e.g. split(), join(), __mod__()) allow or require an iterable rather than a single object. This extends the range of use cases to be covered beyond those with syntactic support to potentially include all string methods that take arguments. Creating minimally surprising semantics for the methods which accept iterables is also rather challenging. It's an interesting idea, but I think it's overkill for the specific problem of making it easier to perform more text-like manipulations in a bytes-only domain. Cheers, NIck. -- Nick Coghlan | ncogh...@gmail.com | Brisbane, Australia ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] bytes / unicode
At 12:43 PM 6/27/2010 +1000, Nick Coghlan wrote: While full support for third party strings and byte sequence implementations is an interesting idea, I think it's overkill for the specific problem of making it easier to write str/bytes agnostic functions for tasks like URL parsing. OTOH, to write your partial implementation is almost as complex - it still must take into account joining and formatting, and so by that point, you've just proposed a new protocol for coercion... so why not just make the coercion protocol explicit in the first place, rather than hardwiring a third type's worth of special cases? Remember, bytes and strings already have to detect mixed-type operations. If there was an API for that, then the hardcoded special cases would just be replaced, or supplemented with type slot checks and calls after the special cases. To put it another way, if you already have two types special-casing their interactions with each other, then rather than add a *third* type to that mix, maybe it's time to have a protocol instead, so that the types that care can do the special-casing themselves, and you generalize to N user types. (Btw, those who are saying that the resulting potential for N*N interaction makes the feature unworkable seem to be overlooking metaclasses and custom numeric types -- two Python features that in principle have the exact same problem, when you use them beyond a certain scope. At least with those features, though, you can generally mix your user-defined metaclasses or numeric types with the Python-supplied basic ones and call arbitrary Python functions on them, without as much heartbreak as you'll get with a from-scratch stringlike object.) All that having been said, a new protocol probably falls under the heading of the language moratorium, unless it can be considered "new methods on builtins"? (But that seems like a stretch even to me.) I just hate the idea that functions taking strings should have to be *rewritten* to be explicitly type-agnostic. It seems *so* un-Pythonic... like if all the bitmasking functions you'd ever written using 32-bit int constants had to be rewritten just because we added longs to the language, and you had to upcast them to be compatible or something. Sounds too much like C or Java or some other non-Python language, where dynamism and polymorphy are the special case, instead of the general rule. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] what environment variable should contain compiler warning suppression flags?
On Sat, Jun 26, 2010 at 4:37 PM, M.-A. Lemburg wrote: > Brett Cannon wrote: >> On Wed, Jun 23, 2010 at 14:53, Brett Cannon wrote: >>> I finally realized why clang has not been silencing its warnings about >>> unused return values: I have -Wno-unused-value set in CFLAGS which >>> comes before OPT (which defines -Wall) as set in PY_CFLAGS in >>> Makefile.pre.in. >>> >>> I could obviously set OPT in my environment, but that would override >>> the default OPT settings Python uses. I could put it in EXTRA_CFLAGS, >>> but the README says that's for stuff that tweak binary compatibility. >>> >>> So basically what I am asking is what environment variable should I >>> use? If CFLAGS is correct then does anyone have any issues if I change >>> the order of things for PY_CFLAGS in the Makefile so that CFLAGS comes >>> after OPT? >>> >> >> Since no one objected I swapped the order in r82259. In case anyone >> else uses clang to compile Python, this means that -Wno-unused-value >> will now work to silence the warning about unused return values that >> is caused by some macros. Probably using -Wno-empty-body is also good >> to avoid all the warnings triggered by the UCS4 macros in cjkcodecs. > > I think you need to come up with a different solution and revert > the change... > > OPT has historically been the only variable to use for > adjusting the Python C compiler settings. > > As the name implies this was usually used to adjust the > optimizer settings, including raising the optimization level > from the default or disabling it. > > With your change CFLAGS will always override OPT and thus > any optimization definitions made in OPT will no longer > have an effect. > > Note that CFLAGS defines -O2 on many platforms. > > In your particular case, you should try setting OPT to > "... -Wno-unused-value ..." (ie. replace -Wall with your > setting). The python configure environment variables are really confused. If OPT is intended to be user-overridden for optimization settings, it shouldn't be used to set -Wall and -Wstrict-prototypes. If it's intended to set warning options, it shouldn't also set optimization options. Setting the user-visible customization option on the configure command line shouldn't stomp unrelated defaults. In configure-based systems, CFLAGS is traditionally (http://sources.redhat.com/automake/automake.html#Flag-Variables-Ordering) the way to tack options onto the end of the command line. Python breaks this by threading flags through CFLAGS in the makefile, which means they all get stomped if the user sets CFLAGS on the make command line. We should instead use another spelling ("CFlags"?) for the internal variable, and append $(CFLAGS) to it. AC_PROG_CC is the macro that sets CFLAGS to -g -O2 on gcc-based systems (http://www.gnu.org/software/hello/manual/autoconf/C-Compiler.html#index-AC_005fPROG_005fCC-842). If Python's configure.in sets an otherwise-empty CFLAGS to -g before calling AC_PROG_CC, AC_PROG_CC won't change it. Or we could just preserve the users CFLAGS setting across AC_PROG_CC regardless of whether it's set, to let the user set CFLAGS on the configure line without stomping any defaults. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] bytes / unicode
On Sun, Jun 27, 2010 at 1:49 PM, P.J. Eby wrote: > I just hate the idea that functions taking strings should have to be > *rewritten* to be explicitly type-agnostic. It seems *so* un-Pythonic... > like if all the bitmasking functions you'd ever written using 32-bit int > constants had to be rewritten just because we added longs to the language, > and you had to upcast them to be compatible or something. Sounds too much > like C or Java or some other non-Python language, where dynamism and > polymorphy are the special case, instead of the general rule. The difference is that we have three classes of algorithm here: - those that work only on octet sequences - those that work only on character sequences - those that can work on either Python 2 lumped all 3 classes of algorithm together through the multi-purpose 8-bit str type. The unicode type provided some scope to separate out the second category, but the divisions were rather blurry. Python 3 forces the first two to be separated by using either octets (bytes/bytearray) or characters (str). There are a *very small* number of APIs where it is appropriate to be polymorphic, but this is currently difficult due to the need to supply literals of the appropriate type for the objects being operated on. This isn't ever going to happen automagically due to the need to explicitly provide two literals (one for octet sequences, one for character sequences). The virtues of a separate poly_str type are that: 1. It can be simple and implemented in Python, dispatching to str or bytes as appropriate (probably in the strings module) 2. No chance of impacting the performance of the core interpreter (as builtins are not affected) 3. Lower impact if it turns out to have been a bad idea We could talk about this even longer, but the most effective way forward is going to be a patch that improves the URL parsing situation. Cheers, Nick. -- Nick Coghlan | ncogh...@gmail.com | Brisbane, Australia ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com