Re: [Python-Dev] Fixing the XML batteries
On Fri, Dec 16, 2011 at 4:53 PM, Stefan Behnel stefan...@behnel.de wrote: If these changes are considered acceptable, I'll copy the above over to the documentation bug I opened at http://bugs.python.org/issue11379 Can these doc changes go into both 2.7 and 3.3? Given that there is no important difference between the implementations, I don't see why the documentation should differ in Py2. Your suggested tweaks look good to me and could go into all of 2.7, 3.2 and 3.3 b) cElementTree should finally loose it's special status as a separate library and disappear as an accelerator module behind ElementTree. There was no opposition and a general agreement on this in the thread, except for the warning that Fredrik Lundh should have a word in this. I wrote him an e-mail and didn't get a response so far. We can wait a little longer, I guess, there's still time before 3.3beta. Having ElementTree implicitly do from _elementtree import * is a 3.3 only change, though. (Note that xml.etree.cElementTree isn't the actual acceleration module - that honor already goes to _elementtree. The only bit missing is the automatic import in xml.etree.ElementTree and the appropriate test updates to ensure the Python version still gets tested) Cheers, Nick. -- Nick Coghlan | ncogh...@gmail.com | Brisbane, Australia ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] French sprint this week-end
Victor Stinner victor.stin...@haypocalc.com wrote: Do you know simple task to start contributing to Python? Something useful and not boring if possible :-) There is the easy tag on the bug tracker, but many issues have a long history, already have a patch, etc. Do know other generic task like improving code coverage or support of some rare platforms? On some buildbots compiler warnings are starting to accumulate. Installing a recent version of gcc and fixing those might be a good task. If the participants are new to buildbot, it might even be interesting for them. :) Stefan Krah ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] French sprint this week-end
On Fri, Dec 16, 2011 at 11:00, Stefan Krah ste...@bytereef.org wrote: Victor Stinner victor.stin...@haypocalc.com wrote: Do you know simple task to start contributing to Python? Something useful and not boring if possible :-) There is the easy tag on the bug tracker, but many issues have a long history, already have a patch, etc. Do know other generic task like improving code coverage or support of some rare platforms? On some buildbots compiler warnings are starting to accumulate. Installing a recent version of gcc and fixing those might be a good task. If the participants are new to buildbot, it might even be interesting for them. :) Do we have buildbots that build Python with Clang instead of GCC? The reason I'm asking is that Clang's diagnostics are usually better, and fixing all its warnings could nicely complement fixing GCC's qualms. Eli ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] French sprint this week-end
On Fri, Dec 16, 2011 at 10:17, Eli Bendersky eli...@gmail.com wrote: Do we have buildbots that build Python with Clang instead of GCC? The reason I'm asking is that Clang's diagnostics are usually better, and fixing all its warnings could nicely complement fixing GCC's qualms. The box running my buildslave has clang installed, so someone with access to the buildmaster could probably set that up without too much trouble. Cheers, Dirkjan ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] A new dict for Xmas?
Greg Ewing wrote: Mark Shannon wrote: I have a new dict implementation which allows sharing of keys between objects of the same class. We already have the __slots__ mechanism for memory savings. Have you done any comparisons with that? You can't make Python programmers use slots, neither can you automatically change existing programs. Are you suggesting that because the __slots__ mechanism exists, the dict implementation doesn't have to be efficient? Seems to me that __slots__ ought to save even more memory, since it eliminates the per-instance dict altogether rather than just the keys half of it. Of course using __slots__ saves more memory, but people don't use them much. Cheers, Mark. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Fixing the XML batteries
Stefan Behnel, 14.12.2011 20:41: It's clear from the discussion that there are still users and that new code is still being written that uses MiniDOM. However, I would argue that this cannot possibly be performance critical code and that it only deals with somewhat small documents. I say that because MiniDOM is evidently not suitable for large documents or performance critical applications, so this is the only explanation I have why the performance problems would not be obvious in the cases where it is still being used. And if they do show, it appears to be much more likely that users rewrite their code using ElementTree or lxml than that they try to fix MiniDOM's performance issues. Out of curiosity, I reran my benchmarks under PyPy 1.7. http://blog.behnel.de/index.php?p=210 In short: MiniDOM performs substantially better there, both in terms of time and space. That by itself doesn't make PyPy an interesting platform for XML processing (using lxml in CPython is way faster), but I found it interesting to note that the problem is not strictly inherent in MiniDOM. It also depends a lot on the runtime environment, even when it comes to memory usage. Stefan ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Fixing the XML batteries
Le 16/12/2011 07:53, Stefan Behnel a écrit : Additionally, the documentation on the xml.sax page would benefit from the following paragraph: [[Note: The xml.sax package provides an implementation of the SAX interface whose API is similar to that in other programming languages. Users who are unfamiliar with the SAX interface or who would like to write less code for efficient stream processing of XML files should consider using the iterparse() function in the xml.etree.ElementTree module instead.]] A small caveat to note about iterparse(), which I otherwise like a lot: when processing very big data (I encountered this with a region-wide openstreetmap XML dump), you have to remove the processed nodes from the root element. Otherwise, its memory footprint increases with the size of the document. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] Summary of Python tracker Issues
ACTIVITY SUMMARY (2011-12-09 - 2011-12-16) Python tracker at http://bugs.python.org/ To view or respond to any of the issues listed below, click on the issue. Do NOT respond to this message. Issues counts and deltas: open3175 ( +6) closed 0 (+40) total 25395 (+46) Open issues with patches: 1360 Issues opened (31) == #11886: test_time.test_tzset() fails on x86 FreeBSD 7.2 3.x: AEST ti http://bugs.python.org/issue11886 reopened by haypo #13571: Backup files support in IDLE http://bugs.python.org/issue13571 opened by maniram.maniram #13572: import _curses fails because of UnicodeDecodeError('utf8' code http://bugs.python.org/issue13572 opened by haypo #13573: csv.writer uses str() for floats instead of repr() http://bugs.python.org/issue13573 opened by rhettinger #13574: refresh example in doc for Extending and Embedding http://bugs.python.org/issue13574 opened by flox #13576: Handling of broken condcoms in HTMLParser http://bugs.python.org/issue13576 opened by ezio.melotti #13577: __qualname__ is not present on builtin methods and functions http://bugs.python.org/issue13577 opened by meador.inge #13578: Add subprocess.iter_output() convenience function http://bugs.python.org/issue13578 opened by ncoghlan #13579: string.Formatter doesn't understand the !a conversion specifie http://bugs.python.org/issue13579 opened by ncoghlan #13581: help() appears to be broken; doesn't display __doc__ for class http://bugs.python.org/issue13581 opened by christopherthemagnificent #13582: IDLE and pythonw.exe stderr problem http://bugs.python.org/issue13582 opened by serwy #13583: sqlite3.Row doesn't support slice indexes http://bugs.python.org/issue13583 opened by xapple #13585: Add contextlib.CleanupManager http://bugs.python.org/issue13585 opened by Nikratio #13586: Replace selected not working/consistent with find http://bugs.python.org/issue13586 opened by marco #13587: Correcting the typos error in Doc/howto/urllib2.rst http://bugs.python.org/issue13587 opened by Bithin.A #13588: Change name of internal closure functions in importlib http://bugs.python.org/issue13588 opened by brett.cannon #13589: Aifc low level serialization primitives fix http://bugs.python.org/issue13589 opened by Oleg.Plakhotnyuk #13590: Prebuilt python-2.7.2 binaries for macosx can not compile c ex http://bugs.python.org/issue13590 opened by teamnoir #13592: repr(regex) doesn't include actual regex http://bugs.python.org/issue13592 opened by dwt #13594: Aifc markers write fix http://bugs.python.org/issue13594 opened by Oleg.Plakhotnyuk #13598: string.Formatter doesn't support empty curly braces {} http://bugs.python.org/issue13598 opened by maniram.maniram #13601: sys.stderr should be unbuffered (or always line-buffered) http://bugs.python.org/issue13601 opened by pitrou #13604: update PEP 393 (match implementation) http://bugs.python.org/issue13604 opened by Jim.Jewett #13605: document argparse's nargs=REMAINDER http://bugs.python.org/issue13605 opened by bethard #13607: Move generator specific sections out of ceval. http://bugs.python.org/issue13607 opened by ron_adam #13608: remove born-deprecated PyUnicode_AsUnicodeAndSize http://bugs.python.org/issue13608 opened by Jim.Jewett #13609: Add os.get_terminal_size() function http://bugs.python.org/issue13609 opened by denilsonsa #13610: On Python parsing numbers. http://bugs.python.org/issue13610 opened by Jean-Michel.Fauth #13611: Integrate ElementC14N module into xml.etree package http://bugs.python.org/issue13611 opened by scoder #13612: xml.etree.ElementTree says unknown encoding of a regular encod http://bugs.python.org/issue13612 opened by dongying #13613: Small error in regular expression poker hand example http://bugs.python.org/issue13613 opened by Eddie E Most recent 15 issues with no replies (15) == #13611: Integrate ElementC14N module into xml.etree package http://bugs.python.org/issue13611 #13608: remove born-deprecated PyUnicode_AsUnicodeAndSize http://bugs.python.org/issue13608 #13605: document argparse's nargs=REMAINDER http://bugs.python.org/issue13605 #13594: Aifc markers write fix http://bugs.python.org/issue13594 #13590: Prebuilt python-2.7.2 binaries for macosx can not compile c ex http://bugs.python.org/issue13590 #13587: Correcting the typos error in Doc/howto/urllib2.rst http://bugs.python.org/issue13587 #13586: Replace selected not working/consistent with find http://bugs.python.org/issue13586 #13583: sqlite3.Row doesn't support slice indexes http://bugs.python.org/issue13583 #13579: string.Formatter doesn't understand the !a conversion specifie http://bugs.python.org/issue13579 #13576: Handling of broken condcoms in HTMLParser http://bugs.python.org/issue13576 #13574: refresh example in doc for Extending and Embedding http://bugs.python.org/issue13574 #13565: test_multiprocessing.test_notify_all() hangs on AMD64 Snow Le
[Python-Dev] A new dict for Xmas?
Greg Ewing wrote: Mark Shannon wrote: I have a new dict implementation which allows sharing of keys between objects of the same class. We already have the __slots__ mechanism for memory savings. Have you done any comparisons with that? You can't make Python programmers use slots, neither can you automatically change existing programs. The automatic change is exactly what a dictionary upgrade provides. I haven't read your patch in detail yet, but it sounds like you're replacing the array of keys + array of values with just an array of values, and getting the numerical index from a single per-class array of keys. That would normally be sensible (so thanks!), but it isn't a drop-in replacement. If you have a Data class intended to take arbitrary per-instance attributes, it just forces them all to keep resizing up, even though individual instances would be small with the current dict. How is this more extreme than replacing a pure dict with some auto-calculated slots and an other_attrs dict that would normally remain empty? [It may be harder to implement, because of the difficulty of calculating the slots in advance ... but I don't see it as any worse, once implemented.] Of course, maybe your shared dict just points to sequential array positions (rather than matching the key position) ... in which case, it may well beat slots, though the the Data class would still be a problem. -jJ ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] A new dict for Xmas?
On 12/16/2011 5:03 AM, Mark Shannon wrote: Of course using __slots__ saves more memory, but people don't use them much. Do you think the stdlib should be using __slots__ more? -- Terry Jan Reedy ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] A new dict for Xmas?
Jim Jewett wrote: Greg Ewing wrote: Mark Shannon wrote: I have a new dict implementation which allows sharing of keys between objects of the same class. We already have the __slots__ mechanism for memory savings. Have you done any comparisons with that? You can't make Python programmers use slots, neither can you automatically change existing programs. The automatic change is exactly what a dictionary upgrade provides. I haven't read your patch in detail yet, but it sounds like you're replacing the array of keys + array of values with just an array of values, and getting the numerical index from a single per-class array of keys. Each dictionary has key/hash/values as before, but instead of on array, they are broken into two: a key/hash array and a value array. The key/hash arrays can be shared amongst dicts, this happens for well behaved classes and completely empty dicts, other wise each dict gets two arrays. That would normally be sensible (so thanks!), but it isn't a drop-in replacement. If you have a Data class intended to take arbitrary It is a drop in replacement. It conforms to the current API. per-instance attributes, it just forces them all to keep resizing up, even though individual instances would be small with the current dict. There is a cut-off point, at the moment it's quite unsophisticated about how it does this, but it could easily be improved. Suggestions are welcome. How is this more extreme than replacing a pure dict with some auto-calculated slots and an other_attrs dict that would normally remain empty? Its less extreme, but equally effective. [It may be harder to implement, because of the difficulty of calculating the slots in advance ... but I don't see it as any worse, once implemented.] Its a trade of between ease of implementation as effectiveness. I think the shared key/hash array approach gets most the advantages of a full map implementation (like PyPy or V8) with much less hassle. Of course, maybe your shared dict just points to sequential array positions (rather than matching the key position) ... in which case, it may well beat slots, though the the Data class would still be a problem. It won't beat slots, mainly due to the extra space required to minimise collisions, but it is a lot more compact than the present approach. For a well behaved class with lots of instances, each with 3 or 4 attributes (ie the minimum size dict) its cuts the space used by the per-instance dict from 136 bytes (32bit machine) to 64 bytes plus the shared key/hash array. Slots would only require 12 or 16 bytes. (When verifying these numbers I found a bug in the resizing, which I have just fixed) The next enhancement would be to store the naked value array directly into an instance, trimming the space cost down to just 32 bytes, but this would cause compatibility issues as the (internal) API would need to change. Cheers, Mark. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] A new dict for Xmas?
Terry Reedy wrote: On 12/16/2011 5:03 AM, Mark Shannon wrote: Of course using __slots__ saves more memory, but people don't use them much. Do you think the stdlib should be using __slots__ more? For some things yes, but where it's critical slots are already used. Take the ordered dict, the nodes in that use slots. The advantage of improving things in the VM is that we don't have to rewrite half of the stdlib. Cheers, Mark. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com