[issue2892] improve cElementTree iterparse error handling
Fredrik Lundh fred...@effbot.org added the comment: Note that this was fixed in upstream 1.3 (and verified by the selftests), but the fix and test was apparently lost when that code was merged into 2.7. Since 2.7 is supposed to ship with 1.3, this is a regression, not a feature request. (But 2.7 is in rc, and I'm on vacation, so I guess it's a bit too late to do anything about that. I'll leave the final decision to flox and the python-dev crowd.) -- assignee: effbot - flox versions: +Python 2.7 ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue2892 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue8583] Hardcoded namespace_separator in the cElementTree.XMLParser
Fredrik Lundh fred...@effbot.org added the comment: Namespaces are a fundamental part of the XML information model (both xpath and infoset) and all modern XML document formats, so I'm not sure what problem you're trying to solve by pretending that they don't exist. It's a bit like modifying import foo to work like from foo import *... -- nosy: +effbot ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue8583 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue6488] ElementTree documentation refers to path with no explanation, and inconsistently
Fredrik Lundh fred...@effbot.org added the comment: As per PEP 257, “Returns” should become “Return” (it’s a command, not a description). Upstream ET uses JavaDoc conventions, where the conventions are designed by technical writers, not hackers. In JavaDoc, descriptions are 3rd person declarative (after all, the documentation describes what the function does, not what you want it to do). http://java.sun.com/j2se/javadoc/writingdoccomments/ The incompatibilities with Python's NIH-standards are unfortunate, but that's the way it is. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue6488 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue6488] ElementTree documentation refers to path with no explanation, and inconsistently
Fredrik Lundh fred...@effbot.org added the comment: The missing/extra words in the findtext description is just a case of sloppy copy-editing, most likely after a quick reformatting. Not sure why you're spending all this energy arguing about commas, though. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue6488 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue8047] Serialiser in ElementTree returns unicode strings in Py3k
Fredrik Lundh fred...@effbot.org added the comment: Hmm. I'm not entirely sure about giving False a meaning when None has traditionally had a different (and documented) meaning. And sleeping on it hasn't convinced me in either direction :-( (well, I'd say no, but the compatibility argument is somewhat tempting) I'm not that concerned by changing the default for write -- 3.x users with utf-8 as the default output encoding will get different output, but still perfectly valid XML. 3.x users with non-utf-8 default encodings will get valid XML also in cases where it didn't work before. tostring() is more problematic, but I'm leaning towards Guido's torpedoes approach there -- changing the default output to bytestrings is more likely to cause code to blow up than cause bad output, and you can trivially make your program backwards compatible by adding an extra check/decode after the call. Supporting unicode for lxml.etree compatibility is fine with me, but I think it might make sense to support the string unicode as well (as a pseudo-encoding -- it's pretty clear to me that nobody will ever define a real character encoding with that name :-). Have you posted/can you post the patch to riedveld, btw? I have some questions about the code that are independent of the encoding decision. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue8047 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue8047] Serialiser in ElementTree returns unicode strings in Py3k
Fredrik Lundh fred...@effbot.org added the comment: 'None' has always been the documented default for the encoding parameter That's probably mostly by accident at least in original ET, but the 1.3 draft docs at effbot.org/elementtree does spell it out explicitly for the 'write' method: Output encoding. If omitted or set to None, defaults to US-ASCII. Not sure I'd consider this text binding in itself, though (even if I'd argue that it's preferred to have the same interpretation of encoding everywhere). writing out the Unicode serialisation will result in an incorrect XML serialisation I think Guido meant the ElementTree.write method; is that broken too? The file.write(et.tostring()) issue is probably my most pressing concern here; that's a common use case (e.g. when using iterparse to cut pieces from a big document), and the defaults were chosen to increase the chance that this automatically do the right thing for non-ASCII even if the programmer never tests it. In 3.X, that construct is suddenly dependent on the interpreter's default encoding. I think I'd prefer old tostring behaviour and a separate tounicode function, and I'm still not convinced that the latter is required for the XML use case (which implies that maybe it should live in lxml.html for the HTML case, even if it ends up calling the same internal implementation). Or should that be tobytes and tounicode to eliminate all ambiguity? -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue8047 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue8047] Serialiser in ElementTree returns unicode strings in Py3k
Fredrik Lundh fred...@effbot.org added the comment: (what's the Python 3 replacement for the array module, btw?) -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue8047 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue8047] Serialiser in ElementTree returns unicode strings in Py3k
Fredrik Lundh fred...@effbot.org added the comment: Yes, the feature has been implemented deep down in the _encode() helper function, so it impacts the entire serialiser, not only its API Ouch. import locale locale.getpreferredencoding() == utf-8 False from xml.etree.ElementTree import * e = Element(tag) e.text = hellö tostring(e) 'taghellö/tag' ElementTree(e).write(out.xml) tree = parse(out.xml) Traceback (most recent call last): File stdin, line 1, in module File C:\Python31\lib\xml\etree\ElementTree.py, line 843, in parse tree.parse(source, parser) File C:\Python31\lib\xml\etree\ElementTree.py, line 581, in parse parser.feed(data) File C:\Python31\lib\xml\etree\ElementTree.py, line 1221, in feed self._parser.Parse(data, 0) xml.parsers.expat.ExpatError: not well-formed (invalid token): line 1, column 9 -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue8047 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue8047] Serialiser in ElementTree returns unicode strings in Py3k
Fredrik Lundh fred...@effbot.org added the comment: I wouldn't raise much opposition against tobytes() as an alias for tostring(), although that sounds more like duplicating an otherwise simple API. Adding an alias would be a way address the 2.X/3.X terminology overlap; string traditionally implies 8-bit in 2.X, and apparently now Unicode in 3.X. That's likely to cause a lot of confusion for people switching over (and to people writing 3.X documentation, as well; the array module's documentation is an example). ET isn't the only thing with tostring functionality, of course -- it's pretty much the standard name for serialize data structure to byte string for later transmission -- so it probably wouldn't hurt with a python-dev pronouncement here. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue8047 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue8047] Serialiser in ElementTree returns unicode strings in Py3k
Changes by Fredrik Lundh fred...@effbot.org: -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue8047 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue8047] Serialiser in ElementTree returns unicode strings in Py3k
Fredrik Lundh fred...@effbot.org added the comment: I wouldn't raise much opposition against tobytes() as an alias for tostring(), although that sounds more like duplicating an otherwise simple API. Adding an alias would be a way address the 2.X/3.X terminology overlap; string traditionally implies 8-bit in 2.X, and apparently now Unicode in 3.X. That's likely to cause a lot of confusion for people switching from 2 to 3 (and to people writing 3.X documentation, apparently; the array module's documentation is an example of that). (And once everyone has switched over, we can deprecate the tostring spelling... :) ET isn't the only thing with tostring functionality, of course -- it's pretty much the standard name for serialize data structure to byte string for later transmission -- so it probably wouldn't hurt with a python-dev pronouncement here. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue8047 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue8047] Serialiser in ElementTree returns unicode strings in Py3k
Fredrik Lundh fred...@effbot.org added the comment: Interesting. But isn't the problem with 3.1 that it relies on the standard encoding, which results in code that may or may not work depending on a global platform setting? Who's doing the encoding in the new version? And what ends up in the file? -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue8047 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue8047] Serialiser in ElementTree returns unicode strings in Py3k
Fredrik Lundh fred...@effbot.org added the comment: Oops :) Yeah, that was pretty lousy way to show what encoding I was using for that test: import locale locale.getpreferredencoding() 'cp1252' (Somewhat related, it would be nice if Python actually normalized defaultencoding/preferredencoding to some canonical name for the codec in use, i.e. preferred MIME name or at least IANA; we had a rather nice little bug recently that wouldn't have happened if that had been the case...) -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue8047 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue7114] HTMLParser doesn't handle ![CDATA[ ... ]]
Fredrik Lundh fred...@effbot.org added the comment: And to clarify, XHTML is an reformulation of HTML4 using XML syntax, so you should use an XML parser to parse it, not an HTML parser. The formats are related, but not identical. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue7114 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue5100] ElementTree.iterparse and Element.tail confusion
Fredrik Lundh fred...@effbot.org added the comment: Footnote: iterparse does things this way mostly to keep the implementation simple and fast; due to buffering, the tree builder are usually ahead of the event generation with up to 16k. See the note on this page: http://effbot.org/zone/element-iterparse.htm and the message it links to for more on this topic. Your case is a very common use case for tostring, so it would probably have made sense to make tostring skip the tail on the element itself, at least if it's whitespace only. Guess we could add an option... But in your case, you can probably just nuke or normalize the tail element before writing it out (i.e. set it to None or \n). -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue5100 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue8047] Serialiser in ElementTree returns unicode strings in Py3k
Fredrik Lundh fred...@effbot.org added the comment: if I don't specify an encoding, I get unicode. If I do specify an encoding, I get encoded bytes. You're confusing the XML document encoding with character set encoding. A serialized (unparsed) XML document is a byte stream, not a string of Unicode characters. And the character set encoding is both embedded in that byte stream and affects how it's generated in more than one way; you cannot just recode XML documents nilly willy and expect things to work. A parsed XML document (an infoset) -- for ET, that's the tree of Element objects -- does indeed contain Unicode strings, but the transformation from the byte stream to the Unicode string doesn't just involve character set decoding; there are several other constructs that are handled by the XML parser. Ha. There has been a very long temporal window You should have had plenty of time to fix it, then, right? -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue8047 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue6472] Update ElementTree with upstream changes
Fredrik Lundh fred...@effbot.org added the comment: W00t! -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue6472 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue8047] Serialiser in ElementTree returns unicode strings in Py3k
Fredrik Lundh fred...@effbot.org added the comment: import array array.array(i, [1, 2, 3]).tostring() b'\x01\x00\x00\x00\x02\x00\x00\x00\x03\x00\x00\x00' -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue8047 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue8047] Serialiser in ElementTree returns unicode strings in Py3k
Fredrik Lundh fred...@effbot.org added the comment: So now it's the domain experts against some hypothetical people that might exist? Tricky. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue8047 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue7462] Implement fastsearch algorithm for rfind/rindex
Fredrik Lundh fred...@effbot.org added the comment: Thanks Florent! Are there any simple, common cases that are made slower by this patch? The original fastsearch implementation has a couple of special cases to make sure it's faster than the original code in all cases. The reason it wasn't implemented for reverse search was more a question of developer time constraints; reverse search isn't nearly as common as forward search, and we had other low-hanging fruit to deal with. (btw, while it's great that someone finally got around to fix this, it wouldn't surprise me if replacing the KMP implementation in SRE with a fastsearch would save as many CPU cycles worldwide as this patch :) -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue7462 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue3475] _elementtree.c import can fail silently
Fredrik Lundh fred...@effbot.org added the comment: Note that fail silently is a bit of a misnomer - if the embedded import doesn't work, portions of the library will fail pretty loudly. Feel free to use some variation of the suggested patch, or just wait until the next upstream release gets imported (if ever). -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue3475 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue7139] ElementTree: Incorrect serialization of end-of-line characters in attribute values
Fredrik Lundh fred...@effbot.org added the comment: The real problem here is that XML attributes weren't really designed to hold data that doesn't survive normalization. One would have thought that making it difficult to do that, and easy to store such things as character data, would have made people think a bit before designing XML formats that does things the other way around, but apparently some people finds it hard having to use their brain when designing things... FWIW, the current ET 1.3 beta escapes newline but not tabs and carriage returns; I don't really mind adding tabs, but I'm less sure about carriage return -- XML pretty much treats CT as a junk character also outside attributes, and escaping it in all contexts would just be silly. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue7139 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue6562] OverflowError in RLock.acquire()
Fredrik Lundh fred...@effbot.org added the comment: PIL is completely thread-agnostic, so I not sure there's anything PIL can do to fix this. (and ImageQt is of course an interface to PyQt, which is an interface to Qt, which consists of a *lot* more than 50 lines...) -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue6562 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue6233] ElementTree (py3k) doesn't properly encode characters that can't be represented in the specified encoding
Fredrik Lundh fred...@effbot.org added the comment: That's backwards, unless I'm missing something here: charrefs represent Unicode characters, not UTF-8 byte values. The character LATIN SMALL LETTER A WITH TILDE with the character value 227 should be represented as #227; if serialized to an encoding that doesn't support non-ASCII characters. And there's no need to use RE:s to filter things under 3.X; those parts of ET 1.2 are there for pre-2.0 compatibility. Did you try running the tests with the escape function I posted? -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue6233 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue5166] ElementTree and minidom don't prevent creation of not well-formed XML
Fredrik Lundh fred...@effbot.org added the comment: For ET, that's very much on purpose. Validating data provided by every single application would kill performance for all of them, even if only a small minority would ever try to serialize data that cannot be represented in XML. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue5166 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue6266] cElementTree.iterparse ElementTree.iterparse return differently encoded strings
Fredrik Lundh fred...@effbot.org added the comment: It should definitely give what's intended (either a Unicode string, or, if the content is plain ASCII, an 8-bit string). What did you get instead? -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue6266 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue6233] ElementTree (py3k) doesn't properly encode characters that can't be represented in the specified encoding
Fredrik Lundh fred...@effbot.org added the comment: Umm. Isn't _encode used to encode tags and attribute names? The charref syntax is only valid in CDATA sections and attribute values, which are encoded by the corresponding _escape functions. I suspect this patch will make things blow up on a non-ASCII tag/attribute name. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue6233 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue6233] ElementTree (py3k) doesn't properly encode characters that can't be represented in the specified encoding
Fredrik Lundh fred...@effbot.org added the comment: Did you look at the 1.3 alpha code base when you came up with this idea? Unfortunately, 1.3's _encode is used for a different purpose... I don't have time to test it tonight, but I suspect that 1.3's escape_data/escape_attrib functions might work better under 3.X; they do the text.replace dance first, and then an explicit text.encode(encoding, xmlcharrefreplace) at the end. E.g. def _escape_cdata(text, encoding): # escape character data try: # it's worth avoiding do-nothing calls for strings that are # shorter than 500 character, or so. assume that's, by far, # the most common case in most applications. if in text: text = text.replace(, amp;) if in text: text = text.replace(, lt;) if in text: text = text.replace(, gt;) return text.encode(encoding, xmlcharrefreplace) except (TypeError, AttributeError): _raise_serialization_error(text) -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue6233 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue6266] cElementTree.iterparse ElementTree.iterparse return differently encoded strings
Fredrik Lundh fred...@effbot.org added the comment: Converting from UTF-8 to Unicode is the right thing to do, but converting back to Latin-1 is not correct -- note that ET returns a Unicode string, not an 8-bit string. There's a makestring helper that does the right thing in the library; just changing: parcel = Py_BuildValue(ss, (prefix) ? prefix : , uri); to parcel = Py_BuildValue(sN, (prefix) ? prefix : , makestring(uri)); should work (even if you should probably do that in two steps, and look for errors from makestring before proceeding). -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue6266 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue5767] xmlrpclib loads invalid documents
Fredrik Lundh fred...@effbot.org added the comment: sgmlop doesn't do much validation; to quote the homepage: [sgmlop] is tolerant, and happily accepts XML-like data that are not well-formed. If you need strictness, use another parser. But given that Python ships with cElementTree these days, and cElementTree's XMLParser (based on expat) is faster than both sgmlop and pyexpat, maybe it's time to remove sgmlop support from xmlrpclib... -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue5767 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue1143] Update to latest ElementTree in Python 2.7
Fredrik Lundh eff...@users.sourceforge.net added the comment: ET 1.3 is still in alpha, though. Hopefully, that'll sort itself out over the next few weeks. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue1143 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue1538691] Patch cElementTree to export CurrentLineNumber
Fredrik Lundh eff...@users.sourceforge.net added the comment: In the upstream 1.0.6, the ParseError exception has a position attribute that contains a (line, column) tuple. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue1538691 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue1777] ElementTree/cElementTree findtext inconsistency
Fredrik Lundh eff...@users.sourceforge.net added the comment: Forgot to mention that this is fixed in the cElementTree trunk (public as of today's 1.0.6 preview release). Will merge with Python trunk when I find the time... ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue1777 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue4100] xml.etree.ElementTree does not read xml-text over page bonderies
Fredrik Lundh [EMAIL PROTECTED] added the comment: Roland's right - iterparse only guarantees that it has seen the character of a starting tag when it emits a start event, so the attributes are defined, but the contents of the text and tail attributes are undefined at that point. The same applies to the element children; they may or may not be present. If you need a fully populated element, look for end events instead. ___ Python tracker [EMAIL PROTECTED] http://bugs.python.org/issue4100 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue433029] SRE: posix classes aren't supported
Fredrik Lundh [EMAIL PROTECTED] added the comment: Yes, this refers to the POSIX character classes as described here: http://www.opengroup.org/onlinepubs/009695399/basedefs/xbd_chap09.html (Ideally, there should be an (internal) API that lets you register class definitions from the Python level.) Support for Unicode properties could perhaps be addressed at the same time: http://unicode.org/unicode/reports/tr18/#Basic_Unicode_Support ___ Python tracker [EMAIL PROTECTED] http://bugs.python.org/issue433029 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue3547] Ctypes is confused by bitfields of varying integer types
Fredrik Lundh [EMAIL PROTECTED] added the comment: Looks fine to me, except for the comment in the test suite. Should +# MS compilers do NOT combine c_short and c_int into +# one field, gcc doesn't. perhaps be +# MS compilers do NOT combine c_short and c_int into +# one field, gcc do. ? Is using explicit tests for MSVC vs. GCC a good idea, btw? What about other compilers? Can the test be changed to accept either value? -- nosy: +effbot ___ Python tracker [EMAIL PROTECTED] http://bugs.python.org/issue3547 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue3547] Ctypes is confused by bitfields of varying integer types
Fredrik Lundh [EMAIL PROTECTED] added the comment: Do should be does, right. Not enough coffee today :) ___ Python tracker [EMAIL PROTECTED] http://bugs.python.org/issue3547 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue3865] explain that profilers should be used for profiling, not benchmarking
Fredrik Lundh [EMAIL PROTECTED] added the comment: (the reason this is extra bad for C modules is that the profilers introduce overhead for Python code, but not for C-level functions. For example, using the standard profiler to benchmark parser performance for xml.etree.ElementTree vs. xml.etree.cElementTree will make ET appear to be about 10 times slower than it actually is.) ___ Python tracker [EMAIL PROTECTED] http://bugs.python.org/issue3865 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue3865] explain that profilers should be used for profiling, not benchmarking
New submission from Fredrik Lundh [EMAIL PROTECTED]: You often see people using the profiler for benchmarking instead of profiling. I suggest adding a note that explains that the profiler modules are designed to provide an execution profile for a given program, not for benchmarking different libraries or, even worse, benchmarking Python code against C libraries. Point people to the timeit module if they want resonably accurate results. (and yes, it would be nice if the copyright text on the page http://docs.python.org/dev/library/profile.html was moved to the bottom of the page. If necessary, add something like This description of the profile module is Copyright © 1994, by InfoSeek Corporation, all rights reserved. Full copyright message below at the top.) -- assignee: georg.brandl components: Documentation messages: 73213 nosy: effbot, georg.brandl severity: normal status: open title: explain that profilers should be used for profiling, not benchmarking type: feature request versions: Python 2.6 ___ Python tracker [EMAIL PROTECTED] http://bugs.python.org/issue3865 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue3825] Major reworking of Python 2.5.2 re module
Fredrik Lundh [EMAIL PROTECTED] added the comment: A bit more information on the changes to the core engine that are responsible for the 2x speedup (on what?) would be nice to have, I think (especially since you seem to have removed the KMP prefix scanner). (Isn't there a RE benchmark suite somewhere under tests?) -- nosy: +effbot ___ Python tracker [EMAIL PROTECTED] http://bugs.python.org/issue3825 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue3811] Update Unicode database to 5.1.0
Fredrik Lundh [EMAIL PROTECTED] added the comment: The patch looks fine to me (assuming that I didn't miss something critical hidden among the large table diffs). (I'd probably named the NODELTA flag after what it is rather than what it isn't, but I cannot think of a short replacement right now, so let's leave it as it is.) ___ Python tracker [EMAIL PROTECTED] http://bugs.python.org/issue3811 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue648658] xmlrpc can't do proxied HTTP
Fredrik Lundh [EMAIL PROTECTED] added the comment: It's a missing feature, not a bug in the existing code. But if you're desperate, why not just use the transport implementation that's attached to this issue? ___ Python tracker [EMAIL PROTECTED] http://bugs.python.org/issue648658 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue3475] _elementtree.c import can fail silently
Fredrik Lundh [EMAIL PROTECTED] added the comment: This is fixed in the ET 1.3-compatible codebase. Since it's too late to add ET 1.3 to 2.6, I guess it's time to make a new 1.2 bugfix release for 2.6. ___ Python tracker [EMAIL PROTECTED] http://bugs.python.org/issue3475 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue3353] make built-in tokenizer available via Python C API
Fredrik Lundh [EMAIL PROTECTED] added the comment: That's should be all that's needed to expose the existing API, as is. If you want to verify the build, you can grab the pytoken.c and setup.py files from this directory, and try building the module. http://svn.effbot.org/public/stuff/sandbox/pytoken/ Make sure you remove the local copy of tokenizer.h that's present in that directory before you build. If that module builds, all's well. ___ Python tracker [EMAIL PROTECTED] http://bugs.python.org/issue3353 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue3409] ElementPath.Path.findall problem with unicode input
Fredrik Lundh [EMAIL PROTECTED] added the comment: Hmm. That's embarrassing. What was I thinking? Guess it's time to update the 2.X codebase to ET 1.2.8. ___ Python tracker [EMAIL PROTECTED] http://bugs.python.org/issue3409 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue3353] make built-in tokenizer available via Python C API
Fredrik Lundh [EMAIL PROTECTED] added the comment: There are a few things in the struct that needs to be public, but that's nothing that cannot be handled by documentation. No need to complicate the API just in case. ___ Python tracker [EMAIL PROTECTED] http://bugs.python.org/issue3353 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue3353] make built-in tokenizer available via Python C API
New submission from Fredrik Lundh [EMAIL PROTECTED]: CPython provides a Python-level API to the parser, but not to the tokenizer itself. Somewhat annoyingly, it does provide a nice C API, but that's not properly exposed for external modules. To fix this, the tokenizer.h file should be moved from the Parser directory to the Include directory, and the (semi-public) functions that already available must be flagged with PyAPI_FUNC, as shown below. The PyAPI_FUNC fix should be non-intrusive enough to go into 2.6 and 3.0; moving stuff around is perhaps better left for a later release (which could also include a Python binding). Index: tokenizer.h === --- tokenizer.h (revision 514) +++ tokenizer.h (working copy) @@ -54,10 +54,10 @@ const char* str; }; -extern struct tok_state *PyTokenizer_FromString(const char *); -extern struct tok_state *PyTokenizer_FromFile(FILE *, char *, char *); -extern void PyTokenizer_Free(struct tok_state *); -extern int PyTokenizer_Get(struct tok_state *, char **, char **); +PyAPI_FUNC(struct tok_state *) PyTokenizer_FromString(const char *); +PyAPI_FUNC(struct tok_state *) PyTokenizer_FromFile(FILE *, char *, char *); +PyAPI_FUNC(void) PyTokenizer_Free(struct tok_state *); +PyAPI_FUNC(int) PyTokenizer_Get(struct tok_state *, char **, char **); #ifdef __cplusplus } -- components: Interpreter Core messages: 69650 nosy: effbot severity: normal status: open title: make built-in tokenizer available via Python C API type: feature request versions: Python 2.6, Python 2.7, Python 3.0 ___ Python tracker [EMAIL PROTECTED] http://bugs.python.org/issue3353 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue3299] invalid object destruction in re.finditer()
Fredrik Lundh [EMAIL PROTECTED] added the comment: This report makes no sense to me; at least in Python 2.X, PyObject_Del removes a chunk of memory from the object heap. It's designed to be used from dealloc implementations, to release the actual memory (either directly, or as the default implementation for the tp_free slot). It can also be used in constructors, to destroy an object that was just created if something goes wrong. If you change PyObject_Del to Py_DECREF nillywilly, things will indeed crash. (with the original 2.5 code, I cannot see how a non-string argument to finditer() can result in a call to scanner_dealloc(); the argument will be rejected by getstring(), which causes state_init() to return, which causes pattern_scanner() to free the object it just created, and return.) ___ Python tracker [EMAIL PROTECTED] http://bugs.python.org/issue3299 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue2842] Dictionary methods: inconsistency
Changes by Fredrik Lundh [EMAIL PROTECTED]: -- nosy: -effbot __ Tracker [EMAIL PROTECTED] http://bugs.python.org/issue2842 __ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue1327] Python 2.4+ spends too much time in PyEval_EvalFrame w/ xmlrpmclib
Fredrik Lundh added the comment: Can you switch on verbose mode in xmlrpclib, so you can see *where* the transfer hangs? Arguing that a hanging Python program must be caused by a bug in the code that *executes* the Python program isn't that meaningful, really. After all, that code is used to run *all* Python programs, so I think we'd noticed if it had a tendency to hang unexpectedly... __ Tracker [EMAIL PROTECTED] http://bugs.python.org/issue1327 __ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue1761] Bug in re.sub()
Fredrik Lundh added the comment: For the record, $ is defined to match before a newline at the end of the string, or at the end of the string in normal mode, and before any newline, or at the end of the string in multiline mode. (and I have a vague memory that the before a newline behaviour in normal mode was added for Perl compatibility) It seems that it matches BOTH the end of the string AND just before the newline at the end of the string. Of course it does: re.sub scans the string for matches from left to right, and does the substitution everywhere the pattern matches, only skipping over the matched parts. Or in other words, if a pattern matches N characters on position X has no influence on whether it matches on position X+N or not. __ Tracker [EMAIL PROTECTED] http://bugs.python.org/issue1761 __ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue1777] ElementTree/cElementTree findtext inconsistency
Fredrik Lundh added the comment: Looks like the mechanisms used decide when to invoke the full ElementPath machinery differs somewhat. I've added this to the TODO list for ET 1.3; in the meantime, my advice is don't do that. (adding a check for '.' to the PATHCHAR macro should fix this, I think.) -- priority: - normal __ Tracker [EMAIL PROTECTED] http://bugs.python.org/issue1777 __ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue1327] Python 2.4+ spends too much time in PyEval_EvalFrame w/ xmlrpmclib
Fredrik Lundh added the comment: That changes to ceval should have introduced some kind of XML-RPC package limit seems a bit unlikely. If you can still reproduce this, can you try instrumenting the xmlrpclib.py library to see where it gets stuck? (passing in verbose=True to the Server[Proxy] constructor might be good enough) -- nosy: +effbot __ Tracker [EMAIL PROTECTED] http://bugs.python.org/issue1327 __ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue1698167] xml.etree document element.tag
Fredrik Lundh added the comment: This is fixed in the development version, so I'm closing this for now. The updated docs can be found here: http://docs.python.org/dev/library/xml.etree.elementtree.html -- resolution: - fixed status: open - closed _ Tracker [EMAIL PROTECTED] http://bugs.python.org/issue1698167 _ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue1700] Regular Expression inline flags not handled correctly for some unicode characters
Fredrik Lundh added the comment: Looks like the wrong execution flags are being passed to the function that creates the actual pattern object; the SRE compiler does the right thing, but the engine isn't running with the right flags in the last case. Changing the call to _sre.compile in sre_compile.py to: return _sre.compile( pattern, flags | p.pattern.flags, code, p.pattern.groups-1, groupindex, indexgroup ) should do the trick, I think. (got no time to fix my broken Python SVN setup right now, but if someone wants to verify this and add the necessary tests to the test suite, be my guest). __ Tracker [EMAIL PROTECTED] http://bugs.python.org/issue1700 __ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue1160] Medium size regexp crashes python
Fredrik Lundh added the comment: Well, I'm not sure 81k qualifies as medium sized, really. If you look at the size distribution for typical RE:s (which are usually handwritten, not machine generated), that's one or two orders of magnitude larger than medium. (And even if this was guaranteed to work on all Python builds, my guess is that performance would be pretty bad compared to a using a minimal RE and checking potential matches against a set. The | operator is mostly O(N), not O(1).) As for fixing this, the byte code used by the RE engine uses a word size equal to the Unicode character size (sizeof(Py_UNICODE)) for the given platform. I don't think it would be that hard to set it to 32 bits also on platforms using 16-bit Unicode characters (if anyone would like to experiment, just set SRE_CODE to unsigned long in sre.h and see what happens when you run the test suite). __ Tracker [EMAIL PROTECTED] http://bugs.python.org/issue1160 __ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue1143] Updated to latest ElementTree in 2.6
New submission from Fredrik Lundh: The xml.etree package should be updated to ElementTree 1.3/cElementTree 1.0.6 (or later). -- assignee: effbot components: XML messages: 55811 nosy: effbot priority: normal severity: minor status: open title: Updated to latest ElementTree in 2.6 type: rfe versions: Python 2.6 __ Tracker [EMAIL PROTECTED] http://bugs.python.org/issue1143 __ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue1602189] Suggest a textlist() method for ElementTree
Fredrik Lundh added the comment: ElementTree 1.3 provides a variant of this (tentatively called itertext). -- resolution: - accepted superseder: - Updated to latest ElementTree in 2.6 _ Tracker [EMAIL PROTECTED] http://bugs.python.org/issue1602189 _ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue1143] Update to latest ElementTree in Python 2.6
Changes by Fredrik Lundh: -- title: Updated to latest ElementTree in 2.6 - Update to latest ElementTree in Python 2.6 __ Tracker [EMAIL PROTECTED] http://bugs.python.org/issue1143 __ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue1602189] Suggest a textlist() method for ElementTree
Changes by Fredrik Lundh: -- status: open - closed _ Tracker [EMAIL PROTECTED] http://bugs.python.org/issue1602189 _ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue1745722] please add wsgi to SimpleXMLRPCServer
Fredrik Lundh added the comment: A proper patch, including tests (if possible) and documentation, would be nice. (also note that SimpleXMLRPCServer was written by Brian Quinlan.) -- assignee: effbot - _ Tracker [EMAIL PROTECTED] http://bugs.python.org/issue1745722 _ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue1690840] xmlrpclib methods submit call on __str__, __repr__
Fredrik Lundh added the comment: I'm trying to think of a reason for actually providing __repr__ over RPC, but I cannot find any. Not quite as sure about __str__, though; I suggest adding a __repr__ method, but leaving the rest as is. -- assignee: effbot - collinwinter _ Tracker [EMAIL PROTECTED] http://bugs.python.org/issue1690840 _ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue814253] Grouprefs in lookbehind assertions
Changes by Fredrik Lundh: -- type: - behavior versions: +Python 2.4, Python 2.5 Tracker [EMAIL PROTECTED] http://bugs.python.org/issue814253 ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue1140] re.sub returns str when processing empty unicode string
Fredrik Lundh added the comment: (is there a way to just add a comment in the new tracker, btw, or is everything a change note, even if nothing has changed?) __ Tracker [EMAIL PROTECTED] http://bugs.python.org/issue1140 __ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue1140] re.sub returns str when processing empty unicode string
Fredrik Lundh added the comment: Well, I spent a minute hunting around for a comment field or an add comment button. Guess this is a you only need to learn this once thing... __ Tracker [EMAIL PROTECTED] http://bugs.python.org/issue1140 __ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue1140] re.sub returns str when processing empty unicode string
Changes by Fredrik Lundh: __ Tracker [EMAIL PROTECTED] http://bugs.python.org/issue1140 __ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue1123] split(None, maxsplit) does not strip whitespace correctly
Fredrik Lundh added the comment: Looks like a *documentation* bug to me; at the implementation level, None just means no empty parts, treat runs of whitespace as separators. -- nosy: +effbot __ Tracker [EMAIL PROTECTED] http://bugs.python.org/issue1123 __ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue1123] split(None, maxsplit) does not strip whitespace correctly
Fredrik Lundh added the comment: But wasn't your complaint that the implementation didn't match the documentation? As I said, the *implementation* treats runs of whitespace as separators, except for whitespace at the beginning or end (or in other words, it never returns empty strings). That matches the documentation, except for the first in first, whitespace characters are stripped from both ends. As far as I can tell, the documentation has never matched the implementation here. __ Tracker [EMAIL PROTECTED] http://bugs.python.org/issue1123 __ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com