[issue8047] Serialiser in ElementTree returns unicode strings in Py3k

2011-10-28 Thread Florent Xicluna
Florent Xicluna added the comment: 3.1 is no longer in scope for this issue. -- resolution: -> out of date status: open -> closed ___ Python tracker ___

[issue8047] Serialiser in ElementTree returns unicode strings in Py3k

2010-08-08 Thread Guido van Rossum
Changes by Guido van Rossum : -- nosy: -gvanrossum ___ Python tracker ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.

[issue8047] Serialiser in ElementTree returns unicode strings in Py3k

2010-08-08 Thread Florent Xicluna
Florent Xicluna added the comment: Done for 3.2 with r83851. Still opened, if someone wants to propose a patch for 3.1. -- assignee: effbot -> keywords: +easy -patch stage: commit review -> needs patch versions: -Python 3.2 ___ Python tracker

[issue8047] Serialiser in ElementTree returns unicode strings in Py3k

2010-08-08 Thread Stefan Behnel
Stefan Behnel added the comment: I would suggest fixing the tostring() behaviour also in a future 3.1.x bug fix release. After all, the current behaviour means that 3.0 and 3.1 would behave different from any other (released or future) Python version here. --

[issue8047] Serialiser in ElementTree returns unicode strings in Py3k

2010-07-31 Thread Florent Xicluna
Florent Xicluna added the comment: Patch updated here, and on Rietveld too. http://codereview.appspot.com/664043 Rules (as discussed): - tree.tostring(encoding=None) => encodes to "US-ASCII" (compatible with 2.7 and lxml.etree) - tree.tostring(encoding="unicode") => outputs Unicode - tre

[issue8047] Serialiser in ElementTree returns unicode strings in Py3k

2010-07-31 Thread Florent Xicluna
Changes by Florent Xicluna : Removed file: http://bugs.python.org/file16543/issue8047_etree_encoding.diff ___ Python tracker ___ ___ Python-bug

[issue8047] Serialiser in ElementTree returns unicode strings in Py3k

2010-03-22 Thread Marc-Andre Lemburg
Marc-Andre Lemburg added the comment: Stefan Behnel wrote: > > Stefan Behnel added the comment: > >> Supporting unicode for lxml.etree compatibility is fine with me, but I >> think it might make sense to support the string "unicode" as well (as >> a pseudo-encoding -- it's pretty clear to me

[issue8047] Serialiser in ElementTree returns unicode strings in Py3k

2010-03-22 Thread Stefan Behnel
Stefan Behnel added the comment: > Supporting unicode for lxml.etree compatibility is fine with me, but I > think it might make sense to support the string "unicode" as well (as > a pseudo-encoding -- it's pretty clear to me that nobody will ever > define a real character encoding with that name

[issue8047] Serialiser in ElementTree returns unicode strings in Py3k

2010-03-22 Thread Florent Xicluna
Florent Xicluna added the comment: http://codereview.appspot.com/664043 (patch against 3.x) IIUC, the changes proposed (for 3.2) are: - default encoding or bool(encoding) == False ==> fallback to 'US-ASCII' encoding (instead of Unicode) - encoding=str or encoding='unicode' ==> serialize

[issue8047] Serialiser in ElementTree returns unicode strings in Py3k

2010-03-21 Thread Fredrik Lundh
Fredrik Lundh added the comment: Hmm. I'm not entirely sure about giving False a meaning when None has traditionally had a different (and documented) meaning. And sleeping on it hasn't convinced me in either direction :-( (well, I'd say no, but the compatibility argument is somewhat temptin

[issue8047] Serialiser in ElementTree returns unicode strings in Py3k

2010-03-14 Thread Stefan Behnel
Stefan Behnel added the comment: That's a funny idea. I like that. +1 -- ___ Python tracker ___ ___ Python-bugs-list mailing list Unsu

[issue8047] Serialiser in ElementTree returns unicode strings in Py3k

2010-03-14 Thread Florent Xicluna
Florent Xicluna added the comment: Currently "tree.write(file)" returns Unicode in 3.1 (and 3.x). I would propose the following change: >>> tree.write(file) # ==> encode to ASCII without xml declaration (compatible 2.x) >>> tree.write(file, encoding="utf-8") # ==> encode to UTF-8 without xm

[issue8047] Serialiser in ElementTree returns unicode strings in Py3k

2010-03-12 Thread Guido van Rossum
Guido van Rossum added the comment: I propose that we continue to see Fredrik as elementtree's "BDFL". If Fredrik wants the API in 3.2 to be changed to be backwards compatible with 2.x, we should do that, and damn the torpedoes (um, 3.1 compatibility). I would do this ASAP; if you can, fix it

[issue8047] Serialiser in ElementTree returns unicode strings in Py3k

2010-03-12 Thread Fredrik Lundh
Fredrik Lundh added the comment: Oops :) Yeah, that was pretty lousy way to show what encoding I was using for that test: >>> import locale >>> locale.getpreferredencoding() 'cp1252' >>> (Somewhat related, it would be nice if Python actually normalized defaultencoding/preferredencoding to s

[issue8047] Serialiser in ElementTree returns unicode strings in Py3k

2010-03-12 Thread Florent Xicluna
Florent Xicluna added the comment: >>> tree = parse("out.xml") Actually the test in my previous message does not prove anything. locale.getpreferredencoding() returns "UTF-8" != "utf-8". :) -- ___ Python tracker

[issue8047] Serialiser in ElementTree returns unicode strings in Py3k

2010-03-12 Thread Fredrik Lundh
Fredrik Lundh added the comment: Interesting. But isn't the problem with 3.1 that it relies on the standard encoding, which results in code that may or may not work depending on a global platform setting? Who's doing the encoding in the new version? And what ends up in the file? -

[issue8047] Serialiser in ElementTree returns unicode strings in Py3k

2010-03-12 Thread Florent Xicluna
Florent Xicluna added the comment: I plan to merge ET 1.3 in the 3.x branch tomorrow (See #6472) Currently, the patch is consistent with 3.1 behaviour. It could be changed later, depending on the pronouncement on this compatibility issue. > Previously, in ElementTree, serialising without an e

[issue8047] Serialiser in ElementTree returns unicode strings in Py3k

2010-03-12 Thread Fredrik Lundh
Fredrik Lundh added the comment: "I wouldn't raise much opposition against tobytes() as an alias for tostring(), although that sounds more like duplicating an otherwise simple API." Adding an alias would be a way address the 2.X/3.X terminology overlap; string traditionally implies 8-bit in 2

[issue8047] Serialiser in ElementTree returns unicode strings in Py3k

2010-03-12 Thread Fredrik Lundh
Changes by Fredrik Lundh : -- ___ Python tracker ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/opt

[issue8047] Serialiser in ElementTree returns unicode strings in Py3k

2010-03-12 Thread Fredrik Lundh
Fredrik Lundh added the comment: "I wouldn't raise much opposition against tobytes() as an alias for tostring(), although that sounds more like duplicating an otherwise simple API." Adding an alias would be a way address the 2.X/3.X terminology overlap; string traditionally implies 8-bit in 2

[issue8047] Serialiser in ElementTree returns unicode strings in Py3k

2010-03-12 Thread Fredrik Lundh
Fredrik Lundh added the comment: "Yes, the feature has been implemented deep down in the _encode() helper function, so it impacts the entire serialiser, not only its API" Ouch. >>> import locale >>> locale.getpreferredencoding() == "utf-8" False >>> from xml.etree.ElementTree import * >>> e =

[issue8047] Serialiser in ElementTree returns unicode strings in Py3k

2010-03-12 Thread Stefan Behnel
Stefan Behnel added the comment: "'None' has always been the documented default for the encoding parameter" What I meant here was that "help(ET.tostring)" will show you that as the default. Also, in the docs, the signature is "tostring(tree, encoding=None)", so None is the documented default

[issue8047] Serialiser in ElementTree returns unicode strings in Py3k

2010-03-12 Thread Fredrik Lundh
Fredrik Lundh added the comment: (what's the Python 3 replacement for the array module, btw?) -- ___ Python tracker ___ ___ Python-bug

[issue8047] Serialiser in ElementTree returns unicode strings in Py3k

2010-03-12 Thread Fredrik Lundh
Fredrik Lundh added the comment: "'None' has always been the documented default for the encoding parameter" That's probably mostly by accident at least in original ET, but the 1.3 draft docs at effbot.org/elementtree does spell it out explicitly for the 'write' method: Output encoding. If

[issue8047] Serialiser in ElementTree returns unicode strings in Py3k

2010-03-11 Thread Stefan Behnel
Stefan Behnel added the comment: One more thing: given that many web-frameworks are still not available for Py3 at this time, and that there are still tons of third-party libraries missing on that platform, I would be surprised if there was any ElementTree based XML/HTML processing code writt

[issue8047] Serialiser in ElementTree returns unicode strings in Py3k

2010-03-11 Thread Stefan Behnel
Stefan Behnel added the comment: Hi Guido, your comment was long overdue in this discussion. Guido van Rossum, 12.03.2010 01:35: > My thinking was that since an XML document looks like text, it should > probably be considered text, at least by default. (There may have > been some unittests tha

[issue8047] Serialiser in ElementTree returns unicode strings in Py3k

2010-03-11 Thread Antoine Pitrou
Changes by Antoine Pitrou : -- nosy: -pitrou ___ Python tracker ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python

[issue8047] Serialiser in ElementTree returns unicode strings in Py3k

2010-03-11 Thread Antoine Pitrou
Antoine Pitrou added the comment: Not wanting to waste my time anymore on this. -- ___ Python tracker ___ ___ Python-bugs-list mailing

[issue8047] Serialiser in ElementTree returns unicode strings in Py3k

2010-03-11 Thread Guido van Rossum
Guido van Rossum added the comment: Hey, can we all try to get along? For anyone who didn't follow the link to r56841, that was mine (though Christian Heimes provided the basis for much of the patch apart from elementtree), and I wrote at the time: """I had to fix a few tests and modules bey

[issue8047] Serialiser in ElementTree returns unicode strings in Py3k

2010-03-11 Thread R. David Murray
R. David Murray added the comment: Well, Benjamin pointed out to me that it would be a bad thing if array.tostring produced a string. True, the method is named wrong, but it is less broken than returning a string. I suspect that that is the same argument Fredrik is making: that returning th

[issue8047] Serialiser in ElementTree returns unicode strings in Py3k

2010-03-11 Thread Fredrik Lundh
Fredrik Lundh added the comment: So now it's the domain experts against some hypothetical people that might exist? Tricky. -- ___ Python tracker ___ ___

[issue8047] Serialiser in ElementTree returns unicode strings in Py3k

2010-03-11 Thread Antoine Pitrou
Antoine Pitrou added the comment: Le Thu, 11 Mar 2010 22:03:37 +, Fredrik Lundh a écrit : > > >>> import array > >>> array.array("i", [1, 2, 3]).tostring() > b'\x01\x00\x00\x00\x02\x00\x00\x00\x03\x00\x00\x00' The fact that array is old, rusty and slightly broken doesn't meen we should pr

[issue8047] Serialiser in ElementTree returns unicode strings in Py3k

2010-03-11 Thread Fredrik Lundh
Fredrik Lundh added the comment: >>> import array >>> array.array("i", [1, 2, 3]).tostring() b'\x01\x00\x00\x00\x02\x00\x00\x00\x03\x00\x00\x00' -- ___ Python tracker ___ ___

[issue8047] Serialiser in ElementTree returns unicode strings in Py3k

2010-03-11 Thread R. David Murray
R. David Murray added the comment: You may well be correct. But just because no one reported a bug does not mean that no one is using the API. The person using it may find it perfectly logical (and may be writing py3 only code, not porting py2 code). However, regardless of whether we decide

[issue8047] Serialiser in ElementTree returns unicode strings in Py3k

2010-03-11 Thread Stefan Behnel
Stefan Behnel added the comment: Then I would call that a clear sign that no-one actually stumbled over this feature in Py3 before I did, well hidden as it was. Still time to fix it. -- ___ Python tracker

[issue8047] Serialiser in ElementTree returns unicode strings in Py3k

2010-03-11 Thread Antoine Pitrou
Antoine Pitrou added the comment: > > Ha. There has been a very long temporal window > > You should have had plenty of time to fix it, then, right? Under the condition that someone would have actually reported it, yes. We don't magically fix bugs if nobody (including us) detects and reports th

[issue8047] Serialiser in ElementTree returns unicode strings in Py3k

2010-03-11 Thread Fredrik Lundh
Fredrik Lundh added the comment: > if I don't specify an encoding, I get unicode. If I do specify an encoding, > I get encoded bytes. You're confusing the XML document encoding with character set encoding. A serialized (unparsed) XML document is a byte stream, not a string of Unicode charac

[issue8047] Serialiser in ElementTree returns unicode strings in Py3k

2010-03-11 Thread R. David Murray
R. David Murray added the comment: I suspect that what Antoine is referring to is the fact that Python 3.1 has this behavior. Whether or not it is explicitly documented is a secondary issue. We're having a similar issue in the unittest package, where there's a new function, assertSameElement

[issue8047] Serialiser in ElementTree returns unicode strings in Py3k

2010-03-11 Thread Stefan Behnel
Stefan Behnel added the comment: Sorry, Antoine, but you can't possibly mean what you say here. The culprit in question is clearly one of the best hidden features of the new Py3 ET API. The only existing reference to it that I can find is the SVN commit comment when it was applied. How is tha

[issue8047] Serialiser in ElementTree returns unicode strings in Py3k

2010-03-11 Thread Antoine Pitrou
Antoine Pitrou added the comment: > The "no header" thing is very much done on purpose, and it's > documented in the upstream ElementTree documentation. I'm sorry, where is that? I can't find it either at http://effbot.org/zone/pythondoc-elementtree-ElementTree.htm#elementtree.ElementTree.tostr

[issue8047] Serialiser in ElementTree returns unicode strings in Py3k

2010-03-11 Thread Fredrik Lundh
Fredrik Lundh added the comment: The "no header" thing is very much done on purpose, and it's documented in the upstream ElementTree documentation. I suggest dropping this "Python 3 exists in its own universe" nonsense; it's not very professional, and it's hurting Python, its users, and all t

[issue8047] Serialiser in ElementTree returns unicode strings in Py3k

2010-03-08 Thread Antoine Pitrou
Changes by Antoine Pitrou : -- assignee: -> georg.brandl components: +Documentation nosy: +georg.brandl ___ Python tracker ___ ___ Pyt

[issue8047] Serialiser in ElementTree returns unicode strings in Py3k

2010-03-08 Thread Antoine Pitrou
Antoine Pitrou added the comment: Le Mon, 08 Mar 2010 09:01:19 +, Stefan Behnel a écrit : > > Antoine, in the same comment, you say that it was not backported to > Py2 in order to prevent breaking existing code, and then you ask if > it's difficult to support in lxml. ;-) I meant breaking

[issue8047] Serialiser in ElementTree returns unicode strings in Py3k

2010-03-08 Thread Florent Xicluna
Florent Xicluna added the comment: With ET 1.3, you should have an explicit keyword argument "xml_declaration": # if xml_declaration or (xml_declaration is None and encoding not in ("utf-8", "us-ascii")): if method == "xml":

[issue8047] Serialiser in ElementTree returns unicode strings in Py3k

2010-03-08 Thread Stefan Behnel
Stefan Behnel added the comment: Antoine, in the same comment, you say that it was not backported to Py2 in order to prevent breaking existing code, and then you ask if it's difficult to support in lxml. ;-) Supporting the same behaviour in lxml would either mean that it breaks existing code

[issue8047] Serialiser in ElementTree returns unicode strings in Py3k

2010-03-07 Thread Antoine Pitrou
Antoine Pitrou added the comment: As Florent said, it is a rule of py3k to avoid implicit encoding/decoding. The fact that it could have made sense for 2.x as well is not relevant, since the change was only done in py3k (and for good reason: we normally try not to break compatibility without

[issue8047] Serialiser in ElementTree returns unicode strings in Py3k

2010-03-07 Thread Stefan Behnel
Stefan Behnel added the comment: It has been brought up several times that ET is special in the stdlib in that it is an externally maintained package. Correct me if I'm wrong, but the rules seem to be: features come outside, adaptation to Py3 can happen inside. What we are talking about here

[issue8047] Serialiser in ElementTree returns unicode strings in Py3k

2010-03-05 Thread Antoine Pitrou
Antoine Pitrou added the comment: I don't know what compatibility you are talking about. Py3k deliberately breaks compatibility with many 2.x behaviours that were considered defective or suboptimal. -- nosy: +pitrou ___ Python tracker

[issue8047] Serialiser in ElementTree returns unicode strings in Py3k

2010-03-03 Thread Florent Xicluna
Florent Xicluna added the comment: With ET 1.3, the serializer ElementTree.write() should output bytes only. And the default encoding is still US-ASCII. The new behaviour is specific to the 3.x branch (since 3.0, r56841). Even if it is not fully backward compatible, I don't find this behavior

[issue8047] Serialiser in ElementTree returns unicode strings in Py3k

2010-03-03 Thread R. David Murray
R. David Murray added the comment: My understanding is that backward compatibility, while nice to retain, was not considered a stopper for cleaning up interfaces in py3. Exactly how considered this change was, I have no idea, but as I said it does make sense to me. As for 2.x, what's there

[issue8047] Serialiser in ElementTree returns unicode strings in Py3k

2010-03-03 Thread Stefan Behnel
Stefan Behnel added the comment: I agree that the lxml API is somewhat clumsy here. I just mentioned it to show that there are already ways to do it in a backwards compatible way, so this change does two things: it breaks existing code, and it does so in a way that is incompatible with other

[issue8047] Serialiser in ElementTree returns unicode strings in Py3k

2010-03-03 Thread R. David Murray
R. David Murray added the comment: I'm not an ElementTree user, but that spelling (etree.tostring(encode=str), or even etree.tostring(encode=unicode)) strikes me as horrible. You don't encode to unicode, you *decode* to unicode. Thus the current Python3 interface works the way I'd expect: i

[issue8047] Serialiser in ElementTree returns unicode strings in Py3k

2010-03-02 Thread Stefan Behnel
New submission from Stefan Behnel : The xml.etree.ElementTree package in the Python 3.x standard library breaks compatibility with existing ET 1.2 code. The serialiser returns a unicode string when no encoding is passed. Previously, the serialiser was guaranteed to return a byte string. By def