[issue10783] struct.pack() and Unicode strings

2010-12-28 Thread David Beazley
David Beazley added the comment: Thanks everyone for looking at this! -- ___ Python tracker ___ ___ Python-bugs-list mailing list Uns

[issue10783] struct.pack() and Unicode strings

2010-12-28 Thread Georg Brandl
Georg Brandl added the comment: Thanks, Victor! -- ___ Python tracker ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail

[issue10783] struct.pack() and Unicode strings

2010-12-28 Thread STINNER Victor
STINNER Victor added the comment: Fixed by r87537. Thanks Amaury for your review! I also removed some ugly (implicit) conversions from test_struct. -- resolution: -> fixed status: open -> closed ___ Python tracker

[issue10783] struct.pack() and Unicode strings

2010-12-28 Thread David Beazley
David Beazley added the comment: As a user of Python 3, I would like echo Victor's comment about fixing the API right now as opposed to having to deal with it later. I can only speak for myself, but I would guess that anyone using Python 3 already understands that it's bleeding edge and that

[issue10783] struct.pack() and Unicode strings

2010-12-28 Thread Amaury Forgeot d'Arc
Amaury Forgeot d'Arc added the comment: Since the Release Manager agrees with the change, I withdraw my objection. I have three remarks to the patch: - Some examples in the documentation should be fixed: http://docs.python.org/dev/py3k/library/struct.html#examples >>> pack('ci', '*', 0x121314

[issue10783] struct.pack() and Unicode strings

2010-12-28 Thread Georg Brandl
Georg Brandl added the comment: I agree this automatic conversion is broken and should be fixed. Not sure if emitting a DeprecationWarning now and fixing it 18 months later is the right thing, especially since DeprecationWarnings are now silent. As Victor says, the incompatibility is explicit

[issue10783] struct.pack() and Unicode strings

2010-12-28 Thread STINNER Victor
STINNER Victor added the comment: Amaury> At this point a feature change seems unlikely, Amaury> but it's not too late to emit a DeprecationWarning. I prefer to break the API today than having to maintain a broken API for 10 or 20 years :-) And we have a very small user base using Python 3, it

[issue10783] struct.pack() and Unicode strings

2010-12-28 Thread Amaury Forgeot d'Arc
Amaury Forgeot d'Arc added the comment: But there are probably working usages with unicode strings out there. For example, I've seen code like struct.pack('<6sHHBBB', 'GIF87a', ...) Do you suggest to make this 3.1 code stop working in 3.2? In any case, the 'c' format should probably be changed

[issue10783] struct.pack() and Unicode strings

2010-12-28 Thread Raymond Hettinger
Raymond Hettinger added the comment: >> At this point a feature change seems unlikely, Amaury, it is not too late to fix anything that's broken. New features are out, but we are free to fix anything hosed this badly. -- ___ Python tracker

[issue10783] struct.pack() and Unicode strings

2010-12-27 Thread Raymond Hettinger
Raymond Hettinger added the comment: Errors should not pass silently :-) Given the buggy behavior, we have several options including just removing the implicit conversion and going back to bytes only. -- ___ Python tracker

[issue10783] struct.pack() and Unicode strings

2010-12-27 Thread Amaury Forgeot d'Arc
Amaury Forgeot d'Arc added the comment: At this point a feature change seems unlikely, but it's not too late to emit a DeprecationWarning. -- nosy: +amaury.forgeotdarc ___ Python tracker _

[issue10783] struct.pack() and Unicode strings

2010-12-27 Thread R. David Murray
R. David Murray added the comment: >>> struct.pack('2s', 'ha') b'ha' >>> struct.pack('2s', 'hé') b'h\xc3' >>> struct.pack('3s', 'hé') b'h\xc3\xa9' That looks like a *buggy* api to me, too. I don't see how we can let that stand. -- _

[issue10783] struct.pack() and Unicode strings

2010-12-27 Thread STINNER Victor
Changes by STINNER Victor : -- resolution: invalid -> status: closed -> open ___ Python tracker ___ ___ Python-bugs-list mailing list

[issue10783] struct.pack() and Unicode strings

2010-12-27 Thread STINNER Victor
STINNER Victor added the comment: This "feature" was introduced in a big commit from Guido van Rossum (made before Python 3.0): r55500. The changelog is strange because it starts with "Make test_zipfile pass. The zipfile module now does all I/O in binary mode using bytes." but ends with "The

[issue10783] struct.pack() and Unicode strings

2010-12-27 Thread David Beazley
David Beazley added the comment: Actually, here's another one of my favorite examples: >>> import struct >>> struct.pack("s","\xf1") b'\xc3' >>> Not only does this not encode the correct value, it doesn't even encode the entire UTF-8 encoding (just the first byte of it). Like I said, pity

[issue10783] struct.pack() and Unicode strings

2010-12-27 Thread David Beazley
David Beazley added the comment: I encountered this issue is in the context of distributed computing/interprocess communication involving binary-encoded records (and encoding/decoding such records using struct). At its core, this is all about I/O--something where encodings and decoding matter

[issue10783] struct.pack() and Unicode strings

2010-12-27 Thread Raymond Hettinger
Raymond Hettinger added the comment: A possible answer to "why is this encoding at all" was probably to make it easier to transition code from python 2.x where strings were usually ascii and it would make no difference in output if encoded in utf-8. The 2-to-3 fixer was good at handling name

[issue10783] struct.pack() and Unicode strings

2010-12-27 Thread Raymond Hettinger
Raymond Hettinger added the comment: Many of these kind of "decisions" were made quickly, haphazardly, and with almost no discussion and were made by contributors who were new to Python core development (no familiar with the API norms). Given the rats nest of bytes/text problems in Py3.0 and

[issue10783] struct.pack() and Unicode strings

2010-12-27 Thread David Beazley
David Beazley added the comment: Why is it even encoding at all? Almost every other part of Python 3 forces you to be explicit about bytes/string conversion. For example: struct.pack("10s", x.encode('utf-8')) Given that automatic conversion is documented, it's not clear what can be done at

[issue10783] struct.pack() and Unicode strings

2010-12-27 Thread Raymond Hettinger
Raymond Hettinger added the comment: Can we at least offer an optional choice of encoding? -- nosy: +rhettinger ___ Python tracker ___ __

[issue10783] struct.pack() and Unicode strings

2010-12-27 Thread R. David Murray
R. David Murray added the comment: But clearly intentional, and now enshrined in released code. -- nosy: +mark.dickinson, r.david.murray resolution: -> invalid stage: -> committed/rejected status: open -> closed ___ Python tracker

[issue10783] struct.pack() and Unicode strings

2010-12-27 Thread David Beazley
David Beazley added the comment: Hmmm. Well, the docs seem to say that it's allowed and that it will be encoded as UTF-8. Given the treatment of Unicode/bytes elsewhere in Python 3, all I can say is that this behavior is rather surprising. -- __

[issue10783] struct.pack() and Unicode strings

2010-12-27 Thread David Beazley
David Beazley added the comment: Note: This is what happens in Python 2.6.4: >>> import struct >>> struct.pack("10s",u"Jalape\u00f1o") Traceback (most recent call last): File "", line 1, in struct.error: argument for 's' must be a string >>> -- _

[issue10783] struct.pack() and Unicode strings

2010-12-27 Thread David Beazley
New submission from David Beazley : Is the struct.pack() function supposed to automatically encode Unicode strings into binary? For example: >>> struct.pack("10s","Jalape\u00f1o") b'Jalape\xc3\xb1o\x00' >>> This is Python 3.2b1. -- components: Library (Lib) messages: 124727 nosy: dab