Re: unicode(s, enc).encode(enc) == s ?

2008-01-03 Thread mario
On Jan 2, 9:34 pm, Martin v. Löwis [EMAIL PROTECTED] wrote: In any case, it goes well beyond the situation that triggered my original question in the first place, that basically was to provide a reasonable check on whether round-tripping a string is successful -- this is in the context of

Re: unicode(s, enc).encode(enc) == s ?

2008-01-03 Thread mario
Thanks again. I will chunk my responses as your message has too much in it for me to process all at once... On Jan 2, 9:34 pm, Martin v. Löwis [EMAIL PROTECTED] wrote: Thanks a lot Martin and Marc for the really great explanations! I was wondering if it would be reasonable to imagine a

Re: unicode(s, enc).encode(enc) == s ?

2008-01-02 Thread mario
Thanks a lot Martin and Marc for the really great explanations! I was wondering if it would be reasonable to imagine a utility that will determine whether, for a given encoding, two byte strings would be equivalent. But I think such a utility will require *extensive* knowledge about many

Re: unicode(s, enc).encode(enc) == s ?

2008-01-02 Thread Martin v. Löwis
Thanks a lot Martin and Marc for the really great explanations! I was wondering if it would be reasonable to imagine a utility that will determine whether, for a given encoding, two byte strings would be equivalent. But that is much easier to answer: s1.decode(enc) == s2.decode(enc)

Re: unicode(s, enc).encode(enc) == s ?

2007-12-28 Thread mario
On Dec 27, 7:37 pm, Martin v. Löwis [EMAIL PROTECTED] wrote: Certainly. ISO-2022 is famous for having ambiguous encodings. Try these: unicode(Hallo,iso-2022-jp) unicode(\x1b(BHallo,iso-2022-jp) unicode(\x1b(JHallo,iso-2022-jp) unicode(\x1b(BHal\x1b(Jlo,iso-2022-jp) or likewise

Re: unicode(s, enc).encode(enc) == s ?

2007-12-28 Thread Marc 'BlackJack' Rintsch
On Fri, 28 Dec 2007 03:00:59 -0800, mario wrote: On Dec 27, 7:37 pm, Martin v. Löwis [EMAIL PROTECTED] wrote: Certainly. ISO-2022 is famous for having ambiguous encodings. Try these: unicode(Hallo,iso-2022-jp) unicode(\x1b(BHallo,iso-2022-jp) unicode(\x1b(JHallo,iso-2022-jp)

Re: unicode(s, enc).encode(enc) == s ?

2007-12-28 Thread Martin v. Löwis
Wow, that's not easy to see why would anyone ever want that? Is there any logic behind this? It's the pre-Unicode solution to the we want to have many characters encoded in a single file problem. Suppose you have pre-defined characters sets A, B, C, and you want text to contain characters from

unicode(s, enc).encode(enc) == s ?

2007-12-27 Thread mario
I have checks in code, to ensure a decode/encode cycle returns the original string. Given no UnicodeErrors, are there any cases for the following not to be True? unicode(s, enc).encode(enc) == s mario -- http://mail.python.org/mailman/listinfo/python-list

Re: unicode(s, enc).encode(enc) == s ?

2007-12-27 Thread Martin v. Löwis
Given no UnicodeErrors, are there any cases for the following not to be True? unicode(s, enc).encode(enc) == s Certainly. ISO-2022 is famous for having ambiguous encodings. Try these: unicode(Hallo,iso-2022-jp) unicode(\x1b(BHallo,iso-2022-jp) unicode(\x1b(JHallo,iso-2022-jp) unicode