On Jan 4, 8:03 am, mario <[EMAIL PROTECTED]> wrote: > On Jan 2, 2:25 pm, Piet van Oostrum <[EMAIL PROTECTED]> wrote: > > > Apparently for the empty string the encoding is irrelevant as it will not > > be used. I guess there is an early check for this special case in the code. > > In the module I an working on [*] I am remembering a failed encoding > to allow me, if necessary, to later re-process fewer encodings.
If you were in fact doing that, you would not have had a problem. What you appear to have been doing is (a) remembering a NON-failing encoding, and assuming that it would continue not to fail (b) not differentiating between failure reasons (codec doesn't exist, input not consistent with specified encoding). A good strategy when dealing with encodings that are unknown (in the sense that they come from user input, or a list of encodings you got out of the manual, or are constructed on the fly (e.g. encoding = 'cp' + str(code_page_number) # old MS Excel files)) is to try to decode some vanilla ASCII alphabetic text, so that you can give an immemdiate in-context error message. > In the > case of an empty string AND an unknown encoding this strategy > failed... > > Anyhow, the question is, should the behaviour be the same for these > operations, and if so what should it be: > > u"".encode("non-existent") > unicode("", "non-existent") Perhaps you should make TWO comparisons: (1) unistrg = strg.decode(encoding) with unistrg = unicode(strg, encoding) [the latter "optimises" the case where strg is ''; the former can't because its output may be '', not u'', depending on the encoding, so ut must do the lookup] (2) unistrg = strg.decode(encoding) with strg = unistrg.encode(encoding) [both always do the lookup] In any case, a pointless question (IMHO); the behaviour is extremely unlikely to change, as the chance of breaking existing code outvotes any desire to clean up a minor inconsistency that is easily worked around. -- http://mail.python.org/mailman/listinfo/python-list