Le mercredi 9 novembre 2011 22:03:52, vous avez écrit : > > > Should we: > > * Drop this codec (public and documented, but I don't know if it is > > used) * Use wchar_t* (Py_UNICODE*) to provide a result similar to > > Python 3.2, and > > > > so fix the decoder to handle surrogate pairs > > > > * Use the real representation (Py_UCS1*, Py_UCS2 or Py_UCS4* string) > > It's described as "Return the internal representation of the operand". > That would suggest that the last choice (i.e. return the real internal > representation) would be best, except that this doesn't round-trip. > Adding a prefix byte indicating the kind (and perhaps also the ASCII > flag) would then be closest to the real representation. > > As that is likely not very useful, and might break some applications > of the encoding (if there are any at all) which might expect to > pass unicode-internal strings across Python versions, I would then > also deprecate the encoding.
After a quick search on Google codesearch (before it disappears!), I don't think that "encoding" a Unicode string to its internal PEP-393 representation would satisfy any program. It looks like wchar_t* is a better candidate. Programs use maybe unicode_internal to decode strings coming from libraries using wchar_t* (and no PyUnicodeObject). taskcoach, drag & drop code using wxPython: data = self.__thunderbirdMailDataObject.GetData() # We expect the data to be encoded with 'unicode_internal', # but on Fedora it can also be 'utf-16', be prepared: try: data = data.decode('unicode_internal') except UnicodeDecodeError: data = data.decode('utf-16') => thunderbirdMailDataObject.GetData() result type should be a Unicode, not bytes hydrat, tokenizer: def bytes(str): return filter(lambda x: x != '\x00', str.encode('unicode_internal')) => this algorithm is really strange... djebel, fscache/rst.py class RstDocument(object): ... def __init__(self, path, options={}): opts = {'input_encoding': 'euc-jp', 'output_encoding': 'unicode_internal', 'doctitle_xform': True, 'file_insertion_enabled': True} ... doctree = core.publish_doctree(source=file(path, 'rb').read(), ..., settings_overrides=opts) ... content = parts['html_body'] or u'' if not isinstance(content, unicode): content = unicode(content, 'unicode_internal') if not isinstance(title, unicode): title = unicode(title, 'unicode_internal') ... => I don't understand this code Victor _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com