Glyph Lefkowitz writes: > On Jun 21, 2010, at 10:58 PM, Stephen J. Turnbull wrote:
> > Note also that the "complete solution" argument cuts both ways. Eg, a > > "complete" solution should implement UTS 39 "confusables detection"[1] > > and IDNA[2]. Good luck doing that with bytes! > > And good luck doing that with just characters, too. I agree with you, sorry. I meant to cast doubt on the idea of complete solutions, or at least claims that completeness is an excuse for putting it in the stdlib. > This is the limitation that everyone seems to keep dancing around. > If you are using the stdlib, with functions that operate on > sequences like 'str' or 'bytes', you need to choose from one of > three options: There's a *fourth* way: specially designed codecs to preserve as much metainformation as you need, while always using the str format internally. This can be done for at least 100,000 separate (character, encoding) pairs by multiplexing into private space with an auxiliary table of encodings and equivalences. That's probably overkill. In many cases, adding simple PEP 383 mechanism (to preserve uninterpreted bytes) might be enough though, and that's pretty plausible IMO. _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com