P.J. Eby writes: > I do know the ultimate target codec -- that's the point. > > IOW, I want to be able to do to all my operations by passing > target-encoded strings to polymorphic functions.
IOW, you *do* have text and (ignoring efficiency issues) could just as well use str. But That Other Code is unreliable, so you need a marker for your own internal strings indicating that they are validated, while other strings are not. This has nothing to do with bytes vs. str as string types, then; it's all about validated (which your architecture indicates by using the bytes type) vs. unvalidated (which your architecture indicates with unicode). Eg, in the case of your USPS vs. ecommerce example, you can't even handle all bytes, so not all possible bytes objects are valid. And other applications might not be able to handle all Japanese, but only a subset, so having valid EUC-JP wouldn't be enough, you'd have to check repertoire -- might as well use str. It seems to me what is wanted here is something like Perl's taint mechanism, for *both* kinds of strings. Am I missing something? But with your architecture, it seems to me that you actually don't want polymorphic functions in the stdlib. You want the stdlib functions to be bytes-oriented if and only if they are reliable. (This is what I was saying to Guido elsewhere.) BTW, this was a little unclear to me: > [Collisions will] be with other *unicode* strings. Ones coming > from other code, and literals embedded in the stdlib. What about the literals in the stdlib? Are you saying they contain invalid code points for your known output encoding? Or are you saying that with non-polymorphic unicode stdlib, you get lots of false positives when combining with your validated bytes? _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com