* Michael Ludwig <michael.lud...@xing.com> [2010-05-04 14:55]: > But wait a second: While URIs are meant to be made of > characters, they're also meant to go over the wire, and there > are no characters on the wire, only bytes. There is no standard > encoding defined for the wire, although UTF-8 has come to be > seen as the standard encoding for URIs containing non-ASCII > characters. Perl having two standard encodings (UTF-8 and > ISO-8859-1) for text and relying on the internal flag to tell > which one is meant to matter, shouldn't the URI module either > only accept bytes or only characters? Or rather, provide two > different constructors instead of only one trying to be > intelligent? > > URI->bytes( $bytes ); # byte string > URI->chars( $chars ); # character string > > And, in addition, define the character encoding used for > serialization.
Yes, exactly. And both methods would use the moral equivalent of a plain `split //` – no trickery such as with `\C`. The only difference between then is that the `chars` method would `encode_utf8` the string first and then encode it blindly, whereas the `bytes` method would leave it as is but then croak if it found a codepoint > 0xFF (since the string is supposed to represent an octet sequence already). Notably absent in both cases: any dependence on the state of the UTF8 flag of the string. Regards, -- Aristotle Pagaltzis // <http://plasmasturm.org/>