Am 04.05.2010 um 11:09 schrieb Gisle Aas: > I regret that I let \C sneak into the URI module. Now we have an interface > that depends on the internal UTF-8 flag of the stings passed in.
Does it? How so? If it's a byte string, well, it's a byte string, and \C doesn't change that. If, on the other hand, it's a text string, \C forces byte semantics upon it. Isn't that what you want to do in that function? (Okay, there's no spec for that function, so I don't really know what you want to do.) But doesn't the function return the same result regardless of the UTF-8 flag being set or not? As demonstrated by this test script: use strict; use warnings; use utf8; # source in UTF-8 use Encode; binmode STDOUT, ':utf8'; # terminal UTF-8 my $text = 'Käse'; # all characters below 256 my $bytes = encode_utf8 $text; my $text2 = 'Jiří'; # some characters above 255 my $bytes2 = encode_utf8 $text2; printf "%x %s\n", ord $_, $_ for $text, $text =~ m/(\C)/g, $bytes, $bytes =~ m/(\C)/g, $text2, $text2 =~ m/(\C)/g, $bytes2, $bytes2 =~ m/(\C)/g; > This makes it very hard to explain, makes it not do what you want when > different type of strings are combined and makes it hard to fix in ways that > don't break some code. Could you provide an example of how this might not do what you want when different types of strings are combined? > My plan for fixing this is to introduce URI::IRI with an interface that > encode all non-URI characters as percent-encoded UTF-8 and live with the > inconsistency for URI (until Perl redefine what \C means). -- Michael.Ludwig (#) XING.com