Re: [PATCH 2/4] gitweb: Make feed title valid utf8
Jakub Narębski jna...@gmail.com writes: Jürgen Kreileder wrote: Properly encode site and project names for RSS and Atom feeds. Signed-off-by: Jürgen Kreileder j...@blackdown.de --- gitweb/gitweb.perl |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl index 9cfe5b5..09294eb 100755 --- a/gitweb/gitweb.perl +++ b/gitweb/gitweb.perl @@ -8056,7 +8056,7 @@ sub git_feed { return if ($cgi-request_method() eq 'HEAD'); # header variables -my $title = $site_name - $project/$action; +my $title = to_utf8($site_name) . - . to_utf8($project) . /$action; my $feed_type = 'log'; if (defined $hash) { $title .= - '$hash'; Was this patch triggered by some bug? Yes, I actually see broken encoding with the old code, e.g on https://git.blackdown.de/old.cgi?p=contactalbum.git;a=rss my first name is messed up in the title tag. New version: https://git.blackdown.de/?p=contactalbum.git;a=rss Because the above is not necessary, as git_feed() has $title = esc_html($title); a bit later, which does to_utf8() internally. Good point. But it doesn't fix the string in question: It looks like to_utf8($a $b) != (to_utf8($a) . . to_utf($b)). Juergen -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/4] gitweb: Make feed title valid utf8
W dniu 09.04.2013 19:40, Jürgen Kreileder napisał: Jakub Narębski jna...@gmail.com writes: Jürgen Kreileder wrote: Properly encode site and project names for RSS and Atom feeds. - my $title = $site_name - $project/$action; + my $title = to_utf8($site_name) . - . to_utf8($project) . /$action; Was this patch triggered by some bug? Yes, I actually see broken encoding with the old code, e.g on https://git.blackdown.de/old.cgi?p=contactalbum.git;a=rss my first name is messed up in the title tag. New version: https://git.blackdown.de/?p=contactalbum.git;a=rss Because the above is not necessary, as git_feed() has $title = esc_html($title); a bit later, which does to_utf8() internally. Good point. But it doesn't fix the string in question: It looks like to_utf8($a $b) != (to_utf8($a) . . to_utf8($b)). Strange. I wonder if the bug is in our to_utf8() implementation, or in Encode, or in Perl... and whether this bug can be triggered anywhere else in gitweb. What Perl version and Encode module version do you use? -- Jakub Narębski -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/4] gitweb: Make feed title valid utf8
Jakub Narębski jna...@gmail.com writes: W dniu 09.04.2013 19:40, Jürgen Kreileder napisał: Jakub Narębski jna...@gmail.com writes: Jürgen Kreileder wrote: Properly encode site and project names for RSS and Atom feeds. - my $title = $site_name - $project/$action; + my $title = to_utf8($site_name) . - . to_utf8($project) . /$action; Was this patch triggered by some bug? Yes, I actually see broken encoding with the old code, e.g on https://git.blackdown.de/old.cgi?p=contactalbum.git;a=rss my first name is messed up in the title tag. New version: https://git.blackdown.de/?p=contactalbum.git;a=rss Because the above is not necessary, as git_feed() has $title = esc_html($title); a bit later, which does to_utf8() internally. Good point. But it doesn't fix the string in question: It looks like to_utf8($a $b) != (to_utf8($a) . . to_utf8($b)). Strange. I wonder if the bug is in our to_utf8() implementation, or in Encode, or in Perl... and whether this bug can be triggered anywhere else in gitweb. I don't think it's a bug, more like a consequence of concatenating utf8 and non-utf8 strings: my $a = ü; my $b = ü; my $c = $a - $b; print $c - . to_utf8($c) . : . (utf8::is_utf8($c) ? utf8 : not utf8) . \n; # GOOD $b = to_utf8($b); $c = $a - $b; print $c - . to_utf8($c) . : . (utf8::is_utf8($c) ? utf8 : not utf8) . \n; # GOOD yields (hopefully the broken encoding shows up correctly here): ü - ü - ü - ü: not utf8 ü - ü - ü - ü: utf8 In gitweb we have the bad case: my $title = $site_name - $project/$action; $project and $action are apparently utf8 already but $site_name isn't. The resulting string is marked as utf8 - although the encoding of $site_name was never fixed. The to_utf8() in esc_html() returns the string without fixing anything because of that. What Perl version and Encode module version do you use? 5.14.2 and 2.42_01 on Ubuntu. Same results with 5.12.4 and 2.39 on OS X. Juergen -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/4] gitweb: Make feed title valid utf8
W dniu 09.04.2013 21:22, Jürgen Kreileder napisał: Jakub Narębski jna...@gmail.com writes: W dniu 09.04.2013 19:40, Jürgen Kreileder napisał: Jakub Narębski jna...@gmail.com writes: Jürgen Kreileder wrote: Properly encode site and project names for RSS and Atom feeds. Good point. But it doesn't fix the string in question: It looks like to_utf8($a $b) != (to_utf8($a) . . to_utf8($b)). Strange. I wonder if the bug is in our to_utf8() implementation, or in Encode, or in Perl... and whether this bug can be triggered anywhere else in gitweb. I don't think it's a bug, more like a consequence of concatenating utf8 and non-utf8 strings: my $a = ü; my $b = ü; my $c = $a - $b; print $c - . to_utf8($c) . : . (utf8::is_utf8($c) ? utf8 : not utf8) . \n; # GOOD $b = to_utf8($b); $c = $a - $b; print $c - . to_utf8($c) . : . (utf8::is_utf8($c) ? utf8 : not utf8) . \n; # GOOD yields (hopefully the broken encoding shows up correctly here): ü - ü - ü - ü: not utf8 ü - ü - ü - ü: utf8 Ah, so it looks like it is misfeature of the way Perl handles Unicode; concatenating adds 'UTF8' flag if either of concatenates strings has it to the result. [Which I have checked using Devel::Peek with perl -MDevel::Peek -E ' my $a = ż; my $b = \x{17c}; Dump $a; Dump $b; Dump $b - $a' ] In gitweb we have the bad case: my $title = $site_name - $project/$action; $project and $action are apparently utf8 already but $site_name isn't. $project and $action are taken from URL, and we have to run decode_utf8 (at least for query params) for gitweb to work correctly. $site_name is usually taken from config file, and gitweb doesn't have use utf8 pragma. The resulting string is marked as utf8 - although the encoding of $site_name was never fixed. The to_utf8() in esc_html() returns the string without fixing anything because of that. O.K. _Maybe_ it would be worth adding explanation of this to commit message (and I see I should audit gitweb for similar problems elsewhere), but anyway Acked-by: Jakub Narebski jna...@gmail.com -- Jakub Narębski -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/4] gitweb: Make feed title valid utf8
Properly encode site and project names for RSS and Atom feeds. Signed-off-by: Jürgen Kreileder j...@blackdown.de --- gitweb/gitweb.perl |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl index 9cfe5b5..09294eb 100755 --- a/gitweb/gitweb.perl +++ b/gitweb/gitweb.perl @@ -8056,7 +8056,7 @@ sub git_feed { return if ($cgi-request_method() eq 'HEAD'); # header variables - my $title = $site_name - $project/$action; + my $title = to_utf8($site_name) . - . to_utf8($project) . /$action; my $feed_type = 'log'; if (defined $hash) { $title .= - '$hash'; -- 1.7.10.4 -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html