Re: [PATCH 2/4] gitweb: Make feed title valid utf8

2013-04-09 Thread Jürgen Kreileder
Jakub Narębski jna...@gmail.com writes:

 Jürgen Kreileder wrote:

 Properly encode site and project names for RSS and Atom feeds.
 
 Signed-off-by: Jürgen Kreileder j...@blackdown.de
 ---
  gitweb/gitweb.perl |2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)
 
 diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
 index 9cfe5b5..09294eb 100755
 --- a/gitweb/gitweb.perl
 +++ b/gitweb/gitweb.perl
 @@ -8056,7 +8056,7 @@ sub git_feed {
  return if ($cgi-request_method() eq 'HEAD');
  
  # header variables
 -my $title = $site_name - $project/$action;
 +my $title = to_utf8($site_name) .  -  . to_utf8($project) . 
 /$action;
  my $feed_type = 'log';
  if (defined $hash) {
  $title .=  - '$hash';
 

 Was this patch triggered by some bug?

Yes, I actually see broken encoding with the old code, e.g on 
https://git.blackdown.de/old.cgi?p=contactalbum.git;a=rss
my first name is messed up in the title tag.

New version: https://git.blackdown.de/?p=contactalbum.git;a=rss

 Because the above is not necessary, as git_feed() has

   $title = esc_html($title);

 a bit later, which does to_utf8() internally.

Good point.  But it doesn't fix the string in question:
It looks like to_utf8($a $b) != (to_utf8($a) .   . to_utf($b)).


Juergen
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/4] gitweb: Make feed title valid utf8

2013-04-09 Thread Jakub Narębski
W dniu 09.04.2013 19:40, Jürgen Kreileder napisał:
 Jakub Narębski jna...@gmail.com writes:
 Jürgen Kreileder wrote:

 Properly encode site and project names for RSS and Atom feeds.

 -   my $title = $site_name - $project/$action;
 +   my $title = to_utf8($site_name) .  -  . to_utf8($project) . 
 /$action;

 Was this patch triggered by some bug?
 
 Yes, I actually see broken encoding with the old code, e.g on 
 https://git.blackdown.de/old.cgi?p=contactalbum.git;a=rss
 my first name is messed up in the title tag.
 
 New version: https://git.blackdown.de/?p=contactalbum.git;a=rss
 
 Because the above is not necessary, as git_feed() has

  $title = esc_html($title);

 a bit later, which does to_utf8() internally.
 
 Good point.  But it doesn't fix the string in question:
 It looks like to_utf8($a $b) != (to_utf8($a) .   . to_utf8($b)).

Strange.  I wonder if the bug is in our to_utf8() implementation,
or in Encode, or in Perl... and whether this bug can be triggered
anywhere else in gitweb.

What Perl version and Encode module version do you use?
-- 
Jakub Narębski
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/4] gitweb: Make feed title valid utf8

2013-04-09 Thread Jürgen Kreileder
Jakub Narębski jna...@gmail.com writes:

 W dniu 09.04.2013 19:40, Jürgen Kreileder napisał:
 Jakub Narębski jna...@gmail.com writes:
 Jürgen Kreileder wrote:

 Properly encode site and project names for RSS and Atom feeds.

 -  my $title = $site_name - $project/$action;
 +  my $title = to_utf8($site_name) .  -  . to_utf8($project) . 
 /$action;

 Was this patch triggered by some bug?
 
 Yes, I actually see broken encoding with the old code, e.g on 
 https://git.blackdown.de/old.cgi?p=contactalbum.git;a=rss
 my first name is messed up in the title tag.
 
 New version: https://git.blackdown.de/?p=contactalbum.git;a=rss
 
 Because the above is not necessary, as git_feed() has

 $title = esc_html($title);

 a bit later, which does to_utf8() internally.
 
 Good point.  But it doesn't fix the string in question:
 It looks like to_utf8($a $b) != (to_utf8($a) .   . to_utf8($b)).

 Strange.  I wonder if the bug is in our to_utf8() implementation,
 or in Encode, or in Perl... and whether this bug can be triggered
 anywhere else in gitweb.

I don't think it's a bug, more like a consequence of concatenating utf8
and non-utf8 strings:

my $a = ü;
my $b = ü;
my $c = $a - $b;
print $c - . to_utf8($c) . :  . (utf8::is_utf8($c) ? utf8 : not 
utf8) . \n; # GOOD
$b = to_utf8($b);
$c = $a - $b;
print $c - . to_utf8($c) . :  . (utf8::is_utf8($c) ? utf8 : not 
utf8) . \n; # GOOD

yields (hopefully the broken encoding shows up correctly here):

ü - ü - ü - ü: not utf8
ü - ü - ü - ü: utf8


In gitweb we have the bad case: 

   my $title = $site_name - $project/$action;

$project and $action are apparently utf8 already but $site_name isn't.
The resulting string is marked as utf8 - although the encoding of
$site_name was never fixed.  The to_utf8() in esc_html() returns the string
without fixing anything because of that.

 What Perl version and Encode module version do you use?

5.14.2 and 2.42_01 on Ubuntu.  Same results with 5.12.4 and 2.39 on OS X.


   Juergen
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/4] gitweb: Make feed title valid utf8

2013-04-09 Thread Jakub Narębski
W dniu 09.04.2013 21:22, Jürgen Kreileder napisał:
 Jakub Narębski jna...@gmail.com writes: 
 W dniu 09.04.2013 19:40, Jürgen Kreileder napisał:
 Jakub Narębski jna...@gmail.com writes:
 Jürgen Kreileder wrote:

 Properly encode site and project names for RSS and Atom feeds.

 Good point.  But it doesn't fix the string in question:
 It looks like to_utf8($a $b) != (to_utf8($a) .   . to_utf8($b)).

 Strange.  I wonder if the bug is in our to_utf8() implementation,
 or in Encode, or in Perl... and whether this bug can be triggered
 anywhere else in gitweb.
 
 I don't think it's a bug, more like a consequence of concatenating utf8
 and non-utf8 strings:
 
 my $a = ü;
 my $b = ü;
 my $c = $a - $b;
 print $c - . to_utf8($c) . :  . (utf8::is_utf8($c) ? utf8 : not 
 utf8) . \n; # GOOD
 $b = to_utf8($b);
 $c = $a - $b;
 print $c - . to_utf8($c) . :  . (utf8::is_utf8($c) ? utf8 : not 
 utf8) . \n; # GOOD
 
 yields (hopefully the broken encoding shows up correctly here):
 
 ü - ü - ü - ü: not utf8
 ü - ü - ü - ü: utf8

Ah, so it looks like it is misfeature of the way Perl handles Unicode;
concatenating adds 'UTF8' flag if either of concatenates strings has
it to the result.

[Which I have checked using Devel::Peek with
 perl -MDevel::Peek -E '
  my $a = ż; my $b = \x{17c};
  Dump $a; Dump $b; Dump $b - $a'
]

 In gitweb we have the bad case: 
 
my $title = $site_name - $project/$action;
 
 $project and $action are apparently utf8 already but $site_name isn't.

$project and $action are taken from URL, and we have to run decode_utf8
(at least for query params) for gitweb to work correctly.

$site_name is usually taken from config file, and gitweb doesn't have
use utf8 pragma.

 The resulting string is marked as utf8 - although the encoding of
 $site_name was never fixed.  The to_utf8() in esc_html() returns the string
 without fixing anything because of that.

O.K.

_Maybe_ it would be worth adding explanation of this to commit message
(and I see I should audit gitweb for similar problems elsewhere), but anyway

Acked-by: Jakub Narebski jna...@gmail.com

-- 
Jakub Narębski
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/4] gitweb: Make feed title valid utf8

2013-04-08 Thread Jürgen Kreileder
Properly encode site and project names for RSS and Atom feeds.

Signed-off-by: Jürgen Kreileder j...@blackdown.de
---
 gitweb/gitweb.perl |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
index 9cfe5b5..09294eb 100755
--- a/gitweb/gitweb.perl
+++ b/gitweb/gitweb.perl
@@ -8056,7 +8056,7 @@ sub git_feed {
return if ($cgi-request_method() eq 'HEAD');
 
# header variables
-   my $title = $site_name - $project/$action;
+   my $title = to_utf8($site_name) .  -  . to_utf8($project) . 
/$action;
my $feed_type = 'log';
if (defined $hash) {
$title .=  - '$hash';
-- 
1.7.10.4
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html