Re: [PATCH] gitweb: Fix the author initials in blame for non-ASCII names
Dear all, I have a git-project which source code use gbk encoding. When use gitweb blame view, it will report an error then stop parse: Malformed UTF-8 character (fatal) at /usr/share/gitweb/gitweb.cgi line 1595, lt;$fdgt; line 45. After apply this patch, blame view of gbk source file will back to normally. diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl index 79057b7..e6fdcfe 100755 --- a/gitweb/gitweb.perl +++ b/gitweb/gitweb.perl @@ -6704,7 +6704,6 @@ sub git_blame_common { $hash_base, '--', $file_name or die_error(500, Open git-blame --porcelain failed); } - binmode $fd, ':utf8'; # incremental blame data returns early if ($format eq 'data') { When I search the git.git log, this commit add the binmode ... line, maybe should recheck this commit? Thanks. fd87004e51df835e5833bfe1bff3ad0137d42227 gitweb: Fix the author initials in blame for non-ASCII names BR, 2014-03-17 -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] gitweb: Fix the author initials in blame for non-ASCII names
Dear all, I have a git-project which source code use gbk encoding. When use gitweb blame view, it will report an error then stop parse: Malformed UTF-8 character (fatal) at /usr/share/gitweb/gitweb.cgi line 1595, lt;$fdgt; line 45. After apply this patch, blame view of gbk source file will back to normally. diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl index 79057b7..e6fdcfe 100755 --- a/gitweb/gitweb.perl +++ b/gitweb/gitweb.perl @@ -6704,7 +6704,6 @@ sub git_blame_common { $hash_base, '--', $file_name or die_error(500, Open git-blame --porcelain failed); } - binmode $fd, ':utf8'; # incremental blame data returns early if ($format eq 'data') { When I search the git.git log, this commit add the binmode ... line, maybe should recheck this commit? Thanks. fd87004e51df835e5833bfe1bff3ad0137d42227 gitweb: Fix the author initials in blame for non-ASCII names BR, 2014-03-17 -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] gitweb: Fix the author initials in blame for non-ASCII names
I did. I just clumsily sent out the wrong patch. I.e. tested it manually on another system, and then fat-fingered $fh instead of $fd. Should I send another patch or do you want to just fix this one up? On Fri, Aug 30, 2013 at 8:13 PM, Junio C Hamano gits...@pobox.com wrote: Junio C Hamano gits...@pobox.com writes: Ævar Arnfjörð Bjarmason ava...@gmail.com writes: Acked-by: Jakub Narębski jna...@gmail.com Tested-by: Ævar Arnfjörð Bjarmason ava...@gmail.com Tested-by: Simon Ruderich si...@ruderich.org --- +++ b/gitweb/gitweb.perl @@ -6631,6 +6631,7 @@ sub git_blame_common { ... +binmode $fh, ':utf8'; [Fri Aug 30 17:48:17 2013] gitweb.perl: Global symbol $fh requires explicit package name at /home/gitster/w/buildfarm/next/t/../gitweb/gitweb.perl line 6634. [Fri Aug 30 17:48:17 2013] gitweb.perl: Execution of /home/gitster/w/buildfarm/next/t/../gitweb/gitweb.perl aborted due to compilation errors. I think in this function the filehandle is called $fd, not $fh. Has any of you really tested this??? -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] gitweb: Fix the author initials in blame for non-ASCII names
On Fri, Aug 30, 2013 at 11:39 PM, Kyle J. McKay mack...@gmail.com wrote: On Aug 30, 2013, at 11:13, Junio C Hamano wrote: Junio C Hamano gits...@pobox.com writes: Ævar Arnfjörð Bjarmason ava...@gmail.com writes: + binmode $fh, ':utf8'; What happens if the author name is written in ISO-8859-1 instead of UTF-8 in the actual commit object itself? I'm pretty sure I've seen this where older commits have a ISO-8859-1 author name and then newer commits have a UTF-8 version of the same author's name. In fact, in the git repository itself, look at commit 0cb3f80d (UTF-8) and commit 7eb93c89 (ISO-8859-1) to see this in action. Well, then you have a problem, though it is only with old history (before introduction of encoding header in commit object). Better and more complete solution would be to use to_utf8() function instead of 'utf8' layer, which when finding invalid UTF-8 sequence uses $fallback_encoding (by default latin1, i.e. ISO-8859-1) instead. In my TODO list is creating PerlIO layer ':utf8-with-fallback' which would replace all those to_utf8() calls... -- Jakub Narebski -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] gitweb: Fix the author initials in blame for non-ASCII names
Change the @author_initials feature Jakub added in v1.6.4-rc2-14-ga36817b to match non-ASCII author initials as intended. The regexp Jakub added was intended to match non-ASCII (/\b([[:upper:]])\B/g). But in Perl this doesn't actually match non-ASCII upper-case characters unless the string being matched against has the UTF8 flag. So when we open a pipe to git blame we need to mark the file descriptor we're opening as utf8 explicitly. So as a result it abbreviates me to AB not ÆAB, entirely because Æ isn't /[[:upper:]]/ unless the string being matched against has the UTF8 flag. Here's something that demonstrates the issue: #!/usr/bin/env perl use strict; use warnings; binmode STDOUT, ':utf8' if $ENV{UTF8}; open my $fd, -|, git, blame, --incremental, --, Makefile or die Can't open: $!; binmode $fd, :utf8 if $ENV{UTF8}; while (my $line = $fd) { next unless my ($author) = $line =~ /^author (.*)/; my @author_initials = ($author =~ /\b([[:upper:]])\B/g); printf %s (%s)\n, join(, @author_initials), $author; } When that's run with and without UTF8 being true in the environment it gives, on git.git: $ UTF8=0 perl author-initials.pl | sort | uniq -c | sort -nr | head -n 5 99 JH (Junio C Hamano) 35 JN (Jonathan Nieder) 35 JK (Jeff King) 20 JS (Johannes Schindelin) 16 AB (Ævar Arnfjörð Bjarmason) $ UTF8=1 perl author-initials.pl | sort | uniq -c | sort -nr | head -n 5 99 JH (Junio C Hamano) 35 JN (Jonathan Nieder) 35 JK (Jeff King) 20 JS (Johannes Schindelin) 16 ÆAB (Ævar Arnfjörð Bjarmason) Acked-by: Jakub Narębski jna...@gmail.com Tested-by: Ævar Arnfjörð Bjarmason ava...@gmail.com Tested-by: Simon Ruderich si...@ruderich.org --- gitweb/gitweb.perl | 1 + 1 file changed, 1 insertion(+) diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl index f429f75..ad48a5a 100755 --- a/gitweb/gitweb.perl +++ b/gitweb/gitweb.perl @@ -6631,6 +6631,7 @@ sub git_blame_common { $hash_base, '--', $file_name or die_error(500, Open git-blame --porcelain failed); } + binmode $fh, ':utf8'; # incremental blame data returns early if ($format eq 'data') { -- 1.8.4.rc2 -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] gitweb: Fix the author initials in blame for non-ASCII names
Junio C Hamano gits...@pobox.com writes: Ævar Arnfjörð Bjarmason ava...@gmail.com writes: Acked-by: Jakub Narębski jna...@gmail.com Tested-by: Ævar Arnfjörð Bjarmason ava...@gmail.com Tested-by: Simon Ruderich si...@ruderich.org --- +++ b/gitweb/gitweb.perl @@ -6631,6 +6631,7 @@ sub git_blame_common { ... +binmode $fh, ':utf8'; [Fri Aug 30 17:48:17 2013] gitweb.perl: Global symbol $fh requires explicit package name at /home/gitster/w/buildfarm/next/t/../gitweb/gitweb.perl line 6634. [Fri Aug 30 17:48:17 2013] gitweb.perl: Execution of /home/gitster/w/buildfarm/next/t/../gitweb/gitweb.perl aborted due to compilation errors. I think in this function the filehandle is called $fd, not $fh. Has any of you really tested this??? -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] gitweb: Fix the author initials in blame for non-ASCII names
On Fri, Aug 30, 2013 at 11:13:19AM -0700, Junio C Hamano wrote: I think in this function the filehandle is called $fd, not $fh. Has any of you really tested this??? I did, but I applied the change by hand without applying the patch directly and didn't notice the difference. Sorry for that. Regards Simon -- + privacy is necessary + using gnupg http://gnupg.org + public key id: 0x92FEFDB7E44C32F9 -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] gitweb: Fix the author initials in blame for non-ASCII names
On Aug 30, 2013, at 11:13, Junio C Hamano wrote: Junio C Hamano gits...@pobox.com writes: Ævar Arnfjörð Bjarmason ava...@gmail.com writes: Acked-by: Jakub Narębski jna...@gmail.com Tested-by: Ævar Arnfjörð Bjarmason ava...@gmail.com Tested-by: Simon Ruderich si...@ruderich.org --- +++ b/gitweb/gitweb.perl @@ -6631,6 +6631,7 @@ sub git_blame_common { ... + binmode $fh, ':utf8'; [Fri Aug 30 17:48:17 2013] gitweb.perl: Global symbol $fh requires explicit package name at /home/gitster/w/buildfarm/next/t/../gitweb/ gitweb.perl line 6634. [Fri Aug 30 17:48:17 2013] gitweb.perl: Execution of /home/gitster/ w/buildfarm/next/t/../gitweb/gitweb.perl aborted due to compilation errors. I think in this function the filehandle is called $fd, not $fh. Has any of you really tested this??? What happens if the author name is written in ISO-8859-1 instead of UTF-8 in the actual commit object itself? I'm pretty sure I've seen this where older commits have a ISO-8859-1 author name and then newer commits have a UTF-8 version of the same author's name. In fact, in the git repository itself, look at commit 0cb3f80d (UTF-8) and commit 7eb93c89 (ISO-8859-1) to see this in action.-- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html