Re: [PATCH] gitweb: Fix the author initials in blame for non-ASCII names

2014-03-17 Thread Kicer Jiao
Dear all,

I have a git-project which source code use gbk encoding. When use
gitweb blame view, it will report an error then stop parse:
Malformed UTF-8 character (fatal) at /usr/share/gitweb/gitweb.cgi line 
 1595, lt;$fdgt; line 45.

After apply this patch, blame view of gbk source file will back to normally.
diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
index 79057b7..e6fdcfe 100755
--- a/gitweb/gitweb.perl
+++ b/gitweb/gitweb.perl
@@ -6704,7 +6704,6 @@ sub git_blame_common {
$hash_base, '--', $file_name
or die_error(500, Open git-blame --porcelain failed);
}
-   binmode $fd, ':utf8';

# incremental blame data returns early
if ($format eq 'data') {

When I search the git.git log, this commit add the binmode ... line,
maybe should recheck this commit? Thanks.
fd87004e51df835e5833bfe1bff3ad0137d42227  gitweb: Fix the author
initials in blame for non-ASCII names


BR,
2014-03-17
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] gitweb: Fix the author initials in blame for non-ASCII names

2014-03-17 Thread Kicer Jiao
Dear all,

I have a git-project which source code use gbk encoding. When use
gitweb blame view, it will report an error then stop parse:
Malformed UTF-8 character (fatal) at /usr/share/gitweb/gitweb.cgi line 
 1595, lt;$fdgt; line 45.

After apply this patch, blame view of gbk source file will back to normally.
diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
index 79057b7..e6fdcfe 100755
--- a/gitweb/gitweb.perl
+++ b/gitweb/gitweb.perl
@@ -6704,7 +6704,6 @@ sub git_blame_common {
$hash_base, '--', $file_name
or die_error(500, Open git-blame --porcelain failed);
}
-   binmode $fd, ':utf8';

# incremental blame data returns early
if ($format eq 'data') {

When I search the git.git log, this commit add the binmode ... line,
maybe should recheck this commit? Thanks.
fd87004e51df835e5833bfe1bff3ad0137d42227  gitweb: Fix the author
initials in blame for non-ASCII names


BR,
2014-03-17
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] gitweb: Fix the author initials in blame for non-ASCII names

2013-08-31 Thread Ævar Arnfjörð Bjarmason
I did. I just clumsily sent out the wrong patch. I.e. tested it
manually on another system, and then fat-fingered $fh instead of $fd.

Should I send another patch or do you want to just fix this one up?

On Fri, Aug 30, 2013 at 8:13 PM, Junio C Hamano gits...@pobox.com wrote:
 Junio C Hamano gits...@pobox.com writes:

 Ævar Arnfjörð Bjarmason  ava...@gmail.com writes:

 Acked-by: Jakub Narębski jna...@gmail.com
 Tested-by: Ævar Arnfjörð Bjarmason ava...@gmail.com
 Tested-by: Simon Ruderich si...@ruderich.org
 ---
 +++ b/gitweb/gitweb.perl
 @@ -6631,6 +6631,7 @@ sub git_blame_common {
 ...
 +binmode $fh, ':utf8';


 [Fri Aug 30 17:48:17 2013] gitweb.perl: Global symbol $fh requires
 explicit package name at 
 /home/gitster/w/buildfarm/next/t/../gitweb/gitweb.perl line 6634.
 [Fri Aug 30 17:48:17 2013] gitweb.perl: Execution of 
 /home/gitster/w/buildfarm/next/t/../gitweb/gitweb.perl aborted due to 
 compilation errors.

 I think in this function the filehandle is called $fd, not $fh.  Has
 any of you really tested this???
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] gitweb: Fix the author initials in blame for non-ASCII names

2013-08-31 Thread Jakub Narębski
On Fri, Aug 30, 2013 at 11:39 PM, Kyle J. McKay mack...@gmail.com wrote:
 On Aug 30, 2013, at 11:13, Junio C Hamano wrote:
 Junio C Hamano gits...@pobox.com writes:
 Ævar Arnfjörð Bjarmason  ava...@gmail.com writes:

 +   binmode $fh, ':utf8';

 What happens if the author name is written in ISO-8859-1 instead of UTF-8 in
 the actual commit object itself?

 I'm pretty sure I've seen this where older commits have a ISO-8859-1 author
 name and then newer commits have a UTF-8 version of the same author's name.

 In fact, in the git repository itself, look at commit 0cb3f80d (UTF-8) and
 commit 7eb93c89 (ISO-8859-1) to see this in action.

Well, then you have a problem, though it is only with old history (before
introduction of encoding header in commit object).

Better and more complete solution would be to use to_utf8() function
instead of 'utf8' layer, which when finding invalid UTF-8 sequence uses
$fallback_encoding (by default latin1, i.e. ISO-8859-1) instead.


In my TODO list is creating PerlIO layer ':utf8-with-fallback' which would
replace all those to_utf8() calls...

-- 
Jakub Narebski
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] gitweb: Fix the author initials in blame for non-ASCII names

2013-08-30 Thread Ævar Arnfjörð Bjarmason
Change the @author_initials feature Jakub added in
v1.6.4-rc2-14-ga36817b to match non-ASCII author initials as intended.

The regexp Jakub added was intended to match
non-ASCII (/\b([[:upper:]])\B/g). But in Perl this doesn't actually
match non-ASCII upper-case characters unless the string being matched
against has the UTF8 flag.

So when we open a pipe to git blame we need to mark the file
descriptor we're opening as utf8 explicitly.

So as a result it abbreviates me to AB not ÆAB, entirely because Æ
isn't /[[:upper:]]/ unless the string being matched against has the UTF8
flag.

Here's something that demonstrates the issue:

#!/usr/bin/env perl
use strict;
use warnings;

binmode STDOUT, ':utf8' if $ENV{UTF8};
open my $fd, -|, git, blame, --incremental, --, Makefile or die 
Can't open: $!;
binmode $fd, :utf8 if $ENV{UTF8};
while (my $line = $fd) {
next unless my ($author) = $line =~ /^author (.*)/;
my @author_initials = ($author =~ /\b([[:upper:]])\B/g);
printf %s (%s)\n,  join(, @author_initials), $author;
}

When that's run with and without UTF8 being true in the environment it
gives, on git.git:

$ UTF8=0 perl author-initials.pl | sort | uniq -c |
sort -nr | head -n 5
 99 JH (Junio C Hamano)
 35 JN (Jonathan Nieder)
 35 JK (Jeff King)
 20 JS (Johannes Schindelin)
 16 AB (Ævar Arnfjörð Bjarmason)
$ UTF8=1 perl author-initials.pl | sort | uniq -c |
sort -nr | head -n 5
 99 JH (Junio C Hamano)
 35 JN (Jonathan Nieder)
 35 JK (Jeff King)
 20 JS (Johannes Schindelin)
 16 ÆAB (Ævar Arnfjörð Bjarmason)

Acked-by: Jakub Narębski jna...@gmail.com
Tested-by: Ævar Arnfjörð Bjarmason ava...@gmail.com
Tested-by: Simon Ruderich si...@ruderich.org
---
 gitweb/gitweb.perl | 1 +
 1 file changed, 1 insertion(+)

diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
index f429f75..ad48a5a 100755
--- a/gitweb/gitweb.perl
+++ b/gitweb/gitweb.perl
@@ -6631,6 +6631,7 @@ sub git_blame_common {
$hash_base, '--', $file_name
or die_error(500, Open git-blame --porcelain failed);
}
+   binmode $fh, ':utf8';
 
# incremental blame data returns early
if ($format eq 'data') {
-- 
1.8.4.rc2

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] gitweb: Fix the author initials in blame for non-ASCII names

2013-08-30 Thread Junio C Hamano
Junio C Hamano gits...@pobox.com writes:

 Ævar Arnfjörð Bjarmason  ava...@gmail.com writes:

 Acked-by: Jakub Narębski jna...@gmail.com
 Tested-by: Ævar Arnfjörð Bjarmason ava...@gmail.com
 Tested-by: Simon Ruderich si...@ruderich.org
 ---
 +++ b/gitweb/gitweb.perl
 @@ -6631,6 +6631,7 @@ sub git_blame_common {
 ...
 +binmode $fh, ':utf8';


 [Fri Aug 30 17:48:17 2013] gitweb.perl: Global symbol $fh requires
 explicit package name at 
 /home/gitster/w/buildfarm/next/t/../gitweb/gitweb.perl line 6634.
 [Fri Aug 30 17:48:17 2013] gitweb.perl: Execution of 
 /home/gitster/w/buildfarm/next/t/../gitweb/gitweb.perl aborted due to 
 compilation errors.

I think in this function the filehandle is called $fd, not $fh.  Has
any of you really tested this???
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] gitweb: Fix the author initials in blame for non-ASCII names

2013-08-30 Thread Simon Ruderich
On Fri, Aug 30, 2013 at 11:13:19AM -0700, Junio C Hamano wrote:
 I think in this function the filehandle is called $fd, not $fh.  Has
 any of you really tested this???

I did, but I applied the change by hand without applying the
patch directly and didn't notice the difference. Sorry for that.

Regards
Simon
-- 
+ privacy is necessary
+ using gnupg http://gnupg.org
+ public key id: 0x92FEFDB7E44C32F9
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] gitweb: Fix the author initials in blame for non-ASCII names

2013-08-30 Thread Kyle J. McKay

On Aug 30, 2013, at 11:13, Junio C Hamano wrote:

Junio C Hamano gits...@pobox.com writes:


Ævar Arnfjörð Bjarmason  ava...@gmail.com writes:


Acked-by: Jakub Narębski jna...@gmail.com
Tested-by: Ævar Arnfjörð Bjarmason ava...@gmail.com
Tested-by: Simon Ruderich si...@ruderich.org
---
+++ b/gitweb/gitweb.perl
@@ -6631,6 +6631,7 @@ sub git_blame_common {
...
+   binmode $fh, ':utf8';




[Fri Aug 30 17:48:17 2013] gitweb.perl: Global symbol $fh requires
explicit package name at /home/gitster/w/buildfarm/next/t/../gitweb/ 
gitweb.perl line 6634.
[Fri Aug 30 17:48:17 2013] gitweb.perl: Execution of /home/gitster/ 
w/buildfarm/next/t/../gitweb/gitweb.perl aborted due to compilation  
errors.


I think in this function the filehandle is called $fd, not $fh.  Has
any of you really tested this???


What happens if the author name is written in ISO-8859-1 instead of  
UTF-8 in the actual commit object itself?


I'm pretty sure I've seen this where older commits have a ISO-8859-1  
author name and then newer commits have a UTF-8 version of the same  
author's name.


In fact, in the git repository itself, look at commit 0cb3f80d (UTF-8)  
and commit 7eb93c89 (ISO-8859-1) to see this in action.--

To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html