Re: [PATCH] gitk: don't highlight submodule diff lines outside submodule diffs

2018-11-06 Thread Роман Донченко

06.11.2018 23:06, Stefan Beller пишет:

On Tue, Nov 6, 2018 at 12:03 PM Роман Донченко  wrote:


A line that starts with "  <" or "  >" is not necessarily a submodule
diff line. It might just be a context line in a normal diff, representing
a line starting with " <" or " >" respectively.

Use the currdiffsubmod variable to track whether we are currently
inside a submodule diff and only highlight these lines if we are.


This explanation makes sense, some prior art is at
https://public-inbox.org/git/20181021163401.4458-1-du...@example.com/
which was not taken AFAICT.


Didn't see that patch. That said, I think it's incorrect, since it never 
resets currdiffsubmod back to the empty string, so if a normal diff 
follows a submodule diff, the same issue will occur.


(The `set $currdiffsubmod ""` lines that are already there are 
effectively useless because they set the variable whose name is the 
contents of currdiffsubmod, rather than currdiffsubmod itself. I assume

it was a typo.)

-Roman



Thanks,
Stefan



Signed-off-by: Роман Донченко 
---
  gitk | 8 
  1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/gitk b/gitk
index a14d7a1..6bb6dc6 100755
--- a/gitk
+++ b/gitk
@@ -8109,6 +8109,8 @@ proc parseblobdiffline {ids line} {
 }
 # start of a new file
 set diffinhdr 1
+   set currdiffsubmod ""
+
 $ctext insert end "\n"
 set curdiffstart [$ctext index "end - 1c"]
 lappend ctext_file_names ""
@@ -8191,12 +8193,10 @@ proc parseblobdiffline {ids line} {
 } else {
 $ctext insert end "$line\n" filesep
 }
-} elseif {![string compare -length 3 "  >" $line]} {
-   set $currdiffsubmod ""
+} elseif {$currdiffsubmod ne "" && ![string compare -length 3 "  >" 
$line]} {
 set line [encoding convertfrom $diffencoding $line]
 $ctext insert end "$line\n" dresult
-} elseif {![string compare -length 3 "  <" $line]} {
-   set $currdiffsubmod ""
+} elseif {$currdiffsubmod ne "" && ![string compare -length 3 "  <" 
$line]} {
 set line [encoding convertfrom $diffencoding $line]
 $ctext insert end "$line\n" d0
  } elseif {$diffinhdr} {
--
2.19.1.windows.1



[PATCH] gitk: don't highlight submodule diff lines outside submodule diffs

2018-11-06 Thread Роман Донченко
A line that starts with "  <" or "  >" is not necessarily a submodule
diff line. It might just be a context line in a normal diff, representing
a line starting with " <" or " >" respectively.

Use the currdiffsubmod variable to track whether we are currently
inside a submodule diff and only highlight these lines if we are.

Signed-off-by: Роман Донченко 
---
 gitk | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/gitk b/gitk
index a14d7a1..6bb6dc6 100755
--- a/gitk
+++ b/gitk
@@ -8109,6 +8109,8 @@ proc parseblobdiffline {ids line} {
}
# start of a new file
set diffinhdr 1
+   set currdiffsubmod ""
+
$ctext insert end "\n"
set curdiffstart [$ctext index "end - 1c"]
lappend ctext_file_names ""
@@ -8191,12 +8193,10 @@ proc parseblobdiffline {ids line} {
} else {
$ctext insert end "$line\n" filesep
}
-} elseif {![string compare -length 3 "  >" $line]} {
-   set $currdiffsubmod ""
+} elseif {$currdiffsubmod ne "" && ![string compare -length 3 "  >" 
$line]} {
set line [encoding convertfrom $diffencoding $line]
$ctext insert end "$line\n" dresult
-} elseif {![string compare -length 3 "  <" $line]} {
-   set $currdiffsubmod ""
+} elseif {$currdiffsubmod ne "" && ![string compare -length 3 "  <" 
$line]} {
set line [encoding convertfrom $diffencoding $line]
$ctext insert end "$line\n" d0
 } elseif {$diffinhdr} {
-- 
2.19.1.windows.1



[PATCH v2 1/2] send-email: align RFC 2047 decoding more closely with the spec

2014-12-14 Thread Роман Донченко
More specifically:

* Add \ to the list of characters not allowed in a token (see RFC 2047
  errata).

* Share regexes between unquote_rfc2047 and is_rfc2047_quoted. Besides
  removing duplication, this also makes unquote_rfc2047 more stringent.

* Allow both q and Q to identify the encoding.

* Allow lowercase hexadecimal digits in the Q encoding.

And, more on the cosmetic side:

* Change the encoded-text regex to exclude rather than include characters,
  for clarity and consistency with token.

Signed-off-by: Роман Донченко d...@corrigendum.ru
Acked-by: Jeff King p...@peff.net
---
 git-send-email.perl | 30 +++---
 1 file changed, 19 insertions(+), 11 deletions(-)

diff --git a/git-send-email.perl b/git-send-email.perl
index 9949db0..d461ffb 100755
--- a/git-send-email.perl
+++ b/git-send-email.perl
@@ -145,6 +145,11 @@ my $have_mail_address = eval { require Mail::Address; 1 };
 my $smtp;
 my $auth;
 
+# Regexes for RFC 2047 productions.
+my $re_token = qr/[^][()@,;:\\\/?.= \000-\037\177-\377]+/;
+my $re_encoded_text = qr/[^? \000-\037\177-\377]+/;
+my $re_encoded_word = qr/=\?($re_token)\?($re_token)\?($re_encoded_text)\?=/;
+
 # Variables we fill in automatically, or via prompting:
 my (@to,$no_to,@initial_to,@cc,$no_cc,@initial_cc,@bcclist,$no_bcc,@xh,
$initial_reply_to,$initial_subject,@files,
@@ -913,15 +918,20 @@ $time = time - scalar $#files;
 
 sub unquote_rfc2047 {
local ($_) = @_;
-   my $encoding;
-   s{=\?([^?]+)\?q\?(.*?)\?=}{
-   $encoding = $1;
-   my $e = $2;
-   $e =~ s/_/ /g;
-   $e =~ s/=([0-9A-F]{2})/chr(hex($1))/eg;
-   $e;
+   my $charset;
+   s{$re_encoded_word}{
+   $charset = $1;
+   my $encoding = $2;
+   my $text = $3;
+   if ($encoding eq 'q' || $encoding eq 'Q') {
+   $text =~ s/_/ /g;
+   $text =~ s/=([0-9A-F]{2})/chr(hex($1))/egi;
+   $text;
+   } else {
+   $; # other encodings not supported yet
+   }
}eg;
-   return wantarray ? ($_, $encoding) : $_;
+   return wantarray ? ($_, $charset) : $_;
 }
 
 sub quote_rfc2047 {
@@ -934,10 +944,8 @@ sub quote_rfc2047 {
 
 sub is_rfc2047_quoted {
my $s = shift;
-   my $token = qr/[^][()@,;:\/?.= \000-\037\177-\377]+/;
-   my $encoded_text = qr/[!-@-~]+/;
length($s) = 75 
-   $s =~ m/^(?:[[:ascii:]]*|=\?$token\?$token\?$encoded_text\?=)$/o;
+   $s =~ m/^(?:[[:ascii:]]*|$re_encoded_word)$/o;
 }
 
 sub subject_needs_rfc2047_quoting {
-- 
2.1.1

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 2/2] send-email: handle adjacent RFC 2047-encoded words properly

2014-12-14 Thread Роман Донченко
The RFC says that they are to be concatenated after decoding (i.e. the
intervening whitespace is ignored).

Signed-off-by: Роман Донченко d...@corrigendum.ru
Acked-by: Jeff King p...@peff.net
---
 git-send-email.perl   | 26 --
 t/t9001-send-email.sh |  7 +++
 2 files changed, 23 insertions(+), 10 deletions(-)

diff --git a/git-send-email.perl b/git-send-email.perl
index d461ffb..7d5cc8a 100755
--- a/git-send-email.perl
+++ b/git-send-email.perl
@@ -919,17 +919,23 @@ $time = time - scalar $#files;
 sub unquote_rfc2047 {
local ($_) = @_;
my $charset;
-   s{$re_encoded_word}{
-   $charset = $1;
-   my $encoding = $2;
-   my $text = $3;
-   if ($encoding eq 'q' || $encoding eq 'Q') {
-   $text =~ s/_/ /g;
-   $text =~ s/=([0-9A-F]{2})/chr(hex($1))/egi;
-   $text;
-   } else {
-   $; # other encodings not supported yet
+   my $sep = qr/[ \t]+/;
+   s{$re_encoded_word(?:$sep$re_encoded_word)*}{
+   my @words = split $sep, $;
+   foreach (@words) {
+   m/$re_encoded_word/;
+   $charset = $1;
+   my $encoding = $2;
+   my $text = $3;
+   if ($encoding eq 'q' || $encoding eq 'Q') {
+   $_ = $text;
+   s/_/ /g;
+   s/=([0-9A-F]{2})/chr(hex($1))/egi;
+   } else {
+   # other encodings not supported yet
+   }
}
+   join '', @words;
}eg;
return wantarray ? ($_, $charset) : $_;
 }
diff --git a/t/t9001-send-email.sh b/t/t9001-send-email.sh
index 19a3ced..fa965ff 100755
--- a/t/t9001-send-email.sh
+++ b/t/t9001-send-email.sh
@@ -240,6 +240,13 @@ test_expect_success $PREREQ 'non-ascii self name is 
suppressed' 
'non_ascii_self_suppressed'
 
 
+# This name is long enough to force format-patch to split it into multiple
+# encoded-words, assuming it uses UTF-8 with the Q encoding.
+test_expect_success $PREREQ 'long non-ascii self name is suppressed' 
+   test_suppress_self_quoted 'Ƒüñníęř €. Nâṁé' 'odd_?=m...@example.com' \
+   'long_non_ascii_self_suppressed'
+
+
 test_expect_success $PREREQ 'sanitized self name is suppressed' 
test_suppress_self_unquoted '\A U. Thor\' 'aut...@example.com' \
'self_name_sanitized_suppressed'
-- 
2.1.1

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 2/2] send-email: handle adjacent RFC 2047-encoded words properly

2014-12-07 Thread Роман Донченко
Jeff King p...@peff.net писал в своём письме Sun, 07 Dec 2014 12:18:59  
+0300:



On Sat, Dec 06, 2014 at 10:36:23PM +0300, Роман Донченко wrote:


The RFC says that they are to be concatenated after decoding (i.e. the
intervening whitespace is ignored).


Thanks. Both patches look good to me, and I'd be happy to have them
applied as-is. I wrote a few comments below, but in all cases I think I
convinced myself that what you wrote is best.


I had the same concerns myself, and eventually convinced myself of the  
same. :-)



One final note on this bit of code: if there are multiple encoded words,
we grab the $charset from the final encoded word, and never report the
earlier charsets. Technically they do not all have to be the same
(rfc2047 even has an example where they are not). I think we can dismiss
this, though, as:

  1. It was like this before your patches (we might have seen multiple
 non-adjacent encoded words; you're just handling adjacent ones),
 and nobody has complained.

  2. Using two separate encodings in the same header is sufficiently
 ridiculous that I can live with us not handling it properly.


Yeah, that bugs me as well. But I think handling multiple encodings would  
require substantial reworking of the code, so I chickened out (with the  
same excuses :-)).


Roman.
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 2/2] send-email: handle adjacent RFC 2047-encoded words properly

2014-12-07 Thread Роман Донченко
Philip Oakley philipoak...@iee.org писал в своём письме Sun, 07 Dec 2014  
20:48:05 +0300:



From: Роман Донченко d...@corrigendum.ru
Jeff King p...@peff.net писал в своём письме Sun, 07 Dec 2014  
12:18:59  +0300:



On Sat, Dec 06, 2014 at 10:36:23PM +0300, Роман Донченко wrote:
One final note on this bit of code: if there are multiple encoded  
words,

we grab the $charset from the final encoded word, and never report the
earlier charsets. Technically they do not all have to be the same
(rfc2047 even has an example where they are not). I think we can  
dismiss

this, though, as:

  1. It was like this before your patches (we might have seen multiple
 non-adjacent encoded words; you're just handling adjacent ones),
 and nobody has complained.

  2. Using two separate encodings in the same header is sufficiently
 ridiculous that I can live with us not handling it properly.


Yeah, that bugs me as well. But I think handling multiple encodings  
would  require substantial reworking of the code, so I chickened out  
(with the  same excuses :-)).


Would that be worth a terse comment in the documentation change part of  
the patch?

Multiple  (RFC2047) encodings are not supported.,
or would that be bike shed noise.


I didn't change any documentation... and in either case, they weren't  
supported in the first place, so I don't think it's anything I need to  
mention.

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 1/2] send-email: align RFC 2047 decoding more closely with the spec

2014-12-06 Thread Роман Донченко
More specifically:

* Add \ to the list of characters not allowed in a token (see RFC 2047
  errata).

* Share regexes between unquote_rfc2047 and is_rfc2047_quoted. Besides
  removing duplication, this also makes unquote_rfc2047 more stringent.

* Allow both q and Q to identify the encoding.

* Allow lowercase hexadecimal digits in the Q encoding.

And, more on the cosmetic side:

* Change the encoded-text regex to exclude rather than include characters,
  for clarity and consistency with token.

Signed-off-by: Роман Донченко d...@corrigendum.ru
---
 git-send-email.perl | 30 +++---
 1 file changed, 19 insertions(+), 11 deletions(-)

diff --git a/git-send-email.perl b/git-send-email.perl
index 9949db0..d461ffb 100755
--- a/git-send-email.perl
+++ b/git-send-email.perl
@@ -145,6 +145,11 @@ my $have_mail_address = eval { require Mail::Address; 1 };
 my $smtp;
 my $auth;
 
+# Regexes for RFC 2047 productions.
+my $re_token = qr/[^][()@,;:\\\/?.= \000-\037\177-\377]+/;
+my $re_encoded_text = qr/[^? \000-\037\177-\377]+/;
+my $re_encoded_word = qr/=\?($re_token)\?($re_token)\?($re_encoded_text)\?=/;
+
 # Variables we fill in automatically, or via prompting:
 my (@to,$no_to,@initial_to,@cc,$no_cc,@initial_cc,@bcclist,$no_bcc,@xh,
$initial_reply_to,$initial_subject,@files,
@@ -913,15 +918,20 @@ $time = time - scalar $#files;
 
 sub unquote_rfc2047 {
local ($_) = @_;
-   my $encoding;
-   s{=\?([^?]+)\?q\?(.*?)\?=}{
-   $encoding = $1;
-   my $e = $2;
-   $e =~ s/_/ /g;
-   $e =~ s/=([0-9A-F]{2})/chr(hex($1))/eg;
-   $e;
+   my $charset;
+   s{$re_encoded_word}{
+   $charset = $1;
+   my $encoding = $2;
+   my $text = $3;
+   if ($encoding eq 'q' || $encoding eq 'Q') {
+   $text =~ s/_/ /g;
+   $text =~ s/=([0-9A-F]{2})/chr(hex($1))/egi;
+   $text;
+   } else {
+   $; # other encodings not supported yet
+   }
}eg;
-   return wantarray ? ($_, $encoding) : $_;
+   return wantarray ? ($_, $charset) : $_;
 }
 
 sub quote_rfc2047 {
@@ -934,10 +944,8 @@ sub quote_rfc2047 {
 
 sub is_rfc2047_quoted {
my $s = shift;
-   my $token = qr/[^][()@,;:\/?.= \000-\037\177-\377]+/;
-   my $encoded_text = qr/[!-@-~]+/;
length($s) = 75 
-   $s =~ m/^(?:[[:ascii:]]*|=\?$token\?$token\?$encoded_text\?=)$/o;
+   $s =~ m/^(?:[[:ascii:]]*|$re_encoded_word)$/o;
 }
 
 sub subject_needs_rfc2047_quoting {
-- 
2.1.1

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 2/2] send-email: handle adjacent RFC 2047-encoded words properly

2014-12-06 Thread Роман Донченко
The RFC says that they are to be concatenated after decoding (i.e. the
intervening whitespace is ignored).
---
 git-send-email.perl   | 26 --
 t/t9001-send-email.sh |  7 +++
 2 files changed, 23 insertions(+), 10 deletions(-)

diff --git a/git-send-email.perl b/git-send-email.perl
index d461ffb..7d5cc8a 100755
--- a/git-send-email.perl
+++ b/git-send-email.perl
@@ -919,17 +919,23 @@ $time = time - scalar $#files;
 sub unquote_rfc2047 {
local ($_) = @_;
my $charset;
-   s{$re_encoded_word}{
-   $charset = $1;
-   my $encoding = $2;
-   my $text = $3;
-   if ($encoding eq 'q' || $encoding eq 'Q') {
-   $text =~ s/_/ /g;
-   $text =~ s/=([0-9A-F]{2})/chr(hex($1))/egi;
-   $text;
-   } else {
-   $; # other encodings not supported yet
+   my $sep = qr/[ \t]+/;
+   s{$re_encoded_word(?:$sep$re_encoded_word)*}{
+   my @words = split $sep, $;
+   foreach (@words) {
+   m/$re_encoded_word/;
+   $charset = $1;
+   my $encoding = $2;
+   my $text = $3;
+   if ($encoding eq 'q' || $encoding eq 'Q') {
+   $_ = $text;
+   s/_/ /g;
+   s/=([0-9A-F]{2})/chr(hex($1))/egi;
+   } else {
+   # other encodings not supported yet
+   }
}
+   join '', @words;
}eg;
return wantarray ? ($_, $charset) : $_;
 }
diff --git a/t/t9001-send-email.sh b/t/t9001-send-email.sh
index 19a3ced..fa965ff 100755
--- a/t/t9001-send-email.sh
+++ b/t/t9001-send-email.sh
@@ -240,6 +240,13 @@ test_expect_success $PREREQ 'non-ascii self name is 
suppressed' 
'non_ascii_self_suppressed'
 
 
+# This name is long enough to force format-patch to split it into multiple
+# encoded-words, assuming it uses UTF-8 with the Q encoding.
+test_expect_success $PREREQ 'long non-ascii self name is suppressed' 
+   test_suppress_self_quoted 'Ƒüñníęř €. Nâṁé' 'odd_?=m...@example.com' \
+   'long_non_ascii_self_suppressed'
+
+
 test_expect_success $PREREQ 'sanitized self name is suppressed' 
test_suppress_self_unquoted '\A U. Thor\' 'aut...@example.com' \
'self_name_sanitized_suppressed'
-- 
2.1.1

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] send-email: handle adjacent RFC 2047-encoded words properly

2014-11-24 Thread Роман Донченко
Junio C Hamano gits...@pobox.com писал в своём письме Mon, 24 Nov 2014  
10:27:51 +0300:


On Sun, Nov 23, 2014 at 3:50 PM, Роман Донченко d...@corrigendum.ru  
wrote:

The RFC says that they are to be concatenated after decoding (i.e. the
intervening whitespace is ignored).

I change the sender's name to an all-Cyrillic string in the tests so  
that

its encoded form goes over the 76 characters in a line limit, forcing
format-patch to split it into multiple encoded words.

Since I have to modify the regular expression for an encoded word  
anyway,

I take the opportunity to bring it closer to the spec, most notably
disallowing embedded spaces and making it case-insensitive (thus  
allowing

the encoding to be specified as both q and Q).

Signed-off-by: Роман Донченко d...@corrigendum.ru


This sounds like a worthy thing to do in general.

I wonder if the C implementation we have for mailinfo needs similar
update, though. I vaguely recall that we have case-insensitive start for
q/b segments, but do not remember the details offhand.


That's what git am uses, right? I think that already works correctly (or  
at least doesn't have the bug this patch fixes). I didn't do extensive  
testing or look at the code, though.




Was the change to the test to use Cyrillic really necessary, or did it
suffice if you simply extended the existsing Funny Name spelled with
strange accents, but you substituted the whole string anyway?

Until I found out what the new string says by running web-based
translation on it, I felt somewhat uneasy. As I do not read
Cyrillic/Russian, we may have been adding some profanity without
knowing. It turns out that the string just says Cyrillic Name, so I am
not against using the new string, but it simply looked odd to replace the
string whole-sale when you merely need a longer string. It made it look
as if a bug was specific to Cyrillic when it wasn't.


Ah, if only I had thought of including profanity beforehand. ;-)

Seriously though, I just needed to hit the 76 character limit, and  
switching the keyboard layout is a lot easier than copypasting Latin  
letters with diacritics (plus I had trouble coming up with a long enough  
extension of Funny Name...). I can see how that's problematic, though;  
I'll change it.



As you may notice by reading git log --no-merges from recent history,
we tend not to say I did X, I did Y. If the tone of the above message
were more similar to them, it may have been easier to read.


Technically, I said I do, not I did... but sure, point taken.

Roman.
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] send-email: handle adjacent RFC 2047-encoded words properly

2014-11-24 Thread Роман Донченко
Jeff King p...@peff.net писал в своём письме Mon, 24 Nov 2014 18:36:09  
+0300:



On Mon, Nov 24, 2014 at 02:50:04AM +0300, Роман Донченко wrote:


The RFC says that they are to be concatenated after decoding (i.e. the
intervening whitespace is ignored).

I change the sender's name to an all-Cyrillic string in the tests so  
that

its encoded form goes over the 76 characters in a line limit, forcing
format-patch to split it into multiple encoded words.

Since I have to modify the regular expression for an encoded word  
anyway,

I take the opportunity to bring it closer to the spec, most notably
disallowing embedded spaces and making it case-insensitive (thus  
allowing

the encoding to be specified as both q and Q).


The overall goal makes sense to me. Thanks for working on this. I have a
few questions/comments, though.


 sub unquote_rfc2047 {
local ($_) = @_;
+
+   my $et = qr/[!-@-~]+/; # encoded-text from RFC 2047
+   my $sep = qr/[ \t]+/;
+   my $encoded_word = qr/=\?($et)\?q\?($et)\?=/i;


The first $et in $encoded_word is actually the charset, which is defined
by RFC 2047 as:

 charset = token; see section 3

 token = 1*Any CHAR except SPACE, CTLs, and especials

 especials = ( / ) /  /  / @ / , / ; / : / 
/ / / [ / ] / ? / . / =

Your regex is a little more liberal. I doubt that it is a big deal in
practice (actually, in practice, I suspect [a-zA-Z0-9-] would be fine).
But if we are tightening things up in general, it may make sense to do
so here (and I notice that is_rfc2047_quoted does a more thorough $token
definition, and it probably makes sense for the two functions to be
consistent).


Yeah, I did realize that token is more restrictive than encoded-text, but  
I didn't want to stray too far from the subject line of the patch. What  
I'll probably do is split the patch into two, one for regex tweaking and  
one for multiple-word handling. And yeah, I'll try to make the two  
functions use the same regexes.




For your definition of encoded-text, RFC 2047 says:

 encoded-text = 1*Any printable ASCII character other than ?
  or SPACE

It looks like you pulled the definition of $et from is_rfc2047_quoted,
but I am not clear on where that original came from (it is from a3a8262,
but that commit message does not explain the regex).


No, it's actually an independent discovery. :-) I don't think it needs  
explanation, though - it's just a character class with two ranges covering  
every printable character but the question mark.



Also, I note that we handle 'q'-style encodings here, but not 'b'. I
wonder if it is worth adding that in while we are in the area (it is not
a big deal if you always send-email git-generated patches, as we never
generate it).


I could add b decoding, but since format-patch never generates b  
encodings, testing would be a problem. And I'd rather not do it without  
any tests.





+   s{$encoded_word(?:$sep$encoded_word)+}{


If I am reading this right, it requires at least two $encoded_words.
Should this + be a *?


I hang my head in shame. Looks like I'll have to add more tests...




+   my @words = split $sep, $;
+   foreach (@words) {
+   m/$encoded_word/;
+   $encoding = $1;
+   $_ = $2;
+   s/_/ /g;
+   s/=([0-9A-F]{2})/chr(hex($1))/eg;


In the spirit of your earlier change, should this final regex be
case-insensitive? RFC 2047 says only Upper case should be used for
hexadecimal digits A through F. but that does not seem like a MUST
to me.


Sounds reasonable.

Roman.
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] send-email: handle adjacent RFC 2047-encoded words properly

2014-11-23 Thread Роман Донченко
The RFC says that they are to be concatenated after decoding (i.e. the
intervening whitespace is ignored).

I change the sender's name to an all-Cyrillic string in the tests so that
its encoded form goes over the 76 characters in a line limit, forcing
format-patch to split it into multiple encoded words.

Since I have to modify the regular expression for an encoded word anyway,
I take the opportunity to bring it closer to the spec, most notably
disallowing embedded spaces and making it case-insensitive (thus allowing
the encoding to be specified as both q and Q).

Signed-off-by: Роман Донченко d...@corrigendum.ru
---
 git-send-email.perl   | 21 +++--
 t/t9001-send-email.sh | 18 +-
 2 files changed, 24 insertions(+), 15 deletions(-)

diff --git a/git-send-email.perl b/git-send-email.perl
index 9949db0..4bb9f6f 100755
--- a/git-send-email.perl
+++ b/git-send-email.perl
@@ -913,13 +913,22 @@ $time = time - scalar $#files;
 
 sub unquote_rfc2047 {
local ($_) = @_;
+
+   my $et = qr/[!-@-~]+/; # encoded-text from RFC 2047
+   my $sep = qr/[ \t]+/;
+   my $encoded_word = qr/=\?($et)\?q\?($et)\?=/i;
+
my $encoding;
-   s{=\?([^?]+)\?q\?(.*?)\?=}{
-   $encoding = $1;
-   my $e = $2;
-   $e =~ s/_/ /g;
-   $e =~ s/=([0-9A-F]{2})/chr(hex($1))/eg;
-   $e;
+   s{$encoded_word(?:$sep$encoded_word)+}{
+   my @words = split $sep, $;
+   foreach (@words) {
+   m/$encoded_word/;
+   $encoding = $1;
+   $_ = $2;
+   s/_/ /g;
+   s/=([0-9A-F]{2})/chr(hex($1))/eg;
+   }
+   join '', @words;
}eg;
return wantarray ? ($_, $encoding) : $_;
 }
diff --git a/t/t9001-send-email.sh b/t/t9001-send-email.sh
index 19a3ced..318b870 100755
--- a/t/t9001-send-email.sh
+++ b/t/t9001-send-email.sh
@@ -236,7 +236,7 @@ test_expect_success $PREREQ 'self name with dot is 
suppressed' 
 
 
 test_expect_success $PREREQ 'non-ascii self name is suppressed' 
-   test_suppress_self_quoted 'Füñný Nâmé' 'odd_?=m...@example.com' \
+   test_suppress_self_quoted 'Кириллическое Имя' 'odd_?=m...@example.com' \
'non_ascii_self_suppressed'
 
 
@@ -946,25 +946,25 @@ test_expect_success $PREREQ 'utf8 author is correctly 
passed on' '
clean_fake_sendmail 
test_commit weird_author 
test_when_finished git reset --hard HEAD^ 
-   git commit --amend --author Füñný Nâmé odd_?=m...@example.com 
-   git format-patch --stdout -1 funny_name.patch 
+   git commit --amend --author Кириллическое Имя 
odd_?=m...@example.com 
+   git format-patch --stdout -1 nonascii_name.patch 
git send-email --from=Example nob...@example.com \
  --to=nob...@example.com \
  --smtp-server=$(pwd)/fake.sendmail \
- funny_name.patch 
-   grep ^From: Füñný Nâmé odd_?=m...@example.com msgtxt1
+ nonascii_name.patch 
+   grep ^From: Кириллическое Имя odd_?=m...@example.com msgtxt1
 '
 
 test_expect_success $PREREQ 'utf8 sender is not duplicated' '
clean_fake_sendmail 
test_commit weird_sender 
test_when_finished git reset --hard HEAD^ 
-   git commit --amend --author Füñný Nâmé odd_?=m...@example.com 
-   git format-patch --stdout -1 funny_name.patch 
-   git send-email --from=Füñný Nâmé odd_?=m...@example.com \
+   git commit --amend --author Кириллическое Имя 
odd_?=m...@example.com 
+   git format-patch --stdout -1 nonascii_name.patch 
+   git send-email --from=Кириллическое Имя odd_?=m...@example.com \
  --to=nob...@example.com \
  --smtp-server=$(pwd)/fake.sendmail \
- funny_name.patch 
+ nonascii_name.patch 
grep ^From:  msgtxt1 msgfrom 
test_line_count = 1 msgfrom
 '
-- 
2.1.1

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html