Re: [PATCH] git-send-email: skip RFC2047 quoting for ASCII subjects

2012-10-25 Thread Jeff King
On Wed, Oct 24, 2012 at 11:08:26PM +0200, Krzysztof Mazur wrote:

 ok, I'm sending a version that just adds quote_subject() without
 changing any logic, so now we still have in first case:
 
  /[^[:ascii:]]/
 
 and in the latter case:
  
  !is_rfc2047_quoted($subject)  /^[:ascii:]]/
 
 
 In the next patch I will just add matching for =? in 
 subject_needs_rfc2047_quoting() and we will have:
 
/=?/ || /[^[:ascii:]]/
 
 and in the latter case:
  
!is_rfc2047_quoted($subject)  (/=\?/ || /^[:ascii:]]/)
 
 This will also add quoting for any rfc2047 quoted subject or any
 other rfc2047-like subject, as you suggested.

Thanks, the two-patch series you outline makes a lot of sense to me.

 Krzysiek
 -- 
 From a70c5385f9b4da69a8ce00a1448f87f63bbd500d Mon Sep 17 00:00:00 2001
 From: Krzysztof Mazur krzys...@podlesie.net
 Date: Wed, 24 Oct 2012 22:46:00 +0200
 Subject: [PATCH] git-send-email: introduce quote_subject()

When sending a patch following some cover letter material, please cut
out any non-essential headers and use the scissors symbol, like this:

  -- 8 --
  Subject: [PATCH] this subject overrides the whole email's subject

  the regular body and diff go here...

That format is understood by git am and means I do not have to
manually munge it, which saves a little work.

 +sub quote_subject {
 + local $subject = shift;
 + my $encoding = shift || 'UTF-8';
 +
 + if (subject_needs_rfc2047_quoting($subject)) {
 + return quote_rfc2047($subject, $encoding);
 + }
 + return $subject;
 +}

There is some funny whitespace here (space followed by tab).

 - if ($broken_encoding{$t}  !is_rfc2047_quoted($subject) 
 - ($subject =~ /[^[:ascii:]]/)) {
 - $subject = quote_rfc2047($subject, $auto_8bit_encoding);
 + if ($broken_encoding{$t}  !is_rfc2047_quoted($subject)) {
 + $subject = quote_subject($subject, $auto_8bit_encoding);
   }

Hmm. What is this patch on top of? It looks like it is on top of your
original patch, but when I tried it on top of that, it does not apply
either, and the index lines in the patch do not mention a sha1 that I do
not have.

Do you mind re-rolling a final 2-patch series with:

  1. Your original patch and this one squashed together, with an
 appropriate commit message.

  2. The second quote when we see '=?' patch.

Thanks.

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] git-send-email: skip RFC2047 quoting for ASCII subjects

2012-10-25 Thread Jeff King
On Thu, Oct 25, 2012 at 05:01:49AM -0400, Jeff King wrote:

  -   if ($broken_encoding{$t}  !is_rfc2047_quoted($subject) 
  -   ($subject =~ /[^[:ascii:]]/)) {
  -   $subject = quote_rfc2047($subject, $auto_8bit_encoding);
  +   if ($broken_encoding{$t}  !is_rfc2047_quoted($subject)) {
  +   $subject = quote_subject($subject, $auto_8bit_encoding);
  }
 
 Hmm. What is this patch on top of? It looks like it is on top of your
 original patch, but when I tried it on top of that, it does not apply
 either, and the index lines in the patch do not mention a sha1 that I do
 not have.

 Do you mind re-rolling a final 2-patch series with:

Ah, never mind. I missed your earlier use compose-encoding for
Subject. I've queued it and all of the follow-ons onto the
km/send-email-compose-encoding topic.

Thanks.

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] git-send-email: skip RFC2047 quoting for ASCII subjects

2012-10-25 Thread Krzysztof Mazur
On Thu, Oct 25, 2012 at 05:01:49AM -0400, Jeff King wrote:
 
 Hmm. What is this patch on top of? It looks like it is on top of your
 original patch, but when I tried it on top of that, it does not apply
 either, and the index lines in the patch do not mention a sha1 that I do
 not have.

Sorry, it's against km/send-email-compose-encoding (or current next)
+ git-send-email: use compose-encoding for Subject.

 
 Do you mind re-rolling a final 2-patch series with:
 
   1. Your original patch and this one squashed together, with an
  appropriate commit message.

I think that it's better to do refactoring and fix for ASCII in separate
patches. Maybe we should reverse order of first two patches. This first
will do refactoring and the second will just replace quote_rfc2047()
with quote_subject() in broken encoding case and add test
for this problem.

 
   2. The second quote when we see '=?' patch.
 
 Thanks.
 
 -Peff

ok, I will resend the final series.

I need also to fix git-send-email: use compose-encoding for Subject
patch. Now it's depends both on this series and
km/send-email-compose-encoding branch.

Krzysiek
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] git-send-email: skip RFC2047 quoting for ASCII subjects

2012-10-25 Thread Jeff King
On Thu, Oct 25, 2012 at 01:19:19PM +0200, Krzysztof Mazur wrote:

 On Thu, Oct 25, 2012 at 06:08:54AM -0400, Jeff King wrote:
  
  Ah, never mind. I missed your earlier use compose-encoding for
  Subject. I've queued it and all of the follow-ons onto the
  km/send-email-compose-encoding topic.
  
 
 thanks, what about the problem with whitespaces in quote_subject patch?

I fixed the whitespace problems, and just applying your patches in
sequence on top of send-email-compose-encoding actually looks sensible
(after looking at it, I came to the same conclusion as you that the two
patches should be kept separate). I'll push out the results of tonight's
work in a few minutes, if you want to eyeball what's in pu.

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] git-send-email: skip RFC2047 quoting for ASCII subjects

2012-10-25 Thread Krzysztof Mazur
On Thu, Oct 25, 2012 at 06:08:54AM -0400, Jeff King wrote:
 
 Ah, never mind. I missed your earlier use compose-encoding for
 Subject. I've queued it and all of the follow-ons onto the
 km/send-email-compose-encoding topic.
 

thanks, what about the problem with whitespaces in quote_subject patch?

Krzysiek
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] git-send-email: skip RFC2047 quoting for ASCII subjects

2012-10-24 Thread Krzysztof Mazur
The git-send-email always use RFC2047 subject quoting for files
with broken encoding - non-ASCII files without Content-Transfer-Encoding,
even for ASCII subjects. Now for ASCII subjects the RFC2047 quoting will be
skipped.

Signed-off-by: Krzysztof Mazur krzys...@podlesie.net
---
 git-send-email.perl   |  3 ++-
 t/t9001-send-email.sh | 17 +
 2 files changed, 19 insertions(+), 1 deletion(-)

diff --git a/git-send-email.perl b/git-send-email.perl
index adcb4e3..efeae4c 100755
--- a/git-send-email.perl
+++ b/git-send-email.perl
@@ -1327,7 +1327,8 @@ foreach my $t (@files) {
$body_encoding = $auto_8bit_encoding;
}
 
-   if ($broken_encoding{$t}  !is_rfc2047_quoted($subject)) {
+   if ($broken_encoding{$t}  !is_rfc2047_quoted($subject) 
+   ($subject =~ /[^[:ascii:]]/)) {
$subject = quote_rfc2047($subject, $auto_8bit_encoding);
}
 
diff --git a/t/t9001-send-email.sh b/t/t9001-send-email.sh
index 89fceda..6c6af7d 100755
--- a/t/t9001-send-email.sh
+++ b/t/t9001-send-email.sh
@@ -1143,6 +1143,23 @@ EOF
 '
 
 test_expect_success $PREREQ 'setup expect' '
+cat expected EOF
+Subject: subject goes here
+EOF
+'
+
+test_expect_success $PREREQ 'ASCII subject is not RFC2047 quoted' '
+   clean_fake_sendmail 
+   echo bogus |
+   git send-email --from=aut...@example.com --to=nob...@example.com \
+   --smtp-server=$(pwd)/fake.sendmail \
+   --8bit-encoding=UTF-8 \
+   email-using-8bit stdout 
+   grep Subject msgtxt1 actual 
+   test_cmp expected actual
+'
+
+test_expect_success $PREREQ 'setup expect' '
 cat content-type-decl EOF
 MIME-Version: 1.0
 Content-Type: text/plain; charset=UTF-8
-- 
1.8.0.3.gf4c35fc

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] git-send-email: skip RFC2047 quoting for ASCII subjects

2012-10-24 Thread Jeff King
On Wed, Oct 24, 2012 at 10:03:35AM +0200, Krzysztof Mazur wrote:

 The git-send-email always use RFC2047 subject quoting for files
 with broken encoding - non-ASCII files without Content-Transfer-Encoding,
 even for ASCII subjects. Now for ASCII subjects the RFC2047 quoting will be
 skipped.
 [...]
 - if ($broken_encoding{$t}  !is_rfc2047_quoted($subject)) {
 + if ($broken_encoding{$t}  !is_rfc2047_quoted($subject) 
 + ($subject =~ /[^[:ascii:]]/)) {

Is that test sufficient? We would also need to encode if it has rfc2047
specials, no?

It looks like we use the same regex elsewhere. Maybe this would be a
good chance to abstract out a needs_rfc2047_quoting while we are in the
area?

Other than that, I did not see anything wrong with the patch.

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] git-send-email: skip RFC2047 quoting for ASCII subjects

2012-10-24 Thread Krzysztof Mazur
On Wed, Oct 24, 2012 at 04:46:36AM -0400, Jeff King wrote:
 On Wed, Oct 24, 2012 at 10:03:35AM +0200, Krzysztof Mazur wrote:
 
  The git-send-email always use RFC2047 subject quoting for files
  with broken encoding - non-ASCII files without Content-Transfer-Encoding,
  even for ASCII subjects. Now for ASCII subjects the RFC2047 quoting will be
  skipped.
  [...]
  -   if ($broken_encoding{$t}  !is_rfc2047_quoted($subject)) {
  +   if ($broken_encoding{$t}  !is_rfc2047_quoted($subject) 
  +   ($subject =~ /[^[:ascii:]]/)) {
 
 Is that test sufficient? We would also need to encode if it has rfc2047
 specials, no?

For Subject this should be sufficient. According to RFC822 after
Subject: we have text token,

--- from RFC822 ---
 /  Subject   :  *text
--- from RFC822 ---

and text is defined as:

--- from RFC822 ---
 text=  any CHAR, including bare; = atoms, specials,
 CR  bare LF, but NOT   ;  comments and
 including CRLF ;  quoted-strings are
 ;  NOT recognized.
--- from RFC822 ---

so only CRLF is not allowed in Subject.


So the problem only exists for broken RFC2047-like texts, but I think
it's ok to just pass such subjects, in most cases the Subject comes
from already formatted patch file. I think that we just want to fix Subjects
without specified encoding here.


In most cases, when git-send-email is used for patches generated
by git format-patch we just don't want to corrupt Subject. The
git format-patch generates broken patches when commit message
uses only ASCII characters and patch contains some non-ASCII characters.
In this case original git-send-email, without this patch, adds RFC2047
quotation for pure ASCII Subject.

 
 It looks like we use the same regex elsewhere. Maybe this would be a
 good chance to abstract out a needs_rfc2047_quoting while we are in the
 area?

It's a good idea, however rules are different for Subject and addresses
(sanitize_address).

I think we can go even further, we can just add quote_subject(),
which performs this test and calls quote_rfc2047() if necessary.
I'm sending bellow patch that does that.

Krzysiek
-- 
From a1e6eef831275485ec1555d94ff0d9aac852dd12 Mon Sep 17 00:00:00 2001
From: Krzysztof Mazur krzys...@podlesie.net
Date: Wed, 24 Oct 2012 19:08:57 +0200
Subject: [PATCH] git-send-email: introduce quote_subject()

The quote_rfc2047() always adds RFC2047 quoting and to avoid quoting ASCII
subjects, before calling quote_rfc2047() subject must be tested for non-ASCII
characters. To avoid this new quote_subject() function is introduced.
The quote_subject() performs this test and calls quote_rfc2047() only if
necessary.

Signed-off-by: Krzysztof Mazur krzys...@podlesie.net
---
 git-send-email.perl | 25 +++--
 1 file changed, 19 insertions(+), 6 deletions(-)

diff --git a/git-send-email.perl b/git-send-email.perl
index efeae4c..e9aec8d 100755
--- a/git-send-email.perl
+++ b/git-send-email.perl
@@ -657,9 +657,7 @@ EOT
$initial_subject = $1;
my $subject = $initial_subject;
$_ = Subject:  .
-   ($subject =~ /[^[:ascii:]]/ ?
-quote_rfc2047($subject, $compose_encoding) :
-$subject) .
+   quote_subject($subject, $compose_encoding) .
\n;
} elsif (/^In-Reply-To:\s*(.+)\s*$/i) {
$initial_reply_to = $1;
@@ -907,6 +905,22 @@ sub is_rfc2047_quoted {
$s =~ m/^(?:[[:ascii:]]*|=\?$token\?$token\?$encoded_text\?=)$/o;
 }
 
+sub subject_needs_rfc2047_quoting {
+   my $s = shift;
+
+   return !is_rfc2047_quoted($s)  ($s =~ /[^[:ascii:]]/);
+}
+
+sub quote_subject {
+   local $subject = shift;
+   my $encoding = shift || 'UTF-8';
+
+   if (subject_needs_rfc2047_quoting($subject)) {
+   return quote_rfc2047($subject, $encoding);
+   }
+   return $subject;
+}
+
 # use the simplest quoting being able to handle the recipient
 sub sanitize_address {
my ($recipient) = @_;
@@ -1327,9 +1341,8 @@ foreach my $t (@files) {
$body_encoding = $auto_8bit_encoding;
}
 
-   if ($broken_encoding{$t}  !is_rfc2047_quoted($subject) 
-   ($subject =~ /[^[:ascii:]]/)) {
-   $subject = quote_rfc2047($subject, $auto_8bit_encoding);
+   if ($broken_encoding{$t}) {
+   $subject = quote_subject($subject, $auto_8bit_encoding);
}
 
if (defined $author and $author ne $sender) {
-- 
1.8.0.3.gf4c35fc

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] git-send-email: skip RFC2047 quoting for ASCII subjects

2012-10-24 Thread Jeff King
On Wed, Oct 24, 2012 at 07:10:36PM +0200, Krzysztof Mazur wrote:

   - if ($broken_encoding{$t}  !is_rfc2047_quoted($subject)) {
   + if ($broken_encoding{$t}  !is_rfc2047_quoted($subject) 
   + ($subject =~ /[^[:ascii:]]/)) {
  
  Is that test sufficient? We would also need to encode if it has rfc2047
  specials, no?
 
 For Subject this should be sufficient. According to RFC822 after
 Subject: we have text token,
 [...]
 So the problem only exists for broken RFC2047-like texts, but I think
 it's ok to just pass such subjects, in most cases the Subject comes
 from already formatted patch file. I think that we just want to fix Subjects
 without specified encoding here.

Right, but I was specifically worried about raw =?, which is only an
issue due to rfc2047 itself.

However, reading the patch again, we are already checking for that with
is_rfc2047_quoted. It might miss the case where we have =? but not the
rest of a valid encoded word, but any compliant parser should recognize
that and leave it be.

So I think your original patch is actually correct.

 I think we can go even further, we can just add quote_subject(),
 which performs this test and calls quote_rfc2047() if necessary.
 I'm sending bellow patch that does that.

Yeah, it would still be nice to keep the logic in one place.

 diff --git a/git-send-email.perl b/git-send-email.perl
 index efeae4c..e9aec8d 100755
 --- a/git-send-email.perl
 +++ b/git-send-email.perl
 @@ -657,9 +657,7 @@ EOT
   $initial_subject = $1;
   my $subject = $initial_subject;
   $_ = Subject:  .
 - ($subject =~ /[^[:ascii:]]/ ?
 -  quote_rfc2047($subject, $compose_encoding) :
 -  $subject) .
 + quote_subject($subject, $compose_encoding) .

Hrm. Isn't this one technically a regression if the $subject contains
encoded words? IOW, in this case we feed quote_subject a known-raw
header; any rfc2047 in it would want to be encoded to be preserved.

But in this case:

 @@ -1327,9 +1341,8 @@ foreach my $t (@files) {
   $body_encoding = $auto_8bit_encoding;
   }
  
 - if ($broken_encoding{$t}  !is_rfc2047_quoted($subject) 
 - ($subject =~ /[^[:ascii:]]/)) {
 - $subject = quote_rfc2047($subject, $auto_8bit_encoding);
 + if ($broken_encoding{$t}) {
 + $subject = quote_subject($subject, $auto_8bit_encoding);
   }

We have a possibly already-encoded header, and we would want to avoid
double-encoding it.

In the first case, the wants quoting logic should be:

  is_rfc2047_quoted($subject) || /[^[:ascii:]]/

and in the latter case it would be:

  !is_rfc2047_quoted($subject)  /^[:ascii:]]/

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] git-send-email: skip RFC2047 quoting for ASCII subjects

2012-10-24 Thread Krzysztof Mazur
On Wed, Oct 24, 2012 at 03:25:30PM -0400, Jeff King wrote:
 Right, but I was specifically worried about raw =?, which is only an
 issue due to rfc2047 itself.
 
 However, reading the patch again, we are already checking for that with
 is_rfc2047_quoted. It might miss the case where we have =? but not the
 rest of a valid encoded word, but any compliant parser should recognize
 that and leave it be.
 
 So I think your original patch is actually correct.
 
 [...]
 We have a possibly already-encoded header, and we would want to avoid
 double-encoding it.
 
 In the first case, the wants quoting logic should be:
 
   is_rfc2047_quoted($subject) || /[^[:ascii:]]/
 
 and in the latter case it would be:
 
   !is_rfc2047_quoted($subject)  /^[:ascii:]]/
 

ok, I'm sending a version that just adds quote_subject() without
changing any logic, so now we still have in first case:

 /[^[:ascii:]]/

and in the latter case:
 
 !is_rfc2047_quoted($subject)  /^[:ascii:]]/


In the next patch I will just add matching for =? in 
subject_needs_rfc2047_quoting() and we will have:

   /=?/ || /[^[:ascii:]]/

and in the latter case:
 
   !is_rfc2047_quoted($subject)  (/=\?/ || /^[:ascii:]]/)

This will also add quoting for any rfc2047 quoted subject or any
other rfc2047-like subject, as you suggested.

Krzysiek
-- 
From a70c5385f9b4da69a8ce00a1448f87f63bbd500d Mon Sep 17 00:00:00 2001
From: Krzysztof Mazur krzys...@podlesie.net
Date: Wed, 24 Oct 2012 22:46:00 +0200
Subject: [PATCH] git-send-email: introduce quote_subject()

The quote_rfc2047() always adds RFC2047 quoting and to avoid quoting ASCII
subjects, before calling quote_rfc2047() subject must be tested for non-ASCII
characters. To avoid this new quote_subject() function is introduced.
The quote_subject() performs this test and calls quote_rfc2047() only if
necessary.

Signed-off-by: Krzysztof Mazur krzys...@podlesie.net
---
 git-send-email.perl | 25 +++--
 1 file changed, 19 insertions(+), 6 deletions(-)

diff --git a/git-send-email.perl b/git-send-email.perl
index efeae4c..eb1b876 100755
--- a/git-send-email.perl
+++ b/git-send-email.perl
@@ -657,9 +657,7 @@ EOT
$initial_subject = $1;
my $subject = $initial_subject;
$_ = Subject:  .
-   ($subject =~ /[^[:ascii:]]/ ?
-quote_rfc2047($subject, $compose_encoding) :
-$subject) .
+   quote_subject($subject, $compose_encoding) .
\n;
} elsif (/^In-Reply-To:\s*(.+)\s*$/i) {
$initial_reply_to = $1;
@@ -907,6 +905,22 @@ sub is_rfc2047_quoted {
$s =~ m/^(?:[[:ascii:]]*|=\?$token\?$token\?$encoded_text\?=)$/o;
 }
 
+sub subject_needs_rfc2047_quoting {
+   my $s = shift;
+
+   return ($s =~ /[^[:ascii:]]/);
+}
+
+sub quote_subject {
+   local $subject = shift;
+   my $encoding = shift || 'UTF-8';
+
+   if (subject_needs_rfc2047_quoting($subject)) {
+   return quote_rfc2047($subject, $encoding);
+   }
+   return $subject;
+}
+
 # use the simplest quoting being able to handle the recipient
 sub sanitize_address {
my ($recipient) = @_;
@@ -1327,9 +1341,8 @@ foreach my $t (@files) {
$body_encoding = $auto_8bit_encoding;
}
 
-   if ($broken_encoding{$t}  !is_rfc2047_quoted($subject) 
-   ($subject =~ /[^[:ascii:]]/)) {
-   $subject = quote_rfc2047($subject, $auto_8bit_encoding);
+   if ($broken_encoding{$t}  !is_rfc2047_quoted($subject)) {
+   $subject = quote_subject($subject, $auto_8bit_encoding);
}
 
if (defined $author and $author ne $sender) {
-- 
1.8.0.4.ge8ddce6
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html