Re: [PATCH] git-send-email: skip RFC2047 quoting for ASCII subjects
On Wed, Oct 24, 2012 at 11:08:26PM +0200, Krzysztof Mazur wrote: ok, I'm sending a version that just adds quote_subject() without changing any logic, so now we still have in first case: /[^[:ascii:]]/ and in the latter case: !is_rfc2047_quoted($subject) /^[:ascii:]]/ In the next patch I will just add matching for =? in subject_needs_rfc2047_quoting() and we will have: /=?/ || /[^[:ascii:]]/ and in the latter case: !is_rfc2047_quoted($subject) (/=\?/ || /^[:ascii:]]/) This will also add quoting for any rfc2047 quoted subject or any other rfc2047-like subject, as you suggested. Thanks, the two-patch series you outline makes a lot of sense to me. Krzysiek -- From a70c5385f9b4da69a8ce00a1448f87f63bbd500d Mon Sep 17 00:00:00 2001 From: Krzysztof Mazur krzys...@podlesie.net Date: Wed, 24 Oct 2012 22:46:00 +0200 Subject: [PATCH] git-send-email: introduce quote_subject() When sending a patch following some cover letter material, please cut out any non-essential headers and use the scissors symbol, like this: -- 8 -- Subject: [PATCH] this subject overrides the whole email's subject the regular body and diff go here... That format is understood by git am and means I do not have to manually munge it, which saves a little work. +sub quote_subject { + local $subject = shift; + my $encoding = shift || 'UTF-8'; + + if (subject_needs_rfc2047_quoting($subject)) { + return quote_rfc2047($subject, $encoding); + } + return $subject; +} There is some funny whitespace here (space followed by tab). - if ($broken_encoding{$t} !is_rfc2047_quoted($subject) - ($subject =~ /[^[:ascii:]]/)) { - $subject = quote_rfc2047($subject, $auto_8bit_encoding); + if ($broken_encoding{$t} !is_rfc2047_quoted($subject)) { + $subject = quote_subject($subject, $auto_8bit_encoding); } Hmm. What is this patch on top of? It looks like it is on top of your original patch, but when I tried it on top of that, it does not apply either, and the index lines in the patch do not mention a sha1 that I do not have. Do you mind re-rolling a final 2-patch series with: 1. Your original patch and this one squashed together, with an appropriate commit message. 2. The second quote when we see '=?' patch. Thanks. -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] git-send-email: skip RFC2047 quoting for ASCII subjects
On Thu, Oct 25, 2012 at 05:01:49AM -0400, Jeff King wrote: - if ($broken_encoding{$t} !is_rfc2047_quoted($subject) - ($subject =~ /[^[:ascii:]]/)) { - $subject = quote_rfc2047($subject, $auto_8bit_encoding); + if ($broken_encoding{$t} !is_rfc2047_quoted($subject)) { + $subject = quote_subject($subject, $auto_8bit_encoding); } Hmm. What is this patch on top of? It looks like it is on top of your original patch, but when I tried it on top of that, it does not apply either, and the index lines in the patch do not mention a sha1 that I do not have. Do you mind re-rolling a final 2-patch series with: Ah, never mind. I missed your earlier use compose-encoding for Subject. I've queued it and all of the follow-ons onto the km/send-email-compose-encoding topic. Thanks. -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] git-send-email: skip RFC2047 quoting for ASCII subjects
On Thu, Oct 25, 2012 at 05:01:49AM -0400, Jeff King wrote: Hmm. What is this patch on top of? It looks like it is on top of your original patch, but when I tried it on top of that, it does not apply either, and the index lines in the patch do not mention a sha1 that I do not have. Sorry, it's against km/send-email-compose-encoding (or current next) + git-send-email: use compose-encoding for Subject. Do you mind re-rolling a final 2-patch series with: 1. Your original patch and this one squashed together, with an appropriate commit message. I think that it's better to do refactoring and fix for ASCII in separate patches. Maybe we should reverse order of first two patches. This first will do refactoring and the second will just replace quote_rfc2047() with quote_subject() in broken encoding case and add test for this problem. 2. The second quote when we see '=?' patch. Thanks. -Peff ok, I will resend the final series. I need also to fix git-send-email: use compose-encoding for Subject patch. Now it's depends both on this series and km/send-email-compose-encoding branch. Krzysiek -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] git-send-email: skip RFC2047 quoting for ASCII subjects
On Thu, Oct 25, 2012 at 01:19:19PM +0200, Krzysztof Mazur wrote: On Thu, Oct 25, 2012 at 06:08:54AM -0400, Jeff King wrote: Ah, never mind. I missed your earlier use compose-encoding for Subject. I've queued it and all of the follow-ons onto the km/send-email-compose-encoding topic. thanks, what about the problem with whitespaces in quote_subject patch? I fixed the whitespace problems, and just applying your patches in sequence on top of send-email-compose-encoding actually looks sensible (after looking at it, I came to the same conclusion as you that the two patches should be kept separate). I'll push out the results of tonight's work in a few minutes, if you want to eyeball what's in pu. -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] git-send-email: skip RFC2047 quoting for ASCII subjects
On Thu, Oct 25, 2012 at 06:08:54AM -0400, Jeff King wrote: Ah, never mind. I missed your earlier use compose-encoding for Subject. I've queued it and all of the follow-ons onto the km/send-email-compose-encoding topic. thanks, what about the problem with whitespaces in quote_subject patch? Krzysiek -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] git-send-email: skip RFC2047 quoting for ASCII subjects
The git-send-email always use RFC2047 subject quoting for files with broken encoding - non-ASCII files without Content-Transfer-Encoding, even for ASCII subjects. Now for ASCII subjects the RFC2047 quoting will be skipped. Signed-off-by: Krzysztof Mazur krzys...@podlesie.net --- git-send-email.perl | 3 ++- t/t9001-send-email.sh | 17 + 2 files changed, 19 insertions(+), 1 deletion(-) diff --git a/git-send-email.perl b/git-send-email.perl index adcb4e3..efeae4c 100755 --- a/git-send-email.perl +++ b/git-send-email.perl @@ -1327,7 +1327,8 @@ foreach my $t (@files) { $body_encoding = $auto_8bit_encoding; } - if ($broken_encoding{$t} !is_rfc2047_quoted($subject)) { + if ($broken_encoding{$t} !is_rfc2047_quoted($subject) + ($subject =~ /[^[:ascii:]]/)) { $subject = quote_rfc2047($subject, $auto_8bit_encoding); } diff --git a/t/t9001-send-email.sh b/t/t9001-send-email.sh index 89fceda..6c6af7d 100755 --- a/t/t9001-send-email.sh +++ b/t/t9001-send-email.sh @@ -1143,6 +1143,23 @@ EOF ' test_expect_success $PREREQ 'setup expect' ' +cat expected EOF +Subject: subject goes here +EOF +' + +test_expect_success $PREREQ 'ASCII subject is not RFC2047 quoted' ' + clean_fake_sendmail + echo bogus | + git send-email --from=aut...@example.com --to=nob...@example.com \ + --smtp-server=$(pwd)/fake.sendmail \ + --8bit-encoding=UTF-8 \ + email-using-8bit stdout + grep Subject msgtxt1 actual + test_cmp expected actual +' + +test_expect_success $PREREQ 'setup expect' ' cat content-type-decl EOF MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 -- 1.8.0.3.gf4c35fc -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] git-send-email: skip RFC2047 quoting for ASCII subjects
On Wed, Oct 24, 2012 at 10:03:35AM +0200, Krzysztof Mazur wrote: The git-send-email always use RFC2047 subject quoting for files with broken encoding - non-ASCII files without Content-Transfer-Encoding, even for ASCII subjects. Now for ASCII subjects the RFC2047 quoting will be skipped. [...] - if ($broken_encoding{$t} !is_rfc2047_quoted($subject)) { + if ($broken_encoding{$t} !is_rfc2047_quoted($subject) + ($subject =~ /[^[:ascii:]]/)) { Is that test sufficient? We would also need to encode if it has rfc2047 specials, no? It looks like we use the same regex elsewhere. Maybe this would be a good chance to abstract out a needs_rfc2047_quoting while we are in the area? Other than that, I did not see anything wrong with the patch. -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] git-send-email: skip RFC2047 quoting for ASCII subjects
On Wed, Oct 24, 2012 at 04:46:36AM -0400, Jeff King wrote: On Wed, Oct 24, 2012 at 10:03:35AM +0200, Krzysztof Mazur wrote: The git-send-email always use RFC2047 subject quoting for files with broken encoding - non-ASCII files without Content-Transfer-Encoding, even for ASCII subjects. Now for ASCII subjects the RFC2047 quoting will be skipped. [...] - if ($broken_encoding{$t} !is_rfc2047_quoted($subject)) { + if ($broken_encoding{$t} !is_rfc2047_quoted($subject) + ($subject =~ /[^[:ascii:]]/)) { Is that test sufficient? We would also need to encode if it has rfc2047 specials, no? For Subject this should be sufficient. According to RFC822 after Subject: we have text token, --- from RFC822 --- / Subject : *text --- from RFC822 --- and text is defined as: --- from RFC822 --- text= any CHAR, including bare; = atoms, specials, CR bare LF, but NOT ; comments and including CRLF ; quoted-strings are ; NOT recognized. --- from RFC822 --- so only CRLF is not allowed in Subject. So the problem only exists for broken RFC2047-like texts, but I think it's ok to just pass such subjects, in most cases the Subject comes from already formatted patch file. I think that we just want to fix Subjects without specified encoding here. In most cases, when git-send-email is used for patches generated by git format-patch we just don't want to corrupt Subject. The git format-patch generates broken patches when commit message uses only ASCII characters and patch contains some non-ASCII characters. In this case original git-send-email, without this patch, adds RFC2047 quotation for pure ASCII Subject. It looks like we use the same regex elsewhere. Maybe this would be a good chance to abstract out a needs_rfc2047_quoting while we are in the area? It's a good idea, however rules are different for Subject and addresses (sanitize_address). I think we can go even further, we can just add quote_subject(), which performs this test and calls quote_rfc2047() if necessary. I'm sending bellow patch that does that. Krzysiek -- From a1e6eef831275485ec1555d94ff0d9aac852dd12 Mon Sep 17 00:00:00 2001 From: Krzysztof Mazur krzys...@podlesie.net Date: Wed, 24 Oct 2012 19:08:57 +0200 Subject: [PATCH] git-send-email: introduce quote_subject() The quote_rfc2047() always adds RFC2047 quoting and to avoid quoting ASCII subjects, before calling quote_rfc2047() subject must be tested for non-ASCII characters. To avoid this new quote_subject() function is introduced. The quote_subject() performs this test and calls quote_rfc2047() only if necessary. Signed-off-by: Krzysztof Mazur krzys...@podlesie.net --- git-send-email.perl | 25 +++-- 1 file changed, 19 insertions(+), 6 deletions(-) diff --git a/git-send-email.perl b/git-send-email.perl index efeae4c..e9aec8d 100755 --- a/git-send-email.perl +++ b/git-send-email.perl @@ -657,9 +657,7 @@ EOT $initial_subject = $1; my $subject = $initial_subject; $_ = Subject: . - ($subject =~ /[^[:ascii:]]/ ? -quote_rfc2047($subject, $compose_encoding) : -$subject) . + quote_subject($subject, $compose_encoding) . \n; } elsif (/^In-Reply-To:\s*(.+)\s*$/i) { $initial_reply_to = $1; @@ -907,6 +905,22 @@ sub is_rfc2047_quoted { $s =~ m/^(?:[[:ascii:]]*|=\?$token\?$token\?$encoded_text\?=)$/o; } +sub subject_needs_rfc2047_quoting { + my $s = shift; + + return !is_rfc2047_quoted($s) ($s =~ /[^[:ascii:]]/); +} + +sub quote_subject { + local $subject = shift; + my $encoding = shift || 'UTF-8'; + + if (subject_needs_rfc2047_quoting($subject)) { + return quote_rfc2047($subject, $encoding); + } + return $subject; +} + # use the simplest quoting being able to handle the recipient sub sanitize_address { my ($recipient) = @_; @@ -1327,9 +1341,8 @@ foreach my $t (@files) { $body_encoding = $auto_8bit_encoding; } - if ($broken_encoding{$t} !is_rfc2047_quoted($subject) - ($subject =~ /[^[:ascii:]]/)) { - $subject = quote_rfc2047($subject, $auto_8bit_encoding); + if ($broken_encoding{$t}) { + $subject = quote_subject($subject, $auto_8bit_encoding); } if (defined $author and $author ne $sender) { -- 1.8.0.3.gf4c35fc -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] git-send-email: skip RFC2047 quoting for ASCII subjects
On Wed, Oct 24, 2012 at 07:10:36PM +0200, Krzysztof Mazur wrote: - if ($broken_encoding{$t} !is_rfc2047_quoted($subject)) { + if ($broken_encoding{$t} !is_rfc2047_quoted($subject) + ($subject =~ /[^[:ascii:]]/)) { Is that test sufficient? We would also need to encode if it has rfc2047 specials, no? For Subject this should be sufficient. According to RFC822 after Subject: we have text token, [...] So the problem only exists for broken RFC2047-like texts, but I think it's ok to just pass such subjects, in most cases the Subject comes from already formatted patch file. I think that we just want to fix Subjects without specified encoding here. Right, but I was specifically worried about raw =?, which is only an issue due to rfc2047 itself. However, reading the patch again, we are already checking for that with is_rfc2047_quoted. It might miss the case where we have =? but not the rest of a valid encoded word, but any compliant parser should recognize that and leave it be. So I think your original patch is actually correct. I think we can go even further, we can just add quote_subject(), which performs this test and calls quote_rfc2047() if necessary. I'm sending bellow patch that does that. Yeah, it would still be nice to keep the logic in one place. diff --git a/git-send-email.perl b/git-send-email.perl index efeae4c..e9aec8d 100755 --- a/git-send-email.perl +++ b/git-send-email.perl @@ -657,9 +657,7 @@ EOT $initial_subject = $1; my $subject = $initial_subject; $_ = Subject: . - ($subject =~ /[^[:ascii:]]/ ? - quote_rfc2047($subject, $compose_encoding) : - $subject) . + quote_subject($subject, $compose_encoding) . Hrm. Isn't this one technically a regression if the $subject contains encoded words? IOW, in this case we feed quote_subject a known-raw header; any rfc2047 in it would want to be encoded to be preserved. But in this case: @@ -1327,9 +1341,8 @@ foreach my $t (@files) { $body_encoding = $auto_8bit_encoding; } - if ($broken_encoding{$t} !is_rfc2047_quoted($subject) - ($subject =~ /[^[:ascii:]]/)) { - $subject = quote_rfc2047($subject, $auto_8bit_encoding); + if ($broken_encoding{$t}) { + $subject = quote_subject($subject, $auto_8bit_encoding); } We have a possibly already-encoded header, and we would want to avoid double-encoding it. In the first case, the wants quoting logic should be: is_rfc2047_quoted($subject) || /[^[:ascii:]]/ and in the latter case it would be: !is_rfc2047_quoted($subject) /^[:ascii:]]/ -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] git-send-email: skip RFC2047 quoting for ASCII subjects
On Wed, Oct 24, 2012 at 03:25:30PM -0400, Jeff King wrote: Right, but I was specifically worried about raw =?, which is only an issue due to rfc2047 itself. However, reading the patch again, we are already checking for that with is_rfc2047_quoted. It might miss the case where we have =? but not the rest of a valid encoded word, but any compliant parser should recognize that and leave it be. So I think your original patch is actually correct. [...] We have a possibly already-encoded header, and we would want to avoid double-encoding it. In the first case, the wants quoting logic should be: is_rfc2047_quoted($subject) || /[^[:ascii:]]/ and in the latter case it would be: !is_rfc2047_quoted($subject) /^[:ascii:]]/ ok, I'm sending a version that just adds quote_subject() without changing any logic, so now we still have in first case: /[^[:ascii:]]/ and in the latter case: !is_rfc2047_quoted($subject) /^[:ascii:]]/ In the next patch I will just add matching for =? in subject_needs_rfc2047_quoting() and we will have: /=?/ || /[^[:ascii:]]/ and in the latter case: !is_rfc2047_quoted($subject) (/=\?/ || /^[:ascii:]]/) This will also add quoting for any rfc2047 quoted subject or any other rfc2047-like subject, as you suggested. Krzysiek -- From a70c5385f9b4da69a8ce00a1448f87f63bbd500d Mon Sep 17 00:00:00 2001 From: Krzysztof Mazur krzys...@podlesie.net Date: Wed, 24 Oct 2012 22:46:00 +0200 Subject: [PATCH] git-send-email: introduce quote_subject() The quote_rfc2047() always adds RFC2047 quoting and to avoid quoting ASCII subjects, before calling quote_rfc2047() subject must be tested for non-ASCII characters. To avoid this new quote_subject() function is introduced. The quote_subject() performs this test and calls quote_rfc2047() only if necessary. Signed-off-by: Krzysztof Mazur krzys...@podlesie.net --- git-send-email.perl | 25 +++-- 1 file changed, 19 insertions(+), 6 deletions(-) diff --git a/git-send-email.perl b/git-send-email.perl index efeae4c..eb1b876 100755 --- a/git-send-email.perl +++ b/git-send-email.perl @@ -657,9 +657,7 @@ EOT $initial_subject = $1; my $subject = $initial_subject; $_ = Subject: . - ($subject =~ /[^[:ascii:]]/ ? -quote_rfc2047($subject, $compose_encoding) : -$subject) . + quote_subject($subject, $compose_encoding) . \n; } elsif (/^In-Reply-To:\s*(.+)\s*$/i) { $initial_reply_to = $1; @@ -907,6 +905,22 @@ sub is_rfc2047_quoted { $s =~ m/^(?:[[:ascii:]]*|=\?$token\?$token\?$encoded_text\?=)$/o; } +sub subject_needs_rfc2047_quoting { + my $s = shift; + + return ($s =~ /[^[:ascii:]]/); +} + +sub quote_subject { + local $subject = shift; + my $encoding = shift || 'UTF-8'; + + if (subject_needs_rfc2047_quoting($subject)) { + return quote_rfc2047($subject, $encoding); + } + return $subject; +} + # use the simplest quoting being able to handle the recipient sub sanitize_address { my ($recipient) = @_; @@ -1327,9 +1341,8 @@ foreach my $t (@files) { $body_encoding = $auto_8bit_encoding; } - if ($broken_encoding{$t} !is_rfc2047_quoted($subject) - ($subject =~ /[^[:ascii:]]/)) { - $subject = quote_rfc2047($subject, $auto_8bit_encoding); + if ($broken_encoding{$t} !is_rfc2047_quoted($subject)) { + $subject = quote_subject($subject, $auto_8bit_encoding); } if (defined $author and $author ne $sender) { -- 1.8.0.4.ge8ddce6 -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html