Re: [PATCH/RFC v4 07/10] send-email: reduce dependancies impact on parse_address_line

2015-06-19 Thread Matthieu Moy
Remi Lespinet remi.lespi...@ensimag.grenoble-inp.fr writes:

 Matthieu Moy matthieu@grenoble-inp.fr writes:

 Cool. Then almost all the work is done to get an automated test. Next
 step would be to add the tests itself in the code. I would do that by
 adding a hidden --selfcheck option to git send-email that would compare
 Mail::Address-parse($string); and split_addrs($string); for all your
 testcases, and die if they do not match. Then calling it from the
 testsuite would be trivial.

 Ok, are there such --selfcheck options elsewhere?

Not as far as I know.

 If I understand it right, you want to put the tests inside the
 git-send-email script. I don't feel really good about that but I guess
 it's hard to test it otherwise...

Hmm, actually there is, I didn't look at the right places yesterday.
git-send-email.perl already does 'use Git;', and there's already a set
of unit-tests for Git.pm: t9700-perl-git.sh, which calls perl
$TEST_DIRECTORY/t9700/test.pl.

So, you can just add your code as a function in Git.pm and unit-tests in
t/t9700/test.pl.

 Also what will we do with the failing tests? Just discard them? I
 think there's two sort of failing test:

  - When output provided by parse_address_ without Mail::Address
is better or has no impact at all on the code. Such as:

I'm not sure we can be better as long as we do use Mail::Address when
available. Any difference is potentially harmfull for the user because
it means that Git will have different behavior on different machines.

Perhaps this is an argument to use your version unconditionally and drop
Mail::Address actually.

But you can still test that with

  is(parse_address_(...), Doe, Jane, description);

(possibly not calling Mail::Address)

http://search.cpan.org/~exodist/Test-Simple-1.001014/lib/Test/More.pm

The cases where Mail::Address and your version give the same result can
be tested with a foreach loop calling

  is(parse_address_(...), Mail::Address(...), ...);

  - When we don't really care about the output, because the user entry
is wrong, and we just expect the script to be aborted somehow... We
don't need to test that.

... but if you already have the tests, you can keep them as known
failure.

See the TODO: BLOCK section of the doc of Test::More.

 I can do that on top of your series if you don't have time.

 Time will become a problem soon, but I think I can handle it unless
 you really want to do it !

If you have time, just do it.

Thanks,

-- 
Matthieu Moy
http://www-verimag.imag.fr/~moy/
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH/RFC v4 07/10] send-email: reduce dependancies impact on parse_address_line

2015-06-18 Thread Matthieu Moy
Remi Lespinet remi.lespi...@ensimag.grenoble-inp.fr writes:

 I've some more tests, maybe I should put them all in this post ?

Yes, please post as much as you have. Ideally, this should be
automatically tested, but if you don't have time to write the automated
tests, at least having a track of what you did on the list archives can
help someone else to do it.

-- 
Matthieu Moy
http://www-verimag.imag.fr/~moy/
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH/RFC v4 07/10] send-email: reduce dependancies impact on parse_address_line

2015-06-18 Thread Matthieu Moy
Remi Lespinet remi.lespi...@ensimag.grenoble-inp.fr writes:

 Remi Lespinet remi.lespi...@ensimag.grenoble-inp.fr writes:
 
  I've some more tests, maybe I should put them all in this post ?
 
 Yes, please post as much as you have. Ideally, this should be
 automatically tested, but if you don't have time to write the automated
 tests, at least having a track of what you did on the list archives can
 help someone else to do it.

 It may not be easily readable without colors, so there are the scripts
 at the end.

Cool. Then almost all the work is done to get an automated test. Next
step would be to add the tests itself in the code. I would do that by
adding a hidden --selfcheck option to git send-email that would compare
Mail::Address-parse($string); and split_addrs($string); for all your
testcases, and die if they do not match. Then calling it from the
testsuite would be trivial.

I can do that on top of your series if you don't have time.

-- 
Matthieu Moy
http://www-verimag.imag.fr/~moy/
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH/RFC v4 07/10] send-email: reduce dependancies impact on parse_address_line

2015-06-18 Thread Remi Lespinet
Matthieu Moy matthieu@grenoble-inp.fr writes:

 Cool. Then almost all the work is done to get an automated test. Next
 step would be to add the tests itself in the code. I would do that by
 adding a hidden --selfcheck option to git send-email that would compare
 Mail::Address-parse($string); and split_addrs($string); for all your
 testcases, and die if they do not match. Then calling it from the
 testsuite would be trivial.

Ok, are there such --selfcheck options elsewhere? If I understand it
right, you want to put the tests inside the git-send-email script. I
don't feel really good about that but I guess it's hard to test it
otherwise...  Also what will we do with the failing tests?  Just
discard them?  I think there's two sort of failing test:

 - When output provided by parse_address_ without Mail::Address
   is better or has no impact at all on the code. Such as:

Input: Doe, Jane j...@example.com
Split: Doe, Ja ne j...@example.com
M::A : Doe, Ja ne j...@example.com

   This output is done on purpose. If it was the same output with
   Mail::Address, we could have avoided commit 8/9 of this serie btw.

   I think we should also test these cases.

 - When we don't really care about the output, because the user entry
   is wrong, and we just expect the script to be aborted somehow... We
   don't need to test that.

We could also add an option to specify whether we want to use
Mail::Address or not and do the tests in t9001* (but this would
take much more time).

 I can do that on top of your series if you don't have time.

Time will become a problem soon, but I think I can handle it unless
you really want to do it !
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH/RFC v4 07/10] send-email: reduce dependancies impact on parse_address_line

2015-06-17 Thread Remi Lespinet
Matthieu Moy matthieu@grenoble-inp.fr writes

  +   my $commentrgx=qr/\((?:[^)]*)\)/;
  +   my $quotergx=qr/(?:[^\\\]|\\.)*/;
  +   my $wordrgx=qr/(?:[^][\s():;@\\,.]|\\.)+/;
 
 Spaces around = please.
 ...
  +   foreach my $token (@tokens) {
  +   if ($token =~ /^[,;]$/) {
 
 Here and below: you're indenting with a 4-column offset, it should be 8.

Should have spent more time on the form... Thanks

 The code below is a bit hard to read (I'm neither fluent in Perl nor in
 the RFC ...). A few more comments would help. A few examples below (it's
 up to you to integrate them or not).

Ok, I'll add comments for the hardest parts.
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH/RFC v4 07/10] send-email: reduce dependancies impact on parse_address_line

2015-06-17 Thread Remi Lespinet
 Junio C Hamano gits...@pobox.com writes
 Suffix rgx that means regular expression is a bit unusual, and
 also hard to read when squashed to another word.  Elsewhere in the
 same script, we seem to use $re_whatever to store precompiled
 regular expressions, so perhaps $re_comment, $re_quote, etc.?

Yes it's indeed a better name. I had not seen it, thanks!


  +if ($str_address ne   $str_phrase ne ) {
  +$str_address = qq[$str_address];
  +}
 
 We see both git@vger.kernel.org and git@vger.kernel.org around
 here for an address without comment or phrase; this chooses to turn
 them both into git@vger.kernel.org form?  Not a complaint but am
 thinking aloud to see if I am reading it correctly.

If there's no phrase, this will choose the git@vger.kernel.org form,
in both cases, because it'll be recognize as an address, $str_address
will be git@vger.kernel.org and $str_phrase will be empty before the
if ($str_address ne  ...)
Here are some tests:

Input: j...@example.com
Split: j...@example.com
M::A : j...@example.com
--
Input: j...@example.com
Split: j...@example.com
M::A : j...@example.com
--
Input: Jane j...@example.com
Split: Jane j...@example.com
M::A : Jane j...@example.com
--
Input: Jane Doe j...@example.com
Split: Jane Doe j...@example.com
M::A : Jane Doe j...@example.com
--
Input: Jane j...@example.com
Split: Jane j...@example.com
M::A : Jane j...@example.com
--
Input: Doe, Jane j...@example.com
Split: Doe, Jane j...@example.com
M::A : Doe, Jane j...@example.com

I've some more tests, maybe I should put them all in this post ?
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH/RFC v4 07/10] send-email: reduce dependancies impact on parse_address_line

2015-06-17 Thread Junio C Hamano
Remi Lespinet remi.lespi...@ensimag.grenoble-inp.fr writes:

 parse_address_line had not the same behavior whether the user had
 Mail::Address or not. Teach parse_address_line to behave like
 Mail::Address.

Sounds like a fun project ;-)

 + my $commentrgx=qr/\((?:[^)]*)\)/;
 + my $quotergx=qr/(?:[^\\\]|\\.)*/;
 + my $wordrgx=qr/(?:[^][\s():;@\\,.]|\\.)+/;
 + my $tokenrgx = qr/(?:$quotergx|$wordrgx|$commentrgx|\S)/;

Suffix rgx that means regular expression is a bit unusual, and
also hard to read when squashed to another word.  Elsewhere in the
same script, we seem to use $re_whatever to store precompiled
regular expressions, so perhaps $re_comment, $re_quote, etc.?

 + my @tokens = map { $_ =~ /\s*($tokenrgx)\s*/g } @_;
 + push @tokens, ,;
 +
 + my (@addr_list, @phrase, @address, @comment, @buffer) = ();
 + foreach my $token (@tokens) {
 + if ($token =~ /^[,;]$/) {
 + if (@address) {
 + push @address, @buffer;
 + } else {
 + push @phrase, @buffer;
 + }
 +
 + my $str_phrase = join ' ', @phrase;
 + my $str_address = join '', @address;
 + my $str_comment = join ' ', @comment;
 +
 + if ($str_phrase =~ /[][():;@\\,.\000-\037\177]/) {
 + $str_phrase =~ s/(^|[^\\])/$1/g;
 + $str_phrase = qq[$str_phrase];
 + }
 +
 + if ($str_address ne   $str_phrase ne ) {
 + $str_address = qq[$str_address];
 + }

We see both git@vger.kernel.org and git@vger.kernel.org around
here for an address without comment or phrase; this chooses to turn
them both into git@vger.kernel.org form?  Not a complaint but am
thinking aloud to see if I am reading it correctly.

 +
 + my $str_mailbox = $str_phrase $str_address $str_comment;
 + $str_mailbox =~ s/^\s*|\s*$//g;

So an empty @comment will not leave spaces after $str_address, which
makes sense (likewise for @phrase).

 + push @addr_list, $str_mailbox if ($str_mailbox);
 +
 + @phrase = @address = @comment = @buffer = ();
 + } elsif ($token =~ /^\(/) {
 + push @comment, $token;
 + } elsif ($token eq ) {
 + push @phrase, (splice @address), (splice @buffer);

That is a clever use of splice (My Perl's rusty; you learn new
things every day) ;-)

 + } elsif ($token eq ) {
 + push @address, (splice @buffer);
 + } elsif ($token eq @) {
 + push @address, (splice @buffer), @;
 + } elsif ($token eq .) {
 + push @address, (splice @buffer), .;
 + } else {
 + push @buffer, $token;
 + }
 + }
 +
 + return @addr_list;
  }
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH/RFC v4 07/10] send-email: reduce dependancies impact on parse_address_line

2015-06-17 Thread Remi Lespinet
parse_address_line had not the same behavior whether the user had
Mail::Address or not. Teach parse_address_line to behave like
Mail::Address.

When the user input is correct, this implementation behaves
exactly like Mail::Address except when there are quotes
inside the name:

  Jane Doe j...@example.com

In this case the result of parse_address_line is:

  With M::A : Jane Do e j...@example.com
  Without   : Jane Do e j...@example.com

When the user input is not correct, the behavior is also mostly
the same.

Unlike Mail::Address, this doesn't parse groups and recursive
commentaries.

Signed-off-by: Remi Lespinet remi.lespi...@ensimag.grenoble-inp.fr
---
 git-send-email.perl | 54 +++--
 1 file changed, 52 insertions(+), 2 deletions(-)

diff --git a/git-send-email.perl b/git-send-email.perl
index a0cd7ff..a1f6c18 100755
--- a/git-send-email.perl
+++ b/git-send-email.perl
@@ -477,9 +477,59 @@ foreach my $entry (@bcclist) {
 sub parse_address_line {
if ($have_mail_address) {
return map { $_-format } Mail::Address-parse($_[0]);
-   } else {
-   return split_addrs($_[0]);
}
+
+   my $commentrgx=qr/\((?:[^)]*)\)/;
+   my $quotergx=qr/(?:[^\\\]|\\.)*/;
+   my $wordrgx=qr/(?:[^][\s():;@\\,.]|\\.)+/;
+   my $tokenrgx = qr/(?:$quotergx|$wordrgx|$commentrgx|\S)/;
+
+   my @tokens = map { $_ =~ /\s*($tokenrgx)\s*/g } @_;
+   push @tokens, ,;
+
+   my (@addr_list, @phrase, @address, @comment, @buffer) = ();
+   foreach my $token (@tokens) {
+   if ($token =~ /^[,;]$/) {
+   if (@address) {
+   push @address, @buffer;
+   } else {
+   push @phrase, @buffer;
+   }
+
+   my $str_phrase = join ' ', @phrase;
+   my $str_address = join '', @address;
+   my $str_comment = join ' ', @comment;
+
+   if ($str_phrase =~ /[][():;@\\,.\000-\037\177]/) {
+   $str_phrase =~ s/(^|[^\\])/$1/g;
+   $str_phrase = qq[$str_phrase];
+   }
+
+   if ($str_address ne   $str_phrase ne ) {
+   $str_address = qq[$str_address];
+   }
+
+   my $str_mailbox = $str_phrase $str_address $str_comment;
+   $str_mailbox =~ s/^\s*|\s*$//g;
+   push @addr_list, $str_mailbox if ($str_mailbox);
+
+   @phrase = @address = @comment = @buffer = ();
+   } elsif ($token =~ /^\(/) {
+   push @comment, $token;
+   } elsif ($token eq ) {
+   push @phrase, (splice @address), (splice @buffer);
+   } elsif ($token eq ) {
+   push @address, (splice @buffer);
+   } elsif ($token eq @) {
+   push @address, (splice @buffer), @;
+   } elsif ($token eq .) {
+   push @address, (splice @buffer), .;
+   } else {
+   push @buffer, $token;
+   }
+   }
+
+   return @addr_list;
 }
 
 sub split_addrs {
-- 
1.9.1

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH/RFC v4 07/10] send-email: reduce dependancies impact on parse_address_line

2015-06-17 Thread Matthieu Moy
Remi Lespinet remi.lespi...@ensimag.grenoble-inp.fr writes:

 ---
  git-send-email.perl | 54 
 +++--
  1 file changed, 52 insertions(+), 2 deletions(-)

 diff --git a/git-send-email.perl b/git-send-email.perl
 index a0cd7ff..a1f6c18 100755
 --- a/git-send-email.perl
 +++ b/git-send-email.perl
 @@ -477,9 +477,59 @@ foreach my $entry (@bcclist) {
  sub parse_address_line {
   if ($have_mail_address) {
   return map { $_-format } Mail::Address-parse($_[0]);
 - } else {
 - return split_addrs($_[0]);
   }
 +
 + my $commentrgx=qr/\((?:[^)]*)\)/;
 + my $quotergx=qr/(?:[^\\\]|\\.)*/;
 + my $wordrgx=qr/(?:[^][\s():;@\\,.]|\\.)+/;

Spaces around = please.

The code below is a bit hard to read (I'm neither fluent in Perl nor in
the RFC ...). A few more comments would help. A few examples below (it's
up to you to integrate them or not).

 + my $tokenrgx = qr/(?:$quotergx|$wordrgx|$commentrgx|\S)/;
 +
 + my @tokens = map { $_ =~ /\s*($tokenrgx)\s*/g } @_;
 + push @tokens, ,;


# parse a full address like
# Phrase (comment) addr...@example.com

(to clarify the wording)

 + my (@addr_list, @phrase, @address, @comment, @buffer) = ();
 + foreach my $token (@tokens) {
 + if ($token =~ /^[,;]$/) {

Here and below: you're indenting with a 4-column offset, it should be 8.

 + if (@address) {
 + push @address, @buffer;
 + } else {
 + push @phrase, @buffer;
 + }
 +
 + my $str_phrase = join ' ', @phrase;
 + my $str_address = join '', @address;
 + my $str_comment = join ' ', @comment;

# Escape special-characters with backslash
 + if ($str_phrase =~ /[][():;@\\,.\000-\037\177]/) {
 + $str_phrase =~ s/(^|[^\\])/$1/g;
 + $str_phrase = qq[$str_phrase];
 + }
 +
 + if ($str_address ne   $str_phrase ne ) {
 + $str_address = qq[$str_address];
 + }
 +
 + my $str_mailbox = $str_phrase $str_address $str_comment;
 + $str_mailbox =~ s/^\s*|\s*$//g;
 + push @addr_list, $str_mailbox if ($str_mailbox);
 +
 + @phrase = @address = @comment = @buffer = ();
 + } elsif ($token =~ /^\(/) {
 + push @comment, $token;
 + } elsif ($token eq ) {
 + push @phrase, (splice @address), (splice @buffer);
 + } elsif ($token eq ) {
 + push @address, (splice @buffer);
 + } elsif ($token eq @) {
 + push @address, (splice @buffer), @;
 + } elsif ($token eq .) {
 + push @address, (splice @buffer), .;
 + } else {

# We don't know what the token belongs to yet. We'll
# decide where to append @buffer later.
 + push @buffer, $token;
 + }
 + }
 +
 + return @addr_list;

-- 
Matthieu Moy
http://www-verimag.imag.fr/~moy/
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html