Re: [PATCH/RFC v4 07/10] send-email: reduce dependancies impact on parse_address_line
Remi Lespinet remi.lespi...@ensimag.grenoble-inp.fr writes: Matthieu Moy matthieu@grenoble-inp.fr writes: Cool. Then almost all the work is done to get an automated test. Next step would be to add the tests itself in the code. I would do that by adding a hidden --selfcheck option to git send-email that would compare Mail::Address-parse($string); and split_addrs($string); for all your testcases, and die if they do not match. Then calling it from the testsuite would be trivial. Ok, are there such --selfcheck options elsewhere? Not as far as I know. If I understand it right, you want to put the tests inside the git-send-email script. I don't feel really good about that but I guess it's hard to test it otherwise... Hmm, actually there is, I didn't look at the right places yesterday. git-send-email.perl already does 'use Git;', and there's already a set of unit-tests for Git.pm: t9700-perl-git.sh, which calls perl $TEST_DIRECTORY/t9700/test.pl. So, you can just add your code as a function in Git.pm and unit-tests in t/t9700/test.pl. Also what will we do with the failing tests? Just discard them? I think there's two sort of failing test: - When output provided by parse_address_ without Mail::Address is better or has no impact at all on the code. Such as: I'm not sure we can be better as long as we do use Mail::Address when available. Any difference is potentially harmfull for the user because it means that Git will have different behavior on different machines. Perhaps this is an argument to use your version unconditionally and drop Mail::Address actually. But you can still test that with is(parse_address_(...), Doe, Jane, description); (possibly not calling Mail::Address) http://search.cpan.org/~exodist/Test-Simple-1.001014/lib/Test/More.pm The cases where Mail::Address and your version give the same result can be tested with a foreach loop calling is(parse_address_(...), Mail::Address(...), ...); - When we don't really care about the output, because the user entry is wrong, and we just expect the script to be aborted somehow... We don't need to test that. ... but if you already have the tests, you can keep them as known failure. See the TODO: BLOCK section of the doc of Test::More. I can do that on top of your series if you don't have time. Time will become a problem soon, but I think I can handle it unless you really want to do it ! If you have time, just do it. Thanks, -- Matthieu Moy http://www-verimag.imag.fr/~moy/ -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH/RFC v4 07/10] send-email: reduce dependancies impact on parse_address_line
Remi Lespinet remi.lespi...@ensimag.grenoble-inp.fr writes: I've some more tests, maybe I should put them all in this post ? Yes, please post as much as you have. Ideally, this should be automatically tested, but if you don't have time to write the automated tests, at least having a track of what you did on the list archives can help someone else to do it. -- Matthieu Moy http://www-verimag.imag.fr/~moy/ -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH/RFC v4 07/10] send-email: reduce dependancies impact on parse_address_line
Remi Lespinet remi.lespi...@ensimag.grenoble-inp.fr writes: Remi Lespinet remi.lespi...@ensimag.grenoble-inp.fr writes: I've some more tests, maybe I should put them all in this post ? Yes, please post as much as you have. Ideally, this should be automatically tested, but if you don't have time to write the automated tests, at least having a track of what you did on the list archives can help someone else to do it. It may not be easily readable without colors, so there are the scripts at the end. Cool. Then almost all the work is done to get an automated test. Next step would be to add the tests itself in the code. I would do that by adding a hidden --selfcheck option to git send-email that would compare Mail::Address-parse($string); and split_addrs($string); for all your testcases, and die if they do not match. Then calling it from the testsuite would be trivial. I can do that on top of your series if you don't have time. -- Matthieu Moy http://www-verimag.imag.fr/~moy/ -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH/RFC v4 07/10] send-email: reduce dependancies impact on parse_address_line
Matthieu Moy matthieu@grenoble-inp.fr writes: Cool. Then almost all the work is done to get an automated test. Next step would be to add the tests itself in the code. I would do that by adding a hidden --selfcheck option to git send-email that would compare Mail::Address-parse($string); and split_addrs($string); for all your testcases, and die if they do not match. Then calling it from the testsuite would be trivial. Ok, are there such --selfcheck options elsewhere? If I understand it right, you want to put the tests inside the git-send-email script. I don't feel really good about that but I guess it's hard to test it otherwise... Also what will we do with the failing tests? Just discard them? I think there's two sort of failing test: - When output provided by parse_address_ without Mail::Address is better or has no impact at all on the code. Such as: Input: Doe, Jane j...@example.com Split: Doe, Ja ne j...@example.com M::A : Doe, Ja ne j...@example.com This output is done on purpose. If it was the same output with Mail::Address, we could have avoided commit 8/9 of this serie btw. I think we should also test these cases. - When we don't really care about the output, because the user entry is wrong, and we just expect the script to be aborted somehow... We don't need to test that. We could also add an option to specify whether we want to use Mail::Address or not and do the tests in t9001* (but this would take much more time). I can do that on top of your series if you don't have time. Time will become a problem soon, but I think I can handle it unless you really want to do it ! -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH/RFC v4 07/10] send-email: reduce dependancies impact on parse_address_line
Matthieu Moy matthieu@grenoble-inp.fr writes + my $commentrgx=qr/\((?:[^)]*)\)/; + my $quotergx=qr/(?:[^\\\]|\\.)*/; + my $wordrgx=qr/(?:[^][\s():;@\\,.]|\\.)+/; Spaces around = please. ... + foreach my $token (@tokens) { + if ($token =~ /^[,;]$/) { Here and below: you're indenting with a 4-column offset, it should be 8. Should have spent more time on the form... Thanks The code below is a bit hard to read (I'm neither fluent in Perl nor in the RFC ...). A few more comments would help. A few examples below (it's up to you to integrate them or not). Ok, I'll add comments for the hardest parts. -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH/RFC v4 07/10] send-email: reduce dependancies impact on parse_address_line
Junio C Hamano gits...@pobox.com writes Suffix rgx that means regular expression is a bit unusual, and also hard to read when squashed to another word. Elsewhere in the same script, we seem to use $re_whatever to store precompiled regular expressions, so perhaps $re_comment, $re_quote, etc.? Yes it's indeed a better name. I had not seen it, thanks! +if ($str_address ne $str_phrase ne ) { +$str_address = qq[$str_address]; +} We see both git@vger.kernel.org and git@vger.kernel.org around here for an address without comment or phrase; this chooses to turn them both into git@vger.kernel.org form? Not a complaint but am thinking aloud to see if I am reading it correctly. If there's no phrase, this will choose the git@vger.kernel.org form, in both cases, because it'll be recognize as an address, $str_address will be git@vger.kernel.org and $str_phrase will be empty before the if ($str_address ne ...) Here are some tests: Input: j...@example.com Split: j...@example.com M::A : j...@example.com -- Input: j...@example.com Split: j...@example.com M::A : j...@example.com -- Input: Jane j...@example.com Split: Jane j...@example.com M::A : Jane j...@example.com -- Input: Jane Doe j...@example.com Split: Jane Doe j...@example.com M::A : Jane Doe j...@example.com -- Input: Jane j...@example.com Split: Jane j...@example.com M::A : Jane j...@example.com -- Input: Doe, Jane j...@example.com Split: Doe, Jane j...@example.com M::A : Doe, Jane j...@example.com I've some more tests, maybe I should put them all in this post ? -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH/RFC v4 07/10] send-email: reduce dependancies impact on parse_address_line
Remi Lespinet remi.lespi...@ensimag.grenoble-inp.fr writes: parse_address_line had not the same behavior whether the user had Mail::Address or not. Teach parse_address_line to behave like Mail::Address. Sounds like a fun project ;-) + my $commentrgx=qr/\((?:[^)]*)\)/; + my $quotergx=qr/(?:[^\\\]|\\.)*/; + my $wordrgx=qr/(?:[^][\s():;@\\,.]|\\.)+/; + my $tokenrgx = qr/(?:$quotergx|$wordrgx|$commentrgx|\S)/; Suffix rgx that means regular expression is a bit unusual, and also hard to read when squashed to another word. Elsewhere in the same script, we seem to use $re_whatever to store precompiled regular expressions, so perhaps $re_comment, $re_quote, etc.? + my @tokens = map { $_ =~ /\s*($tokenrgx)\s*/g } @_; + push @tokens, ,; + + my (@addr_list, @phrase, @address, @comment, @buffer) = (); + foreach my $token (@tokens) { + if ($token =~ /^[,;]$/) { + if (@address) { + push @address, @buffer; + } else { + push @phrase, @buffer; + } + + my $str_phrase = join ' ', @phrase; + my $str_address = join '', @address; + my $str_comment = join ' ', @comment; + + if ($str_phrase =~ /[][():;@\\,.\000-\037\177]/) { + $str_phrase =~ s/(^|[^\\])/$1/g; + $str_phrase = qq[$str_phrase]; + } + + if ($str_address ne $str_phrase ne ) { + $str_address = qq[$str_address]; + } We see both git@vger.kernel.org and git@vger.kernel.org around here for an address without comment or phrase; this chooses to turn them both into git@vger.kernel.org form? Not a complaint but am thinking aloud to see if I am reading it correctly. + + my $str_mailbox = $str_phrase $str_address $str_comment; + $str_mailbox =~ s/^\s*|\s*$//g; So an empty @comment will not leave spaces after $str_address, which makes sense (likewise for @phrase). + push @addr_list, $str_mailbox if ($str_mailbox); + + @phrase = @address = @comment = @buffer = (); + } elsif ($token =~ /^\(/) { + push @comment, $token; + } elsif ($token eq ) { + push @phrase, (splice @address), (splice @buffer); That is a clever use of splice (My Perl's rusty; you learn new things every day) ;-) + } elsif ($token eq ) { + push @address, (splice @buffer); + } elsif ($token eq @) { + push @address, (splice @buffer), @; + } elsif ($token eq .) { + push @address, (splice @buffer), .; + } else { + push @buffer, $token; + } + } + + return @addr_list; } -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH/RFC v4 07/10] send-email: reduce dependancies impact on parse_address_line
parse_address_line had not the same behavior whether the user had Mail::Address or not. Teach parse_address_line to behave like Mail::Address. When the user input is correct, this implementation behaves exactly like Mail::Address except when there are quotes inside the name: Jane Doe j...@example.com In this case the result of parse_address_line is: With M::A : Jane Do e j...@example.com Without : Jane Do e j...@example.com When the user input is not correct, the behavior is also mostly the same. Unlike Mail::Address, this doesn't parse groups and recursive commentaries. Signed-off-by: Remi Lespinet remi.lespi...@ensimag.grenoble-inp.fr --- git-send-email.perl | 54 +++-- 1 file changed, 52 insertions(+), 2 deletions(-) diff --git a/git-send-email.perl b/git-send-email.perl index a0cd7ff..a1f6c18 100755 --- a/git-send-email.perl +++ b/git-send-email.perl @@ -477,9 +477,59 @@ foreach my $entry (@bcclist) { sub parse_address_line { if ($have_mail_address) { return map { $_-format } Mail::Address-parse($_[0]); - } else { - return split_addrs($_[0]); } + + my $commentrgx=qr/\((?:[^)]*)\)/; + my $quotergx=qr/(?:[^\\\]|\\.)*/; + my $wordrgx=qr/(?:[^][\s():;@\\,.]|\\.)+/; + my $tokenrgx = qr/(?:$quotergx|$wordrgx|$commentrgx|\S)/; + + my @tokens = map { $_ =~ /\s*($tokenrgx)\s*/g } @_; + push @tokens, ,; + + my (@addr_list, @phrase, @address, @comment, @buffer) = (); + foreach my $token (@tokens) { + if ($token =~ /^[,;]$/) { + if (@address) { + push @address, @buffer; + } else { + push @phrase, @buffer; + } + + my $str_phrase = join ' ', @phrase; + my $str_address = join '', @address; + my $str_comment = join ' ', @comment; + + if ($str_phrase =~ /[][():;@\\,.\000-\037\177]/) { + $str_phrase =~ s/(^|[^\\])/$1/g; + $str_phrase = qq[$str_phrase]; + } + + if ($str_address ne $str_phrase ne ) { + $str_address = qq[$str_address]; + } + + my $str_mailbox = $str_phrase $str_address $str_comment; + $str_mailbox =~ s/^\s*|\s*$//g; + push @addr_list, $str_mailbox if ($str_mailbox); + + @phrase = @address = @comment = @buffer = (); + } elsif ($token =~ /^\(/) { + push @comment, $token; + } elsif ($token eq ) { + push @phrase, (splice @address), (splice @buffer); + } elsif ($token eq ) { + push @address, (splice @buffer); + } elsif ($token eq @) { + push @address, (splice @buffer), @; + } elsif ($token eq .) { + push @address, (splice @buffer), .; + } else { + push @buffer, $token; + } + } + + return @addr_list; } sub split_addrs { -- 1.9.1 -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH/RFC v4 07/10] send-email: reduce dependancies impact on parse_address_line
Remi Lespinet remi.lespi...@ensimag.grenoble-inp.fr writes: --- git-send-email.perl | 54 +++-- 1 file changed, 52 insertions(+), 2 deletions(-) diff --git a/git-send-email.perl b/git-send-email.perl index a0cd7ff..a1f6c18 100755 --- a/git-send-email.perl +++ b/git-send-email.perl @@ -477,9 +477,59 @@ foreach my $entry (@bcclist) { sub parse_address_line { if ($have_mail_address) { return map { $_-format } Mail::Address-parse($_[0]); - } else { - return split_addrs($_[0]); } + + my $commentrgx=qr/\((?:[^)]*)\)/; + my $quotergx=qr/(?:[^\\\]|\\.)*/; + my $wordrgx=qr/(?:[^][\s():;@\\,.]|\\.)+/; Spaces around = please. The code below is a bit hard to read (I'm neither fluent in Perl nor in the RFC ...). A few more comments would help. A few examples below (it's up to you to integrate them or not). + my $tokenrgx = qr/(?:$quotergx|$wordrgx|$commentrgx|\S)/; + + my @tokens = map { $_ =~ /\s*($tokenrgx)\s*/g } @_; + push @tokens, ,; # parse a full address like # Phrase (comment) addr...@example.com (to clarify the wording) + my (@addr_list, @phrase, @address, @comment, @buffer) = (); + foreach my $token (@tokens) { + if ($token =~ /^[,;]$/) { Here and below: you're indenting with a 4-column offset, it should be 8. + if (@address) { + push @address, @buffer; + } else { + push @phrase, @buffer; + } + + my $str_phrase = join ' ', @phrase; + my $str_address = join '', @address; + my $str_comment = join ' ', @comment; # Escape special-characters with backslash + if ($str_phrase =~ /[][():;@\\,.\000-\037\177]/) { + $str_phrase =~ s/(^|[^\\])/$1/g; + $str_phrase = qq[$str_phrase]; + } + + if ($str_address ne $str_phrase ne ) { + $str_address = qq[$str_address]; + } + + my $str_mailbox = $str_phrase $str_address $str_comment; + $str_mailbox =~ s/^\s*|\s*$//g; + push @addr_list, $str_mailbox if ($str_mailbox); + + @phrase = @address = @comment = @buffer = (); + } elsif ($token =~ /^\(/) { + push @comment, $token; + } elsif ($token eq ) { + push @phrase, (splice @address), (splice @buffer); + } elsif ($token eq ) { + push @address, (splice @buffer); + } elsif ($token eq @) { + push @address, (splice @buffer), @; + } elsif ($token eq .) { + push @address, (splice @buffer), .; + } else { # We don't know what the token belongs to yet. We'll # decide where to append @buffer later. + push @buffer, $token; + } + } + + return @addr_list; -- Matthieu Moy http://www-verimag.imag.fr/~moy/ -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html