Hello all,
After quite a few years of using Perl, I suddenly discovered how complex regular expressions can be written, using the qr// operator. It does appear in the perlop manpage, but somehow I managed not to realize how useful it is. Have a look on the code below, which was taken from http://www.cs.cmu.edu/~cache/email/ Note that $atext contains the *compiled* regular expression, so it can be used within another regular expression, possibly to create another compiled regular expression, such as $dot_atom_text below. The variable doesn't contain the string which created the regular expression, nor some matching result, but it takes the meaning of the regular expression, and encapsulates it in a string. Now just skip to the bottom lines, and see how readable the final expression is. Remember all these patterns for "what I call a whitespace" or "this, this or that" which are repeated ten times in a long one-liner? I really wonder what they were good for. Which makes me wonder: Why aren't all nontrivial regular expressions written like this? Eli sub is_valid_email ($) { my ($addr) = @_; my $atext = qr/[A-Za-z0-9\!\#\$\%\&\'\*\+\-\/\=\?\^\_\`\{\|\+\~]/; my $dot_atom_text = qr/$atext+(\.$atext+)*/; my $no_ws_ctl_char = qr/[\x01-\x08\x0b\x0c\x0e-\x1f\x7f]/; my $qtext_char = qr/([\x21\x23-\x5b\x5d-\x7e]|$no_ws_ctl_char)/; my $text = qr/[\x01-\x09\x0b\x0c\x0e-\x7f]/; my $qtext = qr/($qtext_char|\\$text)*/; my $quoted_string = qr/"$qtext"/; my $quotedpair = qr/\\$text/; my $dtext = qr/[\x21-\x5a\x5e-\x7e\x01-\x08\x0b\x0c\x0e-\x1f\x7f]/; my $dcontent = qr/($dtext|$quotedpair)/; my $domain_literal = qr/\[(${dcontent})*\]/; if ( $addr =~ /^($dot_atom_text|$quoted_string)\@($dot_atom_text|$domain_literal)$/ ) { return 1; } else { return 0; } } -- Web: http://www.billauer.co.il _______________________________________________ Perl mailing list [email protected] http://perl.org.il/mailman/listinfo/perl
