Re: Email::Valid

2001-06-01 Thread Peter Haworth

On Wed, 30 May 2001 17:14:15 +0100, Matthew Robinson wrote:
 RFC822 will allow all of the following (taken from CGI Programming with
 Perl) and was designed to accept all the addresses in use in 1982:
 
 Alfred Neuman Neuman@BBN-TENEXA
 :sysmail@ Some-Group. Some-Org
 Muhammed.(I am the greatest) Ali @(the)vegas.WBA

Attached is the address parser from my mail client (which I might eventually release). 
It returns an arrayref of hashrefs, containing:
  addr = The actual address (minus comments)
  comment = All the comments
  text = The whole text of the address
  name = The name

If I parse q(Alfred Neuman Neuman@BBN-TENEXA, :sysmail@ Some-Group. Some-Org, 
Muhammed.(I am the greatest) Ali @(the)vegas.WBA) with it, I get this back:

$VAR1 = [
  {
'text' = 'Alfred Neuman Neuman@BBN-TENEXA',
'comment' = undef,
'addr' = 'Neuman@BBN-TENEXA',
'name' = 'Alfred Neuman'
  },
  {
'text' = ' :sysmail@ Some-Group. Some-Org ',
'comment' = undef,
'addr' = ':[EMAIL PROTECTED]',
'name' = 'Alfred Neuman'
  },
  {
'text' = 'Muhammed.(I am the greatest) Ali @(the)vegas.WBA',
'comment' = ' (I am the greatest) (the)',
'addr' = '[EMAIL PROTECTED]',
'name' = 'Alfred Neuman'
  }
];

Oooh, look! It's broken! Oh well, back to the drawing board.

-- 
Peter Haworth   [EMAIL PROTECTED]
``Shall we have perl yell if the string Matt Wright
  is found in a comment when running under -w too?''
-- Dan Sugalski

# $Revision: 1.8 $

%token tComma tColon tSemi
%token tAngLeft tAngRight
%token tAt tDot
%token tAtom tQuotedString tQuotedPair

%%

addresses:
  address
{ [ $_[1] ] }
| addresses tComma address
{ [ @{$_[1]},$_[3] ] }
;

address:
  address_
{ 
  $_[0]-ParseComments;
  my $data=$_[0]-YYData;

  my $addr={
addr = $_[1],
comment = $data-{COMMENT},
text = $data-{TEXT},
name = $data-{NAME},
  };
  delete $data-{COMMENT};
  delete $data-{TEXT};
  $addr-{name}=~s/^\s+//s;
  $addr;
}
;

address_:
  group
| mailbox
;

group:
  phrase tColon mailboxes tSemi
;

mailboxes:
  mailbox
| mailboxes tComma mailbox
;

mailbox:
  addr_spec
| opt_phrase route_addr
{ $_[0]-YYData-{NAME}.= $_[1]; $_[2] }
;

addr_spec:
  local_part tAt domain
{ $_[1]$_[2]$_[3] }
;

opt_phrase:
| phrase
;

phrase:
  word
| phrase word
{ $_[1] $_[2] }
;

route_addr:
  tAngLeft opt_route addr_spec tAngRight
{ $_[3] } # XXX Ignore route for now
;

opt_route:
  routes tColon
|
;

routes:
  routes tAt domain
| tAt domain
;

local_part:
  local_part tDot word
{ $_[1]$_[2]$_[3] }
| word
;

domain:
  domain tDot sub_domain
{ $_[1]$_[2]$_[3] }
| sub_domain
;

sub_domain:
  domain_ref
/* | domain_literal */
;

domain_ref:
  tAtom
;

word:
  tAtom
| tQuotedString
;


%%

my %tokens=reverse(
  tComma = ',',
  tColon = ':',
  tSemi = ';',
  tAngLeft = '',
  tAngRight = '',
  tParLeft = '(',
  tParRight = ')',
  tBraLeft = '[',
  tBraRight = ']',
  tAt = '@',
  tDot = '.',
);
my $tokens=join '',keys %tokens;

# Remove whitespace and comments
# This is done outside the lexer, since we call it before the first token
sub ParseComments{
  my($parser)=@_;
  my $data=$parser-YYData;

  for($data-{INPUT}){
while(s/^(\s+)// || /^\(/){
  $data-{TEXT}.=$1;
  if(s/^\(//){
my $level=1;
my $ctext='(';
while($level){
  s/^([^()\\]+)//
and $ctext.=$1;
  s/^((?:\\.)+)//
and $ctext.=$1;
  s/^\(//
and $ctext.='(' and ++$level;
  if(s/^\)//){
$ctext.=')';
last unless --$level;
  }
}
$data-{COMMENT}.= $ctext;
$data-{TEXT}.=$ctext;
  }
}
  }
}

# Debugging version
sub __Lexer{
  my($parser)=@_;
  my @ret=_Lexer;

  local $=',';
  warn Lex returned: (@ret)\n;
  @ret;
}

sub _Lexer{
  my($parser)=@_;
  my $data=$parser-YYData;

  # Remove whitespace and comments
  $parser-ParseComments;

  # Determine next token
  for($data-{INPUT}){
return ('',undef) if $_ eq '';

if(s/^([\Q$tokens\E])//o){
  $data-{TEXT}.=$1 unless $1 eq ',';
  return ($tokens{$1},$1);
}
if(s/^//){
  my $str;
  while(1){
if(s/^//){
  $data-{TEXT}.=qq($str);
  return (tQuotedString = $str);
}elsif(s/^\\(.)//s){
  $str.=$1;
}elsif(s/^([^\\]+)//){
  $str.=$1;
}else{
  $data-{TEXT}.=qq($str);
  return (tQuotedString = $str);
}
  }
}
if(s/^\\(.)//s){
  $data-{TEXT}.=\\$1;
  return (tQuotedPair = $1);
}
if(s/^([^\s\000-\037()\@,;\\.\[\]]+)//){
  $data-{TEXT}.=$1;
  return (tAtom = $1);
}
  }
  if(s/^(.)//s){
$data-{TEXT}.=$1;
return (tUnknown = $1);
  }
}

sub _Error{
  my($self)=@_;

  # XXX 

Re: Email::Valid

2001-05-30 Thread Simon Wistow

Andy Williams wrote:
 
 Has any one used this module at all?

How does it match up against tchrist's stuff?

http://sunsite.lanet.lv/ftp/mirror/x2ftp/msdos/admtools/ckaddr



-- 
simon wistowwireless systems coder
i think, i said i think this is our fault.



Re: Email::Valid

2001-05-30 Thread Andy Williams

On Wed, 30 May 2001, Simon Wistow wrote:

 Andy Williams wrote:
 
  Has any one used this module at all?

 How does it match up against tchrist's stuff?


All the one's that claimed to be valid from E::V failed chaddr!
[EMAIL PROTECTED] had this result from chaddr:
user: andyw. is good
host: hillway.com is good
address `[EMAIL PROTECTED]' is bad: rfc822 failure

So I guess [EMAIL PROTECTED] is invalid even though it works wierd!

Thanks

Andy




Re: Email::Valid

2001-05-30 Thread Dominic Mitchell

On Wed, May 30, 2001 at 11:40:03AM -0400, Andy Williams wrote:
 All the one's that claimed to be valid from E::V failed chaddr!
 [EMAIL PROTECTED] had this result from chaddr:
 user: andyw. is good
 host: hillway.com is good
 address `[EMAIL PROTECTED]' is bad: rfc822 failure
 
 So I guess [EMAIL PROTECTED] is invalid even though it works wierd!

What is valid on the left hand side of an email address is extremely
weird anyway.  Practically anything is allowed.  A pseudo grammar for
them is in RFC822.  There's also much fun trying to parse them in
Friedl's book on regular expressions (the owl book).  He ends up with a
mammoth 5k regex to parse email addresses...

-Dom

-- 
| Semantico: creators of major online resources  |
|   URL: http://www.semantico.com/   |
|   Tel: +44 (1273) 72   |
|   Address: 33 Bond St., Brighton, Sussex, BN1 1RD, UK. |



Re: Email::Valid

2001-05-30 Thread Greg McCarroll

* Andy Williams ([EMAIL PROTECTED]) wrote:
 
 So I guess [EMAIL PROTECTED] is invalid even though it works wierd!
 

its not the email address thats broken, its your SMTP server ;-)

-- 
Greg McCarroll  http://www.mccarroll.uklinux.net



Re: Email::Valid

2001-05-30 Thread Matthew Byng-Maddick

On Wed, May 30, 2001 at 11:02:11AM -0400, Andy Williams wrote:
 Has any one used this module at all?
 I just tried it and got some wierd results!!!
 It though the following where VALID:
 [EMAIL PROTECTED]
 tricad@dial,pipex.com
 [EMAIL PROTECTED],co.uk
 enquiries@peter-il;land.co.uk
 martyn@the,coot.freeserveco.uk
 shirleyhemes@.uk.com
 [EMAIL PROTECTED],co.uk
 3jsolution@.21.com
 paula,[EMAIL PROTECTED]
 ian,[EMAIL PROTECTED]
 [EMAIL PROTECTED]

You are correct in that these shouls all be invalid.

 and that this was INVALID:
 [EMAIL PROTECTED]

It is.

RFC822 S6.1
| local-part  =  word *(. word) ; uninterpreted
| ; case-preserved

and S3.3(reordered) for the definitions of word
| word=  atom / quoted-string
|
| atom=  1*any CHAR except specials, SPACE and CTLs
|
| quoted-string =  *(qtext/quoted-pair) ; Regular qtext or
| ;   quoted chars.
|
| qtext   =  any CHAR excepting , ; = may be folded
| \  CR, and including
| linear-white-space
|
| specials=  ( / ) /  /  / @  ; Must be in quoted-
| /  , / ; / : / \ /   ;  string, to use
| /  . / [ / ]  ;  within a word.
|
| CTL =  any ASCII control   ; (  0- 37,  0.- 31.)
| character and DEL  ; (177, 127.)
|
| SPACE   =  ASCII SP, space; ( 40,  32.)
|
| CR  =  ASCII CR, carriage return  ; ( 15,  13.)
|
| CHAR=  any ASCII character; (  0-177,  0.-127.)
|
| LWSP-char   =  SPACE / HTAB ; semantics = SPACE
|
| linear-white-space =  1*([CRLF] LWSP-char)  ; semantics = SPACE
| ; CRLF = folding
| HTAB=  ASCII HT, horizontal-tab   ; ( 11,   9.)
|
|  =  ASCII quote mark   ; ( 42,  34.)

 I've tried [EMAIL PROTECTED] and it works fine

Be conservative in what you send and liberal in what you accept

 I've tried the one's above that claim to be VALID and they all fail.

They are all wrong.

MBM




Re: Email::Valid

2001-05-30 Thread Andy Williams

On Wed, 30 May 2001, Greg McCarroll wrote:

 * Andy Williams ([EMAIL PROTECTED]) wrote:
 
  So I guess [EMAIL PROTECTED] is invalid even though it works wierd!
 

 its not the email address thats broken, its your SMTP server ;-)


Could be right it's sendmail :(

Andy




Re: Email::Valid

2001-05-30 Thread Matthew Byng-Maddick

On Wed, May 30, 2001 at 11:49:06AM -0400, Andy Williams wrote:
 On Wed, 30 May 2001, Greg McCarroll wrote:
  * Andy Williams ([EMAIL PROTECTED]) wrote:
   So I guess [EMAIL PROTECTED] is invalid even though it works wierd!
  its not the email address thats broken, its your SMTP server ;-)
 Could be right it's sendmail :(

Exim allows it too, surprisingly. I don't know about qm**l or postfix.

of course, andyw.@hillway.com is actually valid. :)

MBM




Re: Email::Valid

2001-05-30 Thread Andy Williams





This man is not guilty of manslaughter, he is only guilty
of being Arnold J. Rimmer. That is his crime... it is also
his punishment.


On Wed, 30 May 2001, Matthew Byng-Maddick wrote:

snip

 You are correct in that these shouls all be invalid.

Great.

  and that this was INVALID:
  [EMAIL PROTECTED]

 It is.
Damn

another snip
Thanks for the RFC... I think!
 Be conservative in what you send and liberal in what you accept

I will...

Andy




RE: Email::Valid

2001-05-30 Thread Scottow Adrian - adscot

Hello,

This can be found in the Owl book - Mastering Regular Expressions or on the
web at: http://public.yahoo.com/~jfriedl/regex/code.html.

If you try running these valid emails through this bit of code is says they
are all invalid...

Cheers,

Adrian

-Original Message-
From: Dominic Mitchell [mailto:[EMAIL PROTECTED]]
Sent: Wednesday, May 30, 2001 4:46 PM
To: [EMAIL PROTECTED]
Subject: Re: Email::Valid


On Wed, May 30, 2001 at 11:40:03AM -0400, Andy Williams wrote:
 All the one's that claimed to be valid from E::V failed chaddr!
 [EMAIL PROTECTED] had this result from chaddr:
 user: andyw. is good
 host: hillway.com is good
 address `[EMAIL PROTECTED]' is bad: rfc822 failure
 
 So I guess [EMAIL PROTECTED] is invalid even though it works wierd!

What is valid on the left hand side of an email address is extremely
weird anyway.  Practically anything is allowed.  A pseudo grammar for
them is in RFC822.  There's also much fun trying to parse them in
Friedl's book on regular expressions (the owl book).  He ends up with a
mammoth 5k regex to parse email addresses...

-Dom

-- 
| Semantico: creators of major online resources  |
|   URL: http://www.semantico.com/   |
|   Tel: +44 (1273) 72   |
|   Address: 33 Bond St., Brighton, Sussex, BN1 1RD, UK. |


The information contained in this communication is
confidential, is intended only for the use of the recipient
named above, and may be legally privileged. If the reader 
of this message is not the intended recipient, you are
hereby notified that any dissemination, distribution or
copying of this communication is strictly prohibited.  
If you have received this communication in error, please 
re-send this communication to the sender and delete the 
original message or any copy of it from your computer
system.



Re: Email::Valid

2001-05-30 Thread Matthew Byng-Maddick

On Wed, May 30, 2001 at 11:56:56AM -0400, Andy Williams wrote:
 On Wed, 30 May 2001, Matthew Byng-Maddick wrote:
 snip
  You are correct in that these shouls all be invalid.
 Great.
   and that this was INVALID:
   [EMAIL PROTECTED]
  It is.
 Damn
 another snip
 Thanks for the RFC... I think!

:)

  Be conservative in what you send and liberal in what you accept
 I will...

Sorry, that wasn't to you, so much as why the mailer accepts it. It is
something occasionally seen, mostly the people I've seen doing it are
spammers, and should therefore die anyway.

A quick test shows that SAUCE doesn't like it, although I'm going to
have to file a bug report against SAUCE as it doesn't deal properly
with quoting, it accepts the quoted version, though. :)

MBM




Re: Email::Valid

2001-05-30 Thread Andy Williams

On Wed, 30 May 2001, Matthew Byng-Maddick wrote:

 On Wed, May 30, 2001 at 11:56:56AM -0400, Andy Williams wrote:
  On Wed, 30 May 2001, Matthew Byng-Maddick wrote:
  snip
   You are correct in that these shouls all be invalid.
  Great.
and that this was INVALID:
[EMAIL PROTECTED]
   It is.
  Damn
  another snip
  Thanks for the RFC... I think!

 :)

   Be conservative in what you send and liberal in what you accept
  I will...

 Sorry, that wasn't to you, so much as why the mailer accepts it. It is
 something occasionally seen, mostly the people I've seen doing it are
 spammers, and should therefore die anyway.

 A quick test shows that SAUCE doesn't like it, although I'm going to
 have to file a bug report against SAUCE as it doesn't deal properly
 with quoting, it accepts the quoted version, though. :)


Suprise, suprise... MS Exchange excepts it!

Andy




Re: Email::Valid

2001-05-30 Thread Matthew Robinson

From: Dominic Mitchell [EMAIL PROTECTED]
Sent: Wednesday, May 30, 2001 4:45 PM


 On Wed, May 30, 2001 at 11:40:03AM -0400, Andy Williams wrote:
  All the one's that claimed to be valid from E::V failed chaddr!
  [EMAIL PROTECTED] had this result from chaddr:
  user: andyw. is good
  host: hillway.com is good
  address `[EMAIL PROTECTED]' is bad: rfc822 failure
 
  So I guess [EMAIL PROTECTED] is invalid even though it works wierd!

 What is valid on the left hand side of an email address is extremely
 weird anyway.  Practically anything is allowed.  A pseudo grammar for
 them is in RFC822.  There's also much fun trying to parse them in
 Friedl's book on regular expressions (the owl book).  He ends up with a
 mammoth 5k regex to parse email addresses...

 -Dom


Having just had a look at E::V it looks like the module is using the
'mammoth 5k regex'.  I prefer the regex that is given in CGI Programming
with Perl.  This regex is designed to accept the more common address
formats.

RFC822 will allow all of the following (taken from CGI Programming with
Perl) and was designed to accept all the addresses in use in 1982:

Alfred Neuman Neuman@BBN-TENEXA
:sysmail@ Some-Group. Some-Org
Muhammed.(I am the greatest) Ali @(the)vegas.WBA

I have checked the following code against the original test cases which
originally returned as valid and none of the list are considered valid.

sub IsValidAddress {
my $addr_to_check = shift;

$addr_to_check =~ s/((?:[^\\]|\\.)*|[^\t ]*)[ \t]*/$1/g;

my $esc= '';
my$space   = '\040';
m $ctrl= '\000-\037';
my $dot= '\.';
my $nonASCII  = '\x80-\xff';
my $CRlist   = '\012\015';
my $letter   = 'a-zA-Z';
my $digit   = '\d';

my $atom_char  = qq{ [^$space\@,;:.\\[\\]$esc$ctrl$nonASCII] };
my $atom= qq{ $atom_char+ };
my $byte= qq{ (?: 1?$digit?$digit |
2[0-4]$digit  |
25[0-5]) };

my $qtext   = qq{ [^$esc$nonASCII$CRlist] };
my $quoted_pair = qq{ $esc [^$nonASCII] };
my $quoted_str  = qq{  (?: $qtext | $quoted_pair )*  };

my $word= qq{ (?: $atom | $quoted_str ) };
my $ip_address  = qq{ \\[ $byte (?: $dot $byte ){3} \\] };
my $sub_domain  = qq{ [$letter$digit]
[$letter$digit-]{0,61}
[$letter$digit]};
my $top_level  = qq{ (?: $atom_char ){2,4} };
my $domain_name = qq{ (?: $sub_domain $dot )+ $top_level };
my $domain   = qq{ (?: $domain_name | $ip_address ) };
my $local_part  = qq{ $word (?: $dot $word )* };

my $address= qq{ $local_part \@ $domain };

return $addr_to_check =~ /^$address$/ox ? $addr_to_check : ;
}


Hope this helps,

Matt
--
s!msfQ!s$utvKs(Q)\1!sfiupoBs^reverse Ibdlfses^#
s$#!uojsqs(.)chr(ord($1)-1)ges(.*)reverse $1see






Re: Email::Valid

2001-05-30 Thread Matthew Byng-Maddick

On Wed, May 30, 2001 at 05:14:15PM +0100, Matthew Robinson wrote:
[IsValidAddress sub, which was built from the RFC822 grammar...]

Unfortunately, a fair few mailers don't allow IP literals as valid
domain-parts anymore, due to abuse.

MBM