On Fri, 1 Oct 2010 09:34:01 pm Norman Khine wrote: > hello, i have this code > > http://pastie.org/1193091 > > i would like to extend this so that it validates TLD's such as > .travel and .museum, i can do this by changing {2,4} to {2,7} but > this sort of defeats the purpose of validating the correct email > address.
The only good advice for using regular expressions to validate emails addresses is... Don't. Just don't even try. The only way to validate an email address is to actually try to send email to it and see if it can be delivered. That is the ONLY way to know if an address is valid. First off, even if you could easily detect invalid addresses -- and you can't, but for the sake of the argument let's pretend you can -- then this doesn't help you at all. [email protected] is syntactically valid, but I guarantee that it will *never* be deliverable. [email protected] is syntactically correct, and it *could* be a real address, but if you can actually deliver mail to it, I'll eat my hat. If you absolutely must try to detect syntactically invalid addresses, the most you should bother is to check that the string isn't blank. If you don't care about local addresses, you can also check that it contains at least one @ sign. (A little known fact is that email addresses can contain multiple @ signs.) Other than that, leave it up to the mail server to validate the address -- which it does by trying to deliver mail to it. Somebody has created a Perl regex to validate *some* email addresses. Even this one doesn't accept all valid addresses, although it comes close: http://www.ex-parrot.com/~pdw/Mail-RFC822-Address.html Read it and weep. See here for more info: http://northernplanets.blogspot.com/2007/03/how-not-to-validate-email-addresses.html This is exactly the sort of thing that you should avoid like the plague: http://www.regular-expressions.info/email.html This is full of bad advice. This clown arrogantly claims that his regex "matches any email address". It doesn't. He then goes on to admit "my claim only holds true when one accepts my definition of what a valid email address really is". Oh really? What about the RFC that *defines* what email addresses are? Shouldn't that count for more than the misinformed opinion of somebody who arrogantly dismisses bug reports for his regex because it "matches 99% of the email addresses in use today"? 99% sounds like a lot, but if you have 20,000 people use your software, that's 200 whose valid email address will be misidentified. He goes on to admit that his regex wrongly rejects .museum addresses, but he considers that acceptable. He seriously suggests that it would be a good idea for your program to list all the TLDs, and even all the country codes, even though "by the time you read this, the list might already be out of date". This is shonky programming. Avoid it like poison. -- Steven D'Aprano _______________________________________________ Tutor maillist - [email protected] To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
