RE: [PHP] Re: email validation (no regex)
J B wrote: On 9/21/05, Michael Sims [EMAIL PROTECTED] wrote: Additionally, some mail servers unconditionally accept mail addressed to ANY username at their domain, whether that user actually exists or not. This is very bad practice, because it usually means the accepting MTA is a dumb host that has to forward all incoming mail to an internal mail server which knows which accounts exist, and if that server ends up rejecting the message, the dumb MTA creates a DSN and sends it back to the envelope sender (which is quite often forged). This causes the so-called backscatter which results in innocent people getting bounces for messages they didn't send. Nevertheless, lots of mail servers are configured this way, so you cannot simply assume that an account is real just because you didn't get a 5xx on RCPT TO. Just as a side note, and I do agree that this behaviour is bad practice in principle, but I imagine they (the MTAs) do this for the same reason that login prompts don't tell you when you enter a bogus username and still prompt for the password and give a generic access denied error...it prevents username fishing. There probably are a few people who accept mail to any address at their domain to foil dictionary attacks, but IMHO the vast majority of servers that are set up this way are due to mail admins who just don't know any better. It's not always easy to set up a border MTA so that it knows about the accounts that exist on an internal machine...it usually involves custom scripting or real-time callouts to the internal server and it takes a relatively knowledgeable admin to implement it (at least that has been my experience). I had someone else email me privately saying that they did the above precisely to foil dictionary attacks, but this person configured his server to simply discard email to nonexistent accounts. That has it's disadvantages (since it could make legit senders believe their messages are being delivered when they aren't) but it least it doesn't create any backscatter. In the default case, accepting all email unconditionally then later rejecting it is just irresponsible, since it makes you a vector for abuse, and could eventually get you blacklisted if other mail servers get sick of receiving bogus bounces from your domain... (As a side note, apparently the list software doesn't like the offtopic nature of this sub-thread (I just received a 550 on this message), so this will be my last post on the matter. But since I've gone to the trouble of typing it up let me throw in the words PHP, web, and Apache, so this will make it through. :) ) -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php
RE: [PHP] Re: email validation (no regex)
So, what is the general thought about validating email addresses in this manner? JM Thre is a good reason why virtually everyone uses regex patterns for email validating. Excellent start! And that good reason is...? How can regex ensure that the email address that is submitted is a valid (ie working, able to receive email) address? Why is regex a better way? JM -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php
RE: [PHP] Re: email validation (no regex)
jim... validating email means different things to different people... but there's no way you're going to be able to 'throw' together something in 2-3 days that others have taken years to create/refine... if you only want to determine if an email address is valid, what does that mean to you? are you following the current/latest rfc 2822 (i think) standard? or are you just trying to get a quick halfway ok function... as an example, i was looking at a way of using a regex/function for email validation for a user input form... i decided that it was simply too tough to deal with the various nuances, and chickened out, using a combination perl/php approach... but you could do what you want to do. however, it's going to be painful if you want it to match the rfc spec... good luck... -bruce ps. take a look at perl's email::valid function if you want to get a feel for how extensive this task can get... -Original Message- From: Jim Moseby [mailto:[EMAIL PROTECTED] Sent: Wednesday, September 21, 2005 11:01 AM To: 'Al'; php-general@lists.php.net Subject: RE: [PHP] Re: email validation (no regex) So, what is the general thought about validating email addresses in this manner? JM Thre is a good reason why virtually everyone uses regex patterns for email validating. Excellent start! And that good reason is...? How can regex ensure that the email address that is submitted is a valid (ie working, able to receive email) address? Why is regex a better way? JM -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP] Re: email validation (no regex)
Jim Moseby said the following on 09/21/05 11:00: So, what is the general thought about validating email addresses in this manner? JM Thre is a good reason why virtually everyone uses regex patterns for email validating. Excellent start! And that good reason is...? How can regex ensure that the email address that is submitted is a valid (ie working, able to receive email) address? Why is regex a better way? Personally I would go for a combination. Regex is much faster so if you can eliminate fake addresses with regex you won't have to waste your time attempting to look up MX records or connect to mail servers that don't exist. My apologies for the line wrapping, but the following is a slightly modified function I found online and have been using for a while. It doesn't actually connect to the remote server and try sending to the address provided like your function does, it merely checks for a valid MX for the domain. The extra time spent attempting a fake send to an address was deemed not worth the bother as some mail servers (especially qmail) do not, by default or without patching, block messages from being sent to non-existant email addresses. Instead the message is accepted and bounced. Your method will not detect this. - Ben function isValidEmail($address, $checkMX = false) { // Return true or false depending on whether the email address is valid $valid_tlds = array(arpa, biz, com, edu, gov, int, mil, net, org, aero, ad, ae, af, ag, ai, al, am, an, ao, aq, ar, as, at, au, aw, az, ba, bb, bd, be, bf, bg, bh, bi, bj, bm, bn, bo, br, bs, bt, bv, bw, by, bz, ca, cc, cf, cd, cg, ch, ci, ck, cl, cm, cn, co, cr, cs, cu, cv, cx, cy, cz, de, dj, dk, dm, do, dz, ec, ee, eg, eh, er, es, et, fi, fj, fk, fm, fo, fr, fx, ga, gb, gd, ge, gf, gh, gi, gl, gm, gn, gp, gq, gr, gs, gt, gu, gw, gy, hk, hm, hn, hr, ht, hu, id, ie, il, in, io, iq, ir, is, it, jm, jo, jp, ke, kg, kh, ki, km, kn, kp, kr, kw, ky, kz, la, lb, lc, li, lk, lr, ls, lt, lu, lv, ly, ma, mc, md, mg, mh, mk, ml, mm, mn, mo, mp, mq, mr, ms, mt, mu, mv, mw, mx, my, mz, na, nc, ne, nf, ng, ni, nl, no, np, nr, nt, nu, nz, om, pa, pe, pf, pg, ph, pk, pl, pm, pn, pr, pt, pw, py, qa, re, ro, ru, rw, sa, sb, sc, sd, se, sg, sh, si, sj, sk, sl, sm, sn, so, sr, st, su, sv, sy, sz, tc, td, tf, tg, th, tj, tk, tm, tn, to, tp, tr, tt, tv, tw, tz, ua, ug, uk, um, us, uy, uz, va, vc, ve, vg, vi, vn, vu, wf, ws, ye, yt, yu, za, zm, zr, zw, coop, info, museum, name, pro); // Rough email address validation using POSIX-style regular expressions if (!eregi([EMAIL PROTECTED],}\.[a-z0-9\-\.]{2,}$, $address)) { return false; } else { $address = strtolower($address); } // Explode the address on name and domain parts $name_domain = explode(@, $address); // There can be only one ;-) I mean... the @ symbol if (count($name_domain) != 2) // There can be only one ;-) I mean... the @ symbol if (count($name_domain) != 2) return false; // Check the domain parts $domain_parts = explode(., $name_domain[1]); if (count($domain_parts) 2) return false; // Check the TLD ($domain_parts[count($domain_parts) - 1]) if (!in_array($domain_parts[count($domain_parts) - 1], $valid_tlds)) return false; // Search DNS for MX records corresponding to the hostname ($name_domain[0]) if ($checkMX !getmxrr($name_domain[1], $mxhosts)) return false; return true; } -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php
RE: [PHP] Re: email validation (no regex)
jim... validating email means different things to different people... True, but for the most part people just want to know whether a user has entered a real working email address into their forms. What better test than to try to send an email to it? but there's no way you're going to be able to 'throw' together something in 2-3 days that others have taken years to create/refine... I threw the example I posted together in about 10 minutes (and it shows :). Even though I'm not at a place where I can test it right now, I think it will work with some tweaking. if you only want to determine if an email address is valid, what does that mean to you? are you following the current/latest rfc 2822 (i think) standard? or are you just trying to get a quick halfway ok function... Of course the SMTP standard would have to be followed, I typed what you see from memory, just as a conceptual model. as an example, i was looking at a way of using a regex/function for email validation for a user input form... i decided that it was simply too tough to deal with the various nuances, and chickened out, using a combination perl/php approach... So what do you get from them that my function would not give you? but you could do what you want to do. however, it's going to be painful if you want it to match the rfc spec... Really? Why does it need to be painful? I just need to do a 'EHLO', 'Mail From:' and 'RCPT to:' and 'QUIT'. It's not going to actually send an email. Seems simple to me. Maybe there's something else in the spec that I don't see? good luck... Thanks. :o) ps. take a look at perl's email::valid function if you want to get a feel for how extensive this task can get... My question is why does it have to be so complicated? SMTP servers are the best email validation devices known to man. Why not let them do the dirty work? JM -- playing devils advocate :o) -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php
RE: [PHP] Re: email validation (no regex)
What you have is virtually impossible to determine if all legitimate possibilities are covered. email validation using regex is a very heavily analyzed subject Google regex email validate and you'll find loads of expressions. Look at the Zend article, it provides some insight. I fully understand about the almost limitless possibilities. Googling the subject returns results more mind boggling than the regex itself. :o) Do ANY of the regex examples you have found cover all those possibilities? If so, why are there so many different approaches? For most applications, where you will only be validating a small number of emails in a given day, why put yourself to all the regex pain, still to not have covered all the possibilities? In the end, with regards to email validation, all most people need is to know that a given email has a proper username, just 1 '@' in the middle, and a valid domain. If it doesn't, its a bogus email address. As to that, why not validate the email address by sending an automated message to the supplied account, requiring the person to click on a validation link? Easy, simple, works better than either method currently being discussed, purely for its simplicity, if nothing else. Much warmth, Murray --- Lost in thought... http://www.planetthoughtful.org -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php
RE: [PHP] Re: email validation (no regex)
because you should want/need to validate that the address is correct prior to determining if the email server is up running... the regex function simply allows you to quickly determine if the address is valid... doens't mean that it's going to go to an actual live user...!! btw simply checking for a single '@' with a domain doesn't do it... what if the user has '[EMAIL PROTECTED]' or '[EMAIL PROTECTED]'. will your regex accept/deny this??? welcome to the world of email validation -bruce -Original Message- From: Murray @ PlanetThoughtful [mailto:[EMAIL PROTECTED] Sent: Wednesday, September 21, 2005 12:01 PM To: 'Jim Moseby'; php-general@lists.php.net Subject: RE: [PHP] Re: email validation (no regex) What you have is virtually impossible to determine if all legitimate possibilities are covered. email validation using regex is a very heavily analyzed subject Google regex email validate and you'll find loads of expressions. Look at the Zend article, it provides some insight. I fully understand about the almost limitless possibilities. Googling the subject returns results more mind boggling than the regex itself. :o) Do ANY of the regex examples you have found cover all those possibilities? If so, why are there so many different approaches? For most applications, where you will only be validating a small number of emails in a given day, why put yourself to all the regex pain, still to not have covered all the possibilities? In the end, with regards to email validation, all most people need is to know that a given email has a proper username, just 1 '@' in the middle, and a valid domain. If it doesn't, its a bogus email address. As to that, why not validate the email address by sending an automated message to the supplied account, requiring the person to click on a validation link? Easy, simple, works better than either method currently being discussed, purely for its simplicity, if nothing else. Much warmth, Murray --- Lost in thought... http://www.planetthoughtful.org -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php
RE: [PHP] Re: email validation (no regex)
because you should want/need to validate that the address is correct prior to determining if the email server is up running... the regex function simply allows you to quickly determine if the address is valid... doens't mean that it's going to go to an actual live user...!! btw simply checking for a single '@' with a domain doesn't do it... what if the user has '[EMAIL PROTECTED]' or '[EMAIL PROTECTED]'. will your regex accept/deny this??? welcome to the world of email validation -bruce As to that, why not validate the email address by sending an automated message to the supplied account, requiring the person to click on a validation link? Easy, simple, works better than either method currently being discussed, purely for its simplicity, if nothing else. I agree, so basic validation is A Good Thing. However, the most desirable form of validation would have to be, can I send a legitimate email to that account and receive acknowledgement that it's working by having the user click on a validation link. Much warmth, Murray --- Lost in thought... http://www.planetthoughtful.org -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php
RE: [PHP] Re: email validation (no regex)
btw simply checking for a single '@' with a domain doesn't do it... what if the user has '[EMAIL PROTECTED]' or '[EMAIL PROTECTED]'. will your regex accept/deny this??? My function will quickly deny those because the DNS lookup for them will immediately fail. Will your regex deny '[EMAIL PROTECTED]'? Should it? welcome to the world of email validation That's your world. Mine is much simpler. :o) Seriously, I think Ben and Manuel have it right. A combination approach is probably most effective (and complex). I was hoping for a simple solution for the regex challenged. Of course the old tried and true validation email that requires the user to validate himself is the most fool-proof method, but thats not an on-the-fly solution. JM -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php
RE: [PHP] Re: email validation (no regex)
because you should want/need to validate that the address is correct prior to determining if the email server is up running... the regex function simply allows you to quickly determine if the address is valid... doens't mean that it's going to go to an actual live user...!! btw simply checking for a single '@' with a domain doesn't do it... what if the user has '[EMAIL PROTECTED]' or '[EMAIL PROTECTED]'. will your regex accept/deny this??? welcome to the world of email validation -bruce As to that, why not validate the email address by sending an automated message to the supplied account, requiring the person to click on a validation link? Easy, simple, works better than either method currently being discussed, purely for its simplicity, if nothing else. I agree, so basic validation is A Good Thing. However, the most desirable form of validation would have to be, can I send a legitimate email to that account and receive acknowledgement that it's working by having the user click on a validation link. After all, for all the regex / interrogation you perform, you still can't be certain that the user entered an account *they own*. See? Sending a validation email is *also* A Good Thing! Much warmth, Murray --- Lost in thought... http://www.planetthoughtful.org -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php
RE: [PHP] Re: email validation (no regex)
but you could do what you want to do. however, it's going to be painful if you want it to match the rfc spec... Really? Why does it need to be painful? I just need to do a 'EHLO', 'Mail From:' and 'RCPT to:' and 'QUIT'. It's not going to actually send an email. Seems simple to me. Maybe there's something else in the spec that I don't see? Some mail servers can be configured to not reject the email until the end of DATA. I know you can do this in postfix. Although if the user is invalid, why you'd wait I don't know, but it is possible. -philip -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php
RE: [PHP] Re: email validation (no regex)
Philip Hallstrom wrote: but you could do what you want to do. however, it's going to be painful if you want it to match the rfc spec... Really? Why does it need to be painful? I just need to do a 'EHLO', 'Mail From:' and 'RCPT to:' and 'QUIT'. It's not going to actually send an email. Seems simple to me. Maybe there's something else in the spec that I don't see? Some mail servers can be configured to not reject the email until the end of DATA. I know you can do this in postfix. Although if the user is invalid, why you'd wait I don't know, but it is possible. Additionally, some mail servers unconditionally accept mail addressed to ANY username at their domain, whether that user actually exists or not. This is very bad practice, because it usually means the accepting MTA is a dumb host that has to forward all incoming mail to an internal mail server which knows which accounts exist, and if that server ends up rejecting the message, the dumb MTA creates a DSN and sends it back to the envelope sender (which is quite often forged). This causes the so-called backscatter which results in innocent people getting bounces for messages they didn't send. Nevertheless, lots of mail servers are configured this way, so you cannot simply assume that an account is real just because you didn't get a 5xx on RCPT TO. -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php
RE: [PHP] Re: email validation (no regex)
-Original Message- From: Jim Moseby [mailto:[EMAIL PROTECTED] Sent: Wednesday, September 21, 2005 12:21 PM To: php-general@lists.php.net Subject: RE: [PHP] Re: email validation (no regex) btw simply checking for a single '@' with a domain doesn't do it... what if the user has '[EMAIL PROTECTED]' or '[EMAIL PROTECTED]'. will your regex accept/deny this??? My function will quickly deny those because the DNS lookup for them will immediately fail. Will your regex deny '[EMAIL PROTECTED]'? Should it? welcome to the world of email validation That's your world. Mine is much simpler. :o) Seriously, I think Ben and Manuel have it right. A combination approach is probably most effective (and complex). I was hoping for a simple solution for the regex challenged. Of course the old tried and true validation email that requires the user to validate himself is the most fool-proof method, but thats not an on-the-fly solution. jim... these are valid emails... as defined by the rfc.. so your function would be in error.. This is where I think you and I are not connecting. I don't care if they are valid according to the RFC. I want to know if they are likely to be *WORKING* email addresses. And so, from that perspective, my function would not necessarily be in error, but working as designed. Others have brought up truly valid points with regards to the reliability of it though. Different quirks of MTA configuration and function are difficult to overcome. I have learned you cannot rely on 'RCPT To:' responding with a '250' as verification that it is a valid user. I have learned that a domain need not have an MX record at all, to receive mail. Learning is why I'm here, and why I posted this question. Thank you for your input. JM -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP] Re: email validation (no regex)
On 9/21/05, Michael Sims [EMAIL PROTECTED] wrote: Additionally, some mail servers unconditionally accept mail addressed to ANY username at their domain, whether that user actually exists or not. This is very bad practice, because it usually means the accepting MTA is a dumb host that has to forward all incoming mail to an internal mail server which knows which accounts exist, and if that server ends up rejecting the message, the dumb MTA creates a DSN and sends it back to the envelope sender (which is quite often forged). This causes the so-called backscatter which results in innocent people getting bounces for messages they didn't send. Nevertheless, lots of mail servers are configured this way, so you cannot simply assume that an account is real just because you didn't get a 5xx on RCPT TO. Just as a side note, and I do agree that this behaviour is bad practice in principle, but I imagine they (the MTAs) do this for the same reason that login prompts don't tell you when you enter a bogus username and still prompt for the password and give a generic access denied error...it prevents username fishing. Of course, I would think that a better solution would be to do immediate rejection and then block the remote IP after X send attempts with invalid usernames, but maybe there's a compelling reason not to do that and I just haven't thought of it... -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP] Re: email validation (no regex)
J B wrote: On 9/21/05, Michael Sims [EMAIL PROTECTED] wrote: Additionally, some mail servers unconditionally accept mail addressed to ANY username at their domain, whether that user actually exists or not. This is very bad practice, because it usually means the accepting MTA is a dumb host that has to forward all incoming mail to an internal mail server which knows which accounts exist, and if that server ends up rejecting the message, the dumb MTA creates a DSN and sends it back to the envelope sender (which is quite often forged). This causes the so-called backscatter which results in innocent people getting bounces for messages they didn't send. Nevertheless, lots of mail servers are configured this way, so you cannot simply assume that an account is real just because you didn't get a 5xx on RCPT TO. Just as a side note, and I do agree that this behaviour is bad practice in principle, but I imagine they (the MTAs) do this for the same reason that login prompts don't tell you when you enter a bogus username and still prompt for the password and give a generic access denied error...it prevents username fishing. Of course, I would think that a better solution would be to do immediate rejection and then block the remote IP after X send attempts with invalid usernames, but maybe there's a compelling reason not to do that and I just haven't thought of it... If someone else on my ISP tries to username fish and gets my ISP's MTA's IP blocked by any other MTA, I'd sure be pissed off about it. That's probably the reason why they don't block remote IPs after X invalid username send attempts -- MTAs are often shared by many, many users. -- Jasper Bryant-Greene Freelance web developer http://jasper.bryant-greene.name/ -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php