On 12/06/2016 21:14, Bill Cole wrote:

I was not at all confused, but sometimes when people are Wrong On The Internet in special ways I cannot resist the urge to respond with a paraphrased geek meme...

Look up Jamie Zawinski's famous "2 problems" quote regarding regular expressions. It is a perfect fit for the application of regular expressions to address validation

It is actually for another software project (a mail server)

Please don't take this as derogatory, because I DO NOT mean it to be, but can you explain why the world needs yet another new mail server implementation?

As an example of why I ask this, consider that Microsoft rewrote the SMTP implementation in Exchange 2013 and did it wrong, breaking multi-recipient message handling. I guess they had some reason, but the point is that new code means new bugs, even when you have an elaborate QA organization in place to prevent that.

that, being a mail server, must ensure email addresses are valid.

Not really. It needs to make sure that it never generates invalid addresses and it probably should check addresses in its inputs for types of invalidity that your later code will assume not to be present, but those are both far from a need to validate addresses perfectly (or even near-perfectly) to the RFC specification. Having a logical set of addresses that you'd never generate but will still blindly and harmlessly work with, some of which may not fit the RFC specs, is a NON-PROBLEM.

Even if you wanted to draw a RFC-perfect boundary between valid and invalid addresses, complex regular expressions are a poor tool for that because the logic of REs don't align to that of the ABNF used in RFCs. A single regular expression CANNOT precisely match the whole RFC822/2822/5322 address space. The closest approximation in Perl RE is huge, indecipherable, and machine-generated. It also cannot deal with nested comments, a valid albeit pathological address structure under the ABNF definition. In POSIX RE the problems are MUCH worse.

On the other hand, you COULD use very simple REs to serially and recursively decompose addresses into the constructs defined by the ABNF spec, using the same logic as the spec to validate addresses. This is not as interesting a "problem" as writing the One True RFC822 RE, but it is a fairly trivial coding exercise and would run more efficiently than a single RE with the benefit of being more readable and debuggable.

I quoted the regexp in context of showing my point about how 'squiggly' they can be and that I am able to read them.....to a point. (I was proud because 'googling' around for a regex email address validator string shows some VERY suspicious and extortionately,seemingly unnecessarily, long offerings. So I had a go myself).

And just like a hilariously long list of predecessors, came up with a RE which fails to precisely reproduce the ABNF definition of a valid address for message headers. This is why you now have 2 problems:

1. The one you invented of needing to precisely validate email addresses to a RFC specification that is not a perfect match for the addressing supported by any coherent package of production-grade mail software.

2. A regular expression that is absurdly complex which you incorrectly believe solves (1) while in fact it does not. It is maybe good enough, but maybe not. It's an untestable approximation of its design goal, which is an intrinsic problem for software.




.......AND relax!

Reply via email to