On 12/06/2016 21:14, Bill Cole wrote:
I was not at all confused, but sometimes when people are Wrong On The
Internet in special ways I cannot resist the urge to respond with a
paraphrased geek meme...
Look up Jamie Zawinski's famous "2 problems" quote regarding regular
expressions. It is a perfect fit for the application of regular
expressions to address validation
It is actually for another software project (a mail server)
Please don't take this as derogatory, because I DO NOT mean it to be,
but can you explain why the world needs yet another new mail server
implementation?
As an example of why I ask this, consider that Microsoft rewrote the
SMTP implementation in Exchange 2013 and did it wrong, breaking
multi-recipient message handling. I guess they had some reason, but
the point is that new code means new bugs, even when you have an
elaborate QA organization in place to prevent that.
that, being a mail server, must ensure email addresses are valid.
Not really. It needs to make sure that it never generates invalid
addresses and it probably should check addresses in its inputs for
types of invalidity that your later code will assume not to be
present, but those are both far from a need to validate addresses
perfectly (or even near-perfectly) to the RFC specification. Having a
logical set of addresses that you'd never generate but will still
blindly and harmlessly work with, some of which may not fit the RFC
specs, is a NON-PROBLEM.
Even if you wanted to draw a RFC-perfect boundary between valid and
invalid addresses, complex regular expressions are a poor tool for
that because the logic of REs don't align to that of the ABNF used in
RFCs. A single regular expression CANNOT precisely match the whole
RFC822/2822/5322 address space. The closest approximation in Perl RE
is huge, indecipherable, and machine-generated. It also cannot deal
with nested comments, a valid albeit pathological address structure
under the ABNF definition. In POSIX RE the problems are MUCH worse.
On the other hand, you COULD use very simple REs to serially and
recursively decompose addresses into the constructs defined by the
ABNF spec, using the same logic as the spec to validate addresses.
This is not as interesting a "problem" as writing the One True RFC822
RE, but it is a fairly trivial coding exercise and would run more
efficiently than a single RE with the benefit of being more readable
and debuggable.
I quoted the regexp in context of showing my point about how
'squiggly' they can be and that I am able to read them.....to a
point. (I was proud because 'googling' around for a regex email
address validator string shows some VERY suspicious and
extortionately,seemingly unnecessarily, long offerings. So I had a go
myself).
And just like a hilariously long list of predecessors, came up with a
RE which fails to precisely reproduce the ABNF definition of a valid
address for message headers. This is why you now have 2 problems:
1. The one you invented of needing to precisely validate email
addresses to a RFC specification that is not a perfect match for the
addressing supported by any coherent package of production-grade mail
software.
2. A regular expression that is absurdly complex which you incorrectly
believe solves (1) while in fact it does not. It is maybe good enough,
but maybe not. It's an untestable approximation of its design goal,
which is an intrinsic problem for software.
.......AND relax!