On 11 Jun 2016, at 4:21, Groach wrote:
On 11/06/2016 05:09, Bill Cole wrote:
So, you thought validating email addresses was a problem demanding a
solution? And you "solved" it with a regular expression?
Congratulations on now having 2 problems. They should be very happy
together.
The regex I quoted was out of context to the problem and completely
unrelated (sorry if you feel so confused with that).
I was not at all confused, but sometimes when people are Wrong On The
Internet in special ways I cannot resist the urge to respond with a
paraphrased geek meme...
Look up Jamie Zawinski's famous "2 problems" quote regarding regular
expressions. It is a perfect fit for the application of regular
expressions to address validation
It is actually for another software project (a mail server)
Please don't take this as derogatory, because I DO NOT mean it to be,
but can you explain why the world needs yet another new mail server
implementation?
As an example of why I ask this, consider that Microsoft rewrote the
SMTP implementation in Exchange 2013 and did it wrong, breaking
multi-recipient message handling. I guess they had some reason, but the
point is that new code means new bugs, even when you have an elaborate
QA organization in place to prevent that.
that, being a mail server, must ensure email addresses are valid.
Not really. It needs to make sure that it never generates invalid
addresses and it probably should check addresses in its inputs for types
of invalidity that your later code will assume not to be present, but
those are both far from a need to validate addresses perfectly (or even
near-perfectly) to the RFC specification. Having a logical set of
addresses that you'd never generate but will still blindly and
harmlessly work with, some of which may not fit the RFC specs, is a
NON-PROBLEM.
Even if you wanted to draw a RFC-perfect boundary between valid and
invalid addresses, complex regular expressions are a poor tool for that
because the logic of REs don't align to that of the ABNF used in RFCs. A
single regular expression CANNOT precisely match the whole
RFC822/2822/5322 address space. The closest approximation in Perl RE is
huge, indecipherable, and machine-generated. It also cannot deal with
nested comments, a valid albeit pathological address structure under the
ABNF definition. In POSIX RE the problems are MUCH worse.
On the other hand, you COULD use very simple REs to serially and
recursively decompose addresses into the constructs defined by the ABNF
spec, using the same logic as the spec to validate addresses. This is
not as interesting a "problem" as writing the One True RFC822 RE, but it
is a fairly trivial coding exercise and would run more efficiently than
a single RE with the benefit of being more readable and debuggable.
I quoted the regexp in context of showing my point about how
'squiggly' they can be and that I am able to read them.....to a point.
(I was proud because 'googling' around for a regex email address
validator string shows some VERY suspicious and
extortionately,seemingly unnecessarily, long offerings. So I had a go
myself).
And just like a hilariously long list of predecessors, came up with a RE
which fails to precisely reproduce the ABNF definition of a valid
address for message headers. This is why you now have 2 problems:
1. The one you invented of needing to precisely validate email addresses
to a RFC specification that is not a perfect match for the addressing
supported by any coherent package of production-grade mail software.
2. A regular expression that is absurdly complex which you incorrectly
believe solves (1) while in fact it does not. It is maybe good enough,
but maybe not. It's an untestable approximation of its design goal,
which is an intrinsic problem for software.