Re: Where to find DETAIL for spamassassin default RULES

Groach Sun, 12 Jun 2016 14:32:29 -0700


On 12/06/2016 21:14, Bill Cole wrote:

I was not at all confused, but sometimes when people are Wrong On TheInternet in special ways I cannot resist the urge to respond with aparaphrased geek meme...
Look up Jamie Zawinski's famous "2 problems" quote regarding regularexpressions. It is a perfect fit for the application of regularexpressions to address validation
It is actually for another software project (a mail server)
Please don't take this as derogatory, because I DO NOT mean it to be,but can you explain why the world needs yet another new mail serverimplementation?
As an example of why I ask this, consider that Microsoft rewrote theSMTP implementation in Exchange 2013 and did it wrong, breakingmulti-recipient message handling. I guess they had some reason, butthe point is that new code means new bugs, even when you have anelaborate QA organization in place to prevent that.
that, being a mail server, must ensure email addresses are valid.
Not really. It needs to make sure that it never generates invalidaddresses and it probably should check addresses in its inputs fortypes of invalidity that your later code will assume not to bepresent, but those are both far from a need to validate addressesperfectly (or even near-perfectly) to the RFC specification. Having alogical set of addresses that you'd never generate but will stillblindly and harmlessly work with, some of which may not fit the RFCspecs, is a NON-PROBLEM.
Even if you wanted to draw a RFC-perfect boundary between valid andinvalid addresses, complex regular expressions are a poor tool forthat because the logic of REs don't align to that of the ABNF used inRFCs. A single regular expression CANNOT precisely match the wholeRFC822/2822/5322 address space. The closest approximation in Perl REis huge, indecipherable, and machine-generated. It also cannot dealwith nested comments, a valid albeit pathological address structureunder the ABNF definition. In POSIX RE the problems are MUCH worse.
On the other hand, you COULD use very simple REs to serially andrecursively decompose addresses into the constructs defined by theABNF spec, using the same logic as the spec to validate addresses.This is not as interesting a "problem" as writing the One True RFC822RE, but it is a fairly trivial coding exercise and would run moreefficiently than a single RE with the benefit of being more readableand debuggable.
I quoted the regexp in context of showing my point about how'squiggly' they can be and that I am able to read them.....to apoint. (I was proud because 'googling' around for a regex emailaddress validator string shows some VERY suspicious andextortionately,seemingly unnecessarily, long offerings. So I had a gomyself).
And just like a hilariously long list of predecessors, came up with aRE which fails to precisely reproduce the ABNF definition of a validaddress for message headers. This is why you now have 2 problems:
1. The one you invented of needing to precisely validate emailaddresses to a RFC specification that is not a perfect match for theaddressing supported by any coherent package of production-grade mailsoftware.
2. A regular expression that is absurdly complex which you incorrectlybelieve solves (1) while in fact it does not. It is maybe good enough,but maybe not. It's an untestable approximation of its design goal,which is an intrinsic problem for software.





.......AND relax!

Re: Where to find DETAIL for spamassassin default RULES

Reply via email to