Alberto, your reasoning is correct, based on my experience of actually implementing and using such a system, albeit in a small scale environment. As "sm" points out, it is particularly useful as a "pass" rule for exact matches to your users' actual email client "real name"s.
I've implemented this as part of a qmail filter that runs after SA. As I've mentioned in other posts, I'm in a shared web hosting environment, and have no control over SA, so designed my filter to complement the great strengths of SA, and fill in the holes that are created by a limited environment. Just over twenty domains use my filter, and we all share data, so as to improve everyone's killrates. I have no idea how practical this would be as an SA plugin, and am Pearl-illiterate, so I merely describe how I have approached it. More than a year ago, I started using _VERY_ crude general header based (To/Cc checking) real name "pass" rules, then in March of 2007 I added an explicit "RealName" virtual header so as to allow more powerful rules, including "match not" type penalty rules. * Main Issues: * - generating a list of account specific real names (preferably automatically) - real-time extraction of the correct "real name" - some "real names" have been compromised, and should receive MUCH lower pass scores - some account names are inherently poorly suited to real name pass rules (e.g. "jayne.cobb" since all words in the real name also appear as words in the account part - "jcobb" is a better form) - some senders transpose real name parts (e.g. "Cobb, Jayne" in place of "Jayne Cobb") - some senders use cutesy nicknames or other tricks (e.g. "Hero of Canton" in place of "Jayne Cobb") - some senders (particularly Bulkers) use the complete account name as the real name, and should not be scored normally (e.g. "[EMAIL PROTECTED]" [EMAIL PROTECTED]) * PREP: Semi-automatic Real Name Data Generation: * I'm just-a-programmer, not a sysadmin, so don't know how a typical pipeline works, however, if it's practical, automatic real name extraction should be fairly straight forward. Just write something that you can temporarily plug in _AFTER_ SA, and which extracts the account & real name pair from everthing which passes SA, accumulates the frequencies, and picks the most often occurring real name(s) for each account (I usually limit this to one or two). Include an option for human inspection, mainly for cases where there is no clear cut winner. In my experience, the majority of accounts can be generated automatically, however it's wise to inspect all possibilities. That's manageable for small companies (less than 20), and shouldn't be too bad for low 100s. The collector app only needs to be run for a week or so. New users could be added manually. It took me much less than five minutes to generate such a data list AND all matching rules for the last person to join my Team (18 accounts, one week of data), and my tool merely dumps the per account RealNames with frequencies. A slicker tool could make this VERY practical for larger userbases. Maintenance and verification would probably be an utter pain for anything in the 1000s, so best to let us small and nimble types prove its efficacy. :) There is anecdotal evidence that Hotmail may be doing something with real name based rules, granted, there's reports that it's a somewhat sub optimal implementation. I speculate that they could easily pull the real name straight out of each user's settings. * Plugin: Real Name Extraction: * An actual SA plugin would need to use the SMTP Recipient (or most reliable Delivered-To account name) to pick out the matching account from the To or Cc headers, then pull out its real name. There should also be some facility for associating external aliases with accounts (e.g. a redirected ISP account). If it FAILS to find a matching account, _ALL_ other real name tests should be skipped or return false. * Plugin: Real Name Testing: * If it does find a matching account, three main real name based tests can be performed: empty, match, match not. It's probably easier to understand how these work with a sample, so let's say we have a user whose account is "[EMAIL PROTECTED]", the real name in his email client is "Jayne Cobb", and an automatic real name collector has shown that occasionally he receives important email that uses the real name "Hero of Canton". Somewhere, we would construct two data lists specific to his account, that would look something like this: realname_full = jayne cobb, hero of canton realname_words = jayne, cobb, hero, canton The generic real name "match" test would only trigger if the extracted real name exactly (case in-sensitive) matched either "jayne cobb" or "hero of canton", and the "match not" test would only trigger if NONE of the four words "jayne, cobb, hero, canton" was found anywhere in the real name. It's feasible to do "soft" matching, instead of word boundary based matching (my code allows either). Here's some examples: [EMAIL PROTECTED] "Jayne Cobb" [EMAIL PROTECTED] "Jayen Cobb" [EMAIL PROTECTED] "Peter Petrelli" [EMAIL PROTECTED] The first triggers an "empty" test, but none of the other types of tests. The second triggers an exact "match" pass rule. The third has a misspelling so it fails an exact "match" pass rule, AND it also fails a "match not" penalty rule because one of the words ("Cobb") does match. In other words, it receives ZERO total real name points. The fourth triggers a "match not" penalty rule, because NO words match. By using a LIST of acceptable individual words in the "match not" rule, there's no need to mess about with fuzzy matching. It is still possible for a fuzzy misfire to occur, however so far I have not seen any actual FPs caused by them (in more than half a million human+machine reviewed emails). Our only FPs have contained word that were widely off, so fuzzy matching would have made no difference. As always, careful scoring is appropriate, and your mileage may vary. A fuzzy matching option might be more suited to a later version of a plugin. * Scoring Notes: * I generally score the "empty" test either not at all or fairly low (0.5). I find it's most helpful as a bonus penalty in compound/meta rules, for example, I give many attachments (zip, PDF, or any image) a small to medium score if the real name is empty. I score the "match" rule between -0.51 and -4.59, depending on whether the real name has been compromised (one of our users gets a lot of ED spam sent from Russia with his correct real name), and whether that person has critical "pass" needs. I have found it to be an EXCELLENT means of preventing FPs, particularly during times when I'm tinkering with stuff to fight an emerging threat, and make a dumb mistake. :) I score the "match not" rules typically in the 1.02 to 3.06 range (default of 2.60). FPs have been extremely low, with most being unimportant bulk/junk type mail. One weakness in my own filter is the lack of metas. If an SA Real Name plugin were developed, it would be more powerful, since it could be used to reject specific attachment types that also triggered a "match not" test. That level of control is more suited to a small business, but it sure is nice to have. :) * Efficacy and Performance Notes: * Since I rolled this out last March, these tests alone immediately improved my users' typical killrates from about 99.40% to 99.75% (three of us are now at 100.00%), with a significant decrease in FPs. Those levels have been maintained, even during a period when many emerging threats have driven down our SA rates (again, using a very constrained SA setup). I have no feel for the SA system performance issues. In my case, I do all the "simple" (fast) tests first, then exit if the score is high enough, and only then do DNS tests. My general impression is that my overall performance is higher, because on average these tests avoid more tests than the time they consume. Bottom line, I think these can be very effective for a smallish environment. Granted, I really need to write some code to extract precise stats. I am confident of the beneficial effect on FPs, because I check ALL of those by hand. - "Chip"