On Sat, Mar 09, 2013 at 04:18:40PM +0100, Axb wrote: > For stuff like that we need to rely on hashers, bayes and other > content rules.
Don't get me wrong: spamd, running on the mail server, handles these just fine. It's only the mass-check code that produces these superfluous messages. If it's difficult to fix, by all means leave it as it is. > On 03/09/2013 03:57 PM, Marc Andre Selig wrote: > >Whitespace within the URL is removed in line with RFC 1738/2396/3986, > >and we end up with "http://podify-merchants..com/?...", which is of > >course invalid. > One good way to test these cases is open such a spam msg with > Thunderbird and maybe some Outlook flavour. No Windows around here. ;) However, a quick check with what I've got yields these results (for my own domain): - SpamAssassin on its own, outside of auto-mass-check, triggers URI_OBFU_WWW (for a null label) and is otherwise happy. - urlview on Debian GNU/Linux, Apple's Mail.app, and Symbian's Mail all take the full URL and pass it on to the configured web client. urlview could actually be told that these are not valid URLs by editing /etc/urlview/system.urlview or ~/.urlview. - K-9 Mail on Android actually knows that these URLs are invalid. It does not show the one with the null label as an URL at all. For the other case, it shortens the overlong label to its last 63 characters and passes the resulting URL to the browser. None of the MUAs I tested allow line breaks within URLs, treating the line break as an URL delimiter instead. - Firefox (both on Linux and on OS X) just says "Server not found". - Safari has a localized message for "Server not found". - Chrome (on OS X) has the same message for the overlong name, but recognizes the null label as invalid and passes that one on to the configured search engine instead of trying to resolve it. - Web on Symbian (an ancient Webkit engine) tries to resolve both and shows the standard message for unresolveable domains. - lynx says "Can't access startfile". - curl says "Couldn't resolve host". - wget says "Name or service not known", "unable to resolve host address". None of these trigger an additional error message. All behave in a somewhat defensible way, either showing some variation on the theme of "server not found" to the user or trying to "improve" the request in some way. There are no warnings written to syslog or mailed to postmaster. Except for Chrome on OS X and K-9 Mail on Android, all of these consistently show the same behaviour for overlong and null labels in domain names and for non-existent but syntactically valid domain names. > If they can't handle the URL, then we can't really expect SA easily > to do it either. As they all handle the invalid URL just fine, including stock SpamAssassin while running in daemon mode, I still believe the mass-check code should be able to do it as well. But of course, I would never insist on it. ;) Regards, Marc
