On Sat, Mar 09, 2013 at 04:18:40PM +0100, Axb wrote:

> For stuff like that we need to rely on  hashers, bayes and other
> content rules.

Don't get me wrong: spamd, running on the mail server, handles these
just fine.  It's only the mass-check code that produces these superfluous
messages.

If it's difficult to fix, by all means leave it as it is.


> On 03/09/2013 03:57 PM, Marc Andre Selig wrote:

> >Whitespace within the URL is removed in line with RFC 1738/2396/3986,
> >and we end up with "http://podify-merchants..com/?...";, which is of
> >course invalid.

> One good way to test these cases is open such a spam msg with
> Thunderbird and maybe some Outlook flavour.

No Windows around here. ;)  However, a quick check with what I've got
yields these results (for my own domain):

- SpamAssassin on its own, outside of auto-mass-check, triggers
  URI_OBFU_WWW (for a null label) and is otherwise happy.

- urlview on Debian GNU/Linux, Apple's Mail.app, and Symbian's Mail
  all take the full URL and pass it on to the configured web client.
  urlview could actually be told that these are not valid URLs by editing
  /etc/urlview/system.urlview or ~/.urlview.

- K-9 Mail on Android actually knows that these URLs are invalid.  It does
  not show the one with the null label as an URL at all.  For the other
  case, it shortens the overlong label to its last 63 characters and
  passes the resulting URL to the browser.

None of the MUAs I tested allow line breaks within URLs, treating the
line break as an URL delimiter instead.

- Firefox (both on Linux and on OS X) just says "Server not found".

- Safari has a localized message for "Server not found".

- Chrome (on OS X) has the same message for the overlong name, but
  recognizes the null label as invalid and passes that one on to the
  configured search engine instead of trying to resolve it.

- Web on Symbian (an ancient Webkit engine) tries to resolve both and
  shows the standard message for unresolveable domains.

- lynx says "Can't access startfile".

- curl says "Couldn't resolve host".

- wget says "Name or service not known", "unable to resolve host address".

None of these trigger an additional error message.  All behave in a
somewhat defensible way, either showing some variation on the theme of
"server not found" to the user or trying to "improve" the request in some
way.  There are no warnings written to syslog or mailed to postmaster.

Except for Chrome on OS X and K-9 Mail on Android, all of these
consistently show the same behaviour for overlong and null labels in
domain names and for non-existent but syntactically valid domain names.


> If they can't handle the URL, then we can't really expect SA easily
> to do it either.

As they all handle the invalid URL just fine, including stock SpamAssassin
while running in daemon mode, I still believe the mass-check code should
be able to do it as well.  But of course, I would never insist on it. ;)

Regards,
Marc

Reply via email to