On Wed, Sep 15, 2004 at 02:17:15AM -0700, Jeff Chan wrote:
> On Wednesday, September 15, 2004, 1:38:30 AM, Julian Field wrote:
> > ... Is it possible to detect where
> > <A HREF="foo">bar</A>
> > and foo and bar are unrelated domains?
> 
> That could be a good idea for a rule.  It would be nice if it
> could be determined canonically, without actually resolving
> either location.

IMHO this is near impossible.

The trivial String Back-reference check can never
determine whether 'foo' and 'bar' are un*related*.
Just whether the text *in* the HREF is unequal to
the text shown to the user highlighted as a link.

In all cases, where the HREF is only 'semantically'
*related* to the following link text, a string check
will assume 'spam', while 'spam/scam' will sooner or
later just obfuscate the text portion by javascript
or encoding tricks.

e.g.:   <a HREF="www.eplus.de">imail.de</a>
        is 'related' (even if 'mis'constructed)
        because you find access to the 'imail.de'
        Mails via the 'www.eplus.de' webserver.

        Also many Mail-Texts of the kind
         ... to reach FOO click <a HREF="somedomain">here</a>
        would be very difficult to 'analyze correctly'.

So I believe it to be an interesting idea for AI specialists,
but alas not for inclusion in spamassassin as it works now.

Stucki  (postmaster at mi.fu-berlin.de using spamassassin 2.63)

Reply via email to