http://bugzilla.spamassassin.org/show_bug.cgi?id=1375
------- Additional Comments From [EMAIL PROTECTED] 2004-03-02 14:35 ------- Sorry, but I disagree with most of the previous comment. Before even getting into the arguments, there is a simple counterexample to your proposal of ignoring just empty links. The example that was attached a few comments ago shows a spammer already including an href to an innocent site kai.com with an empty link area. Your proposal would result in the next spam from that person including the same href with font size 1 text, making the test useless. There is no reason to add a useless test. We are not talking about a general open-ended AI problem. The browser solves the problem already by interpreting the HTML and rendering pixels. If there are enough pixels in contrasting foreground and background colors in an area that is declared as a clickable hotspot, then the link is visible. The question is not if it is possible to do the same thing, but how close can we get to the same determination using only a reasonable amount of processing. We already have code to determine if text has been made invisible by being inside an HTML comment or in an invisible color or in a very tiny font. We need that already to catch attempts to make invisible non-spam content dominate the scoring. That still leaves open the different problem that an image can be visible or invisible and we cannot tell without downloading it from a website, possibly triggering a webbug. I don't know how to get around that one, which means that while I strongly disagree that this is an "AI problem" whose solution would give us a place in history, I do agree that we may not be able to solve the general problem. Most importantly, I disagree with your conclusions: "the distributed nature of dns would seem to defeat any attempts at dos by looking up links" If SA has to look up hundreds of legitimate domains to process each message, that will slow down processing too much. Spammers can create throwaway domains and host them on DNS servers that are designed to slow down anything that queries them. The distributed nature of DNS only helps to the degree that queries are cached, but spam cam contain variations of host names that will ensure that doesn't help. The solution to avoiding DoS is not to look up absolutely every link, instead choosing a random sample. But that allows the spammer to set their own probabilities of detection by how many invisible links they include for each visible link. "the only thing spammers would achieve by loading up spams with bogus links, is making it less likely that their spams would get through" The links would only be "bogus" in the sense that they are not really links that the spammer wants anybody to click on. They could point to real, innocent websites that we would not want on any RBL, like the kai.com example. They would not appear when someone readsd the spam, so they will not be clicked on. The only thing that might look up the domains of the hrefs would be spam filters, which will find that they are innocent. What _might_ work is a rule that is DoS-proof because it looks up only a limited number of hrefs, and another rule that penalizes mail that has enough links that it may be an attempt to introduce chaff to fool the first rule. Both of those would be made more effective by ignoring links that have invisible text. I still don't know what we would do about links that use images for their clickable area. ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is.
