> > We just created an URL signature algorithm to be able to query an entire > URL at our URIBL: > > https://spfbl.net/en/uribl/ > > Now we are able to blacklist any malicious shortener URL > > > Leandro, > > Thanks for all you do! And good luck with that. But there are a few > potential problems. When I analyzed Google's shortners about a month ago, I > found that a VERY large percentage of the most malicious shortened URLs > were a situation where the spammers were generating a unique shortner for > each individual message/recipient-address. This causes the following HUGE > problems (at least for THESE particular shortners) when publishing a > full-URL dnsbl: >
Thank you for all those observations! > (1) much of what you populate your rbldnsd file with is going to be > totally ineffective for anyone since it ONLY applied to whatever single > email address where the spam was original sent (where you had trapped it) - > everyone else is going to get DIFFERENT shortners for the spam from these > same campaigns that are sent to their users. > You are right, but we do not use rbldnsd. We have our own DNSBL implementation that uses a more efficient data structure. Anyway, I thing that is not a good idea list each shortener, as you sad. Maybe thrice complains of same shortener. We will discover what is the best some time. > (2) get ready for EXTREME rbldnsd bloat. You're gonna need a LOT of RAM > eventually? And if you ever distribute via rsync, those are going to be > HUGE rsync files (and then THEY will need a lot of RAM). Sadly, most of > that bloat is going to come from entries that are doing absolutely nothing > for anyone. > That is it! We use a VM with 16GB and the software is using about 10GB to keep more than 30 million registers at memory. That is something about 350 bytes per register. Our software have an expiration mechanism, than this memory occupation is not growing to fast now. But we must keep one eye on it always. > (3) You might be revealing your spam traps to the spammers. In cases where > the spammers are sending that 1-to-1 spam to single recipient shortners, > then all they gave to do is enumerate through their list of shortners, > checking them against your list - and they INSTANTLY get a list of every > recipient address that triggers a listing on your DNSBL. If you want to > destroy the effectiveness of your own DNSBL's spam traps - be my guest. But > if you're getting 3rd party spam feeds (paid or free) - then know that > you're then screwing over your 3rd party spam feed's spam traps - and those > OTHER anti-spam system that rely on such feeds, which will then diminish in > quality. (unless you are filtering OUT these MANY 1-to-1 shortner spams) > Not only spamtraps will trigger this listing. All active users will do it too by complains. The spammer will not know who is spamtrap and who is active user. > Maybe there is enough OTHER shortners (that are sending the same shortners > to multiple recipients) to make this worthwhile? But the bloat from the > ones that are uniquely generated could be a challenge, and could > potentially cause a MASSIVE amount of useless queries. I'd be very > interested to see what PERCENTAGE of such queries generated a hit! > > Meanwhile, in my analysis I did about a month ago, about 80% of Google's > shortners found in egregious spams (that did this one-to-one > shorter-to-recipient tactic)... were all banging on one of ONLY a dozen > different spammers' domains. Therefore, doing a lookup on these and then > checking the domain found at the base of the link it redirects to... is a > more effective strategy for these - whereas, for THESE 80% of egregious > google shortners, a full URL lookup is worthless, consuming resources > without a single hit. > That is right. We have same situation here. But check first URL is not only action we do. Our script can follow shortener redirections and catch the spammer by last URL of redirection chain: https://www.dropbox.com/s/5aorrijafw5ygk0/uribl.pl?dl=0 The spammers can be trapped by any shortener they have or by this dozen domains that shortener hides. Alternatively, you may have found a way to filter out these types of > individualized shortners, to prevent that bloat? But even then, everyone > should know that while your new list might be helpful, it would be good for > others to know your new list isn't applicable to a large percentage of > spammy shortners, since it is still useless against these individualized > shortners. > I think that we all must cause to much of work for spammers, as much they cause to us. If the spammer uses individualized shorteners, we can list each one by crossing data with listed final chain URL domains. If they uses individualized URL domains, we can list each one by crossing data with listed URL equivalent IP (same machine for all spammer domains). We can make it more and more expensive for spammers. But we must work together to do it. > NOTE: Google has made some improvements recently, and I haven't yet > analyzed how much those improvements have changed any of these things I've > mentioned? > > PS - the alphanumeric code at the end of these shortners tend to be > case-sensitive, while the rest of the URL is NOT case sensitive (and they > also work with both "https" and "http")... so you might want to standardize > this on (1) https and (2) everything lower case up until the code at the > end of the shortner - before the MD5 is calculated. Otherwise, it could > easily break if the spammer just mixes up the capitalization of the > shortner URL up until the code at the end of the shortner. > Great! I did not thought this. This is a new problem and your solution is very good. Let's discuss more about this idea before any implementation. > -- > Rob McEwenhttps://www.invaluement.com > >