Re: new (small) shortener campaign suggestion for URLRedirect

2010-03-01 Thread Jonas Eckerman

I think I'm misunderstanding something, but I'm not sure what.

Please tell me why I'm confused. :-)

On 2010-02-24 11:30, Chip M. wrote:


Jonas, do you have any performance and/or efficacy stats for your
URLRedirect plugin?


Unfortunately, no. I am logging info from it (to the general mail log), 
but I haven't put anything together to analyze the logs.



My main concern with doing real-time HTTP HEADs is performance.


Well... Since the URLRedirect plugin inserts the redirection targets 
into the messages metadata so that other plugins (such as URIBL for 
example) can see those targets, the HEAD requests must be done before 
other rules.


Currently the plugin only caches the results in simple perl structure, 
and in a pr instance manner. Puttingthe cache in shared memory or a 
database is an obvious improvement that should be done. This could make 
a big difference at high volume sites. Two reasons I haven't done much 
with the plugin, except adding support for Marc Perkel's URL shortener 
DNS list. is that I have no idea wether anyone except me actually uses 
the plugin, and I haven't seen much URL shortener spam since I made it.


Other than the lack of cache, the plugin shouldn't be much f a 
performance problem. It does the HEAD requests in paralell (when 
possible), and has a runtime definebale timeout (default is 10 seconds, 
I do nt recommend raising that). It also has limits on the amount of 
requests done for one message.


I would like to do the requests in paralell with other processing as 
well, but that's hard. It has to insert the targets into metadata before 
any rules or plugins that uses URIs, and I don't know of a way to do 
that other than having it run it's course before the regular rules.


 Jonas, I've been thinking that if you embedded the SA spam score in
 the HTTP request's Agent, that would provide BitLy/et-al with
 extremely useful data, which should improve their detection rate
 (if they choose to use the extra data).

This would require the plugin to know the score before SpamAssassin has 
calculated it, wich is kind of difficult. I have not found any good 
algorithm implementing that kind of prescience. :-)


It could insert the score(s) (or score aggregates) of *previous* 
messages into the user agent, but I think a general score aggregate 
system not tied to URL shortener services would be more suited for that. 
I for example might be interested in aggregate scores for mailes 
refering our web sites even though they are not URL shorteners.


 You could also include the recipient's domain, which may help them
 to correlate data.

I'm not sure what they would do wth that, but again I think that kind of 
thing fits more into a general report framework than in a very 
specialized spamassassin plugin such as URLRedirect.


 I'm also wondering about using UDP to send a quick real-time,
 no-response-needed message (instead of a high overhead HTTP
 request), then (mostly) auto-quarantine, and later in a separate,
 batch queue, do a proper HTTP Head.  Anything that's clean after a
 certain amount of time, could be automatically re-injected back
 into the main queue.  That would allow pooling of requests, and
 shift the load from the main email gateway.

I really don't understand this idea. What would the UDP packet contain? 
What would it be good for?


You can do a SpamAssassin run in a batch queue allready. The 
URLRedirecft plugin doesn't care if SpamAssassin runs in a queue job, at 
delivery, in the receiving MTA, or wherever. When to call SA, wich is 
what calls URLRedirects methods, is completely outside the plugins control.


If you want to queue some mails for later SA checking you should do that 
withoput having to run the same SA as you want to avoid running. You 
could do that by having some faster, leaner thing check the mail and 
decide to either send it to SA directly or to the slow queue. You could 
also run two SA configs where one has just about everything (including 
default rules) disabled. That SA could have rules used to determine 
wether to send the mail oin to the normal SA or the slow queue.


In any case, it's hard for an SA plugin to decide wether SA should have 
been called or not, so this has to be done with more than just a SA plugin.


The only impact of the URLRedirect plugin I can see in this is that you 
could use it to just see if there is an identified redirector URL in the 
message and use that to decide in the first (lean and fast) SA run in 
the two-SA-scenario above. If this is what you meant, I could easily 
implement an option to just identify redirectors rather than actually 
testing them with HEAD requests.


 The UDP part would be pointless without cooperation from the
 shortener services.  If they do embrace it, they can use the UDP
 data to more quickly identify most bad links, and have that all
 ready for when we send out HTTP requests.

I don't get this either. How would the UDP requests help them find bad 
links? How it help 

new (small) shortener campaign suggestion for URLRedirect

2010-02-24 Thread Chip M.
Jonas, do you have any performance and/or efficacy stats for your
URLRedirect plugin?

After months of near silence, I'm seeing an interesting (albeit
low volume) shortener campaign, that's picking up volume AND
effectiveness.

Only one of my 40-ish domains was getting these, then this week two
other domains started getting a trickle.  They make up about 5.6%
of the post-Spamhaus volume for the main victim, and (currently)
much less than 1% for the two new victims.

They're mostly targeting BitLy, and they're getting MUCH better at
thwarting BitLy's detection (for a short period, it was near 100%
catch rate, now it's down to almost zero).  The volume is low
enough that I've been batch HTTP-HEADing them, a few times each
day.

They're not doing anything stunningly innovative, and they should
be fairly obvious (both algorithmically and to the (experienced)
naked eye), but the volume must be small enough that they're
sneaking under the radar.

I've been killing these partly by giving BitLy/et-al a fairly hefty
score, UNLESS the sender uses the correct RealName in the To
header, or they've already got a generous skip rule.

That's brute force, and terribly inelegant.

The FP rate hasn't been too bad, and I'm of the opinion that a
URL shortener is an admission by the sender that the content is low
priority, and all but begs to be delayed and salvaged by our FP
pipeline.

Still, I'd like to find a more elegant way to handle these, partly
so I can be even more aggressive.

My main concern with doing real-time HTTP HEADs is performance.

Jonas, I've been thinking that if you embedded the SA spam score in
the HTTP request's Agent, that would provide BitLy/et-al with
extremely useful data, which should improve their detection rate
(if they choose to use the extra data).

You could also include the recipient's domain, which may help them
to correlate data.  In the campaigns I've looked at, the patterns
are fairly obvious, as long as you look at _JUST_ the data for that
domain's spam (granted, I'm in a small domain environment, and that
really wouldn't be relevant for a big ISP's data, however that's
easy to separate at the shorteners' end).

Does anyone have a contact at BitLy?  They seem to be on the ball,
and serious.  They could well embrace cooperation. :)

I'm also wondering about using UDP to send a quick real-time,
no-response-needed message (instead of a high overhead HTTP
request), then (mostly) auto-quarantine, and later in a separate,
batch queue, do a proper HTTP Head.  Anything that's clean after a
certain amount of time, could be automatically re-injected back
into the main queue.  That would allow pooling of requests, and
shift the load from the main email gateway.

The UDP part would be pointless without cooperation from the
shortener services.  If they do embrace it, they can use the UDP
data to more quickly identify most bad links, and have that all
ready for when we send out HTTP requests.

Ideally, if a pilot program proved successful, some of the
shorteners might be willing to host a proper DNS blocklist, which
would improve performance for everybody. :)

The commercial shorteners COULD even use a blocklist under their
control (i.e. preferred white/pass-listing) as a selling point for
their (future?) paying customers, and/or as an incentive for their
link creators to opt-in to a more rigorous creation channel
(i.e. still keep the no-brainer, Pakled-friendly channel, but also
have a channel for adults (in the original sense of that word) who
are willing to perform some level of verification).


I've also seeing a very pernicious Digg campaign, all being sent
via Hotmail.  In some ways, that's more effective than the
mainstream shorteners, since it appears Digg does not check any
blocklists. :(

These services are just too dang tempting a target, so I expect
these campaigns to continue.


More fevered ramblings from one of your mostly harmless Iowa Geeks,
- Chip