subject:"\[twitter\-dev\] Re\: tco crawler details"

[twitter-dev] Re: tco crawler details

2010-06-11 Thread Ken

Presumably, in order to check that a URL is not malicious, it would have to be accessed and analysed by tco. In his post Raffi said, Twitter will redirect them to the original URL after first confirming with our database that that URL is not malicious. So it's not by domain, but by URL. One of

[twitter-dev] Re: tco crawler details

2010-06-11 Thread Ken

Dean - I meant the IP of the crawler - we have lots of DENY ACLs in place to curb rogue bots. On Jun 11, 3:21 pm, Dean Collins d...@cognation.net wrote: Of course it is. Twitter were asked what defines a bad site on the second day but I haven't seen a reply apart from more questions about who

[twitter-dev] Re: tco crawler details

2010-06-11 Thread Dewald Pretorius

My guess is that Twitter uses the Google SafeBrowsing API, in addition to other blacklist APIs. http://code.google.com/apis/safebrowsing/ Google SafeBrowsing is basically two databases, which you can host locally, and are constantly updated by Google. One database consists of potential phishing