Re: [twitter-dev] tco crawler details

2010-06-11 Thread John Adams
t.co is not a crawler; Are you referring to the URL unpacking process or
something else?

-john


On Thu, Jun 10, 2010 at 11:46 PM, Ken k...@cimas.ch wrote:

 If tco is to be the new three-letter agency and gatekeeper, we would
 like to treat it nice and whitelist its crawler. If tco is
 inadvertantly blocked, what happens?

 I do not know if we have already been checked by tco as I have not
 sent or received a dm with one of our own URLs.

 What are the user-agent and IP addresses used by this crawler? Does it
 check robots.txt?

 And since, for some, a tco thumbsdown could be a problem, is there a
 (speedy) appeals process?



RE: [twitter-dev] tco crawler details

2010-06-11 Thread Dean Collins
Of course it is.

 

Twitter were asked what defines a bad site on the second day but I
haven't seen a reply apart from more questions about who is making the
choice, eg will pornography be classed as bad, will religious free
speech be classed as bad.

 

I don't think the Twitheads thought through what it means to now offer
an aol version of the web and the long term responsibilities that this
entails through implicit guarantees to their users.

 

Of course Ken you don't expect them to publish their ip address list do
youotherwise some smartass would route this ip address to a clean
site and everyone else to the bad content.

 

 

Regards,

Dean Collins
Cognation Inc
d...@cognation.net
mailto:d...@cognation.net +1-212-203-4357   New York
+61-2-9016-5642   (Sydney in-dial).
+44-20-3129-6001 (London in-dial).



From: twitter-development-talk@googlegroups.com
[mailto:twitter-development-t...@googlegroups.com] On Behalf Of John
Adams
Sent: Friday, 11 June 2010 6:00 AM
To: twitter-development-talk@googlegroups.com
Subject: Re: [twitter-dev] tco crawler details

 

t.co is not a crawler; Are you referring to the URL unpacking process or
something else?

 

-john

 

On Thu, Jun 10, 2010 at 11:46 PM, Ken k...@cimas.ch wrote:

If tco is to be the new three-letter agency and gatekeeper, we would
like to treat it nice and whitelist its crawler. If tco is
inadvertantly blocked, what happens?

I do not know if we have already been checked by tco as I have not
sent or received a dm with one of our own URLs.

What are the user-agent and IP addresses used by this crawler? Does it
check robots.txt?

And since, for some, a tco thumbsdown could be a problem, is there a
(speedy) appeals process?

 



Re: [twitter-dev] tco crawler details

2010-06-11 Thread John Kalucki
We've already been checking for bad links now for at least a year, if
not 18 months. It's been so long, I can't remember when it went into
production. Link checking seems to work very well.

-John Kalucki
http://twitter.com/jkalucki
Infrastructure, Twitter Inc.



On Fri, Jun 11, 2010 at 6:21 AM, Dean Collins d...@cognation.net wrote:
 Of course it is.



 Twitter were asked what “defines” a “bad” site on the second day but I
 haven’t seen a reply apart from more questions about who is making the
 choice, eg will pornography be classed as “bad”, will religious free speech
 be classed as “bad”.



 I don’t think the Twitheads thought through what it means to now offer an
 “aol” version of the web and the long term responsibilities that this
 entails through implicit guarantees to their users.



 Of course Ken you don’t expect them to publish their ip address list do
 you….otherwise some smartass would route this ip address to a “clean” site
 and everyone else to the “bad” content.





 Regards,

 Dean Collins
 Cognation Inc
 d...@cognation.net
 +1-212-203-4357   New York
 +61-2-9016-5642   (Sydney in-dial).
 +44-20-3129-6001 (London in-dial).

 

 From: twitter-development-talk@googlegroups.com
 [mailto:twitter-development-t...@googlegroups.com] On Behalf Of John Adams
 Sent: Friday, 11 June 2010 6:00 AM
 To: twitter-development-talk@googlegroups.com
 Subject: Re: [twitter-dev] tco crawler details



 t.co is not a crawler; Are you referring to the URL unpacking process or
 something else?



 -john



 On Thu, Jun 10, 2010 at 11:46 PM, Ken k...@cimas.ch wrote:

 If tco is to be the new three-letter agency and gatekeeper, we would
 like to treat it nice and whitelist its crawler. If tco is
 inadvertantly blocked, what happens?

 I do not know if we have already been checked by tco as I have not
 sent or received a dm with one of our own URLs.

 What are the user-agent and IP addresses used by this crawler? Does it
 check robots.txt?

 And since, for some, a tco thumbsdown could be a problem, is there a
 (speedy) appeals process?