[twitter-dev] Re: Twitter API is unresponsive, can we crawl the twitter.com website?

2009-08-09 Thread Chad Etzel

Hello,

If you crawl and/or scrape the site, your IP(s) will be blackholed
indefinitely. We do check for this behavior and identify it.
-Chad

On Sun, Aug 9, 2009 at 10:23 AM, Cameron Kaiser wrote:
>
>> > > Would it be okay to start taking a peek at some of the more popular
>> > > user's pages on twitter.com, and crawl their favorites from there
>> > > rather than using the api?  I guess this assumes my IP wouldn't get
>> > > blocked as it seems to be by the API.
>
>> I'm going to say it is a good way to get your app/IP permanently banned from
>> Twitter.
>
> I think John K said something specific about banning screen scrapers not
> too long ago as well.
>
> --
>  personal: http://www.cameronkaiser.com/ 
> --
>  Cameron Kaiser * Floodgap Systems * www.floodgap.com * ckai...@floodgap.com
> -- God may be subtle, but He isn't plain mean. -- Albert Einstein 
> -
>


[twitter-dev] Re: Twitter API is unresponsive, can we crawl the twitter.com website?

2009-08-09 Thread Nick Arnett
On Sun, Aug 9, 2009 at 7:55 AM, Cameron Kaiser  wrote:

>
> > I guess if you crawl like a crawler It should be okay otherwise how a
> > newly developed spider would work. But if you crawl like a scraper
> > you'll be banned.
> > But I am not sure if Twitter can differentiate between them. I mean
> > the request pattern. Does twitter check it?
>
> I'm sure they have ways of determining it internally, and I'm sure they
> won't reveal what those ways are.


They already have, to a certain extent.  It appears that they want up to a
10-second delay, which is an eternity in web crawling.  It limits a crawler
to 8,640 requests a day, which is peanuts, as I'm sure most people here
would realize immediately.

http://twitter.com/robots.txt

#Google Search Engine Robot
User-agent: Googlebot
# Crawl-delay: 10 -- Googlebot ignores crawl-delay ftl
Disallow: /*?
Disallow: /*/with_friends

#Yahoo! Search Engine Robot
User-Agent: Slurp
Crawl-delay: 1
Disallow: /*?
Disallow: /*/with_friends

#Microsoft Search Engine Robot
User-Agent: msnbot
Crawl-delay: 10
Disallow: /*?
Disallow: /*/with_friends

# Every bot that might possibly read and respect this file.
User-agent: *
Disallow: /*?
Disallow: /*/with_friends


What you don't see here is whatever internal list of user-agents and IP
addresses they might block.

Having spent countless days identifying robots on dozens of big consumer and
business sites, I know that it's very hard to eliminate the low-volume ones
that spoof normal web clients.  But any one that starts grabbing a
significant number of pages, or the same page more often than every 24 hours
or so, stands out from the data quickly... and probably gets blocked.  Of
course, that's essentially what the DDoS is doing, except that it probably
also probes for and uses specific vulnerabilities, rather than just making
page requests.

Nick
(who, among other things, was the product manager for the first commercial
web crawler and helped set the robots.txt standard)


[twitter-dev] Re: Twitter API is unresponsive, can we crawl the twitter.com website?

2009-08-09 Thread Cameron Kaiser

> I guess if you crawl like a crawler It should be okay otherwise how a
> newly developed spider would work. But if you crawl like a scraper
> you'll be banned.
> But I am not sure if Twitter can differentiate between them. I mean
> the request pattern. Does twitter check it?

I'm sure they have ways of determining it internally, and I'm sure they
won't reveal what those ways are.

-- 
 personal: http://www.cameronkaiser.com/ --
  Cameron Kaiser * Floodgap Systems * www.floodgap.com * ckai...@floodgap.com
-- Those wise enough to avoid politics are governed by those who aren't. --


[twitter-dev] Re: Twitter API is unresponsive, can we crawl the twitter.com website?

2009-08-09 Thread shiplu
I guess if you crawl like a crawler It should be okay otherwise how a
newly developed spider would work. But if you crawl like a scraper
you'll be banned.
But I am not sure if Twitter can differentiate between them. I mean
the request pattern. Does twitter check it?



-- 
A K M Mokaddim
http://talk.cmyweb.net
http://twitter.com/shiplu
Stop Top Posting !!
বাংলিশ লেখার চাইতে বাংলা লেখা অনেক ভাল
Sent from Dhaka, Bangladesh


[twitter-dev] Re: Twitter API is unresponsive, can we crawl the twitter.com website?

2009-08-09 Thread Cameron Kaiser

> > > Would it be okay to start taking a peek at some of the more popular
> > > user's pages on twitter.com, and crawl their favorites from there
> > > rather than using the api?  I guess this assumes my IP wouldn't get
> > > blocked as it seems to be by the API.

> I'm going to say it is a good way to get your app/IP permanently banned from
> Twitter.

I think John K said something specific about banning screen scrapers not
too long ago as well.

-- 
 personal: http://www.cameronkaiser.com/ --
  Cameron Kaiser * Floodgap Systems * www.floodgap.com * ckai...@floodgap.com
-- God may be subtle, but He isn't plain mean. -- Albert Einstein -


[twitter-dev] Re: Twitter API is unresponsive, can we crawl the twitter.com website?

2009-08-09 Thread avail4one
hmmm, maybe. i honestly wasn't trying to suggest a 'bad idea' was actually
good, but i was just reading through recent messages, trying to make some
sense out of it all, and realized that I first heard of twitter pretty early
on a radio broadcast. that was mostly an advertisement. twitter was still a
baby. I was an early adopter. I created my account right away just to see
how great it was.

anyhow, it just occurred to me the question of why we are talking about this
on this old-fashion google thing and not twitter... and i think the answer
is this: "it's a numbers game". and it has always been. and of course this
makes no sense whatsoever, it's not suppose to. that's exactly what that
kind of response means.

just a thought.












On Sun, Aug 9, 2009 at 1:25 AM, Abraham Williams <4bra...@gmail.com> wrote:

> I'm going to say it is a good way to get your app/IP permanently banned
> from Twitter.
>
> Abraham
>
> 2009/8/9 avail4one 
>
> I think it's ok to go for it, what does everyone else think?
>>
>>
>>
>>
>> On Sun, Aug 9, 2009 at 1:10 AM, Tim Haines  wrote:
>>
>>>
>>> Hi there,
>>>
>>> This is probably a very dumb question, but I thought I'd ask it
>>> anyway.
>>>
>>> I run http://favstar.fm, and I can't call the API more than 60 or so
>>> times an hour at the moment (or something in that region).  My worthy
>>> competitor http://favotter.matope.com remains uneffected though, and
>>> has been able to continue his fav crawling uninterrupted.
>>>
>>> Would it be okay to start taking a peek at some of the more popular
>>> user's pages on twitter.com, and crawl their favorites from there
>>> rather than using the api?  I guess this assumes my IP wouldn't get
>>> blocked as it seems to be by the API.
>>>
>>> Dumb question right?  Curious to hear.
>>>
>>> Tim.
>>>
>>>
>>>
>>
>>
>> --
>> \./'\./ /'\ \ ]. /'\./'\ /'\ /'\./
>>
>
>
>
> --
> Abraham Williams | Community Evangelist | http://web608.org
> Hacker | http://abrah.am | http://twitter.com/abraham
> Project | http://fireeagle.labs.poseurtech.com
> This email is: [ ] blogable [x] ask first [ ] private.
>



-- 
\./'\./ /'\ \ ]. /'\./'\ /'\ /'\./


[twitter-dev] Re: Twitter API is unresponsive, can we crawl the twitter.com website?

2009-08-09 Thread Tim Haines

That's probably what I would have said if the question was posed to me
too.  Or maybe "ask twitter, but likely not". Hence the dumb question
warning.

Tim.

On Aug 9, 8:25 pm, Abraham Williams <4bra...@gmail.com> wrote:
> I'm going to say it is a good way to get your app/IP permanently banned from
> Twitter.
>
> Abraham
>
> 2009/8/9 avail4one 
>
>
>
> > I think it's ok to go for it, what does everyone else think?
>
> > On Sun, Aug 9, 2009 at 1:10 AM, Tim Haines  wrote:
>
> >> Hi there,
>
> >> This is probably a very dumb question, but I thought I'd ask it
> >> anyway.
>
> >> I runhttp://favstar.fm, and I can't call the API more than 60 or so
> >> times an hour at the moment (or something in that region).  My worthy
> >> competitorhttp://favotter.matope.comremains uneffected though, and
> >> has been able to continue his fav crawling uninterrupted.
>
> >> Would it be okay to start taking a peek at some of the more popular
> >> user's pages on twitter.com, and crawl their favorites from there
> >> rather than using the api?  I guess this assumes my IP wouldn't get
> >> blocked as it seems to be by the API.
>
> >> Dumb question right?  Curious to hear.
>
> >> Tim.
>
> > --
> > \./'\./ /'\ \ ]. /'\./'\ /'\ /'\./
>
> --
> Abraham Williams | Community Evangelist |http://web608.org
> Hacker |http://abrah.am|http://twitter.com/abraham
> Project |http://fireeagle.labs.poseurtech.com
> This email is: [ ] blogable [x] ask first [ ] private.


[twitter-dev] Re: Twitter API is unresponsive, can we crawl the twitter.com website?

2009-08-09 Thread Abraham Williams
I'm going to say it is a good way to get your app/IP permanently banned from
Twitter.

Abraham

2009/8/9 avail4one 

> I think it's ok to go for it, what does everyone else think?
>
>
>
>
> On Sun, Aug 9, 2009 at 1:10 AM, Tim Haines  wrote:
>
>>
>> Hi there,
>>
>> This is probably a very dumb question, but I thought I'd ask it
>> anyway.
>>
>> I run http://favstar.fm, and I can't call the API more than 60 or so
>> times an hour at the moment (or something in that region).  My worthy
>> competitor http://favotter.matope.com remains uneffected though, and
>> has been able to continue his fav crawling uninterrupted.
>>
>> Would it be okay to start taking a peek at some of the more popular
>> user's pages on twitter.com, and crawl their favorites from there
>> rather than using the api?  I guess this assumes my IP wouldn't get
>> blocked as it seems to be by the API.
>>
>> Dumb question right?  Curious to hear.
>>
>> Tim.
>>
>>
>>
>
>
> --
> \./'\./ /'\ \ ]. /'\./'\ /'\ /'\./
>



-- 
Abraham Williams | Community Evangelist | http://web608.org
Hacker | http://abrah.am | http://twitter.com/abraham
Project | http://fireeagle.labs.poseurtech.com
This email is: [ ] blogable [x] ask first [ ] private.


[twitter-dev] Re: Twitter API is unresponsive, can we crawl the twitter.com website?

2009-08-09 Thread avail4one
I think it's ok to go for it, what does everyone else think?



On Sun, Aug 9, 2009 at 1:10 AM, Tim Haines  wrote:

>
> Hi there,
>
> This is probably a very dumb question, but I thought I'd ask it
> anyway.
>
> I run http://favstar.fm, and I can't call the API more than 60 or so
> times an hour at the moment (or something in that region).  My worthy
> competitor http://favotter.matope.com remains uneffected though, and
> has been able to continue his fav crawling uninterrupted.
>
> Would it be okay to start taking a peek at some of the more popular
> user's pages on twitter.com, and crawl their favorites from there
> rather than using the api?  I guess this assumes my IP wouldn't get
> blocked as it seems to be by the API.
>
> Dumb question right?  Curious to hear.
>
> Tim.
>
>
>


-- 
\./'\./ /'\ \ ]. /'\./'\ /'\ /'\./