I've had similar problems with Yahoo. The only solution was to change the
IP-address of the spider host.

> -----Original Message-----
> From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]]On
> Behalf Of Nick Arnett
> Sent: Saturday, July 27, 2002 10:54 PM
> To: Internet robots discussion
> Subject: [Robots] Does Yahoo have new robot defenses?
>
>
>
> It looks to me as though Yahoo has some sort of robot defense
> operating.  I
> was just testing a multi-threaded robot that I use to analyze
> discussions,
> including Yahoo's stock market boards.  On the first run, it
> seemed to do
> fine, but when I tried to run it again a few minutes later, it didn't
> retrieve anything... so I tried going to the message boards
> using IE on the
> same machine.  Every page is returning a 403 Forbidden error
> now -- even
> when I try to see robots.txt.  As far as I know, Yahoo has
> never even had a
> robots.txt file.
>
> I'm guessing that the speed of my robot triggered a block
> against this IP
> address.  Another machine, in the same subnet, can access the
> pages just
> fine.
>
> I've been working on the underlying database for the last few
> weeks, so I
> haven't run the spider lately.  Thus, I'm not sure when this
> behavior might
> hvae started.
>
> My robot is quite fast and my connection yields throughput of about 1
> mbit/sec, so it certainly hit their server fairly hard.  But hey, it's
> Yahoo.  If they can't handle getting hit this hard on a
> mid-day Saturday,
> it's hard to imagine who can.
>
> No lectures about well-behaved robots, please.  I know, I
> know.  The next
> step for that robot will be to have each thread hit
> completely different
> domains.  Perhaps each one will rotate through a few domains.
>
> Anybody know what Yahoo might be doing, or what its policy is
> about robots?
> I haven't been able to find anything that addresses the issue
> directly.  I
> don't see anything under its TOS that would clearly apply.
> If they want to
> have a limit on robots, I sure would appreciate it if they
> would say what it
> is...
>
> It's been about 30 minutes now and I'm still blocked, it seems.
>
> Just checked from another machine -- they still have no
> robots.txt at all.
>
> Nick
>
> --
> [EMAIL PROTECTED]
> (408) 904-7198
>
>
>


Reply via email to