I've had similar problems with Yahoo. The only solution was to change the IP-address of the spider host.
> -----Original Message----- > From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]]On > Behalf Of Nick Arnett > Sent: Saturday, July 27, 2002 10:54 PM > To: Internet robots discussion > Subject: [Robots] Does Yahoo have new robot defenses? > > > > It looks to me as though Yahoo has some sort of robot defense > operating. I > was just testing a multi-threaded robot that I use to analyze > discussions, > including Yahoo's stock market boards. On the first run, it > seemed to do > fine, but when I tried to run it again a few minutes later, it didn't > retrieve anything... so I tried going to the message boards > using IE on the > same machine. Every page is returning a 403 Forbidden error > now -- even > when I try to see robots.txt. As far as I know, Yahoo has > never even had a > robots.txt file. > > I'm guessing that the speed of my robot triggered a block > against this IP > address. Another machine, in the same subnet, can access the > pages just > fine. > > I've been working on the underlying database for the last few > weeks, so I > haven't run the spider lately. Thus, I'm not sure when this > behavior might > hvae started. > > My robot is quite fast and my connection yields throughput of about 1 > mbit/sec, so it certainly hit their server fairly hard. But hey, it's > Yahoo. If they can't handle getting hit this hard on a > mid-day Saturday, > it's hard to imagine who can. > > No lectures about well-behaved robots, please. I know, I > know. The next > step for that robot will be to have each thread hit > completely different > domains. Perhaps each one will rotate through a few domains. > > Anybody know what Yahoo might be doing, or what its policy is > about robots? > I haven't been able to find anything that addresses the issue > directly. I > don't see anything under its TOS that would clearly apply. > If they want to > have a limit on robots, I sure would appreciate it if they > would say what it > is... > > It's been about 30 minutes now and I'm still blocked, it seems. > > Just checked from another machine -- they still have no > robots.txt at all. > > Nick > > -- > [EMAIL PROTECTED] > (408) 904-7198 > > >