On Sat, 8 Apr 2000, Adam Megacz wrote: > Ok, I guess it's not really a robot, but metacrawler does connect to > sites such as altavista.com and masquerade as a client -- somewhat > similar to what robots do.
Mmm. Not really. There is no "masquerade" involved; they _are_ a client. This isn't really ontopic here, since metasearch engines don't care about what 95% of being a robot is about (knowing what URLs to query and how, following links in retrieved documents, etc.) but... > > I've always been curious -- why doesn't altavista block access from > the IP's owned by metacrawler.com? Metacrawler slows down altavista, > and they don't make any money from banner ads (since metacrawler > strips the banner ads off of the pages returned by altavista). While it would not be appropriate for me to comment on the particulars of the situation with MetaCrawler and AltaVista, there are a few general comments I can make about metasearch engines and the backend engines they query. MetaCrawler does indeed pass through certain banner ads from certain engines. Take a look on the second and later pages of some MetaCrawler results pages and you will see them in the middle of the results sometimes. MetaCrawler includes attribution of where the queries come from besides each results, which can serve as a form of advertising and a way to drive some amount of traffic to the particular engine. For lesser known engines, the exposure that metasearch engines give could be a reason for them to want such traffic. For a lot of Internet companies, the revenue thing is still something they are trying to figure out; they figure if they can just get the traffic in whatever way they can, they will figure out how to do something with it later. Some search engines, such as goto.com, receive money themself from the sites being linked to for clickthroughs on certain paid results. They still get these payments even if the clickthrough comes from a metasearch engine. I think you will find that every metasearch engine with a significant volume of traffic has business agreements in place with the engines they query. The nature of these agreements, including such things as if the search engine pays the metasearcher, vice versa, or neither, would depend on the situation and the clout of the various companies involved. All of these situations are options, however. For all the random tiny metasearch engines out there that just point themselves at some search engines and query away, their small volume of queries isn't worth the bother to investigate or block. You can be certain, however, that if they grow to be a significant volume of queries, most search engines will notice and will take action. One view is reflected by David Filo's (one of the founders of Yahoo and a "Chief Yahoo") response to a similar question about MetaCrawler at ApacheCon in 1998. He said "MetaCrawler doesn't matter, and will never matter" in terms of Yahoo's traffic. You will note, however, that MetaCrawler doesn't query Yahoo at the moment, and 1998 is a very long time ago in net-time. You will also probably note that there are some search engines that there are no major metasearch engines querying; obviously, there is a reason for that. I did work at Go2Net (http://www.go2net.com) in the past, which is the company that owns both MetaCrawler (http://www.metacrawler.com) and Dogpile (http://www.dogpile.com), which are probably the two most trafficked metasearch engines around. While this helps give me perspective on the issue, everything I said is based on public information.
