At 11:43 AM 3/14/2002 -0800, you wrote:
>Hello everyone , > >How do you guys create your web crawler in such a way >that it would step over bot bait pages like WSPosion? >Do you simply include them in a list of urls to avoid >? This is a good question. I would be interested in learning a bit more about this as well. >or do you keep track of web sites with unusually large >amounts of web page such as a web site with about 200 >pages before abandoning or sending an alert ? Well, there are many sites nowadays that have literally 10's of 1,000's of web pages as a matter of course. (Especially sites that have discussion forums and archive mailing lists.) So, you wouldn't want to simply limit sites to a small number of pages. Don't know about the rest but I doubt that this would be the way to go for a large scale implementation. -Art -- Art Pollard http://www.lextek.com/ Suppliers of High Performance Text Retrieval Engines. -- This message was sent by the Internet robots and spiders discussion list ([EMAIL PROTECTED]). For list server commands, send "help" in the body of a message to "[EMAIL PROTECTED]".
