At 11:43 AM 3/14/2002 -0800, you wrote:

>Hello everyone ,
>
>How do you guys create your web crawler in such a way
>that it would step over bot bait pages like WSPosion?
>Do you simply include them in a list of urls to avoid
>?

This is a good question.  I would be interested in
learning a bit more about this as well.

>or do you keep track of web sites with unusually large
>amounts of web page such as a web site with about 200
>pages before abandoning or sending an alert ?

Well, there are many sites nowadays that have literally
10's of 1,000's of web pages as a matter of course.
(Especially sites that have discussion forums and archive
mailing lists.)

So, you wouldn't want to simply limit sites to a small
number of pages.

Don't know about the rest but I doubt that this would be the
way to go for a large scale implementation.

-Art


-- 
Art Pollard
http://www.lextek.com/
Suppliers of High Performance Text Retrieval Engines.


--
This message was sent by the Internet robots and spiders discussion list 
([EMAIL PROTECTED]).  For list server commands, send "help" in the body of a message 
to "[EMAIL PROTECTED]".

Reply via email to