People write spiders that potentially span all/any hosts to harvest those email 
addresses for the annoying spam, to see if trademarks are being used illegally, to see 
if copyrights are
being violated, etc.

> The replies to my request for advice have been very helpful! I'll pick one
> and reply to it:
>
> At 10:01 2002-03-07 -0800, Otis Gospodnetic wrote:
> >[about my forthcoming book]
> >(i.e. I'm a potential customer :))  When will it be published?
>
> It's probably going into tech edit later this month.  So it'll probably be
> out this summer.  (Altho bear in mind that I live in New Mexico, where
> summer is just about everything between February and December.)
>
> >I think lots of people do want to know about recursive spiders, and I
> >bet one of the most frequent obstacles are issues like: queueing, depth
> >vs. breadth first crawling, (memory) efficient storage of extracted and
> >crawled links, etc.
>
> I'm getting the feeling that I should see spiders as of two kinds:  kinds
> that spider everything under a given URL
> (like  "http://www.speech.cs.cmu.edu/~sburke/pub/";  or "http://www.";), and
> kinds that go hog wide across all of the Web.
>
> The usefulness of the single-host spiders is pretty obvious to me.
> But why do people want to write spiders that potentially span all/any hosts?
> (Aside from people who are working for Google or similar.)
>
> --
> Sean M. Burke    [EMAIL PROTECTED]    http://www.spinn.net/~sburke/
>
> --
> This message was sent by the Internet robots and spiders discussion list 
>([EMAIL PROTECTED]).  For list server commands, send "help" in the body of a 
>message to "[EMAIL PROTECTED]".


--
This message was sent by the Internet robots and spiders discussion list 
([EMAIL PROTECTED]).  For list server commands, send "help" in the body of a message 
to "[EMAIL PROTECTED]".

Reply via email to