The replies to my request for advice have been very helpful! I'll pick one 
and reply to it:

At 10:01 2002-03-07 -0800, Otis Gospodnetic wrote:
>[about my forthcoming book]
>(i.e. I'm a potential customer :))  When will it be published?

It's probably going into tech edit later this month.  So it'll probably be 
out this summer.  (Altho bear in mind that I live in New Mexico, where 
summer is just about everything between February and December.)


>I think lots of people do want to know about recursive spiders, and I
>bet one of the most frequent obstacles are issues like: queueing, depth
>vs. breadth first crawling, (memory) efficient storage of extracted and
>crawled links, etc.

I'm getting the feeling that I should see spiders as of two kinds:  kinds 
that spider everything under a given URL 
(like  "http://www.speech.cs.cmu.edu/~sburke/pub/";  or "http://www.";), and 
kinds that go hog wide across all of the Web.

The usefulness of the single-host spiders is pretty obvious to me.
But why do people want to write spiders that potentially span all/any hosts?
(Aside from people who are working for Google or similar.)

--
Sean M. Burke    [EMAIL PROTECTED]    http://www.spinn.net/~sburke/


--
This message was sent by the Internet robots and spiders discussion list 
([EMAIL PROTECTED]).  For list server commands, send "help" in the body of a message 
to "[EMAIL PROTECTED]".

Reply via email to