The replies to my request for advice have been very helpful! I'll pick one and reply to it:
At 10:01 2002-03-07 -0800, Otis Gospodnetic wrote: >[about my forthcoming book] >(i.e. I'm a potential customer :)) When will it be published? It's probably going into tech edit later this month. So it'll probably be out this summer. (Altho bear in mind that I live in New Mexico, where summer is just about everything between February and December.) >I think lots of people do want to know about recursive spiders, and I >bet one of the most frequent obstacles are issues like: queueing, depth >vs. breadth first crawling, (memory) efficient storage of extracted and >crawled links, etc. I'm getting the feeling that I should see spiders as of two kinds: kinds that spider everything under a given URL (like "http://www.speech.cs.cmu.edu/~sburke/pub/" or "http://www."), and kinds that go hog wide across all of the Web. The usefulness of the single-host spiders is pretty obvious to me. But why do people want to write spiders that potentially span all/any hosts? (Aside from people who are working for Google or similar.) -- Sean M. Burke [EMAIL PROTECTED] http://www.spinn.net/~sburke/ -- This message was sent by the Internet robots and spiders discussion list ([EMAIL PROTECTED]). For list server commands, send "help" in the body of a message to "[EMAIL PROTECTED]".
