People write spiders that potentially span all/any hosts to harvest those email addresses for the annoying spam, to see if trademarks are being used illegally, to see if copyrights are being violated, etc.
> The replies to my request for advice have been very helpful! I'll pick one > and reply to it: > > At 10:01 2002-03-07 -0800, Otis Gospodnetic wrote: > >[about my forthcoming book] > >(i.e. I'm a potential customer :)) When will it be published? > > It's probably going into tech edit later this month. So it'll probably be > out this summer. (Altho bear in mind that I live in New Mexico, where > summer is just about everything between February and December.) > > >I think lots of people do want to know about recursive spiders, and I > >bet one of the most frequent obstacles are issues like: queueing, depth > >vs. breadth first crawling, (memory) efficient storage of extracted and > >crawled links, etc. > > I'm getting the feeling that I should see spiders as of two kinds: kinds > that spider everything under a given URL > (like "http://www.speech.cs.cmu.edu/~sburke/pub/" or "http://www."), and > kinds that go hog wide across all of the Web. > > The usefulness of the single-host spiders is pretty obvious to me. > But why do people want to write spiders that potentially span all/any hosts? > (Aside from people who are working for Google or similar.) > > -- > Sean M. Burke [EMAIL PROTECTED] http://www.spinn.net/~sburke/ > > -- > This message was sent by the Internet robots and spiders discussion list >([EMAIL PROTECTED]). For list server commands, send "help" in the body of a >message to "[EMAIL PROTECTED]". -- This message was sent by the Internet robots and spiders discussion list ([EMAIL PROTECTED]). For list server commands, send "help" in the body of a message to "[EMAIL PROTECTED]".
