About 2 months ago John Kleven posted asking about using nutch just to crawl.

I have the same question, essentially.  One possible development tack I can 
take with my project is: use nutch for crawling, then use Xapian for 
tokenization, indexing, etc.  Over time we will need to spider a lot of sites 
so I'm disinclined to use wget.

Does nutch have out-of-the-box capability to spider sites and write the output 
to html files?  If not, can someone give me a quick summary of how I would 
properly modify or subclass the nutch code?






      
____________________________________________________________________________________
Fussy? Opinionated? Impossible to please? Perfect.  Join Yahoo!'s user panel 
and lay it on us. http://surveylink.yahoo.com/gmrs/yahoo_panel_invite.asp?a=7 

Reply via email to