Hi, I ve worked with Nutch till last year and I am now trying to do something (about continious queries) new with it.
I have only used nutch for getting the index an searching something in a generated site-map (with the WebDB). Now I want to use it for to get a archive of a certain number of sites. So I ll want to nutch to crawl the sites every day (like I used it before) but also download and save the REAL content of the sites (all html and pictures), so I can work with this real content. Is there a possibility to make nutch save also the content like it is crawled, and not only creating the WebDB and Index? Actually I have a solution with a perl script, wget, and lucene, but it would be perfect if I can use nutch from now on. Thanks for your help. Nils ------------------------------------------------------------------------- Using Tomcat but need to do more? Need to support web services, security? Get stuff done quickly with pre-integrated technology to make your job easier Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642 _______________________________________________ Nutch-general mailing list Nutch-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nutch-general