If you look at the section of the tutorial for doing intranet crawls, you should be able to use that for your small number of websites. The bin/nutch script wraps up all the crawl functions for you (fetching, indexing, deduping, etc). You'll just need to delete the results of your previous day's crawl, copy over the results of the new crawl and restart tomcat each night.
Jake. -----Original Message----- From: Sugra Llistaire [mailto:[EMAIL PROTECTED] Sent: Thursday, February 23, 2006 4:55 AM To: [email protected] Subject: Simple indexation and reindexation Hello, I have a small number of websites to be indexed. Formerly, my search engine was udmGoSearch. But I'm glad to see there is this J2EE search engine. But I'm trying to emulate process of udm search with nutch and it doesn't seem to be possible. The system was simple. First day, I indexed the web site. Nightly, I executed a script to reindex the website. I didn't have to think in fetching, duplicating, injecting. All this was included in udm's script. Of course, it is unavoidable, reconfiguring urls filters and all that stuff. Is it possible to use nutch with this easy process? Has anyone implemented the script that makes all this job? A first indexation script and a nightly reindexation script. Thanks in advance. ------------------------------------------------------- This SF.Net email is sponsored by xPML, a groundbreaking scripting language that extends applications into web and mobile media. Attend the live webcast and join the prime developer group breaking into this new coding territory! http://sel.as-us.falkag.net/sel?cmd=lnk&kid0944&bid$1720&dat1642 _______________________________________________ Nutch-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-general
