Hi, I was able to get Nutch to crawl my company's intranet and set up a search webapp without much trouble. However, I have some questions about maintaining that web app.
I'd like to be able to update the crawl periodically (probably nightly) with minimal fuss. I saw 2 bash scripts for updating a crawl: http://today.java.net/pub/a/today/2006/02/16/introduction-to-nutch-2.html http://wiki.apache.org/nutch/Nutch_-_The_Java_Search_Engine However, both fail when I tried to use them--apparently they use nutch commands that are no longer supported in 0.9. Simple modifications of the scripts didn't seem to help much. So... Question #1: How do I update a previous crawl with Nutch 0.9? Does someone have an updated version of the bash scripts in the links above? Or does nutch now do an all-in-one recrawl and I just haven't found the documentation yet? The second questions is about refreshing my webapp. The java.net article above says that Even with the re-crawl script, we have a problem with updating the live search index. As mentioned above, the |NutchBean| class opens the index to search when it is initialized. Since the Nutch web app caches the |NutchBean| in the application servlet context, updates to the index will never be picked up as long as the servlet container is running.This problem is recognized by the Nutch community, so it will likely be fixed in an upcoming release (Nutch 0.7.1 was the stable release at the time of writing). Question #2: Has this issue been resolved in Nutch 0.9? What's the easiest way to get the 0.9 webapp to pick up changes to a crawl? I'm comfortable monkeying around with the webapp a bit if necessary, but if there is a simple way of updating the web app, I'd prefer that. Thanks for any help, Michael ------------------------------------------------------------------------- This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ _______________________________________________ Nutch-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-general
