Hi, I am currently evaluating Nutch for use on an intranet site search engine. I am by no means an expert in this field although I am trying to learn more about it.
1 I was reading one of the articles referenced on the nutch site: http://today.java.net/pub/a/today/2006/02/16/introduction-to-nutch-2.html -and I was a little bit concerned about its warning concerning "re-crawling" the site. I understand that there are several steps of crawling, building the index, etc., but it sounded to me like new pages on my web site would be ignored until I restarted the Nutch server even after I've re-crawled. Am I correct about this? How do most people deal with it? 2 It seems like I would want to re-crawl or re-index the site on a nightly basis. All of this seems to be done with shell scripts, and I wonder what options are available to someone working on a Windows platform. I could run cygrunsrv/cron on Windows I guess. Is there some reason more of this scripting couldn't be redone as a Java program? Also, has anybody considered creating a Windows service to manage indexing/crawling like the one that manages the Tomcat web server? Thanks, Bob __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com ------------------------------------------------------- This SF.Net email is sponsored by xPML, a groundbreaking scripting language that extends applications into web and mobile media. Attend the live webcast and join the prime developer group breaking into this new coding territory! http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642 _______________________________________________ Nutch-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-general
