Hai Kevin, After you replace the crawl folder, just do touch.
Use this command touch your_webapp_folder/WEB-INF/web.xml bye, bhupal Kevin.Y wrote: > > I'm using nutch0.9 to crawl some specified "content" urls, such as > http://xxxxx/art/1.htm > http://xxxxx/art/2.htm > http://xxxxx/art/3.htm > .... > > Here is what I'm doing: > I put these "content" urls into an url.txt, then use "bin/nutch crawl" > command to run a crawl. > After that I get a crawl data , let me call it crawl_A. > I make crawl_A the search.dir of the webapp. > So far it can be searched normally. > I crawl another set of "content" urls ,I get crawl_B and I merge it with > crawl_A, using the script here: > http://wiki.apache.org/nutch/MergeCrawl > After merging I get a new merged one called crawl_C. > Then I stop the Tomcat , replace crawl_A with crawl_C , and then restart > it. > > That's how I "update" my crawl data . And I don't think it's a smart > way...Especially i have to stop and restart the Tomcat otherwise I'll get > some file problems. > > Is there any better way? Any advise will be appreciated! > > > -- View this message in context: http://www.nabble.com/Need-some-advise-about-updating-crawl-data-tp15027375p15155283.html Sent from the Nutch - User mailing list archive at Nabble.com.
