Benjamin Higgins wrote: > Hello, > > Is it neccesary or desirable to run updatesegs and/or merge for a > single-machine setup that crawls ~1 million pages? >
Merge you need too i know.. you have to sync the segs with the crawldb if that is what you are referring to. > I ask because it appears that the 'crawl' tool, specifically for > intranets, > runs these commands, but they aren't included in the whole-web > instructions > in the tutorial. > > Also, can Nutch still service search queries while it is going through > the > whole generate, fetch, updatedb, index, dedup process? At what point > do the > new segments become searchable -- right after indexing? > Yes.. however the new segments only become searchable once the recrawl process is done (indexes are merged i believe) and the tomcat instance reloads the webapp. > Thanks! > > Ben > ------------------------------------------------------------------------- Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT & business topics through brief surveys -- and earn cash http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV _______________________________________________ Nutch-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-general
