Benjamin Higgins wrote:
> Hello,
>
> Is it neccesary or desirable to run updatesegs and/or merge for a
> single-machine setup that crawls ~1 million pages?
>

Merge you  need too i know.. you have to sync the segs with the crawldb 
if that is what you are referring to.
> I ask because it appears that the 'crawl' tool, specifically for 
> intranets,
> runs these commands, but they aren't included in the whole-web 
> instructions
> in the tutorial.
>
> Also, can Nutch still service search queries while it is going through 
> the
> whole generate, fetch, updatedb, index, dedup process?  At what point 
> do the
> new segments become searchable -- right after indexing?
>
Yes.. however the new segments only become searchable once the recrawl 
process is done (indexes are merged i believe) and the tomcat instance 
reloads the webapp.

> Thanks!
>
> Ben
>

-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys -- and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

Reply via email to