Re: [Nutch-general] fetcher : some doubts

Justin Hartman Tue, 02 Jan 2007 01:19:07 -0800

On 1/2/07, Sean Dean <[EMAIL PROTECTED]> wrote:
> There actually isn't much of a reason to generate "huge" multi-million page 
> fetch lists when you can create lots of smaller ones and merge them together. 
> This allows for more of a ladder-style approach, and in some cases reduces 
> the risk of errors in terms of Hadoop versions (0.8+) with large 
> unrecoverable fetches or failed parse-reduce stag


The problem I am faced with is I'm not sure how to merge my indexes
together. For example I run a fetch of about 200,000 pages in about 3
or 4 different fetches. Once done I run the index command and all goes
very well and my index is built.

That said if I try and run a new fetch and then try and index the new
fetch I get an error saying "crawl/indexes" already exists.

How does one actually merge different fetches to the same index
without having to recreate the index each time?

Thanks!
Justin

-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys - and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

Re: [Nutch-general] fetcher : some doubts

Reply via email to