Merging different crawls into a single index?

McCallie,David Sat, 04 Feb 2006 19:56:42 -0800


Hello,


First, let me thank all the developers who have created Nutch -- it is
wonderful and elegant code.

Second, a simple question:

I am using "bin/nutch crawl" to crawl and index two separate sites: one
is an http site, and the second is a network file system. These two
crawls have completely different URL seed files, and different
crawl-urlfilter.txt files.  When the two crawls are done, I'd like to
merge the indexes into a single index for the webapp to search.  How
should I do this?  I tried using "bin/nutch merge" to simply merge the
index directories into a third directory.  This created a valid Lucene
Index (verified with Luke) but it won't work with the search.jsp in the
webapp.   I assume that I need to merge the crawldb and linkdb as well,
but I can't see how to do this?

Thanks in advance,

--david





CONFIDENTIALITY NOTICE

This message and any included attachments
are from Cerner Corporation and are intended
only for the addressee. The information
contained in this message is confidential and
may constitute inside or non-public information
under international, federal, or state
securities laws. Unauthorized forwarding,
printing, copying, distribution, or use of such
information is strictly prohibited and may be
unlawful. If you are not the addressee, please
promptly delete this message and notify the
sender of the delivery error by e-mail or you
may call Cerner's corporate offices in Kansas
City, Missouri, U.S.A at (+1) (816)221-1024.
---------------------------------------- --

Merging different crawls into a single index?

Reply via email to