David,
you don't need crawl and link db merged, right you need to provide a
link db, but this is just for some detail information. I personal
remove this feature from the jsp's.
However merging the indexes will work, it is just a question where
you store the index, how you name the folder and that you provide at
least a dummy linkdb.
I'm not sure what the name of the merged index folder should be, i
guess index but you can take a look into the nutch bean init methods
to verify things.
HTH
Stefan
Am 05.02.2006 um 04:54 schrieb McCallie,David:
Hello,
First, let me thank all the developers who have created Nutch -- it is
wonderful and elegant code.
Second, a simple question:
I am using "bin/nutch crawl" to crawl and index two separate sites:
one
is an http site, and the second is a network file system. These two
crawls have completely different URL seed files, and different
crawl-urlfilter.txt files. When the two crawls are done, I'd like to
merge the indexes into a single index for the webapp to search. How
should I do this? I tried using "bin/nutch merge" to simply merge the
index directories into a third directory. This created a valid Lucene
Index (verified with Luke) but it won't work with the search.jsp in
the
webapp. I assume that I need to merge the crawldb and linkdb as
well,
but I can't see how to do this?
Thanks in advance,
--david
CONFIDENTIALITY NOTICE
This message and any included attachments
are from Cerner Corporation and are intended
only for the addressee. The information
contained in this message is confidential and
may constitute inside or non-public information
under international, federal, or state
securities laws. Unauthorized forwarding,
printing, copying, distribution, or use of such
information is strictly prohibited and may be
unlawful. If you are not the addressee, please
promptly delete this message and notify the
sender of the delivery error by e-mail or you
may call Cerner's corporate offices in Kansas
City, Missouri, U.S.A at (+1) (816)221-1024.
---------------------------------------- --
---------------------------------------------------------------
company: http://www.media-style.com
forum: http://www.text-mining.org
blog: http://www.find23.net