Hi Dan

I'll presume you've done the crawls already..

Each resulting crawled folder should have 3 folders, db, index and
segments.

Create your search.dir folder and create a segments folder in that.

Each segments folder in each crawl folder should contain folders with
timestamps as the names.  Copy the contents of:

crawlA/segments
crawlB/segments
crawlc/segments

(i.e. The folders with timestamps as names)Into:

search.dir/segments

Next, delete the duplicates from the segments by running the command:

bin/nutch dedup -local search.dir/segments

Then you need to merge the segments to create an index folder, so run
the command:

bin/nutch merge -local search.dir/index search.dir/segments/*

You should now have two folders in your search.dir:
search.dir/segments
search.dir/index

That's all you need for serving pages (db folder is only used when
fetching).

Now just set the searcher.dir property value in nutch-site.xml to be the
location of search.dir

That's how I've been doing it, although it may not be the "right" way.
:-) Hope this helps.

Cheers
Aled


> -----Neges Wreiddiol-----/-----Original Message-----
> Oddi wrth/From: Dan Morrill [mailto:[EMAIL PROTECTED] 
> Anfonwyd/Sent: 29 March 2006 18:06
> At/To: [email protected]
> Copi/Cc: [EMAIL PROTECTED]
> Pwnc/Subject: Multiple crawls how to get them to work together
> 
> Hi folks,
> 
>  
> 
> I have 3 crawls, crawlA, crawlB, and crawlC. I would like all 
> of them to be available to the search.jsp page. 
> 
>  
> 
> I went through the site saw merge, index, make new db, and 
> followed all the directions that I could find, but still no 
> resolution on this one. So what I need are some idea's on 
> where to proceed from here, I intend on having 2 or
> 3 boxes make a crawl, then somehow merge the crawls together 
> and form a "master" under search.dir. I would also want to 
> update this one on a regular basis. 
> 
>  
> 
> Unfortunately, the instructions to date have all been tried, 
> and have all lead to the idea not working. There is also no 
> indexmerger or indexsemgents directives in nutch 0.7.1. Any 
> support ideas, direct pointers, or even step-by-step 
> instructions on how to do this (outside of what is in the 
> tutorials because that has been tried already, including 
> support idea's in the user web mail list). 
> 
>  
> 
> Cheers/r/dan
> 
>  
> 
>  
> 
>  
> 
> 
###########################################

This message has been scanned by F-Secure Anti-Virus for Microsoft Exchange.
For more information, connect to http://www.f-secure.com/

************************************************************************
This e-mail and any attachments are strictly confidential and intended solely 
for the addressee. They may contain information which is covered by legal, 
professional or other privilege. If you are not the intended addressee, you 
must not copy the e-mail or the attachments, or use them for any purpose or 
disclose their contents to any other person. To do so may be unlawful. If you 
have received this transmission in error, please notify us as soon as possible 
and delete the message and attachments from all places in your computer where 
they are stored. 

Although we have scanned this e-mail and any attachments for viruses, it is 
your responsibility to ensure that they are actually virus free.
 



-------------------------------------------------------
This SF.Net email is sponsored by xPML, a groundbreaking scripting language
that extends applications into web and mobile media. Attend the live webcast
and join the prime developer group breaking into this new coding territory!
http://sel.as-us.falkag.net/sel?cmd=lnk&kid0944&bid$1720&dat1642
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

Reply via email to