Zaheed,
Thank you, that works good. Do you know if there is a big performance
overhead with starting 2 servers? As an alternative, we could use
Lucene's Multisearcher?
-- Renaud
Zaheed Haque wrote:
Hi:
Assuming you have
index 1 at /data/crawl1
index 2 at /data/crawl2
In nutch-site.xml
searcher.dir = /data
Under /data you have a text file called search-server.txt (I think do
check nutch-site search.dir description please)
In the text file you will have the following
hostname1 portnumber
hostname2 portnumber
example
localhost 1234
localhost 5678
Then you need to start
bin/nutch server 1234 /data/craw1 &
and
bin/nutch server 5678 /data/crawl2 &
now try
bin/nutch org.apache.nutch.search.NutchBean www
you should see results :-)
Cheers
On 9/5/06, Renaud Richardet <[EMAIL PROTECTED]> wrote:
@Dennis,
Can you explain how to setup distributed search while storing the 2
indexes on the same local machine (if possible)?
@Feng,
We created a shell script to merge 2 runs, let us know if that works for
you.
http://wiki.apache.org/nutch/MergeCrawl
Renaud
Dennis Kubes wrote:
> You can keep the indexes separate and use the distributed search
> server, one per index or you can use the mergedb and mergesegs
> commands to merge the two runs into a single crawldb and a single
> segments then re-run the invertlinks and index to create a single
> index file which can then be searched.
>
> Dennis
>
> Feng Ji wrote:
>> Hi there,
>>
>> In Nutch 08, I have crawled down from two webDB independently.
>>
>> For each run, I did invertlinks and index. So each one is searchable.
>>
>> Now I want to combine them togeter for search. I tried "merge"
>> command to
>> merge two indexes, but the search for the result index output dir is
>> dull.
>> Do I need put output dir to the same directory as above two crawl/ ?
>>
>> I wonder what is proper steps to combine two seperate run into one
>> search
>> result. Do I need to combine two webdb, merge two segments and do
>> invertlinks and do index?
>>
>> thanks your time,
>>
>> Michael,
>>
>
--
Renaud Richardet
COO America
Wyona - Open Source Content Management - Apache Lenya
office +1 857 776-3195 mobile +1 617 230 9112
renaud.richardet <at> wyona.com http://www.wyona.com
--
Renaud Richardet
COO America
Wyona - Open Source Content Management - Apache Lenya
office +1 857 776-3195 mobile +1 617 230 9112
renaud.richardet <at> wyona.com http://www.wyona.com