Re: [Nutch-general] how to combine two run's result for search

Renaud Richardet Tue, 05 Sep 2006 11:24:23 -0700

Zaheed,

Thank you, that works good. Do you know if there is a big performance 
overhead with starting 2 servers? As an alternative, we could use 
Lucene's Multisearcher?


-- Renaud


Zaheed Haque wrote:
> Hi:
>
> Assuming you have
>
> index 1 at /data/crawl1
> index 2 at /data/crawl2
>
> In nutch-site.xml
> searcher.dir = /data
>
> Under /data you have a text file called search-server.txt (I think do
> check nutch-site search.dir description please)
>
> In the text file you will have the following
>
> hostname1 portnumber
> hostname2 portnumber
>
> example
> localhost 1234
> localhost 5678
>
> Then you need to start
>
> bin/nutch server 1234 /data/craw1 &
>
> and
>
> bin/nutch server 5678 /data/crawl2 &
>
> now try
>
> bin/nutch org.apache.nutch.search.NutchBean www
>
> you should see results :-)
>
> Cheers
>
> On 9/5/06, Renaud Richardet <[EMAIL PROTECTED]> wrote:
>> @Dennis,
>> Can you explain how to setup distributed search while storing the 2
>> indexes on the same local machine (if possible)?
>>
>> @Feng,
>> We created a shell script to merge 2 runs, let us know if that works for
>> you.
>> http://wiki.apache.org/nutch/MergeCrawl
>>
>> Renaud
>>
>>
>> Dennis Kubes wrote:
>> > You can keep the indexes separate and use the distributed search
>> > server, one per index or you can use the mergedb and mergesegs
>> > commands to merge the two runs into a single crawldb and a single
>> > segments then re-run the invertlinks and index to create a single
>> > index file which can then be searched.
>> >
>> > Dennis
>> >
>> > Feng Ji wrote:
>> >> Hi there,
>> >>
>> >> In Nutch 08, I have crawled down from two webDB independently.
>> >>
>> >> For each run, I did invertlinks and index. So each one is searchable.
>> >>
>> >> Now I want to combine them togeter for search. I tried "merge"
>> >> command to
>> >> merge two indexes, but the search for the result index output dir is
>> >> dull.
>> >> Do I need put output dir to the same directory as above two crawl/ ?
>> >>
>> >> I wonder what is proper steps to combine two seperate run into one
>> >> search
>> >> result. Do I need to combine two webdb, merge two segments and do
>> >> invertlinks and do index?
>> >>
>> >> thanks your time,
>> >>
>> >> Michael,
>> >>
>> >
>>
>> -- 
>> Renaud Richardet
>> COO America
>> Wyona    -   Open Source Content Management   -   Apache Lenya
>> office +1 857 776-3195                  mobile +1 617 230 9112
>> renaud.richardet <at> wyona.com           http://www.wyona.com
>>
>>
>

-- 
Renaud Richardet
COO America
Wyona    -   Open Source Content Management   -   Apache Lenya
office +1 857 776-3195                  mobile +1 617 230 9112
renaud.richardet <at> wyona.com           http://www.wyona.com


-------------------------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

Re: [Nutch-general] how to combine two run's result for search

Reply via email to