Re: [Nutch-general] how to combine two run's result for search

Dennis Kubes Tue, 05 Sep 2006 18:55:02 -0700

Are those like the shuttle boards?  Smaller 1/4 size boxes?

Dennis


Zaheed Haque wrote:
> Renaud:
>
> Yes or No!. I have done some testing as Dennis Kubes suggested and got
> similler results like his test. In short having 4 nutch search servers
> in one box but in 4 different disks with in my case 0.75 mil docs per
> disk. I had about 4 gig memory and 1 AMD 64 processor and it worked
> out rather ok. I need to do more testing to fine tune this cos this
> really brings the issue of cost. I have also thought about doing some
> testing with VIA EPIA boards. Maybe in the future :-)
>
> The problem I encountered is more this
>
> http://issues.apache.org/jira/browse/NUTCH-92
>
> but this will be solved sooner or later just a matter of time.
>
> Cheers
>
>
> On 9/5/06, Renaud Richardet <[EMAIL PROTECTED]> wrote:
>> Zaheed,
>>
>> Thank you, that works good. Do you know if there is a big performance
>> overhead with starting 2 servers? As an alternative, we could use
>> Lucene's Multisearcher?
>>
>> -- Renaud
>>
>>
>> Zaheed Haque wrote:
>> > Hi:
>> >
>> > Assuming you have
>> >
>> > index 1 at /data/crawl1
>> > index 2 at /data/crawl2
>> >
>> > In nutch-site.xml
>> > searcher.dir = /data
>> >
>> > Under /data you have a text file called search-server.txt (I think do
>> > check nutch-site search.dir description please)
>> >
>> > In the text file you will have the following
>> >
>> > hostname1 portnumber
>> > hostname2 portnumber
>> >
>> > example
>> > localhost 1234
>> > localhost 5678
>> >
>> > Then you need to start
>> >
>> > bin/nutch server 1234 /data/craw1 &
>> >
>> > and
>> >
>> > bin/nutch server 5678 /data/crawl2 &
>> >
>> > now try
>> >
>> > bin/nutch org.apache.nutch.search.NutchBean www
>> >
>> > you should see results :-)
>> >
>> > Cheers
>> >
>> > On 9/5/06, Renaud Richardet <[EMAIL PROTECTED]> wrote:
>> >> @Dennis,
>> >> Can you explain how to setup distributed search while storing the 2
>> >> indexes on the same local machine (if possible)?
>> >>
>> >> @Feng,
>> >> We created a shell script to merge 2 runs, let us know if that 
>> works for
>> >> you.
>> >> http://wiki.apache.org/nutch/MergeCrawl
>> >>
>> >> Renaud
>> >>
>> >>
>> >> Dennis Kubes wrote:
>> >> > You can keep the indexes separate and use the distributed search
>> >> > server, one per index or you can use the mergedb and mergesegs
>> >> > commands to merge the two runs into a single crawldb and a single
>> >> > segments then re-run the invertlinks and index to create a single
>> >> > index file which can then be searched.
>> >> >
>> >> > Dennis
>> >> >
>> >> > Feng Ji wrote:
>> >> >> Hi there,
>> >> >>
>> >> >> In Nutch 08, I have crawled down from two webDB independently.
>> >> >>
>> >> >> For each run, I did invertlinks and index. So each one is 
>> searchable.
>> >> >>
>> >> >> Now I want to combine them togeter for search. I tried "merge"
>> >> >> command to
>> >> >> merge two indexes, but the search for the result index output 
>> dir is
>> >> >> dull.
>> >> >> Do I need put output dir to the same directory as above two 
>> crawl/ ?
>> >> >>
>> >> >> I wonder what is proper steps to combine two seperate run into one
>> >> >> search
>> >> >> result. Do I need to combine two webdb, merge two segments and do
>> >> >> invertlinks and do index?
>> >> >>
>> >> >> thanks your time,
>> >> >>
>> >> >> Michael,
>> >> >>
>> >> >
>> >>
>> >> --
>> >> Renaud Richardet
>> >> COO America
>> >> Wyona    -   Open Source Content Management   -   Apache Lenya
>> >> office +1 857 776-3195                  mobile +1 617 230 9112
>> >> renaud.richardet <at> wyona.com           http://www.wyona.com
>> >>
>> >>
>> >
>>
>> -- 
>> Renaud Richardet
>> COO America
>> Wyona    -   Open Source Content Management   -   Apache Lenya
>> office +1 857 776-3195                  mobile +1 617 230 9112
>> renaud.richardet <at> wyona.com           http://www.wyona.com
>>
>>

-------------------------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

Re: [Nutch-general] how to combine two run's result for search

Reply via email to