Re: [Nutch-general] how to combine two run's result for search

Zaheed Haque Tue, 05 Sep 2006 12:12:53 -0700

Renaud:

Yes or No!. I have done some testing as Dennis Kubes suggested and got
similler results like his test. In short having 4 nutch search servers
 in one box but in 4 different disks with in my case 0.75 mil docs per
disk. I had about 4 gig memory and 1 AMD 64 processor and it worked
out rather ok. I need to do more testing to fine tune this cos this
really brings the issue of cost. I have also thought about doing some
testing with VIA EPIA boards. Maybe in the future :-)


The problem I encountered is more this

http://issues.apache.org/jira/browse/NUTCH-92

but this will be solved sooner or later just a matter of time.

Cheers


On 9/5/06, Renaud Richardet <[EMAIL PROTECTED]> wrote:
> Zaheed,
>
> Thank you, that works good. Do you know if there is a big performance
> overhead with starting 2 servers? As an alternative, we could use
> Lucene's Multisearcher?
>
> -- Renaud
>
>
> Zaheed Haque wrote:
> > Hi:
> >
> > Assuming you have
> >
> > index 1 at /data/crawl1
> > index 2 at /data/crawl2
> >
> > In nutch-site.xml
> > searcher.dir = /data
> >
> > Under /data you have a text file called search-server.txt (I think do
> > check nutch-site search.dir description please)
> >
> > In the text file you will have the following
> >
> > hostname1 portnumber
> > hostname2 portnumber
> >
> > example
> > localhost 1234
> > localhost 5678
> >
> > Then you need to start
> >
> > bin/nutch server 1234 /data/craw1 &
> >
> > and
> >
> > bin/nutch server 5678 /data/crawl2 &
> >
> > now try
> >
> > bin/nutch org.apache.nutch.search.NutchBean www
> >
> > you should see results :-)
> >
> > Cheers
> >
> > On 9/5/06, Renaud Richardet <[EMAIL PROTECTED]> wrote:
> >> @Dennis,
> >> Can you explain how to setup distributed search while storing the 2
> >> indexes on the same local machine (if possible)?
> >>
> >> @Feng,
> >> We created a shell script to merge 2 runs, let us know if that works for
> >> you.
> >> http://wiki.apache.org/nutch/MergeCrawl
> >>
> >> Renaud
> >>
> >>
> >> Dennis Kubes wrote:
> >> > You can keep the indexes separate and use the distributed search
> >> > server, one per index or you can use the mergedb and mergesegs
> >> > commands to merge the two runs into a single crawldb and a single
> >> > segments then re-run the invertlinks and index to create a single
> >> > index file which can then be searched.
> >> >
> >> > Dennis
> >> >
> >> > Feng Ji wrote:
> >> >> Hi there,
> >> >>
> >> >> In Nutch 08, I have crawled down from two webDB independently.
> >> >>
> >> >> For each run, I did invertlinks and index. So each one is searchable.
> >> >>
> >> >> Now I want to combine them togeter for search. I tried "merge"
> >> >> command to
> >> >> merge two indexes, but the search for the result index output dir is
> >> >> dull.
> >> >> Do I need put output dir to the same directory as above two crawl/ ?
> >> >>
> >> >> I wonder what is proper steps to combine two seperate run into one
> >> >> search
> >> >> result. Do I need to combine two webdb, merge two segments and do
> >> >> invertlinks and do index?
> >> >>
> >> >> thanks your time,
> >> >>
> >> >> Michael,
> >> >>
> >> >
> >>
> >> --
> >> Renaud Richardet
> >> COO America
> >> Wyona    -   Open Source Content Management   -   Apache Lenya
> >> office +1 857 776-3195                  mobile +1 617 230 9112
> >> renaud.richardet <at> wyona.com           http://www.wyona.com
> >>
> >>
> >
>
> --
> Renaud Richardet
> COO America
> Wyona    -   Open Source Content Management   -   Apache Lenya
> office +1 857 776-3195                  mobile +1 617 230 9112
> renaud.richardet <at> wyona.com           http://www.wyona.com
>
>

-------------------------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

Re: [Nutch-general] how to combine two run's result for search

Reply via email to