Renaud: Yes or No!. I have done some testing as Dennis Kubes suggested and got similler results like his test. In short having 4 nutch search servers in one box but in 4 different disks with in my case 0.75 mil docs per disk. I had about 4 gig memory and 1 AMD 64 processor and it worked out rather ok. I need to do more testing to fine tune this cos this really brings the issue of cost. I have also thought about doing some testing with VIA EPIA boards. Maybe in the future :-)
The problem I encountered is more this http://issues.apache.org/jira/browse/NUTCH-92 but this will be solved sooner or later just a matter of time. Cheers On 9/5/06, Renaud Richardet <[EMAIL PROTECTED]> wrote: > Zaheed, > > Thank you, that works good. Do you know if there is a big performance > overhead with starting 2 servers? As an alternative, we could use > Lucene's Multisearcher? > > -- Renaud > > > Zaheed Haque wrote: > > Hi: > > > > Assuming you have > > > > index 1 at /data/crawl1 > > index 2 at /data/crawl2 > > > > In nutch-site.xml > > searcher.dir = /data > > > > Under /data you have a text file called search-server.txt (I think do > > check nutch-site search.dir description please) > > > > In the text file you will have the following > > > > hostname1 portnumber > > hostname2 portnumber > > > > example > > localhost 1234 > > localhost 5678 > > > > Then you need to start > > > > bin/nutch server 1234 /data/craw1 & > > > > and > > > > bin/nutch server 5678 /data/crawl2 & > > > > now try > > > > bin/nutch org.apache.nutch.search.NutchBean www > > > > you should see results :-) > > > > Cheers > > > > On 9/5/06, Renaud Richardet <[EMAIL PROTECTED]> wrote: > >> @Dennis, > >> Can you explain how to setup distributed search while storing the 2 > >> indexes on the same local machine (if possible)? > >> > >> @Feng, > >> We created a shell script to merge 2 runs, let us know if that works for > >> you. > >> http://wiki.apache.org/nutch/MergeCrawl > >> > >> Renaud > >> > >> > >> Dennis Kubes wrote: > >> > You can keep the indexes separate and use the distributed search > >> > server, one per index or you can use the mergedb and mergesegs > >> > commands to merge the two runs into a single crawldb and a single > >> > segments then re-run the invertlinks and index to create a single > >> > index file which can then be searched. > >> > > >> > Dennis > >> > > >> > Feng Ji wrote: > >> >> Hi there, > >> >> > >> >> In Nutch 08, I have crawled down from two webDB independently. > >> >> > >> >> For each run, I did invertlinks and index. So each one is searchable. > >> >> > >> >> Now I want to combine them togeter for search. I tried "merge" > >> >> command to > >> >> merge two indexes, but the search for the result index output dir is > >> >> dull. > >> >> Do I need put output dir to the same directory as above two crawl/ ? > >> >> > >> >> I wonder what is proper steps to combine two seperate run into one > >> >> search > >> >> result. Do I need to combine two webdb, merge two segments and do > >> >> invertlinks and do index? > >> >> > >> >> thanks your time, > >> >> > >> >> Michael, > >> >> > >> > > >> > >> -- > >> Renaud Richardet > >> COO America > >> Wyona - Open Source Content Management - Apache Lenya > >> office +1 857 776-3195 mobile +1 617 230 9112 > >> renaud.richardet <at> wyona.com http://www.wyona.com > >> > >> > > > > -- > Renaud Richardet > COO America > Wyona - Open Source Content Management - Apache Lenya > office +1 857 776-3195 mobile +1 617 230 9112 > renaud.richardet <at> wyona.com http://www.wyona.com > > ------------------------------------------------------------------------- Using Tomcat but need to do more? Need to support web services, security? Get stuff done quickly with pre-integrated technology to make your job easier Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642 _______________________________________________ Nutch-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-general
