Regarding: http://www.mail-archive.com/[email protected]/msg08854.html
I too want to run multiple nutch instances. I have a two CPU (with two cores each) development box on which to develop my search application. I have installed a nightly build of nutch. Currently that installation is working on a crawl that will take it many days to complete. In the meanwhile, I want to be able to try some other tests. At this stage I'm more interested in the whole crawl cycle: inject, generate, fetch, updatedb, invertlinks, index. I'm less interested in search for now. So for instance, I'd like to install an even more recent nightly build, then run some short crawls with it. Maybe I'd like to have another version of nutch that I hack up. I'd want to play with it even as one of the other instances is running a crawl. My current installation is in: /usr/local/nutch-2007-06-27_06-52-44 I've also noticed that the log file hadoop.log gets created here: /var/tmp/nutch-2007-06-27_06-52-44 Other than these I haven't seen any environment variables or other global properties that might conflict. So it seems I could just install to /usr/local/new_nutch and I presume that this would be created: /var/tmp/new_nutch Some other discussions relating to this subject are here: http://www.mail-archive.com/[email protected]/msg04838.html As for different set up for different Nutch instances I think you could have multiple installations on your server where each instance would have its own conf directory (with specific config files) and source code can be shared via symbolic link. http://www.mail-archive.com/[email protected]/msg02138.html running multiple nutch on one box is possible but difficult. The problem is that tomcat and also nutch (0.8 map reduce/ ndfs) use a set of tcp port ports, that are already blocked in case a other unix user already runs nutch. The best way to go, is that you first use a subversion or cvs as centralized repository for your customized code, than all developers can share code and working together on the same code basis. Beside that each developer should run a tiny test instance of nutch on her developer machine. In the end it is a good idea - to have a script that download once a day the code from cvs and run a test suite and deploy the code on your 'big' server. http://cruisecontrol.sourceforge.net/ is a helpful tool. http://www.mail-archive.com/[email protected]/msg05061.html Q: Let's say I want to run 2 search engines on the same server. For search engine one I use the database "crawl" and for the second search engine I use "crawl2" as the database. For accessing the content could I use different ports for each engine? engine one will be localhost:8080 and engine two will be localhost:8081. Just asking if this is possible. A: Yes this is possible. You can use different ports or different virtualhost or different context path to separate the two ui's. You still need to have two separate web applications with two separate configurations (pointing to two separate directories) Q: the two different web applications is really no big deal. Is it possible that I could be pointed in the right direction or setting this up? Someone else setup nutch/tomcat/java for me so I am not exactly sure where I would set up the virtual host or where a config file would exist that would point to the database path. A: I quess the simplest way to do it is just copy the nutch- war-file under <TOMCAT>/webapps with two different names (search1.war and search2.war) then after tomcat has extracted the archives edit file <TOMCAT>/webapps/search1/WEB- INF/classes/nutch-site.xml and change searcher.dir to point to correct directory. For the other instance the configuration file is <TOMCAT>/webapps/search2/WEB-INF/classes/nutch-site.xml ----- Original Message ---- From: karthik085 <[EMAIL PROTECTED]> To: [email protected] Sent: Friday, July 20, 2007 3:13:24 PM Subject: Multiple Nuch Instances 1. Can I run multiple instances of nutch for crawling/indexing? I got mixed opinions - some say yes and some say no. Can someone, who have tried this let me know? One guy said it is difficult becuase multiple nutch instances have to use different ports? 2. If i can run multiple instances of nutch, can I run nutch v 0.7.2, nutch 0.9 and nutch-dev at the same time for crawling/indexing websites? Please let me know. Thanks. -- View this message in context: http://www.nabble.com/Multiple-Nuch-Instances-tf4119823.html#a11716837 Sent from the Nutch - User mailing list archive at Nabble.com. ____________________________________________________________________________________Ready for the edge of your seat? Check out tonight's top picks on Yahoo! TV. http://tv.yahoo.com/
