Grigory, There are many ways to achieve what you are looking for - although nothing "Automatic" out of the box.
For search servers, you would simply run the distributed search server processes on your query servers. On your JSP/web server you would create the search hosts file that has the host and port number of all of your query servers. If you want redundancy, the easiest way is to create round robine dns for your query servers. So when you create your search servers.txt you would define your load balanced url for the query servers and DNS would delegate which specific host it would hit. Generally for creating indexes, i do about 1 million at a time and merge up to the factor of the hardware i'm running on (memory and cpu load). So typically i would say 4-10 million pages on a single server. As for updating while the system is up - currently you have to bounce the web services when you add new indexes as it isn't aware of changes - however for nutch/lucene you can do almost all admin functionality (index, fetch, generate segments) without worrynig about locks and such - if there is a lock issue you will usually be alerted to it :) To replicate your data, you can look at the NDFS code, use rsync or one of the distributed file systems such oas Coda, GFS or even AFS if you want more of a nfs type system. Alot of nutch is left for you to scale based upon your requirements, although with the recent JMX work and configuration stuff being done it will be easier to manage such systems and integrate a distributed architecture. If you say more of what your trying to accomplish we may be able to assist more. I would recommend reading the lists. If you want direct help, feel free to email me directly. -byron --- gbeg <[EMAIL PROTECTED]> wrote: > Hello all, > > I am particularly interested in the following > issues: > > 1. Deployment of nutch > - how to establish the search system with > several search servers, > - how to divide the "data" between them, > - how to perform scheduled refetchings, > - how many fetchers should be, etc. > > 2. High availability/redundancy of the system. > - How to update the indexes / webdb while > keeping the system alive, > - how to replicate the data, > - what happens when a search server goes > offline, > - how to make webdb redundant, etc. > > Please point me to the resources if there are any, > otherwise let's gather the knowledge and create the > appropriate docs :) > ------------------------------------------------------- This SF.Net email is sponsored by BEA Weblogic Workshop FREE Java Enterprise J2EE developer tools! Get your free copy of BEA WebLogic Workshop 8.1 today. http://ads.osdn.com/?ad_id=4721&alloc_id=10040&op=click _______________________________________________ Nutch-developers mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/nutch-developers
