Hi there. I am facing the same the question and looking for same solution. Your solution seems easy:) My question is what file system the application runs on? LocalFileSystem or DistributedFileSystem?
Thanks /Jack On 2/9/06, Ravi Chintakunta <[EMAIL PROTECTED]> wrote: > Hi David, > > Thanks for your reply. > > After posting the question, I have done this in a more optimum way. > > - I used only a single NutchBean and modified it so that the search > method takes the indices being searched as an argument. This single > NutchBean creates separate IndexReaders on the merged indices in the > directories and keeps them in a map. > > - Based on the indexes that are searched, NutchBean creates an > IndexSearcher using the appropriate IndexReaders. I have added a > constructor to IndexSearcher that takes an array of IndexReaders and > uses a MultiReader to initialize itself. > > - The NutchBean creates a single FetchedSegments with the combination > of the segments directories in all the directories. > > The advantages with this are: > > - A single IndexReader for an index - so no additional filehandles are > created. > - No opening / closing of readers or segments - this improves performance. > > > - Ravi Chintakunta > > > > This is almost exactly what I've done. I create a new NutchBean for > > each search, and point it at whichever of 9 subdirectories the user has > > selected; because I really don't want 511 (2^9-1) beans hanging around. > > > > The reason for the "too many open files" is that the NutchBean doesn't > > clean up after itself - I guess because for most people, the NutchBean > > is going to be reused. > > > > I added a close() method to FetchSegments.Segment in my installation, > > to close all the readers. I added a closeSegments() method to > > NutchBean, to call close() on each segment that's been opened. Then I > > call closeSegments() after each search. > > > > I realise that NutchBean really wasn't designed to support being > > instantiated once per search, but I don't care. It works well, and > > performance is not an issue. > > > > Regards, > > David. > > > > > > Date: Mon, 6 Feb 2006 20:59:34 -0500 > > From: Ravi Chintakunta <[EMAIL PROTECTED]> > > To: [email protected] > > Subject: [Nutch-general] Dynamic merging of indices > > Reply-To: [EMAIL PROTECTED] > > > > I have multiple indices for the crawls across various intranet sites > > stored in separate folders. My search application should support > > searching across one or more of these indices dynamically - by way of > > checkboxes on the web page. For this, I have modified NutchBean to > > create the IndexSearcher and FetchedSegments from the segments > > directory (not the merged index directory) in these folders. Based on > > the selected intranet sites, a NutchBean is instantiated for the > > indices of the selected sites and the results are displayed. > > > > With this I had the "Too many open files error" and have increased the > > number of files limit. > > > > This seems to work well now. But if I have 5 such sites, then I am > > opening 2^5 =3D 32 times more files than I would have opened. > > > > My question is: Is there a better way of doing this? Like: > > > > - Can I open an IndexReader on each of the merged index directory and > > dynamically create an IndexSearcher by merging these readers using > > MultiReader? > > > > - Is an IndexReader thread safe and can it be used simultaneously in > > different IndexSearchers? > > > > - Can I create the IndexReader on the merged index directory and > > create the corresponding FetchedSegments on the corresponding > > non-merged segments directory? > > > > Thanks > > Ravi Chintakunta > > > > > > > > > > ******************************************************************************** > > This email may contain legally privileged information and is intended only > > for the addressee. It is not necessarily the official view or > > communication of the New Zealand Qualifications Authority. If you are not > > the intended recipient you must not use, disclose, copy or distribute this > > email or > > information in it. If you have received this email in error, please contact > > the sender immediately. NZQA does not accept any liability for changes made > > to this email or attachments after sending by NZQA. > > > > All emails have been scanned for viruses and content by MailMarshal. > > NZQA reserves the right to monitor all email communications through its > > network. > > > > ******************************************************************************** > > > > > -- Keep Discovering ... ... http://www.jroller.com/page/jmars
