Single NutchBean and multiple indices support

Jack Tang Wed, 15 Feb 2006 09:25:19 -0800

Hi there.

I am facing the same the question and looking for same solution.
Your solution seems easy:) My question is what file system the
application runs on?
LocalFileSystem or DistributedFileSystem?


Thanks
/Jack

On 2/9/06, Ravi Chintakunta <[EMAIL PROTECTED]> wrote:
> Hi David,
>
> Thanks for your reply.
>
> After posting the question, I have done this in a more optimum way.
>
> - I used only a single NutchBean and modified it so that the search
> method takes the indices being searched as an argument. This single
> NutchBean creates separate IndexReaders on the merged indices in the
> directories and keeps them in a map.
>
> - Based on the indexes that are searched, NutchBean creates an
> IndexSearcher using the appropriate IndexReaders. I have added a
> constructor to IndexSearcher that takes an array of IndexReaders and
> uses a MultiReader to initialize itself.
>
> - The NutchBean creates a single FetchedSegments with the combination
> of the segments directories in all the directories.
>
> The advantages with this are:
>
> - A single IndexReader for an index - so no additional filehandles are 
> created.
> - No opening / closing of readers or segments - this improves performance.
>
>
> - Ravi Chintakunta
>
>
> > This is almost exactly what I've done.  I create a new NutchBean for
> > each search, and point it at whichever of 9 subdirectories the user has
> > selected; because I really don't want 511 (2^9-1) beans hanging around.
> >
> > The reason for the "too many open files" is that the NutchBean doesn't
> > clean up after itself - I guess because for most people, the NutchBean
> > is going to be reused.
> >
> > I added a close() method to FetchSegments.Segment in my installation,
> > to close all the readers.  I added a closeSegments() method to
> > NutchBean, to call close() on each segment that's been opened.  Then I
> > call closeSegments() after each search.
> >
> > I realise that NutchBean really wasn't designed to support being
> > instantiated once per search, but I don't care.  It works well, and
> > performance is not an issue.
> >
> > Regards,
> > David.
> >
> >
> > Date: Mon, 6 Feb 2006 20:59:34 -0500
> > From: Ravi Chintakunta <[EMAIL PROTECTED]>
> > To: [email protected]
> > Subject: [Nutch-general] Dynamic merging of indices
> > Reply-To: [EMAIL PROTECTED]
> >
> > I have multiple indices for the crawls across various intranet sites
> > stored in separate folders. My search application should support
> > searching across one or more of these indices dynamically - by way of
> > checkboxes on the web page.  For this, I have modified NutchBean to
> > create the IndexSearcher and FetchedSegments from the segments
> > directory (not the merged index directory) in these folders.  Based on
> > the selected intranet sites, a NutchBean is instantiated for the
> > indices  of the selected sites and the results are displayed.
> >
> > With this I had the "Too many open files error" and have increased the
> > number of files limit.
> >
> > This seems to work well now. But if I have 5 such sites, then I am
> > opening 2^5 =3D 32 times more files than I would have opened.
> >
> > My question is: Is there a better way of doing this? Like:
> >
> > - Can I open an IndexReader on each of the merged index directory and
> > dynamically create an IndexSearcher by merging these readers using
> > MultiReader?
> >
> > - Is an IndexReader thread safe and can it be used simultaneously in
> > different IndexSearchers?
> >
> > - Can I create the IndexReader on the merged index directory and
> > create the corresponding FetchedSegments on the corresponding
> > non-merged segments directory?
> >
> > Thanks
> > Ravi Chintakunta
> >
> >
> >
> >
> > ********************************************************************************
> > This email may contain legally privileged information and is intended only 
> > for the addressee. It is not necessarily the official view or
> > communication of the New Zealand Qualifications Authority. If you are not 
> > the intended recipient you must not use, disclose, copy or distribute this 
> > email or
> > information in it. If you have received this email in error, please contact 
> > the sender immediately. NZQA does not accept any liability for changes made 
> > to this email or attachments after sending by NZQA.
> >
> > All emails have been scanned for viruses and content by MailMarshal.
> > NZQA reserves the right to monitor all email communications through its 
> > network.
> >
> > ********************************************************************************
> >
> >
>


--
Keep Discovering ... ...
http://www.jroller.com/page/jmars

Single NutchBean and multiple indices support

Reply via email to