Re: [Nutch-dev] Creating Lucence Compound Index

Andrzej Bialecki Tue, 02 Jan 2007 05:10:12 -0800

Alan Tanaman wrote:
> Currently Nutch creates a Lucene multifile index, and makes sure any
> existing compound index is converted  to multifile by using the
> IndexWriter.setUseCompoundFile(false) method.
>
>  
>
> This is done whenever an IndexWriter is opened in the following methods:
>
> org.apache.nutch.indexer.Indexer.getRecordWriter
>
> org.apache.nutch.indexer.IndexSorter.sort
>
> org.apache.nutch.indexer.IndexMerger.merge
>
>  
>
> Is there a technical constraint as to why Nutch should ensure usage of
> multifile (or prevent compound) and not allow the type to be set by a
> property setting?
>
>  
>
> Does anyone object to/support  a patch to allow this to be configurable?
>
>  
>


Multifile indexes are somewhat faster, and require much less temporary 
space during indexing. Why would you want to use the compound format 
with Nutch? The typical use of Nutch is that you work with a single or 
at most couple (few) indexes per machine - in such case, regular 
non-compound index works better, and there is no danger of running out 
of file handles.

-- 
Best regards,
Andrzej Bialecki     <><
 ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com



-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys - and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
Nutch-developers mailing list
Nutch-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nutch-developers

Re: [Nutch-dev] Creating Lucence Compound Index

Reply via email to