Alan Tanaman wrote: > Currently Nutch creates a Lucene multifile index, and makes sure any > existing compound index is converted to multifile by using the > IndexWriter.setUseCompoundFile(false) method. > > > > This is done whenever an IndexWriter is opened in the following methods: > > org.apache.nutch.indexer.Indexer.getRecordWriter > > org.apache.nutch.indexer.IndexSorter.sort > > org.apache.nutch.indexer.IndexMerger.merge > > > > Is there a technical constraint as to why Nutch should ensure usage of > multifile (or prevent compound) and not allow the type to be set by a > property setting? > > > > Does anyone object to/support a patch to allow this to be configurable? > > >
Multifile indexes are somewhat faster, and require much less temporary space during indexing. Why would you want to use the compound format with Nutch? The typical use of Nutch is that you work with a single or at most couple (few) indexes per machine - in such case, regular non-compound index works better, and there is no danger of running out of file handles. -- Best regards, Andrzej Bialecki <>< ___. ___ ___ ___ _ _ __________________________________ [__ || __|__/|__||\/| Information Retrieval, Semantic Web ___|||__|| \| || | Embedded Unix, System Integration http://www.sigram.com Contact: info at sigram dot com ------------------------------------------------------------------------- Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT & business topics through brief surveys - and earn cash http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV _______________________________________________ Nutch-developers mailing list Nutch-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nutch-developers