Distinction between AtomicReader and CompositeReader

2013-04-24 Thread Paul Taylor
Trying to convert some Lucene 3 code to Lucene 4, I want to use termEnums.docs(ir.getLiveDocs()) to only return docs that have not been deleted for a particular term. However getLiveDocs() is only available for AtomicReaders, and although I just have a single index it is file based and uses

Too many unique terms

2013-04-24 Thread Manuel LeNormand
Hi there, Looking at my index (about 1M docs) i see lot of unique terms, more than 8M which is a significant part of my total term count. These are very likely useless terms, binaries or other meaningless numbers that come with few of my docs. I am totally fine with deleting them so these terms

Re: Too many unique terms

2013-04-24 Thread Adrien Grand
Hi Manuel, On Thu, Apr 25, 2013 at 12:29 AM, Manuel LeNormand manuel.lenorm...@gmail.com wrote: Hi there, Looking at my index (about 1M docs) i see lot of unique terms, more than 8M which is a significant part of my total term count. These are very likely useless terms, binaries or other

Re: Distinction between AtomicReader and CompositeReader

2013-04-24 Thread Adrien Grand
Hi Paul On Wed, Apr 24, 2013 at 1:35 PM, Paul Taylor paul_t...@fastmail.fm wrote: Trying to convert some Lucene 3 code to Lucene 4, I want to use termEnums.docs(ir.getLiveDocs()) to only return docs that have not been deleted for a particular term. However getLiveDocs() is only available for

Re: org.apache.lucene.classification - bug in SimpleNaiveBayesClassifier

2013-04-24 Thread Adrien Grand
Hi Alexey, On Tue, Apr 23, 2013 at 3:28 PM, Alexey Anatolevitch alexeyl...@gmail.com wrote: I was trying it with 4.2.1 and SimpleNaiveBayesClassifier seems to have a bug - the local copy of BytesRef referenced by foundClass is affected by subsequent TermsEnum.iterator.next() calls as the