Hi I have an index consisting of a double value that can range between certain values and an associated tag. I am trying to find all the docs which match a certain tag (or combination of tags) and a certain range.
I'm trying to use the TermRangeTermsEnum from the Flex API as part of a custom parser. This is how I'm using it (in the getDocIdSet() method). Terms myField = fields.terms("Count"); //this is the field I'm interested in for range enum termsEnum = myField.iterator(termsEnum); BytesRef lowerBound = new BytesRef(); NumericUtils.longToPrefixCodedBytes(NumericUtils.doubleToSortableLong(lower), 0, lowerBound); BytesRef upperBound = new BytesRef(); NumericUtils.longToPrefixCodedBytes(NumericUtils.doubleToSortableLong(upper), 0, upperBound); TermRangeTermsEnum termRangeTermsEnum= new TermRangeTermsEnum(termsEnum, lowerBound, upperBound, true, true); DocsEnum docs = null; FixedBitSet rangeFilter = new FixedBitSet(reader.maxDoc()); // Create a bitset of all docs that pass range filter while (termRangeTermsEnum.next() != null) { docs = termRangeTermsEnum.docs(startResults, docs, DocsEnum.FLAG_NONE); // no freq since we don't need them while (docs.nextDoc() != DocIdSetIterator.NO_MORE_DOCS) { rangeFilter.set(docs.docID()); } } Terms tagField = fields.terms("Tag");//the other field I want to filter by termsEnum = tagField.iterator(termsEnum); // filter by docs who match the tag private String[] tags; Set<Integer> myIds = new HashSet<Integer>(); for (String s : tags) { ref = new BytesRef(s); if (termsEnum.seekExact(ref, false)) { // don't use cache since we could pollute the cache here easily docs = termsEnum.docs(rangeFilter, docs, DocsEnum.FLAG_NONE); // no freq since we don't need them while (docs.nextDoc() != DocIdSetIterator.NO_MORE_DOCS) { myIds.add(docs.docID()); } } } This does return me the results I want but doesn't perform very well. By comparison, using TermsEnum and doing a check by hand of the range performs much better -its is an order of a magnitude better for small number of (<1000) records and about 3-4 times faster for more. Terms tagField = fields.terms("Tag"); termsEnum = tagField.iterator(termsEnum); Set<Integer> myIds = new HashSet<Integer>(); double value; for (String s : tags) { ref = new BytesRef(s); if (termsEnum.seekExact(ref, false)) { // don't use cache since we could pollute the cache here easily docs = termsEnum.docs(initialSet, docs, DocsEnum.FLAG_NONE); // no freq since we don't need them while (docs.nextDoc() != DocIdSetIterator.NO_MORE_DOCS) { value = cache.get(docs.docID()); if (value >= lowerBound && value <= upperBound) //check for the range myIds.add(docs.docID()); } } } Is this the expected usage of TermRangeTermsEnum? Is this the expected performance also? Any pointers or helpful references are welcome. Thanks, CV