Re: Best approach to Intersect results with big SetString?

2011-09-04 Thread Chris Hostetter

: This works, but i'm concerned about how many terms we could end up
: with as the size grows.
: 
: Another possibility could be a Filter that iterates though FieldCache
: and checks if each value is in the SetString
: 
: Any thoughts/directions on things to look at?

It really all depends on what kind of orders of magnitude you're tlaking 
about.  both in terms of the number of filters, the cardinality of 
those filters, and the likely hood of reuse (ie: will the same SetString 
be used many times?  will the strings in that Set typically be used but in 
various perumtations?


You might want to consider ways you could apply the concepts 
from Field Faceting (particularly the tradeoffs between the fc and enum 
methods, good values for enum.cache.minDf, fieldValueCache's use of 
bigTerms etc...) since you're faceing roughly the same questions -- 
except instead of computing a bunch of distinct facet counts, you want to 
compute the intersection of a bunch of filters ... but you need to 
decide when to cache those filters independently, when to not bother 
caching them at all, when to cache them as a reusable unit, etc...


-Hoss


Best approach to Intersect results with big SetString?

2011-09-01 Thread Ryan McKinley
I have an application where I need to return all results that are not
in a SetString  (the Set is managed from hazelcast... but that is
not relevant)

As a fist approach, i have a SerachComponent that injects a BooleanQuery:

  BooleanQuery bq = new BooleanQuery(true);
  for( String id : ids) {
bq.add(new BooleanClause(new TermQuery(new
Term(id,id)),Occur.MUST_NOT));
  }

This works, but i'm concerned about how many terms we could end up
with as the size grows.

Another possibility could be a Filter that iterates though FieldCache
and checks if each value is in the SetString

Any thoughts/directions on things to look at?

thanks
ryan