Thanks Erick, if I did not know the token up front that could be in the index is there not an efficient way to get the field for a specific document and do some custom processing on it?
On Mon, Aug 29, 2011 at 8:34 AM, Erick Erickson <erickerick...@gmail.com> wrote: > Start here I think: > > http://lucene.apache.org/java/3_0_2/api/core/index.html?org/apache/lucene/index/TermDocs.html > > Best > Erick > > On Mon, Aug 29, 2011 at 8:24 AM, Jamie Johnson <jej2...@gmail.com> wrote: >> Thanks for the reply. The fields I want are indexed, but how would I >> go directly at the fields I wanted? >> >> In regards to indexing the auth tokens I've thought about this and am >> trying to get confirmation if that is reasonable given our >> constraints. >> >> On Mon, Aug 29, 2011 at 8:20 AM, Erick Erickson <erickerick...@gmail.com> >> wrote: >>> Yeah, loading the document inside a Collector is a >>> definite no-no. Have you tried going directly >>> at the fields you want (assuming they're >>> indexed)? That *should* be much faster, but >>> whether it'll be fast enough is a good question. I'm >>> thinking some of the Terms methods here. You >>> *might* get some joy out of making sure lazy >>> field loading is enabled (and make sure the >>> fields you're accessing for your logic are >>> indexed), but I'm not entirely sure about >>> that bit. >>> >>> This kind of problem is sometimes handled >>> by indexing "auth tokens" with the documents >>> and including an OR clause on the query >>> with the authorizations for a particular >>> user, but that works best if there is an upper >>> limit (in the 100s) of tokens that a user can possibly >>> have, often this works best with some kind of >>> grouping. Making this work when a user can >>> have tens of thousands of auth tokens is...er... >>> contra-indicated... >>> >>> Hope this helps a bit... >>> Erick >>> >>> On Sun, Aug 28, 2011 at 11:59 PM, Jamie Johnson <jej2...@gmail.com> wrote: >>>> Just a bit more information. Inside my class which extends >>>> FilteredDocIdSet all of the time seems to be getting spent in >>>> retrieving the document from the readerCtx, doing this >>>> >>>> Document doc = readerCtx.reader.document(docid); >>>> >>>> If I comment out this and just return true things fly along as I >>>> expect. My query is returning a total of 2 million documents also. >>>> >>>> On Sun, Aug 28, 2011 at 11:39 AM, Jamie Johnson <jej2...@gmail.com> wrote: >>>>> I have a need to post process Solr results based on some access >>>>> controls which are setup outside of Solr, currently we've written >>>>> something that extends SearchComponent and in the prepare method I'm >>>>> doing something like this >>>>> >>>>> QueryWrapperFilter qwf = new >>>>> QueryWrapperFilter(rb.getQuery()); >>>>> Filter filter = new CustomFilter(qwf); >>>>> FilteredQuery fq = new FilteredQuery(rb.getQuery(), >>>>> filter); >>>>> rb.setQuery(fq); >>>>> >>>>> Inside my CustomFilter I have a FilteredDocIdSet which checks if the >>>>> document should be returned. This works as I expect but for some >>>>> reason is very very slow. Even if I take out any of the machinery >>>>> which does any logic with the document and only return true in the >>>>> FilteredDocIdSets match method the query still takes an inordinate >>>>> amount of time as compared to not including this custom filter. So my >>>>> question, is this the most appropriate way of handling this? What >>>>> should the performance out of such a setup be expected to be? Any >>>>> information/pointers would be greatly appreciated. >>>>> >>>> >>> >> >