Yeah, loading the document inside a Collector is a definite no-no. Have you tried going directly at the fields you want (assuming they're indexed)? That *should* be much faster, but whether it'll be fast enough is a good question. I'm thinking some of the Terms methods here. You *might* get some joy out of making sure lazy field loading is enabled (and make sure the fields you're accessing for your logic are indexed), but I'm not entirely sure about that bit.
This kind of problem is sometimes handled by indexing "auth tokens" with the documents and including an OR clause on the query with the authorizations for a particular user, but that works best if there is an upper limit (in the 100s) of tokens that a user can possibly have, often this works best with some kind of grouping. Making this work when a user can have tens of thousands of auth tokens is...er... contra-indicated... Hope this helps a bit... Erick On Sun, Aug 28, 2011 at 11:59 PM, Jamie Johnson <jej2...@gmail.com> wrote: > Just a bit more information. Inside my class which extends > FilteredDocIdSet all of the time seems to be getting spent in > retrieving the document from the readerCtx, doing this > > Document doc = readerCtx.reader.document(docid); > > If I comment out this and just return true things fly along as I > expect. My query is returning a total of 2 million documents also. > > On Sun, Aug 28, 2011 at 11:39 AM, Jamie Johnson <jej2...@gmail.com> wrote: >> I have a need to post process Solr results based on some access >> controls which are setup outside of Solr, currently we've written >> something that extends SearchComponent and in the prepare method I'm >> doing something like this >> >> QueryWrapperFilter qwf = new >> QueryWrapperFilter(rb.getQuery()); >> Filter filter = new CustomFilter(qwf); >> FilteredQuery fq = new FilteredQuery(rb.getQuery(), >> filter); >> rb.setQuery(fq); >> >> Inside my CustomFilter I have a FilteredDocIdSet which checks if the >> document should be returned. This works as I expect but for some >> reason is very very slow. Even if I take out any of the machinery >> which does any logic with the document and only return true in the >> FilteredDocIdSets match method the query still takes an inordinate >> amount of time as compared to not including this custom filter. So my >> question, is this the most appropriate way of handling this? What >> should the performance out of such a setup be expected to be? Any >> information/pointers would be greatly appreciated. >> >