Yeah, loading the document inside a Collector is a
definite no-no. Have you tried going directly
at the fields you want (assuming they're
indexed)? That *should* be much faster, but
whether it'll be fast enough is a good question. I'm
thinking some of the Terms methods here. You
*might* get some joy out of making sure lazy
field loading is enabled (and make sure the
fields you're accessing for your logic are
indexed), but I'm not entirely sure about
that bit.

This kind of problem is sometimes handled
by indexing "auth tokens" with the documents
and including an OR clause on the query
with the authorizations for a particular
user, but that works best if there is an upper
limit (in the 100s) of tokens that a user can possibly
have, often this works best with some kind of
grouping. Making this work when a user can
have tens of thousands of auth tokens is...er...
contra-indicated...

Hope this helps a bit...
Erick

On Sun, Aug 28, 2011 at 11:59 PM, Jamie Johnson <jej2...@gmail.com> wrote:
> Just a bit more information.  Inside my class which extends
> FilteredDocIdSet all of the time seems to be getting spent in
> retrieving the document from the readerCtx, doing this
>
> Document doc = readerCtx.reader.document(docid);
>
> If I comment out this and just return true things fly along as I
> expect.  My query is returning a total of 2 million documents also.
>
> On Sun, Aug 28, 2011 at 11:39 AM, Jamie Johnson <jej2...@gmail.com> wrote:
>> I have a need to post process Solr results based on some access
>> controls which are setup outside of Solr, currently we've written
>> something that extends SearchComponent and in the prepare method I'm
>> doing something like this
>>
>>                    QueryWrapperFilter qwf = new
>> QueryWrapperFilter(rb.getQuery());
>>                    Filter filter = new CustomFilter(qwf);
>>                    FilteredQuery fq = new FilteredQuery(rb.getQuery(), 
>> filter);
>>                    rb.setQuery(fq);
>>
>> Inside my CustomFilter I have a FilteredDocIdSet which checks if the
>> document should be returned.  This works as I expect but for some
>> reason is very very slow.  Even if I take out any of the machinery
>> which does any logic with the document and only return true in the
>> FilteredDocIdSets match method the query still takes an inordinate
>> amount of time as compared to not including this custom filter.  So my
>> question, is this the most appropriate way of handling this?  What
>> should the performance out of such a setup be expected to be?  Any
>> information/pointers would be greatly appreciated.
>>
>

Reply via email to