Sounds like you're looking for https://issues.apache.org/jira/browse/SOLR-2429 which has been committed to trunk and also the 3_x branch (after the release of 3.3).
Erik On Aug 29, 2011, at 11:46 , Jamie Johnson wrote: > Thanks guys, perhaps I am just going about this the wrong way. So let > me explain my problem and perhaps there is a more appropriate > solution. What I need to do is basically hide certain results based > on some passed in user parameter (say their service tier for > instance). What I'd like to do is have some way to plugin my custom > logic to basically remove certain documents from the result set using > this information. Now that being said I technically don't need to > remove the documents from the full result set, I really only need to > remove them from current page (but still ensuring that a page is > filled and sorted). At present I'm trying to see if there is a way > for me to add this type of logic after the QueryComponent has > executed, perhaps by going through the DocIdandSet at this point and > then intersecting the DocIdSet with a DocIdSet which would filter out > the stuff I don't want seen. Does this sound reasonable or like a > fools errand? > > > > On Mon, Aug 29, 2011 at 10:51 AM, Erik Hatcher <erik.hatc...@gmail.com> wrote: >> I haven't followed the details, but what I'm guessing you want here is >> Lucene's FieldCache. Perhaps something along the lines of how faceting uses >> it (in SimpleFacets.java) - >> >> FieldCache.DocTermsIndex si = >> FieldCache.DEFAULT.getTermsIndex(searcher.getIndexReader(), fieldName); >> >> Erik >> >> On Aug 29, 2011, at 09:58 , Erick Erickson wrote: >> >>> If you're asking whether there's a way to find, say, >>> all the values for the "auth" field associated with >>> a document... no. The nature of an inverted >>> index makes this hard (think of finding all >>> the definitions in a dictionary where the word >>> "earth" was in the definition). >>> >>> Best >>> Erick >>> >>> On Mon, Aug 29, 2011 at 9:21 AM, Jamie Johnson <jej2...@gmail.com> wrote: >>>> Thanks Erick, if I did not know the token up front that could be in >>>> the index is there not an efficient way to get the field for a >>>> specific document and do some custom processing on it? >>>> >>>> On Mon, Aug 29, 2011 at 8:34 AM, Erick Erickson <erickerick...@gmail.com> >>>> wrote: >>>>> Start here I think: >>>>> >>>>> http://lucene.apache.org/java/3_0_2/api/core/index.html?org/apache/lucene/index/TermDocs.html >>>>> >>>>> Best >>>>> Erick >>>>> >>>>> On Mon, Aug 29, 2011 at 8:24 AM, Jamie Johnson <jej2...@gmail.com> wrote: >>>>>> Thanks for the reply. The fields I want are indexed, but how would I >>>>>> go directly at the fields I wanted? >>>>>> >>>>>> In regards to indexing the auth tokens I've thought about this and am >>>>>> trying to get confirmation if that is reasonable given our >>>>>> constraints. >>>>>> >>>>>> On Mon, Aug 29, 2011 at 8:20 AM, Erick Erickson >>>>>> <erickerick...@gmail.com> wrote: >>>>>>> Yeah, loading the document inside a Collector is a >>>>>>> definite no-no. Have you tried going directly >>>>>>> at the fields you want (assuming they're >>>>>>> indexed)? That *should* be much faster, but >>>>>>> whether it'll be fast enough is a good question. I'm >>>>>>> thinking some of the Terms methods here. You >>>>>>> *might* get some joy out of making sure lazy >>>>>>> field loading is enabled (and make sure the >>>>>>> fields you're accessing for your logic are >>>>>>> indexed), but I'm not entirely sure about >>>>>>> that bit. >>>>>>> >>>>>>> This kind of problem is sometimes handled >>>>>>> by indexing "auth tokens" with the documents >>>>>>> and including an OR clause on the query >>>>>>> with the authorizations for a particular >>>>>>> user, but that works best if there is an upper >>>>>>> limit (in the 100s) of tokens that a user can possibly >>>>>>> have, often this works best with some kind of >>>>>>> grouping. Making this work when a user can >>>>>>> have tens of thousands of auth tokens is...er... >>>>>>> contra-indicated... >>>>>>> >>>>>>> Hope this helps a bit... >>>>>>> Erick >>>>>>> >>>>>>> On Sun, Aug 28, 2011 at 11:59 PM, Jamie Johnson <jej2...@gmail.com> >>>>>>> wrote: >>>>>>>> Just a bit more information. Inside my class which extends >>>>>>>> FilteredDocIdSet all of the time seems to be getting spent in >>>>>>>> retrieving the document from the readerCtx, doing this >>>>>>>> >>>>>>>> Document doc = readerCtx.reader.document(docid); >>>>>>>> >>>>>>>> If I comment out this and just return true things fly along as I >>>>>>>> expect. My query is returning a total of 2 million documents also. >>>>>>>> >>>>>>>> On Sun, Aug 28, 2011 at 11:39 AM, Jamie Johnson <jej2...@gmail.com> >>>>>>>> wrote: >>>>>>>>> I have a need to post process Solr results based on some access >>>>>>>>> controls which are setup outside of Solr, currently we've written >>>>>>>>> something that extends SearchComponent and in the prepare method I'm >>>>>>>>> doing something like this >>>>>>>>> >>>>>>>>> QueryWrapperFilter qwf = new >>>>>>>>> QueryWrapperFilter(rb.getQuery()); >>>>>>>>> Filter filter = new CustomFilter(qwf); >>>>>>>>> FilteredQuery fq = new >>>>>>>>> FilteredQuery(rb.getQuery(), filter); >>>>>>>>> rb.setQuery(fq); >>>>>>>>> >>>>>>>>> Inside my CustomFilter I have a FilteredDocIdSet which checks if the >>>>>>>>> document should be returned. This works as I expect but for some >>>>>>>>> reason is very very slow. Even if I take out any of the machinery >>>>>>>>> which does any logic with the document and only return true in the >>>>>>>>> FilteredDocIdSets match method the query still takes an inordinate >>>>>>>>> amount of time as compared to not including this custom filter. So my >>>>>>>>> question, is this the most appropriate way of handling this? What >>>>>>>>> should the performance out of such a setup be expected to be? Any >>>>>>>>> information/pointers would be greatly appreciated. >>>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>> >>>> >> >>