I haven't followed the details, but what I'm guessing you want here is Lucene's FieldCache. Perhaps something along the lines of how faceting uses it (in SimpleFacets.java) -
FieldCache.DocTermsIndex si = FieldCache.DEFAULT.getTermsIndex(searcher.getIndexReader(), fieldName); Erik On Aug 29, 2011, at 09:58 , Erick Erickson wrote: > If you're asking whether there's a way to find, say, > all the values for the "auth" field associated with > a document... no. The nature of an inverted > index makes this hard (think of finding all > the definitions in a dictionary where the word > "earth" was in the definition). > > Best > Erick > > On Mon, Aug 29, 2011 at 9:21 AM, Jamie Johnson <jej2...@gmail.com> wrote: >> Thanks Erick, if I did not know the token up front that could be in >> the index is there not an efficient way to get the field for a >> specific document and do some custom processing on it? >> >> On Mon, Aug 29, 2011 at 8:34 AM, Erick Erickson <erickerick...@gmail.com> >> wrote: >>> Start here I think: >>> >>> http://lucene.apache.org/java/3_0_2/api/core/index.html?org/apache/lucene/index/TermDocs.html >>> >>> Best >>> Erick >>> >>> On Mon, Aug 29, 2011 at 8:24 AM, Jamie Johnson <jej2...@gmail.com> wrote: >>>> Thanks for the reply. The fields I want are indexed, but how would I >>>> go directly at the fields I wanted? >>>> >>>> In regards to indexing the auth tokens I've thought about this and am >>>> trying to get confirmation if that is reasonable given our >>>> constraints. >>>> >>>> On Mon, Aug 29, 2011 at 8:20 AM, Erick Erickson <erickerick...@gmail.com> >>>> wrote: >>>>> Yeah, loading the document inside a Collector is a >>>>> definite no-no. Have you tried going directly >>>>> at the fields you want (assuming they're >>>>> indexed)? That *should* be much faster, but >>>>> whether it'll be fast enough is a good question. I'm >>>>> thinking some of the Terms methods here. You >>>>> *might* get some joy out of making sure lazy >>>>> field loading is enabled (and make sure the >>>>> fields you're accessing for your logic are >>>>> indexed), but I'm not entirely sure about >>>>> that bit. >>>>> >>>>> This kind of problem is sometimes handled >>>>> by indexing "auth tokens" with the documents >>>>> and including an OR clause on the query >>>>> with the authorizations for a particular >>>>> user, but that works best if there is an upper >>>>> limit (in the 100s) of tokens that a user can possibly >>>>> have, often this works best with some kind of >>>>> grouping. Making this work when a user can >>>>> have tens of thousands of auth tokens is...er... >>>>> contra-indicated... >>>>> >>>>> Hope this helps a bit... >>>>> Erick >>>>> >>>>> On Sun, Aug 28, 2011 at 11:59 PM, Jamie Johnson <jej2...@gmail.com> wrote: >>>>>> Just a bit more information. Inside my class which extends >>>>>> FilteredDocIdSet all of the time seems to be getting spent in >>>>>> retrieving the document from the readerCtx, doing this >>>>>> >>>>>> Document doc = readerCtx.reader.document(docid); >>>>>> >>>>>> If I comment out this and just return true things fly along as I >>>>>> expect. My query is returning a total of 2 million documents also. >>>>>> >>>>>> On Sun, Aug 28, 2011 at 11:39 AM, Jamie Johnson <jej2...@gmail.com> >>>>>> wrote: >>>>>>> I have a need to post process Solr results based on some access >>>>>>> controls which are setup outside of Solr, currently we've written >>>>>>> something that extends SearchComponent and in the prepare method I'm >>>>>>> doing something like this >>>>>>> >>>>>>> QueryWrapperFilter qwf = new >>>>>>> QueryWrapperFilter(rb.getQuery()); >>>>>>> Filter filter = new CustomFilter(qwf); >>>>>>> FilteredQuery fq = new FilteredQuery(rb.getQuery(), >>>>>>> filter); >>>>>>> rb.setQuery(fq); >>>>>>> >>>>>>> Inside my CustomFilter I have a FilteredDocIdSet which checks if the >>>>>>> document should be returned. This works as I expect but for some >>>>>>> reason is very very slow. Even if I take out any of the machinery >>>>>>> which does any logic with the document and only return true in the >>>>>>> FilteredDocIdSets match method the query still takes an inordinate >>>>>>> amount of time as compared to not including this custom filter. So my >>>>>>> question, is this the most appropriate way of handling this? What >>>>>>> should the performance out of such a setup be expected to be? Any >>>>>>> information/pointers would be greatly appreciated. >>>>>>> >>>>>> >>>>> >>>> >>> >>