Re: Post Processing Solr Results

Erik Hatcher Mon, 29 Aug 2011 07:52:09 -0700

I haven't followed the details, but what I'm guessing you want here is Lucene's 
FieldCache.  Perhaps something along the lines of how faceting uses it (in 
SimpleFacets.java) -


   FieldCache.DocTermsIndex si = 
FieldCache.DEFAULT.getTermsIndex(searcher.getIndexReader(), fieldName);

        Erik

On Aug 29, 2011, at 09:58 , Erick Erickson wrote:

> If you're asking whether there's a way to find, say,
> all the values for the "auth" field associated with
> a document... no. The nature of an inverted
> index makes this hard (think of finding all
> the definitions in a dictionary where the word
> "earth" was in the definition).
> 
> Best
> Erick
> 
> On Mon, Aug 29, 2011 at 9:21 AM, Jamie Johnson <jej2...@gmail.com> wrote:
>> Thanks Erick, if I did not know the token up front that could be in
>> the index is there not an efficient way to get the field for a
>> specific document and do some custom processing on it?
>> 
>> On Mon, Aug 29, 2011 at 8:34 AM, Erick Erickson <erickerick...@gmail.com> 
>> wrote:
>>> Start here I think:
>>> 
>>> http://lucene.apache.org/java/3_0_2/api/core/index.html?org/apache/lucene/index/TermDocs.html
>>> 
>>> Best
>>> Erick
>>> 
>>> On Mon, Aug 29, 2011 at 8:24 AM, Jamie Johnson <jej2...@gmail.com> wrote:
>>>> Thanks for the reply.  The fields I want are indexed, but how would I
>>>> go directly at the fields I wanted?
>>>> 
>>>> In regards to indexing the auth tokens I've thought about this and am
>>>> trying to get confirmation if that is reasonable given our
>>>> constraints.
>>>> 
>>>> On Mon, Aug 29, 2011 at 8:20 AM, Erick Erickson <erickerick...@gmail.com> 
>>>> wrote:
>>>>> Yeah, loading the document inside a Collector is a
>>>>> definite no-no. Have you tried going directly
>>>>> at the fields you want (assuming they're
>>>>> indexed)? That *should* be much faster, but
>>>>> whether it'll be fast enough is a good question. I'm
>>>>> thinking some of the Terms methods here. You
>>>>> *might* get some joy out of making sure lazy
>>>>> field loading is enabled (and make sure the
>>>>> fields you're accessing for your logic are
>>>>> indexed), but I'm not entirely sure about
>>>>> that bit.
>>>>> 
>>>>> This kind of problem is sometimes handled
>>>>> by indexing "auth tokens" with the documents
>>>>> and including an OR clause on the query
>>>>> with the authorizations for a particular
>>>>> user, but that works best if there is an upper
>>>>> limit (in the 100s) of tokens that a user can possibly
>>>>> have, often this works best with some kind of
>>>>> grouping. Making this work when a user can
>>>>> have tens of thousands of auth tokens is...er...
>>>>> contra-indicated...
>>>>> 
>>>>> Hope this helps a bit...
>>>>> Erick
>>>>> 
>>>>> On Sun, Aug 28, 2011 at 11:59 PM, Jamie Johnson <jej2...@gmail.com> wrote:
>>>>>> Just a bit more information.  Inside my class which extends
>>>>>> FilteredDocIdSet all of the time seems to be getting spent in
>>>>>> retrieving the document from the readerCtx, doing this
>>>>>> 
>>>>>> Document doc = readerCtx.reader.document(docid);
>>>>>> 
>>>>>> If I comment out this and just return true things fly along as I
>>>>>> expect.  My query is returning a total of 2 million documents also.
>>>>>> 
>>>>>> On Sun, Aug 28, 2011 at 11:39 AM, Jamie Johnson <jej2...@gmail.com> 
>>>>>> wrote:
>>>>>>> I have a need to post process Solr results based on some access
>>>>>>> controls which are setup outside of Solr, currently we've written
>>>>>>> something that extends SearchComponent and in the prepare method I'm
>>>>>>> doing something like this
>>>>>>> 
>>>>>>>                    QueryWrapperFilter qwf = new
>>>>>>> QueryWrapperFilter(rb.getQuery());
>>>>>>>                    Filter filter = new CustomFilter(qwf);
>>>>>>>                    FilteredQuery fq = new FilteredQuery(rb.getQuery(), 
>>>>>>> filter);
>>>>>>>                    rb.setQuery(fq);
>>>>>>> 
>>>>>>> Inside my CustomFilter I have a FilteredDocIdSet which checks if the
>>>>>>> document should be returned.  This works as I expect but for some
>>>>>>> reason is very very slow.  Even if I take out any of the machinery
>>>>>>> which does any logic with the document and only return true in the
>>>>>>> FilteredDocIdSets match method the query still takes an inordinate
>>>>>>> amount of time as compared to not including this custom filter.  So my
>>>>>>> question, is this the most appropriate way of handling this?  What
>>>>>>> should the performance out of such a setup be expected to be?  Any
>>>>>>> information/pointers would be greatly appreciated.
>>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>>> 
>>

Re: Post Processing Solr Results

Reply via email to