Re: Post Processing Solr Results

Erik Hatcher Mon, 29 Aug 2011 11:15:49 -0700

Sounds like you're looking for https://issues.apache.org/jira/browse/SOLR-2429 
which has been committed to trunk and also the 3_x branch (after the release of 
3.3).


        Erik

On Aug 29, 2011, at 11:46 , Jamie Johnson wrote:

> Thanks guys, perhaps I am just going about this the wrong way.  So let
> me explain my problem and perhaps there is a more appropriate
> solution.  What I need to do is basically hide certain results based
> on some passed in user parameter (say their service tier for
> instance).  What I'd like to do is have some way to plugin my custom
> logic to basically remove certain documents from the result set using
> this information.  Now that being said I technically don't need to
> remove the documents from the full result set, I really only need to
> remove them from current page (but still ensuring that a page is
> filled and sorted).  At present I'm trying to see if there is a way
> for me to add this type of logic after the QueryComponent has
> executed, perhaps by going through the DocIdandSet at this point and
> then intersecting the DocIdSet with a DocIdSet which would filter out
> the stuff I don't want seen.  Does this sound reasonable or like a
> fools errand?
> 
> 
> 
> On Mon, Aug 29, 2011 at 10:51 AM, Erik Hatcher <erik.hatc...@gmail.com> wrote:
>> I haven't followed the details, but what I'm guessing you want here is 
>> Lucene's FieldCache.  Perhaps something along the lines of how faceting uses 
>> it (in SimpleFacets.java) -
>> 
>>   FieldCache.DocTermsIndex si = 
>> FieldCache.DEFAULT.getTermsIndex(searcher.getIndexReader(), fieldName);
>> 
>>        Erik
>> 
>> On Aug 29, 2011, at 09:58 , Erick Erickson wrote:
>> 
>>> If you're asking whether there's a way to find, say,
>>> all the values for the "auth" field associated with
>>> a document... no. The nature of an inverted
>>> index makes this hard (think of finding all
>>> the definitions in a dictionary where the word
>>> "earth" was in the definition).
>>> 
>>> Best
>>> Erick
>>> 
>>> On Mon, Aug 29, 2011 at 9:21 AM, Jamie Johnson <jej2...@gmail.com> wrote:
>>>> Thanks Erick, if I did not know the token up front that could be in
>>>> the index is there not an efficient way to get the field for a
>>>> specific document and do some custom processing on it?
>>>> 
>>>> On Mon, Aug 29, 2011 at 8:34 AM, Erick Erickson <erickerick...@gmail.com> 
>>>> wrote:
>>>>> Start here I think:
>>>>> 
>>>>> http://lucene.apache.org/java/3_0_2/api/core/index.html?org/apache/lucene/index/TermDocs.html
>>>>> 
>>>>> Best
>>>>> Erick
>>>>> 
>>>>> On Mon, Aug 29, 2011 at 8:24 AM, Jamie Johnson <jej2...@gmail.com> wrote:
>>>>>> Thanks for the reply.  The fields I want are indexed, but how would I
>>>>>> go directly at the fields I wanted?
>>>>>> 
>>>>>> In regards to indexing the auth tokens I've thought about this and am
>>>>>> trying to get confirmation if that is reasonable given our
>>>>>> constraints.
>>>>>> 
>>>>>> On Mon, Aug 29, 2011 at 8:20 AM, Erick Erickson 
>>>>>> <erickerick...@gmail.com> wrote:
>>>>>>> Yeah, loading the document inside a Collector is a
>>>>>>> definite no-no. Have you tried going directly
>>>>>>> at the fields you want (assuming they're
>>>>>>> indexed)? That *should* be much faster, but
>>>>>>> whether it'll be fast enough is a good question. I'm
>>>>>>> thinking some of the Terms methods here. You
>>>>>>> *might* get some joy out of making sure lazy
>>>>>>> field loading is enabled (and make sure the
>>>>>>> fields you're accessing for your logic are
>>>>>>> indexed), but I'm not entirely sure about
>>>>>>> that bit.
>>>>>>> 
>>>>>>> This kind of problem is sometimes handled
>>>>>>> by indexing "auth tokens" with the documents
>>>>>>> and including an OR clause on the query
>>>>>>> with the authorizations for a particular
>>>>>>> user, but that works best if there is an upper
>>>>>>> limit (in the 100s) of tokens that a user can possibly
>>>>>>> have, often this works best with some kind of
>>>>>>> grouping. Making this work when a user can
>>>>>>> have tens of thousands of auth tokens is...er...
>>>>>>> contra-indicated...
>>>>>>> 
>>>>>>> Hope this helps a bit...
>>>>>>> Erick
>>>>>>> 
>>>>>>> On Sun, Aug 28, 2011 at 11:59 PM, Jamie Johnson <jej2...@gmail.com> 
>>>>>>> wrote:
>>>>>>>> Just a bit more information.  Inside my class which extends
>>>>>>>> FilteredDocIdSet all of the time seems to be getting spent in
>>>>>>>> retrieving the document from the readerCtx, doing this
>>>>>>>> 
>>>>>>>> Document doc = readerCtx.reader.document(docid);
>>>>>>>> 
>>>>>>>> If I comment out this and just return true things fly along as I
>>>>>>>> expect.  My query is returning a total of 2 million documents also.
>>>>>>>> 
>>>>>>>> On Sun, Aug 28, 2011 at 11:39 AM, Jamie Johnson <jej2...@gmail.com> 
>>>>>>>> wrote:
>>>>>>>>> I have a need to post process Solr results based on some access
>>>>>>>>> controls which are setup outside of Solr, currently we've written
>>>>>>>>> something that extends SearchComponent and in the prepare method I'm
>>>>>>>>> doing something like this
>>>>>>>>> 
>>>>>>>>>                    QueryWrapperFilter qwf = new
>>>>>>>>> QueryWrapperFilter(rb.getQuery());
>>>>>>>>>                    Filter filter = new CustomFilter(qwf);
>>>>>>>>>                    FilteredQuery fq = new 
>>>>>>>>> FilteredQuery(rb.getQuery(), filter);
>>>>>>>>>                    rb.setQuery(fq);
>>>>>>>>> 
>>>>>>>>> Inside my CustomFilter I have a FilteredDocIdSet which checks if the
>>>>>>>>> document should be returned.  This works as I expect but for some
>>>>>>>>> reason is very very slow.  Even if I take out any of the machinery
>>>>>>>>> which does any logic with the document and only return true in the
>>>>>>>>> FilteredDocIdSets match method the query still takes an inordinate
>>>>>>>>> amount of time as compared to not including this custom filter.  So my
>>>>>>>>> question, is this the most appropriate way of handling this?  What
>>>>>>>>> should the performance out of such a setup be expected to be?  Any
>>>>>>>>> information/pointers would be greatly appreciated.
>>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>> 
>>

Re: Post Processing Solr Results

Reply via email to