Re: Filtering search results by an external set of values

Mikhail Khludnev Tue, 24 Jan 2012 02:06:12 -0800

Phil,

Some time ago I posted my thoughts about the similar problem. Scroll to
part II.


http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201201.mbox/%3CCANGii8egoB1_rXFfwJMheyxx72v48B_DA-6KteKOymiBrR=m...@mail.gmail.com%3E

Regards

On Tue, Jan 24, 2012 at 1:36 PM, John, Phil (CSS) <philj...@capita.co.uk>wrote:

> Thanks for the responses.
>
> Groups probably wouldn't work as while there will be some overlap between
> customers, each will have a very different overall set of accessible
> resources.
>
> I'll try the suggestion about simply reindexing, or using the no-cache
> option and see how I get on.
>
> Failing that, are there hooks to write custom filter modules that used
> other parts of the records to decide on whether to include them in a result
> set or not? In our use case, the documents represent articles, which have
> an "issue" field. Each customer has defined issues (or ranges of issues)
> that they have subscriptions to, so the upper bounds for "what to filter"
> would probably be fairly small (10k - 20k issues/ranges). This could
> probably be used with the no-cache option you've pointed me to.
>
> Best wishes,
>
> Phil.
>
> -----Original Message-----
> From: Erick Erickson [mailto:erickerick...@gmail.com]
> Sent: 23 January 2012 17:34
> To: solr-user@lucene.apache.org
> Subject: Re: Filtering search results by an external set of values
>
> A second, but arguably quite expert option, is to use the no-cache option.
> See: https://issues.apache.org/jira/browse/SOLR-2429
>
> The idea here is that you can specify that a filter is "expensive" and it
> will only be run after all the other filters & etc have been applied.
> Furthermore,
> it will not be cached and only documents that pass through all the other
> filters will be matched against this filter. It has been specifically used
> for ACL calculations...
>
> That said, see exactly how painful storing auth tokens is. I can index, on
> a relatively underpowered laptop, 11M Wiki documents in 5 minutes or so. If
> your worst-case rights update take 1/2 hour to re-index and it only happens
> once a month, why be complex?
>
> And groups, as Jan says, often make even this unnecessary.
>
> Best
> Erick
>
> On Mon, Jan 23, 2012 at 5:16 AM, Jan Høydahl <jan....@cominvent.com>
> wrote:
> > Hi,
> >
> > Do you have any kind of "group" membership for you users? If you have,
> > a resource's list of security access tokens could be smaller and avoid
> > re-indexing most resources when adding "normal" users which mostly
> > belong to groups. The common way is to add filters on the query. You
> > may do it yourself or have some framework/plugin to it for you, see
> > http://wiki.apache.org/solr/SolrSecurity#Document_Level_Security
> >
> > --
> > Jan Høydahl, search solution architect Cominvent AS -
> > www.cominvent.com Solr Training - www.solrtraining.com
> >
> > On 23. jan. 2012, at 11:49, John, Phil (CSS) wrote:
> >
> >> Hi,
> >>
> >>
> >>
> >> We're building quite a large shared index of resources, using Solr.
> >> The application that makes use of these resources is a multitenant
> >> one (i.e., many customers using the same index). For resources that
> >> are "private" to a customer, it's fairly easy to tag a document with
> >> their customer ID and using a FilterQuery to limit results to just
> >> their "stuff".
> >>
> >>
> >>
> >> We are soon going to be adding a large number (many tens of millions)
> >> of records that will be shared amongst customers. Not all customers
> >> will have access to the same shared resources, e.g.:
> >>
> >>
> >>
> >> *         Shared resource 1:
> >>
> >> o   Customer 1
> >>
> >> o   Customer 3
> >>
> >>
> >>
> >> *         Shared resource 2:
> >>
> >> o   Customer 2
> >>
> >> o   Customer 1
> >>
> >>
> >>
> >> The issue is, what is the best way to model this in Solr? Should we
> >> have multiple customer_id fields on each record, and then use the
> >> filter query as with "private" resources, or is there a better way of
> doing it?
> >> What happens if we need to do a bulk change - i.e. adding new
> >> customer, or a previous customer has a large change in what shared
> >> resources they have access to? Am I right in thinking that we'd need
> >> to go through every shared resource, read it, make the required
> >> change, and reindex it?
> >>
> >>
> >>
> >> I'm wondering if there's a way, instead of updating these resources
> >> directly, I could construct a set of documents that would act as a
> >> filter at query time of which shared resources to return?
> >>
> >>
> >>
> >> Kind regards,
> >>
> >>
> >>
> >> Phil John
> >>
> >> Technical Lead, Capita Software Services
> >>
> >> Knights Court, Solihull Parkway
> >>
> >> Birmingham Business Park B37 7YB
> >>
> >> Office: 0870 400 5000
> >>
> >> Fax: 0870 400 5001
> >> email: philj...@capita.co.uk <mailto:philj...@capita.co.uk>
> >>
> >>
> >>
> >> Part of Capita plc www.capita.co.uk <http://www.capita.co.uk>
> >>
> >>
> >>
> >>
> >>
> >> This email and any attachment to it are confidential.  Unless you are
> the intended recipient, you may not use, copy or disclose either the
> message or any information contained in the message. If you are not the
> intended recipient, you should delete this email and notify the sender
> immediately.
> >>
> >> Any views or opinions expressed in this email are those of the sender
> only, unless otherwise stated.  All copyright in any Capita material in
> this email is reserved.
> >>
> >> All emails, incoming and outgoing, may be recorded by Capita and
> monitored for legitimate business purposes.
> >>
> >> Capita exclude all liability for any loss or damage arising or
> resulting from the receipt, use or transmission of this email to the
> fullest extent permitted by law.
> >
>



-- 
Sincerely yours
Mikhail Khludnev
Lucid Certified
Apache Lucene/Solr Developer
Grid Dynamics

<http://www.griddynamics.com>
 <mkhlud...@griddynamics.com>

Re: Filtering search results by an external set of values

Reply via email to