Phil, Some time ago I posted my thoughts about the similar problem. Scroll to part II.
http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201201.mbox/%3CCANGii8egoB1_rXFfwJMheyxx72v48B_DA-6KteKOymiBrR=m...@mail.gmail.com%3E Regards On Tue, Jan 24, 2012 at 1:36 PM, John, Phil (CSS) <philj...@capita.co.uk>wrote: > Thanks for the responses. > > Groups probably wouldn't work as while there will be some overlap between > customers, each will have a very different overall set of accessible > resources. > > I'll try the suggestion about simply reindexing, or using the no-cache > option and see how I get on. > > Failing that, are there hooks to write custom filter modules that used > other parts of the records to decide on whether to include them in a result > set or not? In our use case, the documents represent articles, which have > an "issue" field. Each customer has defined issues (or ranges of issues) > that they have subscriptions to, so the upper bounds for "what to filter" > would probably be fairly small (10k - 20k issues/ranges). This could > probably be used with the no-cache option you've pointed me to. > > Best wishes, > > Phil. > > -----Original Message----- > From: Erick Erickson [mailto:erickerick...@gmail.com] > Sent: 23 January 2012 17:34 > To: solr-user@lucene.apache.org > Subject: Re: Filtering search results by an external set of values > > A second, but arguably quite expert option, is to use the no-cache option. > See: https://issues.apache.org/jira/browse/SOLR-2429 > > The idea here is that you can specify that a filter is "expensive" and it > will only be run after all the other filters & etc have been applied. > Furthermore, > it will not be cached and only documents that pass through all the other > filters will be matched against this filter. It has been specifically used > for ACL calculations... > > That said, see exactly how painful storing auth tokens is. I can index, on > a relatively underpowered laptop, 11M Wiki documents in 5 minutes or so. If > your worst-case rights update take 1/2 hour to re-index and it only happens > once a month, why be complex? > > And groups, as Jan says, often make even this unnecessary. > > Best > Erick > > On Mon, Jan 23, 2012 at 5:16 AM, Jan Høydahl <jan....@cominvent.com> > wrote: > > Hi, > > > > Do you have any kind of "group" membership for you users? If you have, > > a resource's list of security access tokens could be smaller and avoid > > re-indexing most resources when adding "normal" users which mostly > > belong to groups. The common way is to add filters on the query. You > > may do it yourself or have some framework/plugin to it for you, see > > http://wiki.apache.org/solr/SolrSecurity#Document_Level_Security > > > > -- > > Jan Høydahl, search solution architect Cominvent AS - > > www.cominvent.com Solr Training - www.solrtraining.com > > > > On 23. jan. 2012, at 11:49, John, Phil (CSS) wrote: > > > >> Hi, > >> > >> > >> > >> We're building quite a large shared index of resources, using Solr. > >> The application that makes use of these resources is a multitenant > >> one (i.e., many customers using the same index). For resources that > >> are "private" to a customer, it's fairly easy to tag a document with > >> their customer ID and using a FilterQuery to limit results to just > >> their "stuff". > >> > >> > >> > >> We are soon going to be adding a large number (many tens of millions) > >> of records that will be shared amongst customers. Not all customers > >> will have access to the same shared resources, e.g.: > >> > >> > >> > >> * Shared resource 1: > >> > >> o Customer 1 > >> > >> o Customer 3 > >> > >> > >> > >> * Shared resource 2: > >> > >> o Customer 2 > >> > >> o Customer 1 > >> > >> > >> > >> The issue is, what is the best way to model this in Solr? Should we > >> have multiple customer_id fields on each record, and then use the > >> filter query as with "private" resources, or is there a better way of > doing it? > >> What happens if we need to do a bulk change - i.e. adding new > >> customer, or a previous customer has a large change in what shared > >> resources they have access to? Am I right in thinking that we'd need > >> to go through every shared resource, read it, make the required > >> change, and reindex it? > >> > >> > >> > >> I'm wondering if there's a way, instead of updating these resources > >> directly, I could construct a set of documents that would act as a > >> filter at query time of which shared resources to return? > >> > >> > >> > >> Kind regards, > >> > >> > >> > >> Phil John > >> > >> Technical Lead, Capita Software Services > >> > >> Knights Court, Solihull Parkway > >> > >> Birmingham Business Park B37 7YB > >> > >> Office: 0870 400 5000 > >> > >> Fax: 0870 400 5001 > >> email: philj...@capita.co.uk <mailto:philj...@capita.co.uk> > >> > >> > >> > >> Part of Capita plc www.capita.co.uk <http://www.capita.co.uk> > >> > >> > >> > >> > >> > >> This email and any attachment to it are confidential. Unless you are > the intended recipient, you may not use, copy or disclose either the > message or any information contained in the message. If you are not the > intended recipient, you should delete this email and notify the sender > immediately. > >> > >> Any views or opinions expressed in this email are those of the sender > only, unless otherwise stated. All copyright in any Capita material in > this email is reserved. > >> > >> All emails, incoming and outgoing, may be recorded by Capita and > monitored for legitimate business purposes. > >> > >> Capita exclude all liability for any loss or damage arising or > resulting from the receipt, use or transmission of this email to the > fullest extent permitted by law. > > > -- Sincerely yours Mikhail Khludnev Lucid Certified Apache Lucene/Solr Developer Grid Dynamics <http://www.griddynamics.com> <mkhlud...@griddynamics.com>