Re: Modelling Access Control
Hello All, I am also trying to model ACL on solr search. Since in my case the data itself is very huge and user base is also too big. Putting ACL inside solr gives quite good response time, but ACL outside the solr seems to a nightmare. In case of ACL inside the solr puts heavy load on keeping solr index up to date, because adding a single user in the project with 3 entities in it requires to update them all in solr index. And we have 500 approx user addition per day. Can any body please explain how to implement ACL outside the solr? one more thing, in my case *search should return in 1sec* Thanks in advance -- View this message in context: http://lucene.472066.n3.nabble.com/Modelling-Access-Control-tp1756817p4017479.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Modelling Access Control
Ah haaa. I see now. :-) I didn't make that connection. Hopefully I would hbave before I ever tried to implement that :-) Kind of like user names and icons on a windows login :-) Dennis Gearon Signature Warning It is always a good idea to learn from your own mistakes. It is usually a better idea to learn from others’ mistakes, so you do not have to make them yourself. from 'http://blogs.techrepublic.com.com/security/?p=4501tag=nl.e036' EARTH has a Right To Life, otherwise we all die. --- On Sat, 10/30/10, Erick Erickson erickerick...@gmail.com wrote: From: Erick Erickson erickerick...@gmail.com Subject: Re: Modelling Access Control To: solr-user@lucene.apache.org Date: Saturday, October 30, 2010, 6:01 PM If that's in response to Lance's comment, the answer is that if you return autosuggest possibilities you effectively allow users to see data they shouldn't. Imagine you have a field of the real names of spies. You only want the persons way high up in the security chain to access these names and you control that on a document level. Allowing autocomplete on that field would be...er...very tough on your spies' health... HTH Erick On Tue, Oct 26, 2010 at 2:24 PM, Dennis Gearon gear...@sbcglobal.netwrote: Son, don't touch that stove . . . ., OUCH! Hey Dad, I BURNED my hand on that stove, why didn't you tell me that?!?#! You know I need to know WHY, not just DON'T! Dennis Gearon Very important: do not make a spelling or autosuggest index from a text field which some people can see and other people can't.
Re: Modelling Access Control
If that's in response to Lance's comment, the answer is that if you return autosuggest possibilities you effectively allow users to see data they shouldn't. Imagine you have a field of the real names of spies. You only want the persons way high up in the security chain to access these names and you control that on a document level. Allowing autocomplete on that field would be...er...very tough on your spies' health... HTH Erick On Tue, Oct 26, 2010 at 2:24 PM, Dennis Gearon gear...@sbcglobal.netwrote: Son, don't touch that stove . . . ., OUCH! Hey Dad, I BURNED my hand on that stove, why didn't you tell me that?!?#! You know I need to know WHY, not just DON'T! Dennis Gearon Very important: do not make a spelling or autosuggest index from a text field which some people can see and other people can't.
Re: Modelling Access Control
Filter queries are a set of bits which is ANDed against query results at a very early stage of query processing. They are very useful. Note that they are stored (I think) in parsed query order, so you have to pass in the same filter query string each time. On Mon, Oct 25, 2010 at 8:59 AM, Dennis Gearon gear...@sbcglobal.net wrote: Thanks for that insight, a lot. Dennis Gearon Signature Warning It is always a good idea to learn from your own mistakes. It is usually a better idea to learn from others’ mistakes, so you do not have to make them yourself. from 'http://blogs.techrepublic.com.com/security/?p=4501tag=nl.e036' EARTH has a Right To Life, otherwise we all die. --- On Mon, 10/25/10, Jonathan Rochkind rochk...@jhu.edu wrote: From: Jonathan Rochkind rochk...@jhu.edu Subject: Re: Modelling Access Control To: solr-user@lucene.apache.org solr-user@lucene.apache.org Date: Monday, October 25, 2010, 8:19 AM Dennis Gearon wrote: why use filter queries? Wouldn't reducing the set headed into the filters by putting it in the main query be faster? (A question to learn, since I do NOT know :-) No. At least as I understand it. In the best case, the filter query will be a lot faster, because filter queries are cached seperately in the filter cache. So if the existing filter query can be found in the cache, it'll be a lot faster. If it's not in the cache, the performance should be pretty much the same as if you had included it as an additional clause in the main q query. The reasons to put it in a fq filter are: 1) The caching behavior. You can have that certain part of the query be cached on it's own, speeding up any subsequent queries that use that same fq. 2) Simplification of client code. You can leave your 'q' however you want it, using whatever kind of query parser you want too (dismax, whatever), and just add on the 'fq' without touching the 'q'. This is a lot easier to do, and especially when you're using it for access control like this, a lot harder for a bug to creep in. Jonathan -- Lance Norskog goks...@gmail.com
Re: Modelling Access Control
The idea of ACL-based queries is: each document carries all of the groups or roles that it is ok with. Each user search includes all of the groups or roles the user has. The roles are stored as multivalued string fields. Each ACL-based query passes in roles:A OR roles:B OR roles:C and if any of A,B,C are in the stored ACL field, you have a match. This is called early binding. Late binding is when you return everything and the app calls LDAP and say can she see this? or this?. This is slow and puts a monster load on the ACL server. Very important: do not make a spelling or autosuggest index from a text field which some people can see and other people can't. On Tue, Oct 26, 2010 at 12:06 AM, Lance Norskog goks...@gmail.com wrote: Filter queries are a set of bits which is ANDed against query results at a very early stage of query processing. They are very useful. Note that they are stored (I think) in parsed query order, so you have to pass in the same filter query string each time. On Mon, Oct 25, 2010 at 8:59 AM, Dennis Gearon gear...@sbcglobal.net wrote: Thanks for that insight, a lot. Dennis Gearon Signature Warning It is always a good idea to learn from your own mistakes. It is usually a better idea to learn from others’ mistakes, so you do not have to make them yourself. from 'http://blogs.techrepublic.com.com/security/?p=4501tag=nl.e036' EARTH has a Right To Life, otherwise we all die. --- On Mon, 10/25/10, Jonathan Rochkind rochk...@jhu.edu wrote: From: Jonathan Rochkind rochk...@jhu.edu Subject: Re: Modelling Access Control To: solr-user@lucene.apache.org solr-user@lucene.apache.org Date: Monday, October 25, 2010, 8:19 AM Dennis Gearon wrote: why use filter queries? Wouldn't reducing the set headed into the filters by putting it in the main query be faster? (A question to learn, since I do NOT know :-) No. At least as I understand it. In the best case, the filter query will be a lot faster, because filter queries are cached seperately in the filter cache. So if the existing filter query can be found in the cache, it'll be a lot faster. If it's not in the cache, the performance should be pretty much the same as if you had included it as an additional clause in the main q query. The reasons to put it in a fq filter are: 1) The caching behavior. You can have that certain part of the query be cached on it's own, speeding up any subsequent queries that use that same fq. 2) Simplification of client code. You can leave your 'q' however you want it, using whatever kind of query parser you want too (dismax, whatever), and just add on the 'fq' without touching the 'q'. This is a lot easier to do, and especially when you're using it for access control like this, a lot harder for a bug to creep in. Jonathan -- Lance Norskog goks...@gmail.com -- Lance Norskog goks...@gmail.com
Re: Modelling Access Control
Son, don't touch that stove . . . ., OUCH! Hey Dad, I BURNED my hand on that stove, why didn't you tell me that?!?#! You know I need to know WHY, not just DON'T! Dennis Gearon Very important: do not make a spelling or autosuggest index from a text field which some people can see and other people can't.
Re: Modelling Access Control
Many thanks for all the responses. I now plan on benchmarking and validating both the filter query approach, and maintaining the ACL entirely outside of Solr. I'll decide from there. Paul
Re: Modelling Access Control
On Mon, Oct 25, 2010 at 8:16 AM, Paul Carey paul.p.ca...@gmail.com wrote: Many thanks for all the responses. I now plan on benchmarking and validating both the filter query approach, and maintaining the ACL entirely outside of Solr. I'll decide from there. Paul Great. I am looking forward for some feedback on the benchmarks. -- °O° Good Enough is not good enough. To give anything less than your best is to sacrifice the gift. Quality First. Measure Twice. Cut Once. http://www.israelekpo.com/
Re: Modelling Access Control
Dennis Gearon wrote: why use filter queries? Wouldn't reducing the set headed into the filters by putting it in the main query be faster? (A question to learn, since I do NOT know :-) No. At least as I understand it. In the best case, the filter query will be a lot faster, because filter queries are cached seperately in the filter cache. So if the existing filter query can be found in the cache, it'll be a lot faster. If it's not in the cache, the performance should be pretty much the same as if you had included it as an additional clause in the main q query. The reasons to put it in a fq filter are: 1) The caching behavior. You can have that certain part of the query be cached on it's own, speeding up any subsequent queries that use that same fq. 2) Simplification of client code. You can leave your 'q' however you want it, using whatever kind of query parser you want too (dismax, whatever), and just add on the 'fq' without touching the 'q'. This is a lot easier to do, and especially when you're using it for access control like this, a lot harder for a bug to creep in. Jonathan
Re: Modelling Access Control
I'll also be interested in how that works for you. Bringing out the whole dataset not filtered for some kind of access control will mean that you will have then do the filtering of the result set in your server side/command line program. So the speed comparison with the filter query vs the outside langauge environement will be very interesting :-) I will also do this, but in about 3-5 months. I will report it then. Dennis Gearon Signature Warning It is always a good idea to learn from your own mistakes. It is usually a better idea to learn from others’ mistakes, so you do not have to make them yourself. from 'http://blogs.techrepublic.com.com/security/?p=4501tag=nl.e036' EARTH has a Right To Life, otherwise we all die. --- On Mon, 10/25/10, Paul Carey paul.p.ca...@gmail.com wrote: From: Paul Carey paul.p.ca...@gmail.com Subject: Re: Modelling Access Control To: solr-user@lucene.apache.org Date: Monday, October 25, 2010, 5:16 AM Many thanks for all the responses. I now plan on benchmarking and validating both the filter query approach, and maintaining the ACL entirely outside of Solr. I'll decide from there. Paul
Re: Modelling Access Control
Thanks for that insight, a lot. Dennis Gearon Signature Warning It is always a good idea to learn from your own mistakes. It is usually a better idea to learn from others’ mistakes, so you do not have to make them yourself. from 'http://blogs.techrepublic.com.com/security/?p=4501tag=nl.e036' EARTH has a Right To Life, otherwise we all die. --- On Mon, 10/25/10, Jonathan Rochkind rochk...@jhu.edu wrote: From: Jonathan Rochkind rochk...@jhu.edu Subject: Re: Modelling Access Control To: solr-user@lucene.apache.org solr-user@lucene.apache.org Date: Monday, October 25, 2010, 8:19 AM Dennis Gearon wrote: why use filter queries? Wouldn't reducing the set headed into the filters by putting it in the main query be faster? (A question to learn, since I do NOT know :-) No. At least as I understand it. In the best case, the filter query will be a lot faster, because filter queries are cached seperately in the filter cache. So if the existing filter query can be found in the cache, it'll be a lot faster. If it's not in the cache, the performance should be pretty much the same as if you had included it as an additional clause in the main q query. The reasons to put it in a fq filter are: 1) The caching behavior. You can have that certain part of the query be cached on it's own, speeding up any subsequent queries that use that same fq. 2) Simplification of client code. You can leave your 'q' however you want it, using whatever kind of query parser you want too (dismax, whatever), and just add on the 'fq' without touching the 'q'. This is a lot easier to do, and especially when you're using it for access control like this, a lot harder for a bug to creep in. Jonathan
Re: Modelling Access Control
Hi, See SOLR-1872 for a way of providing access control, whilst placing the ACL configuration itself outside of Solr, which is generally a good idea. http://www.lucidimagination.com/search/out?u=http://issues.apache.org/jira/browse/SOLR-1872 There are a number of ways to approach Access Control, but you will need to take a number of factors into account that aren't issues if you're doing non-acl Solr queries. You can use this patch to achieve authentication and authorization, or use it as a template for similar techniques. Peter On Sat, Oct 23, 2010 at 9:03 AM, Paul Carey paul.p.ca...@gmail.com wrote: Hi My domain model is made of users that have access to projects which are composed of items. I'm hoping to use Solr and would like to make sure that searches only return results for items that users have access to. I've looked over some of the older posts on this mailing list about access control and saw a suggestion along the lines of acl:user_id AND (actual query). While this obviously works, there are a couple of niggles. Every item must have a list of valid user ids (typically less than 100 in my case). Every time a collaborator is added to or removed from a project, I need to update every item in that project. This will typically be fewer than 1000 items, so I guess is no big deal. I wondered if the following might be a reasonable alternative, assuming the number of projects to which a user has access is lower than a certain bound. (acl:project_id OR acl:project_id OR ... ) AND (actual query) When the numbers are small - e.g. each user has access to ~20 projects and each project has ~20 collaborators - is one approach preferable over another? And when outliers exist - e.g. a project with 2000 collaborators, or a user with access to 2000 projects - is one approach more liable to fail than the other? Many thanks Paul
Modelling Access Control
Hi My domain model is made of users that have access to projects which are composed of items. I'm hoping to use Solr and would like to make sure that searches only return results for items that users have access to. I've looked over some of the older posts on this mailing list about access control and saw a suggestion along the lines of acl:user_id AND (actual query). While this obviously works, there are a couple of niggles. Every item must have a list of valid user ids (typically less than 100 in my case). Every time a collaborator is added to or removed from a project, I need to update every item in that project. This will typically be fewer than 1000 items, so I guess is no big deal. I wondered if the following might be a reasonable alternative, assuming the number of projects to which a user has access is lower than a certain bound. (acl:project_id OR acl:project_id OR ... ) AND (actual query) When the numbers are small - e.g. each user has access to ~20 projects and each project has ~20 collaborators - is one approach preferable over another? And when outliers exist - e.g. a project with 2000 collaborators, or a user with access to 2000 projects - is one approach more liable to fail than the other? Many thanks Paul
Re: Modelling Access Control
Hi Paul, Regardless of how you implement it, I would recommend you use filter queries for the permissions check rather than making it part of the main query. On Sat, Oct 23, 2010 at 4:03 AM, Paul Carey paul.p.ca...@gmail.com wrote: Hi My domain model is made of users that have access to projects which are composed of items. I'm hoping to use Solr and would like to make sure that searches only return results for items that users have access to. I've looked over some of the older posts on this mailing list about access control and saw a suggestion along the lines of acl:user_id AND (actual query). While this obviously works, there are a couple of niggles. Every item must have a list of valid user ids (typically less than 100 in my case). Every time a collaborator is added to or removed from a project, I need to update every item in that project. This will typically be fewer than 1000 items, so I guess is no big deal. I wondered if the following might be a reasonable alternative, assuming the number of projects to which a user has access is lower than a certain bound. (acl:project_id OR acl:project_id OR ... ) AND (actual query) When the numbers are small - e.g. each user has access to ~20 projects and each project has ~20 collaborators - is one approach preferable over another? And when outliers exist - e.g. a project with 2000 collaborators, or a user with access to 2000 projects - is one approach more liable to fail than the other? Many thanks Paul -- °O° Good Enough is not good enough. To give anything less than your best is to sacrifice the gift. Quality First. Measure Twice. Cut Once. http://www.israelekpo.com/
Re: Modelling Access Control
Two things will lessen the solr admininstrative load : 1/ Follow examples of databases and *nix OSs. Give each user their own group, or set up groups that don't have regular users as OWNERS, but can have users assigned to the group to give them particular permissions. I.E. Roles, like publishers, reviewers, friends, etc. 2/ Put your ACL outside of Solr, using your server-side/command line language's object oriented properties. Force all searches to come from a single location in code (not sure how to do that), and make the piece of code check authentication and authorization. This is what my research shows how others do it, and how I plan to do it. ANY insight others have on this, I really want to hear. Dennis Gearon Signature Warning It is always a good idea to learn from your own mistakes. It is usually a better idea to learn from others’ mistakes, so you do not have to make them yourself. from 'http://blogs.techrepublic.com.com/security/?p=4501tag=nl.e036' EARTH has a Right To Life, otherwise we all die. --- On Sat, 10/23/10, Paul Carey paul.p.ca...@gmail.com wrote: From: Paul Carey paul.p.ca...@gmail.com Subject: Modelling Access Control To: solr-user@lucene.apache.org Date: Saturday, October 23, 2010, 1:03 AM Hi My domain model is made of users that have access to projects which are composed of items. I'm hoping to use Solr and would like to make sure that searches only return results for items that users have access to. I've looked over some of the older posts on this mailing list about access control and saw a suggestion along the lines of acl:user_id AND (actual query). While this obviously works, there are a couple of niggles. Every item must have a list of valid user ids (typically less than 100 in my case). Every time a collaborator is added to or removed from a project, I need to update every item in that project. This will typically be fewer than 1000 items, so I guess is no big deal. I wondered if the following might be a reasonable alternative, assuming the number of projects to which a user has access is lower than a certain bound. (acl:project_id OR acl:project_id OR ... ) AND (actual query) When the numbers are small - e.g. each user has access to ~20 projects and each project has ~20 collaborators - is one approach preferable over another? And when outliers exist - e.g. a project with 2000 collaborators, or a user with access to 2000 projects - is one approach more liable to fail than the other? Many thanks Paul
Re: Modelling Access Control
why use filter queries? Wouldn't reducing the set headed into the filters by putting it in the main query be faster? (A question to learn, since I do NOT know :-) Dennis Gearon Signature Warning It is always a good idea to learn from your own mistakes. It is usually a better idea to learn from others’ mistakes, so you do not have to make them yourself. from 'http://blogs.techrepublic.com.com/security/?p=4501tag=nl.e036' EARTH has a Right To Life, otherwise we all die. --- On Sat, 10/23/10, Israel Ekpo israele...@gmail.com wrote: From: Israel Ekpo israele...@gmail.com Subject: Re: Modelling Access Control To: solr-user@lucene.apache.org Date: Saturday, October 23, 2010, 7:01 AM Hi Paul, Regardless of how you implement it, I would recommend you use filter queries for the permissions check rather than making it part of the main query. On Sat, Oct 23, 2010 at 4:03 AM, Paul Carey paul.p.ca...@gmail.com wrote: Hi My domain model is made of users that have access to projects which are composed of items. I'm hoping to use Solr and would like to make sure that searches only return results for items that users have access to. I've looked over some of the older posts on this mailing list about access control and saw a suggestion along the lines of acl:user_id AND (actual query). While this obviously works, there are a couple of niggles. Every item must have a list of valid user ids (typically less than 100 in my case). Every time a collaborator is added to or removed from a project, I need to update every item in that project. This will typically be fewer than 1000 items, so I guess is no big deal. I wondered if the following might be a reasonable alternative, assuming the number of projects to which a user has access is lower than a certain bound. (acl:project_id OR acl:project_id OR ... ) AND (actual query) When the numbers are small - e.g. each user has access to ~20 projects and each project has ~20 collaborators - is one approach preferable over another? And when outliers exist - e.g. a project with 2000 collaborators, or a user with access to 2000 projects - is one approach more liable to fail than the other? Many thanks Paul -- °O° Good Enough is not good enough. To give anything less than your best is to sacrifice the gift. Quality First. Measure Twice. Cut Once. http://www.israelekpo.com/
Re: Modelling Access Control
Forgot to add, 3/ The external, application code selects the GROUPS that the user has permission to read (Solr will only serve up what is to be read?) then search on those groups. Dennis Gearon Signature Warning It is always a good idea to learn from your own mistakes. It is usually a better idea to learn from others’ mistakes, so you do not have to make them yourself. from 'http://blogs.techrepublic.com.com/security/?p=4501tag=nl.e036' EARTH has a Right To Life, otherwise we all die. --- On Sat, 10/23/10, Dennis Gearon gear...@sbcglobal.net wrote: From: Dennis Gearon gear...@sbcglobal.net Subject: Re: Modelling Access Control To: solr-user@lucene.apache.org Date: Saturday, October 23, 2010, 11:49 AM Two things will lessen the solr admininstrative load : 1/ Follow examples of databases and *nix OSs. Give each user their own group, or set up groups that don't have regular users as OWNERS, but can have users assigned to the group to give them particular permissions. I.E. Roles, like publishers, reviewers, friends, etc. 2/ Put your ACL outside of Solr, using your server-side/command line language's object oriented properties. Force all searches to come from a single location in code (not sure how to do that), and make the piece of code check authentication and authorization. This is what my research shows how others do it, and how I plan to do it. ANY insight others have on this, I really want to hear. Dennis Gearon Signature Warning It is always a good idea to learn from your own mistakes. It is usually a better idea to learn from others’ mistakes, so you do not have to make them yourself. from 'http://blogs.techrepublic.com.com/security/?p=4501tag=nl.e036' EARTH has a Right To Life, otherwise we all die. --- On Sat, 10/23/10, Paul Carey paul.p.ca...@gmail.com wrote: From: Paul Carey paul.p.ca...@gmail.com Subject: Modelling Access Control To: solr-user@lucene.apache.org Date: Saturday, October 23, 2010, 1:03 AM Hi My domain model is made of users that have access to projects which are composed of items. I'm hoping to use Solr and would like to make sure that searches only return results for items that users have access to. I've looked over some of the older posts on this mailing list about access control and saw a suggestion along the lines of acl:user_id AND (actual query). While this obviously works, there are a couple of niggles. Every item must have a list of valid user ids (typically less than 100 in my case). Every time a collaborator is added to or removed from a project, I need to update every item in that project. This will typically be fewer than 1000 items, so I guess is no big deal. I wondered if the following might be a reasonable alternative, assuming the number of projects to which a user has access is lower than a certain bound. (acl:project_id OR acl:project_id OR ... ) AND (actual query) When the numbers are small - e.g. each user has access to ~20 projects and each project has ~20 collaborators - is one approach preferable over another? And when outliers exist - e.g. a project with 2000 collaborators, or a user with access to 2000 projects - is one approach more liable to fail than the other? Many thanks Paul
Re: Modelling Access Control
Pushing ACL logic outside Solr sounds like a prudent choice indeed as in, my opinion, all of the business rules/conceptual logic should reside only within the code boundaries. This way your domain will be easier to model and your code to read, understand and maintain. More information on Filter Queries, when they should be used and how they affect performance can be found here: http://wiki.apache.org/solr/FilterQueryGuidance On 23 October 2010 20:00, Dennis Gearon gear...@sbcglobal.net wrote: Forgot to add, 3/ The external, application code selects the GROUPS that the user has permission to read (Solr will only serve up what is to be read?) then search on those groups. Dennis Gearon Signature Warning It is always a good idea to learn from your own mistakes. It is usually a better idea to learn from others’ mistakes, so you do not have to make them yourself. from ' http://blogs.techrepublic.com.com/security/?p=4501tag=nl.e036' EARTH has a Right To Life, otherwise we all die. --- On Sat, 10/23/10, Dennis Gearon gear...@sbcglobal.net wrote: From: Dennis Gearon gear...@sbcglobal.net Subject: Re: Modelling Access Control To: solr-user@lucene.apache.org Date: Saturday, October 23, 2010, 11:49 AM Two things will lessen the solr admininstrative load : 1/ Follow examples of databases and *nix OSs. Give each user their own group, or set up groups that don't have regular users as OWNERS, but can have users assigned to the group to give them particular permissions. I.E. Roles, like publishers, reviewers, friends, etc. 2/ Put your ACL outside of Solr, using your server-side/command line language's object oriented properties. Force all searches to come from a single location in code (not sure how to do that), and make the piece of code check authentication and authorization. This is what my research shows how others do it, and how I plan to do it. ANY insight others have on this, I really want to hear. Dennis Gearon Signature Warning It is always a good idea to learn from your own mistakes. It is usually a better idea to learn from others’ mistakes, so you do not have to make them yourself. from ' http://blogs.techrepublic.com.com/security/?p=4501tag=nl.e036' EARTH has a Right To Life, otherwise we all die. --- On Sat, 10/23/10, Paul Carey paul.p.ca...@gmail.com wrote: From: Paul Carey paul.p.ca...@gmail.com Subject: Modelling Access Control To: solr-user@lucene.apache.org Date: Saturday, October 23, 2010, 1:03 AM Hi My domain model is made of users that have access to projects which are composed of items. I'm hoping to use Solr and would like to make sure that searches only return results for items that users have access to. I've looked over some of the older posts on this mailing list about access control and saw a suggestion along the lines of acl:user_id AND (actual query). While this obviously works, there are a couple of niggles. Every item must have a list of valid user ids (typically less than 100 in my case). Every time a collaborator is added to or removed from a project, I need to update every item in that project. This will typically be fewer than 1000 items, so I guess is no big deal. I wondered if the following might be a reasonable alternative, assuming the number of projects to which a user has access is lower than a certain bound. (acl:project_id OR acl:project_id OR ... ) AND (actual query) When the numbers are small - e.g. each user has access to ~20 projects and each project has ~20 collaborators - is one approach preferable over another? And when outliers exist - e.g. a project with 2000 collaborators, or a user with access to 2000 projects - is one approach more liable to fail than the other? Many thanks Paul
Re: Modelling Access Control
Hi All, I think using filter queries will be a good option to consider because of the following reasons * The filter query does not affect the score of the items in the result set. If the ACL logic is part of the main query, it could influence the scores of the items in the result set. * Using a filter query could lead to better performance in complex queries because the results from the query specified with fq are cached independently from that of the main query. Since the result of a filter query is cached, it will be used to filter the primary query result using set intersection without having to fetch the ids of the documents from the fq again a second time. It think this will be useful because we could assume that the ACL portion in the fq is relatively constant since the permissions for each user is not something that is changing frequently. http://wiki.apache.org/solr/FilterQueryGuidance On Sat, Oct 23, 2010 at 2:58 PM, Dennis Gearon gear...@sbcglobal.netwrote: why use filter queries? Wouldn't reducing the set headed into the filters by putting it in the main query be faster? (A question to learn, since I do NOT know :-) Dennis Gearon Signature Warning It is always a good idea to learn from your own mistakes. It is usually a better idea to learn from others’ mistakes, so you do not have to make them yourself. from ' http://blogs.techrepublic.com.com/security/?p=4501tag=nl.e036' EARTH has a Right To Life, otherwise we all die. --- On Sat, 10/23/10, Israel Ekpo israele...@gmail.com wrote: From: Israel Ekpo israele...@gmail.com Subject: Re: Modelling Access Control To: solr-user@lucene.apache.org Date: Saturday, October 23, 2010, 7:01 AM Hi Paul, Regardless of how you implement it, I would recommend you use filter queries for the permissions check rather than making it part of the main query. On Sat, Oct 23, 2010 at 4:03 AM, Paul Carey paul.p.ca...@gmail.com wrote: Hi My domain model is made of users that have access to projects which are composed of items. I'm hoping to use Solr and would like to make sure that searches only return results for items that users have access to. I've looked over some of the older posts on this mailing list about access control and saw a suggestion along the lines of acl:user_id AND (actual query). While this obviously works, there are a couple of niggles. Every item must have a list of valid user ids (typically less than 100 in my case). Every time a collaborator is added to or removed from a project, I need to update every item in that project. This will typically be fewer than 1000 items, so I guess is no big deal. I wondered if the following might be a reasonable alternative, assuming the number of projects to which a user has access is lower than a certain bound. (acl:project_id OR acl:project_id OR ... ) AND (actual query) When the numbers are small - e.g. each user has access to ~20 projects and each project has ~20 collaborators - is one approach preferable over another? And when outliers exist - e.g. a project with 2000 collaborators, or a user with access to 2000 projects - is one approach more liable to fail than the other? Many thanks Paul -- °O° Good Enough is not good enough. To give anything less than your best is to sacrifice the gift. Quality First. Measure Twice. Cut Once. http://www.israelekpo.com/ -- °O° Good Enough is not good enough. To give anything less than your best is to sacrifice the gift. Quality First. Measure Twice. Cut Once. http://www.israelekpo.com/