Re: Modelling Access Control

2012-11-01 Thread hupadhyay
Hello All,

I am also trying to model ACL on solr search. Since in my case the data
itself is very huge and user base is also too big. Putting ACL inside solr
gives quite good response time, but ACL outside the solr seems to a
nightmare.

In case of ACL inside the solr puts heavy load on keeping solr index up to
date, because adding a single user in the project with 3 entities in it
requires to update them all in solr index. And we have 500 approx user
addition per day.

Can any body please explain how to implement ACL outside the solr?

one more thing, in my case *search should return in  1sec*

Thanks in advance



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Modelling-Access-Control-tp1756817p4017479.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Modelling Access Control

2010-10-31 Thread Dennis Gearon
Ah haaa. I see now. :-) 

I didn't make that connection. Hopefully I would hbave before I ever tried to 
implement that :-)

Kind of like user names and icons on a windows login :-)

Dennis Gearon

Signature Warning

It is always a good idea to learn from your own mistakes. It is usually a 
better idea to learn from others’ mistakes, so you do not have to make them 
yourself. from 'http://blogs.techrepublic.com.com/security/?p=4501tag=nl.e036'

EARTH has a Right To Life,
  otherwise we all die.


--- On Sat, 10/30/10, Erick Erickson erickerick...@gmail.com wrote:

 From: Erick Erickson erickerick...@gmail.com
 Subject: Re: Modelling Access Control
 To: solr-user@lucene.apache.org
 Date: Saturday, October 30, 2010, 6:01 PM
 If that's in response to Lance's
 comment, the answer is that if you return
 autosuggest possibilities you effectively allow users to
 see data they
 shouldn't. Imagine you have a field of the real names of
 spies. You only
 want the persons way high up in the security chain to
 access these names and
 you control that on a document level.
 
 Allowing autocomplete on that field would be...er...very
 tough on your
 spies' health...
 
 HTH
 Erick
 
 On Tue, Oct 26, 2010 at 2:24 PM, Dennis Gearon gear...@sbcglobal.netwrote:
 
  Son, don't touch that stove . . . .,
 
  OUCH! Hey Dad, I BURNED my hand on that stove, why
 didn't you tell me
  that?!?#! You know I need to know WHY, not just
 DON'T!
 
  Dennis Gearon
 
   Very important: do not make a spelling or
 autosuggest index
   from a
   text field which some people can see and other
 people
   can't.
  
 
 



Re: Modelling Access Control

2010-10-30 Thread Erick Erickson
If that's in response to Lance's comment, the answer is that if you return
autosuggest possibilities you effectively allow users to see data they
shouldn't. Imagine you have a field of the real names of spies. You only
want the persons way high up in the security chain to access these names and
you control that on a document level.

Allowing autocomplete on that field would be...er...very tough on your
spies' health...

HTH
Erick

On Tue, Oct 26, 2010 at 2:24 PM, Dennis Gearon gear...@sbcglobal.netwrote:

 Son, don't touch that stove . . . .,

 OUCH! Hey Dad, I BURNED my hand on that stove, why didn't you tell me
 that?!?#! You know I need to know WHY, not just DON'T!

 Dennis Gearon

  Very important: do not make a spelling or autosuggest index
  from a
  text field which some people can see and other people
  can't.
 




Re: Modelling Access Control

2010-10-26 Thread Lance Norskog
Filter queries are a set of bits which is ANDed against query results
at a very early stage of query processing. They are very useful.  Note
that they are stored (I think) in parsed query order, so you have to
pass in the same filter query string each time.

On Mon, Oct 25, 2010 at 8:59 AM, Dennis Gearon gear...@sbcglobal.net wrote:
 Thanks for that insight, a lot.

 Dennis Gearon

 Signature Warning
 
 It is always a good idea to learn from your own mistakes. It is usually a 
 better idea to learn from others’ mistakes, so you do not have to make them 
 yourself. from 
 'http://blogs.techrepublic.com.com/security/?p=4501tag=nl.e036'

 EARTH has a Right To Life,
  otherwise we all die.


 --- On Mon, 10/25/10, Jonathan Rochkind rochk...@jhu.edu wrote:

 From: Jonathan Rochkind rochk...@jhu.edu
 Subject: Re: Modelling Access Control
 To: solr-user@lucene.apache.org solr-user@lucene.apache.org
 Date: Monday, October 25, 2010, 8:19 AM
 Dennis Gearon wrote:
  why use filter queries?
 
  Wouldn't reducing the set headed into the filters by
 putting it in the main query be faster? (A question to
 learn, since I do NOT know :-)
 
 
 No. At least as I understand it. In the best case, the
 filter query will be a lot faster, because filter queries
 are cached seperately in the filter cache.  So if the
 existing filter query can be found in the cache, it'll be a
 lot faster. If it's not in the cache, the performance should
 be pretty much the same as if you had included it as an
 additional clause in the main q query.

 The reasons to put it in a fq filter are:

 1) The caching behavior. You can have that certain part of
 the query be cached on it's own, speeding up any subsequent
 queries that use that same fq.

 2) Simplification of client code. You can leave your 'q'
 however you want it, using whatever kind of query parser you
 want too (dismax, whatever), and just add on the 'fq'
 without touching the 'q'.   This is a lot
 easier to do, and especially when you're using it for access
 control like this, a lot harder for a bug to creep in.

 Jonathan







-- 
Lance Norskog
goks...@gmail.com


Re: Modelling Access Control

2010-10-26 Thread Lance Norskog
The idea of ACL-based queries is: each document carries all of the
groups or roles that it is ok with. Each user search includes all of
the groups or roles the user has.

The roles are stored as multivalued string fields. Each ACL-based
query passes in roles:A OR roles:B OR roles:C and if any of A,B,C
are in the stored ACL field, you have a match.

This is called early binding. Late binding is when you return
everything and the app calls LDAP and say can she see this? or
this?. This is slow and puts a monster load on the ACL server.

Very important: do not make a spelling or autosuggest index from a
text field which some people can see and other people can't.

On Tue, Oct 26, 2010 at 12:06 AM, Lance Norskog goks...@gmail.com wrote:
 Filter queries are a set of bits which is ANDed against query results
 at a very early stage of query processing. They are very useful.  Note
 that they are stored (I think) in parsed query order, so you have to
 pass in the same filter query string each time.

 On Mon, Oct 25, 2010 at 8:59 AM, Dennis Gearon gear...@sbcglobal.net wrote:
 Thanks for that insight, a lot.

 Dennis Gearon

 Signature Warning
 
 It is always a good idea to learn from your own mistakes. It is usually a 
 better idea to learn from others’ mistakes, so you do not have to make them 
 yourself. from 
 'http://blogs.techrepublic.com.com/security/?p=4501tag=nl.e036'

 EARTH has a Right To Life,
  otherwise we all die.


 --- On Mon, 10/25/10, Jonathan Rochkind rochk...@jhu.edu wrote:

 From: Jonathan Rochkind rochk...@jhu.edu
 Subject: Re: Modelling Access Control
 To: solr-user@lucene.apache.org solr-user@lucene.apache.org
 Date: Monday, October 25, 2010, 8:19 AM
 Dennis Gearon wrote:
  why use filter queries?
 
  Wouldn't reducing the set headed into the filters by
 putting it in the main query be faster? (A question to
 learn, since I do NOT know :-)
 
 
 No. At least as I understand it. In the best case, the
 filter query will be a lot faster, because filter queries
 are cached seperately in the filter cache.  So if the
 existing filter query can be found in the cache, it'll be a
 lot faster. If it's not in the cache, the performance should
 be pretty much the same as if you had included it as an
 additional clause in the main q query.

 The reasons to put it in a fq filter are:

 1) The caching behavior. You can have that certain part of
 the query be cached on it's own, speeding up any subsequent
 queries that use that same fq.

 2) Simplification of client code. You can leave your 'q'
 however you want it, using whatever kind of query parser you
 want too (dismax, whatever), and just add on the 'fq'
 without touching the 'q'.   This is a lot
 easier to do, and especially when you're using it for access
 control like this, a lot harder for a bug to creep in.

 Jonathan







 --
 Lance Norskog
 goks...@gmail.com




-- 
Lance Norskog
goks...@gmail.com


Re: Modelling Access Control

2010-10-26 Thread Dennis Gearon
Son, don't touch that stove . . . .,

OUCH! Hey Dad, I BURNED my hand on that stove, why didn't you tell me 
that?!?#! You know I need to know WHY, not just DON'T!

Dennis Gearon

 Very important: do not make a spelling or autosuggest index
 from a
 text field which some people can see and other people
 can't.
 



Re: Modelling Access Control

2010-10-25 Thread Paul Carey
Many thanks for all the responses. I now plan on benchmarking and
validating both the filter query approach, and maintaining the ACL
entirely outside of Solr. I'll decide from there.

Paul


Re: Modelling Access Control

2010-10-25 Thread Israel Ekpo
On Mon, Oct 25, 2010 at 8:16 AM, Paul Carey paul.p.ca...@gmail.com wrote:

 Many thanks for all the responses. I now plan on benchmarking and
 validating both the filter query approach, and maintaining the ACL
 entirely outside of Solr. I'll decide from there.

 Paul



Great.

I am looking forward for some feedback on the benchmarks.
-- 
°O°
Good Enough is not good enough.
To give anything less than your best is to sacrifice the gift.
Quality First. Measure Twice. Cut Once.
http://www.israelekpo.com/


Re: Modelling Access Control

2010-10-25 Thread Jonathan Rochkind

Dennis Gearon wrote:

why use filter queries?

Wouldn't reducing the set headed into the filters by putting it in the main 
query be faster? (A question to learn, since I do NOT know :-)

  
No. At least as I understand it. In the best case, the filter query will 
be a lot faster, because filter queries are cached seperately in the 
filter cache.  So if the existing filter query can be found in the 
cache, it'll be a lot faster. If it's not in the cache, the performance 
should be pretty much the same as if you had included it as an 
additional clause in the main q query.


The reasons to put it in a fq filter are:

1) The caching behavior. You can have that certain part of the query be 
cached on it's own, speeding up any subsequent queries that use that 
same fq.


2) Simplification of client code. You can leave your 'q' however you 
want it, using whatever kind of query parser you want too (dismax, 
whatever), and just add on the 'fq' without touching the 'q'.   This is 
a lot easier to do, and especially when you're using it for access 
control like this, a lot harder for a bug to creep in.


Jonathan




Re: Modelling Access Control

2010-10-25 Thread Dennis Gearon
I'll also be interested in how that works for you. Bringing out the whole 
dataset not filtered for some kind of access control will mean that you will 
have then do the filtering of the result set in your server side/command line 
program.

So the speed comparison with the filter query vs the outside langauge 
environement will be very  interesting :-)

I will also do this, but in about 3-5 months. I will report it then.


Dennis Gearon

Signature Warning

It is always a good idea to learn from your own mistakes. It is usually a 
better idea to learn from others’ mistakes, so you do not have to make them 
yourself. from 'http://blogs.techrepublic.com.com/security/?p=4501tag=nl.e036'

EARTH has a Right To Life,
  otherwise we all die.


--- On Mon, 10/25/10, Paul Carey paul.p.ca...@gmail.com wrote:

 From: Paul Carey paul.p.ca...@gmail.com
 Subject: Re: Modelling Access Control
 To: solr-user@lucene.apache.org
 Date: Monday, October 25, 2010, 5:16 AM
 Many thanks for all the responses. I
 now plan on benchmarking and
 validating both the filter query approach, and maintaining
 the ACL
 entirely outside of Solr. I'll decide from there.
 
 Paul



Re: Modelling Access Control

2010-10-25 Thread Dennis Gearon
Thanks for that insight, a lot.

Dennis Gearon

Signature Warning

It is always a good idea to learn from your own mistakes. It is usually a 
better idea to learn from others’ mistakes, so you do not have to make them 
yourself. from 'http://blogs.techrepublic.com.com/security/?p=4501tag=nl.e036'

EARTH has a Right To Life,
  otherwise we all die.


--- On Mon, 10/25/10, Jonathan Rochkind rochk...@jhu.edu wrote:

 From: Jonathan Rochkind rochk...@jhu.edu
 Subject: Re: Modelling Access Control
 To: solr-user@lucene.apache.org solr-user@lucene.apache.org
 Date: Monday, October 25, 2010, 8:19 AM
 Dennis Gearon wrote:
  why use filter queries?
  
  Wouldn't reducing the set headed into the filters by
 putting it in the main query be faster? (A question to
 learn, since I do NOT know :-)
  
    
 No. At least as I understand it. In the best case, the
 filter query will be a lot faster, because filter queries
 are cached seperately in the filter cache.  So if the
 existing filter query can be found in the cache, it'll be a
 lot faster. If it's not in the cache, the performance should
 be pretty much the same as if you had included it as an
 additional clause in the main q query.
 
 The reasons to put it in a fq filter are:
 
 1) The caching behavior. You can have that certain part of
 the query be cached on it's own, speeding up any subsequent
 queries that use that same fq.
 
 2) Simplification of client code. You can leave your 'q'
 however you want it, using whatever kind of query parser you
 want too (dismax, whatever), and just add on the 'fq'
 without touching the 'q'.   This is a lot
 easier to do, and especially when you're using it for access
 control like this, a lot harder for a bug to creep in.
 
 Jonathan
 
 



Re: Modelling Access Control

2010-10-24 Thread Peter Sturge
Hi,

See SOLR-1872 for a way of providing access control, whilst placing
the ACL configuration itself outside of Solr, which is generally a
good idea.
   
http://www.lucidimagination.com/search/out?u=http://issues.apache.org/jira/browse/SOLR-1872

There are a number of ways to approach Access Control, but you will
need to take a number of factors into account that aren't issues if
you're doing non-acl Solr queries.
You can use this patch to achieve authentication and authorization, or
use it as a template for similar techniques.

Peter



On Sat, Oct 23, 2010 at 9:03 AM, Paul Carey paul.p.ca...@gmail.com wrote:
 Hi

 My domain model is made of users that have access to projects which
 are composed of items. I'm hoping to use Solr and would like to make
 sure that searches only return results for items that users have
 access to.

 I've looked over some of the older posts on this mailing list about
 access control and saw a suggestion along the lines of
 acl:user_id AND (actual query).

 While this obviously works, there are a couple of niggles. Every item
 must have a list of valid user ids (typically less than 100 in my
 case). Every time a collaborator is added to or removed from a
 project, I need to update every item in that project. This will
 typically be fewer than 1000 items, so I guess is no big deal.

 I wondered if the following might be a reasonable alternative,
 assuming the number of projects to which a user has access is lower
 than a certain bound.
 (acl:project_id OR acl:project_id OR ... ) AND (actual query)

 When the numbers are small - e.g. each user has access to ~20 projects
 and each project has ~20 collaborators - is one approach preferable
 over another? And when outliers exist - e.g. a project with 2000
 collaborators, or a user with access to 2000 projects - is one
 approach more liable to fail than the other?

 Many thanks

 Paul



Modelling Access Control

2010-10-23 Thread Paul Carey
Hi

My domain model is made of users that have access to projects which
are composed of items. I'm hoping to use Solr and would like to make
sure that searches only return results for items that users have
access to.

I've looked over some of the older posts on this mailing list about
access control and saw a suggestion along the lines of
acl:user_id AND (actual query).

While this obviously works, there are a couple of niggles. Every item
must have a list of valid user ids (typically less than 100 in my
case). Every time a collaborator is added to or removed from a
project, I need to update every item in that project. This will
typically be fewer than 1000 items, so I guess is no big deal.

I wondered if the following might be a reasonable alternative,
assuming the number of projects to which a user has access is lower
than a certain bound.
(acl:project_id OR acl:project_id OR ... ) AND (actual query)

When the numbers are small - e.g. each user has access to ~20 projects
and each project has ~20 collaborators - is one approach preferable
over another? And when outliers exist - e.g. a project with 2000
collaborators, or a user with access to 2000 projects - is one
approach more liable to fail than the other?

Many thanks

Paul


Re: Modelling Access Control

2010-10-23 Thread Israel Ekpo
Hi Paul,

Regardless of how you implement it, I would recommend you use filter queries
for the permissions check rather than making it part of the main query.

On Sat, Oct 23, 2010 at 4:03 AM, Paul Carey paul.p.ca...@gmail.com wrote:

 Hi

 My domain model is made of users that have access to projects which
 are composed of items. I'm hoping to use Solr and would like to make
 sure that searches only return results for items that users have
 access to.

 I've looked over some of the older posts on this mailing list about
 access control and saw a suggestion along the lines of
 acl:user_id AND (actual query).

 While this obviously works, there are a couple of niggles. Every item
 must have a list of valid user ids (typically less than 100 in my
 case). Every time a collaborator is added to or removed from a
 project, I need to update every item in that project. This will
 typically be fewer than 1000 items, so I guess is no big deal.

 I wondered if the following might be a reasonable alternative,
 assuming the number of projects to which a user has access is lower
 than a certain bound.
 (acl:project_id OR acl:project_id OR ... ) AND (actual query)

 When the numbers are small - e.g. each user has access to ~20 projects
 and each project has ~20 collaborators - is one approach preferable
 over another? And when outliers exist - e.g. a project with 2000
 collaborators, or a user with access to 2000 projects - is one
 approach more liable to fail than the other?

 Many thanks

 Paul




-- 
°O°
Good Enough is not good enough.
To give anything less than your best is to sacrifice the gift.
Quality First. Measure Twice. Cut Once.
http://www.israelekpo.com/


Re: Modelling Access Control

2010-10-23 Thread Dennis Gearon
Two things will lessen the solr admininstrative load :

1/ Follow examples of databases and *nix OSs. Give each user their own group, 
or set up groups that don't have regular users as OWNERS, but can have users 
assigned to the group to give them particular permissions. I.E. Roles, like 
publishers, reviewers, friends, etc.

2/ Put your ACL outside of Solr, using your server-side/command line language's 
object oriented properties. Force all searches to come from a single location 
in code (not sure how to do that), and make the piece of code check 
authentication and authorization.

This is what my research shows how others do it, and how I plan to do it. ANY 
insight others have on this, I really want to hear.

Dennis Gearon

Signature Warning

It is always a good idea to learn from your own mistakes. It is usually a 
better idea to learn from others’ mistakes, so you do not have to make them 
yourself. from 'http://blogs.techrepublic.com.com/security/?p=4501tag=nl.e036'

EARTH has a Right To Life,
  otherwise we all die.


--- On Sat, 10/23/10, Paul Carey paul.p.ca...@gmail.com wrote:

 From: Paul Carey paul.p.ca...@gmail.com
 Subject: Modelling Access Control
 To: solr-user@lucene.apache.org
 Date: Saturday, October 23, 2010, 1:03 AM
 Hi
 
 My domain model is made of users that have access to
 projects which
 are composed of items. I'm hoping to use Solr and would
 like to make
 sure that searches only return results for items that users
 have
 access to.
 
 I've looked over some of the older posts on this mailing
 list about
 access control and saw a suggestion along the lines of
 acl:user_id AND (actual query).
 
 While this obviously works, there are a couple of niggles.
 Every item
 must have a list of valid user ids (typically less than 100
 in my
 case). Every time a collaborator is added to or removed
 from a
 project, I need to update every item in that project. This
 will
 typically be fewer than 1000 items, so I guess is no big
 deal.
 
 I wondered if the following might be a reasonable
 alternative,
 assuming the number of projects to which a user has access
 is lower
 than a certain bound.
 (acl:project_id OR acl:project_id OR ... )
 AND (actual query)
 
 When the numbers are small - e.g. each user has access to
 ~20 projects
 and each project has ~20 collaborators - is one approach
 preferable
 over another? And when outliers exist - e.g. a project with
 2000
 collaborators, or a user with access to 2000 projects - is
 one
 approach more liable to fail than the other?
 
 Many thanks
 
 Paul



Re: Modelling Access Control

2010-10-23 Thread Dennis Gearon
why use filter queries?

Wouldn't reducing the set headed into the filters by putting it in the main 
query be faster? (A question to learn, since I do NOT know :-)

Dennis Gearon

Signature Warning

It is always a good idea to learn from your own mistakes. It is usually a 
better idea to learn from others’ mistakes, so you do not have to make them 
yourself. from 'http://blogs.techrepublic.com.com/security/?p=4501tag=nl.e036'

EARTH has a Right To Life,
  otherwise we all die.


--- On Sat, 10/23/10, Israel Ekpo israele...@gmail.com wrote:

 From: Israel Ekpo israele...@gmail.com
 Subject: Re: Modelling Access Control
 To: solr-user@lucene.apache.org
 Date: Saturday, October 23, 2010, 7:01 AM
 Hi Paul,
 
 Regardless of how you implement it, I would recommend you
 use filter queries
 for the permissions check rather than making it part of the
 main query.
 
 On Sat, Oct 23, 2010 at 4:03 AM, Paul Carey paul.p.ca...@gmail.com
 wrote:
 
  Hi
 
  My domain model is made of users that have access to
 projects which
  are composed of items. I'm hoping to use Solr and
 would like to make
  sure that searches only return results for items that
 users have
  access to.
 
  I've looked over some of the older posts on this
 mailing list about
  access control and saw a suggestion along the lines
 of
  acl:user_id AND (actual query).
 
  While this obviously works, there are a couple of
 niggles. Every item
  must have a list of valid user ids (typically less
 than 100 in my
  case). Every time a collaborator is added to or
 removed from a
  project, I need to update every item in that project.
 This will
  typically be fewer than 1000 items, so I guess is no
 big deal.
 
  I wondered if the following might be a reasonable
 alternative,
  assuming the number of projects to which a user has
 access is lower
  than a certain bound.
  (acl:project_id OR acl:project_id OR
 ... ) AND (actual query)
 
  When the numbers are small - e.g. each user has access
 to ~20 projects
  and each project has ~20 collaborators - is one
 approach preferable
  over another? And when outliers exist - e.g. a project
 with 2000
  collaborators, or a user with access to 2000 projects
 - is one
  approach more liable to fail than the other?
 
  Many thanks
 
  Paul
 
 
 
 
 -- 
 °O°
 Good Enough is not good enough.
 To give anything less than your best is to sacrifice the
 gift.
 Quality First. Measure Twice. Cut Once.
 http://www.israelekpo.com/



Re: Modelling Access Control

2010-10-23 Thread Dennis Gearon
Forgot to add,
3/ The external, application code selects the GROUPS that the user has 
permission to read (Solr will only serve up what is to be read?) then search on 
those groups.


Dennis Gearon

Signature Warning

It is always a good idea to learn from your own mistakes. It is usually a 
better idea to learn from others’ mistakes, so you do not have to make them 
yourself. from 'http://blogs.techrepublic.com.com/security/?p=4501tag=nl.e036'

EARTH has a Right To Life,
  otherwise we all die.


--- On Sat, 10/23/10, Dennis Gearon gear...@sbcglobal.net wrote:

 From: Dennis Gearon gear...@sbcglobal.net
 Subject: Re: Modelling Access Control
 To: solr-user@lucene.apache.org
 Date: Saturday, October 23, 2010, 11:49 AM
 Two things will lessen the solr
 admininstrative load :
 
 1/ Follow examples of databases and *nix OSs. Give each
 user their own group, or set up groups that don't have
 regular users as OWNERS, but can have users assigned to the
 group to give them particular permissions. I.E. Roles, like
 publishers, reviewers, friends, etc.
 
 2/ Put your ACL outside of Solr, using your
 server-side/command line language's object oriented
 properties. Force all searches to come from a single
 location in code (not sure how to do that), and make the
 piece of code check authentication and authorization.
 
 This is what my research shows how others do it, and how I
 plan to do it. ANY insight others have on this, I really
 want to hear.
 
 Dennis Gearon
 
 Signature Warning
 
 It is always a good idea to learn from your own mistakes.
 It is usually a better idea to learn from others’
 mistakes, so you do not have to make them yourself. from 
 'http://blogs.techrepublic.com.com/security/?p=4501tag=nl.e036'
 
 EARTH has a Right To Life,
   otherwise we all die.
 
 
 --- On Sat, 10/23/10, Paul Carey paul.p.ca...@gmail.com
 wrote:
 
  From: Paul Carey paul.p.ca...@gmail.com
  Subject: Modelling Access Control
  To: solr-user@lucene.apache.org
  Date: Saturday, October 23, 2010, 1:03 AM
  Hi
  
  My domain model is made of users that have access to
  projects which
  are composed of items. I'm hoping to use Solr and
 would
  like to make
  sure that searches only return results for items that
 users
  have
  access to.
  
  I've looked over some of the older posts on this
 mailing
  list about
  access control and saw a suggestion along the lines
 of
  acl:user_id AND (actual query).
  
  While this obviously works, there are a couple of
 niggles.
  Every item
  must have a list of valid user ids (typically less
 than 100
  in my
  case). Every time a collaborator is added to or
 removed
  from a
  project, I need to update every item in that project.
 This
  will
  typically be fewer than 1000 items, so I guess is no
 big
  deal.
  
  I wondered if the following might be a reasonable
  alternative,
  assuming the number of projects to which a user has
 access
  is lower
  than a certain bound.
  (acl:project_id OR acl:project_id OR
 ... )
  AND (actual query)
  
  When the numbers are small - e.g. each user has access
 to
  ~20 projects
  and each project has ~20 collaborators - is one
 approach
  preferable
  over another? And when outliers exist - e.g. a project
 with
  2000
  collaborators, or a user with access to 2000 projects
 - is
  one
  approach more liable to fail than the other?
  
  Many thanks
  
  Paul
 



Re: Modelling Access Control

2010-10-23 Thread Savvas-Andreas Moysidis
Pushing ACL logic outside Solr sounds like a prudent choice indeed as in, my
opinion, all of the business rules/conceptual logic should reside only
within the code boundaries. This way your domain will be easier to model and
your code to read, understand and maintain.

More information on Filter Queries, when they should be used and how they
affect performance can be found here:
http://wiki.apache.org/solr/FilterQueryGuidance

On 23 October 2010 20:00, Dennis Gearon gear...@sbcglobal.net wrote:

 Forgot to add,
 3/ The external, application code selects the GROUPS that the user has
 permission to read (Solr will only serve up what is to be read?) then search
 on those groups.


 Dennis Gearon

 Signature Warning
 
 It is always a good idea to learn from your own mistakes. It is usually a
 better idea to learn from others’ mistakes, so you do not have to make them
 yourself. from '
 http://blogs.techrepublic.com.com/security/?p=4501tag=nl.e036'

 EARTH has a Right To Life,
  otherwise we all die.


 --- On Sat, 10/23/10, Dennis Gearon gear...@sbcglobal.net wrote:

  From: Dennis Gearon gear...@sbcglobal.net
  Subject: Re: Modelling Access Control
  To: solr-user@lucene.apache.org
  Date: Saturday, October 23, 2010, 11:49 AM
  Two things will lessen the solr
  admininstrative load :
 
  1/ Follow examples of databases and *nix OSs. Give each
  user their own group, or set up groups that don't have
  regular users as OWNERS, but can have users assigned to the
  group to give them particular permissions. I.E. Roles, like
  publishers, reviewers, friends, etc.
 
  2/ Put your ACL outside of Solr, using your
  server-side/command line language's object oriented
  properties. Force all searches to come from a single
  location in code (not sure how to do that), and make the
  piece of code check authentication and authorization.
 
  This is what my research shows how others do it, and how I
  plan to do it. ANY insight others have on this, I really
  want to hear.
 
  Dennis Gearon
 
  Signature Warning
  
  It is always a good idea to learn from your own mistakes.
  It is usually a better idea to learn from others’
  mistakes, so you do not have to make them yourself. from '
 http://blogs.techrepublic.com.com/security/?p=4501tag=nl.e036'
 
  EARTH has a Right To Life,
otherwise we all die.
 
 
  --- On Sat, 10/23/10, Paul Carey paul.p.ca...@gmail.com
  wrote:
 
   From: Paul Carey paul.p.ca...@gmail.com
   Subject: Modelling Access Control
   To: solr-user@lucene.apache.org
   Date: Saturday, October 23, 2010, 1:03 AM
   Hi
  
   My domain model is made of users that have access to
   projects which
   are composed of items. I'm hoping to use Solr and
  would
   like to make
   sure that searches only return results for items that
  users
   have
   access to.
  
   I've looked over some of the older posts on this
  mailing
   list about
   access control and saw a suggestion along the lines
  of
   acl:user_id AND (actual query).
  
   While this obviously works, there are a couple of
  niggles.
   Every item
   must have a list of valid user ids (typically less
  than 100
   in my
   case). Every time a collaborator is added to or
  removed
   from a
   project, I need to update every item in that project.
  This
   will
   typically be fewer than 1000 items, so I guess is no
  big
   deal.
  
   I wondered if the following might be a reasonable
   alternative,
   assuming the number of projects to which a user has
  access
   is lower
   than a certain bound.
   (acl:project_id OR acl:project_id OR
  ... )
   AND (actual query)
  
   When the numbers are small - e.g. each user has access
  to
   ~20 projects
   and each project has ~20 collaborators - is one
  approach
   preferable
   over another? And when outliers exist - e.g. a project
  with
   2000
   collaborators, or a user with access to 2000 projects
  - is
   one
   approach more liable to fail than the other?
  
   Many thanks
  
   Paul
  
 



Re: Modelling Access Control

2010-10-23 Thread Israel Ekpo
Hi All,

I think using filter queries will be a good option to consider because of
the following reasons

* The filter query does not affect the score of the items in the result set.
If the ACL logic is part of the main query, it could influence the scores of
the items in the result set.

* Using a filter query could lead to better performance in complex queries
because the results from the query specified with fq are cached
independently from that of the main query. Since the result of a filter
query is cached, it will be used to filter the primary query result using
set intersection without having to fetch the ids of the documents from the
fq again a second time.

It think this will be useful because we could assume that the ACL portion in
the fq is relatively constant since the permissions for each user is not
something that is changing frequently.

http://wiki.apache.org/solr/FilterQueryGuidance


On Sat, Oct 23, 2010 at 2:58 PM, Dennis Gearon gear...@sbcglobal.netwrote:

 why use filter queries?

 Wouldn't reducing the set headed into the filters by putting it in the main
 query be faster? (A question to learn, since I do NOT know :-)

 Dennis Gearon

 Signature Warning
 
 It is always a good idea to learn from your own mistakes. It is usually a
 better idea to learn from others’ mistakes, so you do not have to make them
 yourself. from '
 http://blogs.techrepublic.com.com/security/?p=4501tag=nl.e036'

 EARTH has a Right To Life,
  otherwise we all die.


 --- On Sat, 10/23/10, Israel Ekpo israele...@gmail.com wrote:

  From: Israel Ekpo israele...@gmail.com
  Subject: Re: Modelling Access Control
  To: solr-user@lucene.apache.org
  Date: Saturday, October 23, 2010, 7:01 AM
  Hi Paul,
 
  Regardless of how you implement it, I would recommend you
  use filter queries
  for the permissions check rather than making it part of the
  main query.
 
  On Sat, Oct 23, 2010 at 4:03 AM, Paul Carey paul.p.ca...@gmail.com
  wrote:
 
   Hi
  
   My domain model is made of users that have access to
  projects which
   are composed of items. I'm hoping to use Solr and
  would like to make
   sure that searches only return results for items that
  users have
   access to.
  
   I've looked over some of the older posts on this
  mailing list about
   access control and saw a suggestion along the lines
  of
   acl:user_id AND (actual query).
  
   While this obviously works, there are a couple of
  niggles. Every item
   must have a list of valid user ids (typically less
  than 100 in my
   case). Every time a collaborator is added to or
  removed from a
   project, I need to update every item in that project.
  This will
   typically be fewer than 1000 items, so I guess is no
  big deal.
  
   I wondered if the following might be a reasonable
  alternative,
   assuming the number of projects to which a user has
  access is lower
   than a certain bound.
   (acl:project_id OR acl:project_id OR
  ... ) AND (actual query)
  
   When the numbers are small - e.g. each user has access
  to ~20 projects
   and each project has ~20 collaborators - is one
  approach preferable
   over another? And when outliers exist - e.g. a project
  with 2000
   collaborators, or a user with access to 2000 projects
  - is one
   approach more liable to fail than the other?
  
   Many thanks
  
   Paul
  
 
 
 
  --
  °O°
  Good Enough is not good enough.
  To give anything less than your best is to sacrifice the
  gift.
  Quality First. Measure Twice. Cut Once.
  http://www.israelekpo.com/
 




-- 
°O°
Good Enough is not good enough.
To give anything less than your best is to sacrifice the gift.
Quality First. Measure Twice. Cut Once.
http://www.israelekpo.com/