Re: LCF security with Solr

2010-04-06 Thread Karl Wright

Erik Hatcher wrote:

Karl -

I appreciate you starting this thread on this important topic.  To kick 
start some discussions, some thoughts are inline below...


On Apr 6, 2010, at 9:24 AM, Karl Wright wrote:
As many may be aware, the LCF model relies on "access tokens" (e.g. 
active directory SIDs).  There are "allow" tokens, and "deny" tokens.  
They are currently dropped on the floor when Solr is involved, but 
they can readily (and most naturally) be handled to Solr as metadata 
when a document is ingested.


These tokens are arbitrary strings, right?  In other words, the strings 
from one data source isn't going to be in the same format as from 
another data source, as I understand it.




That is correct; they are arbitrary strings.  LCF defined a concept of "authority".  Each authority is defined in the UI by a 
user, based on a specific authority connector, which creates a named "authority connection".  LCF uses the authority connection 
name to establish a "space" of tokens; so tokens from one authority cannot collide with tokens from any other authority.


The specific format of tokens is up to the authority to determine.  For the Active Directory authority (the connection named 
"AD"), they may look like this:


AD:S-1-1-0
AD:S-1-5-32-545
AD:S-1-5-32-544
AD:S-1-5-21-4271684201-248514445-3847783096-518
AD:S-1-5-21-4271684201-248514445-3847783096-519
AD:S-1-5-21-4271684201-248514445-3847783096-512
AD:S-1-5-21-4271684201-248514445-3847783096-513
AD:S-1-5-21-4271684201-248514445-3847783096-520
AD:S-1-5-21-4271684201-248514445-3847783096-500

For Documentum, they may look like this:

My Documentum:_dm_012345678
My Documentum:_dm_123456789
...

For Livelink, they could be simple numbers, with some special ones:

My Livelink:0123532
My Livelink:31231
My Livelink:Guest


So, if you had the above 3 authorities defined, that user's tokens would 
consist of this entire list:

AD:S-1-1-0
AD:S-1-5-32-545
AD:S-1-5-32-544
AD:S-1-5-21-4271684201-248514445-3847783096-518
AD:S-1-5-21-4271684201-248514445-3847783096-519
AD:S-1-5-21-4271684201-248514445-3847783096-512
AD:S-1-5-21-4271684201-248514445-3847783096-513
AD:S-1-5-21-4271684201-248514445-3847783096-520
AD:S-1-5-21-4271684201-248514445-3847783096-500
My Documentum:_dm_012345678
My Documentum:_dm_123456789
...
My Livelink:0123532
My Livelink:31231
My Livelink:Guest

It all makes sense, because each individual repository connection gets affiliated with an authority connection.  So the tokens 
associated with the documents are qualified by the appropriate authority in exactly the same way.



Can you provide some examples of the grant and deny strings one may get 
from a few different data sources?


The authority does not distinguish between "grant" and "deny", that distinction 
is made for tokens attached to documents only.




Read more about the LCF security model here:

http://cwiki.apache.org/confluence/display/CONNECTORS/Lucene+Connectors+Framework+concepts 



My proposal is therefore to do the following:

(1) Choose specific metadata names that LCF will use for "allow" 
tokens and "deny" tokens;
(2) Write a Solr request handler, which would peel out the special 
headers that LCF's mod_authz_annotate module puts into the request, 
and put those into a Solr request object;


Rather than a request handler, which would be too constraining on the 
Solr configuration of various request handlers, this is probably best as 
a servlet filter that fronts Solr's dispatch filter and simply adds 
parameters to the request passed on to Solr.




ok, I'll take your word for it...

mod_authz_annotate - I need to understand this, but it will be a 
required front to Solr to take advantage of the grant/deny strings?   Is 
this where the user credentials get processed?




mod_authz_annotate is meant to be used in conjunction with mod_auth_kerb.  mod_auth_kerb provides the authentication services. 
mod_authz_annotate then takes the authenticated qualified username, and communicates with the lcf-authority-service webapp to 
obtain the complete set of tokens for that user - across all authorities.


Allowing the search component to pick up the parameters and add the 
filtering...


(3) Write a Solr search component, which pulls out the access tokens 
from the Solr request object, and effectively wraps all incoming 
queries with the appropriate clauses that limit the results returned 
according to the appropriate "allow" and "deny" metadata matches.


(a) Is this the right approach (bearing in mind that the LCF security 
model is pretty deeply ingrained in LCF at this time, and is thus not 
subject to significant changes);


Seems like a good approach with a servlet filter and search component.  
Although I'm unclear how this will work with more than one data source 
indexed with different grant/deny formats.




Works fine. ;-)  If the above doesn't answer your questions, let me know.

(b) Where should all of this live?  Should it be a component of Solr, 
or a component o

Re: LCF security with Solr

2010-04-06 Thread Erik Hatcher

Karl -

I appreciate you starting this thread on this important topic.  To  
kick start some discussions, some thoughts are inline below...


On Apr 6, 2010, at 9:24 AM, Karl Wright wrote:
As many may be aware, the LCF model relies on "access tokens" (e.g.  
active directory SIDs).  There are "allow" tokens, and "deny"  
tokens.  They are currently dropped on the floor when Solr is  
involved, but they can readily (and most naturally) be handled to  
Solr as metadata when a document is ingested.


These tokens are arbitrary strings, right?  In other words, the  
strings from one data source isn't going to be in the same format as  
from another data source, as I understand it.


Can you provide some examples of the grant and deny strings one may  
get from a few different data sources?



Read more about the LCF security model here:

http://cwiki.apache.org/confluence/display/CONNECTORS/Lucene+Connectors+Framework+concepts

My proposal is therefore to do the following:

(1) Choose specific metadata names that LCF will use for "allow"  
tokens and "deny" tokens;
(2) Write a Solr request handler, which would peel out the special  
headers that LCF's mod_authz_annotate module puts into the request,  
and put those into a Solr request object;


Rather than a request handler, which would be too constraining on the  
Solr configuration of various request handlers, this is probably best  
as a servlet filter that fronts Solr's dispatch filter and simply adds  
parameters to the request passed on to Solr.


mod_authz_annotate - I need to understand this, but it will be a  
required front to Solr to take advantage of the grant/deny strings?
Is this where the user credentials get processed?


Allowing the search component to pick up the parameters and add the  
filtering...


(3) Write a Solr search component, which pulls out the access tokens  
from the Solr request object, and effectively wraps all incoming  
queries with the appropriate clauses that limit the results returned  
according to the appropriate "allow" and "deny" metadata matches.


(a) Is this the right approach (bearing in mind that the LCF  
security model is pretty deeply ingrained in LCF at this time, and  
is thus not subject to significant changes);


Seems like a good approach with a servlet filter and search  
component.  Although I'm unclear how this will work with more than one  
data source indexed with different grant/deny formats.


(b) Where should all of this live?  Should it be a component of  
Solr, or a component of LCF?


Good questions!   I don't have any strong opinion on this just yet.   
Always a toss-up when it comes to placing code that straddles two  
projects.  But I think I lean towards having this in the new lucene/ 
solr trunk as a module.  While I'm pretty Solr-centric these days, I  
can imagine that LCF can have an output connector to write to Lucene's  
API directly and some may find it handy to have some common filtering  
code shared between Lucene and Solr.


(c) The access tokens used by LCF are arbitrary strings, which are  
usually alphanumeric, but do contain certain punctuation. Would this  
cause a problem?


Punctuation won't cause a problem, but jiving a search request from a  
user into the various grant/deny is what I'm not quite understanding  
just yet.  Would there be issues with multiple data sources integrated  
into one Solr index?


Erik



Re: LCF security with Solr

2010-04-06 Thread Grant Ingersoll
You should also see SOLR-1834.  More later.

On Apr 6, 2010, at 9:24 AM, Karl Wright wrote:

> Hi,
> 
> This post pertains to the integration between Lucene Connectors Framework and 
> Solr.
> 
> I don't know a ton about Solr, but one of the engineers here at MetaCarta has 
> become quite familiar with it.  So, I took some time to try and work through 
> one of the outstanding LCF/Solr integration issues, which is how to enforce 
> the LCF security model using Solr.
> 
> As many may be aware, the LCF model relies on "access tokens" (e.g. active 
> directory SIDs).  There are "allow" tokens, and "deny" tokens.  They are 
> currently dropped on the floor when Solr is involved, but they can readily 
> (and most naturally) be handled to Solr as metadata when a document is 
> ingested.
> 
> Read more about the LCF security model here:
> 
> http://cwiki.apache.org/confluence/display/CONNECTORS/Lucene+Connectors+Framework+concepts
> 
> My proposal is therefore to do the following:
> 
> (1) Choose specific metadata names that LCF will use for "allow" tokens and 
> "deny" tokens;
> (2) Write a Solr request handler, which would peel out the special headers 
> that LCF's mod_authz_annotate module puts into the request, and put those 
> into a Solr request object;
> (3) Write a Solr search component, which pulls out the access tokens from the 
> Solr request object, and effectively wraps all incoming queries with the 
> appropriate clauses that limit the results returned according to the 
> appropriate "allow" and "deny" metadata matches.
> 
> Some questions:
> 
> (a) Is this the right approach (bearing in mind that the LCF security model 
> is pretty deeply ingrained in LCF at this time, and is thus not subject to 
> significant changes);
> (b) Where should all of this live?  Should it be a component of Solr, or a 
> component of LCF?
> (c) The access tokens used by LCF are arbitrary strings, which are usually 
> alphanumeric, but do contain certain punctuation. Would this cause a problem?
> 
> Thanks,
> Karl



LCF security with Solr

2010-04-06 Thread Karl Wright

Hi,

This post pertains to the integration between Lucene Connectors Framework and 
Solr.

I don't know a ton about Solr, but one of the engineers here at MetaCarta has become quite familiar with it.  So, I took some 
time to try and work through one of the outstanding LCF/Solr integration issues, which is how to enforce the LCF security model 
using Solr.


As many may be aware, the LCF model relies on "access tokens" (e.g. active directory SIDs).  There are "allow" tokens, and 
"deny" tokens.  They are currently dropped on the floor when Solr is involved, but they can readily (and most naturally) be 
handled to Solr as metadata when a document is ingested.


Read more about the LCF security model here:

http://cwiki.apache.org/confluence/display/CONNECTORS/Lucene+Connectors+Framework+concepts

My proposal is therefore to do the following:

(1) Choose specific metadata names that LCF will use for "allow" tokens and 
"deny" tokens;
(2) Write a Solr request handler, which would peel out the special headers that LCF's mod_authz_annotate module puts into the 
request, and put those into a Solr request object;
(3) Write a Solr search component, which pulls out the access tokens from the Solr request object, and effectively wraps all 
incoming queries with the appropriate clauses that limit the results returned according to the appropriate "allow" and "deny" 
metadata matches.


Some questions:

(a) Is this the right approach (bearing in mind that the LCF security model is pretty deeply ingrained in LCF at this time, and 
is thus not subject to significant changes);

(b) Where should all of this live?  Should it be a component of Solr, or a 
component of LCF?
(c) The access tokens used by LCF are arbitrary strings, which are usually alphanumeric, but do contain certain punctuation. 
Would this cause a problem?


Thanks,
Karl