Re: How to map the atlassian confluence security model to manifoldcf

Markus Schuch Thu, 30 May 2013 04:20:38 -0700

Hi Karl,

sorry for not beeing very responsive.
We had a lot to do this week. We created a confluence plugin to add an api to
confluence that can give us all the information about permissions we need.


Your proposal to calculate the minimal user/group list (take groups where
possible, create intersection of userlists where needed) sounds promising to me.
But my colleague is worried that this solution does not scale well on solr and
that we will have to deal with very long user lists. (We implement this for a
~300,000 people company). At the moment we don't know how long the biggest
userlist will be.

So, the next step is to examine the content and the permissions after our admins
installed the brand new plugin. When we have an overview how our admins work
with page permissions and how big our groups and the resulting intersected user
lists are, than we will decide which way to go.

I'll keep you updated.

Markus

Am 30.05.2013 12:17, schrieb Karl Wright:
> Hi Markus,
> 
> Have you had any luck with this?
> 
> Karl
> 
> 
> 
> On Sun, May 26, 2013 at 9:32 AM, Karl Wright <[email protected]
> <mailto:[email protected]>> wrote:
> 
>     Hi Markus,
> 
>     The usual way these things map is that there is an API call that gets a 
> list
>     of groups and users that can see
>     the resource, and *maybe* there's a list of groups and users that are
>     prohibited from seeing the resource.
>     These user ids and group ids get used as access tokens.  The semantics of
>     the ManifoldCF access tokens are that prohibitions supercede allowances. 
>     The authority service then simply returns the user id and a list of
>     group ids to which the user belongs, provided such functionality exists in
>     the API.
> 
>     In the case of Atlassian, where parents have both prohibition lists as 
> well
>     as allowance lists, it is usually the case that the prohibition lists can
>     simply be unioned when they are flattened.  Being a member of any 
> prohibited
>     group in the hierarchy is sufficient to exclude a user from seeing the
>     resource.  For allowance
>     lists, however, it is not possible to merge the lists in a simple way, 
> since
>     as you point out you are trying to
>     capture an "AND" relationship.  To make this concrete, say you have three
>     objects - A->B->C, and let's say
>     P(A) is the allow list for A, P(B) for B, etc.  Then, you want
>     "user_in(P(A)) AND user_in(P(B)) AND user_in(P(C))".
> 
>     I agree that the only viable way to flatten this is to create an access
>     token for every combination of group
>     permissions you are likely to see.  So if there were the groups G1 G2 G3 
> G4
>     and G5, there would have to be
>     access tokens for "G1 AND G2", "G2 AND G3", "G1 AND G2 AND G3", etc.  The
>     authority service would then be stuck returning a combinatorially large
>     number of access tokens, and that would not do at all.
> 
>     An alternative is to try and find a way to implement the AND relationship
>     between access tokens natively.
>     To do it his way requires an open-ended and potentially combinatorially
>     large number of index fields.  You'd
>     need one such field per page, seems to me.  In theory Solr has a way of
>     creating N fields at index time, where
>     you just use a special field prefix, and the field is created.  But there
>     are two problems with this.  First,
>     at query time, the Lucene query the Solr plugin would need to build would
>     contain a clause for every page in
>     Atlassian.  That's not going to work.  Second, we'd need a default value 
> for
>     access tokens for all pages in
>     Atlassian for every document indexed, and I don't think that's 
> configurable
>     in Solr either.
> 
>     Another alternative is to post-filter results.  This will require
>     significant support in ManifoldCF, especially in the
>     authority connector, but it could be added with not too much trouble.  The
>     downside is that there are going to
>     be cases where one would need to go through a lot of results to find the 
> few
>     that one is allowed to see.  I'm
>     willing to do this, though, if there are no better alternatives.
> 
>     But there's one more possibility, which is worth thinking about. 
>     Specifically, try the approach of actually calculating the minimal
>     user/group list for the document, at indexing time.  So the access tokens
>     are group id's and user id's, and the connector logic actually calculates
>     the minimal intersection of P(A), P(B), and P(C) in the example above.
> 
>     Example 1:
>     P(A) was G1 or G2
>     P(B) was G2 or G3
>     P(C) was G4
> 
>     ...then the logic would explicitly find all users which matched ALL of 
> those
>     criteria - which would mean that the
>     access token list for the document would be a list of individual user id's
>     in this case, not groups - specifically the list of user ids of those 
> users
>     that belong to G2 AND G4.
> 
>     Example 2:
>     P(A) was G1 or G2 or G3
>     P(B) was G2 or G3
>     P(C) was G3
> 
>     ...then the logic would return just the group id for G3.
> 
>     The only problem with this approach that I can see is that if the sysadmin
>     structures things like example 1, the
>     only way a user would be rendered unable to see such a document would be 
> via
>     reindexing.  Changing the user's group affinity alone would not be
>     sufficient in that case.  However, I strongly suspect that real Atlassian
>     sysadmins do things more like Example 2 than Example 1.  What do you 
> think?
> 
>     Karl
> 
> 
> 
>     On Sat, May 25, 2013 at 8:20 PM, Markus Schuch <[email protected]
>     <mailto:[email protected]>> wrote:
> 
>         Hi Karl,
> 
>         no need to apologize... a response in less than 24 hours to an open
>         source project's mailing list entry is perfect to me ;) - so thank you
>         for the quick response and thank you for sacrificing your valuable
>         holiday weekend time.
> 
>         The confluence API returns user and/or group names when requesting
>         permissions for a page.
> 
>         see:
>         
> https://developer.atlassian.com/display/CONFDEV/Remote+Confluence+Methods#RemoteConfluenceMethods-Permissions.1
>         
> https://developer.atlassian.com/display/CONFDEV/Remote+Confluence+Data+Objects#RemoteConfluenceDataObjects-contentpermissionContentPermission
> 
>         But the API methods for retrieving page permissions do not respect
>         permissions inherited from parent pages which is very sad. (refer to
>         https://jira.atlassian.com/browse/CONF-14965)
> 
>         To workaround this problem we will have to write a confluence plugin
>         that can give us the effective permissions for a page.
>         We looked into that and we think it is possible.
>         In theory the effective page permissions retrieved by our plugin would
>         be a list of group names and/or usernames. The groupnames have to be
>         ANDed to respect permissions inherited from parent pages. We can
>         concatenate all needed combinations of group and user names to single
>         accesstokens to create a "flattened" version of the permission
>         hierarchy. So good so far...
> 
>         But another problem arises:
>         The authority connector would also have to return accesstokens that 
> are
>         compatible to the flattened permission hierachy and therefore we must
>         build all possible permutations of the user's groupnames. If our math 
> is
>         correct, there will be (2^n)-1 access tokens for a user (where n is 
> the
>         number of distinct groups the user is member of). Additionally there
>         will be more combinations with the username. This will most probably 
> not
>         perform well for users with many group memberships.
> 
>         I see these 2 options:
>         - We could implement folder level accesstokens for a constant number X
>         of folder levels.
>         So the outputconnector would need to reject documents with a number of
>         folder levels greater X.
>         May be there is built in limit of page levels in confluence... if not,
>         that this solution is not ideal.
>         - Start to think about post filtering...
> 
>         Regards,
>         Markus
> 
>         -----------------------------------------
> 
>         Gesendet: Samstag, 25. Mai 2013 um 16:54 Uhr
>         Von: "Karl Wright" <[email protected] <mailto:[email protected]>>
>         An: "[email protected] <mailto:[email protected]>"
>         <[email protected] <mailto:[email protected]>>
>         Betreff: Re: How to map the atlassian confluence security model to
>         manifoldcf
> 
>         Hi Marcus,
> 
>         Sorry for the slow response - it is a holiday weekend in the States, 
> and
>         that has managed to impact me to some degree.
>          Anyhow, I've looked at the doc on Atlassian security, and I have some
>         questions.  First, when you call the Atlassian API, and request 
> security
>         information for a document, in what form does it come back?  If it 
> comes
>         back as a minimal list of groups and users which can see the document,
>         then you probably just want the access tokens for this connector to be
>         group names/ids and user names/ids.  If it is more complicated, and
>         basically you have to ascend the hierarchy either explicitly or
>         implicitly, then we'll have to work a bit harder.  Either we'll have 
> to
>         find a flat mapping of folders to access tokens, or we'll have to look
>         at extending the framework to handle more stuff.
>          
>         As far as the folder-level security, the reason it is deprecated at 
> the
>         moment is because it is very challenging to implement properly in a
>         standard search engine with a fixed schema, since there are N possible
>         folder parents, where N is determined by an individual document. 
>         Furthermore, the model is not really applicable to the case where 
> there
>         is a hierarchy that cannot be flattened. But, depending on what the
>         answer is to my question above, if needed we can try to come up with a
>         workable folder implementation, and extend the Solr connector and
>         plugins as well.
>          
>         Karl
>          
>          
>          
>         On Fri, May 24, 2013 at 6:57 PM, Markus Schuch <[email protected]
>         <mailto:[email protected]>> wrote:Hi,
> 
>         we are currently writing a repository connector for confluence.
>         We are using the solr output connection on Solr 4.x.
>         Seeding, versioning, processing works already and now we have to face
>         security.
> 
>         Compared to the already supported repositories by mcf, confluence 
> seems
>         to have a different security model.
> 
>         There are "Space" permissions for a whole wiki space and these can
>         easily be mapped as shareAllowTokens but there are also page
>         restrictions. Page restrictions are attached to each page (page =
>         document) and page restrictions are inherited.
> 
>         See "Example of Child Page Restrictions" in the Confluence Doc:
>         
> https://confluence.atlassian.com/display/DOC/Page+Restrictions[https://confluence.atlassian.com/display/DOC/Page+Restrictions]
>         
> <https://confluence.atlassian.com/display/DOC/Page+Restrictions%5Bhttps://confluence.atlassian.com/display/DOC/Page+Restrictions%5D>
> 
>         The inheritance of page restrictions makes things difficult.
>         If we are correct, than it is not sufficient to add the page
>         restrictions as document level access tokens, because the query time
>         filtering handels the user's access tokens (e.g. group memberships) as
>         disjunction. Instead we probalby need a hierarchic, folder based
>         structure of access tokens to map the inheritance of the page
>         restrictions correctly.
>         The current Solr SearchComponent does not support folder level access
>         tokens and the book (mcf in action) says, that these kind of tokens 
> are
>         considered deprecated.
>         To cut a long story short... we are stuck at the moment.
> 
>         Our questions:
>         Did anyone already manage to map confluence security to mcf/solr?
>         Or does somebody has an idea how a confluence-like security model can 
> be
>         mapped to mcf/solr?
> 
>         Thanks in advance
>         Markus
> 
> 
>

Re: How to map the atlassian confluence security model to manifoldcf

Reply via email to