Re: How to map the atlassian confluence security model to manifoldcf

Karl Wright Sun, 26 May 2013 06:32:53 -0700

Hi Markus,

The usual way these things map is that there is an API call that gets a
list of groups and users that can see
the resource, and *maybe* there's a list of groups and users that are
prohibited from seeing the resource.
These user ids and group ids get used as access tokens.  The semantics of
the ManifoldCF access tokens are that prohibitions supercede allowances.
The authority service then simply returns the user id and a list of
group ids to which the user belongs, provided such functionality exists in
the API.

In the case of Atlassian, where parents have both prohibition lists as well
as allowance lists, it is usually the case that the prohibition lists can
simply be unioned when they are flattened.  Being a member of any
prohibited group in the hierarchy is sufficient to exclude a user from
seeing the resource.  For allowance
lists, however, it is not possible to merge the lists in a simple way,
since as you point out you are trying to
capture an "AND" relationship.  To make this concrete, say you have three
objects - A->B->C, and let's say
P(A) is the allow list for A, P(B) for B, etc.  Then, you want
"user_in(P(A)) AND user_in(P(B)) AND user_in(P(C))".

I agree that the only viable way to flatten this is to create an access
token for every combination of group
permissions you are likely to see.  So if there were the groups G1 G2 G3 G4
and G5, there would have to be
access tokens for "G1 AND G2", "G2 AND G3", "G1 AND G2 AND G3", etc.  The
authority service would then be stuck returning a combinatorially large
number of access tokens, and that would not do at all.

An alternative is to try and find a way to implement the AND relationship
between access tokens natively.
To do it his way requires an open-ended and potentially combinatorially
large number of index fields.  You'd
need one such field per page, seems to me.  In theory Solr has a way of
creating N fields at index time, where
you just use a special field prefix, and the field is created.  But there
are two problems with this.  First,
at query time, the Lucene query the Solr plugin would need to build would
contain a clause for every page in
Atlassian.  That's not going to work.  Second, we'd need a default value
for access tokens for all pages in
Atlassian for every document indexed, and I don't think that's configurable
in Solr either.

Another alternative is to post-filter results.  This will require
significant support in ManifoldCF, especially in the
authority connector, but it could be added with not too much trouble.  The
downside is that there are going to
be cases where one would need to go through a lot of results to find the
few that one is allowed to see.  I'm
willing to do this, though, if there are no better alternatives.

But there's one more possibility, which is worth thinking about.
Specifically, try the approach of actually calculating the minimal
user/group list for the document, at indexing time.  So the access tokens
are group id's and user id's, and the connector logic actually calculates
the minimal intersection of P(A), P(B), and P(C) in the example above.

Example 1:
P(A) was G1 or G2
P(B) was G2 or G3
P(C) was G4

...then the logic would explicitly find all users which matched ALL of
those criteria - which would mean that the
access token list for the document would be a list of individual user id's
in this case, not groups - specifically the list of user ids of those users
that belong to G2 AND G4.

Example 2:
P(A) was G1 or G2 or G3
P(B) was G2 or G3
P(C) was G3

...then the logic would return just the group id for G3.

The only problem with this approach that I can see is that if the sysadmin
structures things like example 1, the
only way a user would be rendered unable to see such a document would be
via reindexing.  Changing the user's group affinity alone would not be
sufficient in that case.  However, I strongly suspect that real Atlassian
sysadmins do things more like Example 2 than Example 1.  What do you think?

Karl

On Sat, May 25, 2013 at 8:20 PM, Markus Schuch <[email protected]> wrote:

> Hi Karl,
>
> no need to apologize... a response in less than 24 hours to an open source
> project's mailing list entry is perfect to me ;) - so thank you for the
> quick response and thank you for sacrificing your valuable holiday weekend
> time.
>
> The confluence API returns user and/or group names when requesting
> permissions for a page.
>
> see:
>
> https://developer.atlassian.com/display/CONFDEV/Remote+Confluence+Methods#RemoteConfluenceMethods-Permissions.1
>
> https://developer.atlassian.com/display/CONFDEV/Remote+Confluence+Data+Objects#RemoteConfluenceDataObjects-contentpermissionContentPermission
>
> But the API methods for retrieving page permissions do not respect
> permissions inherited from parent pages which is very sad. (refer to
> https://jira.atlassian.com/browse/CONF-14965)
>
> To workaround this problem we will have to write a confluence plugin that
> can give us the effective permissions for a page.
> We looked into that and we think it is possible.
> In theory the effective page permissions retrieved by our plugin would be
> a list of group names and/or usernames. The groupnames have to be ANDed to
> respect permissions inherited from parent pages. We can concatenate all
> needed combinations of group and user names to single accesstokens to
> create a "flattened" version of the permission hierarchy. So good so far...
>
> But another problem arises:
> The authority connector would also have to return accesstokens that are
> compatible to the flattened permission hierachy and therefore we must build
> all possible permutations of the user's groupnames. If our math is correct,
> there will be (2^n)-1 access tokens for a user (where n is the number of
> distinct groups the user is member of). Additionally there will be more
> combinations with the username. This will most probably not perform well
> for users with many group memberships.
>
> I see these 2 options:
> - We could implement folder level accesstokens for a constant number X of
> folder levels.
> So the outputconnector would need to reject documents with a number of
> folder levels greater X.
> May be there is built in limit of page levels in confluence... if not,
> that this solution is not ideal.
> - Start to think about post filtering...
>
> Regards,
> Markus
>
> -----------------------------------------
>
> Gesendet: Samstag, 25. Mai 2013 um 16:54 Uhr
> Von: "Karl Wright" <[email protected]>
> An: "[email protected]" <[email protected]>
> Betreff: Re: How to map the atlassian confluence security model to
> manifoldcf
>
> Hi Marcus,
>
> Sorry for the slow response - it is a holiday weekend in the States, and
> that has managed to impact me to some degree.
>  Anyhow, I've looked at the doc on Atlassian security, and I have some
> questions.  First, when you call the Atlassian API, and request security
> information for a document, in what form does it come back?  If it comes
> back as a minimal list of groups and users which can see the document, then
> you probably just want the access tokens for this connector to be group
> names/ids and user names/ids.  If it is more complicated, and basically you
> have to ascend the hierarchy either explicitly or implicitly, then we'll
> have to work a bit harder.  Either we'll have to find a flat mapping of
> folders to access tokens, or we'll have to look at extending the framework
> to handle more stuff.
>
> As far as the folder-level security, the reason it is deprecated at the
> moment is because it is very challenging to implement properly in a
> standard search engine with a fixed schema, since there are N possible
> folder parents, where N is determined by an individual document.
> Furthermore, the model is not really applicable to the case where there is
> a hierarchy that cannot be flattened. But, depending on what the answer is
> to my question above, if needed we can try to come up with a workable
> folder implementation, and extend the Solr connector and plugins as well.
>
> Karl
>
>
>
> On Fri, May 24, 2013 at 6:57 PM, Markus Schuch <[email protected]>
> wrote:Hi,
>
> we are currently writing a repository connector for confluence.
> We are using the solr output connection on Solr 4.x.
> Seeding, versioning, processing works already and now we have to face
> security.
>
> Compared to the already supported repositories by mcf, confluence seems to
> have a different security model.
>
> There are "Space" permissions for a whole wiki space and these can easily
> be mapped as shareAllowTokens but there are also page restrictions. Page
> restrictions are attached to each page (page = document) and page
> restrictions are inherited.
>
> See "Example of Child Page Restrictions" in the Confluence Doc:
>
> https://confluence.atlassian.com/display/DOC/Page+Restrictions[https://confluence.atlassian.com/display/DOC/Page+Restrictions]
>
> The inheritance of page restrictions makes things difficult.
> If we are correct, than it is not sufficient to add the page restrictions
> as document level access tokens, because the query time filtering handels
> the user's access tokens (e.g. group memberships) as disjunction. Instead
> we probalby need a hierarchic, folder based structure of access tokens to
> map the inheritance of the page restrictions correctly.
> The current Solr SearchComponent does not support folder level access
> tokens and the book (mcf in action) says, that these kind of tokens are
> considered deprecated.
> To cut a long story short... we are stuck at the moment.
>
> Our questions:
> Did anyone already manage to map confluence security to mcf/solr?
> Or does somebody has an idea how a confluence-like security model can be
> mapped to mcf/solr?
>
> Thanks in advance
> Markus
>

Re: How to map the atlassian confluence security model to manifoldcf

Reply via email to