Re: [DISCUSS] Per-doc access control

2019-02-27 Thread Adam Kocoloski


> On Feb 27, 2019, at 3:47 PM, Michael Fair  wrote:
> 
>> 
>> 
>>> This might be what is already planned (it hasn't sounded like it to me
>>> though).
>>> And I definitely think changing the perspective to make "databases" a
>>> function of the access control system and to make views based on "access
>>> controlled collection results" instead of "databases" would be quite
>>> powerful...
>>> 
>>> Regards,
>>> Mike
>> 
>> Hi Mike, what you’ve described here is very very similar to what Jan is
>> building.
>> 
>> Adam
>> 
>> 
> I read back through the links that Jan posted again; the details I was
> looking for are probably somewhere in the sharding conversation that my
> eyes glazed over on or somewhere in the notes of the roadmap discussion
> which made it a bit hard for me to find just the parts related to this (I
> most likely scrolled through it). ;-)
> 
> Thanks for clarifying for me, and for letting me chime in!
> 
> Mike

Those details are really hard to find — I can only find them because I know 
exactly where to look in the minutes of a meeting that I attended well over a 
year ago :) Probably a good case for an RFC so we have a current pointer to the 
plan.

Adam

Re: [VOTE] Release Apache CouchDB 2.3.1 RC2

2019-02-27 Thread Joan Touzet
Based on discussion with Russell Branca (chewbranca) in IRC, we need to
abort this RC vote as he is effectively voting -1. Here's the full
transcript of our discussion:

  

16:06 <+Wohali> chewbranca: you there? are you seeing these eunit
context setup errors in 2.3.0 as well as the 2.3.1 RC
and master?
16:06 <+Wohali> I don't want to hold up 2.3.1 over something that was a
pre-existing condition, but if it's something that
changed between 2.3.0 and 2.3.1/master, we need to fix
it
16:07  Wohali: well the fundamental issue right now is test
   suite failures don't fail the build, which IMO should
   be fixed before any further builds
16:08  I've been using this diff locally, which fails the
   `make eunit` check upon an eunit failure: 
https://gist.github.com/chewbranca/65d2969ac191a5dfaf87172ace18d2ee
16:08  not sure that's the best approach, but we need
   something like that
16:08 <+Wohali> What I'm asking is: do you think this should block the
release of 2.3.1?
16:08 <+Wohali> By all means PR that to master and let's get shit in
gear
16:08 <+Wohali> I'm trying to work out when this problem started
occurring, though.
16:09  yes, should definitely block any further releases,
   because unless someone is manually inspecting the
   eunit output, then we could have test failures
   bubbling through
16:11  in theory this particular issue was introduced 26
   days ago with the change to running individual eunit
   tests: 
https://github.com/apache/couchdb/commit/20bbfbf972ad1f822e2ef1edfb3d47f2cec3f639
16:11  so this is probably a new thing, but we've definitely
   had issues with eunit over the years
16:12  Wohali: I can make a quick PR with the diff I pasted
   above and then we should be good to go IMO, but it
   wouldn't hurt to see if there's a more proper way to
   do that in a Makefile than just `|| exit 1`
16:16 <+Wohali> chewbranca: are you 100% sure that context setup
failures mean the tests are actually failing? They seem
to be running and passing even after that. I'm too
unfamiliar to know for sure.
16:17 <+Wohali> chewbranca: that change you linked isn't in 2.3.1.
16:17  context setup failure means that setting up a series
   of eunit test generators failed and those tests
   aren't being executed
16:17 <+Wohali> ok.
16:18  those will fail if you do `|| exit 1`, but they
   continue running today because we don't exit on the
   individual eunit runs
16:18 <+Wohali> 2.3.1 has a critical fix for buffer sizes that we need
to get out there. WOuld you accept me manually reviewing
the output of 2.3.1's test suite  to ensure no context
setup failures?
16:18 <+Wohali> then we make this a blocker for 2.4.0?
16:18  what I linked above is just a diff that I've been
   using locally because I wanted the suite to fail, and
   it works
16:19  Wohali: IMO let's just add that diff and then if
   folks know a more proper Makefile approach to doing
   that type of thing then they can fix it later
16:19 <+Wohali> to both 2.3.1 and master? And to Makefile.Win I presume?
;) Then we'll have to cancel the current RC and re-spin.
...
16:25  https://github.com/apache/couchdb/pull/1951

  


- Original Message -
> From: "Dave Cottlehuber" 
> To: dev@couchdb.apache.org
> Sent: Monday, February 25, 2019 6:10:05 AM
> Subject: Re: [VOTE] Release Apache CouchDB 2.3.1 RC2
> 
> On Mon, 25 Feb 2019, at 10:56, Dave Cottlehuber wrote:
> > On Thu, 21 Feb 2019, at 06:27, Jan Lehnardt wrote:
> > 
> > FreeBSD 12.0-RELEASE-p3 amd64 + OTP 21.2.6 custom
> > 
> > - OK sigs and checksums
> > - OK release
> > - fauxton verify is happy
> > - make check fails with the C.UTF-8 issues Joan has mentioned
> > previously
> > 
> > belated +1 from me
> > 
> > BTW the port will be a bit delayed this time as I need to bump OTP
> > version and that usually has a bit of ports tree shakeout. My patch
> > for
> > that is https://reviews.freebsd.org/D18820
> 
> I forgot to mention that the tarball has the annoying -RC2 suffix in
> filenames, which makes the downstream packaging diffs fiddly. I have
> that unfinished PR https://github.com/apache/couchdb/pull/1927
> hopefully to fix that for next time.
> 
> A+
> Dave
> 


Notifications for new repos redirected

2019-02-27 Thread Joan Touzet
Hi there,

As part of ASF's move of all Git repositories to GitHub, some of our
newer repos started having their notifications routed to the wrong
mailing list (dev@ instead of notifications@).

This should be fixed now, see  https://issues.apache.org/jira/browse/INFRA-17917

Committers/PMC: When you request new repos for our project, be sure
to let INFRA know that you want notifications all sent to
our notifications@ mailing list, not dev@.

If you forget, or if there isn't a place to specify that anymore
in the self-service desk, file a new JIRA ticket with INFRA and
they will typically turn it around within an hour.

Thanks,
Joan "everything in its place" Touzet


Re: [DISCUSS] Per-doc access control

2019-02-27 Thread Michael Fair
>
>
> > This might be what is already planned (it hasn't sounded like it to me
> > though).
> > And I definitely think changing the perspective to make "databases" a
> > function of the access control system and to make views based on "access
> > controlled collection results" instead of "databases" would be quite
> > powerful...
> >
> > Regards,
> > Mike
>
> Hi Mike, what you’ve described here is very very similar to what Jan is
> building.
>
> Adam
>
>
I read back through the links that Jan posted again; the details I was
looking for are probably somewhere in the sharding conversation that my
eyes glazed over on or somewhere in the notes of the roadmap discussion
which made it a bit hard for me to find just the parts related to this (I
most likely scrolled through it). ;-)

Thanks for clarifying for me, and for letting me chime in!

Mike


Re: [DISCUSS] Per-doc access control

2019-02-27 Thread Robert Newson
Not to pile on here but "Once you read your available docs into your DB, you 
can grant yourself
write privileges to every document there." does seem to miss the mark.

All replication is doing is making a copy of data you have access to. You can 
modify your own copy as you please, it doesn't violate the security of the 
origin server. If that server allowed you to replicate those changes back, 
sure, but that is wholly within the origin servers control.

The notion that access controls could meaningful propagate to servers outside 
of the original servers control seems very much like DRM with all its 
doesn't-work-without-litigation problems.

-- 
  Robert Samuel Newson
  rnew...@apache.org

On Wed, 27 Feb 2019, at 20:14, Adam Kocoloski wrote:
> 
> > On Feb 27, 2019, at 3:01 PM, Michael Fair  wrote:
> > 
> > On Wed, Feb 27, 2019 at 10:36 AM Adam Kocoloski  wrote:
> > 
> >> Hi Mike, just picking out this one snippet:
> >> 
> >>> On Feb 27, 2019, at 12:16 PM, Michael Fair 
> >> wrote:
> >>> 
> >>> If I get a replica of a database from your server, what, if anything,
> >>> prevents me from granting myself access controls to the entire database?
> >> 
> >> Replication is a client of the API like everyone else and cannot bypass
> >> the access controls on the source. You can only create a replication which
> >> has at most access to all the documents in the database that you can access
> >> yourself; i.e. a replication of a database with per-doc access controls
> >> enabled may only transfer a subset of the documents in the database.
> >> 
> > 
> > Right, but generally speaking READ access is very prevalent, WRITE access
> > is much more restrictive.
> > 
> > Or are these "access controls" really just the binary "exposed"/"not
> > exposed" sort such that had these documents simply gone into a different
> > database in the first place, and the view indexes tracked in a "per
> > database" way then everything would work as expected?
> > 
> > In fact, maybe "the same doc in multiple user centered databases" is not
> > such a bad idea/model to consider.
> > 
> > Currently, I see this idea in Couch that documents belong to a particular
> > "named document collection" called a database.  A view is really just
> > another kind of "named document collection".
> > 
> > What if instead of a "database", there was simply the single universal
> > document store and a "database" is then more like a "view" that documents
> > became a member of just like a user's access controlled "slice" would be?
> > (i.e. the dbname becomes more like of a document access filter and less a
> > grouping and storage boundary)
> > 
> > Users then have a "per user scope/database" which they always access
> > through when they connect to the server.
> > The top level list of "_databases" the user sees is actually just a list of
> > those "named collections" that the user can access.  To the client side
> > APIs nothing really changes for them, what changes is how Couch internally
> > organizes itself to make "databases" a construct of the access control
> > feature and no longer a structural primitive.
> > 
> > Documents now carry the information about what "named collections" they
> > belong to in the same way as what entities are authorized to access them,
> > and "databases" basically become a "view" grouping by the "_collections"
> > array field member values on each document.
> > 
> > 
> > Many people have requested the option to expose the same document to
> > multiple databases and then use database access rights to enforce per user
> > access and share documents between users.  I've even wanted this feature
> > from time to time.  If views have to be modified to hide unauthorized
> > documents anyway, this seems a perfect opportunity to address this at the
> > same time...
> > 
> > 
> > .
> > In this model, a "document collection", whether it be a database, a user, a
> > user group, a role, or a role group becomes the entity that the access
> > control documents are "authorizing".
> > 
> > In this model, a "view" becomes an identifiable "collection" that can be
> > treated like a database.
> > Which would make creating views on top of other views becomes something
> > much easier to define/express.
> > 
> > 
> > I'm envisioning that to implement successful "access control" based views,
> > each "authorized entity" would have to maintain its own view index.
> > Otherwise it's really hard for me to imagine how a reduce function can
> > cache any precomputed results because it has no way of knowing that all the
> > underlying documents used in the reduce were authorized for the accessing
> > entity...  All reduce functions would have to run in real time to ensure it
> > is only using authorized documents...
> > 
> > If there were a separate index for each authorized entity though (and
> > especially if these entities were able to share index buckets where they
> > had all the same document access in common), then the reduce 

Re: [DISCUSS] Per-doc access control

2019-02-27 Thread Adam Kocoloski


> On Feb 27, 2019, at 3:01 PM, Michael Fair  wrote:
> 
> On Wed, Feb 27, 2019 at 10:36 AM Adam Kocoloski  wrote:
> 
>> Hi Mike, just picking out this one snippet:
>> 
>>> On Feb 27, 2019, at 12:16 PM, Michael Fair 
>> wrote:
>>> 
>>> If I get a replica of a database from your server, what, if anything,
>>> prevents me from granting myself access controls to the entire database?
>> 
>> Replication is a client of the API like everyone else and cannot bypass
>> the access controls on the source. You can only create a replication which
>> has at most access to all the documents in the database that you can access
>> yourself; i.e. a replication of a database with per-doc access controls
>> enabled may only transfer a subset of the documents in the database.
>> 
> 
> Right, but generally speaking READ access is very prevalent, WRITE access
> is much more restrictive.
> 
> Or are these "access controls" really just the binary "exposed"/"not
> exposed" sort such that had these documents simply gone into a different
> database in the first place, and the view indexes tracked in a "per
> database" way then everything would work as expected?
> 
> In fact, maybe "the same doc in multiple user centered databases" is not
> such a bad idea/model to consider.
> 
> Currently, I see this idea in Couch that documents belong to a particular
> "named document collection" called a database.  A view is really just
> another kind of "named document collection".
> 
> What if instead of a "database", there was simply the single universal
> document store and a "database" is then more like a "view" that documents
> became a member of just like a user's access controlled "slice" would be?
> (i.e. the dbname becomes more like of a document access filter and less a
> grouping and storage boundary)
> 
> Users then have a "per user scope/database" which they always access
> through when they connect to the server.
> The top level list of "_databases" the user sees is actually just a list of
> those "named collections" that the user can access.  To the client side
> APIs nothing really changes for them, what changes is how Couch internally
> organizes itself to make "databases" a construct of the access control
> feature and no longer a structural primitive.
> 
> Documents now carry the information about what "named collections" they
> belong to in the same way as what entities are authorized to access them,
> and "databases" basically become a "view" grouping by the "_collections"
> array field member values on each document.
> 
> 
> Many people have requested the option to expose the same document to
> multiple databases and then use database access rights to enforce per user
> access and share documents between users.  I've even wanted this feature
> from time to time.  If views have to be modified to hide unauthorized
> documents anyway, this seems a perfect opportunity to address this at the
> same time...
> 
> 
> .
> In this model, a "document collection", whether it be a database, a user, a
> user group, a role, or a role group becomes the entity that the access
> control documents are "authorizing".
> 
> In this model, a "view" becomes an identifiable "collection" that can be
> treated like a database.
> Which would make creating views on top of other views becomes something
> much easier to define/express.
> 
> 
> I'm envisioning that to implement successful "access control" based views,
> each "authorized entity" would have to maintain its own view index.
> Otherwise it's really hard for me to imagine how a reduce function can
> cache any precomputed results because it has no way of knowing that all the
> underlying documents used in the reduce were authorized for the accessing
> entity...  All reduce functions would have to run in real time to ensure it
> is only using authorized documents...
> 
> If there were a separate index for each authorized entity though (and
> especially if these entities were able to share index buckets where they
> had all the same document access in common), then the reduce function would
> cache data based on each authorized entity because it'd be built up from a
> different view index.
> 
> The existing model where indexes are updated whenever they are accessed
> could remain in tact because you wouldn't have to rebuild indeces for
> entities that weren't accessing that view...
> 
> This might be what is already planned (it hasn't sounded like it to me
> though).
> And I definitely think changing the perspective to make "databases" a
> function of the access control system and to make views based on "access
> controlled collection results" instead of "databases" would be quite
> powerful...
> 
> Regards,
> Mike

Hi Mike, what you’ve described here is very very similar to what Jan is 
building.

Adam



Re: [DISCUSS] Per-doc access control

2019-02-27 Thread Michael Fair
>
> There are certainly a bunch of interesting challenges around providing
> identities that are meaningful across multiple servers in different
> domains, and I think that’s worth digging into, but I wanted to avoid
> anyone thinking that replication could trivially defeat the per-doc _access
> controls that Jan has been working on here. Cheers,
>
>
I think it depends on what you mean by "defeat".
I haven't heard enough that suggests replication can't trivially defeat
access controls yet
That's exactly the question I'm asking.

Let's look at two specific cases:


So in one case, it's a "pull only" replica; say you are pulling from my
server to yours.

Once you read your available docs into your DB, you can grant yourself
write privileges to every document there.

My original database is obviously unchanged, but in a more p2p environment
where the idea is that the data is "daisy chaining" between replicas in a
"mesh" instead of a "star" topology; the access controls on documents in
the mesh is completely untrustworthy if those kinds of rewrites can happen.



The other case is a bi-directional replication scenario because we want
both our servers to have "some" write access to "some" documents in a
database that we share, such that we wish to replicate the full set of
documents between each other but maintain access controls.  This case seems
to completely break down because you can modify any document you have read
access to and then I could easily replicate those changes back from you.

.

In the scenarios I'm envisioning, once a document gets onto a machine, the
administrator of that machine can completely rewrite the access controls to
any document (and by extension rewrite the document itself); if those
changes are able to replicate out to other databases, and especially back
up to the original source, that's what I see poses a problem.

I suppose one fix idea is that when replicating, outbound uses read access
controls and inbound enforces write access controls; but this would have to
be at the level of tracking what entity made the modifications and not the
entity doing the replication.

I'm looking at scenarios where there is more than one user accessing the
replica because it's a multimaster distributed infoset scenario.

 Perhaps you are thinking that the model is more "hub and spoke" where the
replica is only for a singular user?


Re: [DISCUSS] Per-doc access control

2019-02-27 Thread Michael Fair
On Wed, Feb 27, 2019 at 10:36 AM Adam Kocoloski  wrote:

> Hi Mike, just picking out this one snippet:
>
> > On Feb 27, 2019, at 12:16 PM, Michael Fair 
> wrote:
> >
> > If I get a replica of a database from your server, what, if anything,
> > prevents me from granting myself access controls to the entire database?
>
> Replication is a client of the API like everyone else and cannot bypass
> the access controls on the source. You can only create a replication which
> has at most access to all the documents in the database that you can access
> yourself; i.e. a replication of a database with per-doc access controls
> enabled may only transfer a subset of the documents in the database.
>

Right, but generally speaking READ access is very prevalent, WRITE access
is much more restrictive.

Or are these "access controls" really just the binary "exposed"/"not
exposed" sort such that had these documents simply gone into a different
database in the first place, and the view indexes tracked in a "per
database" way then everything would work as expected?

In fact, maybe "the same doc in multiple user centered databases" is not
such a bad idea/model to consider.

Currently, I see this idea in Couch that documents belong to a particular
"named document collection" called a database.  A view is really just
another kind of "named document collection".

What if instead of a "database", there was simply the single universal
document store and a "database" is then more like a "view" that documents
became a member of just like a user's access controlled "slice" would be?
(i.e. the dbname becomes more like of a document access filter and less a
grouping and storage boundary)

Users then have a "per user scope/database" which they always access
through when they connect to the server.
The top level list of "_databases" the user sees is actually just a list of
those "named collections" that the user can access.  To the client side
APIs nothing really changes for them, what changes is how Couch internally
organizes itself to make "databases" a construct of the access control
feature and no longer a structural primitive.

Documents now carry the information about what "named collections" they
belong to in the same way as what entities are authorized to access them,
and "databases" basically become a "view" grouping by the "_collections"
array field member values on each document.


Many people have requested the option to expose the same document to
multiple databases and then use database access rights to enforce per user
access and share documents between users.  I've even wanted this feature
from time to time.  If views have to be modified to hide unauthorized
documents anyway, this seems a perfect opportunity to address this at the
same time...


.
In this model, a "document collection", whether it be a database, a user, a
user group, a role, or a role group becomes the entity that the access
control documents are "authorizing".

In this model, a "view" becomes an identifiable "collection" that can be
treated like a database.
Which would make creating views on top of other views becomes something
much easier to define/express.


I'm envisioning that to implement successful "access control" based views,
each "authorized entity" would have to maintain its own view index.
Otherwise it's really hard for me to imagine how a reduce function can
cache any precomputed results because it has no way of knowing that all the
underlying documents used in the reduce were authorized for the accessing
entity...  All reduce functions would have to run in real time to ensure it
is only using authorized documents...

If there were a separate index for each authorized entity though (and
especially if these entities were able to share index buckets where they
had all the same document access in common), then the reduce function would
cache data based on each authorized entity because it'd be built up from a
different view index.

The existing model where indexes are updated whenever they are accessed
could remain in tact because you wouldn't have to rebuild indeces for
entities that weren't accessing that view...

This might be what is already planned (it hasn't sounded like it to me
though).
And I definitely think changing the perspective to make "databases" a
function of the access control system and to make views based on "access
controlled collection results" instead of "databases" would be quite
powerful...

Regards,
Mike


Re: [DISCUSS] Per-doc access control

2019-02-27 Thread Adam Kocoloski
Hi Mike, just picking out this one snippet:

> On Feb 27, 2019, at 12:16 PM, Michael Fair  wrote:
> 
> If I get a replica of a database from your server, what, if anything,
> prevents me from granting myself access controls to the entire database?

Replication is a client of the API like everyone else and cannot bypass the 
access controls on the source. You can only create a replication which has at 
most access to all the documents in the database that you can access yourself; 
i.e. a replication of a database with per-doc access controls enabled may only 
transfer a subset of the documents in the database.

There are certainly a bunch of interesting challenges around providing 
identities that are meaningful across multiple servers in different domains, 
and I think that’s worth digging into, but I wanted to avoid anyone thinking 
that replication could trivially defeat the per-doc _access controls that Jan 
has been working on here. Cheers,

Adam

Re: [DISCUSS] Per-doc access control

2019-02-27 Thread Michael Fair
>
> Thanks Mike, I did understand you correctly the first time. I still
maintain that’s in the realm of authentication, not authorization, and
should be cleanly separable from the problem of implementing per-document
access controls.
>

Then I think we're saying the same thing except I'm looking/asking about
the actual intended mechanics of this particular cross domain case.

I agree access controls are an authorization question, but what the heck is
getting authorized and how do those things map to whatever gets
authenticated?

I was specifically proposing the idea that "role entities", stored inside
the database, are the only objects an authorization record could get
granted permissions to as a proposed nomenclature.

* Roles are authorized.
* Users are authenticated.
* Users are a server/system level entity.
* Roles are a database level entity.
* The valid role list replicates with the database.
* The document authorization descriptors replicate with the database.
* The users database does not replicate.
* The links between users and roles do not replicate.

And there is this big ol' security question about what data in the database
replicates to another site?

If I get a replica of a database from your server, what, if anything,
prevents me from granting myself access controls to the entire database?

If I get a replica of a database, how does that map my preexisitng
autheticated user entities onto the authorized entities of the access
control docs?
(If there's an authorized entity called 'sam' in the access controls, do I
just create a user account called 'sam' with a password I know so I can
autheticate as 'sam', and get access to Sam's doc authorizations?)

The response of "this is an authentication versus authorization issue and
we're discussing authorizations" is something I agree with; but it doesn't
seem to address my question/observation that there is a linking
relationship between the authenticated and authorized entities; and in a
cross domain replicated environment the adminstrative controls for
assigning the links and creating the respective entities are under
differing authorities/controls.

IOW, defeating Couch Access Controls looks like it will simply be pulling
the database onto your own server and granting yourself access.  That doing
something more sophisticated will require a domain aware approach to
defining authenticated and authorized entities...

Is the intended design architecture a simple 'string match on the
authenticated user name' model?

IOW, is the user/authorization linking model that every user named 'sam' is
accepted/authenticated as the same 'sam' regardless of hosting server, and
therefore all users named 'sam' on every server are always authorized the
same way?  I think it's obvious that while a valid way to handle it, that's
clearly suboptimal and in some cases will be completely unusable.

I agree access controls are an "authorization" question; my
questions/concerns/focus is "What conceptual entity are these access
control documents authorizing" and "How do authenticated entities get
mapped to those authorized entities"?

Specifically in a cross domain setting where the authentication systems are
under totally different administrative domains; and each domain can
effectively admin itself over the entire DB.

If you have bi-directional replication between two servers as I described
before (the shared product catalog); is there a design element here to
enforce those access controls in "cross domain safe" way?

I'm not saying there has to be, but I do think that'd be a good thing, and
I don't see an immediately obvious 'how to do it' solution...

Thanks,
Mike


Re: [DISCUSS] Per-doc access control

2019-02-27 Thread Ilya Khlopotov
> The application would override the _all_docs and _changes endpoints, and if
> a user has enabled access=true for that database then you could then return
> the _all_docs and _changes requests from your application. The epi http
> work is pretty fancy I think we could do some cool things around that to
> make this work well. 
I would advise against using EPI for this. The main role of the EPI (and 
especially override of endpoints) is to allow custom builds of CouchDB with 
extended functionality. In this case per-doc access control feature is part of 
the Apache CouchDB. If we would use override it would prevent subsequent 
override from the custom build. 

best regards,
iilyak

On 2019/02/26 10:18:45, Garren Smith  wrote: 
> Hi Jan,
> 
> I've been giving this some thought and I wonder if we should take a step
> back and rethink how we do this. Instead of implementing this directly into
> the CouchDB core code, it might be better to write this as an application
> similar to Dreyfus - Cloudant's search[1]. Instead of writing this code
> directly in the core CouchDB code rather we write this as another
> application. I'm hoping then that you wouldn't have to make huge
> modifications to the CouchDB codebase which should make this easier to do.
> The application would override the _all_docs and _changes endpoints, and if
> a user has enabled access=true for that database then you could then return
> the _all_docs and _changes requests from your application. The epi http
> work is pretty fancy I think we could do some cool things around that to
> make this work well. The app would listen to the changes feeds of any
> database that has access=true and then implement the required index's for
> _all_docs and changes. I think we then would not have to create a custom
> indexer as we could build the indexes when new changes arrive.
> 
> I'm also hoping that another advantage of doing this as an app that listens
> to the changes feed is that there should be minimal work to get this to
> work when we switch to fdb.
> 
> This is obviously just an idea I had and I thought I would share it, not in
> an attempt to derail what you doing, but hopefully in an attempt to make
> sure we find the easiest and most effective way to get this done.
> 
> Cheers
> Garren
> 
> 
> [1] https://github.com/cloudant-labs/dreyfus
> 
> On Sun, Feb 17, 2019 at 4:25 PM Jan Lehnardt  wrote:
> 
> > Hi Everyone,
> >
> > I’m happy to share my work in progress attempt to implement the per-doc
> > access control feature we discussed a good while ago:
> >
> >
> > https://lists.apache.org/thread.html/6aa77dd8e5974a3a540758c6902ccb509ab5a2e4802ecf4fd724a5e4@%3Cdev.couchdb.apache.org%3E
> > <
> > https://lists.apache.org/thread.html/6aa77dd8e5974a3a540758c6902ccb509ab5a2e4802ecf4fd724a5e4@%3Cdev.couchdb.apache.org%3E
> > >
> >
> > You can check out my branch here:
> >
> > https://github.com/apache/couchdb/compare/access?expand=1 <
> > https://github.com/apache/couchdb/compare/access?expand=1>
> >
> > It is very much work in progress, but it is far enough along to warrant
> > discussion.
> >
> > The main point of this branch is to show all the places that we would need
> > to change to support the proposal.
> >
> > Things I’ve left for later:
> >
> > - currently only the first element in the _access array is used. Our
> > and/or syntax can be added later.
> > - building per-access views has not been implemented yet, couch_index
> > would have to be taught about the new per-access-id index.
> > - pretty HTTP error handling
> > - tests except for a tiny shell script 
> >
> > Implementation notes:
> >
> > You create a database with the _access feature turned on like so:  PUT
> > /db?access=true
> >
> > I started out with storing _access in the document body, as that would
> > allow for a minimal change set, however, on doc updates, we try hard not to
> > load the old doc body from the database, and forcing us to do so for EVERY
> > doc update under _access seemed prohibitive, so I extended the #doc,
> > #doc_info and #full_doc_info records with a new `access` attribute that is
> > stored in both by-id and by-seq. I will need guidance on how extending
> > these records impact multi-version cluster interop. And especially whether
> > this is an acceptable approach.
> >
> >
> > https://github.com/apache/couchdb/compare/access?expand=1=0#diff-904ab7473ff8ddd07ea44aca414e3a36
> >
> > * * *
> >
> > The main addition is a new native query server called
> > couch_access_native_proc, which implements two new indexes by-access-id and
> > by-access-seq which do what you’d expect, pass in a userCtx and retrieve
> > the equivalent of _all_docs or _changes, but only including those docs that
> > match the username and roles in their _access property. The existing
> > handlers for _all_docs and _changes have been augmented to use the new
> > indexes instead of the default ones, unless the user is an admin.
> >
> >
> > 

Re: [DISCUSS] Per-doc access control

2019-02-27 Thread Adam Kocoloski
It’s possible, but I don’t see how to scale it efficiently.

If you let a user create a view and access it directly, you need to ensure that 
the system which reads the documents only reads the ones that the user can 
access. Effectively, the design doc needs to inherit the access roles of the 
user who created it. An external application could enforce this although it 
would somehow need to be a different syntax so the regular view system doesn’t 
pick up the ddoc and build the view while reading all documents.

Assuming you take care of the above, the next challenge is a performance one. 
If you have 100 users creating their own views off of the vanilla _changes feed 
of a single database, the indexing process for each view is 100x more expensive 
than it needs to be (because each indexer scans and discards 99% of the docs 
due to _access restrictions). The in-database approach tries to address this by 
creating a _changes feed where the entries are prefixed by the _access required 
to read them, so an indexer can quickly skip the ranges that it knows are 
inaccessible.

Adam

> On Feb 27, 2019, at 6:47 AM, Garren Smith  wrote:
> 
> Hi Adam,
> 
> I probably didn’t give views the most thought. I was thinking we could
> implement views by either using the changes feed or a new query engine like
> we do for mango. Does that make sense or is it not really possible?
> 
> 
> On Wed, Feb 27, 2019 at 2:26 AM Adam Kocoloski  wrote:
> 
>> 
>>> On Feb 26, 2019, at 7:12 PM, Michael Fair 
>> wrote:
>>> 
>>> On Tue, Feb 26, 2019 at 3:38 PM Adam Kocoloski 
>> wrote:
>>> 
 Mike,
 
 If I’m reading you correctly you’re concerned about cross-domain
 authentication. A good problem and worth discussing, but I think it’s
 cleanly decoupled from the per-doc access control work, which is
>> focused on
 *authorization*.
 
 
>>> 
>>> I don't think I'm talking about the same cross domain authentication you
>>> are talking about.  I think you are talking about a web page from Domain
>>> (B) attempting to access Couch resource in domain (A) (Cross site
>> scripting
>>> access). That's not what I'm talking about.
>>> 
>>> I'm talking about what ought to happen with the authorization control
>>> definitions when you have two Couch servers, one running in Domain (A)
>> and
>>> one running in Domain (B) with different sets of system users, such that
>>> the authorized entities in the bidirectionally replicated database don't
>>> exist in both server instances (the two distinct domains share the same
>>> document database but have disparate sets of authenticated system users).
>>> 
>>> In other words the ("sam", "pete", and "joe") users on domain/machine A
>> are
>>> not the same thing as the ("mary", "betty", and "sue")  users on
>>> domain/machine B; yet the replicated database between the two machines
>> has
>>> the same access control document authorization descriptors in both
>> places.
>> 
>> 
>> Thanks Mike, I did understand you correctly the first time. I still
>> maintain that’s in the realm of authentication, not authorization, and
>> should be cleanly separable from the problem of implementing per-document
>> access controls. Cheers,
>> 
>> Adam



Re: [DISCUSS] Per-doc access control

2019-02-27 Thread Garren Smith
Hi Adam,

I probably didn’t give views the most thought. I was thinking we could
implement views by either using the changes feed or a new query engine like
we do for mango. Does that make sense or is it not really possible?


On Wed, Feb 27, 2019 at 2:26 AM Adam Kocoloski  wrote:

>
> > On Feb 26, 2019, at 7:12 PM, Michael Fair 
> wrote:
> >
> > On Tue, Feb 26, 2019 at 3:38 PM Adam Kocoloski 
> wrote:
> >
> >> Mike,
> >>
> >> If I’m reading you correctly you’re concerned about cross-domain
> >> authentication. A good problem and worth discussing, but I think it’s
> >> cleanly decoupled from the per-doc access control work, which is
> focused on
> >> *authorization*.
> >>
> >>
> >
> > I don't think I'm talking about the same cross domain authentication you
> > are talking about.  I think you are talking about a web page from Domain
> > (B) attempting to access Couch resource in domain (A) (Cross site
> scripting
> > access). That's not what I'm talking about.
> >
> > I'm talking about what ought to happen with the authorization control
> > definitions when you have two Couch servers, one running in Domain (A)
> and
> > one running in Domain (B) with different sets of system users, such that
> > the authorized entities in the bidirectionally replicated database don't
> > exist in both server instances (the two distinct domains share the same
> > document database but have disparate sets of authenticated system users).
> >
> > In other words the ("sam", "pete", and "joe") users on domain/machine A
> are
> > not the same thing as the ("mary", "betty", and "sue")  users on
> > domain/machine B; yet the replicated database between the two machines
> has
> > the same access control document authorization descriptors in both
> places.
>
>
> Thanks Mike, I did understand you correctly the first time. I still
> maintain that’s in the realm of authentication, not authorization, and
> should be cleanly separable from the problem of implementing per-document
> access controls. Cheers,
>
> Adam