Re: Support permission concepts in REST spec

2024-02-28 Thread Ryan Blue
I think we should keep this separate from views. A view could be one way to
implement this in engine integration, but I think the best direction is to
pass the metadata directly and with a clear spec instead of trying to
translate to a view in the REST catalog. Translation in the catalog would
(currently) require producing multiple dialects, depends on SQL/view
support in clients, and would possibly require authorized views (which
haven't been added to the view spec yet).

While I wouldn't have the REST catalog return a view instead of a table, I
think it may be a good way to implement the feature, at least in Spark (if
Spark is explicitly trusted to enforce this!). We currently have no way to
pass filters from the data source in Spark and we could detect this case
and use a view to pass the requirements.

On Wed, Feb 28, 2024 at 2:49 AM Renjie Liu  wrote:

> Many of these decisions can be translated together to some sort of view on
>> top of a table. Consider user A has permission on table1, column c1 c2,
>> sha1 hash mask on email column, row filter age > 21. This can be translated
>> into a decision that user A can access a view *SELECT c1, c2,
>> sha1(email) FROM table1 WHERE age > 21*.
>
>
> If I understand correctly, does this mean that rest catalog needs to take
> care of the translation? So the rest catalog needs to be aware of sql
> engines?
>
> On Wed, Feb 28, 2024 at 12:38 PM Brian Olsen 
> wrote:
>
>> This may potentially be another thread, but I want to see if we can avoid
>> excess work/design discussions by utilizing an open policy engine rest api (
>> https://www.openpolicyagent.org/docs/latest/rest-api/
>> )that’s already defined and used in Trino now (
>> https://trino.io/docs/current/security/opa-access-control.html).
>>
>> I know the api may be overkill for simple permissions concepts, but this
>> could deflect the need for iceberg to own and manage any of the security
>> primitives as I’ve seen it mentioned that we don’t want to have too much
>> focus on these concepts.
>>
>> OPA seems like modern Ranger but for all apps, to me. What do you all
>> think?
>>
>> On Mon, Feb 26, 2024 at 1:34 PM Jack Ye  wrote:
>>
>>> Thank you Ryan for the detailed suggestions!
>>>
>>> So far, it sounds like there are in general 2 types of policy decisions:
>>> 1. ones that would fail an execution if not satisfied, e.g. check
>>> constraints, protected column, read/write access to storage, etc.
>>> 2. ones that would amend an execution plan, e.g. column and row filters,
>>> dynamic column masking, etc.
>>>
>>> For the second type, there is another potential alternative direction I
>>> found some systems are using. Let me also put it here, curious what people
>>> think.
>>>
>>> Many of these decisions can be translated together to some sort of view
>>> on top of a table. Consider user A has permission on table1, column c1 c2,
>>> sha1 hash mask on email column, row filter age > 21. This can be translated
>>> into a decision that user A can access a view *SELECT c1, c2,
>>> sha1(email) FROM table1 WHERE age > 21*.
>>>
>>> Given that we already have an Iceberg view spec, the catalog can
>>> potentially dynamically render such a multi-dialect view, so that table1
>>> becomes a view "*SELECT c1, c2, sha1(email) FROM temp_table_12345 WHERE
>>> age > 21*" where *temp_table_12345* becomes the actual underlying table
>>> for enforcing type 1 decisions. (temp table is just one way to implement it
>>> as an example here, more design consideration is needed)
>>>
>>> This approach seems to be more flexible in the sense that catalogs can
>>> develop many different styles of policy without the need for Iceberg to
>>> standardize on something like expression semantics, since the Iceberg view
>>> is now a standard for expressing the decision.
>>>
>>> Any thoughts?
>>>
>>> -Jack
>>>
>>>
>>>
>>>
>>> On Sun, Feb 25, 2024 at 11:58 AM Ryan Blue  wrote:
>>>
 I think this is a good idea, but is definitely an area where we need to
 be clear about how it would work for people to build with it successfully.

 All it takes is one engine to ignore these as the security provided is
 no longer applicable.

 You’re right that security depends on knowing that the client is going
 to enforce the requirements sent by the catalog. That just means that the
 catalog either needs to deny access (401/403 response) or have some
 pre-established trust in the identity that is loading a table (or view).

 The current authentication mechanisms that we’ve documented have ways
 to do this. For example, if you’re using a token scheme you can put
 additional claims in the auth token when the client is trusted to enforce
 fine-grained access. To establish trust, you can either manually create a
 token for a compute service with the trust selected or we could add another
 OAuth2 scope to request it when connecting compute engines to catalogs.
 Either way, we already have mechanisms to

Re: Support permission concepts in REST spec

2024-02-28 Thread Renjie Liu
>
> Many of these decisions can be translated together to some sort of view on
> top of a table. Consider user A has permission on table1, column c1 c2,
> sha1 hash mask on email column, row filter age > 21. This can be translated
> into a decision that user A can access a view *SELECT c1, c2, sha1(email)
> FROM table1 WHERE age > 21*.


If I understand correctly, does this mean that rest catalog needs to take
care of the translation? So the rest catalog needs to be aware of sql
engines?

On Wed, Feb 28, 2024 at 12:38 PM Brian Olsen 
wrote:

> This may potentially be another thread, but I want to see if we can avoid
> excess work/design discussions by utilizing an open policy engine rest api (
> https://www.openpolicyagent.org/docs/latest/rest-api/
> )that’s already defined and used in Trino now (
> https://trino.io/docs/current/security/opa-access-control.html).
>
> I know the api may be overkill for simple permissions concepts, but this
> could deflect the need for iceberg to own and manage any of the security
> primitives as I’ve seen it mentioned that we don’t want to have too much
> focus on these concepts.
>
> OPA seems like modern Ranger but for all apps, to me. What do you all
> think?
>
> On Mon, Feb 26, 2024 at 1:34 PM Jack Ye  wrote:
>
>> Thank you Ryan for the detailed suggestions!
>>
>> So far, it sounds like there are in general 2 types of policy decisions:
>> 1. ones that would fail an execution if not satisfied, e.g. check
>> constraints, protected column, read/write access to storage, etc.
>> 2. ones that would amend an execution plan, e.g. column and row filters,
>> dynamic column masking, etc.
>>
>> For the second type, there is another potential alternative direction I
>> found some systems are using. Let me also put it here, curious what people
>> think.
>>
>> Many of these decisions can be translated together to some sort of view
>> on top of a table. Consider user A has permission on table1, column c1 c2,
>> sha1 hash mask on email column, row filter age > 21. This can be translated
>> into a decision that user A can access a view *SELECT c1, c2,
>> sha1(email) FROM table1 WHERE age > 21*.
>>
>> Given that we already have an Iceberg view spec, the catalog can
>> potentially dynamically render such a multi-dialect view, so that table1
>> becomes a view "*SELECT c1, c2, sha1(email) FROM temp_table_12345 WHERE
>> age > 21*" where *temp_table_12345* becomes the actual underlying table
>> for enforcing type 1 decisions. (temp table is just one way to implement it
>> as an example here, more design consideration is needed)
>>
>> This approach seems to be more flexible in the sense that catalogs can
>> develop many different styles of policy without the need for Iceberg to
>> standardize on something like expression semantics, since the Iceberg view
>> is now a standard for expressing the decision.
>>
>> Any thoughts?
>>
>> -Jack
>>
>>
>>
>>
>> On Sun, Feb 25, 2024 at 11:58 AM Ryan Blue  wrote:
>>
>>> I think this is a good idea, but is definitely an area where we need to
>>> be clear about how it would work for people to build with it successfully.
>>>
>>> All it takes is one engine to ignore these as the security provided is
>>> no longer applicable.
>>>
>>> You’re right that security depends on knowing that the client is going
>>> to enforce the requirements sent by the catalog. That just means that the
>>> catalog either needs to deny access (401/403 response) or have some
>>> pre-established trust in the identity that is loading a table (or view).
>>>
>>> The current authentication mechanisms that we’ve documented have ways to
>>> do this. For example, if you’re using a token scheme you can put additional
>>> claims in the auth token when the client is trusted to enforce fine-grained
>>> access. To establish trust, you can either manually create a token for a
>>> compute service with the trust selected or we could add another OAuth2
>>> scope to request it when connecting compute engines to catalogs. Either
>>> way, we already have mechanisms to establish trust relationships between
>>> engines and catalogs so this would just be an additional capability.
>>>
>>> I worry a little bit about putting security features into the REST API
>>> that require the execution engine and catalog to agree on semantics and
>>> execution.
>>>
>>> I agree in the general case, but I think there are narrow cases where we
>>> are already handling this problem and solving those is incredibly useful. I
>>> think a critical design constraint is that this extension should be used to
>>> pass requirements — the result of policy decisions — and NOT be used to
>>> pass policy itself. (And, I would change the proposed policy field in
>>> REST responses to requirements or similar to make this clear.)
>>>
>>> Policy is complicated and it is modelled and enforced differently across
>>> products. Databases all have their own rules. For instance, in some schemes
>>> database SELECT cascades to table SELECT, while others

Re: Support permission concepts in REST spec

2024-02-27 Thread Brian Olsen
This may potentially be another thread, but I want to see if we can avoid
excess work/design discussions by utilizing an open policy engine rest api (
https://www.openpolicyagent.org/docs/latest/rest-api/
)that’s already defined and used in Trino now (
https://trino.io/docs/current/security/opa-access-control.html).

I know the api may be overkill for simple permissions concepts, but this
could deflect the need for iceberg to own and manage any of the security
primitives as I’ve seen it mentioned that we don’t want to have too much
focus on these concepts.

OPA seems like modern Ranger but for all apps, to me. What do you all think?

On Mon, Feb 26, 2024 at 1:34 PM Jack Ye  wrote:

> Thank you Ryan for the detailed suggestions!
>
> So far, it sounds like there are in general 2 types of policy decisions:
> 1. ones that would fail an execution if not satisfied, e.g. check
> constraints, protected column, read/write access to storage, etc.
> 2. ones that would amend an execution plan, e.g. column and row filters,
> dynamic column masking, etc.
>
> For the second type, there is another potential alternative direction I
> found some systems are using. Let me also put it here, curious what people
> think.
>
> Many of these decisions can be translated together to some sort of view on
> top of a table. Consider user A has permission on table1, column c1 c2,
> sha1 hash mask on email column, row filter age > 21. This can be translated
> into a decision that user A can access a view *SELECT c1, c2, sha1(email)
> FROM table1 WHERE age > 21*.
>
> Given that we already have an Iceberg view spec, the catalog can
> potentially dynamically render such a multi-dialect view, so that table1
> becomes a view "*SELECT c1, c2, sha1(email) FROM temp_table_12345 WHERE
> age > 21*" where *temp_table_12345* becomes the actual underlying table
> for enforcing type 1 decisions. (temp table is just one way to implement it
> as an example here, more design consideration is needed)
>
> This approach seems to be more flexible in the sense that catalogs can
> develop many different styles of policy without the need for Iceberg to
> standardize on something like expression semantics, since the Iceberg view
> is now a standard for expressing the decision.
>
> Any thoughts?
>
> -Jack
>
>
>
>
> On Sun, Feb 25, 2024 at 11:58 AM Ryan Blue  wrote:
>
>> I think this is a good idea, but is definitely an area where we need to
>> be clear about how it would work for people to build with it successfully.
>>
>> All it takes is one engine to ignore these as the security provided is no
>> longer applicable.
>>
>> You’re right that security depends on knowing that the client is going to
>> enforce the requirements sent by the catalog. That just means that the
>> catalog either needs to deny access (401/403 response) or have some
>> pre-established trust in the identity that is loading a table (or view).
>>
>> The current authentication mechanisms that we’ve documented have ways to
>> do this. For example, if you’re using a token scheme you can put additional
>> claims in the auth token when the client is trusted to enforce fine-grained
>> access. To establish trust, you can either manually create a token for a
>> compute service with the trust selected or we could add another OAuth2
>> scope to request it when connecting compute engines to catalogs. Either
>> way, we already have mechanisms to establish trust relationships between
>> engines and catalogs so this would just be an additional capability.
>>
>> I worry a little bit about putting security features into the REST API
>> that require the execution engine and catalog to agree on semantics and
>> execution.
>>
>> I agree in the general case, but I think there are narrow cases where we
>> are already handling this problem and solving those is incredibly useful. I
>> think a critical design constraint is that this extension should be used to
>> pass requirements — the result of policy decisions — and NOT be used to
>> pass policy itself. (And, I would change the proposed policy field in
>> REST responses to requirements or similar to make this clear.)
>>
>> Policy is complicated and it is modelled and enforced differently across
>> products. Databases all have their own rules. For instance, in some schemes
>> database SELECT cascades to table SELECT, while others check only the table
>> resource for SELECT permission. I think we clearly don’t want to try to
>> normalize or force a standard on this space. Instead, we want catalogs and
>> access control systems to have the model that they choose. The REST
>> protocol should communicate the decisions made by those schemes.
>>
>> That significantly narrows the scope of this feature. Starting with
>> fields that can or can’t be read and filters that must be applied is a
>> great start that covers a large number of use cases. And we already have
>> clear semantics for Iceberg filters and for column projection. We would
>> still need to specify additional 

Re: Support permission concepts in REST spec

2024-02-26 Thread Jack Ye
Thank you Ryan for the detailed suggestions!

So far, it sounds like there are in general 2 types of policy decisions:
1. ones that would fail an execution if not satisfied, e.g. check
constraints, protected column, read/write access to storage, etc.
2. ones that would amend an execution plan, e.g. column and row filters,
dynamic column masking, etc.

For the second type, there is another potential alternative direction I
found some systems are using. Let me also put it here, curious what people
think.

Many of these decisions can be translated together to some sort of view on
top of a table. Consider user A has permission on table1, column c1 c2,
sha1 hash mask on email column, row filter age > 21. This can be translated
into a decision that user A can access a view *SELECT c1, c2, sha1(email)
FROM table1 WHERE age > 21*.

Given that we already have an Iceberg view spec, the catalog can
potentially dynamically render such a multi-dialect view, so that table1
becomes a view "*SELECT c1, c2, sha1(email) FROM temp_table_12345 WHERE age
> 21*" where *temp_table_12345* becomes the actual underlying table for
enforcing type 1 decisions. (temp table is just one way to implement it as
an example here, more design consideration is needed)

This approach seems to be more flexible in the sense that catalogs can
develop many different styles of policy without the need for Iceberg to
standardize on something like expression semantics, since the Iceberg view
is now a standard for expressing the decision.

Any thoughts?

-Jack




On Sun, Feb 25, 2024 at 11:58 AM Ryan Blue  wrote:

> I think this is a good idea, but is definitely an area where we need to be
> clear about how it would work for people to build with it successfully.
>
> All it takes is one engine to ignore these as the security provided is no
> longer applicable.
>
> You’re right that security depends on knowing that the client is going to
> enforce the requirements sent by the catalog. That just means that the
> catalog either needs to deny access (401/403 response) or have some
> pre-established trust in the identity that is loading a table (or view).
>
> The current authentication mechanisms that we’ve documented have ways to
> do this. For example, if you’re using a token scheme you can put additional
> claims in the auth token when the client is trusted to enforce fine-grained
> access. To establish trust, you can either manually create a token for a
> compute service with the trust selected or we could add another OAuth2
> scope to request it when connecting compute engines to catalogs. Either
> way, we already have mechanisms to establish trust relationships between
> engines and catalogs so this would just be an additional capability.
>
> I worry a little bit about putting security features into the REST API
> that require the execution engine and catalog to agree on semantics and
> execution.
>
> I agree in the general case, but I think there are narrow cases where we
> are already handling this problem and solving those is incredibly useful. I
> think a critical design constraint is that this extension should be used to
> pass requirements — the result of policy decisions — and NOT be used to
> pass policy itself. (And, I would change the proposed policy field in
> REST responses to requirements or similar to make this clear.)
>
> Policy is complicated and it is modelled and enforced differently across
> products. Databases all have their own rules. For instance, in some schemes
> database SELECT cascades to table SELECT, while others check only the table
> resource for SELECT permission. I think we clearly don’t want to try to
> normalize or force a standard on this space. Instead, we want catalogs and
> access control systems to have the model that they choose. The REST
> protocol should communicate the decisions made by those schemes.
>
> That significantly narrows the scope of this feature. Starting with fields
> that can or can’t be read and filters that must be applied is a great start
> that covers a large number of use cases. And we already have clear
> semantics for Iceberg filters and for column projection. We would still
> need to specify additional guidance, but semantics and execution are
> possible to agree on if we start small.
>
> Here’s some additional guidance I would add:
>
>- Projection: the client is not allowed to read certain fields,
>specified by field ID, even if those fields are not part of the output.
>This avoids leaks using queries like SELECT count(1) FROM
>bank_accounts WHERE email = ? where email is a protected column.
>- Filtering: rows that do not match the filter must be removed
>immediately after loading the data, before rows or groups of rows are
>passed to any other operator.
>
> Jack also suggested passing permissions back, which I don’t think I would
> include. There was some discussion about using this to identify catalogs
> that are secondary references; I think that’s okay but

Re: Support permission concepts in REST spec

2024-02-25 Thread Ryan Blue
I think this is a good idea, but is definitely an area where we need to be
clear about how it would work for people to build with it successfully.

All it takes is one engine to ignore these as the security provided is no
longer applicable.

You’re right that security depends on knowing that the client is going to
enforce the requirements sent by the catalog. That just means that the
catalog either needs to deny access (401/403 response) or have some
pre-established trust in the identity that is loading a table (or view).

The current authentication mechanisms that we’ve documented have ways to do
this. For example, if you’re using a token scheme you can put additional
claims in the auth token when the client is trusted to enforce fine-grained
access. To establish trust, you can either manually create a token for a
compute service with the trust selected or we could add another OAuth2
scope to request it when connecting compute engines to catalogs. Either
way, we already have mechanisms to establish trust relationships between
engines and catalogs so this would just be an additional capability.

I worry a little bit about putting security features into the REST API that
require the execution engine and catalog to agree on semantics and
execution.

I agree in the general case, but I think there are narrow cases where we
are already handling this problem and solving those is incredibly useful. I
think a critical design constraint is that this extension should be used to
pass requirements — the result of policy decisions — and NOT be used to
pass policy itself. (And, I would change the proposed policy field in REST
responses to requirements or similar to make this clear.)

Policy is complicated and it is modelled and enforced differently across
products. Databases all have their own rules. For instance, in some schemes
database SELECT cascades to table SELECT, while others check only the table
resource for SELECT permission. I think we clearly don’t want to try to
normalize or force a standard on this space. Instead, we want catalogs and
access control systems to have the model that they choose. The REST
protocol should communicate the decisions made by those schemes.

That significantly narrows the scope of this feature. Starting with fields
that can or can’t be read and filters that must be applied is a great start
that covers a large number of use cases. And we already have clear
semantics for Iceberg filters and for column projection. We would still
need to specify additional guidance, but semantics and execution are
possible to agree on if we start small.

Here’s some additional guidance I would add:

   - Projection: the client is not allowed to read certain fields,
   specified by field ID, even if those fields are not part of the output.
   This avoids leaks using queries like SELECT count(1) FROM bank_accounts
   WHERE email = ? where email is a protected column.
   - Filtering: rows that do not match the filter must be removed
   immediately after loading the data, before rows or groups of rows are
   passed to any other operator.

Jack also suggested passing permissions back, which I don’t think I would
include. There was some discussion about using this to identify catalogs
that are secondary references; I think that’s okay but I would make it a
much more narrow option, like supports-commit: false rather than specifying
a set of privileges.

As for the idea about sending write constraints, this is an interesting
idea. I think we could make it work the same way that row and column
filters would work. If the client is trusted to support it, then it is
responsible for checking those constraints and not attempting to commit
changes. There’s not need to complicated the commit protocol if the chain
of trust includes the ability to enforce constraints. Plus, constraints may
need to be known during job execution, not just at commit time, so it is
better to send them when loading a table.

Ryan

On Tue, Feb 20, 2024 at 1:44 PM Jack Ye  wrote:

> Thanks for the response JB & Micah.
>
> > Is this intended to be information only?
>
> I would expect the engine to honor it to some extent. Consider the case of
> writing to a table, LoadTableRequest needs to be able to express this
> intent of requesting write access, such that the credentials vended back in
> LoadTableResponse can have write access.
>
> > I worry a little bit about putting security features into the REST API
> that require the execution engine and catalog to agree on semantics and
> execution.  All it takes is one engine to ignore these as the security
> provided is no longer applicable.
> > I think replicating this would be challenging, since it requires
> distinguishing between direct user access to the catalog and a query engine
> working on a user's behalf.
>
> Yes, I have the same concerns, that's why I am trying to gather some
> community feedback here. I think it is possible to distinguish a normal
> user vs a specific engine. At least in

Re: Support permission concepts in REST spec

2024-02-20 Thread Jack Ye
Thanks for the response JB & Micah.

> Is this intended to be information only?

I would expect the engine to honor it to some extent. Consider the case of
writing to a table, LoadTableRequest needs to be able to express this
intent of requesting write access, such that the credentials vended back in
LoadTableResponse can have write access.

> I worry a little bit about putting security features into the REST API
that require the execution engine and catalog to agree on semantics and
execution.  All it takes is one engine to ignore these as the security
provided is no longer applicable.
> I think replicating this would be challenging, since it requires
distinguishing between direct user access to the catalog and a query engine
working on a user's behalf.

Yes, I have the same concerns, that's why I am trying to gather some
community feedback here. I think it is possible to distinguish a normal
user vs a specific engine. At least in the AWS world, we figured out a way.
If an engine is accessing the Glue API, it must go through an onboarding
process
.
After that process, there is a shared responsibility model: any requests
from the authorized engine will contain sensitive information like
credentials, filters, etc. The engine needs to make sure that these
sensitive information are not exposed when fulfilling query executions.
Normal users calling the catalog with their personal credentials do not see
anything sensitive. This limits the end users to always use an authorized
engine for actual data read and write, which is intended.

>From a feature perspective, there is an opportunity I see to create a spec
about these common security constructs that different engine integrations
can try to follow. Specific authorization mechanisms like the one I
described above can be left to the individual catalog services to figure
out.

Just looking at the initial feedback, it sounds like at least it is an
interesting idea that is worth exploring. I can provide a more detailed doc
for us to review.

Best,
Jack Ye

On Fri, Feb 16, 2024 at 10:29 AM Micah Kornfield 
wrote:

> Hi Jack,
> I think this is an interesting idea but I think there are some practical
> concerns (I posted them inline).
>
> - general access patterns, like read-only, read-write, admin full access,
>> etc.
>
> Is this intended to be information only?  I would hope the tokens and REST
> API vending to clients would enforce these settings, so it seems like this
> would mostly be for debug purposes (e.g. if only read access is available,
> only tokens with "read" privileges are vended, or without full access admin
> rights update to the catalog would not be allowed).
>
> - columns that the specific caller has access to for read or write
>> - filters (maybe expressed in Iceberg expression) that should be applied
>> by the engine on behalf of the caller during a table scan
>
> I have a few concerns here:
> 1.  I worry a little bit about putting security features into the REST API
> that require the execution engine and catalog to agree on semantics and
> execution.  All it takes is one engine to ignore these as the security
> provided is no longer applicable.  For more tightly controlled environments
> this is viable but it feels like some very large consequences if users make
> the wrong choice on engine or even if there is an engine using a stale REST
> API client (i.e. we would need to be very careful with
> compatibility guarantees).
> 2.  The row-level security feature linked is designed so that end-users
> are not aware of which, if any, filters were applied during the query.  I
> think replicating this would be challenging, since it requires
> distinguishing between direct user access to the catalog and a query engine
> working on a user's behalf.
> 3.  In terms of dialect, I imagine it would probably make sense to be
> agnostic here and follow a similar model that views are taking by allowing
> multiple dialects (or at least wait to see how the view works out in
> practice).
>
>
> For points 1 and 2 a different approach would be to introduce a new
> standard based on something like Apache Arrow's Flight or Flight SQL
> protocol that acts as a layer of abstraction between physical storage and
> security controls.
>
> - constraints (again, maybe expressed in Iceberg expression) that should
>> trigger the table scan or table commit to be rejected
>
>
> It feels like this should probably be part of the table spec, as in
> general, it affects the commit protocol (IIUC it is already covered
> partially with identifier-field IDs).
>
> Thanks,
> Micah
>
>
>
> On Tue, Feb 13, 2024 at 10:42 AM Jack Ye  wrote:
>
>> Hi everyone,
>>
>> I would like to get some initial thoughts about the possibility to add
>> some permission control constructs to the Iceberg REST spec. Do we think it
>> is valuable? If so, how do we imagine its shape and form?
>>
>> The background of this idea 

Re: Support permission concepts in REST spec

2024-02-16 Thread Micah Kornfield
Hi Jack,
I think this is an interesting idea but I think there are some practical
concerns (I posted them inline).

- general access patterns, like read-only, read-write, admin full access,
> etc.

Is this intended to be information only?  I would hope the tokens and REST
API vending to clients would enforce these settings, so it seems like this
would mostly be for debug purposes (e.g. if only read access is available,
only tokens with "read" privileges are vended, or without full access admin
rights update to the catalog would not be allowed).

- columns that the specific caller has access to for read or write
> - filters (maybe expressed in Iceberg expression) that should be applied
> by the engine on behalf of the caller during a table scan

I have a few concerns here:
1.  I worry a little bit about putting security features into the REST API
that require the execution engine and catalog to agree on semantics and
execution.  All it takes is one engine to ignore these as the security
provided is no longer applicable.  For more tightly controlled environments
this is viable but it feels like some very large consequences if users make
the wrong choice on engine or even if there is an engine using a stale REST
API client (i.e. we would need to be very careful with
compatibility guarantees).
2.  The row-level security feature linked is designed so that end-users are
not aware of which, if any, filters were applied during the query.  I think
replicating this would be challenging, since it requires distinguishing
between direct user access to the catalog and a query engine working on a
user's behalf.
3.  In terms of dialect, I imagine it would probably make sense to be
agnostic here and follow a similar model that views are taking by allowing
multiple dialects (or at least wait to see how the view works out in
practice).


For points 1 and 2 a different approach would be to introduce a new
standard based on something like Apache Arrow's Flight or Flight SQL
protocol that acts as a layer of abstraction between physical storage and
security controls.

- constraints (again, maybe expressed in Iceberg expression) that should
> trigger the table scan or table commit to be rejected


It feels like this should probably be part of the table spec, as in
general, it affects the commit protocol (IIUC it is already covered
partially with identifier-field IDs).

Thanks,
Micah



On Tue, Feb 13, 2024 at 10:42 AM Jack Ye  wrote:

> Hi everyone,
>
> I would like to get some initial thoughts about the possibility to add
> some permission control constructs to the Iceberg REST spec. Do we think it
> is valuable? If so, how do we imagine its shape and form?
>
> The background of this idea is that, today Iceberg already supports loading
> credentials to a table through the config field
> 
> in LoadTableResponse, as a basic way to control data access. We heard that
> users really like this feature and want more regarding data access control
> and permission configuration in Iceberg.
>
> For example, we could consider add a *policy* field in the REST
> LoadTableResponse, where a policy has sub-fields that describe:
> - general access patterns, like read-only, read-write, admin full access,
> etc.
> - columns that the specific caller has access to for read or write
> - filters (maybe expressed in Iceberg expression) that should be applied
> by the engine on behalf of the caller during a table scan
> - constraints (again, maybe expressed in Iceberg expression) that should
> trigger the table scan or table commit to be rejected
>
> This could be the solution to some topics we discussed in the past. For
> example, we can use this as a solution to the EXTERNAL database semantics
> support discussion
>  by
> saying an external table has read-only access. We can also let the REST
> service decide access to columns, which solves some governance issues
> raised during the column tagging discussion
> .
>
> Outside existing discussions, this can also work pretty well with popular
> engine vendor features like row-level security
> , check
> constraint , etc.
>
> In general, permission control and data governance is an important aspect
> for enterprise data warehousing. I think having these constructs in the
> REST spec and related engine integration could increase enterprise adoption
> and help our vision of standardizing access through the REST interface.
>
> Would appreciate any thoughts in this domain! And if we have some general
> interest in this direction, I can put up a more detailed design doc.
>
> Best,
> Jack Ye
>


Re: Support permission concepts in REST spec

2024-02-16 Thread Jean-Baptiste Onofré
Hi Jack,

It's a good idea and it has to be pluggable.

I think we could have a TableConfigResponse and ViewConfigResponse
that could contain the policy field + key/value pair open list. The
idea is let the REST backend to deal with permission (

We would be able to extend permission this way.

Generally speaking I would propose a dual REST spec: about the
table/view operations handling, and about the REST catalog behavior
itself (pluggable security, etc). But that's another discussion I will
start later :)

Regards
JB

On Tue, Feb 13, 2024 at 7:41 PM Jack Ye  wrote:
>
> Hi everyone,
>
> I would like to get some initial thoughts about the possibility to add some 
> permission control constructs to the Iceberg REST spec. Do we think it is 
> valuable? If so, how do we imagine its shape and form?
>
> The background of this idea is that, today Iceberg already supports loading 
> credentials to a table through the config field in LoadTableResponse, as a 
> basic way to control data access. We heard that users really like this 
> feature and want more regarding data access control and permission 
> configuration in Iceberg.
>
> For example, we could consider add a policy field in the REST 
> LoadTableResponse, where a policy has sub-fields that describe:
> - general access patterns, like read-only, read-write, admin full access, etc.
> - columns that the specific caller has access to for read or write
> - filters (maybe expressed in Iceberg expression) that should be applied by 
> the engine on behalf of the caller during a table scan
> - constraints (again, maybe expressed in Iceberg expression) that should 
> trigger the table scan or table commit to be rejected
>
> This could be the solution to some topics we discussed in the past. For 
> example, we can use this as a solution to the EXTERNAL database semantics 
> support discussion by saying an external table has read-only access. We can 
> also let the REST service decide access to columns, which solves some 
> governance issues raised during the column tagging discussion.
>
> Outside existing discussions, this can also work pretty well with popular 
> engine vendor features like row-level security, check constraint, etc.
>
> In general, permission control and data governance is an important aspect for 
> enterprise data warehousing. I think having these constructs in the REST spec 
> and related engine integration could increase enterprise adoption and help 
> our vision of standardizing access through the REST interface.
>
> Would appreciate any thoughts in this domain! And if we have some general 
> interest in this direction, I can put up a more detailed design doc.
>
> Best,
> Jack Ye