Re: [DISCUSS][SPEC] Add loaded-via in loadTable Request to support Definer Views

2025-08-21 Thread Ryan Blue
I agree with Prashant's arguments.

This is a good thing to add and it can enable secure views with trusted
engines, but the catalog implementer and warehouse administrator (who sets
up trusted engine relationships) have higher responsibilities in
those cases. That's okay and catalogs can also rely on secure view
implementations in engines that support them, like Trino, instead.

On Thu, Aug 21, 2025 at 7:34 AM Prashant Singh 
wrote:

> Agree, that ownership should be managed entirely by catalog, because we
> don't define Actors / Users in IRC, and a catalog can very well store this
> metadata separately (not necessarily in the view metadata obj), hence we
> don't talk about owner in this proposal.
>
> loaded-via / referenced-by needs to originate from a Trusted Engine /
> Partner that the user is using, which would make sure the loaded-via is not
> tampered or can bypass, establishing Trusted Engine workflow is *not* a
> goal of this proposal as this is entirely up to the catalog to establish
> this trust relationship, via OAuth (Delegation workflow) or mTLS is
> entirely catalog choice. I added this in the proposal as well as this is
> not a *sufficient* condition for implementing a secure definer view.
>
> The way i see the proposal is it's the first step towards achieving the
> DEFINER views, the spec is just sending the context that, Hey Catalog this
> table is being is being accessed in the context of this View, it's not
> mandating the Catalog to authorize based on this information, it's totally
> up to the catalog discretion to simply ignore it.
>
> Nevertheless this context helps in audit POV as most of the audit logs
> systems have the base entity along with the entity in the audit logs (for
> example check here
> ).
>
> Hence I don't see why we should not add this context to the loadTable
> calls given we are clear from the spec POV what this is (ref
> )
> and not dictate the catalog how to authorize.
>
> Best,
> Prashant
>
> On Thu, Aug 21, 2025 at 7:10 AM Claude Warren, Jr
>  wrote:
>
>> I believe that Ryan is correct, that the ownership and access should be
>> managed by the catalog.
>>
>> Determining who has access for CRUD operations on the table data as well
>> as CRUD operations on the table metadata are rather complicated questions.
>> Adding Views to the mix is an exponential change in the number of issues
>> to consider.
>>
>> I think that there should be an Interface that catalogs implement that
>> would allow the creation of a single set of tests, and an interface to
>> catalog developers implement to provide data for the tests.  In this way
>> developers of catalogs can verify that their implementations cover all the
>> identified cases.
>>
>> I also think that adding a "loaded-via" field opens up a very broad
>> attack vector and should be avoided.
>>
>> Claude
>>
>> On Wed, Aug 20, 2025 at 4:28 PM Ryan Blue  wrote:
>>
>>> I think that ownership should be managed by the catalog and should not
>>> be set by table properties. Respecting an "owner" property would easily
>>> lead to security issues where writers can change the property with write
>>> access. I think ownership or what role is the definer, like other security
>>> attributes, should be managed by the catalog outside of table properties.
>>>
>>> It's also reasonable to include in the proposal that this is necessary
>>> but not sufficient to build secure views. A catalog also needs to establish
>>> that the engine is trusted and would also need to validate that the
>>> principal loading the table has access to the view. But these are catalog
>>> problems. While they can be noted in this proposal, it's up to the catalog
>>> to design the policy and enforce it.
>>>
>>> On Tue, Aug 19, 2025 at 7:34 PM Sung Yun  wrote:
>>>
 Hi Prashant,

 Thanks for sharing this proposal! This has been top of mind for me as
 well, since I’m working on integrating Trino with Iceberg while also
 supporting other heterogeneous engines in the ecosystem.

 I agree that a change like this is necessary to enable SECURITY DEFINER
 views to correctly provision metadata for the underlying tables. That said,
 even if it’s out of scope for this document, I think it would be valuable
 to also discuss potential attack vectors. Adopters of this feature should
 be aware of the additional risk surface it introduces if there isn’t a
 trusted security contract between the client and the catalog.

 I’d also support converging on a consistent metadata property
 convention for view ownership and security mode, so we have a common
 foundation across engines.

 Curious to hear what others think.

 Best,
 Sung

 On 2025/08/18 22:20:33 Prashant Singh wrote:
 > Hi everyone,
 >
 > I wa

Re: [DISCUSS][SPEC] Add loaded-via in loadTable Request to support Definer Views

2025-08-21 Thread Prashant Singh
Agree, that ownership should be managed entirely by catalog, because we
don't define Actors / Users in IRC, and a catalog can very well store this
metadata separately (not necessarily in the view metadata obj), hence we
don't talk about owner in this proposal.

loaded-via / referenced-by needs to originate from a Trusted Engine /
Partner that the user is using, which would make sure the loaded-via is not
tampered or can bypass, establishing Trusted Engine workflow is *not* a
goal of this proposal as this is entirely up to the catalog to establish
this trust relationship, via OAuth (Delegation workflow) or mTLS is
entirely catalog choice. I added this in the proposal as well as this is
not a *sufficient* condition for implementing a secure definer view.

The way i see the proposal is it's the first step towards achieving the
DEFINER views, the spec is just sending the context that, Hey Catalog this
table is being is being accessed in the context of this View, it's not
mandating the Catalog to authorize based on this information, it's totally
up to the catalog discretion to simply ignore it.

Nevertheless this context helps in audit POV as most of the audit logs
systems have the base entity along with the entity in the audit logs (for
example check here
).

Hence I don't see why we should not add this context to the loadTable calls
given we are clear from the spec POV what this is (ref
)
and not dictate the catalog how to authorize.

Best,
Prashant

On Thu, Aug 21, 2025 at 7:10 AM Claude Warren, Jr
 wrote:

> I believe that Ryan is correct, that the ownership and access should be
> managed by the catalog.
>
> Determining who has access for CRUD operations on the table data as well
> as CRUD operations on the table metadata are rather complicated questions.
> Adding Views to the mix is an exponential change in the number of issues
> to consider.
>
> I think that there should be an Interface that catalogs implement that
> would allow the creation of a single set of tests, and an interface to
> catalog developers implement to provide data for the tests.  In this way
> developers of catalogs can verify that their implementations cover all the
> identified cases.
>
> I also think that adding a "loaded-via" field opens up a very broad attack
> vector and should be avoided.
>
> Claude
>
> On Wed, Aug 20, 2025 at 4:28 PM Ryan Blue  wrote:
>
>> I think that ownership should be managed by the catalog and should not be
>> set by table properties. Respecting an "owner" property would easily lead
>> to security issues where writers can change the property with write access.
>> I think ownership or what role is the definer, like other security
>> attributes, should be managed by the catalog outside of table properties.
>>
>> It's also reasonable to include in the proposal that this is necessary
>> but not sufficient to build secure views. A catalog also needs to establish
>> that the engine is trusted and would also need to validate that the
>> principal loading the table has access to the view. But these are catalog
>> problems. While they can be noted in this proposal, it's up to the catalog
>> to design the policy and enforce it.
>>
>> On Tue, Aug 19, 2025 at 7:34 PM Sung Yun  wrote:
>>
>>> Hi Prashant,
>>>
>>> Thanks for sharing this proposal! This has been top of mind for me as
>>> well, since I’m working on integrating Trino with Iceberg while also
>>> supporting other heterogeneous engines in the ecosystem.
>>>
>>> I agree that a change like this is necessary to enable SECURITY DEFINER
>>> views to correctly provision metadata for the underlying tables. That said,
>>> even if it’s out of scope for this document, I think it would be valuable
>>> to also discuss potential attack vectors. Adopters of this feature should
>>> be aware of the additional risk surface it introduces if there isn’t a
>>> trusted security contract between the client and the catalog.
>>>
>>> I’d also support converging on a consistent metadata property convention
>>> for view ownership and security mode, so we have a common foundation across
>>> engines.
>>>
>>> Curious to hear what others think.
>>>
>>> Best,
>>> Sung
>>>
>>> On 2025/08/18 22:20:33 Prashant Singh wrote:
>>> > Hi everyone,
>>> >
>>> > I wanted to bring up a small but important change to how we handle view
>>> > security in Iceberg, especially for *DEFINER* views. This change is
>>> crucial
>>> > for ensuring views function as a secure gateway to sensitive data,
>>> where
>>> > access is determined by the view's creator, not the user running the
>>> query.
>>> >
>>> > The Problem
>>> >
>>> > Currently, when a view queries a table, the loadTable request to the
>>> REST
>>> > catalog doesn't know that it's coming from a view. It just sees the
>>> table's
>>> > name and defaults to checking the p

Re: [DISCUSS][SPEC] Add loaded-via in loadTable Request to support Definer Views

2025-08-21 Thread Claude Warren, Jr
I believe that Ryan is correct, that the ownership and access should be
managed by the catalog.

Determining who has access for CRUD operations on the table data as well as
CRUD operations on the table metadata are rather complicated questions.
Adding Views to the mix is an exponential change in the number of issues to
consider.

I think that there should be an Interface that catalogs implement that
would allow the creation of a single set of tests, and an interface to
catalog developers implement to provide data for the tests.  In this way
developers of catalogs can verify that their implementations cover all the
identified cases.

I also think that adding a "loaded-via" field opens up a very broad attack
vector and should be avoided.

Claude

On Wed, Aug 20, 2025 at 4:28 PM Ryan Blue  wrote:

> I think that ownership should be managed by the catalog and should not be
> set by table properties. Respecting an "owner" property would easily lead
> to security issues where writers can change the property with write access.
> I think ownership or what role is the definer, like other security
> attributes, should be managed by the catalog outside of table properties.
>
> It's also reasonable to include in the proposal that this is necessary but
> not sufficient to build secure views. A catalog also needs to establish
> that the engine is trusted and would also need to validate that the
> principal loading the table has access to the view. But these are catalog
> problems. While they can be noted in this proposal, it's up to the catalog
> to design the policy and enforce it.
>
> On Tue, Aug 19, 2025 at 7:34 PM Sung Yun  wrote:
>
>> Hi Prashant,
>>
>> Thanks for sharing this proposal! This has been top of mind for me as
>> well, since I’m working on integrating Trino with Iceberg while also
>> supporting other heterogeneous engines in the ecosystem.
>>
>> I agree that a change like this is necessary to enable SECURITY DEFINER
>> views to correctly provision metadata for the underlying tables. That said,
>> even if it’s out of scope for this document, I think it would be valuable
>> to also discuss potential attack vectors. Adopters of this feature should
>> be aware of the additional risk surface it introduces if there isn’t a
>> trusted security contract between the client and the catalog.
>>
>> I’d also support converging on a consistent metadata property convention
>> for view ownership and security mode, so we have a common foundation across
>> engines.
>>
>> Curious to hear what others think.
>>
>> Best,
>> Sung
>>
>> On 2025/08/18 22:20:33 Prashant Singh wrote:
>> > Hi everyone,
>> >
>> > I wanted to bring up a small but important change to how we handle view
>> > security in Iceberg, especially for *DEFINER* views. This change is
>> crucial
>> > for ensuring views function as a secure gateway to sensitive data, where
>> > access is determined by the view's creator, not the user running the
>> query.
>> >
>> > The Problem
>> >
>> > Currently, when a view queries a table, the loadTable request to the
>> REST
>> > catalog doesn't know that it's coming from a view. It just sees the
>> table's
>> > name and defaults to checking the permissions of the user running the
>> > query. This does not work well with the security model for *DEFINER*
>> views
>> > and prevents them from working as intended.
>> >
>> > The Proposed Solution (PR-13810
>> > )
>> >
>> > To address this, I've created a pull request that adds an optional
>> > loaded-via field to the loadTable request. This field will contain the
>> > fully qualified name of the view (e.g., my_db.secure_view) i.e namespace
>> > and the view name.
>> >
>> > When the catalog sees this new field, it will know to perform the
>> security
>> > check against the view's owner, not the query invoker. This small
>> addition
>> > ensures that the view's security semantics are respected, and it's
>> > backward-compatible with existing clients.
>> >
>> > I've also included a simple proof-of-concept (POC) to show how a client
>> > could implement this, along with Spark. I'm open to feedback on the
>> > approach, especially on how to cleanly pass the view context to the
>> > loadTable request, the POC is just for demonstration *POV*.
>> >
>> > You can view the full proposal (Iceberg View Security
>> > <
>> https://docs.google.com/document/d/15zgmACxue8jH8SIBAJNzZ64Mx6RTRmDv2IoH3Clc2uQ/edit?tab=t.0
>> >)
>> > and for details can refer PR-13810
>> > 
>> >
>> > Looking forward to your thoughts.
>> >
>> > Best,
>> > Prashant Singh
>> >
>>
>


Re: [DISCUSS][SPEC] Add loaded-via in loadTable Request to support Definer Views

2025-08-20 Thread Ryan Blue
I think that ownership should be managed by the catalog and should not be
set by table properties. Respecting an "owner" property would easily lead
to security issues where writers can change the property with write access.
I think ownership or what role is the definer, like other security
attributes, should be managed by the catalog outside of table properties.

It's also reasonable to include in the proposal that this is necessary but
not sufficient to build secure views. A catalog also needs to establish
that the engine is trusted and would also need to validate that the
principal loading the table has access to the view. But these are catalog
problems. While they can be noted in this proposal, it's up to the catalog
to design the policy and enforce it.

On Tue, Aug 19, 2025 at 7:34 PM Sung Yun  wrote:

> Hi Prashant,
>
> Thanks for sharing this proposal! This has been top of mind for me as
> well, since I’m working on integrating Trino with Iceberg while also
> supporting other heterogeneous engines in the ecosystem.
>
> I agree that a change like this is necessary to enable SECURITY DEFINER
> views to correctly provision metadata for the underlying tables. That said,
> even if it’s out of scope for this document, I think it would be valuable
> to also discuss potential attack vectors. Adopters of this feature should
> be aware of the additional risk surface it introduces if there isn’t a
> trusted security contract between the client and the catalog.
>
> I’d also support converging on a consistent metadata property convention
> for view ownership and security mode, so we have a common foundation across
> engines.
>
> Curious to hear what others think.
>
> Best,
> Sung
>
> On 2025/08/18 22:20:33 Prashant Singh wrote:
> > Hi everyone,
> >
> > I wanted to bring up a small but important change to how we handle view
> > security in Iceberg, especially for *DEFINER* views. This change is
> crucial
> > for ensuring views function as a secure gateway to sensitive data, where
> > access is determined by the view's creator, not the user running the
> query.
> >
> > The Problem
> >
> > Currently, when a view queries a table, the loadTable request to the REST
> > catalog doesn't know that it's coming from a view. It just sees the
> table's
> > name and defaults to checking the permissions of the user running the
> > query. This does not work well with the security model for *DEFINER*
> views
> > and prevents them from working as intended.
> >
> > The Proposed Solution (PR-13810
> > )
> >
> > To address this, I've created a pull request that adds an optional
> > loaded-via field to the loadTable request. This field will contain the
> > fully qualified name of the view (e.g., my_db.secure_view) i.e namespace
> > and the view name.
> >
> > When the catalog sees this new field, it will know to perform the
> security
> > check against the view's owner, not the query invoker. This small
> addition
> > ensures that the view's security semantics are respected, and it's
> > backward-compatible with existing clients.
> >
> > I've also included a simple proof-of-concept (POC) to show how a client
> > could implement this, along with Spark. I'm open to feedback on the
> > approach, especially on how to cleanly pass the view context to the
> > loadTable request, the POC is just for demonstration *POV*.
> >
> > You can view the full proposal (Iceberg View Security
> > <
> https://docs.google.com/document/d/15zgmACxue8jH8SIBAJNzZ64Mx6RTRmDv2IoH3Clc2uQ/edit?tab=t.0
> >)
> > and for details can refer PR-13810
> > 
> >
> > Looking forward to your thoughts.
> >
> > Best,
> > Prashant Singh
> >
>


Re: [DISCUSS][SPEC] Add loaded-via in loadTable Request to support Definer Views

2025-08-19 Thread Sung Yun
Hi Prashant,

Thanks for sharing this proposal! This has been top of mind for me as well, 
since I’m working on integrating Trino with Iceberg while also supporting other 
heterogeneous engines in the ecosystem.

I agree that a change like this is necessary to enable SECURITY DEFINER views 
to correctly provision metadata for the underlying tables. That said, even if 
it’s out of scope for this document, I think it would be valuable to also 
discuss potential attack vectors. Adopters of this feature should be aware of 
the additional risk surface it introduces if there isn’t a trusted security 
contract between the client and the catalog.

I’d also support converging on a consistent metadata property convention for 
view ownership and security mode, so we have a common foundation across engines.

Curious to hear what others think.

Best,
Sung

On 2025/08/18 22:20:33 Prashant Singh wrote:
> Hi everyone,
> 
> I wanted to bring up a small but important change to how we handle view
> security in Iceberg, especially for *DEFINER* views. This change is crucial
> for ensuring views function as a secure gateway to sensitive data, where
> access is determined by the view's creator, not the user running the query.
> 
> The Problem
> 
> Currently, when a view queries a table, the loadTable request to the REST
> catalog doesn't know that it's coming from a view. It just sees the table's
> name and defaults to checking the permissions of the user running the
> query. This does not work well with the security model for *DEFINER* views
> and prevents them from working as intended.
> 
> The Proposed Solution (PR-13810
> )
> 
> To address this, I've created a pull request that adds an optional
> loaded-via field to the loadTable request. This field will contain the
> fully qualified name of the view (e.g., my_db.secure_view) i.e namespace
> and the view name.
> 
> When the catalog sees this new field, it will know to perform the security
> check against the view's owner, not the query invoker. This small addition
> ensures that the view's security semantics are respected, and it's
> backward-compatible with existing clients.
> 
> I've also included a simple proof-of-concept (POC) to show how a client
> could implement this, along with Spark. I'm open to feedback on the
> approach, especially on how to cleanly pass the view context to the
> loadTable request, the POC is just for demonstration *POV*.
> 
> You can view the full proposal (Iceberg View Security
> )
> and for details can refer PR-13810
> 
> 
> Looking forward to your thoughts.
> 
> Best,
> Prashant Singh
>