Re: [DISCUSS] Dashboard/HistoryServer authentication

2021-07-05 Thread Márton Balassi
Thank you, team. Then based on this agreement we are moving the proposal to
the wiki and opening the PR soon.

On Thu, Jul 1, 2021 at 12:28 AM Austin Cawley-Edwards <
austin.caw...@gmail.com> wrote:

> >
> > * Even if there is a major breaking version, Flink releases major
> versions
> > too where it could be added
> > Netty framework locking is true but AFAIK there was a discussion to
> > rewrite the Netty stuff to a more sexy thing but there was no agreement
> to
> > do that.
> >
>
> Flink major releases seem to happen even less frequently than Netty
> releases :( It would be unfortunate if a breaking Netty API change ended up
> in the FLINK-3957[1] catch-all 2.0 changes.
>
> All in all I would agree on making it experimental.
> >
>
> Thus I am happy with this compromise, thank you :)
>
> This would simply restrict use-cases where order is not important. Limiting
> > devs such an add way is no-go.
> >
>
> I think the only case to be made for imposing limitations would be to
> encourage devs to only use this API in very specific situations, otherwise
> to solve this in another way, and revisit the API if these limitations are
> met and alternatives do not work. That said, I am still trying to
> understand this specific Cloudera case – anything you can say about the
> limitations of its Flink setup (i.e, difficult to spawn sidecar processes
> (because of Yarn?)) would be greatly helpful to me and others without this
> bit of context.
>
> But I think the proposed priority function that you've added is a nice
> compromise as well, so +1 from my side with the proposal. I would only
> further suggest that we include the other options to this problem in the
> docs as the preferred approach, where possible.
>
> Thanks,
> Austin
>
>
> [1]: https://issues.apache.org/jira/browse/FLINK-3957
>
> On Wed, Jun 30, 2021 at 10:25 AM Gabor Somogyi 
> wrote:
>
> > Answered here because the text started to be crowded.
> >
> > > It also locks Flink into the current major version of Netty (and the
> > Netty framework itself) for the foreseeable future.
> > It's not doing any Netty version locking because:
> > * Netty not necessarily will add breaking changes in major versions, the
> > API is quite stable
> > * Even if there is a major breaking version, Flink releases major
> versions
> > too where it could be added
> > Netty framework locking is true but AFAIK there was a discussion to
> > rewrite the Netty stuff to a more sexy thing but there was no agreement
> to
> > do that.
> > All in all I would agree on making it experimental.
> >
> > > why not restrict the service loader to only allow one?
> > This would simply restrict use-cases where order is not important.
> > Limiting devs such an add way is no-go.
> > I think the ordering came up multiple places which I think is a good
> > reason fill this gap with a priority function.
> > I've updated the doc and added it...
> >
> > BR,
> > G
> >
> >
> > On Wed, Jun 30, 2021 at 3:53 PM Austin Cawley-Edwards <
> > austin.caw...@gmail.com> wrote:
> >
> >> Hi Gabor,
> >>
> >> Thanks for your answers. I appreciate the explanations. Please see my
> >> responses + further questions below.
> >>
> >>
> >> * What stability semantics do you envision for this API?
> 
> >>> As I foresee the API will be as stable as Netty API. Since there is
> >>> guarantee on no breaking changes between minor versions we can give the
> >>> same guarantee.
> >>> If for whatever reason we need to break it we can do it in major
> version
> >>> like every other open source project does.
> >>>
> >>
> >> * Does Flink expose dependencies’ APIs in other places? Since this
>  exposes the Netty API, will this make it difficult to upgrade Netty?
> 
> >>> I don't expect breaking changes between minor versions so such cases
> >>> there will be no issues. If there is a breaking change in major version
> >>> we need to wait Flink major version too.
> >>>
> >>
> >> To clarify, you are proposing this new API to have the same stability
> >> guarantees as @Public currently does? Where we will not introduce
> breaking
> >> changes unless absolutely necessary (and requiring a FLIP, etc.)?
> >>
> >> If this is the case, I think this puts the community in a tough position
> >> where we are forced to maintain compatibility with something that we do
> not
> >> have control over. It also locks Flink into the current major version of
> >> Netty (and the Netty framework itself) for the foreseeable future.
> >>
> >> I am saying we should not do this, perhaps this is the best solution to
> >> finding a good compromise here, but I am trying to discover +
> acknowledge
> >> the full implications of this proposal so they can be discussed.
> >>
> >> What do you think about marking this API as @Experimental and not
> >> guaranteeing stability between versions? Then, if we do decide we need
> to
> >> upgrade Netty (or move away from it), we can do so.
> >>
> >> * I share Till's concern about multiple factories – other HTTP
> 

Re: [DISCUSS] Dashboard/HistoryServer authentication

2021-06-30 Thread Austin Cawley-Edwards
>
> * Even if there is a major breaking version, Flink releases major versions
> too where it could be added
> Netty framework locking is true but AFAIK there was a discussion to
> rewrite the Netty stuff to a more sexy thing but there was no agreement to
> do that.
>

Flink major releases seem to happen even less frequently than Netty
releases :( It would be unfortunate if a breaking Netty API change ended up
in the FLINK-3957[1] catch-all 2.0 changes.

All in all I would agree on making it experimental.
>

Thus I am happy with this compromise, thank you :)

This would simply restrict use-cases where order is not important. Limiting
> devs such an add way is no-go.
>

I think the only case to be made for imposing limitations would be to
encourage devs to only use this API in very specific situations, otherwise
to solve this in another way, and revisit the API if these limitations are
met and alternatives do not work. That said, I am still trying to
understand this specific Cloudera case – anything you can say about the
limitations of its Flink setup (i.e, difficult to spawn sidecar processes
(because of Yarn?)) would be greatly helpful to me and others without this
bit of context.

But I think the proposed priority function that you've added is a nice
compromise as well, so +1 from my side with the proposal. I would only
further suggest that we include the other options to this problem in the
docs as the preferred approach, where possible.

Thanks,
Austin


[1]: https://issues.apache.org/jira/browse/FLINK-3957

On Wed, Jun 30, 2021 at 10:25 AM Gabor Somogyi 
wrote:

> Answered here because the text started to be crowded.
>
> > It also locks Flink into the current major version of Netty (and the
> Netty framework itself) for the foreseeable future.
> It's not doing any Netty version locking because:
> * Netty not necessarily will add breaking changes in major versions, the
> API is quite stable
> * Even if there is a major breaking version, Flink releases major versions
> too where it could be added
> Netty framework locking is true but AFAIK there was a discussion to
> rewrite the Netty stuff to a more sexy thing but there was no agreement to
> do that.
> All in all I would agree on making it experimental.
>
> > why not restrict the service loader to only allow one?
> This would simply restrict use-cases where order is not important.
> Limiting devs such an add way is no-go.
> I think the ordering came up multiple places which I think is a good
> reason fill this gap with a priority function.
> I've updated the doc and added it...
>
> BR,
> G
>
>
> On Wed, Jun 30, 2021 at 3:53 PM Austin Cawley-Edwards <
> austin.caw...@gmail.com> wrote:
>
>> Hi Gabor,
>>
>> Thanks for your answers. I appreciate the explanations. Please see my
>> responses + further questions below.
>>
>>
>> * What stability semantics do you envision for this API?

>>> As I foresee the API will be as stable as Netty API. Since there is
>>> guarantee on no breaking changes between minor versions we can give the
>>> same guarantee.
>>> If for whatever reason we need to break it we can do it in major version
>>> like every other open source project does.
>>>
>>
>> * Does Flink expose dependencies’ APIs in other places? Since this
 exposes the Netty API, will this make it difficult to upgrade Netty?

>>> I don't expect breaking changes between minor versions so such cases
>>> there will be no issues. If there is a breaking change in major version
>>> we need to wait Flink major version too.
>>>
>>
>> To clarify, you are proposing this new API to have the same stability
>> guarantees as @Public currently does? Where we will not introduce breaking
>> changes unless absolutely necessary (and requiring a FLIP, etc.)?
>>
>> If this is the case, I think this puts the community in a tough position
>> where we are forced to maintain compatibility with something that we do not
>> have control over. It also locks Flink into the current major version of
>> Netty (and the Netty framework itself) for the foreseeable future.
>>
>> I am saying we should not do this, perhaps this is the best solution to
>> finding a good compromise here, but I am trying to discover + acknowledge
>> the full implications of this proposal so they can be discussed.
>>
>> What do you think about marking this API as @Experimental and not
>> guaranteeing stability between versions? Then, if we do decide we need to
>> upgrade Netty (or move away from it), we can do so.
>>
>> * I share Till's concern about multiple factories – other HTTP middleware
 frameworks commonly support chaining middlewares. Since the proposed API
 does not include these features/guarantee ordering, do you see any reason
 to allow more than one factory?

>>> I personally can't come up with a use-case where ordering is a must. I'm
>>> not telling that this is not a valid use-case but adding a feature w/o
>>> business rationale would include the maintenance cost (though I'm open to

Re: [DISCUSS] Dashboard/HistoryServer authentication

2021-06-30 Thread Gabor Somogyi
Answered here because the text started to be crowded.

> It also locks Flink into the current major version of Netty (and the
Netty framework itself) for the foreseeable future.
It's not doing any Netty version locking because:
* Netty not necessarily will add breaking changes in major versions, the
API is quite stable
* Even if there is a major breaking version, Flink releases major versions
too where it could be added
Netty framework locking is true but AFAIK there was a discussion to rewrite
the Netty stuff to a more sexy thing but there was no agreement to do that.
All in all I would agree on making it experimental.

> why not restrict the service loader to only allow one?
This would simply restrict use-cases where order is not important. Limiting
devs such an add way is no-go.
I think the ordering came up multiple places which I think is a good reason
fill this gap with a priority function.
I've updated the doc and added it...

BR,
G


On Wed, Jun 30, 2021 at 3:53 PM Austin Cawley-Edwards <
austin.caw...@gmail.com> wrote:

> Hi Gabor,
>
> Thanks for your answers. I appreciate the explanations. Please see my
> responses + further questions below.
>
>
> * What stability semantics do you envision for this API?
>>>
>> As I foresee the API will be as stable as Netty API. Since there is
>> guarantee on no breaking changes between minor versions we can give the
>> same guarantee.
>> If for whatever reason we need to break it we can do it in major version
>> like every other open source project does.
>>
>
> * Does Flink expose dependencies’ APIs in other places? Since this exposes
>>> the Netty API, will this make it difficult to upgrade Netty?
>>>
>> I don't expect breaking changes between minor versions so such cases
>> there will be no issues. If there is a breaking change in major version
>> we need to wait Flink major version too.
>>
>
> To clarify, you are proposing this new API to have the same stability
> guarantees as @Public currently does? Where we will not introduce breaking
> changes unless absolutely necessary (and requiring a FLIP, etc.)?
>
> If this is the case, I think this puts the community in a tough position
> where we are forced to maintain compatibility with something that we do not
> have control over. It also locks Flink into the current major version of
> Netty (and the Netty framework itself) for the foreseeable future.
>
> I am saying we should not do this, perhaps this is the best solution to
> finding a good compromise here, but I am trying to discover + acknowledge
> the full implications of this proposal so they can be discussed.
>
> What do you think about marking this API as @Experimental and not
> guaranteeing stability between versions? Then, if we do decide we need to
> upgrade Netty (or move away from it), we can do so.
>
> * I share Till's concern about multiple factories – other HTTP middleware
>>> frameworks commonly support chaining middlewares. Since the proposed API
>>> does not include these features/guarantee ordering, do you see any reason
>>> to allow more than one factory?
>>>
>> I personally can't come up with a use-case where ordering is a must. I'm
>> not telling that this is not a valid use-case but adding a feature w/o
>> business rationale would include the maintenance cost (though I'm open to
>> add).
>> As I've seen Till also can't give example for that (please see the doc
>> comments). If you have anything in mind please share it and we can add
>> priority to the API.
>> There is another option too, namely we can be defensive and we can add
>> the priority right now. I would do this only if everybody states in mail
>> that it would be the best option,
>> otherwise I would stick to the original plan.
>>
>
> Let me try to come up with a use case:
> * Someone creates an authentication module for integrating with Google's
> OAuth and publishes it to flink-packages
> * Another person in another org wants to use Google OAuth and then add
> internal authorization based on the user
> * In this scenario, *Google OAuth must come before the internal
> authorization*
> * They place their module and the Google OAuth module to be picked up by
> the service loader
> * What happens?
>
> I do not think that the current proposal has a way to handle this, besides
> having the implementor of the internal authorization module bundle
> everything into one, as you have suggested. Since this is the only way to
> achieve order, why not restrict the service loader to only allow one? This
> way the API is explicit in what it supports.
>
>
> Let me know what you think,
> Austin
>
>
> On Wed, Jun 30, 2021 at 5:24 AM Gabor Somogyi 
> wrote:
>
>> Hi Austin,
>>
>> Please see my answers embedded down below.
>>
>> BR,
>> G
>>
>>
>>
>> On Tue, Jun 29, 2021 at 9:59 PM Austin Cawley-Edwards <
>> austin.caw...@gmail.com> wrote:
>>
>>> Hi all,
>>>
>>> Thanks for the updated proposal. I have a few questions about the API,
>>> please see below.
>>>
>>> * What stability semantics do you 

Re: [DISCUSS] Dashboard/HistoryServer authentication

2021-06-30 Thread Austin Cawley-Edwards
Small correction:

I am *not *saying we should not do this, perhaps this is the best solution
to finding a good compromise here, but I am trying to discover +
acknowledge the full implications of this proposal so they can be
discussed.

Sorry :)

On Wed, Jun 30, 2021 at 9:53 AM Austin Cawley-Edwards <
austin.caw...@gmail.com> wrote:

> Hi Gabor,
>
> Thanks for your answers. I appreciate the explanations. Please see my
> responses + further questions below.
>
>
> * What stability semantics do you envision for this API?
>>>
>> As I foresee the API will be as stable as Netty API. Since there is
>> guarantee on no breaking changes between minor versions we can give the
>> same guarantee.
>> If for whatever reason we need to break it we can do it in major version
>> like every other open source project does.
>>
>
> * Does Flink expose dependencies’ APIs in other places? Since this exposes
>>> the Netty API, will this make it difficult to upgrade Netty?
>>>
>> I don't expect breaking changes between minor versions so such cases
>> there will be no issues. If there is a breaking change in major version
>> we need to wait Flink major version too.
>>
>
> To clarify, you are proposing this new API to have the same stability
> guarantees as @Public currently does? Where we will not introduce breaking
> changes unless absolutely necessary (and requiring a FLIP, etc.)?
>
> If this is the case, I think this puts the community in a tough position
> where we are forced to maintain compatibility with something that we do not
> have control over. It also locks Flink into the current major version of
> Netty (and the Netty framework itself) for the foreseeable future.
>
> I am saying we should not do this, perhaps this is the best solution to
> finding a good compromise here, but I am trying to discover + acknowledge
> the full implications of this proposal so they can be discussed.
>
> What do you think about marking this API as @Experimental and not
> guaranteeing stability between versions? Then, if we do decide we need to
> upgrade Netty (or move away from it), we can do so.
>
> * I share Till's concern about multiple factories – other HTTP middleware
>>> frameworks commonly support chaining middlewares. Since the proposed API
>>> does not include these features/guarantee ordering, do you see any reason
>>> to allow more than one factory?
>>>
>> I personally can't come up with a use-case where ordering is a must. I'm
>> not telling that this is not a valid use-case but adding a feature w/o
>> business rationale would include the maintenance cost (though I'm open to
>> add).
>> As I've seen Till also can't give example for that (please see the doc
>> comments). If you have anything in mind please share it and we can add
>> priority to the API.
>> There is another option too, namely we can be defensive and we can add
>> the priority right now. I would do this only if everybody states in mail
>> that it would be the best option,
>> otherwise I would stick to the original plan.
>>
>
> Let me try to come up with a use case:
> * Someone creates an authentication module for integrating with Google's
> OAuth and publishes it to flink-packages
> * Another person in another org wants to use Google OAuth and then add
> internal authorization based on the user
> * In this scenario, *Google OAuth must come before the internal
> authorization*
> * They place their module and the Google OAuth module to be picked up by
> the service loader
> * What happens?
>
> I do not think that the current proposal has a way to handle this, besides
> having the implementor of the internal authorization module bundle
> everything into one, as you have suggested. Since this is the only way to
> achieve order, why not restrict the service loader to only allow one? This
> way the API is explicit in what it supports.
>
>
> Let me know what you think,
> Austin
>
>
> On Wed, Jun 30, 2021 at 5:24 AM Gabor Somogyi 
> wrote:
>
>> Hi Austin,
>>
>> Please see my answers embedded down below.
>>
>> BR,
>> G
>>
>>
>>
>> On Tue, Jun 29, 2021 at 9:59 PM Austin Cawley-Edwards <
>> austin.caw...@gmail.com> wrote:
>>
>>> Hi all,
>>>
>>> Thanks for the updated proposal. I have a few questions about the API,
>>> please see below.
>>>
>>> * What stability semantics do you envision for this API?
>>>
>> As I foresee the API will be as stable as Netty API. Since there is
>> guarantee on no breaking changes between minor versions we can give the
>> same guarantee.
>> If for whatever reason we need to break it we can do it in major version
>> like every other open source project does.
>>
>>
>>> * Does Flink expose dependencies’ APIs in other places? Since this
>>> exposes the Netty API, will this make it difficult to upgrade Netty?
>>>
>> I don't expect breaking changes between minor versions so such cases
>> there will be no issues. If there is a breaking change in major version
>> we need to wait Flink major version too.
>>
>>
>>> * I share Till's concern about multiple 

Re: [DISCUSS] Dashboard/HistoryServer authentication

2021-06-30 Thread Austin Cawley-Edwards
Hi Gabor,

Thanks for your answers. I appreciate the explanations. Please see my
responses + further questions below.


* What stability semantics do you envision for this API?
>>
> As I foresee the API will be as stable as Netty API. Since there is
> guarantee on no breaking changes between minor versions we can give the
> same guarantee.
> If for whatever reason we need to break it we can do it in major version
> like every other open source project does.
>

* Does Flink expose dependencies’ APIs in other places? Since this exposes
>> the Netty API, will this make it difficult to upgrade Netty?
>>
> I don't expect breaking changes between minor versions so such cases there
> will be no issues. If there is a breaking change in major version
> we need to wait Flink major version too.
>

To clarify, you are proposing this new API to have the same stability
guarantees as @Public currently does? Where we will not introduce breaking
changes unless absolutely necessary (and requiring a FLIP, etc.)?

If this is the case, I think this puts the community in a tough position
where we are forced to maintain compatibility with something that we do not
have control over. It also locks Flink into the current major version of
Netty (and the Netty framework itself) for the foreseeable future.

I am saying we should not do this, perhaps this is the best solution to
finding a good compromise here, but I am trying to discover + acknowledge
the full implications of this proposal so they can be discussed.

What do you think about marking this API as @Experimental and not
guaranteeing stability between versions? Then, if we do decide we need to
upgrade Netty (or move away from it), we can do so.

* I share Till's concern about multiple factories – other HTTP middleware
>> frameworks commonly support chaining middlewares. Since the proposed API
>> does not include these features/guarantee ordering, do you see any reason
>> to allow more than one factory?
>>
> I personally can't come up with a use-case where ordering is a must. I'm
> not telling that this is not a valid use-case but adding a feature w/o
> business rationale would include the maintenance cost (though I'm open to
> add).
> As I've seen Till also can't give example for that (please see the doc
> comments). If you have anything in mind please share it and we can add
> priority to the API.
> There is another option too, namely we can be defensive and we can add the
> priority right now. I would do this only if everybody states in mail that
> it would be the best option,
> otherwise I would stick to the original plan.
>

Let me try to come up with a use case:
* Someone creates an authentication module for integrating with Google's
OAuth and publishes it to flink-packages
* Another person in another org wants to use Google OAuth and then add
internal authorization based on the user
* In this scenario, *Google OAuth must come before the internal
authorization*
* They place their module and the Google OAuth module to be picked up by
the service loader
* What happens?

I do not think that the current proposal has a way to handle this, besides
having the implementor of the internal authorization module bundle
everything into one, as you have suggested. Since this is the only way to
achieve order, why not restrict the service loader to only allow one? This
way the API is explicit in what it supports.


Let me know what you think,
Austin


On Wed, Jun 30, 2021 at 5:24 AM Gabor Somogyi 
wrote:

> Hi Austin,
>
> Please see my answers embedded down below.
>
> BR,
> G
>
>
>
> On Tue, Jun 29, 2021 at 9:59 PM Austin Cawley-Edwards <
> austin.caw...@gmail.com> wrote:
>
>> Hi all,
>>
>> Thanks for the updated proposal. I have a few questions about the API,
>> please see below.
>>
>> * What stability semantics do you envision for this API?
>>
> As I foresee the API will be as stable as Netty API. Since there is
> guarantee on no breaking changes between minor versions we can give the
> same guarantee.
> If for whatever reason we need to break it we can do it in major version
> like every other open source project does.
>
>
>> * Does Flink expose dependencies’ APIs in other places? Since this
>> exposes the Netty API, will this make it difficult to upgrade Netty?
>>
> I don't expect breaking changes between minor versions so such cases there
> will be no issues. If there is a breaking change in major version
> we need to wait Flink major version too.
>
>
>> * I share Till's concern about multiple factories – other HTTP middleware
>> frameworks commonly support chaining middlewares. Since the proposed API
>> does not include these features/guarantee ordering, do you see any reason
>> to allow more than one factory?
>>
> I personally can't come up with a use-case where ordering is a must. I'm
> not telling that this is not a valid use-case but adding a feature w/o
> business rationale would include the maintenance cost (though I'm open to
> add).
> As I've seen Till also can't give 

Re: [DISCUSS] Dashboard/HistoryServer authentication

2021-06-30 Thread Gabor Somogyi
Hi Austin,

Please see my answers embedded down below.

BR,
G



On Tue, Jun 29, 2021 at 9:59 PM Austin Cawley-Edwards <
austin.caw...@gmail.com> wrote:

> Hi all,
>
> Thanks for the updated proposal. I have a few questions about the API,
> please see below.
>
> * What stability semantics do you envision for this API?
>
As I foresee the API will be as stable as Netty API. Since there is
guarantee on no breaking changes between minor versions we can give the
same guarantee.
If for whatever reason we need to break it we can do it in major version
like every other open source project does.


> * Does Flink expose dependencies’ APIs in other places? Since this exposes
> the Netty API, will this make it difficult to upgrade Netty?
>
I don't expect breaking changes between minor versions so such cases there
will be no issues. If there is a breaking change in major version
we need to wait Flink major version too.


> * I share Till's concern about multiple factories – other HTTP middleware
> frameworks commonly support chaining middlewares. Since the proposed API
> does not include these features/guarantee ordering, do you see any reason
> to allow more than one factory?
>
I personally can't come up with a use-case where ordering is a must. I'm
not telling that this is not a valid use-case but adding a feature w/o
business rationale would include the maintenance cost (though I'm open to
add).
As I've seen Till also can't give example for that (please see the doc
comments). If you have anything in mind please share it and we can add
priority to the API.
There is another option too, namely we can be defensive and we can add the
priority right now. I would do this only if everybody states in mail that
it would be the best option,
otherwise I would stick to the original plan.


>
> Best,
> Austin
>
> On Tue, Jun 29, 2021 at 8:55 AM Márton Balassi 
> wrote:
>
>> Hi all,
>>
>> I commend Konstantin and Till when it comes to standing up for the
>> community values.
>>
>> Based on your feedback we are withdrawing the original proposal and
>> attaching a more general custom netty handler API proposal [1] written by
>> G. The change necessary to the Flink repository is approximately 500 lines
>> of code. [2]
>>
>> Please let us focus on discussing the details of this API and whether it
>> covers the necessary use cases.
>>
>> [1]
>>
>> https://docs.google.com/document/d/1Idnw8YauMK1x_14iv0rVF0Hqm58J6Dg-hi-hEuL6hwM/edit#heading=h.ijcbce3c5gip
>> [2]
>>
>> https://github.com/gaborgsomogyi/flink/commit/942f23679ac21428bb87fc85557b9b443fcaf310
>>
>> Thanks,
>> Marton
>>
>> On Wed, Jun 23, 2021 at 9:36 PM Austin Cawley-Edwards <
>> austin.caw...@gmail.com> wrote:
>>
>> > Hi all,
>> >
>> > Thanks, Konstantin and Till, for guiding the discussion.
>> >
>> > I was not aware of the results of the call with Konstantin and was
>> > attempting to resolve the unanswered questions before more, potentially
>> > fruitless, work was done.
>> >
>> > I am also looking forward to the coming proposal, as well as increasing
>> my
>> > understanding of this specific use case + its limitations!
>> >
>> > Best,
>> > Austin
>> >
>> > On Tue, Jun 22, 2021 at 6:32 AM Till Rohrmann 
>> > wrote:
>> >
>> > > Hi everyone,
>> > >
>> > > I do like the idea of keeping the actual change outside of Flink but
>> to
>> > > enable Flink to support such a use case (different authentication
>> > > mechanisms). I think this is a good compromise for the community that
>> > > combines long-term maintainability with support for new use-cases. I
>> am
>> > > looking forward to your proposal.
>> > >
>> > > I also want to second Konstantin here that the tone of your last
>> email,
>> > > Marton, does not reflect the values and manners of the Flink community
>> > and
>> > > is not representative of how we conduct discussions. Especially, the
>> more
>> > > senior community members should know this and act accordingly in
>> order to
>> > > be good role models for others in the community. Technical discussions
>> > > should not be decided by who wields presumably the greatest authority
>> but
>> > > by the soundness of arguments and by what is the best solution for a
>> > > problem.
>> > >
>> > > Let us now try to find the best solution for the problem at hand!
>> > >
>> > > Cheers,
>> > > Till
>> > >
>> > > On Tue, Jun 22, 2021 at 11:24 AM Konstantin Knauf 
>> > > wrote:
>> > >
>> > > > Hi everyone,
>> > > >
>> > > > First, Marton and I had a brief conversation yesterday offline and
>> > > > discussed exploring the approach of exposing the authentication
>> > > > functionality via an API. So, I am looking forward to your proposal
>> in
>> > > that
>> > > > direction. The benefit of such a solution would be that it is
>> > extensible
>> > > > for others and it does add a smaller maintenance (in particular
>> > testing)
>> > > > footprint to Apache Flink itself. If we end up going down this
>> route,
>> > > > flink-packages.org would be a great way to promote these third
>> 

Re: [DISCUSS] Dashboard/HistoryServer authentication

2021-06-29 Thread Austin Cawley-Edwards
Hi all,

Thanks for the updated proposal. I have a few questions about the API,
please see below.

* What stability semantics do you envision for this API?
* Does Flink expose dependencies’ APIs in other places? Since this exposes
the Netty API, will this make it difficult to upgrade Netty?
* I share Till's concern about multiple factories – other HTTP middleware
frameworks commonly support chaining middlewares. Since the proposed API
does not include these features/guarantee ordering, do you see any reason
to allow more than one factory?

Best,
Austin

On Tue, Jun 29, 2021 at 8:55 AM Márton Balassi 
wrote:

> Hi all,
>
> I commend Konstantin and Till when it comes to standing up for the
> community values.
>
> Based on your feedback we are withdrawing the original proposal and
> attaching a more general custom netty handler API proposal [1] written by
> G. The change necessary to the Flink repository is approximately 500 lines
> of code. [2]
>
> Please let us focus on discussing the details of this API and whether it
> covers the necessary use cases.
>
> [1]
>
> https://docs.google.com/document/d/1Idnw8YauMK1x_14iv0rVF0Hqm58J6Dg-hi-hEuL6hwM/edit#heading=h.ijcbce3c5gip
> [2]
>
> https://github.com/gaborgsomogyi/flink/commit/942f23679ac21428bb87fc85557b9b443fcaf310
>
> Thanks,
> Marton
>
> On Wed, Jun 23, 2021 at 9:36 PM Austin Cawley-Edwards <
> austin.caw...@gmail.com> wrote:
>
> > Hi all,
> >
> > Thanks, Konstantin and Till, for guiding the discussion.
> >
> > I was not aware of the results of the call with Konstantin and was
> > attempting to resolve the unanswered questions before more, potentially
> > fruitless, work was done.
> >
> > I am also looking forward to the coming proposal, as well as increasing
> my
> > understanding of this specific use case + its limitations!
> >
> > Best,
> > Austin
> >
> > On Tue, Jun 22, 2021 at 6:32 AM Till Rohrmann 
> > wrote:
> >
> > > Hi everyone,
> > >
> > > I do like the idea of keeping the actual change outside of Flink but to
> > > enable Flink to support such a use case (different authentication
> > > mechanisms). I think this is a good compromise for the community that
> > > combines long-term maintainability with support for new use-cases. I am
> > > looking forward to your proposal.
> > >
> > > I also want to second Konstantin here that the tone of your last email,
> > > Marton, does not reflect the values and manners of the Flink community
> > and
> > > is not representative of how we conduct discussions. Especially, the
> more
> > > senior community members should know this and act accordingly in order
> to
> > > be good role models for others in the community. Technical discussions
> > > should not be decided by who wields presumably the greatest authority
> but
> > > by the soundness of arguments and by what is the best solution for a
> > > problem.
> > >
> > > Let us now try to find the best solution for the problem at hand!
> > >
> > > Cheers,
> > > Till
> > >
> > > On Tue, Jun 22, 2021 at 11:24 AM Konstantin Knauf 
> > > wrote:
> > >
> > > > Hi everyone,
> > > >
> > > > First, Marton and I had a brief conversation yesterday offline and
> > > > discussed exploring the approach of exposing the authentication
> > > > functionality via an API. So, I am looking forward to your proposal
> in
> > > that
> > > > direction. The benefit of such a solution would be that it is
> > extensible
> > > > for others and it does add a smaller maintenance (in particular
> > testing)
> > > > footprint to Apache Flink itself. If we end up going down this route,
> > > > flink-packages.org would be a great way to promote these third party
> > > > "authentication modules".
> > > >
> > > > Second, Marton, I understand your frustration about the long
> discussion
> > > on
> > > > this "simple matter", but the condescending tone of your last mail
> > feels
> > > > uncalled for to me. Austin expressed a valid opinion on the topic,
> > which
> > > is
> > > > based on his experience from other Open Source frameworks (CNCF
> > mostly).
> > > I
> > > > am sure you agree that it is important for Apache Flink to stay open
> > and
> > > to
> > > > consider different approaches and ideas and I don't think it helps
> the
> > > > culture of discussion to shoot it down like this ("This is where this
> > > > discussion stops.").
> > > >
> > > > Let's continue to move this discussion forward and I am sure we'll
> > find a
> > > > consensus based on product and technological considerations.
> > > >
> > > > Thanks,
> > > >
> > > > Konstantin
> > > >
> > > > On Tue, Jun 22, 2021 at 9:31 AM Márton Balassi <
> > balassi.mar...@gmail.com
> > > >
> > > > wrote:
> > > >
> > > > > Hi Austin,
> > > > >
> > > > > Thank you for your thoughts. This is where this discussion stops.
> > This
> > > > > email thread already contains more characters than the
> implementation
> > > and
> > > > > what is needed for the next 20 years of maintenance.
> > > > >
> > > > > It is great that you have a view on modern 

Re: [DISCUSS] Dashboard/HistoryServer authentication

2021-06-29 Thread Márton Balassi
Hi all,

I commend Konstantin and Till when it comes to standing up for the
community values.

Based on your feedback we are withdrawing the original proposal and
attaching a more general custom netty handler API proposal [1] written by
G. The change necessary to the Flink repository is approximately 500 lines
of code. [2]

Please let us focus on discussing the details of this API and whether it
covers the necessary use cases.

[1]
https://docs.google.com/document/d/1Idnw8YauMK1x_14iv0rVF0Hqm58J6Dg-hi-hEuL6hwM/edit#heading=h.ijcbce3c5gip
[2]
https://github.com/gaborgsomogyi/flink/commit/942f23679ac21428bb87fc85557b9b443fcaf310

Thanks,
Marton

On Wed, Jun 23, 2021 at 9:36 PM Austin Cawley-Edwards <
austin.caw...@gmail.com> wrote:

> Hi all,
>
> Thanks, Konstantin and Till, for guiding the discussion.
>
> I was not aware of the results of the call with Konstantin and was
> attempting to resolve the unanswered questions before more, potentially
> fruitless, work was done.
>
> I am also looking forward to the coming proposal, as well as increasing my
> understanding of this specific use case + its limitations!
>
> Best,
> Austin
>
> On Tue, Jun 22, 2021 at 6:32 AM Till Rohrmann 
> wrote:
>
> > Hi everyone,
> >
> > I do like the idea of keeping the actual change outside of Flink but to
> > enable Flink to support such a use case (different authentication
> > mechanisms). I think this is a good compromise for the community that
> > combines long-term maintainability with support for new use-cases. I am
> > looking forward to your proposal.
> >
> > I also want to second Konstantin here that the tone of your last email,
> > Marton, does not reflect the values and manners of the Flink community
> and
> > is not representative of how we conduct discussions. Especially, the more
> > senior community members should know this and act accordingly in order to
> > be good role models for others in the community. Technical discussions
> > should not be decided by who wields presumably the greatest authority but
> > by the soundness of arguments and by what is the best solution for a
> > problem.
> >
> > Let us now try to find the best solution for the problem at hand!
> >
> > Cheers,
> > Till
> >
> > On Tue, Jun 22, 2021 at 11:24 AM Konstantin Knauf 
> > wrote:
> >
> > > Hi everyone,
> > >
> > > First, Marton and I had a brief conversation yesterday offline and
> > > discussed exploring the approach of exposing the authentication
> > > functionality via an API. So, I am looking forward to your proposal in
> > that
> > > direction. The benefit of such a solution would be that it is
> extensible
> > > for others and it does add a smaller maintenance (in particular
> testing)
> > > footprint to Apache Flink itself. If we end up going down this route,
> > > flink-packages.org would be a great way to promote these third party
> > > "authentication modules".
> > >
> > > Second, Marton, I understand your frustration about the long discussion
> > on
> > > this "simple matter", but the condescending tone of your last mail
> feels
> > > uncalled for to me. Austin expressed a valid opinion on the topic,
> which
> > is
> > > based on his experience from other Open Source frameworks (CNCF
> mostly).
> > I
> > > am sure you agree that it is important for Apache Flink to stay open
> and
> > to
> > > consider different approaches and ideas and I don't think it helps the
> > > culture of discussion to shoot it down like this ("This is where this
> > > discussion stops.").
> > >
> > > Let's continue to move this discussion forward and I am sure we'll
> find a
> > > consensus based on product and technological considerations.
> > >
> > > Thanks,
> > >
> > > Konstantin
> > >
> > > On Tue, Jun 22, 2021 at 9:31 AM Márton Balassi <
> balassi.mar...@gmail.com
> > >
> > > wrote:
> > >
> > > > Hi Austin,
> > > >
> > > > Thank you for your thoughts. This is where this discussion stops.
> This
> > > > email thread already contains more characters than the implementation
> > and
> > > > what is needed for the next 20 years of maintenance.
> > > >
> > > > It is great that you have a view on modern solutions and thank you
> for
> > > > offering your help with brainstorming solutions. I am responsible for
> > > Flink
> > > > at Cloudera and we do need an implementation like this and it is in
> > fact
> > > > already in production at dozens of customers. We are open to adapting
> > > that
> > > > to expose a more generic API (and keeping Kerberos to our fork), to
> > > > contribute this to the community as others have asked for it and to
> > > protect
> > > > ourselves from occasionally having to update this critical
> > implementation
> > > > path based on changes in the Apache codebase. I have worked with
> close
> > > to a
> > > > hundred Big Data customers as a consultant and an engineering manager
> > and
> > > > committed hundreds of changes to Apache Flink over the past decade,
> > > please
> > > > trust my judgement on a simple matter like this.

Re: [DISCUSS] Dashboard/HistoryServer authentication

2021-06-23 Thread Austin Cawley-Edwards
Hi all,

Thanks, Konstantin and Till, for guiding the discussion.

I was not aware of the results of the call with Konstantin and was
attempting to resolve the unanswered questions before more, potentially
fruitless, work was done.

I am also looking forward to the coming proposal, as well as increasing my
understanding of this specific use case + its limitations!

Best,
Austin

On Tue, Jun 22, 2021 at 6:32 AM Till Rohrmann  wrote:

> Hi everyone,
>
> I do like the idea of keeping the actual change outside of Flink but to
> enable Flink to support such a use case (different authentication
> mechanisms). I think this is a good compromise for the community that
> combines long-term maintainability with support for new use-cases. I am
> looking forward to your proposal.
>
> I also want to second Konstantin here that the tone of your last email,
> Marton, does not reflect the values and manners of the Flink community and
> is not representative of how we conduct discussions. Especially, the more
> senior community members should know this and act accordingly in order to
> be good role models for others in the community. Technical discussions
> should not be decided by who wields presumably the greatest authority but
> by the soundness of arguments and by what is the best solution for a
> problem.
>
> Let us now try to find the best solution for the problem at hand!
>
> Cheers,
> Till
>
> On Tue, Jun 22, 2021 at 11:24 AM Konstantin Knauf 
> wrote:
>
> > Hi everyone,
> >
> > First, Marton and I had a brief conversation yesterday offline and
> > discussed exploring the approach of exposing the authentication
> > functionality via an API. So, I am looking forward to your proposal in
> that
> > direction. The benefit of such a solution would be that it is extensible
> > for others and it does add a smaller maintenance (in particular testing)
> > footprint to Apache Flink itself. If we end up going down this route,
> > flink-packages.org would be a great way to promote these third party
> > "authentication modules".
> >
> > Second, Marton, I understand your frustration about the long discussion
> on
> > this "simple matter", but the condescending tone of your last mail feels
> > uncalled for to me. Austin expressed a valid opinion on the topic, which
> is
> > based on his experience from other Open Source frameworks (CNCF mostly).
> I
> > am sure you agree that it is important for Apache Flink to stay open and
> to
> > consider different approaches and ideas and I don't think it helps the
> > culture of discussion to shoot it down like this ("This is where this
> > discussion stops.").
> >
> > Let's continue to move this discussion forward and I am sure we'll find a
> > consensus based on product and technological considerations.
> >
> > Thanks,
> >
> > Konstantin
> >
> > On Tue, Jun 22, 2021 at 9:31 AM Márton Balassi  >
> > wrote:
> >
> > > Hi Austin,
> > >
> > > Thank you for your thoughts. This is where this discussion stops. This
> > > email thread already contains more characters than the implementation
> and
> > > what is needed for the next 20 years of maintenance.
> > >
> > > It is great that you have a view on modern solutions and thank you for
> > > offering your help with brainstorming solutions. I am responsible for
> > Flink
> > > at Cloudera and we do need an implementation like this and it is in
> fact
> > > already in production at dozens of customers. We are open to adapting
> > that
> > > to expose a more generic API (and keeping Kerberos to our fork), to
> > > contribute this to the community as others have asked for it and to
> > protect
> > > ourselves from occasionally having to update this critical
> implementation
> > > path based on changes in the Apache codebase. I have worked with close
> > to a
> > > hundred Big Data customers as a consultant and an engineering manager
> and
> > > committed hundreds of changes to Apache Flink over the past decade,
> > please
> > > trust my judgement on a simple matter like this.
> > >
> > > Please forgive me for referencing authority, this discussion was
> getting
> > > out of hand. Please keep vigilant.
> > >
> > > Best,
> > > Marton
> > >
> > > On Mon, Jun 21, 2021 at 10:50 PM Austin Cawley-Edwards <
> > > austin.caw...@gmail.com> wrote:
> > >
> > > > Hi Gabor + Marton,
> > > >
> > > > I don't believe that the issue with this proposal is the specific
> > > mechanism
> > > > proposed (Kerberos), but rather that it is not the level to implement
> > it
> > > at
> > > > (Flink). I'm just one voice, so please take this with a grain of
> salt.
> > > >
> > > > In the other solutions previously noted there is no need to
> instrument
> > > > Flink which, in addition to reducing the maintenance burden,
> provides a
> > > > better, decoupled end result.
> > > >
> > > > IMO we should not add any new API in Flink for this use case. I think
> > it
> > > is
> > > > unfortunate and sympathize with the work that has already been done
> on
> > > this
> > > > feature – 

Re: [DISCUSS] Dashboard/HistoryServer authentication

2021-06-22 Thread Till Rohrmann
Hi everyone,

I do like the idea of keeping the actual change outside of Flink but to
enable Flink to support such a use case (different authentication
mechanisms). I think this is a good compromise for the community that
combines long-term maintainability with support for new use-cases. I am
looking forward to your proposal.

I also want to second Konstantin here that the tone of your last email,
Marton, does not reflect the values and manners of the Flink community and
is not representative of how we conduct discussions. Especially, the more
senior community members should know this and act accordingly in order to
be good role models for others in the community. Technical discussions
should not be decided by who wields presumably the greatest authority but
by the soundness of arguments and by what is the best solution for a
problem.

Let us now try to find the best solution for the problem at hand!

Cheers,
Till

On Tue, Jun 22, 2021 at 11:24 AM Konstantin Knauf  wrote:

> Hi everyone,
>
> First, Marton and I had a brief conversation yesterday offline and
> discussed exploring the approach of exposing the authentication
> functionality via an API. So, I am looking forward to your proposal in that
> direction. The benefit of such a solution would be that it is extensible
> for others and it does add a smaller maintenance (in particular testing)
> footprint to Apache Flink itself. If we end up going down this route,
> flink-packages.org would be a great way to promote these third party
> "authentication modules".
>
> Second, Marton, I understand your frustration about the long discussion on
> this "simple matter", but the condescending tone of your last mail feels
> uncalled for to me. Austin expressed a valid opinion on the topic, which is
> based on his experience from other Open Source frameworks (CNCF mostly). I
> am sure you agree that it is important for Apache Flink to stay open and to
> consider different approaches and ideas and I don't think it helps the
> culture of discussion to shoot it down like this ("This is where this
> discussion stops.").
>
> Let's continue to move this discussion forward and I am sure we'll find a
> consensus based on product and technological considerations.
>
> Thanks,
>
> Konstantin
>
> On Tue, Jun 22, 2021 at 9:31 AM Márton Balassi 
> wrote:
>
> > Hi Austin,
> >
> > Thank you for your thoughts. This is where this discussion stops. This
> > email thread already contains more characters than the implementation and
> > what is needed for the next 20 years of maintenance.
> >
> > It is great that you have a view on modern solutions and thank you for
> > offering your help with brainstorming solutions. I am responsible for
> Flink
> > at Cloudera and we do need an implementation like this and it is in fact
> > already in production at dozens of customers. We are open to adapting
> that
> > to expose a more generic API (and keeping Kerberos to our fork), to
> > contribute this to the community as others have asked for it and to
> protect
> > ourselves from occasionally having to update this critical implementation
> > path based on changes in the Apache codebase. I have worked with close
> to a
> > hundred Big Data customers as a consultant and an engineering manager and
> > committed hundreds of changes to Apache Flink over the past decade,
> please
> > trust my judgement on a simple matter like this.
> >
> > Please forgive me for referencing authority, this discussion was getting
> > out of hand. Please keep vigilant.
> >
> > Best,
> > Marton
> >
> > On Mon, Jun 21, 2021 at 10:50 PM Austin Cawley-Edwards <
> > austin.caw...@gmail.com> wrote:
> >
> > > Hi Gabor + Marton,
> > >
> > > I don't believe that the issue with this proposal is the specific
> > mechanism
> > > proposed (Kerberos), but rather that it is not the level to implement
> it
> > at
> > > (Flink). I'm just one voice, so please take this with a grain of salt.
> > >
> > > In the other solutions previously noted there is no need to instrument
> > > Flink which, in addition to reducing the maintenance burden, provides a
> > > better, decoupled end result.
> > >
> > > IMO we should not add any new API in Flink for this use case. I think
> it
> > is
> > > unfortunate and sympathize with the work that has already been done on
> > this
> > > feature – perhaps we could brainstorm ways to run this alongside Flink
> in
> > > your setup. Again, I don't think the proposed solution of an agnostic
> API
> > > would not work, nor is it a bad idea, but is not one that will make
> Flink
> > > more compatible with the modern solutions to this problem.
> > >
> > > Best,
> > > Austin
> > >
> > > On Mon, Jun 21, 2021 at 2:18 PM Márton Balassi <
> balassi.mar...@gmail.com
> > >
> > > wrote:
> > >
> > > > Hi team,
> > > >
> > > > Thank you for your input. Based on this discussion I agree with G
> that
> > > > selecting and standardizing on a specific strong authentication
> > mechanism
> > > > is more challenging than the whole 

Re: [DISCUSS] Dashboard/HistoryServer authentication

2021-06-22 Thread Konstantin Knauf
Hi everyone,

First, Marton and I had a brief conversation yesterday offline and
discussed exploring the approach of exposing the authentication
functionality via an API. So, I am looking forward to your proposal in that
direction. The benefit of such a solution would be that it is extensible
for others and it does add a smaller maintenance (in particular testing)
footprint to Apache Flink itself. If we end up going down this route,
flink-packages.org would be a great way to promote these third party
"authentication modules".

Second, Marton, I understand your frustration about the long discussion on
this "simple matter", but the condescending tone of your last mail feels
uncalled for to me. Austin expressed a valid opinion on the topic, which is
based on his experience from other Open Source frameworks (CNCF mostly). I
am sure you agree that it is important for Apache Flink to stay open and to
consider different approaches and ideas and I don't think it helps the
culture of discussion to shoot it down like this ("This is where this
discussion stops.").

Let's continue to move this discussion forward and I am sure we'll find a
consensus based on product and technological considerations.

Thanks,

Konstantin

On Tue, Jun 22, 2021 at 9:31 AM Márton Balassi 
wrote:

> Hi Austin,
>
> Thank you for your thoughts. This is where this discussion stops. This
> email thread already contains more characters than the implementation and
> what is needed for the next 20 years of maintenance.
>
> It is great that you have a view on modern solutions and thank you for
> offering your help with brainstorming solutions. I am responsible for Flink
> at Cloudera and we do need an implementation like this and it is in fact
> already in production at dozens of customers. We are open to adapting that
> to expose a more generic API (and keeping Kerberos to our fork), to
> contribute this to the community as others have asked for it and to protect
> ourselves from occasionally having to update this critical implementation
> path based on changes in the Apache codebase. I have worked with close to a
> hundred Big Data customers as a consultant and an engineering manager and
> committed hundreds of changes to Apache Flink over the past decade, please
> trust my judgement on a simple matter like this.
>
> Please forgive me for referencing authority, this discussion was getting
> out of hand. Please keep vigilant.
>
> Best,
> Marton
>
> On Mon, Jun 21, 2021 at 10:50 PM Austin Cawley-Edwards <
> austin.caw...@gmail.com> wrote:
>
> > Hi Gabor + Marton,
> >
> > I don't believe that the issue with this proposal is the specific
> mechanism
> > proposed (Kerberos), but rather that it is not the level to implement it
> at
> > (Flink). I'm just one voice, so please take this with a grain of salt.
> >
> > In the other solutions previously noted there is no need to instrument
> > Flink which, in addition to reducing the maintenance burden, provides a
> > better, decoupled end result.
> >
> > IMO we should not add any new API in Flink for this use case. I think it
> is
> > unfortunate and sympathize with the work that has already been done on
> this
> > feature – perhaps we could brainstorm ways to run this alongside Flink in
> > your setup. Again, I don't think the proposed solution of an agnostic API
> > would not work, nor is it a bad idea, but is not one that will make Flink
> > more compatible with the modern solutions to this problem.
> >
> > Best,
> > Austin
> >
> > On Mon, Jun 21, 2021 at 2:18 PM Márton Balassi  >
> > wrote:
> >
> > > Hi team,
> > >
> > > Thank you for your input. Based on this discussion I agree with G that
> > > selecting and standardizing on a specific strong authentication
> mechanism
> > > is more challenging than the whole rest of the scope of this
> > authentication
> > > story. :-) I suggest that G and I go back to the drawing board and come
> > up
> > > with an API that can support multiple authentication mechanisms, and we
> > > would only merge said API to Flink. Specific implementations of it can
> be
> > > maintained outside of the project. This way we tackle the main
> challenge
> > in
> > > a truly minimal way.
> > >
> > > Best,
> > > Marton
> > >
> > > On Mon, Jun 21, 2021 at 4:18 PM Gabor Somogyi <
> gabor.g.somo...@gmail.com
> > >
> > > wrote:
> > >
> > > > Hi All,
> > > >
> > > > We see that adding any kind of specific authentication raises more
> > > > questions than answers.
> > > > What would be if a generic API would be added without any real
> > > > authentication logic?
> > > > That way every provider can add its own protocol implementation as
> > > > additional jar.
> > > >
> > > > BR,
> > > > G
> > > >
> > > >
> > > > On Thu, Jun 17, 2021 at 7:53 PM Austin Cawley-Edwards <
> > > > austin.caw...@gmail.com> wrote:
> > > >
> > > >> Hi all,
> > > >>
> > > >> Sorry to be joining the conversation late. I'm also on the side of
> > > >> Konstantin, generally, in that this seems to not be a core goal of

Re: [DISCUSS] Dashboard/HistoryServer authentication

2021-06-22 Thread Márton Balassi
Hi Austin,

Thank you for your thoughts. This is where this discussion stops. This
email thread already contains more characters than the implementation and
what is needed for the next 20 years of maintenance.

It is great that you have a view on modern solutions and thank you for
offering your help with brainstorming solutions. I am responsible for Flink
at Cloudera and we do need an implementation like this and it is in fact
already in production at dozens of customers. We are open to adapting that
to expose a more generic API (and keeping Kerberos to our fork), to
contribute this to the community as others have asked for it and to protect
ourselves from occasionally having to update this critical implementation
path based on changes in the Apache codebase. I have worked with close to a
hundred Big Data customers as a consultant and an engineering manager and
committed hundreds of changes to Apache Flink over the past decade, please
trust my judgement on a simple matter like this.

Please forgive me for referencing authority, this discussion was getting
out of hand. Please keep vigilant.

Best,
Marton

On Mon, Jun 21, 2021 at 10:50 PM Austin Cawley-Edwards <
austin.caw...@gmail.com> wrote:

> Hi Gabor + Marton,
>
> I don't believe that the issue with this proposal is the specific mechanism
> proposed (Kerberos), but rather that it is not the level to implement it at
> (Flink). I'm just one voice, so please take this with a grain of salt.
>
> In the other solutions previously noted there is no need to instrument
> Flink which, in addition to reducing the maintenance burden, provides a
> better, decoupled end result.
>
> IMO we should not add any new API in Flink for this use case. I think it is
> unfortunate and sympathize with the work that has already been done on this
> feature – perhaps we could brainstorm ways to run this alongside Flink in
> your setup. Again, I don't think the proposed solution of an agnostic API
> would not work, nor is it a bad idea, but is not one that will make Flink
> more compatible with the modern solutions to this problem.
>
> Best,
> Austin
>
> On Mon, Jun 21, 2021 at 2:18 PM Márton Balassi 
> wrote:
>
> > Hi team,
> >
> > Thank you for your input. Based on this discussion I agree with G that
> > selecting and standardizing on a specific strong authentication mechanism
> > is more challenging than the whole rest of the scope of this
> authentication
> > story. :-) I suggest that G and I go back to the drawing board and come
> up
> > with an API that can support multiple authentication mechanisms, and we
> > would only merge said API to Flink. Specific implementations of it can be
> > maintained outside of the project. This way we tackle the main challenge
> in
> > a truly minimal way.
> >
> > Best,
> > Marton
> >
> > On Mon, Jun 21, 2021 at 4:18 PM Gabor Somogyi  >
> > wrote:
> >
> > > Hi All,
> > >
> > > We see that adding any kind of specific authentication raises more
> > > questions than answers.
> > > What would be if a generic API would be added without any real
> > > authentication logic?
> > > That way every provider can add its own protocol implementation as
> > > additional jar.
> > >
> > > BR,
> > > G
> > >
> > >
> > > On Thu, Jun 17, 2021 at 7:53 PM Austin Cawley-Edwards <
> > > austin.caw...@gmail.com> wrote:
> > >
> > >> Hi all,
> > >>
> > >> Sorry to be joining the conversation late. I'm also on the side of
> > >> Konstantin, generally, in that this seems to not be a core goal of
> Flink
> > >> as
> > >> a project and adds a maintenance burden.
> > >>
> > >> Would another con of Kerberos be that is likely a fading project in
> > terms
> > >> of network security? (serious question, please correct me if there is
> > >> reason to believe it is gaining adoption)
> > >>
> > >> The point about Kerberos being independent of infrastructure is a good
> > one
> > >> but is something that is also solved by modern sidecar proxies +
> service
> > >> meshes that can run across Kubernetes and bare-metal. These solutions
> > also
> > >> handle certificate provisioning, rotation, etc. in addition to
> > >> higher-level
> > >> authorization policies. Some examples of projects with this "universal
> > >> infrastructure support" are Kuma[1] (CNCF Sandbox, I'm a maintainer)
> and
> > >> Istio[2] (Google).
> > >>
> > >> Wondering out loud: has anyone tried to run Flink on top of cilium[3],
> > >> which also provides zero-trust networking at the kernel level without
> > >> needing to instrument applications? This currently only runs on
> > Kubernetes
> > >> on Linux, so that's a major limitation, but solves many of the request
> > >> forging concerns at all levels.
> > >>
> > >> Thanks,
> > >> Austin
> > >>
> > >> [1]: https://kuma.io/docs/1.1.6/quickstart/universal/
> > >> [2]: https://istio.io/latest/docs/setup/install/virtual-machine/
> > >> [3]: https://cilium.io/
> > >>
> > >> On Thu, Jun 17, 2021 at 1:50 PM Till Rohrmann 
> > >> wrote:
> > >>
> > >> > I left some comments 

Re: [DISCUSS] Dashboard/HistoryServer authentication

2021-06-21 Thread Austin Cawley-Edwards
Hi Gabor + Marton,

I don't believe that the issue with this proposal is the specific mechanism
proposed (Kerberos), but rather that it is not the level to implement it at
(Flink). I'm just one voice, so please take this with a grain of salt.

In the other solutions previously noted there is no need to instrument
Flink which, in addition to reducing the maintenance burden, provides a
better, decoupled end result.

IMO we should not add any new API in Flink for this use case. I think it is
unfortunate and sympathize with the work that has already been done on this
feature – perhaps we could brainstorm ways to run this alongside Flink in
your setup. Again, I don't think the proposed solution of an agnostic API
would not work, nor is it a bad idea, but is not one that will make Flink
more compatible with the modern solutions to this problem.

Best,
Austin

On Mon, Jun 21, 2021 at 2:18 PM Márton Balassi 
wrote:

> Hi team,
>
> Thank you for your input. Based on this discussion I agree with G that
> selecting and standardizing on a specific strong authentication mechanism
> is more challenging than the whole rest of the scope of this authentication
> story. :-) I suggest that G and I go back to the drawing board and come up
> with an API that can support multiple authentication mechanisms, and we
> would only merge said API to Flink. Specific implementations of it can be
> maintained outside of the project. This way we tackle the main challenge in
> a truly minimal way.
>
> Best,
> Marton
>
> On Mon, Jun 21, 2021 at 4:18 PM Gabor Somogyi 
> wrote:
>
> > Hi All,
> >
> > We see that adding any kind of specific authentication raises more
> > questions than answers.
> > What would be if a generic API would be added without any real
> > authentication logic?
> > That way every provider can add its own protocol implementation as
> > additional jar.
> >
> > BR,
> > G
> >
> >
> > On Thu, Jun 17, 2021 at 7:53 PM Austin Cawley-Edwards <
> > austin.caw...@gmail.com> wrote:
> >
> >> Hi all,
> >>
> >> Sorry to be joining the conversation late. I'm also on the side of
> >> Konstantin, generally, in that this seems to not be a core goal of Flink
> >> as
> >> a project and adds a maintenance burden.
> >>
> >> Would another con of Kerberos be that is likely a fading project in
> terms
> >> of network security? (serious question, please correct me if there is
> >> reason to believe it is gaining adoption)
> >>
> >> The point about Kerberos being independent of infrastructure is a good
> one
> >> but is something that is also solved by modern sidecar proxies + service
> >> meshes that can run across Kubernetes and bare-metal. These solutions
> also
> >> handle certificate provisioning, rotation, etc. in addition to
> >> higher-level
> >> authorization policies. Some examples of projects with this "universal
> >> infrastructure support" are Kuma[1] (CNCF Sandbox, I'm a maintainer) and
> >> Istio[2] (Google).
> >>
> >> Wondering out loud: has anyone tried to run Flink on top of cilium[3],
> >> which also provides zero-trust networking at the kernel level without
> >> needing to instrument applications? This currently only runs on
> Kubernetes
> >> on Linux, so that's a major limitation, but solves many of the request
> >> forging concerns at all levels.
> >>
> >> Thanks,
> >> Austin
> >>
> >> [1]: https://kuma.io/docs/1.1.6/quickstart/universal/
> >> [2]: https://istio.io/latest/docs/setup/install/virtual-machine/
> >> [3]: https://cilium.io/
> >>
> >> On Thu, Jun 17, 2021 at 1:50 PM Till Rohrmann 
> >> wrote:
> >>
> >> > I left some comments in the Google document. It would be great if
> >> > someone from the community with security experience could also take a
> >> look
> >> > at it. Maybe Eron you have an opinion on the topic.
> >> >
> >> > Cheers,
> >> > Till
> >> >
> >> > On Thu, Jun 17, 2021 at 6:57 PM Till Rohrmann 
> >> > wrote:
> >> >
> >> > > Hi Gabor,
> >> > >
> >> > > I haven't found time to look into the updated FLIP yet. I'll try to
> >> do it
> >> > > asap.
> >> > >
> >> > > Cheers,
> >> > > Till
> >> > >
> >> > > On Wed, Jun 16, 2021 at 9:35 PM Konstantin Knauf  >
> >> > > wrote:
> >> > >
> >> > >> Hi Gabor,
> >> > >>
> >> > >> > However representing Kerberos as completely new feature is not
> true
> >> > >> because
> >> > >> it's already in since Flink makes authentication at least with HDFS
> >> and
> >> > >> Hbase through Kerberos.
> >> > >>
> >> > >> True, that is one way to look at it, but there are differences,
> too:
> >> > >> Control Plane vs Data Plane, Core vs Connectors.
> >> > >>
> >> > >> > Adding OIDC or OAuth2 has the exact same concerns what you've
> guys
> >> > just
> >> > >> raised. Why exactly these? If you think this would be beneficial we
> >> can
> >> > >> discuss it in detail
> >> > >>
> >> > >> That's exactly my point. Once we start adding authx support, we
> will
> >> > >> sooner or later discuss other options besides Kerberos, too. A user
> >> who
> >> > >> would like to use OAuth can not 

Re: [DISCUSS] Dashboard/HistoryServer authentication

2021-06-21 Thread Márton Balassi
Hi team,

Thank you for your input. Based on this discussion I agree with G that
selecting and standardizing on a specific strong authentication mechanism
is more challenging than the whole rest of the scope of this authentication
story. :-) I suggest that G and I go back to the drawing board and come up
with an API that can support multiple authentication mechanisms, and we
would only merge said API to Flink. Specific implementations of it can be
maintained outside of the project. This way we tackle the main challenge in
a truly minimal way.

Best,
Marton

On Mon, Jun 21, 2021 at 4:18 PM Gabor Somogyi 
wrote:

> Hi All,
>
> We see that adding any kind of specific authentication raises more
> questions than answers.
> What would be if a generic API would be added without any real
> authentication logic?
> That way every provider can add its own protocol implementation as
> additional jar.
>
> BR,
> G
>
>
> On Thu, Jun 17, 2021 at 7:53 PM Austin Cawley-Edwards <
> austin.caw...@gmail.com> wrote:
>
>> Hi all,
>>
>> Sorry to be joining the conversation late. I'm also on the side of
>> Konstantin, generally, in that this seems to not be a core goal of Flink
>> as
>> a project and adds a maintenance burden.
>>
>> Would another con of Kerberos be that is likely a fading project in terms
>> of network security? (serious question, please correct me if there is
>> reason to believe it is gaining adoption)
>>
>> The point about Kerberos being independent of infrastructure is a good one
>> but is something that is also solved by modern sidecar proxies + service
>> meshes that can run across Kubernetes and bare-metal. These solutions also
>> handle certificate provisioning, rotation, etc. in addition to
>> higher-level
>> authorization policies. Some examples of projects with this "universal
>> infrastructure support" are Kuma[1] (CNCF Sandbox, I'm a maintainer) and
>> Istio[2] (Google).
>>
>> Wondering out loud: has anyone tried to run Flink on top of cilium[3],
>> which also provides zero-trust networking at the kernel level without
>> needing to instrument applications? This currently only runs on Kubernetes
>> on Linux, so that's a major limitation, but solves many of the request
>> forging concerns at all levels.
>>
>> Thanks,
>> Austin
>>
>> [1]: https://kuma.io/docs/1.1.6/quickstart/universal/
>> [2]: https://istio.io/latest/docs/setup/install/virtual-machine/
>> [3]: https://cilium.io/
>>
>> On Thu, Jun 17, 2021 at 1:50 PM Till Rohrmann 
>> wrote:
>>
>> > I left some comments in the Google document. It would be great if
>> > someone from the community with security experience could also take a
>> look
>> > at it. Maybe Eron you have an opinion on the topic.
>> >
>> > Cheers,
>> > Till
>> >
>> > On Thu, Jun 17, 2021 at 6:57 PM Till Rohrmann 
>> > wrote:
>> >
>> > > Hi Gabor,
>> > >
>> > > I haven't found time to look into the updated FLIP yet. I'll try to
>> do it
>> > > asap.
>> > >
>> > > Cheers,
>> > > Till
>> > >
>> > > On Wed, Jun 16, 2021 at 9:35 PM Konstantin Knauf 
>> > > wrote:
>> > >
>> > >> Hi Gabor,
>> > >>
>> > >> > However representing Kerberos as completely new feature is not true
>> > >> because
>> > >> it's already in since Flink makes authentication at least with HDFS
>> and
>> > >> Hbase through Kerberos.
>> > >>
>> > >> True, that is one way to look at it, but there are differences, too:
>> > >> Control Plane vs Data Plane, Core vs Connectors.
>> > >>
>> > >> > Adding OIDC or OAuth2 has the exact same concerns what you've guys
>> > just
>> > >> raised. Why exactly these? If you think this would be beneficial we
>> can
>> > >> discuss it in detail
>> > >>
>> > >> That's exactly my point. Once we start adding authx support, we will
>> > >> sooner or later discuss other options besides Kerberos, too. A user
>> who
>> > >> would like to use OAuth can not easily use Kerberos, right?
>> > >> That is one of the reasons I am skeptical about adding initial authx
>> > >> support.
>> > >>
>> > >> > Related authorization you've mentioned it can be complicated over
>> > time.
>> > >> Can
>> > >> you show us an example? We've knowledge with couple of open source
>> > >> components
>> > >> but authorization was never a horror complex story. I personally have
>> > the
>> > >> most experience with Spark which I think is quite simple and stable.
>> > Users
>> > >> can be viewers/admins
>> > >> and jobs started by others can't be modified. If you can share an
>> > example
>> > >> over-complication we can discuss on facts.
>> > >>
>> > >> Authorization is a new aspect that needs to be considered for every
>> > >> addition to the REST API. In the future users might ask for
>> additional
>> > >> roles (e.g. an editor), user-defined roles and you've already
>> mentioned
>> > >> job-level permissions yourself. And keep in mind that there might
>> also
>> > be
>> > >> larger additions in the future like the flink-sql-gateway.
>> Contributions
>> > >> like this become more expensive the more aspects we need to 

Re: [DISCUSS] Dashboard/HistoryServer authentication

2021-06-21 Thread Gabor Somogyi
Hi All,

We see that adding any kind of specific authentication raises more
questions than answers.
What would be if a generic API would be added without any real
authentication logic?
That way every provider can add its own protocol implementation as
additional jar.

BR,
G


On Thu, Jun 17, 2021 at 7:53 PM Austin Cawley-Edwards <
austin.caw...@gmail.com> wrote:

> Hi all,
>
> Sorry to be joining the conversation late. I'm also on the side of
> Konstantin, generally, in that this seems to not be a core goal of Flink as
> a project and adds a maintenance burden.
>
> Would another con of Kerberos be that is likely a fading project in terms
> of network security? (serious question, please correct me if there is
> reason to believe it is gaining adoption)
>
> The point about Kerberos being independent of infrastructure is a good one
> but is something that is also solved by modern sidecar proxies + service
> meshes that can run across Kubernetes and bare-metal. These solutions also
> handle certificate provisioning, rotation, etc. in addition to higher-level
> authorization policies. Some examples of projects with this "universal
> infrastructure support" are Kuma[1] (CNCF Sandbox, I'm a maintainer) and
> Istio[2] (Google).
>
> Wondering out loud: has anyone tried to run Flink on top of cilium[3],
> which also provides zero-trust networking at the kernel level without
> needing to instrument applications? This currently only runs on Kubernetes
> on Linux, so that's a major limitation, but solves many of the request
> forging concerns at all levels.
>
> Thanks,
> Austin
>
> [1]: https://kuma.io/docs/1.1.6/quickstart/universal/
> [2]: https://istio.io/latest/docs/setup/install/virtual-machine/
> [3]: https://cilium.io/
>
> On Thu, Jun 17, 2021 at 1:50 PM Till Rohrmann 
> wrote:
>
> > I left some comments in the Google document. It would be great if
> > someone from the community with security experience could also take a
> look
> > at it. Maybe Eron you have an opinion on the topic.
> >
> > Cheers,
> > Till
> >
> > On Thu, Jun 17, 2021 at 6:57 PM Till Rohrmann 
> > wrote:
> >
> > > Hi Gabor,
> > >
> > > I haven't found time to look into the updated FLIP yet. I'll try to do
> it
> > > asap.
> > >
> > > Cheers,
> > > Till
> > >
> > > On Wed, Jun 16, 2021 at 9:35 PM Konstantin Knauf 
> > > wrote:
> > >
> > >> Hi Gabor,
> > >>
> > >> > However representing Kerberos as completely new feature is not true
> > >> because
> > >> it's already in since Flink makes authentication at least with HDFS
> and
> > >> Hbase through Kerberos.
> > >>
> > >> True, that is one way to look at it, but there are differences, too:
> > >> Control Plane vs Data Plane, Core vs Connectors.
> > >>
> > >> > Adding OIDC or OAuth2 has the exact same concerns what you've guys
> > just
> > >> raised. Why exactly these? If you think this would be beneficial we
> can
> > >> discuss it in detail
> > >>
> > >> That's exactly my point. Once we start adding authx support, we will
> > >> sooner or later discuss other options besides Kerberos, too. A user
> who
> > >> would like to use OAuth can not easily use Kerberos, right?
> > >> That is one of the reasons I am skeptical about adding initial authx
> > >> support.
> > >>
> > >> > Related authorization you've mentioned it can be complicated over
> > time.
> > >> Can
> > >> you show us an example? We've knowledge with couple of open source
> > >> components
> > >> but authorization was never a horror complex story. I personally have
> > the
> > >> most experience with Spark which I think is quite simple and stable.
> > Users
> > >> can be viewers/admins
> > >> and jobs started by others can't be modified. If you can share an
> > example
> > >> over-complication we can discuss on facts.
> > >>
> > >> Authorization is a new aspect that needs to be considered for every
> > >> addition to the REST API. In the future users might ask for additional
> > >> roles (e.g. an editor), user-defined roles and you've already
> mentioned
> > >> job-level permissions yourself. And keep in mind that there might also
> > be
> > >> larger additions in the future like the flink-sql-gateway.
> Contributions
> > >> like this become more expensive the more aspects we need to consider.
> > >>
> > >> In general, I believe, it is important that the community focuses its
> > >> efforts where we can generate the most value to the user and -
> > personally -
> > >> I don't think there is much to gain by extending Flink's scope in that
> > >> direction. Of course, this is not black and white and there are other
> > valid
> > >> opinions.
> > >>
> > >> Thanks,
> > >>
> > >> Konstantin
> > >>
> > >> On Wed, Jun 16, 2021 at 7:38 PM Gabor Somogyi <
> > gabor.g.somo...@gmail.com>
> > >> wrote:
> > >>
> > >>> Hi Konstantin,
> > >>>
> > >>> Thanks for the response. Related new feature introduction in case of
> > >>> Basic
> > >>> auth I tend to agree, anything else can be chosen.
> > >>>
> > >>> However representing Kerberos as completely 

Re: [DISCUSS] Dashboard/HistoryServer authentication

2021-06-17 Thread Austin Cawley-Edwards
Hi all,

Sorry to be joining the conversation late. I'm also on the side of
Konstantin, generally, in that this seems to not be a core goal of Flink as
a project and adds a maintenance burden.

Would another con of Kerberos be that is likely a fading project in terms
of network security? (serious question, please correct me if there is
reason to believe it is gaining adoption)

The point about Kerberos being independent of infrastructure is a good one
but is something that is also solved by modern sidecar proxies + service
meshes that can run across Kubernetes and bare-metal. These solutions also
handle certificate provisioning, rotation, etc. in addition to higher-level
authorization policies. Some examples of projects with this "universal
infrastructure support" are Kuma[1] (CNCF Sandbox, I'm a maintainer) and
Istio[2] (Google).

Wondering out loud: has anyone tried to run Flink on top of cilium[3],
which also provides zero-trust networking at the kernel level without
needing to instrument applications? This currently only runs on Kubernetes
on Linux, so that's a major limitation, but solves many of the request
forging concerns at all levels.

Thanks,
Austin

[1]: https://kuma.io/docs/1.1.6/quickstart/universal/
[2]: https://istio.io/latest/docs/setup/install/virtual-machine/
[3]: https://cilium.io/

On Thu, Jun 17, 2021 at 1:50 PM Till Rohrmann  wrote:

> I left some comments in the Google document. It would be great if
> someone from the community with security experience could also take a look
> at it. Maybe Eron you have an opinion on the topic.
>
> Cheers,
> Till
>
> On Thu, Jun 17, 2021 at 6:57 PM Till Rohrmann 
> wrote:
>
> > Hi Gabor,
> >
> > I haven't found time to look into the updated FLIP yet. I'll try to do it
> > asap.
> >
> > Cheers,
> > Till
> >
> > On Wed, Jun 16, 2021 at 9:35 PM Konstantin Knauf 
> > wrote:
> >
> >> Hi Gabor,
> >>
> >> > However representing Kerberos as completely new feature is not true
> >> because
> >> it's already in since Flink makes authentication at least with HDFS and
> >> Hbase through Kerberos.
> >>
> >> True, that is one way to look at it, but there are differences, too:
> >> Control Plane vs Data Plane, Core vs Connectors.
> >>
> >> > Adding OIDC or OAuth2 has the exact same concerns what you've guys
> just
> >> raised. Why exactly these? If you think this would be beneficial we can
> >> discuss it in detail
> >>
> >> That's exactly my point. Once we start adding authx support, we will
> >> sooner or later discuss other options besides Kerberos, too. A user who
> >> would like to use OAuth can not easily use Kerberos, right?
> >> That is one of the reasons I am skeptical about adding initial authx
> >> support.
> >>
> >> > Related authorization you've mentioned it can be complicated over
> time.
> >> Can
> >> you show us an example? We've knowledge with couple of open source
> >> components
> >> but authorization was never a horror complex story. I personally have
> the
> >> most experience with Spark which I think is quite simple and stable.
> Users
> >> can be viewers/admins
> >> and jobs started by others can't be modified. If you can share an
> example
> >> over-complication we can discuss on facts.
> >>
> >> Authorization is a new aspect that needs to be considered for every
> >> addition to the REST API. In the future users might ask for additional
> >> roles (e.g. an editor), user-defined roles and you've already mentioned
> >> job-level permissions yourself. And keep in mind that there might also
> be
> >> larger additions in the future like the flink-sql-gateway. Contributions
> >> like this become more expensive the more aspects we need to consider.
> >>
> >> In general, I believe, it is important that the community focuses its
> >> efforts where we can generate the most value to the user and -
> personally -
> >> I don't think there is much to gain by extending Flink's scope in that
> >> direction. Of course, this is not black and white and there are other
> valid
> >> opinions.
> >>
> >> Thanks,
> >>
> >> Konstantin
> >>
> >> On Wed, Jun 16, 2021 at 7:38 PM Gabor Somogyi <
> gabor.g.somo...@gmail.com>
> >> wrote:
> >>
> >>> Hi Konstantin,
> >>>
> >>> Thanks for the response. Related new feature introduction in case of
> >>> Basic
> >>> auth I tend to agree, anything else can be chosen.
> >>>
> >>> However representing Kerberos as completely new feature is not true
> >>> because
> >>> it's already in since Flink makes authentication at least with HDFS and
> >>> Hbase through Kerberos.
> >>> The main problem with the actual Kerberos implementation is that it
> >>> contains several bugs and only partially implemented. Following your
> >>> suggestion can we agree that we
> >>> skip the Basic auth implementation and finish an already started
> Kerberos
> >>> story by adding History Server and Job Dashboard authentication?
> >>>
> >>> Adding OIDC or OAuth2 has the exact same concerns what you've guys just
> >>> raised. Why exactly these? If you think 

Re: [DISCUSS] Dashboard/HistoryServer authentication

2021-06-17 Thread Till Rohrmann
I left some comments in the Google document. It would be great if
someone from the community with security experience could also take a look
at it. Maybe Eron you have an opinion on the topic.

Cheers,
Till

On Thu, Jun 17, 2021 at 6:57 PM Till Rohrmann  wrote:

> Hi Gabor,
>
> I haven't found time to look into the updated FLIP yet. I'll try to do it
> asap.
>
> Cheers,
> Till
>
> On Wed, Jun 16, 2021 at 9:35 PM Konstantin Knauf 
> wrote:
>
>> Hi Gabor,
>>
>> > However representing Kerberos as completely new feature is not true
>> because
>> it's already in since Flink makes authentication at least with HDFS and
>> Hbase through Kerberos.
>>
>> True, that is one way to look at it, but there are differences, too:
>> Control Plane vs Data Plane, Core vs Connectors.
>>
>> > Adding OIDC or OAuth2 has the exact same concerns what you've guys just
>> raised. Why exactly these? If you think this would be beneficial we can
>> discuss it in detail
>>
>> That's exactly my point. Once we start adding authx support, we will
>> sooner or later discuss other options besides Kerberos, too. A user who
>> would like to use OAuth can not easily use Kerberos, right?
>> That is one of the reasons I am skeptical about adding initial authx
>> support.
>>
>> > Related authorization you've mentioned it can be complicated over time.
>> Can
>> you show us an example? We've knowledge with couple of open source
>> components
>> but authorization was never a horror complex story. I personally have the
>> most experience with Spark which I think is quite simple and stable. Users
>> can be viewers/admins
>> and jobs started by others can't be modified. If you can share an example
>> over-complication we can discuss on facts.
>>
>> Authorization is a new aspect that needs to be considered for every
>> addition to the REST API. In the future users might ask for additional
>> roles (e.g. an editor), user-defined roles and you've already mentioned
>> job-level permissions yourself. And keep in mind that there might also be
>> larger additions in the future like the flink-sql-gateway. Contributions
>> like this become more expensive the more aspects we need to consider.
>>
>> In general, I believe, it is important that the community focuses its
>> efforts where we can generate the most value to the user and - personally -
>> I don't think there is much to gain by extending Flink's scope in that
>> direction. Of course, this is not black and white and there are other valid
>> opinions.
>>
>> Thanks,
>>
>> Konstantin
>>
>> On Wed, Jun 16, 2021 at 7:38 PM Gabor Somogyi 
>> wrote:
>>
>>> Hi Konstantin,
>>>
>>> Thanks for the response. Related new feature introduction in case of
>>> Basic
>>> auth I tend to agree, anything else can be chosen.
>>>
>>> However representing Kerberos as completely new feature is not true
>>> because
>>> it's already in since Flink makes authentication at least with HDFS and
>>> Hbase through Kerberos.
>>> The main problem with the actual Kerberos implementation is that it
>>> contains several bugs and only partially implemented. Following your
>>> suggestion can we agree that we
>>> skip the Basic auth implementation and finish an already started Kerberos
>>> story by adding History Server and Job Dashboard authentication?
>>>
>>> Adding OIDC or OAuth2 has the exact same concerns what you've guys just
>>> raised. Why exactly these? If you think this would be beneficial we can
>>> discuss it in detail
>>> but as a side story it would be good to finish a halfway done Kerberos
>>> story.
>>>
>>> Related authorization you've mentioned it can be complicated over time.
>>> Can
>>> you show us an example? We've knowledge with couple of open source
>>> components
>>> but authorization was never a horror complex story. I personally have the
>>> most experience with Spark which I think is quite simple and stable.
>>> Users
>>> can be viewers/admins
>>> and jobs started by others can't be modified. If you can share an example
>>> over-complication we can discuss on facts.
>>>
>>> Thank you in advance!
>>>
>>> BR,
>>> G
>>>
>>>
>>> On Wed, Jun 16, 2021 at 5:42 PM Konstantin Knauf 
>>> wrote:
>>>
>>> > Hi everyone,
>>> >
>>> > sorry for joining late and thanks for the insightful discussion.
>>> >
>>> > In general, I'd personally prefer not to increase the surface area of
>>> > Apache Flink unless there is a good reason. It seems we all agree that
>>> > authx is not part of the core value proposition of Apache Flink, so if
>>> we
>>> > can delegate this problem to a more specialized tool, I am in favor of
>>> > that. Apache Flink is already huge and a lot of work goes into
>>> maintenance,
>>> > so I personally have become more sensitive to this aspect over time.
>>> >
>>> > If we add support for Basic Auth and Kerberos now, users will sooner or
>>> > later ask for OIDC, LDAP, SAML,... I acknowledge that Kerberos is
>>> widely
>>> > used in the corporate, on-premises context, but isn't the focus moving
>>> more
>>> > towards more 

Re: [DISCUSS] Dashboard/HistoryServer authentication

2021-06-17 Thread Till Rohrmann
Hi Gabor,

I haven't found time to look into the updated FLIP yet. I'll try to do it
asap.

Cheers,
Till

On Wed, Jun 16, 2021 at 9:35 PM Konstantin Knauf  wrote:

> Hi Gabor,
>
> > However representing Kerberos as completely new feature is not true
> because
> it's already in since Flink makes authentication at least with HDFS and
> Hbase through Kerberos.
>
> True, that is one way to look at it, but there are differences, too:
> Control Plane vs Data Plane, Core vs Connectors.
>
> > Adding OIDC or OAuth2 has the exact same concerns what you've guys just
> raised. Why exactly these? If you think this would be beneficial we can
> discuss it in detail
>
> That's exactly my point. Once we start adding authx support, we will
> sooner or later discuss other options besides Kerberos, too. A user who
> would like to use OAuth can not easily use Kerberos, right?
> That is one of the reasons I am skeptical about adding initial authx
> support.
>
> > Related authorization you've mentioned it can be complicated over time.
> Can
> you show us an example? We've knowledge with couple of open source
> components
> but authorization was never a horror complex story. I personally have the
> most experience with Spark which I think is quite simple and stable. Users
> can be viewers/admins
> and jobs started by others can't be modified. If you can share an example
> over-complication we can discuss on facts.
>
> Authorization is a new aspect that needs to be considered for every
> addition to the REST API. In the future users might ask for additional
> roles (e.g. an editor), user-defined roles and you've already mentioned
> job-level permissions yourself. And keep in mind that there might also be
> larger additions in the future like the flink-sql-gateway. Contributions
> like this become more expensive the more aspects we need to consider.
>
> In general, I believe, it is important that the community focuses its
> efforts where we can generate the most value to the user and - personally -
> I don't think there is much to gain by extending Flink's scope in that
> direction. Of course, this is not black and white and there are other valid
> opinions.
>
> Thanks,
>
> Konstantin
>
> On Wed, Jun 16, 2021 at 7:38 PM Gabor Somogyi 
> wrote:
>
>> Hi Konstantin,
>>
>> Thanks for the response. Related new feature introduction in case of Basic
>> auth I tend to agree, anything else can be chosen.
>>
>> However representing Kerberos as completely new feature is not true
>> because
>> it's already in since Flink makes authentication at least with HDFS and
>> Hbase through Kerberos.
>> The main problem with the actual Kerberos implementation is that it
>> contains several bugs and only partially implemented. Following your
>> suggestion can we agree that we
>> skip the Basic auth implementation and finish an already started Kerberos
>> story by adding History Server and Job Dashboard authentication?
>>
>> Adding OIDC or OAuth2 has the exact same concerns what you've guys just
>> raised. Why exactly these? If you think this would be beneficial we can
>> discuss it in detail
>> but as a side story it would be good to finish a halfway done Kerberos
>> story.
>>
>> Related authorization you've mentioned it can be complicated over time.
>> Can
>> you show us an example? We've knowledge with couple of open source
>> components
>> but authorization was never a horror complex story. I personally have the
>> most experience with Spark which I think is quite simple and stable. Users
>> can be viewers/admins
>> and jobs started by others can't be modified. If you can share an example
>> over-complication we can discuss on facts.
>>
>> Thank you in advance!
>>
>> BR,
>> G
>>
>>
>> On Wed, Jun 16, 2021 at 5:42 PM Konstantin Knauf 
>> wrote:
>>
>> > Hi everyone,
>> >
>> > sorry for joining late and thanks for the insightful discussion.
>> >
>> > In general, I'd personally prefer not to increase the surface area of
>> > Apache Flink unless there is a good reason. It seems we all agree that
>> > authx is not part of the core value proposition of Apache Flink, so if
>> we
>> > can delegate this problem to a more specialized tool, I am in favor of
>> > that. Apache Flink is already huge and a lot of work goes into
>> maintenance,
>> > so I personally have become more sensitive to this aspect over time.
>> >
>> > If we add support for Basic Auth and Kerberos now, users will sooner or
>> > later ask for OIDC, LDAP, SAML,... I acknowledge that Kerberos is widely
>> > used in the corporate, on-premises context, but isn't the focus moving
>> more
>> > towards more web-friendly standards like OIDC/OAuth 2.0? If we only
>> want to
>> > support a single protocol, there is an argument to be made that it
>> should
>> > be OIDC and Dex [1,2] as a bridge to everything else. Have OIDC or
>> OAuth2
>> > been considered instead of Kerberos? How do you see the market moving?
>> But
>> > as I said before, in my opinion we can generate more value by investing
>> > into 

Re: [DISCUSS] Dashboard/HistoryServer authentication

2021-06-16 Thread Konstantin Knauf
Hi Gabor,

> However representing Kerberos as completely new feature is not true
because
it's already in since Flink makes authentication at least with HDFS and
Hbase through Kerberos.

True, that is one way to look at it, but there are differences, too:
Control Plane vs Data Plane, Core vs Connectors.

> Adding OIDC or OAuth2 has the exact same concerns what you've guys just
raised. Why exactly these? If you think this would be beneficial we can
discuss it in detail

That's exactly my point. Once we start adding authx support, we will sooner
or later discuss other options besides Kerberos, too. A user who would like
to use OAuth can not easily use Kerberos, right?
That is one of the reasons I am skeptical about adding initial authx
support.

> Related authorization you've mentioned it can be complicated over time.
Can
you show us an example? We've knowledge with couple of open source
components
but authorization was never a horror complex story. I personally have the
most experience with Spark which I think is quite simple and stable. Users
can be viewers/admins
and jobs started by others can't be modified. If you can share an example
over-complication we can discuss on facts.

Authorization is a new aspect that needs to be considered for every
addition to the REST API. In the future users might ask for additional
roles (e.g. an editor), user-defined roles and you've already mentioned
job-level permissions yourself. And keep in mind that there might also be
larger additions in the future like the flink-sql-gateway. Contributions
like this become more expensive the more aspects we need to consider.

In general, I believe, it is important that the community focuses its
efforts where we can generate the most value to the user and - personally -
I don't think there is much to gain by extending Flink's scope in that
direction. Of course, this is not black and white and there are other valid
opinions.

Thanks,

Konstantin

On Wed, Jun 16, 2021 at 7:38 PM Gabor Somogyi 
wrote:

> Hi Konstantin,
>
> Thanks for the response. Related new feature introduction in case of Basic
> auth I tend to agree, anything else can be chosen.
>
> However representing Kerberos as completely new feature is not true because
> it's already in since Flink makes authentication at least with HDFS and
> Hbase through Kerberos.
> The main problem with the actual Kerberos implementation is that it
> contains several bugs and only partially implemented. Following your
> suggestion can we agree that we
> skip the Basic auth implementation and finish an already started Kerberos
> story by adding History Server and Job Dashboard authentication?
>
> Adding OIDC or OAuth2 has the exact same concerns what you've guys just
> raised. Why exactly these? If you think this would be beneficial we can
> discuss it in detail
> but as a side story it would be good to finish a halfway done Kerberos
> story.
>
> Related authorization you've mentioned it can be complicated over time. Can
> you show us an example? We've knowledge with couple of open source
> components
> but authorization was never a horror complex story. I personally have the
> most experience with Spark which I think is quite simple and stable. Users
> can be viewers/admins
> and jobs started by others can't be modified. If you can share an example
> over-complication we can discuss on facts.
>
> Thank you in advance!
>
> BR,
> G
>
>
> On Wed, Jun 16, 2021 at 5:42 PM Konstantin Knauf 
> wrote:
>
> > Hi everyone,
> >
> > sorry for joining late and thanks for the insightful discussion.
> >
> > In general, I'd personally prefer not to increase the surface area of
> > Apache Flink unless there is a good reason. It seems we all agree that
> > authx is not part of the core value proposition of Apache Flink, so if we
> > can delegate this problem to a more specialized tool, I am in favor of
> > that. Apache Flink is already huge and a lot of work goes into
> maintenance,
> > so I personally have become more sensitive to this aspect over time.
> >
> > If we add support for Basic Auth and Kerberos now, users will sooner or
> > later ask for OIDC, LDAP, SAML,... I acknowledge that Kerberos is widely
> > used in the corporate, on-premises context, but isn't the focus moving
> more
> > towards more web-friendly standards like OIDC/OAuth 2.0? If we only want
> to
> > support a single protocol, there is an argument to be made that it should
> > be OIDC and Dex [1,2] as a bridge to everything else. Have OIDC or OAuth2
> > been considered instead of Kerberos? How do you see the market moving?
> But
> > as I said before, in my opinion we can generate more value by investing
> > into other areas of Apache Flink.
> >
> > Authorization also has the potential to become more fine-grained and
> > complex over time: you already mentioned restricting the actions that a
> > specific user can do in a cluster.
> >
> > Cheers,
> >
> > Konstantin
> >
> > [1] https://github.com/dexidp/dex
> > [2] 

Re: [DISCUSS] Dashboard/HistoryServer authentication

2021-06-16 Thread Gabor Somogyi
Hi Konstantin,

Thanks for the response. Related new feature introduction in case of Basic
auth I tend to agree, anything else can be chosen.

However representing Kerberos as completely new feature is not true because
it's already in since Flink makes authentication at least with HDFS and
Hbase through Kerberos.
The main problem with the actual Kerberos implementation is that it
contains several bugs and only partially implemented. Following your
suggestion can we agree that we
skip the Basic auth implementation and finish an already started Kerberos
story by adding History Server and Job Dashboard authentication?

Adding OIDC or OAuth2 has the exact same concerns what you've guys just
raised. Why exactly these? If you think this would be beneficial we can
discuss it in detail
but as a side story it would be good to finish a halfway done Kerberos
story.

Related authorization you've mentioned it can be complicated over time. Can
you show us an example? We've knowledge with couple of open source
components
but authorization was never a horror complex story. I personally have the
most experience with Spark which I think is quite simple and stable. Users
can be viewers/admins
and jobs started by others can't be modified. If you can share an example
over-complication we can discuss on facts.

Thank you in advance!

BR,
G


On Wed, Jun 16, 2021 at 5:42 PM Konstantin Knauf  wrote:

> Hi everyone,
>
> sorry for joining late and thanks for the insightful discussion.
>
> In general, I'd personally prefer not to increase the surface area of
> Apache Flink unless there is a good reason. It seems we all agree that
> authx is not part of the core value proposition of Apache Flink, so if we
> can delegate this problem to a more specialized tool, I am in favor of
> that. Apache Flink is already huge and a lot of work goes into maintenance,
> so I personally have become more sensitive to this aspect over time.
>
> If we add support for Basic Auth and Kerberos now, users will sooner or
> later ask for OIDC, LDAP, SAML,... I acknowledge that Kerberos is widely
> used in the corporate, on-premises context, but isn't the focus moving more
> towards more web-friendly standards like OIDC/OAuth 2.0? If we only want to
> support a single protocol, there is an argument to be made that it should
> be OIDC and Dex [1,2] as a bridge to everything else. Have OIDC or OAuth2
> been considered instead of Kerberos? How do you see the market moving? But
> as I said before, in my opinion we can generate more value by investing
> into other areas of Apache Flink.
>
> Authorization also has the potential to become more fine-grained and
> complex over time: you already mentioned restricting the actions that a
> specific user can do in a cluster.
>
> Cheers,
>
> Konstantin
>
> [1] https://github.com/dexidp/dex
> [2] https://github.com/dexidp/dex/issues/1903
>
>
> On Wed, Jun 16, 2021 at 11:44 AM Gabor Somogyi 
> wrote:
>
>> Hi Till,
>>
>> Did you have the chance to take a look at the doc? Not yet seen any
>> update.
>>
>> BR,
>> G
>>
>>
>> On Wed, Jun 9, 2021 at 1:43 PM Till Rohrmann 
>> wrote:
>>
>> > Thanks for the update Gabor. I'll take a look and respond in the
>> document.
>> >
>> > Cheers,
>> > Till
>> >
>> > On Wed, Jun 9, 2021 at 12:59 PM Gabor Somogyi <
>> gabor.g.somo...@gmail.com>
>> > wrote:
>> >
>> >> Hi Till,
>> >>
>> >> Your proxy suggestion has been considered in-depth and updated the FLIP
>> >> accordingly.
>> >> We've considered 2 proxy implementation (Nginx and Squid) but according
>> >> to our analysis and testing it's not suitable for the mentioned
>> use-cases.
>> >> Please take a look at the rejected alternatives for detailed
>> explanation.
>> >>
>> >> Thanks for your time in advance!
>> >>
>> >> BR,
>> >> G
>> >>
>> >>
>> >> On Fri, Jun 4, 2021 at 3:31 PM Till Rohrmann 
>> >> wrote:
>> >>
>> >>> As I've said I am not a security expert and that's why I have to ask
>> for
>> >>> clarification, Gabor. You are saying that if we configure a
>> truststore for
>> >>> the REST endpoint with a single trusted certificate which has been
>> >>> generated by the operator of the Flink cluster, then the attacker can
>> >>> generate a new certificate, sign it and then talk to the Flink
>> cluster if
>> >>> he has access to the node on which the REST endpoint runs? My
>> understanding
>> >>> was that you need the corresponding private key which in my proposed
>> setup
>> >>> would be under the control of the operator as well (e.g. stored in a
>> >>> keystore on the same machine but guarded by some secret). That way
>> (if I am
>> >>> not mistaken), only the entity which has access to the keystore is
>> able to
>> >>> talk to the Flink cluster.
>> >>>
>> >>> Maybe we are also getting our wires crossed here and are talking about
>> >>> different things.
>> >>>
>> >>> Thanks for listing the pros and cons of Kerberos. Concerning what
>> other
>> >>> authentication mechanisms are used in the industry, I am not 100%
>> sure.
>> >>>
>> >>> 

Re: [DISCUSS] Dashboard/HistoryServer authentication

2021-06-16 Thread Konstantin Knauf
Hi everyone,

sorry for joining late and thanks for the insightful discussion.

In general, I'd personally prefer not to increase the surface area of
Apache Flink unless there is a good reason. It seems we all agree that
authx is not part of the core value proposition of Apache Flink, so if we
can delegate this problem to a more specialized tool, I am in favor of
that. Apache Flink is already huge and a lot of work goes into maintenance,
so I personally have become more sensitive to this aspect over time.

If we add support for Basic Auth and Kerberos now, users will sooner or
later ask for OIDC, LDAP, SAML,... I acknowledge that Kerberos is widely
used in the corporate, on-premises context, but isn't the focus moving more
towards more web-friendly standards like OIDC/OAuth 2.0? If we only want to
support a single protocol, there is an argument to be made that it should
be OIDC and Dex [1,2] as a bridge to everything else. Have OIDC or OAuth2
been considered instead of Kerberos? How do you see the market moving? But
as I said before, in my opinion we can generate more value by investing
into other areas of Apache Flink.

Authorization also has the potential to become more fine-grained and
complex over time: you already mentioned restricting the actions that a
specific user can do in a cluster.

Cheers,

Konstantin

[1] https://github.com/dexidp/dex
[2] https://github.com/dexidp/dex/issues/1903


On Wed, Jun 16, 2021 at 11:44 AM Gabor Somogyi 
wrote:

> Hi Till,
>
> Did you have the chance to take a look at the doc? Not yet seen any update.
>
> BR,
> G
>
>
> On Wed, Jun 9, 2021 at 1:43 PM Till Rohrmann  wrote:
>
> > Thanks for the update Gabor. I'll take a look and respond in the
> document.
> >
> > Cheers,
> > Till
> >
> > On Wed, Jun 9, 2021 at 12:59 PM Gabor Somogyi  >
> > wrote:
> >
> >> Hi Till,
> >>
> >> Your proxy suggestion has been considered in-depth and updated the FLIP
> >> accordingly.
> >> We've considered 2 proxy implementation (Nginx and Squid) but according
> >> to our analysis and testing it's not suitable for the mentioned
> use-cases.
> >> Please take a look at the rejected alternatives for detailed
> explanation.
> >>
> >> Thanks for your time in advance!
> >>
> >> BR,
> >> G
> >>
> >>
> >> On Fri, Jun 4, 2021 at 3:31 PM Till Rohrmann 
> >> wrote:
> >>
> >>> As I've said I am not a security expert and that's why I have to ask
> for
> >>> clarification, Gabor. You are saying that if we configure a truststore
> for
> >>> the REST endpoint with a single trusted certificate which has been
> >>> generated by the operator of the Flink cluster, then the attacker can
> >>> generate a new certificate, sign it and then talk to the Flink cluster
> if
> >>> he has access to the node on which the REST endpoint runs? My
> understanding
> >>> was that you need the corresponding private key which in my proposed
> setup
> >>> would be under the control of the operator as well (e.g. stored in a
> >>> keystore on the same machine but guarded by some secret). That way (if
> I am
> >>> not mistaken), only the entity which has access to the keystore is
> able to
> >>> talk to the Flink cluster.
> >>>
> >>> Maybe we are also getting our wires crossed here and are talking about
> >>> different things.
> >>>
> >>> Thanks for listing the pros and cons of Kerberos. Concerning what other
> >>> authentication mechanisms are used in the industry, I am not 100% sure.
> >>>
> >>> Cheers,
> >>> Till
> >>>
> >>> On Fri, Jun 4, 2021 at 11:09 AM Gabor Somogyi <
> gabor.g.somo...@gmail.com>
> >>> wrote:
> >>>
>  > I did not mean for the user to sign its own certificates but for the
>  operator of the cluster. Once the user request hits the proxy, it
> should no
>  longer be under his control. I think I do not fully understand yet
> why this
>  would not work.
>  I said it's not solving the authentication problem over any proxy.
> Even
>  if the operator is signing the certificate one can have access to an
>  internal node.
>  Such case anybody can craft certificates which is accepted by the
>  server. When it's accepted a bad guy can cancel jobs causing huge
> impacts.
> 
>  > Also, I am missing a bit the comparison of Kerberos to other
>  authentication mechanisms and why they were rejected in favour of
> Kerberos.
>  PROS:
>  * Since it's not depending on cloud provider and/or k8s or bare-metal
>  etc. deployment it's the biggest plus
>  * Centralized with tools and no need to write tons of tools around
>  * There are clients/tools on almost all OS-es and several languages
>  * Super huge users are using it for years in production w/o huge
> issues
>  * Provides cross-realm trust possibility amongst other features
>  * Several open source components using it which could increase
>  compatibility
> 
>  CONS:
>  * Not everybody using kerberos
>  * It would increase the code footprint but this is true for many
>  

Re: [DISCUSS] Dashboard/HistoryServer authentication

2021-06-16 Thread Gabor Somogyi
Hi Till,

Did you have the chance to take a look at the doc? Not yet seen any update.

BR,
G


On Wed, Jun 9, 2021 at 1:43 PM Till Rohrmann  wrote:

> Thanks for the update Gabor. I'll take a look and respond in the document.
>
> Cheers,
> Till
>
> On Wed, Jun 9, 2021 at 12:59 PM Gabor Somogyi 
> wrote:
>
>> Hi Till,
>>
>> Your proxy suggestion has been considered in-depth and updated the FLIP
>> accordingly.
>> We've considered 2 proxy implementation (Nginx and Squid) but according
>> to our analysis and testing it's not suitable for the mentioned use-cases.
>> Please take a look at the rejected alternatives for detailed explanation.
>>
>> Thanks for your time in advance!
>>
>> BR,
>> G
>>
>>
>> On Fri, Jun 4, 2021 at 3:31 PM Till Rohrmann 
>> wrote:
>>
>>> As I've said I am not a security expert and that's why I have to ask for
>>> clarification, Gabor. You are saying that if we configure a truststore for
>>> the REST endpoint with a single trusted certificate which has been
>>> generated by the operator of the Flink cluster, then the attacker can
>>> generate a new certificate, sign it and then talk to the Flink cluster if
>>> he has access to the node on which the REST endpoint runs? My understanding
>>> was that you need the corresponding private key which in my proposed setup
>>> would be under the control of the operator as well (e.g. stored in a
>>> keystore on the same machine but guarded by some secret). That way (if I am
>>> not mistaken), only the entity which has access to the keystore is able to
>>> talk to the Flink cluster.
>>>
>>> Maybe we are also getting our wires crossed here and are talking about
>>> different things.
>>>
>>> Thanks for listing the pros and cons of Kerberos. Concerning what other
>>> authentication mechanisms are used in the industry, I am not 100% sure.
>>>
>>> Cheers,
>>> Till
>>>
>>> On Fri, Jun 4, 2021 at 11:09 AM Gabor Somogyi 
>>> wrote:
>>>
 > I did not mean for the user to sign its own certificates but for the
 operator of the cluster. Once the user request hits the proxy, it should no
 longer be under his control. I think I do not fully understand yet why this
 would not work.
 I said it's not solving the authentication problem over any proxy. Even
 if the operator is signing the certificate one can have access to an
 internal node.
 Such case anybody can craft certificates which is accepted by the
 server. When it's accepted a bad guy can cancel jobs causing huge impacts.

 > Also, I am missing a bit the comparison of Kerberos to other
 authentication mechanisms and why they were rejected in favour of Kerberos.
 PROS:
 * Since it's not depending on cloud provider and/or k8s or bare-metal
 etc. deployment it's the biggest plus
 * Centralized with tools and no need to write tons of tools around
 * There are clients/tools on almost all OS-es and several languages
 * Super huge users are using it for years in production w/o huge issues
 * Provides cross-realm trust possibility amongst other features
 * Several open source components using it which could increase
 compatibility

 CONS:
 * Not everybody using kerberos
 * It would increase the code footprint but this is true for many
 features (as a side note I'm here to maintain it)

 Feel free to add your points because it only represents a single
 viewpoint.
 Also if you have any better option for strong authentication please
 share it and we can consider the pros/cons here.

 BR,
 G


 On Fri, Jun 4, 2021 at 10:32 AM Till Rohrmann 
 wrote:

> I did not mean for the user to sign its own certificates but for the
> operator of the cluster. Once the user request hits the proxy, it should 
> no
> longer be under his control. I think I do not fully understand yet why 
> this
> would not work.
>
> What I would like to avoid is to add more complexity into Flink if
> there is an easy solution which fulfills the requirements. That's why I
> would like to exercise thoroughly through the different alternatives. 
> Also,
> I am missing a bit the comparison of Kerberos to other authentication
> mechanisms and why they were rejected in favour of Kerberos.
>
> Cheers,
> Till
>
> On Fri, Jun 4, 2021 at 10:26 AM Gyula Fóra  wrote:
>
>> Hi!
>>
>> I think there might be possible alternatives but it seems Kerberos on
>> the rest endpoint ticks all the right boxes and provides a super clean 
>> and
>> simple solution for strong authentication.
>>
>> I wouldn’t even consider sidecar proxies etc if we can solve it in
>> such a simple way as proposed by G.
>>
>> Cheers
>> Gyula
>>
>> On Fri, 4 Jun 2021 at 10:03, Till Rohrmann 
>> wrote:
>>
>>> I am not saying that we shouldn't add a strong authentication
>>> mechanism if there are good 

Re: [DISCUSS] Dashboard/HistoryServer authentication

2021-06-09 Thread Till Rohrmann
Thanks for the update Gabor. I'll take a look and respond in the document.

Cheers,
Till

On Wed, Jun 9, 2021 at 12:59 PM Gabor Somogyi 
wrote:

> Hi Till,
>
> Your proxy suggestion has been considered in-depth and updated the FLIP
> accordingly.
> We've considered 2 proxy implementation (Nginx and Squid) but according to
> our analysis and testing it's not suitable for the mentioned use-cases.
> Please take a look at the rejected alternatives for detailed explanation.
>
> Thanks for your time in advance!
>
> BR,
> G
>
>
> On Fri, Jun 4, 2021 at 3:31 PM Till Rohrmann  wrote:
>
>> As I've said I am not a security expert and that's why I have to ask for
>> clarification, Gabor. You are saying that if we configure a truststore for
>> the REST endpoint with a single trusted certificate which has been
>> generated by the operator of the Flink cluster, then the attacker can
>> generate a new certificate, sign it and then talk to the Flink cluster if
>> he has access to the node on which the REST endpoint runs? My understanding
>> was that you need the corresponding private key which in my proposed setup
>> would be under the control of the operator as well (e.g. stored in a
>> keystore on the same machine but guarded by some secret). That way (if I am
>> not mistaken), only the entity which has access to the keystore is able to
>> talk to the Flink cluster.
>>
>> Maybe we are also getting our wires crossed here and are talking about
>> different things.
>>
>> Thanks for listing the pros and cons of Kerberos. Concerning what other
>> authentication mechanisms are used in the industry, I am not 100% sure.
>>
>> Cheers,
>> Till
>>
>> On Fri, Jun 4, 2021 at 11:09 AM Gabor Somogyi 
>> wrote:
>>
>>> > I did not mean for the user to sign its own certificates but for the
>>> operator of the cluster. Once the user request hits the proxy, it should no
>>> longer be under his control. I think I do not fully understand yet why this
>>> would not work.
>>> I said it's not solving the authentication problem over any proxy. Even
>>> if the operator is signing the certificate one can have access to an
>>> internal node.
>>> Such case anybody can craft certificates which is accepted by the
>>> server. When it's accepted a bad guy can cancel jobs causing huge impacts.
>>>
>>> > Also, I am missing a bit the comparison of Kerberos to other
>>> authentication mechanisms and why they were rejected in favour of Kerberos.
>>> PROS:
>>> * Since it's not depending on cloud provider and/or k8s or bare-metal
>>> etc. deployment it's the biggest plus
>>> * Centralized with tools and no need to write tons of tools around
>>> * There are clients/tools on almost all OS-es and several languages
>>> * Super huge users are using it for years in production w/o huge issues
>>> * Provides cross-realm trust possibility amongst other features
>>> * Several open source components using it which could increase
>>> compatibility
>>>
>>> CONS:
>>> * Not everybody using kerberos
>>> * It would increase the code footprint but this is true for many
>>> features (as a side note I'm here to maintain it)
>>>
>>> Feel free to add your points because it only represents a single
>>> viewpoint.
>>> Also if you have any better option for strong authentication please
>>> share it and we can consider the pros/cons here.
>>>
>>> BR,
>>> G
>>>
>>>
>>> On Fri, Jun 4, 2021 at 10:32 AM Till Rohrmann 
>>> wrote:
>>>
 I did not mean for the user to sign its own certificates but for the
 operator of the cluster. Once the user request hits the proxy, it should no
 longer be under his control. I think I do not fully understand yet why this
 would not work.

 What I would like to avoid is to add more complexity into Flink if
 there is an easy solution which fulfills the requirements. That's why I
 would like to exercise thoroughly through the different alternatives. Also,
 I am missing a bit the comparison of Kerberos to other authentication
 mechanisms and why they were rejected in favour of Kerberos.

 Cheers,
 Till

 On Fri, Jun 4, 2021 at 10:26 AM Gyula Fóra  wrote:

> Hi!
>
> I think there might be possible alternatives but it seems Kerberos on
> the rest endpoint ticks all the right boxes and provides a super clean and
> simple solution for strong authentication.
>
> I wouldn’t even consider sidecar proxies etc if we can solve it in
> such a simple way as proposed by G.
>
> Cheers
> Gyula
>
> On Fri, 4 Jun 2021 at 10:03, Till Rohrmann 
> wrote:
>
>> I am not saying that we shouldn't add a strong authentication
>> mechanism if there are good reasons for it. I primarily would like to
>> understand the context a bit better in order to give qualified feedback 
>> and
>> come to a good decision. In order to do this, I have the feeling that we
>> haven't fully considered all available options which are on the table, 
>> tbh.

Re: [DISCUSS] Dashboard/HistoryServer authentication

2021-06-09 Thread Gabor Somogyi
Hi Till,

Your proxy suggestion has been considered in-depth and updated the FLIP
accordingly.
We've considered 2 proxy implementation (Nginx and Squid) but according to
our analysis and testing it's not suitable for the mentioned use-cases.
Please take a look at the rejected alternatives for detailed explanation.

Thanks for your time in advance!

BR,
G


On Fri, Jun 4, 2021 at 3:31 PM Till Rohrmann  wrote:

> As I've said I am not a security expert and that's why I have to ask for
> clarification, Gabor. You are saying that if we configure a truststore for
> the REST endpoint with a single trusted certificate which has been
> generated by the operator of the Flink cluster, then the attacker can
> generate a new certificate, sign it and then talk to the Flink cluster if
> he has access to the node on which the REST endpoint runs? My understanding
> was that you need the corresponding private key which in my proposed setup
> would be under the control of the operator as well (e.g. stored in a
> keystore on the same machine but guarded by some secret). That way (if I am
> not mistaken), only the entity which has access to the keystore is able to
> talk to the Flink cluster.
>
> Maybe we are also getting our wires crossed here and are talking about
> different things.
>
> Thanks for listing the pros and cons of Kerberos. Concerning what other
> authentication mechanisms are used in the industry, I am not 100% sure.
>
> Cheers,
> Till
>
> On Fri, Jun 4, 2021 at 11:09 AM Gabor Somogyi 
> wrote:
>
>> > I did not mean for the user to sign its own certificates but for the
>> operator of the cluster. Once the user request hits the proxy, it should no
>> longer be under his control. I think I do not fully understand yet why this
>> would not work.
>> I said it's not solving the authentication problem over any proxy. Even
>> if the operator is signing the certificate one can have access to an
>> internal node.
>> Such case anybody can craft certificates which is accepted by the server.
>> When it's accepted a bad guy can cancel jobs causing huge impacts.
>>
>> > Also, I am missing a bit the comparison of Kerberos to other
>> authentication mechanisms and why they were rejected in favour of Kerberos.
>> PROS:
>> * Since it's not depending on cloud provider and/or k8s or bare-metal
>> etc. deployment it's the biggest plus
>> * Centralized with tools and no need to write tons of tools around
>> * There are clients/tools on almost all OS-es and several languages
>> * Super huge users are using it for years in production w/o huge issues
>> * Provides cross-realm trust possibility amongst other features
>> * Several open source components using it which could increase
>> compatibility
>>
>> CONS:
>> * Not everybody using kerberos
>> * It would increase the code footprint but this is true for many features
>> (as a side note I'm here to maintain it)
>>
>> Feel free to add your points because it only represents a single
>> viewpoint.
>> Also if you have any better option for strong authentication please share
>> it and we can consider the pros/cons here.
>>
>> BR,
>> G
>>
>>
>> On Fri, Jun 4, 2021 at 10:32 AM Till Rohrmann 
>> wrote:
>>
>>> I did not mean for the user to sign its own certificates but for the
>>> operator of the cluster. Once the user request hits the proxy, it should no
>>> longer be under his control. I think I do not fully understand yet why this
>>> would not work.
>>>
>>> What I would like to avoid is to add more complexity into Flink if there
>>> is an easy solution which fulfills the requirements. That's why I would
>>> like to exercise thoroughly through the different alternatives. Also, I am
>>> missing a bit the comparison of Kerberos to other authentication mechanisms
>>> and why they were rejected in favour of Kerberos.
>>>
>>> Cheers,
>>> Till
>>>
>>> On Fri, Jun 4, 2021 at 10:26 AM Gyula Fóra  wrote:
>>>
 Hi!

 I think there might be possible alternatives but it seems Kerberos on
 the rest endpoint ticks all the right boxes and provides a super clean and
 simple solution for strong authentication.

 I wouldn’t even consider sidecar proxies etc if we can solve it in such
 a simple way as proposed by G.

 Cheers
 Gyula

 On Fri, 4 Jun 2021 at 10:03, Till Rohrmann 
 wrote:

> I am not saying that we shouldn't add a strong authentication
> mechanism if there are good reasons for it. I primarily would like to
> understand the context a bit better in order to give qualified feedback 
> and
> come to a good decision. In order to do this, I have the feeling that we
> haven't fully considered all available options which are on the table, 
> tbh.
>
> Does the problem of certificate expiry also apply for self-signed
> certificates? If yes, then this should then also be a problem for the
> internal encryption of Flink's communication. If not, then one could use
> self-signed certificates with a 

Re: [DISCUSS] Dashboard/HistoryServer authentication

2021-06-04 Thread Till Rohrmann
As I've said I am not a security expert and that's why I have to ask for
clarification, Gabor. You are saying that if we configure a truststore for
the REST endpoint with a single trusted certificate which has been
generated by the operator of the Flink cluster, then the attacker can
generate a new certificate, sign it and then talk to the Flink cluster if
he has access to the node on which the REST endpoint runs? My understanding
was that you need the corresponding private key which in my proposed setup
would be under the control of the operator as well (e.g. stored in a
keystore on the same machine but guarded by some secret). That way (if I am
not mistaken), only the entity which has access to the keystore is able to
talk to the Flink cluster.

Maybe we are also getting our wires crossed here and are talking about
different things.

Thanks for listing the pros and cons of Kerberos. Concerning what other
authentication mechanisms are used in the industry, I am not 100% sure.

Cheers,
Till

On Fri, Jun 4, 2021 at 11:09 AM Gabor Somogyi 
wrote:

> > I did not mean for the user to sign its own certificates but for the
> operator of the cluster. Once the user request hits the proxy, it should no
> longer be under his control. I think I do not fully understand yet why this
> would not work.
> I said it's not solving the authentication problem over any proxy. Even if
> the operator is signing the certificate one can have access to an internal
> node.
> Such case anybody can craft certificates which is accepted by the server.
> When it's accepted a bad guy can cancel jobs causing huge impacts.
>
> > Also, I am missing a bit the comparison of Kerberos to other
> authentication mechanisms and why they were rejected in favour of Kerberos.
> PROS:
> * Since it's not depending on cloud provider and/or k8s or bare-metal etc.
> deployment it's the biggest plus
> * Centralized with tools and no need to write tons of tools around
> * There are clients/tools on almost all OS-es and several languages
> * Super huge users are using it for years in production w/o huge issues
> * Provides cross-realm trust possibility amongst other features
> * Several open source components using it which could increase
> compatibility
>
> CONS:
> * Not everybody using kerberos
> * It would increase the code footprint but this is true for many features
> (as a side note I'm here to maintain it)
>
> Feel free to add your points because it only represents a single viewpoint.
> Also if you have any better option for strong authentication please share
> it and we can consider the pros/cons here.
>
> BR,
> G
>
>
> On Fri, Jun 4, 2021 at 10:32 AM Till Rohrmann 
> wrote:
>
>> I did not mean for the user to sign its own certificates but for the
>> operator of the cluster. Once the user request hits the proxy, it should no
>> longer be under his control. I think I do not fully understand yet why this
>> would not work.
>>
>> What I would like to avoid is to add more complexity into Flink if there
>> is an easy solution which fulfills the requirements. That's why I would
>> like to exercise thoroughly through the different alternatives. Also, I am
>> missing a bit the comparison of Kerberos to other authentication mechanisms
>> and why they were rejected in favour of Kerberos.
>>
>> Cheers,
>> Till
>>
>> On Fri, Jun 4, 2021 at 10:26 AM Gyula Fóra  wrote:
>>
>>> Hi!
>>>
>>> I think there might be possible alternatives but it seems Kerberos on
>>> the rest endpoint ticks all the right boxes and provides a super clean and
>>> simple solution for strong authentication.
>>>
>>> I wouldn’t even consider sidecar proxies etc if we can solve it in such
>>> a simple way as proposed by G.
>>>
>>> Cheers
>>> Gyula
>>>
>>> On Fri, 4 Jun 2021 at 10:03, Till Rohrmann  wrote:
>>>
 I am not saying that we shouldn't add a strong authentication mechanism
 if there are good reasons for it. I primarily would like to understand the
 context a bit better in order to give qualified feedback and come to a good
 decision. In order to do this, I have the feeling that we haven't fully
 considered all available options which are on the table, tbh.

 Does the problem of certificate expiry also apply for self-signed
 certificates? If yes, then this should then also be a problem for the
 internal encryption of Flink's communication. If not, then one could use
 self-signed certificates with a longer validity to solve the mentioned
 issue.

 I think you can set up Flink in such a way that you don't have to
 handle all the different certificates. For example, you could deploy Flink
 with a "sidecar proxy" which is responsible for the authentication using an
 arbitrary method (e.g. Kerberos) and then bind the REST endpoint to a local
 network interface. That way, the REST endpoint would only be available
 through the sidecar proxy. Additionally, one could enable SSL for this
 communication. Would this be a 

Re: [DISCUSS] Dashboard/HistoryServer authentication

2021-06-04 Thread Gabor Somogyi
> I did not mean for the user to sign its own certificates but for the
operator of the cluster. Once the user request hits the proxy, it should no
longer be under his control. I think I do not fully understand yet why this
would not work.
I said it's not solving the authentication problem over any proxy. Even if
the operator is signing the certificate one can have access to an internal
node.
Such case anybody can craft certificates which is accepted by the server.
When it's accepted a bad guy can cancel jobs causing huge impacts.

> Also, I am missing a bit the comparison of Kerberos to other
authentication mechanisms and why they were rejected in favour of Kerberos.
PROS:
* Since it's not depending on cloud provider and/or k8s or bare-metal etc.
deployment it's the biggest plus
* Centralized with tools and no need to write tons of tools around
* There are clients/tools on almost all OS-es and several languages
* Super huge users are using it for years in production w/o huge issues
* Provides cross-realm trust possibility amongst other features
* Several open source components using it which could increase compatibility

CONS:
* Not everybody using kerberos
* It would increase the code footprint but this is true for many features
(as a side note I'm here to maintain it)

Feel free to add your points because it only represents a single viewpoint.
Also if you have any better option for strong authentication please share
it and we can consider the pros/cons here.

BR,
G


On Fri, Jun 4, 2021 at 10:32 AM Till Rohrmann  wrote:

> I did not mean for the user to sign its own certificates but for the
> operator of the cluster. Once the user request hits the proxy, it should no
> longer be under his control. I think I do not fully understand yet why this
> would not work.
>
> What I would like to avoid is to add more complexity into Flink if there
> is an easy solution which fulfills the requirements. That's why I would
> like to exercise thoroughly through the different alternatives. Also, I am
> missing a bit the comparison of Kerberos to other authentication mechanisms
> and why they were rejected in favour of Kerberos.
>
> Cheers,
> Till
>
> On Fri, Jun 4, 2021 at 10:26 AM Gyula Fóra  wrote:
>
>> Hi!
>>
>> I think there might be possible alternatives but it seems Kerberos on the
>> rest endpoint ticks all the right boxes and provides a super clean and
>> simple solution for strong authentication.
>>
>> I wouldn’t even consider sidecar proxies etc if we can solve it in such a
>> simple way as proposed by G.
>>
>> Cheers
>> Gyula
>>
>> On Fri, 4 Jun 2021 at 10:03, Till Rohrmann  wrote:
>>
>>> I am not saying that we shouldn't add a strong authentication mechanism
>>> if there are good reasons for it. I primarily would like to understand the
>>> context a bit better in order to give qualified feedback and come to a good
>>> decision. In order to do this, I have the feeling that we haven't fully
>>> considered all available options which are on the table, tbh.
>>>
>>> Does the problem of certificate expiry also apply for self-signed
>>> certificates? If yes, then this should then also be a problem for the
>>> internal encryption of Flink's communication. If not, then one could use
>>> self-signed certificates with a longer validity to solve the mentioned
>>> issue.
>>>
>>> I think you can set up Flink in such a way that you don't have to handle
>>> all the different certificates. For example, you could deploy Flink with a
>>> "sidecar proxy" which is responsible for the authentication using an
>>> arbitrary method (e.g. Kerberos) and then bind the REST endpoint to a local
>>> network interface. That way, the REST endpoint would only be available
>>> through the sidecar proxy. Additionally, one could enable SSL for this
>>> communication. Would this be a solution for the problem?
>>>
>>> Cheers,
>>> Till
>>>
>>> On Thu, Jun 3, 2021 at 10:46 PM Márton Balassi 
>>> wrote:
>>>
 That is an interesting idea, Till.

 The main issue with it is that TLS certificates have an expiration
 time, usually they get approved for a couple years. Forcing our users to
 restart jobs to reprovision TLS certificates would be weird when we could
 just implement a single proper strong authentication mechanism instead in a
 couple hundred lines of code. :-)

 In many cases it is also impractical to go the TLS mutual route,
 because the Flink Dashboard can end up on any node in the k8s/Yarn cluster
 which means that we need a certificate per node (due to the mutual auth),
 but if we also want to protect the private key of these from users
 accidentally or intentionally leaking them then we need this per user. As
 in we end up managing user*machine number certificates and having to renew
 them periodically, which albeit automatable is unfortunately not yet
 automated in all large organizations.

 I fully agree that TLS certificate mutual authentication has its nice
 

Re: [DISCUSS] Dashboard/HistoryServer authentication

2021-06-04 Thread Till Rohrmann
I did not mean for the user to sign its own certificates but for the
operator of the cluster. Once the user request hits the proxy, it should no
longer be under his control. I think I do not fully understand yet why this
would not work.

What I would like to avoid is to add more complexity into Flink if there is
an easy solution which fulfills the requirements. That's why I would like
to exercise thoroughly through the different alternatives. Also, I am
missing a bit the comparison of Kerberos to other authentication mechanisms
and why they were rejected in favour of Kerberos.

Cheers,
Till

On Fri, Jun 4, 2021 at 10:26 AM Gyula Fóra  wrote:

> Hi!
>
> I think there might be possible alternatives but it seems Kerberos on the
> rest endpoint ticks all the right boxes and provides a super clean and
> simple solution for strong authentication.
>
> I wouldn’t even consider sidecar proxies etc if we can solve it in such a
> simple way as proposed by G.
>
> Cheers
> Gyula
>
> On Fri, 4 Jun 2021 at 10:03, Till Rohrmann  wrote:
>
>> I am not saying that we shouldn't add a strong authentication mechanism
>> if there are good reasons for it. I primarily would like to understand the
>> context a bit better in order to give qualified feedback and come to a good
>> decision. In order to do this, I have the feeling that we haven't fully
>> considered all available options which are on the table, tbh.
>>
>> Does the problem of certificate expiry also apply for self-signed
>> certificates? If yes, then this should then also be a problem for the
>> internal encryption of Flink's communication. If not, then one could use
>> self-signed certificates with a longer validity to solve the mentioned
>> issue.
>>
>> I think you can set up Flink in such a way that you don't have to handle
>> all the different certificates. For example, you could deploy Flink with a
>> "sidecar proxy" which is responsible for the authentication using an
>> arbitrary method (e.g. Kerberos) and then bind the REST endpoint to a local
>> network interface. That way, the REST endpoint would only be available
>> through the sidecar proxy. Additionally, one could enable SSL for this
>> communication. Would this be a solution for the problem?
>>
>> Cheers,
>> Till
>>
>> On Thu, Jun 3, 2021 at 10:46 PM Márton Balassi 
>> wrote:
>>
>>> That is an interesting idea, Till.
>>>
>>> The main issue with it is that TLS certificates have an expiration time,
>>> usually they get approved for a couple years. Forcing our users to restart
>>> jobs to reprovision TLS certificates would be weird when we could just
>>> implement a single proper strong authentication mechanism instead in a
>>> couple hundred lines of code. :-)
>>>
>>> In many cases it is also impractical to go the TLS mutual route, because
>>> the Flink Dashboard can end up on any node in the k8s/Yarn cluster which
>>> means that we need a certificate per node (due to the mutual auth), but if
>>> we also want to protect the private key of these from users accidentally or
>>> intentionally leaking them then we need this per user. As in we end up
>>> managing user*machine number certificates and having to renew them
>>> periodically, which albeit automatable is unfortunately not yet automated
>>> in all large organizations.
>>>
>>> I fully agree that TLS certificate mutual authentication has its nice
>>> properties, especially at very large (multiple thousand node) clusters -
>>> but it has its own challenges too. Thanks for bringing it up.
>>>
>>> Happy to have this added to the rejected alternative list so that we
>>> have the full picture documented.
>>>
>>> On Thu, Jun 3, 2021 at 5:52 PM Till Rohrmann 
>>> wrote:
>>>
 I guess the idea would then be to let the proxy do the authentication
 job and only forward the request via an SSL mutually encrypted connection
 to the Flink cluster. Would this be possible? The beauty of this setup is
 in my opinion that this setup should work with all kinds of authentication
 mechanisms.

 Cheers,
 Till

 On Thu, Jun 3, 2021 at 3:12 PM Gabor Somogyi 
 wrote:

> Thanks for giving options to fulfil the need.
>
> Users are looking for a solution where users can be identified on the
> whole cluster and restrict access to resources/actions.
> A good example for such an action is cancelling other users running
> jobs.
>
> * SSL does provide mutual authentication but when authentication
> passed there is no user based on restrictions can be made.
> * The less problematic part is that generating/maintaining short time
> valid certificates would be a hard (that's the reason KDC like servers
> exist).
> Having long time valid certificates would widen the attack surface but
> since the first concern is there this is just a cosmetic issue.
>
> All in all using TLS certificates is not sufficient in these
> environments unfortunately.
>
> BR,
> G
>
>

Re: [DISCUSS] Dashboard/HistoryServer authentication

2021-06-04 Thread Gyula Fóra
Hi!

I think there might be possible alternatives but it seems Kerberos on the
rest endpoint ticks all the right boxes and provides a super clean and
simple solution for strong authentication.

I wouldn’t even consider sidecar proxies etc if we can solve it in such a
simple way as proposed by G.

Cheers
Gyula

On Fri, 4 Jun 2021 at 10:03, Till Rohrmann  wrote:

> I am not saying that we shouldn't add a strong authentication mechanism if
> there are good reasons for it. I primarily would like to understand the
> context a bit better in order to give qualified feedback and come to a good
> decision. In order to do this, I have the feeling that we haven't fully
> considered all available options which are on the table, tbh.
>
> Does the problem of certificate expiry also apply for self-signed
> certificates? If yes, then this should then also be a problem for the
> internal encryption of Flink's communication. If not, then one could use
> self-signed certificates with a longer validity to solve the mentioned
> issue.
>
> I think you can set up Flink in such a way that you don't have to handle
> all the different certificates. For example, you could deploy Flink with a
> "sidecar proxy" which is responsible for the authentication using an
> arbitrary method (e.g. Kerberos) and then bind the REST endpoint to a local
> network interface. That way, the REST endpoint would only be available
> through the sidecar proxy. Additionally, one could enable SSL for this
> communication. Would this be a solution for the problem?
>
> Cheers,
> Till
>
> On Thu, Jun 3, 2021 at 10:46 PM Márton Balassi 
> wrote:
>
>> That is an interesting idea, Till.
>>
>> The main issue with it is that TLS certificates have an expiration time,
>> usually they get approved for a couple years. Forcing our users to restart
>> jobs to reprovision TLS certificates would be weird when we could just
>> implement a single proper strong authentication mechanism instead in a
>> couple hundred lines of code. :-)
>>
>> In many cases it is also impractical to go the TLS mutual route, because
>> the Flink Dashboard can end up on any node in the k8s/Yarn cluster which
>> means that we need a certificate per node (due to the mutual auth), but if
>> we also want to protect the private key of these from users accidentally or
>> intentionally leaking them then we need this per user. As in we end up
>> managing user*machine number certificates and having to renew them
>> periodically, which albeit automatable is unfortunately not yet automated
>> in all large organizations.
>>
>> I fully agree that TLS certificate mutual authentication has its nice
>> properties, especially at very large (multiple thousand node) clusters -
>> but it has its own challenges too. Thanks for bringing it up.
>>
>> Happy to have this added to the rejected alternative list so that we have
>> the full picture documented.
>>
>> On Thu, Jun 3, 2021 at 5:52 PM Till Rohrmann 
>> wrote:
>>
>>> I guess the idea would then be to let the proxy do the authentication
>>> job and only forward the request via an SSL mutually encrypted connection
>>> to the Flink cluster. Would this be possible? The beauty of this setup is
>>> in my opinion that this setup should work with all kinds of authentication
>>> mechanisms.
>>>
>>> Cheers,
>>> Till
>>>
>>> On Thu, Jun 3, 2021 at 3:12 PM Gabor Somogyi 
>>> wrote:
>>>
 Thanks for giving options to fulfil the need.

 Users are looking for a solution where users can be identified on the
 whole cluster and restrict access to resources/actions.
 A good example for such an action is cancelling other users running
 jobs.

 * SSL does provide mutual authentication but when authentication passed
 there is no user based on restrictions can be made.
 * The less problematic part is that generating/maintaining short time
 valid certificates would be a hard (that's the reason KDC like servers
 exist).
 Having long time valid certificates would widen the attack surface but
 since the first concern is there this is just a cosmetic issue.

 All in all using TLS certificates is not sufficient in these
 environments unfortunately.

 BR,
 G


 On Thu, Jun 3, 2021 at 12:49 PM Till Rohrmann 
 wrote:

> Thanks for the information Gabor. If it is about securing the
> communication between the REST client and the REST server, then Flink
> already supports enabling mutual SSL authentication [1]. Would this be
> enough to secure the communication and to pass an audit?
>
> [1]
> https://ci.apache.org/projects/flink/flink-docs-master/docs/deployment/security/security-ssl/#external--rest-connectivity
>
> Cheers,
> Till
>
> On Thu, Jun 3, 2021 at 10:33 AM Gabor Somogyi <
> gabor.g.somo...@gmail.com> wrote:
>
>> Hi Till,
>>
>> Since I'm working in security area 10+ years let me share my thought.
>> I would like to 

Re: [DISCUSS] Dashboard/HistoryServer authentication

2021-06-04 Thread Gabor Somogyi
Till, thanks for investing time in giving further options.
Marci, thanks for summarizing the use-case point of view.

We've arrived back to one of the original problems. Namely if an attacker
gets access to a node it's possible to cancel other user's jobs (and more
can be done).
Self signed certificate is almost no-op authentication in production
environments because any user can sign its own certificate and no third
party plays.
This problem just can't be solved with SSL no matter from which point of
view we consider it.

BR,
G


On Fri, Jun 4, 2021 at 10:03 AM Till Rohrmann  wrote:

> I am not saying that we shouldn't add a strong authentication mechanism if
> there are good reasons for it. I primarily would like to understand the
> context a bit better in order to give qualified feedback and come to a good
> decision. In order to do this, I have the feeling that we haven't fully
> considered all available options which are on the table, tbh.
>
> Does the problem of certificate expiry also apply for self-signed
> certificates? If yes, then this should then also be a problem for the
> internal encryption of Flink's communication. If not, then one could use
> self-signed certificates with a longer validity to solve the mentioned
> issue.
>
> I think you can set up Flink in such a way that you don't have to handle
> all the different certificates. For example, you could deploy Flink with a
> "sidecar proxy" which is responsible for the authentication using an
> arbitrary method (e.g. Kerberos) and then bind the REST endpoint to a local
> network interface. That way, the REST endpoint would only be available
> through the sidecar proxy. Additionally, one could enable SSL for this
> communication. Would this be a solution for the problem?
>
> Cheers,
> Till
>
> On Thu, Jun 3, 2021 at 10:46 PM Márton Balassi 
> wrote:
>
>> That is an interesting idea, Till.
>>
>> The main issue with it is that TLS certificates have an expiration time,
>> usually they get approved for a couple years. Forcing our users to restart
>> jobs to reprovision TLS certificates would be weird when we could just
>> implement a single proper strong authentication mechanism instead in a
>> couple hundred lines of code. :-)
>>
>> In many cases it is also impractical to go the TLS mutual route, because
>> the Flink Dashboard can end up on any node in the k8s/Yarn cluster which
>> means that we need a certificate per node (due to the mutual auth), but if
>> we also want to protect the private key of these from users accidentally or
>> intentionally leaking them then we need this per user. As in we end up
>> managing user*machine number certificates and having to renew them
>> periodically, which albeit automatable is unfortunately not yet automated
>> in all large organizations.
>>
>> I fully agree that TLS certificate mutual authentication has its nice
>> properties, especially at very large (multiple thousand node) clusters -
>> but it has its own challenges too. Thanks for bringing it up.
>>
>> Happy to have this added to the rejected alternative list so that we have
>> the full picture documented.
>>
>> On Thu, Jun 3, 2021 at 5:52 PM Till Rohrmann 
>> wrote:
>>
>>> I guess the idea would then be to let the proxy do the authentication
>>> job and only forward the request via an SSL mutually encrypted connection
>>> to the Flink cluster. Would this be possible? The beauty of this setup is
>>> in my opinion that this setup should work with all kinds of authentication
>>> mechanisms.
>>>
>>> Cheers,
>>> Till
>>>
>>> On Thu, Jun 3, 2021 at 3:12 PM Gabor Somogyi 
>>> wrote:
>>>
 Thanks for giving options to fulfil the need.

 Users are looking for a solution where users can be identified on the
 whole cluster and restrict access to resources/actions.
 A good example for such an action is cancelling other users running
 jobs.

 * SSL does provide mutual authentication but when authentication passed
 there is no user based on restrictions can be made.
 * The less problematic part is that generating/maintaining short time
 valid certificates would be a hard (that's the reason KDC like servers
 exist).
 Having long time valid certificates would widen the attack surface but
 since the first concern is there this is just a cosmetic issue.

 All in all using TLS certificates is not sufficient in these
 environments unfortunately.

 BR,
 G


 On Thu, Jun 3, 2021 at 12:49 PM Till Rohrmann 
 wrote:

> Thanks for the information Gabor. If it is about securing the
> communication between the REST client and the REST server, then Flink
> already supports enabling mutual SSL authentication [1]. Would this be
> enough to secure the communication and to pass an audit?
>
> [1]
> https://ci.apache.org/projects/flink/flink-docs-master/docs/deployment/security/security-ssl/#external--rest-connectivity
>
> Cheers,
> Till
>

Re: [DISCUSS] Dashboard/HistoryServer authentication

2021-06-04 Thread Till Rohrmann
I am not saying that we shouldn't add a strong authentication mechanism if
there are good reasons for it. I primarily would like to understand the
context a bit better in order to give qualified feedback and come to a good
decision. In order to do this, I have the feeling that we haven't fully
considered all available options which are on the table, tbh.

Does the problem of certificate expiry also apply for self-signed
certificates? If yes, then this should then also be a problem for the
internal encryption of Flink's communication. If not, then one could use
self-signed certificates with a longer validity to solve the mentioned
issue.

I think you can set up Flink in such a way that you don't have to handle
all the different certificates. For example, you could deploy Flink with a
"sidecar proxy" which is responsible for the authentication using an
arbitrary method (e.g. Kerberos) and then bind the REST endpoint to a local
network interface. That way, the REST endpoint would only be available
through the sidecar proxy. Additionally, one could enable SSL for this
communication. Would this be a solution for the problem?

Cheers,
Till

On Thu, Jun 3, 2021 at 10:46 PM Márton Balassi 
wrote:

> That is an interesting idea, Till.
>
> The main issue with it is that TLS certificates have an expiration time,
> usually they get approved for a couple years. Forcing our users to restart
> jobs to reprovision TLS certificates would be weird when we could just
> implement a single proper strong authentication mechanism instead in a
> couple hundred lines of code. :-)
>
> In many cases it is also impractical to go the TLS mutual route, because
> the Flink Dashboard can end up on any node in the k8s/Yarn cluster which
> means that we need a certificate per node (due to the mutual auth), but if
> we also want to protect the private key of these from users accidentally or
> intentionally leaking them then we need this per user. As in we end up
> managing user*machine number certificates and having to renew them
> periodically, which albeit automatable is unfortunately not yet automated
> in all large organizations.
>
> I fully agree that TLS certificate mutual authentication has its nice
> properties, especially at very large (multiple thousand node) clusters -
> but it has its own challenges too. Thanks for bringing it up.
>
> Happy to have this added to the rejected alternative list so that we have
> the full picture documented.
>
> On Thu, Jun 3, 2021 at 5:52 PM Till Rohrmann  wrote:
>
>> I guess the idea would then be to let the proxy do the authentication job
>> and only forward the request via an SSL mutually encrypted connection to
>> the Flink cluster. Would this be possible? The beauty of this setup is in
>> my opinion that this setup should work with all kinds of authentication
>> mechanisms.
>>
>> Cheers,
>> Till
>>
>> On Thu, Jun 3, 2021 at 3:12 PM Gabor Somogyi 
>> wrote:
>>
>>> Thanks for giving options to fulfil the need.
>>>
>>> Users are looking for a solution where users can be identified on the
>>> whole cluster and restrict access to resources/actions.
>>> A good example for such an action is cancelling other users running jobs.
>>>
>>> * SSL does provide mutual authentication but when authentication passed
>>> there is no user based on restrictions can be made.
>>> * The less problematic part is that generating/maintaining short time
>>> valid certificates would be a hard (that's the reason KDC like servers
>>> exist).
>>> Having long time valid certificates would widen the attack surface but
>>> since the first concern is there this is just a cosmetic issue.
>>>
>>> All in all using TLS certificates is not sufficient in these
>>> environments unfortunately.
>>>
>>> BR,
>>> G
>>>
>>>
>>> On Thu, Jun 3, 2021 at 12:49 PM Till Rohrmann 
>>> wrote:
>>>
 Thanks for the information Gabor. If it is about securing the
 communication between the REST client and the REST server, then Flink
 already supports enabling mutual SSL authentication [1]. Would this be
 enough to secure the communication and to pass an audit?

 [1]
 https://ci.apache.org/projects/flink/flink-docs-master/docs/deployment/security/security-ssl/#external--rest-connectivity

 Cheers,
 Till

 On Thu, Jun 3, 2021 at 10:33 AM Gabor Somogyi <
 gabor.g.somo...@gmail.com> wrote:

> Hi Till,
>
> Since I'm working in security area 10+ years let me share my thought.
> I would like to emphasise there are experts better than me but I have
> some
> basics.
> The discussion is open and not trying to tell alone things...
>
> > I mean if an attacker can get access to one of the machines, then it
> should also be possible to obtain the right Kerberos token.
> Not necessarily. For example if one gets access to a specific user's
> credentials then it's not possible to compromise other user's jobs,
> data,
> etc...
> Security is like an onion, 

Re: [DISCUSS] Dashboard/HistoryServer authentication

2021-06-03 Thread Márton Balassi
That is an interesting idea, Till.

The main issue with it is that TLS certificates have an expiration time,
usually they get approved for a couple years. Forcing our users to restart
jobs to reprovision TLS certificates would be weird when we could just
implement a single proper strong authentication mechanism instead in a
couple hundred lines of code. :-)

In many cases it is also impractical to go the TLS mutual route, because
the Flink Dashboard can end up on any node in the k8s/Yarn cluster which
means that we need a certificate per node (due to the mutual auth), but if
we also want to protect the private key of these from users accidentally or
intentionally leaking them then we need this per user. As in we end up
managing user*machine number certificates and having to renew them
periodically, which albeit automatable is unfortunately not yet automated
in all large organizations.

I fully agree that TLS certificate mutual authentication has its nice
properties, especially at very large (multiple thousand node) clusters -
but it has its own challenges too. Thanks for bringing it up.

Happy to have this added to the rejected alternative list so that we have
the full picture documented.

On Thu, Jun 3, 2021 at 5:52 PM Till Rohrmann  wrote:

> I guess the idea would then be to let the proxy do the authentication job
> and only forward the request via an SSL mutually encrypted connection to
> the Flink cluster. Would this be possible? The beauty of this setup is in
> my opinion that this setup should work with all kinds of authentication
> mechanisms.
>
> Cheers,
> Till
>
> On Thu, Jun 3, 2021 at 3:12 PM Gabor Somogyi 
> wrote:
>
>> Thanks for giving options to fulfil the need.
>>
>> Users are looking for a solution where users can be identified on the
>> whole cluster and restrict access to resources/actions.
>> A good example for such an action is cancelling other users running jobs.
>>
>> * SSL does provide mutual authentication but when authentication passed
>> there is no user based on restrictions can be made.
>> * The less problematic part is that generating/maintaining short time
>> valid certificates would be a hard (that's the reason KDC like servers
>> exist).
>> Having long time valid certificates would widen the attack surface but
>> since the first concern is there this is just a cosmetic issue.
>>
>> All in all using TLS certificates is not sufficient in these environments
>> unfortunately.
>>
>> BR,
>> G
>>
>>
>> On Thu, Jun 3, 2021 at 12:49 PM Till Rohrmann 
>> wrote:
>>
>>> Thanks for the information Gabor. If it is about securing the
>>> communication between the REST client and the REST server, then Flink
>>> already supports enabling mutual SSL authentication [1]. Would this be
>>> enough to secure the communication and to pass an audit?
>>>
>>> [1]
>>> https://ci.apache.org/projects/flink/flink-docs-master/docs/deployment/security/security-ssl/#external--rest-connectivity
>>>
>>> Cheers,
>>> Till
>>>
>>> On Thu, Jun 3, 2021 at 10:33 AM Gabor Somogyi 
>>> wrote:
>>>
 Hi Till,

 Since I'm working in security area 10+ years let me share my thought.
 I would like to emphasise there are experts better than me but I have
 some
 basics.
 The discussion is open and not trying to tell alone things...

 > I mean if an attacker can get access to one of the machines, then it
 should also be possible to obtain the right Kerberos token.
 Not necessarily. For example if one gets access to a specific user's
 credentials then it's not possible to compromise other user's jobs,
 data,
 etc...
 Security is like an onion, the more layers has been added the more time
 an
 attacker needs to proceed.
 At the end of the day if one is in, then most probably can find the way
 but
 this time is normally enough to sysadmins or security experts to
 close down the system and minimize the damage.

 The other thing is that all tokens has a timeout and if the token is
 invalid then the attacker can't proceed further.

 > Is Kerberos also the standard authentication protocol for Kubernetes
 deployments?
 Kerberos is an industry standard which is cloud/deployment agnostic and
 it
 can be used in any deployments including k8s.
 The main intention is to use kerberos in k8s deployments too since we're
 going this direction as well.
 Please see how Spark does this:

 https://spark.apache.org/docs/latest/security.html#secure-interaction-with-kubernetes

 Last but not least the most important reason to add at least one strong
 authentication is that we have users who has
 hard requirements on this. They're doing security audits and if they
 fail
 then it's deal breaking.
 That is why we have added kerberos at the first place. Unfortunately we
 can't name them in this public list, however
 the customers who specifically asked for this were mainly in the 

Re: [DISCUSS] Dashboard/HistoryServer authentication

2021-06-03 Thread Till Rohrmann
I guess the idea would then be to let the proxy do the authentication job
and only forward the request via an SSL mutually encrypted connection to
the Flink cluster. Would this be possible? The beauty of this setup is in
my opinion that this setup should work with all kinds of authentication
mechanisms.

Cheers,
Till

On Thu, Jun 3, 2021 at 3:12 PM Gabor Somogyi 
wrote:

> Thanks for giving options to fulfil the need.
>
> Users are looking for a solution where users can be identified on the
> whole cluster and restrict access to resources/actions.
> A good example for such an action is cancelling other users running jobs.
>
> * SSL does provide mutual authentication but when authentication passed
> there is no user based on restrictions can be made.
> * The less problematic part is that generating/maintaining short time
> valid certificates would be a hard (that's the reason KDC like servers
> exist).
> Having long time valid certificates would widen the attack surface but
> since the first concern is there this is just a cosmetic issue.
>
> All in all using TLS certificates is not sufficient in these environments
> unfortunately.
>
> BR,
> G
>
>
> On Thu, Jun 3, 2021 at 12:49 PM Till Rohrmann 
> wrote:
>
>> Thanks for the information Gabor. If it is about securing the
>> communication between the REST client and the REST server, then Flink
>> already supports enabling mutual SSL authentication [1]. Would this be
>> enough to secure the communication and to pass an audit?
>>
>> [1]
>> https://ci.apache.org/projects/flink/flink-docs-master/docs/deployment/security/security-ssl/#external--rest-connectivity
>>
>> Cheers,
>> Till
>>
>> On Thu, Jun 3, 2021 at 10:33 AM Gabor Somogyi 
>> wrote:
>>
>>> Hi Till,
>>>
>>> Since I'm working in security area 10+ years let me share my thought.
>>> I would like to emphasise there are experts better than me but I have
>>> some
>>> basics.
>>> The discussion is open and not trying to tell alone things...
>>>
>>> > I mean if an attacker can get access to one of the machines, then it
>>> should also be possible to obtain the right Kerberos token.
>>> Not necessarily. For example if one gets access to a specific user's
>>> credentials then it's not possible to compromise other user's jobs, data,
>>> etc...
>>> Security is like an onion, the more layers has been added the more time
>>> an
>>> attacker needs to proceed.
>>> At the end of the day if one is in, then most probably can find the way
>>> but
>>> this time is normally enough to sysadmins or security experts to
>>> close down the system and minimize the damage.
>>>
>>> The other thing is that all tokens has a timeout and if the token is
>>> invalid then the attacker can't proceed further.
>>>
>>> > Is Kerberos also the standard authentication protocol for Kubernetes
>>> deployments?
>>> Kerberos is an industry standard which is cloud/deployment agnostic and
>>> it
>>> can be used in any deployments including k8s.
>>> The main intention is to use kerberos in k8s deployments too since we're
>>> going this direction as well.
>>> Please see how Spark does this:
>>>
>>> https://spark.apache.org/docs/latest/security.html#secure-interaction-with-kubernetes
>>>
>>> Last but not least the most important reason to add at least one strong
>>> authentication is that we have users who has
>>> hard requirements on this. They're doing security audits and if they fail
>>> then it's deal breaking.
>>> That is why we have added kerberos at the first place. Unfortunately we
>>> can't name them in this public list, however
>>> the customers who specifically asked for this were mainly in the banking
>>> and telco sector.
>>>
>>> BR,
>>> G
>>>
>>>
>>> On Thu, Jun 3, 2021 at 9:20 AM Till Rohrmann 
>>> wrote:
>>>
>>> > Thanks for updating the document Márton. Why is it that banks will
>>> > consider it more secure if Flink comes with Kerberos authentication
>>> > (assuming a properly secured setup)? I mean if an attacker can get
>>> access
>>> > to one of the machines, then it should also be possible to obtain the
>>> right
>>> > Kerberos token.
>>> >
>>> > I am not an authentication expert and that's why I wanted to ask what
>>> are
>>> > other authentication protocols other than Kerberos? Why did we select
>>> > Kerberos and not any other authentication protocol? Maybe you can list
>>> the
>>> > pros and cons for the different protocols. Is Kerberos also the
>>> standard
>>> > authentication protocol for Kubernetes deployments? If not, what would
>>> be
>>> > the answer when deploying on K8s?
>>> >
>>> > Cheers,
>>> > Till
>>> >
>>> > On Wed, Jun 2, 2021 at 12:07 PM Gabor Somogyi <
>>> gabor.g.somo...@gmail.com>
>>> > wrote:
>>> >
>>> >> Hi team,
>>> >>
>>> >> Happy to be here and hope I can provide quality additions in the
>>> future.
>>> >>
>>> >> Thank you all for helpful the suggestions!
>>> >> Considering them the FLIP has been modified and the work continues on
>>> the
>>> >> already existing Jira.
>>> >>
>>> >> BR,
>>> >> G
>>> >>

Re: [DISCUSS] Dashboard/HistoryServer authentication

2021-06-03 Thread Gabor Somogyi
Thanks for giving options to fulfil the need.

Users are looking for a solution where users can be identified on the whole
cluster and restrict access to resources/actions.
A good example for such an action is cancelling other users running jobs.

* SSL does provide mutual authentication but when authentication passed
there is no user based on restrictions can be made.
* The less problematic part is that generating/maintaining short time valid
certificates would be a hard (that's the reason KDC like servers exist).
Having long time valid certificates would widen the attack surface but
since the first concern is there this is just a cosmetic issue.

All in all using TLS certificates is not sufficient in these environments
unfortunately.

BR,
G


On Thu, Jun 3, 2021 at 12:49 PM Till Rohrmann  wrote:

> Thanks for the information Gabor. If it is about securing the
> communication between the REST client and the REST server, then Flink
> already supports enabling mutual SSL authentication [1]. Would this be
> enough to secure the communication and to pass an audit?
>
> [1]
> https://ci.apache.org/projects/flink/flink-docs-master/docs/deployment/security/security-ssl/#external--rest-connectivity
>
> Cheers,
> Till
>
> On Thu, Jun 3, 2021 at 10:33 AM Gabor Somogyi 
> wrote:
>
>> Hi Till,
>>
>> Since I'm working in security area 10+ years let me share my thought.
>> I would like to emphasise there are experts better than me but I have some
>> basics.
>> The discussion is open and not trying to tell alone things...
>>
>> > I mean if an attacker can get access to one of the machines, then it
>> should also be possible to obtain the right Kerberos token.
>> Not necessarily. For example if one gets access to a specific user's
>> credentials then it's not possible to compromise other user's jobs, data,
>> etc...
>> Security is like an onion, the more layers has been added the more time an
>> attacker needs to proceed.
>> At the end of the day if one is in, then most probably can find the way
>> but
>> this time is normally enough to sysadmins or security experts to
>> close down the system and minimize the damage.
>>
>> The other thing is that all tokens has a timeout and if the token is
>> invalid then the attacker can't proceed further.
>>
>> > Is Kerberos also the standard authentication protocol for Kubernetes
>> deployments?
>> Kerberos is an industry standard which is cloud/deployment agnostic and it
>> can be used in any deployments including k8s.
>> The main intention is to use kerberos in k8s deployments too since we're
>> going this direction as well.
>> Please see how Spark does this:
>>
>> https://spark.apache.org/docs/latest/security.html#secure-interaction-with-kubernetes
>>
>> Last but not least the most important reason to add at least one strong
>> authentication is that we have users who has
>> hard requirements on this. They're doing security audits and if they fail
>> then it's deal breaking.
>> That is why we have added kerberos at the first place. Unfortunately we
>> can't name them in this public list, however
>> the customers who specifically asked for this were mainly in the banking
>> and telco sector.
>>
>> BR,
>> G
>>
>>
>> On Thu, Jun 3, 2021 at 9:20 AM Till Rohrmann 
>> wrote:
>>
>> > Thanks for updating the document Márton. Why is it that banks will
>> > consider it more secure if Flink comes with Kerberos authentication
>> > (assuming a properly secured setup)? I mean if an attacker can get
>> access
>> > to one of the machines, then it should also be possible to obtain the
>> right
>> > Kerberos token.
>> >
>> > I am not an authentication expert and that's why I wanted to ask what
>> are
>> > other authentication protocols other than Kerberos? Why did we select
>> > Kerberos and not any other authentication protocol? Maybe you can list
>> the
>> > pros and cons for the different protocols. Is Kerberos also the standard
>> > authentication protocol for Kubernetes deployments? If not, what would
>> be
>> > the answer when deploying on K8s?
>> >
>> > Cheers,
>> > Till
>> >
>> > On Wed, Jun 2, 2021 at 12:07 PM Gabor Somogyi <
>> gabor.g.somo...@gmail.com>
>> > wrote:
>> >
>> >> Hi team,
>> >>
>> >> Happy to be here and hope I can provide quality additions in the
>> future.
>> >>
>> >> Thank you all for helpful the suggestions!
>> >> Considering them the FLIP has been modified and the work continues on
>> the
>> >> already existing Jira.
>> >>
>> >> BR,
>> >> G
>> >>
>> >>
>> >> On Wed, Jun 2, 2021 at 11:23 AM Márton Balassi <
>> balassi.mar...@gmail.com>
>> >> wrote:
>> >>
>> >>> Thanks, Chesney - I totally missed that. Answered on the ticket too,
>> let
>> >>> us continue there then.
>> >>>
>> >>> Till, I agree that we should keep this codepath as slim as possible.
>> It
>> >>> is an important design decision that we aim to keep the list of
>> >>> authentication protocols to a minimum. We believe that this should
>> not be a
>> >>> primary concern of Flink and a trusted proxy service 

Re: [DISCUSS] Dashboard/HistoryServer authentication

2021-06-03 Thread Till Rohrmann
Thanks for the information Gabor. If it is about securing the
communication between the REST client and the REST server, then Flink
already supports enabling mutual SSL authentication [1]. Would this be
enough to secure the communication and to pass an audit?

[1]
https://ci.apache.org/projects/flink/flink-docs-master/docs/deployment/security/security-ssl/#external--rest-connectivity

Cheers,
Till

On Thu, Jun 3, 2021 at 10:33 AM Gabor Somogyi 
wrote:

> Hi Till,
>
> Since I'm working in security area 10+ years let me share my thought.
> I would like to emphasise there are experts better than me but I have some
> basics.
> The discussion is open and not trying to tell alone things...
>
> > I mean if an attacker can get access to one of the machines, then it
> should also be possible to obtain the right Kerberos token.
> Not necessarily. For example if one gets access to a specific user's
> credentials then it's not possible to compromise other user's jobs, data,
> etc...
> Security is like an onion, the more layers has been added the more time an
> attacker needs to proceed.
> At the end of the day if one is in, then most probably can find the way but
> this time is normally enough to sysadmins or security experts to
> close down the system and minimize the damage.
>
> The other thing is that all tokens has a timeout and if the token is
> invalid then the attacker can't proceed further.
>
> > Is Kerberos also the standard authentication protocol for Kubernetes
> deployments?
> Kerberos is an industry standard which is cloud/deployment agnostic and it
> can be used in any deployments including k8s.
> The main intention is to use kerberos in k8s deployments too since we're
> going this direction as well.
> Please see how Spark does this:
>
> https://spark.apache.org/docs/latest/security.html#secure-interaction-with-kubernetes
>
> Last but not least the most important reason to add at least one strong
> authentication is that we have users who has
> hard requirements on this. They're doing security audits and if they fail
> then it's deal breaking.
> That is why we have added kerberos at the first place. Unfortunately we
> can't name them in this public list, however
> the customers who specifically asked for this were mainly in the banking
> and telco sector.
>
> BR,
> G
>
>
> On Thu, Jun 3, 2021 at 9:20 AM Till Rohrmann  wrote:
>
> > Thanks for updating the document Márton. Why is it that banks will
> > consider it more secure if Flink comes with Kerberos authentication
> > (assuming a properly secured setup)? I mean if an attacker can get access
> > to one of the machines, then it should also be possible to obtain the
> right
> > Kerberos token.
> >
> > I am not an authentication expert and that's why I wanted to ask what are
> > other authentication protocols other than Kerberos? Why did we select
> > Kerberos and not any other authentication protocol? Maybe you can list
> the
> > pros and cons for the different protocols. Is Kerberos also the standard
> > authentication protocol for Kubernetes deployments? If not, what would be
> > the answer when deploying on K8s?
> >
> > Cheers,
> > Till
> >
> > On Wed, Jun 2, 2021 at 12:07 PM Gabor Somogyi  >
> > wrote:
> >
> >> Hi team,
> >>
> >> Happy to be here and hope I can provide quality additions in the future.
> >>
> >> Thank you all for helpful the suggestions!
> >> Considering them the FLIP has been modified and the work continues on
> the
> >> already existing Jira.
> >>
> >> BR,
> >> G
> >>
> >>
> >> On Wed, Jun 2, 2021 at 11:23 AM Márton Balassi <
> balassi.mar...@gmail.com>
> >> wrote:
> >>
> >>> Thanks, Chesney - I totally missed that. Answered on the ticket too,
> let
> >>> us continue there then.
> >>>
> >>> Till, I agree that we should keep this codepath as slim as possible. It
> >>> is an important design decision that we aim to keep the list of
> >>> authentication protocols to a minimum. We believe that this should not
> be a
> >>> primary concern of Flink and a trusted proxy service (for example
> Apache
> >>> Knox) should be used to enable a multitude of enduser authentication
> >>> mechanisms. The bare minimum of authentication mechanisms to support
> >>> consequently consist of a single strong authentication protocol for
> which
> >>> Kerberos is the enterprise solution and HTTP Basic primary for
> development
> >>> and light-weight scenarios.
> >>>
> >>> Added the above wording to G's doc.
> >>>
> >>>
> https://docs.google.com/document/d/1NMPeJ9H0G49TGy3AzTVVJVKmYC0okwOtqLTSPnGqzHw/edit
> >>>
> >>>
> >>>
> >>> On Tue, Jun 1, 2021 at 11:47 AM Chesnay Schepler 
> >>> wrote:
> >>>
>  There's a related effort:
>  https://issues.apache.org/jira/browse/FLINK-21108
> 
>  On 6/1/2021 10:14 AM, Till Rohrmann wrote:
>  > Hi Gabor, welcome to the Flink community!
>  >
>  > Thanks for sharing this proposal with the community Márton. In
>  general, I
>  > agree that authentication is missing and that this is required 

Re: [DISCUSS] Dashboard/HistoryServer authentication

2021-06-03 Thread Gabor Somogyi
Hi Till,

Since I'm working in security area 10+ years let me share my thought.
I would like to emphasise there are experts better than me but I have some
basics.
The discussion is open and not trying to tell alone things...

> I mean if an attacker can get access to one of the machines, then it
should also be possible to obtain the right Kerberos token.
Not necessarily. For example if one gets access to a specific user's
credentials then it's not possible to compromise other user's jobs, data,
etc...
Security is like an onion, the more layers has been added the more time an
attacker needs to proceed.
At the end of the day if one is in, then most probably can find the way but
this time is normally enough to sysadmins or security experts to
close down the system and minimize the damage.

The other thing is that all tokens has a timeout and if the token is
invalid then the attacker can't proceed further.

> Is Kerberos also the standard authentication protocol for Kubernetes
deployments?
Kerberos is an industry standard which is cloud/deployment agnostic and it
can be used in any deployments including k8s.
The main intention is to use kerberos in k8s deployments too since we're
going this direction as well.
Please see how Spark does this:
https://spark.apache.org/docs/latest/security.html#secure-interaction-with-kubernetes

Last but not least the most important reason to add at least one strong
authentication is that we have users who has
hard requirements on this. They're doing security audits and if they fail
then it's deal breaking.
That is why we have added kerberos at the first place. Unfortunately we
can't name them in this public list, however
the customers who specifically asked for this were mainly in the banking
and telco sector.

BR,
G


On Thu, Jun 3, 2021 at 9:20 AM Till Rohrmann  wrote:

> Thanks for updating the document Márton. Why is it that banks will
> consider it more secure if Flink comes with Kerberos authentication
> (assuming a properly secured setup)? I mean if an attacker can get access
> to one of the machines, then it should also be possible to obtain the right
> Kerberos token.
>
> I am not an authentication expert and that's why I wanted to ask what are
> other authentication protocols other than Kerberos? Why did we select
> Kerberos and not any other authentication protocol? Maybe you can list the
> pros and cons for the different protocols. Is Kerberos also the standard
> authentication protocol for Kubernetes deployments? If not, what would be
> the answer when deploying on K8s?
>
> Cheers,
> Till
>
> On Wed, Jun 2, 2021 at 12:07 PM Gabor Somogyi 
> wrote:
>
>> Hi team,
>>
>> Happy to be here and hope I can provide quality additions in the future.
>>
>> Thank you all for helpful the suggestions!
>> Considering them the FLIP has been modified and the work continues on the
>> already existing Jira.
>>
>> BR,
>> G
>>
>>
>> On Wed, Jun 2, 2021 at 11:23 AM Márton Balassi 
>> wrote:
>>
>>> Thanks, Chesney - I totally missed that. Answered on the ticket too, let
>>> us continue there then.
>>>
>>> Till, I agree that we should keep this codepath as slim as possible. It
>>> is an important design decision that we aim to keep the list of
>>> authentication protocols to a minimum. We believe that this should not be a
>>> primary concern of Flink and a trusted proxy service (for example Apache
>>> Knox) should be used to enable a multitude of enduser authentication
>>> mechanisms. The bare minimum of authentication mechanisms to support
>>> consequently consist of a single strong authentication protocol for which
>>> Kerberos is the enterprise solution and HTTP Basic primary for development
>>> and light-weight scenarios.
>>>
>>> Added the above wording to G's doc.
>>>
>>> https://docs.google.com/document/d/1NMPeJ9H0G49TGy3AzTVVJVKmYC0okwOtqLTSPnGqzHw/edit
>>>
>>>
>>>
>>> On Tue, Jun 1, 2021 at 11:47 AM Chesnay Schepler 
>>> wrote:
>>>
 There's a related effort:
 https://issues.apache.org/jira/browse/FLINK-21108

 On 6/1/2021 10:14 AM, Till Rohrmann wrote:
 > Hi Gabor, welcome to the Flink community!
 >
 > Thanks for sharing this proposal with the community Márton. In
 general, I
 > agree that authentication is missing and that this is required for
 using
 > Flink within an enterprise. The thing I am wondering is whether this
 > feature strictly needs to be implemented inside of Flink or whether a
 proxy
 > setup could do the job? Have you considered this option? If yes, then
 it
 > would be good to list it under the point of rejected alternatives.
 >
 > I do see the benefit of implementing this feature inside of Flink if
 many
 > users need it. If not, then it might be easier for the project to not
 > increase the surface area since it makes the overall maintenance
 harder.
 >
 > Cheers,
 > Till
 >
 > On Mon, May 31, 2021 at 4:57 PM Márton Balassi 
 wrote:
 >
 >> Hi 

Re: [DISCUSS] Dashboard/HistoryServer authentication

2021-06-03 Thread Till Rohrmann
Thanks for updating the document Márton. Why is it that banks will consider
it more secure if Flink comes with Kerberos authentication (assuming a
properly secured setup)? I mean if an attacker can get access to one of the
machines, then it should also be possible to obtain the right Kerberos
token.

I am not an authentication expert and that's why I wanted to ask what are
other authentication protocols other than Kerberos? Why did we select
Kerberos and not any other authentication protocol? Maybe you can list the
pros and cons for the different protocols. Is Kerberos also the standard
authentication protocol for Kubernetes deployments? If not, what would be
the answer when deploying on K8s?

Cheers,
Till

On Wed, Jun 2, 2021 at 12:07 PM Gabor Somogyi 
wrote:

> Hi team,
>
> Happy to be here and hope I can provide quality additions in the future.
>
> Thank you all for helpful the suggestions!
> Considering them the FLIP has been modified and the work continues on the
> already existing Jira.
>
> BR,
> G
>
>
> On Wed, Jun 2, 2021 at 11:23 AM Márton Balassi 
> wrote:
>
>> Thanks, Chesney - I totally missed that. Answered on the ticket too, let
>> us continue there then.
>>
>> Till, I agree that we should keep this codepath as slim as possible. It
>> is an important design decision that we aim to keep the list of
>> authentication protocols to a minimum. We believe that this should not be a
>> primary concern of Flink and a trusted proxy service (for example Apache
>> Knox) should be used to enable a multitude of enduser authentication
>> mechanisms. The bare minimum of authentication mechanisms to support
>> consequently consist of a single strong authentication protocol for which
>> Kerberos is the enterprise solution and HTTP Basic primary for development
>> and light-weight scenarios.
>>
>> Added the above wording to G's doc.
>>
>> https://docs.google.com/document/d/1NMPeJ9H0G49TGy3AzTVVJVKmYC0okwOtqLTSPnGqzHw/edit
>>
>>
>>
>> On Tue, Jun 1, 2021 at 11:47 AM Chesnay Schepler 
>> wrote:
>>
>>> There's a related effort:
>>> https://issues.apache.org/jira/browse/FLINK-21108
>>>
>>> On 6/1/2021 10:14 AM, Till Rohrmann wrote:
>>> > Hi Gabor, welcome to the Flink community!
>>> >
>>> > Thanks for sharing this proposal with the community Márton. In
>>> general, I
>>> > agree that authentication is missing and that this is required for
>>> using
>>> > Flink within an enterprise. The thing I am wondering is whether this
>>> > feature strictly needs to be implemented inside of Flink or whether a
>>> proxy
>>> > setup could do the job? Have you considered this option? If yes, then
>>> it
>>> > would be good to list it under the point of rejected alternatives.
>>> >
>>> > I do see the benefit of implementing this feature inside of Flink if
>>> many
>>> > users need it. If not, then it might be easier for the project to not
>>> > increase the surface area since it makes the overall maintenance
>>> harder.
>>> >
>>> > Cheers,
>>> > Till
>>> >
>>> > On Mon, May 31, 2021 at 4:57 PM Márton Balassi 
>>> wrote:
>>> >
>>> >> Hi team,
>>> >>
>>> >> Firstly I would like to introduce Gabor or G [1] for short to the
>>> >> community, he is a Spark committer who has recently transitioned to
>>> the
>>> >> Flink Engineering team at Cloudera and is looking forward to
>>> contributing
>>> >> to Apache Flink. Previously G primarily focused on Spark Streaming and
>>> >> security.
>>> >>
>>> >> Based on requests from our customers G has implemented Kerberos and
>>> HTTP
>>> >> Basic Authentication for the Flink Dashboard and HistoryServer.
>>> Previously
>>> >> lacked an authentication story.
>>> >>
>>> >> We are looking to contribute this functionality back to the
>>> community, we
>>> >> believe that given Flink's maturity there should be a common code
>>> solution
>>> >> for this general pattern.
>>> >>
>>> >> We are looking forward to your feedback on G's design. [2]
>>> >>
>>> >> [1] http://gaborsomogyi.com/
>>> >> [2]
>>> >>
>>> >>
>>> https://docs.google.com/document/d/1NMPeJ9H0G49TGy3AzTVVJVKmYC0okwOtqLTSPnGqzHw/edit
>>> >>
>>>
>>>


Re: [DISCUSS] Dashboard/HistoryServer authentication

2021-06-02 Thread Gabor Somogyi
Hi team,

Happy to be here and hope I can provide quality additions in the future.

Thank you all for helpful the suggestions!
Considering them the FLIP has been modified and the work continues on the
already existing Jira.

BR,
G


On Wed, Jun 2, 2021 at 11:23 AM Márton Balassi 
wrote:

> Thanks, Chesney - I totally missed that. Answered on the ticket too, let
> us continue there then.
>
> Till, I agree that we should keep this codepath as slim as possible. It is
> an important design decision that we aim to keep the list of authentication
> protocols to a minimum. We believe that this should not be a primary
> concern of Flink and a trusted proxy service (for example Apache Knox)
> should be used to enable a multitude of enduser authentication mechanisms.
> The bare minimum of authentication mechanisms to support consequently
> consist of a single strong authentication protocol for which Kerberos is
> the enterprise solution and HTTP Basic primary for development and
> light-weight scenarios.
>
> Added the above wording to G's doc.
>
> https://docs.google.com/document/d/1NMPeJ9H0G49TGy3AzTVVJVKmYC0okwOtqLTSPnGqzHw/edit
>
>
>
> On Tue, Jun 1, 2021 at 11:47 AM Chesnay Schepler 
> wrote:
>
>> There's a related effort:
>> https://issues.apache.org/jira/browse/FLINK-21108
>>
>> On 6/1/2021 10:14 AM, Till Rohrmann wrote:
>> > Hi Gabor, welcome to the Flink community!
>> >
>> > Thanks for sharing this proposal with the community Márton. In general,
>> I
>> > agree that authentication is missing and that this is required for using
>> > Flink within an enterprise. The thing I am wondering is whether this
>> > feature strictly needs to be implemented inside of Flink or whether a
>> proxy
>> > setup could do the job? Have you considered this option? If yes, then it
>> > would be good to list it under the point of rejected alternatives.
>> >
>> > I do see the benefit of implementing this feature inside of Flink if
>> many
>> > users need it. If not, then it might be easier for the project to not
>> > increase the surface area since it makes the overall maintenance harder.
>> >
>> > Cheers,
>> > Till
>> >
>> > On Mon, May 31, 2021 at 4:57 PM Márton Balassi 
>> wrote:
>> >
>> >> Hi team,
>> >>
>> >> Firstly I would like to introduce Gabor or G [1] for short to the
>> >> community, he is a Spark committer who has recently transitioned to the
>> >> Flink Engineering team at Cloudera and is looking forward to
>> contributing
>> >> to Apache Flink. Previously G primarily focused on Spark Streaming and
>> >> security.
>> >>
>> >> Based on requests from our customers G has implemented Kerberos and
>> HTTP
>> >> Basic Authentication for the Flink Dashboard and HistoryServer.
>> Previously
>> >> lacked an authentication story.
>> >>
>> >> We are looking to contribute this functionality back to the community,
>> we
>> >> believe that given Flink's maturity there should be a common code
>> solution
>> >> for this general pattern.
>> >>
>> >> We are looking forward to your feedback on G's design. [2]
>> >>
>> >> [1] http://gaborsomogyi.com/
>> >> [2]
>> >>
>> >>
>> https://docs.google.com/document/d/1NMPeJ9H0G49TGy3AzTVVJVKmYC0okwOtqLTSPnGqzHw/edit
>> >>
>>
>>


Re: [DISCUSS] Dashboard/HistoryServer authentication

2021-06-02 Thread Márton Balassi
Thanks, Chesney - I totally missed that. Answered on the ticket too, let us
continue there then.

Till, I agree that we should keep this codepath as slim as possible. It is
an important design decision that we aim to keep the list of authentication
protocols to a minimum. We believe that this should not be a primary
concern of Flink and a trusted proxy service (for example Apache Knox)
should be used to enable a multitude of enduser authentication mechanisms.
The bare minimum of authentication mechanisms to support consequently
consist of a single strong authentication protocol for which Kerberos is
the enterprise solution and HTTP Basic primary for development and
light-weight scenarios.

Added the above wording to G's doc.
https://docs.google.com/document/d/1NMPeJ9H0G49TGy3AzTVVJVKmYC0okwOtqLTSPnGqzHw/edit



On Tue, Jun 1, 2021 at 11:47 AM Chesnay Schepler  wrote:

> There's a related effort:
> https://issues.apache.org/jira/browse/FLINK-21108
>
> On 6/1/2021 10:14 AM, Till Rohrmann wrote:
> > Hi Gabor, welcome to the Flink community!
> >
> > Thanks for sharing this proposal with the community Márton. In general, I
> > agree that authentication is missing and that this is required for using
> > Flink within an enterprise. The thing I am wondering is whether this
> > feature strictly needs to be implemented inside of Flink or whether a
> proxy
> > setup could do the job? Have you considered this option? If yes, then it
> > would be good to list it under the point of rejected alternatives.
> >
> > I do see the benefit of implementing this feature inside of Flink if many
> > users need it. If not, then it might be easier for the project to not
> > increase the surface area since it makes the overall maintenance harder.
> >
> > Cheers,
> > Till
> >
> > On Mon, May 31, 2021 at 4:57 PM Márton Balassi 
> wrote:
> >
> >> Hi team,
> >>
> >> Firstly I would like to introduce Gabor or G [1] for short to the
> >> community, he is a Spark committer who has recently transitioned to the
> >> Flink Engineering team at Cloudera and is looking forward to
> contributing
> >> to Apache Flink. Previously G primarily focused on Spark Streaming and
> >> security.
> >>
> >> Based on requests from our customers G has implemented Kerberos and HTTP
> >> Basic Authentication for the Flink Dashboard and HistoryServer.
> Previously
> >> lacked an authentication story.
> >>
> >> We are looking to contribute this functionality back to the community,
> we
> >> believe that given Flink's maturity there should be a common code
> solution
> >> for this general pattern.
> >>
> >> We are looking forward to your feedback on G's design. [2]
> >>
> >> [1] http://gaborsomogyi.com/
> >> [2]
> >>
> >>
> https://docs.google.com/document/d/1NMPeJ9H0G49TGy3AzTVVJVKmYC0okwOtqLTSPnGqzHw/edit
> >>
>
>


Re: [DISCUSS] Dashboard/HistoryServer authentication

2021-06-01 Thread Chesnay Schepler

There's a related effort: https://issues.apache.org/jira/browse/FLINK-21108

On 6/1/2021 10:14 AM, Till Rohrmann wrote:

Hi Gabor, welcome to the Flink community!

Thanks for sharing this proposal with the community Márton. In general, I
agree that authentication is missing and that this is required for using
Flink within an enterprise. The thing I am wondering is whether this
feature strictly needs to be implemented inside of Flink or whether a proxy
setup could do the job? Have you considered this option? If yes, then it
would be good to list it under the point of rejected alternatives.

I do see the benefit of implementing this feature inside of Flink if many
users need it. If not, then it might be easier for the project to not
increase the surface area since it makes the overall maintenance harder.

Cheers,
Till

On Mon, May 31, 2021 at 4:57 PM Márton Balassi  wrote:


Hi team,

Firstly I would like to introduce Gabor or G [1] for short to the
community, he is a Spark committer who has recently transitioned to the
Flink Engineering team at Cloudera and is looking forward to contributing
to Apache Flink. Previously G primarily focused on Spark Streaming and
security.

Based on requests from our customers G has implemented Kerberos and HTTP
Basic Authentication for the Flink Dashboard and HistoryServer. Previously
lacked an authentication story.

We are looking to contribute this functionality back to the community, we
believe that given Flink's maturity there should be a common code solution
for this general pattern.

We are looking forward to your feedback on G's design. [2]

[1] http://gaborsomogyi.com/
[2]

https://docs.google.com/document/d/1NMPeJ9H0G49TGy3AzTVVJVKmYC0okwOtqLTSPnGqzHw/edit





Re: [DISCUSS] Dashboard/HistoryServer authentication

2021-06-01 Thread Till Rohrmann
Hi Gabor, welcome to the Flink community!

Thanks for sharing this proposal with the community Márton. In general, I
agree that authentication is missing and that this is required for using
Flink within an enterprise. The thing I am wondering is whether this
feature strictly needs to be implemented inside of Flink or whether a proxy
setup could do the job? Have you considered this option? If yes, then it
would be good to list it under the point of rejected alternatives.

I do see the benefit of implementing this feature inside of Flink if many
users need it. If not, then it might be easier for the project to not
increase the surface area since it makes the overall maintenance harder.

Cheers,
Till

On Mon, May 31, 2021 at 4:57 PM Márton Balassi  wrote:

> Hi team,
>
> Firstly I would like to introduce Gabor or G [1] for short to the
> community, he is a Spark committer who has recently transitioned to the
> Flink Engineering team at Cloudera and is looking forward to contributing
> to Apache Flink. Previously G primarily focused on Spark Streaming and
> security.
>
> Based on requests from our customers G has implemented Kerberos and HTTP
> Basic Authentication for the Flink Dashboard and HistoryServer. Previously
> lacked an authentication story.
>
> We are looking to contribute this functionality back to the community, we
> believe that given Flink's maturity there should be a common code solution
> for this general pattern.
>
> We are looking forward to your feedback on G's design. [2]
>
> [1] http://gaborsomogyi.com/
> [2]
>
> https://docs.google.com/document/d/1NMPeJ9H0G49TGy3AzTVVJVKmYC0okwOtqLTSPnGqzHw/edit
>