Re: [DISCUSS] FLIP-438: Make Flink's Hadoop and YARN configuration probing consistent

2024-04-18 Thread Ferenc Csaky
Hi Venkata krishnan,

My general point was that personally I do not think that the
current implementation is wrong or confusing. And the main thing
here is that how we define consistent in this case is subjective.
>From the proposed point of view, consistent mean we use the same
prefix. But we can consistently use the least required prefix
groups to identify a subsystem property, which is how it works
right now. The prop naming conventions of these dependent systems
are different, so do their prefixes in Flink.

It is very possible I am in minority with my view,
but I do not think duplicating the `yarn` prefix to make it
conform with Hadoop props would be a better UX as it is now, just
different. 

One thing that less opinionated maybe is if the proposed solution
simplifies the property load logic. Currently, Hadoop props from
the Flink conf are parsed in `HadoopUtils` (flink-hadoop-fs
module), while Yarn props in `Utils` (flink-yarn module). Maybe
`org.apache.flink.configuration.Configuration` could have a helper
to extract all prefixed properties to a `Map`, or another
`Configuration` object (the latter could be easily added as a
resource to the dependent system config objects).

That simplification could make the overall prefixed load logic
more clean IMO and that is something that would be useful.

WDYT?

Best,
Ferenc





On Monday, April 15th, 2024 at 20:55, Venkatakrishnan Sowrirajan 
 wrote:

> 
> 
> Sorry for the late reply, Ferenc.
> 
> I understand the rationale behind the current implementation as the problem
> is slightly different b/w yarn (always prefixed with `yarn`) and hadoop (it
> is not guaranteed all `hadoop` configs will be prefixed by `hadoop`)
> configs.
> 
> From the dev UX perspective, it is confusing and only if you really pay
> close attention to the docs it is evident. I understand your point on added
> complexity till Flink-3.0 but if we agree it should be made consistent, it
> has to be done at some point of time right?
> 
> Regards
> Venkata krishnan
> 
> 
> On Wed, Apr 3, 2024 at 4:51 AM Ferenc Csaky ferenc.cs...@pm.me.invalid
> 
> wrote:
> 
> > Hi Venkata,
> > 
> > Thank you for opening the discussion about this!
> > 
> > After taking a look at the YARN and Hadoop configurations, the
> > reason why it was implemented this way is that, in case of YARN,
> > every YARN-specific property is prefixed with "yarn.", so to get
> > the final, YARN-side property it is enough to remove the "flink."
> > prefix.
> > 
> > In case of Hadoop, there are properties that not prefixed with
> > "hadoop.", e.g. "dfs.replication" so to identify and get the
> > Hadoop-side property it is necessary to duplicate the "hadoop" part
> > in the properties.
> > 
> > Taking this into consideration I would personally say -0 to this
> > change. IMO the current behavior can be justified as giving
> > slightly different solutions to slightly different problems, which
> > are well documented. Handling both prefixes would complicate the
> > parsing logic until the APIs can be removed, which as it looks at
> > the moment would only be possible in Flink 3.0, which probably will
> > not happen in the foreseeable future, so I do not see the benefit
> > of the added complexity.
> > 
> > Regarding the FLIP, in the "YARN configuration override example"
> > part, I think you should present an example that works correctly
> > at the moment: "flink.yarn.application.classpath" ->
> > "yarn.application.classpath".
> > 
> > Best,
> > Ferenc
> > 
> > On Friday, March 29th, 2024 at 23:45, Venkatakrishnan Sowrirajan <
> > vsowr...@asu.edu> wrote:
> > 
> > > Hi Flink devs,
> > > 
> > > I would like to start a discussion on FLIP-XXX: Make Flink's Hadoop and
> > > YARN configuration probing consistent
> > 
> > https://urldefense.com/v3/__https://docs.google.com/document/d/1I2jBFI0eVkofAVCAEeajNQRfOqKGJsRfZd54h79AIYc/edit?usp=sharing__;!!IKRxdwAv5BmarQ!d0XJO_mzLCJZNkrjJDMyRGP95zPLW8Cuym88l7CoAUG8aD_KRYJbll3K-q1Ypplyqe6-jcsWq3S8YJqrDMCpK4IhpT4cZPXy$
> > .
> > 
> > > This stems from an earlier discussion thread here
> > 
> > https://urldefense.com/v3/__https://lists.apache.org/thread/l2fh5shbf59fjgbt1h73pmmsqj038ppv__;!!IKRxdwAv5BmarQ!d0XJO_mzLCJZNkrjJDMyRGP95zPLW8Cuym88l7CoAUG8aD_KRYJbll3K-q1Ypplyqe6-jcsWq3S8YJqrDMCpK4IhpW60A99X$
> > .
> > 
> > > This FLIP is proposing to make the configuration probing behavior between
> > > Hadoop and YARN configuration to be consistent.
> > > 
> > > Regards
> > > Venkata krishnan


Re: [DISCUSS] FLIP-438: Make Flink's Hadoop and YARN configuration probing consistent

2024-04-15 Thread Venkatakrishnan Sowrirajan
Sorry for the late reply, Ferenc.

I understand the rationale behind the current implementation as the problem
is slightly different b/w yarn (always prefixed with `yarn`) and hadoop (it
is not guaranteed all `hadoop` configs will be prefixed by `hadoop`)
configs.

>From the dev UX perspective, it is confusing and only if you really pay
close attention to the docs it is evident. I understand your point on added
complexity till Flink-3.0 but if we agree it should be made consistent, it
has to be done at some point of time right?

Regards
Venkata krishnan


On Wed, Apr 3, 2024 at 4:51 AM Ferenc Csaky 
wrote:

> Hi Venkata,
>
> Thank you for opening the discussion about this!
>
> After taking a look at the YARN and Hadoop configurations, the
> reason why it was implemented this way is that, in case of YARN,
> every YARN-specific property is prefixed with "yarn.", so to get
> the final, YARN-side property it is enough to remove the "flink."
> prefix.
>
> In case of Hadoop, there are properties that not prefixed with
> "hadoop.", e.g. "dfs.replication" so to identify and get the
> Hadoop-side property it is necessary to duplicate the "hadoop" part
> in the properties.
>
> Taking this into consideration I would personally say -0 to this
> change. IMO the current behavior can be justified as giving
> slightly different solutions to slightly different problems, which
> are well documented. Handling both prefixes would complicate the
> parsing logic until the APIs can be removed, which as it looks at
> the moment would only be possible in Flink 3.0, which probably will
> not happen in the foreseeable future, so I do not see the benefit
> of the added complexity.
>
> Regarding the FLIP, in the "YARN configuration override example"
> part, I think you should present an example that works correctly
> at the moment: "flink.yarn.application.classpath" ->
> "yarn.application.classpath".
>
> Best,
> Ferenc
>
>
> On Friday, March 29th, 2024 at 23:45, Venkatakrishnan Sowrirajan <
> vsowr...@asu.edu> wrote:
>
> >
> >
> > Hi Flink devs,
> >
> > I would like to start a discussion on FLIP-XXX: Make Flink's Hadoop and
> > YARN configuration probing consistent
> >
> https://urldefense.com/v3/__https://docs.google.com/document/d/1I2jBFI0eVkofAVCAEeajNQRfOqKGJsRfZd54h79AIYc/edit?usp=sharing__;!!IKRxdwAv5BmarQ!d0XJO_mzLCJZNkrjJDMyRGP95zPLW8Cuym88l7CoAUG8aD_KRYJbll3K-q1Ypplyqe6-jcsWq3S8YJqrDMCpK4IhpT4cZPXy$
> .
> >
> > This stems from an earlier discussion thread here
> >
> https://urldefense.com/v3/__https://lists.apache.org/thread/l2fh5shbf59fjgbt1h73pmmsqj038ppv__;!!IKRxdwAv5BmarQ!d0XJO_mzLCJZNkrjJDMyRGP95zPLW8Cuym88l7CoAUG8aD_KRYJbll3K-q1Ypplyqe6-jcsWq3S8YJqrDMCpK4IhpW60A99X$
> .
> >
> >
> > This FLIP is proposing to make the configuration probing behavior between
> > Hadoop and YARN configuration to be consistent.
> >
> > Regards
> > Venkata krishnan
>


Re: [DISCUSS] FLIP-438: Make Flink's Hadoop and YARN configuration probing consistent

2024-04-03 Thread Ferenc Csaky
Hi Venkata,

Thank you for opening the discussion about this!

After taking a look at the YARN and Hadoop configurations, the
reason why it was implemented this way is that, in case of YARN,
every YARN-specific property is prefixed with "yarn.", so to get
the final, YARN-side property it is enough to remove the "flink."
prefix.

In case of Hadoop, there are properties that not prefixed with
"hadoop.", e.g. "dfs.replication" so to identify and get the
Hadoop-side property it is necessary to duplicate the "hadoop" part
in the properties.

Taking this into consideration I would personally say -0 to this
change. IMO the current behavior can be justified as giving
slightly different solutions to slightly different problems, which
are well documented. Handling both prefixes would complicate the
parsing logic until the APIs can be removed, which as it looks at
the moment would only be possible in Flink 3.0, which probably will
not happen in the foreseeable future, so I do not see the benefit
of the added complexity.

Regarding the FLIP, in the "YARN configuration override example"
part, I think you should present an example that works correctly
at the moment: "flink.yarn.application.classpath" ->
"yarn.application.classpath".

Best,
Ferenc


On Friday, March 29th, 2024 at 23:45, Venkatakrishnan Sowrirajan 
 wrote:

> 
> 
> Hi Flink devs,
> 
> I would like to start a discussion on FLIP-XXX: Make Flink's Hadoop and
> YARN configuration probing consistent
> https://docs.google.com/document/d/1I2jBFI0eVkofAVCAEeajNQRfOqKGJsRfZd54h79AIYc/edit?usp=sharing.
> 
> This stems from an earlier discussion thread here
> https://lists.apache.org/thread/l2fh5shbf59fjgbt1h73pmmsqj038ppv.
> 
> 
> This FLIP is proposing to make the configuration probing behavior between
> Hadoop and YARN configuration to be consistent.
> 
> Regards
> Venkata krishnan