Re: [DISCUSS] FLIP-445: Support dynamic parallelism inference for HiveSource

2024-04-25 Thread Xia Sun
Hi Venkat,

Thanks for joining the discussion.
Based on our understanding, there are still a significant number of
existing tasks using Hive. Indeed, many companies are now migrating their
data to the lakehouse, but due to historical reasons, a substantial amount
of data still resides in Hive.

Best,
Xia

Venkatakrishnan Sowrirajan  于2024年4月25日周四 11:52写道:

> Hi Xia,
>
> +1 on introducing dynamic parallelism inference for HiveSource.
>
> Orthogonal to this discussion, curious, how commonly HiveSource is used
> these days in the industry given the popularity of table formats/sources
> like Iceberg, Hudi and Delta lake?
>
> Thanks
> Venkat
>
> On Wed, Apr 24, 2024, 7:41 PM Xia Sun  wrote:
>
> > Hi everyone,
> >
> > Thanks for all the feedback!
> >
> > If there are no more comments, I would like to start the vote thread,
> > thanks again!
> >
> > Best,
> > Xia
> >
> > Ahmed Hamdy  于2024年4月18日周四 21:31写道:
> >
> > > Hi Xia,
> > > I have read through the FLIP and discussion and the new version of the
> > FLIP
> > > looks better.
> > > +1 for the proposal.
> > > Best Regards
> > > Ahmed Hamdy
> > >
> > >
> > > On Thu, 18 Apr 2024 at 12:21, Ron Liu  wrote:
> > >
> > > > Hi, Xia
> > > >
> > > > Thanks for updating, looks good to me.
> > > >
> > > > Best,
> > > > Ron
> > > >
> > > > Xia Sun  于2024年4月18日周四 19:11写道:
> > > >
> > > > > Hi Ron,
> > > > > Yes, presenting it in a table might be more intuitive. I have
> already
> > > > added
> > > > > the table in the "Public Interfaces | New Config Option" chapter of
> > > FLIP.
> > > > > PTAL~
> > > > >
> > > > > Ron Liu  于2024年4月18日周四 18:10写道:
> > > > >
> > > > > > Hi, Xia
> > > > > >
> > > > > > Thanks for your reply.
> > > > > >
> > > > > > > That means, in terms
> > > > > > of priority, `table.exec.hive.infer-source-parallelism` >
> > > > > > `table.exec.hive.infer-source-parallelism.mode`.
> > > > > >
> > > > > > I still have some confusion, if the
> > > > > > `table.exec.hive.infer-source-parallelism`
> > > > > > >`table.exec.hive.infer-source-parallelism.mode`, currently
> > > > > > `table.exec.hive.infer-source-parallelism` default value is true,
> > > that
> > > > > > means always static parallelism inference work? Or perhaps after
> > this
> > > > > FLIP,
> > > > > > we changed the default behavior of
> > > > > > `table.exec.hive.infer-source-parallelism` to indicate dynamic
> > > > > parallelism
> > > > > > inference when enabled.
> > > > > > I think you should list the various behaviors of these two
> options
> > > that
> > > > > > coexist in FLIP by a table, only then users can know how the
> > dynamic
> > > > and
> > > > > > static parallelism inference work.
> > > > > >
> > > > > > Best,
> > > > > > Ron
> > > > > >
> > > > > > Xia Sun  于2024年4月18日周四 16:33写道:
> > > > > >
> > > > > > > Hi Ron and Lijie,
> > > > > > > Thanks for joining the discussion and sharing your suggestions.
> > > > > > >
> > > > > > > > the InferMode class should also be introduced in the Public
> > > > > Interfaces
> > > > > > > > section!
> > > > > > >
> > > > > > >
> > > > > > > Thanks for the reminder, I have now added the InferMode class
> to
> > > the
> > > > > > Public
> > > > > > > Interfaces section as well.
> > > > > > >
> > > > > > > > `table.exec.hive.infer-source-parallelism.max` is 1024, I
> > checked
> > > > > > through
> > > > > > > > the code that the default value is 1000?
> > > > > > >
> > > > > > >
> > > > > > > I have checked and the default value of
> > > > > > > `table.exec.hive.infer-source-parallelism.max` is indeed 1000.
> > This
> > > > has
> > > > > > > been corrected in the FLIP.
> > > > > > >
> > > > > > > > how are`table.exec.hive.infer-source-parallelism` and
> > > > > > > > `table.exec.hive.infer-source-parallelism.mode` compatible?
> > > > > > >
> > > > > > >
> > > > > > > This is indeed a critical point. The current plan is to
> deprecate
> > > > > > > `table.exec.hive.infer-source-parallelism` but still utilize it
> > as
> > > > the
> > > > > > main
> > > > > > > switch for enabling automatic parallelism inference. That
> means,
> > in
> > > > > terms
> > > > > > > of priority, `table.exec.hive.infer-source-parallelism` >
> > > > > > > `table.exec.hive.infer-source-parallelism.mode`. In future
> > > versions,
> > > > if
> > > > > > > `table.exec.hive.infer-source-parallelism` is removed, this
> logic
> > > > will
> > > > > > also
> > > > > > > need to be revised, leaving only
> > > > > > > `table.exec.hive.infer-source-parallelism.mode` as the basis
> for
> > > > > deciding
> > > > > > > whether to enable parallelism inference. I have also added this
> > > > > > description
> > > > > > > to the FLIP.
> > > > > > >
> > > > > > >
> > > > > > > > In FLIP-367 it is supported to be able to set the Source's
> > > > > parallelism
> > > > > > > > individually, if in the future HiveSource also supports this
> > > > feature,
> > > > > > > > however, the default value of
> > > > > > > > `table.exec.hive.infer-source-parallelism.mode` is
> > > > > 

Re: [DISCUSS] FLIP-445: Support dynamic parallelism inference for HiveSource

2024-04-24 Thread Venkatakrishnan Sowrirajan
Hi Xia,

+1 on introducing dynamic parallelism inference for HiveSource.

Orthogonal to this discussion, curious, how commonly HiveSource is used
these days in the industry given the popularity of table formats/sources
like Iceberg, Hudi and Delta lake?

Thanks
Venkat

On Wed, Apr 24, 2024, 7:41 PM Xia Sun  wrote:

> Hi everyone,
>
> Thanks for all the feedback!
>
> If there are no more comments, I would like to start the vote thread,
> thanks again!
>
> Best,
> Xia
>
> Ahmed Hamdy  于2024年4月18日周四 21:31写道:
>
> > Hi Xia,
> > I have read through the FLIP and discussion and the new version of the
> FLIP
> > looks better.
> > +1 for the proposal.
> > Best Regards
> > Ahmed Hamdy
> >
> >
> > On Thu, 18 Apr 2024 at 12:21, Ron Liu  wrote:
> >
> > > Hi, Xia
> > >
> > > Thanks for updating, looks good to me.
> > >
> > > Best,
> > > Ron
> > >
> > > Xia Sun  于2024年4月18日周四 19:11写道:
> > >
> > > > Hi Ron,
> > > > Yes, presenting it in a table might be more intuitive. I have already
> > > added
> > > > the table in the "Public Interfaces | New Config Option" chapter of
> > FLIP.
> > > > PTAL~
> > > >
> > > > Ron Liu  于2024年4月18日周四 18:10写道:
> > > >
> > > > > Hi, Xia
> > > > >
> > > > > Thanks for your reply.
> > > > >
> > > > > > That means, in terms
> > > > > of priority, `table.exec.hive.infer-source-parallelism` >
> > > > > `table.exec.hive.infer-source-parallelism.mode`.
> > > > >
> > > > > I still have some confusion, if the
> > > > > `table.exec.hive.infer-source-parallelism`
> > > > > >`table.exec.hive.infer-source-parallelism.mode`, currently
> > > > > `table.exec.hive.infer-source-parallelism` default value is true,
> > that
> > > > > means always static parallelism inference work? Or perhaps after
> this
> > > > FLIP,
> > > > > we changed the default behavior of
> > > > > `table.exec.hive.infer-source-parallelism` to indicate dynamic
> > > > parallelism
> > > > > inference when enabled.
> > > > > I think you should list the various behaviors of these two options
> > that
> > > > > coexist in FLIP by a table, only then users can know how the
> dynamic
> > > and
> > > > > static parallelism inference work.
> > > > >
> > > > > Best,
> > > > > Ron
> > > > >
> > > > > Xia Sun  于2024年4月18日周四 16:33写道:
> > > > >
> > > > > > Hi Ron and Lijie,
> > > > > > Thanks for joining the discussion and sharing your suggestions.
> > > > > >
> > > > > > > the InferMode class should also be introduced in the Public
> > > > Interfaces
> > > > > > > section!
> > > > > >
> > > > > >
> > > > > > Thanks for the reminder, I have now added the InferMode class to
> > the
> > > > > Public
> > > > > > Interfaces section as well.
> > > > > >
> > > > > > > `table.exec.hive.infer-source-parallelism.max` is 1024, I
> checked
> > > > > through
> > > > > > > the code that the default value is 1000?
> > > > > >
> > > > > >
> > > > > > I have checked and the default value of
> > > > > > `table.exec.hive.infer-source-parallelism.max` is indeed 1000.
> This
> > > has
> > > > > > been corrected in the FLIP.
> > > > > >
> > > > > > > how are`table.exec.hive.infer-source-parallelism` and
> > > > > > > `table.exec.hive.infer-source-parallelism.mode` compatible?
> > > > > >
> > > > > >
> > > > > > This is indeed a critical point. The current plan is to deprecate
> > > > > > `table.exec.hive.infer-source-parallelism` but still utilize it
> as
> > > the
> > > > > main
> > > > > > switch for enabling automatic parallelism inference. That means,
> in
> > > > terms
> > > > > > of priority, `table.exec.hive.infer-source-parallelism` >
> > > > > > `table.exec.hive.infer-source-parallelism.mode`. In future
> > versions,
> > > if
> > > > > > `table.exec.hive.infer-source-parallelism` is removed, this logic
> > > will
> > > > > also
> > > > > > need to be revised, leaving only
> > > > > > `table.exec.hive.infer-source-parallelism.mode` as the basis for
> > > > deciding
> > > > > > whether to enable parallelism inference. I have also added this
> > > > > description
> > > > > > to the FLIP.
> > > > > >
> > > > > >
> > > > > > > In FLIP-367 it is supported to be able to set the Source's
> > > > parallelism
> > > > > > > individually, if in the future HiveSource also supports this
> > > feature,
> > > > > > > however, the default value of
> > > > > > > `table.exec.hive.infer-source-parallelism.mode` is
> > > > `InferMode.DYNAMIC`,
> > > > > > at
> > > > > > > this point will the parallelism be dynamically derived or will
> > the
> > > > > > manually
> > > > > > > set parallelism take effect, and who has the higher priority?
> > > > > >
> > > > > >
> > > > > > From my understanding, 'manually set parallelism' has the higher
> > > > > priority,
> > > > > > just like one of the preconditions for the effectiveness of
> dynamic
> > > > > > parallelism inference in the AdaptiveBatchScheduler is that the
> > > > vertex's
> > > > > > parallelism isn't set. I believe whether it's static inference or
> > > > dynamic
> > > > > > inference, the manually set parallelism by the user 

Re: [DISCUSS] FLIP-445: Support dynamic parallelism inference for HiveSource

2024-04-24 Thread Xia Sun
Hi everyone,

Thanks for all the feedback!

If there are no more comments, I would like to start the vote thread,
thanks again!

Best,
Xia

Ahmed Hamdy  于2024年4月18日周四 21:31写道:

> Hi Xia,
> I have read through the FLIP and discussion and the new version of the FLIP
> looks better.
> +1 for the proposal.
> Best Regards
> Ahmed Hamdy
>
>
> On Thu, 18 Apr 2024 at 12:21, Ron Liu  wrote:
>
> > Hi, Xia
> >
> > Thanks for updating, looks good to me.
> >
> > Best,
> > Ron
> >
> > Xia Sun  于2024年4月18日周四 19:11写道:
> >
> > > Hi Ron,
> > > Yes, presenting it in a table might be more intuitive. I have already
> > added
> > > the table in the "Public Interfaces | New Config Option" chapter of
> FLIP.
> > > PTAL~
> > >
> > > Ron Liu  于2024年4月18日周四 18:10写道:
> > >
> > > > Hi, Xia
> > > >
> > > > Thanks for your reply.
> > > >
> > > > > That means, in terms
> > > > of priority, `table.exec.hive.infer-source-parallelism` >
> > > > `table.exec.hive.infer-source-parallelism.mode`.
> > > >
> > > > I still have some confusion, if the
> > > > `table.exec.hive.infer-source-parallelism`
> > > > >`table.exec.hive.infer-source-parallelism.mode`, currently
> > > > `table.exec.hive.infer-source-parallelism` default value is true,
> that
> > > > means always static parallelism inference work? Or perhaps after this
> > > FLIP,
> > > > we changed the default behavior of
> > > > `table.exec.hive.infer-source-parallelism` to indicate dynamic
> > > parallelism
> > > > inference when enabled.
> > > > I think you should list the various behaviors of these two options
> that
> > > > coexist in FLIP by a table, only then users can know how the dynamic
> > and
> > > > static parallelism inference work.
> > > >
> > > > Best,
> > > > Ron
> > > >
> > > > Xia Sun  于2024年4月18日周四 16:33写道:
> > > >
> > > > > Hi Ron and Lijie,
> > > > > Thanks for joining the discussion and sharing your suggestions.
> > > > >
> > > > > > the InferMode class should also be introduced in the Public
> > > Interfaces
> > > > > > section!
> > > > >
> > > > >
> > > > > Thanks for the reminder, I have now added the InferMode class to
> the
> > > > Public
> > > > > Interfaces section as well.
> > > > >
> > > > > > `table.exec.hive.infer-source-parallelism.max` is 1024, I checked
> > > > through
> > > > > > the code that the default value is 1000?
> > > > >
> > > > >
> > > > > I have checked and the default value of
> > > > > `table.exec.hive.infer-source-parallelism.max` is indeed 1000. This
> > has
> > > > > been corrected in the FLIP.
> > > > >
> > > > > > how are`table.exec.hive.infer-source-parallelism` and
> > > > > > `table.exec.hive.infer-source-parallelism.mode` compatible?
> > > > >
> > > > >
> > > > > This is indeed a critical point. The current plan is to deprecate
> > > > > `table.exec.hive.infer-source-parallelism` but still utilize it as
> > the
> > > > main
> > > > > switch for enabling automatic parallelism inference. That means, in
> > > terms
> > > > > of priority, `table.exec.hive.infer-source-parallelism` >
> > > > > `table.exec.hive.infer-source-parallelism.mode`. In future
> versions,
> > if
> > > > > `table.exec.hive.infer-source-parallelism` is removed, this logic
> > will
> > > > also
> > > > > need to be revised, leaving only
> > > > > `table.exec.hive.infer-source-parallelism.mode` as the basis for
> > > deciding
> > > > > whether to enable parallelism inference. I have also added this
> > > > description
> > > > > to the FLIP.
> > > > >
> > > > >
> > > > > > In FLIP-367 it is supported to be able to set the Source's
> > > parallelism
> > > > > > individually, if in the future HiveSource also supports this
> > feature,
> > > > > > however, the default value of
> > > > > > `table.exec.hive.infer-source-parallelism.mode` is
> > > `InferMode.DYNAMIC`,
> > > > > at
> > > > > > this point will the parallelism be dynamically derived or will
> the
> > > > > manually
> > > > > > set parallelism take effect, and who has the higher priority?
> > > > >
> > > > >
> > > > > From my understanding, 'manually set parallelism' has the higher
> > > > priority,
> > > > > just like one of the preconditions for the effectiveness of dynamic
> > > > > parallelism inference in the AdaptiveBatchScheduler is that the
> > > vertex's
> > > > > parallelism isn't set. I believe whether it's static inference or
> > > dynamic
> > > > > inference, the manually set parallelism by the user should be
> > > respected.
> > > > >
> > > > > > The `InferMode.NONE` option.
> > > > >
> > > > > Currently, 'adding InferMode.NONE' seems to be the prevailing
> > opinion.
> > > I
> > > > > will add InferMode.NONE as one of the Enum options in InferMode
> > class.
> > > > >
> > > > > Best,
> > > > > Xia
> > > > >
> > > > > Lijie Wang  于2024年4月18日周四 13:50写道:
> > > > >
> > > > > > Thanks for driving the discussion.
> > > > > >
> > > > > > +1 for the proposal and +1 for the `InferMode.NONE` option.
> > > > > >
> > > > > > Best,
> > > > > > Lijie
> > > > > >
> > > > > > Ron liu  于2024年4月18日周四 11:36写道:
> > 

Re: [DISCUSS] FLIP-445: Support dynamic parallelism inference for HiveSource

2024-04-18 Thread Ahmed Hamdy
Hi Xia,
I have read through the FLIP and discussion and the new version of the FLIP
looks better.
+1 for the proposal.
Best Regards
Ahmed Hamdy


On Thu, 18 Apr 2024 at 12:21, Ron Liu  wrote:

> Hi, Xia
>
> Thanks for updating, looks good to me.
>
> Best,
> Ron
>
> Xia Sun  于2024年4月18日周四 19:11写道:
>
> > Hi Ron,
> > Yes, presenting it in a table might be more intuitive. I have already
> added
> > the table in the "Public Interfaces | New Config Option" chapter of FLIP.
> > PTAL~
> >
> > Ron Liu  于2024年4月18日周四 18:10写道:
> >
> > > Hi, Xia
> > >
> > > Thanks for your reply.
> > >
> > > > That means, in terms
> > > of priority, `table.exec.hive.infer-source-parallelism` >
> > > `table.exec.hive.infer-source-parallelism.mode`.
> > >
> > > I still have some confusion, if the
> > > `table.exec.hive.infer-source-parallelism`
> > > >`table.exec.hive.infer-source-parallelism.mode`, currently
> > > `table.exec.hive.infer-source-parallelism` default value is true, that
> > > means always static parallelism inference work? Or perhaps after this
> > FLIP,
> > > we changed the default behavior of
> > > `table.exec.hive.infer-source-parallelism` to indicate dynamic
> > parallelism
> > > inference when enabled.
> > > I think you should list the various behaviors of these two options that
> > > coexist in FLIP by a table, only then users can know how the dynamic
> and
> > > static parallelism inference work.
> > >
> > > Best,
> > > Ron
> > >
> > > Xia Sun  于2024年4月18日周四 16:33写道:
> > >
> > > > Hi Ron and Lijie,
> > > > Thanks for joining the discussion and sharing your suggestions.
> > > >
> > > > > the InferMode class should also be introduced in the Public
> > Interfaces
> > > > > section!
> > > >
> > > >
> > > > Thanks for the reminder, I have now added the InferMode class to the
> > > Public
> > > > Interfaces section as well.
> > > >
> > > > > `table.exec.hive.infer-source-parallelism.max` is 1024, I checked
> > > through
> > > > > the code that the default value is 1000?
> > > >
> > > >
> > > > I have checked and the default value of
> > > > `table.exec.hive.infer-source-parallelism.max` is indeed 1000. This
> has
> > > > been corrected in the FLIP.
> > > >
> > > > > how are`table.exec.hive.infer-source-parallelism` and
> > > > > `table.exec.hive.infer-source-parallelism.mode` compatible?
> > > >
> > > >
> > > > This is indeed a critical point. The current plan is to deprecate
> > > > `table.exec.hive.infer-source-parallelism` but still utilize it as
> the
> > > main
> > > > switch for enabling automatic parallelism inference. That means, in
> > terms
> > > > of priority, `table.exec.hive.infer-source-parallelism` >
> > > > `table.exec.hive.infer-source-parallelism.mode`. In future versions,
> if
> > > > `table.exec.hive.infer-source-parallelism` is removed, this logic
> will
> > > also
> > > > need to be revised, leaving only
> > > > `table.exec.hive.infer-source-parallelism.mode` as the basis for
> > deciding
> > > > whether to enable parallelism inference. I have also added this
> > > description
> > > > to the FLIP.
> > > >
> > > >
> > > > > In FLIP-367 it is supported to be able to set the Source's
> > parallelism
> > > > > individually, if in the future HiveSource also supports this
> feature,
> > > > > however, the default value of
> > > > > `table.exec.hive.infer-source-parallelism.mode` is
> > `InferMode.DYNAMIC`,
> > > > at
> > > > > this point will the parallelism be dynamically derived or will the
> > > > manually
> > > > > set parallelism take effect, and who has the higher priority?
> > > >
> > > >
> > > > From my understanding, 'manually set parallelism' has the higher
> > > priority,
> > > > just like one of the preconditions for the effectiveness of dynamic
> > > > parallelism inference in the AdaptiveBatchScheduler is that the
> > vertex's
> > > > parallelism isn't set. I believe whether it's static inference or
> > dynamic
> > > > inference, the manually set parallelism by the user should be
> > respected.
> > > >
> > > > > The `InferMode.NONE` option.
> > > >
> > > > Currently, 'adding InferMode.NONE' seems to be the prevailing
> opinion.
> > I
> > > > will add InferMode.NONE as one of the Enum options in InferMode
> class.
> > > >
> > > > Best,
> > > > Xia
> > > >
> > > > Lijie Wang  于2024年4月18日周四 13:50写道:
> > > >
> > > > > Thanks for driving the discussion.
> > > > >
> > > > > +1 for the proposal and +1 for the `InferMode.NONE` option.
> > > > >
> > > > > Best,
> > > > > Lijie
> > > > >
> > > > > Ron liu  于2024年4月18日周四 11:36写道:
> > > > >
> > > > > > Hi, Xia
> > > > > >
> > > > > > Thanks for driving this FLIP.
> > > > > >
> > > > > > This proposal looks good to me overall. However, I have the
> > following
> > > > > minor
> > > > > > questions:
> > > > > >
> > > > > > 1. FLIP introduced
> `table.exec.hive.infer-source-parallelism.mode`
> > > as a
> > > > > new
> > > > > > parameter, and the value is the enum class `InferMode`, I think
> the
> > > > > > InferMode class should also be introduced in 

Re: [DISCUSS] FLIP-445: Support dynamic parallelism inference for HiveSource

2024-04-18 Thread Ron Liu
Hi, Xia

Thanks for updating, looks good to me.

Best,
Ron

Xia Sun  于2024年4月18日周四 19:11写道:

> Hi Ron,
> Yes, presenting it in a table might be more intuitive. I have already added
> the table in the "Public Interfaces | New Config Option" chapter of FLIP.
> PTAL~
>
> Ron Liu  于2024年4月18日周四 18:10写道:
>
> > Hi, Xia
> >
> > Thanks for your reply.
> >
> > > That means, in terms
> > of priority, `table.exec.hive.infer-source-parallelism` >
> > `table.exec.hive.infer-source-parallelism.mode`.
> >
> > I still have some confusion, if the
> > `table.exec.hive.infer-source-parallelism`
> > >`table.exec.hive.infer-source-parallelism.mode`, currently
> > `table.exec.hive.infer-source-parallelism` default value is true, that
> > means always static parallelism inference work? Or perhaps after this
> FLIP,
> > we changed the default behavior of
> > `table.exec.hive.infer-source-parallelism` to indicate dynamic
> parallelism
> > inference when enabled.
> > I think you should list the various behaviors of these two options that
> > coexist in FLIP by a table, only then users can know how the dynamic and
> > static parallelism inference work.
> >
> > Best,
> > Ron
> >
> > Xia Sun  于2024年4月18日周四 16:33写道:
> >
> > > Hi Ron and Lijie,
> > > Thanks for joining the discussion and sharing your suggestions.
> > >
> > > > the InferMode class should also be introduced in the Public
> Interfaces
> > > > section!
> > >
> > >
> > > Thanks for the reminder, I have now added the InferMode class to the
> > Public
> > > Interfaces section as well.
> > >
> > > > `table.exec.hive.infer-source-parallelism.max` is 1024, I checked
> > through
> > > > the code that the default value is 1000?
> > >
> > >
> > > I have checked and the default value of
> > > `table.exec.hive.infer-source-parallelism.max` is indeed 1000. This has
> > > been corrected in the FLIP.
> > >
> > > > how are`table.exec.hive.infer-source-parallelism` and
> > > > `table.exec.hive.infer-source-parallelism.mode` compatible?
> > >
> > >
> > > This is indeed a critical point. The current plan is to deprecate
> > > `table.exec.hive.infer-source-parallelism` but still utilize it as the
> > main
> > > switch for enabling automatic parallelism inference. That means, in
> terms
> > > of priority, `table.exec.hive.infer-source-parallelism` >
> > > `table.exec.hive.infer-source-parallelism.mode`. In future versions, if
> > > `table.exec.hive.infer-source-parallelism` is removed, this logic will
> > also
> > > need to be revised, leaving only
> > > `table.exec.hive.infer-source-parallelism.mode` as the basis for
> deciding
> > > whether to enable parallelism inference. I have also added this
> > description
> > > to the FLIP.
> > >
> > >
> > > > In FLIP-367 it is supported to be able to set the Source's
> parallelism
> > > > individually, if in the future HiveSource also supports this feature,
> > > > however, the default value of
> > > > `table.exec.hive.infer-source-parallelism.mode` is
> `InferMode.DYNAMIC`,
> > > at
> > > > this point will the parallelism be dynamically derived or will the
> > > manually
> > > > set parallelism take effect, and who has the higher priority?
> > >
> > >
> > > From my understanding, 'manually set parallelism' has the higher
> > priority,
> > > just like one of the preconditions for the effectiveness of dynamic
> > > parallelism inference in the AdaptiveBatchScheduler is that the
> vertex's
> > > parallelism isn't set. I believe whether it's static inference or
> dynamic
> > > inference, the manually set parallelism by the user should be
> respected.
> > >
> > > > The `InferMode.NONE` option.
> > >
> > > Currently, 'adding InferMode.NONE' seems to be the prevailing opinion.
> I
> > > will add InferMode.NONE as one of the Enum options in InferMode class.
> > >
> > > Best,
> > > Xia
> > >
> > > Lijie Wang  于2024年4月18日周四 13:50写道:
> > >
> > > > Thanks for driving the discussion.
> > > >
> > > > +1 for the proposal and +1 for the `InferMode.NONE` option.
> > > >
> > > > Best,
> > > > Lijie
> > > >
> > > > Ron liu  于2024年4月18日周四 11:36写道:
> > > >
> > > > > Hi, Xia
> > > > >
> > > > > Thanks for driving this FLIP.
> > > > >
> > > > > This proposal looks good to me overall. However, I have the
> following
> > > > minor
> > > > > questions:
> > > > >
> > > > > 1. FLIP introduced `table.exec.hive.infer-source-parallelism.mode`
> > as a
> > > > new
> > > > > parameter, and the value is the enum class `InferMode`, I think the
> > > > > InferMode class should also be introduced in the Public Interfaces
> > > > section!
> > > > > 2. You mentioned in FLIP that the default value of
> > > > > `table.exec.hive.infer-source-parallelism.max` is 1024, I checked
> > > through
> > > > > the code that the default value is 1000?
> > > > > 3. I also agree with Muhammet's idea that there is no need to
> > introduce
> > > > the
> > > > > option `table.exec.hive.infer-source-parallelism.enabled`, and that
> > > > > expanding the InferMode values will fulfill the need. There is
> > 

Re: [DISCUSS] FLIP-445: Support dynamic parallelism inference for HiveSource

2024-04-18 Thread Xia Sun
Hi Ron,
Yes, presenting it in a table might be more intuitive. I have already added
the table in the "Public Interfaces | New Config Option" chapter of FLIP.
PTAL~

Ron Liu  于2024年4月18日周四 18:10写道:

> Hi, Xia
>
> Thanks for your reply.
>
> > That means, in terms
> of priority, `table.exec.hive.infer-source-parallelism` >
> `table.exec.hive.infer-source-parallelism.mode`.
>
> I still have some confusion, if the
> `table.exec.hive.infer-source-parallelism`
> >`table.exec.hive.infer-source-parallelism.mode`, currently
> `table.exec.hive.infer-source-parallelism` default value is true, that
> means always static parallelism inference work? Or perhaps after this FLIP,
> we changed the default behavior of
> `table.exec.hive.infer-source-parallelism` to indicate dynamic parallelism
> inference when enabled.
> I think you should list the various behaviors of these two options that
> coexist in FLIP by a table, only then users can know how the dynamic and
> static parallelism inference work.
>
> Best,
> Ron
>
> Xia Sun  于2024年4月18日周四 16:33写道:
>
> > Hi Ron and Lijie,
> > Thanks for joining the discussion and sharing your suggestions.
> >
> > > the InferMode class should also be introduced in the Public Interfaces
> > > section!
> >
> >
> > Thanks for the reminder, I have now added the InferMode class to the
> Public
> > Interfaces section as well.
> >
> > > `table.exec.hive.infer-source-parallelism.max` is 1024, I checked
> through
> > > the code that the default value is 1000?
> >
> >
> > I have checked and the default value of
> > `table.exec.hive.infer-source-parallelism.max` is indeed 1000. This has
> > been corrected in the FLIP.
> >
> > > how are`table.exec.hive.infer-source-parallelism` and
> > > `table.exec.hive.infer-source-parallelism.mode` compatible?
> >
> >
> > This is indeed a critical point. The current plan is to deprecate
> > `table.exec.hive.infer-source-parallelism` but still utilize it as the
> main
> > switch for enabling automatic parallelism inference. That means, in terms
> > of priority, `table.exec.hive.infer-source-parallelism` >
> > `table.exec.hive.infer-source-parallelism.mode`. In future versions, if
> > `table.exec.hive.infer-source-parallelism` is removed, this logic will
> also
> > need to be revised, leaving only
> > `table.exec.hive.infer-source-parallelism.mode` as the basis for deciding
> > whether to enable parallelism inference. I have also added this
> description
> > to the FLIP.
> >
> >
> > > In FLIP-367 it is supported to be able to set the Source's parallelism
> > > individually, if in the future HiveSource also supports this feature,
> > > however, the default value of
> > > `table.exec.hive.infer-source-parallelism.mode` is `InferMode.DYNAMIC`,
> > at
> > > this point will the parallelism be dynamically derived or will the
> > manually
> > > set parallelism take effect, and who has the higher priority?
> >
> >
> > From my understanding, 'manually set parallelism' has the higher
> priority,
> > just like one of the preconditions for the effectiveness of dynamic
> > parallelism inference in the AdaptiveBatchScheduler is that the vertex's
> > parallelism isn't set. I believe whether it's static inference or dynamic
> > inference, the manually set parallelism by the user should be respected.
> >
> > > The `InferMode.NONE` option.
> >
> > Currently, 'adding InferMode.NONE' seems to be the prevailing opinion. I
> > will add InferMode.NONE as one of the Enum options in InferMode class.
> >
> > Best,
> > Xia
> >
> > Lijie Wang  于2024年4月18日周四 13:50写道:
> >
> > > Thanks for driving the discussion.
> > >
> > > +1 for the proposal and +1 for the `InferMode.NONE` option.
> > >
> > > Best,
> > > Lijie
> > >
> > > Ron liu  于2024年4月18日周四 11:36写道:
> > >
> > > > Hi, Xia
> > > >
> > > > Thanks for driving this FLIP.
> > > >
> > > > This proposal looks good to me overall. However, I have the following
> > > minor
> > > > questions:
> > > >
> > > > 1. FLIP introduced `table.exec.hive.infer-source-parallelism.mode`
> as a
> > > new
> > > > parameter, and the value is the enum class `InferMode`, I think the
> > > > InferMode class should also be introduced in the Public Interfaces
> > > section!
> > > > 2. You mentioned in FLIP that the default value of
> > > > `table.exec.hive.infer-source-parallelism.max` is 1024, I checked
> > through
> > > > the code that the default value is 1000?
> > > > 3. I also agree with Muhammet's idea that there is no need to
> introduce
> > > the
> > > > option `table.exec.hive.infer-source-parallelism.enabled`, and that
> > > > expanding the InferMode values will fulfill the need. There is
> another
> > > > issue to consider here though, how are
> > > > `table.exec.hive.infer-source-parallelism` and
> > > > `table.exec.hive.infer-source-parallelism.mode` compatible?
> > > > 4. In FLIP-367 it is supported to be able to set the Source's
> > parallelism
> > > > individually, if in the future HiveSource also supports this feature,
> > > > however, the default value 

Re: [DISCUSS] FLIP-445: Support dynamic parallelism inference for HiveSource

2024-04-18 Thread Ron Liu
Hi, Xia

Thanks for your reply.

> That means, in terms
of priority, `table.exec.hive.infer-source-parallelism` >
`table.exec.hive.infer-source-parallelism.mode`.

I still have some confusion, if the
`table.exec.hive.infer-source-parallelism`
>`table.exec.hive.infer-source-parallelism.mode`, currently
`table.exec.hive.infer-source-parallelism` default value is true, that
means always static parallelism inference work? Or perhaps after this FLIP,
we changed the default behavior of
`table.exec.hive.infer-source-parallelism` to indicate dynamic parallelism
inference when enabled.
I think you should list the various behaviors of these two options that
coexist in FLIP by a table, only then users can know how the dynamic and
static parallelism inference work.

Best,
Ron

Xia Sun  于2024年4月18日周四 16:33写道:

> Hi Ron and Lijie,
> Thanks for joining the discussion and sharing your suggestions.
>
> > the InferMode class should also be introduced in the Public Interfaces
> > section!
>
>
> Thanks for the reminder, I have now added the InferMode class to the Public
> Interfaces section as well.
>
> > `table.exec.hive.infer-source-parallelism.max` is 1024, I checked through
> > the code that the default value is 1000?
>
>
> I have checked and the default value of
> `table.exec.hive.infer-source-parallelism.max` is indeed 1000. This has
> been corrected in the FLIP.
>
> > how are`table.exec.hive.infer-source-parallelism` and
> > `table.exec.hive.infer-source-parallelism.mode` compatible?
>
>
> This is indeed a critical point. The current plan is to deprecate
> `table.exec.hive.infer-source-parallelism` but still utilize it as the main
> switch for enabling automatic parallelism inference. That means, in terms
> of priority, `table.exec.hive.infer-source-parallelism` >
> `table.exec.hive.infer-source-parallelism.mode`. In future versions, if
> `table.exec.hive.infer-source-parallelism` is removed, this logic will also
> need to be revised, leaving only
> `table.exec.hive.infer-source-parallelism.mode` as the basis for deciding
> whether to enable parallelism inference. I have also added this description
> to the FLIP.
>
>
> > In FLIP-367 it is supported to be able to set the Source's parallelism
> > individually, if in the future HiveSource also supports this feature,
> > however, the default value of
> > `table.exec.hive.infer-source-parallelism.mode` is `InferMode.DYNAMIC`,
> at
> > this point will the parallelism be dynamically derived or will the
> manually
> > set parallelism take effect, and who has the higher priority?
>
>
> From my understanding, 'manually set parallelism' has the higher priority,
> just like one of the preconditions for the effectiveness of dynamic
> parallelism inference in the AdaptiveBatchScheduler is that the vertex's
> parallelism isn't set. I believe whether it's static inference or dynamic
> inference, the manually set parallelism by the user should be respected.
>
> > The `InferMode.NONE` option.
>
> Currently, 'adding InferMode.NONE' seems to be the prevailing opinion. I
> will add InferMode.NONE as one of the Enum options in InferMode class.
>
> Best,
> Xia
>
> Lijie Wang  于2024年4月18日周四 13:50写道:
>
> > Thanks for driving the discussion.
> >
> > +1 for the proposal and +1 for the `InferMode.NONE` option.
> >
> > Best,
> > Lijie
> >
> > Ron liu  于2024年4月18日周四 11:36写道:
> >
> > > Hi, Xia
> > >
> > > Thanks for driving this FLIP.
> > >
> > > This proposal looks good to me overall. However, I have the following
> > minor
> > > questions:
> > >
> > > 1. FLIP introduced `table.exec.hive.infer-source-parallelism.mode` as a
> > new
> > > parameter, and the value is the enum class `InferMode`, I think the
> > > InferMode class should also be introduced in the Public Interfaces
> > section!
> > > 2. You mentioned in FLIP that the default value of
> > > `table.exec.hive.infer-source-parallelism.max` is 1024, I checked
> through
> > > the code that the default value is 1000?
> > > 3. I also agree with Muhammet's idea that there is no need to introduce
> > the
> > > option `table.exec.hive.infer-source-parallelism.enabled`, and that
> > > expanding the InferMode values will fulfill the need. There is another
> > > issue to consider here though, how are
> > > `table.exec.hive.infer-source-parallelism` and
> > > `table.exec.hive.infer-source-parallelism.mode` compatible?
> > > 4. In FLIP-367 it is supported to be able to set the Source's
> parallelism
> > > individually, if in the future HiveSource also supports this feature,
> > > however, the default value of
> > > `table.exec.hive.infer-source-parallelism.mode` is `InferMode.
> DYNAMIC`,
> > at
> > > this point will the parallelism be dynamically derived or will the
> > manually
> > > set parallelism take effect, and who has the higher priority?
> > >
> > > Best,
> > > Ron
> > >
> > > Xia Sun  于2024年4月17日周三 12:08写道:
> > >
> > > > Hi Jeyhun, Muhammet,
> > > > Thanks for all the feedback!
> > > >
> > > > > Could you please mention the default values for 

Re: [DISCUSS] FLIP-445: Support dynamic parallelism inference for HiveSource

2024-04-18 Thread Xia Sun
Hi Ron and Lijie,
Thanks for joining the discussion and sharing your suggestions.

> the InferMode class should also be introduced in the Public Interfaces
> section!


Thanks for the reminder, I have now added the InferMode class to the Public
Interfaces section as well.

> `table.exec.hive.infer-source-parallelism.max` is 1024, I checked through
> the code that the default value is 1000?


I have checked and the default value of
`table.exec.hive.infer-source-parallelism.max` is indeed 1000. This has
been corrected in the FLIP.

> how are`table.exec.hive.infer-source-parallelism` and
> `table.exec.hive.infer-source-parallelism.mode` compatible?


This is indeed a critical point. The current plan is to deprecate
`table.exec.hive.infer-source-parallelism` but still utilize it as the main
switch for enabling automatic parallelism inference. That means, in terms
of priority, `table.exec.hive.infer-source-parallelism` >
`table.exec.hive.infer-source-parallelism.mode`. In future versions, if
`table.exec.hive.infer-source-parallelism` is removed, this logic will also
need to be revised, leaving only
`table.exec.hive.infer-source-parallelism.mode` as the basis for deciding
whether to enable parallelism inference. I have also added this description
to the FLIP.


> In FLIP-367 it is supported to be able to set the Source's parallelism
> individually, if in the future HiveSource also supports this feature,
> however, the default value of
> `table.exec.hive.infer-source-parallelism.mode` is `InferMode.DYNAMIC`, at
> this point will the parallelism be dynamically derived or will the manually
> set parallelism take effect, and who has the higher priority?


>From my understanding, 'manually set parallelism' has the higher priority,
just like one of the preconditions for the effectiveness of dynamic
parallelism inference in the AdaptiveBatchScheduler is that the vertex's
parallelism isn't set. I believe whether it's static inference or dynamic
inference, the manually set parallelism by the user should be respected.

> The `InferMode.NONE` option.

Currently, 'adding InferMode.NONE' seems to be the prevailing opinion. I
will add InferMode.NONE as one of the Enum options in InferMode class.

Best,
Xia

Lijie Wang  于2024年4月18日周四 13:50写道:

> Thanks for driving the discussion.
>
> +1 for the proposal and +1 for the `InferMode.NONE` option.
>
> Best,
> Lijie
>
> Ron liu  于2024年4月18日周四 11:36写道:
>
> > Hi, Xia
> >
> > Thanks for driving this FLIP.
> >
> > This proposal looks good to me overall. However, I have the following
> minor
> > questions:
> >
> > 1. FLIP introduced `table.exec.hive.infer-source-parallelism.mode` as a
> new
> > parameter, and the value is the enum class `InferMode`, I think the
> > InferMode class should also be introduced in the Public Interfaces
> section!
> > 2. You mentioned in FLIP that the default value of
> > `table.exec.hive.infer-source-parallelism.max` is 1024, I checked through
> > the code that the default value is 1000?
> > 3. I also agree with Muhammet's idea that there is no need to introduce
> the
> > option `table.exec.hive.infer-source-parallelism.enabled`, and that
> > expanding the InferMode values will fulfill the need. There is another
> > issue to consider here though, how are
> > `table.exec.hive.infer-source-parallelism` and
> > `table.exec.hive.infer-source-parallelism.mode` compatible?
> > 4. In FLIP-367 it is supported to be able to set the Source's parallelism
> > individually, if in the future HiveSource also supports this feature,
> > however, the default value of
> > `table.exec.hive.infer-source-parallelism.mode` is `InferMode. DYNAMIC`,
> at
> > this point will the parallelism be dynamically derived or will the
> manually
> > set parallelism take effect, and who has the higher priority?
> >
> > Best,
> > Ron
> >
> > Xia Sun  于2024年4月17日周三 12:08写道:
> >
> > > Hi Jeyhun, Muhammet,
> > > Thanks for all the feedback!
> > >
> > > > Could you please mention the default values for the new
> configurations
> > > > (e.g., table.exec.hive.infer-source-parallelism.mode,
> > > > table.exec.hive.infer-source-parallelism.enabled,
> > > > etc) ?
> > >
> > >
> > > Thanks for your suggestion. I have supplemented the explanation
> regarding
> > > the default values.
> > >
> > > > Since we are introducing the mode as a configuration option,
> > > > could it make sense to have `InferMode.NONE` option also?
> > > > The `NONE` option would disable the inference.
> > >
> > >
> > > This is a good idea. Looking ahead, it could eliminate the need for
> > > introducing
> > > a new configuration option. I haven't identified any potential
> > > compatibility issues
> > > as yet. If there are no further ideas from others, I'll go ahead and
> > update
> > > the FLIP to
> > > introducing InferMode.NONE.
> > >
> > > Best,
> > > Xia
> > >
> > > Muhammet Orazov  于2024年4月17日周三 10:31写道:
> > >
> > > > Hello Xia,
> > > >
> > > > Thanks for the FLIP!
> > > >
> > > > Since we are introducing 

Re: [DISCUSS] FLIP-445: Support dynamic parallelism inference for HiveSource

2024-04-17 Thread Lijie Wang
Thanks for driving the discussion.

+1 for the proposal and +1 for the `InferMode.NONE` option.

Best,
Lijie

Ron liu  于2024年4月18日周四 11:36写道:

> Hi, Xia
>
> Thanks for driving this FLIP.
>
> This proposal looks good to me overall. However, I have the following minor
> questions:
>
> 1. FLIP introduced `table.exec.hive.infer-source-parallelism.mode` as a new
> parameter, and the value is the enum class `InferMode`, I think the
> InferMode class should also be introduced in the Public Interfaces section!
> 2. You mentioned in FLIP that the default value of
> `table.exec.hive.infer-source-parallelism.max` is 1024, I checked through
> the code that the default value is 1000?
> 3. I also agree with Muhammet's idea that there is no need to introduce the
> option `table.exec.hive.infer-source-parallelism.enabled`, and that
> expanding the InferMode values will fulfill the need. There is another
> issue to consider here though, how are
> `table.exec.hive.infer-source-parallelism` and
> `table.exec.hive.infer-source-parallelism.mode` compatible?
> 4. In FLIP-367 it is supported to be able to set the Source's parallelism
> individually, if in the future HiveSource also supports this feature,
> however, the default value of
> `table.exec.hive.infer-source-parallelism.mode` is `InferMode. DYNAMIC`, at
> this point will the parallelism be dynamically derived or will the manually
> set parallelism take effect, and who has the higher priority?
>
> Best,
> Ron
>
> Xia Sun  于2024年4月17日周三 12:08写道:
>
> > Hi Jeyhun, Muhammet,
> > Thanks for all the feedback!
> >
> > > Could you please mention the default values for the new configurations
> > > (e.g., table.exec.hive.infer-source-parallelism.mode,
> > > table.exec.hive.infer-source-parallelism.enabled,
> > > etc) ?
> >
> >
> > Thanks for your suggestion. I have supplemented the explanation regarding
> > the default values.
> >
> > > Since we are introducing the mode as a configuration option,
> > > could it make sense to have `InferMode.NONE` option also?
> > > The `NONE` option would disable the inference.
> >
> >
> > This is a good idea. Looking ahead, it could eliminate the need for
> > introducing
> > a new configuration option. I haven't identified any potential
> > compatibility issues
> > as yet. If there are no further ideas from others, I'll go ahead and
> update
> > the FLIP to
> > introducing InferMode.NONE.
> >
> > Best,
> > Xia
> >
> > Muhammet Orazov  于2024年4月17日周三 10:31写道:
> >
> > > Hello Xia,
> > >
> > > Thanks for the FLIP!
> > >
> > > Since we are introducing the mode as a configuration option,
> > > could it make sense to have `InferMode.NONE` option also?
> > > The `NONE` option would disable the inference.
> > >
> > > This way we deprecate the `table.exec.hive.infer-source-parallelism`
> > > and no additional `table.exec.hive.infer-source-parallelism.enabled`
> > > option is required.
> > >
> > > What do you think?
> > >
> > > Best,
> > > Muhammet
> > >
> > > On 2024-04-16 07:07, Xia Sun wrote:
> > > > Hi everyone,
> > > > I would like to start a discussion on FLIP-445: Support dynamic
> > > > parallelism
> > > > inference for HiveSource[1].
> > > >
> > > > FLIP-379[2] has introduced dynamic source parallelism inference for
> > > > batch
> > > > jobs, which can utilize runtime information to more accurately decide
> > > > the
> > > > source parallelism. As a follow-up task, we plan to implement the
> > > > dynamic
> > > > parallelism inference interface for HiveSource, and also switch the
> > > > default
> > > > static parallelism inference to dynamic parallelism inference.
> > > >
> > > > Looking forward to your feedback and suggestions, thanks.
> > > >
> > > > [1]
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-445%3A+Support+dynamic+parallelism+inference+for+HiveSource
> > > > [2]
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-379%3A+Dynamic+source+parallelism+inference+for+batch+jobs
> > > >
> > > > Best regards,
> > > > Xia
> > >
> >
>


Re: [DISCUSS] FLIP-445: Support dynamic parallelism inference for HiveSource

2024-04-17 Thread Ron liu
Hi, Xia

Thanks for driving this FLIP.

This proposal looks good to me overall. However, I have the following minor
questions:

1. FLIP introduced `table.exec.hive.infer-source-parallelism.mode` as a new
parameter, and the value is the enum class `InferMode`, I think the
InferMode class should also be introduced in the Public Interfaces section!
2. You mentioned in FLIP that the default value of
`table.exec.hive.infer-source-parallelism.max` is 1024, I checked through
the code that the default value is 1000?
3. I also agree with Muhammet's idea that there is no need to introduce the
option `table.exec.hive.infer-source-parallelism.enabled`, and that
expanding the InferMode values will fulfill the need. There is another
issue to consider here though, how are
`table.exec.hive.infer-source-parallelism` and
`table.exec.hive.infer-source-parallelism.mode` compatible?
4. In FLIP-367 it is supported to be able to set the Source's parallelism
individually, if in the future HiveSource also supports this feature,
however, the default value of
`table.exec.hive.infer-source-parallelism.mode` is `InferMode. DYNAMIC`, at
this point will the parallelism be dynamically derived or will the manually
set parallelism take effect, and who has the higher priority?

Best,
Ron

Xia Sun  于2024年4月17日周三 12:08写道:

> Hi Jeyhun, Muhammet,
> Thanks for all the feedback!
>
> > Could you please mention the default values for the new configurations
> > (e.g., table.exec.hive.infer-source-parallelism.mode,
> > table.exec.hive.infer-source-parallelism.enabled,
> > etc) ?
>
>
> Thanks for your suggestion. I have supplemented the explanation regarding
> the default values.
>
> > Since we are introducing the mode as a configuration option,
> > could it make sense to have `InferMode.NONE` option also?
> > The `NONE` option would disable the inference.
>
>
> This is a good idea. Looking ahead, it could eliminate the need for
> introducing
> a new configuration option. I haven't identified any potential
> compatibility issues
> as yet. If there are no further ideas from others, I'll go ahead and update
> the FLIP to
> introducing InferMode.NONE.
>
> Best,
> Xia
>
> Muhammet Orazov  于2024年4月17日周三 10:31写道:
>
> > Hello Xia,
> >
> > Thanks for the FLIP!
> >
> > Since we are introducing the mode as a configuration option,
> > could it make sense to have `InferMode.NONE` option also?
> > The `NONE` option would disable the inference.
> >
> > This way we deprecate the `table.exec.hive.infer-source-parallelism`
> > and no additional `table.exec.hive.infer-source-parallelism.enabled`
> > option is required.
> >
> > What do you think?
> >
> > Best,
> > Muhammet
> >
> > On 2024-04-16 07:07, Xia Sun wrote:
> > > Hi everyone,
> > > I would like to start a discussion on FLIP-445: Support dynamic
> > > parallelism
> > > inference for HiveSource[1].
> > >
> > > FLIP-379[2] has introduced dynamic source parallelism inference for
> > > batch
> > > jobs, which can utilize runtime information to more accurately decide
> > > the
> > > source parallelism. As a follow-up task, we plan to implement the
> > > dynamic
> > > parallelism inference interface for HiveSource, and also switch the
> > > default
> > > static parallelism inference to dynamic parallelism inference.
> > >
> > > Looking forward to your feedback and suggestions, thanks.
> > >
> > > [1]
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-445%3A+Support+dynamic+parallelism+inference+for+HiveSource
> > > [2]
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-379%3A+Dynamic+source+parallelism+inference+for+batch+jobs
> > >
> > > Best regards,
> > > Xia
> >
>


Re: [DISCUSS] FLIP-445: Support dynamic parallelism inference for HiveSource

2024-04-16 Thread Xia Sun
Hi Jeyhun, Muhammet,
Thanks for all the feedback!

> Could you please mention the default values for the new configurations
> (e.g., table.exec.hive.infer-source-parallelism.mode,
> table.exec.hive.infer-source-parallelism.enabled,
> etc) ?


Thanks for your suggestion. I have supplemented the explanation regarding
the default values.

> Since we are introducing the mode as a configuration option,
> could it make sense to have `InferMode.NONE` option also?
> The `NONE` option would disable the inference.


This is a good idea. Looking ahead, it could eliminate the need for
introducing
a new configuration option. I haven't identified any potential
compatibility issues
as yet. If there are no further ideas from others, I'll go ahead and update
the FLIP to
introducing InferMode.NONE.

Best,
Xia

Muhammet Orazov  于2024年4月17日周三 10:31写道:

> Hello Xia,
>
> Thanks for the FLIP!
>
> Since we are introducing the mode as a configuration option,
> could it make sense to have `InferMode.NONE` option also?
> The `NONE` option would disable the inference.
>
> This way we deprecate the `table.exec.hive.infer-source-parallelism`
> and no additional `table.exec.hive.infer-source-parallelism.enabled`
> option is required.
>
> What do you think?
>
> Best,
> Muhammet
>
> On 2024-04-16 07:07, Xia Sun wrote:
> > Hi everyone,
> > I would like to start a discussion on FLIP-445: Support dynamic
> > parallelism
> > inference for HiveSource[1].
> >
> > FLIP-379[2] has introduced dynamic source parallelism inference for
> > batch
> > jobs, which can utilize runtime information to more accurately decide
> > the
> > source parallelism. As a follow-up task, we plan to implement the
> > dynamic
> > parallelism inference interface for HiveSource, and also switch the
> > default
> > static parallelism inference to dynamic parallelism inference.
> >
> > Looking forward to your feedback and suggestions, thanks.
> >
> > [1]
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-445%3A+Support+dynamic+parallelism+inference+for+HiveSource
> > [2]
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-379%3A+Dynamic+source+parallelism+inference+for+batch+jobs
> >
> > Best regards,
> > Xia
>


Re: [DISCUSS] FLIP-445: Support dynamic parallelism inference for HiveSource

2024-04-16 Thread Muhammet Orazov

Hello Xia,

Thanks for the FLIP!

Since we are introducing the mode as a configuration option,
could it make sense to have `InferMode.NONE` option also?
The `NONE` option would disable the inference.

This way we deprecate the `table.exec.hive.infer-source-parallelism`
and no additional `table.exec.hive.infer-source-parallelism.enabled`
option is required.

What do you think?

Best,
Muhammet

On 2024-04-16 07:07, Xia Sun wrote:

Hi everyone,
I would like to start a discussion on FLIP-445: Support dynamic 
parallelism

inference for HiveSource[1].

FLIP-379[2] has introduced dynamic source parallelism inference for 
batch
jobs, which can utilize runtime information to more accurately decide 
the
source parallelism. As a follow-up task, we plan to implement the 
dynamic
parallelism inference interface for HiveSource, and also switch the 
default

static parallelism inference to dynamic parallelism inference.

Looking forward to your feedback and suggestions, thanks.

[1]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-445%3A+Support+dynamic+parallelism+inference+for+HiveSource
[2]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-379%3A+Dynamic+source+parallelism+inference+for+batch+jobs

Best regards,
Xia


Re: [DISCUSS] FLIP-445: Support dynamic parallelism inference for HiveSource

2024-04-16 Thread Jeyhun Karimov
Hi Xia,

Thanks for driving this FLIP. +1 from my side.

I have one comment.
Could you please mention the default values for the new configurations
(e.g., table.exec.hive.infer-source-parallelism.mode,
table.exec.hive.infer-source-parallelism.enabled,
etc) ?

Regards,
Jeyhun

On Tue, Apr 16, 2024 at 9:46 AM Zhu Zhu  wrote:

> Thanks for creating this FLIP. @Xia
>
> +1 for this proposal. Dynamic parallelism inference can be helpful
> to decide a better parallelism. And it's good to unify the settings
> of static & dynamic parallelism inference.
>
> Thanks,
> Zhu
>
>
> Xia Sun  于2024年4月16日周二 15:12写道:
>
> > Hi everyone,
> > I would like to start a discussion on FLIP-445: Support dynamic
> parallelism
> > inference for HiveSource[1].
> >
> > FLIP-379[2] has introduced dynamic source parallelism inference for batch
> > jobs, which can utilize runtime information to more accurately decide the
> > source parallelism. As a follow-up task, we plan to implement the dynamic
> > parallelism inference interface for HiveSource, and also switch the
> default
> > static parallelism inference to dynamic parallelism inference.
> >
> > Looking forward to your feedback and suggestions, thanks.
> >
> > [1]
> >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-445%3A+Support+dynamic+parallelism+inference+for+HiveSource
> > [2]
> >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-379%3A+Dynamic+source+parallelism+inference+for+batch+jobs
> >
> > Best regards,
> > Xia
> >
>


Re: [DISCUSS] FLIP-445: Support dynamic parallelism inference for HiveSource

2024-04-16 Thread Zhu Zhu
Thanks for creating this FLIP. @Xia

+1 for this proposal. Dynamic parallelism inference can be helpful
to decide a better parallelism. And it's good to unify the settings
of static & dynamic parallelism inference.

Thanks,
Zhu


Xia Sun  于2024年4月16日周二 15:12写道:

> Hi everyone,
> I would like to start a discussion on FLIP-445: Support dynamic parallelism
> inference for HiveSource[1].
>
> FLIP-379[2] has introduced dynamic source parallelism inference for batch
> jobs, which can utilize runtime information to more accurately decide the
> source parallelism. As a follow-up task, we plan to implement the dynamic
> parallelism inference interface for HiveSource, and also switch the default
> static parallelism inference to dynamic parallelism inference.
>
> Looking forward to your feedback and suggestions, thanks.
>
> [1]
>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-445%3A+Support+dynamic+parallelism+inference+for+HiveSource
> [2]
>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-379%3A+Dynamic+source+parallelism+inference+for+batch+jobs
>
> Best regards,
> Xia
>