Re: [DISCUSS] FLIP-445: Support dynamic parallelism inference for HiveSource
Hi Venkat, Thanks for joining the discussion. Based on our understanding, there are still a significant number of existing tasks using Hive. Indeed, many companies are now migrating their data to the lakehouse, but due to historical reasons, a substantial amount of data still resides in Hive. Best, Xia Venkatakrishnan Sowrirajan 于2024年4月25日周四 11:52写道: > Hi Xia, > > +1 on introducing dynamic parallelism inference for HiveSource. > > Orthogonal to this discussion, curious, how commonly HiveSource is used > these days in the industry given the popularity of table formats/sources > like Iceberg, Hudi and Delta lake? > > Thanks > Venkat > > On Wed, Apr 24, 2024, 7:41 PM Xia Sun wrote: > > > Hi everyone, > > > > Thanks for all the feedback! > > > > If there are no more comments, I would like to start the vote thread, > > thanks again! > > > > Best, > > Xia > > > > Ahmed Hamdy 于2024年4月18日周四 21:31写道: > > > > > Hi Xia, > > > I have read through the FLIP and discussion and the new version of the > > FLIP > > > looks better. > > > +1 for the proposal. > > > Best Regards > > > Ahmed Hamdy > > > > > > > > > On Thu, 18 Apr 2024 at 12:21, Ron Liu wrote: > > > > > > > Hi, Xia > > > > > > > > Thanks for updating, looks good to me. > > > > > > > > Best, > > > > Ron > > > > > > > > Xia Sun 于2024年4月18日周四 19:11写道: > > > > > > > > > Hi Ron, > > > > > Yes, presenting it in a table might be more intuitive. I have > already > > > > added > > > > > the table in the "Public Interfaces | New Config Option" chapter of > > > FLIP. > > > > > PTAL~ > > > > > > > > > > Ron Liu 于2024年4月18日周四 18:10写道: > > > > > > > > > > > Hi, Xia > > > > > > > > > > > > Thanks for your reply. > > > > > > > > > > > > > That means, in terms > > > > > > of priority, `table.exec.hive.infer-source-parallelism` > > > > > > > `table.exec.hive.infer-source-parallelism.mode`. > > > > > > > > > > > > I still have some confusion, if the > > > > > > `table.exec.hive.infer-source-parallelism` > > > > > > >`table.exec.hive.infer-source-parallelism.mode`, currently > > > > > > `table.exec.hive.infer-source-parallelism` default value is true, > > > that > > > > > > means always static parallelism inference work? Or perhaps after > > this > > > > > FLIP, > > > > > > we changed the default behavior of > > > > > > `table.exec.hive.infer-source-parallelism` to indicate dynamic > > > > > parallelism > > > > > > inference when enabled. > > > > > > I think you should list the various behaviors of these two > options > > > that > > > > > > coexist in FLIP by a table, only then users can know how the > > dynamic > > > > and > > > > > > static parallelism inference work. > > > > > > > > > > > > Best, > > > > > > Ron > > > > > > > > > > > > Xia Sun 于2024年4月18日周四 16:33写道: > > > > > > > > > > > > > Hi Ron and Lijie, > > > > > > > Thanks for joining the discussion and sharing your suggestions. > > > > > > > > > > > > > > > the InferMode class should also be introduced in the Public > > > > > Interfaces > > > > > > > > section! > > > > > > > > > > > > > > > > > > > > > Thanks for the reminder, I have now added the InferMode class > to > > > the > > > > > > Public > > > > > > > Interfaces section as well. > > > > > > > > > > > > > > > `table.exec.hive.infer-source-parallelism.max` is 1024, I > > checked > > > > > > through > > > > > > > > the code that the default value is 1000? > > > > > > > > > > > > > > > > > > > > > I have checked and the default value of > > > > > > > `table.exec.hive.infer-source-parallelism.max` is indeed 1000. > > This > > > > has > > > > > > > been corrected in the FLIP. > > > > > > > > > > > > > > > how are`table.exec.hive.infer-source-parallelism` and > > > > > > > > `table.exec.hive.infer-source-parallelism.mode` compatible? > > > > > > > > > > > > > > > > > > > > > This is indeed a critical point. The current plan is to > deprecate > > > > > > > `table.exec.hive.infer-source-parallelism` but still utilize it > > as > > > > the > > > > > > main > > > > > > > switch for enabling automatic parallelism inference. That > means, > > in > > > > > terms > > > > > > > of priority, `table.exec.hive.infer-source-parallelism` > > > > > > > > `table.exec.hive.infer-source-parallelism.mode`. In future > > > versions, > > > > if > > > > > > > `table.exec.hive.infer-source-parallelism` is removed, this > logic > > > > will > > > > > > also > > > > > > > need to be revised, leaving only > > > > > > > `table.exec.hive.infer-source-parallelism.mode` as the basis > for > > > > > deciding > > > > > > > whether to enable parallelism inference. I have also added this > > > > > > description > > > > > > > to the FLIP. > > > > > > > > > > > > > > > > > > > > > > In FLIP-367 it is supported to be able to set the Source's > > > > > parallelism > > > > > > > > individually, if in the future HiveSource also supports this > > > > feature, > > > > > > > > however, the default value of > > > > > > > > `table.exec.hive.infer-source-parallelism.mode` is > > > > >
Re: [DISCUSS] FLIP-445: Support dynamic parallelism inference for HiveSource
Hi Xia, +1 on introducing dynamic parallelism inference for HiveSource. Orthogonal to this discussion, curious, how commonly HiveSource is used these days in the industry given the popularity of table formats/sources like Iceberg, Hudi and Delta lake? Thanks Venkat On Wed, Apr 24, 2024, 7:41 PM Xia Sun wrote: > Hi everyone, > > Thanks for all the feedback! > > If there are no more comments, I would like to start the vote thread, > thanks again! > > Best, > Xia > > Ahmed Hamdy 于2024年4月18日周四 21:31写道: > > > Hi Xia, > > I have read through the FLIP and discussion and the new version of the > FLIP > > looks better. > > +1 for the proposal. > > Best Regards > > Ahmed Hamdy > > > > > > On Thu, 18 Apr 2024 at 12:21, Ron Liu wrote: > > > > > Hi, Xia > > > > > > Thanks for updating, looks good to me. > > > > > > Best, > > > Ron > > > > > > Xia Sun 于2024年4月18日周四 19:11写道: > > > > > > > Hi Ron, > > > > Yes, presenting it in a table might be more intuitive. I have already > > > added > > > > the table in the "Public Interfaces | New Config Option" chapter of > > FLIP. > > > > PTAL~ > > > > > > > > Ron Liu 于2024年4月18日周四 18:10写道: > > > > > > > > > Hi, Xia > > > > > > > > > > Thanks for your reply. > > > > > > > > > > > That means, in terms > > > > > of priority, `table.exec.hive.infer-source-parallelism` > > > > > > `table.exec.hive.infer-source-parallelism.mode`. > > > > > > > > > > I still have some confusion, if the > > > > > `table.exec.hive.infer-source-parallelism` > > > > > >`table.exec.hive.infer-source-parallelism.mode`, currently > > > > > `table.exec.hive.infer-source-parallelism` default value is true, > > that > > > > > means always static parallelism inference work? Or perhaps after > this > > > > FLIP, > > > > > we changed the default behavior of > > > > > `table.exec.hive.infer-source-parallelism` to indicate dynamic > > > > parallelism > > > > > inference when enabled. > > > > > I think you should list the various behaviors of these two options > > that > > > > > coexist in FLIP by a table, only then users can know how the > dynamic > > > and > > > > > static parallelism inference work. > > > > > > > > > > Best, > > > > > Ron > > > > > > > > > > Xia Sun 于2024年4月18日周四 16:33写道: > > > > > > > > > > > Hi Ron and Lijie, > > > > > > Thanks for joining the discussion and sharing your suggestions. > > > > > > > > > > > > > the InferMode class should also be introduced in the Public > > > > Interfaces > > > > > > > section! > > > > > > > > > > > > > > > > > > Thanks for the reminder, I have now added the InferMode class to > > the > > > > > Public > > > > > > Interfaces section as well. > > > > > > > > > > > > > `table.exec.hive.infer-source-parallelism.max` is 1024, I > checked > > > > > through > > > > > > > the code that the default value is 1000? > > > > > > > > > > > > > > > > > > I have checked and the default value of > > > > > > `table.exec.hive.infer-source-parallelism.max` is indeed 1000. > This > > > has > > > > > > been corrected in the FLIP. > > > > > > > > > > > > > how are`table.exec.hive.infer-source-parallelism` and > > > > > > > `table.exec.hive.infer-source-parallelism.mode` compatible? > > > > > > > > > > > > > > > > > > This is indeed a critical point. The current plan is to deprecate > > > > > > `table.exec.hive.infer-source-parallelism` but still utilize it > as > > > the > > > > > main > > > > > > switch for enabling automatic parallelism inference. That means, > in > > > > terms > > > > > > of priority, `table.exec.hive.infer-source-parallelism` > > > > > > > `table.exec.hive.infer-source-parallelism.mode`. In future > > versions, > > > if > > > > > > `table.exec.hive.infer-source-parallelism` is removed, this logic > > > will > > > > > also > > > > > > need to be revised, leaving only > > > > > > `table.exec.hive.infer-source-parallelism.mode` as the basis for > > > > deciding > > > > > > whether to enable parallelism inference. I have also added this > > > > > description > > > > > > to the FLIP. > > > > > > > > > > > > > > > > > > > In FLIP-367 it is supported to be able to set the Source's > > > > parallelism > > > > > > > individually, if in the future HiveSource also supports this > > > feature, > > > > > > > however, the default value of > > > > > > > `table.exec.hive.infer-source-parallelism.mode` is > > > > `InferMode.DYNAMIC`, > > > > > > at > > > > > > > this point will the parallelism be dynamically derived or will > > the > > > > > > manually > > > > > > > set parallelism take effect, and who has the higher priority? > > > > > > > > > > > > > > > > > > From my understanding, 'manually set parallelism' has the higher > > > > > priority, > > > > > > just like one of the preconditions for the effectiveness of > dynamic > > > > > > parallelism inference in the AdaptiveBatchScheduler is that the > > > > vertex's > > > > > > parallelism isn't set. I believe whether it's static inference or > > > > dynamic > > > > > > inference, the manually set parallelism by the user
Re: [DISCUSS] FLIP-445: Support dynamic parallelism inference for HiveSource
Hi everyone, Thanks for all the feedback! If there are no more comments, I would like to start the vote thread, thanks again! Best, Xia Ahmed Hamdy 于2024年4月18日周四 21:31写道: > Hi Xia, > I have read through the FLIP and discussion and the new version of the FLIP > looks better. > +1 for the proposal. > Best Regards > Ahmed Hamdy > > > On Thu, 18 Apr 2024 at 12:21, Ron Liu wrote: > > > Hi, Xia > > > > Thanks for updating, looks good to me. > > > > Best, > > Ron > > > > Xia Sun 于2024年4月18日周四 19:11写道: > > > > > Hi Ron, > > > Yes, presenting it in a table might be more intuitive. I have already > > added > > > the table in the "Public Interfaces | New Config Option" chapter of > FLIP. > > > PTAL~ > > > > > > Ron Liu 于2024年4月18日周四 18:10写道: > > > > > > > Hi, Xia > > > > > > > > Thanks for your reply. > > > > > > > > > That means, in terms > > > > of priority, `table.exec.hive.infer-source-parallelism` > > > > > `table.exec.hive.infer-source-parallelism.mode`. > > > > > > > > I still have some confusion, if the > > > > `table.exec.hive.infer-source-parallelism` > > > > >`table.exec.hive.infer-source-parallelism.mode`, currently > > > > `table.exec.hive.infer-source-parallelism` default value is true, > that > > > > means always static parallelism inference work? Or perhaps after this > > > FLIP, > > > > we changed the default behavior of > > > > `table.exec.hive.infer-source-parallelism` to indicate dynamic > > > parallelism > > > > inference when enabled. > > > > I think you should list the various behaviors of these two options > that > > > > coexist in FLIP by a table, only then users can know how the dynamic > > and > > > > static parallelism inference work. > > > > > > > > Best, > > > > Ron > > > > > > > > Xia Sun 于2024年4月18日周四 16:33写道: > > > > > > > > > Hi Ron and Lijie, > > > > > Thanks for joining the discussion and sharing your suggestions. > > > > > > > > > > > the InferMode class should also be introduced in the Public > > > Interfaces > > > > > > section! > > > > > > > > > > > > > > > Thanks for the reminder, I have now added the InferMode class to > the > > > > Public > > > > > Interfaces section as well. > > > > > > > > > > > `table.exec.hive.infer-source-parallelism.max` is 1024, I checked > > > > through > > > > > > the code that the default value is 1000? > > > > > > > > > > > > > > > I have checked and the default value of > > > > > `table.exec.hive.infer-source-parallelism.max` is indeed 1000. This > > has > > > > > been corrected in the FLIP. > > > > > > > > > > > how are`table.exec.hive.infer-source-parallelism` and > > > > > > `table.exec.hive.infer-source-parallelism.mode` compatible? > > > > > > > > > > > > > > > This is indeed a critical point. The current plan is to deprecate > > > > > `table.exec.hive.infer-source-parallelism` but still utilize it as > > the > > > > main > > > > > switch for enabling automatic parallelism inference. That means, in > > > terms > > > > > of priority, `table.exec.hive.infer-source-parallelism` > > > > > > `table.exec.hive.infer-source-parallelism.mode`. In future > versions, > > if > > > > > `table.exec.hive.infer-source-parallelism` is removed, this logic > > will > > > > also > > > > > need to be revised, leaving only > > > > > `table.exec.hive.infer-source-parallelism.mode` as the basis for > > > deciding > > > > > whether to enable parallelism inference. I have also added this > > > > description > > > > > to the FLIP. > > > > > > > > > > > > > > > > In FLIP-367 it is supported to be able to set the Source's > > > parallelism > > > > > > individually, if in the future HiveSource also supports this > > feature, > > > > > > however, the default value of > > > > > > `table.exec.hive.infer-source-parallelism.mode` is > > > `InferMode.DYNAMIC`, > > > > > at > > > > > > this point will the parallelism be dynamically derived or will > the > > > > > manually > > > > > > set parallelism take effect, and who has the higher priority? > > > > > > > > > > > > > > > From my understanding, 'manually set parallelism' has the higher > > > > priority, > > > > > just like one of the preconditions for the effectiveness of dynamic > > > > > parallelism inference in the AdaptiveBatchScheduler is that the > > > vertex's > > > > > parallelism isn't set. I believe whether it's static inference or > > > dynamic > > > > > inference, the manually set parallelism by the user should be > > > respected. > > > > > > > > > > > The `InferMode.NONE` option. > > > > > > > > > > Currently, 'adding InferMode.NONE' seems to be the prevailing > > opinion. > > > I > > > > > will add InferMode.NONE as one of the Enum options in InferMode > > class. > > > > > > > > > > Best, > > > > > Xia > > > > > > > > > > Lijie Wang 于2024年4月18日周四 13:50写道: > > > > > > > > > > > Thanks for driving the discussion. > > > > > > > > > > > > +1 for the proposal and +1 for the `InferMode.NONE` option. > > > > > > > > > > > > Best, > > > > > > Lijie > > > > > > > > > > > > Ron liu 于2024年4月18日周四 11:36写道: > >
Re: [DISCUSS] FLIP-445: Support dynamic parallelism inference for HiveSource
Hi Xia, I have read through the FLIP and discussion and the new version of the FLIP looks better. +1 for the proposal. Best Regards Ahmed Hamdy On Thu, 18 Apr 2024 at 12:21, Ron Liu wrote: > Hi, Xia > > Thanks for updating, looks good to me. > > Best, > Ron > > Xia Sun 于2024年4月18日周四 19:11写道: > > > Hi Ron, > > Yes, presenting it in a table might be more intuitive. I have already > added > > the table in the "Public Interfaces | New Config Option" chapter of FLIP. > > PTAL~ > > > > Ron Liu 于2024年4月18日周四 18:10写道: > > > > > Hi, Xia > > > > > > Thanks for your reply. > > > > > > > That means, in terms > > > of priority, `table.exec.hive.infer-source-parallelism` > > > > `table.exec.hive.infer-source-parallelism.mode`. > > > > > > I still have some confusion, if the > > > `table.exec.hive.infer-source-parallelism` > > > >`table.exec.hive.infer-source-parallelism.mode`, currently > > > `table.exec.hive.infer-source-parallelism` default value is true, that > > > means always static parallelism inference work? Or perhaps after this > > FLIP, > > > we changed the default behavior of > > > `table.exec.hive.infer-source-parallelism` to indicate dynamic > > parallelism > > > inference when enabled. > > > I think you should list the various behaviors of these two options that > > > coexist in FLIP by a table, only then users can know how the dynamic > and > > > static parallelism inference work. > > > > > > Best, > > > Ron > > > > > > Xia Sun 于2024年4月18日周四 16:33写道: > > > > > > > Hi Ron and Lijie, > > > > Thanks for joining the discussion and sharing your suggestions. > > > > > > > > > the InferMode class should also be introduced in the Public > > Interfaces > > > > > section! > > > > > > > > > > > > Thanks for the reminder, I have now added the InferMode class to the > > > Public > > > > Interfaces section as well. > > > > > > > > > `table.exec.hive.infer-source-parallelism.max` is 1024, I checked > > > through > > > > > the code that the default value is 1000? > > > > > > > > > > > > I have checked and the default value of > > > > `table.exec.hive.infer-source-parallelism.max` is indeed 1000. This > has > > > > been corrected in the FLIP. > > > > > > > > > how are`table.exec.hive.infer-source-parallelism` and > > > > > `table.exec.hive.infer-source-parallelism.mode` compatible? > > > > > > > > > > > > This is indeed a critical point. The current plan is to deprecate > > > > `table.exec.hive.infer-source-parallelism` but still utilize it as > the > > > main > > > > switch for enabling automatic parallelism inference. That means, in > > terms > > > > of priority, `table.exec.hive.infer-source-parallelism` > > > > > `table.exec.hive.infer-source-parallelism.mode`. In future versions, > if > > > > `table.exec.hive.infer-source-parallelism` is removed, this logic > will > > > also > > > > need to be revised, leaving only > > > > `table.exec.hive.infer-source-parallelism.mode` as the basis for > > deciding > > > > whether to enable parallelism inference. I have also added this > > > description > > > > to the FLIP. > > > > > > > > > > > > > In FLIP-367 it is supported to be able to set the Source's > > parallelism > > > > > individually, if in the future HiveSource also supports this > feature, > > > > > however, the default value of > > > > > `table.exec.hive.infer-source-parallelism.mode` is > > `InferMode.DYNAMIC`, > > > > at > > > > > this point will the parallelism be dynamically derived or will the > > > > manually > > > > > set parallelism take effect, and who has the higher priority? > > > > > > > > > > > > From my understanding, 'manually set parallelism' has the higher > > > priority, > > > > just like one of the preconditions for the effectiveness of dynamic > > > > parallelism inference in the AdaptiveBatchScheduler is that the > > vertex's > > > > parallelism isn't set. I believe whether it's static inference or > > dynamic > > > > inference, the manually set parallelism by the user should be > > respected. > > > > > > > > > The `InferMode.NONE` option. > > > > > > > > Currently, 'adding InferMode.NONE' seems to be the prevailing > opinion. > > I > > > > will add InferMode.NONE as one of the Enum options in InferMode > class. > > > > > > > > Best, > > > > Xia > > > > > > > > Lijie Wang 于2024年4月18日周四 13:50写道: > > > > > > > > > Thanks for driving the discussion. > > > > > > > > > > +1 for the proposal and +1 for the `InferMode.NONE` option. > > > > > > > > > > Best, > > > > > Lijie > > > > > > > > > > Ron liu 于2024年4月18日周四 11:36写道: > > > > > > > > > > > Hi, Xia > > > > > > > > > > > > Thanks for driving this FLIP. > > > > > > > > > > > > This proposal looks good to me overall. However, I have the > > following > > > > > minor > > > > > > questions: > > > > > > > > > > > > 1. FLIP introduced > `table.exec.hive.infer-source-parallelism.mode` > > > as a > > > > > new > > > > > > parameter, and the value is the enum class `InferMode`, I think > the > > > > > > InferMode class should also be introduced in
Re: [DISCUSS] FLIP-445: Support dynamic parallelism inference for HiveSource
Hi, Xia Thanks for updating, looks good to me. Best, Ron Xia Sun 于2024年4月18日周四 19:11写道: > Hi Ron, > Yes, presenting it in a table might be more intuitive. I have already added > the table in the "Public Interfaces | New Config Option" chapter of FLIP. > PTAL~ > > Ron Liu 于2024年4月18日周四 18:10写道: > > > Hi, Xia > > > > Thanks for your reply. > > > > > That means, in terms > > of priority, `table.exec.hive.infer-source-parallelism` > > > `table.exec.hive.infer-source-parallelism.mode`. > > > > I still have some confusion, if the > > `table.exec.hive.infer-source-parallelism` > > >`table.exec.hive.infer-source-parallelism.mode`, currently > > `table.exec.hive.infer-source-parallelism` default value is true, that > > means always static parallelism inference work? Or perhaps after this > FLIP, > > we changed the default behavior of > > `table.exec.hive.infer-source-parallelism` to indicate dynamic > parallelism > > inference when enabled. > > I think you should list the various behaviors of these two options that > > coexist in FLIP by a table, only then users can know how the dynamic and > > static parallelism inference work. > > > > Best, > > Ron > > > > Xia Sun 于2024年4月18日周四 16:33写道: > > > > > Hi Ron and Lijie, > > > Thanks for joining the discussion and sharing your suggestions. > > > > > > > the InferMode class should also be introduced in the Public > Interfaces > > > > section! > > > > > > > > > Thanks for the reminder, I have now added the InferMode class to the > > Public > > > Interfaces section as well. > > > > > > > `table.exec.hive.infer-source-parallelism.max` is 1024, I checked > > through > > > > the code that the default value is 1000? > > > > > > > > > I have checked and the default value of > > > `table.exec.hive.infer-source-parallelism.max` is indeed 1000. This has > > > been corrected in the FLIP. > > > > > > > how are`table.exec.hive.infer-source-parallelism` and > > > > `table.exec.hive.infer-source-parallelism.mode` compatible? > > > > > > > > > This is indeed a critical point. The current plan is to deprecate > > > `table.exec.hive.infer-source-parallelism` but still utilize it as the > > main > > > switch for enabling automatic parallelism inference. That means, in > terms > > > of priority, `table.exec.hive.infer-source-parallelism` > > > > `table.exec.hive.infer-source-parallelism.mode`. In future versions, if > > > `table.exec.hive.infer-source-parallelism` is removed, this logic will > > also > > > need to be revised, leaving only > > > `table.exec.hive.infer-source-parallelism.mode` as the basis for > deciding > > > whether to enable parallelism inference. I have also added this > > description > > > to the FLIP. > > > > > > > > > > In FLIP-367 it is supported to be able to set the Source's > parallelism > > > > individually, if in the future HiveSource also supports this feature, > > > > however, the default value of > > > > `table.exec.hive.infer-source-parallelism.mode` is > `InferMode.DYNAMIC`, > > > at > > > > this point will the parallelism be dynamically derived or will the > > > manually > > > > set parallelism take effect, and who has the higher priority? > > > > > > > > > From my understanding, 'manually set parallelism' has the higher > > priority, > > > just like one of the preconditions for the effectiveness of dynamic > > > parallelism inference in the AdaptiveBatchScheduler is that the > vertex's > > > parallelism isn't set. I believe whether it's static inference or > dynamic > > > inference, the manually set parallelism by the user should be > respected. > > > > > > > The `InferMode.NONE` option. > > > > > > Currently, 'adding InferMode.NONE' seems to be the prevailing opinion. > I > > > will add InferMode.NONE as one of the Enum options in InferMode class. > > > > > > Best, > > > Xia > > > > > > Lijie Wang 于2024年4月18日周四 13:50写道: > > > > > > > Thanks for driving the discussion. > > > > > > > > +1 for the proposal and +1 for the `InferMode.NONE` option. > > > > > > > > Best, > > > > Lijie > > > > > > > > Ron liu 于2024年4月18日周四 11:36写道: > > > > > > > > > Hi, Xia > > > > > > > > > > Thanks for driving this FLIP. > > > > > > > > > > This proposal looks good to me overall. However, I have the > following > > > > minor > > > > > questions: > > > > > > > > > > 1. FLIP introduced `table.exec.hive.infer-source-parallelism.mode` > > as a > > > > new > > > > > parameter, and the value is the enum class `InferMode`, I think the > > > > > InferMode class should also be introduced in the Public Interfaces > > > > section! > > > > > 2. You mentioned in FLIP that the default value of > > > > > `table.exec.hive.infer-source-parallelism.max` is 1024, I checked > > > through > > > > > the code that the default value is 1000? > > > > > 3. I also agree with Muhammet's idea that there is no need to > > introduce > > > > the > > > > > option `table.exec.hive.infer-source-parallelism.enabled`, and that > > > > > expanding the InferMode values will fulfill the need. There is > >
Re: [DISCUSS] FLIP-445: Support dynamic parallelism inference for HiveSource
Hi Ron, Yes, presenting it in a table might be more intuitive. I have already added the table in the "Public Interfaces | New Config Option" chapter of FLIP. PTAL~ Ron Liu 于2024年4月18日周四 18:10写道: > Hi, Xia > > Thanks for your reply. > > > That means, in terms > of priority, `table.exec.hive.infer-source-parallelism` > > `table.exec.hive.infer-source-parallelism.mode`. > > I still have some confusion, if the > `table.exec.hive.infer-source-parallelism` > >`table.exec.hive.infer-source-parallelism.mode`, currently > `table.exec.hive.infer-source-parallelism` default value is true, that > means always static parallelism inference work? Or perhaps after this FLIP, > we changed the default behavior of > `table.exec.hive.infer-source-parallelism` to indicate dynamic parallelism > inference when enabled. > I think you should list the various behaviors of these two options that > coexist in FLIP by a table, only then users can know how the dynamic and > static parallelism inference work. > > Best, > Ron > > Xia Sun 于2024年4月18日周四 16:33写道: > > > Hi Ron and Lijie, > > Thanks for joining the discussion and sharing your suggestions. > > > > > the InferMode class should also be introduced in the Public Interfaces > > > section! > > > > > > Thanks for the reminder, I have now added the InferMode class to the > Public > > Interfaces section as well. > > > > > `table.exec.hive.infer-source-parallelism.max` is 1024, I checked > through > > > the code that the default value is 1000? > > > > > > I have checked and the default value of > > `table.exec.hive.infer-source-parallelism.max` is indeed 1000. This has > > been corrected in the FLIP. > > > > > how are`table.exec.hive.infer-source-parallelism` and > > > `table.exec.hive.infer-source-parallelism.mode` compatible? > > > > > > This is indeed a critical point. The current plan is to deprecate > > `table.exec.hive.infer-source-parallelism` but still utilize it as the > main > > switch for enabling automatic parallelism inference. That means, in terms > > of priority, `table.exec.hive.infer-source-parallelism` > > > `table.exec.hive.infer-source-parallelism.mode`. In future versions, if > > `table.exec.hive.infer-source-parallelism` is removed, this logic will > also > > need to be revised, leaving only > > `table.exec.hive.infer-source-parallelism.mode` as the basis for deciding > > whether to enable parallelism inference. I have also added this > description > > to the FLIP. > > > > > > > In FLIP-367 it is supported to be able to set the Source's parallelism > > > individually, if in the future HiveSource also supports this feature, > > > however, the default value of > > > `table.exec.hive.infer-source-parallelism.mode` is `InferMode.DYNAMIC`, > > at > > > this point will the parallelism be dynamically derived or will the > > manually > > > set parallelism take effect, and who has the higher priority? > > > > > > From my understanding, 'manually set parallelism' has the higher > priority, > > just like one of the preconditions for the effectiveness of dynamic > > parallelism inference in the AdaptiveBatchScheduler is that the vertex's > > parallelism isn't set. I believe whether it's static inference or dynamic > > inference, the manually set parallelism by the user should be respected. > > > > > The `InferMode.NONE` option. > > > > Currently, 'adding InferMode.NONE' seems to be the prevailing opinion. I > > will add InferMode.NONE as one of the Enum options in InferMode class. > > > > Best, > > Xia > > > > Lijie Wang 于2024年4月18日周四 13:50写道: > > > > > Thanks for driving the discussion. > > > > > > +1 for the proposal and +1 for the `InferMode.NONE` option. > > > > > > Best, > > > Lijie > > > > > > Ron liu 于2024年4月18日周四 11:36写道: > > > > > > > Hi, Xia > > > > > > > > Thanks for driving this FLIP. > > > > > > > > This proposal looks good to me overall. However, I have the following > > > minor > > > > questions: > > > > > > > > 1. FLIP introduced `table.exec.hive.infer-source-parallelism.mode` > as a > > > new > > > > parameter, and the value is the enum class `InferMode`, I think the > > > > InferMode class should also be introduced in the Public Interfaces > > > section! > > > > 2. You mentioned in FLIP that the default value of > > > > `table.exec.hive.infer-source-parallelism.max` is 1024, I checked > > through > > > > the code that the default value is 1000? > > > > 3. I also agree with Muhammet's idea that there is no need to > introduce > > > the > > > > option `table.exec.hive.infer-source-parallelism.enabled`, and that > > > > expanding the InferMode values will fulfill the need. There is > another > > > > issue to consider here though, how are > > > > `table.exec.hive.infer-source-parallelism` and > > > > `table.exec.hive.infer-source-parallelism.mode` compatible? > > > > 4. In FLIP-367 it is supported to be able to set the Source's > > parallelism > > > > individually, if in the future HiveSource also supports this feature, > > > > however, the default value
Re: [DISCUSS] FLIP-445: Support dynamic parallelism inference for HiveSource
Hi, Xia Thanks for your reply. > That means, in terms of priority, `table.exec.hive.infer-source-parallelism` > `table.exec.hive.infer-source-parallelism.mode`. I still have some confusion, if the `table.exec.hive.infer-source-parallelism` >`table.exec.hive.infer-source-parallelism.mode`, currently `table.exec.hive.infer-source-parallelism` default value is true, that means always static parallelism inference work? Or perhaps after this FLIP, we changed the default behavior of `table.exec.hive.infer-source-parallelism` to indicate dynamic parallelism inference when enabled. I think you should list the various behaviors of these two options that coexist in FLIP by a table, only then users can know how the dynamic and static parallelism inference work. Best, Ron Xia Sun 于2024年4月18日周四 16:33写道: > Hi Ron and Lijie, > Thanks for joining the discussion and sharing your suggestions. > > > the InferMode class should also be introduced in the Public Interfaces > > section! > > > Thanks for the reminder, I have now added the InferMode class to the Public > Interfaces section as well. > > > `table.exec.hive.infer-source-parallelism.max` is 1024, I checked through > > the code that the default value is 1000? > > > I have checked and the default value of > `table.exec.hive.infer-source-parallelism.max` is indeed 1000. This has > been corrected in the FLIP. > > > how are`table.exec.hive.infer-source-parallelism` and > > `table.exec.hive.infer-source-parallelism.mode` compatible? > > > This is indeed a critical point. The current plan is to deprecate > `table.exec.hive.infer-source-parallelism` but still utilize it as the main > switch for enabling automatic parallelism inference. That means, in terms > of priority, `table.exec.hive.infer-source-parallelism` > > `table.exec.hive.infer-source-parallelism.mode`. In future versions, if > `table.exec.hive.infer-source-parallelism` is removed, this logic will also > need to be revised, leaving only > `table.exec.hive.infer-source-parallelism.mode` as the basis for deciding > whether to enable parallelism inference. I have also added this description > to the FLIP. > > > > In FLIP-367 it is supported to be able to set the Source's parallelism > > individually, if in the future HiveSource also supports this feature, > > however, the default value of > > `table.exec.hive.infer-source-parallelism.mode` is `InferMode.DYNAMIC`, > at > > this point will the parallelism be dynamically derived or will the > manually > > set parallelism take effect, and who has the higher priority? > > > From my understanding, 'manually set parallelism' has the higher priority, > just like one of the preconditions for the effectiveness of dynamic > parallelism inference in the AdaptiveBatchScheduler is that the vertex's > parallelism isn't set. I believe whether it's static inference or dynamic > inference, the manually set parallelism by the user should be respected. > > > The `InferMode.NONE` option. > > Currently, 'adding InferMode.NONE' seems to be the prevailing opinion. I > will add InferMode.NONE as one of the Enum options in InferMode class. > > Best, > Xia > > Lijie Wang 于2024年4月18日周四 13:50写道: > > > Thanks for driving the discussion. > > > > +1 for the proposal and +1 for the `InferMode.NONE` option. > > > > Best, > > Lijie > > > > Ron liu 于2024年4月18日周四 11:36写道: > > > > > Hi, Xia > > > > > > Thanks for driving this FLIP. > > > > > > This proposal looks good to me overall. However, I have the following > > minor > > > questions: > > > > > > 1. FLIP introduced `table.exec.hive.infer-source-parallelism.mode` as a > > new > > > parameter, and the value is the enum class `InferMode`, I think the > > > InferMode class should also be introduced in the Public Interfaces > > section! > > > 2. You mentioned in FLIP that the default value of > > > `table.exec.hive.infer-source-parallelism.max` is 1024, I checked > through > > > the code that the default value is 1000? > > > 3. I also agree with Muhammet's idea that there is no need to introduce > > the > > > option `table.exec.hive.infer-source-parallelism.enabled`, and that > > > expanding the InferMode values will fulfill the need. There is another > > > issue to consider here though, how are > > > `table.exec.hive.infer-source-parallelism` and > > > `table.exec.hive.infer-source-parallelism.mode` compatible? > > > 4. In FLIP-367 it is supported to be able to set the Source's > parallelism > > > individually, if in the future HiveSource also supports this feature, > > > however, the default value of > > > `table.exec.hive.infer-source-parallelism.mode` is `InferMode. > DYNAMIC`, > > at > > > this point will the parallelism be dynamically derived or will the > > manually > > > set parallelism take effect, and who has the higher priority? > > > > > > Best, > > > Ron > > > > > > Xia Sun 于2024年4月17日周三 12:08写道: > > > > > > > Hi Jeyhun, Muhammet, > > > > Thanks for all the feedback! > > > > > > > > > Could you please mention the default values for
Re: [DISCUSS] FLIP-445: Support dynamic parallelism inference for HiveSource
Hi Ron and Lijie, Thanks for joining the discussion and sharing your suggestions. > the InferMode class should also be introduced in the Public Interfaces > section! Thanks for the reminder, I have now added the InferMode class to the Public Interfaces section as well. > `table.exec.hive.infer-source-parallelism.max` is 1024, I checked through > the code that the default value is 1000? I have checked and the default value of `table.exec.hive.infer-source-parallelism.max` is indeed 1000. This has been corrected in the FLIP. > how are`table.exec.hive.infer-source-parallelism` and > `table.exec.hive.infer-source-parallelism.mode` compatible? This is indeed a critical point. The current plan is to deprecate `table.exec.hive.infer-source-parallelism` but still utilize it as the main switch for enabling automatic parallelism inference. That means, in terms of priority, `table.exec.hive.infer-source-parallelism` > `table.exec.hive.infer-source-parallelism.mode`. In future versions, if `table.exec.hive.infer-source-parallelism` is removed, this logic will also need to be revised, leaving only `table.exec.hive.infer-source-parallelism.mode` as the basis for deciding whether to enable parallelism inference. I have also added this description to the FLIP. > In FLIP-367 it is supported to be able to set the Source's parallelism > individually, if in the future HiveSource also supports this feature, > however, the default value of > `table.exec.hive.infer-source-parallelism.mode` is `InferMode.DYNAMIC`, at > this point will the parallelism be dynamically derived or will the manually > set parallelism take effect, and who has the higher priority? >From my understanding, 'manually set parallelism' has the higher priority, just like one of the preconditions for the effectiveness of dynamic parallelism inference in the AdaptiveBatchScheduler is that the vertex's parallelism isn't set. I believe whether it's static inference or dynamic inference, the manually set parallelism by the user should be respected. > The `InferMode.NONE` option. Currently, 'adding InferMode.NONE' seems to be the prevailing opinion. I will add InferMode.NONE as one of the Enum options in InferMode class. Best, Xia Lijie Wang 于2024年4月18日周四 13:50写道: > Thanks for driving the discussion. > > +1 for the proposal and +1 for the `InferMode.NONE` option. > > Best, > Lijie > > Ron liu 于2024年4月18日周四 11:36写道: > > > Hi, Xia > > > > Thanks for driving this FLIP. > > > > This proposal looks good to me overall. However, I have the following > minor > > questions: > > > > 1. FLIP introduced `table.exec.hive.infer-source-parallelism.mode` as a > new > > parameter, and the value is the enum class `InferMode`, I think the > > InferMode class should also be introduced in the Public Interfaces > section! > > 2. You mentioned in FLIP that the default value of > > `table.exec.hive.infer-source-parallelism.max` is 1024, I checked through > > the code that the default value is 1000? > > 3. I also agree with Muhammet's idea that there is no need to introduce > the > > option `table.exec.hive.infer-source-parallelism.enabled`, and that > > expanding the InferMode values will fulfill the need. There is another > > issue to consider here though, how are > > `table.exec.hive.infer-source-parallelism` and > > `table.exec.hive.infer-source-parallelism.mode` compatible? > > 4. In FLIP-367 it is supported to be able to set the Source's parallelism > > individually, if in the future HiveSource also supports this feature, > > however, the default value of > > `table.exec.hive.infer-source-parallelism.mode` is `InferMode. DYNAMIC`, > at > > this point will the parallelism be dynamically derived or will the > manually > > set parallelism take effect, and who has the higher priority? > > > > Best, > > Ron > > > > Xia Sun 于2024年4月17日周三 12:08写道: > > > > > Hi Jeyhun, Muhammet, > > > Thanks for all the feedback! > > > > > > > Could you please mention the default values for the new > configurations > > > > (e.g., table.exec.hive.infer-source-parallelism.mode, > > > > table.exec.hive.infer-source-parallelism.enabled, > > > > etc) ? > > > > > > > > > Thanks for your suggestion. I have supplemented the explanation > regarding > > > the default values. > > > > > > > Since we are introducing the mode as a configuration option, > > > > could it make sense to have `InferMode.NONE` option also? > > > > The `NONE` option would disable the inference. > > > > > > > > > This is a good idea. Looking ahead, it could eliminate the need for > > > introducing > > > a new configuration option. I haven't identified any potential > > > compatibility issues > > > as yet. If there are no further ideas from others, I'll go ahead and > > update > > > the FLIP to > > > introducing InferMode.NONE. > > > > > > Best, > > > Xia > > > > > > Muhammet Orazov 于2024年4月17日周三 10:31写道: > > > > > > > Hello Xia, > > > > > > > > Thanks for the FLIP! > > > > > > > > Since we are introducing
Re: [DISCUSS] FLIP-445: Support dynamic parallelism inference for HiveSource
Thanks for driving the discussion. +1 for the proposal and +1 for the `InferMode.NONE` option. Best, Lijie Ron liu 于2024年4月18日周四 11:36写道: > Hi, Xia > > Thanks for driving this FLIP. > > This proposal looks good to me overall. However, I have the following minor > questions: > > 1. FLIP introduced `table.exec.hive.infer-source-parallelism.mode` as a new > parameter, and the value is the enum class `InferMode`, I think the > InferMode class should also be introduced in the Public Interfaces section! > 2. You mentioned in FLIP that the default value of > `table.exec.hive.infer-source-parallelism.max` is 1024, I checked through > the code that the default value is 1000? > 3. I also agree with Muhammet's idea that there is no need to introduce the > option `table.exec.hive.infer-source-parallelism.enabled`, and that > expanding the InferMode values will fulfill the need. There is another > issue to consider here though, how are > `table.exec.hive.infer-source-parallelism` and > `table.exec.hive.infer-source-parallelism.mode` compatible? > 4. In FLIP-367 it is supported to be able to set the Source's parallelism > individually, if in the future HiveSource also supports this feature, > however, the default value of > `table.exec.hive.infer-source-parallelism.mode` is `InferMode. DYNAMIC`, at > this point will the parallelism be dynamically derived or will the manually > set parallelism take effect, and who has the higher priority? > > Best, > Ron > > Xia Sun 于2024年4月17日周三 12:08写道: > > > Hi Jeyhun, Muhammet, > > Thanks for all the feedback! > > > > > Could you please mention the default values for the new configurations > > > (e.g., table.exec.hive.infer-source-parallelism.mode, > > > table.exec.hive.infer-source-parallelism.enabled, > > > etc) ? > > > > > > Thanks for your suggestion. I have supplemented the explanation regarding > > the default values. > > > > > Since we are introducing the mode as a configuration option, > > > could it make sense to have `InferMode.NONE` option also? > > > The `NONE` option would disable the inference. > > > > > > This is a good idea. Looking ahead, it could eliminate the need for > > introducing > > a new configuration option. I haven't identified any potential > > compatibility issues > > as yet. If there are no further ideas from others, I'll go ahead and > update > > the FLIP to > > introducing InferMode.NONE. > > > > Best, > > Xia > > > > Muhammet Orazov 于2024年4月17日周三 10:31写道: > > > > > Hello Xia, > > > > > > Thanks for the FLIP! > > > > > > Since we are introducing the mode as a configuration option, > > > could it make sense to have `InferMode.NONE` option also? > > > The `NONE` option would disable the inference. > > > > > > This way we deprecate the `table.exec.hive.infer-source-parallelism` > > > and no additional `table.exec.hive.infer-source-parallelism.enabled` > > > option is required. > > > > > > What do you think? > > > > > > Best, > > > Muhammet > > > > > > On 2024-04-16 07:07, Xia Sun wrote: > > > > Hi everyone, > > > > I would like to start a discussion on FLIP-445: Support dynamic > > > > parallelism > > > > inference for HiveSource[1]. > > > > > > > > FLIP-379[2] has introduced dynamic source parallelism inference for > > > > batch > > > > jobs, which can utilize runtime information to more accurately decide > > > > the > > > > source parallelism. As a follow-up task, we plan to implement the > > > > dynamic > > > > parallelism inference interface for HiveSource, and also switch the > > > > default > > > > static parallelism inference to dynamic parallelism inference. > > > > > > > > Looking forward to your feedback and suggestions, thanks. > > > > > > > > [1] > > > > > > > > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-445%3A+Support+dynamic+parallelism+inference+for+HiveSource > > > > [2] > > > > > > > > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-379%3A+Dynamic+source+parallelism+inference+for+batch+jobs > > > > > > > > Best regards, > > > > Xia > > > > > >
Re: [DISCUSS] FLIP-445: Support dynamic parallelism inference for HiveSource
Hi, Xia Thanks for driving this FLIP. This proposal looks good to me overall. However, I have the following minor questions: 1. FLIP introduced `table.exec.hive.infer-source-parallelism.mode` as a new parameter, and the value is the enum class `InferMode`, I think the InferMode class should also be introduced in the Public Interfaces section! 2. You mentioned in FLIP that the default value of `table.exec.hive.infer-source-parallelism.max` is 1024, I checked through the code that the default value is 1000? 3. I also agree with Muhammet's idea that there is no need to introduce the option `table.exec.hive.infer-source-parallelism.enabled`, and that expanding the InferMode values will fulfill the need. There is another issue to consider here though, how are `table.exec.hive.infer-source-parallelism` and `table.exec.hive.infer-source-parallelism.mode` compatible? 4. In FLIP-367 it is supported to be able to set the Source's parallelism individually, if in the future HiveSource also supports this feature, however, the default value of `table.exec.hive.infer-source-parallelism.mode` is `InferMode. DYNAMIC`, at this point will the parallelism be dynamically derived or will the manually set parallelism take effect, and who has the higher priority? Best, Ron Xia Sun 于2024年4月17日周三 12:08写道: > Hi Jeyhun, Muhammet, > Thanks for all the feedback! > > > Could you please mention the default values for the new configurations > > (e.g., table.exec.hive.infer-source-parallelism.mode, > > table.exec.hive.infer-source-parallelism.enabled, > > etc) ? > > > Thanks for your suggestion. I have supplemented the explanation regarding > the default values. > > > Since we are introducing the mode as a configuration option, > > could it make sense to have `InferMode.NONE` option also? > > The `NONE` option would disable the inference. > > > This is a good idea. Looking ahead, it could eliminate the need for > introducing > a new configuration option. I haven't identified any potential > compatibility issues > as yet. If there are no further ideas from others, I'll go ahead and update > the FLIP to > introducing InferMode.NONE. > > Best, > Xia > > Muhammet Orazov 于2024年4月17日周三 10:31写道: > > > Hello Xia, > > > > Thanks for the FLIP! > > > > Since we are introducing the mode as a configuration option, > > could it make sense to have `InferMode.NONE` option also? > > The `NONE` option would disable the inference. > > > > This way we deprecate the `table.exec.hive.infer-source-parallelism` > > and no additional `table.exec.hive.infer-source-parallelism.enabled` > > option is required. > > > > What do you think? > > > > Best, > > Muhammet > > > > On 2024-04-16 07:07, Xia Sun wrote: > > > Hi everyone, > > > I would like to start a discussion on FLIP-445: Support dynamic > > > parallelism > > > inference for HiveSource[1]. > > > > > > FLIP-379[2] has introduced dynamic source parallelism inference for > > > batch > > > jobs, which can utilize runtime information to more accurately decide > > > the > > > source parallelism. As a follow-up task, we plan to implement the > > > dynamic > > > parallelism inference interface for HiveSource, and also switch the > > > default > > > static parallelism inference to dynamic parallelism inference. > > > > > > Looking forward to your feedback and suggestions, thanks. > > > > > > [1] > > > > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-445%3A+Support+dynamic+parallelism+inference+for+HiveSource > > > [2] > > > > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-379%3A+Dynamic+source+parallelism+inference+for+batch+jobs > > > > > > Best regards, > > > Xia > > >
Re: [DISCUSS] FLIP-445: Support dynamic parallelism inference for HiveSource
Hi Jeyhun, Muhammet, Thanks for all the feedback! > Could you please mention the default values for the new configurations > (e.g., table.exec.hive.infer-source-parallelism.mode, > table.exec.hive.infer-source-parallelism.enabled, > etc) ? Thanks for your suggestion. I have supplemented the explanation regarding the default values. > Since we are introducing the mode as a configuration option, > could it make sense to have `InferMode.NONE` option also? > The `NONE` option would disable the inference. This is a good idea. Looking ahead, it could eliminate the need for introducing a new configuration option. I haven't identified any potential compatibility issues as yet. If there are no further ideas from others, I'll go ahead and update the FLIP to introducing InferMode.NONE. Best, Xia Muhammet Orazov 于2024年4月17日周三 10:31写道: > Hello Xia, > > Thanks for the FLIP! > > Since we are introducing the mode as a configuration option, > could it make sense to have `InferMode.NONE` option also? > The `NONE` option would disable the inference. > > This way we deprecate the `table.exec.hive.infer-source-parallelism` > and no additional `table.exec.hive.infer-source-parallelism.enabled` > option is required. > > What do you think? > > Best, > Muhammet > > On 2024-04-16 07:07, Xia Sun wrote: > > Hi everyone, > > I would like to start a discussion on FLIP-445: Support dynamic > > parallelism > > inference for HiveSource[1]. > > > > FLIP-379[2] has introduced dynamic source parallelism inference for > > batch > > jobs, which can utilize runtime information to more accurately decide > > the > > source parallelism. As a follow-up task, we plan to implement the > > dynamic > > parallelism inference interface for HiveSource, and also switch the > > default > > static parallelism inference to dynamic parallelism inference. > > > > Looking forward to your feedback and suggestions, thanks. > > > > [1] > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-445%3A+Support+dynamic+parallelism+inference+for+HiveSource > > [2] > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-379%3A+Dynamic+source+parallelism+inference+for+batch+jobs > > > > Best regards, > > Xia >
Re: [DISCUSS] FLIP-445: Support dynamic parallelism inference for HiveSource
Hello Xia, Thanks for the FLIP! Since we are introducing the mode as a configuration option, could it make sense to have `InferMode.NONE` option also? The `NONE` option would disable the inference. This way we deprecate the `table.exec.hive.infer-source-parallelism` and no additional `table.exec.hive.infer-source-parallelism.enabled` option is required. What do you think? Best, Muhammet On 2024-04-16 07:07, Xia Sun wrote: Hi everyone, I would like to start a discussion on FLIP-445: Support dynamic parallelism inference for HiveSource[1]. FLIP-379[2] has introduced dynamic source parallelism inference for batch jobs, which can utilize runtime information to more accurately decide the source parallelism. As a follow-up task, we plan to implement the dynamic parallelism inference interface for HiveSource, and also switch the default static parallelism inference to dynamic parallelism inference. Looking forward to your feedback and suggestions, thanks. [1] https://cwiki.apache.org/confluence/display/FLINK/FLIP-445%3A+Support+dynamic+parallelism+inference+for+HiveSource [2] https://cwiki.apache.org/confluence/display/FLINK/FLIP-379%3A+Dynamic+source+parallelism+inference+for+batch+jobs Best regards, Xia
Re: [DISCUSS] FLIP-445: Support dynamic parallelism inference for HiveSource
Hi Xia, Thanks for driving this FLIP. +1 from my side. I have one comment. Could you please mention the default values for the new configurations (e.g., table.exec.hive.infer-source-parallelism.mode, table.exec.hive.infer-source-parallelism.enabled, etc) ? Regards, Jeyhun On Tue, Apr 16, 2024 at 9:46 AM Zhu Zhu wrote: > Thanks for creating this FLIP. @Xia > > +1 for this proposal. Dynamic parallelism inference can be helpful > to decide a better parallelism. And it's good to unify the settings > of static & dynamic parallelism inference. > > Thanks, > Zhu > > > Xia Sun 于2024年4月16日周二 15:12写道: > > > Hi everyone, > > I would like to start a discussion on FLIP-445: Support dynamic > parallelism > > inference for HiveSource[1]. > > > > FLIP-379[2] has introduced dynamic source parallelism inference for batch > > jobs, which can utilize runtime information to more accurately decide the > > source parallelism. As a follow-up task, we plan to implement the dynamic > > parallelism inference interface for HiveSource, and also switch the > default > > static parallelism inference to dynamic parallelism inference. > > > > Looking forward to your feedback and suggestions, thanks. > > > > [1] > > > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-445%3A+Support+dynamic+parallelism+inference+for+HiveSource > > [2] > > > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-379%3A+Dynamic+source+parallelism+inference+for+batch+jobs > > > > Best regards, > > Xia > > >
Re: [DISCUSS] FLIP-445: Support dynamic parallelism inference for HiveSource
Thanks for creating this FLIP. @Xia +1 for this proposal. Dynamic parallelism inference can be helpful to decide a better parallelism. And it's good to unify the settings of static & dynamic parallelism inference. Thanks, Zhu Xia Sun 于2024年4月16日周二 15:12写道: > Hi everyone, > I would like to start a discussion on FLIP-445: Support dynamic parallelism > inference for HiveSource[1]. > > FLIP-379[2] has introduced dynamic source parallelism inference for batch > jobs, which can utilize runtime information to more accurately decide the > source parallelism. As a follow-up task, we plan to implement the dynamic > parallelism inference interface for HiveSource, and also switch the default > static parallelism inference to dynamic parallelism inference. > > Looking forward to your feedback and suggestions, thanks. > > [1] > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-445%3A+Support+dynamic+parallelism+inference+for+HiveSource > [2] > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-379%3A+Dynamic+source+parallelism+inference+for+batch+jobs > > Best regards, > Xia >