[jira] [Updated] (FLINK-23826) Verify optimized scheduler performance for large-scale jobs

2021-08-16 Thread Zhu Zhu (Jira)
[ https://issues.apache.org/jira/browse/FLINK-23826?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhu Zhu updated FLINK-23826: Description: This ticket is used to verify the result of FLINK-21110. It should check if large scale

[jira] [Updated] (FLINK-23826) Verify optimized scheduler performance for large-scale jobs

2021-08-16 Thread Zhu Zhu (Jira)
[ https://issues.apache.org/jira/browse/FLINK-23826?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhu Zhu updated FLINK-23826: Description: This ticket is used to verify the result of FLINK-21110. It should check if large scale

[jira] [Created] (FLINK-23826) Verify optimized scheduler performance for large-scale jobs

2021-08-16 Thread Zhu Zhu (Jira)
Zhu Zhu created FLINK-23826: --- Summary: Verify optimized scheduler performance for large-scale jobs Key: FLINK-23826 URL: https://issues.apache.org/jira/browse/FLINK-23826 Project: Flink Issue

[jira] [Updated] (FLINK-23806) StackOverflowException can happen if a large scale job failed to acquire enough slots in time

2021-08-16 Thread Zhu Zhu (Jira)
[ https://issues.apache.org/jira/browse/FLINK-23806?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhu Zhu updated FLINK-23806: Description: When requested slots are not fulfilled in time, task failure will be triggered and all

[jira] [Assigned] (FLINK-23806) StackOverflowException can happen if a large scale job failed to acquire enough slots in time

2021-08-16 Thread Zhu Zhu (Jira)
[ https://issues.apache.org/jira/browse/FLINK-23806?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhu Zhu reassigned FLINK-23806: --- Assignee: Zhu Zhu > StackOverflowException can happen if a large scale job failed to acquire >

[jira] [Created] (FLINK-23806) StackOverflowException can happen if a large scale job failed to acquire enough slots in time

2021-08-16 Thread Zhu Zhu (Jira)
Zhu Zhu created FLINK-23806: --- Summary: StackOverflowException can happen if a large scale job failed to acquire enough slots in time Key: FLINK-23806 URL: https://issues.apache.org/jira/browse/FLINK-23806

[jira] [Closed] (FLINK-21110) Optimize scheduler performance for large-scale jobs

2021-08-16 Thread Zhu Zhu (Jira)
[ https://issues.apache.org/jira/browse/FLINK-21110?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhu Zhu closed FLINK-21110. --- Resolution: Done > Optimize scheduler performance for large-scale jobs >

[jira] [Closed] (FLINK-22767) Optimize the initialization of LocalInputPreferredSlotSharingStrategy

2021-08-15 Thread Zhu Zhu (Jira)
[ https://issues.apache.org/jira/browse/FLINK-22767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhu Zhu closed FLINK-22767. --- Resolution: Done Done via 94ae3dc0cb0a0a374b8f15fe49b09cc00ccf4c19 > Optimize the initialization of

[jira] [Closed] (FLINK-22773) Optimize the construction of pipelined regions

2021-08-11 Thread Zhu Zhu (Jira)
[ https://issues.apache.org/jira/browse/FLINK-22773?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhu Zhu closed FLINK-22773. --- Resolution: Done Done via 656901beb4617c356840e9ae677ad2c9e65fd8da 2cfbf35649acb78711790ff67f9043835907b8ac

[jira] [Issue Comment Deleted] (FLINK-22945) StackOverflowException can happen when a large scale job is CANCELING/FAILING

2021-08-11 Thread Zhu Zhu (Jira)
[ https://issues.apache.org/jira/browse/FLINK-22945?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhu Zhu updated FLINK-22945: Comment: was deleted (was: This issue was labeled "stale-critical" 7 days ago and has not received any

[jira] [Issue Comment Deleted] (FLINK-22945) StackOverflowException can happen when a large scale job is CANCELING/FAILING

2021-08-11 Thread Zhu Zhu (Jira)
[ https://issues.apache.org/jira/browse/FLINK-22945?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhu Zhu updated FLINK-22945: Comment: was deleted (was: I am the [Flink Jira Bot|https://github.com/apache/flink-jira-bot/] and I

[jira] [Comment Edited] (FLINK-23593) Performance regression on 15.07.2021

2021-08-10 Thread Zhu Zhu (Jira)
[ https://issues.apache.org/jira/browse/FLINK-23593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17396636#comment-17396636 ] Zhu Zhu edited comment on FLINK-23593 at 8/11/21, 3:01 AM: --- I think I find the

[jira] [Comment Edited] (FLINK-23593) Performance regression on 15.07.2021

2021-08-10 Thread Zhu Zhu (Jira)
[ https://issues.apache.org/jira/browse/FLINK-23593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17396636#comment-17396636 ] Zhu Zhu edited comment on FLINK-23593 at 8/11/21, 3:01 AM: --- I think I find the

[jira] [Comment Edited] (FLINK-23593) Performance regression on 15.07.2021

2021-08-10 Thread Zhu Zhu (Jira)
[ https://issues.apache.org/jira/browse/FLINK-23593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17396636#comment-17396636 ] Zhu Zhu edited comment on FLINK-23593 at 8/11/21, 2:53 AM: --- I think I find the

[jira] [Comment Edited] (FLINK-23593) Performance regression on 15.07.2021

2021-08-10 Thread Zhu Zhu (Jira)
[ https://issues.apache.org/jira/browse/FLINK-23593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17396636#comment-17396636 ] Zhu Zhu edited comment on FLINK-23593 at 8/11/21, 2:51 AM: --- I think I find the

[jira] [Comment Edited] (FLINK-23593) Performance regression on 15.07.2021

2021-08-10 Thread Zhu Zhu (Jira)
[ https://issues.apache.org/jira/browse/FLINK-23593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17396636#comment-17396636 ] Zhu Zhu edited comment on FLINK-23593 at 8/10/21, 12:09 PM: I think I find

[jira] [Comment Edited] (FLINK-23593) Performance regression on 15.07.2021

2021-08-10 Thread Zhu Zhu (Jira)
[ https://issues.apache.org/jira/browse/FLINK-23593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17396636#comment-17396636 ] Zhu Zhu edited comment on FLINK-23593 at 8/10/21, 12:08 PM: I think I find

[jira] [Commented] (FLINK-23593) Performance regression on 15.07.2021

2021-08-10 Thread Zhu Zhu (Jira)
[ https://issues.apache.org/jira/browse/FLINK-23593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17396636#comment-17396636 ] Zhu Zhu commented on FLINK-23593: - I think I find the cause of the regression. *Cause* The regression

[jira] [Comment Edited] (FLINK-23593) Performance regression on 15.07.2021

2021-08-09 Thread Zhu Zhu (Jira)
[ https://issues.apache.org/jira/browse/FLINK-23593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17395983#comment-17395983 ] Zhu Zhu edited comment on FLINK-23593 at 8/9/21, 11:24 AM: --- I tried the

[jira] [Comment Edited] (FLINK-23593) Performance regression on 15.07.2021

2021-08-09 Thread Zhu Zhu (Jira)
[ https://issues.apache.org/jira/browse/FLINK-23593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17393950#comment-17393950 ] Zhu Zhu edited comment on FLINK-23593 at 8/9/21, 11:22 AM: --- >> Could the

[jira] [Comment Edited] (FLINK-23593) Performance regression on 15.07.2021

2021-08-09 Thread Zhu Zhu (Jira)
[ https://issues.apache.org/jira/browse/FLINK-23593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17395983#comment-17395983 ] Zhu Zhu edited comment on FLINK-23593 at 8/9/21, 11:20 AM: --- I tried the

[jira] [Commented] (FLINK-23593) Performance regression on 15.07.2021

2021-08-09 Thread Zhu Zhu (Jira)
[ https://issues.apache.org/jira/browse/FLINK-23593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17395983#comment-17395983 ] Zhu Zhu commented on FLINK-23593: - I tried the benchmarks locally before/after applying FLINK-23372 and

[jira] [Closed] (FLINK-16069) Creation of TaskDeploymentDescriptor can block main thread for long time

2021-08-08 Thread Zhu Zhu (Jira)
[ https://issues.apache.org/jira/browse/FLINK-16069?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhu Zhu closed FLINK-16069. --- Resolution: Duplicate > Creation of TaskDeploymentDescriptor can block main thread for long time >

[jira] [Commented] (FLINK-16069) Creation of TaskDeploymentDescriptor can block main thread for long time

2021-08-08 Thread Zhu Zhu (Jira)
[ https://issues.apache.org/jira/browse/FLINK-16069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17395728#comment-17395728 ] Zhu Zhu commented on FLINK-16069: - Thanks for making the improvements and sharing the results!

[jira] [Comment Edited] (FLINK-23593) Performance regression on 15.07.2021

2021-08-05 Thread Zhu Zhu (Jira)
[ https://issues.apache.org/jira/browse/FLINK-23593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17393950#comment-17393950 ] Zhu Zhu edited comment on FLINK-23593 at 8/5/21, 12:19 PM: --- >> Could the

[jira] [Commented] (FLINK-23593) Performance regression on 15.07.2021

2021-08-05 Thread Zhu Zhu (Jira)
[ https://issues.apache.org/jira/browse/FLINK-23593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17393950#comment-17393950 ] Zhu Zhu commented on FLINK-23593: - >> Could the larger difference between local benchmark vs. cloud be

[jira] [Updated] (FLINK-23172) Links to Task Failure Recovery page on Configuration page are broken

2021-08-05 Thread Zhu Zhu (Jira)
[ https://issues.apache.org/jira/browse/FLINK-23172?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhu Zhu updated FLINK-23172: Affects Version/s: 1.13.2 > Links to Task Failure Recovery page on Configuration page are broken >

[jira] [Commented] (FLINK-23593) Performance regression on 15.07.2021

2021-08-05 Thread Zhu Zhu (Jira)
[ https://issues.apache.org/jira/browse/FLINK-23593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17393836#comment-17393836 ] Zhu Zhu commented on FLINK-23593: - Thanks for the updates! [~sewen] I think your guess about *Trying to

[jira] [Updated] (FLINK-22674) Provide JobID when apply shuffle resource by ShuffleMaster

2021-08-04 Thread Zhu Zhu (Jira)
[ https://issues.apache.org/jira/browse/FLINK-22674?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhu Zhu updated FLINK-22674: Fix Version/s: 1.14.0 > Provide JobID when apply shuffle resource by ShuffleMaster >

[jira] [Updated] (FLINK-23214) Make ShuffleMaster a cluster level shared service

2021-08-04 Thread Zhu Zhu (Jira)
[ https://issues.apache.org/jira/browse/FLINK-23214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhu Zhu updated FLINK-23214: Fix Version/s: 1.14.0 > Make ShuffleMaster a cluster level shared service >

[jira] [Closed] (FLINK-23249) Introduce ShuffleMasterContext to ShuffleMaster

2021-08-04 Thread Zhu Zhu (Jira)
[ https://issues.apache.org/jira/browse/FLINK-23249?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhu Zhu closed FLINK-23249. --- Resolution: Done Done via 0ee4038ef596b22630bc814f677fa489d3796241 > Introduce ShuffleMasterContext to

[jira] [Updated] (FLINK-22675) Add lifecycle methods to ShuffleMaster

2021-08-04 Thread Zhu Zhu (Jira)
[ https://issues.apache.org/jira/browse/FLINK-22675?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhu Zhu updated FLINK-22675: Fix Version/s: 1.14.0 > Add lifecycle methods to ShuffleMaster > -- >

[jira] [Closed] (FLINK-23214) Make ShuffleMaster a cluster level shared service

2021-08-04 Thread Zhu Zhu (Jira)
[ https://issues.apache.org/jira/browse/FLINK-23214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhu Zhu closed FLINK-23214. --- Resolution: Done Done via 81e1db3c439c1758dccd1a20f2f6b70120f48ef7 > Make ShuffleMaster a cluster level

[jira] [Closed] (FLINK-22674) Provide JobID when apply shuffle resource by ShuffleMaster

2021-08-04 Thread Zhu Zhu (Jira)
[ https://issues.apache.org/jira/browse/FLINK-22674?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhu Zhu closed FLINK-22674. --- Resolution: Done Done via 6bc8399e7f1738ec22cb1082c096269b5106cee5 > Provide JobID when apply shuffle

[jira] [Assigned] (FLINK-22675) Add lifecycle methods to ShuffleMaster

2021-08-04 Thread Zhu Zhu (Jira)
[ https://issues.apache.org/jira/browse/FLINK-22675?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhu Zhu reassigned FLINK-22675: --- Assignee: Yingjie Cao (was: Zhu Zhu) > Add lifecycle methods to ShuffleMaster >

[jira] [Closed] (FLINK-22675) Add lifecycle methods to ShuffleMaster

2021-08-04 Thread Zhu Zhu (Jira)
[ https://issues.apache.org/jira/browse/FLINK-22675?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhu Zhu closed FLINK-22675. --- Resolution: Done Done via 80df36b51af791f67126e07e182015ea6ea73fd2 > Add lifecycle methods to

[jira] [Assigned] (FLINK-22674) Provide JobID when apply shuffle resource by ShuffleMaster

2021-08-04 Thread Zhu Zhu (Jira)
[ https://issues.apache.org/jira/browse/FLINK-22674?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhu Zhu reassigned FLINK-22674: --- Assignee: Yingjie Cao > Provide JobID when apply shuffle resource by ShuffleMaster >

[jira] [Assigned] (FLINK-22675) Add lifecycle methods to ShuffleMaster

2021-08-04 Thread Zhu Zhu (Jira)
[ https://issues.apache.org/jira/browse/FLINK-22675?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhu Zhu reassigned FLINK-22675: --- Assignee: Zhu Zhu > Add lifecycle methods to ShuffleMaster >

[jira] [Assigned] (FLINK-23214) Make ShuffleMaster a cluster level shared service

2021-08-04 Thread Zhu Zhu (Jira)
[ https://issues.apache.org/jira/browse/FLINK-23214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhu Zhu reassigned FLINK-23214: --- Assignee: Yingjie Cao > Make ShuffleMaster a cluster level shared service >

[jira] [Assigned] (FLINK-23249) Introduce ShuffleMasterContext to ShuffleMaster

2021-08-04 Thread Zhu Zhu (Jira)
[ https://issues.apache.org/jira/browse/FLINK-23249?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhu Zhu reassigned FLINK-23249: --- Assignee: Yingjie Cao > Introduce ShuffleMasterContext to ShuffleMaster >

[jira] [Assigned] (FLINK-22910) Refine ShuffleMaster lifecycle management for pluggable shuffle service framework

2021-08-04 Thread Zhu Zhu (Jira)
[ https://issues.apache.org/jira/browse/FLINK-22910?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhu Zhu reassigned FLINK-22910: --- Assignee: Yingjie Cao > Refine ShuffleMaster lifecycle management for pluggable shuffle service >

[jira] [Commented] (FLINK-23590) StreamTaskTest#testProcessWithUnAvailableInput is flaky

2021-08-03 Thread Zhu Zhu (Jira)
[ https://issues.apache.org/jira/browse/FLINK-23590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17392737#comment-17392737 ] Zhu Zhu commented on FLINK-23590: -

[jira] [Closed] (FLINK-23172) Links to Task Failure Recovery page on Configuration page are broken

2021-08-03 Thread Zhu Zhu (Jira)
[ https://issues.apache.org/jira/browse/FLINK-23172?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhu Zhu closed FLINK-23172. --- Resolution: Fixed Fixed via 5183b2af9d467708725bd1454a671bc7689159a5

[jira] [Assigned] (FLINK-22767) Optimize the initialization of LocalInputPreferredSlotSharingStrategy

2021-08-03 Thread Zhu Zhu (Jira)
[ https://issues.apache.org/jira/browse/FLINK-22767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhu Zhu reassigned FLINK-22767: --- Assignee: Zhilong Hong > Optimize the initialization of LocalInputPreferredSlotSharingStrategy >

[jira] [Closed] (FLINK-23599) Remove JobVertex#connectIdInput

2021-08-03 Thread Zhu Zhu (Jira)
[ https://issues.apache.org/jira/browse/FLINK-23599?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhu Zhu closed FLINK-23599. --- Resolution: Done Done via ec9ff1ee5e33529260d6a3adfad4b0b34efde55e > Remove JobVertex#connectIdInput >

[jira] [Assigned] (FLINK-23599) Remove JobVertex#connectIdInput

2021-08-03 Thread Zhu Zhu (Jira)
[ https://issues.apache.org/jira/browse/FLINK-23599?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhu Zhu reassigned FLINK-23599: --- Assignee: Zhilong Hong > Remove JobVertex#connectIdInput > --- > >

[jira] [Commented] (FLINK-23593) Performance regression on 15.07.2021

2021-08-03 Thread Zhu Zhu (Jira)
[ https://issues.apache.org/jira/browse/FLINK-23593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17392091#comment-17392091 ] Zhu Zhu commented on FLINK-23593: - I'd like to understand why the regression happens due to FLINK-23372

[jira] [Closed] (FLINK-23354) Limit the size of ShuffleDescriptors in PermanentBlobCache on TaskExecutor

2021-07-29 Thread Zhu Zhu (Jira)
[ https://issues.apache.org/jira/browse/FLINK-23354?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhu Zhu closed FLINK-23354. --- Resolution: Done Done via 5c475d41fea3c81557e0d463bed1c94024dd0da5 > Limit the size of ShuffleDescriptors

[jira] [Commented] (FLINK-23172) Links of restart strategy in configuration page is broken

2021-07-28 Thread Zhu Zhu (Jira)
[ https://issues.apache.org/jira/browse/FLINK-23172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17389206#comment-17389206 ] Zhu Zhu commented on FLINK-23172: - Thanks for reporting this problem! [~Thesharing] I have assigned you

[jira] [Assigned] (FLINK-23172) Links of restart strategy in configuration page is broken

2021-07-28 Thread Zhu Zhu (Jira)
[ https://issues.apache.org/jira/browse/FLINK-23172?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhu Zhu reassigned FLINK-23172: --- Assignee: Zhilong Hong > Links of restart strategy in configuration page is broken >

[jira] [Assigned] (FLINK-22773) Optimize the construction of pipelined regions

2021-07-28 Thread Zhu Zhu (Jira)
[ https://issues.apache.org/jira/browse/FLINK-22773?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhu Zhu reassigned FLINK-22773: --- Assignee: Zhilong Hong > Optimize the construction of pipelined regions >

[jira] [Updated] (FLINK-22773) Optimize the construction of pipelined regions

2021-07-28 Thread Zhu Zhu (Jira)
[ https://issues.apache.org/jira/browse/FLINK-22773?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhu Zhu updated FLINK-22773: Fix Version/s: 1.14.0 > Optimize the construction of pipelined regions >

[jira] [Assigned] (FLINK-23354) Limit the size of ShuffleDescriptors in PermanentBlobCache on TaskExecutor

2021-07-28 Thread Zhu Zhu (Jira)
[ https://issues.apache.org/jira/browse/FLINK-23354?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhu Zhu reassigned FLINK-23354: --- Assignee: Zhilong Hong > Limit the size of ShuffleDescriptors in PermanentBlobCache on

[jira] [Commented] (FLINK-23402) Expose a consistent GlobalDataExchangeMode

2021-07-28 Thread Zhu Zhu (Jira)
[ https://issues.apache.org/jira/browse/FLINK-23402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1730#comment-1730 ] Zhu Zhu commented on FLINK-23402: - +1 for option #2 to rename {{ShuffleMode}} as well as the

[jira] [Closed] (FLINK-23005) Cache the compressed serialized value of ShuffleDescriptors

2021-07-26 Thread Zhu Zhu (Jira)
[ https://issues.apache.org/jira/browse/FLINK-23005?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhu Zhu closed FLINK-23005. --- Resolution: Done Done via a3f72f20acd4df1dbdc61e145d1d932f61ca63f8 6812d18c358ce007a5cbcd685f32f59c70b03a49

[jira] [Assigned] (FLINK-23218) Distribute the ShuffleDescriptors via blob server

2021-07-26 Thread Zhu Zhu (Jira)
[ https://issues.apache.org/jira/browse/FLINK-23218?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhu Zhu reassigned FLINK-23218: --- Assignee: Zhilong Hong > Distribute the ShuffleDescriptors via blob server >

[jira] [Closed] (FLINK-23218) Distribute the ShuffleDescriptors via blob server

2021-07-26 Thread Zhu Zhu (Jira)
[ https://issues.apache.org/jira/browse/FLINK-23218?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhu Zhu closed FLINK-23218. --- Resolution: Done Done via ee7e9c3b87f6533d6f54361fddc71585d6b8ad61 > Distribute the ShuffleDescriptors via

[jira] [Commented] (FLINK-23479) IncrementalAggregateJsonPlanTest.testIncrementalAggregateWithSumCountDistinctAndRetraction fail

2021-07-26 Thread Zhu Zhu (Jira)
[ https://issues.apache.org/jira/browse/FLINK-23479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17387460#comment-17387460 ] Zhu Zhu commented on FLINK-23479: - another instance:

[jira] [Commented] (FLINK-23470) Use blocking shuffles but pipeline within a slot for batch mode

2021-07-23 Thread Zhu Zhu (Jira)
[ https://issues.apache.org/jira/browse/FLINK-23470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17386107#comment-17386107 ] Zhu Zhu commented on FLINK-23470: - Sorry I did not see the discussion in FLINK-23402 until I noticed

[jira] [Commented] (FLINK-23218) Distribute the ShuffleDescriptors via blob server

2021-07-22 Thread Zhu Zhu (Jira)
[ https://issues.apache.org/jira/browse/FLINK-23218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17385644#comment-17385644 ] Zhu Zhu commented on FLINK-23218: - Thanks for confirming! [~trohrmann] And thanks for the explanation

[jira] [Commented] (FLINK-23218) Distribute the ShuffleDescriptors via blob server

2021-07-21 Thread Zhu Zhu (Jira)
[ https://issues.apache.org/jira/browse/FLINK-23218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17385234#comment-17385234 ] Zhu Zhu commented on FLINK-23218: - Just summarize the investigation and discussion, a common blob cache

[jira] [Comment Edited] (FLINK-23218) Distribute the ShuffleDescriptors via blob server

2021-07-21 Thread Zhu Zhu (Jira)
[ https://issues.apache.org/jira/browse/FLINK-23218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17384876#comment-17384876 ] Zhu Zhu edited comment on FLINK-23218 at 7/21/21, 1:02 PM: --- I took a look at

[jira] [Commented] (FLINK-23218) Distribute the ShuffleDescriptors via blob server

2021-07-21 Thread Zhu Zhu (Jira)
[ https://issues.apache.org/jira/browse/FLINK-23218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17384876#comment-17384876 ] Zhu Zhu commented on FLINK-23218: - I took a look at the code and looks to me that transient blobs are

[jira] [Commented] (FLINK-23218) Distribute the ShuffleDescriptors via blob server

2021-07-21 Thread Zhu Zhu (Jira)
[ https://issues.apache.org/jira/browse/FLINK-23218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17384868#comment-17384868 ] Zhu Zhu commented on FLINK-23218: - I think the PR mentioned above should be

[jira] [Closed] (FLINK-22677) Scheduler should invoke ShuffleMaster#registerPartitionWithProducer by a real asynchronous fashion

2021-07-19 Thread Zhu Zhu (Jira)
[ https://issues.apache.org/jira/browse/FLINK-22677?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhu Zhu closed FLINK-22677. --- Resolution: Done Done via 0d099b79fddc5e254884e44f2167c625744079a4 0b28fadccfb6b0d2a85592ced9e98b03a0c2d3bf

[jira] [Assigned] (FLINK-22672) Some enhancements for pluggable shuffle service framework

2021-07-18 Thread Zhu Zhu (Jira)
[ https://issues.apache.org/jira/browse/FLINK-22672?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhu Zhu reassigned FLINK-22672: --- Assignee: Jin Xing > Some enhancements for pluggable shuffle service framework >

[jira] [Closed] (FLINK-22676) The partition tracker should support remote shuffle properly

2021-07-18 Thread Zhu Zhu (Jira)
[ https://issues.apache.org/jira/browse/FLINK-22676?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhu Zhu closed FLINK-22676. --- Resolution: Done Done via 62a342b647fc1eac7f87769be92fda798649d6d4 > The partition tracker should support

[jira] [Updated] (FLINK-22676) The partition tracker should support remote shuffle properly

2021-07-18 Thread Zhu Zhu (Jira)
[ https://issues.apache.org/jira/browse/FLINK-22676?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhu Zhu updated FLINK-22676: Affects Version/s: (was: 1.4) 1.14.0 > The partition tracker should support

[jira] [Updated] (FLINK-22676) The partition tracker should support remote shuffle properly

2021-07-18 Thread Zhu Zhu (Jira)
[ https://issues.apache.org/jira/browse/FLINK-22676?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhu Zhu updated FLINK-22676: Fix Version/s: 1.14.0 > The partition tracker should support remote shuffle properly >

[jira] [Assigned] (FLINK-22676) The partition tracker should support remote shuffle properly

2021-07-18 Thread Zhu Zhu (Jira)
[ https://issues.apache.org/jira/browse/FLINK-22676?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhu Zhu reassigned FLINK-22676: --- Assignee: Jin Xing > The partition tracker should support remote shuffle properly >

[jira] [Updated] (FLINK-22676) The partition tracker should support remote shuffle properly

2021-07-18 Thread Zhu Zhu (Jira)
[ https://issues.apache.org/jira/browse/FLINK-22676?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhu Zhu updated FLINK-22676: Affects Version/s: 1.4 > The partition tracker should support remote shuffle properly >

[jira] [Issue Comment Deleted] (FLINK-22017) Regions may never be scheduled when there are cross-region blocking edges

2021-07-15 Thread Zhu Zhu (Jira)
[ https://issues.apache.org/jira/browse/FLINK-22017?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhu Zhu updated FLINK-22017: Comment: was deleted (was: I am the [Flink Jira Bot|https://github.com/apache/flink-jira-bot/] and I

[jira] [Issue Comment Deleted] (FLINK-22017) Regions may never be scheduled when there are cross-region blocking edges

2021-07-15 Thread Zhu Zhu (Jira)
[ https://issues.apache.org/jira/browse/FLINK-22017?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhu Zhu updated FLINK-22017: Comment: was deleted (was: This critical issue is unassigned and itself and all of its Sub-Tasks have

[jira] [Issue Comment Deleted] (FLINK-22017) Regions may never be scheduled when there are cross-region blocking edges

2021-07-15 Thread Zhu Zhu (Jira)
[ https://issues.apache.org/jira/browse/FLINK-22017?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhu Zhu updated FLINK-22017: Comment: was deleted (was: I am the [Flink Jira Bot|https://github.com/apache/flink-jira-bot/] and I

[jira] [Issue Comment Deleted] (FLINK-22017) Regions may never be scheduled when there are cross-region blocking edges

2021-07-15 Thread Zhu Zhu (Jira)
[ https://issues.apache.org/jira/browse/FLINK-22017?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhu Zhu updated FLINK-22017: Comment: was deleted (was: This issue was labeled "stale-critical" 7 ago and has not received any

[jira] [Closed] (FLINK-22017) Regions may never be scheduled when there are cross-region blocking edges

2021-07-15 Thread Zhu Zhu (Jira)
[ https://issues.apache.org/jira/browse/FLINK-22017?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhu Zhu closed FLINK-22017. --- Assignee: Zhilong Hong Resolution: Fixed Fixed via d2005268b1eeb0fe928b69c5e56ca54862fbf508

[jira] [Commented] (FLINK-23218) Distribute the ShuffleDescriptors via blob server

2021-07-09 Thread Zhu Zhu (Jira)
[ https://issues.apache.org/jira/browse/FLINK-23218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17377963#comment-17377963 ] Zhu Zhu commented on FLINK-23218: - I took another think and 10GB sounds good to me now. If we always

[jira] [Commented] (FLINK-23218) Distribute the ShuffleDescriptors via blob server

2021-07-09 Thread Zhu Zhu (Jira)
[ https://issues.apache.org/jira/browse/FLINK-23218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17377961#comment-17377961 ] Zhu Zhu commented on FLINK-23218: - 10GB looks a bit too large to limit the blob size by default. If we

[jira] [Commented] (FLINK-23218) Distribute the ShuffleDescriptors via blob server

2021-07-08 Thread Zhu Zhu (Jira)
[ https://issues.apache.org/jira/browse/FLINK-23218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17377811#comment-17377811 ] Zhu Zhu commented on FLINK-23218: - 1. To not affect existing users, I prefer limit to not be too small

[jira] [Closed] (FLINK-15031) Automatically calculate required network memory for fine-grained jobs

2021-07-06 Thread Zhu Zhu (Jira)
[ https://issues.apache.org/jira/browse/FLINK-15031?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhu Zhu closed FLINK-15031. --- Fix Version/s: (was: 1.12.0) 1.14.0 Resolution: Fixed Done via

[jira] [Commented] (FLINK-23262) FileReadingWatermarkITCase.testWatermarkEmissionWithChaining fails on azure

2021-07-06 Thread Zhu Zhu (Jira)
[ https://issues.apache.org/jira/browse/FLINK-23262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17375376#comment-17375376 ] Zhu Zhu commented on FLINK-23262: - another instance:

[jira] [Commented] (FLINK-22677) Scheduler should invoke ShuffleMaster#registerPartitionWithProducer by a real asynchronous fashion

2021-07-05 Thread Zhu Zhu (Jira)
[ https://issues.apache.org/jira/browse/FLINK-22677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17375218#comment-17375218 ] Zhu Zhu commented on FLINK-22677: - One thing need to mention is that I did not change

[jira] [Commented] (FLINK-22677) Scheduler should invoke ShuffleMaster#registerPartitionWithProducer by a real asynchronous fashion

2021-07-05 Thread Zhu Zhu (Jira)
[ https://issues.apache.org/jira/browse/FLINK-22677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17375217#comment-17375217 ] Zhu Zhu commented on FLINK-22677: - Problems below could happen if enabling partition registration is

[jira] [Assigned] (FLINK-22677) Scheduler should invoke ShuffleMaster#registerPartitionWithProducer by a real asynchronous fashion

2021-07-02 Thread Zhu Zhu (Jira)
[ https://issues.apache.org/jira/browse/FLINK-22677?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhu Zhu reassigned FLINK-22677: --- Assignee: Zhu Zhu > Scheduler should invoke ShuffleMaster#registerPartitionWithProducer by a real

[jira] [Updated] (FLINK-22677) Scheduler should invoke ShuffleMaster#registerPartitionWithProducer by a real asynchronous fashion

2021-07-02 Thread Zhu Zhu (Jira)
[ https://issues.apache.org/jira/browse/FLINK-22677?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhu Zhu updated FLINK-22677: Affects Version/s: 1.14.0 > Scheduler should invoke ShuffleMaster#registerPartitionWithProducer by a real

[jira] [Updated] (FLINK-22677) Scheduler should invoke ShuffleMaster#registerPartitionWithProducer by a real asynchronous fashion

2021-07-02 Thread Zhu Zhu (Jira)
[ https://issues.apache.org/jira/browse/FLINK-22677?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhu Zhu updated FLINK-22677: Fix Version/s: 1.14.0 > Scheduler should invoke ShuffleMaster#registerPartitionWithProducer by a real >

[jira] [Closed] (FLINK-22945) StackOverflowException can happen when a large scale job is CANCELING/FAILING

2021-06-30 Thread Zhu Zhu (Jira)
[ https://issues.apache.org/jira/browse/FLINK-22945?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhu Zhu closed FLINK-22945. --- Resolution: Fixed Fixed via: master: 5badc356abdcbb3d5cae1fe3f00f1ec18f414d98 1.13:

[jira] [Updated] (FLINK-23172) Links of restart strategy in configuration page is broken

2021-06-29 Thread Zhu Zhu (Jira)
[ https://issues.apache.org/jira/browse/FLINK-23172?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhu Zhu updated FLINK-23172: Priority: Major (was: Minor) > Links of restart strategy in configuration page is broken >

[jira] [Updated] (FLINK-23172) Links of restart strategy in configuration page is broken

2021-06-29 Thread Zhu Zhu (Jira)
[ https://issues.apache.org/jira/browse/FLINK-23172?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhu Zhu updated FLINK-23172: Issue Type: Bug (was: Technical Debt) > Links of restart strategy in configuration page is broken >

[jira] [Closed] (FLINK-23078) Scheduler Benchmarks not compiling

2021-06-29 Thread Zhu Zhu (Jira)
[ https://issues.apache.org/jira/browse/FLINK-23078?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhu Zhu closed FLINK-23078. --- Resolution: Fixed Fixed via flink: 439dbfa48122df164780f55da2cb05f64669a247

[jira] [Comment Edited] (FLINK-15031) Automatically calculate required network memory for fine-grained jobs

2021-06-29 Thread Zhu Zhu (Jira)
[ https://issues.apache.org/jira/browse/FLINK-15031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17371290#comment-17371290 ] Zhu Zhu edited comment on FLINK-15031 at 6/29/21, 10:52 AM: Discussed with

[jira] [Commented] (FLINK-15031) Automatically calculate required network memory for fine-grained jobs

2021-06-29 Thread Zhu Zhu (Jira)
[ https://issues.apache.org/jira/browse/FLINK-15031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17371290#comment-17371290 ] Zhu Zhu commented on FLINK-15031: - Discussed with Till offline. His concern was that the network

[jira] [Comment Edited] (FLINK-15031) Automatically calculate required network memory for fine-grained jobs

2021-06-29 Thread Zhu Zhu (Jira)
[ https://issues.apache.org/jira/browse/FLINK-15031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17371207#comment-17371207 ] Zhu Zhu edited comment on FLINK-15031 at 6/29/21, 8:18 AM: --- I think it should

[jira] [Commented] (FLINK-15031) Automatically calculate required network memory for fine-grained jobs

2021-06-29 Thread Zhu Zhu (Jira)
[ https://issues.apache.org/jira/browse/FLINK-15031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17371207#comment-17371207 ] Zhu Zhu commented on FLINK-15031: - I think it should be an advanced and experimental config. It can be

[jira] [Commented] (FLINK-15031) Automatically calculate required network memory for fine-grained jobs

2021-06-28 Thread Zhu Zhu (Jira)
[ https://issues.apache.org/jira/browse/FLINK-15031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17370577#comment-17370577 ] Zhu Zhu commented on FLINK-15031: - Thanks for reviving this discussion! This improvement is necessary

[jira] [Updated] (FLINK-15031) Automatically calculate required network memory for fine-grained jobs

2021-06-28 Thread Zhu Zhu (Jira)
[ https://issues.apache.org/jira/browse/FLINK-15031?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhu Zhu updated FLINK-15031: Summary: Automatically calculate required network memory for fine-grained jobs (was: Automatically

[jira] [Reopened] (FLINK-15031) Automatically calculate required shuffle memory for fine-grained jobs

2021-06-28 Thread Zhu Zhu (Jira)
[ https://issues.apache.org/jira/browse/FLINK-15031?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhu Zhu reopened FLINK-15031: - Assignee: Jin Xing (was: Zhu Zhu) > Automatically calculate required shuffle memory for fine-grained

[jira] [Updated] (FLINK-15031) Automatically calculate required shuffle memory for fine-grained jobs

2021-06-28 Thread Zhu Zhu (Jira)
[ https://issues.apache.org/jira/browse/FLINK-15031?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhu Zhu updated FLINK-15031: Summary: Automatically calculate required shuffle memory for fine-grained jobs (was: Calculate required

[jira] [Updated] (FLINK-22945) StackOverflowException can happen when a large scale job is CANCELING/FAILING

2021-06-28 Thread Zhu Zhu (Jira)
[ https://issues.apache.org/jira/browse/FLINK-22945?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhu Zhu updated FLINK-22945: Fix Version/s: 1.13.2 1.14.0 > StackOverflowException can happen when a large scale

[jira] [Assigned] (FLINK-22945) StackOverflowException can happen when a large scale job is CANCELING/FAILING

2021-06-28 Thread Zhu Zhu (Jira)
[ https://issues.apache.org/jira/browse/FLINK-22945?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhu Zhu reassigned FLINK-22945: --- Assignee: Gen Luo (was: Luo Gen) > StackOverflowException can happen when a large scale job is

<    4   5   6   7   8   9   10   11   12   13   >