Re: ExecutionMode in ExecutionConfig

Yun Tang Fri, 16 Sep 2022 04:41:51 -0700

Hi Hailu,

If you take a look at the history of ExecutionMode [1], apart from the 
refactoring commit, this class is introduced before the year 2016, in which 
DataSet API has not been deprecated.


>From my point of view, you should set runtime mode [2] instead of execution 
>mode currently if using Flink as a computation engine.


[1] 
https://github.com/apache/flink/commits/master/flink-core/src/main/java/org/apache/flink/api/common/ExecutionMode.java
[2] 
https://github.com/apache/flink/blob/9d2ae5572897f3e2d9089414261a250cfc2a2ab8/flink-examples/flink-examples-streaming/src/main/java/org/apache/flink/streaming/examples/wordcount/WordCount.java#L98

Best
Yun Tang

________________________________
From: zhanghao.c...@outlook.com <zhanghao.c...@outlook.com>
Sent: Thursday, September 15, 2022 0:03
To: Hailu, Andreas <andreas.ha...@gs.com>; user@flink.apache.org 
<user@flink.apache.org>
Subject: Re: ExecutionMode in ExecutionConfig

It's added in Flink 1.14: 
https://nightlies.apache.org/flink/flink-docs-master/zh/release-notes/flink-1.14/#expose-a-consistent-globaldataexchangemode.
 Not sure if there's a way to change this in 1.13

Best,
Zhanghao Chen
________________________________
From: Hailu, Andreas <andreas.ha...@gs.com>
Sent: Wednesday, September 14, 2022 23:38
To: zhanghao.c...@outlook.com <zhanghao.c...@outlook.com>; 
user@flink.apache.org <user@flink.apache.org>
Subject: RE: ExecutionMode in ExecutionConfig


I can give this a try. Do you know which Flink version does this feature become 
available in?



ah



From: zhanghao.c...@outlook.com <zhanghao.c...@outlook.com>
Sent: Wednesday, September 14, 2022 11:10 AM
To: Hailu, Andreas [Engineering] <andreas.ha...@ny.email.gs.com>; 
user@flink.apache.org
Subject: Re: ExecutionMode in ExecutionConfig



Could you try setting ”execution.batch-shuffle-mode‘=‘ALL_EXCHANGES_PIPELINED’? 
Looks like the ExecutionMode in ExecutionConfig does not work for DataStream 
APIs.



The default shuffling behavior for a DataStream API in batch mode is 
'ALL_EXCHANGES_BLOCKING' where upstream and downstream tasks run subsequently. 
On the other hand, the pipelined mode will have upstream and downstream tasks 
run simultaneously.





Best,

Zhanghao Chen

________________________________

From: Hailu, Andreas <andreas.ha...@gs.com<mailto:andreas.ha...@gs.com>>
Sent: Wednesday, September 14, 2022 21:37
To: zhanghao.c...@outlook.com<mailto:zhanghao.c...@outlook.com> 
<zhanghao.c...@outlook.com<mailto:zhanghao.c...@outlook.com>>; 
user@flink.apache.org<mailto:user@flink.apache.org> 
<user@flink.apache.org<mailto:user@flink.apache.org>>
Subject: RE: ExecutionMode in ExecutionConfig



Hi Zhanghao,



That seems different than what I’m referencing and one of my points of 
confusion – the documents refer to ExecutionMode as BATCH and STREAMING which 
is different than what the code refers to it as Runtime Mode e.g. 
env.setRuntimeMode(RuntimeExecutionMode.BATCH);



I’m referring to the ExecutionMode in the ExecutionConfig e.g. 
env.getConfig().setExecutionMode(ExecutionMode.BATCH)/ 
env.getConfig().setExecutionMode(ExecutionMode.PIPELINED). I’m not able to find 
documentation on this anywhere.







ah



From: zhanghao.c...@outlook.com<mailto:zhanghao.c...@outlook.com> 
<zhanghao.c...@outlook.com<mailto:zhanghao.c...@outlook.com>>
Sent: Wednesday, September 14, 2022 1:10 AM
To: Hailu, Andreas [Engineering] 
<andreas.ha...@ny.email.gs.com<mailto:andreas.ha...@ny.email.gs.com>>; 
user@flink.apache.org<mailto:user@flink.apache.org>
Subject: Re: ExecutionMode in ExecutionConfig



https://nightlies.apache.org/flink/flink-docs-release-1.13/docs/dev/datastream/execution_mode/<https://urldefense.proofpoint.com/v2/url?u=https-3A__nightlies.apache.org_flink_flink-2Ddocs-2Drelease-2D1.13_docs_dev_datastream_execution-5Fmode_&d=DwMF-g&c=7563p3e2zaQw0AB1wrFVgyagb2IE5rTZOYPxLxfZlX4&r=hRr4SA7BtUvKoMBP6VDhfisy2OJ1ZAzai-pcCC6TFXM&m=q-f1lFtNrjN2BnGqtchdhZkFNvCDUE8ZuhD4M0wJsdHcpLqEqTybqUaMAlo6lz91&s=bM_ucnQfxGo5Ky9Fq6S1yXbTqz476hGaKtkZINW4kGU&e=>
 gives a comprehensive description on it

Execution Mode (Batch/Streaming) | Apache 
Flink<https://urldefense.proofpoint.com/v2/url?u=https-3A__nightlies.apache.org_flink_flink-2Ddocs-2Drelease-2D1.13_docs_dev_datastream_execution-5Fmode_&d=DwMF-g&c=7563p3e2zaQw0AB1wrFVgyagb2IE5rTZOYPxLxfZlX4&r=hRr4SA7BtUvKoMBP6VDhfisy2OJ1ZAzai-pcCC6TFXM&m=q-f1lFtNrjN2BnGqtchdhZkFNvCDUE8ZuhD4M0wJsdHcpLqEqTybqUaMAlo6lz91&s=bM_ucnQfxGo5Ky9Fq6S1yXbTqz476hGaKtkZINW4kGU&e=>

Execution Mode (Batch/Streaming) # The DataStream API supports different 
runtime execution modes from which you can choose depending on the requirements 
of your use case and the characteristics of your job. There is the “classic” 
execution behavior of the DataStream API, which we call STREAMING execution 
mode. This should be used for unbounded jobs that require continuous 
incremental ...

nightlies.apache.org





Best,

Zhanghao Chen

________________________________

From: Hailu, Andreas <andreas.ha...@gs.com<mailto:andreas.ha...@gs.com>>
Sent: Wednesday, September 14, 2022 7:13
To: user@flink.apache.org<mailto:user@flink.apache.org> 
<user@flink.apache.org<mailto:user@flink.apache.org>>
Subject: ExecutionMode in ExecutionConfig



Hello,



Is there somewhere I can learn more about the details of the effect of 
ExecutionMode in ExecutionConfig on a job? I am trying sort out some of the 
details as it seems to work differently between the DataStream API and 
deprecated DataSet API.



I’ve attached a picture of this job graph - I’m reading from a total of 3 data 
sources – the results of 2 are sent to CoGroup (orange rectangle), and the 
other has its records forwarded to a sink after some basic filter + map 
operations (red rectangle).



The DataSet API’s job graph has all of the operators RUNNING immediately as we 
desire. However, the DataStream API’s job graph only has the DataSource 
operators that are feeding into the CoGroup online, and the remaining operators 
wake up only when the 2 sources have completed. This winds up introducing a lot 
of latency in processing the batch.



Both of these are running in the same environment on the same data with 
identical ExecutionMode configs, just different APIs. I’m attempting to have 
the same behavior between them. I ask about ExecutionMode as I am able to 
replicate this behavior in DataSet by setting the ExecutionMode from the 
default of PIPELINED to BATCH.



Thanks!



best,

ah





________________________________

Your Personal Data: We may collect and process information about you that may 
be subject to data protection laws. For more information about how we use and 
disclose your personal data, how we protect your information, our legal basis 
to use your information, your rights and who you can contact, please refer to: 
www.gs.com/privacy-notices<http://www.gs.com/privacy-notices>



________________________________

Your Personal Data: We may collect and process information about you that may 
be subject to data protection laws. For more information about how we use and 
disclose your personal data, how we protect your information, our legal basis 
to use your information, your rights and who you can contact, please refer to: 
www.gs.com/privacy-notices<http://www.gs.com/privacy-notices>

________________________________

Your Personal Data: We may collect and process information about you that may 
be subject to data protection laws. For more information about how we use and 
disclose your personal data, how we protect your information, our legal basis 
to use your information, your rights and who you can contact, please refer to: 
www.gs.com/privacy-notices<http://www.gs.com/privacy-notices>

Re: ExecutionMode in ExecutionConfig

Reply via email to