date:20210430



flinkbot edited a comment on pull request #15821:
URL: https://github.com/apache/flink/pull/15821#issuecomment-830427889


   
   ## CI report:
   
   * 701b87adda8ea30306cf1d99e7aea84f1ded716a Azure: 
[SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=17478)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] fangyuefy commented on pull request #15817: [FLINK-14393][webui] Add an option to enable/disable cancel job in we…



fangyuefy commented on pull request #15817:
URL: https://github.com/apache/flink/pull/15817#issuecomment-830530227


   > Thanks for creating this PR @fangyuefy. I've tested it, and it works. 
Nicely done. I will address my comments while merging this PR.
   
   Thanks @tillrohrmann  for reviewing the code, I fully agree with the 
modifications and suggestions. This is my first community contribution, what 
else do I need to do to make this PR merge?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] flinkbot edited a comment on pull request #15821: [FLINK-21911][table] Add built-in greatest/least functions support



flinkbot edited a comment on pull request #15821:
URL: https://github.com/apache/flink/pull/15821#issuecomment-830427889


   
   ## CI report:
   
   * 701b87adda8ea30306cf1d99e7aea84f1ded716a Azure: 
[PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=17478)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] flinkbot commented on pull request #15821: [FLINK-21911][table] Add built-in greatest/least functions support



flinkbot commented on pull request #15821:
URL: https://github.com/apache/flink/pull/15821#issuecomment-830427889


   
   ## CI report:
   
   * 701b87adda8ea30306cf1d99e7aea84f1ded716a UNKNOWN
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] flinkbot commented on pull request #15821: [FLINK-21911][table] Add built-in greatest/least functions support



flinkbot commented on pull request #15821:
URL: https://github.com/apache/flink/pull/15821#issuecomment-830424683


   Thanks a lot for your contribution to the Apache Flink project. I'm the 
@flinkbot. I help the community
   to review your pull request. We will use this comment to track the progress 
of the review.
   
   
   ## Automated Checks
   Last check on commit 701b87adda8ea30306cf1d99e7aea84f1ded716a (Fri Apr 30 
22:04:07 UTC 2021)
   
   **Warnings:**
* No documentation files were touched! Remember to keep the Flink docs up 
to date!
   
   
   Mention the bot in a comment to re-run the automated checks.
   ## Review Progress
   
   * ❓ 1. The [description] looks good.
   * ❓ 2. There is [consensus] that the contribution should go into to Flink.
   * ❓ 3. Needs [attention] from.
   * ❓ 4. The change fits into the overall [architecture].
   * ❓ 5. Overall code [quality] is good.
   
   Please see the [Pull Request Review 
Guide](https://flink.apache.org/contributing/reviewing-prs.html) for a full 
explanation of the review process.
The Bot is tracking the review progress through labels. Labels are applied 
according to the order of the review items. For consensus, approval by a Flink 
committer of PMC member is required Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot approve description` to approve one or more aspects (aspects: 
`description`, `consensus`, `architecture` and `quality`)
- `@flinkbot approve all` to approve all aspects
- `@flinkbot approve-until architecture` to approve everything until 
`architecture`
- `@flinkbot attention @username1 [@username2 ..]` to require somebody's 
attention
- `@flinkbot disapprove architecture` to remove an approval you gave earlier
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] snuyanzin opened a new pull request #15821: [FLINK-21911][table] Add built-in greatest/least functions support



snuyanzin opened a new pull request #15821:
URL: https://github.com/apache/flink/pull/15821


   
   
   ## What is the purpose of the change
   
   The PR implements three functions from the list mentioned in FLINK-21911 : 
`GREATEST`, `LEAST`
   
   ## Brief change log
   
   Implementation, tests and docs for `GREATEST`, `LEAST`
   
   
   ## Verifying this change
   
   - `MiscFunctionsITCase`
   
   ## Does this pull request potentially affect one of the following parts:
   
 - Dependencies (does it add or upgrade a dependency): no
 - The public API, i.e., is any changed class annotated with 
`@Public(Evolving)`: yes
 - The serializers: no
 - The runtime per-record code paths (performance sensitive): no
 - Anything that affects deployment or recovery: JobManager (and its 
components), Checkpointing, Kubernetes/Yarn/Mesos, ZooKeeper: no
 - The S3 file system connector: no
   
   ## Documentation
   
 - Does this pull request introduce a new feature? yes
 - If yes, how is the feature documented? docs / JavaDocs
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Updated] (FLINK-21911) Support GREATEST and LEAST functions in SQL

2021-04-30 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-21911?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated FLINK-21911:
---
Labels: pull-request-available  (was: )

> Support GREATEST and LEAST functions in SQL
> ---
>
> Key: FLINK-21911
> URL: https://issues.apache.org/jira/browse/FLINK-21911
> Project: Flink
>  Issue Type: Sub-task
>  Components: Table SQL / API
>Reporter: Timo Walther
>Assignee: Sergey Nuyanzin
>Priority: Major
>  Labels: pull-request-available
>
> We should discuss if we want to support math MIN / MAX in Flink SQL. It seems 
> also other vendors do not support it out of the box:
> [https://stackoverflow.com/questions/124417/is-there-a-max-function-in-sql-server-that-takes-two-values-like-math-max-in-ne]
> But it might be quite useful and a common operation in JVM languages.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[GitHub] [flink] fdalsotto commented on a change in pull request #15813: [FLINK-22519][flink-python] support tar.gz python archives



fdalsotto commented on a change in pull request #15813:
URL: https://github.com/apache/flink/pull/15813#discussion_r624232556



##
File path: 
flink-python/src/main/java/org/apache/flink/python/env/beam/ProcessPythonEnvironmentManager.java
##
@@ -326,9 +327,21 @@ private void constructArchivesDirectory(Map env) throws IOExcept
 
 // extract archives to archives directory
 for (Map.Entry entry : 
dependencyInfo.getArchives().entrySet()) {
-ZipUtils.extractZipFileWithPermissions(
-entry.getKey(),
-String.join(File.separator, archivesDirectory, 
entry.getValue()));
+String filePath = entry.getKey();
+if (filePath.endsWith(".zip") || filePath.endsWith(".jar")) {

Review comment:
   that'd be extra effort to only support windows installations for a very 
low number of users. not sure if I'd support this




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] flinkbot edited a comment on pull request #15768: [FLINK-22451][table] Support (*) as parameter of UDFs in Table API



flinkbot edited a comment on pull request #15768:
URL: https://github.com/apache/flink/pull/15768#issuecomment-826735938


   
   ## CI report:
   
   * dced81c2ffc8c59f9d9311346e71309129aa73cf Azure: 
[SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=17473)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] flinkbot edited a comment on pull request #15820: [FLINK-22535][runtime] CleanUp is invoked for task even when the task…



flinkbot edited a comment on pull request #15820:
URL: https://github.com/apache/flink/pull/15820#issuecomment-830052015


   
   ## CI report:
   
   * c11e6cb0c0817a53ef456462f610e3d079f590f9 Azure: 
[SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=17472)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Comment Edited] (FLINK-10165) JarRunHandler/JarRunRequestBody should allow to pass program arguments as escaped json list

2021-04-30 Thread Matthias (Jira)



[ 
https://issues.apache.org/jira/browse/FLINK-10165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17202057#comment-17202057
 ] 

Matthias edited comment on FLINK-10165 at 4/30/21, 7:39 PM:


This issue was visited as part of the Engine team's backlog grooming on Sep 25, 
2020. The relevant code change is 
[JarRunHandler#106|https://github.com/apache/flink/commit/fefe866bad47b1c4a2f92eded19bc7a5059f1277#diff-f83b0a6d9d1ab75ac867244111bb6b15R106].
 It was moved into 
[JarHandlerUtils#207|https://github.com/apache/flink/blob/7381304930a964098df52d9fa79a55241538b301/flink-runtime-web/src/main/java/org/apache/flink/runtime/webmonitor/handlers/utils/JarHandlerUtils.java#L207]


was (Author: mapohl):
This issue was visited as part of the Engine team's backlog grooming on Sep 25, 
2020. The relevant code change is 
[JarRunHandler#106https://github.com/apache/flink/commit/fefe866bad47b1c4a2f92eded19bc7a5059f1277#diff-f83b0a6d9d1ab75ac867244111bb6b15R106].
 It was moved into 
[JarHandlerUtils#207|https://github.com/apache/flink/blob/7381304930a964098df52d9fa79a55241538b301/flink-runtime-web/src/main/java/org/apache/flink/runtime/webmonitor/handlers/utils/JarHandlerUtils.java#L207]

> JarRunHandler/JarRunRequestBody should allow to pass program arguments as 
> escaped json list
> ---
>
> Key: FLINK-10165
> URL: https://issues.apache.org/jira/browse/FLINK-10165
> Project: Flink
>  Issue Type: Improvement
>  Components: Runtime / REST
>Affects Versions: 1.6.0, 1.10.2, 1.11.2
>Reporter: Maciej Prochniak
>Priority: Minor
>  Labels: auto-deprioritized-major
>
> Currently program arguments are parsed from plain string: 
> [https://github.com/apache/flink/blob/master/flink-runtime-web/src/main/java/org/apache/flink/runtime/webmonitor/handlers/JarRunHandler.java#L106]
> It doesn't allow to put quotes or new lines in arguments - in particular it's 
> difficult to pass json as argument. I think it would be good to pass 
> arguments as json list - then jackson would handle escaping. It'd be a bit 
> more problematic for query string parameters... WDYT [~Zentol]?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[GitHub] [flink] flinkbot edited a comment on pull request #15768: [FLINK-22451][table] Support (*) as parameter of UDFs in Table API



flinkbot edited a comment on pull request #15768:
URL: https://github.com/apache/flink/pull/15768#issuecomment-826735938


   
   ## CI report:
   
   * 11787dc2e0c372007bddc148e0d4aec2ed9a275f Azure: 
[CANCELED](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=17470)
 
   * dced81c2ffc8c59f9d9311346e71309129aa73cf Azure: 
[PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=17473)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] flinkbot edited a comment on pull request #15789: [FLINK-21181][runtime] Wait for Invokable cancellation before releasing network resources



flinkbot edited a comment on pull request #15789:
URL: https://github.com/apache/flink/pull/15789#issuecomment-827996694


   
   ## CI report:
   
   * 4c5180310bf76e96f2665bf53531eccb1fa86421 UNKNOWN
   * 46e1be2c4832080bf1cb48c509b77cd88872d024 Azure: 
[FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=17468)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Updated] (FLINK-19164) Release scripts break other dependency versions unintentionally

2021-04-30 Thread Serhat Soydan (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-19164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Serhat Soydan updated FLINK-19164:
--
Labels:   (was: stale-assigned)

> Release scripts break other dependency versions unintentionally
> ---
>
> Key: FLINK-19164
> URL: https://issues.apache.org/jira/browse/FLINK-19164
> Project: Flink
>  Issue Type: Bug
>  Components: Deployment / Scripts, Release System
>Reporter: Serhat Soydan
>Assignee: Serhat Soydan
>Priority: Minor
>
> All the scripts below has a line to change the old version to new version in 
> pom files.
> [https://github.com/apache/flink/blob/master/tools/change-version.sh#L31]
> [https://github.com/apache/flink/blob/master/tools/releasing/create_release_branch.sh#L60]
> [https://github.com/apache/flink/blob/master/tools/releasing/update_branch_version.sh#L52]
>  
> It works like find & replace so it is prone to unintentional errors. Any 
> dependency with a version equals to "old version" might be automatically 
> changed to "new version". See below to see how to produce a similar case. 
>  
> +How to re-produce the bug:+
>  * Clone/Fork Flink repo and for example checkout version v*1.11.1* 
>  * Apply any changes you need
>  * Run "create_release_branch.sh" script with OLD_VERSION=*1.11.1* 
> NEW_VERSION={color:#de350b}*1.12.0*{color}
>  ** In parent pom.xml, an auto find of maven-dependency-analyzer 
> version will be done automatically and *unintentionally* which will break the 
> build.
>  
> 
> org.apache.maven.shared
> maven-dependency-analyzer
> *1.11.1*
> 
>  
> 
> org.apache.maven.shared
> maven-dependency-analyzer
> {color:#de350b}*1.12.0*{color}
> 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Comment Edited] (FLINK-22506) YARN job cluster stuck in retrying creating JobManager if savepoint is corrupted



[ 
https://issues.apache.org/jira/browse/FLINK-22506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17337517#comment-17337517
 ] 

Paul Lin edited comment on FLINK-22506 at 4/30/21, 5:09 PM:


[~trohrmann] Thanks a lot for the input. Now I'm suspecting maybe the value of 
`yarn.application-attempt-failures-validity-interval` is too low (I'm using the 
default 10s), given that in my case a retry may take 1 min. I'll investigate 
further, and close the issue if it's a configuration problem. Thanks again! 


was (Author: paul lin):
[~trohrmann] Thanks a lot for the input. Now I'm suspecting maybe the value of 
`yarn.application-attempt-failures-validity-interval` is too low (I'm using the 
default), given that in my case a retry may take 1 min. I'll investigate 
further, and close the issue if it's a configuration problem. Thanks again! 

> YARN job cluster stuck in retrying creating JobManager if savepoint is 
> corrupted
> 
>
> Key: FLINK-22506
> URL: https://issues.apache.org/jira/browse/FLINK-22506
> Project: Flink
>  Issue Type: Improvement
>  Components: Deployment / YARN
>Affects Versions: 1.11.3
>Reporter: Paul Lin
>Priority: Major
> Attachments: corrupted_savepoint.log, yarn application attempts.png
>
>
> If a non-retryable error (e.g. the savepoint is corrupted or unaccessible) 
> occurs during the initiation of the job manager, the job cluster exits with 
> an error code. But since it does not mark the attempt as failed, it won't be 
> count as a failed attempt, and YARN will keep retrying forever.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Comment Edited] (FLINK-22506) YARN job cluster stuck in retrying creating JobManager if savepoint is corrupted



[ 
https://issues.apache.org/jira/browse/FLINK-22506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17337517#comment-17337517
 ] 

Paul Lin edited comment on FLINK-22506 at 4/30/21, 5:08 PM:


[~trohrmann] Thanks a lot for the input. Now I'm suspecting maybe the value of 
`yarn.application-attempt-failures-validity-interval` is too low (I'm using the 
default), given that in my case a retry may take 1 min. I'll investigate 
further, and close the issue if it's a configuration problem. Thanks again! 


was (Author: paul lin):
[~trohrmann] Thanks a lot for the input. Now I'm suspecting maybe the value of 
`yarn.application-attempt-failures-validity-interval` is low (I'm using the 
default), given that in my case a retry may take 1 min. I'll investigate 
further, and close the issue if it's a configuration problem. Thanks again! 

> YARN job cluster stuck in retrying creating JobManager if savepoint is 
> corrupted
> 
>
> Key: FLINK-22506
> URL: https://issues.apache.org/jira/browse/FLINK-22506
> Project: Flink
>  Issue Type: Improvement
>  Components: Deployment / YARN
>Affects Versions: 1.11.3
>Reporter: Paul Lin
>Priority: Major
> Attachments: corrupted_savepoint.log, yarn application attempts.png
>
>
> If a non-retryable error (e.g. the savepoint is corrupted or unaccessible) 
> occurs during the initiation of the job manager, the job cluster exits with 
> an error code. But since it does not mark the attempt as failed, it won't be 
> count as a failed attempt, and YARN will keep retrying forever.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (FLINK-22506) YARN job cluster stuck in retrying creating JobManager if savepoint is corrupted



[ 
https://issues.apache.org/jira/browse/FLINK-22506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17337517#comment-17337517
 ] 

Paul Lin commented on FLINK-22506:
--

[~trohrmann] Thanks a lot for the input. Now I'm suspecting maybe the value of 
`yarn.application-attempt-failures-validity-interval` is low (I'm using the 
default), given that in my case a retry may take 1 min. I'll investigate 
further, and close the issue if it's a configuration problem. Thanks again! 

> YARN job cluster stuck in retrying creating JobManager if savepoint is 
> corrupted
> 
>
> Key: FLINK-22506
> URL: https://issues.apache.org/jira/browse/FLINK-22506
> Project: Flink
>  Issue Type: Improvement
>  Components: Deployment / YARN
>Affects Versions: 1.11.3
>Reporter: Paul Lin
>Priority: Major
> Attachments: corrupted_savepoint.log, yarn application attempts.png
>
>
> If a non-retryable error (e.g. the savepoint is corrupted or unaccessible) 
> occurs during the initiation of the job manager, the job cluster exits with 
> an error code. But since it does not mark the attempt as failed, it won't be 
> count as a failed attempt, and YARN will keep retrying forever.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (FLINK-10644) Batch Job: Speculative execution



[ 
https://issues.apache.org/jira/browse/FLINK-10644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17337515#comment-17337515
 ] 

Till Rohrmann commented on FLINK-10644:
---

Sounds good to me [~wangwj]. Thanks for the update.

> Batch Job: Speculative execution
> 
>
> Key: FLINK-10644
> URL: https://issues.apache.org/jira/browse/FLINK-10644
> Project: Flink
>  Issue Type: New Feature
>  Components: Runtime / Coordination
>Reporter: JIN SUN
>Assignee: BoWang
>Priority: Major
>  Labels: stale-assigned
>
> Strugglers/outlier are tasks that run slower than most of the all tasks in a 
> Batch Job, this somehow impact job latency, as pretty much this straggler 
> will be in the critical path of the job and become as the bottleneck.
> Tasks may be slow for various reasons, including hardware degradation, or 
> software mis-configuration, or noise neighboring. It's hard for JM to predict 
> the runtime.
> To reduce the overhead of strugglers, other system such as Hadoop/Tez, Spark 
> has *_speculative execution_*. Speculative execution is a health-check 
> procedure that checks for tasks to be speculated, i.e. running slower in a 
> ExecutionJobVertex than the median of all successfully completed tasks in 
> that EJV, Such slow tasks will be re-submitted to another TM. It will not 
> stop the slow tasks, but run a new copy in parallel. And will kill the others 
> if one of them complete.
> This JIRA is an umbrella to apply this kind of idea in FLINK. Details will be 
> append later.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (FLINK-22506) YARN job cluster stuck in retrying creating JobManager if savepoint is corrupted



[ 
https://issues.apache.org/jira/browse/FLINK-22506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17337510#comment-17337510
 ] 

Till Rohrmann commented on FLINK-22506:
---

Ok, then I have misunderstood the ticket a bit. I thought that any application 
master failure would be handled as a failed attempt and counts towards the 
{{yarn.application-attempts}}. I don't think that we ever mark an Yarn attempt 
explicitly as failed. Hence, I thought that it should work with 
{{yarn.application-attempts}} and 
{{yarn.application-attempt-failures-validity-interval}}.

> YARN job cluster stuck in retrying creating JobManager if savepoint is 
> corrupted
> 
>
> Key: FLINK-22506
> URL: https://issues.apache.org/jira/browse/FLINK-22506
> Project: Flink
>  Issue Type: Improvement
>  Components: Deployment / YARN
>Affects Versions: 1.11.3
>Reporter: Paul Lin
>Priority: Major
> Attachments: corrupted_savepoint.log, yarn application attempts.png
>
>
> If a non-retryable error (e.g. the savepoint is corrupted or unaccessible) 
> occurs during the initiation of the job manager, the job cluster exits with 
> an error code. But since it does not mark the attempt as failed, it won't be 
> count as a failed attempt, and YARN will keep retrying forever.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[GitHub] [flink] flinkbot edited a comment on pull request #15768: [FLINK-22451][table] Support (*) as parameter of UDFs in Table API



flinkbot edited a comment on pull request #15768:
URL: https://github.com/apache/flink/pull/15768#issuecomment-826735938


   
   ## CI report:
   
   * 48814bd4d49167579110479e3f3cecabff6d443a Azure: 
[SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=17307)
 
   * 11787dc2e0c372007bddc148e0d4aec2ed9a275f Azure: 
[PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=17470)
 
   * dced81c2ffc8c59f9d9311346e71309129aa73cf Azure: 
[PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=17473)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Comment Edited] (FLINK-10644) Batch Job: Speculative execution



[ 
https://issues.apache.org/jira/browse/FLINK-10644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17337504#comment-17337504
 ] 

wangwj edited comment on FLINK-10644 at 4/30/21, 4:53 PM:
--

Hi, [~trohrmann]  [~wind_ljy]
I think the closest version of Flink to my Blink version I built this feature 
on maybe is 1.7 or 1.8
Though it seems a little far from the latest version of Flink, I found that the 
code which I want to modify is not much different from that after I read the 
code of Blink and Flink (master branch). So, I am confident to contribute this 
issue.  

I think the multi-threading in the ExecutionGraph is two executions finished at 
the same time in a same ExecutionVertex. ExecutionVertex.executionFinished() 
method may be called as two different execution at the same time. Maybe I call 
it "multi-threading" is not very accurate here.

How does the speculative execution play together with other sinks? Does it only 
work for the file based sinks? 
The speculative execution could also support sink to Key-value databases, such 
as Hologres, HBase etc.  In our scenario, the batch job usually writes data 
into Hologres (similar to HBase) or Pangu (similar to HDFS).  

  How does the blacklisting mechanism work? Does it work also for the K8s and 
Mesos integration or only for the Yarn integration?
The blacklist module is a thread that maintains the black machines of this job 
and removes expired elements periodically. Each element in blacklist contains 
IP and timestamp. The timestamp is used to decide whether the elements of the 
blacklist is expired or not.  
My code only supports Yarn integration. But as far as I know, we could use 
nodeaffinity or podaffinity to achieve the same goal with Yarn 
PlacementConstraint in K8s integration. As the mesos integration will be 
deprecated in Flink-1.13, I didn’t consider it.

How much is the change encapsulated by the SchedulerNG interface?
I agree with that the SchedulerNG interface is more or less self-contained, and 
I will consider your proposal carefully in FLIP and coding.  

  In the next step, I'll move on to figure out what changes are needed in Flink 
(master branch) then try to write a POC.
Then I will send e-mail to d...@flink.apache.org to discuss this feature.
Then I will write a FLIP and a vote on it.

Thanks




was (Author: wangwj):
Hi, [~trohrmann]  [~wind_ljy]
I think the closest version of Flink to my Blink version I built this feature 
on maybe is 1.7 or 1.8
Though it seems a little far from the latest version of Flink, I found that the 
code which I want to modify is not much different from that after I read the 
code of Blink and Flink (master branch). So, I am confident to contribute this 
issue.  

I think the multi-threading in the ExecutionGraph is two executions finished at 
the same time in a same ExecutionVertex. ExecutionVertex.executionFinished() 
method may be called as two different execution at the same time. Maybe I call 
it "multi-threading" is not very accurate here.

How does the speculative execution play together with other sinks? Does it only 
work for the file based sinks? 
The speculative execution could also support sink to Key-value databases, such 
as Hologres, HBase etc.  In our scenario, the batch job usually writes data 
into Hologres (similar to HBase) or Pangu (similar to HDFS).  

  How does the blacklisting mechanism work? Does it work also for the K8s and 
Mesos integration or only for the Yarn integration?
The blacklist module is a thread that maintains the black machines of this job 
and removes expired elements periodically. Each element in blacklist contains 
IP and timestamp. The timestamp is used to decide whether the elements of the 
blacklist is expired or not.  
My code only supports Yarn integration. But as far as I know, we could use 
nodeaffinity or podaffinity to achieve the same goal with Yarn 
PlacementConstraint in K8s integration. As the mesos integration will be 
deprecated in Flink-1.13, I didn’t consider it.

How much is the change encapsulated by the SchedulerNG interface?
I agree with that the SchedulerNG interface is more or less self-contained, and 
I will consider your proposal carefully in FLIP and coding.  

  In the next step, I'll move on to figure out what changes are needed in Flink 
(master branch) then write a POC.
Then I will send e-mail to d...@flink.apache.org to discuss this feature.
Then I will write a FLIP and a vote on it.

Thanks



> Batch Job: Speculative execution
> 
>
> Key: FLINK-10644
> URL: https://issues.apache.org/jira/browse/FLINK-10644
> Project: Flink
>  Issue Type: New Feature
>  Components: Runtime / Coordination
>Reporter: JIN SUN
>Assignee: BoWang
>Priority: Major
>  Labels: stale-assigned
>
> Strugglers/outlier are tasks

[jira] [Comment Edited] (FLINK-10644) Batch Job: Speculative execution



[ 
https://issues.apache.org/jira/browse/FLINK-10644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17337504#comment-17337504
 ] 

wangwj edited comment on FLINK-10644 at 4/30/21, 4:52 PM:
--

Hi, [~trohrmann]  [~wind_ljy]
I think the closest version of Flink to my Blink version I built this feature 
on maybe is 1.7 or 1.8
Though it seems a little far from the latest version of Flink, I found that the 
code which I want to modify is not much different from that after I read the 
code of Blink and Flink (master branch). So, I am confident to contribute this 
issue.  

I think the multi-threading in the ExecutionGraph is two executions finished at 
the same time in a same ExecutionVertex. ExecutionVertex.executionFinished() 
method may be called as two different execution at the same time. Maybe I call 
it "multi-threading" is not very accurate here.

How does the speculative execution play together with other sinks? Does it only 
work for the file based sinks? 
The speculative execution could also support sink to Key-value databases, such 
as Hologres, HBase etc.  In our scenario, the batch job usually writes data 
into Hologres (similar to HBase) or Pangu (similar to HDFS).  

  How does the blacklisting mechanism work? Does it work also for the K8s and 
Mesos integration or only for the Yarn integration?
The blacklist module is a thread that maintains the black machines of this job 
and removes expired elements periodically. Each element in blacklist contains 
IP and timestamp. The timestamp is used to decide whether the elements of the 
blacklist is expired or not.  
My code only supports Yarn integration. But as far as I know, we could use 
nodeaffinity or podaffinity to achieve the same goal with Yarn 
PlacementConstraint in K8s integration. As the mesos integration will be 
deprecated in Flink-1.13, I didn’t consider it.

How much is the change encapsulated by the SchedulerNG interface?
I agree with that the SchedulerNG interface is more or less self-contained, and 
I will consider your proposal carefully in FLIP and coding.  

  In the next step, I'll move on to figure out what changes are needed in Flink 
(master branch) then write a POC.
Then I will send e-mail to d...@flink.apache.org to discuss this feature.
Then I will write a FLIP and a vote on it.

Thanks




was (Author: wangwj):
Hi, [~trohrmann]  [~wind_ljy]
The closest version of Flink to my Blink version I built this feature on is 
1.5.1
Though it seems a little far from the latest version of Flink, I found that the 
code which I want to modify is not much different from that after I read the 
code of Blink and Flink (master branch). So, I am confident to contribute this 
issue.  

I think the multi-threading in the ExecutionGraph is two executions finished at 
the same time in a same ExecutionVertex. ExecutionVertex.executionFinished() 
method may be called as two different execution at the same time. Maybe I call 
it "multi-threading" is not very accurate here.

How does the speculative execution play together with other sinks? Does it only 
work for the file based sinks? 
The speculative execution could also support sink to Key-value databases, such 
as Hologres, HBase etc.  In our scenario, the batch job usually writes data 
into Hologres (similar to HBase) or Pangu (similar to HDFS).  

  How does the blacklisting mechanism work? Does it work also for the K8s and 
Mesos integration or only for the Yarn integration?
The blacklist module is a thread that maintains the black machines of this job 
and removes expired elements periodically. Each element in blacklist contains 
IP and timestamp. The timestamp is used to decide whether the elements of the 
blacklist is expired or not.  
My code only supports Yarn integration. But as far as I know, we could use 
nodeaffinity or podaffinity to achieve the same goal with Yarn 
PlacementConstraint in K8s integration. As the mesos integration will be 
deprecated in Flink-1.13, I didn’t consider it.

How much is the change encapsulated by the SchedulerNG interface?
I agree with that the SchedulerNG interface is more or less self-contained, and 
I will consider your proposal carefully in FLIP and coding.  

  In the next step, I'll move on to figure out what changes are needed in Flink 
(master branch) then write a POC.
Then I will send e-mail to d...@flink.apache.org to discuss this feature.
Then I will write a FLIP and a vote on it.

Thanks



> Batch Job: Speculative execution
> 
>
> Key: FLINK-10644
> URL: https://issues.apache.org/jira/browse/FLINK-10644
> Project: Flink
>  Issue Type: New Feature
>  Components: Runtime / Coordination
>Reporter: JIN SUN
>Assignee: BoWang
>Priority: Major
>  Labels: stale-assigned
>
> Strugglers/outlier are tasks that run slower than most

[jira] [Commented] (FLINK-10644) Batch Job: Speculative execution



[ 
https://issues.apache.org/jira/browse/FLINK-10644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17337504#comment-17337504
 ] 

wangwj commented on FLINK-10644:


Hi, [~trohrmann]  [~wind_ljy]
The closest version of Flink to my Blink version I built this feature on is 
1.5.1
Though it seems a little far from the latest version of Flink, I found that the 
code which I want to modify is not much different from that after I read the 
code of Blink and Flink (master branch). So, I am confident to contribute this 
issue.  

I think the multi-threading in the ExecutionGraph is two executions finished at 
the same time in a same ExecutionVertex. ExecutionVertex.executionFinished() 
method may be called as two different execution at the same time. Maybe I call 
it "multi-threading" is not very accurate here.

How does the speculative execution play together with other sinks? Does it only 
work for the file based sinks? 
The speculative execution could also support sink to Key-value databases, such 
as Hologres, HBase etc.  In our scenario, the batch job usually writes data 
into Hologres (similar to HBase) or Pangu (similar to HDFS).  

  How does the blacklisting mechanism work? Does it work also for the K8s and 
Mesos integration or only for the Yarn integration?
The blacklist module is a thread that maintains the black machines of this job 
and removes expired elements periodically. Each element in blacklist contains 
IP and timestamp. The timestamp is used to decide whether the elements of the 
blacklist is expired or not.  
My code only supports Yarn integration. But as far as I know, we could use 
nodeaffinity or podaffinity to achieve the same goal with Yarn 
PlacementConstraint in K8s integration. As the mesos integration will be 
deprecated in Flink-1.13, I didn’t consider it.

How much is the change encapsulated by the SchedulerNG interface?
I agree with that the SchedulerNG interface is more or less self-contained, and 
I will consider your proposal carefully in FLIP and coding.  

  In the next step, I'll move on to figure out what changes are needed in Flink 
(master branch) then write a POC.
Then I will send e-mail to d...@flink.apache.org to discuss this feature.
Then I will write a FLIP and a vote on it.

Thanks



> Batch Job: Speculative execution
> 
>
> Key: FLINK-10644
> URL: https://issues.apache.org/jira/browse/FLINK-10644
> Project: Flink
>  Issue Type: New Feature
>  Components: Runtime / Coordination
>Reporter: JIN SUN
>Assignee: BoWang
>Priority: Major
>  Labels: stale-assigned
>
> Strugglers/outlier are tasks that run slower than most of the all tasks in a 
> Batch Job, this somehow impact job latency, as pretty much this straggler 
> will be in the critical path of the job and become as the bottleneck.
> Tasks may be slow for various reasons, including hardware degradation, or 
> software mis-configuration, or noise neighboring. It's hard for JM to predict 
> the runtime.
> To reduce the overhead of strugglers, other system such as Hadoop/Tez, Spark 
> has *_speculative execution_*. Speculative execution is a health-check 
> procedure that checks for tasks to be speculated, i.e. running slower in a 
> ExecutionJobVertex than the median of all successfully completed tasks in 
> that EJV, Such slow tasks will be re-submitted to another TM. It will not 
> stop the slow tasks, but run a new copy in parallel. And will kill the others 
> if one of them complete.
> This JIRA is an umbrella to apply this kind of idea in FLINK. Details will be 
> append later.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Comment Edited] (FLINK-22506) YARN job cluster stuck in retrying creating JobManager if savepoint is corrupted



[ 
https://issues.apache.org/jira/browse/FLINK-22506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17337499#comment-17337499
 ] 

Paul Lin edited comment on FLINK-22506 at 4/30/21, 4:42 PM:


[~knaufk] I've attached the jm logs and the screen shoot of YARN application 
web UI. Please take a look. I first reported the issue as a bug, because I 
think the max number of attempts (which is set to 2) is not respected in this 
case, but I'm fine with making it an improvement.


was (Author: paul lin):
[~knaufk] I've attached the jm logs and the screen shoot yarn application web 
UI. Please take a look. I first reported the issue as a bug, because I think 
the max number of attempts (which is set to 2) is not respected in this case, 
but I'm fine with making it an improvement.

> YARN job cluster stuck in retrying creating JobManager if savepoint is 
> corrupted
> 
>
> Key: FLINK-22506
> URL: https://issues.apache.org/jira/browse/FLINK-22506
> Project: Flink
>  Issue Type: Improvement
>  Components: Deployment / YARN
>Affects Versions: 1.11.3
>Reporter: Paul Lin
>Priority: Major
> Attachments: corrupted_savepoint.log, yarn application attempts.png
>
>
> If a non-retryable error (e.g. the savepoint is corrupted or unaccessible) 
> occurs during the initiation of the job manager, the job cluster exits with 
> an error code. But since it does not mark the attempt as failed, it won't be 
> count as a failed attempt, and YARN will keep retrying forever.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (FLINK-22506) YARN job cluster stuck in retrying creating JobManager if savepoint is corrupted



[ 
https://issues.apache.org/jira/browse/FLINK-22506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17337499#comment-17337499
 ] 

Paul Lin commented on FLINK-22506:
--

[~knaufk] I've attached the jm logs and the screen shoot yarn application web 
UI. Please take a look. I first reported the issue as a bug, because I think 
the max number of attempts (which is set to 2) is not respected in this case, 
but I'm fine with making it an improvement.

> YARN job cluster stuck in retrying creating JobManager if savepoint is 
> corrupted
> 
>
> Key: FLINK-22506
> URL: https://issues.apache.org/jira/browse/FLINK-22506
> Project: Flink
>  Issue Type: Improvement
>  Components: Deployment / YARN
>Affects Versions: 1.11.3
>Reporter: Paul Lin
>Priority: Major
> Attachments: corrupted_savepoint.log, yarn application attempts.png
>
>
> If a non-retryable error (e.g. the savepoint is corrupted or unaccessible) 
> occurs during the initiation of the job manager, the job cluster exits with 
> an error code. But since it does not mark the attempt as failed, it won't be 
> count as a failed attempt, and YARN will keep retrying forever.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (FLINK-22506) YARN job cluster stuck in retrying creating JobManager if savepoint is corrupted



 [ 
https://issues.apache.org/jira/browse/FLINK-22506?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Lin updated FLINK-22506:
-
Attachment: yarn application attempts.png

> YARN job cluster stuck in retrying creating JobManager if savepoint is 
> corrupted
> 
>
> Key: FLINK-22506
> URL: https://issues.apache.org/jira/browse/FLINK-22506
> Project: Flink
>  Issue Type: Improvement
>  Components: Deployment / YARN
>Affects Versions: 1.11.3
>Reporter: Paul Lin
>Priority: Major
> Attachments: corrupted_savepoint.log, yarn application attempts.png
>
>
> If a non-retryable error (e.g. the savepoint is corrupted or unaccessible) 
> occurs during the initiation of the job manager, the job cluster exits with 
> an error code. But since it does not mark the attempt as failed, it won't be 
> count as a failed attempt, and YARN will keep retrying forever.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (FLINK-22506) YARN job cluster stuck in retrying creating JobManager if savepoint is corrupted



 [ 
https://issues.apache.org/jira/browse/FLINK-22506?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Lin updated FLINK-22506:
-
Attachment: corrupted_savepoint.log

> YARN job cluster stuck in retrying creating JobManager if savepoint is 
> corrupted
> 
>
> Key: FLINK-22506
> URL: https://issues.apache.org/jira/browse/FLINK-22506
> Project: Flink
>  Issue Type: Improvement
>  Components: Deployment / YARN
>Affects Versions: 1.11.3
>Reporter: Paul Lin
>Priority: Major
> Attachments: corrupted_savepoint.log
>
>
> If a non-retryable error (e.g. the savepoint is corrupted or unaccessible) 
> occurs during the initiation of the job manager, the job cluster exits with 
> an error code. But since it does not mark the attempt as failed, it won't be 
> count as a failed attempt, and YARN will keep retrying forever.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[GitHub] [flink] flinkbot edited a comment on pull request #15820: [FLINK-22535][runtime] CleanUp is invoked for task even when the task…



flinkbot edited a comment on pull request #15820:
URL: https://github.com/apache/flink/pull/15820#issuecomment-830052015


   
   ## CI report:
   
   * c548fc5c79450d898eebc1c6523be98472fe1cdd Azure: 
[CANCELED](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=17467)
 
   * c11e6cb0c0817a53ef456462f610e3d079f590f9 Azure: 
[PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=17472)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] flinkbot edited a comment on pull request #15768: [FLINK-22451][table] Support (*) as parameter of UDFs in Table API



flinkbot edited a comment on pull request #15768:
URL: https://github.com/apache/flink/pull/15768#issuecomment-826735938


   
   ## CI report:
   
   * 48814bd4d49167579110479e3f3cecabff6d443a Azure: 
[SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=17307)
 
   * 11787dc2e0c372007bddc148e0d4aec2ed9a275f Azure: 
[PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=17470)
 
   * dced81c2ffc8c59f9d9311346e71309129aa73cf UNKNOWN
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] flinkbot edited a comment on pull request #15131: [FLINK-21700][security] Add an option to disable credential retrieval on a secure cluster



flinkbot edited a comment on pull request #15131:
URL: https://github.com/apache/flink/pull/15131#issuecomment-794957907


   
   ## CI report:
   
   * 8cb12deac019898bb69f7a5faf7f803dd27d71d0 Azure: 
[SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=17465)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Comment Edited] (FLINK-10644) Batch Job: Speculative execution

[
https://issues.apache.org/jira/browse/FLINK-10644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17334427#comment-17334427
]

wangwj edited comment on FLINK-10644 at 4/30/21, 4:15 PM:
--

[~trohrmann]
Hi Till.

I am from the search and recommendation department of Alibaba in China. Happy
to share and discuss my job here.
Our big data processing platform uses Flink Batch to process extremely huge
data every day. Many long-tail tasks are produced everyday and we have to kill
these processes manually, which leads to a poor user experience. So I tried to
solve this problem.

I think that speculative execution means that two executions in a
ExecutionVertex running at a same time, and failover means that two executions
running at two different time. Based on this, I think this feature(speculative
execution) is theoretically achievable. So, I have implemented a speculative
execution for batch job based on Blink, and it had a significant effect in our
product cluster.

I did as follows:

（1）Which kind of ExecutionJobVertex is suitable enable speculative execution
feature in a batch job?
The speculative execution feature correlates with the implementation details of
the region failover. So, I found that a ExecutionJobVertex will enable
speculative execution feature only if all input edges and output edges of this
ExecutionJobVertex are blocking(Condition A).

（2）How to distinguish long-tail task？
I distinguish long-tail task based on the intervals between the current time
and the execution first create/deploying time before it failover. When an
ExecutionJobVertex meets Condition A and a configurable percentage of
executions has been finished in the ExecutionJobVertex, the speculative
execution thread starts to really work. In the ExecutionJobVertex, when the
running time of a execution is greater than a configurable multiple of the
median of the running time of other finished executions, this execution is
defined as long-tail execution.(Condition B)

（3）How to make the speculative execution algorithm more precise?
Baesd on the speculative execution algorithm in Condition B, I solved the
problem of long-tail tasks in our product cluster.
In the next step, we may add the throughput of the task to the speculative
execution algorithm through the heartbeat of TaskManagers with JobManager.

（4）How to schedule another execution in a same ExecutionVertex?
We have changed the currentExecution in ExecutionVertex to a list, which means
that there can be multiple executions in an ExecutionVertex at the same time.
Then we reuse the current scheduling logic to schedule the speculative
execution.

（5）How to make the speculative task run on another machine from the original
execution.
We have implemented a machine-dimensional blacklist per job. The machine IP was
added in the blacklist when an execution is recognized as a long-tail
execution. The blacklist would remove the machine IP when it is out of date.
When the executions are scheduled, we will add information of the blacklist to
yarn PlacementConstraint. In this way, I can ensure that the yarn container is
not on the machines in the blacklist.

（6）How to avoid errors when multiple executions finish at the same time in an
ExecutionVertex?
In ExecutionVertex executionFinished() method, multi-thread synchronization was
used to ensure that only one execution would successfully finished in an
ExecutionVertex. All the other executions will go to the cancellation logic.

（7）How to deal with multiple sink files in one ExecutionVertex when the job is
sink to files?
When batch job will sink to file, we will add an executionAttemptID suffix to
the file name.
Finally, I will delete or rename these files in finalizeOnMaster().
Here we should pay attention to the situation of flink stream job processing
bounded data sets.

（8）In batch job with all-to-all shuffle, how did the downstream original
execution and speculative execution select the ResultSubPartition of the
upstream executions?
Two executions of a upstream ExecutionVertex will produce two ResultPartitions.
When the upstream ExecutionVertex finished, we will update the inputChannel of
down stream execution to the fastest finished execution of upstream.
Here we should pay attention to the situation when the down stream execution
meet DataConsumptionException. It will restart with the upstream execution that
has been finished.

（9）How to display information about speculative task on the Flink web ui?
After I have implemented this feature. When speculative execution runs faster
then original execution, the flink ui will show that this task has been
cancelled. But the result of the job is correct, which is in full compliance
with our expectations.
I don’t know much about the web, I will ask my colleague for help.

[~trohrmann]
My implementation has played a big role in our product cluster in

[jira] [Comment Edited] (FLINK-22509) ./bin/flink run -m yarn-cluster -d submission leads to IllegalStateException



[ 
https://issues.apache.org/jira/browse/FLINK-22509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17337484#comment-17337484
 ] 

Yang Wang edited comment on FLINK-22509 at 4/30/21, 4:10 PM:
-

[~knaufk] [~rmetzger] I agree with that we should deprecate {{\-m 
yarn-cluster}}, which also means the {{FlinkYarnSessionCli}}. Instead, we need 
to suggest our users to migrate to the client unified executor interface 
{{--target yarn-per-job}}.

However, I hesitate to deprecate it too soon(e.g. 1.13). The only reason why we 
have not deprecated it after introducing the unified executor interface is 
about the CLI compatibility. I am pretty sure a lot of companies are using the 
{{flink run}} to integrate with their deployers in production. Because we do 
not provide a very good client SDK or deploy interfaces. It will take a big 
burden and surprise for them if we do not let them know early enough.


was (Author: fly_in_gis):
[~knaufk] [~rmetzger] I agree with that we should deprecate {{-m 
yarn-cluster}}, which also means the {{FlinkYarnSessionCli}}. Instead, we need 
to suggest our users to migrate to the client unified executor interface 
{{--target yarn-per-job}}.

However, I hesitate to deprecate it too soon(e.g. 1.13). The only reason why we 
have not deprecated it after introducing the unified executor interface is 
about the CLI compatibility. I am pretty sure a lot of companies are using the 
{{flink run}} to integrate with their deployers in production. Because we do 
not provide a very good client SDK or deploy interfaces. It will take a big 
burden and surprise for them if we do not let them know early enough.

> ./bin/flink run -m yarn-cluster -d submission leads to IllegalStateException
> 
>
> Key: FLINK-22509
> URL: https://issues.apache.org/jira/browse/FLINK-22509
> Project: Flink
>  Issue Type: Bug
>  Components: Deployment / YARN
>Affects Versions: 1.13.0, 1.14.0
>Reporter: Robert Metzger
>Assignee: Robert Metzger
>Priority: Major
>
> Submitting a detached, per-job YARN cluster in Flink (like this: 
> {{./bin/flink run -m yarn-cluster -d  
> ./examples/streaming/TopSpeedWindowing.jar}}), leads to the following 
> exception:
> {code}
> 2021-04-28 11:39:00,786 INFO  org.apache.flink.yarn.YarnClusterDescriptor 
>  [] - Found Web Interface 
> ip-172-31-27-232.eu-central-1.compute.internal:45689 of application 
> 'application_1619607372651_0005'.
> Job has been submitted with JobID 5543e81db9c2de78b646088891f23bfc
> Exception in thread "Thread-4" java.lang.IllegalStateException: Trying to 
> access closed classloader. Please check if you store classloaders directly or 
> indirectly in static fields. If the stacktrace suggests that the leak occurs 
> in a third party library and cannot be fixed immediately, you can disable 
> this check with the configuration 'classloader.check-leaked-classloader'.
>   at 
> org.apache.flink.runtime.execution.librarycache.FlinkUserCodeClassLoaders$SafetyNetWrapperClassLoader.ensureInner(FlinkUserCodeClassLoaders.java:164)
>   at 
> org.apache.flink.runtime.execution.librarycache.FlinkUserCodeClassLoaders$SafetyNetWrapperClassLoader.getResource(FlinkUserCodeClassLoaders.java:183)
>   at 
> org.apache.hadoop.conf.Configuration.getResource(Configuration.java:2570)
>   at 
> org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:2783)
>   at 
> org.apache.hadoop.conf.Configuration.loadResources(Configuration.java:2758)
>   at 
> org.apache.hadoop.conf.Configuration.getProps(Configuration.java:2638)
>   at org.apache.hadoop.conf.Configuration.get(Configuration.java:1100)
>   at 
> org.apache.hadoop.conf.Configuration.getTimeDuration(Configuration.java:1707)
>   at 
> org.apache.hadoop.conf.Configuration.getTimeDuration(Configuration.java:1688)
>   at 
> org.apache.hadoop.util.ShutdownHookManager.getShutdownTimeout(ShutdownHookManager.java:183)
>   at 
> org.apache.hadoop.util.ShutdownHookManager.shutdownExecutor(ShutdownHookManager.java:145)
>   at 
> org.apache.hadoop.util.ShutdownHookManager.access$300(ShutdownHookManager.java:65)
>   at 
> org.apache.hadoop.util.ShutdownHookManager$1.run(ShutdownHookManager.java:102)
> {code}
> The job is still running as expected.
> Detached submission with {{./bin/flink run-application -t yarn-application 
> -d}} works as expected. This is also the documented approach.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Comment Edited] (FLINK-22509) ./bin/flink run -m yarn-cluster -d submission leads to IllegalStateException



[ 
https://issues.apache.org/jira/browse/FLINK-22509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17337484#comment-17337484
 ] 

Yang Wang edited comment on FLINK-22509 at 4/30/21, 4:10 PM:
-

[~knaufk] [~rmetzger] I agree with that we should deprecate {{-m 
yarn-cluster}}, which also means the {{FlinkYarnSessionCli}}. Instead, we need 
to suggest our users to migrate to the client unified executor interface 
{{--target yarn-per-job}}.

However, I hesitate to deprecate it too soon(e.g. 1.13). The only reason why we 
have not deprecated it after introducing the unified executor interface is 
about the CLI compatibility. I am pretty sure a lot of companies are using the 
{{flink run}} to integrate with their deployers in production. Because we do 
not provide a very good client SDK or deploy interfaces. It will take a big 
burden and surprise for them if we do not let them know early enough.


was (Author: fly_in_gis):
[~knaufk] [~rmetzger] I agree with that we should deprecate {{\-m 
yarn-cluster}}, which also means the {{FlinkYarnSessionCli}}. Instead, we need 
to suggest our users to migrate to the client unified executor interface 
{{--target yarn-per-job}}.


 However, I hesitate to deprecate it too soon(e.g. 1.13). The only reason why 
we have not deprecate it after introducing the unified executor interface is 
about the CLI the compatibility. I am pretty sure a lot of companies are using 
the {{flink run}} to integrate with their deployers in production. Because we 
do not provide a very good client SDK or deploy interfaces. It will take a big 
burden and surprise for them if we do not let them know early enough.

> ./bin/flink run -m yarn-cluster -d submission leads to IllegalStateException
> 
>
> Key: FLINK-22509
> URL: https://issues.apache.org/jira/browse/FLINK-22509
> Project: Flink
>  Issue Type: Bug
>  Components: Deployment / YARN
>Affects Versions: 1.13.0, 1.14.0
>Reporter: Robert Metzger
>Assignee: Robert Metzger
>Priority: Major
>
> Submitting a detached, per-job YARN cluster in Flink (like this: 
> {{./bin/flink run -m yarn-cluster -d  
> ./examples/streaming/TopSpeedWindowing.jar}}), leads to the following 
> exception:
> {code}
> 2021-04-28 11:39:00,786 INFO  org.apache.flink.yarn.YarnClusterDescriptor 
>  [] - Found Web Interface 
> ip-172-31-27-232.eu-central-1.compute.internal:45689 of application 
> 'application_1619607372651_0005'.
> Job has been submitted with JobID 5543e81db9c2de78b646088891f23bfc
> Exception in thread "Thread-4" java.lang.IllegalStateException: Trying to 
> access closed classloader. Please check if you store classloaders directly or 
> indirectly in static fields. If the stacktrace suggests that the leak occurs 
> in a third party library and cannot be fixed immediately, you can disable 
> this check with the configuration 'classloader.check-leaked-classloader'.
>   at 
> org.apache.flink.runtime.execution.librarycache.FlinkUserCodeClassLoaders$SafetyNetWrapperClassLoader.ensureInner(FlinkUserCodeClassLoaders.java:164)
>   at 
> org.apache.flink.runtime.execution.librarycache.FlinkUserCodeClassLoaders$SafetyNetWrapperClassLoader.getResource(FlinkUserCodeClassLoaders.java:183)
>   at 
> org.apache.hadoop.conf.Configuration.getResource(Configuration.java:2570)
>   at 
> org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:2783)
>   at 
> org.apache.hadoop.conf.Configuration.loadResources(Configuration.java:2758)
>   at 
> org.apache.hadoop.conf.Configuration.getProps(Configuration.java:2638)
>   at org.apache.hadoop.conf.Configuration.get(Configuration.java:1100)
>   at 
> org.apache.hadoop.conf.Configuration.getTimeDuration(Configuration.java:1707)
>   at 
> org.apache.hadoop.conf.Configuration.getTimeDuration(Configuration.java:1688)
>   at 
> org.apache.hadoop.util.ShutdownHookManager.getShutdownTimeout(ShutdownHookManager.java:183)
>   at 
> org.apache.hadoop.util.ShutdownHookManager.shutdownExecutor(ShutdownHookManager.java:145)
>   at 
> org.apache.hadoop.util.ShutdownHookManager.access$300(ShutdownHookManager.java:65)
>   at 
> org.apache.hadoop.util.ShutdownHookManager$1.run(ShutdownHookManager.java:102)
> {code}
> The job is still running as expected.
> Detached submission with {{./bin/flink run-application -t yarn-application 
> -d}} works as expected. This is also the documented approach.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Comment Edited] (FLINK-22509) ./bin/flink run -m yarn-cluster -d submission leads to IllegalStateException



[ 
https://issues.apache.org/jira/browse/FLINK-22509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17337484#comment-17337484
 ] 

Yang Wang edited comment on FLINK-22509 at 4/30/21, 4:09 PM:
-

[~knaufk] [~rmetzger] I agree with that we should deprecate {{\-m 
yarn-cluster}}, which also means the {{FlinkYarnSessionCli}}. Instead, we need 
to suggest our users to migrate to the client unified executor interface 
{{--target yarn-per-job}}.


 However, I hesitate to deprecate it too soon(e.g. 1.13). The only reason why 
we have not deprecate it after introducing the unified executor interface is 
about the CLI the compatibility. I am pretty sure a lot of companies are using 
the {{flink run}} to integrate with their deployers in production. Because we 
do not provide a very good client SDK or deploy interfaces. It will take a big 
burden and surprise for them if we do not let them know early enough.


was (Author: fly_in_gis):
[~knaufk] [~rmetzger] I agree with that we should deprecate {{-m 
yarn-cluster}}, which also means the {{FlinkYarnSessionCli}}. Instead, we need 
to suggest our users to migrate to the client unified executor interface 
{{--target yarn-per-job}}.


 However, I hesitate to deprecate it too soon(e.g. 1.13). The only reason why 
we have not deprecate it after introducing the unified executor interface is 
about the CLI the compatibility. I am pretty sure a lot of companies are using 
the {{flink run}} to integrate with their deployers in production. Because we 
do not provide a very good client SDK or deploy interfaces. It will take a big 
burden and surprise for them if we do not let them know early enough.

> ./bin/flink run -m yarn-cluster -d submission leads to IllegalStateException
> 
>
> Key: FLINK-22509
> URL: https://issues.apache.org/jira/browse/FLINK-22509
> Project: Flink
>  Issue Type: Bug
>  Components: Deployment / YARN
>Affects Versions: 1.13.0, 1.14.0
>Reporter: Robert Metzger
>Assignee: Robert Metzger
>Priority: Major
>
> Submitting a detached, per-job YARN cluster in Flink (like this: 
> {{./bin/flink run -m yarn-cluster -d  
> ./examples/streaming/TopSpeedWindowing.jar}}), leads to the following 
> exception:
> {code}
> 2021-04-28 11:39:00,786 INFO  org.apache.flink.yarn.YarnClusterDescriptor 
>  [] - Found Web Interface 
> ip-172-31-27-232.eu-central-1.compute.internal:45689 of application 
> 'application_1619607372651_0005'.
> Job has been submitted with JobID 5543e81db9c2de78b646088891f23bfc
> Exception in thread "Thread-4" java.lang.IllegalStateException: Trying to 
> access closed classloader. Please check if you store classloaders directly or 
> indirectly in static fields. If the stacktrace suggests that the leak occurs 
> in a third party library and cannot be fixed immediately, you can disable 
> this check with the configuration 'classloader.check-leaked-classloader'.
>   at 
> org.apache.flink.runtime.execution.librarycache.FlinkUserCodeClassLoaders$SafetyNetWrapperClassLoader.ensureInner(FlinkUserCodeClassLoaders.java:164)
>   at 
> org.apache.flink.runtime.execution.librarycache.FlinkUserCodeClassLoaders$SafetyNetWrapperClassLoader.getResource(FlinkUserCodeClassLoaders.java:183)
>   at 
> org.apache.hadoop.conf.Configuration.getResource(Configuration.java:2570)
>   at 
> org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:2783)
>   at 
> org.apache.hadoop.conf.Configuration.loadResources(Configuration.java:2758)
>   at 
> org.apache.hadoop.conf.Configuration.getProps(Configuration.java:2638)
>   at org.apache.hadoop.conf.Configuration.get(Configuration.java:1100)
>   at 
> org.apache.hadoop.conf.Configuration.getTimeDuration(Configuration.java:1707)
>   at 
> org.apache.hadoop.conf.Configuration.getTimeDuration(Configuration.java:1688)
>   at 
> org.apache.hadoop.util.ShutdownHookManager.getShutdownTimeout(ShutdownHookManager.java:183)
>   at 
> org.apache.hadoop.util.ShutdownHookManager.shutdownExecutor(ShutdownHookManager.java:145)
>   at 
> org.apache.hadoop.util.ShutdownHookManager.access$300(ShutdownHookManager.java:65)
>   at 
> org.apache.hadoop.util.ShutdownHookManager$1.run(ShutdownHookManager.java:102)
> {code}
> The job is still running as expected.
> Detached submission with {{./bin/flink run-application -t yarn-application 
> -d}} works as expected. This is also the documented approach.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (FLINK-22509) ./bin/flink run -m yarn-cluster -d submission leads to IllegalStateException



[ 
https://issues.apache.org/jira/browse/FLINK-22509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17337484#comment-17337484
 ] 

Yang Wang commented on FLINK-22509:
---

[~knaufk] [~rmetzger] I agree with that we should deprecate {{-m 
yarn-cluster}}, which also means the {{FlinkYarnSessionCli}}. Instead, we need 
to suggest our users to migrate to the client unified executor interface 
{{--target yarn-per-job}}.


 However, I hesitate to deprecate it too soon(e.g. 1.13). The only reason why 
we have not deprecate it after introducing the unified executor interface is 
about the CLI the compatibility. I am pretty sure a lot of companies are using 
the {{flink run}} to integrate with their deployers in production. Because we 
do not provide a very good client SDK or deploy interfaces. It will take a big 
burden and surprise for them if we do not let them know early enough.

> ./bin/flink run -m yarn-cluster -d submission leads to IllegalStateException
> 
>
> Key: FLINK-22509
> URL: https://issues.apache.org/jira/browse/FLINK-22509
> Project: Flink
>  Issue Type: Bug
>  Components: Deployment / YARN
>Affects Versions: 1.13.0, 1.14.0
>Reporter: Robert Metzger
>Assignee: Robert Metzger
>Priority: Major
>
> Submitting a detached, per-job YARN cluster in Flink (like this: 
> {{./bin/flink run -m yarn-cluster -d  
> ./examples/streaming/TopSpeedWindowing.jar}}), leads to the following 
> exception:
> {code}
> 2021-04-28 11:39:00,786 INFO  org.apache.flink.yarn.YarnClusterDescriptor 
>  [] - Found Web Interface 
> ip-172-31-27-232.eu-central-1.compute.internal:45689 of application 
> 'application_1619607372651_0005'.
> Job has been submitted with JobID 5543e81db9c2de78b646088891f23bfc
> Exception in thread "Thread-4" java.lang.IllegalStateException: Trying to 
> access closed classloader. Please check if you store classloaders directly or 
> indirectly in static fields. If the stacktrace suggests that the leak occurs 
> in a third party library and cannot be fixed immediately, you can disable 
> this check with the configuration 'classloader.check-leaked-classloader'.
>   at 
> org.apache.flink.runtime.execution.librarycache.FlinkUserCodeClassLoaders$SafetyNetWrapperClassLoader.ensureInner(FlinkUserCodeClassLoaders.java:164)
>   at 
> org.apache.flink.runtime.execution.librarycache.FlinkUserCodeClassLoaders$SafetyNetWrapperClassLoader.getResource(FlinkUserCodeClassLoaders.java:183)
>   at 
> org.apache.hadoop.conf.Configuration.getResource(Configuration.java:2570)
>   at 
> org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:2783)
>   at 
> org.apache.hadoop.conf.Configuration.loadResources(Configuration.java:2758)
>   at 
> org.apache.hadoop.conf.Configuration.getProps(Configuration.java:2638)
>   at org.apache.hadoop.conf.Configuration.get(Configuration.java:1100)
>   at 
> org.apache.hadoop.conf.Configuration.getTimeDuration(Configuration.java:1707)
>   at 
> org.apache.hadoop.conf.Configuration.getTimeDuration(Configuration.java:1688)
>   at 
> org.apache.hadoop.util.ShutdownHookManager.getShutdownTimeout(ShutdownHookManager.java:183)
>   at 
> org.apache.hadoop.util.ShutdownHookManager.shutdownExecutor(ShutdownHookManager.java:145)
>   at 
> org.apache.hadoop.util.ShutdownHookManager.access$300(ShutdownHookManager.java:65)
>   at 
> org.apache.hadoop.util.ShutdownHookManager$1.run(ShutdownHookManager.java:102)
> {code}
> The job is still running as expected.
> Detached submission with {{./bin/flink run-application -t yarn-application 
> -d}} works as expected. This is also the documented approach.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[GitHub] [flink] yittg commented on a change in pull request #15768: [FLINK-22451][table] Support (*) as parameter of UDFs in Table API



yittg commented on a change in pull request #15768:
URL: https://github.com/apache/flink/pull/15768#discussion_r623990859



##
File path: 
flink-table/flink-table-planner-blink/src/test/scala/org/apache/flink/table/planner/runtime/stream/table/CalcITCase.scala
##
@@ -529,6 +529,27 @@ class CalcITCase(mode: StateBackendMode) extends 
StreamingWithStateTestBase(mode
 val expected = List("0,0,0", "1,1,1", "2,2,2")
 assertEquals(expected.sorted, sink.getAppendResults.sorted)
   }
+
+  /**
+   * This is an edition of [[CalcITCase.testMap()]] that uses * as one 
argument of the UDFs.
+   */
+  @Test
+  def testUserDefinedFunctionWithStarParameter(): Unit = {
+val ds = env.fromCollection(smallTupleData3).toTable(tEnv, 'a, 'b, 'c)
+  .map(Func23('*)).as("a", "b", "c", "d")
+  .map(Func24('*)).as("a", "b", "c", "d")
+  .map(Func1('b))

Review comment:
   original test case added




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] flinkbot edited a comment on pull request #15656: [FLINK-22233][Table SQL / API]Modified the spelling error of word "constant" in source code



flinkbot edited a comment on pull request #15656:
URL: https://github.com/apache/flink/pull/15656#issuecomment-822135263


   
   ## CI report:
   
   * fb69e9a4c5c897e75659c28915f9b9aa08e2f0c3 Azure: 
[SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=17466)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] flinkbot edited a comment on pull request #15820: [FLINK-22535][runtime] CleanUp is invoked for task even when the task…



flinkbot edited a comment on pull request #15820:
URL: https://github.com/apache/flink/pull/15820#issuecomment-830052015


   
   ## CI report:
   
   * c548fc5c79450d898eebc1c6523be98472fe1cdd Azure: 
[PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=17467)
 
   * c11e6cb0c0817a53ef456462f610e3d079f590f9 Azure: 
[PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=17472)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] flinkbot edited a comment on pull request #15760: [FLINK-19606][table-runtime-blink] Introduce WindowJoinOperator and WindowJoinOperatorBuilder.



flinkbot edited a comment on pull request #15760:
URL: https://github.com/apache/flink/pull/15760#issuecomment-826500326


   
   ## CI report:
   
   * 7fadefe78bb0f8b0d7b61ee2a105e1cb3b5d17b5 Azure: 
[FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=17459)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] yittg commented on a change in pull request #15768: [FLINK-22451][table] Support (*) as parameter of UDFs in Table API



yittg commented on a change in pull request #15768:
URL: https://github.com/apache/flink/pull/15768#discussion_r623972440



##
File path: 
flink-table/flink-table-planner-blink/src/test/scala/org/apache/flink/table/planner/runtime/stream/table/CalcITCase.scala
##
@@ -529,6 +529,27 @@ class CalcITCase(mode: StateBackendMode) extends 
StreamingWithStateTestBase(mode
 val expected = List("0,0,0", "1,1,1", "2,2,2")
 assertEquals(expected.sorted, sink.getAppendResults.sorted)
   }
+
+  /**
+   * This is an edition of [[CalcITCase.testMap()]] that uses * as one 
argument of the UDFs.
+   */
+  @Test
+  def testUserDefinedFunctionWithStarParameter(): Unit = {
+val ds = env.fromCollection(smallTupleData3).toTable(tEnv, 'a, 'b, 'c)
+  .map(Func23('*)).as("a", "b", "c", "d")
+  .map(Func24('*)).as("a", "b", "c", "d")
+  .map(Func1('b))

Review comment:
   hha, i just modified the existed testMap case, let me rewrite the 
original substring case here.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] flinkbot edited a comment on pull request #14172: [FLINK-19545][e2e] Add e2e test for native Kubernetes HA



flinkbot edited a comment on pull request #14172:
URL: https://github.com/apache/flink/pull/14172#issuecomment-73215


   
   ## CI report:
   
   * bbe6f4a0e363170970b29269af446f0d93578e5f Azure: 
[FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=17469)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] flinkbot edited a comment on pull request #15820: [FLINK-22535][runtime] CleanUp is invoked for task even when the task…



flinkbot edited a comment on pull request #15820:
URL: https://github.com/apache/flink/pull/15820#issuecomment-830052015


   
   ## CI report:
   
   * c548fc5c79450d898eebc1c6523be98472fe1cdd Azure: 
[PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=17467)
 
   * c11e6cb0c0817a53ef456462f610e3d079f590f9 UNKNOWN
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] flinkbot edited a comment on pull request #15817: [FLINK-14393][webui] Add an option to enable/disable cancel job in we…



flinkbot edited a comment on pull request #15817:
URL: https://github.com/apache/flink/pull/15817#issuecomment-829959024


   
   ## CI report:
   
   * c14593a5ccd7896b041421ee4d3e433a7afc34ac Azure: 
[SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=17460)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] akalash commented on pull request #15820: [FLINK-22535][runtime] CleanUp is invoked for task even when the task…



akalash commented on pull request #15820:
URL: https://github.com/apache/flink/pull/15820#issuecomment-830157511


   @gaoyunhaii , @pnowojski, Thanks guys for your comments. I fixed them and 
reorganized my commits.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] akalash commented on a change in pull request #15820: [FLINK-22535][runtime] CleanUp is invoked for task even when the task…



akalash commented on a change in pull request #15820:
URL: https://github.com/apache/flink/pull/15820#discussion_r623948875



##
File path: 
flink-streaming-java/src/main/java/org/apache/flink/streaming/runtime/tasks/StreamTask.java
##
@@ -609,42 +615,54 @@ private void ensureNotCanceled() {
 
 @Override
 public final void invoke() throws Exception {
-try {
-// Allow invoking method 'invoke' without having to call 'restore' 
before it.
-if (!isRunning) {
-LOG.debug("Restoring during invoke will be called.");
-restore();
-}
+runWithCleanUpOnFail(this::executeInvoke);
+
+cleanUpInvoke();
+}
+
+private void executeInvoke() throws Exception {
+// Allow invoking method 'invoke' without having to call 'restore' 
before it.
+if (!isRunning) {
+LOG.debug("Restoring during invoke will be called.");
+restore();
+}
 
-// final check to exit early before starting to run
-ensureNotCanceled();
+// final check to exit early before starting to run
+ensureNotCanceled();
 
-// let the task do its work
-runMailboxLoop();
+// let the task do its work
+runMailboxLoop();
 
-// if this left the run() method cleanly despite the fact that 
this was canceled,
-// make sure the "clean shutdown" is not attempted
-ensureNotCanceled();
+// if this left the run() method cleanly despite the fact that this 
was canceled,
+// make sure the "clean shutdown" is not attempted
+ensureNotCanceled();
 
-afterInvoke();
+afterInvoke();
+}
+
+private void runWithCleanUpOnFail(RunnableWithException run) throws 
Exception {
+try {
+run.run();
 } catch (Throwable invokeException) {
 failing = !canceled;
 try {
 if (!canceled) {
-cancelTask();
+try {
+cancelTask();
+} catch (Throwable ex) {
+invokeException = firstOrSuppressed(ex, 
invokeException);
+}

Review comment:
   1. done
   2. now it have




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] flinkbot edited a comment on pull request #15768: [FLINK-22451][table] Support (*) as parameter of UDFs in Table API



flinkbot edited a comment on pull request #15768:
URL: https://github.com/apache/flink/pull/15768#issuecomment-826735938


   
   ## CI report:
   
   * 48814bd4d49167579110479e3f3cecabff6d443a Azure: 
[SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=17307)
 
   * 11787dc2e0c372007bddc148e0d4aec2ed9a275f Azure: 
[PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=17470)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] flinkbot edited a comment on pull request #15728: [FLINK-22379][runtime] CheckpointCoordinator checks the state of all …



flinkbot edited a comment on pull request #15728:
URL: https://github.com/apache/flink/pull/15728#issuecomment-824960796


   
   ## CI report:
   
   * d4f85948ee5281b189ba948a8ec512f62587f979 Azure: 
[SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=17458)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] wuchong commented on a change in pull request #15768: [FLINK-22451][table] Support (*) as parameter of UDFs in Table API



wuchong commented on a change in pull request #15768:
URL: https://github.com/apache/flink/pull/15768#discussion_r623933402



##
File path: 
flink-table/flink-table-planner-blink/src/test/scala/org/apache/flink/table/planner/runtime/stream/table/CalcITCase.scala
##
@@ -529,6 +529,27 @@ class CalcITCase(mode: StateBackendMode) extends 
StreamingWithStateTestBase(mode
 val expected = List("0,0,0", "1,1,1", "2,2,2")
 assertEquals(expected.sorted, sink.getAppendResults.sorted)
   }
+
+  /**
+   * This is an edition of [[CalcITCase.testMap()]] that uses * as one 
argument of the UDFs.
+   */
+  @Test
+  def testUserDefinedFunctionWithStarParameter(): Unit = {
+val ds = env.fromCollection(smallTupleData3).toTable(tEnv, 'a, 'b, 'c)
+  .map(Func23('*)).as("a", "b", "c", "d")
+  .map(Func24('*)).as("a", "b", "c", "d")
+  .map(Func1('b))

Review comment:
   Could you also test `where`, `select` too? I remember you tested it in 
the original test. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Commented] (FLINK-18027) ROW value constructor cannot deal with complex expressions

2021-04-30 Thread Jark Wu (Jira)



[ 
https://issues.apache.org/jira/browse/FLINK-18027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17337429#comment-17337429
 ] 

Jark Wu commented on FLINK-18027:
-

Is it possible to register a custom {{ROW}} function to override the built-in 
{{SqlStdOperatorTable.ROW}}? 
I remember we override the {{JSON_VALUE}} function in internal branch, and 
{{JSON_VALUE}} is also defined in parser. 

> ROW value constructor cannot deal with complex expressions
> --
>
> Key: FLINK-18027
> URL: https://issues.apache.org/jira/browse/FLINK-18027
> Project: Flink
>  Issue Type: Bug
>  Components: Table SQL / API
>Reporter: Benchao Li
>Priority: Major
>  Labels: pull-request-available
>
> {code:java}
> create table my_source (
> my_row row
> ) with (...);
> create table my_sink (
> my_row row
> ) with (...);
> insert into my_sink
> select ROW(my_row.a, my_row.b) 
> from my_source;{code}
> will throw excepions:
> {code:java}
> Exception in thread "main" org.apache.flink.table.api.SqlParserException: SQL 
> parse failed. Encountered "." at line 1, column 18.Exception in thread "main" 
> org.apache.flink.table.api.SqlParserException: SQL parse failed. Encountered 
> "." at line 1, column 18.Was expecting one of:    ")" ...    "," ...     at 
> org.apache.flink.table.planner.calcite.CalciteParser.parse(CalciteParser.java:56)
>  at 
> org.apache.flink.table.planner.delegation.ParserImpl.parse(ParserImpl.java:64)
>  at 
> org.apache.flink.table.api.internal.TableEnvironmentImpl.sqlQuery(TableEnvironmentImpl.java:627)
>  at com.bytedance.demo.KafkaTableSource.main(KafkaTableSource.java:76)Caused 
> by: org.apache.calcite.sql.parser.SqlParseException: Encountered "." at line 
> 1, column 18.Was expecting one of:    ")" ...    "," ...     at 
> org.apache.flink.sql.parser.impl.FlinkSqlParserImpl.convertException(FlinkSqlParserImpl.java:416)
>  at 
> org.apache.flink.sql.parser.impl.FlinkSqlParserImpl.normalizeException(FlinkSqlParserImpl.java:201)
>  at 
> org.apache.calcite.sql.parser.SqlParser.handleException(SqlParser.java:148) 
> at org.apache.calcite.sql.parser.SqlParser.parseQuery(SqlParser.java:163) at 
> org.apache.calcite.sql.parser.SqlParser.parseStmt(SqlParser.java:188) at 
> org.apache.flink.table.planner.calcite.CalciteParser.parse(CalciteParser.java:54)
>  ... 3 moreCaused by: org.apache.flink.sql.parser.impl.ParseException: 
> Encountered "." at line 1, column 18.Was expecting one of:    ")" ...    "," 
> ...     at 
> org.apache.flink.sql.parser.impl.FlinkSqlParserImpl.generateParseException(FlinkSqlParserImpl.java:36161)
>  at 
> org.apache.flink.sql.parser.impl.FlinkSqlParserImpl.jj_consume_token(FlinkSqlParserImpl.java:35975)
>  at 
> org.apache.flink.sql.parser.impl.FlinkSqlParserImpl.ParenthesizedSimpleIdentifierList(FlinkSqlParserImpl.java:21432)
>  at 
> org.apache.flink.sql.parser.impl.FlinkSqlParserImpl.Expression3(FlinkSqlParserImpl.java:17164)
>  at 
> org.apache.flink.sql.parser.impl.FlinkSqlParserImpl.Expression2b(FlinkSqlParserImpl.java:16820)
>  at 
> org.apache.flink.sql.parser.impl.FlinkSqlParserImpl.Expression2(FlinkSqlParserImpl.java:16861)
>  at 
> org.apache.flink.sql.parser.impl.FlinkSqlParserImpl.Expression(FlinkSqlParserImpl.java:16792)
>  at 
> org.apache.flink.sql.parser.impl.FlinkSqlParserImpl.SelectExpression(FlinkSqlParserImpl.java:11091)
>  at 
> org.apache.flink.sql.parser.impl.FlinkSqlParserImpl.SelectItem(FlinkSqlParserImpl.java:10293)
>  at 
> org.apache.flink.sql.parser.impl.FlinkSqlParserImpl.SelectList(FlinkSqlParserImpl.java:10267)
>  at 
> org.apache.flink.sql.parser.impl.FlinkSqlParserImpl.SqlSelect(FlinkSqlParserImpl.java:6943)
>  at 
> org.apache.flink.sql.parser.impl.FlinkSqlParserImpl.LeafQuery(FlinkSqlParserImpl.java:658)
>  at 
> org.apache.flink.sql.parser.impl.FlinkSqlParserImpl.LeafQueryOrExpr(FlinkSqlParserImpl.java:16775)
>  at 
> org.apache.flink.sql.parser.impl.FlinkSqlParserImpl.QueryOrExpr(FlinkSqlParserImpl.java:16238)
>  at 
> org.apache.flink.sql.parser.impl.FlinkSqlParserImpl.OrderedQueryOrExpr(FlinkSqlParserImpl.java:532)
>  at 
> org.apache.flink.sql.parser.impl.FlinkSqlParserImpl.SqlStmt(FlinkSqlParserImpl.java:3761)
>  at 
> org.apache.flink.sql.parser.impl.FlinkSqlParserImpl.SqlStmtEof(FlinkSqlParserImpl.java:3800)
>  at 
> org.apache.flink.sql.parser.impl.FlinkSqlParserImpl.parseSqlStmtEof(FlinkSqlParserImpl.java:248)
>  at org.apache.calcite.sql.parser.SqlParser.parseQuery(SqlParser.java:161) 
> ... 5 more
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Closed] (FLINK-22424) Writing to already released buffers potentially causing data corruption during job failover/cancellation



 [ 
https://issues.apache.org/jira/browse/FLINK-22424?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Piotr Nowojski closed FLINK-22424.
--
Resolution: Fixed

merged to release-1.13 as da26733e484 and 65319b256c8

> Writing to already released buffers potentially causing data corruption 
> during job failover/cancellation
> 
>
> Key: FLINK-22424
> URL: https://issues.apache.org/jira/browse/FLINK-22424
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Network
>Affects Versions: 1.6.4, 1.7.2, 1.8.3, 1.9.3, 1.10.3, 1.11.3, 1.12.2, 
> 1.13.0
>Reporter: Piotr Nowojski
>Assignee: Piotr Nowojski
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 1.11.4, 1.14.0, 1.13.1, 1.12.4
>
>
> I modified the code to not re-use the same memory segments, but on recycling 
> always free up the segment. And what I have observed is a similar problem as 
> reported in FLINK-21181 ticket, but even more severe:
> {noformat}
> Caused by: java.lang.RuntimeException: segment has been freed
>   at 
> org.apache.flink.streaming.runtime.io.RecordWriterOutput.pushToRecordWriter(RecordWriterOutput.java:109)
>   at 
> org.apache.flink.streaming.runtime.io.RecordWriterOutput.collect(RecordWriterOutput.java:93)
>   at 
> org.apache.flink.streaming.runtime.io.RecordWriterOutput.collect(RecordWriterOutput.java:44)
>   at 
> org.apache.flink.streaming.api.operators.CountingOutput.collect(CountingOutput.java:50)
>   at 
> org.apache.flink.streaming.api.operators.CountingOutput.collect(CountingOutput.java:28)
>   at 
> org.apache.flink.streaming.api.operators.TimestampedCollector.collect(TimestampedCollector.java:50)
>   at 
> org.apache.flink.test.checkpointing.UnalignedCheckpointStressITCase$ReEmitAll.process(UnalignedCheckpointStressITCase.java:477)
>   at 
> org.apache.flink.test.checkpointing.UnalignedCheckpointStressITCase$ReEmitAll.process(UnalignedCheckpointStressITCase.java:468)
>   at 
> org.apache.flink.streaming.runtime.operators.windowing.functions.InternalIterableProcessWindowFunction.process(InternalIterableProcessWindowFunction.java:57)
>   at 
> org.apache.flink.streaming.runtime.operators.windowing.functions.InternalIterableProcessWindowFunction.process(InternalIterableProcessWindowFunction.java:32)
>   at 
> org.apache.flink.streaming.runtime.operators.windowing.WindowOperator.emitWindowContents(WindowOperator.java:577)
>   at 
> org.apache.flink.streaming.runtime.operators.windowing.WindowOperator.onProcessingTime(WindowOperator.java:533)
>   at 
> org.apache.flink.streaming.api.operators.InternalTimerServiceImpl.onProcessingTime(InternalTimerServiceImpl.java:284)
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.invokeProcessingTimeCallback(StreamTask.java:1395)
>   ... 11 more
> Caused by: java.lang.IllegalStateException: segment has been freed
>   at 
> org.apache.flink.core.memory.MemorySegment.put(MemorySegment.java:483)
>   at 
> org.apache.flink.core.memory.MemorySegment.put(MemorySegment.java:1398)
>   at 
> org.apache.flink.runtime.io.network.buffer.BufferBuilder.append(BufferBuilder.java:100)
>   at 
> org.apache.flink.runtime.io.network.buffer.BufferBuilder.appendAndCommit(BufferBuilder.java:82)
>   at 
> org.apache.flink.runtime.io.network.partition.BufferWritingResultPartition.appendUnicastDataForNewRecord(BufferWritingResultPartition.java:250)
>   at 
> org.apache.flink.runtime.io.network.partition.BufferWritingResultPartition.emitRecord(BufferWritingResultPartition.java:142)
>   at 
> org.apache.flink.runtime.io.network.api.writer.RecordWriter.emit(RecordWriter.java:104)
>   at 
> org.apache.flink.runtime.io.network.api.writer.ChannelSelectorRecordWriter.emit(ChannelSelectorRecordWriter.java:54)
>   at 
> org.apache.flink.streaming.runtime.io.RecordWriterOutput.pushToRecordWriter(RecordWriterOutput.java:107)
>   ... 24 more
> {noformat}
> That's happening also during cancellation/job failover. It's failing when 
> trying to write to already `free`'ed up buffer. Without my changes, this code 
> would silently write some data to a buffer that has already been 
> recycled/returned to the pool. If someone else would pick up this buffer, it 
> would easily lead to the data corruption.
> As far as I can tell, the exact reason behind this is that the buffer to 
> which timer attempts to write to, has been released from 
> `ResultSubpartition#onConsumedSubpartition`, causing `BufferConsumer` to be 
> closed (which recycles/frees underlying memory segment ), while matching 
> `BufferBuilder` is still being used...



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Comment Edited] (FLINK-10644) Batch Job: Speculative execution

[
https://issues.apache.org/jira/browse/FLINK-10644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17334427#comment-17334427
]

wangwj edited comment on FLINK-10644 at 4/30/21, 2:40 PM:
--

[~trohrmann]
Hi Till.

I think that speculative execution means that two executions in a
ExecutionVertex running at a same time, and failover means that two tasks
running at two different time. Based on this, I think this feature(speculative
execution) is theoretically achievable. So, I have implemented a speculative
execution for batch job based on Blink, and it had a significant effect in our
product cluster.

I did as follows:

[~trohrmann]
My implementation has played a big role in our product cluster in Alibaba.

[jira] [Updated] (FLINK-20724) Create a http handler for aggregating metrics from whole job



 [ 
https://issues.apache.org/jira/browse/FLINK-20724?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Piotr Nowojski updated FLINK-20724:
---
Affects Version/s: 1.13.0
   1.12.3

> Create a http handler for aggregating metrics from whole job
> 
>
> Key: FLINK-20724
> URL: https://issues.apache.org/jira/browse/FLINK-20724
> Project: Flink
>  Issue Type: Improvement
>  Components: Runtime / Metrics
>Affects Versions: 1.13.0, 1.12.3
>Reporter: Piotr Nowojski
>Priority: Major
>  Labels: stale-major
>
> This is an optimisation idea. 
> Create a similar http handler to {{AggregatingSubtasksMetricsHandler}}, but 
> one that would aggregate metrics per task, from all of the job vertices. The 
> new handler would only take {{JobID}} as a parameter. So that Web UI can in 
> one RPC obtain {{max(isBackPressureRatio)}} / 
> {{max(isCausingBackPressureRatio)}} per each task in the job graph.
> This is related to FLINK-14712, where we are invoking more REST calls to get 
> the statistics per each task/node (in order to color the nodes based on the 
> back pressure and busy times). With this new handler, WebUI could make a one 
> REST call to get all the metrics it needs, instead of doing one REST call per 
> every Task.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (FLINK-20724) Create a http handler for aggregating metrics from whole job



 [ 
https://issues.apache.org/jira/browse/FLINK-20724?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Piotr Nowojski updated FLINK-20724:
---
Fix Version/s: (was: 1.14.0)

> Create a http handler for aggregating metrics from whole job
> 
>
> Key: FLINK-20724
> URL: https://issues.apache.org/jira/browse/FLINK-20724
> Project: Flink
>  Issue Type: Improvement
>  Components: Runtime / Metrics
>Reporter: Piotr Nowojski
>Priority: Major
>  Labels: stale-major
>
> This is an optimisation idea. 
> Create a similar http handler to {{AggregatingSubtasksMetricsHandler}}, but 
> one that would aggregate metrics per task, from all of the job vertices. The 
> new handler would only take {{JobID}} as a parameter. So that Web UI can in 
> one RPC obtain {{max(isBackPressureRatio)}} / 
> {{max(isCausingBackPressureRatio)}} per each task in the job graph.
> This is related to FLINK-14712, where we are invoking more REST calls to get 
> the statistics per each task/node (in order to color the nodes based on the 
> back pressure and busy times). With this new handler, WebUI could make a one 
> REST call to get all the metrics it needs, instead of doing one REST call per 
> every Task.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (FLINK-22506) YARN job cluster stuck in retrying creating JobManager if savepoint is corrupted



[ 
https://issues.apache.org/jira/browse/FLINK-22506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17337421#comment-17337421
 ] 

Paul Lin commented on FLINK-22506:
--

[~knaufk] Actually I'm not using the application mode, and this issue has been 
around for a very long time. I've tried 1.12.1, and the problem still exists.

[~trohrmann] I agree that it's hard to distinguish non-retryable errors from 
the other ones. I think a simple thought to solve the problem is to make the 
attempt failed when an retryable or non-retryable error occurs, and leave YARN 
to decide whether the application should be restarted. The total restarts would 
be restricted by `yarn.application-attempts` and 
`yarn.application-attempt-failures-validity-interval`. 

> YARN job cluster stuck in retrying creating JobManager if savepoint is 
> corrupted
> 
>
> Key: FLINK-22506
> URL: https://issues.apache.org/jira/browse/FLINK-22506
> Project: Flink
>  Issue Type: Improvement
>  Components: Deployment / YARN
>Affects Versions: 1.11.3
>Reporter: Paul Lin
>Priority: Major
>
> If a non-retryable error (e.g. the savepoint is corrupted or unaccessible) 
> occurs during the initiation of the job manager, the job cluster exits with 
> an error code. But since it does not mark the attempt as failed, it won't be 
> count as a failed attempt, and YARN will keep retrying forever.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Closed] (FLINK-15550) testCancelTaskExceptionAfterTaskMarkedFailed failed on azure



 [ 
https://issues.apache.org/jira/browse/FLINK-15550?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Piotr Nowojski closed FLINK-15550.
--
Resolution: Cannot Reproduce

> testCancelTaskExceptionAfterTaskMarkedFailed failed on azure
> 
>
> Key: FLINK-15550
> URL: https://issues.apache.org/jira/browse/FLINK-15550
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Task
>Affects Versions: 1.11.0
>Reporter: Yun Tang
>Priority: Minor
>  Labels: auto-deprioritized-major
>
> Instance: 
> https://dev.azure.com/rmetzger/Flink/_build/results?buildId=4241=ms.vss-test-web.build-test-results-tab=12434=108939=debug
> {code:java}
> java.lang.AssertionError: expected: but was:
>   at 
> org.apache.flink.runtime.taskmanager.TaskTest.testCancelTaskExceptionAfterTaskMarkedFailed(TaskTest.java:525)
> {code}
> {code:java}
> expected: but was:
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[GitHub] [flink] flinkbot edited a comment on pull request #15768: [FLINK-22451][table] Support (*) as parameter of UDFs in Table API



flinkbot edited a comment on pull request #15768:
URL: https://github.com/apache/flink/pull/15768#issuecomment-826735938


   
   ## CI report:
   
   * 48814bd4d49167579110479e3f3cecabff6d443a Azure: 
[SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=17307)
 
   * 11787dc2e0c372007bddc148e0d4aec2ed9a275f UNKNOWN
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] flinkbot edited a comment on pull request #14172: [FLINK-19545][e2e] Add e2e test for native Kubernetes HA



flinkbot edited a comment on pull request #14172:
URL: https://github.com/apache/flink/pull/14172#issuecomment-73215


   
   ## CI report:
   
   * 6a74f815c70c03f2b374f4d422e857468bef9f12 Azure: 
[SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=10070)
 
   * bbe6f4a0e363170970b29269af446f0d93578e5f Azure: 
[PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=17469)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] yittg commented on a change in pull request #15768: [FLINK-22451][table] Support (*) as parameter of UDFs in Table API



yittg commented on a change in pull request #15768:
URL: https://github.com/apache/flink/pull/15768#discussion_r623903170



##
File path: 
flink-table/flink-table-planner-blink/src/test/java/org/apache/flink/table/planner/functions/CallFunctionITCase.java
##
@@ -0,0 +1,98 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.table.planner.functions;
+
+import org.apache.flink.runtime.testutils.MiniClusterResourceConfiguration;
+import org.apache.flink.table.annotation.DataTypeHint;
+import org.apache.flink.table.annotation.InputGroup;
+import org.apache.flink.table.api.EnvironmentSettings;
+import org.apache.flink.table.api.Table;
+import org.apache.flink.table.api.TableEnvironment;
+import org.apache.flink.table.functions.ScalarFunction;
+import org.apache.flink.test.util.MiniClusterWithClientResource;
+import org.apache.flink.types.Row;
+import org.apache.flink.util.CloseableIterator;
+
+import org.apache.flink.shaded.guava18.com.google.common.collect.Lists;
+
+import org.junit.ClassRule;
+import org.junit.Test;
+
+import java.util.ArrayList;
+import java.util.List;
+
+import static org.apache.flink.table.api.Expressions.$;
+import static org.apache.flink.table.api.Expressions.call;
+import static org.hamcrest.CoreMatchers.equalTo;
+import static org.junit.Assert.assertThat;
+
+/** IT tests for call functions. */
+public class CallFunctionITCase {

Review comment:
   @wuchong  Thanks, i have fixed it.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Updated] (FLINK-22256) Persist checkpoint type information



 [ 
https://issues.apache.org/jira/browse/FLINK-22256?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Till Rohrmann updated FLINK-22256:
--
Description: 
As a user, it is retrospectively difficult to determine what kind of checkpoint 
(i.e. incremental, unaligned, ...) was performed when looking only at the 
persisted checkpoint metadata.

The only way would be to look into the execution configuration of the job which 
might not be available anymore and can be scattered across the application code 
and cluster configuration.

It would be highly beneficial if such information would be part of the 
persisted metadata to not track these external pointers.

 It would also be great to persist the metadata information in a standardized 
format so that external projects don't need to use Flink's metadata serializers 
to access it.

  was:
As a user, it is retrospectively difficult to determine what kind of checkpoint 
(i.e. incremental, unaligned, ...) was performed when looking only at the 
persisted checkpoint metadata.

The only way would be to look into the execution configuration of the job which 
might not be available anymore and can be scattered across the application code 
and cluster configuration.

It would be highly beneficial if such information would be part of the 
persisted metadata to not track these external pointers.

 


> Persist checkpoint type information
> ---
>
> Key: FLINK-22256
> URL: https://issues.apache.org/jira/browse/FLINK-22256
> Project: Flink
>  Issue Type: Improvement
>  Components: Runtime / Checkpointing
>Reporter: Fabian Paul
>Priority: Major
>
> As a user, it is retrospectively difficult to determine what kind of 
> checkpoint (i.e. incremental, unaligned, ...) was performed when looking only 
> at the persisted checkpoint metadata.
> The only way would be to look into the execution configuration of the job 
> which might not be available anymore and can be scattered across the 
> application code and cluster configuration.
> It would be highly beneficial if such information would be part of the 
> persisted metadata to not track these external pointers.
>  It would also be great to persist the metadata information in a standardized 
> format so that external projects don't need to use Flink's metadata 
> serializers to access it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (FLINK-22256) Persist checkpoint type information



 [ 
https://issues.apache.org/jira/browse/FLINK-22256?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Till Rohrmann updated FLINK-22256:
--
Priority: Major  (was: Minor)

> Persist checkpoint type information
> ---
>
> Key: FLINK-22256
> URL: https://issues.apache.org/jira/browse/FLINK-22256
> Project: Flink
>  Issue Type: Improvement
>  Components: Runtime / Checkpointing
>Reporter: Fabian Paul
>Priority: Major
>
> As a user, it is retrospectively difficult to determine what kind of 
> checkpoint (i.e. incremental, unaligned, ...) was performed when looking only 
> at the persisted checkpoint metadata.
> The only way would be to look into the execution configuration of the job 
> which might not be available anymore and can be scattered across the 
> application code and cluster configuration.
> It would be highly beneficial if such information would be part of the 
> persisted metadata to not track these external pointers.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[GitHub] [flink] flinkbot edited a comment on pull request #15819: [FLINK-22512][Hive] Fix issue that can't call current_timestamp with hive dialect for hive-3.1



flinkbot edited a comment on pull request #15819:
URL: https://github.com/apache/flink/pull/15819#issuecomment-829984795


   
   ## CI report:
   
   * 4c52c0c429eb766f22ac20a71e0dcd80b28ff718 Azure: 
[SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=17457)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] flinkbot edited a comment on pull request #14172: [FLINK-19545][e2e] Add e2e test for native Kubernetes HA



flinkbot edited a comment on pull request #14172:
URL: https://github.com/apache/flink/pull/14172#issuecomment-73215


   
   ## CI report:
   
   * 6a74f815c70c03f2b374f4d422e857468bef9f12 Azure: 
[SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=10070)
 
   * bbe6f4a0e363170970b29269af446f0d93578e5f UNKNOWN
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] flinkbot edited a comment on pull request #15818: [FLINK-22539][python][docs] Restructure the Python dependency management documentation



flinkbot edited a comment on pull request #15818:
URL: https://github.com/apache/flink/pull/15818#issuecomment-829970320


   
   ## CI report:
   
   * d44f98b523a36e71a1f265768f4ea55a3aeb28ff Azure: 
[SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=17461)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] flinkbot edited a comment on pull request #15789: [FLINK-21181][runtime] Wait for Invokable cancellation before releasing network resources



flinkbot edited a comment on pull request #15789:
URL: https://github.com/apache/flink/pull/15789#issuecomment-827996694


   
   ## CI report:
   
   * 4c5180310bf76e96f2665bf53531eccb1fa86421 UNKNOWN
   * f87ee85814873dee4ed181eee11df9a6758a916e Azure: 
[FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=17443)
 
   * 46e1be2c4832080bf1cb48c509b77cd88872d024 Azure: 
[PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=17468)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] flinkbot edited a comment on pull request #15817: [FLINK-14393][webui] Add an option to enable/disable cancel job in we…



flinkbot edited a comment on pull request #15817:
URL: https://github.com/apache/flink/pull/15817#issuecomment-829959024


   
   ## CI report:
   
   * 31f01fe2108f5e18f26f50e352d3ed353bd8ffc8 Azure: 
[CANCELED](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=17455)
 
   * c14593a5ccd7896b041421ee4d3e433a7afc34ac Azure: 
[PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=17460)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] tillrohrmann commented on pull request #14172: [FLINK-19545][e2e] Add e2e test for native Kubernetes HA



tillrohrmann commented on pull request #14172:
URL: https://github.com/apache/flink/pull/14172#issuecomment-830086650


   Also, it seems that the test does not fully clean up the `minikube` cluster 
and Docker. Not sure whether this is due to my local setup or not.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Commented] (FLINK-19916) Hadoop3 ShutdownHookManager visit closed ClassLoader



[ 
https://issues.apache.org/jira/browse/FLINK-19916?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17337377#comment-17337377
 ] 

Robert Metzger commented on FLINK-19916:


I've reset the priority to Major since multiple tickets have been opened about 
this issue already.


> Hadoop3 ShutdownHookManager visit closed ClassLoader
> 
>
> Key: FLINK-19916
> URL: https://issues.apache.org/jira/browse/FLINK-19916
> Project: Flink
>  Issue Type: Bug
>  Components: Connectors / Hadoop Compatibility
>Reporter: Jingsong Lee
>Priority: Major
>  Labels: auto-deprioritized-major
>
> {code:java}
> Exception in thread "Thread-10" java.lang.IllegalStateException: Trying to 
> access closed classloader. Please check if you store classloaders directly or 
> indirectly in static fields. If the stacktrace suggests that the leak occurs 
> in a third party library and cannot be fixed immediately, you can disable 
> this check with the configuration 'classloader.check-leaked-classloader'.
>   at 
> org.apache.flink.runtime.execution.librarycache.FlinkUserCodeClassLoaders$SafetyNetWrapperClassLoader.ensureInner(FlinkUserCodeClassLoaders.java:161)
>   at 
> org.apache.flink.runtime.execution.librarycache.FlinkUserCodeClassLoaders$SafetyNetWrapperClassLoader.getResource(FlinkUserCodeClassLoaders.java:179)
>   at 
> org.apache.hadoop.conf.Configuration.getResource(Configuration.java:2780)
>   at 
> org.apache.hadoop.conf.Configuration.getStreamReader(Configuration.java:3036)
>   at 
> org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:2995)
>   at 
> org.apache.hadoop.conf.Configuration.loadResources(Configuration.java:2968)
>   at 
> org.apache.hadoop.conf.Configuration.getProps(Configuration.java:2848)
>   at org.apache.hadoop.conf.Configuration.get(Configuration.java:1200)
>   at 
> org.apache.hadoop.conf.Configuration.getTimeDuration(Configuration.java:1812)
>   at 
> org.apache.hadoop.conf.Configuration.getTimeDuration(Configuration.java:1789)
>   at 
> org.apache.hadoop.util.ShutdownHookManager.getShutdownTimeout(ShutdownHookManager.java:183)
>   at 
> org.apache.hadoop.util.ShutdownHookManager.shutdownExecutor(ShutdownHookManager.java:145)
>   at 
> org.apache.hadoop.util.ShutdownHookManager.access$300(ShutdownHookManager.java:65)
>   at 
> org.apache.hadoop.util.ShutdownHookManager$1.run(ShutdownHookManager.java:102)
> {code}
> This is because Hadoop 3 starts asynchronous threads to execute some shutdown 
> hooks.
>  These hooks are run after the job is executed, as a result, the classloader 
> has been released, but in hooks, configuration still holds the released 
> classloader, so it will fail to throw an exception in this asynchronous 
> thread.
> Now it doesn't affect our function, it just prints the exception stack on the 
> console.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (FLINK-19916) Hadoop3 ShutdownHookManager visit closed ClassLoader



 [ 
https://issues.apache.org/jira/browse/FLINK-19916?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Metzger updated FLINK-19916:
---
Affects Version/s: 1.12.2

> Hadoop3 ShutdownHookManager visit closed ClassLoader
> 
>
> Key: FLINK-19916
> URL: https://issues.apache.org/jira/browse/FLINK-19916
> Project: Flink
>  Issue Type: Bug
>  Components: Connectors / Hadoop Compatibility
>Affects Versions: 1.12.2
>Reporter: Jingsong Lee
>Priority: Major
>  Labels: auto-deprioritized-major
>
> {code:java}
> Exception in thread "Thread-10" java.lang.IllegalStateException: Trying to 
> access closed classloader. Please check if you store classloaders directly or 
> indirectly in static fields. If the stacktrace suggests that the leak occurs 
> in a third party library and cannot be fixed immediately, you can disable 
> this check with the configuration 'classloader.check-leaked-classloader'.
>   at 
> org.apache.flink.runtime.execution.librarycache.FlinkUserCodeClassLoaders$SafetyNetWrapperClassLoader.ensureInner(FlinkUserCodeClassLoaders.java:161)
>   at 
> org.apache.flink.runtime.execution.librarycache.FlinkUserCodeClassLoaders$SafetyNetWrapperClassLoader.getResource(FlinkUserCodeClassLoaders.java:179)
>   at 
> org.apache.hadoop.conf.Configuration.getResource(Configuration.java:2780)
>   at 
> org.apache.hadoop.conf.Configuration.getStreamReader(Configuration.java:3036)
>   at 
> org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:2995)
>   at 
> org.apache.hadoop.conf.Configuration.loadResources(Configuration.java:2968)
>   at 
> org.apache.hadoop.conf.Configuration.getProps(Configuration.java:2848)
>   at org.apache.hadoop.conf.Configuration.get(Configuration.java:1200)
>   at 
> org.apache.hadoop.conf.Configuration.getTimeDuration(Configuration.java:1812)
>   at 
> org.apache.hadoop.conf.Configuration.getTimeDuration(Configuration.java:1789)
>   at 
> org.apache.hadoop.util.ShutdownHookManager.getShutdownTimeout(ShutdownHookManager.java:183)
>   at 
> org.apache.hadoop.util.ShutdownHookManager.shutdownExecutor(ShutdownHookManager.java:145)
>   at 
> org.apache.hadoop.util.ShutdownHookManager.access$300(ShutdownHookManager.java:65)
>   at 
> org.apache.hadoop.util.ShutdownHookManager$1.run(ShutdownHookManager.java:102)
> {code}
> This is because Hadoop 3 starts asynchronous threads to execute some shutdown 
> hooks.
>  These hooks are run after the job is executed, as a result, the classloader 
> has been released, but in hooks, configuration still holds the released 
> classloader, so it will fail to throw an exception in this asynchronous 
> thread.
> Now it doesn't affect our function, it just prints the exception stack on the 
> console.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Closed] (FLINK-21914) Trying to access closed classloader



 [ 
https://issues.apache.org/jira/browse/FLINK-21914?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Metzger closed FLINK-21914.
--
Resolution: Duplicate

> Trying to access closed classloader
> ---
>
> Key: FLINK-21914
> URL: https://issues.apache.org/jira/browse/FLINK-21914
> Project: Flink
>  Issue Type: Bug
>  Components: Connectors / Hive, Deployment / YARN
>Affects Versions: 1.12.2
> Environment: flink: 1.12.2
> hadoop: 3.1.3
> hive: 3.1.2
>  
>Reporter: Spongebob
>Priority: Critical
> Attachments: app.log
>
>
> I am trying to deploy flink application on yarn, but got this exception: 
> Exception in thread "Thread-9" java.lang.IllegalStateException: Trying to 
> access closed classloader. Please check if you store classloaders directly or 
> indirectly in static fields. If the stacktrace suggests that the leak occurs 
> in a third party library and cannot be fixed immediately, you can disable 
> this check with the configuration 'classloader.check-leaked-classloader'.
>  
> This application tested pass on my local environment. And the application 
> detail is read and write into hive via flink table environment. you can view 
> attachment for yarn log which  source and sink data info was deleted
> Exception in thread "Thread-9" java.lang.IllegalStateException: Trying to 
> access closed classloader. Please check if you store classloaders directly or 
> indirectly in static fields. If the stacktrace suggests that the leak occurs 
> in a third party library and cannot be fixed immediately, you can disable 
> this check with the configuration 'classloader.check-
> leaked-classloader'.
> {code}
> Exception in thread "Thread-9" java.lang.IllegalStateException: Trying to 
> access closed classloader. Please check if you store classloaders directly or 
> indirectly in static fields. If the stacktrace suggests that the leak occurs 
> in a third party library and cannot be fixed immediately, you can disable 
> this check with the configuration 'classloader.check-leaked-classloader'.
>   at 
> org.apache.flink.runtime.execution.librarycache.FlinkUserCodeClassLoaders$SafetyNetWrapperClassLoader.ensureInner(FlinkUserCodeClassLoaders.java:164)
>   at 
> org.apache.flink.runtime.execution.librarycache.FlinkUserCodeClassLoaders$SafetyNetWrapperClassLoader.getResource(FlinkUserCodeClassLoaders.java:183)
>   at 
> org.apache.hadoop.conf.Configuration.getResource(Configuration.java:2780)
>   at 
> org.apache.hadoop.conf.Configuration.getStreamReader(Configuration.java:3036)
>   at 
> org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:2995)
>   at 
> org.apache.hadoop.conf.Configuration.loadResources(Configuration.java:2968)
>   at 
> org.apache.hadoop.conf.Configuration.getProps(Configuration.java:2848)
>   at org.apache.hadoop.conf.Configuration.get(Configuration.java:1200)
>   at 
> org.apache.hadoop.conf.Configuration.getTimeDuration(Configuration.java:1812)
>   at 
> org.apache.hadoop.conf.Configuration.getTimeDuration(Configuration.java:1789)
>   at 
> org.apache.hadoop.util.ShutdownHookManager.getShutdownTimeout(ShutdownHookManager.java:183)
>   at 
> org.apache.hadoop.util.ShutdownHookManager.shutdownExecutor(ShutdownHookManager.java:145)
>   at 
> org.apache.hadoop.util.ShutdownHookManager.access$300(ShutdownHookManager.java:65)
>   at 
> org.apache.hadoop.util.ShutdownHookManager$1.run(ShutdownHookManager.java:102)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (FLINK-19916) Hadoop3 ShutdownHookManager visit closed ClassLoader



 [ 
https://issues.apache.org/jira/browse/FLINK-19916?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Metzger updated FLINK-19916:
---
Priority: Major  (was: Minor)

> Hadoop3 ShutdownHookManager visit closed ClassLoader
> 
>
> Key: FLINK-19916
> URL: https://issues.apache.org/jira/browse/FLINK-19916
> Project: Flink
>  Issue Type: Bug
>  Components: Connectors / Hadoop Compatibility
>Reporter: Jingsong Lee
>Priority: Major
>  Labels: auto-deprioritized-major
>
> {code:java}
> Exception in thread "Thread-10" java.lang.IllegalStateException: Trying to 
> access closed classloader. Please check if you store classloaders directly or 
> indirectly in static fields. If the stacktrace suggests that the leak occurs 
> in a third party library and cannot be fixed immediately, you can disable 
> this check with the configuration 'classloader.check-leaked-classloader'.
>   at 
> org.apache.flink.runtime.execution.librarycache.FlinkUserCodeClassLoaders$SafetyNetWrapperClassLoader.ensureInner(FlinkUserCodeClassLoaders.java:161)
>   at 
> org.apache.flink.runtime.execution.librarycache.FlinkUserCodeClassLoaders$SafetyNetWrapperClassLoader.getResource(FlinkUserCodeClassLoaders.java:179)
>   at 
> org.apache.hadoop.conf.Configuration.getResource(Configuration.java:2780)
>   at 
> org.apache.hadoop.conf.Configuration.getStreamReader(Configuration.java:3036)
>   at 
> org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:2995)
>   at 
> org.apache.hadoop.conf.Configuration.loadResources(Configuration.java:2968)
>   at 
> org.apache.hadoop.conf.Configuration.getProps(Configuration.java:2848)
>   at org.apache.hadoop.conf.Configuration.get(Configuration.java:1200)
>   at 
> org.apache.hadoop.conf.Configuration.getTimeDuration(Configuration.java:1812)
>   at 
> org.apache.hadoop.conf.Configuration.getTimeDuration(Configuration.java:1789)
>   at 
> org.apache.hadoop.util.ShutdownHookManager.getShutdownTimeout(ShutdownHookManager.java:183)
>   at 
> org.apache.hadoop.util.ShutdownHookManager.shutdownExecutor(ShutdownHookManager.java:145)
>   at 
> org.apache.hadoop.util.ShutdownHookManager.access$300(ShutdownHookManager.java:65)
>   at 
> org.apache.hadoop.util.ShutdownHookManager$1.run(ShutdownHookManager.java:102)
> {code}
> This is because Hadoop 3 starts asynchronous threads to execute some shutdown 
> hooks.
>  These hooks are run after the job is executed, as a result, the classloader 
> has been released, but in hooks, configuration still holds the released 
> classloader, so it will fail to throw an exception in this asynchronous 
> thread.
> Now it doesn't affect our function, it just prints the exception stack on the 
> console.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (FLINK-21914) Trying to access closed classloader



[ 
https://issues.apache.org/jira/browse/FLINK-21914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17337376#comment-17337376
 ] 

Robert Metzger commented on FLINK-21914:


I would like to close this ticket and track the resolution of the issue in 
FLINK-19916.

> Trying to access closed classloader
> ---
>
> Key: FLINK-21914
> URL: https://issues.apache.org/jira/browse/FLINK-21914
> Project: Flink
>  Issue Type: Bug
>  Components: Connectors / Hive, Deployment / YARN
>Affects Versions: 1.12.2
> Environment: flink: 1.12.2
> hadoop: 3.1.3
> hive: 3.1.2
>  
>Reporter: Spongebob
>Priority: Critical
> Attachments: app.log
>
>
> I am trying to deploy flink application on yarn, but got this exception: 
> Exception in thread "Thread-9" java.lang.IllegalStateException: Trying to 
> access closed classloader. Please check if you store classloaders directly or 
> indirectly in static fields. If the stacktrace suggests that the leak occurs 
> in a third party library and cannot be fixed immediately, you can disable 
> this check with the configuration 'classloader.check-leaked-classloader'.
>  
> This application tested pass on my local environment. And the application 
> detail is read and write into hive via flink table environment. you can view 
> attachment for yarn log which  source and sink data info was deleted
> Exception in thread "Thread-9" java.lang.IllegalStateException: Trying to 
> access closed classloader. Please check if you store classloaders directly or 
> indirectly in static fields. If the stacktrace suggests that the leak occurs 
> in a third party library and cannot be fixed immediately, you can disable 
> this check with the configuration 'classloader.check-
> leaked-classloader'.
> {code}
> Exception in thread "Thread-9" java.lang.IllegalStateException: Trying to 
> access closed classloader. Please check if you store classloaders directly or 
> indirectly in static fields. If the stacktrace suggests that the leak occurs 
> in a third party library and cannot be fixed immediately, you can disable 
> this check with the configuration 'classloader.check-leaked-classloader'.
>   at 
> org.apache.flink.runtime.execution.librarycache.FlinkUserCodeClassLoaders$SafetyNetWrapperClassLoader.ensureInner(FlinkUserCodeClassLoaders.java:164)
>   at 
> org.apache.flink.runtime.execution.librarycache.FlinkUserCodeClassLoaders$SafetyNetWrapperClassLoader.getResource(FlinkUserCodeClassLoaders.java:183)
>   at 
> org.apache.hadoop.conf.Configuration.getResource(Configuration.java:2780)
>   at 
> org.apache.hadoop.conf.Configuration.getStreamReader(Configuration.java:3036)
>   at 
> org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:2995)
>   at 
> org.apache.hadoop.conf.Configuration.loadResources(Configuration.java:2968)
>   at 
> org.apache.hadoop.conf.Configuration.getProps(Configuration.java:2848)
>   at org.apache.hadoop.conf.Configuration.get(Configuration.java:1200)
>   at 
> org.apache.hadoop.conf.Configuration.getTimeDuration(Configuration.java:1812)
>   at 
> org.apache.hadoop.conf.Configuration.getTimeDuration(Configuration.java:1789)
>   at 
> org.apache.hadoop.util.ShutdownHookManager.getShutdownTimeout(ShutdownHookManager.java:183)
>   at 
> org.apache.hadoop.util.ShutdownHookManager.shutdownExecutor(ShutdownHookManager.java:145)
>   at 
> org.apache.hadoop.util.ShutdownHookManager.access$300(ShutdownHookManager.java:65)
>   at 
> org.apache.hadoop.util.ShutdownHookManager$1.run(ShutdownHookManager.java:102)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (FLINK-21914) Trying to access closed classloader



[ 
https://issues.apache.org/jira/browse/FLINK-21914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17337375#comment-17337375
 ] 

Robert Metzger commented on FLINK-21914:


Just deploying on YARN doesn't cause the exceptions on the TaskManagers.
I guess it's because you are running Hive code there.


As far as I can tell, the exception is not nice, but it does not cause any 
further issues, as it is thrown after the shutdown hooks have been executed.

> Trying to access closed classloader
> ---
>
> Key: FLINK-21914
> URL: https://issues.apache.org/jira/browse/FLINK-21914
> Project: Flink
>  Issue Type: Bug
>  Components: Connectors / Hive, Deployment / YARN
>Affects Versions: 1.12.2
> Environment: flink: 1.12.2
> hadoop: 3.1.3
> hive: 3.1.2
>  
>Reporter: Spongebob
>Priority: Critical
> Attachments: app.log
>
>
> I am trying to deploy flink application on yarn, but got this exception: 
> Exception in thread "Thread-9" java.lang.IllegalStateException: Trying to 
> access closed classloader. Please check if you store classloaders directly or 
> indirectly in static fields. If the stacktrace suggests that the leak occurs 
> in a third party library and cannot be fixed immediately, you can disable 
> this check with the configuration 'classloader.check-leaked-classloader'.
>  
> This application tested pass on my local environment. And the application 
> detail is read and write into hive via flink table environment. you can view 
> attachment for yarn log which  source and sink data info was deleted
> Exception in thread "Thread-9" java.lang.IllegalStateException: Trying to 
> access closed classloader. Please check if you store classloaders directly or 
> indirectly in static fields. If the stacktrace suggests that the leak occurs 
> in a third party library and cannot be fixed immediately, you can disable 
> this check with the configuration 'classloader.check-
> leaked-classloader'.
> {code}
> Exception in thread "Thread-9" java.lang.IllegalStateException: Trying to 
> access closed classloader. Please check if you store classloaders directly or 
> indirectly in static fields. If the stacktrace suggests that the leak occurs 
> in a third party library and cannot be fixed immediately, you can disable 
> this check with the configuration 'classloader.check-leaked-classloader'.
>   at 
> org.apache.flink.runtime.execution.librarycache.FlinkUserCodeClassLoaders$SafetyNetWrapperClassLoader.ensureInner(FlinkUserCodeClassLoaders.java:164)
>   at 
> org.apache.flink.runtime.execution.librarycache.FlinkUserCodeClassLoaders$SafetyNetWrapperClassLoader.getResource(FlinkUserCodeClassLoaders.java:183)
>   at 
> org.apache.hadoop.conf.Configuration.getResource(Configuration.java:2780)
>   at 
> org.apache.hadoop.conf.Configuration.getStreamReader(Configuration.java:3036)
>   at 
> org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:2995)
>   at 
> org.apache.hadoop.conf.Configuration.loadResources(Configuration.java:2968)
>   at 
> org.apache.hadoop.conf.Configuration.getProps(Configuration.java:2848)
>   at org.apache.hadoop.conf.Configuration.get(Configuration.java:1200)
>   at 
> org.apache.hadoop.conf.Configuration.getTimeDuration(Configuration.java:1812)
>   at 
> org.apache.hadoop.conf.Configuration.getTimeDuration(Configuration.java:1789)
>   at 
> org.apache.hadoop.util.ShutdownHookManager.getShutdownTimeout(ShutdownHookManager.java:183)
>   at 
> org.apache.hadoop.util.ShutdownHookManager.shutdownExecutor(ShutdownHookManager.java:145)
>   at 
> org.apache.hadoop.util.ShutdownHookManager.access$300(ShutdownHookManager.java:65)
>   at 
> org.apache.hadoop.util.ShutdownHookManager$1.run(ShutdownHookManager.java:102)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (FLINK-21914) Trying to access closed classloader



 [ 
https://issues.apache.org/jira/browse/FLINK-21914?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Metzger updated FLINK-21914:
---
Component/s: Deployment / YARN

> Trying to access closed classloader
> ---
>
> Key: FLINK-21914
> URL: https://issues.apache.org/jira/browse/FLINK-21914
> Project: Flink
>  Issue Type: Bug
>  Components: Connectors / Hive, Deployment / YARN
>Affects Versions: 1.12.2
> Environment: flink: 1.12.2
> hadoop: 3.1.3
> hive: 3.1.2
>  
>Reporter: Spongebob
>Priority: Critical
> Attachments: app.log
>
>
> I am trying to deploy flink application on yarn, but got this exception: 
> Exception in thread "Thread-9" java.lang.IllegalStateException: Trying to 
> access closed classloader. Please check if you store classloaders directly or 
> indirectly in static fields. If the stacktrace suggests that the leak occurs 
> in a third party library and cannot be fixed immediately, you can disable 
> this check with the configuration 'classloader.check-leaked-classloader'.
>  
> This application tested pass on my local environment. And the application 
> detail is read and write into hive via flink table environment. you can view 
> attachment for yarn log which  source and sink data info was deleted
> Exception in thread "Thread-9" java.lang.IllegalStateException: Trying to 
> access closed classloader. Please check if you store classloaders directly or 
> indirectly in static fields. If the stacktrace suggests that the leak occurs 
> in a third party library and cannot be fixed immediately, you can disable 
> this check with the configuration 'classloader.check-
> leaked-classloader'.
> {code}
> Exception in thread "Thread-9" java.lang.IllegalStateException: Trying to 
> access closed classloader. Please check if you store classloaders directly or 
> indirectly in static fields. If the stacktrace suggests that the leak occurs 
> in a third party library and cannot be fixed immediately, you can disable 
> this check with the configuration 'classloader.check-leaked-classloader'.
>   at 
> org.apache.flink.runtime.execution.librarycache.FlinkUserCodeClassLoaders$SafetyNetWrapperClassLoader.ensureInner(FlinkUserCodeClassLoaders.java:164)
>   at 
> org.apache.flink.runtime.execution.librarycache.FlinkUserCodeClassLoaders$SafetyNetWrapperClassLoader.getResource(FlinkUserCodeClassLoaders.java:183)
>   at 
> org.apache.hadoop.conf.Configuration.getResource(Configuration.java:2780)
>   at 
> org.apache.hadoop.conf.Configuration.getStreamReader(Configuration.java:3036)
>   at 
> org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:2995)
>   at 
> org.apache.hadoop.conf.Configuration.loadResources(Configuration.java:2968)
>   at 
> org.apache.hadoop.conf.Configuration.getProps(Configuration.java:2848)
>   at org.apache.hadoop.conf.Configuration.get(Configuration.java:1200)
>   at 
> org.apache.hadoop.conf.Configuration.getTimeDuration(Configuration.java:1812)
>   at 
> org.apache.hadoop.conf.Configuration.getTimeDuration(Configuration.java:1789)
>   at 
> org.apache.hadoop.util.ShutdownHookManager.getShutdownTimeout(ShutdownHookManager.java:183)
>   at 
> org.apache.hadoop.util.ShutdownHookManager.shutdownExecutor(ShutdownHookManager.java:145)
>   at 
> org.apache.hadoop.util.ShutdownHookManager.access$300(ShutdownHookManager.java:65)
>   at 
> org.apache.hadoop.util.ShutdownHookManager$1.run(ShutdownHookManager.java:102)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (FLINK-21914) Trying to access closed classloader



 [ 
https://issues.apache.org/jira/browse/FLINK-21914?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Metzger updated FLINK-21914:
---
Component/s: (was: Deployment / YARN)
 Connectors / Hive

> Trying to access closed classloader
> ---
>
> Key: FLINK-21914
> URL: https://issues.apache.org/jira/browse/FLINK-21914
> Project: Flink
>  Issue Type: Bug
>  Components: Connectors / Hive
>Affects Versions: 1.12.2
> Environment: flink: 1.12.2
> hadoop: 3.1.3
> hive: 3.1.2
>  
>Reporter: Spongebob
>Priority: Critical
> Attachments: app.log
>
>
> I am trying to deploy flink application on yarn, but got this exception: 
> Exception in thread "Thread-9" java.lang.IllegalStateException: Trying to 
> access closed classloader. Please check if you store classloaders directly or 
> indirectly in static fields. If the stacktrace suggests that the leak occurs 
> in a third party library and cannot be fixed immediately, you can disable 
> this check with the configuration 'classloader.check-leaked-classloader'.
>  
> This application tested pass on my local environment. And the application 
> detail is read and write into hive via flink table environment. you can view 
> attachment for yarn log which  source and sink data info was deleted
> Exception in thread "Thread-9" java.lang.IllegalStateException: Trying to 
> access closed classloader. Please check if you store classloaders directly or 
> indirectly in static fields. If the stacktrace suggests that the leak occurs 
> in a third party library and cannot be fixed immediately, you can disable 
> this check with the configuration 'classloader.check-
> leaked-classloader'.
> {code}
> Exception in thread "Thread-9" java.lang.IllegalStateException: Trying to 
> access closed classloader. Please check if you store classloaders directly or 
> indirectly in static fields. If the stacktrace suggests that the leak occurs 
> in a third party library and cannot be fixed immediately, you can disable 
> this check with the configuration 'classloader.check-leaked-classloader'.
>   at 
> org.apache.flink.runtime.execution.librarycache.FlinkUserCodeClassLoaders$SafetyNetWrapperClassLoader.ensureInner(FlinkUserCodeClassLoaders.java:164)
>   at 
> org.apache.flink.runtime.execution.librarycache.FlinkUserCodeClassLoaders$SafetyNetWrapperClassLoader.getResource(FlinkUserCodeClassLoaders.java:183)
>   at 
> org.apache.hadoop.conf.Configuration.getResource(Configuration.java:2780)
>   at 
> org.apache.hadoop.conf.Configuration.getStreamReader(Configuration.java:3036)
>   at 
> org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:2995)
>   at 
> org.apache.hadoop.conf.Configuration.loadResources(Configuration.java:2968)
>   at 
> org.apache.hadoop.conf.Configuration.getProps(Configuration.java:2848)
>   at org.apache.hadoop.conf.Configuration.get(Configuration.java:1200)
>   at 
> org.apache.hadoop.conf.Configuration.getTimeDuration(Configuration.java:1812)
>   at 
> org.apache.hadoop.conf.Configuration.getTimeDuration(Configuration.java:1789)
>   at 
> org.apache.hadoop.util.ShutdownHookManager.getShutdownTimeout(ShutdownHookManager.java:183)
>   at 
> org.apache.hadoop.util.ShutdownHookManager.shutdownExecutor(ShutdownHookManager.java:145)
>   at 
> org.apache.hadoop.util.ShutdownHookManager.access$300(ShutdownHookManager.java:65)
>   at 
> org.apache.hadoop.util.ShutdownHookManager$1.run(ShutdownHookManager.java:102)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (FLINK-21914) Trying to access closed classloader



[ 
https://issues.apache.org/jira/browse/FLINK-21914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17337371#comment-17337371
 ] 

Robert Metzger commented on FLINK-21914:


This is a known issue, reported here FLINK-22509 and here already: FLINK-19916.

The logs you've provided are interesting, because the issue also seem to appear 
on TaskManagers.

> Trying to access closed classloader
> ---
>
> Key: FLINK-21914
> URL: https://issues.apache.org/jira/browse/FLINK-21914
> Project: Flink
>  Issue Type: Bug
>  Components: Deployment / YARN
>Affects Versions: 1.12.2
> Environment: flink: 1.12.2
> hadoop: 3.1.3
> hive: 3.1.2
>  
>Reporter: Spongebob
>Priority: Critical
> Attachments: app.log
>
>
> I am trying to deploy flink application on yarn, but got this exception: 
> Exception in thread "Thread-9" java.lang.IllegalStateException: Trying to 
> access closed classloader. Please check if you store classloaders directly or 
> indirectly in static fields. If the stacktrace suggests that the leak occurs 
> in a third party library and cannot be fixed immediately, you can disable 
> this check with the configuration 'classloader.check-leaked-classloader'.
>  
> This application tested pass on my local environment. And the application 
> detail is read and write into hive via flink table environment. you can view 
> attachment for yarn log which  source and sink data info was deleted
> Exception in thread "Thread-9" java.lang.IllegalStateException: Trying to 
> access closed classloader. Please check if you store classloaders directly or 
> indirectly in static fields. If the stacktrace suggests that the leak occurs 
> in a third party library and cannot be fixed immediately, you can disable 
> this check with the configuration 'classloader.check-
> leaked-classloader'.
> {code}
> Exception in thread "Thread-9" java.lang.IllegalStateException: Trying to 
> access closed classloader. Please check if you store classloaders directly or 
> indirectly in static fields. If the stacktrace suggests that the leak occurs 
> in a third party library and cannot be fixed immediately, you can disable 
> this check with the configuration 'classloader.check-leaked-classloader'.
>   at 
> org.apache.flink.runtime.execution.librarycache.FlinkUserCodeClassLoaders$SafetyNetWrapperClassLoader.ensureInner(FlinkUserCodeClassLoaders.java:164)
>   at 
> org.apache.flink.runtime.execution.librarycache.FlinkUserCodeClassLoaders$SafetyNetWrapperClassLoader.getResource(FlinkUserCodeClassLoaders.java:183)
>   at 
> org.apache.hadoop.conf.Configuration.getResource(Configuration.java:2780)
>   at 
> org.apache.hadoop.conf.Configuration.getStreamReader(Configuration.java:3036)
>   at 
> org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:2995)
>   at 
> org.apache.hadoop.conf.Configuration.loadResources(Configuration.java:2968)
>   at 
> org.apache.hadoop.conf.Configuration.getProps(Configuration.java:2848)
>   at org.apache.hadoop.conf.Configuration.get(Configuration.java:1200)
>   at 
> org.apache.hadoop.conf.Configuration.getTimeDuration(Configuration.java:1812)
>   at 
> org.apache.hadoop.conf.Configuration.getTimeDuration(Configuration.java:1789)
>   at 
> org.apache.hadoop.util.ShutdownHookManager.getShutdownTimeout(ShutdownHookManager.java:183)
>   at 
> org.apache.hadoop.util.ShutdownHookManager.shutdownExecutor(ShutdownHookManager.java:145)
>   at 
> org.apache.hadoop.util.ShutdownHookManager.access$300(ShutdownHookManager.java:65)
>   at 
> org.apache.hadoop.util.ShutdownHookManager$1.run(ShutdownHookManager.java:102)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (FLINK-21914) Trying to access closed classloader



 [ 
https://issues.apache.org/jira/browse/FLINK-21914?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Metzger updated FLINK-21914:
---
Component/s: (was: API / Core)
 Deployment / YARN

> Trying to access closed classloader
> ---
>
> Key: FLINK-21914
> URL: https://issues.apache.org/jira/browse/FLINK-21914
> Project: Flink
>  Issue Type: Bug
>  Components: Deployment / YARN
>Affects Versions: 1.12.2
> Environment: flink: 1.12.2
> hadoop: 3.1.3
> hive: 3.1.2
>  
>Reporter: Spongebob
>Priority: Critical
> Attachments: app.log
>
>
> I am trying to deploy flink application on yarn, but got this exception: 
> Exception in thread "Thread-9" java.lang.IllegalStateException: Trying to 
> access closed classloader. Please check if you store classloaders directly or 
> indirectly in static fields. If the stacktrace suggests that the leak occurs 
> in a third party library and cannot be fixed immediately, you can disable 
> this check with the configuration 'classloader.check-leaked-classloader'.
>  
> This application tested pass on my local environment. And the application 
> detail is read and write into hive via flink table environment. you can view 
> attachment for yarn log which  source and sink data info was deleted
> Exception in thread "Thread-9" java.lang.IllegalStateException: Trying to 
> access closed classloader. Please check if you store classloaders directly or 
> indirectly in static fields. If the stacktrace suggests that the leak occurs 
> in a third party library and cannot be fixed immediately, you can disable 
> this check with the configuration 'classloader.check-
> leaked-classloader'.
> {code}
> Exception in thread "Thread-9" java.lang.IllegalStateException: Trying to 
> access closed classloader. Please check if you store classloaders directly or 
> indirectly in static fields. If the stacktrace suggests that the leak occurs 
> in a third party library and cannot be fixed immediately, you can disable 
> this check with the configuration 'classloader.check-leaked-classloader'.
>   at 
> org.apache.flink.runtime.execution.librarycache.FlinkUserCodeClassLoaders$SafetyNetWrapperClassLoader.ensureInner(FlinkUserCodeClassLoaders.java:164)
>   at 
> org.apache.flink.runtime.execution.librarycache.FlinkUserCodeClassLoaders$SafetyNetWrapperClassLoader.getResource(FlinkUserCodeClassLoaders.java:183)
>   at 
> org.apache.hadoop.conf.Configuration.getResource(Configuration.java:2780)
>   at 
> org.apache.hadoop.conf.Configuration.getStreamReader(Configuration.java:3036)
>   at 
> org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:2995)
>   at 
> org.apache.hadoop.conf.Configuration.loadResources(Configuration.java:2968)
>   at 
> org.apache.hadoop.conf.Configuration.getProps(Configuration.java:2848)
>   at org.apache.hadoop.conf.Configuration.get(Configuration.java:1200)
>   at 
> org.apache.hadoop.conf.Configuration.getTimeDuration(Configuration.java:1812)
>   at 
> org.apache.hadoop.conf.Configuration.getTimeDuration(Configuration.java:1789)
>   at 
> org.apache.hadoop.util.ShutdownHookManager.getShutdownTimeout(ShutdownHookManager.java:183)
>   at 
> org.apache.hadoop.util.ShutdownHookManager.shutdownExecutor(ShutdownHookManager.java:145)
>   at 
> org.apache.hadoop.util.ShutdownHookManager.access$300(ShutdownHookManager.java:65)
>   at 
> org.apache.hadoop.util.ShutdownHookManager$1.run(ShutdownHookManager.java:102)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (FLINK-22014) Flink JobManager failed to restart after failure in kubernetes HA setup



[ 
https://issues.apache.org/jira/browse/FLINK-22014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17337369#comment-17337369
 ] 

Till Rohrmann commented on FLINK-22014:
---

I couldn't really find anything suspicious in the logs. Would it be possible to 
configure a different filesystem for the HA storage directory [~mlushchytski]? 
Maybe it has something to do with S3. If not, then the problem should also 
arise with a different filesystem.

> Flink JobManager failed to restart after failure in kubernetes HA setup
> ---
>
> Key: FLINK-22014
> URL: https://issues.apache.org/jira/browse/FLINK-22014
> Project: Flink
>  Issue Type: Bug
>  Components: Deployment / Kubernetes
>Affects Versions: 1.11.3, 1.12.2, 1.13.0
>Reporter: Mikalai Lushchytski
>Assignee: Till Rohrmann
>Priority: Major
>  Labels: k8s-ha, pull-request-available
> Fix For: 1.11.4, 1.14.0, 1.12.4
>
> Attachments: flink-logs.txt.zip, image-2021-04-19-11-17-58-215.png, 
> scalyr-logs (1).txt
>
>
> After the JobManager pod failed and the new one started, it was not able to 
> recover jobs due to the absence of recovery data in storage - config map 
> pointed at not existing file.
>   
>  Due to this the JobManager pod entered into the `CrashLoopBackOff`state and 
> was not able to recover - each attempt failed with the same error so the 
> whole cluster became unrecoverable and not operating.
>   
>  I had to manually delete the config map and start the jobs again without the 
> save point.
>   
>  If I tried to emulate the failure further by deleting job manager pod 
> manually, the new pod every time recovered well and issue was not 
> reproducible anymore artificially.
>   
>  Below is the failure log:
> {code:java}
> 2021-03-26 08:22:57,925 INFO 
> org.apache.flink.runtime.resourcemanager.slotmanager.SlotManagerImpl [] - 
> Starting the SlotManager.
>  2021-03-26 08:22:57,928 INFO 
> org.apache.flink.runtime.leaderretrieval.DefaultLeaderRetrievalService [] - 
> Starting DefaultLeaderRetrievalService with KubernetesLeaderRetrievalDriver
> {configMapName='stellar-flink-cluster-dispatcher-leader'}.
>  2021-03-26 08:22:57,931 INFO 
> org.apache.flink.runtime.jobmanager.DefaultJobGraphStore [] - Retrieved job 
> ids [198c46bac791e73ebcc565a550fa4ff6, 344f5ebc1b5c3a566b4b2837813e4940, 
> 96c4603a0822d10884f7fe536703d811, d9ded24224aab7c7041420b3efc1b6ba] from 
> KubernetesStateHandleStore{configMapName='stellar-flink-cluster-dispatcher-leader'}
> 2021-03-26 08:22:57,933 INFO 
> org.apache.flink.runtime.dispatcher.runner.SessionDispatcherLeaderProcess [] 
> - Trying to recover job with job id 198c46bac791e73ebcc565a550fa4ff6.
>  2021-03-26 08:22:58,029 INFO 
> org.apache.flink.runtime.dispatcher.runner.SessionDispatcherLeaderProcess [] 
> - Stopping SessionDispatcherLeaderProcess.
>  2021-03-26 08:28:22,677 INFO 
> org.apache.flink.runtime.jobmanager.DefaultJobGraphStore [] - Stopping 
> DefaultJobGraphStore. 2021-03-26 08:28:22,681 ERROR 
> org.apache.flink.runtime.entrypoint.ClusterEntrypoint [] - Fatal error 
> occurred in the cluster entrypoint. java.util.concurrent.CompletionException: 
> org.apache.flink.util.FlinkRuntimeException: Could not recover job with job 
> id 198c46bac791e73ebcc565a550fa4ff6.
>at java.util.concurrent.CompletableFuture.encodeThrowable(Unknown Source) 
> ~[?:?]
>at java.util.concurrent.CompletableFuture.completeThrowable(Unknown 
> Source) [?:?]
>at java.util.concurrent.CompletableFuture$AsyncSupply.run(Unknown Source) 
> [?:?]
>at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) [?:?]
>at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) [?:?]
>at java.lang.Thread.run(Unknown Source) [?:?] Caused by: 
> org.apache.flink.util.FlinkRuntimeException: Could not recover job with job 
> id 198c46bac791e73ebcc565a550fa4ff6.
>at 
> org.apache.flink.runtime.dispatcher.runner.SessionDispatcherLeaderProcess.recoverJob(SessionDispatcherLeaderProcess.java:144
>  undefined) ~[flink-dist_2.12-1.12.2.jar:1.12.2]
>at 
> org.apache.flink.runtime.dispatcher.runner.SessionDispatcherLeaderProcess.recoverJobs(SessionDispatcherLeaderProcess.java:122
>  undefined) ~[flink-dist_2.12-1.12.2.jar:1.12.2]
>at 
> org.apache.flink.runtime.dispatcher.runner.AbstractDispatcherLeaderProcess.supplyUnsynchronizedIfRunning(AbstractDispatcherLeaderProcess.java:198
>  undefined) ~[flink-dist_2.12-1.12.2.jar:1.12.2]
>at 
> org.apache.flink.runtime.dispatcher.runner.SessionDispatcherLeaderProcess.recoverJobsIfRunning(SessionDispatcherLeaderProcess.java:113
>  undefined) ~[flink-dist_2.12-1.12.2.jar:1.12.2] ... 4 more 
> Caused by: org.apache.flink.util.FlinkException: Could not retrieve submitted 
> JobGraph from state

[jira] [Commented] (FLINK-22509) ./bin/flink run -m yarn-cluster -d submission leads to IllegalStateException



[ 
https://issues.apache.org/jira/browse/FLINK-22509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17337368#comment-17337368
 ] 

Robert Metzger commented on FLINK-22509:


Closing as a duplicate of FLINK-19916.

> ./bin/flink run -m yarn-cluster -d submission leads to IllegalStateException
> 
>
> Key: FLINK-22509
> URL: https://issues.apache.org/jira/browse/FLINK-22509
> Project: Flink
>  Issue Type: Bug
>  Components: Deployment / YARN
>Affects Versions: 1.13.0, 1.14.0
>Reporter: Robert Metzger
>Assignee: Robert Metzger
>Priority: Major
>
> Submitting a detached, per-job YARN cluster in Flink (like this: 
> {{./bin/flink run -m yarn-cluster -d  
> ./examples/streaming/TopSpeedWindowing.jar}}), leads to the following 
> exception:
> {code}
> 2021-04-28 11:39:00,786 INFO  org.apache.flink.yarn.YarnClusterDescriptor 
>  [] - Found Web Interface 
> ip-172-31-27-232.eu-central-1.compute.internal:45689 of application 
> 'application_1619607372651_0005'.
> Job has been submitted with JobID 5543e81db9c2de78b646088891f23bfc
> Exception in thread "Thread-4" java.lang.IllegalStateException: Trying to 
> access closed classloader. Please check if you store classloaders directly or 
> indirectly in static fields. If the stacktrace suggests that the leak occurs 
> in a third party library and cannot be fixed immediately, you can disable 
> this check with the configuration 'classloader.check-leaked-classloader'.
>   at 
> org.apache.flink.runtime.execution.librarycache.FlinkUserCodeClassLoaders$SafetyNetWrapperClassLoader.ensureInner(FlinkUserCodeClassLoaders.java:164)
>   at 
> org.apache.flink.runtime.execution.librarycache.FlinkUserCodeClassLoaders$SafetyNetWrapperClassLoader.getResource(FlinkUserCodeClassLoaders.java:183)
>   at 
> org.apache.hadoop.conf.Configuration.getResource(Configuration.java:2570)
>   at 
> org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:2783)
>   at 
> org.apache.hadoop.conf.Configuration.loadResources(Configuration.java:2758)
>   at 
> org.apache.hadoop.conf.Configuration.getProps(Configuration.java:2638)
>   at org.apache.hadoop.conf.Configuration.get(Configuration.java:1100)
>   at 
> org.apache.hadoop.conf.Configuration.getTimeDuration(Configuration.java:1707)
>   at 
> org.apache.hadoop.conf.Configuration.getTimeDuration(Configuration.java:1688)
>   at 
> org.apache.hadoop.util.ShutdownHookManager.getShutdownTimeout(ShutdownHookManager.java:183)
>   at 
> org.apache.hadoop.util.ShutdownHookManager.shutdownExecutor(ShutdownHookManager.java:145)
>   at 
> org.apache.hadoop.util.ShutdownHookManager.access$300(ShutdownHookManager.java:65)
>   at 
> org.apache.hadoop.util.ShutdownHookManager$1.run(ShutdownHookManager.java:102)
> {code}
> The job is still running as expected.
> Detached submission with {{./bin/flink run-application -t yarn-application 
> -d}} works as expected. This is also the documented approach.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Closed] (FLINK-22509) ./bin/flink run -m yarn-cluster -d submission leads to IllegalStateException



 [ 
https://issues.apache.org/jira/browse/FLINK-22509?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Metzger closed FLINK-22509.
--
Resolution: Duplicate

> ./bin/flink run -m yarn-cluster -d submission leads to IllegalStateException
> 
>
> Key: FLINK-22509
> URL: https://issues.apache.org/jira/browse/FLINK-22509
> Project: Flink
>  Issue Type: Bug
>  Components: Deployment / YARN
>Affects Versions: 1.13.0, 1.14.0
>Reporter: Robert Metzger
>Assignee: Robert Metzger
>Priority: Major
>
> Submitting a detached, per-job YARN cluster in Flink (like this: 
> {{./bin/flink run -m yarn-cluster -d  
> ./examples/streaming/TopSpeedWindowing.jar}}), leads to the following 
> exception:
> {code}
> 2021-04-28 11:39:00,786 INFO  org.apache.flink.yarn.YarnClusterDescriptor 
>  [] - Found Web Interface 
> ip-172-31-27-232.eu-central-1.compute.internal:45689 of application 
> 'application_1619607372651_0005'.
> Job has been submitted with JobID 5543e81db9c2de78b646088891f23bfc
> Exception in thread "Thread-4" java.lang.IllegalStateException: Trying to 
> access closed classloader. Please check if you store classloaders directly or 
> indirectly in static fields. If the stacktrace suggests that the leak occurs 
> in a third party library and cannot be fixed immediately, you can disable 
> this check with the configuration 'classloader.check-leaked-classloader'.
>   at 
> org.apache.flink.runtime.execution.librarycache.FlinkUserCodeClassLoaders$SafetyNetWrapperClassLoader.ensureInner(FlinkUserCodeClassLoaders.java:164)
>   at 
> org.apache.flink.runtime.execution.librarycache.FlinkUserCodeClassLoaders$SafetyNetWrapperClassLoader.getResource(FlinkUserCodeClassLoaders.java:183)
>   at 
> org.apache.hadoop.conf.Configuration.getResource(Configuration.java:2570)
>   at 
> org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:2783)
>   at 
> org.apache.hadoop.conf.Configuration.loadResources(Configuration.java:2758)
>   at 
> org.apache.hadoop.conf.Configuration.getProps(Configuration.java:2638)
>   at org.apache.hadoop.conf.Configuration.get(Configuration.java:1100)
>   at 
> org.apache.hadoop.conf.Configuration.getTimeDuration(Configuration.java:1707)
>   at 
> org.apache.hadoop.conf.Configuration.getTimeDuration(Configuration.java:1688)
>   at 
> org.apache.hadoop.util.ShutdownHookManager.getShutdownTimeout(ShutdownHookManager.java:183)
>   at 
> org.apache.hadoop.util.ShutdownHookManager.shutdownExecutor(ShutdownHookManager.java:145)
>   at 
> org.apache.hadoop.util.ShutdownHookManager.access$300(ShutdownHookManager.java:65)
>   at 
> org.apache.hadoop.util.ShutdownHookManager$1.run(ShutdownHookManager.java:102)
> {code}
> The job is still running as expected.
> Detached submission with {{./bin/flink run-application -t yarn-application 
> -d}} works as expected. This is also the documented approach.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (FLINK-19916) Hadoop3 ShutdownHookManager visit closed ClassLoader



[ 
https://issues.apache.org/jira/browse/FLINK-19916?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17337367#comment-17337367
 ] 

Robert Metzger commented on FLINK-19916:


Where did you see this exception?  I found the same issue while submitting a 
job with YARN: FLINK-22509.
It seems that this exception is thrown during the shutdown of the executor 
itself, so ideally all the shutdown actions are completed.

> Hadoop3 ShutdownHookManager visit closed ClassLoader
> 
>
> Key: FLINK-19916
> URL: https://issues.apache.org/jira/browse/FLINK-19916
> Project: Flink
>  Issue Type: Bug
>  Components: Connectors / Hadoop Compatibility
>Reporter: Jingsong Lee
>Priority: Minor
>  Labels: auto-deprioritized-major
>
> {code:java}
> Exception in thread "Thread-10" java.lang.IllegalStateException: Trying to 
> access closed classloader. Please check if you store classloaders directly or 
> indirectly in static fields. If the stacktrace suggests that the leak occurs 
> in a third party library and cannot be fixed immediately, you can disable 
> this check with the configuration 'classloader.check-leaked-classloader'.
>   at 
> org.apache.flink.runtime.execution.librarycache.FlinkUserCodeClassLoaders$SafetyNetWrapperClassLoader.ensureInner(FlinkUserCodeClassLoaders.java:161)
>   at 
> org.apache.flink.runtime.execution.librarycache.FlinkUserCodeClassLoaders$SafetyNetWrapperClassLoader.getResource(FlinkUserCodeClassLoaders.java:179)
>   at 
> org.apache.hadoop.conf.Configuration.getResource(Configuration.java:2780)
>   at 
> org.apache.hadoop.conf.Configuration.getStreamReader(Configuration.java:3036)
>   at 
> org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:2995)
>   at 
> org.apache.hadoop.conf.Configuration.loadResources(Configuration.java:2968)
>   at 
> org.apache.hadoop.conf.Configuration.getProps(Configuration.java:2848)
>   at org.apache.hadoop.conf.Configuration.get(Configuration.java:1200)
>   at 
> org.apache.hadoop.conf.Configuration.getTimeDuration(Configuration.java:1812)
>   at 
> org.apache.hadoop.conf.Configuration.getTimeDuration(Configuration.java:1789)
>   at 
> org.apache.hadoop.util.ShutdownHookManager.getShutdownTimeout(ShutdownHookManager.java:183)
>   at 
> org.apache.hadoop.util.ShutdownHookManager.shutdownExecutor(ShutdownHookManager.java:145)
>   at 
> org.apache.hadoop.util.ShutdownHookManager.access$300(ShutdownHookManager.java:65)
>   at 
> org.apache.hadoop.util.ShutdownHookManager$1.run(ShutdownHookManager.java:102)
> {code}
> This is because Hadoop 3 starts asynchronous threads to execute some shutdown 
> hooks.
>  These hooks are run after the job is executed, as a result, the classloader 
> has been released, but in hooks, configuration still holds the released 
> classloader, so it will fail to throw an exception in this asynchronous 
> thread.
> Now it doesn't affect our function, it just prints the exception stack on the 
> console.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (FLINK-22139) Flink Jobmanager & Task Manger logs are not writing to the logs files



[ 
https://issues.apache.org/jira/browse/FLINK-22139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17337366#comment-17337366
 ] 

Till Rohrmann commented on FLINK-22139:
---

[~bhagi__R] I've tested to deploy {{jobmanager-session-deployment-non-ha.yaml}} 
as specified 
[here|https://ci.apache.org/projects/flink/flink-docs-master/docs/deployment/resource-providers/standalone/kubernetes/#session-cluster-resource-definitions]
 and I could access the logs via {{kubectl logs pod-name}} as well as by 
logging into the container and the accessing the file under {{/opt/flink/log}}. 
I am not sure why it isn't working for you.

> Flink Jobmanager & Task Manger logs are not writing to the logs files
> -
>
> Key: FLINK-22139
> URL: https://issues.apache.org/jira/browse/FLINK-22139
> Project: Flink
>  Issue Type: Bug
>  Components: Deployment / Kubernetes
>Affects Versions: 1.12.2
> Environment: on kubernetes flink standalone deployment with 
> jobmanager HA is enabled.
>Reporter: Bhagi
>Priority: Major
>
> Hi Team,
> I am submitting the jobs and restarting the job manager and task manager 
> pods..  Log files are generating with the name task manager and job manager.
> but job manager & task manager log file size is '0', i am not sure any 
> configuration missed..why logs are not writing to their log files..
> # Task Manager pod###
> flink@flink-taskmanager-85b6585b7-hhgl7:~$ ls -lart log/
> total 0
> -rw-r--r-- 1 flink flink  0 Apr  7 09:35 
> flink--taskexecutor-0-flink-taskmanager-85b6585b7-hhgl7.log
> flink@flink-taskmanager-85b6585b7-hhgl7:~$
> ### Jobmanager pod Logs #
> flink@flink-jobmanager-f6db89b7f-lq4ps:~$
> -rw-r--r-- 1 7148739 flink 0 Apr  7 06:36 
> flink--standalonesession-0-flink-jobmanager-f6db89b7f-gtkx5.log
> -rw-r--r-- 1 7148739 flink 0 Apr  7 06:36 
> flink--standalonesession-0-flink-jobmanager-f6db89b7f-wnrfm.log
> -rw-r--r-- 1 7148739 flink 0 Apr  7 06:37 
> flink--standalonesession-0-flink-jobmanager-f6db89b7f-2b2fs.log
> -rw-r--r-- 1 7148739 flink 0 Apr  7 06:37 
> flink--standalonesession-0-flink-jobmanager-f6db89b7f-7kdhh.log
> -rw-r--r-- 1 7148739 flink 0 Apr  7 09:35 
> flink--standalonesession-0-flink-jobmanager-f6db89b7f-twhkt.log
> drwxrwxrwx 2 7148739 flink35 Apr  7 09:35 .
> -rw-r--r-- 1 7148739 flink 0 Apr  7 09:35 
> flink--standalonesession-0-flink-jobmanager-f6db89b7f-lq4ps.log
> flink@flink-jobmanager-f6db89b7f-lq4ps:~$
> I configured log4j.properties for flink
> log4j.properties: |+
> monitorInterval=30
> rootLogger.level = INFO
> rootLogger.appenderRef.file.ref = MainAppender
> logger.flink.name = org.apache.flink
> logger.flink.level = INFO
> logger.akka.name = akka
> logger.akka.level = INFO
> appender.main.name = MainAppender
> appender.main.type = RollingFile
> appender.main.append = true
> appender.main.fileName = ${sys:log.file}
> appender.main.filePattern = ${sys:log.file}.%i
> appender.main.layout.type = PatternLayout
> appender.main.layout.pattern = %d{-MM-dd HH:mm:ss,SSS} %-5p %-60c %x 
> - %m%n
> appender.main.policies.type = Policies
> appender.main.policies.size.type = SizeBasedTriggeringPolicy
> appender.main.policies.size.size = 100MB
> appender.main.policies.startup.type = OnStartupTriggeringPolicy
> appender.main.strategy.type = DefaultRolloverStrategy
> appender.main.strategy.max = ${env:MAX_LOG_FILE_NUMBER:-10}
> logger.netty.name = 
> org.apache.flink.shaded.akka.org.jboss.netty.channel.DefaultChannelPipeline
> logger.netty.level = OFF



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[GitHub] [flink] flinkbot edited a comment on pull request #15820: [FLINK-22535][runtime] CleanUp is invoked for task even when the task…



flinkbot edited a comment on pull request #15820:
URL: https://github.com/apache/flink/pull/15820#issuecomment-830052015


   
   ## CI report:
   
   * c548fc5c79450d898eebc1c6523be98472fe1cdd Azure: 
[PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=17467)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] flinkbot edited a comment on pull request #15656: [FLINK-22233][Table SQL / API]Modified the spelling error of word "constant" in source code



flinkbot edited a comment on pull request #15656:
URL: https://github.com/apache/flink/pull/15656#issuecomment-822135263


   
   ## CI report:
   
   * 6358c33368746cf2c93463ab423f5c9ff11b641b Azure: 
[FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=17410)
 
   * fb69e9a4c5c897e75659c28915f9b9aa08e2f0c3 Azure: 
[PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=17466)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] flinkbot edited a comment on pull request #15131: [FLINK-21700][security] Add an option to disable credential retrieval on a secure cluster



flinkbot edited a comment on pull request #15131:
URL: https://github.com/apache/flink/pull/15131#issuecomment-794957907


   
   ## CI report:
   
   * 60812bb2804fd21d56a47a23ca4b42a04300f1c0 Azure: 
[SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=17419)
 
   * 8cb12deac019898bb69f7a5faf7f803dd27d71d0 Azure: 
[PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=17465)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Commented] (FLINK-18027) ROW value constructor cannot deal with complex expressions

2021-04-30 Thread Timo Walther (Jira)



[ 
https://issues.apache.org/jira/browse/FLINK-18027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17337350#comment-17337350
 ] 

Timo Walther commented on FLINK-18027:
--

I would keep the naming consistent with our existing data types. `map_`, 
`array_`, `row_` to find functions easier. Furthermore, we still have 
user-defined structured types which could cause confusion if we call it 
`struct`.

> ROW value constructor cannot deal with complex expressions
> --
>
> Key: FLINK-18027
> URL: https://issues.apache.org/jira/browse/FLINK-18027
> Project: Flink
>  Issue Type: Bug
>  Components: Table SQL / API
>Reporter: Benchao Li
>Priority: Major
>  Labels: pull-request-available
>
> {code:java}
> create table my_source (
> my_row row
> ) with (...);
> create table my_sink (
> my_row row
> ) with (...);
> insert into my_sink
> select ROW(my_row.a, my_row.b) 
> from my_source;{code}
> will throw excepions:
> {code:java}
> Exception in thread "main" org.apache.flink.table.api.SqlParserException: SQL 
> parse failed. Encountered "." at line 1, column 18.Exception in thread "main" 
> org.apache.flink.table.api.SqlParserException: SQL parse failed. Encountered 
> "." at line 1, column 18.Was expecting one of:    ")" ...    "," ...     at 
> org.apache.flink.table.planner.calcite.CalciteParser.parse(CalciteParser.java:56)
>  at 
> org.apache.flink.table.planner.delegation.ParserImpl.parse(ParserImpl.java:64)
>  at 
> org.apache.flink.table.api.internal.TableEnvironmentImpl.sqlQuery(TableEnvironmentImpl.java:627)
>  at com.bytedance.demo.KafkaTableSource.main(KafkaTableSource.java:76)Caused 
> by: org.apache.calcite.sql.parser.SqlParseException: Encountered "." at line 
> 1, column 18.Was expecting one of:    ")" ...    "," ...     at 
> org.apache.flink.sql.parser.impl.FlinkSqlParserImpl.convertException(FlinkSqlParserImpl.java:416)
>  at 
> org.apache.flink.sql.parser.impl.FlinkSqlParserImpl.normalizeException(FlinkSqlParserImpl.java:201)
>  at 
> org.apache.calcite.sql.parser.SqlParser.handleException(SqlParser.java:148) 
> at org.apache.calcite.sql.parser.SqlParser.parseQuery(SqlParser.java:163) at 
> org.apache.calcite.sql.parser.SqlParser.parseStmt(SqlParser.java:188) at 
> org.apache.flink.table.planner.calcite.CalciteParser.parse(CalciteParser.java:54)
>  ... 3 moreCaused by: org.apache.flink.sql.parser.impl.ParseException: 
> Encountered "." at line 1, column 18.Was expecting one of:    ")" ...    "," 
> ...     at 
> org.apache.flink.sql.parser.impl.FlinkSqlParserImpl.generateParseException(FlinkSqlParserImpl.java:36161)
>  at 
> org.apache.flink.sql.parser.impl.FlinkSqlParserImpl.jj_consume_token(FlinkSqlParserImpl.java:35975)
>  at 
> org.apache.flink.sql.parser.impl.FlinkSqlParserImpl.ParenthesizedSimpleIdentifierList(FlinkSqlParserImpl.java:21432)
>  at 
> org.apache.flink.sql.parser.impl.FlinkSqlParserImpl.Expression3(FlinkSqlParserImpl.java:17164)
>  at 
> org.apache.flink.sql.parser.impl.FlinkSqlParserImpl.Expression2b(FlinkSqlParserImpl.java:16820)
>  at 
> org.apache.flink.sql.parser.impl.FlinkSqlParserImpl.Expression2(FlinkSqlParserImpl.java:16861)
>  at 
> org.apache.flink.sql.parser.impl.FlinkSqlParserImpl.Expression(FlinkSqlParserImpl.java:16792)
>  at 
> org.apache.flink.sql.parser.impl.FlinkSqlParserImpl.SelectExpression(FlinkSqlParserImpl.java:11091)
>  at 
> org.apache.flink.sql.parser.impl.FlinkSqlParserImpl.SelectItem(FlinkSqlParserImpl.java:10293)
>  at 
> org.apache.flink.sql.parser.impl.FlinkSqlParserImpl.SelectList(FlinkSqlParserImpl.java:10267)
>  at 
> org.apache.flink.sql.parser.impl.FlinkSqlParserImpl.SqlSelect(FlinkSqlParserImpl.java:6943)
>  at 
> org.apache.flink.sql.parser.impl.FlinkSqlParserImpl.LeafQuery(FlinkSqlParserImpl.java:658)
>  at 
> org.apache.flink.sql.parser.impl.FlinkSqlParserImpl.LeafQueryOrExpr(FlinkSqlParserImpl.java:16775)
>  at 
> org.apache.flink.sql.parser.impl.FlinkSqlParserImpl.QueryOrExpr(FlinkSqlParserImpl.java:16238)
>  at 
> org.apache.flink.sql.parser.impl.FlinkSqlParserImpl.OrderedQueryOrExpr(FlinkSqlParserImpl.java:532)
>  at 
> org.apache.flink.sql.parser.impl.FlinkSqlParserImpl.SqlStmt(FlinkSqlParserImpl.java:3761)
>  at 
> org.apache.flink.sql.parser.impl.FlinkSqlParserImpl.SqlStmtEof(FlinkSqlParserImpl.java:3800)
>  at 
> org.apache.flink.sql.parser.impl.FlinkSqlParserImpl.parseSqlStmtEof(FlinkSqlParserImpl.java:248)
>  at org.apache.calcite.sql.parser.SqlParser.parseQuery(SqlParser.java:161) 
> ... 5 more
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (FLINK-21949) Support collect to array aggregate function

2021-04-30 Thread Jark Wu (Jira)



[ 
https://issues.apache.org/jira/browse/FLINK-21949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17337348#comment-17337348
 ] 

Jark Wu commented on FLINK-21949:
-

The only concern of ARRAY_AGG from my side is it sounds similar to LISTAGG, 
however returns different type (ARRAY vs. STRING). 

LISTAGG is introduced in  SQL-2016:  https://modern-sql.com/feature/listagg

> Support collect to array aggregate function
> ---
>
> Key: FLINK-21949
> URL: https://issues.apache.org/jira/browse/FLINK-21949
> Project: Flink
>  Issue Type: Sub-task
>  Components: Table SQL / API
>Affects Versions: 1.12.0
>Reporter: jiabao sun
>Priority: Minor
>
> Some nosql databases like mongodb and elasticsearch support nested data types.
> Aggregating multiple rows into ARRAY is a common requirement.
> The CollectToArray function is similar to Collect, except that it returns 
> ARRAY instead of MULTISET.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[GitHub] [flink] lirui-apache commented on a change in pull request #15712: [FLINK-22400][hive connect]fix NPE problem when convert flink object for Map



lirui-apache commented on a change in pull request #15712:
URL: https://github.com/apache/flink/pull/15712#discussion_r623836414



##
File path: 
flink-connectors/flink-connector-hive/src/test/java/org/apache/flink/connectors/hive/TableEnvHiveConnectorITCase.java
##
@@ -531,6 +532,52 @@ public void testLocationWithComma() throws Exception {
 }
 }
 
+@Test
+public void testReadHiveDataWithEmptyMapForHiveShim20X() throws Exception {
+TableEnvironment tableEnv = getTableEnvWithHiveCatalog();
+
+try {
+// Flink to write parquet file
+String folderURI = TEMPORARY_FOLDER.newFolder().toURI().toString();

Review comment:
   This is already a `tempFolder` member in this test.

##
File path: 
flink-connectors/flink-connector-hive/src/test/java/org/apache/flink/connectors/hive/TableEnvHiveConnectorITCase.java
##
@@ -531,6 +532,52 @@ public void testLocationWithComma() throws Exception {
 }
 }
 
+@Test
+public void testReadHiveDataWithEmptyMapForHiveShim20X() throws Exception {
+TableEnvironment tableEnv = getTableEnvWithHiveCatalog();
+
+try {
+// Flink to write parquet file

Review comment:
   Does this issue only happen with Parquet tables? If not, we can just use 
text table so that the test can be simpler.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] pnowojski commented on a change in pull request #15820: [FLINK-22535][runtime] CleanUp is invoked for task even when the task…



pnowojski commented on a change in pull request #15820:
URL: https://github.com/apache/flink/pull/15820#discussion_r623819184



##
File path: 
flink-streaming-java/src/test/java/org/apache/flink/streaming/runtime/tasks/StreamTaskTest.java
##
@@ -1619,6 +1616,34 @@ public void 
testTaskAvoidHangingAfterSnapshotStateThrownException() throws Excep
 }
 }
 
+@Test
+public void testCleanUpResourcesWhenFailingDuringInit() throws Exception {
+// given: Configured SourceStreamTask with source which fails during 
initialization.
+Configuration taskManagerConfig = new Configuration();
+taskManagerConfig.setString(STATE_BACKEND, 
TestMemoryStateBackendFactory.class.getName());
+
+StreamConfig cfg = new StreamConfig(new Configuration());
+cfg.setStateKeySerializer(mock(TypeSerializer.class));
+cfg.setOperatorID(new OperatorID(4712L, 43L));
+
+cfg.setStreamOperator(new TestStreamSource<>(new InitFailedSource()));
+cfg.setTimeCharacteristic(TimeCharacteristic.ProcessingTime);
+
+try (NettyShuffleEnvironment shuffleEnv = new 
NettyShuffleEnvironmentBuilder().build()) {
+Task task = createTask(SourceStreamTask.class, shuffleEnv, cfg, 
taskManagerConfig);
+
+// when: Task starts.
+task.startTaskThread();
+
+// wait for clean termination.
+task.getExecutingThread().join();
+
+// then: The task should clean up all resources even when it 
failed on init.
+assertEquals(ExecutionState.FAILED, task.getExecutionState());
+assertFalse(InitFailedSource.resourceAllocated);

Review comment:
   nit: rename and invert boolean logic to 
`assertTrue(InitFailedSource.wasClosed)`? It would be more accurate compared to 
"allocated".

##
File path: 
flink-streaming-java/src/main/java/org/apache/flink/streaming/runtime/tasks/StreamTask.java
##
@@ -609,42 +615,54 @@ private void ensureNotCanceled() {
 
 @Override
 public final void invoke() throws Exception {
-try {
-// Allow invoking method 'invoke' without having to call 'restore' 
before it.
-if (!isRunning) {
-LOG.debug("Restoring during invoke will be called.");
-restore();
-}
+runWithCleanUpOnFail(this::executeInvoke);
+
+cleanUpInvoke();
+}
+
+private void executeInvoke() throws Exception {
+// Allow invoking method 'invoke' without having to call 'restore' 
before it.
+if (!isRunning) {
+LOG.debug("Restoring during invoke will be called.");
+restore();

Review comment:
   `executeRestore()` + add a test for not swallowing exceptions if 
`restore` hasn't been invoked?

##
File path: 
flink-streaming-java/src/main/java/org/apache/flink/streaming/runtime/tasks/StreamTask.java
##
@@ -609,42 +615,54 @@ private void ensureNotCanceled() {
 
 @Override
 public final void invoke() throws Exception {
-try {
-// Allow invoking method 'invoke' without having to call 'restore' 
before it.
-if (!isRunning) {
-LOG.debug("Restoring during invoke will be called.");
-restore();
-}
+runWithCleanUpOnFail(this::executeInvoke);
+
+cleanUpInvoke();
+}
+
+private void executeInvoke() throws Exception {
+// Allow invoking method 'invoke' without having to call 'restore' 
before it.
+if (!isRunning) {
+LOG.debug("Restoring during invoke will be called.");
+restore();
+}
 
-// final check to exit early before starting to run
-ensureNotCanceled();
+// final check to exit early before starting to run
+ensureNotCanceled();
 
-// let the task do its work
-runMailboxLoop();
+// let the task do its work
+runMailboxLoop();
 
-// if this left the run() method cleanly despite the fact that 
this was canceled,
-// make sure the "clean shutdown" is not attempted
-ensureNotCanceled();
+// if this left the run() method cleanly despite the fact that this 
was canceled,
+// make sure the "clean shutdown" is not attempted
+ensureNotCanceled();
 
-afterInvoke();
+afterInvoke();
+}
+
+private void runWithCleanUpOnFail(RunnableWithException run) throws 
Exception {
+try {
+run.run();
 } catch (Throwable invokeException) {
 failing = !canceled;
 try {
 if (!canceled) {
-cancelTask();
+try {
+cancelTask();
+} catch (Throwable ex) {
+invokeException = firstOrSuppressed(ex, 
invokeException);
+}

Review comment:
   1. shouldn't this

[jira] [Commented] (FLINK-18027) ROW value constructor cannot deal with complex expressions

2021-04-30 Thread Jark Wu (Jira)



[ 
https://issues.apache.org/jira/browse/FLINK-18027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17337346#comment-17337346
 ] 

Jark Wu commented on FLINK-18027:
-

What about {{STRUCT(...)}} ? This is also a well-known word and Spark provides 
this function to construct row. 

https://spark.apache.org/docs/latest/api/sql/#struct

> ROW value constructor cannot deal with complex expressions
> --
>
> Key: FLINK-18027
> URL: https://issues.apache.org/jira/browse/FLINK-18027
> Project: Flink
>  Issue Type: Bug
>  Components: Table SQL / API
>Reporter: Benchao Li
>Priority: Major
>  Labels: pull-request-available
>
> {code:java}
> create table my_source (
> my_row row
> ) with (...);
> create table my_sink (
> my_row row
> ) with (...);
> insert into my_sink
> select ROW(my_row.a, my_row.b) 
> from my_source;{code}
> will throw excepions:
> {code:java}
> Exception in thread "main" org.apache.flink.table.api.SqlParserException: SQL 
> parse failed. Encountered "." at line 1, column 18.Exception in thread "main" 
> org.apache.flink.table.api.SqlParserException: SQL parse failed. Encountered 
> "." at line 1, column 18.Was expecting one of:    ")" ...    "," ...     at 
> org.apache.flink.table.planner.calcite.CalciteParser.parse(CalciteParser.java:56)
>  at 
> org.apache.flink.table.planner.delegation.ParserImpl.parse(ParserImpl.java:64)
>  at 
> org.apache.flink.table.api.internal.TableEnvironmentImpl.sqlQuery(TableEnvironmentImpl.java:627)
>  at com.bytedance.demo.KafkaTableSource.main(KafkaTableSource.java:76)Caused 
> by: org.apache.calcite.sql.parser.SqlParseException: Encountered "." at line 
> 1, column 18.Was expecting one of:    ")" ...    "," ...     at 
> org.apache.flink.sql.parser.impl.FlinkSqlParserImpl.convertException(FlinkSqlParserImpl.java:416)
>  at 
> org.apache.flink.sql.parser.impl.FlinkSqlParserImpl.normalizeException(FlinkSqlParserImpl.java:201)
>  at 
> org.apache.calcite.sql.parser.SqlParser.handleException(SqlParser.java:148) 
> at org.apache.calcite.sql.parser.SqlParser.parseQuery(SqlParser.java:163) at 
> org.apache.calcite.sql.parser.SqlParser.parseStmt(SqlParser.java:188) at 
> org.apache.flink.table.planner.calcite.CalciteParser.parse(CalciteParser.java:54)
>  ... 3 moreCaused by: org.apache.flink.sql.parser.impl.ParseException: 
> Encountered "." at line 1, column 18.Was expecting one of:    ")" ...    "," 
> ...     at 
> org.apache.flink.sql.parser.impl.FlinkSqlParserImpl.generateParseException(FlinkSqlParserImpl.java:36161)
>  at 
> org.apache.flink.sql.parser.impl.FlinkSqlParserImpl.jj_consume_token(FlinkSqlParserImpl.java:35975)
>  at 
> org.apache.flink.sql.parser.impl.FlinkSqlParserImpl.ParenthesizedSimpleIdentifierList(FlinkSqlParserImpl.java:21432)
>  at 
> org.apache.flink.sql.parser.impl.FlinkSqlParserImpl.Expression3(FlinkSqlParserImpl.java:17164)
>  at 
> org.apache.flink.sql.parser.impl.FlinkSqlParserImpl.Expression2b(FlinkSqlParserImpl.java:16820)
>  at 
> org.apache.flink.sql.parser.impl.FlinkSqlParserImpl.Expression2(FlinkSqlParserImpl.java:16861)
>  at 
> org.apache.flink.sql.parser.impl.FlinkSqlParserImpl.Expression(FlinkSqlParserImpl.java:16792)
>  at 
> org.apache.flink.sql.parser.impl.FlinkSqlParserImpl.SelectExpression(FlinkSqlParserImpl.java:11091)
>  at 
> org.apache.flink.sql.parser.impl.FlinkSqlParserImpl.SelectItem(FlinkSqlParserImpl.java:10293)
>  at 
> org.apache.flink.sql.parser.impl.FlinkSqlParserImpl.SelectList(FlinkSqlParserImpl.java:10267)
>  at 
> org.apache.flink.sql.parser.impl.FlinkSqlParserImpl.SqlSelect(FlinkSqlParserImpl.java:6943)
>  at 
> org.apache.flink.sql.parser.impl.FlinkSqlParserImpl.LeafQuery(FlinkSqlParserImpl.java:658)
>  at 
> org.apache.flink.sql.parser.impl.FlinkSqlParserImpl.LeafQueryOrExpr(FlinkSqlParserImpl.java:16775)
>  at 
> org.apache.flink.sql.parser.impl.FlinkSqlParserImpl.QueryOrExpr(FlinkSqlParserImpl.java:16238)
>  at 
> org.apache.flink.sql.parser.impl.FlinkSqlParserImpl.OrderedQueryOrExpr(FlinkSqlParserImpl.java:532)
>  at 
> org.apache.flink.sql.parser.impl.FlinkSqlParserImpl.SqlStmt(FlinkSqlParserImpl.java:3761)
>  at 
> org.apache.flink.sql.parser.impl.FlinkSqlParserImpl.SqlStmtEof(FlinkSqlParserImpl.java:3800)
>  at 
> org.apache.flink.sql.parser.impl.FlinkSqlParserImpl.parseSqlStmtEof(FlinkSqlParserImpl.java:248)
>  at org.apache.calcite.sql.parser.SqlParser.parseQuery(SqlParser.java:161) 
> ... 5 more
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[GitHub] [flink] flinkbot commented on pull request #15820: [FLINK-22535][runtime] CleanUp is invoked for task even when the task…



flinkbot commented on pull request #15820:
URL: https://github.com/apache/flink/pull/15820#issuecomment-830052015


   
   ## CI report:
   
   * c548fc5c79450d898eebc1c6523be98472fe1cdd UNKNOWN
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] flinkbot edited a comment on pull request #15789: [FLINK-21181][runtime] Wait for Invokable cancellation before releasing network resources



flinkbot edited a comment on pull request #15789:
URL: https://github.com/apache/flink/pull/15789#issuecomment-827996694


   
   ## CI report:
   
   * 4c5180310bf76e96f2665bf53531eccb1fa86421 UNKNOWN
   * f87ee85814873dee4ed181eee11df9a6758a916e Azure: 
[FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=17443)
 
   * 46e1be2c4832080bf1cb48c509b77cd88872d024 UNKNOWN
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] flinkbot edited a comment on pull request #15656: [FLINK-22233][Table SQL / API]Modified the spelling error of word "constant" in source code



flinkbot edited a comment on pull request #15656:
URL: https://github.com/apache/flink/pull/15656#issuecomment-822135263


   
   ## CI report:
   
   * 6358c33368746cf2c93463ab423f5c9ff11b641b Azure: 
[FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=17410)
 
   * fb69e9a4c5c897e75659c28915f9b9aa08e2f0c3 UNKNOWN
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] flinkbot edited a comment on pull request #15131: [FLINK-21700][security] Add an option to disable credential retrieval on a secure cluster



flinkbot edited a comment on pull request #15131:
URL: https://github.com/apache/flink/pull/15131#issuecomment-794957907


   
   ## CI report:
   
   * 60812bb2804fd21d56a47a23ca4b42a04300f1c0 Azure: 
[SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=17419)
 
   * 8cb12deac019898bb69f7a5faf7f803dd27d71d0 UNKNOWN
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] gaoyunhaii commented on a change in pull request #15820: [FLINK-22535][runtime] CleanUp is invoked for task even when the task…



gaoyunhaii commented on a change in pull request #15820:
URL: https://github.com/apache/flink/pull/15820#discussion_r623824848



##
File path: 
flink-streaming-java/src/main/java/org/apache/flink/streaming/runtime/tasks/StreamTask.java
##
@@ -530,6 +532,10 @@ protected Counter setupNumRecordsInCounter(StreamOperator 
streamOperator) {
 
 @Override
 public void restore() throws Exception {
+runWithCleanUpOnFail(this::executeRestore);

Review comment:
   Would it be better to make `restore` also `final`, and let the 
subclasses to override `executeRestore` instead ? Since the exception handling 
should be always required

##
File path: 
flink-streaming-java/src/main/java/org/apache/flink/streaming/runtime/tasks/StreamTask.java
##
@@ -609,42 +615,54 @@ private void ensureNotCanceled() {
 
 @Override
 public final void invoke() throws Exception {
-try {
-// Allow invoking method 'invoke' without having to call 'restore' 
before it.
-if (!isRunning) {
-LOG.debug("Restoring during invoke will be called.");
-restore();
-}
+runWithCleanUpOnFail(this::executeInvoke);
+
+cleanUpInvoke();
+}
+
+private void executeInvoke() throws Exception {
+// Allow invoking method 'invoke' without having to call 'restore' 
before it.
+if (!isRunning) {
+LOG.debug("Restoring during invoke will be called.");
+restore();

Review comment:
   Would it be better to call `executeRestore` here to avoid nested 
exception handling?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Commented] (FLINK-18027) ROW value constructor cannot deal with complex expressions

2021-04-30 Thread Timo Walther (Jira)



[ 
https://issues.apache.org/jira/browse/FLINK-18027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17337323#comment-17337323
 ] 

Timo Walther commented on FLINK-18027:
--

[~jark] what do you think about introducing a new built-in function 
`row_from()` similar to `map_from_arrays` etc. to avoid all these problems with 
Calcite's parser. `ROW()` seems to cause a lot of issues that could actually be 
solved by a simple built-in function.

> ROW value constructor cannot deal with complex expressions
> --
>
> Key: FLINK-18027
> URL: https://issues.apache.org/jira/browse/FLINK-18027
> Project: Flink
>  Issue Type: Bug
>  Components: Table SQL / API
>Reporter: Benchao Li
>Priority: Major
>  Labels: pull-request-available
>
> {code:java}
> create table my_source (
> my_row row
> ) with (...);
> create table my_sink (
> my_row row
> ) with (...);
> insert into my_sink
> select ROW(my_row.a, my_row.b) 
> from my_source;{code}
> will throw excepions:
> {code:java}
> Exception in thread "main" org.apache.flink.table.api.SqlParserException: SQL 
> parse failed. Encountered "." at line 1, column 18.Exception in thread "main" 
> org.apache.flink.table.api.SqlParserException: SQL parse failed. Encountered 
> "." at line 1, column 18.Was expecting one of:    ")" ...    "," ...     at 
> org.apache.flink.table.planner.calcite.CalciteParser.parse(CalciteParser.java:56)
>  at 
> org.apache.flink.table.planner.delegation.ParserImpl.parse(ParserImpl.java:64)
>  at 
> org.apache.flink.table.api.internal.TableEnvironmentImpl.sqlQuery(TableEnvironmentImpl.java:627)
>  at com.bytedance.demo.KafkaTableSource.main(KafkaTableSource.java:76)Caused 
> by: org.apache.calcite.sql.parser.SqlParseException: Encountered "." at line 
> 1, column 18.Was expecting one of:    ")" ...    "," ...     at 
> org.apache.flink.sql.parser.impl.FlinkSqlParserImpl.convertException(FlinkSqlParserImpl.java:416)
>  at 
> org.apache.flink.sql.parser.impl.FlinkSqlParserImpl.normalizeException(FlinkSqlParserImpl.java:201)
>  at 
> org.apache.calcite.sql.parser.SqlParser.handleException(SqlParser.java:148) 
> at org.apache.calcite.sql.parser.SqlParser.parseQuery(SqlParser.java:163) at 
> org.apache.calcite.sql.parser.SqlParser.parseStmt(SqlParser.java:188) at 
> org.apache.flink.table.planner.calcite.CalciteParser.parse(CalciteParser.java:54)
>  ... 3 moreCaused by: org.apache.flink.sql.parser.impl.ParseException: 
> Encountered "." at line 1, column 18.Was expecting one of:    ")" ...    "," 
> ...     at 
> org.apache.flink.sql.parser.impl.FlinkSqlParserImpl.generateParseException(FlinkSqlParserImpl.java:36161)
>  at 
> org.apache.flink.sql.parser.impl.FlinkSqlParserImpl.jj_consume_token(FlinkSqlParserImpl.java:35975)
>  at 
> org.apache.flink.sql.parser.impl.FlinkSqlParserImpl.ParenthesizedSimpleIdentifierList(FlinkSqlParserImpl.java:21432)
>  at 
> org.apache.flink.sql.parser.impl.FlinkSqlParserImpl.Expression3(FlinkSqlParserImpl.java:17164)
>  at 
> org.apache.flink.sql.parser.impl.FlinkSqlParserImpl.Expression2b(FlinkSqlParserImpl.java:16820)
>  at 
> org.apache.flink.sql.parser.impl.FlinkSqlParserImpl.Expression2(FlinkSqlParserImpl.java:16861)
>  at 
> org.apache.flink.sql.parser.impl.FlinkSqlParserImpl.Expression(FlinkSqlParserImpl.java:16792)
>  at 
> org.apache.flink.sql.parser.impl.FlinkSqlParserImpl.SelectExpression(FlinkSqlParserImpl.java:11091)
>  at 
> org.apache.flink.sql.parser.impl.FlinkSqlParserImpl.SelectItem(FlinkSqlParserImpl.java:10293)
>  at 
> org.apache.flink.sql.parser.impl.FlinkSqlParserImpl.SelectList(FlinkSqlParserImpl.java:10267)
>  at 
> org.apache.flink.sql.parser.impl.FlinkSqlParserImpl.SqlSelect(FlinkSqlParserImpl.java:6943)
>  at 
> org.apache.flink.sql.parser.impl.FlinkSqlParserImpl.LeafQuery(FlinkSqlParserImpl.java:658)
>  at 
> org.apache.flink.sql.parser.impl.FlinkSqlParserImpl.LeafQueryOrExpr(FlinkSqlParserImpl.java:16775)
>  at 
> org.apache.flink.sql.parser.impl.FlinkSqlParserImpl.QueryOrExpr(FlinkSqlParserImpl.java:16238)
>  at 
> org.apache.flink.sql.parser.impl.FlinkSqlParserImpl.OrderedQueryOrExpr(FlinkSqlParserImpl.java:532)
>  at 
> org.apache.flink.sql.parser.impl.FlinkSqlParserImpl.SqlStmt(FlinkSqlParserImpl.java:3761)
>  at 
> org.apache.flink.sql.parser.impl.FlinkSqlParserImpl.SqlStmtEof(FlinkSqlParserImpl.java:3800)
>  at 
> org.apache.flink.sql.parser.impl.FlinkSqlParserImpl.parseSqlStmtEof(FlinkSqlParserImpl.java:248)
>  at org.apache.calcite.sql.parser.SqlParser.parseQuery(SqlParser.java:161) 
> ... 5 more
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[GitHub] [flink] flinkbot commented on pull request #15820: [FLINK-22535][runtime] CleanUp is invoked for task even when the task…



flinkbot commented on pull request #15820:
URL: https://github.com/apache/flink/pull/15820#issuecomment-830041840


   Thanks a lot for your contribution to the Apache Flink project. I'm the 
@flinkbot. I help the community
   to review your pull request. We will use this comment to track the progress 
of the review.
   
   
   ## Automated Checks
   Last check on commit c548fc5c79450d898eebc1c6523be98472fe1cdd (Fri Apr 30 
11:53:57 UTC 2021)
   
   **Warnings:**
* No documentation files were touched! Remember to keep the Flink docs up 
to date!
   
   
   Mention the bot in a comment to re-run the automated checks.
   ## Review Progress
   
   * ❓ 1. The [description] looks good.
   * ❓ 2. There is [consensus] that the contribution should go into to Flink.
   * ❓ 3. Needs [attention] from.
   * ❓ 4. The change fits into the overall [architecture].
   * ❓ 5. Overall code [quality] is good.
   
   Please see the [Pull Request Review 
Guide](https://flink.apache.org/contributing/reviewing-prs.html) for a full 
explanation of the review process.
The Bot is tracking the review progress through labels. Labels are applied 
according to the order of the review items. For consensus, approval by a Flink 
committer of PMC member is required Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot approve description` to approve one or more aspects (aspects: 
`description`, `consensus`, `architecture` and `quality`)
- `@flinkbot approve all` to approve all aspects
- `@flinkbot approve-until architecture` to approve everything until 
`architecture`
- `@flinkbot attention @username1 [@username2 ..]` to require somebody's 
attention
- `@flinkbot disapprove architecture` to remove an approval you gave earlier
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Closed] (FLINK-12710) Unify built-in and user-defined functions in the API modules

2021-04-30 Thread Timo Walther (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-12710?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Timo Walther closed FLINK-12710.

Resolution: Implemented

Implemented as part of FLINK-20522.

> Unify built-in and user-defined functions in the API modules
> 
>
> Key: FLINK-12710
> URL: https://issues.apache.org/jira/browse/FLINK-12710
> Project: Flink
>  Issue Type: New Feature
>  Components: Table SQL / API
>Reporter: Timo Walther
>Priority: Major
>  Labels: auto-unassigned, pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Currently, there are three completely different stacks of functions: Table 
> API builtins, SQL builtins, and user-defined types.
> Both the Blink and the legacy planner define a separate list of functions and 
> implementations with different type system and type checking logic.
> The long-term goal of this issue is to unify all 6 different stacks into a 
> common one. This includes better support for type inference which relates to 
> FLINK-12251.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[GitHub] [flink] flinkbot edited a comment on pull request #15817: [FLINK-14393][webui] Add an option to enable/disable cancel job in we…