[GitHub] [flink] flinkbot edited a comment on pull request #13277: [FLINK-19089][Connectors / Kafka] Replace ReentrantLock with ReentrantReadWriteLock in ClosableBlockingQueue

2020-08-28 Thread GitBox


flinkbot edited a comment on pull request #13277:
URL: https://github.com/apache/flink/pull/13277#issuecomment-683239187


   
   ## CI report:
   
   * d6be96d65aff9af67e8af0a2cb3edf68cc084940 Azure: 
[PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=5980)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot commented on pull request #13277: [FLINK-19089][Connectors / Kafka] Replace ReentrantLock with ReentrantReadWriteLock in ClosableBlockingQueue

2020-08-28 Thread GitBox


flinkbot commented on pull request #13277:
URL: https://github.com/apache/flink/pull/13277#issuecomment-683239187


   
   ## CI report:
   
   * d6be96d65aff9af67e8af0a2cb3edf68cc084940 UNKNOWN
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot commented on pull request #13277: [FLINK-19089][Connectors / Kafka] Replace ReentrantLock with ReentrantReadWriteLock in ClosableBlockingQueue

2020-08-28 Thread GitBox


flinkbot commented on pull request #13277:
URL: https://github.com/apache/flink/pull/13277#issuecomment-683238576


   Thanks a lot for your contribution to the Apache Flink project. I'm the 
@flinkbot. I help the community
   to review your pull request. We will use this comment to track the progress 
of the review.
   
   
   ## Automated Checks
   Last check on commit 22af70e728f6824a4c6a11d208ef6bea56318c3e (Sat Aug 29 
05:13:28 UTC 2020)
   
   **Warnings:**
* No documentation files were touched! Remember to keep the Flink docs up 
to date!
* **This pull request references an unassigned [Jira 
ticket](https://issues.apache.org/jira/browse/FLINK-19089).** According to the 
[code contribution 
guide](https://flink.apache.org/contributing/contribute-code.html), tickets 
need to be assigned before starting with the implementation work.
   
   
   Mention the bot in a comment to re-run the automated checks.
   ## Review Progress
   
   * ❓ 1. The [description] looks good.
   * ❓ 2. There is [consensus] that the contribution should go into to Flink.
   * ❓ 3. Needs [attention] from.
   * ❓ 4. The change fits into the overall [architecture].
   * ❓ 5. Overall code [quality] is good.
   
   Please see the [Pull Request Review 
Guide](https://flink.apache.org/contributing/reviewing-prs.html) for a full 
explanation of the review process.
The Bot is tracking the review progress through labels. Labels are applied 
according to the order of the review items. For consensus, approval by a Flink 
committer of PMC member is required Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot approve description` to approve one or more aspects (aspects: 
`description`, `consensus`, `architecture` and `quality`)
- `@flinkbot approve all` to approve all aspects
- `@flinkbot approve-until architecture` to approve everything until 
`architecture`
- `@flinkbot attention @username1 [@username2 ..]` to require somebody's 
attention
- `@flinkbot disapprove architecture` to remove an approval you gave earlier
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Updated] (FLINK-19089) Replace ReentrantLock with ReentrantReadWriteLock in ClosableBlockingQueue

2020-08-28 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-19089?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated FLINK-19089:
---
Labels: pull-request-available  (was: )

> Replace ReentrantLock with ReentrantReadWriteLock in ClosableBlockingQueue
> --
>
> Key: FLINK-19089
> URL: https://issues.apache.org/jira/browse/FLINK-19089
> Project: Flink
>  Issue Type: Improvement
>Reporter: dugenkui
>Priority: Major
>  Labels: pull-request-available
>
> 1. Replace ReentrantLock with ReentrantReadWriteLock to improve concurrency;
> 2. Use signal instead of signalAll to reduce the thread scheduling.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [flink] dugenkui03 opened a new pull request #13277: [FLINK-19089][Connectors / Kafka] Replace ReentrantLock with ReentrantReadWriteLock in ClosableBlockingQueue

2020-08-28 Thread GitBox


dugenkui03 opened a new pull request #13277:
URL: https://github.com/apache/flink/pull/13277


   
   ## What is the purpose of the change
   
   1. Replace `ReentrantLock` with `ReentrantReadWriteLock` to improve 
concurrency;
   
   2. Use `signal` instead of `signalAll` to reduce the thread scheduling.
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Created] (FLINK-19089) Replace ReentrantLock with ReentrantReadWriteLock in ClosableBlockingQueue

2020-08-28 Thread dugenkui (Jira)
dugenkui created FLINK-19089:


 Summary: Replace ReentrantLock with ReentrantReadWriteLock in 
ClosableBlockingQueue
 Key: FLINK-19089
 URL: https://issues.apache.org/jira/browse/FLINK-19089
 Project: Flink
  Issue Type: Improvement
Reporter: dugenkui


1. Replace ReentrantLock with ReentrantReadWriteLock to improve concurrency;

2. Use signal instead of signalAll to reduce the thread scheduling.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (FLINK-14964) Support configure remote flink jar

2020-08-28 Thread Zili Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-14964?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zili Chen closed FLINK-14964.
-
  Assignee: Zili Chen
Resolution: Won't Do

The original proposal is as 

[https://github.com/apache/flink/pull/10187]

and the patch linked in this issue.

At that moment I'd like to propose support config remote library and remote 
flink jar(as our codebase specially handle it) separately. But the proposal 
gets little response so I'm not sure if it is a common requirement and thus I 
don't push it to master branch.

Still the code snippets are here and anyone later have such a requirement can 
make use of them.

> Support configure remote flink jar
> --
>
> Key: FLINK-14964
> URL: https://issues.apache.org/jira/browse/FLINK-14964
> Project: Flink
>  Issue Type: Improvement
>  Components: Client / Job Submission, Deployment / YARN
>Reporter: Zili Chen
>Assignee: Zili Chen
>Priority: Major
> Attachments: 
> [XXX]_Enable_configure_shared_libraries_[YYY]_Enable_configure_remote_flink_jar.patch
>
>
> Corresponding discussion happens on 
> [https://lists.apache.org/x/thread.html/a2fef19f6925d43dd2cf3f9a46a2263b20562bd61d4da970f609568d@%3Cdev.flink.apache.org%3E]
> This ticket focuses on support configure remote flink jar. That is, when user 
> configure flink jar as "hdfs://..." we also respect the pattern and instead 
> of uploading from local, directly register as YARN's {{LocalResource}}.
> This improvement will speed up the deployment on YARN.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-18934) Idle stream does not advance watermark in connected stream

2020-08-28 Thread Benchao Li (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-18934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17186872#comment-17186872
 ] 

Benchao Li commented on FLINK-18934:


+1 for fixing this issue, we also found this limitation when using 'temporal 
table function', and fixed it in our internal branch.

> Idle stream does not advance watermark in connected stream
> --
>
> Key: FLINK-18934
> URL: https://issues.apache.org/jira/browse/FLINK-18934
> Project: Flink
>  Issue Type: Bug
>  Components: API / DataStream
>Affects Versions: 1.11.1
>Reporter: Truong Duc Kien
>Priority: Major
>
> Per Flink documents, when a stream is idle, it will allow watermarks of 
> downstream operator to advance. However, when I connect an active data stream 
> with an idle data stream, the output watermark of the CoProcessOperator does 
> not increase.
> Here's a small test that reproduces the problem.
> https://github.com/kien-truong/flink-idleness-testing



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-18934) Idle stream does not advance watermark in connected stream

2020-08-28 Thread Kenneth William Krugler (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-18934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17186823#comment-17186823
 ] 

Kenneth William Krugler commented on FLINK-18934:
-

On the list, [~dwysakowicz] said:
{quote}Hi Kien,
I am afraid this is a valid bug. I am not 100% sure but the way I
understand the code the idleness mechanism applies to input channels,
which means e.g. when multiple parallell instances shuffle its results
to downstream operators.
In case of a two input operator, combining the watermark of two
different upstream operators happens inside of the operator itself.
There we do not have the idleness status. We do not have a status that a
whole upstream operator became idle. That's definitely a bug/limitation.
I'm also cc'ing Aljoscha who could maybe confirm my analysis.
Best,
Dawid{quote}

And [~aljoscha] added:
{quote}Yes, I'm afraid this analysis is correct. The StreamOperator, 
AbstractStreamOperator to be specific, computes the combined watermarks from 
both inputs here: 
https://github.com/apache/flink/blob/f0ed29c06d331892a06ee9bddea4173d6300835d/flink-streaming-java/src/main/java/org/apache/flink/streaming/api/operators/AbstractStreamOperator.java#L573.
 The operator layer is not aware of idleness so it will never notice. The 
idleness only works on the level of inputs but is never forwarded to an 
operator itself.

To fix this we would have to also make operators aware of idleness such that 
they can take this into account when computing the combined output watermark.

Best,
Aljoscha{quote}


> Idle stream does not advance watermark in connected stream
> --
>
> Key: FLINK-18934
> URL: https://issues.apache.org/jira/browse/FLINK-18934
> Project: Flink
>  Issue Type: Bug
>  Components: API / DataStream
>Affects Versions: 1.11.1
>Reporter: Truong Duc Kien
>Priority: Major
>
> Per Flink documents, when a stream is idle, it will allow watermarks of 
> downstream operator to advance. However, when I connect an active data stream 
> with an idle data stream, the output watermark of the CoProcessOperator does 
> not increase.
> Here's a small test that reproduces the problem.
> https://github.com/kien-truong/flink-idleness-testing



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [flink] flinkbot edited a comment on pull request #13265: [FLINK-19055] Wait less time for all memory GC in tests (MemoryManager#verifyEmpty)

2020-08-28 Thread GitBox


flinkbot edited a comment on pull request #13265:
URL: https://github.com/apache/flink/pull/13265#issuecomment-682059106


   
   ## CI report:
   
   * 92e66f032e705f7ce95b4bcb68c494f4dac0529a Azure: 
[SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=5977)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot edited a comment on pull request #13251: [FLINK-14435] Added memory configuration to TaskManagers REST endpoint

2020-08-28 Thread GitBox


flinkbot edited a comment on pull request #13251:
URL: https://github.com/apache/flink/pull/13251#issuecomment-680904640


   
   ## CI report:
   
   * 0afb282498cadbef9dabb45e6ce6a6dcbc3bc6b7 Azure: 
[SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=5976)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot edited a comment on pull request #13251: [FLINK-14435] Added memory configuration to TaskManagers REST endpoint

2020-08-28 Thread GitBox


flinkbot edited a comment on pull request #13251:
URL: https://github.com/apache/flink/pull/13251#issuecomment-680904640


   
   ## CI report:
   
   * 5c299cca964c6a0d5f1e61c2d8379f0c2717622e Azure: 
[CANCELED](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=5974)
 
   * 0afb282498cadbef9dabb45e6ce6a6dcbc3bc6b7 Azure: 
[PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=5976)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot edited a comment on pull request #13265: [FLINK-19055] Wait less time for all memory GC in tests (MemoryManager#verifyEmpty)

2020-08-28 Thread GitBox


flinkbot edited a comment on pull request #13265:
URL: https://github.com/apache/flink/pull/13265#issuecomment-682059106


   
   ## CI report:
   
   * af69156ea6dbc7513cb7dde023cdf8cc454bec31 Azure: 
[SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=5936)
 
   * 92e66f032e705f7ce95b4bcb68c494f4dac0529a Azure: 
[PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=5977)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot edited a comment on pull request #13265: [FLINK-19055] Wait less time for all memory GC in tests (MemoryManager#verifyEmpty)

2020-08-28 Thread GitBox


flinkbot edited a comment on pull request #13265:
URL: https://github.com/apache/flink/pull/13265#issuecomment-682059106


   
   ## CI report:
   
   * af69156ea6dbc7513cb7dde023cdf8cc454bec31 Azure: 
[SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=5936)
 
   * 92e66f032e705f7ce95b4bcb68c494f4dac0529a UNKNOWN
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot edited a comment on pull request #13251: [FLINK-14435] Added memory configuration to TaskManagers REST endpoint

2020-08-28 Thread GitBox


flinkbot edited a comment on pull request #13251:
URL: https://github.com/apache/flink/pull/13251#issuecomment-680904640


   
   ## CI report:
   
   * 63794c99aa8dc635bc15ce3e7013241fa291f658 Azure: 
[CANCELED](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=5969)
 
   * 5c299cca964c6a0d5f1e61c2d8379f0c2717622e Azure: 
[PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=5974)
 
   * 0afb282498cadbef9dabb45e6ce6a6dcbc3bc6b7 Azure: 
[PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=5976)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot edited a comment on pull request #13216: [FLINK-18999][table-planner-blink][hive] Temporary generic table does…

2020-08-28 Thread GitBox


flinkbot edited a comment on pull request #13216:
URL: https://github.com/apache/flink/pull/13216#issuecomment-678268420


   
   ## CI report:
   
   * 31eef5a5d1f413435c3a3ccd9764c185b66c1c08 Azure: 
[FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=5970)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (FLINK-19086) Performance regressin 2020-08-28 in globalWindow benchmark

2020-08-28 Thread Roman Khachatryan (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-19086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17186657#comment-17186657
 ] 

Roman Khachatryan commented on FLINK-19086:
---

However, the "old" machine still gives better results for older commit than for 
a newer:
{code:java}
642d4fe48c (Aug 26)
"Benchmark","Mode","Threads","Samples","Score","Score Error (99.9%)","Unit"
"org.apache.flink.benchmark.WindowBenchmarks.globalWindow","thrpt",1,30,7484.147160,418.570329,"ops/ms"
"org.apache.flink.benchmark.WindowBenchmarks.sessionWindow","thrpt",1,30,921.140648,5.433683,"ops/ms"
"org.apache.flink.benchmark.WindowBenchmarks.slidingWindow","thrpt",1,30,791.730208,4.176828,"ops/ms"
"org.apache.flink.benchmark.WindowBenchmarks.tumblingWindow","thrpt",1,30,4539.169233,105.467920,"ops/ms"

278d2a043 (Aug 28)
"org.apache.flink.benchmark.WindowBenchmarks.globalWindow","thrpt",1,30,7117.974175,202.626669,"ops/ms"
"org.apache.flink.benchmark.WindowBenchmarks.sessionWindow","thrpt",1,30,905.537850,2.796048,"ops/ms"
"org.apache.flink.benchmark.WindowBenchmarks.slidingWindow","thrpt",1,30,788.335585,5.004561,"ops/ms"
"org.apache.flink.benchmark.WindowBenchmarks.tumblingWindow","thrpt",1,30,4303.568852,85.755644,"ops/ms"
{code}

> Performance regressin 2020-08-28 in globalWindow benchmark
> --
>
> Key: FLINK-19086
> URL: https://issues.apache.org/jira/browse/FLINK-19086
> Project: Flink
>  Issue Type: Bug
>  Components: Benchmarks, Runtime / Task
>Affects Versions: 1.12.0
>Reporter: Roman Khachatryan
>Assignee: Roman Khachatryan
>Priority: Major
> Attachments: Screenshot_2020-08-28_09-58-31.png
>
>
> [http://codespeed.dak8s.net:8000/timeline/?ben=globalWindow=2]
> [http://codespeed.dak8s.net:8000/timeline/#/?exe=1,3=tumblingWindow=2=200=off=on=on]
>  
> The results started to decrease 2 days before decomissioning of an old 
> jenkins node.
> The other tests, however, were stable.
>  
> cc: [~pnowojski]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [flink-web] azagrebin opened a new pull request #373: Add blog post: Memory Management improvements for Flink’s JobManager in Apache Flink 1.11

2020-08-28 Thread GitBox


azagrebin opened a new pull request #373:
URL: https://github.com/apache/flink-web/pull/373


   Source in 
[gdoc](https://docs.google.com/document/d/1K7Kte14C6txZdiWNIfLwD7NVbnEKwHzWjLtcNllwQBk).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (FLINK-18603) SQL fails with java.lang.IllegalStateException: No operators defined in streaming topology

2020-08-28 Thread MR HOUSTON PILLAY (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-18603?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17186654#comment-17186654
 ] 

MR HOUSTON PILLAY commented on FLINK-18603:
---

[~afilipchik], running into this same issue. Could you please let me know how 
you resolved your format issues?

> SQL fails with java.lang.IllegalStateException: No operators defined in 
> streaming topology
> --
>
> Key: FLINK-18603
> URL: https://issues.apache.org/jira/browse/FLINK-18603
> Project: Flink
>  Issue Type: Bug
>  Components: Table SQL / Runtime
>Affects Versions: 1.11.0
>Reporter: Alexander Filipchik
>Priority: Major
>
> Hi, was playing with 1.11 and found that code that worked in 1.10.1 fails in 
> 1.11.0 with : 
> {code:java}
> Exception in thread "main" java.lang.IllegalStateException: No operators 
> defined in streaming topology. Cannot generate StreamGraph.
>   at 
> org.apache.flink.table.planner.utils.ExecutorUtils.generateStreamGraph(ExecutorUtils.java:47)
>   at 
> org.apache.flink.table.planner.delegation.StreamExecutor.createPipeline(StreamExecutor.java:47)
>   at 
> org.apache.flink.table.api.internal.TableEnvironmentImpl.execute(TableEnvironmentImpl.java:1197)
>   at com.css.flink.avro.confluent.table.SqlTest.main(SqlTest.java:53)
> {code}
> code example:
> {code:java}
> StreamExecutionEnvironment bsEnv = 
> StreamExecutionEnvironment.getExecutionEnvironment();
> EnvironmentSettings bsSettings =
> 
> EnvironmentSettings.newInstance().useBlinkPlanner().inStreamingMode().build();
> StreamTableEnvironment tEnv = StreamTableEnvironment.create(bsEnv, 
> bsSettings);
> String createTable =
> String.format(
> "create table EnrichedOrders ("
> + "name VARCHAR,"
> + "proctime AS PROCTIME()"
> + ") with ("
> + "  'connector.type' = 'kafka',"
> + "  'connector.version' = 'universal',"
> + "  'connector.property-version' = '1',"
> + "  'connector.topic' = '%s',"
> + "  'connector.properties.bootstrap.servers' = '%s',"
> + "  'connector.properties.group.id' = '%s',"
> + "  'connector.startup-mode' = 'earliest-offset',"
> + "  'update-mode' = 'append',"
> + "  'format.type' = 'confluent-avro',"
> + "  'format.schema-registry' = '%s'"
> + ")",
> "avro",
> "broker",
> "testSqlLocal",
> "registry");
> tEnv.executeSql(createTable);
> tEnv.toAppendStream(
> tEnv.sqlQuery(
> "select name, sum(*) "
> + "from EnrichedOrders "
> + "GROUP BY TUMBLE(proctime, INTERVAL '1' MINUTE), name"),
> Row.class)
> .print();
> tEnv.execute("testSql");
> {code}
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [flink] XComp commented on pull request #13251: [FLINK-14435] Added memory configuration to TaskManagers REST endpoint

2020-08-28 Thread GitBox


XComp commented on pull request #13251:
URL: https://github.com/apache/flink/pull/13251#issuecomment-682824021


   @tillrohrmann The issues are fixed and the PR builds successfully now.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot edited a comment on pull request #13209: [FLINK-18832][datastream] Add compatible check for blocking partition with buffer timeout

2020-08-28 Thread GitBox


flinkbot edited a comment on pull request #13209:
URL: https://github.com/apache/flink/pull/13209#issuecomment-677744672


   
   ## CI report:
   
   * a343c2c3bf36c97dca7045c65eccbcccfbbef5bf UNKNOWN
   * 3de3f59281d22f21dfbb30cf7d93b1ecd3165a9e Azure: 
[FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=5968)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (FLINK-19086) Performance regressin 2020-08-28 in globalWindow benchmark

2020-08-28 Thread Roman Khachatryan (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-19086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17186639#comment-17186639
 ] 

Roman Khachatryan commented on FLINK-19086:
---

Locally couldn't confirm:

 
{code:java}
642d4fe48c (Aug 26)
Benchmark Mode  Cnt Score Error   Units
WindowBenchmarks.globalWindowthrpt   30  4580.346 ± 215.247  ops/ms
WindowBenchmarks.sessionWindow   thrpt   30   377.308 ±  40.897  ops/ms
WindowBenchmarks.slidingWindow   thrpt   30   448.708 ±  20.665  ops/ms
WindowBenchmarks.tumblingWindow  thrpt   30  3199.008 ±  43.838  ops/ms

e690793c6 (Aug 27)
Benchmark Mode  Cnt Score Error   Units
WindowBenchmarks.globalWindowthrpt   30  4814.900 ± 100.372  ops/ms
WindowBenchmarks.sessionWindow   thrpt   30   590.319 ±   7.214  ops/ms
WindowBenchmarks.slidingWindow   thrpt   30   509.285 ±   4.370  ops/ms
WindowBenchmarks.tumblingWindow  thrpt   30  3014.359 ±  60.543  ops/ms
{code}
 

 

> Performance regressin 2020-08-28 in globalWindow benchmark
> --
>
> Key: FLINK-19086
> URL: https://issues.apache.org/jira/browse/FLINK-19086
> Project: Flink
>  Issue Type: Bug
>  Components: Benchmarks, Runtime / Task
>Affects Versions: 1.12.0
>Reporter: Roman Khachatryan
>Assignee: Roman Khachatryan
>Priority: Major
> Attachments: Screenshot_2020-08-28_09-58-31.png
>
>
> [http://codespeed.dak8s.net:8000/timeline/?ben=globalWindow=2]
> [http://codespeed.dak8s.net:8000/timeline/#/?exe=1,3=tumblingWindow=2=200=off=on=on]
>  
> The results started to decrease 2 days before decomissioning of an old 
> jenkins node.
> The other tests, however, were stable.
>  
> cc: [~pnowojski]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (FLINK-19086) Performance regressin 2020-08-28 in globalWindow benchmark

2020-08-28 Thread Roman Khachatryan (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-19086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17186639#comment-17186639
 ] 

Roman Khachatryan edited comment on FLINK-19086 at 8/28/20, 3:55 PM:
-

Locally couldn't confirm:
{code:java}
642d4fe48c (Aug 26)
Benchmark Mode  Cnt Score Error   Units
WindowBenchmarks.globalWindowthrpt   30  4580.346 ± 215.247  ops/ms
WindowBenchmarks.sessionWindow   thrpt   30   377.308 ±  40.897  ops/ms
WindowBenchmarks.slidingWindow   thrpt   30   448.708 ±  20.665  ops/ms
WindowBenchmarks.tumblingWindow  thrpt   30  3199.008 ±  43.838  ops/ms

e690793c6 (Aug 27)
Benchmark Mode  Cnt Score Error   Units
WindowBenchmarks.globalWindowthrpt   30  4814.900 ± 100.372  ops/ms
WindowBenchmarks.sessionWindow   thrpt   30   590.319 ±   7.214  ops/ms
WindowBenchmarks.slidingWindow   thrpt   30   509.285 ±   4.370  ops/ms
WindowBenchmarks.tumblingWindow  thrpt   30  3014.359 ±  60.543  ops/ms
{code}
 


was (Author: roman_khachatryan):
Locally couldn't confirm:

 
{code:java}
642d4fe48c (Aug 26)
Benchmark Mode  Cnt Score Error   Units
WindowBenchmarks.globalWindowthrpt   30  4580.346 ± 215.247  ops/ms
WindowBenchmarks.sessionWindow   thrpt   30   377.308 ±  40.897  ops/ms
WindowBenchmarks.slidingWindow   thrpt   30   448.708 ±  20.665  ops/ms
WindowBenchmarks.tumblingWindow  thrpt   30  3199.008 ±  43.838  ops/ms

e690793c6 (Aug 27)
Benchmark Mode  Cnt Score Error   Units
WindowBenchmarks.globalWindowthrpt   30  4814.900 ± 100.372  ops/ms
WindowBenchmarks.sessionWindow   thrpt   30   590.319 ±   7.214  ops/ms
WindowBenchmarks.slidingWindow   thrpt   30   509.285 ±   4.370  ops/ms
WindowBenchmarks.tumblingWindow  thrpt   30  3014.359 ±  60.543  ops/ms
{code}
 

 

> Performance regressin 2020-08-28 in globalWindow benchmark
> --
>
> Key: FLINK-19086
> URL: https://issues.apache.org/jira/browse/FLINK-19086
> Project: Flink
>  Issue Type: Bug
>  Components: Benchmarks, Runtime / Task
>Affects Versions: 1.12.0
>Reporter: Roman Khachatryan
>Assignee: Roman Khachatryan
>Priority: Major
> Attachments: Screenshot_2020-08-28_09-58-31.png
>
>
> [http://codespeed.dak8s.net:8000/timeline/?ben=globalWindow=2]
> [http://codespeed.dak8s.net:8000/timeline/#/?exe=1,3=tumblingWindow=2=200=off=on=on]
>  
> The results started to decrease 2 days before decomissioning of an old 
> jenkins node.
> The other tests, however, were stable.
>  
> cc: [~pnowojski]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (FLINK-19087) ResultPartitionWriter should not expose subpartition but only subpartition-readers

2020-08-28 Thread Kenneth William Krugler (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-19087?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth William Krugler updated FLINK-19087:

Summary: ResultPartitionWriter should not expose subpartition but only 
subpartition-readers  (was: ReaultPartitionWriter should not expose 
subpartition but only subpartition-readers)

> ResultPartitionWriter should not expose subpartition but only 
> subpartition-readers
> --
>
> Key: FLINK-19087
> URL: https://issues.apache.org/jira/browse/FLINK-19087
> Project: Flink
>  Issue Type: Sub-task
>  Components: Runtime / Network
>Reporter: Stephan Ewen
>Assignee: Stephan Ewen
>Priority: Major
> Fix For: 1.12.0
>
>
> The {{ResultPartitionWiter}} currently gives arbitrary access to the 
> sub-partitions.
> These subpartitions may not always exist directly, such as in a sort based 
> shuffle.
> Necessary is only the access to a reader over a sub-partition's data (the 
> ResultSubpartitionView).
> In the spirit of minimal scope of knowledge, the methods should be scoped to 
> return readers, not the more general subpartitions.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-18828) Terminate jobmanager process with zero exit code to avoid unexpected restarting by K8s

2020-08-28 Thread Ufuk Celebi (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-18828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17186623#comment-17186623
 ] 

Ufuk Celebi commented on FLINK-18828:
-

Thanks for the summary, Till. I think that's a good way to frame this problem.
{quote}One could argue then that users should configure their restart 
strategies to always restart if they don't want to reach a FAILED state.
{quote}
Practically speaking I think it's the only option we currently have in the 
context of Kubernetes standalone deployments.

> Terminate jobmanager process with zero exit code to avoid unexpected 
> restarting by K8s
> --
>
> Key: FLINK-18828
> URL: https://issues.apache.org/jira/browse/FLINK-18828
> Project: Flink
>  Issue Type: Improvement
>  Components: Runtime / Coordination
>Affects Versions: 1.10.1, 1.12.0, 1.11.1
>Reporter: Yang Wang
>Priority: Major
> Fix For: 1.12.0, 1.11.2, 1.10.3
>
>
> Currently, Flink jobmanager process terminates with a non-zero exit code if 
> the job reaches the {{ApplicationStatus.FAILED}}. It is not ideal in K8s 
> deployment, since non-zero exit code will cause unexpected restarting. Also 
> from a framework's perspective, a FAILED job does not mean that Flink has 
> failed and, hence, the return code could still be 0.
> > Note:
> This is a special case for standalone K8s deployment. For 
> standalone/Yarn/Mesos/native K8s, terminating with non-zero exit code is 
> harmless. And a non-zero exit code could help to check the job result quickly.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [flink] flinkbot edited a comment on pull request #13251: [FLINK-14435] Added memory configuration to TaskManagers REST endpoint

2020-08-28 Thread GitBox


flinkbot edited a comment on pull request #13251:
URL: https://github.com/apache/flink/pull/13251#issuecomment-680904640


   
   ## CI report:
   
   * 78fbaad055ce6f0c9bb59e37347793a08655bb69 Azure: 
[FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=5900)
 
   * 63794c99aa8dc635bc15ce3e7013241fa291f658 Azure: 
[PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=5969)
 
   * 5c299cca964c6a0d5f1e61c2d8379f0c2717622e Azure: 
[PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=5974)
 
   * 0afb282498cadbef9dabb45e6ce6a6dcbc3bc6b7 Azure: 
[PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=5976)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Created] (FLINK-19088) flink sql 1.11 HbaseTableSource Supports FilterPushDown

2020-08-28 Thread sijun.huang (Jira)
sijun.huang created FLINK-19088:
---

 Summary: flink sql 1.11 HbaseTableSource Supports FilterPushDown
 Key: FLINK-19088
 URL: https://issues.apache.org/jira/browse/FLINK-19088
 Project: Flink
  Issue Type: New Feature
Affects Versions: 1.11.1
 Environment: flink sql 1.11
Reporter: sijun.huang


Hi,

In flink sql 1.11, if we create hbase table via hbase connector through hive 
catalog, when we query it, the flink will do a full table scan on the hbase 
table, even we specify the row key filter.

for detailed info, you may look at below post 

[http://apache-flink.147419.n8.nabble.com/flink-sql-1-11-hbase-td6652.html]

so I strongly recommend flink sql 1.11 HbaseTableSource support FilterPushDown 
to avoid full table scan on hbase table.

Cheers.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (FLINK-19021) Cleanups of the ResultPartition components

2020-08-28 Thread Stephan Ewen (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-19021?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephan Ewen resolved FLINK-19021.
--
Resolution: Fixed

Fixed through resolution of all subtasks.

> Cleanups of the ResultPartition components
> --
>
> Key: FLINK-19021
> URL: https://issues.apache.org/jira/browse/FLINK-19021
> Project: Flink
>  Issue Type: Improvement
>  Components: Runtime / Network
>Reporter: Stephan Ewen
>Assignee: Stephan Ewen
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.12.0
>
>
> This is the umbrella issue for a set of simplifications and cleanups in the 
> {{ResultPartition}} components.
> This cleanup is in preparation for a possible future refactoring.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (FLINK-19021) Cleanups of the ResultPartition components

2020-08-28 Thread Stephan Ewen (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-19021?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephan Ewen closed FLINK-19021.


> Cleanups of the ResultPartition components
> --
>
> Key: FLINK-19021
> URL: https://issues.apache.org/jira/browse/FLINK-19021
> Project: Flink
>  Issue Type: Improvement
>  Components: Runtime / Network
>Reporter: Stephan Ewen
>Assignee: Stephan Ewen
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.12.0
>
>
> This is the umbrella issue for a set of simplifications and cleanups in the 
> {{ResultPartition}} components.
> This cleanup is in preparation for a possible future refactoring.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (FLINK-19047) Move unaligned checkpoint methods from ResultPartition to separate interface.

2020-08-28 Thread Stephan Ewen (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-19047?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephan Ewen resolved FLINK-19047.
--
Resolution: Fixed

Fixed in 1.12 via d8250a01081e76a05f07e2f66d3adccffa811c30

> Move unaligned checkpoint methods from ResultPartition to separate interface.
> -
>
> Key: FLINK-19047
> URL: https://issues.apache.org/jira/browse/FLINK-19047
> Project: Flink
>  Issue Type: Sub-task
>  Components: Runtime / Network
>Reporter: Stephan Ewen
>Assignee: Stephan Ewen
>Priority: Major
> Fix For: 1.12.0
>
>
> All ResultPartitions have the unaligned checkpointing methods, but some do 
> not support checkpoints and throw an {{UnsupportedOperationException}}.
> I suggest to follow the idea of interface segregation to put the methods 
> relating to unaligned checkpoints in a dedicated interface.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (FLINK-19087) ReaultPartitionWriter should not expose subpartition but only subpartition-readers

2020-08-28 Thread Stephan Ewen (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-19087?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephan Ewen closed FLINK-19087.


> ReaultPartitionWriter should not expose subpartition but only 
> subpartition-readers
> --
>
> Key: FLINK-19087
> URL: https://issues.apache.org/jira/browse/FLINK-19087
> Project: Flink
>  Issue Type: Sub-task
>  Components: Runtime / Network
>Reporter: Stephan Ewen
>Assignee: Stephan Ewen
>Priority: Major
> Fix For: 1.12.0
>
>
> The {{ResultPartitionWiter}} currently gives arbitrary access to the 
> sub-partitions.
> These subpartitions may not always exist directly, such as in a sort based 
> shuffle.
> Necessary is only the access to a reader over a sub-partition's data (the 
> ResultSubpartitionView).
> In the spirit of minimal scope of knowledge, the methods should be scoped to 
> return readers, not the more general subpartitions.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (FLINK-19047) Move unaligned checkpoint methods from ResultPartition to separate interface.

2020-08-28 Thread Stephan Ewen (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-19047?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephan Ewen closed FLINK-19047.


> Move unaligned checkpoint methods from ResultPartition to separate interface.
> -
>
> Key: FLINK-19047
> URL: https://issues.apache.org/jira/browse/FLINK-19047
> Project: Flink
>  Issue Type: Sub-task
>  Components: Runtime / Network
>Reporter: Stephan Ewen
>Assignee: Stephan Ewen
>Priority: Major
> Fix For: 1.12.0
>
>
> All ResultPartitions have the unaligned checkpointing methods, but some do 
> not support checkpoints and throw an {{UnsupportedOperationException}}.
> I suggest to follow the idea of interface segregation to put the methods 
> relating to unaligned checkpoints in a dedicated interface.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (FLINK-19087) ReaultPartitionWriter should not expose subpartition but only subpartition-readers

2020-08-28 Thread Stephan Ewen (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-19087?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephan Ewen resolved FLINK-19087.
--
Resolution: Fixed

Fixed in 1.12 via 95c533c964e5babfee5d901ffea3c9be48e30ccb

> ReaultPartitionWriter should not expose subpartition but only 
> subpartition-readers
> --
>
> Key: FLINK-19087
> URL: https://issues.apache.org/jira/browse/FLINK-19087
> Project: Flink
>  Issue Type: Sub-task
>  Components: Runtime / Network
>Reporter: Stephan Ewen
>Assignee: Stephan Ewen
>Priority: Major
> Fix For: 1.12.0
>
>
> The {{ResultPartitionWiter}} currently gives arbitrary access to the 
> sub-partitions.
> These subpartitions may not always exist directly, such as in a sort based 
> shuffle.
> Necessary is only the access to a reader over a sub-partition's data (the 
> ResultSubpartitionView).
> In the spirit of minimal scope of knowledge, the methods should be scoped to 
> return readers, not the more general subpartitions.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [flink] flinkbot edited a comment on pull request #13251: [FLINK-14435] Added memory configuration to TaskManagers REST endpoint

2020-08-28 Thread GitBox


flinkbot edited a comment on pull request #13251:
URL: https://github.com/apache/flink/pull/13251#issuecomment-680904640


   
   ## CI report:
   
   * 78fbaad055ce6f0c9bb59e37347793a08655bb69 Azure: 
[FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=5900)
 
   * 63794c99aa8dc635bc15ce3e7013241fa291f658 Azure: 
[PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=5969)
 
   * 5c299cca964c6a0d5f1e61c2d8379f0c2717622e Azure: 
[PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=5974)
 
   * 0afb282498cadbef9dabb45e6ce6a6dcbc3bc6b7 UNKNOWN
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Closed] (FLINK-19046) Introduce separate classes for PipelinedResultPartition and BoundedBlockingResultPartition

2020-08-28 Thread Stephan Ewen (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-19046?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephan Ewen closed FLINK-19046.


> Introduce separate classes for PipelinedResultPartition and 
> BoundedBlockingResultPartition
> --
>
> Key: FLINK-19046
> URL: https://issues.apache.org/jira/browse/FLINK-19046
> Project: Flink
>  Issue Type: Sub-task
>  Components: Runtime / Network
>Reporter: Stephan Ewen
>Assignee: Stephan Ewen
>Priority: Major
> Fix For: 1.12.0
>
>
> Currently, the SubPartition classes are specific to the partition type 
> (pipelined, batched/blocking) but the parent Partition class is shared.
> Given that the partitions behave differently regarding checkpoints, 
> releasing, etc. the code is cleaner separated by introducing dedicated 
> classes for the {{ResultPartitions}} based on the type.
> This is also an important preparation to later have more different 
> implementations, like sort-based shuffles.
> Important: These new classes will not override any performance critical 
> methods (like adding a buffer to the result). They merely specialize certain 
> behaviors around checkpointing and cleanup.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (FLINK-19024) Remove unused "releaseMemory" from ResultSubpartition

2020-08-28 Thread Stephan Ewen (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-19024?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephan Ewen closed FLINK-19024.


> Remove unused "releaseMemory" from ResultSubpartition
> -
>
> Key: FLINK-19024
> URL: https://issues.apache.org/jira/browse/FLINK-19024
> Project: Flink
>  Issue Type: Sub-task
>  Components: Runtime / Network
>Reporter: Stephan Ewen
>Assignee: Stephan Ewen
>Priority: Minor
> Fix For: 1.12.0
>
>
> The {{releaseMemory()}} call in the {{ResultSubpartition}} is currently not 
> meaningful for any existing implementation.
> Future versions where memory may have to be released will quite possibly not 
> implement that on a "subpartition" level. For example, a sort based shuffle 
> has the buffers on a partition-level, rather than a subpartition level.
> We should thus remove the {{releaseMemory()}} call from the abstract 
> subpartition interface. Concrete implementations can still release memory on 
> a subpartition level, if needed in the future.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (FLINK-19045) Remove obsolete option 'taskmanager.network.partition.force-release-on-consumption'

2020-08-28 Thread Stephan Ewen (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-19045?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephan Ewen closed FLINK-19045.


> Remove obsolete option 
> 'taskmanager.network.partition.force-release-on-consumption'
> ---
>
> Key: FLINK-19045
> URL: https://issues.apache.org/jira/browse/FLINK-19045
> Project: Flink
>  Issue Type: Sub-task
>  Components: Runtime / Network
>Reporter: Stephan Ewen
>Assignee: Stephan Ewen
>Priority: Major
> Fix For: 1.12.0
>
>
> This option was a fallback/safeguard option introduced when the fine-grained 
> failover for batch was added.
> With the fine-grained batch failover, the batch result partitions were no 
> longer immediately cleaned up, but retained for recovery until the scheduler 
> decided that they were no longer needed.
> The purpose of the flag was to restore the old "immediate cleanup" behavior 
> should it be needed.
> The batch failover has proven stable and this fallback option can be removed 
> now.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (FLINK-19046) Introduce separate classes for PipelinedResultPartition and BoundedBlockingResultPartition

2020-08-28 Thread Stephan Ewen (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-19046?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephan Ewen resolved FLINK-19046.
--
Resolution: Fixed

Fixed in 1.12 via 91d380788005d35fcb14a2fe66ae6ca9e72e529d

> Introduce separate classes for PipelinedResultPartition and 
> BoundedBlockingResultPartition
> --
>
> Key: FLINK-19046
> URL: https://issues.apache.org/jira/browse/FLINK-19046
> Project: Flink
>  Issue Type: Sub-task
>  Components: Runtime / Network
>Reporter: Stephan Ewen
>Assignee: Stephan Ewen
>Priority: Major
> Fix For: 1.12.0
>
>
> Currently, the SubPartition classes are specific to the partition type 
> (pipelined, batched/blocking) but the parent Partition class is shared.
> Given that the partitions behave differently regarding checkpoints, 
> releasing, etc. the code is cleaner separated by introducing dedicated 
> classes for the {{ResultPartitions}} based on the type.
> This is also an important preparation to later have more different 
> implementations, like sort-based shuffles.
> Important: These new classes will not override any performance critical 
> methods (like adding a buffer to the result). They merely specialize certain 
> behaviors around checkpointing and cleanup.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (FLINK-19024) Remove unused "releaseMemory" from ResultSubpartition

2020-08-28 Thread Stephan Ewen (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-19024?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephan Ewen resolved FLINK-19024.
--
Resolution: Fixed

Fixed for 1.12 via 0b3f15284d1777925b4bd6c4358fa6dac0867172

> Remove unused "releaseMemory" from ResultSubpartition
> -
>
> Key: FLINK-19024
> URL: https://issues.apache.org/jira/browse/FLINK-19024
> Project: Flink
>  Issue Type: Sub-task
>  Components: Runtime / Network
>Reporter: Stephan Ewen
>Assignee: Stephan Ewen
>Priority: Minor
> Fix For: 1.12.0
>
>
> The {{releaseMemory()}} call in the {{ResultSubpartition}} is currently not 
> meaningful for any existing implementation.
> Future versions where memory may have to be released will quite possibly not 
> implement that on a "subpartition" level. For example, a sort based shuffle 
> has the buffers on a partition-level, rather than a subpartition level.
> We should thus remove the {{releaseMemory()}} call from the abstract 
> subpartition interface. Concrete implementations can still release memory on 
> a subpartition level, if needed in the future.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (FLINK-19045) Remove obsolete option 'taskmanager.network.partition.force-release-on-consumption'

2020-08-28 Thread Stephan Ewen (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-19045?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephan Ewen resolved FLINK-19045.
--
Resolution: Fixed

Fixed for 1.12 via 3ad06e3aacda3923f3c8e40eced430177aecb49d

> Remove obsolete option 
> 'taskmanager.network.partition.force-release-on-consumption'
> ---
>
> Key: FLINK-19045
> URL: https://issues.apache.org/jira/browse/FLINK-19045
> Project: Flink
>  Issue Type: Sub-task
>  Components: Runtime / Network
>Reporter: Stephan Ewen
>Assignee: Stephan Ewen
>Priority: Major
> Fix For: 1.12.0
>
>
> This option was a fallback/safeguard option introduced when the fine-grained 
> failover for batch was added.
> With the fine-grained batch failover, the batch result partitions were no 
> longer immediately cleaned up, but retained for recovery until the scheduler 
> decided that they were no longer needed.
> The purpose of the flag was to restore the old "immediate cleanup" behavior 
> should it be needed.
> The batch failover has proven stable and this fallback option can be removed 
> now.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (FLINK-19023) Remove pruning of Record Serializer Buffer

2020-08-28 Thread Stephan Ewen (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-19023?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephan Ewen closed FLINK-19023.


> Remove pruning of Record Serializer Buffer
> --
>
> Key: FLINK-19023
> URL: https://issues.apache.org/jira/browse/FLINK-19023
> Project: Flink
>  Issue Type: Sub-task
>  Components: Runtime / Network
>Reporter: Stephan Ewen
>Assignee: Stephan Ewen
>Priority: Major
> Fix For: 1.12.0
>
>
> Currently, the {{SpanningRecordSerializer}} prunes its internal serialization 
> buffer under special circumstances:
>   - The buffer becomes larger than a certain threshold (5MB)
>   - The full record end lines up exactly with a full buffer length (this 
> change got introduced at some point, it is not clear what the purpose is)
> This optimization virtually never kicks in (because of the second condition) 
> and also seems unnecessary. There is only a single serializer on the sender 
> side, so this will not help to reduce the maximum memory footprint needed in 
> any way.
> NOTE: A similar optimization on the reader side 
> ({{SpillingAdaptiveSpanningRecordDeserializer}}) makes sense, because 
> multiple parallel deserializers run in order to piece together the records 
> when retrieving buffers from the network in arbitrary order. Truncating 
> buffers (or spilling) there helps reduce the maximum required memory 
> footprint.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (FLINK-19023) Remove pruning of Record Serializer Buffer

2020-08-28 Thread Stephan Ewen (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-19023?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephan Ewen resolved FLINK-19023.
--
Resolution: Fixed

Fixed for 1.12 via 2c273f86e41866bc737de1686aa0925a4749671b

> Remove pruning of Record Serializer Buffer
> --
>
> Key: FLINK-19023
> URL: https://issues.apache.org/jira/browse/FLINK-19023
> Project: Flink
>  Issue Type: Sub-task
>  Components: Runtime / Network
>Reporter: Stephan Ewen
>Assignee: Stephan Ewen
>Priority: Major
> Fix For: 1.12.0
>
>
> Currently, the {{SpanningRecordSerializer}} prunes its internal serialization 
> buffer under special circumstances:
>   - The buffer becomes larger than a certain threshold (5MB)
>   - The full record end lines up exactly with a full buffer length (this 
> change got introduced at some point, it is not clear what the purpose is)
> This optimization virtually never kicks in (because of the second condition) 
> and also seems unnecessary. There is only a single serializer on the sender 
> side, so this will not help to reduce the maximum memory footprint needed in 
> any way.
> NOTE: A similar optimization on the reader side 
> ({{SpillingAdaptiveSpanningRecordDeserializer}}) makes sense, because 
> multiple parallel deserializers run in order to piece together the records 
> when retrieving buffers from the network in arbitrary order. Truncating 
> buffers (or spilling) there helps reduce the maximum required memory 
> footprint.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [flink] flinkbot edited a comment on pull request #13272: [FLINK-18695][network] Netty fakes heap buffer allocationn with direct buffers

2020-08-28 Thread GitBox


flinkbot edited a comment on pull request #13272:
URL: https://github.com/apache/flink/pull/13272#issuecomment-682346986


   
   ## CI report:
   
   * d34257206e16c6b9757046cf2c7d71d687ab75eb Azure: 
[FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=5971)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot edited a comment on pull request #13209: [FLINK-18832][datastream] Add compatible check for blocking partition with buffer timeout

2020-08-28 Thread GitBox


flinkbot edited a comment on pull request #13209:
URL: https://github.com/apache/flink/pull/13209#issuecomment-677744672


   
   ## CI report:
   
   * 6b299eb84623c97e80fcd15d64702fb993947186 Azure: 
[CANCELED](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=5966)
 
   * a343c2c3bf36c97dca7045c65eccbcccfbbef5bf UNKNOWN
   * 3de3f59281d22f21dfbb30cf7d93b1ecd3165a9e Azure: 
[PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=5968)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] asfgit closed pull request #13239: [FLINK-19021][network] Various cleanups and refactorings of the ResultPartition

2020-08-28 Thread GitBox


asfgit closed pull request #13239:
URL: https://github.com/apache/flink/pull/13239


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (FLINK-18828) Terminate jobmanager process with zero exit code to avoid unexpected restarting by K8s

2020-08-28 Thread Till Rohrmann (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-18828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17186566#comment-17186566
 ] 

Till Rohrmann commented on FLINK-18828:
---

I believe the question here is whether one considers if a job reaches the 
{{FAILED}} state a failure of Flink or not. If {{FAILED}} is a valid outcome of 
a job execution, then the return code should be zero. One could argue then that 
users should configure their restart strategies to always restart if they don't 
want to reach a {{FAILED}} state.

If one considers the per-job deployment mode the vehicle to run a Flink job, 
then I can also see that a {{FAILED}} job state can be considered a Flink 
failure and, hence, one should terminate with a non-zero exit code.

Somewhat related to this question is what is causing the job to fail. If it is 
a user code fault, then I would be more inclined to say that {{FAILED}} is a 
valid terminal state with a zero exit code because Flink cannot do anything 
about it. If on the other hand, Flink is causing the job to reach a {{FAILED}} 
state (e.g. if it does not manage to acquire enough resources, the timeouts are 
too tight to run successfully on the infrastructure, a fatal error occurs, 
etc.), then I can see that Flink should terminate with a non-zero exit code 
indicating that it failed.

> Terminate jobmanager process with zero exit code to avoid unexpected 
> restarting by K8s
> --
>
> Key: FLINK-18828
> URL: https://issues.apache.org/jira/browse/FLINK-18828
> Project: Flink
>  Issue Type: Improvement
>  Components: Runtime / Coordination
>Affects Versions: 1.10.1, 1.12.0, 1.11.1
>Reporter: Yang Wang
>Priority: Major
> Fix For: 1.12.0, 1.11.2, 1.10.3
>
>
> Currently, Flink jobmanager process terminates with a non-zero exit code if 
> the job reaches the {{ApplicationStatus.FAILED}}. It is not ideal in K8s 
> deployment, since non-zero exit code will cause unexpected restarting. Also 
> from a framework's perspective, a FAILED job does not mean that Flink has 
> failed and, hence, the return code could still be 0.
> > Note:
> This is a special case for standalone K8s deployment. For 
> standalone/Yarn/Mesos/native K8s, terminating with non-zero exit code is 
> harmless. And a non-zero exit code could help to check the job result quickly.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [flink] flinkbot edited a comment on pull request #13273: [FLINK-18801][docs][python] Add a "10 minutes to Table API" document under the "Python API" -> "User Guide" -> "Table API" section.

2020-08-28 Thread GitBox


flinkbot edited a comment on pull request #13273:
URL: https://github.com/apache/flink/pull/13273#issuecomment-682353427


   
   ## CI report:
   
   * f0ae42ddcc931633408535d4df06c35bb56573ca Azure: 
[SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=5972)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (FLINK-14964) Support configure remote flink jar

2020-08-28 Thread Andreas Hailu (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-14964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17186562#comment-17186562
 ] 

Andreas Hailu commented on FLINK-14964:
---

@[~trohrmann] What's the difference between this feature and 
[FLINK-13938|https://issues.apache.org/jira/browse/FLINK-13938] ? 

> Support configure remote flink jar
> --
>
> Key: FLINK-14964
> URL: https://issues.apache.org/jira/browse/FLINK-14964
> Project: Flink
>  Issue Type: Improvement
>  Components: Client / Job Submission, Deployment / YARN
>Reporter: Zili Chen
>Priority: Major
> Attachments: 
> [XXX]_Enable_configure_shared_libraries_[YYY]_Enable_configure_remote_flink_jar.patch
>
>
> Corresponding discussion happens on 
> [https://lists.apache.org/x/thread.html/a2fef19f6925d43dd2cf3f9a46a2263b20562bd61d4da970f609568d@%3Cdev.flink.apache.org%3E]
> This ticket focuses on support configure remote flink jar. That is, when user 
> configure flink jar as "hdfs://..." we also respect the pattern and instead 
> of uploading from local, directly register as YARN's {{LocalResource}}.
> This improvement will speed up the deployment on YARN.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-14422) Add metric for network memory

2020-08-28 Thread Matthias (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-14422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17186559#comment-17186559
 ] 

Matthias commented on FLINK-14422:
--

I'm gonna pick-up this issue to work on it.

> Add metric for network memory
> -
>
> Key: FLINK-14422
> URL: https://issues.apache.org/jira/browse/FLINK-14422
> Project: Flink
>  Issue Type: Sub-task
>  Components: Runtime / Task
>Reporter: lining
>Assignee: Matthias
>Priority: Major
>
> This issue refers to Step 2 in the implementation proposal of 
> [FLIP-102|https://cwiki.apache.org/confluence/display/FLINK/FLIP-102%3A+Add+More+Metrics+to+TaskManager]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (FLINK-14422) Add metric for network memory

2020-08-28 Thread Matthias (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-14422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias reassigned FLINK-14422:


Assignee: Matthias

> Add metric for network memory
> -
>
> Key: FLINK-14422
> URL: https://issues.apache.org/jira/browse/FLINK-14422
> Project: Flink
>  Issue Type: Sub-task
>  Components: Runtime / Task
>Reporter: lining
>Assignee: Matthias
>Priority: Major
>
> This issue refers to Step 2 in the implementation proposal of 
> [FLIP-102|https://cwiki.apache.org/confluence/display/FLINK/FLIP-102%3A+Add+More+Metrics+to+TaskManager]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (FLINK-14422) Add metric for network memory

2020-08-28 Thread Matthias (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-14422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias updated FLINK-14422:
-
Summary: Add metric for network memory  (was: Add metric for shuffle memory)

> Add metric for network memory
> -
>
> Key: FLINK-14422
> URL: https://issues.apache.org/jira/browse/FLINK-14422
> Project: Flink
>  Issue Type: Sub-task
>  Components: Runtime / Task
>Reporter: lining
>Priority: Major
>
> This issue refers to Step 2 in the implementation proposal of 
> [FLIP-102|https://cwiki.apache.org/confluence/display/FLINK/FLIP-102%3A+Add+More+Metrics+to+TaskManager]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [flink] flinkbot edited a comment on pull request #13251: [FLINK-14435] Added memory configuration to TaskManagers REST endpoint

2020-08-28 Thread GitBox


flinkbot edited a comment on pull request #13251:
URL: https://github.com/apache/flink/pull/13251#issuecomment-680904640


   
   ## CI report:
   
   * 78fbaad055ce6f0c9bb59e37347793a08655bb69 Azure: 
[FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=5900)
 
   * 63794c99aa8dc635bc15ce3e7013241fa291f658 Azure: 
[PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=5969)
 
   * 5c299cca964c6a0d5f1e61c2d8379f0c2717622e Azure: 
[PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=5974)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (FLINK-14964) Support configure remote flink jar

2020-08-28 Thread Zili Chen (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-14964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17186555#comment-17186555
 ] 

Zili Chen commented on FLINK-14964:
---

Let me recall the context and hopefully reply in tomorrow :P

> Support configure remote flink jar
> --
>
> Key: FLINK-14964
> URL: https://issues.apache.org/jira/browse/FLINK-14964
> Project: Flink
>  Issue Type: Improvement
>  Components: Client / Job Submission, Deployment / YARN
>Reporter: Zili Chen
>Priority: Major
> Attachments: 
> [XXX]_Enable_configure_shared_libraries_[YYY]_Enable_configure_remote_flink_jar.patch
>
>
> Corresponding discussion happens on 
> [https://lists.apache.org/x/thread.html/a2fef19f6925d43dd2cf3f9a46a2263b20562bd61d4da970f609568d@%3Cdev.flink.apache.org%3E]
> This ticket focuses on support configure remote flink jar. That is, when user 
> configure flink jar as "hdfs://..." we also respect the pattern and instead 
> of uploading from local, directly register as YARN's {{LocalResource}}.
> This improvement will speed up the deployment on YARN.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (FLINK-14422) Add metric for shuffle memory

2020-08-28 Thread Matthias (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-14422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias updated FLINK-14422:
-
Description: This issue refers to Step 2 in the implementation proposal of 
[FLIP-102|https://cwiki.apache.org/confluence/display/FLINK/FLIP-102%3A+Add+More+Metrics+to+TaskManager]
  (was: * add getTotalMemorySize and in getAvaliableMemorySize 
NetworkBufferPool 

{code:java}
public long getTotalMemorySize() {
return 1L * getTotalNumberOfMemorySegments() * memorySegmentSize;
}

public long getAvaliableMemorySize() {
return 1L * getNumberOfAvailableMemorySegments() * memorySegmentSize;
}
{code}
 
 * update NettyShuffleMetricFactory#registerShuffleMetrics

{code:java}
private static final String METRIC_TOTAL_MEMORY_SEGMENT_TOTALCAPACITY = 
"TotalMemoryCapacity";
private static final String METRIC_TOTAL_MEMORY_SEGMENT_AVALIABLEMEMORY = 
"AvaliableMemory";
private static void registerShuffleMetrics(
String groupName,
MetricGroup metricGroup,
NetworkBufferPool networkBufferPool) {
MetricGroup networkGroup = metricGroup.addGroup(groupName);
networkGroup.>gauge(METRIC_TOTAL_MEMORY_SEGMENT,

networkBufferPool::getTotalNumberOfMemorySegments);
networkGroup.>gauge(METRIC_AVAILABLE_MEMORY_SEGMENT,

networkBufferPool::getNumberOfAvailableMemorySegments);
networkGroup.>gauge(METRIC_TOTAL_MEMORY_SEGMENT_TOTALCAPACITY,
  
networkBufferPool::getTotalMemorySize);
networkGroup.>gauge(METRIC_TOTAL_MEMORY_SEGMENT_AVALIABLEMEMORY,
  
networkBufferPool::getAvaliableMemorySize);
}
{code})

> Add metric for shuffle memory
> -
>
> Key: FLINK-14422
> URL: https://issues.apache.org/jira/browse/FLINK-14422
> Project: Flink
>  Issue Type: Sub-task
>  Components: Runtime / Task
>Reporter: lining
>Priority: Major
>
> This issue refers to Step 2 in the implementation proposal of 
> [FLIP-102|https://cwiki.apache.org/confluence/display/FLINK/FLIP-102%3A+Add+More+Metrics+to+TaskManager]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (FLINK-14234) All partition consumable events should be notified to SchedulingStrategy (SchedulerNG)

2020-08-28 Thread Zhu Zhu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-14234?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhu Zhu closed FLINK-14234.
---
Resolution: Won't Do

Not a must at the moment so close it for now.

> All partition consumable events should be notified to SchedulingStrategy 
> (SchedulerNG)
> --
>
> Key: FLINK-14234
> URL: https://issues.apache.org/jira/browse/FLINK-14234
> Project: Flink
>  Issue Type: Sub-task
>  Components: Runtime / Coordination
>Affects Versions: 1.10.0
>Reporter: Zhu Zhu
>Assignee: Zhu Zhu
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> {{SchedulingStrategy}} requires partition consumable notification to make 
> scheduling decisions.
> According to {{SchedulingStrategy#onPartitionConsumable}} definition, all 
> partition consumable events should be notified to {{SchedulingStrategy}}, 
> including those from TMs (pipelined partitions consumable for data produced) 
> and from within JM(blocking partitions consumable for producer finished).
> In this way, the LazyFromSourcesSchedulingStrategy does not need to maintain 
> the result partition status by itself. InputDependencyConstraintChecker can 
> be simplified a lot in this way.
> Besides that, LazyFromSourcesSchedulingStrategy does not need to be aware of 
> result partition types(PIPELINED/BLOCKING) in this way.
> It would also simplify the input checking for pipelined region scheduling.
> More details see 
> [here|https://github.com/apache/flink/pull/9663#discussion_r326540913].



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (FLINK-18828) Terminate jobmanager process with zero exit code to avoid unexpected restarting by K8s

2020-08-28 Thread Ufuk Celebi (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-18828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17186532#comment-17186532
 ] 

Ufuk Celebi edited comment on FLINK-18828 at 8/28/20, 1:19 PM:
---

[~fly_in_gis] Thanks for the pointers. The Flink job would only transition to 
FAILED when the Flink-level restart strategy has been exhausted. In your 
example for fixed-delay with 3 attempts, the first three restarts would _not_ 
result in the container to exit, but only on the 4th failure would the job 
transition to FAILED and the container exit.

I think the bigger problem with my proposal to set the policy to Never is that 
it would not restart in other failure scenarios (e.g. OOM killed). So overall, 
I don't think it's a viable option.

So overall, I don't see a good way around this problem without your proposed 
change.

---

Maybe as a follow-up we want to resurrect 
https://issues.apache.org/jira/browse/FLINK-10948 ? That way, users would at 
least be able to determine the final Flink job status.


was (Author: uce):
[~fly_in_gis] Thanks for the pointers. The Flink job would only transition to 
FAILED when the Flink-level restart strategy has been exhausted. In your 
example for fixed-delay with 3 attempts, the first three restarts would _not_ 
result in the container to exit, but only on the 4th failure would the job 
transition to FAILED and the container exit.

I think the bigger problem with my proposal to set the policy to Never is that 
it would not restart in other failure scenarios (e.g. OOM killed). So overall, 
I don't think it's a viable option.

So overall, I don't see a good way around this problem without your proposed 
change.

---

Maybe as a follow-up we want to resurrect 
https://issues.apache.org/jira/browse/FLINK-10948? That way, users would at 
least be able to determine the final Flink job status.

> Terminate jobmanager process with zero exit code to avoid unexpected 
> restarting by K8s
> --
>
> Key: FLINK-18828
> URL: https://issues.apache.org/jira/browse/FLINK-18828
> Project: Flink
>  Issue Type: Improvement
>  Components: Runtime / Coordination
>Affects Versions: 1.10.1, 1.12.0, 1.11.1
>Reporter: Yang Wang
>Priority: Major
> Fix For: 1.12.0, 1.11.2, 1.10.3
>
>
> Currently, Flink jobmanager process terminates with a non-zero exit code if 
> the job reaches the {{ApplicationStatus.FAILED}}. It is not ideal in K8s 
> deployment, since non-zero exit code will cause unexpected restarting. Also 
> from a framework's perspective, a FAILED job does not mean that Flink has 
> failed and, hence, the return code could still be 0.
> > Note:
> This is a special case for standalone K8s deployment. For 
> standalone/Yarn/Mesos/native K8s, terminating with non-zero exit code is 
> harmless. And a non-zero exit code could help to check the job result quickly.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (FLINK-14233) All task state changes should be notified to SchedulingStrategy (SchedulerNG)

2020-08-28 Thread Zhu Zhu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-14233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhu Zhu closed FLINK-14233.
---
Resolution: Later

Not a must at the moment so close it for now.

> All task state changes should be notified to SchedulingStrategy (SchedulerNG)
> -
>
> Key: FLINK-14233
> URL: https://issues.apache.org/jira/browse/FLINK-14233
> Project: Flink
>  Issue Type: Sub-task
>  Components: Runtime / Coordination
>Affects Versions: 1.10.0
>Reporter: Zhu Zhu
>Priority: Major
>
> {{SchedulingStrategy}} requires task states to make scheduling decisions.
> At the moment, however, only FINISHED and FAILED states are notified. 
> According to {{SchedulingStrategy#onExecutionStateChange}} definition, all 
> task state changes should be notified to {{SchedulingStrategy}}, including 
> those from TMs and from within JM.
> This can be helpful for a more flexible scheduling.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [flink] flinkbot edited a comment on pull request #13251: [FLINK-14435] Added memory configuration to TaskManagers REST endpoint

2020-08-28 Thread GitBox


flinkbot edited a comment on pull request #13251:
URL: https://github.com/apache/flink/pull/13251#issuecomment-680904640


   
   ## CI report:
   
   * 78fbaad055ce6f0c9bb59e37347793a08655bb69 Azure: 
[FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=5900)
 
   * 63794c99aa8dc635bc15ce3e7013241fa291f658 Azure: 
[PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=5969)
 
   * 5c299cca964c6a0d5f1e61c2d8379f0c2717622e UNKNOWN
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (FLINK-14431) Update TaskManager's memory information to match its memory composition

2020-08-28 Thread Matthias (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-14431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17186542#comment-17186542
 ] 

Matthias commented on FLINK-14431:
--

Just as a remark: I took over the backend side for this issue. I just reached 
out to [~yadongxie] to discuss the REST API interface.

> Update TaskManager's memory information to match its memory composition
> ---
>
> Key: FLINK-14431
> URL: https://issues.apache.org/jira/browse/FLINK-14431
> Project: Flink
>  Issue Type: Improvement
>  Components: Runtime / REST, Runtime / Task, Runtime / Web Frontend
>Reporter: lining
>Priority: Major
> Attachments: image-2019-10-17-17-58-50-342.png, 
> image-2019-10-17-18-01-09-353.png, image-2019-10-17-18-29-53-329.png, 
> image-2019-10-24-16-19-15-499.png, image-2019-10-24-16-20-23-210.png, 
> image-2019-10-24-16-22-27-360.png, image-2019-12-19-18-09-05-542.png, 
> image-2019-12-19-18-27-38-589.png, image-2019-12-19-18-28-01-447.png
>
>
> h3. Motivation
> There are several shortcomings of current (Flink 1.10) Flink TaskManager 
> memory information show in rest api.
> h4. (1) The information from HardwareDescription  is difficult to match the 
> memory compositions of TaskManager in flip-49. As below picture show:
> !image-2019-12-19-18-09-05-542.png|width=444,height=389!
>  * what's the meaning of HardwareDescription.sizeOfJvmHeap.
>  * the user couldn't get resource config about TaskManager.
> h4. (2) There isn't information for managed memory.
>  * no metric for managed memory.
> h4. (3) There isn't information for shuffle memory
>  * according to TaskManagerMetricsInfo's memorySegmentsTotal(ps: shuffle 
> segment total size), the user couldn't get shuffle memory.
> h4. (4) The metrics in the TaskManager's metrics page do not correspond to 
> the resource configuration of taskmanager
>  * It is difficult for users to update taskmanager's resource configuration 
> based on metrics because users couldn’t find configuration items related to 
> metrics.
> h3. Proposed Changes
> h4. Add TaskManageResourceInfo which match the memory compositions 
>  * information from TaskExecutorResourceSpec in flip-49, add it to 
> TaskExecutorRegistration.
> {code:java}
> public class TaskManagerResourceInfo {
> private final double cpuCores;
> private final long frameworkHeap;
> private final long frameworkOffHeap;
> private final long taskHeap;
> private final long taskOffHeap;
> private final long shuffleMemory;
> private final long managedMemory;
> private final long jvmMetaSpace;
> private final long jvmOverhead;
> private final long totalProcessMemory;
> }
> {code}
>  * url: /taskmanagers/:taskmanagerid
>  * response: add
> {code:json}
> resource: {
>   cpuCores: 4,
>   frameworkHeap: 134217728,
>   frameworkOffHeap: 134217728,
>   taskHeap: 181193928,
>   taskOffHeap: 0,
>   shuffleMemory: 33554432,
>   managedMemory: 322122552,
>   jvmMetaSpace: 134217728,
>   jvmOverhead: 134217728,
>   totalProcessMemory: 1073741824
> }
> {code}
> h4. Add shuffle memory metric
>  * add getTotalMemorySize and in getAvaliableMemorySize NetworkBufferPool
> {code:java}
> public long getTotalMemorySize() {
> return 1L * getTotalNumberOfMemorySegments() * memorySegmentSize;
> }
> public long getAvaliableMemorySize() {
> return 1L * getNumberOfAvailableMemorySegments() * memorySegmentSize;
> }{code}
>  * update NettyShuffleMetricFactory#registerShuffleMetrics
> {code:java}
> private static final String METRIC_TOTAL_MEMORY_SEGMENT_TOTALCAPACITY = 
> "TotalMemoryCapacity";
> private static final String METRIC_TOTAL_MEMORY_SEGMENT_AVALIABLEMEMORY = 
> "AvaliableMemory";
> private static void registerShuffleMetrics(
> String groupName,
> MetricGroup metricGroup,
> NetworkBufferPool networkBufferPool) {
> MetricGroup networkGroup = metricGroup.addGroup(groupName);
> networkGroup.>gauge(METRIC_TOTAL_MEMORY_SEGMENT,
> 
> networkBufferPool::getTotalNumberOfMemorySegments);
> networkGroup. Gauge>gauge(METRIC_AVAILABLE_MEMORY_SEGMENT,
> 
> networkBufferPool::getNumberOfAvailableMemorySegments);
> networkGroup. Gauge>gauge(METRIC_TOTAL_MEMORY_SEGMENT_TOTALCAPACITY,
>   
> networkBufferPool::getTotalMemorySize);
> networkGroup. Gauge>gauge(METRIC_TOTAL_MEMORY_SEGMENT_AVALIABLEMEMORY,
>   
> networkBufferPool::getAvaliableMemorySize);
> }
> {code}
> h4. Add manage memory metric
>  * add default memory type in MemoryManager
> {code:java}
> public static final MemoryType DEFAULT_MEMORY_TYPE = MemoryType.OFF_HEAP;
> {code}
>  * add getManagedMemoryTotal in TaskExecutor:
> 

[jira] [Commented] (FLINK-18828) Terminate jobmanager process with zero exit code to avoid unexpected restarting by K8s

2020-08-28 Thread Ufuk Celebi (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-18828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17186532#comment-17186532
 ] 

Ufuk Celebi commented on FLINK-18828:
-

[~fly_in_gis] Thanks for the pointers. The Flink job would only transition to 
FAILED when the Flink-level restart strategy has been exhausted. In your 
example for fixed-delay with 3 attempts, the first three restarts would _not_ 
result in the container to exit, but only on the 4th failure would the job 
transition to FAILED and the container exit.

I think the bigger problem with my proposal to set the policy to Never is that 
it would not restart in other failure scenarios (e.g. OOM killed). So overall, 
I don't think it's a viable option.

So overall, I don't see a good way around this problem without your proposed 
change.

---

Maybe as a follow-up we want to resurrect 
https://issues.apache.org/jira/browse/FLINK-10948? That way, users would at 
least be able to determine the final Flink job status.

> Terminate jobmanager process with zero exit code to avoid unexpected 
> restarting by K8s
> --
>
> Key: FLINK-18828
> URL: https://issues.apache.org/jira/browse/FLINK-18828
> Project: Flink
>  Issue Type: Improvement
>  Components: Runtime / Coordination
>Affects Versions: 1.10.1, 1.12.0, 1.11.1
>Reporter: Yang Wang
>Priority: Major
> Fix For: 1.12.0, 1.11.2, 1.10.3
>
>
> Currently, Flink jobmanager process terminates with a non-zero exit code if 
> the job reaches the {{ApplicationStatus.FAILED}}. It is not ideal in K8s 
> deployment, since non-zero exit code will cause unexpected restarting. Also 
> from a framework's perspective, a FAILED job does not mean that Flink has 
> failed and, hence, the return code could still be 0.
> > Note:
> This is a special case for standalone K8s deployment. For 
> standalone/Yarn/Mesos/native K8s, terminating with non-zero exit code is 
> harmless. And a non-zero exit code could help to check the job result quickly.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [flink] flinkbot edited a comment on pull request #13276: [FLINK-18604][connectors/HBase] HBase ConnectorDescriptor can not work in Table API

2020-08-28 Thread GitBox


flinkbot edited a comment on pull request #13276:
URL: https://github.com/apache/flink/pull/13276#issuecomment-682412809


   
   ## CI report:
   
   * 28efe1242dca9036804402d3c0ed1b56bfaf8d8f Azure: 
[SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=5965)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot edited a comment on pull request #13272: [FLINK-18695][network] Netty fakes heap buffer allocationn with direct buffers

2020-08-28 Thread GitBox


flinkbot edited a comment on pull request #13272:
URL: https://github.com/apache/flink/pull/13272#issuecomment-682346986


   
   ## CI report:
   
   * 3ef9f728f6a8005a7f91846bf32df96ee718c626 Azure: 
[FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=5952)
 
   * d34257206e16c6b9757046cf2c7d71d687ab75eb Azure: 
[PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=5971)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot edited a comment on pull request #13273: [FLINK-18801][docs][python] Add a "10 minutes to Table API" document under the "Python API" -> "User Guide" -> "Table API" section.

2020-08-28 Thread GitBox


flinkbot edited a comment on pull request #13273:
URL: https://github.com/apache/flink/pull/13273#issuecomment-682353427


   
   ## CI report:
   
   * 0134fa06171742eb8ae840b2aced9530232bdb04 Azure: 
[FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=5955)
 
   * f0ae42ddcc931633408535d4df06c35bb56573ca Azure: 
[PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=5972)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot edited a comment on pull request #13216: [FLINK-18999][table-planner-blink][hive] Temporary generic table does…

2020-08-28 Thread GitBox


flinkbot edited a comment on pull request #13216:
URL: https://github.com/apache/flink/pull/13216#issuecomment-678268420


   
   ## CI report:
   
   * 6cef4d0e4c7cc7f835bad1594abf700df632d22b Azure: 
[SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=5799)
 
   * 31eef5a5d1f413435c3a3ccd9764c185b66c1c08 Azure: 
[PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=5970)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Closed] (FLINK-19050) Doc of MAX_DECIMAL_PRECISION should be DECIMAL

2020-08-28 Thread Benchao Li (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-19050?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benchao Li closed FLINK-19050.
--
Fix Version/s: 1.12.0
   Resolution: Fixed

> Doc of MAX_DECIMAL_PRECISION should be DECIMAL
> --
>
> Key: FLINK-19050
> URL: https://issues.apache.org/jira/browse/FLINK-19050
> Project: Flink
>  Issue Type: Bug
>  Components: Documentation
>Affects Versions: 1.11.1
>Reporter: Pua
>Assignee: Pua
>Priority: Trivial
>  Labels: pull-request-available
> Fix For: 1.12.0
>
>
> {code:java}
> // Define MAX/MIN precision of TIMESTAMP type according to PostgreSQL docs:
> // 
> https://www.postgresql.org/docs/12/datatype-numeric.html#DATATYPE-NUMERIC-DECIMAL
> private static final int MAX_DECIMAL_PRECISION = 1000;
> private static final int MIN_DECIMAL_PRECISION = 1;
> {code}
> [https://github.com/apache/flink/blob/master/flink-connectors/flink-connector-jdbc/src/main/java/org/apache/flink/connector/jdbc/dialect/PostgresDialect.java#L43]
> the doc of decimal precision constants should be DECIMAL not  TIMESTAMP



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-15031) Calculate required shuffle memory before allocating slots if resources are specified

2020-08-28 Thread Matthias (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-15031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17186511#comment-17186511
 ] 

Matthias commented on FLINK-15031:
--

Thanks (y)

> Calculate required shuffle memory before allocating slots if resources are 
> specified
> 
>
> Key: FLINK-15031
> URL: https://issues.apache.org/jira/browse/FLINK-15031
> Project: Flink
>  Issue Type: Task
>  Components: Runtime / Coordination
>Affects Versions: 1.10.0
>Reporter: Zhu Zhu
>Assignee: Zhu Zhu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.12.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> In cases where resources are specified, we expect each operator to declare 
> required resources before using them. In this way, no resource related error 
> should happen if resources are not used beyond what was declared. This 
> ensures a deployed task would not fail due to insufficient resources in TM, 
> which may result in unnecessary failures and may even cause a job hanging 
> forever, failing repeatedly on deploying tasks to a TM with insufficient 
> resources.
> Shuffle memory is the last missing piece for this goal at the moment. Minimum 
> network buffers are required by tasks to work. Currently a task is possible 
> to be deployed to a TM with insufficient network buffers, and fails on 
> launching.
> To avoid that, we should calculate required network memory for a 
> task/SlotSharingGroup before allocating a slot for it.
> The required shuffle memory can be derived from the number of required 
> network buffers. The number of buffers required by a task (ExecutionVertex) is
> {code:java}
> exclusive buffers for input channels(i.e. numInputChannel * 
> buffersPerChannel) + required buffers for result partition buffer 
> pool(currently is numberOfSubpartitions + 1)
> {code}
> Note that this is for the {{NettyShuffleService}} case. For custom shuffle 
> services, currently there is no way to get the required shuffle memory of a 
> task.
> To make it simple under dynamic slot sharing, the required shuffle memory for 
> a task should be the max required shuffle memory of all {{ExecutionVertex}} 
> of the same {{ExecutionJobVertex}}. And the required shuffle memory for a 
> slot sharing group should be the sum of shuffle memory for each 
> {{ExecutionJobVertex}} instance within.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (FLINK-18712) Flink RocksDB statebackend memory leak issue

2020-08-28 Thread Julius Michaelis (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-18712?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17186505#comment-17186505
 ] 

Julius Michaelis edited comment on FLINK-18712 at 8/28/20, 12:35 PM:
-

[~lio_sy], have you tried setting {{state.backend.rocksdb.memory.managed: 
false}}, and checked whether it stops that behavior?

(I think I'm seeing something quite similar, I'll try to build a small 
reproducer next week. I'm running into this on bare metal / outside of 
Kubernetes. I'm also seeing some odd metaspace usage, but I'm unsure it's 
related.)


was (Author: caesar):
@Farnight, have you tried setting {{state.backend.rocksdb.memory.managed: 
false}}, and checked whether it stops that behavior?

(I think I'm seeing something quite similar, I'll try to build a small 
reproducer next week. I'm running into this on bare metal / outside of 
Kubernetes. I'm also seeing some odd metaspace usage, but I'm unsure it's 
related.)

> Flink RocksDB statebackend memory leak issue 
> -
>
> Key: FLINK-18712
> URL: https://issues.apache.org/jira/browse/FLINK-18712
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / State Backends
>Affects Versions: 1.10.0
>Reporter: Farnight
>Priority: Critical
>
> When using RocksDB as our statebackend, we found it will lead to memory leak 
> when restarting job (manually or in recovery case).
>  
> How to reproduce:
>  # increase RocksDB blockcache size(e.g. 1G), it is easier to monitor and 
> reproduce.
>  # start a job using RocksDB statebackend.
>  # when the RocksDB blockcache reachs maximum size, restart the job. and 
> monitor the memory usage (k8s pod working set) of the TM.
>  # go through step 2-3 few more times. and memory will keep raising.
>  
> Any solution or suggestion for this? Thanks!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-19050) Doc of MAX_DECIMAL_PRECISION should be DECIMAL

2020-08-28 Thread Benchao Li (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-19050?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17186512#comment-17186512
 ] 

Benchao Li commented on FLINK-19050:


fixed via b5a459c6d11fc7d655e7bcf26a8cea0594a693dd (1.12.0)

> Doc of MAX_DECIMAL_PRECISION should be DECIMAL
> --
>
> Key: FLINK-19050
> URL: https://issues.apache.org/jira/browse/FLINK-19050
> Project: Flink
>  Issue Type: Bug
>  Components: Documentation
>Affects Versions: 1.11.1
>Reporter: Pua
>Assignee: Pua
>Priority: Trivial
>  Labels: pull-request-available
>
> {code:java}
> // Define MAX/MIN precision of TIMESTAMP type according to PostgreSQL docs:
> // 
> https://www.postgresql.org/docs/12/datatype-numeric.html#DATATYPE-NUMERIC-DECIMAL
> private static final int MAX_DECIMAL_PRECISION = 1000;
> private static final int MIN_DECIMAL_PRECISION = 1;
> {code}
> [https://github.com/apache/flink/blob/master/flink-connectors/flink-connector-jdbc/src/main/java/org/apache/flink/connector/jdbc/dialect/PostgresDialect.java#L43]
> the doc of decimal precision constants should be DECIMAL not  TIMESTAMP



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (FLINK-19050) Doc of MAX_DECIMAL_PRECISION should be DECIMAL

2020-08-28 Thread Benchao Li (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-19050?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benchao Li reassigned FLINK-19050:
--

Assignee: Pua

> Doc of MAX_DECIMAL_PRECISION should be DECIMAL
> --
>
> Key: FLINK-19050
> URL: https://issues.apache.org/jira/browse/FLINK-19050
> Project: Flink
>  Issue Type: Bug
>  Components: Documentation
>Affects Versions: 1.11.1
>Reporter: Pua
>Assignee: Pua
>Priority: Trivial
>  Labels: pull-request-available
>
> {code:java}
> // Define MAX/MIN precision of TIMESTAMP type according to PostgreSQL docs:
> // 
> https://www.postgresql.org/docs/12/datatype-numeric.html#DATATYPE-NUMERIC-DECIMAL
> private static final int MAX_DECIMAL_PRECISION = 1000;
> private static final int MIN_DECIMAL_PRECISION = 1;
> {code}
> [https://github.com/apache/flink/blob/master/flink-connectors/flink-connector-jdbc/src/main/java/org/apache/flink/connector/jdbc/dialect/PostgresDialect.java#L43]
> the doc of decimal precision constants should be DECIMAL not  TIMESTAMP



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-18712) Flink RocksDB statebackend memory leak issue

2020-08-28 Thread Yun Tang (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-18712?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17186516#comment-17186516
 ] 

Yun Tang commented on FLINK-18712:
--

Some updates: I have contacted with [~lio_sy] offline to confirm the steps to 
reproduce the problem and I'm still debugging. I'll give comments once I have 
any findings.

> Flink RocksDB statebackend memory leak issue 
> -
>
> Key: FLINK-18712
> URL: https://issues.apache.org/jira/browse/FLINK-18712
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / State Backends
>Affects Versions: 1.10.0
>Reporter: Farnight
>Priority: Critical
>
> When using RocksDB as our statebackend, we found it will lead to memory leak 
> when restarting job (manually or in recovery case).
>  
> How to reproduce:
>  # increase RocksDB blockcache size(e.g. 1G), it is easier to monitor and 
> reproduce.
>  # start a job using RocksDB statebackend.
>  # when the RocksDB blockcache reachs maximum size, restart the job. and 
> monitor the memory usage (k8s pod working set) of the TM.
>  # go through step 2-3 few more times. and memory will keep raising.
>  
> Any solution or suggestion for this? Thanks!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [flink-benchmarks] rkhachatryan commented on a change in pull request #3: [FLINK-18905] Provide basic benchmarks for MultipleInputStreamOperator

2020-08-28 Thread GitBox


rkhachatryan commented on a change in pull request #3:
URL: https://github.com/apache/flink-benchmarks/pull/3#discussion_r479246328



##
File path: src/main/java/org/apache/flink/benchmark/MultipleInputBenchmark.java
##
@@ -0,0 +1,161 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.benchmark;
+
+import org.apache.flink.api.common.restartstrategy.RestartStrategies;
+import org.apache.flink.api.common.typeinfo.BasicTypeInfo;
+import org.apache.flink.benchmark.functions.LongSource;
+import org.apache.flink.benchmark.functions.QueuingLongSource;
+import org.apache.flink.streaming.api.datastream.DataStreamSource;
+import org.apache.flink.streaming.api.datastream.MultipleConnectedStreams;
+import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
+import org.apache.flink.streaming.api.functions.sink.DiscardingSink;
+import org.apache.flink.streaming.api.operators.AbstractInput;
+import org.apache.flink.streaming.api.operators.AbstractStreamOperatorFactory;
+import org.apache.flink.streaming.api.operators.AbstractStreamOperatorV2;
+import org.apache.flink.streaming.api.operators.Input;
+import org.apache.flink.streaming.api.operators.MultipleInputStreamOperator;
+import org.apache.flink.streaming.api.operators.StreamOperator;
+import org.apache.flink.streaming.api.operators.StreamOperatorParameters;
+import 
org.apache.flink.streaming.api.transformations.MultipleInputTransformation;
+import org.apache.flink.streaming.runtime.streamrecord.StreamRecord;
+
+import org.openjdk.jmh.annotations.Benchmark;
+import org.openjdk.jmh.annotations.OperationsPerInvocation;
+import org.openjdk.jmh.runner.Runner;
+import org.openjdk.jmh.runner.RunnerException;
+import org.openjdk.jmh.runner.options.Options;
+import org.openjdk.jmh.runner.options.OptionsBuilder;
+import org.openjdk.jmh.runner.options.VerboseMode;
+
+import java.util.Arrays;
+import java.util.List;
+
+public class MultipleInputBenchmark extends BenchmarkBase {
+
+   public static final int RECORDS_PER_INVOCATION = 
TwoInputBenchmark.RECORDS_PER_INVOCATION;
+   public static final int ONE_IDLE_RECORDS_PER_INVOCATION = 
TwoInputBenchmark.ONE_IDLE_RECORDS_PER_INVOCATION;
+   public static final long CHECKPOINT_INTERVAL_MS = 
TwoInputBenchmark.CHECKPOINT_INTERVAL_MS;
+
+   public static void main(String[] args)
+   throws RunnerException {
+   Options options = new OptionsBuilder()
+   .verbosity(VerboseMode.NORMAL)
+   .include(".*" + 
MultipleInputBenchmark.class.getSimpleName() + ".*")
+   .build();
+
+   new Runner(options).run();
+   }
+
+   @Benchmark
+   @OperationsPerInvocation(RECORDS_PER_INVOCATION)
+   public void multiInputMapSink(FlinkEnvironmentContext context) throws 
Exception {
+
+   StreamExecutionEnvironment env = context.env;
+
+   env.enableCheckpointing(CHECKPOINT_INTERVAL_MS);
+   env.setParallelism(1);
+   env.setRestartStrategy(RestartStrategies.noRestart());

Review comment:
   I also disabled restarts in `ContinuousFileReaderOperatorBenchmark` as 
it caused it to run indefinitely.
   I think it would be helpful to disable it globally, but it's probably out of 
scope of this PR.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink-benchmarks] rkhachatryan commented on a change in pull request #3: [FLINK-18905] Provide basic benchmarks for MultipleInputStreamOperator

2020-08-28 Thread GitBox


rkhachatryan commented on a change in pull request #3:
URL: https://github.com/apache/flink-benchmarks/pull/3#discussion_r479227832



##
File path: src/main/java/org/apache/flink/benchmark/MultipleInputBenchmark.java
##
@@ -0,0 +1,161 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.benchmark;
+
+import org.apache.flink.api.common.restartstrategy.RestartStrategies;
+import org.apache.flink.api.common.typeinfo.BasicTypeInfo;
+import org.apache.flink.benchmark.functions.LongSource;
+import org.apache.flink.benchmark.functions.QueuingLongSource;
+import org.apache.flink.streaming.api.datastream.DataStreamSource;
+import org.apache.flink.streaming.api.datastream.MultipleConnectedStreams;
+import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
+import org.apache.flink.streaming.api.functions.sink.DiscardingSink;
+import org.apache.flink.streaming.api.operators.AbstractInput;
+import org.apache.flink.streaming.api.operators.AbstractStreamOperatorFactory;
+import org.apache.flink.streaming.api.operators.AbstractStreamOperatorV2;
+import org.apache.flink.streaming.api.operators.Input;
+import org.apache.flink.streaming.api.operators.MultipleInputStreamOperator;
+import org.apache.flink.streaming.api.operators.StreamOperator;
+import org.apache.flink.streaming.api.operators.StreamOperatorParameters;
+import 
org.apache.flink.streaming.api.transformations.MultipleInputTransformation;
+import org.apache.flink.streaming.runtime.streamrecord.StreamRecord;
+
+import org.openjdk.jmh.annotations.Benchmark;
+import org.openjdk.jmh.annotations.OperationsPerInvocation;
+import org.openjdk.jmh.runner.Runner;
+import org.openjdk.jmh.runner.RunnerException;
+import org.openjdk.jmh.runner.options.Options;
+import org.openjdk.jmh.runner.options.OptionsBuilder;
+import org.openjdk.jmh.runner.options.VerboseMode;
+
+import java.util.Arrays;
+import java.util.List;
+
+public class MultipleInputBenchmark extends BenchmarkBase {
+
+   public static final int RECORDS_PER_INVOCATION = 
TwoInputBenchmark.RECORDS_PER_INVOCATION;
+   public static final int ONE_IDLE_RECORDS_PER_INVOCATION = 
TwoInputBenchmark.ONE_IDLE_RECORDS_PER_INVOCATION;
+   public static final long CHECKPOINT_INTERVAL_MS = 
TwoInputBenchmark.CHECKPOINT_INTERVAL_MS;
+
+   public static void main(String[] args)
+   throws RunnerException {
+   Options options = new OptionsBuilder()
+   .verbosity(VerboseMode.NORMAL)
+   .include(".*" + 
MultipleInputBenchmark.class.getSimpleName() + ".*")
+   .build();
+
+   new Runner(options).run();
+   }
+
+   @Benchmark
+   @OperationsPerInvocation(RECORDS_PER_INVOCATION)
+   public void multiInputMapSink(FlinkEnvironmentContext context) throws 
Exception {
+
+   StreamExecutionEnvironment env = context.env;
+
+   env.enableCheckpointing(CHECKPOINT_INTERVAL_MS);
+   env.setParallelism(1);
+   env.setRestartStrategy(RestartStrategies.noRestart());
+
+   // Setting buffer timeout to 1 is an attempt to improve 
twoInputMapSink benchmark stability.
+   // Without 1ms buffer timeout, some JVM forks are much slower 
then others, making results
+   // unstable and unreliable.
+   env.setBufferTimeout(1);
+
+   long numRecordsPerInput = RECORDS_PER_INVOCATION / 2;
+   DataStreamSource source1 = env.addSource(new 
LongSource(numRecordsPerInput));
+   DataStreamSource source2 = env.addSource(new 
LongSource(numRecordsPerInput));
+   connectAndDiscard(env, source1, source2);
+
+   env.execute();
+   }
+
+   @Benchmark
+   @OperationsPerInvocation(ONE_IDLE_RECORDS_PER_INVOCATION)
+   public void multiInputOneIdleMapSink(FlinkEnvironmentContext context) 
throws Exception {
+
+   StreamExecutionEnvironment env = context.env;
+   env.enableCheckpointing(CHECKPOINT_INTERVAL_MS);
+   env.setParallelism(1);
+
+   QueuingLongSource.reset();
+   DataStreamSource source1 = 

[GitHub] [flink-benchmarks] rkhachatryan commented on a change in pull request #3: [FLINK-18905] Provide basic benchmarks for MultipleInputStreamOperator

2020-08-28 Thread GitBox


rkhachatryan commented on a change in pull request #3:
URL: https://github.com/apache/flink-benchmarks/pull/3#discussion_r479216456



##
File path: src/main/java/org/apache/flink/benchmark/MultipleInputBenchmark.java
##
@@ -0,0 +1,161 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.benchmark;
+
+import org.apache.flink.api.common.restartstrategy.RestartStrategies;
+import org.apache.flink.api.common.typeinfo.BasicTypeInfo;
+import org.apache.flink.benchmark.functions.LongSource;
+import org.apache.flink.benchmark.functions.QueuingLongSource;
+import org.apache.flink.streaming.api.datastream.DataStreamSource;
+import org.apache.flink.streaming.api.datastream.MultipleConnectedStreams;
+import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
+import org.apache.flink.streaming.api.functions.sink.DiscardingSink;
+import org.apache.flink.streaming.api.operators.AbstractInput;
+import org.apache.flink.streaming.api.operators.AbstractStreamOperatorFactory;
+import org.apache.flink.streaming.api.operators.AbstractStreamOperatorV2;
+import org.apache.flink.streaming.api.operators.Input;
+import org.apache.flink.streaming.api.operators.MultipleInputStreamOperator;
+import org.apache.flink.streaming.api.operators.StreamOperator;
+import org.apache.flink.streaming.api.operators.StreamOperatorParameters;
+import 
org.apache.flink.streaming.api.transformations.MultipleInputTransformation;
+import org.apache.flink.streaming.runtime.streamrecord.StreamRecord;
+
+import org.openjdk.jmh.annotations.Benchmark;
+import org.openjdk.jmh.annotations.OperationsPerInvocation;
+import org.openjdk.jmh.runner.Runner;
+import org.openjdk.jmh.runner.RunnerException;
+import org.openjdk.jmh.runner.options.Options;
+import org.openjdk.jmh.runner.options.OptionsBuilder;
+import org.openjdk.jmh.runner.options.VerboseMode;
+
+import java.util.Arrays;
+import java.util.List;
+
+public class MultipleInputBenchmark extends BenchmarkBase {
+
+   public static final int RECORDS_PER_INVOCATION = 
TwoInputBenchmark.RECORDS_PER_INVOCATION;
+   public static final int ONE_IDLE_RECORDS_PER_INVOCATION = 
TwoInputBenchmark.ONE_IDLE_RECORDS_PER_INVOCATION;
+   public static final long CHECKPOINT_INTERVAL_MS = 
TwoInputBenchmark.CHECKPOINT_INTERVAL_MS;
+
+   public static void main(String[] args)
+   throws RunnerException {
+   Options options = new OptionsBuilder()
+   .verbosity(VerboseMode.NORMAL)
+   .include(".*" + 
MultipleInputBenchmark.class.getSimpleName() + ".*")
+   .build();
+
+   new Runner(options).run();
+   }
+
+   @Benchmark
+   @OperationsPerInvocation(RECORDS_PER_INVOCATION)
+   public void multiInputMapSink(FlinkEnvironmentContext context) throws 
Exception {
+
+   StreamExecutionEnvironment env = context.env;
+
+   env.enableCheckpointing(CHECKPOINT_INTERVAL_MS);
+   env.setParallelism(1);
+   env.setRestartStrategy(RestartStrategies.noRestart());
+
+   // Setting buffer timeout to 1 is an attempt to improve 
twoInputMapSink benchmark stability.
+   // Without 1ms buffer timeout, some JVM forks are much slower 
then others, making results
+   // unstable and unreliable.
+   env.setBufferTimeout(1);
+
+   long numRecordsPerInput = RECORDS_PER_INVOCATION / 2;
+   DataStreamSource source1 = env.addSource(new 
LongSource(numRecordsPerInput));
+   DataStreamSource source2 = env.addSource(new 
LongSource(numRecordsPerInput));
+   connectAndDiscard(env, source1, source2);
+
+   env.execute();
+   }
+
+   @Benchmark
+   @OperationsPerInvocation(ONE_IDLE_RECORDS_PER_INVOCATION)
+   public void multiInputOneIdleMapSink(FlinkEnvironmentContext context) 
throws Exception {
+
+   StreamExecutionEnvironment env = context.env;
+   env.enableCheckpointing(CHECKPOINT_INTERVAL_MS);
+   env.setParallelism(1);
+
+   QueuingLongSource.reset();
+   DataStreamSource source1 = 

[GitHub] [flink] flinkbot edited a comment on pull request #13273: [FLINK-18801][docs][python] Add a "10 minutes to Table API" document under the "Python API" -> "User Guide" -> "Table API" section.

2020-08-28 Thread GitBox


flinkbot edited a comment on pull request #13273:
URL: https://github.com/apache/flink/pull/13273#issuecomment-682353427


   
   ## CI report:
   
   * 0134fa06171742eb8ae840b2aced9530232bdb04 Azure: 
[FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=5955)
 
   * f0ae42ddcc931633408535d4df06c35bb56573ca UNKNOWN
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot edited a comment on pull request #13251: [FLINK-14435] Added memory configuration to TaskManagers REST endpoint

2020-08-28 Thread GitBox


flinkbot edited a comment on pull request #13251:
URL: https://github.com/apache/flink/pull/13251#issuecomment-680904640


   
   ## CI report:
   
   * 78fbaad055ce6f0c9bb59e37347793a08655bb69 Azure: 
[FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=5900)
 
   * 63794c99aa8dc635bc15ce3e7013241fa291f658 Azure: 
[PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=5969)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot edited a comment on pull request #13272: [FLINK-18695][network] Netty fakes heap buffer allocationn with direct buffers

2020-08-28 Thread GitBox


flinkbot edited a comment on pull request #13272:
URL: https://github.com/apache/flink/pull/13272#issuecomment-682346986


   
   ## CI report:
   
   * 3ef9f728f6a8005a7f91846bf32df96ee718c626 Azure: 
[FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=5952)
 
   * d34257206e16c6b9757046cf2c7d71d687ab75eb UNKNOWN
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot edited a comment on pull request #13216: [FLINK-18999][table-planner-blink][hive] Temporary generic table does…

2020-08-28 Thread GitBox


flinkbot edited a comment on pull request #13216:
URL: https://github.com/apache/flink/pull/13216#issuecomment-678268420


   
   ## CI report:
   
   * 6cef4d0e4c7cc7f835bad1594abf700df632d22b Azure: 
[SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=5799)
 
   * 31eef5a5d1f413435c3a3ccd9764c185b66c1c08 UNKNOWN
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] gaoyunhaii commented on pull request #13272: [FLINK-18695][network] Netty fakes heap buffer allocationn with direct buffers

2020-08-28 Thread GitBox


gaoyunhaii commented on pull request #13272:
URL: https://github.com/apache/flink/pull/13272#issuecomment-682497831


   Very sorry for not checking the tests at the first hand :( , I fixed the 
tests. 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] libenchao closed pull request #13240: [FLINK-19050][Documentation]Doc of MAX_DECIMAL_PRECISION should be DECIMAL

2020-08-28 Thread GitBox


libenchao closed pull request #13240:
URL: https://github.com/apache/flink/pull/13240


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] libenchao commented on pull request #13240: [FLINK-19050][Documentation]Doc of MAX_DECIMAL_PRECISION should be DECIMAL

2020-08-28 Thread GitBox


libenchao commented on pull request #13240:
URL: https://github.com/apache/flink/pull/13240#issuecomment-682494923


   @zhuxiaoshang Thanks for for your contribution, the change LGTM, will merge.
   
   Tips:
   - For such minor changes, we can raise a hotfix PR without a jira issue.
   - We suggest to reach an consensus in the jira issue firstly, before opening 
the PR.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Closed] (FLINK-14236) Make LazyFromSourcesSchedulingStrategy do lazy scheduling based on partition state only

2020-08-28 Thread Zhu Zhu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-14236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhu Zhu closed FLINK-14236.
---
Resolution: Won't Do

Not needed because lazy scheduling will be replaced by pipelined region 
scheduling FLINK-16430.

> Make LazyFromSourcesSchedulingStrategy do lazy scheduling based on partition 
> state only
> ---
>
> Key: FLINK-14236
> URL: https://issues.apache.org/jira/browse/FLINK-14236
> Project: Flink
>  Issue Type: Sub-task
>  Components: Runtime / Coordination
>Affects Versions: 1.10.0
>Reporter: Zhu Zhu
>Priority: Major
>
> The philosophy of lazy_from_sources scheduling is to schedule a task when its 
> inputs are ready, i.e. when needed result partitions are consumable. It 
> actually does not directly care whether a task is finished, though a finished 
> task will lead its partitions to be consumable.
> The {{LazyFromSourcesSchedulingStrategy}} currently does lazy scheduling in 
> two ways:
> 1. trigger scheduling on FINISHED execution state for blocking partition 
> consumers
> 2. trigger scheduling on partition consumable events for pipelined partition 
> consumers
> This makes the scheduling decision a bit messy. And it also requires the 
> {{LazyFromSourcesSchedulingStrategy}} to manage the partition state by 
> itself, to decide when a partition is consumable, which is not good.
> I'd propose to make {{LazyFromSourcesSchedulingStrategy}} do lazy scheduling 
> only based on partition state. This can make the scheduling decision making 
> clearer and relieve {{LazyFromSourcesSchedulingStrategy}} from partition 
> state management.
> This work requires FLINK-14234 so that all partition consumable events can be 
> notified.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-18712) Flink RocksDB statebackend memory leak issue

2020-08-28 Thread Julius Michaelis (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-18712?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17186505#comment-17186505
 ] 

Julius Michaelis commented on FLINK-18712:
--

@Farnight, have you tried setting {{state.backend.rocksdb.memory.managed: 
false}}, and checked whether it stops that behavior?

(I think I'm seeing something quite similar, I'll try to build a small 
reproducer next week.)

> Flink RocksDB statebackend memory leak issue 
> -
>
> Key: FLINK-18712
> URL: https://issues.apache.org/jira/browse/FLINK-18712
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / State Backends
>Affects Versions: 1.10.0
>Reporter: Farnight
>Priority: Critical
>
> When using RocksDB as our statebackend, we found it will lead to memory leak 
> when restarting job (manually or in recovery case).
>  
> How to reproduce:
>  # increase RocksDB blockcache size(e.g. 1G), it is easier to monitor and 
> reproduce.
>  # start a job using RocksDB statebackend.
>  # when the RocksDB blockcache reachs maximum size, restart the job. and 
> monitor the memory usage (k8s pod working set) of the TM.
>  # go through step 2-3 few more times. and memory will keep raising.
>  
> Any solution or suggestion for this? Thanks!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [flink] WeiZhong94 commented on pull request #13273: [FLINK-18801][docs][python] Add a "10 minutes to Table API" document under the "Python API" -> "User Guide" -> "Table API" section.

2020-08-28 Thread GitBox


WeiZhong94 commented on pull request #13273:
URL: https://github.com/apache/flink/pull/13273#issuecomment-682493403


   @alpinegizmo @dianfu Thanks for your comments! I have updated these docs, 
please take a look. :)



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Comment Edited] (FLINK-18712) Flink RocksDB statebackend memory leak issue

2020-08-28 Thread Julius Michaelis (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-18712?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17186505#comment-17186505
 ] 

Julius Michaelis edited comment on FLINK-18712 at 8/28/20, 12:15 PM:
-

@Farnight, have you tried setting {{state.backend.rocksdb.memory.managed: 
false}}, and checked whether it stops that behavior?

(I think I'm seeing something quite similar, I'll try to build a small 
reproducer next week. I'm running into this on bare metal / outside of 
Kubernetes. I'm also seeing some odd metaspace usage, but I'm unsure it's 
related.)


was (Author: caesar):
@Farnight, have you tried setting {{state.backend.rocksdb.memory.managed: 
false}}, and checked whether it stops that behavior?

(I think I'm seeing something quite similar, I'll try to build a small 
reproducer next week. I'm running into this on bare metal / outside of 
Kubernetes.)

> Flink RocksDB statebackend memory leak issue 
> -
>
> Key: FLINK-18712
> URL: https://issues.apache.org/jira/browse/FLINK-18712
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / State Backends
>Affects Versions: 1.10.0
>Reporter: Farnight
>Priority: Critical
>
> When using RocksDB as our statebackend, we found it will lead to memory leak 
> when restarting job (manually or in recovery case).
>  
> How to reproduce:
>  # increase RocksDB blockcache size(e.g. 1G), it is easier to monitor and 
> reproduce.
>  # start a job using RocksDB statebackend.
>  # when the RocksDB blockcache reachs maximum size, restart the job. and 
> monitor the memory usage (k8s pod working set) of the TM.
>  # go through step 2-3 few more times. and memory will keep raising.
>  
> Any solution or suggestion for this? Thanks!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (FLINK-18712) Flink RocksDB statebackend memory leak issue

2020-08-28 Thread Julius Michaelis (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-18712?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17186505#comment-17186505
 ] 

Julius Michaelis edited comment on FLINK-18712 at 8/28/20, 12:14 PM:
-

@Farnight, have you tried setting {{state.backend.rocksdb.memory.managed: 
false}}, and checked whether it stops that behavior?

(I think I'm seeing something quite similar, I'll try to build a small 
reproducer next week. I'm running into this on bare metal / outside of 
Kubernetes.)


was (Author: caesar):
@Farnight, have you tried setting {{state.backend.rocksdb.memory.managed: 
false}}, and checked whether it stops that behavior?

(I think I'm seeing something quite similar, I'll try to build a small 
reproducer next week.)

> Flink RocksDB statebackend memory leak issue 
> -
>
> Key: FLINK-18712
> URL: https://issues.apache.org/jira/browse/FLINK-18712
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / State Backends
>Affects Versions: 1.10.0
>Reporter: Farnight
>Priority: Critical
>
> When using RocksDB as our statebackend, we found it will lead to memory leak 
> when restarting job (manually or in recovery case).
>  
> How to reproduce:
>  # increase RocksDB blockcache size(e.g. 1G), it is easier to monitor and 
> reproduce.
>  # start a job using RocksDB statebackend.
>  # when the RocksDB blockcache reachs maximum size, restart the job. and 
> monitor the memory usage (k8s pod working set) of the TM.
>  # go through step 2-3 few more times. and memory will keep raising.
>  
> Any solution or suggestion for this? Thanks!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [flink] flinkbot edited a comment on pull request #13275: [FLINK-19064][hbase] HBaseRowDataInputFormat is leaking resources

2020-08-28 Thread GitBox


flinkbot edited a comment on pull request #13275:
URL: https://github.com/apache/flink/pull/13275#issuecomment-682391096


   
   ## CI report:
   
   * 8534c89725bb874f6c6126bbe6db4e9ccf64a820 Azure: 
[SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=5961)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot edited a comment on pull request #13271: [FLINK-19043][docs-zh] Translate the 'Logging' page of 'Debugging & Monitoring' into Chinese

2020-08-28 Thread GitBox


flinkbot edited a comment on pull request #13271:
URL: https://github.com/apache/flink/pull/13271#issuecomment-682323547


   
   ## CI report:
   
   * 077964070424e9463205a8a789cea37142d709bf Azure: 
[SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=5967)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot edited a comment on pull request #13251: [FLINK-14435] Added memory configuration to TaskManagers REST endpoint

2020-08-28 Thread GitBox


flinkbot edited a comment on pull request #13251:
URL: https://github.com/apache/flink/pull/13251#issuecomment-680904640


   
   ## CI report:
   
   * 78fbaad055ce6f0c9bb59e37347793a08655bb69 Azure: 
[FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=5900)
 
   * 63794c99aa8dc635bc15ce3e7013241fa291f658 UNKNOWN
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot edited a comment on pull request #13109: [FLINK-18808][runtime/metrics] Include side outputs in numRecordsOut metric.

2020-08-28 Thread GitBox


flinkbot edited a comment on pull request #13109:
URL: https://github.com/apache/flink/pull/13109#issuecomment-671381707


   
   ## CI report:
   
   * 21aeb58971fb6408da21af37a4a1dfee35917484 Azure: 
[SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=5960)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (FLINK-10429) Redesign Flink Scheduling, introducing dedicated Scheduler component

2020-08-28 Thread Zhu Zhu (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-10429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17186488#comment-17186488
 ] 

Zhu Zhu commented on FLINK-10429:
-

I think we can close it after we have finished FLINK-15626 to cleanup the 
legacy scheduling.
FLINK-16430 just makes uses of this feature and is not a blocker for closing it.

> Redesign Flink Scheduling, introducing dedicated Scheduler component
> 
>
> Key: FLINK-10429
> URL: https://issues.apache.org/jira/browse/FLINK-10429
> Project: Flink
>  Issue Type: New Feature
>  Components: Runtime / Coordination
>Affects Versions: 1.7.0
>Reporter: Stefan Richter
>Priority: Major
>
> This epic tracks the redesign of scheduling in Flink. Scheduling is currently 
> a concern that is scattered across different components, mainly the 
> ExecutionGraph/Execution and the SlotPool. Scheduling also happens only on 
> the granularity of individual tasks, which make holistic scheduling 
> strategies hard to implement. In this epic we aim to introduce a dedicated 
> Scheduler component that can support use-case like auto-scaling, 
> local-recovery, and resource optimized batch.
> The design for this feature is developed here: 
> https://docs.google.com/document/d/1q7NOqt05HIN-PlKEEPB36JiuU1Iu9fnxxVGJzylhsxU/edit?usp=sharing



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [flink] XComp commented on a change in pull request #13251: [FLINK-14435] Added memory configuration to TaskManagers REST endpoint

2020-08-28 Thread GitBox


XComp commented on a change in pull request #13251:
URL: https://github.com/apache/flink/pull/13251#discussion_r479199422



##
File path: 
flink-runtime/src/main/java/org/apache/flink/runtime/taskexecutor/TaskExecutorMemoryConfiguration.java
##
@@ -0,0 +1,333 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.runtime.taskexecutor;
+
+import org.apache.flink.configuration.ConfigOption;
+import org.apache.flink.configuration.Configuration;
+import org.apache.flink.configuration.MemorySize;
+
+import org.apache.flink.shaded.guava18.com.google.common.base.MoreObjects;
+import 
org.apache.flink.shaded.jackson2.com.fasterxml.jackson.annotation.JsonCreator;
+import 
org.apache.flink.shaded.jackson2.com.fasterxml.jackson.annotation.JsonInclude;
+import 
org.apache.flink.shaded.jackson2.com.fasterxml.jackson.annotation.JsonProperty;
+
+import java.io.Serializable;
+import java.util.Objects;
+
+import static 
org.apache.flink.configuration.TaskManagerOptions.FRAMEWORK_HEAP_MEMORY;
+import static 
org.apache.flink.configuration.TaskManagerOptions.FRAMEWORK_OFF_HEAP_MEMORY;
+import static org.apache.flink.configuration.TaskManagerOptions.JVM_METASPACE;
+import static 
org.apache.flink.configuration.TaskManagerOptions.JVM_OVERHEAD_FRACTION;
+import static 
org.apache.flink.configuration.TaskManagerOptions.JVM_OVERHEAD_MAX;
+import static 
org.apache.flink.configuration.TaskManagerOptions.JVM_OVERHEAD_MIN;
+import static 
org.apache.flink.configuration.TaskManagerOptions.MANAGED_MEMORY_FRACTION;
+import static 
org.apache.flink.configuration.TaskManagerOptions.MANAGED_MEMORY_SIZE;
+import static 
org.apache.flink.configuration.TaskManagerOptions.NETWORK_MEMORY_FRACTION;
+import static 
org.apache.flink.configuration.TaskManagerOptions.NETWORK_MEMORY_MAX;
+import static 
org.apache.flink.configuration.TaskManagerOptions.NETWORK_MEMORY_MIN;
+import static 
org.apache.flink.configuration.TaskManagerOptions.TASK_HEAP_MEMORY;
+import static 
org.apache.flink.configuration.TaskManagerOptions.TASK_OFF_HEAP_MEMORY;
+import static 
org.apache.flink.configuration.TaskManagerOptions.TOTAL_PROCESS_MEMORY;
+
+/**
+ * TaskExecutorConfiguration collects the configuration of a TaskExecutor 
instance.
+ */
+public class TaskExecutorMemoryConfiguration implements Serializable {
+
+   public static final String FIELD_NAME_FRAMEWORK_HEAP = "frameworkHeap";
+   public static final String FIELD_NAME_TASK_HEAP = "taskHeap";
+
+   public static final String FIELD_NAME_FRAMEWORK_OFFHEAP = 
"frameworkOffHeap";
+   public static final String FIELD_NAME_TASK_OFFHEAP = "taskOffHeap";
+
+   public static final String FIELD_NAME_NETWORK_MIN = "networkMin";
+   public static final String FIELD_NAME_NETWORK_MAX = "networkMax";
+   public static final String FIELD_NAME_NETWORK_FRACTION = 
"networkFraction";
+
+   public static final String FIELD_NAME_MANAGED_TOTAL = "managedTotal";
+   public static final String FIELD_NAME_MANAGED_FRACTION = 
"managedFraction";
+
+   public static final String FIELD_NAME_METASPACE_MAX = "metaspaceMax";
+
+   public static final String FIELD_NAME_OVERHEAD_MIN = "overheadMin";
+   public static final String FIELD_NAME_OVERHEAD_MAX = "overheadMax";
+   public static final String FIELD_NAME_OVERHEAD_FRACTION = 
"overheadFraction";
+
+   public static final String FIELD_NAME_MEMORY_TOTAL = "memoryTotal";
+
+   @JsonProperty(FIELD_NAME_FRAMEWORK_HEAP)
+   @JsonInclude
+   private final Long frameworkHeap;
+
+   @JsonProperty(FIELD_NAME_TASK_HEAP)
+   private final Long taskHeap;
+
+   @JsonProperty(FIELD_NAME_FRAMEWORK_OFFHEAP)
+   private final Long frameworkOffHeap;
+
+   @JsonProperty(FIELD_NAME_TASK_OFFHEAP)
+   private final Long taskOffHeap;
+
+   @JsonProperty(FIELD_NAME_NETWORK_MIN)
+   private final Long networkMemoryMin;
+   @JsonProperty(FIELD_NAME_NETWORK_MAX)
+   private final Long networkMemoryMax;
+   @JsonProperty(FIELD_NAME_NETWORK_FRACTION)
+   private final Float networkMemoryFraction;
+
+   @JsonProperty(FIELD_NAME_MANAGED_TOTAL)
+   private final Long managedMemoryTotal;
+ 

[GitHub] [flink] flinkbot edited a comment on pull request #13244: [FLINK-18333][jdbc] UnsignedTypeConversionITCase failed caused by MariaDB4j "Asked to waitFor Program"

2020-08-28 Thread GitBox


flinkbot edited a comment on pull request #13244:
URL: https://github.com/apache/flink/pull/13244#issuecomment-680535784


   
   ## CI report:
   
   * 9c6afc2e13a7c112c0b65d0fb64b13224ec55efd Azure: 
[FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=5959)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Closed] (FLINK-15031) Calculate required shuffle memory before allocating slots if resources are specified

2020-08-28 Thread Zhu Zhu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-15031?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhu Zhu closed FLINK-15031.
---
Resolution: Later

> Calculate required shuffle memory before allocating slots if resources are 
> specified
> 
>
> Key: FLINK-15031
> URL: https://issues.apache.org/jira/browse/FLINK-15031
> Project: Flink
>  Issue Type: Task
>  Components: Runtime / Coordination
>Affects Versions: 1.10.0
>Reporter: Zhu Zhu
>Assignee: Zhu Zhu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.12.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> In cases where resources are specified, we expect each operator to declare 
> required resources before using them. In this way, no resource related error 
> should happen if resources are not used beyond what was declared. This 
> ensures a deployed task would not fail due to insufficient resources in TM, 
> which may result in unnecessary failures and may even cause a job hanging 
> forever, failing repeatedly on deploying tasks to a TM with insufficient 
> resources.
> Shuffle memory is the last missing piece for this goal at the moment. Minimum 
> network buffers are required by tasks to work. Currently a task is possible 
> to be deployed to a TM with insufficient network buffers, and fails on 
> launching.
> To avoid that, we should calculate required network memory for a 
> task/SlotSharingGroup before allocating a slot for it.
> The required shuffle memory can be derived from the number of required 
> network buffers. The number of buffers required by a task (ExecutionVertex) is
> {code:java}
> exclusive buffers for input channels(i.e. numInputChannel * 
> buffersPerChannel) + required buffers for result partition buffer 
> pool(currently is numberOfSubpartitions + 1)
> {code}
> Note that this is for the {{NettyShuffleService}} case. For custom shuffle 
> services, currently there is no way to get the required shuffle memory of a 
> task.
> To make it simple under dynamic slot sharing, the required shuffle memory for 
> a task should be the max required shuffle memory of all {{ExecutionVertex}} 
> of the same {{ExecutionJobVertex}}. And the required shuffle memory for a 
> slot sharing group should be the sum of shuffle memory for each 
> {{ExecutionJobVertex}} instance within.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [flink] flinkbot edited a comment on pull request #13209: [FLINK-18832][datastream] Add compatible check for blocking partition with buffer timeout

2020-08-28 Thread GitBox


flinkbot edited a comment on pull request #13209:
URL: https://github.com/apache/flink/pull/13209#issuecomment-677744672


   
   ## CI report:
   
   * 454310e0df591eac9dd81940f3cf690185155c70 Azure: 
[FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=5884)
 
   * 6b299eb84623c97e80fcd15d64702fb993947186 Azure: 
[PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=5966)
 
   * a343c2c3bf36c97dca7045c65eccbcccfbbef5bf UNKNOWN
   * 3de3f59281d22f21dfbb30cf7d93b1ecd3165a9e Azure: 
[PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=5968)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (FLINK-15031) Calculate required shuffle memory before allocating slots if resources are specified

2020-08-28 Thread Zhu Zhu (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-15031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17186485#comment-17186485
 ] 

Zhu Zhu commented on FLINK-15031:
-

It is needed only after we have enabled users to set non-UNKNOWN resources.
I will close it for now. We can reopen it if it is needed in the future.

> Calculate required shuffle memory before allocating slots if resources are 
> specified
> 
>
> Key: FLINK-15031
> URL: https://issues.apache.org/jira/browse/FLINK-15031
> Project: Flink
>  Issue Type: Task
>  Components: Runtime / Coordination
>Affects Versions: 1.10.0
>Reporter: Zhu Zhu
>Assignee: Zhu Zhu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.12.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> In cases where resources are specified, we expect each operator to declare 
> required resources before using them. In this way, no resource related error 
> should happen if resources are not used beyond what was declared. This 
> ensures a deployed task would not fail due to insufficient resources in TM, 
> which may result in unnecessary failures and may even cause a job hanging 
> forever, failing repeatedly on deploying tasks to a TM with insufficient 
> resources.
> Shuffle memory is the last missing piece for this goal at the moment. Minimum 
> network buffers are required by tasks to work. Currently a task is possible 
> to be deployed to a TM with insufficient network buffers, and fails on 
> launching.
> To avoid that, we should calculate required network memory for a 
> task/SlotSharingGroup before allocating a slot for it.
> The required shuffle memory can be derived from the number of required 
> network buffers. The number of buffers required by a task (ExecutionVertex) is
> {code:java}
> exclusive buffers for input channels(i.e. numInputChannel * 
> buffersPerChannel) + required buffers for result partition buffer 
> pool(currently is numberOfSubpartitions + 1)
> {code}
> Note that this is for the {{NettyShuffleService}} case. For custom shuffle 
> services, currently there is no way to get the required shuffle memory of a 
> task.
> To make it simple under dynamic slot sharing, the required shuffle memory for 
> a task should be the max required shuffle memory of all {{ExecutionVertex}} 
> of the same {{ExecutionJobVertex}}. And the required shuffle memory for a 
> slot sharing group should be the sum of shuffle memory for each 
> {{ExecutionJobVertex}} instance within.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (FLINK-14607) SharedSlot cannot fulfill pending slot requests before it's completely released

2020-08-28 Thread Zhu Zhu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-14607?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhu Zhu closed FLINK-14607.
---
Resolution: Won't Fix

> SharedSlot cannot fulfill pending slot requests before it's completely 
> released
> ---
>
> Key: FLINK-14607
> URL: https://issues.apache.org/jira/browse/FLINK-14607
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Coordination
>Affects Versions: 1.9.1, 1.10.0
>Reporter: Zhu Zhu
>Priority: Minor
>
> Currently a pending request can only be fulfilled when a physical 
> slot({{AllocatedSlot}}) becomes available in {{SlotPool}}.
> A shared slot however, cannot be used to fulfill pending requests even if it 
> becomes qualified. This may lead to resource deadlocks in certain cases.
> For example, running job A(parallelism=2) --(pipelined)--> B(parallelism=2) 
> with 1 slot only, all vertices are in the same slot sharing group, here's 
> what may happen:
> 1. Schedule A1 and A2. A1 acquires the only slot, A2's slot request is 
> pending because a slot cannot host 2 instances of the same JobVertex at the 
> same time. Shared slot status: \{A1\}
> 2. A1 produces data and triggers the scheduling of B1. Shared slot status: 
> \{A1, B1\}
> 3. A1 finishes. Shared slot status: \{B1\}
> 4. B1 cannot finish since A2 has not finished, while A2 cannot get launched 
> due to no physical slot becomes available, even though the shred slot is 
> qualified to host it now. A resource deadlock happens.
> Maybe we should improve {{SlotSharingManager}}. One a task slot is released, 
> its root {{MultiTaskSlot}} should be used to try fulfilling existing pending 
> task slots from other pending root slots({{unresolvedRootSlots}}) in this 
> {{SlotSharingManager}}(means in the same slot sharing group).
> We need to be careful to not cause any failures, and do not violate 
> colocation constraints.
> cc [~trohrmann]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-14607) SharedSlot cannot fulfill pending slot requests before it's completely released

2020-08-28 Thread Zhu Zhu (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-14607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17186484#comment-17186484
 ] 

Zhu Zhu commented on FLINK-14607:
-

It would be not a problem. I will close this ticket.

> SharedSlot cannot fulfill pending slot requests before it's completely 
> released
> ---
>
> Key: FLINK-14607
> URL: https://issues.apache.org/jira/browse/FLINK-14607
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Coordination
>Affects Versions: 1.9.1, 1.10.0
>Reporter: Zhu Zhu
>Priority: Minor
>
> Currently a pending request can only be fulfilled when a physical 
> slot({{AllocatedSlot}}) becomes available in {{SlotPool}}.
> A shared slot however, cannot be used to fulfill pending requests even if it 
> becomes qualified. This may lead to resource deadlocks in certain cases.
> For example, running job A(parallelism=2) --(pipelined)--> B(parallelism=2) 
> with 1 slot only, all vertices are in the same slot sharing group, here's 
> what may happen:
> 1. Schedule A1 and A2. A1 acquires the only slot, A2's slot request is 
> pending because a slot cannot host 2 instances of the same JobVertex at the 
> same time. Shared slot status: \{A1\}
> 2. A1 produces data and triggers the scheduling of B1. Shared slot status: 
> \{A1, B1\}
> 3. A1 finishes. Shared slot status: \{B1\}
> 4. B1 cannot finish since A2 has not finished, while A2 cannot get launched 
> due to no physical slot becomes available, even though the shred slot is 
> qualified to host it now. A resource deadlock happens.
> Maybe we should improve {{SlotSharingManager}}. One a task slot is released, 
> its root {{MultiTaskSlot}} should be used to try fulfilling existing pending 
> task slots from other pending root slots({{unresolvedRootSlots}}) in this 
> {{SlotSharingManager}}(means in the same slot sharing group).
> We need to be careful to not cause any failures, and do not violate 
> colocation constraints.
> cc [~trohrmann]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-19087) ReaultPartitionWriter should not expose subpartition but only subpartition-readers

2020-08-28 Thread Stephan Ewen (Jira)
Stephan Ewen created FLINK-19087:


 Summary: ReaultPartitionWriter should not expose subpartition but 
only subpartition-readers
 Key: FLINK-19087
 URL: https://issues.apache.org/jira/browse/FLINK-19087
 Project: Flink
  Issue Type: Sub-task
  Components: Runtime / Network
Reporter: Stephan Ewen
Assignee: Stephan Ewen
 Fix For: 1.12.0


The {{ResultPartitionWiter}} currently gives arbitrary access to the 
sub-partitions.

These subpartitions may not always exist directly, such as in a sort based 
shuffle.
Necessary is only the access to a reader over a sub-partition's data (the 
ResultSubpartitionView).

In the spirit of minimal scope of knowledge, the methods should be scoped to 
return readers, not the more general subpartitions.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [flink] rkhachatryan commented on a change in pull request #13234: [FLINK-18905][task] Allow SourceOperator chaining with MultipleInputStreamTask

2020-08-28 Thread GitBox


rkhachatryan commented on a change in pull request #13234:
URL: https://github.com/apache/flink/pull/13234#discussion_r479150979



##
File path: 
flink-streaming-java/src/main/java/org/apache/flink/streaming/runtime/tasks/OperatorChain.java
##
@@ -208,19 +212,97 @@ public OperatorChain(
OperatorChain(
List> allOperatorWrappers,
RecordWriterOutput[] streamOutputs,
-   WatermarkGaugeExposingOutput> 
chainEntryPoint,
-   StreamOperatorWrapper headOperatorWrapper) {
+   WatermarkGaugeExposingOutput> 
mainOperatorOutput,
+   StreamOperatorWrapper mainOperatorWrapper) {
 
this.streamOutputs = checkNotNull(streamOutputs);
-   this.chainEntryPoint = checkNotNull(chainEntryPoint);
+   this.mainOperatorOutput = checkNotNull(mainOperatorOutput);
this.operatorEventDispatcher = null;
 
checkState(allOperatorWrappers != null && 
allOperatorWrappers.size() > 0);
-   this.headOperatorWrapper = checkNotNull(headOperatorWrapper);
+   this.mainOperatorWrapper = checkNotNull(mainOperatorWrapper);
this.tailOperatorWrapper = allOperatorWrappers.get(0);
this.numOperators = allOperatorWrappers.size();
+   this.chainedSources = Collections.emptyMap();
+
+   firstOperatorWrapper = 
linkOperatorWrappers(allOperatorWrappers);
+   }
+
+   private void createChainOutputs(
+   List outEdgesInOrder,
+   
RecordWriterDelegate>> 
recordWriterDelegate,
+   Map chainedConfigs,
+   StreamTask containingTask,
+   Map> streamOutputMap) 
{
+   for (int i = 0; i < outEdgesInOrder.size(); i++) {
+   StreamEdge outEdge = outEdgesInOrder.get(i);
+
+   RecordWriterOutput streamOutput = createStreamOutput(
+   recordWriterDelegate.getRecordWriter(i),
+   outEdge,
+   chainedConfigs.get(outEdge.getSourceId()),
+   containingTask.getEnvironment());
+
+   this.streamOutputs[i] = streamOutput;
+   streamOutputMap.put(outEdge, streamOutput);
+   }
+   }
+
+   private Map createChainedInputs(

Review comment:
   This method as well as `createChainedInputs` is full of compiler 
warnings too - can you fix them please?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] WeiZhong94 commented on a change in pull request #13273: [FLINK-18801][docs][python] Add a "10 minutes to Table API" document under the "Python API" -> "User Guide" -> "Table API" sec

2020-08-28 Thread GitBox


WeiZhong94 commented on a change in pull request #13273:
URL: https://github.com/apache/flink/pull/13273#discussion_r479173568



##
File path: docs/dev/python/user-guide/table/10_minutes_to_table_api.md
##
@@ -0,0 +1,712 @@
+---
+title: "10 Minutes to Table API"
+nav-parent_id: python_tableapi
+nav-pos: 25
+---
+
+
+This document is a short introduction to PyFlink Table API, which is used to 
help novice users quickly understand the basic usage of PyFlink Table API.
+For advanced usage, please refer to other documents in this User Guide.
+
+* This will be replaced by the TOC
+{:toc}
+
+Common Structure of Python Table API Program 
+
+
+All Table API and SQL programs for batch and streaming follow the same 
pattern. The following code example shows the common structure of Table API and 
SQL programs.
+
+{% highlight python %}
+
+from pyflink.table import EnvironmentSettings, StreamTableEnvironment
+
+# 1. create a TableEnvironment
+table_env = 
StreamTableEnvironment.create(environment_settings=EnvironmentSettings.new_instance().in_streaming_mode().use_blink_planner().build())
 
+
+# 2. create source Table
+table_env.execute_sql("""
+CREATE TABLE datagen (
+ id INT,
+ data STRING
+) WITH (
+ 'connector' = 'datagen',
+ 'fields.id.kind' = 'sequence',
+ 'fields.id.start' = '1',
+ 'fields.id.end' = '10'
+)
+""")
+
+# 3. create sink Table
+table_env.execute_sql("""
+CREATE TABLE print (
+ id INT,
+ data STRING
+) WITH (
+ 'connector' = 'print'
+)
+""")
+
+# 4. query from source table and caculate
+# create a Table from a Table API query:
+tapi_result = table_env.from_path("datagen").select("id + 1, data")
+# or create a Table from a SQL query:
+sql_result = table_env.sql_query("SELECT * FROM datagen").select("id + 1, 
data")
+
+# 5. emit query result to sink table
+# emit a Table API result Table to a sink table:
+tapi_result.execute_insert("print").get_job_client().get_job_execution_result().result()
+sql_result.execute_insert("print").get_job_client().get_job_execution_result().result()
+# or emit results via SQL query:
+table_env.execute_sql("INSERT INTO print SELECT * FROM 
datagen").get_job_client().get_job_execution_result().result()
+
+{% endhighlight %}
+
+{% top %}
+
+Create a TableEnvironment
+-
+
+The `TableEnvironment` is a central concept of the Table API and SQL 
integration. The following code example shows how to create a TableEnvironment:
+
+{% highlight python %}
+
+from pyflink.table import EnvironmentSettings, StreamTableEnvironment, 
BatchTableEnvironment
+
+# create a blink streaming TableEnvironment
+table_env = 
StreamTableEnvironment.create(environment_settings=EnvironmentSettings.new_instance().in_streaming_mode().use_blink_planner().build())
+
+# create a blink batch TableEnvironment
+table_env = 
BatchTableEnvironment.create(environment_settings=EnvironmentSettings.new_instance().in_batch_mode().use_blink_planner().build())
+
+# create a flink streaming TableEnvironment
+table_env = 
StreamTableEnvironment.create(environment_settings=EnvironmentSettings.new_instance().in_streaming_mode().use_old_planner().build())
+
+# create a flink batch TableEnvironment
+table_env = 
BatchTableEnvironment.create(environment_settings=EnvironmentSettings.new_instance().in_batch_mode().use_old_planner().build())
+
+{% endhighlight %}
+
+The `TableEnvironment` is responsible for:
+
+* Creating `Table`s
+* Registering `Table`s to the catalog
+* Executing SQL queries
+* Registering user-defined (scalar, table, or aggregation) functions
+* Offering further configuration options.
+* Add Python dependencies to support running Python UDF on remote cluster
+* Executing jobs.
+
+Currently there are 2 planners available: flink planner and blink planner.
+
+You should explicitly set which planner to use in the current program.
+We recommend using the blink planner as much as possible. 
+The blink planner is more powerful in functionality and performance, and the 
flink planner is reserved for compatibility.
+
+{% top %}
+
+Create Tables
+-
+
+`Table` is the core component of the Table API. A `Table` represents a 
intermediate result set during a Table API Job.
+
+A `Table` is always bound to a specific `TableEnvironment`. It is not possible 
to combine tables of different TableEnvironments in same query, e.g., to join 
or union them.
+
+### Create From A List Object
+
+You can create a Table from a list object:
+
+{% highlight python %}
+
+# create a blink batch TableEnvironment
+from pyflink.table import EnvironmentSettings, BatchTableEnvironment
+table_env = 
BatchTableEnvironment.create(environment_settings=EnvironmentSettings.new_instance().in_batch_mode().use_blink_planner().build())
+
+table = table_env.from_elements([(1, 'Hi'), (2, 'Hello')])
+table.to_pandas()
+
+{% endhighlight %}
+
+The result is:
+
+{% highlight text %}
+   _1 _2
+0   1 Hi
+1   2  Hello
+{% endhighlight %}
+
+You can also create 

  1   2   3   >