date:20201115

[jira] [Assigned] (FLINK-20168) Translate page 'Flink Architecture' into Chinese

2020-11-15 Thread Jark Wu (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-20168?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jark Wu reassigned FLINK-20168:
---

Assignee: CaoZhen

> Translate page 'Flink Architecture' into Chinese
> 
>
> Key: FLINK-20168
> URL: https://issues.apache.org/jira/browse/FLINK-20168
> Project: Flink
>  Issue Type: Improvement
>  Components: chinese-translation, Documentation
>Reporter: CaoZhen
>Assignee: CaoZhen
>Priority: Minor
>
> Translate the page [Flink 
> Architecture|https://ci.apache.org/projects/flink/flink-docs-release-1.11/concepts/flink-architecture.html].
> The doc located in "flink/docs/concepts/flink-architecture.zh.md"



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[GitHub] [flink] flinkbot edited a comment on pull request #14020: [FLINK-20078][coordination] Factor out an ExecutionGraph factory method for DefaultExecutionTopology

2020-11-15 Thread GitBox



flinkbot edited a comment on pull request #14020:
URL: https://github.com/apache/flink/pull/14020#issuecomment-724735089


   
   ## CI report:
   
   * bb25d7f6325759f7400ae0b5728d41d478784659 UNKNOWN
   * c7df499afee4829fd30823f277597e04de20a8f2 Azure: 
[SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=9449)
 
   * ab964bbe7a026494fca227e41979b74597dc7067 UNKNOWN
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Commented] (FLINK-17726) Scheduler should take care of tasks directly canceled by TaskManager

2020-11-15 Thread Yuan Mei (Jira)



[ 
https://issues.apache.org/jira/browse/FLINK-17726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17232595#comment-17232595
 ] 

Yuan Mei commented on FLINK-17726:
--

Double checked with [~nicholasjiang], he said he would finish this ticket by 
the end of this week.

> Scheduler should take care of tasks directly canceled by TaskManager
> 
>
> Key: FLINK-17726
> URL: https://issues.apache.org/jira/browse/FLINK-17726
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Coordination, Runtime / Task
>Affects Versions: 1.11.0, 1.12.0
>Reporter: Zhu Zhu
>Priority: Critical
> Fix For: 1.12.0, 1.11.3, 1.13.0
>
>
> JobManager will not trigger failure handling when receiving CANCELED task 
> update. 
> This is because CANCELED tasks are usually caused by another FAILED task. 
> These CANCELED tasks will be restarted by the failover process triggered  
> FAILED task.
> However, if a task is directly CANCELED by TaskManager due to its own runtime 
> issue, the task will not be recovered by JM and thus the job would hang.
> This is a potential issue and we should avoid it.
> A possible solution is to let JobManager treat tasks transitioning to 
> CANCELED from all states except from CANCELING as failed tasks. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[GitHub] [flink] flinkbot edited a comment on pull request #13991: [FLINK-19983][network] Remove wrong state checking in SortMergeSubpartitionReader

2020-11-15 Thread GitBox



flinkbot edited a comment on pull request #13991:
URL: https://github.com/apache/flink/pull/13991#issuecomment-723737863


   
   ## CI report:
   
   * 9d1b60190d1e7c588b225301792b63789227a91b Azure: 
[SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=9332)
 
   * c44538769371ce3114274f3322168a9fe18e4875 UNKNOWN
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Commented] (FLINK-20098) Don't add flink-connector-files to flink-dist

2020-11-15 Thread Jark Wu (Jira)



[ 
https://issues.apache.org/jira/browse/FLINK-20098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17232593#comment-17232593
 ] 

Jark Wu commented on FLINK-20098:
-

If we want to solve the "provided" problem, then we may need to remove csv and 
json format from the dist too? 
It sounds like a regression on the out-of-box experience for SQL users. We have 
a long discussion before, and decided to put some light-weight dependencies 
into dist. However, now, users have to add dependency filesystem jar manually 
which is a frequently used connector in POC. 


> Don't add flink-connector-files to flink-dist
> -
>
> Key: FLINK-20098
> URL: https://issues.apache.org/jira/browse/FLINK-20098
> Project: Flink
>  Issue Type: Improvement
>  Components: API / DataStream
>Reporter: Aljoscha Krettek
>Assignee: Aljoscha Krettek
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.12.0
>
>
> We currently add both {{flink-connector-files}} and {{flink-connector-base}} 
> to {{flink-dist}}. 
> This implies, that users should use the dependency like this:
> {code}
> 
>   org.apache.flink
>   flink-connector-files
>   ${project.version}
>   provided
> 
> {code}
> which differs from other connectors where users don't need to specify 
> {{provided}}.
> Also, {{flink-connector-files}} has {{flink-connector-base}} as a provided 
> dependency, which means that examples that use this dependency will not run 
> out-of-box in IntelliJ because transitive provided dependencies will not be 
> considered.
> I propose to just remove the dependencies from {{flink-dist}} and let users 
> use the File Connector like any other connector.
> I believe the initial motivation for "providing" the File Connector in 
> {{flink-dist}} was to allow us to use the File Connector under the hood in 
> methods such as {{StreamExecutionEnvironment.readFile(...)}}. We could decide 
> to deprecate and remove those methods or re-add the File Connector as an 
> explicit (non-provided) dependency again in the future.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Comment Edited] (FLINK-20098) Don't add flink-connector-files to flink-dist

2020-11-15 Thread Jark Wu (Jira)



[ 
https://issues.apache.org/jira/browse/FLINK-20098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17232593#comment-17232593
 ] 

Jark Wu edited comment on FLINK-20098 at 11/16/20, 7:53 AM:


If we want to solve the "provided" problem, then we may need to remove csv and 
json format from the dist too? 
It sounds like a regression on the out-of-box experience for SQL users. We have 
a long discussion [1] before, and decided to put some light-weight dependencies 
into dist. However, now, users have to add dependency filesystem jar manually 
which is a frequently used connector in POC. 

[1]: 
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Releasing-quot-fat-quot-and-quot-slim-quot-Flink-distributions-td40237i40.html#a42276



was (Author: jark):
If we want to solve the "provided" problem, then we may need to remove csv and 
json format from the dist too? 
It sounds like a regression on the out-of-box experience for SQL users. We have 
a long discussion before, and decided to put some light-weight dependencies 
into dist. However, now, users have to add dependency filesystem jar manually 
which is a frequently used connector in POC. 


> Don't add flink-connector-files to flink-dist
> -
>
> Key: FLINK-20098
> URL: https://issues.apache.org/jira/browse/FLINK-20098
> Project: Flink
>  Issue Type: Improvement
>  Components: API / DataStream
>Reporter: Aljoscha Krettek
>Assignee: Aljoscha Krettek
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.12.0
>
>
> We currently add both {{flink-connector-files}} and {{flink-connector-base}} 
> to {{flink-dist}}. 
> This implies, that users should use the dependency like this:
> {code}
> 
>   org.apache.flink
>   flink-connector-files
>   ${project.version}
>   provided
> 
> {code}
> which differs from other connectors where users don't need to specify 
> {{provided}}.
> Also, {{flink-connector-files}} has {{flink-connector-base}} as a provided 
> dependency, which means that examples that use this dependency will not run 
> out-of-box in IntelliJ because transitive provided dependencies will not be 
> considered.
> I propose to just remove the dependencies from {{flink-dist}} and let users 
> use the File Connector like any other connector.
> I believe the initial motivation for "providing" the File Connector in 
> {{flink-dist}} was to allow us to use the File Connector under the hood in 
> methods such as {{StreamExecutionEnvironment.readFile(...)}}. We could decide 
> to deprecate and remove those methods or re-add the File Connector as an 
> explicit (non-provided) dependency again in the future.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[GitHub] [flink] wangyang0918 commented on a change in pull request #14006: [FLINK-19546][doc] Add documentation for native Kubernetes HA

2020-11-15 Thread GitBox



wangyang0918 commented on a change in pull request #14006:
URL: https://github.com/apache/flink/pull/14006#discussion_r523933632



##
File path: docs/ops/jobmanager_high_availability.md
##
@@ -215,6 +215,76 @@ Starting zookeeper daemon on host localhost.

 $ bin/yarn-session.sh -n 2
 
+## Kubernetes Cluster High Availability
+When running Flink JobManager as a Kubernetes deployment, the replica count 
should be configured to 1 or greater.
+* The value `1` means that a new JobManager will be launched to take over 
leadership if the current one terminates exceptionally.
+* The value `N` (greater than 1) means that multiple JobManagers will be 
launched simultaneously while one is active and others are standby. Starting 
more than one JobManager will make the recovery faster.
+
+### Configuration
+{% highlight yaml %}
+kubernetes.cluster-id: 
+high-availability: 
org.apache.flink.kubernetes.highavailability.KubernetesHaServicesFactory
+high-availability.storageDir: hdfs:///flink/recovery
+{% endhighlight %}
+
+ Example: Highly Available Standalone Flink Cluster on Kubernetes
+Both session and job/application clusters support using the Kubernetes high 
availability service. Users just need to add the following Flink config options 
to [flink-configuration-configmap.yaml]({{ 
site.baseurl}}/ops/deployment/kubernetes.html#common-cluster-resource-definitions).
 All other yamls do not need to be updated.
+
+Note The filesystem which corresponds to 
the scheme of your configured HA storage directory must be available to the 
runtime. Refer to [custom Flink image]({{ 
site.baseurl}}/ops/deployment/docker.html#customize-flink-image) and [enable 
plugins]({{ site.baseurl}}/ops/deployment/docker.html#using-plugins) for more 
information.
+
+{% highlight yaml %}
+apiVersion: v1
+kind: ConfigMap
+metadata:
+  name: flink-config
+  labels:
+app: flink
+data:
+  flink-conf.yaml: |+
+  ...
+kubernetes.cluster-id: 
+high-availability: 
org.apache.flink.kubernetes.highavailability.KubernetesHaServicesFactory
+high-availability.storageDir: hdfs:///flink/recovery
+restart-strategy: fixed-delay
+restart-strategy.fixed-delay.attempts: 10
+  ...
+{% endhighlight %}
+
+ Example: Highly Available Native Kubernetes Cluster
+Using the following command to start a native Flink application cluster on 
Kubernetes with high availability configured.
+{% highlight bash %}
+$ ./bin/flink run-application -p 8 -t kubernetes-application \
+  -Dkubernetes.cluster-id= \
+  -Dtaskmanager.memory.process.size=4096m \
+  -Dkubernetes.taskmanager.cpu=2 \
+  -Dtaskmanager.numberOfTaskSlots=4 \
+  -Dkubernetes.container.image= \
+  
-Dhigh-availability=org.apache.flink.kubernetes.highavailability.KubernetesHaServicesFactory
 \
+  -Dhigh-availability.storageDir=s3://flink/flink-ha \
+  -Drestart-strategy=fixed-delay -Drestart-strategy.fixed-delay.attempts=10 \
+  
-Dcontainerized.master.env.ENABLE_BUILT_IN_PLUGINS=flink-s3-fs-hadoop-{{site.version}}.jar
 \
+  
-Dcontainerized.taskmanager.env.ENABLE_BUILT_IN_PLUGINS=flink-s3-fs-hadoop-{{site.version}}.jar
 \
+  local:///opt/flink/examples/streaming/StateMachineExample.jar
+{% endhighlight %}
+
+### High Availability Data Clean Up
+Currently, when a Flink job reached the terminal state (`FAILED`, `CANCELED`, 
`FINISHED`), all the HA data, including metadata in Kubernetes ConfigMap and HA 
state on DFS, will be cleaned up.
+
+So the following command will only shut down the Flink session cluster and 
leave all the HA related ConfigMaps, state untouched.
+{% highlight bash %}
+$ echo 'stop' | ./bin/kubernetes-session.sh 
-Dkubernetes.cluster-id= -Dexecution.attached=true
+{% endhighlight %}
+
+The following commands will cancel the job in application or session cluster 
and effectively remove all its HA data.
+{% highlight bash %}
+# Cancel a Flink job in the existing session
+$ ./bin/flink cancel -t kubernetes-session -Dkubernetes.cluster-id= 

+# Cancel a Flink application
+$ ./bin/flink cancel -t kubernetes-application 
-Dkubernetes.cluster-id= 
+{% endhighlight %}
+
+To keep HA data while restarting the Flink cluster, simply delete the deploy 
(via `kubectl delete deploy `). All the Flink cluster related 
resources will be destroyed (e.g. JobManager Deployment, TaskManager pods, 
services, Flink conf ConfigMap), as to not occupy the Kubernetes cluster 
resources. However, HA related ConfigMaps do not set the owner reference and 
they will be retained. When restarting the session / application, use 
`kubernetes-session.sh` or `flink run-application`. All the previous suspending 
running jobs will recover from the latest checkpoint successfully.

Review comment:
   > I don't understand the subclause , as to not occupy  resources.
   
   What I want to state here is that we will free all the allocated resources 
from K8s so that they could be used by others. At the same time, we still have 
the ability to recover the Flink application at any tim

[jira] [Commented] (FLINK-20143) use `yarn.provided.lib.dirs` config deploy job failed in yarn per job mode

2020-11-15 Thread Kostas Kloudas (Jira)



[ 
https://issues.apache.org/jira/browse/FLINK-20143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17232590#comment-17232590
 ] 

Kostas Kloudas commented on FLINK-20143:


Thanks [~fly_in_gis], feel free to ping me for a review.

> use `yarn.provided.lib.dirs` config deploy job failed in yarn per job mode
> --
>
> Key: FLINK-20143
> URL: https://issues.apache.org/jira/browse/FLINK-20143
> Project: Flink
>  Issue Type: Bug
>  Components: Client / Job Submission, Deployment / YARN
>Affects Versions: 1.12.0
>Reporter: zhisheng
>Priority: Major
> Attachments: image-2020-11-13-17-21-47-751.png, 
> image-2020-11-13-17-22-06-111.png, image-2020-11-13-18-43-55-188.png
>
>
> use follow command deploy flink job to yarn failed 
> {code:java}
> ./bin/flink run -m yarn-cluster -d -ynm flink-1.12-test -ytm 3g -yjm 3g -yD 
> yarn.provided.lib.dirs=hdfs:///flink/flink-1.12-SNAPSHOT/lib
> ./examples/streaming/StateMachineExample.jar
> {code}
> log:
> {code:java}
> $ ./bin/flink run -m yarn-cluster -d -ynm flink-1.12-test -ytm 3g -yjm 3g -yD 
> yarn.provided.lib.dirs=hdfs:///flink/flink-1.12-SNAPSHOT/lib    
> ./examples/streaming/StateMachineExample.jar$ ./bin/flink run -m yarn-cluster 
> -d -ynm flink-1.12-test -ytm 3g -yjm 3g -yD 
> yarn.provided.lib.dirs=hdfs:///flink/flink-1.12-SNAPSHOT/lib    
> ./examples/streaming/StateMachineExample.jarSLF4J: Class path contains 
> multiple SLF4J bindings.SLF4J: Found binding in 
> [jar:file:/data1/app/flink-1.12-SNAPSHOT/lib/log4j-slf4j-impl-2.12.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]SLF4J:
>  Found binding in 
> [jar:file:/data1/app/hadoop-2.7.3-snappy-32core12disk/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]SLF4J:
>  Found binding in 
> [jar:file:/data1/app/hadoop-2.7.3-snappy-32core12disk/share/hadoop/tools/lib/hadoop-aliyun-2.9.2-jar-with-dependencies.jar!/org/slf4j/impl/StaticLoggerBinder.class]SLF4J:
>  See http://www.slf4j.org/codes.html#multiple_bindings for an 
> explanation.SLF4J: Actual binding is of type 
> [org.apache.logging.slf4j.Log4jLoggerFactory]2020-11-13 16:14:30,347 INFO  
> org.apache.flink.yarn.cli.FlinkYarnSessionCli                [] - Dynamic 
> Property set: 
> yarn.provided.lib.dirs=hdfs:///flink/flink-1.12-SNAPSHOT/lib2020-11-13 
> 16:14:30,347 INFO  org.apache.flink.yarn.cli.FlinkYarnSessionCli              
>   [] - Dynamic Property set: 
> yarn.provided.lib.dirs=hdfs:///flink/flink-1.12-SNAPSHOT/libUsage with 
> built-in data generator: StateMachineExample [--error-rate 
> ] [--sleep ]Usage 
> with Kafka: StateMachineExample --kafka-topic  [--brokers 
> ]Options for both the above setups: [--backend ] 
> [--checkpoint-dir ] [--async-checkpoints ] 
> [--incremental-checkpoints ] [--output  OR null for 
> stdout]
> Using standalone source with error rate 0.00 and sleep delay 1 millis
> 2020-11-13 16:14:30,706 WARN  
> org.apache.flink.yarn.configuration.YarnLogConfigUtil        [] - The 
> configuration directory ('/data1/app/flink-1.12-SNAPSHOT/conf') already 
> contains a LOG4J config file.If you want to use logback, then please delete 
> or rename the log configuration file.2020-11-13 16:14:30,947 INFO  
> org.apache.hadoop.yarn.client.AHSProxy                       [] - Connecting 
> to Application History server at 
> FAT-hadoopuat-69117.vm.dc01.tech/10.69.1.17:102002020-11-13 16:14:30,958 INFO 
>  org.apache.flink.yarn.YarnClusterDescriptor                  [] - No path 
> for the flink jar passed. Using the location of class 
> org.apache.flink.yarn.YarnClusterDescriptor to locate the jar2020-11-13 
> 16:14:31,065 INFO  
> org.apache.hadoop.yarn.client.ConfiguredRMFailoverProxyProvider [] - Failing 
> over to rm22020-11-13 16:14:31,130 INFO  
> org.apache.flink.yarn.YarnClusterDescriptor                  [] - The 
> configured JobManager memory is 3072 MB. YARN will allocate 4096 MB to make 
> up an integer multiple of its minimum allocation memory (2048 MB, configured 
> via 'yarn.scheduler.minimum-allocation-mb'). The extra 1024 MB may not be 
> used by Flink.2020-11-13 16:14:31,130 INFO  
> org.apache.flink.yarn.YarnClusterDescriptor                  [] - The 
> configured TaskManager memory is 3072 MB. YARN will allocate 4096 MB to make 
> up an integer multiple of its minimum allocation memory (2048 MB, configured 
> via 'yarn.scheduler.minimum-allocation-mb'). The extra 1024 MB may not be 
> used by Flink.2020-11-13 16:14:31,130 INFO  
> org.apache.flink.yarn.YarnClusterDescriptor                  [] - Cluster 
> specification: ClusterSpecification{masterMemoryMB=3072, 
> taskManagerMemoryMB=3072, slotsPerTaskManager=2}2020-11-13 16:14:31,681 WARN  
> org.apache.hadoop.hdfs.shortcircuit.DomainSocketFactory      [] - The 
> sh

[GitHub] [flink] flinkbot edited a comment on pull request #14075: [FLINK-20163][docs-zh] Translate page "raw format" into Chinese

2020-11-15 Thread GitBox



flinkbot edited a comment on pull request #14075:
URL: https://github.com/apache/flink/pull/14075#issuecomment-727598953


   
   ## CI report:
   
   * 08214d0608cf8a45395d003dc136a3e51eb1db68 Azure: 
[PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=9607)
 Azure: 
[SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=9595)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Commented] (FLINK-20168) Translate page 'Flink Architecture' into Chinese

2020-11-15 Thread CaoZhen (Jira)



[ 
https://issues.apache.org/jira/browse/FLINK-20168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17232589#comment-17232589
 ] 

CaoZhen commented on FLINK-20168:
-

Hi [~jark], I want to translate this document. Can you assign it to me?

> Translate page 'Flink Architecture' into Chinese
> 
>
> Key: FLINK-20168
> URL: https://issues.apache.org/jira/browse/FLINK-20168
> Project: Flink
>  Issue Type: Improvement
>  Components: chinese-translation, Documentation
>Reporter: CaoZhen
>Priority: Minor
>
> Translate the page [Flink 
> Architecture|https://ci.apache.org/projects/flink/flink-docs-release-1.11/concepts/flink-architecture.html].
> The doc located in "flink/docs/concepts/flink-architecture.zh.md"



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[GitHub] [flink] wsry commented on pull request #13991: [FLINK-19983][network] Remove wrong state checking in SortMergeSubpartitionReader

2020-11-15 Thread GitBox



wsry commented on pull request #13991:
URL: https://github.com/apache/flink/pull/13991#issuecomment-727793009


   @AHeise Thanks for your review and comments.
   
   Your concern is reasonable. I have updated my fix. For current bounded 
partition, only Netty thread can release or poll the subpartition read view so 
we should never poll a released view. In this case, two reasons cause the 
failure: 1) There are redundant data availability notifications; 2) The buffers 
read is not cleared which means the read view can be still available after 
released.
   
   For multi-thread scenario, from my understand, the root failure cause need 
to be propagated to the downstream, if nothing is polled from a read view, we 
will try to find out if the view is released and if there is a failure cause, 
the logic is like this:
   ```
   next = reader.getNextBuffer();
   if (next == null) {
   if (!reader.isReleased()) {
   continue;
   }
   
   Throwable cause = reader.getFailureCause();
   if (cause != null) {
   ErrorResponse msg = new ErrorResponse(new 
ProducerFailedException(cause), reader.getReceiverId());
ctx.writeAndFlush(msg);
   }
   }
   ```



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Commented] (FLINK-20145) Streaming job fails with IllegalStateException: Should only poll priority events

2020-11-15 Thread Robert Metzger (Jira)



[ 
https://issues.apache.org/jira/browse/FLINK-20145?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17232587#comment-17232587
 ] 

Robert Metzger commented on FLINK-20145:


Thanks a lot for addressing this so quickly! 
I will verify the fix.

Another question: We have found this issue through manual testing only, and the 
PR also doesn't contain any new tests. I wonder if we need additional test 
coverage to verify this fix, and ensure that it won't be broken again in the 
future.

> Streaming job fails with IllegalStateException: Should only poll priority 
> events
> 
>
> Key: FLINK-20145
> URL: https://issues.apache.org/jira/browse/FLINK-20145
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Checkpointing, Runtime / Network
>Affects Versions: 1.12.0
>Reporter: Robert Metzger
>Assignee: Arvid Heise
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 1.12.0
>
>
> While testing the 1.12 release, I came across the following failure cause:
> {code}
> 2020-11-13 09:41:52,110 WARN  org.apache.flink.runtime.taskmanager.Task   
>  [] - dynamic filter (3/4)#0 (b977944851531f96e5324e786f055eb7) 
> switched from RUNNING to FAILED.
> java.lang.IllegalStateException: Should only poll priority events
>   at 
> org.apache.flink.util.Preconditions.checkState(Preconditions.java:198) 
> ~[flink-dist_2.11-1.12.0.jar:1.12.0]
>   at 
> org.apache.flink.streaming.runtime.io.CheckpointedInputGate.processPriorityEvents(CheckpointedInputGate.java:116)
>  ~[flink-dist_2.11-1.12.0.jar:1.12.0]
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTaskActionExecutor$1.runThrowing(StreamTaskActionExecutor.java:47)
>  ~[flink-dist_2.11-1.12.0.jar:1.12.0]
>   at 
> org.apache.flink.streaming.runtime.tasks.mailbox.Mail.run(Mail.java:78) 
> ~[flink-dist_2.11-1.12.0.jar:1.12.0]
>   at 
> org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.processMail(MailboxProcessor.java:283)
>  ~[flink-dist_2.11-1.12.0.jar:1.12.0]
>   at 
> org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.runMailboxLoop(MailboxProcessor.java:184)
>  ~[flink-dist_2.11-1.12.0.jar:1.12.0]
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.runMailboxLoop(StreamTask.java:577)
>  ~[flink-dist_2.11-1.12.0.jar:1.12.0]
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:541)
>  ~[flink-dist_2.11-1.12.0.jar:1.12.0]
>   at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:722) 
> [flink-dist_2.11-1.12.0.jar:1.12.0]
>   at org.apache.flink.runtime.taskmanager.Task.run(Task.java:547) 
> [flink-dist_2.11-1.12.0.jar:1.12.0]
>   at java.lang.Thread.run(Thread.java:748) [?:1.8.0_222]
> {code}
> I have unaligned checkpointing enabled, the failing operator is a 
> CoFlatMapFunction. The error happend on all four TaskManagers, very soon 
> after job submission. The error doesn't happen when unaligned checkpointing 
> is disabled.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (FLINK-20168) Translate page 'Flink Architecture' into Chinese

2020-11-15 Thread CaoZhen (Jira)

CaoZhen created FLINK-20168:
---

 Summary: Translate page 'Flink Architecture' into Chinese
 Key: FLINK-20168
 URL: https://issues.apache.org/jira/browse/FLINK-20168
 Project: Flink
  Issue Type: Improvement
  Components: chinese-translation, Documentation
Reporter: CaoZhen


Translate the page [Flink 
Architecture|https://ci.apache.org/projects/flink/flink-docs-release-1.11/concepts/flink-architecture.html].

The doc located in "flink/docs/concepts/flink-architecture.zh.md"



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Closed] (FLINK-20149) SQLClientKafkaITCase.testKafka failed with "Did not get expected results before timeout."

2020-11-15 Thread Robert Metzger (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-20149?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Metzger closed FLINK-20149.
--
Resolution: Fixed

I reverted https://issues.apache.org/jira/browse/FLINK-20098 and reopened the 
ticket.

> SQLClientKafkaITCase.testKafka failed with "Did not get expected results 
> before timeout."
> -
>
> Key: FLINK-20149
> URL: https://issues.apache.org/jira/browse/FLINK-20149
> Project: Flink
>  Issue Type: Bug
>  Components: Connectors / Kafka, Table SQL / Ecosystem
>Affects Versions: 1.12.0
>Reporter: Dian Fu
>Assignee: Robert Metzger
>Priority: Blocker
>  Labels: test-stability
> Fix For: 1.12.0
>
>
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=9558&view=logs&j=c88eea3b-64a0-564d-0031-9fdcd7b8abee&t=ff888d9b-cd34-53cc-d90f-3e446d355529
> {code}
> 2020-11-13T13:47:21.6002762Z Nov 13 13:47:21 [ERROR] testKafka[0: 
> kafka-version:2.4.1 
> kafka-sql-version:universal](org.apache.flink.tests.util.kafka.SQLClientKafkaITCase)
>  Time elapsed: 183.015 s <<< FAILURE! 2020-11-13T13:47:21.6003744Z Nov 13 
> 13:47:21 java.lang.AssertionError: Did not get expected results before 
> timeout. 2020-11-13T13:47:21.6004745Z Nov 13 13:47:21 at 
> org.junit.Assert.fail(Assert.java:88) 2020-11-13T13:47:21.6005325Z Nov 13 
> 13:47:21 at org.junit.Assert.assertTrue(Assert.java:41) 
> 2020-11-13T13:47:21.6006007Z Nov 13 13:47:21 at 
> org.apache.flink.tests.util.kafka.SQLClientKafkaITCase.checkCsvResultFile(SQLClientKafkaITCase.java:226)
>  2020-11-13T13:47:21.6007091Z Nov 13 13:47:21 at 
> org.apache.flink.tests.util.kafka.SQLClientKafkaITCase.testKafka(SQLClientKafkaITCase.java:166)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[GitHub] [flink] flinkbot edited a comment on pull request #13900: [FLINK-19949][csv] Unescape CSV format line delimiter character

2020-11-15 Thread GitBox



flinkbot edited a comment on pull request #13900:
URL: https://github.com/apache/flink/pull/13900#issuecomment-720989833


   
   ## CI report:
   
   * d9e47451664a976692e4e6110bbffa2842f2ae7a UNKNOWN
   * c9a24a3ac861addff4a26ffa47cffd38db7427ae Azure: 
[FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=9600)
 
   * 67451b7cb7e42729ac9f74b0b1b642ac685954f7 UNKNOWN
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Comment Edited] (FLINK-20098) Don't add flink-connector-files to flink-dist

2020-11-15 Thread Robert Metzger (Jira)



[ 
https://issues.apache.org/jira/browse/FLINK-20098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17232585#comment-17232585
 ] 

Robert Metzger edited comment on FLINK-20098 at 11/16/20, 7:27 AM:
---

Reverted in f3f5e316622255b86f416ba5eb1c283562732823

See: https://issues.apache.org/jira/browse/FLINK-20149


was (Author: rmetzger):
Reverted f3f5e316622255b86f416ba5eb1c283562732823

See: https://issues.apache.org/jira/browse/FLINK-20149

> Don't add flink-connector-files to flink-dist
> -
>
> Key: FLINK-20098
> URL: https://issues.apache.org/jira/browse/FLINK-20098
> Project: Flink
>  Issue Type: Improvement
>  Components: API / DataStream
>Reporter: Aljoscha Krettek
>Assignee: Aljoscha Krettek
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.12.0
>
>
> We currently add both {{flink-connector-files}} and {{flink-connector-base}} 
> to {{flink-dist}}. 
> This implies, that users should use the dependency like this:
> {code}
> 
>   org.apache.flink
>   flink-connector-files
>   ${project.version}
>   provided
> 
> {code}
> which differs from other connectors where users don't need to specify 
> {{provided}}.
> Also, {{flink-connector-files}} has {{flink-connector-base}} as a provided 
> dependency, which means that examples that use this dependency will not run 
> out-of-box in IntelliJ because transitive provided dependencies will not be 
> considered.
> I propose to just remove the dependencies from {{flink-dist}} and let users 
> use the File Connector like any other connector.
> I believe the initial motivation for "providing" the File Connector in 
> {{flink-dist}} was to allow us to use the File Connector under the hood in 
> methods such as {{StreamExecutionEnvironment.readFile(...)}}. We could decide 
> to deprecate and remove those methods or re-add the File Connector as an 
> explicit (non-provided) dependency again in the future.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Reopened] (FLINK-20098) Don't add flink-connector-files to flink-dist

2020-11-15 Thread Robert Metzger (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-20098?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Metzger reopened FLINK-20098:


Reverted f3f5e316622255b86f416ba5eb1c283562732823

See: https://issues.apache.org/jira/browse/FLINK-20149

> Don't add flink-connector-files to flink-dist
> -
>
> Key: FLINK-20098
> URL: https://issues.apache.org/jira/browse/FLINK-20098
> Project: Flink
>  Issue Type: Improvement
>  Components: API / DataStream
>Reporter: Aljoscha Krettek
>Assignee: Aljoscha Krettek
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.12.0
>
>
> We currently add both {{flink-connector-files}} and {{flink-connector-base}} 
> to {{flink-dist}}. 
> This implies, that users should use the dependency like this:
> {code}
> 
>   org.apache.flink
>   flink-connector-files
>   ${project.version}
>   provided
> 
> {code}
> which differs from other connectors where users don't need to specify 
> {{provided}}.
> Also, {{flink-connector-files}} has {{flink-connector-base}} as a provided 
> dependency, which means that examples that use this dependency will not run 
> out-of-box in IntelliJ because transitive provided dependencies will not be 
> considered.
> I propose to just remove the dependencies from {{flink-dist}} and let users 
> use the File Connector like any other connector.
> I believe the initial motivation for "providing" the File Connector in 
> {{flink-dist}} was to allow us to use the File Connector under the hood in 
> methods such as {{StreamExecutionEnvironment.readFile(...)}}. We could decide 
> to deprecate and remove those methods or re-add the File Connector as an 
> explicit (non-provided) dependency again in the future.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (FLINK-20149) SQLClientKafkaITCase.testKafka failed with "Did not get expected results before timeout."

2020-11-15 Thread Robert Metzger (Jira)



[ 
https://issues.apache.org/jira/browse/FLINK-20149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17232583#comment-17232583
 ] 

Robert Metzger commented on FLINK-20149:


Thanks for looking into it. I will revert FLINK-20098.

> SQLClientKafkaITCase.testKafka failed with "Did not get expected results 
> before timeout."
> -
>
> Key: FLINK-20149
> URL: https://issues.apache.org/jira/browse/FLINK-20149
> Project: Flink
>  Issue Type: Bug
>  Components: Connectors / Kafka, Table SQL / Ecosystem
>Affects Versions: 1.12.0
>Reporter: Dian Fu
>Priority: Blocker
>  Labels: test-stability
> Fix For: 1.12.0
>
>
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=9558&view=logs&j=c88eea3b-64a0-564d-0031-9fdcd7b8abee&t=ff888d9b-cd34-53cc-d90f-3e446d355529
> {code}
> 2020-11-13T13:47:21.6002762Z Nov 13 13:47:21 [ERROR] testKafka[0: 
> kafka-version:2.4.1 
> kafka-sql-version:universal](org.apache.flink.tests.util.kafka.SQLClientKafkaITCase)
>  Time elapsed: 183.015 s <<< FAILURE! 2020-11-13T13:47:21.6003744Z Nov 13 
> 13:47:21 java.lang.AssertionError: Did not get expected results before 
> timeout. 2020-11-13T13:47:21.6004745Z Nov 13 13:47:21 at 
> org.junit.Assert.fail(Assert.java:88) 2020-11-13T13:47:21.6005325Z Nov 13 
> 13:47:21 at org.junit.Assert.assertTrue(Assert.java:41) 
> 2020-11-13T13:47:21.6006007Z Nov 13 13:47:21 at 
> org.apache.flink.tests.util.kafka.SQLClientKafkaITCase.checkCsvResultFile(SQLClientKafkaITCase.java:226)
>  2020-11-13T13:47:21.6007091Z Nov 13 13:47:21 at 
> org.apache.flink.tests.util.kafka.SQLClientKafkaITCase.testKafka(SQLClientKafkaITCase.java:166)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[GitHub] [flink] AHeise merged pull request #14073: [FLINK-20145][task] Don't expose modifiable PrioritizedDeque.iterator

2020-11-15 Thread GitBox



AHeise merged pull request #14073:
URL: https://github.com/apache/flink/pull/14073


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] rmetzger commented on pull request #14044: [FLINK-20098] Don't add flink-connector-files to flink-dist

2020-11-15 Thread GitBox



rmetzger commented on pull request #14044:
URL: https://github.com/apache/flink/pull/14044#issuecomment-727789464


   Note to all people involved in this PR: The CI run of this PR showed a test 
break that has been merged to master: 
https://issues.apache.org/jira/browse/FLINK-20149
   I will revert this change and reopen the ticket.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Assigned] (FLINK-20149) SQLClientKafkaITCase.testKafka failed with "Did not get expected results before timeout."

2020-11-15 Thread Robert Metzger (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-20149?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Metzger reassigned FLINK-20149:
--

Assignee: Robert Metzger

> SQLClientKafkaITCase.testKafka failed with "Did not get expected results 
> before timeout."
> -
>
> Key: FLINK-20149
> URL: https://issues.apache.org/jira/browse/FLINK-20149
> Project: Flink
>  Issue Type: Bug
>  Components: Connectors / Kafka, Table SQL / Ecosystem
>Affects Versions: 1.12.0
>Reporter: Dian Fu
>Assignee: Robert Metzger
>Priority: Blocker
>  Labels: test-stability
> Fix For: 1.12.0
>
>
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=9558&view=logs&j=c88eea3b-64a0-564d-0031-9fdcd7b8abee&t=ff888d9b-cd34-53cc-d90f-3e446d355529
> {code}
> 2020-11-13T13:47:21.6002762Z Nov 13 13:47:21 [ERROR] testKafka[0: 
> kafka-version:2.4.1 
> kafka-sql-version:universal](org.apache.flink.tests.util.kafka.SQLClientKafkaITCase)
>  Time elapsed: 183.015 s <<< FAILURE! 2020-11-13T13:47:21.6003744Z Nov 13 
> 13:47:21 java.lang.AssertionError: Did not get expected results before 
> timeout. 2020-11-13T13:47:21.6004745Z Nov 13 13:47:21 at 
> org.junit.Assert.fail(Assert.java:88) 2020-11-13T13:47:21.6005325Z Nov 13 
> 13:47:21 at org.junit.Assert.assertTrue(Assert.java:41) 
> 2020-11-13T13:47:21.6006007Z Nov 13 13:47:21 at 
> org.apache.flink.tests.util.kafka.SQLClientKafkaITCase.checkCsvResultFile(SQLClientKafkaITCase.java:226)
>  2020-11-13T13:47:21.6007091Z Nov 13 13:47:21 at 
> org.apache.flink.tests.util.kafka.SQLClientKafkaITCase.testKafka(SQLClientKafkaITCase.java:166)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[GitHub] [flink] wangyang0918 commented on pull request #14006: [FLINK-19546][doc] Add documentation for native Kubernetes HA

2020-11-15 Thread GitBox



wangyang0918 commented on pull request #14006:
URL: https://github.com/apache/flink/pull/14006#issuecomment-727789055


   I have created a new ticket FLINK-20167 for the follow-up(rework the HA 
documentation).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] AHeise commented on pull request #14073: [FLINK-20145][task] Don't expose modifiable PrioritizedDeque.iterator

2020-11-15 Thread GitBox



AHeise commented on pull request #14073:
URL: https://github.com/apache/flink/pull/14073#issuecomment-727789160


   I'm merging but not closing the ticket as I couldn't confirm yet, if it's 
indeed causing the issue. (It's only valid for union gates).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Created] (FLINK-20167) Rework the high availability documentation

2020-11-15 Thread Yang Wang (Jira)

Yang Wang created FLINK-20167:
-

 Summary: Rework the high availability documentation
 Key: FLINK-20167
 URL: https://issues.apache.org/jira/browse/FLINK-20167
 Project: Flink
  Issue Type: Improvement
  Components: Documentation
Reporter: Yang Wang


Currently, the [high availability 
documentation|https://ci.apache.org/projects/flink/flink-docs-master/ops/jobmanager_high_availability.html]
 and its structure are not ideal. We should rework it with the documentation 
about the different deployment modes.

Then we will have Deployment -> Standalone -> HA, Deployment -> Yarn -> HA, 
Deployment -> K8s -> HA and etc.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (FLINK-20164) Add docs for ProcessFunction and Timer in Python DataStream API.

2020-11-15 Thread Shuiqiang Chen (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-20164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shuiqiang Chen updated FLINK-20164:
---
Summary: Add docs for ProcessFunction and Timer in Python DataStream API.  
(was: Add docs for ProcessFunction and Timer for Python DataStream API.)

> Add docs for ProcessFunction and Timer in Python DataStream API.
> 
>
> Key: FLINK-20164
> URL: https://issues.apache.org/jira/browse/FLINK-20164
> Project: Flink
>  Issue Type: Improvement
>  Components: API / Python, Documentation
>Reporter: Shuiqiang Chen
>Assignee: Shuiqiang Chen
>Priority: Minor
> Fix For: 1.12.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[GitHub] [flink] flinkbot edited a comment on pull request #14075: [FLINK-20163][docs-zh] Translate page "raw format" into Chinese

2020-11-15 Thread GitBox



flinkbot edited a comment on pull request #14075:
URL: https://github.com/apache/flink/pull/14075#issuecomment-727598953


   
   ## CI report:
   
   * 08214d0608cf8a45395d003dc136a3e51eb1db68 Azure: 
[SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=9595)
 Azure: 
[PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=9607)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] flinkbot edited a comment on pull request #14077: [FLINK-20141][fs-connector] Translate FileSink document into Chinese

2020-11-15 Thread GitBox



flinkbot edited a comment on pull request #14077:
URL: https://github.com/apache/flink/pull/14077#issuecomment-727773204


   
   ## CI report:
   
   * 605e600a9ed9a0fec3cdd872dfa6208626f0a087 Azure: 
[PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=9606)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] flinkbot edited a comment on pull request #14068: [FLINK-200137][python] Emit timestamps of current records to downstream in PythonFunctionOperator.

2020-11-15 Thread GitBox



flinkbot edited a comment on pull request #14068:
URL: https://github.com/apache/flink/pull/14068#issuecomment-726851708


   
   ## CI report:
   
   * 5babde60923df8d09e936677752ac1140f69993f Azure: 
[FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=9592)
 
   * 04c250c0c8c9489eb622c91f057a7e5a496f1f35 Azure: 
[PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=9605)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] wangyang0918 commented on a change in pull request #14006: [FLINK-19546][doc] Add documentation for native Kubernetes HA

2020-11-15 Thread GitBox



wangyang0918 commented on a change in pull request #14006:
URL: https://github.com/apache/flink/pull/14006#discussion_r523933632



##
File path: docs/ops/jobmanager_high_availability.md
##
@@ -215,6 +215,76 @@ Starting zookeeper daemon on host localhost.

 $ bin/yarn-session.sh -n 2
 
+## Kubernetes Cluster High Availability
+When running Flink JobManager as a Kubernetes deployment, the replica count 
should be configured to 1 or greater.
+* The value `1` means that a new JobManager will be launched to take over 
leadership if the current one terminates exceptionally.
+* The value `N` (greater than 1) means that multiple JobManagers will be 
launched simultaneously while one is active and others are standby. Starting 
more than one JobManager will make the recovery faster.
+
+### Configuration
+{% highlight yaml %}
+kubernetes.cluster-id: 
+high-availability: 
org.apache.flink.kubernetes.highavailability.KubernetesHaServicesFactory
+high-availability.storageDir: hdfs:///flink/recovery
+{% endhighlight %}
+
+ Example: Highly Available Standalone Flink Cluster on Kubernetes
+Both session and job/application clusters support using the Kubernetes high 
availability service. Users just need to add the following Flink config options 
to [flink-configuration-configmap.yaml]({{ 
site.baseurl}}/ops/deployment/kubernetes.html#common-cluster-resource-definitions).
 All other yamls do not need to be updated.
+
+Note The filesystem which corresponds to 
the scheme of your configured HA storage directory must be available to the 
runtime. Refer to [custom Flink image]({{ 
site.baseurl}}/ops/deployment/docker.html#customize-flink-image) and [enable 
plugins]({{ site.baseurl}}/ops/deployment/docker.html#using-plugins) for more 
information.
+
+{% highlight yaml %}
+apiVersion: v1
+kind: ConfigMap
+metadata:
+  name: flink-config
+  labels:
+app: flink
+data:
+  flink-conf.yaml: |+
+  ...
+kubernetes.cluster-id: 
+high-availability: 
org.apache.flink.kubernetes.highavailability.KubernetesHaServicesFactory
+high-availability.storageDir: hdfs:///flink/recovery
+restart-strategy: fixed-delay
+restart-strategy.fixed-delay.attempts: 10
+  ...
+{% endhighlight %}
+
+ Example: Highly Available Native Kubernetes Cluster
+Using the following command to start a native Flink application cluster on 
Kubernetes with high availability configured.
+{% highlight bash %}
+$ ./bin/flink run-application -p 8 -t kubernetes-application \
+  -Dkubernetes.cluster-id= \
+  -Dtaskmanager.memory.process.size=4096m \
+  -Dkubernetes.taskmanager.cpu=2 \
+  -Dtaskmanager.numberOfTaskSlots=4 \
+  -Dkubernetes.container.image= \
+  
-Dhigh-availability=org.apache.flink.kubernetes.highavailability.KubernetesHaServicesFactory
 \
+  -Dhigh-availability.storageDir=s3://flink/flink-ha \
+  -Drestart-strategy=fixed-delay -Drestart-strategy.fixed-delay.attempts=10 \
+  
-Dcontainerized.master.env.ENABLE_BUILT_IN_PLUGINS=flink-s3-fs-hadoop-{{site.version}}.jar
 \
+  
-Dcontainerized.taskmanager.env.ENABLE_BUILT_IN_PLUGINS=flink-s3-fs-hadoop-{{site.version}}.jar
 \
+  local:///opt/flink/examples/streaming/StateMachineExample.jar
+{% endhighlight %}
+
+### High Availability Data Clean Up
+Currently, when a Flink job reached the terminal state (`FAILED`, `CANCELED`, 
`FINISHED`), all the HA data, including metadata in Kubernetes ConfigMap and HA 
state on DFS, will be cleaned up.
+
+So the following command will only shut down the Flink session cluster and 
leave all the HA related ConfigMaps, state untouched.
+{% highlight bash %}
+$ echo 'stop' | ./bin/kubernetes-session.sh 
-Dkubernetes.cluster-id= -Dexecution.attached=true
+{% endhighlight %}
+
+The following commands will cancel the job in application or session cluster 
and effectively remove all its HA data.
+{% highlight bash %}
+# Cancel a Flink job in the existing session
+$ ./bin/flink cancel -t kubernetes-session -Dkubernetes.cluster-id= 

+# Cancel a Flink application
+$ ./bin/flink cancel -t kubernetes-application 
-Dkubernetes.cluster-id= 
+{% endhighlight %}
+
+To keep HA data while restarting the Flink cluster, simply delete the deploy 
(via `kubectl delete deploy `). All the Flink cluster related 
resources will be destroyed (e.g. JobManager Deployment, TaskManager pods, 
services, Flink conf ConfigMap), as to not occupy the Kubernetes cluster 
resources. However, HA related ConfigMaps do not set the owner reference and 
they will be retained. When restarting the session / application, use 
`kubernetes-session.sh` or `flink run-application`. All the previous suspending 
running jobs will recover from the latest checkpoint successfully.

Review comment:
   > I don't understand the subclause , as to not occupy  resources.
   
   What I want to state here is that we will free all the allocated resources 
from K8s so that they could be used by others. At the same time, we still have 
the ability to recover the Flink application at any tim

[GitHub] [flink] zhuzhurk commented on a change in pull request #14020: [FLINK-20078][coordination] Factor out an ExecutionGraph factory method for DefaultExecutionTopology

2020-11-15 Thread GitBox



zhuzhurk commented on a change in pull request #14020:
URL: https://github.com/apache/flink/pull/14020#discussion_r523932872



##
File path: 
flink-runtime/src/main/java/org/apache/flink/runtime/scheduler/adapter/DefaultExecutionTopology.java
##
@@ -140,6 +115,53 @@ public DefaultSchedulingPipelinedRegion 
getPipelinedRegionOfVertex(final Executi
return pipelinedRegion;
}
 
+   public static DefaultExecutionTopology 
fromExecutionGraph(ExecutionGraph executionGraph) {
+   checkNotNull(executionGraph, "execution graph can not be null");
+
+   ExecutionGraphIndex executionGraphIndex = 
computeExecutionGraphIndex(
+   executionGraph.getAllExecutionVertices(),
+   executionGraph.getNumberOfExecutionJobVertices());

Review comment:
   ```suggestion
executionGraph.getTotalNumberOfVertices());
   ```





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] guoweiM commented on pull request #14061: [FLINK-20141][fs-connector] Add FileSink documentation

2020-11-15 Thread GitBox



guoweiM commented on pull request #14061:
URL: https://github.com/apache/flink/pull/14061#issuecomment-727779205


   > The translation is in #14077, @guoweiM could you also have a look at the 
Chinese translation ?
   
   ok. I will take a look.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Commented] (FLINK-12884) FLIP-144: Native Kubernetes HA Service

2020-11-15 Thread Yang Wang (Jira)



[ 
https://issues.apache.org/jira/browse/FLINK-12884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17232568#comment-17232568
 ] 

Yang Wang commented on FLINK-12884:
---

[~ksp0422] Thanks for your suggestion. I second your idea and am trying to add 
a E2E test to cover the whole process.
 * Start a Flink application with HA configured
 * The Flink job completes checkpoints successfully
 * Kill the JobManager
 * A new one should be launched and takes over the leadership
 * The Flink job should be recovered from the latest checkpoint successfully

> FLIP-144: Native Kubernetes HA Service
> --
>
> Key: FLINK-12884
> URL: https://issues.apache.org/jira/browse/FLINK-12884
> Project: Flink
>  Issue Type: New Feature
>  Components: Deployment / Kubernetes, Runtime / Coordination
>Reporter: MalcolmSanders
>Assignee: Yang Wang
>Priority: Major
> Fix For: 1.12.0
>
>
> Currently flink only supports HighAvailabilityService using zookeeper. As a 
> result, it requires a zookeeper cluster to be deployed on k8s cluster if our 
> customers needs high availability for flink. If we support 
> HighAvailabilityService based on native k8s APIs, it will save the efforts of 
> zookeeper deployment as well as the resources used by zookeeper cluster. It 
> might be especially helpful for customers who run small-scale k8s clusters so 
> that flink HighAvailabilityService may not cause too much overhead on k8s 
> clusters.
> Previously [FLINK-11105|https://issues.apache.org/jira/browse/FLINK-11105] 
> has proposed a HighAvailabilityService using etcd. As [~NathanHowell] 
> suggested in FLINK-11105, since k8s doesn't expose its own etcd cluster by 
> design (see [Securing etcd 
> clusters|https://kubernetes.io/docs/tasks/administer-cluster/configure-upgrade-etcd/#securing-etcd-clusters]),
>  it also requires the deployment of etcd cluster if flink uses etcd to 
> achieve HA.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[GitHub] [flink] metaotao commented on pull request #14075: [FLINK-20163][docs-zh] Translate page "raw format" into Chinese

2020-11-15 Thread GitBox



metaotao commented on pull request #14075:
URL: https://github.com/apache/flink/pull/14075#issuecomment-727775897


   @flinkbot run azure



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] gaoyunhaii commented on a change in pull request #14061: [FLINK-20141][fs-connector] Add FileSink documentation

2020-11-15 Thread GitBox



gaoyunhaii commented on a change in pull request #14061:
URL: https://github.com/apache/flink/pull/14061#discussion_r523923047



##
File path: docs/dev/connectors/streamfile_sink.md
##
@@ -765,6 +765,7 @@ and Flink will throw an exception.
 normal job termination (*e.g.* finite input stream) and termination due to 
failure, upon normal termination of a job, the last 
 in-progress files will not be transitioned to the "finished" state.
 
+TODO I think this is wrong.

Review comment:
   Why this might be wrong ? 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] gaoyunhaii edited a comment on pull request #14061: [FLINK-20141][fs-connector] Add FileSink documentation

2020-11-15 Thread GitBox



gaoyunhaii edited a comment on pull request #14061:
URL: https://github.com/apache/flink/pull/14061#issuecomment-727768552


   The translation is in https://github.com/apache/flink/pull/14077, @guoweiM 
could you also have a look at the Chinese translation ?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] flinkbot commented on pull request #14077: [FLINK-20141][fs-connector] Translate FileSink document into Chinese

2020-11-15 Thread GitBox



flinkbot commented on pull request #14077:
URL: https://github.com/apache/flink/pull/14077#issuecomment-727773204


   
   ## CI report:
   
   * 605e600a9ed9a0fec3cdd872dfa6208626f0a087 UNKNOWN
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] flinkbot edited a comment on pull request #14076: [FLINK-19775][test] SystemProcessingTimeServiceTest.testImmediateShutdown is instable

2020-11-15 Thread GitBox



flinkbot edited a comment on pull request #14076:
URL: https://github.com/apache/flink/pull/14076#issuecomment-727741544


   
   ## CI report:
   
   * 6082a1d0c0f4274e553de47fae1285537c38ace8 Azure: 
[PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=9602)
 Azure: 
[PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=9603)
 
   *  Unknown: [CANCELED](TBD) 
   * a88211378f8e89d1ed89a22795643210352ebe81 Azure: 
[PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=9604)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] flinkbot edited a comment on pull request #14068: [FLINK-200137][python] Emit timestamps of current records to downstream in PythonFunctionOperator.

2020-11-15 Thread GitBox



flinkbot edited a comment on pull request #14068:
URL: https://github.com/apache/flink/pull/14068#issuecomment-726851708


   
   ## CI report:
   
   * 5babde60923df8d09e936677752ac1140f69993f Azure: 
[FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=9592)
 
   * 04c250c0c8c9489eb622c91f057a7e5a496f1f35 UNKNOWN
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] flinkbot commented on pull request #14077: [FLINK-20141][fs-connector] Translate FileSink document into Chinese

2020-11-15 Thread GitBox



flinkbot commented on pull request #14077:
URL: https://github.com/apache/flink/pull/14077#issuecomment-727769142


   Thanks a lot for your contribution to the Apache Flink project. I'm the 
@flinkbot. I help the community
   to review your pull request. We will use this comment to track the progress 
of the review.
   
   
   ## Automated Checks
   Last check on commit 605e600a9ed9a0fec3cdd872dfa6208626f0a087 (Mon Nov 16 
06:36:55 UTC 2020)
   
✅no warnings
   
   Mention the bot in a comment to re-run the automated checks.
   ## Review Progress
   
   * ❓ 1. The [description] looks good.
   * ❓ 2. There is [consensus] that the contribution should go into to Flink.
   * ❓ 3. Needs [attention] from.
   * ❓ 4. The change fits into the overall [architecture].
   * ❓ 5. Overall code [quality] is good.
   
   Please see the [Pull Request Review 
Guide](https://flink.apache.org/contributing/reviewing-prs.html) for a full 
explanation of the review process.
The Bot is tracking the review progress through labels. Labels are applied 
according to the order of the review items. For consensus, approval by a Flink 
committer of PMC member is required Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot approve description` to approve one or more aspects (aspects: 
`description`, `consensus`, `architecture` and `quality`)
- `@flinkbot approve all` to approve all aspects
- `@flinkbot approve-until architecture` to approve everything until 
`architecture`
- `@flinkbot attention @username1 [@username2 ..]` to require somebody's 
attention
- `@flinkbot disapprove architecture` to remove an approval you gave earlier
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] gaoyunhaii edited a comment on pull request #14061: [FLINK-20141][fs-connector] Add FileSink documentation

2020-11-15 Thread GitBox



gaoyunhaii edited a comment on pull request #14061:
URL: https://github.com/apache/flink/pull/14061#issuecomment-727768552


   The translation is in https://github.com/apache/flink/pull/14077



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] gaoyunhaii commented on pull request #14061: [FLINK-20141][fs-connector] Add FileSink documentation

2020-11-15 Thread GitBox



gaoyunhaii commented on pull request #14061:
URL: https://github.com/apache/flink/pull/14061#issuecomment-727768552


   The translation is in `https://github.com/apache/flink/pull/14077`



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] gaoyunhaii opened a new pull request #14077: Pr14061 add doc zh

2020-11-15 Thread GitBox



gaoyunhaii opened a new pull request #14077:
URL: https://github.com/apache/flink/pull/14077


   ## What is the purpose of the change
   
   Translating the `FileSink` document into Chinese.
   
   ## Verifying this change
   
   This change is a trivial rework / code cleanup without any test coverage.
   
   ## Does this pull request potentially affect one of the following parts:
   
 - Dependencies (does it add or upgrade a dependency): **no**
 - The public API, i.e., is any changed class annotated with 
`@Public(Evolving)`: **no**
 - The serializers: **no**
 - The runtime per-record code paths (performance sensitive): **no**
 - Anything that affects deployment or recovery: JobManager (and its 
components), Checkpointing, Kubernetes/Yarn/Mesos, ZooKeeper: **no**
 - The S3 file system connector: **no**
   
   ## Documentation
   
 - Does this pull request introduce a new feature? **no**
 - If yes, how is the feature documented? **not applicable**



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] flinkbot edited a comment on pull request #14076: [FLINK-19775][test] SystemProcessingTimeServiceTest.testImmediateShutdown is instable

2020-11-15 Thread GitBox



flinkbot edited a comment on pull request #14076:
URL: https://github.com/apache/flink/pull/14076#issuecomment-727741544


   
   ## CI report:
   
   * 6082a1d0c0f4274e553de47fae1285537c38ace8 Azure: 
[PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=9602)
 Azure: 
[PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=9603)
 
   * a88211378f8e89d1ed89a22795643210352ebe81 UNKNOWN
   *  Unknown: [CANCELED](TBD) 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Commented] (FLINK-20143) use `yarn.provided.lib.dirs` config deploy job failed in yarn per job mode

2020-11-15 Thread Yang Wang (Jira)



[ 
https://issues.apache.org/jira/browse/FLINK-20143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17232558#comment-17232558
 ] 

Yang Wang commented on FLINK-20143:
---

Hmm. I think maybe I find the root cause. When the {{yarn.provided.lib.dirs}} 
is set to the non-qualified path(e.g. hdfs:///path/of/sharedLib), the 
{{URI#relativize}} in 
{{YarnApplicationFileUploader#getAllFilesInProvidedLibDirs}} could not work as 
expected.

 

[~zhisheng] So for your situation, I guess all the deployment(e.g. 
yarn-per-job, yarn-application, yarn-session) could not work effectively if you 
are using non-qualified path.

 

[~kkl0u] Even though we have a work around, specify a qualified path(e.g. 
hdfs://hdpdev/path/of/sharedLib), I think it is better to fix this issue. I 
will attach a PR for this ticket.

> use `yarn.provided.lib.dirs` config deploy job failed in yarn per job mode
> --
>
> Key: FLINK-20143
> URL: https://issues.apache.org/jira/browse/FLINK-20143
> Project: Flink
>  Issue Type: Bug
>  Components: Client / Job Submission, Deployment / YARN
>Affects Versions: 1.12.0
>Reporter: zhisheng
>Priority: Major
> Attachments: image-2020-11-13-17-21-47-751.png, 
> image-2020-11-13-17-22-06-111.png, image-2020-11-13-18-43-55-188.png
>
>
> use follow command deploy flink job to yarn failed 
> {code:java}
> ./bin/flink run -m yarn-cluster -d -ynm flink-1.12-test -ytm 3g -yjm 3g -yD 
> yarn.provided.lib.dirs=hdfs:///flink/flink-1.12-SNAPSHOT/lib
> ./examples/streaming/StateMachineExample.jar
> {code}
> log:
> {code:java}
> $ ./bin/flink run -m yarn-cluster -d -ynm flink-1.12-test -ytm 3g -yjm 3g -yD 
> yarn.provided.lib.dirs=hdfs:///flink/flink-1.12-SNAPSHOT/lib    
> ./examples/streaming/StateMachineExample.jar$ ./bin/flink run -m yarn-cluster 
> -d -ynm flink-1.12-test -ytm 3g -yjm 3g -yD 
> yarn.provided.lib.dirs=hdfs:///flink/flink-1.12-SNAPSHOT/lib    
> ./examples/streaming/StateMachineExample.jarSLF4J: Class path contains 
> multiple SLF4J bindings.SLF4J: Found binding in 
> [jar:file:/data1/app/flink-1.12-SNAPSHOT/lib/log4j-slf4j-impl-2.12.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]SLF4J:
>  Found binding in 
> [jar:file:/data1/app/hadoop-2.7.3-snappy-32core12disk/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]SLF4J:
>  Found binding in 
> [jar:file:/data1/app/hadoop-2.7.3-snappy-32core12disk/share/hadoop/tools/lib/hadoop-aliyun-2.9.2-jar-with-dependencies.jar!/org/slf4j/impl/StaticLoggerBinder.class]SLF4J:
>  See http://www.slf4j.org/codes.html#multiple_bindings for an 
> explanation.SLF4J: Actual binding is of type 
> [org.apache.logging.slf4j.Log4jLoggerFactory]2020-11-13 16:14:30,347 INFO  
> org.apache.flink.yarn.cli.FlinkYarnSessionCli                [] - Dynamic 
> Property set: 
> yarn.provided.lib.dirs=hdfs:///flink/flink-1.12-SNAPSHOT/lib2020-11-13 
> 16:14:30,347 INFO  org.apache.flink.yarn.cli.FlinkYarnSessionCli              
>   [] - Dynamic Property set: 
> yarn.provided.lib.dirs=hdfs:///flink/flink-1.12-SNAPSHOT/libUsage with 
> built-in data generator: StateMachineExample [--error-rate 
> ] [--sleep ]Usage 
> with Kafka: StateMachineExample --kafka-topic  [--brokers 
> ]Options for both the above setups: [--backend ] 
> [--checkpoint-dir ] [--async-checkpoints ] 
> [--incremental-checkpoints ] [--output  OR null for 
> stdout]
> Using standalone source with error rate 0.00 and sleep delay 1 millis
> 2020-11-13 16:14:30,706 WARN  
> org.apache.flink.yarn.configuration.YarnLogConfigUtil        [] - The 
> configuration directory ('/data1/app/flink-1.12-SNAPSHOT/conf') already 
> contains a LOG4J config file.If you want to use logback, then please delete 
> or rename the log configuration file.2020-11-13 16:14:30,947 INFO  
> org.apache.hadoop.yarn.client.AHSProxy                       [] - Connecting 
> to Application History server at 
> FAT-hadoopuat-69117.vm.dc01.tech/10.69.1.17:102002020-11-13 16:14:30,958 INFO 
>  org.apache.flink.yarn.YarnClusterDescriptor                  [] - No path 
> for the flink jar passed. Using the location of class 
> org.apache.flink.yarn.YarnClusterDescriptor to locate the jar2020-11-13 
> 16:14:31,065 INFO  
> org.apache.hadoop.yarn.client.ConfiguredRMFailoverProxyProvider [] - Failing 
> over to rm22020-11-13 16:14:31,130 INFO  
> org.apache.flink.yarn.YarnClusterDescriptor                  [] - The 
> configured JobManager memory is 3072 MB. YARN will allocate 4096 MB to make 
> up an integer multiple of its minimum allocation memory (2048 MB, configured 
> via 'yarn.scheduler.minimum-allocation-mb'). The extra 1024 MB may not be 
> used by Flink.2020-11-13 16:14:31,130 INFO  
> org.apache.flink.yarn.YarnClusterDescriptor                  [] - The 
> configur

[GitHub] [flink] flinkbot edited a comment on pull request #14076: [FLINK-19775][test] SystemProcessingTimeServiceTest.testImmediateShutdown is instable

2020-11-15 Thread GitBox



flinkbot edited a comment on pull request #14076:
URL: https://github.com/apache/flink/pull/14076#issuecomment-727741544


   
   ## CI report:
   
   * 6082a1d0c0f4274e553de47fae1285537c38ace8 Azure: 
[PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=9602)
 Azure: 
[PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=9603)
 
   *  Unknown: [CANCELED](TBD) 
   * a88211378f8e89d1ed89a22795643210352ebe81 UNKNOWN
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] xiaoHoly commented on pull request #14076: [FLINK-19775][test] SystemProcessingTimeServiceTest.testImmediateShutdown is instable

2020-11-15 Thread GitBox



xiaoHoly commented on pull request #14076:
URL: https://github.com/apache/flink/pull/14076#issuecomment-727752237


   HI,@tillrohrmann .PTAL 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] xiaoHoly commented on pull request #14076: [FLINK-19775][test] SystemProcessingTimeServiceTest.testImmediateShutdown is instable

2020-11-15 Thread GitBox



xiaoHoly commented on pull request #14076:
URL: https://github.com/apache/flink/pull/14076#issuecomment-727750065


   > @xiaoHoly It will trigger the test atomically when you pushing a new 
commit. Usually there is no need to trigger the test manually. Besides, you can 
use command `@flinkbot run azure` instead of `run azure re-run the last Azure 
build` to trigger the tests.
   
   Thanks for your advice



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] dianfu commented on pull request #14076: [FLINK-19775][test] SystemProcessingTimeServiceTest.testImmediateShutdown is instable

2020-11-15 Thread GitBox



dianfu commented on pull request #14076:
URL: https://github.com/apache/flink/pull/14076#issuecomment-727749697


   I'm not quite familiar with this part.
   cc @tillrohrmann 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] dianfu commented on pull request #14076: [FLINK-19775][test] SystemProcessingTimeServiceTest.testImmediateShutdown is instable

2020-11-15 Thread GitBox



dianfu commented on pull request #14076:
URL: https://github.com/apache/flink/pull/14076#issuecomment-727748627


   @xiaoHoly It will trigger the test atomically when you pushing a new commit. 
Usually there is no need to trigger the test manually. Besides, you can use 
command `@flinkbot run azure` instead of `run azure re-run the last Azure 
build` to trigger the tests.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] flinkbot edited a comment on pull request #14076: [FLINK-19775][test] SystemProcessingTimeServiceTest.testImmediateShutdown is instable

2020-11-15 Thread GitBox



flinkbot edited a comment on pull request #14076:
URL: https://github.com/apache/flink/pull/14076#issuecomment-727741544


   
   ## CI report:
   
   * 6082a1d0c0f4274e553de47fae1285537c38ace8 Azure: 
[PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=9602)
 Azure: 
[PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=9603)
 
   *  Unknown: [CANCELED](TBD) 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] xiaoHoly commented on pull request #14076: [FLINK-19775][test] SystemProcessingTimeServiceTest.testImmediateShutdown is instable

2020-11-15 Thread GitBox



xiaoHoly commented on pull request #14076:
URL: https://github.com/apache/flink/pull/14076#issuecomment-727747755


   @flinkbot run azure re-run the last Azure build



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] xiaoHoly commented on pull request #14076: [FLINK-19775][test] SystemProcessingTimeServiceTest.testImmediateShutdown is instable

2020-11-15 Thread GitBox



xiaoHoly commented on pull request #14076:
URL: https://github.com/apache/flink/pull/14076#issuecomment-727747601


   @flinkbot run travis re-run the last Travis build



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Commented] (FLINK-19775) SystemProcessingTimeServiceTest.testImmediateShutdown is instable

2020-11-15 Thread jiawen xiao (Jira)



[ 
https://issues.apache.org/jira/browse/FLINK-19775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17232536#comment-17232536
 ] 

jiawen xiao commented on FLINK-19775:
-

Hi,[~dian.fu], Looking forward to your follow-up to this pr 
（https://github.com/apache/flink/pull/14076）

> SystemProcessingTimeServiceTest.testImmediateShutdown is instable
> -
>
> Key: FLINK-19775
> URL: https://issues.apache.org/jira/browse/FLINK-19775
> Project: Flink
>  Issue Type: Bug
>  Components: API / DataStream
>Affects Versions: 1.11.0
>Reporter: Dian Fu
>Assignee: jiawen xiao
>Priority: Major
>  Labels: pull-request-available, test-stability
>
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=8131&view=logs&j=d89de3df-4600-5585-dadc-9bbc9a5e661c&t=66b5c59a-0094-561d-0e44-b149dfdd586d
> {code}
> 2020-10-22T21:12:54.9462382Z [ERROR] 
> testImmediateShutdown(org.apache.flink.streaming.runtime.tasks.SystemProcessingTimeServiceTest)
>   Time elapsed: 0.009 s  <<< ERROR!
> 2020-10-22T21:12:54.9463024Z java.lang.InterruptedException
> 2020-10-22T21:12:54.9463331Z  at java.lang.Object.wait(Native Method)
> 2020-10-22T21:12:54.9463766Z  at java.lang.Object.wait(Object.java:502)
> 2020-10-22T21:12:54.9464140Z  at 
> org.apache.flink.core.testutils.OneShotLatch.await(OneShotLatch.java:63)
> 2020-10-22T21:12:54.9466014Z  at 
> org.apache.flink.streaming.runtime.tasks.SystemProcessingTimeServiceTest.testImmediateShutdown(SystemProcessingTimeServiceTest.java:154)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (FLINK-19775) SystemProcessingTimeServiceTest.testImmediateShutdown is instable

2020-11-15 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-19775?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated FLINK-19775:
---
Labels: pull-request-available test-stability  (was: test-stability)

> SystemProcessingTimeServiceTest.testImmediateShutdown is instable
> -
>
> Key: FLINK-19775
> URL: https://issues.apache.org/jira/browse/FLINK-19775
> Project: Flink
>  Issue Type: Bug
>  Components: API / DataStream
>Affects Versions: 1.11.0
>Reporter: Dian Fu
>Assignee: jiawen xiao
>Priority: Major
>  Labels: pull-request-available, test-stability
>
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=8131&view=logs&j=d89de3df-4600-5585-dadc-9bbc9a5e661c&t=66b5c59a-0094-561d-0e44-b149dfdd586d
> {code}
> 2020-10-22T21:12:54.9462382Z [ERROR] 
> testImmediateShutdown(org.apache.flink.streaming.runtime.tasks.SystemProcessingTimeServiceTest)
>   Time elapsed: 0.009 s  <<< ERROR!
> 2020-10-22T21:12:54.9463024Z java.lang.InterruptedException
> 2020-10-22T21:12:54.9463331Z  at java.lang.Object.wait(Native Method)
> 2020-10-22T21:12:54.9463766Z  at java.lang.Object.wait(Object.java:502)
> 2020-10-22T21:12:54.9464140Z  at 
> org.apache.flink.core.testutils.OneShotLatch.await(OneShotLatch.java:63)
> 2020-10-22T21:12:54.9466014Z  at 
> org.apache.flink.streaming.runtime.tasks.SystemProcessingTimeServiceTest.testImmediateShutdown(SystemProcessingTimeServiceTest.java:154)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[GitHub] [flink] flinkbot commented on pull request #14076: [FLINK-19775][test] SystemProcessingTimeServiceTest.testImmediateShutdown is instable

2020-11-15 Thread GitBox



flinkbot commented on pull request #14076:
URL: https://github.com/apache/flink/pull/14076#issuecomment-727741544


   
   ## CI report:
   
   * 6082a1d0c0f4274e553de47fae1285537c38ace8 UNKNOWN
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] xiaoHoly commented on pull request #14076: Pr holy patch

2020-11-15 Thread GitBox



xiaoHoly commented on pull request #14076:
URL: https://github.com/apache/flink/pull/14076#issuecomment-727739140


   @flinkbot run travis re-run the last Travis build



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] xiaoHoly commented on pull request #14076: Pr holy patch

2020-11-15 Thread GitBox



xiaoHoly commented on pull request #14076:
URL: https://github.com/apache/flink/pull/14076#issuecomment-727739202


   @flinkbot run azure re-run the last Azure build



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] xiaoHoly commented on pull request #14076: Pr holy patch

2020-11-15 Thread GitBox



xiaoHoly commented on pull request #14076:
URL: https://github.com/apache/flink/pull/14076#issuecomment-727737851


   @dianfu ,hi,Teacher Fu,I have finished my work for this problem . Can you 
help me review my code?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] flinkbot commented on pull request #14076: Pr holy patch

2020-11-15 Thread GitBox



flinkbot commented on pull request #14076:
URL: https://github.com/apache/flink/pull/14076#issuecomment-727737772


   Thanks a lot for your contribution to the Apache Flink project. I'm the 
@flinkbot. I help the community
   to review your pull request. We will use this comment to track the progress 
of the review.
   
   
   ## Automated Checks
   Last check on commit 6082a1d0c0f4274e553de47fae1285537c38ace8 (Mon Nov 16 
05:10:09 UTC 2020)
   
   **Warnings:**
* No documentation files were touched! Remember to keep the Flink docs up 
to date!
* **Invalid pull request title: No valid Jira ID provided**
   
   
   Mention the bot in a comment to re-run the automated checks.
   ## Review Progress
   
   * ❓ 1. The [description] looks good.
   * ❓ 2. There is [consensus] that the contribution should go into to Flink.
   * ❓ 3. Needs [attention] from.
   * ❓ 4. The change fits into the overall [architecture].
   * ❓ 5. Overall code [quality] is good.
   
   Please see the [Pull Request Review 
Guide](https://flink.apache.org/contributing/reviewing-prs.html) for a full 
explanation of the review process.
The Bot is tracking the review progress through labels. Labels are applied 
according to the order of the review items. For consensus, approval by a Flink 
committer of PMC member is required Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot approve description` to approve one or more aspects (aspects: 
`description`, `consensus`, `architecture` and `quality`)
- `@flinkbot approve all` to approve all aspects
- `@flinkbot approve-until architecture` to approve everything until 
`architecture`
- `@flinkbot attention @username1 [@username2 ..]` to require somebody's 
attention
- `@flinkbot disapprove architecture` to remove an approval you gave earlier
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] xiaoHoly opened a new pull request #14076: Pr holy patch

2020-11-15 Thread GitBox



xiaoHoly opened a new pull request #14076:
URL: https://github.com/apache/flink/pull/14076


   
   
   ## What is the purpose of the change
   
   I will repair the test function 
(SystemProcessingTimeServiceTest.testImmediateShutdown)  which  is instable .
   Express in the FLINK-19775
   
   
   ## Brief change log
   
   -change 
flink/flink-test-utils-parent/flink-test-utils-junit/src/main/java/org/apache/flink/core/testutils/OneShotLatch.java
   
   ## Verifying this change
   
   This change is already covered by existing tests, such as 
SystemProcessingTimeServiceTest.testImmediateShutdown.
   
   ## Does this pull request potentially affect one of the following parts:
   
 - Dependencies (does it add or upgrade a dependency): (no)
 - The public API, i.e., is any changed class annotated with 
`@Public(Evolving)`: ( no)
 - The serializers: (no)
 - The runtime per-record code paths (performance sensitive): (no)
 - Anything that affects deployment or recovery: JobManager (and its 
components), Checkpointing, Kubernetes/Yarn/Mesos, ZooKeeper: (no)
 - The S3 file system connector: (no)
   
   ## Documentation
   
 - Does this pull request introduce a new feature? (no)
 - If yes, how is the feature documented? (no)
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] flinkbot edited a comment on pull request #13998: [FLINK-20062][hive] ContinuousHiveSplitEnumerator should be lock-free

2020-11-15 Thread GitBox



flinkbot edited a comment on pull request #13998:
URL: https://github.com/apache/flink/pull/13998#issuecomment-724001789


   
   ## CI report:
   
   * 73aaf9862a1d30857c4605b26a8997eeb11281c1 Azure: 
[SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=9351)
 
   * 530e65e61ae94f952ffb682f0573071724da27d6 Azure: 
[PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=9601)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] flinkbot edited a comment on pull request #13998: [FLINK-20062][hive] ContinuousHiveSplitEnumerator should be lock-free

2020-11-15 Thread GitBox



flinkbot edited a comment on pull request #13998:
URL: https://github.com/apache/flink/pull/13998#issuecomment-724001789


   
   ## CI report:
   
   * 73aaf9862a1d30857c4605b26a8997eeb11281c1 Azure: 
[SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=9351)
 
   * 530e65e61ae94f952ffb682f0573071724da27d6 UNKNOWN
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] flinkbot edited a comment on pull request #13900: [FLINK-19949][csv] Unescape CSV format line delimiter character

2020-11-15 Thread GitBox



flinkbot edited a comment on pull request #13900:
URL: https://github.com/apache/flink/pull/13900#issuecomment-720989833


   
   ## CI report:
   
   * d9e47451664a976692e4e6110bbffa2842f2ae7a UNKNOWN
   * c9a24a3ac861addff4a26ffa47cffd38db7427ae Azure: 
[FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=9600)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] flinkbot edited a comment on pull request #13900: [FLINK-19949][csv] Unescape CSV format line delimiter character

2020-11-15 Thread GitBox



flinkbot edited a comment on pull request #13900:
URL: https://github.com/apache/flink/pull/13900#issuecomment-720989833


   
   ## CI report:
   
   * 9dc60ef8af4b2a4e11c749a817b454c983f938aa Azure: 
[SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=8969)
 
   * d9e47451664a976692e4e6110bbffa2842f2ae7a UNKNOWN
   * c9a24a3ac861addff4a26ffa47cffd38db7427ae UNKNOWN
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Comment Edited] (FLINK-20038) Rectify the usage of ResultPartitionType#isPipelined() in partition tracker.

2020-11-15 Thread Jin Xing (Jira)



[ 
https://issues.apache.org/jira/browse/FLINK-20038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17232495#comment-17232495
 ] 

Jin Xing edited comment on FLINK-20038 at 11/16/20, 3:56 AM:
-

Hi [~trohrmann] [~ym] 
 Thanks a lot for your feedback and sorry for late reply, was busy during 11.11 
shopping festival support ~

We indeed need a proper design for what we want to support and how it could be 
mapped to properties. The characteristics of shuffle manner is exposed by 
ResultPartitionType.

We should sort out what should be exposed to scheduling layer, which should be 
respected;  and what are implementation details, which should be kept inside;

>From my side, I think there are 3 kinds of properties should be exposed to 
>scheduling layer:
 # *_+Writing property+_* – – how the shuffle service is writable, that's what 
'_hasBackpressure_' indicates, but seems that it's used nowhere and blurred 
with the property of 'isPipeliened'. In current code, Flink assumes that if a 
shuffle is '_isPipiplined=true_', it's always '_hasBackpressure=true_', which 
is not true from the concept.
 # *_+Reading property+_* – – when the shuffle data is readable, that's what 
'isPipelined' and '_isBlocking_' indicates. They are used when dividing 
PipelinedRegion. My concern is how we define the meaning  of a PIpelinedRegion 
in short. From my understanding, a PipelinedRegion is a set of vertices which 
should be scheduled together because of back pressure and the internal data 
flow can be consumed before task finish. If my understanding is correct, when 
judge whether two vertices should be divided into the same region, should the 
condition be '_isPipeliend=true && hasBackpressure=true_', rather than the 
current impl of ([PipelinedRegionComputeUtil|#L158])]  
(_producedResult.getResultType().isPipelined()_) ?
 # *+_Data lifecycle_+* – –  GC could happen a). after data consumption; b). 
after job finished; c) by recycle manually

If above classification is valid, we need to rectify some misuse in scheduling 
layer and give full respect to shuffle property.

 


was (Author: jinxing6...@126.com):
Hi [~trohrmann] [~ym] 
 Thanks a lot for your feedback and sorry for late reply, was busy during 11.11 
shopping festival support ~

We indeed need a proper design for what we want to support and how it could be 
mapped to properties. The characteristics of shuffle manner is exposed by 
ResultPartitionType.

We should sort out what should be exposed to scheduling layer, which should be 
respected;  and what are implementation details, which should be kept inside;

>From my side, I think there are 3 kinds of properties should be exposed to 
>scheduling layer:
 # *_+Writing property+_* – – when the shuffle service is writable, that's what 
'_hasBackpressure_' indicates, but seems that it's used nowhere and blurred 
with the property of 'isPipeliened'. In current code, Flink assumes that if a 
shuffle is '_isPipiplined=true_', it's always '_hasBackpressure=true_', which 
is not true from the concept.
 # *_+Reading property+_* – – when the shuffle data is readable, that's what 
'isPipelined' and '_isBlocking_' indicates. They are used when dividing 
PipelinedRegion. My concern is how we define the meaning  of a PIpelinedRegion 
in short. From my understanding, a PipelinedRegion is a set of vertices which 
should be scheduled together because of back pressure and the internal data 
flow can be consumed before task finish. If my understanding is correct, when 
judge whether two vertices should be divided into the same region, should the 
condition be '_isPipeliend=true && hasBackpressure=true_', rather than the 
current impl of ([PipelinedRegionComputeUtil|#L158])]  
(_producedResult.getResultType().isPipelined()_) ?
 # *+_Data lifecycle_+* – –  GC could happen a). after data consumption; b). 
after job finished; c) by recycle manually

If above classification is valid, we need to rectify some misuse in scheduling 
layer and give full respect to shuffle property.

 

> Rectify the usage of ResultPartitionType#isPipelined() in partition tracker.
> 
>
> Key: FLINK-20038
> URL: https://issues.apache.org/jira/browse/FLINK-20038
> Project: Flink
>  Issue Type: Improvement
>  Components: Runtime / Coordination, Runtime / Network
>Reporter: Jin Xing
>Priority: Major
>
> After "FLIP-31: Pluggable Shuffle Service", users can extend and plug in new 
> shuffle manner, thus to benefit different scenarios. New shuffle manner tend 
> to bring in new abilities which could be leveraged by scheduling layer to 
> provide better performance.
> From my understanding, the characteristics of shuffle manner is exposed by 
> ResultPartitionType (e.g. isPipelined, isBlocking, hasBackPressure .

[jira] [Comment Edited] (FLINK-20038) Rectify the usage of ResultPartitionType#isPipelined() in partition tracker.

2020-11-15 Thread Jin Xing (Jira)



[ 
https://issues.apache.org/jira/browse/FLINK-20038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17232495#comment-17232495
 ] 

Jin Xing edited comment on FLINK-20038 at 11/16/20, 3:54 AM:
-

Hi [~trohrmann] [~ym] 
 Thanks a lot for your feedback and sorry for late reply, was busy during 11.11 
shopping festival support ~

We indeed need a proper design for what we want to support and how it could be 
mapped to properties. The characteristics of shuffle manner is exposed by 
ResultPartitionType.

We should sort out what should be exposed to scheduling layer, which should be 
respected;  and what are implementation details, which should be kept inside;

>From my side, I think there are 3 kinds of properties should be exposed to 
>scheduling layer:
 # *_+Writing property+_* – – when the shuffle service is writable, that's what 
'_hasBackpressure_' indicates, but seems that it's used nowhere and blurred 
with the property of 'isPipeliened'. In current code, Flink assumes that if a 
shuffle is '_isPipiplined=true_', it's always '_hasBackpressure=true_', which 
is not true from the concept.
 # *_+Reading property+_* – – when the shuffle data is readable, that's what 
'isPipelined' and '_isBlocking_' indicates. They are used when dividing 
PipelinedRegion. My concern is how we define the meaning  of a PIpelinedRegion 
in short. From my understanding, a PipelinedRegion is a set of vertices which 
should be scheduled together because of back pressure and the internal data 
flow can be consumed before task finish. If my understanding is correct, when 
judge whether two vertices should be divided into the same region, should the 
condition be '_isPipeliend=true && hasBackpressure=true_', rather than the 
current impl of ([PipelinedRegionComputeUtil|#L158])]  
(_producedResult.getResultType().isPipelined()_) ?
 # *+_Data lifecycle_+* – –  GC could happen a). after data consumption; b). 
after job finished; c) by recycle manually

If above classification is valid, we need to rectify some misuse in scheduling 
layer and give full respect to shuffle property.

 


was (Author: jinxing6...@126.com):
Hi [~trohrmann] [~ym] 
Thanks a lot for your feedback and sorry for late reply, was busy during 11.11 
shopping festival support ~

We indeed need a proper design for what we want to support and how it could be 
mapped to properties. The characteristics of shuffle manner is exposed by 
ResultPartitionType.

We should sort out what should be exposed to scheduling layer, which should be 
respected;  and what is implementation details, which should be kept inside;

>From my side, I think there are 3 kinds of properties should be exposed to 
>scheduling layer:
 # *_+Writing property+_* – – when the shuffle service is writable, that's what 
'_hasBackpressure_' indicates, but seems that it's used nowhere and blurred 
with the property of 'isPipeliened'. In current code, Flink assumes that if a 
shuffle is '_isPipiplined=true_', it's always '_hasBackpressure=true_', which 
is not true from the concept.
 # *_+Reading property+_* – – when the shuffle data is readable, that's what 
'isPipelined' and '_isBlocking_' indicates. They are used when dividing 
PipelinedRegion. My concern is how we define the meaning  of a PIpelinedRegion 
in short. From my understanding, a PipelinedRegion is a set of vertices which 
should be scheduled together because of back pressure and the internal data 
flow can be consumed before task finish. If my understanding is correct, when 
judge whether two vertices should be divided into the same region, should the 
condition be '_isPipeliend=true && hasBackpressure=true_', rather than the 
current impl of 
([PipelinedRegionComputeUtil|[https://github.com/apache/flink/blob/master/flink-runtime/src/main/java/org/apache/flink/runtime/executiongraph/failover/flip1/PipelinedRegionComputeUtil.java#L158])]
  (_producedResult.getResultType().isPipelined()_) ?
 # *+_Data lifecycle_+* – –  GC could happen a). after data consumption; b). 
after job finished; c) by recycle manually

If above classification is valid, we need to rectify some misuse in scheduling 
layer and give full respect to shuffle property.

 

> Rectify the usage of ResultPartitionType#isPipelined() in partition tracker.
> 
>
> Key: FLINK-20038
> URL: https://issues.apache.org/jira/browse/FLINK-20038
> Project: Flink
>  Issue Type: Improvement
>  Components: Runtime / Coordination, Runtime / Network
>Reporter: Jin Xing
>Priority: Major
>
> After "FLIP-31: Pluggable Shuffle Service", users can extend and plug in new 
> shuffle manner, thus to benefit different scenarios. New shuffle manner tend 
> to bring in new abilities which could be leveraged by scheduling layer to 
> provide better pe

[jira] [Commented] (FLINK-20038) Rectify the usage of ResultPartitionType#isPipelined() in partition tracker.

2020-11-15 Thread Jin Xing (Jira)



[ 
https://issues.apache.org/jira/browse/FLINK-20038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17232495#comment-17232495
 ] 

Jin Xing commented on FLINK-20038:
--

Hi [~trohrmann] [~ym] 
Thanks a lot for your feedback and sorry for late reply, was busy during 11.11 
shopping festival support ~

We indeed need a proper design for what we want to support and how it could be 
mapped to properties. The characteristics of shuffle manner is exposed by 
ResultPartitionType.

We should sort out what should be exposed to scheduling layer, which should be 
respected;  and what is implementation details, which should be kept inside;

>From my side, I think there are 3 kinds of properties should be exposed to 
>scheduling layer:
 # *_+Writing property+_* – – when the shuffle service is writable, that's what 
'_hasBackpressure_' indicates, but seems that it's used nowhere and blurred 
with the property of 'isPipeliened'. In current code, Flink assumes that if a 
shuffle is '_isPipiplined=true_', it's always '_hasBackpressure=true_', which 
is not true from the concept.
 # *_+Reading property+_* – – when the shuffle data is readable, that's what 
'isPipelined' and '_isBlocking_' indicates. They are used when dividing 
PipelinedRegion. My concern is how we define the meaning  of a PIpelinedRegion 
in short. From my understanding, a PipelinedRegion is a set of vertices which 
should be scheduled together because of back pressure and the internal data 
flow can be consumed before task finish. If my understanding is correct, when 
judge whether two vertices should be divided into the same region, should the 
condition be '_isPipeliend=true && hasBackpressure=true_', rather than the 
current impl of 
([PipelinedRegionComputeUtil|[https://github.com/apache/flink/blob/master/flink-runtime/src/main/java/org/apache/flink/runtime/executiongraph/failover/flip1/PipelinedRegionComputeUtil.java#L158])]
  (_producedResult.getResultType().isPipelined()_) ?
 # *+_Data lifecycle_+* – –  GC could happen a). after data consumption; b). 
after job finished; c) by recycle manually

If above classification is valid, we need to rectify some misuse in scheduling 
layer and give full respect to shuffle property.

 

> Rectify the usage of ResultPartitionType#isPipelined() in partition tracker.
> 
>
> Key: FLINK-20038
> URL: https://issues.apache.org/jira/browse/FLINK-20038
> Project: Flink
>  Issue Type: Improvement
>  Components: Runtime / Coordination, Runtime / Network
>Reporter: Jin Xing
>Priority: Major
>
> After "FLIP-31: Pluggable Shuffle Service", users can extend and plug in new 
> shuffle manner, thus to benefit different scenarios. New shuffle manner tend 
> to bring in new abilities which could be leveraged by scheduling layer to 
> provide better performance.
> From my understanding, the characteristics of shuffle manner is exposed by 
> ResultPartitionType (e.g. isPipelined, isBlocking, hasBackPressure ...), and 
> leveraged by scheduling layer to conduct job. But seems that Flink doesn't 
> provide a way to describe the new characteristics from a plugged in shuffle 
> manner. I also find that scheduling layer have some weak assumptions on 
> ResultPartitionType. I detail by below example.
> In our internal Flink, we develop a new shuffle manner for batch jobs. 
> Characteristics can be summarized as below briefly:
> 1. Upstream task shuffle writes data to DISK;
> 2. Upstream task commits data while producing and notify "consumable" to 
> downstream BEFORE task finished;
> 3. Downstream is notified when upstream data is consumable and can be 
> scheduled according to resource;
> 4. When downstream task failover, only itself needs to be restarted because 
> upstream data is written into disk and replayable;
> We can tell the character of this new shuffle manner as:
> a. isPipelined=true – downstream task can consume data before upstream 
> finished;
> b. hasBackPressure=false – upstream task shuffle writes data to disk and can 
> finish by itself no matter if there's downstream task consumes the data in 
> time.
> But above new ResultPartitionType(isPipelined=true, hasBackPressure=false) 
> seems contradicts the partition lifecycle management in current scheduling 
> layer:
> 1. The above new shuffle manner needs partition tracker for lifecycle 
> management, but current Flink assumes that ALL "isPipelined=true" result 
> partitions are released on consumption and will not be taken care of by 
> partition tracker 
> ([link|https://github.com/apache/flink/blob/master/flink-runtime/src/main/java/org/apache/flink/runtime/io/network/partition/JobMasterPartitionTrackerImpl.java#L66])
>  – the limitation is not correct for this case.
> From my understanding, the method of ResultPartitionType#isPipel

[jira] [Comment Edited] (FLINK-12884) FLIP-144: Native Kubernetes HA Service

2020-11-15 Thread Kevin Kwon (Jira)



[ 
https://issues.apache.org/jira/browse/FLINK-12884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17232493#comment-17232493
 ] 

Kevin Kwon edited comment on FLINK-12884 at 11/16/20, 3:48 AM:
---

Just in my opinion, I think the e2e test should mostly focus on killing the job 
manager since Zookeeper was used for checkpoint metadata storage aside from 
leader election (which is innately handled by Kubernetes' pod respawning) 


was (Author: ksp0422):
Just in my opinion, I think the e2e test should mostly focus on killing the job 
manager since Zookeeper was used for checkpoint metadata storage aside from 
leader election (which is innately handled by Kubernetes' pod spawning) 

> FLIP-144: Native Kubernetes HA Service
> --
>
> Key: FLINK-12884
> URL: https://issues.apache.org/jira/browse/FLINK-12884
> Project: Flink
>  Issue Type: New Feature
>  Components: Deployment / Kubernetes, Runtime / Coordination
>Reporter: MalcolmSanders
>Assignee: Yang Wang
>Priority: Major
> Fix For: 1.12.0
>
>
> Currently flink only supports HighAvailabilityService using zookeeper. As a 
> result, it requires a zookeeper cluster to be deployed on k8s cluster if our 
> customers needs high availability for flink. If we support 
> HighAvailabilityService based on native k8s APIs, it will save the efforts of 
> zookeeper deployment as well as the resources used by zookeeper cluster. It 
> might be especially helpful for customers who run small-scale k8s clusters so 
> that flink HighAvailabilityService may not cause too much overhead on k8s 
> clusters.
> Previously [FLINK-11105|https://issues.apache.org/jira/browse/FLINK-11105] 
> has proposed a HighAvailabilityService using etcd. As [~NathanHowell] 
> suggested in FLINK-11105, since k8s doesn't expose its own etcd cluster by 
> design (see [Securing etcd 
> clusters|https://kubernetes.io/docs/tasks/administer-cluster/configure-upgrade-etcd/#securing-etcd-clusters]),
>  it also requires the deployment of etcd cluster if flink uses etcd to 
> achieve HA.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (FLINK-12884) FLIP-144: Native Kubernetes HA Service

2020-11-15 Thread Kevin Kwon (Jira)



[ 
https://issues.apache.org/jira/browse/FLINK-12884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17232493#comment-17232493
 ] 

Kevin Kwon commented on FLINK-12884:


Just in my opinion, I think the e2e test should mostly focus on killing the job 
manager since Zookeeper was used for checkpoint metadata storage aside from 
leader election (which is innately handled by Kubernetes' pod spawning) 

> FLIP-144: Native Kubernetes HA Service
> --
>
> Key: FLINK-12884
> URL: https://issues.apache.org/jira/browse/FLINK-12884
> Project: Flink
>  Issue Type: New Feature
>  Components: Deployment / Kubernetes, Runtime / Coordination
>Reporter: MalcolmSanders
>Assignee: Yang Wang
>Priority: Major
> Fix For: 1.12.0
>
>
> Currently flink only supports HighAvailabilityService using zookeeper. As a 
> result, it requires a zookeeper cluster to be deployed on k8s cluster if our 
> customers needs high availability for flink. If we support 
> HighAvailabilityService based on native k8s APIs, it will save the efforts of 
> zookeeper deployment as well as the resources used by zookeeper cluster. It 
> might be especially helpful for customers who run small-scale k8s clusters so 
> that flink HighAvailabilityService may not cause too much overhead on k8s 
> clusters.
> Previously [FLINK-11105|https://issues.apache.org/jira/browse/FLINK-11105] 
> has proposed a HighAvailabilityService using etcd. As [~NathanHowell] 
> suggested in FLINK-11105, since k8s doesn't expose its own etcd cluster by 
> design (see [Securing etcd 
> clusters|https://kubernetes.io/docs/tasks/administer-cluster/configure-upgrade-etcd/#securing-etcd-clusters]),
>  it also requires the deployment of etcd cluster if flink uses etcd to 
> achieve HA.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (FLINK-20143) use `yarn.provided.lib.dirs` config deploy job failed in yarn per job mode

2020-11-15 Thread Yang Wang (Jira)



[ 
https://issues.apache.org/jira/browse/FLINK-20143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17232488#comment-17232488
 ] 

Yang Wang commented on FLINK-20143:
---

[~zhisheng] Could you add the hdfs schema in the {{yarn.provided.lib.dirs}} and 
have a try again.

For example, {{-yD 
yarn.provided.lib.dirs=hdfs://hdpdev/flink/flink-1.12-SNAPSHOT/lib}}

> use `yarn.provided.lib.dirs` config deploy job failed in yarn per job mode
> --
>
> Key: FLINK-20143
> URL: https://issues.apache.org/jira/browse/FLINK-20143
> Project: Flink
>  Issue Type: Bug
>  Components: Client / Job Submission, Deployment / YARN
>Affects Versions: 1.12.0
>Reporter: zhisheng
>Priority: Major
> Attachments: image-2020-11-13-17-21-47-751.png, 
> image-2020-11-13-17-22-06-111.png, image-2020-11-13-18-43-55-188.png
>
>
> use follow command deploy flink job to yarn failed 
> {code:java}
> ./bin/flink run -m yarn-cluster -d -ynm flink-1.12-test -ytm 3g -yjm 3g -yD 
> yarn.provided.lib.dirs=hdfs:///flink/flink-1.12-SNAPSHOT/lib
> ./examples/streaming/StateMachineExample.jar
> {code}
> log:
> {code:java}
> $ ./bin/flink run -m yarn-cluster -d -ynm flink-1.12-test -ytm 3g -yjm 3g -yD 
> yarn.provided.lib.dirs=hdfs:///flink/flink-1.12-SNAPSHOT/lib    
> ./examples/streaming/StateMachineExample.jar$ ./bin/flink run -m yarn-cluster 
> -d -ynm flink-1.12-test -ytm 3g -yjm 3g -yD 
> yarn.provided.lib.dirs=hdfs:///flink/flink-1.12-SNAPSHOT/lib    
> ./examples/streaming/StateMachineExample.jarSLF4J: Class path contains 
> multiple SLF4J bindings.SLF4J: Found binding in 
> [jar:file:/data1/app/flink-1.12-SNAPSHOT/lib/log4j-slf4j-impl-2.12.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]SLF4J:
>  Found binding in 
> [jar:file:/data1/app/hadoop-2.7.3-snappy-32core12disk/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]SLF4J:
>  Found binding in 
> [jar:file:/data1/app/hadoop-2.7.3-snappy-32core12disk/share/hadoop/tools/lib/hadoop-aliyun-2.9.2-jar-with-dependencies.jar!/org/slf4j/impl/StaticLoggerBinder.class]SLF4J:
>  See http://www.slf4j.org/codes.html#multiple_bindings for an 
> explanation.SLF4J: Actual binding is of type 
> [org.apache.logging.slf4j.Log4jLoggerFactory]2020-11-13 16:14:30,347 INFO  
> org.apache.flink.yarn.cli.FlinkYarnSessionCli                [] - Dynamic 
> Property set: 
> yarn.provided.lib.dirs=hdfs:///flink/flink-1.12-SNAPSHOT/lib2020-11-13 
> 16:14:30,347 INFO  org.apache.flink.yarn.cli.FlinkYarnSessionCli              
>   [] - Dynamic Property set: 
> yarn.provided.lib.dirs=hdfs:///flink/flink-1.12-SNAPSHOT/libUsage with 
> built-in data generator: StateMachineExample [--error-rate 
> ] [--sleep ]Usage 
> with Kafka: StateMachineExample --kafka-topic  [--brokers 
> ]Options for both the above setups: [--backend ] 
> [--checkpoint-dir ] [--async-checkpoints ] 
> [--incremental-checkpoints ] [--output  OR null for 
> stdout]
> Using standalone source with error rate 0.00 and sleep delay 1 millis
> 2020-11-13 16:14:30,706 WARN  
> org.apache.flink.yarn.configuration.YarnLogConfigUtil        [] - The 
> configuration directory ('/data1/app/flink-1.12-SNAPSHOT/conf') already 
> contains a LOG4J config file.If you want to use logback, then please delete 
> or rename the log configuration file.2020-11-13 16:14:30,947 INFO  
> org.apache.hadoop.yarn.client.AHSProxy                       [] - Connecting 
> to Application History server at 
> FAT-hadoopuat-69117.vm.dc01.tech/10.69.1.17:102002020-11-13 16:14:30,958 INFO 
>  org.apache.flink.yarn.YarnClusterDescriptor                  [] - No path 
> for the flink jar passed. Using the location of class 
> org.apache.flink.yarn.YarnClusterDescriptor to locate the jar2020-11-13 
> 16:14:31,065 INFO  
> org.apache.hadoop.yarn.client.ConfiguredRMFailoverProxyProvider [] - Failing 
> over to rm22020-11-13 16:14:31,130 INFO  
> org.apache.flink.yarn.YarnClusterDescriptor                  [] - The 
> configured JobManager memory is 3072 MB. YARN will allocate 4096 MB to make 
> up an integer multiple of its minimum allocation memory (2048 MB, configured 
> via 'yarn.scheduler.minimum-allocation-mb'). The extra 1024 MB may not be 
> used by Flink.2020-11-13 16:14:31,130 INFO  
> org.apache.flink.yarn.YarnClusterDescriptor                  [] - The 
> configured TaskManager memory is 3072 MB. YARN will allocate 4096 MB to make 
> up an integer multiple of its minimum allocation memory (2048 MB, configured 
> via 'yarn.scheduler.minimum-allocation-mb'). The extra 1024 MB may not be 
> used by Flink.2020-11-13 16:14:31,130 INFO  
> org.apache.flink.yarn.YarnClusterDescriptor                  [] - Cluster 
> specification: ClusterSpecification{masterMemoryMB=3072, 
> taskManagerMemoryMB=3072, slotsPer

[jira] [Assigned] (FLINK-20164) Add docs for ProcessFunction and Timer for Python DataStream API.

2020-11-15 Thread Dian Fu (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-20164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dian Fu reassigned FLINK-20164:
---

Assignee: Shuiqiang Chen

> Add docs for ProcessFunction and Timer for Python DataStream API.
> -
>
> Key: FLINK-20164
> URL: https://issues.apache.org/jira/browse/FLINK-20164
> Project: Flink
>  Issue Type: Improvement
>  Components: API / Python, Documentation
>Reporter: Shuiqiang Chen
>Assignee: Shuiqiang Chen
>Priority: Minor
> Fix For: 1.12.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[GitHub] [flink] flinkbot edited a comment on pull request #14060: [hotfix][doc] fix the usage example of datagen connector doc

2020-11-15 Thread GitBox



flinkbot edited a comment on pull request #14060:
URL: https://github.com/apache/flink/pull/14060#issuecomment-726548680


   
   ## CI report:
   
   * deb5686ae2485f1fd700e5f1e69e6bf8aa6e Azure: 
[SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=9599)
 Azure: 
[FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=9549)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] flinkbot edited a comment on pull request #13900: [FLINK-19949][csv] Unescape CSV format line delimiter character

2020-11-15 Thread GitBox



flinkbot edited a comment on pull request #13900:
URL: https://github.com/apache/flink/pull/13900#issuecomment-720989833


   
   ## CI report:
   
   * 9dc60ef8af4b2a4e11c749a817b454c983f938aa Azure: 
[SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=8969)
 
   * d9e47451664a976692e4e6110bbffa2842f2ae7a UNKNOWN
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Closed] (FLINK-20134) Test non-vectorized Python UDAF

2020-11-15 Thread Dian Fu (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-20134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dian Fu closed FLINK-20134.
---
Resolution: Done

[~zhongwei] Thanks a lot~

> Test non-vectorized Python UDAF
> ---
>
> Key: FLINK-20134
> URL: https://issues.apache.org/jira/browse/FLINK-20134
> Project: Flink
>  Issue Type: Task
>  Components: API / Python
>Affects Versions: 1.12.0
>Reporter: Dian Fu
>Assignee: Wei Zhong
>Priority: Critical
> Fix For: 1.12.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Assigned] (FLINK-20134) Test non-vectorized Python UDAF

2020-11-15 Thread Dian Fu (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-20134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dian Fu reassigned FLINK-20134:
---

Assignee: Wei Zhong

> Test non-vectorized Python UDAF
> ---
>
> Key: FLINK-20134
> URL: https://issues.apache.org/jira/browse/FLINK-20134
> Project: Flink
>  Issue Type: Task
>  Components: API / Python
>Affects Versions: 1.12.0
>Reporter: Dian Fu
>Assignee: Wei Zhong
>Priority: Critical
> Fix For: 1.12.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (FLINK-20134) Test non-vectorized Python UDAF

2020-11-15 Thread Wei Zhong (Jira)



[ 
https://issues.apache.org/jira/browse/FLINK-20134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17232484#comment-17232484
 ] 

Wei Zhong commented on FLINK-20134:
---

Hi [~dian.fu], I have tested the non-vectorized Python UDAF in the following 
sceneries:
 * simple count UDAF
 * count distinct UDAF with MapView
 * mixed use of built-in agg functions and non-vectorized Python UDAF

> Test non-vectorized Python UDAF
> ---
>
> Key: FLINK-20134
> URL: https://issues.apache.org/jira/browse/FLINK-20134
> Project: Flink
>  Issue Type: Task
>  Components: API / Python
>Affects Versions: 1.12.0
>Reporter: Dian Fu
>Priority: Critical
> Fix For: 1.12.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (FLINK-19852) Managed memory released check can block IterativeTask

2020-11-15 Thread Xintong Song (Jira)



[ 
https://issues.apache.org/jira/browse/FLINK-19852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17232483#comment-17232483
 ] 

Xintong Song commented on FLINK-19852:
--

[~gaoyunhaii] and I also had a discussion around this issue. We have similar 
opinion as yours, that the least invasive fix might be to have iterative tasks 
cache the pages and do not re-allocate them on each iteration.

> Managed memory released check can block IterativeTask
> -
>
> Key: FLINK-19852
> URL: https://issues.apache.org/jira/browse/FLINK-19852
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Task
>Affects Versions: 1.11.0, 1.10.2, 1.12.0, 1.11.1, 1.11.2
>Reporter: shaomeng.wang
>Priority: Critical
> Attachments: image-2020-10-28-17-48-28-395.png, 
> image-2020-10-28-17-48-48-583.png
>
>
> UnsafeMemoryBudget#reserveMemory, called on TempBarrier, needs time to wait 
> on GC of all allocated/released managed memory at every iteration.
>  
> stack:
> !image-2020-10-28-17-48-48-583.png!
> new TempBarrier in BatchTask
> !image-2020-10-28-17-48-28-395.png!
>  
> These will be very slow than before.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (FLINK-20155) java.lang.OutOfMemoryError: Direct buffer memory

2020-11-15 Thread Xintong Song (Jira)



[ 
https://issues.apache.org/jira/browse/FLINK-20155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17232482#comment-17232482
 ] 

Xintong Song commented on FLINK-20155:
--

Hi [~roeehersh],

The problem you described sounds like a direct memory leak to me. A previous 
finished job may failed to properly release its direct memory. 

While the exception is thrown from the pulsar client, the leak may also come 
from other places.

I would suggest looking into a dump profile, looking for the unreleased 
resources from previous finished jobs.
{quote}i also notice that even thought i configure 10gb memory, my flink 
managed memory is much smaller:
{quote}
That's true. Currently the metrics on this page have not cover all Flink's 
memory use cases. The most important part missing is the native memory. There 
will be an improvement to this page in the upcoming release 1.12.

> java.lang.OutOfMemoryError: Direct buffer memory
> 
>
> Key: FLINK-20155
> URL: https://issues.apache.org/jira/browse/FLINK-20155
> Project: Flink
>  Issue Type: Bug
>  Components: Client / Job Submission
>Affects Versions: 1.11.1
>Reporter: roee hershko
>Priority: Major
> Fix For: 1.11.1
>
> Attachments: image-2020-11-13-17-52-54-217.png
>
>
> update:
> this issue occur every time after a job fails the only way to fix it is to 
> manually re-create the task managers pods (i am using flink operator)
>  
> after submitting a job, it runs for few hours and then the job manager is 
> crushing, when trying to re-create the job i am getting the following error:
>  
> {code:java}
> 2020-11-13 17:44:58org.apache.pulsar.client.admin.PulsarAdminException: 
> org.apache.pulsar.shade.io.netty.handler.codec.EncoderException: 
> java.lang.OutOfMemoryError: Direct buffer memoryat 
> org.apache.pulsar.client.admin.internal.BaseResource.getApiException(BaseResource.java:228)
> at 
> org.apache.pulsar.client.admin.internal.TopicsImpl$7.failed(TopicsImpl.java:324)
> at 
> org.apache.pulsar.shade.org.glassfish.jersey.client.JerseyInvocation$4.failed(JerseyInvocation.java:1030)
> at 
> org.apache.pulsar.shade.org.glassfish.jersey.client.ClientRuntime.processFailure(ClientRuntime.java:231)
> at 
> org.apache.pulsar.shade.org.glassfish.jersey.client.ClientRuntime.access$100(ClientRuntime.java:85)
> at 
> org.apache.pulsar.shade.org.glassfish.jersey.client.ClientRuntime$2.lambda$failure$1(ClientRuntime.java:183)
> at 
> org.apache.pulsar.shade.org.glassfish.jersey.internal.Errors$1.call(Errors.java:272)
> at 
> org.apache.pulsar.shade.org.glassfish.jersey.internal.Errors$1.call(Errors.java:268)
> at 
> org.apache.pulsar.shade.org.glassfish.jersey.internal.Errors.process(Errors.java:316)
> at 
> org.apache.pulsar.shade.org.glassfish.jersey.internal.Errors.process(Errors.java:298)
> at 
> org.apache.pulsar.shade.org.glassfish.jersey.internal.Errors.process(Errors.java:268)
> at 
> org.apache.pulsar.shade.org.glassfish.jersey.process.internal.RequestScope.runInScope(RequestScope.java:312)
> at 
> org.apache.pulsar.shade.org.glassfish.jersey.client.ClientRuntime$2.failure(ClientRuntime.java:183)
> at 
> org.apache.pulsar.client.admin.internal.http.AsyncHttpConnector$3.onThrowable(AsyncHttpConnector.java:279)
> at 
> org.apache.pulsar.shade.org.asynchttpclient.netty.NettyResponseFuture.abort(NettyResponseFuture.java:277)
> at 
> org.apache.pulsar.shade.org.asynchttpclient.netty.request.WriteListener.abortOnThrowable(WriteListener.java:50)
> at 
> org.apache.pulsar.shade.org.asynchttpclient.netty.request.WriteListener.operationComplete(WriteListener.java:61)
> at 
> org.apache.pulsar.shade.org.asynchttpclient.netty.request.WriteCompleteListener.operationComplete(WriteCompleteListener.java:28)
> at 
> org.apache.pulsar.shade.org.asynchttpclient.netty.request.WriteCompleteListener.operationComplete(WriteCompleteListener.java:20)
> at 
> org.apache.pulsar.shade.io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:577)
> at 
> org.apache.pulsar.shade.io.netty.util.concurrent.DefaultPromise.notifyListenersNow(DefaultPromise.java:551)
> at 
> org.apache.pulsar.shade.io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:490)
> at 
> org.apache.pulsar.shade.io.netty.util.concurrent.DefaultPromise.addListener(DefaultPromise.java:183)
> at 
> org.apache.pulsar.shade.io.netty.channel.DefaultChannelPromise.addListener(DefaultChannelPromise.java:95)
> at 
> org.apache.pulsar.shade.io.netty.channel.DefaultChannelPromise.addListener(DefaultChannelPromise.java:30)
> at 
> org.apache.pulsar.shade.org.asynchttpclient.netty.request.NettyRequestSender.writeRequest(NettyRequestSender.java:421)
> at 
> org.

[jira] [Closed] (FLINK-20133) Test Pandas UDAF

2020-11-15 Thread Dian Fu (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-20133?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dian Fu closed FLINK-20133.
---
Resolution: Done

> Test Pandas UDAF
> 
>
> Key: FLINK-20133
> URL: https://issues.apache.org/jira/browse/FLINK-20133
> Project: Flink
>  Issue Type: Task
>  Components: API / Python
>Affects Versions: 1.12.0
>Reporter: Dian Fu
>Assignee: Huang Xingbo
>Priority: Critical
> Fix For: 1.12.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (FLINK-20142) Update the document for CREATE TABLE LIKE that source table from different catalog is supported

2020-11-15 Thread Jark Wu (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-20142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jark Wu updated FLINK-20142:

Description: 
The confusion from the [USER mailing 
list|http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/CREATE-TABLE-LIKE-clause-from-different-catalog-or-database-td39364.html]:

Hi,

Is it disallowed to refer to a table from different databases or catalogs when 
someone creates a table?

According to [1], there's no way to refer to tables belonging to different 
databases or catalogs.

[1] 
https://ci.apache.org/projects/flink/flink-docs-master/dev/table/sql/create.html#create-table

Best,

Dongwon

  was:
The confusion from the USER mailing list:

Hi,

Is it disallowed to refer to a table from different databases or catalogs when 
someone creates a table?

According to [1], there's no way to refer to tables belonging to different 
databases or catalogs.

[1] 
https://ci.apache.org/projects/flink/flink-docs-master/dev/table/sql/create.html#create-table

Best,

Dongwon


> Update the document for CREATE TABLE LIKE that source table from different 
> catalog is supported
> ---
>
> Key: FLINK-20142
> URL: https://issues.apache.org/jira/browse/FLINK-20142
> Project: Flink
>  Issue Type: Improvement
>  Components: Documentation, Table SQL / API
>Affects Versions: 1.11.2
>Reporter: Danny Chen
>Priority: Major
>
> The confusion from the [USER mailing 
> list|http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/CREATE-TABLE-LIKE-clause-from-different-catalog-or-database-td39364.html]:
> Hi,
> Is it disallowed to refer to a table from different databases or catalogs 
> when someone creates a table?
> According to [1], there's no way to refer to tables belonging to different 
> databases or catalogs.
> [1] 
> https://ci.apache.org/projects/flink/flink-docs-master/dev/table/sql/create.html#create-table
> Best,
> Dongwon



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[GitHub] [flink] flinkbot edited a comment on pull request #14060: [hotfix][doc] fix the usage example of datagen connector doc

2020-11-15 Thread GitBox



flinkbot edited a comment on pull request #14060:
URL: https://github.com/apache/flink/pull/14060#issuecomment-726548680


   
   ## CI report:
   
   * deb5686ae2485f1fd700e5f1e69e6bf8aa6e Azure: 
[PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=9599)
 Azure: 
[FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=9549)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Assigned] (FLINK-20125) Test memory configuration display in the web ui

2020-11-15 Thread Xintong Song (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-20125?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xintong Song reassigned FLINK-20125:


Assignee: Xintong Song

> Test memory configuration display in the web ui
> ---
>
> Key: FLINK-20125
> URL: https://issues.apache.org/jira/browse/FLINK-20125
> Project: Flink
>  Issue Type: Sub-task
>  Components: Runtime / Coordination
>Affects Versions: 1.12.0
>Reporter: Robert Metzger
>Assignee: Xintong Song
>Priority: Critical
> Fix For: 1.12.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (FLINK-20149) SQLClientKafkaITCase.testKafka failed with "Did not get expected results before timeout."

2020-11-15 Thread Dian Fu (Jira)



[ 
https://issues.apache.org/jira/browse/FLINK-20149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17232458#comment-17232458
 ] 

Dian Fu commented on FLINK-20149:
-

https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=9596&view=logs&j=c88eea3b-64a0-564d-0031-9fdcd7b8abee&t=ff888d9b-cd34-53cc-d90f-3e446d355529

> SQLClientKafkaITCase.testKafka failed with "Did not get expected results 
> before timeout."
> -
>
> Key: FLINK-20149
> URL: https://issues.apache.org/jira/browse/FLINK-20149
> Project: Flink
>  Issue Type: Bug
>  Components: Connectors / Kafka, Table SQL / Ecosystem
>Affects Versions: 1.12.0
>Reporter: Dian Fu
>Priority: Blocker
>  Labels: test-stability
> Fix For: 1.12.0
>
>
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=9558&view=logs&j=c88eea3b-64a0-564d-0031-9fdcd7b8abee&t=ff888d9b-cd34-53cc-d90f-3e446d355529
> {code}
> 2020-11-13T13:47:21.6002762Z Nov 13 13:47:21 [ERROR] testKafka[0: 
> kafka-version:2.4.1 
> kafka-sql-version:universal](org.apache.flink.tests.util.kafka.SQLClientKafkaITCase)
>  Time elapsed: 183.015 s <<< FAILURE! 2020-11-13T13:47:21.6003744Z Nov 13 
> 13:47:21 java.lang.AssertionError: Did not get expected results before 
> timeout. 2020-11-13T13:47:21.6004745Z Nov 13 13:47:21 at 
> org.junit.Assert.fail(Assert.java:88) 2020-11-13T13:47:21.6005325Z Nov 13 
> 13:47:21 at org.junit.Assert.assertTrue(Assert.java:41) 
> 2020-11-13T13:47:21.6006007Z Nov 13 13:47:21 at 
> org.apache.flink.tests.util.kafka.SQLClientKafkaITCase.checkCsvResultFile(SQLClientKafkaITCase.java:226)
>  2020-11-13T13:47:21.6007091Z Nov 13 13:47:21 at 
> org.apache.flink.tests.util.kafka.SQLClientKafkaITCase.testKafka(SQLClientKafkaITCase.java:166)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (FLINK-20166) FlinkKafkaProducerITCase.testRunOutOfProducersInThePool failed with "CheckpointException: Could not complete snapshot 1 for operator MockTask (1/1)#0. Failure reason: Ch

2020-11-15 Thread Dian Fu (Jira)

Dian Fu created FLINK-20166:
---

 Summary: FlinkKafkaProducerITCase.testRunOutOfProducersInThePool 
failed with "CheckpointException: Could not complete snapshot 1 for operator 
MockTask (1/1)#0. Failure reason: Checkpoint was declined."
 Key: FLINK-20166
 URL: https://issues.apache.org/jira/browse/FLINK-20166
 Project: Flink
  Issue Type: Bug
  Components: Connectors / Kafka, Runtime / Checkpointing
Affects Versions: 1.12.0
Reporter: Dian Fu
 Fix For: 1.12.0


https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=9596&view=logs&j=c5f0071e-1851-543e-9a45-9ac140befc32&t=1fb1a56f-e8b5-5a82-00a0-a2db7757b4f5

{code}
2020-11-15T22:54:53.4151222Z [ERROR] 
testRunOutOfProducersInThePool(org.apache.flink.streaming.connectors.kafka.FlinkKafkaProducerITCase)
  Time elapsed: 61.224 s  <<< ERROR!
2020-11-15T22:54:53.4152487Z 
org.apache.flink.runtime.checkpoint.CheckpointException: Could not complete 
snapshot 1 for operator MockTask (1/1)#0. Failure reason: Checkpoint was 
declined.
2020-11-15T22:54:53.4153577Zat 
org.apache.flink.streaming.api.operators.StreamOperatorStateHandler.snapshotState(StreamOperatorStateHandler.java:226)
2020-11-15T22:54:53.4154747Zat 
org.apache.flink.streaming.api.operators.StreamOperatorStateHandler.snapshotState(StreamOperatorStateHandler.java:158)
2020-11-15T22:54:53.4155760Zat 
org.apache.flink.streaming.api.operators.AbstractStreamOperator.snapshotState(AbstractStreamOperator.java:343)
2020-11-15T22:54:53.4156772Zat 
org.apache.flink.streaming.util.AbstractStreamOperatorTestHarness.snapshotWithLocalState(AbstractStreamOperatorTestHarness.java:638)
2020-11-15T22:54:53.4157865Zat 
org.apache.flink.streaming.util.AbstractStreamOperatorTestHarness.snapshot(AbstractStreamOperatorTestHarness.java:630)
2020-11-15T22:54:53.4159065Zat 
org.apache.flink.streaming.connectors.kafka.FlinkKafkaProducerITCase.testRunOutOfProducersInThePool(FlinkKafkaProducerITCase.java:527)
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (FLINK-20166) FlinkKafkaProducerITCase.testRunOutOfProducersInThePool failed with "CheckpointException: Could not complete snapshot 1 for operator MockTask (1/1)#0. Failure reason: Ch

2020-11-15 Thread Dian Fu (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-20166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dian Fu updated FLINK-20166:

Labels: test-stability  (was: )

> FlinkKafkaProducerITCase.testRunOutOfProducersInThePool failed with 
> "CheckpointException: Could not complete snapshot 1 for operator MockTask 
> (1/1)#0. Failure reason: Checkpoint was declined."
> 
>
> Key: FLINK-20166
> URL: https://issues.apache.org/jira/browse/FLINK-20166
> Project: Flink
>  Issue Type: Bug
>  Components: Connectors / Kafka, Runtime / Checkpointing
>Affects Versions: 1.12.0
>Reporter: Dian Fu
>Priority: Major
>  Labels: test-stability
> Fix For: 1.12.0
>
>
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=9596&view=logs&j=c5f0071e-1851-543e-9a45-9ac140befc32&t=1fb1a56f-e8b5-5a82-00a0-a2db7757b4f5
> {code}
> 2020-11-15T22:54:53.4151222Z [ERROR] 
> testRunOutOfProducersInThePool(org.apache.flink.streaming.connectors.kafka.FlinkKafkaProducerITCase)
>   Time elapsed: 61.224 s  <<< ERROR!
> 2020-11-15T22:54:53.4152487Z 
> org.apache.flink.runtime.checkpoint.CheckpointException: Could not complete 
> snapshot 1 for operator MockTask (1/1)#0. Failure reason: Checkpoint was 
> declined.
> 2020-11-15T22:54:53.4153577Z  at 
> org.apache.flink.streaming.api.operators.StreamOperatorStateHandler.snapshotState(StreamOperatorStateHandler.java:226)
> 2020-11-15T22:54:53.4154747Z  at 
> org.apache.flink.streaming.api.operators.StreamOperatorStateHandler.snapshotState(StreamOperatorStateHandler.java:158)
> 2020-11-15T22:54:53.4155760Z  at 
> org.apache.flink.streaming.api.operators.AbstractStreamOperator.snapshotState(AbstractStreamOperator.java:343)
> 2020-11-15T22:54:53.4156772Z  at 
> org.apache.flink.streaming.util.AbstractStreamOperatorTestHarness.snapshotWithLocalState(AbstractStreamOperatorTestHarness.java:638)
> 2020-11-15T22:54:53.4157865Z  at 
> org.apache.flink.streaming.util.AbstractStreamOperatorTestHarness.snapshot(AbstractStreamOperatorTestHarness.java:630)
> 2020-11-15T22:54:53.4159065Z  at 
> org.apache.flink.streaming.connectors.kafka.FlinkKafkaProducerITCase.testRunOutOfProducersInThePool(FlinkKafkaProducerITCase.java:527)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[GitHub] [flink] flinkbot edited a comment on pull request #14060: [hotfix][doc] fix the usage example of datagen connector doc

2020-11-15 Thread GitBox



flinkbot edited a comment on pull request #14060:
URL: https://github.com/apache/flink/pull/14060#issuecomment-726548680


   
   ## CI report:
   
   * deb5686ae2485f1fd700e5f1e69e6bf8aa6e Azure: 
[FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=9549)
 Azure: 
[PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=9599)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Commented] (FLINK-16908) FlinkKafkaProducerITCase testScaleUpAfterScalingDown Timeout expired while initializing transactional state in 60000ms.

2020-11-15 Thread Dian Fu (Jira)



[ 
https://issues.apache.org/jira/browse/FLINK-16908?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17232450#comment-17232450
 ] 

Dian Fu commented on FLINK-16908:
-

https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=9596&view=logs&j=c5f0071e-1851-543e-9a45-9ac140befc32&t=1fb1a56f-e8b5-5a82-00a0-a2db7757b4f5

> FlinkKafkaProducerITCase testScaleUpAfterScalingDown Timeout expired while 
> initializing transactional state in 6ms.
> ---
>
> Key: FLINK-16908
> URL: https://issues.apache.org/jira/browse/FLINK-16908
> Project: Flink
>  Issue Type: Bug
>  Components: Connectors / Kafka
>Affects Versions: 1.11.0, 1.12.0
>Reporter: Piotr Nowojski
>Priority: Major
>  Labels: test-stability
> Fix For: 1.12.0
>
>
> https://dev.azure.com/rmetzger/Flink/_build/results?buildId=6889&view=logs&j=c5f0071e-1851-543e-9a45-9ac140befc32&t=f66652e3-384e-5b25-be29-abfea69ea8da
> {noformat}
> [ERROR] 
> testScaleUpAfterScalingDown(org.apache.flink.streaming.connectors.kafka.FlinkKafkaProducerITCase)
>   Time elapsed: 64.353 s  <<< ERROR!
> org.apache.kafka.common.errors.TimeoutException: Timeout expired while 
> initializing transactional state in 6ms.
> {noformat}
> After this initial error many other tests (I think all following unit tests) 
> failed with errors like:
> {noformat}
> [ERROR] 
> testFailAndRecoverSameCheckpointTwice(org.apache.flink.streaming.connectors.kafka.FlinkKafkaProducerITCase)
>   Time elapsed: 7.895 s  <<< FAILURE!
> java.lang.AssertionError: Detected producer leak. Thread name: 
> kafka-producer-network-thread | producer-196
>   at 
> org.apache.flink.streaming.connectors.kafka.FlinkKafkaProducerITCase.checkProducerLeak(FlinkKafkaProducerITCase.java:675)
>   at 
> org.apache.flink.streaming.connectors.kafka.FlinkKafkaProducerITCase.testFailAndRecoverSameCheckpointTwice(FlinkKafkaProducerITCase.java:311)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (FLINK-20165) YARNSessionFIFOITCase.checkForProhibitedLogContents: Error occurred during initialization of boot layer java.lang.IllegalStateException: Module system already initialize

2020-11-15 Thread Dian Fu (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-20165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dian Fu updated FLINK-20165:

Labels: test-stability  (was: )

> YARNSessionFIFOITCase.checkForProhibitedLogContents: Error occurred during 
> initialization of boot layer java.lang.IllegalStateException: Module system 
> already initialized
> --
>
> Key: FLINK-20165
> URL: https://issues.apache.org/jira/browse/FLINK-20165
> Project: Flink
>  Issue Type: Bug
>  Components: Deployment / YARN
>Affects Versions: 1.11.3
>Reporter: Dian Fu
>Priority: Major
>  Labels: test-stability
> Fix For: 1.11.3
>
>
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=9597&view=logs&j=298e20ef-7951-5965-0e79-ea664ddc435e&t=8560c56f-9ec1-5c40-4ff5-9d3e882d
> {code}
> 2020-11-15T22:42:03.3053212Z 22:42:03,303 [   Time-limited test] INFO  
> org.apache.flink.yarn.YARNSessionFIFOITCase  [] - Finished 
> testDetachedMode()
> 2020-11-15T22:42:37.9020133Z [ERROR] Tests run: 5, Failures: 2, Errors: 0, 
> Skipped: 2, Time elapsed: 67.485 s <<< FAILURE! - in 
> org.apache.flink.yarn.YARNSessionFIFOSecuredITCase
> 2020-11-15T22:42:37.9022015Z [ERROR] 
> testDetachedMode(org.apache.flink.yarn.YARNSessionFIFOSecuredITCase)  Time 
> elapsed: 12.841 s  <<< FAILURE!
> 2020-11-15T22:42:37.9023701Z java.lang.AssertionError: 
> 2020-11-15T22:42:37.9025649Z Found a file 
> /__w/2/s/flink-yarn-tests/target/flink-yarn-tests-fifo-secured/flink-yarn-tests-fifo-secured-logDir-nm-1_0/application_1605480087188_0002/container_1605480087188_0002_01_02/taskmanager.out
>  with a prohibited string (one of [Exception, Started 
> SelectChannelConnector@0.0.0.0:8081]). Excerpts:
> 2020-11-15T22:42:37.9026730Z [
> 2020-11-15T22:42:37.9027080Z Error occurred during initialization of boot 
> layer
> 2020-11-15T22:42:37.9027623Z java.lang.IllegalStateException: Module system 
> already initialized
> 2020-11-15T22:42:37.9033278Z java.lang.IllegalStateException: Module system 
> already initialized
> 2020-11-15T22:42:37.9033825Z ]
> 2020-11-15T22:42:37.9034291Z  at org.junit.Assert.fail(Assert.java:88)
> 2020-11-15T22:42:37.9034971Z  at 
> org.apache.flink.yarn.YarnTestBase.ensureNoProhibitedStringInLogFiles(YarnTestBase.java:479)
> 2020-11-15T22:42:37.9035814Z  at 
> org.apache.flink.yarn.YARNSessionFIFOITCase.checkForProhibitedLogContents(YARNSessionFIFOITCase.java:83)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (FLINK-20165) YARNSessionFIFOITCase.checkForProhibitedLogContents: Error occurred during initialization of boot layer java.lang.IllegalStateException: Module system already initialize

2020-11-15 Thread Dian Fu (Jira)

Dian Fu created FLINK-20165:
---

 Summary: YARNSessionFIFOITCase.checkForProhibitedLogContents: 
Error occurred during initialization of boot layer 
java.lang.IllegalStateException: Module system already initialized
 Key: FLINK-20165
 URL: https://issues.apache.org/jira/browse/FLINK-20165
 Project: Flink
  Issue Type: Bug
  Components: Deployment / YARN
Affects Versions: 1.11.3
Reporter: Dian Fu
 Fix For: 1.11.3


https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=9597&view=logs&j=298e20ef-7951-5965-0e79-ea664ddc435e&t=8560c56f-9ec1-5c40-4ff5-9d3e882d

{code}
2020-11-15T22:42:03.3053212Z 22:42:03,303 [   Time-limited test] INFO  
org.apache.flink.yarn.YARNSessionFIFOITCase  [] - Finished 
testDetachedMode()
2020-11-15T22:42:37.9020133Z [ERROR] Tests run: 5, Failures: 2, Errors: 0, 
Skipped: 2, Time elapsed: 67.485 s <<< FAILURE! - in 
org.apache.flink.yarn.YARNSessionFIFOSecuredITCase
2020-11-15T22:42:37.9022015Z [ERROR] 
testDetachedMode(org.apache.flink.yarn.YARNSessionFIFOSecuredITCase)  Time 
elapsed: 12.841 s  <<< FAILURE!
2020-11-15T22:42:37.9023701Z java.lang.AssertionError: 
2020-11-15T22:42:37.9025649Z Found a file 
/__w/2/s/flink-yarn-tests/target/flink-yarn-tests-fifo-secured/flink-yarn-tests-fifo-secured-logDir-nm-1_0/application_1605480087188_0002/container_1605480087188_0002_01_02/taskmanager.out
 with a prohibited string (one of [Exception, Started 
SelectChannelConnector@0.0.0.0:8081]). Excerpts:
2020-11-15T22:42:37.9026730Z [
2020-11-15T22:42:37.9027080Z Error occurred during initialization of boot layer
2020-11-15T22:42:37.9027623Z java.lang.IllegalStateException: Module system 
already initialized
2020-11-15T22:42:37.9033278Z java.lang.IllegalStateException: Module system 
already initialized
2020-11-15T22:42:37.9033825Z ]
2020-11-15T22:42:37.9034291Zat org.junit.Assert.fail(Assert.java:88)
2020-11-15T22:42:37.9034971Zat 
org.apache.flink.yarn.YarnTestBase.ensureNoProhibitedStringInLogFiles(YarnTestBase.java:479)
2020-11-15T22:42:37.9035814Zat 
org.apache.flink.yarn.YARNSessionFIFOITCase.checkForProhibitedLogContents(YARNSessionFIFOITCase.java:83)
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[GitHub] [flink] HuangXingBo commented on pull request #14060: [hotfix][doc] fix the usage example of datagen connector doc

2020-11-15 Thread GitBox



HuangXingBo commented on pull request #14060:
URL: https://github.com/apache/flink/pull/14060#issuecomment-727681489


   @flinkbot run azure



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Commented] (FLINK-17482) KafkaITCase.testMultipleSourcesOnePartition unstable

2020-11-15 Thread Dian Fu (Jira)



[ 
https://issues.apache.org/jira/browse/FLINK-17482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17232444#comment-17232444
 ] 

Dian Fu commented on FLINK-17482:
-

Another instance on 1.11: 
https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=9597&view=logs&j=1fc6e7bf-633c-5081-c32a-9dea24b05730&t=0d9ad4c1-5629-5ffc-10dc-113ca91e23c5

> KafkaITCase.testMultipleSourcesOnePartition unstable
> 
>
> Key: FLINK-17482
> URL: https://issues.apache.org/jira/browse/FLINK-17482
> Project: Flink
>  Issue Type: Bug
>  Components: Connectors / Kafka, Tests
>Affects Versions: 1.11.0, 1.12.0
>Reporter: Robert Metzger
>Priority: Major
>  Labels: test-stability
> Fix For: 1.12.0
>
>
> CI run: 
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=454&view=logs&j=c5f0071e-1851-543e-9a45-9ac140befc32&t=684b1416-4c17-504e-d5ab-97ee44e08a20
> {code}
> 07:29:40,472 [main] INFO  
> org.apache.flink.streaming.connectors.kafka.KafkaTestBase[] - 
> -
> [ERROR] Tests run: 22, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 
> 152.018 s <<< FAILURE! - in 
> org.apache.flink.streaming.connectors.kafka.KafkaITCase
> [ERROR] 
> testMultipleSourcesOnePartition(org.apache.flink.streaming.connectors.kafka.KafkaITCase)
>   Time elapsed: 4.257 s  <<< FAILURE!
> java.lang.AssertionError: Test failed: Job execution failed.
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.apache.flink.test.util.TestUtils.tryExecute(TestUtils.java:45)
>   at 
> org.apache.flink.streaming.connectors.kafka.KafkaConsumerTestBase.runMultipleSourcesOnePartitionExactlyOnceTest(KafkaConsumerTestBase.java:963)
>   at 
> org.apache.flink.streaming.connectors.kafka.KafkaITCase.testMultipleSourcesOnePartition(KafkaITCase.java:107)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:298)
>   at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:292)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at java.lang.Thread.run(Thread.java:748)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[GitHub] [flink] XComp commented on a change in pull request #14033: [FLINK-19979][e2e] Add sanity check after bash e2e tests for no leftovers

2020-11-15 Thread GitBox



XComp commented on a change in pull request #14033:
URL: https://github.com/apache/flink/pull/14033#discussion_r523823323



##
File path: flink-end-to-end-tests/test-scripts/common.sh
##
@@ -822,11 +839,20 @@ internal_run_with_timeout() {
 
   (
   command_pid=$BASHPID
-  (sleep "${timeout_in_seconds}" # set a timeout for this command
-  echo "${command_label:-"The command '${command}'"} (pid: $command_pid) 
did not finish after $timeout_in_seconds seconds."
-  eval "${on_failure}"
-  kill "$command_pid") & watchdog_pid=$!
+  (# this subshell contains the watchdog
+local wakeup_time=$(( ${timeout_in_seconds} + $(date +%s) ))
+while true; do
+  sleep 1
+  if [ $wakeup_time -le $(date +%s) ]; then
+echo "${command_label:-"The command '${command}'"} (pid: 
$command_pid) did not finish after $timeout_in_seconds seconds."
+eval "${on_failure}"
+kill "$command_pid"
+pkill -P "$command_pid"

Review comment:
   Ok, I did some more research on it. Looks like it works like that.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Commented] (FLINK-19250) SplitFetcherManager does not propagate errors correctly

2020-11-15 Thread Stephan Ewen (Jira)



[ 
https://issues.apache.org/jira/browse/FLINK-19250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17232412#comment-17232412
 ] 

Stephan Ewen commented on FLINK-19250:
--

Fixed in 1.11.3 (release-1.11) via f220c2486181e4af19b246e9745a4990cd6aa9ce

> SplitFetcherManager does not propagate errors correctly
> ---
>
> Key: FLINK-19250
> URL: https://issues.apache.org/jira/browse/FLINK-19250
> Project: Flink
>  Issue Type: Bug
>  Components: Connectors / Common
>Affects Versions: 1.11.2
>Reporter: Stephan Ewen
>Assignee: Stephan Ewen
>Priority: Blocker
> Fix For: 1.12.0, 1.11.3
>
>
> The first exception that is reported does not lead to a notification, only 
> successive exceptions do. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (FLINK-19251) Avoid confusing queue handling in "SplitReader.handleSplitsChanges()"

2020-11-15 Thread Stephan Ewen (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-19251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephan Ewen updated FLINK-19251:
-
Fix Version/s: 1.11.3

> Avoid confusing queue handling in "SplitReader.handleSplitsChanges()"
> -
>
> Key: FLINK-19251
> URL: https://issues.apache.org/jira/browse/FLINK-19251
> Project: Flink
>  Issue Type: Improvement
>  Components: Connectors / Common
>Reporter: Stephan Ewen
>Assignee: Stephan Ewen
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.12.0, 1.11.3
>
>
> Currently, the method {{SplitReader.handleSplitsChanges()}} is passed a queue 
> of split changes to handle. The method may decide to handle only a subset of 
> them and is passes later all remaining changes.
> In practice, this ends up being confusing and problematic:
>   - It is important to remove the elements from the queue, not accidentally 
> iterate, or the splits will get handles multiple times
>   - If the queue is not left empty, the task to handle the changes is 
> immediately re-enqueued. No other operation can happen before all split 
> changes from the queue are handled.
> A simpler and more efficient contract would be to simply pass a list of split 
> changes directly and once, for the fetcher to handle. For all implementations 
> so far, this was sufficient and easier.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (FLINK-19251) Avoid confusing queue handling in "SplitReader.handleSplitsChanges()"

2020-11-15 Thread Stephan Ewen (Jira)



[ 
https://issues.apache.org/jira/browse/FLINK-19251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17232413#comment-17232413
 ] 

Stephan Ewen commented on FLINK-19251:
--

Fixed in 1.11.3 (release-1.11) via 257a0da75feb0cf791de01b0ec43467cc433b955

> Avoid confusing queue handling in "SplitReader.handleSplitsChanges()"
> -
>
> Key: FLINK-19251
> URL: https://issues.apache.org/jira/browse/FLINK-19251
> Project: Flink
>  Issue Type: Improvement
>  Components: Connectors / Common
>Reporter: Stephan Ewen
>Assignee: Stephan Ewen
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.12.0, 1.11.3
>
>
> Currently, the method {{SplitReader.handleSplitsChanges()}} is passed a queue 
> of split changes to handle. The method may decide to handle only a subset of 
> them and is passes later all remaining changes.
> In practice, this ends up being confusing and problematic:
>   - It is important to remove the elements from the queue, not accidentally 
> iterate, or the splits will get handles multiple times
>   - If the queue is not left empty, the task to handle the changes is 
> immediately re-enqueued. No other operation can happen before all split 
> changes from the queue are handled.
> A simpler and more efficient contract would be to simply pass a list of split 
> changes directly and once, for the fetcher to handle. For all implementations 
> so far, this was sufficient and easier.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (FLINK-19245) Set default queue capacity for FLIP-27 source handover queue to 2

2020-11-15 Thread Stephan Ewen (Jira)



[ 
https://issues.apache.org/jira/browse/FLINK-19245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17232411#comment-17232411
 ] 

Stephan Ewen commented on FLINK-19245:
--

Fixed in 1.11.3 (release-1.11) via 406aa9f3f647568410ca0d9d27c475ce84a58ece

> Set default queue capacity for FLIP-27 source handover queue to 2
> -
>
> Key: FLINK-19245
> URL: https://issues.apache.org/jira/browse/FLINK-19245
> Project: Flink
>  Issue Type: Improvement
>  Components: Connectors / Common
>Reporter: Stephan Ewen
>Assignee: Stephan Ewen
>Priority: Major
> Fix For: 1.12.0, 1.11.3
>
>
> Based on initial investigation by [~jqin] a capacity value of two results in 
> good queue throughput while still keeping the number of elements small.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (FLINK-19250) SplitFetcherManager does not propagate errors correctly

2020-11-15 Thread Stephan Ewen (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-19250?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephan Ewen updated FLINK-19250:
-
Fix Version/s: 1.11.3

> SplitFetcherManager does not propagate errors correctly
> ---
>
> Key: FLINK-19250
> URL: https://issues.apache.org/jira/browse/FLINK-19250
> Project: Flink
>  Issue Type: Bug
>  Components: Connectors / Common
>Affects Versions: 1.11.2
>Reporter: Stephan Ewen
>Assignee: Stephan Ewen
>Priority: Blocker
> Fix For: 1.12.0, 1.11.3
>
>
> The first exception that is reported does not lead to a notification, only 
> successive exceptions do. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (FLINK-18128) CoordinatedSourceITCase.testMultipleSources gets stuck

2020-11-15 Thread Stephan Ewen (Jira)



[ 
https://issues.apache.org/jira/browse/FLINK-18128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17232409#comment-17232409
 ] 

Stephan Ewen commented on FLINK-18128:
--

Fixed in 1.11.3 (release-1.11) via e72e48533902fe6a7271310736584e77b64d05b8

> CoordinatedSourceITCase.testMultipleSources gets stuck
> --
>
> Key: FLINK-18128
> URL: https://issues.apache.org/jira/browse/FLINK-18128
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Checkpointing, Runtime / Task, Tests
>Affects Versions: 1.11.0
>Reporter: Robert Metzger
>Assignee: Stephan Ewen
>Priority: Critical
>  Labels: pull-request-available, test-stability
> Fix For: 1.12.0, 1.11.3
>
>
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=2705&view=logs&j=fc5181b0-e452-5c8f-68de-1097947f6483&t=6b04ca5f-0b52-511d-19c9-52bf0d9fbdfa
> {code}
> 2020-06-04T11:19:39.6335702Z [INFO] 
> 2020-06-04T11:19:39.6337440Z [INFO] 
> ---
> 2020-06-04T11:19:39.6338176Z [INFO]  T E S T S
> 2020-06-04T11:19:39.6339305Z [INFO] 
> ---
> 2020-06-04T11:19:40.1906157Z [INFO] Running 
> org.apache.flink.connector.base.source.reader.CoordinatedSourceITCase
> 2020-06-04T11:34:51.0599860Z 
> ==
> 2020-06-04T11:34:51.0603015Z Maven produced no output for 900 seconds.
> 2020-06-04T11:34:51.0604174Z 
> ==
> 2020-06-04T11:34:51.0613908Z 
> ==
> 2020-06-04T11:34:51.0615097Z The following Java processes are running (JPS)
> 2020-06-04T11:34:51.0616043Z 
> ==
> 2020-06-04T11:34:51.0762007Z Picked up JAVA_TOOL_OPTIONS: 
> -XX:+HeapDumpOnOutOfMemoryError
> 2020-06-04T11:34:51.2775341Z 29635 surefirebooter5307550588991461882.jar
> 2020-06-04T11:34:51.2931264Z 2100 Launcher
> 2020-06-04T11:34:51.3012583Z 32203 Jps
> 2020-06-04T11:34:51.3258038Z Picked up JAVA_TOOL_OPTIONS: 
> -XX:+HeapDumpOnOutOfMemoryError
> 2020-06-04T11:34:51.5443730Z 
> ==
> 2020-06-04T11:34:51.5445134Z Printing stack trace of Java process 29635
> 2020-06-04T11:34:51.5445984Z 
> ==
> 2020-06-04T11:34:51.5528602Z Picked up JAVA_TOOL_OPTIONS: 
> -XX:+HeapDumpOnOutOfMemoryError
> 2020-06-04T11:34:51.9617670Z 2020-06-04 11:34:51
> 2020-06-04T11:34:51.9619131Z Full thread dump OpenJDK 64-Bit Server VM 
> (25.242-b08 mixed mode):
> 2020-06-04T11:34:51.9619732Z 
> 2020-06-04T11:34:51.9620618Z "Attach Listener" #299 daemon prio=9 os_prio=0 
> tid=0x7f4d60001000 nid=0x7e59 waiting on condition [0x]
> 2020-06-04T11:34:51.9621720Zjava.lang.Thread.State: RUNNABLE
> 2020-06-04T11:34:51.9622190Z 
> 2020-06-04T11:34:51.9623631Z "flink-akka.actor.default-dispatcher-185" #297 
> prio=5 os_prio=0 tid=0x7f4ca0003000 nid=0x7db4 waiting on condition 
> [0x7f4d10136000]
> 2020-06-04T11:34:51.9624972Zjava.lang.Thread.State: WAITING (parking)
> 2020-06-04T11:34:51.9625716Z  at sun.misc.Unsafe.park(Native Method)
> 2020-06-04T11:34:51.9627072Z  - parking to wait for  <0x80c557f0> (a 
> akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinPool)
> 2020-06-04T11:34:51.9628593Z  at 
> akka.dispatch.forkjoin.ForkJoinPool.scan(ForkJoinPool.java:2075)
> 2020-06-04T11:34:51.9629649Z  at 
> akka.dispatch.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
> 2020-06-04T11:34:51.9630825Z  at 
> akka.dispatch.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
> 2020-06-04T11:34:51.9631559Z 
> 2020-06-04T11:34:51.9633020Z "flink-akka.actor.default-dispatcher-186" #298 
> prio=5 os_prio=0 tid=0x7f4d08006800 nid=0x7db3 waiting on condition 
> [0x7f4d12974000]
> 2020-06-04T11:34:51.9634074Zjava.lang.Thread.State: WAITING (parking)
> 2020-06-04T11:34:51.9634965Z  at sun.misc.Unsafe.park(Native Method)
> 2020-06-04T11:34:51.9636384Z  - parking to wait for  <0x80c557f0> (a 
> akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinPool)
> 2020-06-04T11:34:51.9637683Z  at 
> akka.dispatch.forkjoin.ForkJoinPool.scan(ForkJoinPool.java:2075)
> 2020-06-04T11:34:51.9638795Z  at 
> akka.dispatch.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
> 2020-06-04T11:34:51.9639845Z  at 
> akka.dispatch.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
> 2020-06-04T11:34:51.9640475Z 
> 2020-06-04T11:34:51.9642008Z "flink-akka.actor.default-

[jira] [Updated] (FLINK-19245) Set default queue capacity for FLIP-27 source handover queue to 2

2020-11-15 Thread Stephan Ewen (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-19245?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephan Ewen updated FLINK-19245:
-
Fix Version/s: 1.11.3

> Set default queue capacity for FLIP-27 source handover queue to 2
> -
>
> Key: FLINK-19245
> URL: https://issues.apache.org/jira/browse/FLINK-19245
> Project: Flink
>  Issue Type: Improvement
>  Components: Connectors / Common
>Reporter: Stephan Ewen
>Assignee: Stephan Ewen
>Priority: Major
> Fix For: 1.12.0, 1.11.3
>
>
> Based on initial investigation by [~jqin] a capacity value of two results in 
> good queue throughput while still keeping the number of elements small.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (FLINK-19223) Simplify Availability Future Model in Base Connector

2020-11-15 Thread Stephan Ewen (Jira)



[ 
https://issues.apache.org/jira/browse/FLINK-19223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17232410#comment-17232410
 ] 

Stephan Ewen commented on FLINK-19223:
--

Fixed in 1.11.3 (release-1.11) via 0e821eaa483b0371ac1df1339f0d3c9ad7376976

> Simplify Availability Future Model in Base Connector
> 
>
> Key: FLINK-19223
> URL: https://issues.apache.org/jira/browse/FLINK-19223
> Project: Flink
>  Issue Type: Improvement
>  Components: Connectors / Common
>Reporter: Stephan Ewen
>Assignee: Stephan Ewen
>Priority: Critical
> Fix For: 1.12.0
>
>
> The current model implemented by the {{FutureNotifier}} and the 
> {{SourceReaderBase}} has a shortcoming:
>   - It does not support availability notifications where the notification 
> comes before the check. IN that case the notification is lost.
>   - One can see the added complexity created by this model also in the 
> {{SourceReaderBase#isAvailable()}} where the returned future needs to be 
> "post-processed" and eagerly completed if the reader is in fact available. 
> This is based on queue size, which makes it hard to have other conditions.
> I think we can do something that is both easier and a bit more efficient by 
> following a similar model as the 
> {{org.apache.flink.runtime.io.AvailabilityProvider.AvailabilityHelper}}.
> Furthermore, I believe we can win more efficiency by integrating this better 
> with the {{FutureCompletingBlockingQueue}}.
> I suggest to do a similar implementation as the {{AvailabilityHelper}} 
> directly in the {{FutureCompletingBlockingQueue}}.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

1 2 >

1 - 100 of 132 matches

Mail list logo