date:20200821

[GitHub] [flink-docker] zhuzhurk merged pull request #39: Update Dockerfiles for 1.10.2 release

2020-08-21 Thread GitBox



zhuzhurk merged pull request #39:
URL: https://github.com/apache/flink-docker/pull/39


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink-docker] zhuzhurk merged pull request #38: Add GPG key for 1.10.2 release and update test script

2020-08-21 Thread GitBox



zhuzhurk merged pull request #38:
URL: https://github.com/apache/flink-docker/pull/38


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Comment Edited] (FLINK-19016) Checksum mismatch when restore from RocksDB

2020-08-21 Thread Jiayi Liao (Jira)



[ 
https://issues.apache.org/jira/browse/FLINK-19016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17181926#comment-17181926
 ] 

Jiayi Liao edited comment on FLINK-19016 at 8/22/20, 4:30 AM:
--

[~sewen] All of your assumptions are right. (BTW the local recovery is not 
enabled.)

I can share my thoughts here but I'm not sure that I'm completely right. During 
RocksDB's takeDBNativeCheckpoint, RocksDB will invoke {{create_file_cb}} on 
current in-progress file (see 
[here|https://github.com/dataArtisans/frocksdb/blob/49bc897d5d768026f1eb816d960c1f2383396ef4/utilities/checkpoint/checkpoint_impl.cc#L292]),
 which will invoke CreateFile function(see 
[here|https://github.com/dataArtisans/frocksdb/blob/49bc897d5d768026f1eb816d960c1f2383396ef4/util/file_util.cc#L70]).
 Since Flink sets sync mode in db's options, the CreateFile and its Append 
operation may not synchornize the data on disk yet when uploading sst files to 
HDFS. (If using fsync mode, I think it should be guaranteed.)

And the checksum mismatch problem is caused by partial record(see 
[here|https://github.com/dataArtisans/frocksdb/blob/49bc897d5d768026f1eb816d960c1f2383396ef4/db/log_reader.cc#L176]).
 That's why I suspect that there's something wrong with the sync mode set by 
Flink.

I'm not expert on Linux and RocksDB, plz correct me if I'm wrong.


was (Author: wind_ljy):
[~sewen] All of your assumptions are right. (BTW the local recovery is not 
enabled.)

I can share my thoughts here but I'm not sure that I'm completely right. During 
RocksDB's takeDBNativeCheckpoint, RocksDB will invoke {{create_file_cb}} on 
current in-progress file (see 
[here|https://github.com/dataArtisans/frocksdb/blob/49bc897d5d768026f1eb816d960c1f2383396ef4/utilities/checkpoint/checkpoint_impl.cc#L292]),
 which will invoke CreateFile function(see 
[here|https://github.com/dataArtisans/frocksdb/blob/49bc897d5d768026f1eb816d960c1f2383396ef4/util/file_util.cc#L70]).
 Since Flink sets sync mode in db's options, the CreateFile and its Append 
operation may not succeed on disk. (If using fsync, I think it should be 
guranteed.)

And the checksum mismatch problem is caused by partial record(see 
[here|https://github.com/dataArtisans/frocksdb/blob/49bc897d5d768026f1eb816d960c1f2383396ef4/db/log_reader.cc#L176]).
 That's why I suspect that there's something wrong with the sync mode set by 
Flink.

I'm not expert on Linux and RocksDB, plz correct me if I'm wrong.

> Checksum mismatch when restore from RocksDB
> ---
>
> Key: FLINK-19016
> URL: https://issues.apache.org/jira/browse/FLINK-19016
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Checkpointing
>Affects Versions: 1.11.1
>Reporter: Jiayi Liao
>Priority: Major
>
> The error stack is shown below:
> {code:java}
> Caused by: org.apache.flink.util.FlinkException: Could not restore keyed 
> state backend for 
> KeyedMapBundleOperator_44cfc1ca74b40bb44eed1f38f72b3ea9_(71/300) from any of 
> the 1 provided restore options.
> at 
> org.apache.flink.streaming.api.operators.BackendRestorerProcedure.createAndRestore(BackendRestorerProcedure.java:135)
> at 
> org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.keyedStatedBackend(StreamTaskStateInitializerImpl.java:307)
> at 
> org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.streamOperatorStateContext(StreamTaskStateInitializerImpl.java:135)
> ... 6 more
> Caused by: org.apache.flink.runtime.state.BackendBuildingException: Caught 
> unexpected exception.
> at 
> org.apache.flink.contrib.streaming.state.RocksDBKeyedStateBackendBuilder.build(RocksDBKeyedStateBackendBuilder.java:333)
> at 
> org.apache.flink.contrib.streaming.state.RocksDBStateBackend.createKeyedStateBackend(RocksDBStateBackend.java:580)
> at 
> org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.lambda$keyedStatedBackend$1(StreamTaskStateInitializerImpl.java:291)
> at 
> org.apache.flink.streaming.api.operators.BackendRestorerProcedure.attemptCreateAndRestore(BackendRestorerProcedure.java:142)
> at 
> org.apache.flink.streaming.api.operators.BackendRestorerProcedure.createAndRestore(BackendRestorerProcedure.java:121)
> ... 8 more
> Caused by: java.io.IOException: Error while opening RocksDB instance.
> at 
> org.apache.flink.contrib.streaming.state.RocksDBOperationUtils.openDB(RocksDBOperationUtils.java:74)
> at 
> org.apache.flink.contrib.streaming.state.restore.AbstractRocksDBRestoreOperation.openDB(AbstractRocksDBRestoreOperation.java:131)
> at 
> org.apache.flink.contrib.streaming.state.restore.RocksDBIncrementalRestoreOperation.restoreFromLocalState(RocksDBIncrementalRestoreOperation.java:214)
> at 
>

[jira] [Commented] (FLINK-18675) Checkpoint not maintaining minimum pause duration between checkpoints

2020-08-21 Thread Dian Fu (Jira)



[ 
https://issues.apache.org/jira/browse/FLINK-18675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17182236#comment-17182236
 ] 

Dian Fu commented on FLINK-18675:
-

It seems duplicate with FLINK-18856 and has been addressed there. If this is 
the case, we should close this issue. [~raviratnakar]  Could you help to check 
if this is the case?

> Checkpoint not maintaining minimum pause duration between checkpoints
> -
>
> Key: FLINK-18675
> URL: https://issues.apache.org/jira/browse/FLINK-18675
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Checkpointing
>Affects Versions: 1.11.0
> Environment: !image.png!
>Reporter: Ravi Bhushan Ratnakar
>Priority: Critical
> Fix For: 1.12.0, 1.11.2
>
> Attachments: image.png
>
>
> I am running a streaming job with Flink 1.11.0 using kubernetes 
> infrastructure. I have configured checkpoint configuration like below
> Interval - 3 minutes
> Minimum pause between checkpoints - 3 minutes
> Checkpoint timeout - 10 minutes
> Checkpointing Mode - Exactly Once
> Number of Concurrent Checkpoint - 1
>  
> Other configs
> Time Characteristics - Processing Time
>  
> I am observing an usual behaviour. *When a checkpoint completes successfully* 
> *and if it's end to end duration is almost equal or greater than Minimum 
> pause duration then the next checkpoint gets triggered immediately without 
> maintaining the Minimum pause duration*. Kindly notice this behaviour from 
> checkpoint id 194 onward in the attached screenshot



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Closed] (FLINK-18819) All the PubSub tests are failing

2020-08-21 Thread Dian Fu (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-18819?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dian Fu closed FLINK-18819.
---
Fix Version/s: (was: 1.11.2)
   (was: 1.12.0)
   Resolution: Cannot Reproduce

Close this issue for now as it doesn't occur any more.

> All the PubSub tests are failing
> 
>
> Key: FLINK-18819
> URL: https://issues.apache.org/jira/browse/FLINK-18819
> Project: Flink
>  Issue Type: Bug
>  Components: Connectors / Google Cloud PubSub, Tests
>Affects Versions: 1.12.0, 1.11.2
>Reporter: Dian Fu
>Priority: Critical
>  Labels: test-stability
>
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=5177=logs=08866332-78f7-59e4-4f7e-49a56faa3179=7f606211-1454-543c-70ab-c7a028a1ce8c
> {code}
> 2020-08-04T21:18:11.4228635Z 63538 [main] INFO  
> com.spotify.docker.client.DefaultDockerClient [] - Starting container with 
> Id: a40e3b941371ddd82f4efe0ec6f72b89f36868316e5d8cc52dacc64f0bc79029
> 2020-08-04T21:18:13.1868394Z [ERROR] Tests run: 3, Failures: 1, Errors: 2, 
> Skipped: 0, Time elapsed: 64.94 s <<< FAILURE! - in 
> org.apache.flink.streaming.connectors.gcp.pubsub.EmulatedPubSubSinkTest
> 2020-08-04T21:18:13.1869974Z [ERROR] 
> org.apache.flink.streaming.connectors.gcp.pubsub.EmulatedPubSubSinkTest  Time 
> elapsed: 64.938 s  <<< FAILURE!
> 2020-08-04T21:18:13.1870671Z java.lang.AssertionError: We expect 1 port to be 
> mapped expected:<1> but was:<0>
> 2020-08-04T21:18:13.1871219Z  at org.junit.Assert.fail(Assert.java:88)
> 2020-08-04T21:18:13.1871693Z  at 
> org.junit.Assert.failNotEquals(Assert.java:834)
> 2020-08-04T21:18:13.1872167Z  at 
> org.junit.Assert.assertEquals(Assert.java:645)
> 2020-08-04T21:18:13.1872808Z  at 
> org.apache.flink.streaming.connectors.gcp.pubsub.emulator.GCloudEmulatorManager.launchDocker(GCloudEmulatorManager.java:141)
> 2020-08-04T21:18:13.1873573Z  at 
> org.apache.flink.streaming.connectors.gcp.pubsub.emulator.GCloudUnitTestBase.launchGCloudEmulator(GCloudUnitTestBase.java:45)
> 2020-08-04T21:18:13.1874205Z  at 
> sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 2020-08-04T21:18:13.1874742Z  at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> 2020-08-04T21:18:13.1875353Z  at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> 2020-08-04T21:18:13.1875912Z  at 
> java.lang.reflect.Method.invoke(Method.java:498)
> 2020-08-04T21:18:13.1876472Z  at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
> 2020-08-04T21:18:13.1877081Z  at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
> 2020-08-04T21:18:13.1877695Z  at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
> 2020-08-04T21:18:13.1878283Z  at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:24)
> 2020-08-04T21:18:13.1878876Z  at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
> 2020-08-04T21:18:13.1879425Z  at 
> org.junit.runners.ParentRunner.run(ParentRunner.java:363)
> 2020-08-04T21:18:13.1880172Z  at 
> org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365)
> 2020-08-04T21:18:13.1880815Z  at 
> org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:273)
> 2020-08-04T21:18:13.1881464Z  at 
> org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:238)
> 2020-08-04T21:18:13.1882083Z  at 
> org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159)
> 2020-08-04T21:18:13.1882736Z  at 
> org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:384)
> 2020-08-04T21:18:13.1883399Z  at 
> org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:345)
> 2020-08-04T21:18:13.1883999Z  at 
> org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:126)
> 2020-08-04T21:18:13.1884593Z  at 
> org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:418)
> 2020-08-04T21:18:13.1884989Z 
> 2020-08-04T21:18:13.1885496Z [ERROR] 
> org.apache.flink.streaming.connectors.gcp.pubsub.EmulatedPubSubSinkTest  Time 
> elapsed: 64.939 s  <<< ERROR!
> 2020-08-04T21:18:13.1886024Z java.lang.NullPointerException
> 2020-08-04T21:18:13.1886588Z  at 
> org.apache.flink.streaming.connectors.gcp.pubsub.EmulatedPubSubSinkTest.tearDown(EmulatedPubSubSinkTest.java:62)
> 2020-08-04T21:18:13.1887193Z  at 
> sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 2020-08-04T21:18:13.1887864Z  at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> 2020-08-04T21:18:13.1888483Z  at 
>

[jira] [Commented] (FLINK-17274) Maven: Premature end of Content-Length delimited message body

2020-08-21 Thread Dian Fu (Jira)



[ 
https://issues.apache.org/jira/browse/FLINK-17274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17182233#comment-17182233
 ] 

Dian Fu commented on FLINK-17274:
-

https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=5781=logs=f0ac5c25-1168-55a5-07ff-0e88223afed9=39a61cac-5c62-532f-d2c1-dea450a66708

> Maven: Premature end of Content-Length delimited message body
> -
>
> Key: FLINK-17274
> URL: https://issues.apache.org/jira/browse/FLINK-17274
> Project: Flink
>  Issue Type: Bug
>  Components: Build System / Azure Pipelines
>Reporter: Robert Metzger
>Assignee: Robert Metzger
>Priority: Critical
>  Labels: test-stability
> Fix For: 1.12.0
>
>
> CI: 
> https://dev.azure.com/rmetzger/Flink/_build/results?buildId=7786=logs=52b61abe-a3cc-5bde-cc35-1bbe89bb7df5=54421a62-0c80-5aad-3319-094ff69180bb
> {code}
> [ERROR] Failed to execute goal on project 
> flink-connector-elasticsearch7_2.11: Could not resolve dependencies for 
> project 
> org.apache.flink:flink-connector-elasticsearch7_2.11:jar:1.11-SNAPSHOT: Could 
> not transfer artifact org.apache.lucene:lucene-sandbox:jar:8.3.0 from/to 
> alicloud-mvn-mirror 
> (http://mavenmirror.alicloud.dak8s.net:/repository/maven-central/): GET 
> request of: org/apache/lucene/lucene-sandbox/8.3.0/lucene-sandbox-8.3.0.jar 
> from alicloud-mvn-mirror failed: Premature end of Content-Length delimited 
> message body (expected: 289920; received: 239832 -> [Help 1]
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (FLINK-16947) ArtifactResolutionException: Could not transfer artifact. Entry [...] has not been leased from this pool

2020-08-21 Thread Dian Fu (Jira)



[ 
https://issues.apache.org/jira/browse/FLINK-16947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17182232#comment-17182232
 ] 

Dian Fu commented on FLINK-16947:
-

https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=5781=logs=f450c1a5-64b1-5955-e215-49cb1ad5ec88=61b612c5-1c7d-5289-27c2-71f332ae98d7

> ArtifactResolutionException: Could not transfer artifact.  Entry [...] has 
> not been leased from this pool
> -
>
> Key: FLINK-16947
> URL: https://issues.apache.org/jira/browse/FLINK-16947
> Project: Flink
>  Issue Type: Bug
>  Components: Build System / Azure Pipelines
>Reporter: Piotr Nowojski
>Assignee: Robert Metzger
>Priority: Blocker
>  Labels: test-stability
> Fix For: 1.12.0
>
>
> https://dev.azure.com/rmetzger/Flink/_build/results?buildId=6982=logs=c88eea3b-64a0-564d-0031-9fdcd7b8abee=1e2bbe5b-4657-50be-1f07-d84bfce5b1f5
> Build of flink-metrics-availability-test failed with:
> {noformat}
> [ERROR] Failed to execute goal 
> org.apache.maven.plugins:maven-surefire-plugin:2.22.1:test (end-to-end-tests) 
> on project flink-metrics-availability-test: Unable to generate classpath: 
> org.apache.maven.artifact.resolver.ArtifactResolutionException: Could not 
> transfer artifact org.apache.maven.surefire:surefire-grouper:jar:2.22.1 
> from/to google-maven-central 
> (https://maven-central-eu.storage-download.googleapis.com/maven2/): Entry 
> [id:13][route:{s}->https://maven-central-eu.storage-download.googleapis.com:443][state:null]
>  has not been leased from this pool
> [ERROR] org.apache.maven.surefire:surefire-grouper:jar:2.22.1
> [ERROR] 
> [ERROR] from the specified remote repositories:
> [ERROR] google-maven-central 
> (https://maven-central-eu.storage-download.googleapis.com/maven2/, 
> releases=true, snapshots=false),
> [ERROR] apache.snapshots (https://repository.apache.org/snapshots, 
> releases=false, snapshots=true)
> [ERROR] Path to dependency:
> [ERROR] 1) dummy:dummy:jar:1.0
> [ERROR] 2) org.apache.maven.surefire:surefire-junit47:jar:2.22.1
> [ERROR] 3) org.apache.maven.surefire:common-junit48:jar:2.22.1
> [ERROR] 4) org.apache.maven.surefire:surefire-grouper:jar:2.22.1
> [ERROR] -> [Help 1]
> [ERROR] 
> [ERROR] To see the full stack trace of the errors, re-run Maven with the -e 
> switch.
> [ERROR] Re-run Maven using the -X switch to enable full debug logging.
> [ERROR] 
> [ERROR] For more information about the errors and possible solutions, please 
> read the following articles:
> [ERROR] [Help 1] 
> http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException
> [ERROR] 
> [ERROR] After correcting the problems, you can resume the build with the 
> command
> [ERROR]   mvn  -rf :flink-metrics-availability-test
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (FLINK-18634) FlinkKafkaProducerITCase.testRecoverCommittedTransaction failed with "Timeout expired after 60000milliseconds while awaiting InitProducerId"

2020-08-21 Thread Dian Fu (Jira)



[ 
https://issues.apache.org/jira/browse/FLINK-18634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17182231#comment-17182231
 ] 

Dian Fu commented on FLINK-18634:
-

https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=5780=logs=1fc6e7bf-633c-5081-c32a-9dea24b05730=80a658d1-f7f6-5d93-2758-53ac19fd5b19

> FlinkKafkaProducerITCase.testRecoverCommittedTransaction failed with "Timeout 
> expired after 6milliseconds while awaiting InitProducerId"
> 
>
> Key: FLINK-18634
> URL: https://issues.apache.org/jira/browse/FLINK-18634
> Project: Flink
>  Issue Type: Bug
>  Components: Connectors / Kafka, Tests
>Affects Versions: 1.11.0
>Reporter: Dian Fu
>Priority: Major
>  Labels: test-stability
>
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=4590=logs=c5f0071e-1851-543e-9a45-9ac140befc32=684b1416-4c17-504e-d5ab-97ee44e08a20
> {code}
> 2020-07-17T11:43:47.9693015Z [ERROR] Tests run: 12, Failures: 0, Errors: 1, 
> Skipped: 0, Time elapsed: 269.399 s <<< FAILURE! - in 
> org.apache.flink.streaming.connectors.kafka.FlinkKafkaProducerITCase
> 2020-07-17T11:43:47.9693862Z [ERROR] 
> testRecoverCommittedTransaction(org.apache.flink.streaming.connectors.kafka.FlinkKafkaProducerITCase)
>   Time elapsed: 60.679 s  <<< ERROR!
> 2020-07-17T11:43:47.9694737Z org.apache.kafka.common.errors.TimeoutException: 
> org.apache.kafka.common.errors.TimeoutException: Timeout expired after 
> 6milliseconds while awaiting InitProducerId
> 2020-07-17T11:43:47.9695376Z Caused by: 
> org.apache.kafka.common.errors.TimeoutException: Timeout expired after 
> 6milliseconds while awaiting InitProducerId
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Comment Edited] (FLINK-18634) FlinkKafkaProducerITCase.testRecoverCommittedTransaction failed with "Timeout expired after 60000milliseconds while awaiting InitProducerId"

2020-08-21 Thread Dian Fu (Jira)



[ 
https://issues.apache.org/jira/browse/FLINK-18634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17182231#comment-17182231
 ] 

Dian Fu edited comment on FLINK-18634 at 8/22/20, 2:46 AM:
---

Instance on master:
https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=5780=logs=1fc6e7bf-633c-5081-c32a-9dea24b05730=80a658d1-f7f6-5d93-2758-53ac19fd5b19


was (Author: dian.fu):
https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=5780=logs=1fc6e7bf-633c-5081-c32a-9dea24b05730=80a658d1-f7f6-5d93-2758-53ac19fd5b19

> FlinkKafkaProducerITCase.testRecoverCommittedTransaction failed with "Timeout 
> expired after 6milliseconds while awaiting InitProducerId"
> 
>
> Key: FLINK-18634
> URL: https://issues.apache.org/jira/browse/FLINK-18634
> Project: Flink
>  Issue Type: Bug
>  Components: Connectors / Kafka, Tests
>Affects Versions: 1.11.0
>Reporter: Dian Fu
>Priority: Major
>  Labels: test-stability
>
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=4590=logs=c5f0071e-1851-543e-9a45-9ac140befc32=684b1416-4c17-504e-d5ab-97ee44e08a20
> {code}
> 2020-07-17T11:43:47.9693015Z [ERROR] Tests run: 12, Failures: 0, Errors: 1, 
> Skipped: 0, Time elapsed: 269.399 s <<< FAILURE! - in 
> org.apache.flink.streaming.connectors.kafka.FlinkKafkaProducerITCase
> 2020-07-17T11:43:47.9693862Z [ERROR] 
> testRecoverCommittedTransaction(org.apache.flink.streaming.connectors.kafka.FlinkKafkaProducerITCase)
>   Time elapsed: 60.679 s  <<< ERROR!
> 2020-07-17T11:43:47.9694737Z org.apache.kafka.common.errors.TimeoutException: 
> org.apache.kafka.common.errors.TimeoutException: Timeout expired after 
> 6milliseconds while awaiting InitProducerId
> 2020-07-17T11:43:47.9695376Z Caused by: 
> org.apache.kafka.common.errors.TimeoutException: Timeout expired after 
> 6milliseconds while awaiting InitProducerId
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (FLINK-19012) E2E test fails with "Cannot register Closeable, this subtaskCheckpointCoordinator is already closed. Closing argument."

2020-08-21 Thread Dian Fu (Jira)



[ 
https://issues.apache.org/jira/browse/FLINK-19012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17182230#comment-17182230
 ] 

Dian Fu commented on FLINK-19012:
-

Another instance on master: 
https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=5780=logs=739e6eac-8312-5d31-d437-294c4d26fced=a68b8d89-50e9-5977-4500-f4fde4f57f9b

> E2E test fails with "Cannot register Closeable, this 
> subtaskCheckpointCoordinator is already closed. Closing argument."
> ---
>
> Key: FLINK-19012
> URL: https://issues.apache.org/jira/browse/FLINK-19012
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Checkpointing, Runtime / Task, Tests
>Affects Versions: 1.12.0
>Reporter: Robert Metzger
>Priority: Major
>  Labels: test-stability
>
> Note: This error occurred in a custom branch with unreviewed changes. I don't 
> believe my changes affect this error, but I would keep this in mind when 
> investigating the error: 
> https://dev.azure.com/rmetzger/Flink/_build/results?buildId=8307=logs=1f3ed471-1849-5d3c-a34c-19792af4ad16=0d2e35fc-a330-5cf2-a012-7267e2667b1d
>  
> {code}
> 2020-08-20T20:55:30.2400645Z 2020-08-20 20:55:22,373 INFO  
> org.apache.flink.runtime.taskmanager.Task[] - Registering 
> task at network: Source: Sequence Source -> Flat Map -> Sink: Unnamed (1/1) 
> (cbc357ccb763df2852fee8c4fc7d55f2_0_0) [DEPLOYING].
> 2020-08-20T20:55:30.2402392Z 2020-08-20 20:55:22,401 INFO  
> org.apache.flink.streaming.runtime.tasks.StreamTask  [] - No state 
> backend has been configured, using default (Memory / JobManager) 
> MemoryStateBackend (data in heap memory / checkpoints to JobManager) 
> (checkpoints: 'null', savepoints: 'null', asynchronous: TRUE, maxStateSize: 
> 5242880)
> 2020-08-20T20:55:30.2404297Z 2020-08-20 20:55:22,413 INFO  
> org.apache.flink.runtime.taskmanager.Task[] - Source: 
> Sequence Source -> Flat Map -> Sink: Unnamed (1/1) 
> (cbc357ccb763df2852fee8c4fc7d55f2_0_0) switched from DEPLOYING to RUNNING.
> 2020-08-20T20:55:30.2405805Z 2020-08-20 20:55:22,786 INFO  
> org.apache.flink.streaming.connectors.elasticsearch6.Elasticsearch6ApiCallBridge
>  [] - Pinging Elasticsearch cluster via hosts [http://127.0.0.1:9200] ...
> 2020-08-20T20:55:30.2407027Z 2020-08-20 20:55:22,848 INFO  
> org.apache.flink.streaming.connectors.elasticsearch6.Elasticsearch6ApiCallBridge
>  [] - Elasticsearch RestHighLevelClient is connected to 
> [http://127.0.0.1:9200]
> 2020-08-20T20:55:30.2409277Z 2020-08-20 20:55:29,205 INFO  
> org.apache.flink.runtime.checkpoint.channel.ChannelStateWriteRequestExecutorImpl
>  [] - Source: Sequence Source -> Flat Map -> Sink: Unnamed (1/1) discarding 0 
> drained requests
> 2020-08-20T20:55:30.2410690Z 2020-08-20 20:55:29,218 INFO  
> org.apache.flink.runtime.taskmanager.Task[] - Source: 
> Sequence Source -> Flat Map -> Sink: Unnamed (1/1) 
> (cbc357ccb763df2852fee8c4fc7d55f2_0_0) switched from RUNNING to FINISHED.
> 2020-08-20T20:55:30.2412187Z 2020-08-20 20:55:29,218 INFO  
> org.apache.flink.runtime.taskmanager.Task[] - Freeing 
> task resources for Source: Sequence Source -> Flat Map -> Sink: Unnamed (1/1) 
> (cbc357ccb763df2852fee8c4fc7d55f2_0_0).
> 2020-08-20T20:55:30.2414203Z 2020-08-20 20:55:29,224 INFO  
> org.apache.flink.runtime.taskexecutor.TaskExecutor   [] - 
> Un-registering task and sending final execution state FINISHED to JobManager 
> for task Source: Sequence Source -> Flat Map -> Sink: Unnamed (1/1) 
> cbc357ccb763df2852fee8c4fc7d55f2_0_0.
> 2020-08-20T20:55:30.2415602Z 2020-08-20 20:55:29,219 INFO  
> org.apache.flink.streaming.runtime.tasks.AsyncCheckpointRunnable [] - Source: 
> Sequence Source -> Flat Map -> Sink: Unnamed (1/1) - asynchronous part of 
> checkpoint 1 could not be completed.
> 2020-08-20T20:55:30.2416411Z java.io.UncheckedIOException: 
> java.io.IOException: Cannot register Closeable, this 
> subtaskCheckpointCoordinator is already closed. Closing argument.
> 2020-08-20T20:55:30.2418956Z  at 
> org.apache.flink.streaming.runtime.tasks.SubtaskCheckpointCoordinatorImpl.lambda$registerConsumer$2(SubtaskCheckpointCoordinatorImpl.java:468)
>  ~[flink-dist_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
> 2020-08-20T20:55:30.2420100Z  at 
> org.apache.flink.streaming.runtime.tasks.AsyncCheckpointRunnable.run(AsyncCheckpointRunnable.java:91)
>  [flink-dist_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
> 2020-08-20T20:55:30.2420927Z  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  [?:1.8.0_265]
> 2020-08-20T20:55:30.2421455Z  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  [?:1.8.0_265]
>

[jira] [Updated] (FLINK-18855) Translate the "Cluster Execution" page of "Application Development's DataSet API" into Chinese

2020-08-21 Thread Dian Fu (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-18855?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dian Fu updated FLINK-18855:

Fix Version/s: 1.12.0

> Translate the "Cluster Execution" page of "Application Development's DataSet 
> API" into Chinese
> --
>
> Key: FLINK-18855
> URL: https://issues.apache.org/jira/browse/FLINK-18855
> Project: Flink
>  Issue Type: Improvement
>  Components: Documentation
>Affects Versions: 1.10.1, 1.11.0, 1.11.1
>Reporter: Roc Marshal
>Assignee: Roc Marshal
>Priority: Major
>  Labels: documentation, pull-request-available
> Fix For: 1.12.0
>
>
> The markdown file is located in *flink/docs/dev/cluster_execution.zh.md*
> The page url is 
> https://ci.apache.org/projects/flink/flink-docs-release-1.11/zh/dev/cluster_execution.html



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Closed] (FLINK-18855) Translate the "Cluster Execution" page of "Application Development's DataSet API" into Chinese

2020-08-21 Thread Dian Fu (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-18855?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dian Fu closed FLINK-18855.
---
Resolution: Fixed

master: 5500435372cdf258e4f0a646089c85389e25e94d

> Translate the "Cluster Execution" page of "Application Development's DataSet 
> API" into Chinese
> --
>
> Key: FLINK-18855
> URL: https://issues.apache.org/jira/browse/FLINK-18855
> Project: Flink
>  Issue Type: Improvement
>  Components: Documentation
>Affects Versions: 1.10.1, 1.11.0, 1.11.1
>Reporter: Roc Marshal
>Assignee: Roc Marshal
>Priority: Major
>  Labels: documentation, pull-request-available
>
> The markdown file is located in *flink/docs/dev/cluster_execution.zh.md*
> The page url is 
> https://ci.apache.org/projects/flink/flink-docs-release-1.11/zh/dev/cluster_execution.html



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[GitHub] [flink] dianfu closed pull request #13173: [FLINK-18855][docs-zh] Translate the "Cluster Execution" page of "Application Development's DataSet API" into Chinese

2020-08-21 Thread GitBox



dianfu closed pull request #13173:
URL: https://github.com/apache/flink/pull/13173


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] dianfu commented on a change in pull request #13173: [FLINK-18855][docs-zh] Translate the "Cluster Execution" page of "Application Development's DataSet API" into Chinese

2020-08-21 Thread GitBox



dianfu commented on a change in pull request #13173:
URL: https://github.com/apache/flink/pull/13173#discussion_r475030431



##
File path: docs/dev/cluster_execution.zh.md
##
@@ -25,27 +25,27 @@ under the License.
 * This will be replaced by the TOC
 {:toc}
 
-Flink programs can run distributed on clusters of many machines. There
-are two ways to send a program to a cluster for execution:
+Flink 程序可以分布式运行在多机器集群上。有两种方式可以将程序提交到集群上执行：
 
-## Command Line Interface
+
 
-The command line interface lets you submit packaged programs (JARs) to a 
cluster
-(or single machine setup).
+## 命令行界面（Interface）
 
-Please refer to the [Command Line Interface]({{ site.baseurl }}/ops/cli.html) 
documentation for
-details.
+命令行界面使你可以将打包的程序（JARs）提交到集群（或单机设置）。
 
-## Remote Environment
+有关详细信息，请参阅[命令行界面]({% link ops/cli.zh.md %})文档。
 
-The remote environment lets you execute Flink Java programs on a cluster
-directly. The remote environment points to the cluster on which you want to
-execute the program.
+
+
+## 远程环境（Remote Environment）
+
+远程环境使你可以直接在集群上执行 Flink Java 程序。远程环境指向你要执行程序的群集。

Review comment:
   群集 -> 集群





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] RocMarshal commented on pull request #13172: [FLINK-18854][docs-zh] Translate the 'API Migration Guides' page of 'Application Development' into Chinese

2020-08-21 Thread GitBox



RocMarshal commented on pull request #13172:
URL: https://github.com/apache/flink/pull/13172#issuecomment-678572640


   @Thesharing  
   Could you help me to review this PR if you have free time?
   Thank you.
   
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] RocMarshal commented on pull request #13173: [FLINK-18855][docs-zh] Translate the "Cluster Execution" page of "Application Development's DataSet API" into Chinese

2020-08-21 Thread GitBox



RocMarshal commented on pull request #13173:
URL: https://github.com/apache/flink/pull/13173#issuecomment-678571817


   @dianfu 
   Could you help me to merge this PR if there is nothing inappropriate？
   
   Thank you.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Closed] (FLINK-18813) Translate the 'Local Installation' page of 'Try Flink' into Chinese

2020-08-21 Thread Dian Fu (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-18813?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dian Fu closed FLINK-18813.
---
Fix Version/s: 1.12.0
   Resolution: Fixed

master: c9e46694fcf0e84c7b5d9428a31a77683ec0170d

> Translate the 'Local Installation' page of 'Try Flink' into Chinese
> ---
>
> Key: FLINK-18813
> URL: https://issues.apache.org/jira/browse/FLINK-18813
> Project: Flink
>  Issue Type: Task
>  Components: Documentation
>Affects Versions: 1.10.1, 1.11.0, 1.11.1
>Reporter: Roc Marshal
>Assignee: Roc Marshal
>Priority: Major
>  Labels: documentaion, pull-request-available
> Fix For: 1.12.0
>
>
> The page url is 
> [https://ci.apache.org/projects/flink/flink-docs-release-1.11/zh/try-flink/local_installation.html]
> The markdown file is located in 
> *flink/docs/try-flink/local_installation.zh.md*



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[GitHub] [flink] dianfu closed pull request #13089: [FLINK-18813][docs-zh] Translate the 'Local Installation' page of 'Try Flink' into Chinese

2020-08-21 Thread GitBox



dianfu closed pull request #13089:
URL: https://github.com/apache/flink/pull/13089


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] dianfu commented on pull request #13089: [FLINK-18813][docs-zh] Translate the 'Local Installation' page of 'Try Flink' into Chinese

2020-08-21 Thread GitBox



dianfu commented on pull request #13089:
URL: https://github.com/apache/flink/pull/13089#issuecomment-678570567


   Merging...



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] RocMarshal commented on a change in pull request #13089: [FLINK-18813][docs-zh] Translate the 'Local Installation' page of 'Try Flink' into Chinese

2020-08-21 Thread GitBox



RocMarshal commented on a change in pull request #13089:
URL: https://github.com/apache/flink/pull/13089#discussion_r475024133



##
File path: docs/try-flink/local_installation.zh.md
##
@@ -26,36 +26,35 @@ under the License.
 {% if site.version contains "SNAPSHOT" %}
 
   
-  NOTE: The Apache Flink community only publishes official builds for
-  released versions of Apache Flink.
+  注意：Apache Flink 社区只发布 Apache Flink 的 release 版本。
   
-  Since you are currently looking at the latest SNAPSHOT
-  version of the documentation, all version references below will not work.
-  Please switch the documentation to the latest released version via the 
release picker which you
-  find on the left side below the menu.
+  由于你当前正在查看的是文档最新的 SNAPSHOT 版本，因此相关内容会被隐藏。请通过左侧菜单底部的版本选择将文档切换到最新的 release 版本。
 
 {% else %}
-Follow these few steps to download the latest stable versions and get started.
+请按照以下几个步骤下载最新的稳定版本开始使用。
 
-## Step 1: Download
+

Review comment:
   @dianfu 
   The item `4. 标题锚点链接` of this page 
`https://cwiki.apache.org/confluence/display/FLINK/Flink+Translation+Specifications`





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] dianfu commented on a change in pull request #13089: [FLINK-18813][docs-zh] Translate the 'Local Installation' page of 'Try Flink' into Chinese

2020-08-21 Thread GitBox



dianfu commented on a change in pull request #13089:
URL: https://github.com/apache/flink/pull/13089#discussion_r474762668



##
File path: docs/try-flink/local_installation.zh.md
##
@@ -26,36 +26,35 @@ under the License.
 {% if site.version contains "SNAPSHOT" %}
 
   
-  NOTE: The Apache Flink community only publishes official builds for
-  released versions of Apache Flink.
+  注意：Apache Flink 社区只发布 Apache Flink 的 release 版本。
   
-  Since you are currently looking at the latest SNAPSHOT
-  version of the documentation, all version references below will not work.
-  Please switch the documentation to the latest released version via the 
release picker which you
-  find on the left side below the menu.
+  由于你当前正在查看的是文档最新的 SNAPSHOT 版本，因此相关内容会被隐藏。请通过左侧菜单底部的版本选择将文档切换到最新的 release 版本。
 
 {% else %}
-Follow these few steps to download the latest stable versions and get started.
+请按照以下几个步骤下载最新的稳定版本开始使用。
 
-## Step 1: Download
+

Review comment:
   Just wondering why we need this line? 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] flinkbot edited a comment on pull request #13175: [FLINK-18955][Checkpointing] Add checkpoint path to job startup/restore message

2020-08-21 Thread GitBox



flinkbot edited a comment on pull request #13175:
URL: https://github.com/apache/flink/pull/13175#issuecomment-674839926


   
   ## CI report:
   
   * 78f6ef8e0aa36ddc08ecd4a6984ec465c0192ec7 Azure: 
[SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=5779)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink-statefun] sjwiesman commented on pull request #132: [FLINK-18979][docs] Update docs to better emphasize remote functions

2020-08-21 Thread GitBox



sjwiesman commented on pull request #132:
URL: https://github.com/apache/flink-statefun/pull/132#issuecomment-678432084


   Thanks for the review @morsapaes . I've addressed your feedback and also 
updated the README's in a seperate commit. I'm going on vacation next week so I 
went ahead and squashed my fixup commits. 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink-statefun] sjwiesman commented on a change in pull request #132: [FLINK-18979][docs] Update docs to better emphasize remote functions

2020-08-21 Thread GitBox



sjwiesman commented on a change in pull request #132:
URL: https://github.com/apache/flink-statefun/pull/132#discussion_r474857047



##
File path: docs/index.md
##
@@ -23,16 +23,19 @@ specific language governing permissions and limitations
 under the License.
 -->
 
-**Stateful Functions** is an open source framework that reduces the complexity 
of building and orchestrating distributed stateful applications at scale.
-It brings together the benefits of stream processing with Apache Flink® and 
Function-as-a-Service (FaaS) to provide a powerful abstraction for the next 
generation of event-driven architectures.
+Stateful Functions is an API that simplifies the building of **distributed 
stateful applications** with a **runtime built for serverless architectures**.
+It brings together the benefits of stateful stream processing, the processing 
of large datasets with low latency and bounded resource constraints,

Review comment:
   Hmm, yeah I'll try something





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] morsapaes commented on pull request #13203: [FLINK-18984][python][docs] Add tutorial documentation for Python DataStream API

2020-08-21 Thread GitBox



morsapaes commented on pull request #13203:
URL: https://github.com/apache/flink/pull/13203#issuecomment-678426366


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] sjwiesman commented on pull request #13203: [FLINK-18984][python][docs] Add tutorial documentation for Python DataStream API

2020-08-21 Thread GitBox



sjwiesman commented on pull request #13203:
URL: https://github.com/apache/flink/pull/13203#issuecomment-678424666


   I'm going to be on vacation for the next week, maybe @morsapaes can take a 
look at this? 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] flinkbot edited a comment on pull request #13175: [FLINK-18955][Checkpointing] Add checkpoint path to job startup/restore message

2020-08-21 Thread GitBox



flinkbot edited a comment on pull request #13175:
URL: https://github.com/apache/flink/pull/13175#issuecomment-674839926


   
   ## CI report:
   
   * 635f839466122e36674a38a9845c2d6b5eb5c244 Azure: 
[SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=5735)
 
   * 78f6ef8e0aa36ddc08ecd4a6984ec465c0192ec7 Azure: 
[PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=5779)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] flinkbot edited a comment on pull request #13217: [FLINK-16866] Make job submission non-blocking

2020-08-21 Thread GitBox



flinkbot edited a comment on pull request #13217:
URL: https://github.com/apache/flink/pull/13217#issuecomment-678285884


   
   ## CI report:
   
   * 85c9bbac2d9ca07435fbd80d76208fa2a51e5d37 Azure: 
[FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=5776)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] flinkbot edited a comment on pull request #13175: [FLINK-18955][Checkpointing] Add checkpoint path to job startup/restore message

2020-08-21 Thread GitBox



flinkbot edited a comment on pull request #13175:
URL: https://github.com/apache/flink/pull/13175#issuecomment-674839926


   
   ## CI report:
   
   * 635f839466122e36674a38a9845c2d6b5eb5c244 Azure: 
[SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=5735)
 
   * 78f6ef8e0aa36ddc08ecd4a6984ec465c0192ec7 UNKNOWN
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] curcur commented on pull request #13175: [FLINK-18955][Checkpointing] Add checkpoint path to job startup/restore message

2020-08-21 Thread GitBox



curcur commented on pull request #13175:
URL: https://github.com/apache/flink/pull/13175#issuecomment-678391416


   With updated version
   
   ```
   9213 [flink-akka.actor.default-dispatcher-3] INFO  
org.apache.flink.runtime.checkpoint.CheckpointCoordinator [] - Restoring job 
49eabb33b8dfc353a4ca205f5f02b118 from Checkpoint 1 @ 1598017016795 for 
49eabb33b8dfc353a4ca205f5f02b118 located at 
file:/var/folders/dm/5xn_h6n9135dwy4j27sr65zhgp/T/junit3638159208308156331/junit7166739105409262417/checkpoints/49eabb33b8dfc353a4ca205f5f02b118/chk-1.
   ```



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] curcur commented on a change in pull request #13175: [FLINK-18955][Checkpointing] Add checkpoint path to job startup/restore message

2020-08-21 Thread GitBox



curcur commented on a change in pull request #13175:
URL: https://github.com/apache/flink/pull/13175#discussion_r474816716



##
File path: 
flink-runtime/src/main/java/org/apache/flink/runtime/checkpoint/CompletedCheckpoint.java
##
@@ -320,6 +320,12 @@ void setDiscardCallback(@Nullable 
CompletedCheckpointStats.DiscardCallback disca
 
@Override
public String toString() {
-   return String.format("Checkpoint %d @ %d for %s", checkpointID, 
timestamp, job);
+   return String.format(
+   "%s %d @ %d for %s located at %s",
+   props.getCheckpointType(),

Review comment:
   updated :-)
   
   I can double-check whether SYNC_SAVEPOINT is user-aware or not.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] curcur commented on a change in pull request #13175: [FLINK-18955][Checkpointing] Add checkpoint path to job startup/restore message

2020-08-21 Thread GitBox



curcur commented on a change in pull request #13175:
URL: https://github.com/apache/flink/pull/13175#discussion_r474816716



##
File path: 
flink-runtime/src/main/java/org/apache/flink/runtime/checkpoint/CompletedCheckpoint.java
##
@@ -320,6 +320,12 @@ void setDiscardCallback(@Nullable 
CompletedCheckpointStats.DiscardCallback disca
 
@Override
public String toString() {
-   return String.format("Checkpoint %d @ %d for %s", checkpointID, 
timestamp, job);
+   return String.format(
+   "%s %d @ %d for %s located at %s",
+   props.getCheckpointType(),

Review comment:
   updated :-)
   
   Do you mean SYNC_SAVEPOINT is not user-aware? I do not know...





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Closed] (FLINK-18993) Invoke sanityCheckTotalFlinkMemory method incorrectly in JobManagerFlinkMemoryUtils.java

2020-08-21 Thread Till Rohrmann (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-18993?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Till Rohrmann closed FLINK-18993.
-
Resolution: Fixed

Fixed via

master: df525b77d29ccd89649a64e5faad96c93f61ca08
1.11.2: 5e9bd610f3f1cb62e00c7baf1f02409c2d017c05

> Invoke sanityCheckTotalFlinkMemory method incorrectly in 
> JobManagerFlinkMemoryUtils.java
> 
>
> Key: FLINK-18993
> URL: https://issues.apache.org/jira/browse/FLINK-18993
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Configuration
>Affects Versions: 1.11.0
>Reporter: Peng
>Assignee: Peng
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 1.12.0, 1.11.2
>
>
> In deriveFromRequiredFineGrainedOptions method from 
> JobManagerFlinkMemoryUtils.java, it will call sanityCheckTotalFlinkMemory 
> method to check heap, off-heap and total memory as below.
> {code:java}
> sanityCheckTotalFlinkMemory(totalFlinkMemorySize, jvmHeapMemorySize, 
> totalFlinkMemorySize);
> {code}
> As I understand it, the third argument should be 
> {color:#de350b}*offHeapMemorySize.*{color}
> {code:java}
> sanityCheckTotalFlinkMemory(totalFlinkMemorySize, jvmHeapMemorySize, 
> offHeapMemorySize);
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[GitHub] [flink] tillrohrmann closed pull request #13198: [FLINK-18993][Runtime]Invoke sanityCheckTotalFlinkMemory method incorrectly in JobManagerFlinkMemoryUtils.java

2020-08-21 Thread GitBox



tillrohrmann closed pull request #13198:
URL: https://github.com/apache/flink/pull/13198


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Assigned] (FLINK-18993) Invoke sanityCheckTotalFlinkMemory method incorrectly in JobManagerFlinkMemoryUtils.java

2020-08-21 Thread Till Rohrmann (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-18993?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Till Rohrmann reassigned FLINK-18993:
-

Assignee: Peng

> Invoke sanityCheckTotalFlinkMemory method incorrectly in 
> JobManagerFlinkMemoryUtils.java
> 
>
> Key: FLINK-18993
> URL: https://issues.apache.org/jira/browse/FLINK-18993
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Configuration
>Affects Versions: 1.11.0
>Reporter: Peng
>Assignee: Peng
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 1.12.0, 1.11.2
>
>
> In deriveFromRequiredFineGrainedOptions method from 
> JobManagerFlinkMemoryUtils.java, it will call sanityCheckTotalFlinkMemory 
> method to check heap, off-heap and total memory as below.
> {code:java}
> sanityCheckTotalFlinkMemory(totalFlinkMemorySize, jvmHeapMemorySize, 
> totalFlinkMemorySize);
> {code}
> As I understand it, the third argument should be 
> {color:#de350b}*offHeapMemorySize.*{color}
> {code:java}
> sanityCheckTotalFlinkMemory(totalFlinkMemorySize, jvmHeapMemorySize, 
> offHeapMemorySize);
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (FLINK-18993) Invoke sanityCheckTotalFlinkMemory method incorrectly in JobManagerFlinkMemoryUtils.java

2020-08-21 Thread Till Rohrmann (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-18993?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Till Rohrmann updated FLINK-18993:
--
Fix Version/s: 1.11.2
   1.12.0

> Invoke sanityCheckTotalFlinkMemory method incorrectly in 
> JobManagerFlinkMemoryUtils.java
> 
>
> Key: FLINK-18993
> URL: https://issues.apache.org/jira/browse/FLINK-18993
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Configuration
>Affects Versions: 1.11.0
>Reporter: Peng
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 1.12.0, 1.11.2
>
>
> In deriveFromRequiredFineGrainedOptions method from 
> JobManagerFlinkMemoryUtils.java, it will call sanityCheckTotalFlinkMemory 
> method to check heap, off-heap and total memory as below.
> {code:java}
> sanityCheckTotalFlinkMemory(totalFlinkMemorySize, jvmHeapMemorySize, 
> totalFlinkMemorySize);
> {code}
> As I understand it, the third argument should be 
> {color:#de350b}*offHeapMemorySize.*{color}
> {code:java}
> sanityCheckTotalFlinkMemory(totalFlinkMemorySize, jvmHeapMemorySize, 
> offHeapMemorySize);
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (FLINK-19011) Parallelize the restore operation in OperatorStateBackend

2020-08-21 Thread Jiayi Liao (Jira)



[ 
https://issues.apache.org/jira/browse/FLINK-19011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17181981#comment-17181981
 ] 

Jiayi Liao commented on FLINK-19011:


[~sewen] Mmm... probably you're right because we also find other problems 
caused by Union State like full gc on JobManager during failover. But I think 
the Union state is still useful in some cases. For example we store the 
watermark as a value state so that the job can recover with a correct 
watermark. 

I'd vote dropping this if we can find a better replacement of Union State.

> Parallelize the restore operation in OperatorStateBackend 
> --
>
> Key: FLINK-19011
> URL: https://issues.apache.org/jira/browse/FLINK-19011
> Project: Flink
>  Issue Type: Improvement
>  Components: Runtime / Checkpointing
>Affects Versions: 1.11.1
>Reporter: Jiayi Liao
>Priority: Major
>
> To restore the states, union state needs to read state handles produced by 
> all operators. And currently during the restore operation, Flink iterates the 
> state handles one by one, which could last tens of minutes if the magnitude 
> of state handles exceeds ten thousand. 
> To accelerate the process, I propose to parallelize the random reads on HDFS 
> and deserialization. We can create a runnable for each state handle and let 
> it return the metadata and deserialized data, which can be aggregated in main 
> thread.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Comment Edited] (FLINK-19016) Checksum mismatch when restore from RocksDB

2020-08-21 Thread Jiayi Liao (Jira)



[ 
https://issues.apache.org/jira/browse/FLINK-19016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17181926#comment-17181926
 ] 

Jiayi Liao edited comment on FLINK-19016 at 8/21/20, 4:26 PM:
--

[~sewen] All of your assumptions are right. (BTW the local recovery is not 
enabled.)

I can share my thoughts here but I'm not sure that I'm completely right. During 
RocksDB's takeDBNativeCheckpoint, RocksDB will invoke {{create_file_cb}} on 
current in-progress file (see 
[here|https://github.com/dataArtisans/frocksdb/blob/49bc897d5d768026f1eb816d960c1f2383396ef4/utilities/checkpoint/checkpoint_impl.cc#L292]),
 which will invoke CreateFile function(see 
[here|https://github.com/dataArtisans/frocksdb/blob/49bc897d5d768026f1eb816d960c1f2383396ef4/util/file_util.cc#L70]).
 Since Flink sets sync mode in db's options, the CreateFile and its Append 
operation may not succeed on disk. (If using fsync, I think it should be 
guranteed.)

And the checksum mismatch problem is caused by partial record. (see 
[here|https://github.com/dataArtisans/frocksdb/blob/49bc897d5d768026f1eb816d960c1f2383396ef4/db/log_reader.cc#L176]).
 That's why I suspect that there's something wrong with the sync mode set by 
Flink.

I'm not expert on Linux and RocksDB, plz correct me if I'm wrong.


was (Author: wind_ljy):
[~sewen] All of your assumptions are right. (BTW the local recovery is not 
enabled.)

I can share my thoughts here but I'm not sure that I'm completely right. During 
RocksDB's takeDBNativeCheckpoint, RocksDB will invoke {{create_file_cb}} on 
current in-progress file (see 
[here|https://github.com/dataArtisans/frocksdb/blob/49bc897d5d768026f1eb816d960c1f2383396ef4/utilities/checkpoint/checkpoint_impl.cc#L292]),
 which will invoke CreateFile function(see 
[here|https://github.com/dataArtisans/frocksdb/blob/49bc897d5d768026f1eb816d960c1f2383396ef4/util/file_util.cc#L70]).
 Since Flink sets sync mode in db's options, the CreateFile and its Append 
operation may not succeed on disk. (If using fsync, I think it should be 
guranteed.)

I'm not expert on Linux and RocksDB, plz correct me if I'm wrong.

> Checksum mismatch when restore from RocksDB
> ---
>
> Key: FLINK-19016
> URL: https://issues.apache.org/jira/browse/FLINK-19016
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Checkpointing
>Affects Versions: 1.11.1
>Reporter: Jiayi Liao
>Priority: Major
>
> The error stack is shown below:
> {code:java}
> Caused by: org.apache.flink.util.FlinkException: Could not restore keyed 
> state backend for 
> KeyedMapBundleOperator_44cfc1ca74b40bb44eed1f38f72b3ea9_(71/300) from any of 
> the 1 provided restore options.
> at 
> org.apache.flink.streaming.api.operators.BackendRestorerProcedure.createAndRestore(BackendRestorerProcedure.java:135)
> at 
> org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.keyedStatedBackend(StreamTaskStateInitializerImpl.java:307)
> at 
> org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.streamOperatorStateContext(StreamTaskStateInitializerImpl.java:135)
> ... 6 more
> Caused by: org.apache.flink.runtime.state.BackendBuildingException: Caught 
> unexpected exception.
> at 
> org.apache.flink.contrib.streaming.state.RocksDBKeyedStateBackendBuilder.build(RocksDBKeyedStateBackendBuilder.java:333)
> at 
> org.apache.flink.contrib.streaming.state.RocksDBStateBackend.createKeyedStateBackend(RocksDBStateBackend.java:580)
> at 
> org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.lambda$keyedStatedBackend$1(StreamTaskStateInitializerImpl.java:291)
> at 
> org.apache.flink.streaming.api.operators.BackendRestorerProcedure.attemptCreateAndRestore(BackendRestorerProcedure.java:142)
> at 
> org.apache.flink.streaming.api.operators.BackendRestorerProcedure.createAndRestore(BackendRestorerProcedure.java:121)
> ... 8 more
> Caused by: java.io.IOException: Error while opening RocksDB instance.
> at 
> org.apache.flink.contrib.streaming.state.RocksDBOperationUtils.openDB(RocksDBOperationUtils.java:74)
> at 
> org.apache.flink.contrib.streaming.state.restore.AbstractRocksDBRestoreOperation.openDB(AbstractRocksDBRestoreOperation.java:131)
> at 
> org.apache.flink.contrib.streaming.state.restore.RocksDBIncrementalRestoreOperation.restoreFromLocalState(RocksDBIncrementalRestoreOperation.java:214)
> at 
> org.apache.flink.contrib.streaming.state.restore.RocksDBIncrementalRestoreOperation.restoreFromRemoteState(RocksDBIncrementalRestoreOperation.java:188)
> at 
> org.apache.flink.contrib.streaming.state.restore.RocksDBIncrementalRestoreOperation.restoreWithoutRescaling(RocksDBIncrementalRestoreOperation.java:162)
> at 
>

[jira] [Comment Edited] (FLINK-19016) Checksum mismatch when restore from RocksDB

2020-08-21 Thread Jiayi Liao (Jira)



[ 
https://issues.apache.org/jira/browse/FLINK-19016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17181926#comment-17181926
 ] 

Jiayi Liao edited comment on FLINK-19016 at 8/21/20, 4:26 PM:
--

[~sewen] All of your assumptions are right. (BTW the local recovery is not 
enabled.)

I can share my thoughts here but I'm not sure that I'm completely right. During 
RocksDB's takeDBNativeCheckpoint, RocksDB will invoke {{create_file_cb}} on 
current in-progress file (see 
[here|https://github.com/dataArtisans/frocksdb/blob/49bc897d5d768026f1eb816d960c1f2383396ef4/utilities/checkpoint/checkpoint_impl.cc#L292]),
 which will invoke CreateFile function(see 
[here|https://github.com/dataArtisans/frocksdb/blob/49bc897d5d768026f1eb816d960c1f2383396ef4/util/file_util.cc#L70]).
 Since Flink sets sync mode in db's options, the CreateFile and its Append 
operation may not succeed on disk. (If using fsync, I think it should be 
guranteed.)

And the checksum mismatch problem is caused by partial record(see 
[here|https://github.com/dataArtisans/frocksdb/blob/49bc897d5d768026f1eb816d960c1f2383396ef4/db/log_reader.cc#L176]).
 That's why I suspect that there's something wrong with the sync mode set by 
Flink.

I'm not expert on Linux and RocksDB, plz correct me if I'm wrong.


was (Author: wind_ljy):
[~sewen] All of your assumptions are right. (BTW the local recovery is not 
enabled.)

I can share my thoughts here but I'm not sure that I'm completely right. During 
RocksDB's takeDBNativeCheckpoint, RocksDB will invoke {{create_file_cb}} on 
current in-progress file (see 
[here|https://github.com/dataArtisans/frocksdb/blob/49bc897d5d768026f1eb816d960c1f2383396ef4/utilities/checkpoint/checkpoint_impl.cc#L292]),
 which will invoke CreateFile function(see 
[here|https://github.com/dataArtisans/frocksdb/blob/49bc897d5d768026f1eb816d960c1f2383396ef4/util/file_util.cc#L70]).
 Since Flink sets sync mode in db's options, the CreateFile and its Append 
operation may not succeed on disk. (If using fsync, I think it should be 
guranteed.)

And the checksum mismatch problem is caused by partial record. (see 
[here|https://github.com/dataArtisans/frocksdb/blob/49bc897d5d768026f1eb816d960c1f2383396ef4/db/log_reader.cc#L176]).
 That's why I suspect that there's something wrong with the sync mode set by 
Flink.

I'm not expert on Linux and RocksDB, plz correct me if I'm wrong.

> Checksum mismatch when restore from RocksDB
> ---
>
> Key: FLINK-19016
> URL: https://issues.apache.org/jira/browse/FLINK-19016
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Checkpointing
>Affects Versions: 1.11.1
>Reporter: Jiayi Liao
>Priority: Major
>
> The error stack is shown below:
> {code:java}
> Caused by: org.apache.flink.util.FlinkException: Could not restore keyed 
> state backend for 
> KeyedMapBundleOperator_44cfc1ca74b40bb44eed1f38f72b3ea9_(71/300) from any of 
> the 1 provided restore options.
> at 
> org.apache.flink.streaming.api.operators.BackendRestorerProcedure.createAndRestore(BackendRestorerProcedure.java:135)
> at 
> org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.keyedStatedBackend(StreamTaskStateInitializerImpl.java:307)
> at 
> org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.streamOperatorStateContext(StreamTaskStateInitializerImpl.java:135)
> ... 6 more
> Caused by: org.apache.flink.runtime.state.BackendBuildingException: Caught 
> unexpected exception.
> at 
> org.apache.flink.contrib.streaming.state.RocksDBKeyedStateBackendBuilder.build(RocksDBKeyedStateBackendBuilder.java:333)
> at 
> org.apache.flink.contrib.streaming.state.RocksDBStateBackend.createKeyedStateBackend(RocksDBStateBackend.java:580)
> at 
> org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.lambda$keyedStatedBackend$1(StreamTaskStateInitializerImpl.java:291)
> at 
> org.apache.flink.streaming.api.operators.BackendRestorerProcedure.attemptCreateAndRestore(BackendRestorerProcedure.java:142)
> at 
> org.apache.flink.streaming.api.operators.BackendRestorerProcedure.createAndRestore(BackendRestorerProcedure.java:121)
> ... 8 more
> Caused by: java.io.IOException: Error while opening RocksDB instance.
> at 
> org.apache.flink.contrib.streaming.state.RocksDBOperationUtils.openDB(RocksDBOperationUtils.java:74)
> at 
> org.apache.flink.contrib.streaming.state.restore.AbstractRocksDBRestoreOperation.openDB(AbstractRocksDBRestoreOperation.java:131)
> at 
> org.apache.flink.contrib.streaming.state.restore.RocksDBIncrementalRestoreOperation.restoreFromLocalState(RocksDBIncrementalRestoreOperation.java:214)
> at 
>

[jira] [Comment Edited] (FLINK-19016) Checksum mismatch when restore from RocksDB

2020-08-21 Thread Jiayi Liao (Jira)



[ 
https://issues.apache.org/jira/browse/FLINK-19016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17181926#comment-17181926
 ] 

Jiayi Liao edited comment on FLINK-19016 at 8/21/20, 4:23 PM:
--

[~sewen] All of your assumptions are right. (BTW the local recovery is not 
enabled.)

I can share my thoughts here but I'm not sure that I'm completely right. During 
RocksDB's takeDBNativeCheckpoint, RocksDB will invoke {{create_file_cb}} on 
current in-progress file (see 
[here|https://github.com/dataArtisans/frocksdb/blob/49bc897d5d768026f1eb816d960c1f2383396ef4/utilities/checkpoint/checkpoint_impl.cc#L292]),
 which will invoke CreateFile function(see 
[here|https://github.com/dataArtisans/frocksdb/blob/49bc897d5d768026f1eb816d960c1f2383396ef4/util/file_util.cc#L70]).
 Since Flink sets sync mode in db's options, the CreateFile and its Append 
operation may not succeed on disk. (If using fsync, I think it should be 
guranteed.)

I'm not expert on Linux and RocksDB, plz correct me if I'm wrong.


was (Author: wind_ljy):
[~sewen] All of your assumptions are right. (BTW the local recovery is not 
enabled.)

I can share my thoughts here but I'm not sure that I'm completely right. During 
RocksDB's takeDBNativeCheckpoint, RocksDB will invoke {{create_file_cb}} on 
current in-progress file (see 
[here|https://github.com/dataArtisans/frocksdb/blob/49bc897d5d768026f1eb816d960c1f2383396ef4/utilities/checkpoint/checkpoint_impl.cc#L292]),
 which will invoke CreateFile function(see 
[here|https://github.com/dataArtisans/frocksdb/blob/49bc897d5d768026f1eb816d960c1f2383396ef4/util/file_util.cc#L70]).
 Since Flink uses sync operation, the CreateFile and its Append operation may 
not succeed on disk. (If using fsync, I think it should be guranteed.)

I'm not expert on Linux and RocksDB, plz correct me if I'm wrong.

> Checksum mismatch when restore from RocksDB
> ---
>
> Key: FLINK-19016
> URL: https://issues.apache.org/jira/browse/FLINK-19016
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Checkpointing
>Affects Versions: 1.11.1
>Reporter: Jiayi Liao
>Priority: Major
>
> The error stack is shown below:
> {code:java}
> Caused by: org.apache.flink.util.FlinkException: Could not restore keyed 
> state backend for 
> KeyedMapBundleOperator_44cfc1ca74b40bb44eed1f38f72b3ea9_(71/300) from any of 
> the 1 provided restore options.
> at 
> org.apache.flink.streaming.api.operators.BackendRestorerProcedure.createAndRestore(BackendRestorerProcedure.java:135)
> at 
> org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.keyedStatedBackend(StreamTaskStateInitializerImpl.java:307)
> at 
> org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.streamOperatorStateContext(StreamTaskStateInitializerImpl.java:135)
> ... 6 more
> Caused by: org.apache.flink.runtime.state.BackendBuildingException: Caught 
> unexpected exception.
> at 
> org.apache.flink.contrib.streaming.state.RocksDBKeyedStateBackendBuilder.build(RocksDBKeyedStateBackendBuilder.java:333)
> at 
> org.apache.flink.contrib.streaming.state.RocksDBStateBackend.createKeyedStateBackend(RocksDBStateBackend.java:580)
> at 
> org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.lambda$keyedStatedBackend$1(StreamTaskStateInitializerImpl.java:291)
> at 
> org.apache.flink.streaming.api.operators.BackendRestorerProcedure.attemptCreateAndRestore(BackendRestorerProcedure.java:142)
> at 
> org.apache.flink.streaming.api.operators.BackendRestorerProcedure.createAndRestore(BackendRestorerProcedure.java:121)
> ... 8 more
> Caused by: java.io.IOException: Error while opening RocksDB instance.
> at 
> org.apache.flink.contrib.streaming.state.RocksDBOperationUtils.openDB(RocksDBOperationUtils.java:74)
> at 
> org.apache.flink.contrib.streaming.state.restore.AbstractRocksDBRestoreOperation.openDB(AbstractRocksDBRestoreOperation.java:131)
> at 
> org.apache.flink.contrib.streaming.state.restore.RocksDBIncrementalRestoreOperation.restoreFromLocalState(RocksDBIncrementalRestoreOperation.java:214)
> at 
> org.apache.flink.contrib.streaming.state.restore.RocksDBIncrementalRestoreOperation.restoreFromRemoteState(RocksDBIncrementalRestoreOperation.java:188)
> at 
> org.apache.flink.contrib.streaming.state.restore.RocksDBIncrementalRestoreOperation.restoreWithoutRescaling(RocksDBIncrementalRestoreOperation.java:162)
> at 
> org.apache.flink.contrib.streaming.state.restore.RocksDBIncrementalRestoreOperation.restore(RocksDBIncrementalRestoreOperation.java:148)
> at 
> org.apache.flink.contrib.streaming.state.RocksDBKeyedStateBackendBuilder.build(RocksDBKeyedStateBackendBuilder.java:277)
>

[jira] [Created] (FLINK-19024) Remove unused "releaseMemory" from ResultSubpartition

2020-08-21 Thread Stephan Ewen (Jira)

Stephan Ewen created FLINK-19024:


 Summary: Remove unused "releaseMemory" from ResultSubpartition
 Key: FLINK-19024
 URL: https://issues.apache.org/jira/browse/FLINK-19024
 Project: Flink
  Issue Type: Sub-task
  Components: Runtime / Network
Reporter: Stephan Ewen
Assignee: Stephan Ewen
 Fix For: 1.12.0


The {{releaseMemory()}} call in the {{ResultSubpartition}} is currently not 
meaningful for any existing implementation.

Future versions where memory may have to be released will quite possibly not 
implement that on a "subpartition" level. For example, a sort based shuffle has 
the buffers on a partition-level, rather than a subpartition level.

We should thus remove the {{releaseMemory()}} call from the abstract 
subpartition interface. Concrete implementations can still release memory on a 
subpartition level, if needed in the future.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[GitHub] [flink] flinkbot edited a comment on pull request #13206: [FLINK-18948][python] Add end to end test for Python DataStream API.

2020-08-21 Thread GitBox



flinkbot edited a comment on pull request #13206:
URL: https://github.com/apache/flink/pull/13206#issuecomment-677500353


   
   ## CI report:
   
   * 5b6041fc76df9a5345ae672cd29e5e3b0be51f73 Azure: 
[FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=5774)
 Azure: 
[FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=5764)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] lgo commented on pull request #12345: draft: [FLINK-17971] [Runtime/StateBackends] Add RocksDB SST ingestion for batch writes

2020-08-21 Thread GitBox



lgo commented on pull request #12345:
URL: https://github.com/apache/flink/pull/12345#issuecomment-678368305


   Unfortunately not. The changes should be code complete, but I did not find 
the time to test and evaluate perf on running cluster yet and had backported to 
1.9.1 for that.
   
   Thanks for the reminder on this though, I'm going to try to do that next 
week when I've got a bunch of free time.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] flinkbot edited a comment on pull request #13024: [FLINK-18742][flink-clients] Some configuration args do not take effect at client

2020-08-21 Thread GitBox



flinkbot edited a comment on pull request #13024:
URL: https://github.com/apache/flink/pull/13024#issuecomment-666148245


   
   ## CI report:
   
   * eb611d5b39f997fa3e986b5d163cb65a44b4b0ba UNKNOWN
   * e505da81c290c0ab3f21defb1ef0f3a3f7be69eb Azure: 
[SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=5773)
 Azure: 
[SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=5749)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] dianfu commented on a change in pull request #13089: [FLINK-18813][docs-zh] Translate the 'Local Installation' page of 'Try Flink' into Chinese

2020-08-21 Thread GitBox



dianfu commented on a change in pull request #13089:
URL: https://github.com/apache/flink/pull/13089#discussion_r474762668



##
File path: docs/try-flink/local_installation.zh.md
##
@@ -26,36 +26,35 @@ under the License.
 {% if site.version contains "SNAPSHOT" %}
 
   
-  NOTE: The Apache Flink community only publishes official builds for
-  released versions of Apache Flink.
+  注意：Apache Flink 社区只发布 Apache Flink 的 release 版本。
   
-  Since you are currently looking at the latest SNAPSHOT
-  version of the documentation, all version references below will not work.
-  Please switch the documentation to the latest released version via the 
release picker which you
-  find on the left side below the menu.
+  由于你当前正在查看的是文档最新的 SNAPSHOT 版本，因此相关内容会被隐藏。请通过左侧菜单底部的版本选择将文档切换到最新的 release 版本。
 
 {% else %}
-Follow these few steps to download the latest stable versions and get started.
+请按照以下几个步骤下载最新的稳定版本开始使用。
 
-## Step 1: Download
+

Review comment:
   Just wondering why we need this line? I have tried to cherry-pick this 
PR to release-1.11 locally and find that it could not display the content 
correctly. After remove this line and also "", etc, everything works well.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Created] (FLINK-19023) Remove pruning of Record Serializer Buffer

2020-08-21 Thread Stephan Ewen (Jira)

Stephan Ewen created FLINK-19023:


 Summary: Remove pruning of Record Serializer Buffer
 Key: FLINK-19023
 URL: https://issues.apache.org/jira/browse/FLINK-19023
 Project: Flink
  Issue Type: Sub-task
  Components: Runtime / Network
Reporter: Stephan Ewen
Assignee: Stephan Ewen
 Fix For: 1.12.0


Currently, the {{SpanningRecordSerializer}} prunes its internal serialization 
buffer under special circumstances:

  - The buffer becomes larger than a certain threshold (5MB)
  - The full record end lines up exactly with a full buffer length (this change 
got introduced at some point, it is not clear what the purpose is)

This optimization virtually never kicks in (because of the second condition) 
and also seems unnecessary. There is only a single serializer on the sender 
side, so this will not help to reduce the maximum memory footprint needed in 
any way.

NOTE: A similar optimization on the reader side 
({{SpillingAdaptiveSpanningRecordDeserializer}}) makes sense, because multiple 
parallel deserializers run in order to piece together the records when 
retrieving buffers from the network in arbitrary order. Truncating buffers (or 
spilling) there helps reduce the maximum required memory footprint.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[GitHub] [flink] rmetzger commented on a change in pull request #12823: FLINK-18013: Refactor Hadoop utils to a single module

2020-08-21 Thread GitBox



rmetzger commented on a change in pull request #12823:
URL: https://github.com/apache/flink/pull/12823#discussion_r474752292



##
File path: flink-hadoop-utils/pom.xml
##
@@ -0,0 +1,78 @@
+
+
+http://maven.apache.org/POM/4.0.0;
+xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance;
+xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 
http://maven.apache.org/xsd/maven-4.0.0.xsd;>
+   
+   flink-parent
+   org.apache.flink
+   1.12-SNAPSHOT
+   
+   4.0.0
+
+   flink-hadoop-utils
+
+   
+   
+   org.apache.hadoop
+   hadoop-common
+   ${hadoop.version}
+   provided
+   
+
+   
+   org.apache.hadoop
+   hadoop-hdfs
+   ${hadoop.version}
+   test
+   
+
+   
+   org.apache.flink
+   flink-core
+   ${project.version}
+   
+
+   
+   org.apache.flink
+   flink-test-utils-junit

Review comment:
   why is this not test scoped?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Commented] (FLINK-19016) Checksum mismatch when restore from RocksDB

2020-08-21 Thread Jiayi Liao (Jira)



[ 
https://issues.apache.org/jira/browse/FLINK-19016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17181926#comment-17181926
 ] 

Jiayi Liao commented on FLINK-19016:


[~sewen] All of your assumptions are right. (BTW the local recovery is not 
enabled.)

I can share my thoughts here but I'm not sure that I'm completely right. During 
RocksDB's takeDBNativeCheckpoint, RocksDB will invoke {{create_file_cb}} on 
current in-progress file (see 
[here|https://github.com/dataArtisans/frocksdb/blob/49bc897d5d768026f1eb816d960c1f2383396ef4/utilities/checkpoint/checkpoint_impl.cc#L292]),
 which will invoke CreateFile function(see 
[here|https://github.com/dataArtisans/frocksdb/blob/49bc897d5d768026f1eb816d960c1f2383396ef4/util/file_util.cc#L70]).
 Since Flink uses sync operation, the CreateFile and its Append operation may 
not succeed on disk. (If using fsync, I think it should be guranteed.)

I'm not expert on Linux and RocksDB, plz correct me if I'm wrong.

> Checksum mismatch when restore from RocksDB
> ---
>
> Key: FLINK-19016
> URL: https://issues.apache.org/jira/browse/FLINK-19016
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Checkpointing
>Affects Versions: 1.11.1
>Reporter: Jiayi Liao
>Priority: Major
>
> The error stack is shown below:
> {code:java}
> Caused by: org.apache.flink.util.FlinkException: Could not restore keyed 
> state backend for 
> KeyedMapBundleOperator_44cfc1ca74b40bb44eed1f38f72b3ea9_(71/300) from any of 
> the 1 provided restore options.
> at 
> org.apache.flink.streaming.api.operators.BackendRestorerProcedure.createAndRestore(BackendRestorerProcedure.java:135)
> at 
> org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.keyedStatedBackend(StreamTaskStateInitializerImpl.java:307)
> at 
> org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.streamOperatorStateContext(StreamTaskStateInitializerImpl.java:135)
> ... 6 more
> Caused by: org.apache.flink.runtime.state.BackendBuildingException: Caught 
> unexpected exception.
> at 
> org.apache.flink.contrib.streaming.state.RocksDBKeyedStateBackendBuilder.build(RocksDBKeyedStateBackendBuilder.java:333)
> at 
> org.apache.flink.contrib.streaming.state.RocksDBStateBackend.createKeyedStateBackend(RocksDBStateBackend.java:580)
> at 
> org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.lambda$keyedStatedBackend$1(StreamTaskStateInitializerImpl.java:291)
> at 
> org.apache.flink.streaming.api.operators.BackendRestorerProcedure.attemptCreateAndRestore(BackendRestorerProcedure.java:142)
> at 
> org.apache.flink.streaming.api.operators.BackendRestorerProcedure.createAndRestore(BackendRestorerProcedure.java:121)
> ... 8 more
> Caused by: java.io.IOException: Error while opening RocksDB instance.
> at 
> org.apache.flink.contrib.streaming.state.RocksDBOperationUtils.openDB(RocksDBOperationUtils.java:74)
> at 
> org.apache.flink.contrib.streaming.state.restore.AbstractRocksDBRestoreOperation.openDB(AbstractRocksDBRestoreOperation.java:131)
> at 
> org.apache.flink.contrib.streaming.state.restore.RocksDBIncrementalRestoreOperation.restoreFromLocalState(RocksDBIncrementalRestoreOperation.java:214)
> at 
> org.apache.flink.contrib.streaming.state.restore.RocksDBIncrementalRestoreOperation.restoreFromRemoteState(RocksDBIncrementalRestoreOperation.java:188)
> at 
> org.apache.flink.contrib.streaming.state.restore.RocksDBIncrementalRestoreOperation.restoreWithoutRescaling(RocksDBIncrementalRestoreOperation.java:162)
> at 
> org.apache.flink.contrib.streaming.state.restore.RocksDBIncrementalRestoreOperation.restore(RocksDBIncrementalRestoreOperation.java:148)
> at 
> org.apache.flink.contrib.streaming.state.RocksDBKeyedStateBackendBuilder.build(RocksDBKeyedStateBackendBuilder.java:277)
> ... 12 more
> Caused by: org.rocksdb.RocksDBException: checksum mismatch
> at org.rocksdb.RocksDB.open(Native Method)
> at org.rocksdb.RocksDB.open(RocksDB.java:286)
> at 
> org.apache.flink.contrib.streaming.state.RocksDBOperationUtils.openDB(RocksDBOperationUtils.java:66)
> ... 18 more
> {code}
> The machine goes down because of hardware problem, then the job cannot 
> restart successfully anymore. After digging a little bit, I found that 
> RocksDB in Flink uses sync instead of fsync to synchronized the data with the 
> disk. With sync operation, the RocksDB cannot guarantee that the current 
> in-progress file can be persisted on disk in takeDBNativeCheckpoint.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[GitHub] [flink] rmetzger commented on pull request #13217: [FLINK-16866] Make job submission non-blocking

2020-08-21 Thread GitBox



rmetzger commented on pull request #13217:
URL: https://github.com/apache/flink/pull/13217#issuecomment-678327440


   I assumed the tests to be stable, because my personal CI finished [pretty 
green](https://dev.azure.com/rmetzger/Flink/_build/results?buildId=8308=results).
 I will investigate the unstable tests.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] rmetzger edited a comment on pull request #13162: [FLINK-18685][API / DataStream]JobClient.getAccumulators() blocks until streaming job has finished in local environment

2020-08-21 Thread GitBox



rmetzger edited a comment on pull request #13162:
URL: https://github.com/apache/flink/pull/13162#issuecomment-678321188


   This is the error I'm seeing 
   ```
   java.lang.IllegalStateException: MiniCluster is not yet running or has 
already been shut down.
at 
org.apache.flink.util.Preconditions.checkState(Preconditions.java:195)
at 
org.apache.flink.runtime.minicluster.MiniCluster.getDispatcherGatewayFuture(MiniCluster.java:707)
at 
org.apache.flink.runtime.minicluster.MiniCluster.runDispatcherCommand(MiniCluster.java:621)
at 
org.apache.flink.runtime.minicluster.MiniCluster.getExecutionGraph(MiniCluster.java:607)
at 
org.apache.flink.client.program.PerJobMiniClusterFactory$PerJobMiniClusterJobClient.getAccumulators(PerJobMiniClusterFactory.java:182)
at 
org.apache.flink.client.program.PerJobMiniClusterFactoryTest.testJobExecution(PerJobMiniClusterFactoryTest.java:67)
   ```
   
   We could implement the getAccumulator methods as follows
   ```java
@Override
public CompletableFuture> 
getAccumulators(ClassLoader classLoader) {
if (miniCluster.isRunning()) {
return miniCluster
.getExecutionGraph(jobID)

.thenApply(AccessExecutionGraph::getAccumulatorsSerialized)
.thenApply(accumulators -> {
try {
return 
AccumulatorHelper.deserializeAndUnwrapAccumulators(accumulators, classLoader);
} catch (Exception e) {
throw new 
CompletionException("Cannot deserialize and unwrap accumulators properly.", e);
}
});
} else {
return 
getJobExecutionResult(classLoader).thenApply(JobExecutionResult::getAllAccumulatorResults);
}
}
   ```
   
   (Disclaimer: I'm not very familiar with that part of the codebase, I might 
need to ask another committer for a final review)



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] rmetzger commented on pull request #13162: [FLINK-18685][API / DataStream]JobClient.getAccumulators() blocks until streaming job has finished in local environment

2020-08-21 Thread GitBox



rmetzger commented on pull request #13162:
URL: https://github.com/apache/flink/pull/13162#issuecomment-678321188


   This is the error I'm seeing 
   ```
   java.lang.IllegalStateException: MiniCluster is not yet running or has 
already been shut down.
at 
org.apache.flink.util.Preconditions.checkState(Preconditions.java:195)
at 
org.apache.flink.runtime.minicluster.MiniCluster.getDispatcherGatewayFuture(MiniCluster.java:707)
at 
org.apache.flink.runtime.minicluster.MiniCluster.runDispatcherCommand(MiniCluster.java:621)
at 
org.apache.flink.runtime.minicluster.MiniCluster.getExecutionGraph(MiniCluster.java:607)
at 
org.apache.flink.client.program.PerJobMiniClusterFactory$PerJobMiniClusterJobClient.getAccumulators(PerJobMiniClusterFactory.java:182)
at 
org.apache.flink.client.program.PerJobMiniClusterFactoryTest.testJobExecution(PerJobMiniClusterFactoryTest.java:67)
   ```
   
   We could implement the getAccumulator methods as follows
   ```
@Override
public CompletableFuture> 
getAccumulators(ClassLoader classLoader) {
if (miniCluster.isRunning()) {
return miniCluster
.getExecutionGraph(jobID)

.thenApply(AccessExecutionGraph::getAccumulatorsSerialized)
.thenApply(accumulators -> {
try {
return 
AccumulatorHelper.deserializeAndUnwrapAccumulators(accumulators, classLoader);
} catch (Exception e) {
throw new 
CompletionException("Cannot deserialize and unwrap accumulators properly.", e);
}
});
} else {
return 
getJobExecutionResult(classLoader).thenApply(JobExecutionResult::getAllAccumulatorResults);
}
}
   ```
   
   (Disclaimer: I'm not very familiar with that part of the codebase, I might 
need to ask another committer for a final review)



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink-web] sjwiesman commented on pull request #371: [hotfix] Add makefile

2020-08-21 Thread GitBox



sjwiesman commented on pull request #371:
URL: https://github.com/apache/flink-web/pull/371#issuecomment-678314215


   merged in a229405453080b7f70a9d04fae5e6d22129d20bc



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink-web] sjwiesman closed pull request #371: [hotfix] Add makefile

2020-08-21 Thread GitBox



sjwiesman closed pull request #371:
URL: https://github.com/apache/flink-web/pull/371


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Commented] (FLINK-18797) docs and examples use deprecated forms of keyBy

2020-08-21 Thread David Anderson (Jira)



[ 
https://issues.apache.org/jira/browse/FLINK-18797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17181913#comment-17181913
 ] 

David Anderson commented on FLINK-18797:


[~rmetzger] No, I only looked at keyBy. 

You're right, we should rework the examples so they don't use any deprecated 
methods. I wonder if it be possible to set up something like checkstyles so 
that the examples fail if they are using deprecated methods.

> docs and examples use deprecated forms of keyBy
> ---
>
> Key: FLINK-18797
> URL: https://issues.apache.org/jira/browse/FLINK-18797
> Project: Flink
>  Issue Type: Improvement
>  Components: Documentation, Examples
>Affects Versions: 1.11.0, 1.11.1
>Reporter: David Anderson
>Assignee: wulei0302
>Priority: Major
>  Labels: pull-request-available
>
> The DataStream example at 
> https://ci.apache.org/projects/flink/flink-docs-stable/dev/datastream_api.html#example-program
>  uses
> {{keyBy(0)}}
> which has been deprecated. There are many other cases of this throughout the 
> docs:
> dev/connectors/cassandra.md
> dev/parallel.md
> dev/stream/operators/index.md
> dev/stream/operators/process_function.md
> dev/stream/state/queryable_state.md
> dev/stream/state/state.md
> dev/types_serialization.md
> learn-flink/etl.md
> ops/scala_shell.md
> and also in a number of examples:
> AsyncIOExample.java
> SideOutputExample.java
> TwitterExample.java
> GroupedProcessingTimeWindowExample.java
> SessionWindowing.java
> TopSpeedWindowing.java
> WindowWordCount.java
> WordCount.java
> TwitterExample.scala
> GroupedProcessingTimeWindowExample.scala
> SessionWindowing.scala
> WindowWordCount.scala
> WordCount.scala
> There are also some uses of keyBy("string"), which has also been deprecated:
> dev/connectors/cassandra.md
> dev/stream/operators/index.md
> dev/types_serialization.md
> learn-flink/etl.md
> SocketWindowWordCount.java
> SocketWindowWordCount.scala
> TopSpeedWindowing.scala



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (FLINK-19022) AkkaRpcActor failed to start but no exception information

2020-08-21 Thread tartarus (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-19022?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

tartarus updated FLINK-19022:
-
Description: 
My job appeared that JM could not start normally, and the JM container was 
finally killed by RM.

In the end, I found through debug that AkkaRpcActor failed to start because the 
version of yarn in my job was incompatible with the version in the cluster.

[AkkaRpcActor exception 
handling|https://github.com/apache/flink/blob/478c9657fe1240acdc1eb08ad32ea93e08b0cd5e/flink-runtime/src/main/java/org/apache/flink/runtime/rpc/akka/AkkaRpcActor.java#L550]

I add log printing here，and then found the specific problem.
{code:java}
2020-08-21 21:31:16,985 ERROR 
org.apache.flink.runtime.rpc.akka.AkkaRpcActor$StoppedState 
[flink-akka.actor.default-dispatcher-4]  - Could not start RpcEndpoint 
resourcemanager.
java.lang.NoSuchMethodError: 
org.apache.hadoop.yarn.api.ApplicationMasterProtocolPB.registerApplicationMaster(Lcom/google/protobuf/RpcController;Lorg/apache/hadoop/yarn/proto/YarnServiceProtos$RegisterApplicationMasterRequestProto;)Lorg/apache/hadoop/yarn/proto/YarnServiceProtos$RegisterApplicationMasterResponseProto;
at 
org.apache.hadoop.yarn.api.impl.pb.client.ApplicationMasterProtocolPBClientImpl.registerApplicationMaster(ApplicationMasterProtocolPBClientImpl.java:106)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:191)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
at com.sun.proxy.$Proxy25.registerApplicationMaster(Unknown Source)
at 
org.apache.hadoop.yarn.client.api.impl.AMRMClientImpl.registerApplicationMaster(AMRMClientImpl.java:222)
at 
org.apache.hadoop.yarn.client.api.impl.AMRMClientImpl.registerApplicationMaster(AMRMClientImpl.java:214)
at 
org.apache.hadoop.yarn.client.api.async.impl.AMRMClientAsyncImpl.registerApplicationMaster(AMRMClientAsyncImpl.java:138)
at 
org.apache.flink.yarn.YarnResourceManager.createAndStartResourceManagerClient(YarnResourceManager.java:229)
at 
org.apache.flink.yarn.YarnResourceManager.initialize(YarnResourceManager.java:262)
at 
org.apache.flink.runtime.resourcemanager.ResourceManager.startResourceManagerServices(ResourceManager.java:204)
at 
org.apache.flink.runtime.resourcemanager.ResourceManager.onStart(ResourceManager.java:192)
at 
org.apache.flink.runtime.rpc.RpcEndpoint.internalCallOnStart(RpcEndpoint.java:185)
at 
org.apache.flink.runtime.rpc.akka.AkkaRpcActor$StoppedState.start(AkkaRpcActor.java:544)
at 
org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleControlMessage(AkkaRpcActor.java:169)
at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:26)
at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:21)
at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:123)
at akka.japi.pf.UnitCaseStatement.applyOrElse(CaseStatements.scala:21)
at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:170)
at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171)
at akka.actor.Actor$class.aroundReceive(Actor.scala:517)
at akka.actor.AbstractActor.aroundReceive(AbstractActor.scala:225)
at akka.actor.ActorCell.receiveMessage(ActorCell.scala:592)
at akka.actor.ActorCell.invoke(ActorCell.scala:561)
at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:258)
at akka.dispatch.Mailbox.run(Mailbox.scala:225)
at akka.dispatch.Mailbox.exec(Mailbox.scala:235)
at akka.dispatch.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at 
akka.dispatch.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at akka.dispatch.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at 
akka.dispatch.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107){code}
Should we add logs here to help find problems?

 
 [ |http://dict.youdao.com/search?q=http=chrome.extension]

  was:
My task appeared that JM could not start normally, and the JM container was 
finally killed by RM.

In the end, I found through debug that AkkaRpcActor failed to start because the 
version of yarn in my job was incompatible with the version in the cluster.

[AkkaRpcActor exception 
handling|https://github.com/apache/flink/blob/478c9657fe1240acdc1eb08ad32ea93e08b0cd5e/flink-runtime/src/main/java/org/apache/flink/runtime/rpc/akka/AkkaRpcActor.java#L550]

I add log printing here，and then found the specific

[GitHub] [flink] flinkbot edited a comment on pull request #13214: [FLINK-18938][tableSQL/API] Throw better exception message for quering sink-only connector

2020-08-21 Thread GitBox



flinkbot edited a comment on pull request #13214:
URL: https://github.com/apache/flink/pull/13214#issuecomment-678119788


   
   ## CI report:
   
   * ef62ec54b70560a9dabc09f020f7386d907aab20 Azure: 
[FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=5771)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Commented] (FLINK-19022) AkkaRpcActor failed to start but no exception information

2020-08-21 Thread tartarus (Jira)



[ 
https://issues.apache.org/jira/browse/FLINK-19022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17181911#comment-17181911
 ] 

tartarus commented on FLINK-19022:
--

[~chesnay]  [~trohrmann]  How about adding log printing here to help quickly 
find the problem?

Please assign to me, thanks

> AkkaRpcActor failed to start but no exception information
> -
>
> Key: FLINK-19022
> URL: https://issues.apache.org/jira/browse/FLINK-19022
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Coordination
>Affects Versions: 1.10.0
>Reporter: tartarus
>Priority: Major
>
> My task appeared that JM could not start normally, and the JM container was 
> finally killed by RM.
> In the end, I found through debug that AkkaRpcActor failed to start because 
> the version of yarn in my job was incompatible with the version in the 
> cluster.
> [AkkaRpcActor exception 
> handling|https://github.com/apache/flink/blob/478c9657fe1240acdc1eb08ad32ea93e08b0cd5e/flink-runtime/src/main/java/org/apache/flink/runtime/rpc/akka/AkkaRpcActor.java#L550]
> I add log printing here，and then found the specific problem.
> {code:java}
> 2020-08-21 21:31:16,985 ERROR 
> org.apache.flink.runtime.rpc.akka.AkkaRpcActor$StoppedState 
> [flink-akka.actor.default-dispatcher-4]  - Could not start RpcEndpoint 
> resourcemanager.
> java.lang.NoSuchMethodError: 
> org.apache.hadoop.yarn.api.ApplicationMasterProtocolPB.registerApplicationMaster(Lcom/google/protobuf/RpcController;Lorg/apache/hadoop/yarn/proto/YarnServiceProtos$RegisterApplicationMasterRequestProto;)Lorg/apache/hadoop/yarn/proto/YarnServiceProtos$RegisterApplicationMasterResponseProto;
>   at 
> org.apache.hadoop.yarn.api.impl.pb.client.ApplicationMasterProtocolPBClientImpl.registerApplicationMaster(ApplicationMasterProtocolPBClientImpl.java:106)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:191)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
>   at com.sun.proxy.$Proxy25.registerApplicationMaster(Unknown Source)
>   at 
> org.apache.hadoop.yarn.client.api.impl.AMRMClientImpl.registerApplicationMaster(AMRMClientImpl.java:222)
>   at 
> org.apache.hadoop.yarn.client.api.impl.AMRMClientImpl.registerApplicationMaster(AMRMClientImpl.java:214)
>   at 
> org.apache.hadoop.yarn.client.api.async.impl.AMRMClientAsyncImpl.registerApplicationMaster(AMRMClientAsyncImpl.java:138)
>   at 
> org.apache.flink.yarn.YarnResourceManager.createAndStartResourceManagerClient(YarnResourceManager.java:229)
>   at 
> org.apache.flink.yarn.YarnResourceManager.initialize(YarnResourceManager.java:262)
>   at 
> org.apache.flink.runtime.resourcemanager.ResourceManager.startResourceManagerServices(ResourceManager.java:204)
>   at 
> org.apache.flink.runtime.resourcemanager.ResourceManager.onStart(ResourceManager.java:192)
>   at 
> org.apache.flink.runtime.rpc.RpcEndpoint.internalCallOnStart(RpcEndpoint.java:185)
>   at 
> org.apache.flink.runtime.rpc.akka.AkkaRpcActor$StoppedState.start(AkkaRpcActor.java:544)
>   at 
> org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleControlMessage(AkkaRpcActor.java:169)
>   at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:26)
>   at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:21)
>   at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:123)
>   at akka.japi.pf.UnitCaseStatement.applyOrElse(CaseStatements.scala:21)
>   at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:170)
>   at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171)
>   at akka.actor.Actor$class.aroundReceive(Actor.scala:517)
>   at akka.actor.AbstractActor.aroundReceive(AbstractActor.scala:225)
>   at akka.actor.ActorCell.receiveMessage(ActorCell.scala:592)
>   at akka.actor.ActorCell.invoke(ActorCell.scala:561)
>   at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:258)
>   at akka.dispatch.Mailbox.run(Mailbox.scala:225)
>   at akka.dispatch.Mailbox.exec(Mailbox.scala:235)
>   at akka.dispatch.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
>   at 
> akka.dispatch.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
>   at akka.dispatch.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
>   at 
>

[jira] [Created] (FLINK-19022) AkkaRpcActor failed to start but no exception information

2020-08-21 Thread tartarus (Jira)

tartarus created FLINK-19022:


 Summary: AkkaRpcActor failed to start but no exception information
 Key: FLINK-19022
 URL: https://issues.apache.org/jira/browse/FLINK-19022
 Project: Flink
  Issue Type: Bug
  Components: Runtime / Coordination
Affects Versions: 1.10.0
Reporter: tartarus


My task appeared that JM could not start normally, and the JM container was 
finally killed by RM.

In the end, I found through debug that AkkaRpcActor failed to start because the 
version of yarn in my job was incompatible with the version in the cluster.

[AkkaRpcActor exception 
handling|https://github.com/apache/flink/blob/478c9657fe1240acdc1eb08ad32ea93e08b0cd5e/flink-runtime/src/main/java/org/apache/flink/runtime/rpc/akka/AkkaRpcActor.java#L550]

I add log printing here，and then found the specific problem.
{code:java}
2020-08-21 21:31:16,985 ERROR 
org.apache.flink.runtime.rpc.akka.AkkaRpcActor$StoppedState 
[flink-akka.actor.default-dispatcher-4]  - Could not start RpcEndpoint 
resourcemanager.
java.lang.NoSuchMethodError: 
org.apache.hadoop.yarn.api.ApplicationMasterProtocolPB.registerApplicationMaster(Lcom/google/protobuf/RpcController;Lorg/apache/hadoop/yarn/proto/YarnServiceProtos$RegisterApplicationMasterRequestProto;)Lorg/apache/hadoop/yarn/proto/YarnServiceProtos$RegisterApplicationMasterResponseProto;
at 
org.apache.hadoop.yarn.api.impl.pb.client.ApplicationMasterProtocolPBClientImpl.registerApplicationMaster(ApplicationMasterProtocolPBClientImpl.java:106)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:191)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
at com.sun.proxy.$Proxy25.registerApplicationMaster(Unknown Source)
at 
org.apache.hadoop.yarn.client.api.impl.AMRMClientImpl.registerApplicationMaster(AMRMClientImpl.java:222)
at 
org.apache.hadoop.yarn.client.api.impl.AMRMClientImpl.registerApplicationMaster(AMRMClientImpl.java:214)
at 
org.apache.hadoop.yarn.client.api.async.impl.AMRMClientAsyncImpl.registerApplicationMaster(AMRMClientAsyncImpl.java:138)
at 
org.apache.flink.yarn.YarnResourceManager.createAndStartResourceManagerClient(YarnResourceManager.java:229)
at 
org.apache.flink.yarn.YarnResourceManager.initialize(YarnResourceManager.java:262)
at 
org.apache.flink.runtime.resourcemanager.ResourceManager.startResourceManagerServices(ResourceManager.java:204)
at 
org.apache.flink.runtime.resourcemanager.ResourceManager.onStart(ResourceManager.java:192)
at 
org.apache.flink.runtime.rpc.RpcEndpoint.internalCallOnStart(RpcEndpoint.java:185)
at 
org.apache.flink.runtime.rpc.akka.AkkaRpcActor$StoppedState.start(AkkaRpcActor.java:544)
at 
org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleControlMessage(AkkaRpcActor.java:169)
at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:26)
at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:21)
at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:123)
at akka.japi.pf.UnitCaseStatement.applyOrElse(CaseStatements.scala:21)
at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:170)
at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171)
at akka.actor.Actor$class.aroundReceive(Actor.scala:517)
at akka.actor.AbstractActor.aroundReceive(AbstractActor.scala:225)
at akka.actor.ActorCell.receiveMessage(ActorCell.scala:592)
at akka.actor.ActorCell.invoke(ActorCell.scala:561)
at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:258)
at akka.dispatch.Mailbox.run(Mailbox.scala:225)
at akka.dispatch.Mailbox.exec(Mailbox.scala:235)
at akka.dispatch.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at 
akka.dispatch.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at akka.dispatch.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at 
akka.dispatch.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107){code}
Should we add logs here to help find problems?

 
[ |http://dict.youdao.com/search?q=http=chrome.extension]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (FLINK-19021) Cleanups of the ResultPartition components

2020-08-21 Thread Stephan Ewen (Jira)

Stephan Ewen created FLINK-19021:


 Summary: Cleanups of the ResultPartition components
 Key: FLINK-19021
 URL: https://issues.apache.org/jira/browse/FLINK-19021
 Project: Flink
  Issue Type: Improvement
  Components: Runtime / Network
Reporter: Stephan Ewen
Assignee: Stephan Ewen
 Fix For: 1.12.0


This is the umbrella issue for a set of simplifications and cleanups in the 
{{ResultPartition}} components.

This cleanup is in preparation for a possible future refactoring.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (FLINK-18970) Adding Junit TestMarkers

2020-08-21 Thread Stephan Ewen (Jira)



[ 
https://issues.apache.org/jira/browse/FLINK-18970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17181900#comment-17181900
 ] 

Stephan Ewen commented on FLINK-18970:
--

This distinction already exists: Files ending in "Test" are unit tests, files 
ending in "ITCase" are integration tests.

> Adding Junit TestMarkers
> 
>
> Key: FLINK-18970
> URL: https://issues.apache.org/jira/browse/FLINK-18970
> Project: Flink
>  Issue Type: Improvement
>  Components: Test Infrastructure, Tests
>Affects Versions: 1.11.1
>Reporter: goutham
>Priority: Minor
> Fix For: 1.11.1
>
>   Original Estimate: 10h
>  Remaining Estimate: 10h
>
> I am planning to add Test Marker to run the Unit test and Integration test 
> using markers.
> Currently, if you want to run the complete build locally it takes close to 2 
> hours. Based on requirement developers can run unit tests only or just 
> integration or both. 
> By default, it will run all the tests. 
> planning to introduce below markers 
> @Tag("IntegrationTest")
> @Tag("UnitTest")



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (FLINK-19011) Parallelize the restore operation in OperatorStateBackend

2020-08-21 Thread Stephan Ewen (Jira)



[ 
https://issues.apache.org/jira/browse/FLINK-19011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17181898#comment-17181898
 ] 

Stephan Ewen commented on FLINK-19011:
--

Union state has caused us much trouble through the years.
I am leaning towards deprecating and dropping the feature.

  - The new sources should not need it any more, the new sinks neither.
  - The feature is not really scalable, it has an inherent limit, worse than 
broadcast, because it works linearly until a failure and then becomes quadratic 
(like a broadcast). So it is a non-scalability you discover only when it is too 
late.

What do you think?

> Parallelize the restore operation in OperatorStateBackend 
> --
>
> Key: FLINK-19011
> URL: https://issues.apache.org/jira/browse/FLINK-19011
> Project: Flink
>  Issue Type: Improvement
>  Components: Runtime / Checkpointing
>Affects Versions: 1.11.1
>Reporter: Jiayi Liao
>Priority: Major
>
> To restore the states, union state needs to read state handles produced by 
> all operators. And currently during the restore operation, Flink iterates the 
> state handles one by one, which could last tens of minutes if the magnitude 
> of state handles exceeds ten thousand. 
> To accelerate the process, I propose to parallelize the random reads on HDFS 
> and deserialization. We can create a runnable for each state handle and let 
> it return the metadata and deserialized data, which can be aggregated in main 
> thread.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (FLINK-19016) Checksum mismatch when restore from RocksDB

2020-08-21 Thread Stephan Ewen (Jira)



[ 
https://issues.apache.org/jira/browse/FLINK-19016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17181897#comment-17181897
 ] 

Stephan Ewen commented on FLINK-19016:
--

Thanks for reporting this. Can you help us understand this a bit better?

(1) I assume that this is not due to local recovery, because the machine 
failed, as you said.

(2) Does the restore comes from the remote DFS (HDFS / S3 / ...) in this case? 
(I assume yes)

(3) Is the file already corrupt on the DFS, meaning do repeated failovers 
always fail? (I assume yes)

(4) How can fsync affect the whether the upload to DFS is corrupt? If the 
upload succeeded, then the file could be read locally completely, and it should 
not matter whether it is fully on disk or partially only in the disk cache.
Could you share your thoughts on how fsync can affect this?


> Checksum mismatch when restore from RocksDB
> ---
>
> Key: FLINK-19016
> URL: https://issues.apache.org/jira/browse/FLINK-19016
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Checkpointing
>Affects Versions: 1.11.1
>Reporter: Jiayi Liao
>Priority: Major
>
> The error stack is shown below:
> {code:java}
> Caused by: org.apache.flink.util.FlinkException: Could not restore keyed 
> state backend for 
> KeyedMapBundleOperator_44cfc1ca74b40bb44eed1f38f72b3ea9_(71/300) from any of 
> the 1 provided restore options.
> at 
> org.apache.flink.streaming.api.operators.BackendRestorerProcedure.createAndRestore(BackendRestorerProcedure.java:135)
> at 
> org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.keyedStatedBackend(StreamTaskStateInitializerImpl.java:307)
> at 
> org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.streamOperatorStateContext(StreamTaskStateInitializerImpl.java:135)
> ... 6 more
> Caused by: org.apache.flink.runtime.state.BackendBuildingException: Caught 
> unexpected exception.
> at 
> org.apache.flink.contrib.streaming.state.RocksDBKeyedStateBackendBuilder.build(RocksDBKeyedStateBackendBuilder.java:333)
> at 
> org.apache.flink.contrib.streaming.state.RocksDBStateBackend.createKeyedStateBackend(RocksDBStateBackend.java:580)
> at 
> org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.lambda$keyedStatedBackend$1(StreamTaskStateInitializerImpl.java:291)
> at 
> org.apache.flink.streaming.api.operators.BackendRestorerProcedure.attemptCreateAndRestore(BackendRestorerProcedure.java:142)
> at 
> org.apache.flink.streaming.api.operators.BackendRestorerProcedure.createAndRestore(BackendRestorerProcedure.java:121)
> ... 8 more
> Caused by: java.io.IOException: Error while opening RocksDB instance.
> at 
> org.apache.flink.contrib.streaming.state.RocksDBOperationUtils.openDB(RocksDBOperationUtils.java:74)
> at 
> org.apache.flink.contrib.streaming.state.restore.AbstractRocksDBRestoreOperation.openDB(AbstractRocksDBRestoreOperation.java:131)
> at 
> org.apache.flink.contrib.streaming.state.restore.RocksDBIncrementalRestoreOperation.restoreFromLocalState(RocksDBIncrementalRestoreOperation.java:214)
> at 
> org.apache.flink.contrib.streaming.state.restore.RocksDBIncrementalRestoreOperation.restoreFromRemoteState(RocksDBIncrementalRestoreOperation.java:188)
> at 
> org.apache.flink.contrib.streaming.state.restore.RocksDBIncrementalRestoreOperation.restoreWithoutRescaling(RocksDBIncrementalRestoreOperation.java:162)
> at 
> org.apache.flink.contrib.streaming.state.restore.RocksDBIncrementalRestoreOperation.restore(RocksDBIncrementalRestoreOperation.java:148)
> at 
> org.apache.flink.contrib.streaming.state.RocksDBKeyedStateBackendBuilder.build(RocksDBKeyedStateBackendBuilder.java:277)
> ... 12 more
> Caused by: org.rocksdb.RocksDBException: checksum mismatch
> at org.rocksdb.RocksDB.open(Native Method)
> at org.rocksdb.RocksDB.open(RocksDB.java:286)
> at 
> org.apache.flink.contrib.streaming.state.RocksDBOperationUtils.openDB(RocksDBOperationUtils.java:66)
> ... 18 more
> {code}
> The machine goes down because of hardware problem, then the job cannot 
> restart successfully anymore. After digging a little bit, I found that 
> RocksDB in Flink uses sync instead of fsync to synchronized the data with the 
> disk. With sync operation, the RocksDB cannot guarantee that the current 
> in-progress file can be persisted on disk in takeDBNativeCheckpoint.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[GitHub] [flink] morsapaes commented on a change in pull request #13193: [FLINK-18918][python][docs] Add dedicated connector documentation for Python Table API

2020-08-21 Thread GitBox



morsapaes commented on a change in pull request #13193:
URL: https://github.com/apache/flink/pull/13193#discussion_r474700929



##
File path: docs/dev/python/user-guide/table/python_table_api_connectors.md
##
@@ -0,0 +1,194 @@
+---
+title: "Connectors"
+nav-parent_id: python_tableapi
+nav-pos: 130
+---
+
+
+
+This page describes how to use connectors in PyFlink and highlights the 
different parts between using connectors in PyFlink vs Java/Scala. 
+
+* This will be replaced by the TOC
+{:toc}
+
+NoteFor general connector information 
and common configuration, please refer to the corresponding [Java/Scala 
documentation]({{ site.baseurl }}/dev/table/connectors/index.html). 
+
+## Download connector and format jars
+
+For both connectors and formats, implementations are available as jars that 
need to be specified as job [dependencies]({{ site.baseurl 
}}/dev/python/user-guide/table/dependency_management.html).
+
+{% highlight python %}
+
+table_env.get_config().get_configuration().set_string("pipeline.jars", 
"file:///my/jar/path/connector.jar;file:///my/jar/path/json.jar")
+
+{% endhighlight %}
+
+## How to use connectors
+
+In PyFink's Table API, DDL is the recommended way to define sources and sinks, 
executed via the `execute_sql()` method on the `TableEnvironment`. This makes 
the table available for use by the application.
+
+{% highlight python %}
+
+source_ddl = """
+CREATE TABLE source_table(
+a VARCHAR,
+b INT
+) WITH (
+  'connector.type' = 'kafka',
+  'connector.version' = 'universal',
+  'connector.topic' = 'source_topic',
+  'connector.properties.bootstrap.servers' = 'kafka:9092',
+  'connector.properties.group.id' = 'test_3',
+  'connector.startup-mode' = 'latest-offset',
+  'format.type' = 'json'
+)
+"""
+
+sink_ddl = """
+CREATE TABLE sink_table(
+a VARCHAR
+) WITH (
+  'connector.type' = 'kafka',
+  'connector.version' = 'universal',
+  'connector.topic' = 'sink_topic',
+  'connector.properties.bootstrap.servers' = 'kafka:9092',
+  'format.type' = 'json'
+)
+"""
+
+t_env.execute_sql(source_ddl)
+t_env.execute_sql(sink_ddl)
+
+t_env.sql_query("select a from source_table") \
+.insert_into("sink_table")
+
+{% endhighlight %}
+
+Below is a complete example of how to use the Kafka and Json format in PyFlink.
+
+{% highlight python %}
+
+from pyflink.datastream import StreamExecutionEnvironment, TimeCharacteristic
+from pyflink.table import StreamTableEnvironment, EnvironmentSettings
+
+
+def log_processing():
+env = StreamExecutionEnvironment.get_execution_environment()
+env_settings = EnvironmentSettings.Builder().use_blink_planner().build()
+t_env = StreamTableEnvironment.create(stream_execution_environment=env, 
environment_settings=env_settings)
+
t_env.get_config().get_configuration().set_boolean("python.fn-execution.memory.managed",
 True)
+# specify connector and format jars
+t_env.get_config().get_configuration().set_string("pipeline.jars", 
"file:///my/jar/path/connector.jar;file:///my/jar/path/json.jar")
+
+source_ddl = """
+CREATE TABLE source_table(
+a VARCHAR,
+b INT
+) WITH (
+  'connector.type' = 'kafka',
+  'connector.version' = 'universal',
+  'connector.topic' = 'source_topic',
+  'connector.properties.bootstrap.servers' = 'kafka:9092',
+  'connector.properties.group.id' = 'test_3',
+  'connector.startup-mode' = 'latest-offset',
+  'format.type' = 'json'
+)
+"""
+
+sink_ddl = """
+CREATE TABLE sink_table(
+a VARCHAR
+) WITH (
+  'connector.type' = 'kafka',
+  'connector.version' = 'universal',
+  'connector.topic' = 'sink_topic',
+  'connector.properties.bootstrap.servers' = 'kafka:9092',
+  'format.type' = 'json'
+)
+"""
+
+t_env.execute_sql(source_ddl)
+t_env.execute_sql(sink_ddl)
+
+t_env.sql_query("select a from source_table") \

Review comment:
   Just a personal preference/suggestion, for readability.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink-web] sjwiesman opened a new pull request #371: [hotfix] Add makefile

2020-08-21 Thread GitBox



sjwiesman opened a new pull request #371:
URL: https://github.com/apache/flink-web/pull/371


   Adds a makefile to add explicit commands for rebuilding vs running locally. 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] flinkbot edited a comment on pull request #13217: [FLINK-16866] Make job submission non-blocking

2020-08-21 Thread GitBox



flinkbot edited a comment on pull request #13217:
URL: https://github.com/apache/flink/pull/13217#issuecomment-678285884


   
   ## CI report:
   
   * 85c9bbac2d9ca07435fbd80d76208fa2a51e5d37 Azure: 
[PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=5776)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] morsapaes commented on a change in pull request #13193: [FLINK-18918][python][docs] Add dedicated connector documentation for Python Table API

2020-08-21 Thread GitBox



morsapaes commented on a change in pull request #13193:
URL: https://github.com/apache/flink/pull/13193#discussion_r474666933



##
File path: docs/dev/python/user-guide/table/python_table_api_connectors.md
##
@@ -0,0 +1,194 @@
+---
+title: "Connectors"
+nav-parent_id: python_tableapi
+nav-pos: 130
+---
+
+
+
+This page describes how to use connectors in PyFlink and highlights the 
different parts between using connectors in PyFlink vs Java/Scala. 
+
+* This will be replaced by the TOC
+{:toc}
+
+NoteFor general connector information 
and common configuration, please refer to the corresponding [Java/Scala 
documentation]({{ site.baseurl }}/dev/table/connectors/index.html). 
+
+## Download connector and format jars
+
+For both connectors and formats, implementations are available as jars that 
need to be specified as job [dependencies]({{ site.baseurl 
}}/dev/python/user-guide/table/dependency_management.html).

Review comment:
   Here, it would be nice to add a short sentence explaining that Flink is 
a Java/Scala-based project, just for the sake of completeness and so users who 
are not familiar with Flink understand why they have to deal with JARs in a 
Python program.

##
File path: docs/dev/python/user-guide/table/python_table_api_connectors.md
##
@@ -0,0 +1,194 @@
+---
+title: "Connectors"
+nav-parent_id: python_tableapi
+nav-pos: 130
+---
+
+
+
+This page describes how to use connectors in PyFlink and highlights the 
different parts between using connectors in PyFlink vs Java/Scala. 
+
+* This will be replaced by the TOC
+{:toc}
+
+NoteFor general connector information 
and common configuration, please refer to the corresponding [Java/Scala 
documentation]({{ site.baseurl }}/dev/table/connectors/index.html). 
+
+## Download connector and format jars
+
+For both connectors and formats, implementations are available as jars that 
need to be specified as job [dependencies]({{ site.baseurl 
}}/dev/python/user-guide/table/dependency_management.html).
+
+{% highlight python %}
+
+table_env.get_config().get_configuration().set_string("pipeline.jars", 
"file:///my/jar/path/connector.jar;file:///my/jar/path/json.jar")
+
+{% endhighlight %}
+
+## How to use connectors
+
+In PyFink's Table API, DDL is the recommended way to define sources and sinks, 
executed via the `execute_sql()` method on the `TableEnvironment`. This makes 
the table available for use by the application.
+
+{% highlight python %}
+
+source_ddl = """
+CREATE TABLE source_table(
+a VARCHAR,
+b INT
+) WITH (
+  'connector.type' = 'kafka',
+  'connector.version' = 'universal',
+  'connector.topic' = 'source_topic',
+  'connector.properties.bootstrap.servers' = 'kafka:9092',
+  'connector.properties.group.id' = 'test_3',
+  'connector.startup-mode' = 'latest-offset',
+  'format.type' = 'json'
+)
+"""
+
+sink_ddl = """
+CREATE TABLE sink_table(
+a VARCHAR
+) WITH (
+  'connector.type' = 'kafka',
+  'connector.version' = 'universal',
+  'connector.topic' = 'sink_topic',
+  'connector.properties.bootstrap.servers' = 'kafka:9092',
+  'format.type' = 'json'
+)
+"""
+
+t_env.execute_sql(source_ddl)
+t_env.execute_sql(sink_ddl)
+
+t_env.sql_query("select a from source_table") \

Review comment:
   ```suggestion
   t_env.sql_query("SELECT a FROM source_table") \
   ```

##
File path: docs/dev/python/user-guide/table/python_table_api_connectors.md
##
@@ -0,0 +1,194 @@
+---
+title: "Connectors"
+nav-parent_id: python_tableapi
+nav-pos: 130
+---
+
+
+
+This page describes how to use connectors in PyFlink and highlights the 
different parts between using connectors in PyFlink vs Java/Scala. 
+
+* This will be replaced by the TOC
+{:toc}
+
+NoteFor general connector information 
and common configuration, please refer to the corresponding [Java/Scala 
documentation]({{ site.baseurl }}/dev/table/connectors/index.html). 
+
+## Download connector and format jars
+
+For both connectors and formats, implementations are available as jars that 
need to be specified as job [dependencies]({{ site.baseurl 
}}/dev/python/user-guide/table/dependency_management.html).
+
+{% highlight python %}
+
+table_env.get_config().get_configuration().set_string("pipeline.jars", 
"file:///my/jar/path/connector.jar;file:///my/jar/path/json.jar")
+
+{% endhighlight %}
+
+## How to use connectors
+
+In PyFink's Table API, DDL is the recommended way to define sources and sinks, 
executed via the `execute_sql()` method on the `TableEnvironment`. This makes 
the table available for use by the application.
+
+{% highlight python %}
+
+source_ddl = """
+CREATE TABLE source_table(
+a VARCHAR,
+b INT
+) WITH (
+  'connector.type' = 'kafka',
+  'connector.version' = 'universal',
+

[GitHub] [flink] flinkbot edited a comment on pull request #13144: [FLINK-15853][hive][table-planner-blink] Use the new type inference f…

2020-08-21 Thread GitBox



flinkbot edited a comment on pull request #13144:
URL: https://github.com/apache/flink/pull/13144#issuecomment-673924067


   
   ## CI report:
   
   * 9b69c04b6a987bf81e21fd8a7c062fb72fe7ccb5 Azure: 
[SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=5769)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Created] (FLINK-19020) Add more metrics around async operations and backpressure

2020-08-21 Thread Igal Shilman (Jira)

Igal Shilman created FLINK-19020:


 Summary: Add more metrics around async operations and backpressure
 Key: FLINK-19020
 URL: https://issues.apache.org/jira/browse/FLINK-19020
 Project: Flink
  Issue Type: Improvement
  Components: Stateful Functions
Reporter: Igal Shilman






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (FLINK-19019) Add HDFS / S3 / GCS support to Flink-StateFun Docker image.

2020-08-21 Thread Igal Shilman (Jira)

Igal Shilman created FLINK-19019:


 Summary: Add HDFS / S3 / GCS support to Flink-StateFun Docker 
image.
 Key: FLINK-19019
 URL: https://issues.apache.org/jira/browse/FLINK-19019
 Project: Flink
  Issue Type: Improvement
  Components: Stateful Functions
Reporter: Igal Shilman


The 3rd party filesystem support is not enabled by default. To make 
checkpointing work in s3 / gcs etc' we need to add the relevant plugins.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (FLINK-18797) docs and examples use deprecated forms of keyBy

2020-08-21 Thread Robert Metzger (Jira)



[ 
https://issues.apache.org/jira/browse/FLINK-18797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17181894#comment-17181894
 ] 

Robert Metzger commented on FLINK-18797:


Thanks a lot for filing a ticket for this [~alpinegizmo].
I noticed that there are more deprecated methods, such as {{.writeAsText()}} or 
{{.assignTimestampsAndWatermarks()}} which are used in the examples. Did you 
file tickets for these as well?

> docs and examples use deprecated forms of keyBy
> ---
>
> Key: FLINK-18797
> URL: https://issues.apache.org/jira/browse/FLINK-18797
> Project: Flink
>  Issue Type: Improvement
>  Components: Documentation, Examples
>Affects Versions: 1.11.0, 1.11.1
>Reporter: David Anderson
>Assignee: wulei0302
>Priority: Major
>  Labels: pull-request-available
>
> The DataStream example at 
> https://ci.apache.org/projects/flink/flink-docs-stable/dev/datastream_api.html#example-program
>  uses
> {{keyBy(0)}}
> which has been deprecated. There are many other cases of this throughout the 
> docs:
> dev/connectors/cassandra.md
> dev/parallel.md
> dev/stream/operators/index.md
> dev/stream/operators/process_function.md
> dev/stream/state/queryable_state.md
> dev/stream/state/state.md
> dev/types_serialization.md
> learn-flink/etl.md
> ops/scala_shell.md
> and also in a number of examples:
> AsyncIOExample.java
> SideOutputExample.java
> TwitterExample.java
> GroupedProcessingTimeWindowExample.java
> SessionWindowing.java
> TopSpeedWindowing.java
> WindowWordCount.java
> WordCount.java
> TwitterExample.scala
> GroupedProcessingTimeWindowExample.scala
> SessionWindowing.scala
> WindowWordCount.scala
> WordCount.scala
> There are also some uses of keyBy("string"), which has also been deprecated:
> dev/connectors/cassandra.md
> dev/stream/operators/index.md
> dev/types_serialization.md
> learn-flink/etl.md
> SocketWindowWordCount.java
> SocketWindowWordCount.scala
> TopSpeedWindowing.scala



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[GitHub] [flink] flinkbot commented on pull request #13217: [FLINK-16866] Make job submission non-blocking

2020-08-21 Thread GitBox



flinkbot commented on pull request #13217:
URL: https://github.com/apache/flink/pull/13217#issuecomment-678285884


   
   ## CI report:
   
   * 85c9bbac2d9ca07435fbd80d76208fa2a51e5d37 UNKNOWN
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] rmetzger edited a comment on pull request #13217: [FLINK-16866] Make job submission non-blocking

2020-08-21 Thread GitBox



rmetzger edited a comment on pull request #13217:
URL: https://github.com/apache/flink/pull/13217#issuecomment-678283618


   Note: I'm still investigating one last unstable test 
`FunctionITCase.testInvalidUseOfTableFunction()`



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] rmetzger commented on pull request #13217: [FLINK-16866] Make job submission non-blocking

2020-08-21 Thread GitBox



rmetzger commented on pull request #13217:
URL: https://github.com/apache/flink/pull/13217#issuecomment-678283618


   Note: I'm still reviewing one last unstable test 
`FunctionITCase.testInvalidUseOfTableFunction()`



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Comment Edited] (FLINK-19005) used metaspace grow on every execution

2020-08-21 Thread Matthias (Jira)



[ 
https://issues.apache.org/jira/browse/FLINK-19005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17181871#comment-17181871
 ] 

Matthias edited comment on FLINK-19005 at 8/21/20, 1:09 PM:


Thanks for the update.
A quick diff shows already that there is a growing number of classes generated 
through reflection:
{code:bash}
# analysis on the "after 1 execution" heap dump
FLINK-19005 grep -o "class .*" after_1/out.html | sed -e 's~class ~~g' -e 
's~~~g' | cut -d'$' -f1 | sed 's/[0-9]*$//g'| sort | uniq -c | sort -rn | 
head
 287 jdk.internal.reflect.GeneratedSerializationConstructorAccessor
 150 jdk.internal.reflect.GeneratedMethodAccessor
 136 oracle.jdbc.driver.Redirector
  41 org.apache.flink.shaded.guava18.com.google.common.cache.LocalCache
  37 akka.remote.WireFormats
  37 akka.remote.EndpointManager
  35 
org.apache.flink.shaded.curator.org.apache.curator.shaded.com.google.common.cache.LocalCache
  35 akka.remote.RemoteSettings
  33 org.apache.flink.shaded.hadoop2.com.google.common.cache.LocalCache
  31 akka.remote.serialization.MiscMessageSerializer
{code}
{code:bash}
# analysis on the "after 10 executions" heap dump
FLINK-19005 grep -o "class .*" after_10/out.html | sed -e 's~class ~~g' -e 
's~~~g' | cut -d'$' -f1 | sed 's/[0-9]*$//g'| sort | uniq -c | sort -rn | 
head
 575 jdk.internal.reflect.GeneratedSerializationConstructorAccessor
 223 jdk.internal.reflect.GeneratedMethodAccessor
 136 oracle.jdbc.driver.Redirector
  49 com.sun.proxy.
  41 org.apache.flink.shaded.guava18.com.google.common.cache.LocalCache
  37 akka.remote.WireFormats
  37 akka.remote.EndpointManager
  36 jdk.internal.reflect.GeneratedConstructorAccessor
  35 
org.apache.flink.shaded.curator.org.apache.curator.shaded.com.google.common.cache.LocalCache
  35 akka.remote.RemoteSettings
{code}


was (Author: mapohl):
Thanks for the update:
A quick diff shows already that there is a growing number of classes generated 
through reflection:
{code:bash}
# analysis on the "after 1 execution" heap dump
FLINK-19005 grep -o "class .*" after_1/out.html | sed -e 's~class ~~g' -e 
's~~~g' | cut -d'$' -f1 | sed 's/[0-9]*$//g'| sort | uniq -c | sort -rn | 
head
 287 jdk.internal.reflect.GeneratedSerializationConstructorAccessor
 150 jdk.internal.reflect.GeneratedMethodAccessor
 136 oracle.jdbc.driver.Redirector
  41 org.apache.flink.shaded.guava18.com.google.common.cache.LocalCache
  37 akka.remote.WireFormats
  37 akka.remote.EndpointManager
  35 
org.apache.flink.shaded.curator.org.apache.curator.shaded.com.google.common.cache.LocalCache
  35 akka.remote.RemoteSettings
  33 org.apache.flink.shaded.hadoop2.com.google.common.cache.LocalCache
  31 akka.remote.serialization.MiscMessageSerializer
{code}
{code:bash}
# analysis on the "after 10 executions" heap dump
FLINK-19005 grep -o "class .*" after_10/out.html | sed -e 's~class ~~g' -e 
's~~~g' | cut -d'$' -f1 | sed 's/[0-9]*$//g'| sort | uniq -c | sort -rn | 
head
 575 jdk.internal.reflect.GeneratedSerializationConstructorAccessor
 223 jdk.internal.reflect.GeneratedMethodAccessor
 136 oracle.jdbc.driver.Redirector
  49 com.sun.proxy.
  41 org.apache.flink.shaded.guava18.com.google.common.cache.LocalCache
  37 akka.remote.WireFormats
  37 akka.remote.EndpointManager
  36 jdk.internal.reflect.GeneratedConstructorAccessor
  35 
org.apache.flink.shaded.curator.org.apache.curator.shaded.com.google.common.cache.LocalCache
  35 akka.remote.RemoteSettings
{code}

> used metaspace grow on every execution
> --
>
> Key: FLINK-19005
> URL: https://issues.apache.org/jira/browse/FLINK-19005
> Project: Flink
>  Issue Type: Bug
>  Components: API / DataSet, Client / Job Submission
>Affects Versions: 1.11.1
>Reporter: Guillermo Sánchez
>Assignee: Chesnay Schepler
>Priority: Major
> Attachments: heap_dump_after_10_executions.zip, 
> heap_dump_after_1_execution.zip
>
>
> Hi !
> Im running a 1.11.1 flink cluster, where I execute batch jobs made with 
> DataSet API.
> I submit these jobs every day to calculate daily data.
> In every execution, cluster's used metaspace increase by 7MB and its never 
> released.
> This ends up with an OutOfMemoryError caused by Metaspace every 15 days and i 
> need to restart the cluster to clean the metaspace
> taskmanager.memory.jvm-metaspace.size is set to 512mb
> Any idea of what could be causing this metaspace grow and why is it not 
> released ?
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Comment Edited] (FLINK-19005) used metaspace grow on every execution

2020-08-21 Thread Matthias (Jira)



[ 
https://issues.apache.org/jira/browse/FLINK-19005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17181871#comment-17181871
 ] 

Matthias edited comment on FLINK-19005 at 8/21/20, 1:09 PM:


Thanks for the update:
A quick diff shows already that there is a growing number of classes generated 
through reflection:
{code:bash}
# analysis on the "after 1 execution" heap dump
FLINK-19005 grep -o "class .*" after_1/out.html | sed -e 's~class ~~g' -e 
's~~~g' | cut -d'$' -f1 | sed 's/[0-9]*$//g'| sort | uniq -c | sort -rn | 
head
 287 jdk.internal.reflect.GeneratedSerializationConstructorAccessor
 150 jdk.internal.reflect.GeneratedMethodAccessor
 136 oracle.jdbc.driver.Redirector
  41 org.apache.flink.shaded.guava18.com.google.common.cache.LocalCache
  37 akka.remote.WireFormats
  37 akka.remote.EndpointManager
  35 
org.apache.flink.shaded.curator.org.apache.curator.shaded.com.google.common.cache.LocalCache
  35 akka.remote.RemoteSettings
  33 org.apache.flink.shaded.hadoop2.com.google.common.cache.LocalCache
  31 akka.remote.serialization.MiscMessageSerializer
{code}
{code:bash}
# analysis on the "after 10 executions" heap dump
FLINK-19005 grep -o "class .*" after_10/out.html | sed -e 's~class ~~g' -e 
's~~~g' | cut -d'$' -f1 | sed 's/[0-9]*$//g'| sort | uniq -c | sort -rn | 
head
 575 jdk.internal.reflect.GeneratedSerializationConstructorAccessor
 223 jdk.internal.reflect.GeneratedMethodAccessor
 136 oracle.jdbc.driver.Redirector
  49 com.sun.proxy.
  41 org.apache.flink.shaded.guava18.com.google.common.cache.LocalCache
  37 akka.remote.WireFormats
  37 akka.remote.EndpointManager
  36 jdk.internal.reflect.GeneratedConstructorAccessor
  35 
org.apache.flink.shaded.curator.org.apache.curator.shaded.com.google.common.cache.LocalCache
  35 akka.remote.RemoteSettings
{code}


was (Author: mapohl):
Thanks for the update:
A quick diff shows already that there is a growing number of class generated 
through reflection:
{code:bash}
# analysis on the "after 1 execution" heap dump
FLINK-19005 grep -o "class .*" after_1/out.html | sed -e 's~class ~~g' -e 
's~~~g' | cut -d'$' -f1 | sed 's/[0-9]*$//g'| sort | uniq -c | sort -rn | 
head
 287 jdk.internal.reflect.GeneratedSerializationConstructorAccessor
 150 jdk.internal.reflect.GeneratedMethodAccessor
 136 oracle.jdbc.driver.Redirector
  41 org.apache.flink.shaded.guava18.com.google.common.cache.LocalCache
  37 akka.remote.WireFormats
  37 akka.remote.EndpointManager
  35 
org.apache.flink.shaded.curator.org.apache.curator.shaded.com.google.common.cache.LocalCache
  35 akka.remote.RemoteSettings
  33 org.apache.flink.shaded.hadoop2.com.google.common.cache.LocalCache
  31 akka.remote.serialization.MiscMessageSerializer
{code}
{code:bash}
# analysis on the "after 10 executions" heap dump
FLINK-19005 grep -o "class .*" after_10/out.html | sed -e 's~class ~~g' -e 
's~~~g' | cut -d'$' -f1 | sed 's/[0-9]*$//g'| sort | uniq -c | sort -rn | 
head
 575 jdk.internal.reflect.GeneratedSerializationConstructorAccessor
 223 jdk.internal.reflect.GeneratedMethodAccessor
 136 oracle.jdbc.driver.Redirector
  49 com.sun.proxy.
  41 org.apache.flink.shaded.guava18.com.google.common.cache.LocalCache
  37 akka.remote.WireFormats
  37 akka.remote.EndpointManager
  36 jdk.internal.reflect.GeneratedConstructorAccessor
  35 
org.apache.flink.shaded.curator.org.apache.curator.shaded.com.google.common.cache.LocalCache
  35 akka.remote.RemoteSettings
{code}

> used metaspace grow on every execution
> --
>
> Key: FLINK-19005
> URL: https://issues.apache.org/jira/browse/FLINK-19005
> Project: Flink
>  Issue Type: Bug
>  Components: API / DataSet, Client / Job Submission
>Affects Versions: 1.11.1
>Reporter: Guillermo Sánchez
>Assignee: Chesnay Schepler
>Priority: Major
> Attachments: heap_dump_after_10_executions.zip, 
> heap_dump_after_1_execution.zip
>
>
> Hi !
> Im running a 1.11.1 flink cluster, where I execute batch jobs made with 
> DataSet API.
> I submit these jobs every day to calculate daily data.
> In every execution, cluster's used metaspace increase by 7MB and its never 
> released.
> This ends up with an OutOfMemoryError caused by Metaspace every 15 days and i 
> need to restart the cluster to clean the metaspace
> taskmanager.memory.jvm-metaspace.size is set to 512mb
> Any idea of what could be causing this metaspace grow and why is it not 
> released ?
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (FLINK-19005) used metaspace grow on every execution

2020-08-21 Thread Matthias (Jira)



[ 
https://issues.apache.org/jira/browse/FLINK-19005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17181871#comment-17181871
 ] 

Matthias commented on FLINK-19005:
--

Thanks for the update:
A quick diff shows already that there is a growing number of class generated 
through reflection:
{code:bash}
# analysis on the "after 1 execution" heap dump
FLINK-19005 grep -o "class .*" after_1/out.html | sed -e 's~class ~~g' -e 
's~~~g' | cut -d'$' -f1 | sed 's/[0-9]*$//g'| sort | uniq -c | sort -rn | 
head
 287 jdk.internal.reflect.GeneratedSerializationConstructorAccessor
 150 jdk.internal.reflect.GeneratedMethodAccessor
 136 oracle.jdbc.driver.Redirector
  41 org.apache.flink.shaded.guava18.com.google.common.cache.LocalCache
  37 akka.remote.WireFormats
  37 akka.remote.EndpointManager
  35 
org.apache.flink.shaded.curator.org.apache.curator.shaded.com.google.common.cache.LocalCache
  35 akka.remote.RemoteSettings
  33 org.apache.flink.shaded.hadoop2.com.google.common.cache.LocalCache
  31 akka.remote.serialization.MiscMessageSerializer
{code}
{code:bash}
# analysis on the "after 10 executions" heap dump
FLINK-19005 grep -o "class .*" after_10/out.html | sed -e 's~class ~~g' -e 
's~~~g' | cut -d'$' -f1 | sed 's/[0-9]*$//g'| sort | uniq -c | sort -rn | 
head
 575 jdk.internal.reflect.GeneratedSerializationConstructorAccessor
 223 jdk.internal.reflect.GeneratedMethodAccessor
 136 oracle.jdbc.driver.Redirector
  49 com.sun.proxy.
  41 org.apache.flink.shaded.guava18.com.google.common.cache.LocalCache
  37 akka.remote.WireFormats
  37 akka.remote.EndpointManager
  36 jdk.internal.reflect.GeneratedConstructorAccessor
  35 
org.apache.flink.shaded.curator.org.apache.curator.shaded.com.google.common.cache.LocalCache
  35 akka.remote.RemoteSettings
{code}

> used metaspace grow on every execution
> --
>
> Key: FLINK-19005
> URL: https://issues.apache.org/jira/browse/FLINK-19005
> Project: Flink
>  Issue Type: Bug
>  Components: API / DataSet, Client / Job Submission
>Affects Versions: 1.11.1
>Reporter: Guillermo Sánchez
>Assignee: Chesnay Schepler
>Priority: Major
> Attachments: heap_dump_after_10_executions.zip, 
> heap_dump_after_1_execution.zip
>
>
> Hi !
> Im running a 1.11.1 flink cluster, where I execute batch jobs made with 
> DataSet API.
> I submit these jobs every day to calculate daily data.
> In every execution, cluster's used metaspace increase by 7MB and its never 
> released.
> This ends up with an OutOfMemoryError caused by Metaspace every 15 days and i 
> need to restart the cluster to clean the metaspace
> taskmanager.memory.jvm-metaspace.size is set to 512mb
> Any idea of what could be causing this metaspace grow and why is it not 
> released ?
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (FLINK-19018) Add connection timeout to remote functions.

2020-08-21 Thread Igal Shilman (Jira)

Igal Shilman created FLINK-19018:


 Summary: Add connection timeout to remote functions.
 Key: FLINK-19018
 URL: https://issues.apache.org/jira/browse/FLINK-19018
 Project: Flink
  Issue Type: Improvement
  Components: Stateful Functions
Reporter: Igal Shilman


Currently we only have total request timeout which also incorporates connection 
timeout.

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[GitHub] [flink] flinkbot edited a comment on pull request #13216: [FLINK-18999][table-planner-blink][hive] Temporary generic table does…

2020-08-21 Thread GitBox



flinkbot edited a comment on pull request #13216:
URL: https://github.com/apache/flink/pull/13216#issuecomment-678268420


   
   ## CI report:
   
   * 1df875a397cf4d7cf68e46f695a01ef6734b3af1 Azure: 
[FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=5775)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] flinkbot commented on pull request #13217: [FLINK-16866] Make job submission non-blocking

2020-08-21 Thread GitBox



flinkbot commented on pull request #13217:
URL: https://github.com/apache/flink/pull/13217#issuecomment-678277886


   Thanks a lot for your contribution to the Apache Flink project. I'm the 
@flinkbot. I help the community
   to review your pull request. We will use this comment to track the progress 
of the review.
   
   
   ## Automated Checks
   Last check on commit 85c9bbac2d9ca07435fbd80d76208fa2a51e5d37 (Fri Aug 21 
13:00:46 UTC 2020)
   
   **Warnings:**
* No documentation files were touched! Remember to keep the Flink docs up 
to date!
   
   
   Mention the bot in a comment to re-run the automated checks.
   ## Review Progress
   
   * ❓ 1. The [description] looks good.
   * ❓ 2. There is [consensus] that the contribution should go into to Flink.
   * ❓ 3. Needs [attention] from.
   * ❓ 4. The change fits into the overall [architecture].
   * ❓ 5. Overall code [quality] is good.
   
   Please see the [Pull Request Review 
Guide](https://flink.apache.org/contributing/reviewing-prs.html) for a full 
explanation of the review process.
The Bot is tracking the review progress through labels. Labels are applied 
according to the order of the review items. For consensus, approval by a Flink 
committer of PMC member is required Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot approve description` to approve one or more aspects (aspects: 
`description`, `consensus`, `architecture` and `quality`)
- `@flinkbot approve all` to approve all aspects
- `@flinkbot approve-until architecture` to approve everything until 
`architecture`
- `@flinkbot attention @username1 [@username2 ..]` to require somebody's 
attention
- `@flinkbot disapprove architecture` to remove an approval you gave earlier
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] flinkbot edited a comment on pull request #13141: [FLINK-18852] Fix StreamScan doesn't inherit parallelism from input in legacy planner

2020-08-21 Thread GitBox



flinkbot edited a comment on pull request #13141:
URL: https://github.com/apache/flink/pull/13141#issuecomment-673492124


   
   ## CI report:
   
   * 0e9c4198095239ebee07a466a5e23de1a60809ac Azure: 
[SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=5768)
 Azure: 
[FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=5581)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Updated] (FLINK-16866) Make job submission non-blocking

2020-08-21 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-16866?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated FLINK-16866:
---
Labels: pull-request-available  (was: )

> Make job submission non-blocking
> 
>
> Key: FLINK-16866
> URL: https://issues.apache.org/jira/browse/FLINK-16866
> Project: Flink
>  Issue Type: Improvement
>  Components: Runtime / Coordination
>Affects Versions: 1.9.2, 1.10.0, 1.11.0
>Reporter: Till Rohrmann
>Assignee: Robert Metzger
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 1.12.0
>
>
> Currently, Flink waits to acknowledge a job submission until the 
> corresponding {{JobManager}} has been created. Since its creation also 
> involves the creation of the {{ExecutionGraph}} and potential FS operations, 
> it can take a bit of time. If the user has configured a too low 
> {{web.timeout}}, the submission can time out only reporting a 
> {{TimeoutException}} to the user.
> I propose to change the notion of job submission slightly. Instead of waiting 
> until the {{JobManager}} has been created, a job submission is complete once 
> all job relevant files have been uploaded to the {{Dispatcher}} and the 
> {{Dispatcher}} has been told about it. Creating the {{JobManager}} will then 
> belong to the actual job execution. Consequently, if problems occur while 
> creating the {{JobManager}} it will result into a job failure.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[GitHub] [flink] rmetzger opened a new pull request #13217: [FLINK-16866] Make jobsubmission non-blocking

2020-08-21 Thread GitBox



rmetzger opened a new pull request #13217:
URL: https://github.com/apache/flink/pull/13217


   ## What is the purpose of the change
   
   This is changing the semantics of the job submission: Instead of completing 
the `Dispatcher.submitJob()` future after all the initialization happened 
(which can potentially involve calling external systems etc.), the 
`.submitJob()` call now returns as soon as the job has been accepted by the 
Dispatcher.
   The benefit of this change is that the users will see the root cause of a 
submission timeout, instead of an akka.ask.timeout.
   
   ## Brief change log
   
   - Introduce a `DispatcherJob` abstraction that manages the job in a new 
`INITIALIZING` state
   - Change web frontend to cope with initializing jobs
   - change clients to submit & poll
   
   
   ## Verifying this change
   
   This PR introduces various new tests for verification.
   
   ## Does this pull request potentially affect one of the following parts:
   
 - Dependencies (does it add or upgrade a dependency): no
 - The public API, i.e., is any changed class annotated with 
`@Public(Evolving)`: no
 - The serializers: no 
 - The runtime per-record code paths (performance sensitive): no
 - Anything that affects deployment or recovery: JobManager (and its 
components), Checkpointing, Kubernetes/Yarn/Mesos, ZooKeeper: yes  
 - The S3 file system connector: no 
   
   ## Documentation
   
   This change is transparent to the user and doesn't need a documentation 
update.
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Commented] (FLINK-19005) used metaspace grow on every execution

2020-08-21 Thread Jira



[ 
https://issues.apache.org/jira/browse/FLINK-19005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17181864#comment-17181864
 ] 

Guillermo Sánchez commented on FLINK-19005:
---

I have uploaded heap_dump_after_1_executions.zip fixed

Thanks

> used metaspace grow on every execution
> --
>
> Key: FLINK-19005
> URL: https://issues.apache.org/jira/browse/FLINK-19005
> Project: Flink
>  Issue Type: Bug
>  Components: API / DataSet, Client / Job Submission
>Affects Versions: 1.11.1
>Reporter: Guillermo Sánchez
>Assignee: Chesnay Schepler
>Priority: Major
> Attachments: heap_dump_after_10_executions.zip, 
> heap_dump_after_1_execution.zip
>
>
> Hi !
> Im running a 1.11.1 flink cluster, where I execute batch jobs made with 
> DataSet API.
> I submit these jobs every day to calculate daily data.
> In every execution, cluster's used metaspace increase by 7MB and its never 
> released.
> This ends up with an OutOfMemoryError caused by Metaspace every 15 days and i 
> need to restart the cluster to clean the metaspace
> taskmanager.memory.jvm-metaspace.size is set to 512mb
> Any idea of what could be causing this metaspace grow and why is it not 
> released ?
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Comment Edited] (FLINK-18959) Fail to archiveExecutionGraph because job is not finished when dispatcher close

2020-08-21 Thread Till Rohrmann (Jira)



[ 
https://issues.apache.org/jira/browse/FLINK-18959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17181861#comment-17181861
 ] 

Till Rohrmann edited comment on FLINK-18959 at 8/21/20, 12:48 PM:
--

I agree that we should fix this issue [~Jiangang]. The thing I am asking myself 
is why it has been done differently in the first place because the original 
behaviour has been like you described it in your solution proposal. That's why 
I've pulled in Aljoscha and Tison who worked on the change.

What you could try to do is to revert the {{MiniDispatcher}} changes and then 
see whether some tests fail [~Jiangang].


was (Author: till.rohrmann):
I agree that we should fix this issue [~Jiangang]. The thing I am asking myself 
is why it has been done differently in the first place because the original 
behaviour has been like you described it in your solution proposal. That's why 
I've pulled in Aljoscha and Tison who worked on the change.

> Fail to archiveExecutionGraph because job is not finished when dispatcher 
> close
> ---
>
> Key: FLINK-18959
> URL: https://issues.apache.org/jira/browse/FLINK-18959
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Coordination
>Affects Versions: 1.10.0, 1.12.0, 1.11.1
>Reporter: Liu
>Priority: Critical
> Fix For: 1.12.0, 1.11.2, 1.10.3
>
> Attachments: flink-debug-log
>
>
> When job is cancelled, we expect to see it in flink's history server. But I 
> can not see my job after it is cancelled.
> After digging into the problem, I find that the function 
> archiveExecutionGraph is not executed. Below is the brief log:
> {panel:title=log}
> 2020-08-14 15:10:06,406 INFO 
> org.apache.flink.runtime.executiongraph.ExecutionGraph 
> [flink-akka.actor.default-dispatcher- 15] - Job EtlAndWindow 
> (6f784d4cc5bae88a332d254b21660372) switched from state RUNNING to CANCELLING.
> 2020-08-14 15:10:06,415 DEBUG 
> org.apache.flink.runtime.dispatcher.MiniDispatcher 
> [flink-akka.actor.default-dispatcher-3] - Shutting down per-job cluster 
> because the job was canceled.
> 2020-08-14 15:10:06,629 INFO 
> org.apache.flink.runtime.dispatcher.MiniDispatcher 
> [flink-akka.actor.default-dispatcher-3] - Stopping dispatcher 
> akka.tcp://flink@bjfk-c9865.yz02:38663/user/dispatcher.
> 2020-08-14 15:10:06,629 INFO 
> org.apache.flink.runtime.dispatcher.MiniDispatcher 
> [flink-akka.actor.default-dispatcher-3] - Stopping all currently running jobs 
> of dispatcher akka.tcp://flink@bjfk-c9865.yz02:38663/user/dispatcher.
> 2020-08-14 15:10:06,631 INFO org.apache.flink.runtime.jobmaster.JobMaster 
> [flink-akka.actor.default-dispatcher-29] - Stopping the JobMaster for job 
> EtlAndWindow(6f784d4cc5bae88a332d254b21660372).
> 2020-08-14 15:10:06,632 DEBUG org.apache.flink.runtime.jobmaster.JobMaster 
> [flink-akka.actor.default-dispatcher-29] - Disconnect TaskExecutor 
> container_e144_1590060720089_2161_01_06 because: Stopping JobMaster for 
> job EtlAndWindow(6f784d4cc5bae88a332d254b21660372).
> 2020-08-14 15:10:06,646 INFO 
> org.apache.flink.runtime.executiongraph.ExecutionGraph 
> [flink-akka.actor.default-dispatcher-29] - Job EtlAndWindow 
> (6f784d4cc5bae88a332d254b21660372) switched from state CANCELLING to CANCELED.
> 2020-08-14 15:10:06,664 DEBUG 
> org.apache.flink.runtime.dispatcher.MiniDispatcher 
> [flink-akka.actor.default-dispatcher-4] - There is a newer JobManagerRunner 
> for the job 6f784d4cc5bae88a332d254b21660372.
> {panel}
> From the log, we can see that job is not finished when dispatcher closes. The 
> process is as following:
>  * Receive cancel command and send it to all tasks async.
>  * In MiniDispatcher, begin to shutting down per-job cluster.
>  * Stopping dispatcher and remove job.
>  * Job is cancelled and callback is executed in method startJobManagerRunner.
>  * Because job is removed before, so currentJobManagerRunner is null which 
> not equals to the original jobManagerRunner. In this case, 
> archivedExecutionGraph will not be uploaded.
> In normal cases, I find that job is cancelled first and then dispatcher is 
> stopped so that archivedExecutionGraph will succeed. But I think that the 
> order is not constrained and it is hard to know which comes first. 
> Above is what I suspected. If so, then we should fix it.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (FLINK-18959) Fail to archiveExecutionGraph because job is not finished when dispatcher close

2020-08-21 Thread Till Rohrmann (Jira)



[ 
https://issues.apache.org/jira/browse/FLINK-18959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17181861#comment-17181861
 ] 

Till Rohrmann commented on FLINK-18959:
---

I agree that we should fix this issue [~Jiangang]. The thing I am asking myself 
is why it has been done differently in the first place because the original 
behaviour has been like you described it in your solution proposal. That's why 
I've pulled in Aljoscha and Tison who worked on the change.

> Fail to archiveExecutionGraph because job is not finished when dispatcher 
> close
> ---
>
> Key: FLINK-18959
> URL: https://issues.apache.org/jira/browse/FLINK-18959
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Coordination
>Affects Versions: 1.10.0, 1.12.0, 1.11.1
>Reporter: Liu
>Priority: Critical
> Fix For: 1.12.0, 1.11.2, 1.10.3
>
> Attachments: flink-debug-log
>
>
> When job is cancelled, we expect to see it in flink's history server. But I 
> can not see my job after it is cancelled.
> After digging into the problem, I find that the function 
> archiveExecutionGraph is not executed. Below is the brief log:
> {panel:title=log}
> 2020-08-14 15:10:06,406 INFO 
> org.apache.flink.runtime.executiongraph.ExecutionGraph 
> [flink-akka.actor.default-dispatcher- 15] - Job EtlAndWindow 
> (6f784d4cc5bae88a332d254b21660372) switched from state RUNNING to CANCELLING.
> 2020-08-14 15:10:06,415 DEBUG 
> org.apache.flink.runtime.dispatcher.MiniDispatcher 
> [flink-akka.actor.default-dispatcher-3] - Shutting down per-job cluster 
> because the job was canceled.
> 2020-08-14 15:10:06,629 INFO 
> org.apache.flink.runtime.dispatcher.MiniDispatcher 
> [flink-akka.actor.default-dispatcher-3] - Stopping dispatcher 
> akka.tcp://flink@bjfk-c9865.yz02:38663/user/dispatcher.
> 2020-08-14 15:10:06,629 INFO 
> org.apache.flink.runtime.dispatcher.MiniDispatcher 
> [flink-akka.actor.default-dispatcher-3] - Stopping all currently running jobs 
> of dispatcher akka.tcp://flink@bjfk-c9865.yz02:38663/user/dispatcher.
> 2020-08-14 15:10:06,631 INFO org.apache.flink.runtime.jobmaster.JobMaster 
> [flink-akka.actor.default-dispatcher-29] - Stopping the JobMaster for job 
> EtlAndWindow(6f784d4cc5bae88a332d254b21660372).
> 2020-08-14 15:10:06,632 DEBUG org.apache.flink.runtime.jobmaster.JobMaster 
> [flink-akka.actor.default-dispatcher-29] - Disconnect TaskExecutor 
> container_e144_1590060720089_2161_01_06 because: Stopping JobMaster for 
> job EtlAndWindow(6f784d4cc5bae88a332d254b21660372).
> 2020-08-14 15:10:06,646 INFO 
> org.apache.flink.runtime.executiongraph.ExecutionGraph 
> [flink-akka.actor.default-dispatcher-29] - Job EtlAndWindow 
> (6f784d4cc5bae88a332d254b21660372) switched from state CANCELLING to CANCELED.
> 2020-08-14 15:10:06,664 DEBUG 
> org.apache.flink.runtime.dispatcher.MiniDispatcher 
> [flink-akka.actor.default-dispatcher-4] - There is a newer JobManagerRunner 
> for the job 6f784d4cc5bae88a332d254b21660372.
> {panel}
> From the log, we can see that job is not finished when dispatcher closes. The 
> process is as following:
>  * Receive cancel command and send it to all tasks async.
>  * In MiniDispatcher, begin to shutting down per-job cluster.
>  * Stopping dispatcher and remove job.
>  * Job is cancelled and callback is executed in method startJobManagerRunner.
>  * Because job is removed before, so currentJobManagerRunner is null which 
> not equals to the original jobManagerRunner. In this case, 
> archivedExecutionGraph will not be uploaded.
> In normal cases, I find that job is cancelled first and then dispatcher is 
> stopped so that archivedExecutionGraph will succeed. But I think that the 
> order is not constrained and it is hard to know which comes first. 
> Above is what I suspected. If so, then we should fix it.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (FLINK-19017) Increases the visibility of a remote function call failures and retires

2020-08-21 Thread Igal Shilman (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-19017?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Igal Shilman updated FLINK-19017:
-
Summary: Increases the visibility of a remote function call failures and 
retires  (was: Increases the visibility of a remote function call retry)

> Increases the visibility of a remote function call failures and retires
> ---
>
> Key: FLINK-19017
> URL: https://issues.apache.org/jira/browse/FLINK-19017
> Project: Flink
>  Issue Type: Improvement
>  Components: Stateful Functions
>Reporter: Igal Shilman
>Assignee: Igal Shilman
>Priority: Major
>
> Right now it is difficult to see if a remote function call is being retired,
> this issue would want to improve that situation.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (FLINK-19005) used metaspace grow on every execution

2020-08-21 Thread Jira



 [ 
https://issues.apache.org/jira/browse/FLINK-19005?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guillermo Sánchez updated FLINK-19005:
--
Attachment: heap_dump_after_1_execution.zip

> used metaspace grow on every execution
> --
>
> Key: FLINK-19005
> URL: https://issues.apache.org/jira/browse/FLINK-19005
> Project: Flink
>  Issue Type: Bug
>  Components: API / DataSet, Client / Job Submission
>Affects Versions: 1.11.1
>Reporter: Guillermo Sánchez
>Assignee: Chesnay Schepler
>Priority: Major
> Attachments: heap_dump_after_10_executions.zip, 
> heap_dump_after_1_execution.zip
>
>
> Hi !
> Im running a 1.11.1 flink cluster, where I execute batch jobs made with 
> DataSet API.
> I submit these jobs every day to calculate daily data.
> In every execution, cluster's used metaspace increase by 7MB and its never 
> released.
> This ends up with an OutOfMemoryError caused by Metaspace every 15 days and i 
> need to restart the cluster to clean the metaspace
> taskmanager.memory.jvm-metaspace.size is set to 512mb
> Any idea of what could be causing this metaspace grow and why is it not 
> released ?
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (FLINK-19005) used metaspace grow on every execution

2020-08-21 Thread Jira



 [ 
https://issues.apache.org/jira/browse/FLINK-19005?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guillermo Sánchez updated FLINK-19005:
--
Attachment: (was: heap_dump_after_1_execution.zip)

> used metaspace grow on every execution
> --
>
> Key: FLINK-19005
> URL: https://issues.apache.org/jira/browse/FLINK-19005
> Project: Flink
>  Issue Type: Bug
>  Components: API / DataSet, Client / Job Submission
>Affects Versions: 1.11.1
>Reporter: Guillermo Sánchez
>Assignee: Chesnay Schepler
>Priority: Major
> Attachments: heap_dump_after_10_executions.zip
>
>
> Hi !
> Im running a 1.11.1 flink cluster, where I execute batch jobs made with 
> DataSet API.
> I submit these jobs every day to calculate daily data.
> In every execution, cluster's used metaspace increase by 7MB and its never 
> released.
> This ends up with an OutOfMemoryError caused by Metaspace every 15 days and i 
> need to restart the cluster to clean the metaspace
> taskmanager.memory.jvm-metaspace.size is set to 512mb
> Any idea of what could be causing this metaspace grow and why is it not 
> released ?
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[GitHub] [flink] flinkbot commented on pull request #13216: [FLINK-18999][table-planner-blink][hive] Temporary generic table does…

2020-08-21 Thread GitBox



flinkbot commented on pull request #13216:
URL: https://github.com/apache/flink/pull/13216#issuecomment-678268420


   
   ## CI report:
   
   * 1df875a397cf4d7cf68e46f695a01ef6734b3af1 UNKNOWN
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Created] (FLINK-19017) Increases the visibility of a remote function call retry

2020-08-21 Thread Igal Shilman (Jira)

Igal Shilman created FLINK-19017:


 Summary: Increases the visibility of a remote function call retry
 Key: FLINK-19017
 URL: https://issues.apache.org/jira/browse/FLINK-19017
 Project: Flink
  Issue Type: Improvement
  Components: Stateful Functions
Reporter: Igal Shilman
Assignee: Igal Shilman


Right now it is difficult to see if a remote function call is being retired,

this issue would want to improve that situation.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Comment Edited] (FLINK-19005) used metaspace grow on every execution

2020-08-21 Thread Matthias (Jira)



[ 
https://issues.apache.org/jira/browse/FLINK-19005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17181848#comment-17181848
 ] 

Matthias edited comment on FLINK-19005 at 8/21/20, 12:22 PM:
-

Hi [~gestevez], thanks for getting back to us. It looks like 
{{heap_dump_after_1_executions.zip}} contains the 
{{heap_dump_after_10_execution.zip}}. May you update the 
{{heap_dump_after_1_executions.zip}} attachment?


was (Author: mapohl):
Hi [~gestevez], thanks for getting back to us. It looks like 
{{heap_dump_after_10_executions.zip}} contains the 
{{heap_dump_after_1_execution.zip}}. May you update the 
{{heap_dump_after_10_executions.zip}} attachment?

> used metaspace grow on every execution
> --
>
> Key: FLINK-19005
> URL: https://issues.apache.org/jira/browse/FLINK-19005
> Project: Flink
>  Issue Type: Bug
>  Components: API / DataSet, Client / Job Submission
>Affects Versions: 1.11.1
>Reporter: Guillermo Sánchez
>Assignee: Chesnay Schepler
>Priority: Major
> Attachments: heap_dump_after_10_executions.zip, 
> heap_dump_after_1_execution.zip
>
>
> Hi !
> Im running a 1.11.1 flink cluster, where I execute batch jobs made with 
> DataSet API.
> I submit these jobs every day to calculate daily data.
> In every execution, cluster's used metaspace increase by 7MB and its never 
> released.
> This ends up with an OutOfMemoryError caused by Metaspace every 15 days and i 
> need to restart the cluster to clean the metaspace
> taskmanager.memory.jvm-metaspace.size is set to 512mb
> Any idea of what could be causing this metaspace grow and why is it not 
> released ?
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[GitHub] [flink] flinkbot commented on pull request #13216: [FLINK-18999][table-planner-blink][hive] Temporary generic table does…

2020-08-21 Thread GitBox



flinkbot commented on pull request #13216:
URL: https://github.com/apache/flink/pull/13216#issuecomment-678262611


   Thanks a lot for your contribution to the Apache Flink project. I'm the 
@flinkbot. I help the community
   to review your pull request. We will use this comment to track the progress 
of the review.
   
   
   ## Automated Checks
   Last check on commit 1df875a397cf4d7cf68e46f695a01ef6734b3af1 (Fri Aug 21 
12:22:33 UTC 2020)
   
   **Warnings:**
* No documentation files were touched! Remember to keep the Flink docs up 
to date!
   
   
   Mention the bot in a comment to re-run the automated checks.
   ## Review Progress
   
   * ❓ 1. The [description] looks good.
   * ❓ 2. There is [consensus] that the contribution should go into to Flink.
   * ❓ 3. Needs [attention] from.
   * ❓ 4. The change fits into the overall [architecture].
   * ❓ 5. Overall code [quality] is good.
   
   Please see the [Pull Request Review 
Guide](https://flink.apache.org/contributing/reviewing-prs.html) for a full 
explanation of the review process.
The Bot is tracking the review progress through labels. Labels are applied 
according to the order of the review items. For consensus, approval by a Flink 
committer of PMC member is required Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot approve description` to approve one or more aspects (aspects: 
`description`, `consensus`, `architecture` and `quality`)
- `@flinkbot approve all` to approve all aspects
- `@flinkbot approve-until architecture` to approve everything until 
`architecture`
- `@flinkbot attention @username1 [@username2 ..]` to require somebody's 
attention
- `@flinkbot disapprove architecture` to remove an approval you gave earlier
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] flinkbot edited a comment on pull request #13209: [FLINK-18832][datastream] Add compatible check for blocking partition with buffer timeout

2020-08-21 Thread GitBox



flinkbot edited a comment on pull request #13209:
URL: https://github.com/apache/flink/pull/13209#issuecomment-677744672


   
   ## CI report:
   
   * 22f17388c9b9beba64aa8ab4bcd2c0ae6d344308 Azure: 
[FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=5770)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] flinkbot edited a comment on pull request #13206: [FLINK-18948][python] Add end to end test for Python DataStream API.

2020-08-21 Thread GitBox



flinkbot edited a comment on pull request #13206:
URL: https://github.com/apache/flink/pull/13206#issuecomment-677500353


   
   ## CI report:
   
   * 5b6041fc76df9a5345ae672cd29e5e3b0be51f73 Azure: 
[PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=5774)
 Azure: 
[FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=5764)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] flinkbot edited a comment on pull request #13050: [FLINK-18750][table] SqlValidatorException thrown when select from a …

2020-08-21 Thread GitBox



flinkbot edited a comment on pull request #13050:
URL: https://github.com/apache/flink/pull/13050#issuecomment-667904442


   
   ## CI report:
   
   * efe2b4b092cbce31dee74b4261ca7a20904b2000 Azure: 
[SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=5766)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Commented] (FLINK-19005) used metaspace grow on every execution

2020-08-21 Thread Matthias (Jira)



[ 
https://issues.apache.org/jira/browse/FLINK-19005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17181848#comment-17181848
 ] 

Matthias commented on FLINK-19005:
--

Hi [~gestevez], thanks for getting back to us. It looks like 
{{heap_dump_after_10_executions.zip}} contains the 
{{heap_dump_after_1_execution.zip}}. May you update the 
{{heap_dump_after_10_executions.zip}} attachment?

> used metaspace grow on every execution
> --
>
> Key: FLINK-19005
> URL: https://issues.apache.org/jira/browse/FLINK-19005
> Project: Flink
>  Issue Type: Bug
>  Components: API / DataSet, Client / Job Submission
>Affects Versions: 1.11.1
>Reporter: Guillermo Sánchez
>Assignee: Chesnay Schepler
>Priority: Major
> Attachments: heap_dump_after_10_executions.zip, 
> heap_dump_after_1_execution.zip
>
>
> Hi !
> Im running a 1.11.1 flink cluster, where I execute batch jobs made with 
> DataSet API.
> I submit these jobs every day to calculate daily data.
> In every execution, cluster's used metaspace increase by 7MB and its never 
> released.
> This ends up with an OutOfMemoryError caused by Metaspace every 15 days and i 
> need to restart the cluster to clean the metaspace
> taskmanager.memory.jvm-metaspace.size is set to 512mb
> Any idea of what could be causing this metaspace grow and why is it not 
> released ?
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (FLINK-18999) Temporary generic table doesn't work with HiveCatalog

2020-08-21 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-18999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated FLINK-18999:
---
Labels: pull-request-available  (was: )

> Temporary generic table doesn't work with HiveCatalog
> -
>
> Key: FLINK-18999
> URL: https://issues.apache.org/jira/browse/FLINK-18999
> Project: Flink
>  Issue Type: Bug
>  Components: Connectors / Hive
>Reporter: Rui Li
>Assignee: Rui Li
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.12.0
>
>
> Suppose current catalog is a {{HiveCatalog}}. If user creates a temporary 
> generic table, this table cannot be accessed in SQL queries. Will hit 
> exception like:
> {noformat}
> Caused by: org.apache.hadoop.hive.metastore.api.NoSuchObjectException: DB.TBL 
> table not found
> at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_table_req_result$get_table_req_resultStandardScheme.read(ThriftHiveMetastore.java:55064)
>  ~[hive-exec-2.3.4.jar:2.3.4]
> at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_table_req_result$get_table_req_resultStandardScheme.read(ThriftHiveMetastore.java:55032)
>  ~[hive-exec-2.3.4.jar:2.3.4]
> at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_table_req_result.read(ThriftHiveMetastore.java:54963)
>  ~[hive-exec-2.3.4.jar:2.3.4]
> at 
> org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:86) 
> ~[hive-exec-2.3.4.jar:2.3.4]
> at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_get_table_req(ThriftHiveMetastore.java:1563)
>  ~[hive-exec-2.3.4.jar:2.3.4]
> at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.get_table_req(ThriftHiveMetastore.java:1550)
>  ~[hive-exec-2.3.4.jar:2.3.4]
> at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getTable(HiveMetaStoreClient.java:1344)
>  ~[hive-exec-2.3.4.jar:2.3.4]
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
> ~[?:1.8.0_181]
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
> ~[?:1.8.0_181]
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  ~[?:1.8.0_181]
> at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_181]
> at 
> org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:169)
>  ~[hive-exec-2.3.4.jar:2.3.4]
> at com.sun.proxy.$Proxy28.getTable(Unknown Source) ~[?:?]
> at 
> org.apache.flink.table.catalog.hive.client.HiveMetastoreClientWrapper.getTable(HiveMetastoreClientWrapper.java:112)
>  ~[flink-connector-hive_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
> at 
> org.apache.flink.connectors.hive.HiveTableSource.initAllPartitions(HiveTableSource.java:415)
>  ~[flink-connector-hive_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
> at 
> org.apache.flink.connectors.hive.HiveTableSource.getDataStream(HiveTableSource.java:171)
>  ~[flink-connector-hive_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
> at 
> org.apache.flink.table.planner.plan.nodes.physical.PhysicalLegacyTableSourceScan.getSourceTransformation(PhysicalLegacyTableSourceScan.scala:82)
>  ~[flink-table-blink_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
> at 
> org.apache.flink.table.planner.plan.nodes.physical.stream.StreamExecLegacyTableSourceScan.translateToPlanInternal(StreamExecLegacyTableSourceScan.scala:98)
>  ~[flink-table-blink_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
> at 
> org.apache.flink.table.planner.plan.nodes.physical.stream.StreamExecLegacyTableSourceScan.translateToPlanInternal(StreamExecLegacyTableSourceScan.scala:63)
>  ~[flink-table-blink_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
> at 
> org.apache.flink.table.planner.plan.nodes.exec.ExecNode$class.translateToPlan(ExecNode.scala:58)
>  ~[flink-table-blink_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
> at 
> org.apache.flink.table.planner.plan.nodes.physical.stream.StreamExecLegacyTableSourceScan.translateToPlan(StreamExecLegacyTableSourceScan.scala:63)
>  ~[flink-table-blink_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
> at 
> org.apache.flink.table.planner.plan.nodes.physical.stream.StreamExecLegacySink.translateToTransformation(StreamExecLegacySink.scala:158)
>  ~[flink-table-blink_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
> at 
> org.apache.flink.table.planner.plan.nodes.physical.stream.StreamExecLegacySink.translateToPlanInternal(StreamExecLegacySink.scala:82)
>  ~[flink-table-blink_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
> at 
> org.apache.flink.table.planner.plan.nodes.physical.stream.StreamExecLegacySink.translateToPlanInternal(StreamExecLegacySink.scala:48)
>

[GitHub] [flink] lirui-apache opened a new pull request #13216: [FLINK-18999][table-planner-blink][hive] Temporary generic table does…

2020-08-21 Thread GitBox



lirui-apache opened a new pull request #13216:
URL: https://github.com/apache/flink/pull/13216


   …n't work with HiveCatalog
   
   
   
   ## What is the purpose of the change
   
   Fix the issue that generic temporary table can't be used with hive catalog
   
   
   ## Brief change log
   
 - Avoid using catalog table factory for temporary tables
 - Add test cases
   
   
   ## Verifying this change
   
   Existing and added test cases
   
   ## Does this pull request potentially affect one of the following parts:
   
 - Dependencies (does it add or upgrade a dependency): no
 - The public API, i.e., is any changed class annotated with 
`@Public(Evolving)`: no
 - The serializers: no
 - The runtime per-record code paths (performance sensitive): no
 - Anything that affects deployment or recovery: JobManager (and its 
components), Checkpointing, Kubernetes/Yarn/Mesos, ZooKeeper: no
 - The S3 file system connector: no
   
   ## Documentation
   
 - Does this pull request introduce a new feature? no
 - If yes, how is the feature documented? N/A
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] flinkbot edited a comment on pull request #13206: [FLINK-18948][python] Add end to end test for Python DataStream API.

2020-08-21 Thread GitBox



flinkbot edited a comment on pull request #13206:
URL: https://github.com/apache/flink/pull/13206#issuecomment-677500353


   
   ## CI report:
   
   * 5b6041fc76df9a5345ae672cd29e5e3b0be51f73 Azure: 
[FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=5764)
 Azure: 
[PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=5774)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] flinkbot edited a comment on pull request #13212: [FLINK-18973][docs-zh] Translate the 'History Server' page of 'Debugging & Monitoring' into Chinese

2020-08-21 Thread GitBox



flinkbot edited a comment on pull request #13212:
URL: https://github.com/apache/flink/pull/13212#issuecomment-678069566


   
   ## CI report:
   
   * 0371b8be3b5a368623bf8dc553f94fd57cf0c60e Azure: 
[FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=5765)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

1 2 3 >

1 - 100 of 210 matches

Mail list logo