[GitHub] [flink-docker] zhuzhurk merged pull request #39: Update Dockerfiles for 1.10.2 release
zhuzhurk merged pull request #39: URL: https://github.com/apache/flink-docker/pull/39 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink-docker] zhuzhurk merged pull request #38: Add GPG key for 1.10.2 release and update test script
zhuzhurk merged pull request #38: URL: https://github.com/apache/flink-docker/pull/38 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Comment Edited] (FLINK-19016) Checksum mismatch when restore from RocksDB
[ https://issues.apache.org/jira/browse/FLINK-19016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17181926#comment-17181926 ] Jiayi Liao edited comment on FLINK-19016 at 8/22/20, 4:30 AM: -- [~sewen] All of your assumptions are right. (BTW the local recovery is not enabled.) I can share my thoughts here but I'm not sure that I'm completely right. During RocksDB's takeDBNativeCheckpoint, RocksDB will invoke {{create_file_cb}} on current in-progress file (see [here|https://github.com/dataArtisans/frocksdb/blob/49bc897d5d768026f1eb816d960c1f2383396ef4/utilities/checkpoint/checkpoint_impl.cc#L292]), which will invoke CreateFile function(see [here|https://github.com/dataArtisans/frocksdb/blob/49bc897d5d768026f1eb816d960c1f2383396ef4/util/file_util.cc#L70]). Since Flink sets sync mode in db's options, the CreateFile and its Append operation may not synchornize the data on disk yet when uploading sst files to HDFS. (If using fsync mode, I think it should be guaranteed.) And the checksum mismatch problem is caused by partial record(see [here|https://github.com/dataArtisans/frocksdb/blob/49bc897d5d768026f1eb816d960c1f2383396ef4/db/log_reader.cc#L176]). That's why I suspect that there's something wrong with the sync mode set by Flink. I'm not expert on Linux and RocksDB, plz correct me if I'm wrong. was (Author: wind_ljy): [~sewen] All of your assumptions are right. (BTW the local recovery is not enabled.) I can share my thoughts here but I'm not sure that I'm completely right. During RocksDB's takeDBNativeCheckpoint, RocksDB will invoke {{create_file_cb}} on current in-progress file (see [here|https://github.com/dataArtisans/frocksdb/blob/49bc897d5d768026f1eb816d960c1f2383396ef4/utilities/checkpoint/checkpoint_impl.cc#L292]), which will invoke CreateFile function(see [here|https://github.com/dataArtisans/frocksdb/blob/49bc897d5d768026f1eb816d960c1f2383396ef4/util/file_util.cc#L70]). Since Flink sets sync mode in db's options, the CreateFile and its Append operation may not succeed on disk. (If using fsync, I think it should be guranteed.) And the checksum mismatch problem is caused by partial record(see [here|https://github.com/dataArtisans/frocksdb/blob/49bc897d5d768026f1eb816d960c1f2383396ef4/db/log_reader.cc#L176]). That's why I suspect that there's something wrong with the sync mode set by Flink. I'm not expert on Linux and RocksDB, plz correct me if I'm wrong. > Checksum mismatch when restore from RocksDB > --- > > Key: FLINK-19016 > URL: https://issues.apache.org/jira/browse/FLINK-19016 > Project: Flink > Issue Type: Bug > Components: Runtime / Checkpointing >Affects Versions: 1.11.1 >Reporter: Jiayi Liao >Priority: Major > > The error stack is shown below: > {code:java} > Caused by: org.apache.flink.util.FlinkException: Could not restore keyed > state backend for > KeyedMapBundleOperator_44cfc1ca74b40bb44eed1f38f72b3ea9_(71/300) from any of > the 1 provided restore options. > at > org.apache.flink.streaming.api.operators.BackendRestorerProcedure.createAndRestore(BackendRestorerProcedure.java:135) > at > org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.keyedStatedBackend(StreamTaskStateInitializerImpl.java:307) > at > org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.streamOperatorStateContext(StreamTaskStateInitializerImpl.java:135) > ... 6 more > Caused by: org.apache.flink.runtime.state.BackendBuildingException: Caught > unexpected exception. > at > org.apache.flink.contrib.streaming.state.RocksDBKeyedStateBackendBuilder.build(RocksDBKeyedStateBackendBuilder.java:333) > at > org.apache.flink.contrib.streaming.state.RocksDBStateBackend.createKeyedStateBackend(RocksDBStateBackend.java:580) > at > org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.lambda$keyedStatedBackend$1(StreamTaskStateInitializerImpl.java:291) > at > org.apache.flink.streaming.api.operators.BackendRestorerProcedure.attemptCreateAndRestore(BackendRestorerProcedure.java:142) > at > org.apache.flink.streaming.api.operators.BackendRestorerProcedure.createAndRestore(BackendRestorerProcedure.java:121) > ... 8 more > Caused by: java.io.IOException: Error while opening RocksDB instance. > at > org.apache.flink.contrib.streaming.state.RocksDBOperationUtils.openDB(RocksDBOperationUtils.java:74) > at > org.apache.flink.contrib.streaming.state.restore.AbstractRocksDBRestoreOperation.openDB(AbstractRocksDBRestoreOperation.java:131) > at > org.apache.flink.contrib.streaming.state.restore.RocksDBIncrementalRestoreOperation.restoreFromLocalState(RocksDBIncrementalRestoreOperation.java:214) > at >
[jira] [Commented] (FLINK-18675) Checkpoint not maintaining minimum pause duration between checkpoints
[ https://issues.apache.org/jira/browse/FLINK-18675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17182236#comment-17182236 ] Dian Fu commented on FLINK-18675: - It seems duplicate with FLINK-18856 and has been addressed there. If this is the case, we should close this issue. [~raviratnakar] Could you help to check if this is the case? > Checkpoint not maintaining minimum pause duration between checkpoints > - > > Key: FLINK-18675 > URL: https://issues.apache.org/jira/browse/FLINK-18675 > Project: Flink > Issue Type: Bug > Components: Runtime / Checkpointing >Affects Versions: 1.11.0 > Environment: !image.png! >Reporter: Ravi Bhushan Ratnakar >Priority: Critical > Fix For: 1.12.0, 1.11.2 > > Attachments: image.png > > > I am running a streaming job with Flink 1.11.0 using kubernetes > infrastructure. I have configured checkpoint configuration like below > Interval - 3 minutes > Minimum pause between checkpoints - 3 minutes > Checkpoint timeout - 10 minutes > Checkpointing Mode - Exactly Once > Number of Concurrent Checkpoint - 1 > > Other configs > Time Characteristics - Processing Time > > I am observing an usual behaviour. *When a checkpoint completes successfully* > *and if it's end to end duration is almost equal or greater than Minimum > pause duration then the next checkpoint gets triggered immediately without > maintaining the Minimum pause duration*. Kindly notice this behaviour from > checkpoint id 194 onward in the attached screenshot -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Closed] (FLINK-18819) All the PubSub tests are failing
[ https://issues.apache.org/jira/browse/FLINK-18819?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dian Fu closed FLINK-18819. --- Fix Version/s: (was: 1.11.2) (was: 1.12.0) Resolution: Cannot Reproduce Close this issue for now as it doesn't occur any more. > All the PubSub tests are failing > > > Key: FLINK-18819 > URL: https://issues.apache.org/jira/browse/FLINK-18819 > Project: Flink > Issue Type: Bug > Components: Connectors / Google Cloud PubSub, Tests >Affects Versions: 1.12.0, 1.11.2 >Reporter: Dian Fu >Priority: Critical > Labels: test-stability > > https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=5177=logs=08866332-78f7-59e4-4f7e-49a56faa3179=7f606211-1454-543c-70ab-c7a028a1ce8c > {code} > 2020-08-04T21:18:11.4228635Z 63538 [main] INFO > com.spotify.docker.client.DefaultDockerClient [] - Starting container with > Id: a40e3b941371ddd82f4efe0ec6f72b89f36868316e5d8cc52dacc64f0bc79029 > 2020-08-04T21:18:13.1868394Z [ERROR] Tests run: 3, Failures: 1, Errors: 2, > Skipped: 0, Time elapsed: 64.94 s <<< FAILURE! - in > org.apache.flink.streaming.connectors.gcp.pubsub.EmulatedPubSubSinkTest > 2020-08-04T21:18:13.1869974Z [ERROR] > org.apache.flink.streaming.connectors.gcp.pubsub.EmulatedPubSubSinkTest Time > elapsed: 64.938 s <<< FAILURE! > 2020-08-04T21:18:13.1870671Z java.lang.AssertionError: We expect 1 port to be > mapped expected:<1> but was:<0> > 2020-08-04T21:18:13.1871219Z at org.junit.Assert.fail(Assert.java:88) > 2020-08-04T21:18:13.1871693Z at > org.junit.Assert.failNotEquals(Assert.java:834) > 2020-08-04T21:18:13.1872167Z at > org.junit.Assert.assertEquals(Assert.java:645) > 2020-08-04T21:18:13.1872808Z at > org.apache.flink.streaming.connectors.gcp.pubsub.emulator.GCloudEmulatorManager.launchDocker(GCloudEmulatorManager.java:141) > 2020-08-04T21:18:13.1873573Z at > org.apache.flink.streaming.connectors.gcp.pubsub.emulator.GCloudUnitTestBase.launchGCloudEmulator(GCloudUnitTestBase.java:45) > 2020-08-04T21:18:13.1874205Z at > sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > 2020-08-04T21:18:13.1874742Z at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > 2020-08-04T21:18:13.1875353Z at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > 2020-08-04T21:18:13.1875912Z at > java.lang.reflect.Method.invoke(Method.java:498) > 2020-08-04T21:18:13.1876472Z at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50) > 2020-08-04T21:18:13.1877081Z at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > 2020-08-04T21:18:13.1877695Z at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47) > 2020-08-04T21:18:13.1878283Z at > org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:24) > 2020-08-04T21:18:13.1878876Z at > org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) > 2020-08-04T21:18:13.1879425Z at > org.junit.runners.ParentRunner.run(ParentRunner.java:363) > 2020-08-04T21:18:13.1880172Z at > org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365) > 2020-08-04T21:18:13.1880815Z at > org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:273) > 2020-08-04T21:18:13.1881464Z at > org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:238) > 2020-08-04T21:18:13.1882083Z at > org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159) > 2020-08-04T21:18:13.1882736Z at > org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:384) > 2020-08-04T21:18:13.1883399Z at > org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:345) > 2020-08-04T21:18:13.1883999Z at > org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:126) > 2020-08-04T21:18:13.1884593Z at > org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:418) > 2020-08-04T21:18:13.1884989Z > 2020-08-04T21:18:13.1885496Z [ERROR] > org.apache.flink.streaming.connectors.gcp.pubsub.EmulatedPubSubSinkTest Time > elapsed: 64.939 s <<< ERROR! > 2020-08-04T21:18:13.1886024Z java.lang.NullPointerException > 2020-08-04T21:18:13.1886588Z at > org.apache.flink.streaming.connectors.gcp.pubsub.EmulatedPubSubSinkTest.tearDown(EmulatedPubSubSinkTest.java:62) > 2020-08-04T21:18:13.1887193Z at > sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > 2020-08-04T21:18:13.1887864Z at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > 2020-08-04T21:18:13.1888483Z at >
[jira] [Commented] (FLINK-17274) Maven: Premature end of Content-Length delimited message body
[ https://issues.apache.org/jira/browse/FLINK-17274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17182233#comment-17182233 ] Dian Fu commented on FLINK-17274: - https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=5781=logs=f0ac5c25-1168-55a5-07ff-0e88223afed9=39a61cac-5c62-532f-d2c1-dea450a66708 > Maven: Premature end of Content-Length delimited message body > - > > Key: FLINK-17274 > URL: https://issues.apache.org/jira/browse/FLINK-17274 > Project: Flink > Issue Type: Bug > Components: Build System / Azure Pipelines >Reporter: Robert Metzger >Assignee: Robert Metzger >Priority: Critical > Labels: test-stability > Fix For: 1.12.0 > > > CI: > https://dev.azure.com/rmetzger/Flink/_build/results?buildId=7786=logs=52b61abe-a3cc-5bde-cc35-1bbe89bb7df5=54421a62-0c80-5aad-3319-094ff69180bb > {code} > [ERROR] Failed to execute goal on project > flink-connector-elasticsearch7_2.11: Could not resolve dependencies for > project > org.apache.flink:flink-connector-elasticsearch7_2.11:jar:1.11-SNAPSHOT: Could > not transfer artifact org.apache.lucene:lucene-sandbox:jar:8.3.0 from/to > alicloud-mvn-mirror > (http://mavenmirror.alicloud.dak8s.net:/repository/maven-central/): GET > request of: org/apache/lucene/lucene-sandbox/8.3.0/lucene-sandbox-8.3.0.jar > from alicloud-mvn-mirror failed: Premature end of Content-Length delimited > message body (expected: 289920; received: 239832 -> [Help 1] > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-16947) ArtifactResolutionException: Could not transfer artifact. Entry [...] has not been leased from this pool
[ https://issues.apache.org/jira/browse/FLINK-16947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17182232#comment-17182232 ] Dian Fu commented on FLINK-16947: - https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=5781=logs=f450c1a5-64b1-5955-e215-49cb1ad5ec88=61b612c5-1c7d-5289-27c2-71f332ae98d7 > ArtifactResolutionException: Could not transfer artifact. Entry [...] has > not been leased from this pool > - > > Key: FLINK-16947 > URL: https://issues.apache.org/jira/browse/FLINK-16947 > Project: Flink > Issue Type: Bug > Components: Build System / Azure Pipelines >Reporter: Piotr Nowojski >Assignee: Robert Metzger >Priority: Blocker > Labels: test-stability > Fix For: 1.12.0 > > > https://dev.azure.com/rmetzger/Flink/_build/results?buildId=6982=logs=c88eea3b-64a0-564d-0031-9fdcd7b8abee=1e2bbe5b-4657-50be-1f07-d84bfce5b1f5 > Build of flink-metrics-availability-test failed with: > {noformat} > [ERROR] Failed to execute goal > org.apache.maven.plugins:maven-surefire-plugin:2.22.1:test (end-to-end-tests) > on project flink-metrics-availability-test: Unable to generate classpath: > org.apache.maven.artifact.resolver.ArtifactResolutionException: Could not > transfer artifact org.apache.maven.surefire:surefire-grouper:jar:2.22.1 > from/to google-maven-central > (https://maven-central-eu.storage-download.googleapis.com/maven2/): Entry > [id:13][route:{s}->https://maven-central-eu.storage-download.googleapis.com:443][state:null] > has not been leased from this pool > [ERROR] org.apache.maven.surefire:surefire-grouper:jar:2.22.1 > [ERROR] > [ERROR] from the specified remote repositories: > [ERROR] google-maven-central > (https://maven-central-eu.storage-download.googleapis.com/maven2/, > releases=true, snapshots=false), > [ERROR] apache.snapshots (https://repository.apache.org/snapshots, > releases=false, snapshots=true) > [ERROR] Path to dependency: > [ERROR] 1) dummy:dummy:jar:1.0 > [ERROR] 2) org.apache.maven.surefire:surefire-junit47:jar:2.22.1 > [ERROR] 3) org.apache.maven.surefire:common-junit48:jar:2.22.1 > [ERROR] 4) org.apache.maven.surefire:surefire-grouper:jar:2.22.1 > [ERROR] -> [Help 1] > [ERROR] > [ERROR] To see the full stack trace of the errors, re-run Maven with the -e > switch. > [ERROR] Re-run Maven using the -X switch to enable full debug logging. > [ERROR] > [ERROR] For more information about the errors and possible solutions, please > read the following articles: > [ERROR] [Help 1] > http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException > [ERROR] > [ERROR] After correcting the problems, you can resume the build with the > command > [ERROR] mvn -rf :flink-metrics-availability-test > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-18634) FlinkKafkaProducerITCase.testRecoverCommittedTransaction failed with "Timeout expired after 60000milliseconds while awaiting InitProducerId"
[ https://issues.apache.org/jira/browse/FLINK-18634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17182231#comment-17182231 ] Dian Fu commented on FLINK-18634: - https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=5780=logs=1fc6e7bf-633c-5081-c32a-9dea24b05730=80a658d1-f7f6-5d93-2758-53ac19fd5b19 > FlinkKafkaProducerITCase.testRecoverCommittedTransaction failed with "Timeout > expired after 6milliseconds while awaiting InitProducerId" > > > Key: FLINK-18634 > URL: https://issues.apache.org/jira/browse/FLINK-18634 > Project: Flink > Issue Type: Bug > Components: Connectors / Kafka, Tests >Affects Versions: 1.11.0 >Reporter: Dian Fu >Priority: Major > Labels: test-stability > > https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=4590=logs=c5f0071e-1851-543e-9a45-9ac140befc32=684b1416-4c17-504e-d5ab-97ee44e08a20 > {code} > 2020-07-17T11:43:47.9693015Z [ERROR] Tests run: 12, Failures: 0, Errors: 1, > Skipped: 0, Time elapsed: 269.399 s <<< FAILURE! - in > org.apache.flink.streaming.connectors.kafka.FlinkKafkaProducerITCase > 2020-07-17T11:43:47.9693862Z [ERROR] > testRecoverCommittedTransaction(org.apache.flink.streaming.connectors.kafka.FlinkKafkaProducerITCase) > Time elapsed: 60.679 s <<< ERROR! > 2020-07-17T11:43:47.9694737Z org.apache.kafka.common.errors.TimeoutException: > org.apache.kafka.common.errors.TimeoutException: Timeout expired after > 6milliseconds while awaiting InitProducerId > 2020-07-17T11:43:47.9695376Z Caused by: > org.apache.kafka.common.errors.TimeoutException: Timeout expired after > 6milliseconds while awaiting InitProducerId > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (FLINK-18634) FlinkKafkaProducerITCase.testRecoverCommittedTransaction failed with "Timeout expired after 60000milliseconds while awaiting InitProducerId"
[ https://issues.apache.org/jira/browse/FLINK-18634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17182231#comment-17182231 ] Dian Fu edited comment on FLINK-18634 at 8/22/20, 2:46 AM: --- Instance on master: https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=5780=logs=1fc6e7bf-633c-5081-c32a-9dea24b05730=80a658d1-f7f6-5d93-2758-53ac19fd5b19 was (Author: dian.fu): https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=5780=logs=1fc6e7bf-633c-5081-c32a-9dea24b05730=80a658d1-f7f6-5d93-2758-53ac19fd5b19 > FlinkKafkaProducerITCase.testRecoverCommittedTransaction failed with "Timeout > expired after 6milliseconds while awaiting InitProducerId" > > > Key: FLINK-18634 > URL: https://issues.apache.org/jira/browse/FLINK-18634 > Project: Flink > Issue Type: Bug > Components: Connectors / Kafka, Tests >Affects Versions: 1.11.0 >Reporter: Dian Fu >Priority: Major > Labels: test-stability > > https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=4590=logs=c5f0071e-1851-543e-9a45-9ac140befc32=684b1416-4c17-504e-d5ab-97ee44e08a20 > {code} > 2020-07-17T11:43:47.9693015Z [ERROR] Tests run: 12, Failures: 0, Errors: 1, > Skipped: 0, Time elapsed: 269.399 s <<< FAILURE! - in > org.apache.flink.streaming.connectors.kafka.FlinkKafkaProducerITCase > 2020-07-17T11:43:47.9693862Z [ERROR] > testRecoverCommittedTransaction(org.apache.flink.streaming.connectors.kafka.FlinkKafkaProducerITCase) > Time elapsed: 60.679 s <<< ERROR! > 2020-07-17T11:43:47.9694737Z org.apache.kafka.common.errors.TimeoutException: > org.apache.kafka.common.errors.TimeoutException: Timeout expired after > 6milliseconds while awaiting InitProducerId > 2020-07-17T11:43:47.9695376Z Caused by: > org.apache.kafka.common.errors.TimeoutException: Timeout expired after > 6milliseconds while awaiting InitProducerId > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-19012) E2E test fails with "Cannot register Closeable, this subtaskCheckpointCoordinator is already closed. Closing argument."
[ https://issues.apache.org/jira/browse/FLINK-19012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17182230#comment-17182230 ] Dian Fu commented on FLINK-19012: - Another instance on master: https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=5780=logs=739e6eac-8312-5d31-d437-294c4d26fced=a68b8d89-50e9-5977-4500-f4fde4f57f9b > E2E test fails with "Cannot register Closeable, this > subtaskCheckpointCoordinator is already closed. Closing argument." > --- > > Key: FLINK-19012 > URL: https://issues.apache.org/jira/browse/FLINK-19012 > Project: Flink > Issue Type: Bug > Components: Runtime / Checkpointing, Runtime / Task, Tests >Affects Versions: 1.12.0 >Reporter: Robert Metzger >Priority: Major > Labels: test-stability > > Note: This error occurred in a custom branch with unreviewed changes. I don't > believe my changes affect this error, but I would keep this in mind when > investigating the error: > https://dev.azure.com/rmetzger/Flink/_build/results?buildId=8307=logs=1f3ed471-1849-5d3c-a34c-19792af4ad16=0d2e35fc-a330-5cf2-a012-7267e2667b1d > > {code} > 2020-08-20T20:55:30.2400645Z 2020-08-20 20:55:22,373 INFO > org.apache.flink.runtime.taskmanager.Task[] - Registering > task at network: Source: Sequence Source -> Flat Map -> Sink: Unnamed (1/1) > (cbc357ccb763df2852fee8c4fc7d55f2_0_0) [DEPLOYING]. > 2020-08-20T20:55:30.2402392Z 2020-08-20 20:55:22,401 INFO > org.apache.flink.streaming.runtime.tasks.StreamTask [] - No state > backend has been configured, using default (Memory / JobManager) > MemoryStateBackend (data in heap memory / checkpoints to JobManager) > (checkpoints: 'null', savepoints: 'null', asynchronous: TRUE, maxStateSize: > 5242880) > 2020-08-20T20:55:30.2404297Z 2020-08-20 20:55:22,413 INFO > org.apache.flink.runtime.taskmanager.Task[] - Source: > Sequence Source -> Flat Map -> Sink: Unnamed (1/1) > (cbc357ccb763df2852fee8c4fc7d55f2_0_0) switched from DEPLOYING to RUNNING. > 2020-08-20T20:55:30.2405805Z 2020-08-20 20:55:22,786 INFO > org.apache.flink.streaming.connectors.elasticsearch6.Elasticsearch6ApiCallBridge > [] - Pinging Elasticsearch cluster via hosts [http://127.0.0.1:9200] ... > 2020-08-20T20:55:30.2407027Z 2020-08-20 20:55:22,848 INFO > org.apache.flink.streaming.connectors.elasticsearch6.Elasticsearch6ApiCallBridge > [] - Elasticsearch RestHighLevelClient is connected to > [http://127.0.0.1:9200] > 2020-08-20T20:55:30.2409277Z 2020-08-20 20:55:29,205 INFO > org.apache.flink.runtime.checkpoint.channel.ChannelStateWriteRequestExecutorImpl > [] - Source: Sequence Source -> Flat Map -> Sink: Unnamed (1/1) discarding 0 > drained requests > 2020-08-20T20:55:30.2410690Z 2020-08-20 20:55:29,218 INFO > org.apache.flink.runtime.taskmanager.Task[] - Source: > Sequence Source -> Flat Map -> Sink: Unnamed (1/1) > (cbc357ccb763df2852fee8c4fc7d55f2_0_0) switched from RUNNING to FINISHED. > 2020-08-20T20:55:30.2412187Z 2020-08-20 20:55:29,218 INFO > org.apache.flink.runtime.taskmanager.Task[] - Freeing > task resources for Source: Sequence Source -> Flat Map -> Sink: Unnamed (1/1) > (cbc357ccb763df2852fee8c4fc7d55f2_0_0). > 2020-08-20T20:55:30.2414203Z 2020-08-20 20:55:29,224 INFO > org.apache.flink.runtime.taskexecutor.TaskExecutor [] - > Un-registering task and sending final execution state FINISHED to JobManager > for task Source: Sequence Source -> Flat Map -> Sink: Unnamed (1/1) > cbc357ccb763df2852fee8c4fc7d55f2_0_0. > 2020-08-20T20:55:30.2415602Z 2020-08-20 20:55:29,219 INFO > org.apache.flink.streaming.runtime.tasks.AsyncCheckpointRunnable [] - Source: > Sequence Source -> Flat Map -> Sink: Unnamed (1/1) - asynchronous part of > checkpoint 1 could not be completed. > 2020-08-20T20:55:30.2416411Z java.io.UncheckedIOException: > java.io.IOException: Cannot register Closeable, this > subtaskCheckpointCoordinator is already closed. Closing argument. > 2020-08-20T20:55:30.2418956Z at > org.apache.flink.streaming.runtime.tasks.SubtaskCheckpointCoordinatorImpl.lambda$registerConsumer$2(SubtaskCheckpointCoordinatorImpl.java:468) > ~[flink-dist_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT] > 2020-08-20T20:55:30.2420100Z at > org.apache.flink.streaming.runtime.tasks.AsyncCheckpointRunnable.run(AsyncCheckpointRunnable.java:91) > [flink-dist_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT] > 2020-08-20T20:55:30.2420927Z at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > [?:1.8.0_265] > 2020-08-20T20:55:30.2421455Z at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > [?:1.8.0_265] >
[jira] [Updated] (FLINK-18855) Translate the "Cluster Execution" page of "Application Development's DataSet API" into Chinese
[ https://issues.apache.org/jira/browse/FLINK-18855?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dian Fu updated FLINK-18855: Fix Version/s: 1.12.0 > Translate the "Cluster Execution" page of "Application Development's DataSet > API" into Chinese > -- > > Key: FLINK-18855 > URL: https://issues.apache.org/jira/browse/FLINK-18855 > Project: Flink > Issue Type: Improvement > Components: Documentation >Affects Versions: 1.10.1, 1.11.0, 1.11.1 >Reporter: Roc Marshal >Assignee: Roc Marshal >Priority: Major > Labels: documentation, pull-request-available > Fix For: 1.12.0 > > > The markdown file is located in *flink/docs/dev/cluster_execution.zh.md* > The page url is > https://ci.apache.org/projects/flink/flink-docs-release-1.11/zh/dev/cluster_execution.html -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Closed] (FLINK-18855) Translate the "Cluster Execution" page of "Application Development's DataSet API" into Chinese
[ https://issues.apache.org/jira/browse/FLINK-18855?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dian Fu closed FLINK-18855. --- Resolution: Fixed master: 5500435372cdf258e4f0a646089c85389e25e94d > Translate the "Cluster Execution" page of "Application Development's DataSet > API" into Chinese > -- > > Key: FLINK-18855 > URL: https://issues.apache.org/jira/browse/FLINK-18855 > Project: Flink > Issue Type: Improvement > Components: Documentation >Affects Versions: 1.10.1, 1.11.0, 1.11.1 >Reporter: Roc Marshal >Assignee: Roc Marshal >Priority: Major > Labels: documentation, pull-request-available > > The markdown file is located in *flink/docs/dev/cluster_execution.zh.md* > The page url is > https://ci.apache.org/projects/flink/flink-docs-release-1.11/zh/dev/cluster_execution.html -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [flink] dianfu closed pull request #13173: [FLINK-18855][docs-zh] Translate the "Cluster Execution" page of "Application Development's DataSet API" into Chinese
dianfu closed pull request #13173: URL: https://github.com/apache/flink/pull/13173 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] dianfu commented on a change in pull request #13173: [FLINK-18855][docs-zh] Translate the "Cluster Execution" page of "Application Development's DataSet API" into Chinese
dianfu commented on a change in pull request #13173: URL: https://github.com/apache/flink/pull/13173#discussion_r475030431 ## File path: docs/dev/cluster_execution.zh.md ## @@ -25,27 +25,27 @@ under the License. * This will be replaced by the TOC {:toc} -Flink programs can run distributed on clusters of many machines. There -are two ways to send a program to a cluster for execution: +Flink 程序可以分布式运行在多机器集群上。有两种方式可以将程序提交到集群上执行: -## Command Line Interface + -The command line interface lets you submit packaged programs (JARs) to a cluster -(or single machine setup). +## 命令行界面(Interface) -Please refer to the [Command Line Interface]({{ site.baseurl }}/ops/cli.html) documentation for -details. +命令行界面使你可以将打包的程序(JARs)提交到集群(或单机设置)。 -## Remote Environment +有关详细信息,请参阅[命令行界面]({% link ops/cli.zh.md %})文档。 -The remote environment lets you execute Flink Java programs on a cluster -directly. The remote environment points to the cluster on which you want to -execute the program. + + +## 远程环境(Remote Environment) + +远程环境使你可以直接在集群上执行 Flink Java 程序。远程环境指向你要执行程序的群集。 Review comment: 群集 -> 集群 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] RocMarshal commented on pull request #13172: [FLINK-18854][docs-zh] Translate the 'API Migration Guides' page of 'Application Development' into Chinese
RocMarshal commented on pull request #13172: URL: https://github.com/apache/flink/pull/13172#issuecomment-678572640 @Thesharing Could you help me to review this PR if you have free time? Thank you. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] RocMarshal commented on pull request #13173: [FLINK-18855][docs-zh] Translate the "Cluster Execution" page of "Application Development's DataSet API" into Chinese
RocMarshal commented on pull request #13173: URL: https://github.com/apache/flink/pull/13173#issuecomment-678571817 @dianfu Could you help me to merge this PR if there is nothing inappropriate? Thank you. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Closed] (FLINK-18813) Translate the 'Local Installation' page of 'Try Flink' into Chinese
[ https://issues.apache.org/jira/browse/FLINK-18813?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dian Fu closed FLINK-18813. --- Fix Version/s: 1.12.0 Resolution: Fixed master: c9e46694fcf0e84c7b5d9428a31a77683ec0170d > Translate the 'Local Installation' page of 'Try Flink' into Chinese > --- > > Key: FLINK-18813 > URL: https://issues.apache.org/jira/browse/FLINK-18813 > Project: Flink > Issue Type: Task > Components: Documentation >Affects Versions: 1.10.1, 1.11.0, 1.11.1 >Reporter: Roc Marshal >Assignee: Roc Marshal >Priority: Major > Labels: documentaion, pull-request-available > Fix For: 1.12.0 > > > The page url is > [https://ci.apache.org/projects/flink/flink-docs-release-1.11/zh/try-flink/local_installation.html] > The markdown file is located in > *flink/docs/try-flink/local_installation.zh.md* -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [flink] dianfu closed pull request #13089: [FLINK-18813][docs-zh] Translate the 'Local Installation' page of 'Try Flink' into Chinese
dianfu closed pull request #13089: URL: https://github.com/apache/flink/pull/13089 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] dianfu commented on pull request #13089: [FLINK-18813][docs-zh] Translate the 'Local Installation' page of 'Try Flink' into Chinese
dianfu commented on pull request #13089: URL: https://github.com/apache/flink/pull/13089#issuecomment-678570567 Merging... This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] RocMarshal commented on a change in pull request #13089: [FLINK-18813][docs-zh] Translate the 'Local Installation' page of 'Try Flink' into Chinese
RocMarshal commented on a change in pull request #13089: URL: https://github.com/apache/flink/pull/13089#discussion_r475024133 ## File path: docs/try-flink/local_installation.zh.md ## @@ -26,36 +26,35 @@ under the License. {% if site.version contains "SNAPSHOT" %} - NOTE: The Apache Flink community only publishes official builds for - released versions of Apache Flink. + 注意:Apache Flink 社区只发布 Apache Flink 的 release 版本。 - Since you are currently looking at the latest SNAPSHOT - version of the documentation, all version references below will not work. - Please switch the documentation to the latest released version via the release picker which you - find on the left side below the menu. + 由于你当前正在查看的是文档最新的 SNAPSHOT 版本,因此相关内容会被隐藏。请通过左侧菜单底部的版本选择将文档切换到最新的 release 版本。 {% else %} -Follow these few steps to download the latest stable versions and get started. +请按照以下几个步骤下载最新的稳定版本开始使用。 -## Step 1: Download + Review comment: @dianfu The item `4. 标题锚点链接` of this page `https://cwiki.apache.org/confluence/display/FLINK/Flink+Translation+Specifications` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] dianfu commented on a change in pull request #13089: [FLINK-18813][docs-zh] Translate the 'Local Installation' page of 'Try Flink' into Chinese
dianfu commented on a change in pull request #13089: URL: https://github.com/apache/flink/pull/13089#discussion_r474762668 ## File path: docs/try-flink/local_installation.zh.md ## @@ -26,36 +26,35 @@ under the License. {% if site.version contains "SNAPSHOT" %} - NOTE: The Apache Flink community only publishes official builds for - released versions of Apache Flink. + 注意:Apache Flink 社区只发布 Apache Flink 的 release 版本。 - Since you are currently looking at the latest SNAPSHOT - version of the documentation, all version references below will not work. - Please switch the documentation to the latest released version via the release picker which you - find on the left side below the menu. + 由于你当前正在查看的是文档最新的 SNAPSHOT 版本,因此相关内容会被隐藏。请通过左侧菜单底部的版本选择将文档切换到最新的 release 版本。 {% else %} -Follow these few steps to download the latest stable versions and get started. +请按照以下几个步骤下载最新的稳定版本开始使用。 -## Step 1: Download + Review comment: Just wondering why we need this line? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] flinkbot edited a comment on pull request #13175: [FLINK-18955][Checkpointing] Add checkpoint path to job startup/restore message
flinkbot edited a comment on pull request #13175: URL: https://github.com/apache/flink/pull/13175#issuecomment-674839926 ## CI report: * 78f6ef8e0aa36ddc08ecd4a6984ec465c0192ec7 Azure: [SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=5779) Bot commands The @flinkbot bot supports the following commands: - `@flinkbot run travis` re-run the last Travis build - `@flinkbot run azure` re-run the last Azure build This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink-statefun] sjwiesman commented on pull request #132: [FLINK-18979][docs] Update docs to better emphasize remote functions
sjwiesman commented on pull request #132: URL: https://github.com/apache/flink-statefun/pull/132#issuecomment-678432084 Thanks for the review @morsapaes . I've addressed your feedback and also updated the README's in a seperate commit. I'm going on vacation next week so I went ahead and squashed my fixup commits. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink-statefun] sjwiesman commented on a change in pull request #132: [FLINK-18979][docs] Update docs to better emphasize remote functions
sjwiesman commented on a change in pull request #132: URL: https://github.com/apache/flink-statefun/pull/132#discussion_r474857047 ## File path: docs/index.md ## @@ -23,16 +23,19 @@ specific language governing permissions and limitations under the License. --> -**Stateful Functions** is an open source framework that reduces the complexity of building and orchestrating distributed stateful applications at scale. -It brings together the benefits of stream processing with Apache Flink® and Function-as-a-Service (FaaS) to provide a powerful abstraction for the next generation of event-driven architectures. +Stateful Functions is an API that simplifies the building of **distributed stateful applications** with a **runtime built for serverless architectures**. +It brings together the benefits of stateful stream processing, the processing of large datasets with low latency and bounded resource constraints, Review comment: Hmm, yeah I'll try something This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] morsapaes commented on pull request #13203: [FLINK-18984][python][docs] Add tutorial documentation for Python DataStream API
morsapaes commented on pull request #13203: URL: https://github.com/apache/flink/pull/13203#issuecomment-678426366 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] sjwiesman commented on pull request #13203: [FLINK-18984][python][docs] Add tutorial documentation for Python DataStream API
sjwiesman commented on pull request #13203: URL: https://github.com/apache/flink/pull/13203#issuecomment-678424666 I'm going to be on vacation for the next week, maybe @morsapaes can take a look at this? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] flinkbot edited a comment on pull request #13175: [FLINK-18955][Checkpointing] Add checkpoint path to job startup/restore message
flinkbot edited a comment on pull request #13175: URL: https://github.com/apache/flink/pull/13175#issuecomment-674839926 ## CI report: * 635f839466122e36674a38a9845c2d6b5eb5c244 Azure: [SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=5735) * 78f6ef8e0aa36ddc08ecd4a6984ec465c0192ec7 Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=5779) Bot commands The @flinkbot bot supports the following commands: - `@flinkbot run travis` re-run the last Travis build - `@flinkbot run azure` re-run the last Azure build This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] flinkbot edited a comment on pull request #13217: [FLINK-16866] Make job submission non-blocking
flinkbot edited a comment on pull request #13217: URL: https://github.com/apache/flink/pull/13217#issuecomment-678285884 ## CI report: * 85c9bbac2d9ca07435fbd80d76208fa2a51e5d37 Azure: [FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=5776) Bot commands The @flinkbot bot supports the following commands: - `@flinkbot run travis` re-run the last Travis build - `@flinkbot run azure` re-run the last Azure build This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] flinkbot edited a comment on pull request #13175: [FLINK-18955][Checkpointing] Add checkpoint path to job startup/restore message
flinkbot edited a comment on pull request #13175: URL: https://github.com/apache/flink/pull/13175#issuecomment-674839926 ## CI report: * 635f839466122e36674a38a9845c2d6b5eb5c244 Azure: [SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=5735) * 78f6ef8e0aa36ddc08ecd4a6984ec465c0192ec7 UNKNOWN Bot commands The @flinkbot bot supports the following commands: - `@flinkbot run travis` re-run the last Travis build - `@flinkbot run azure` re-run the last Azure build This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] curcur commented on pull request #13175: [FLINK-18955][Checkpointing] Add checkpoint path to job startup/restore message
curcur commented on pull request #13175: URL: https://github.com/apache/flink/pull/13175#issuecomment-678391416 With updated version ``` 9213 [flink-akka.actor.default-dispatcher-3] INFO org.apache.flink.runtime.checkpoint.CheckpointCoordinator [] - Restoring job 49eabb33b8dfc353a4ca205f5f02b118 from Checkpoint 1 @ 1598017016795 for 49eabb33b8dfc353a4ca205f5f02b118 located at file:/var/folders/dm/5xn_h6n9135dwy4j27sr65zhgp/T/junit3638159208308156331/junit7166739105409262417/checkpoints/49eabb33b8dfc353a4ca205f5f02b118/chk-1. ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] curcur commented on a change in pull request #13175: [FLINK-18955][Checkpointing] Add checkpoint path to job startup/restore message
curcur commented on a change in pull request #13175: URL: https://github.com/apache/flink/pull/13175#discussion_r474816716 ## File path: flink-runtime/src/main/java/org/apache/flink/runtime/checkpoint/CompletedCheckpoint.java ## @@ -320,6 +320,12 @@ void setDiscardCallback(@Nullable CompletedCheckpointStats.DiscardCallback disca @Override public String toString() { - return String.format("Checkpoint %d @ %d for %s", checkpointID, timestamp, job); + return String.format( + "%s %d @ %d for %s located at %s", + props.getCheckpointType(), Review comment: updated :-) I can double-check whether SYNC_SAVEPOINT is user-aware or not. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] curcur commented on a change in pull request #13175: [FLINK-18955][Checkpointing] Add checkpoint path to job startup/restore message
curcur commented on a change in pull request #13175: URL: https://github.com/apache/flink/pull/13175#discussion_r474816716 ## File path: flink-runtime/src/main/java/org/apache/flink/runtime/checkpoint/CompletedCheckpoint.java ## @@ -320,6 +320,12 @@ void setDiscardCallback(@Nullable CompletedCheckpointStats.DiscardCallback disca @Override public String toString() { - return String.format("Checkpoint %d @ %d for %s", checkpointID, timestamp, job); + return String.format( + "%s %d @ %d for %s located at %s", + props.getCheckpointType(), Review comment: updated :-) Do you mean SYNC_SAVEPOINT is not user-aware? I do not know... This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Closed] (FLINK-18993) Invoke sanityCheckTotalFlinkMemory method incorrectly in JobManagerFlinkMemoryUtils.java
[ https://issues.apache.org/jira/browse/FLINK-18993?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Till Rohrmann closed FLINK-18993. - Resolution: Fixed Fixed via master: df525b77d29ccd89649a64e5faad96c93f61ca08 1.11.2: 5e9bd610f3f1cb62e00c7baf1f02409c2d017c05 > Invoke sanityCheckTotalFlinkMemory method incorrectly in > JobManagerFlinkMemoryUtils.java > > > Key: FLINK-18993 > URL: https://issues.apache.org/jira/browse/FLINK-18993 > Project: Flink > Issue Type: Bug > Components: Runtime / Configuration >Affects Versions: 1.11.0 >Reporter: Peng >Assignee: Peng >Priority: Critical > Labels: pull-request-available > Fix For: 1.12.0, 1.11.2 > > > In deriveFromRequiredFineGrainedOptions method from > JobManagerFlinkMemoryUtils.java, it will call sanityCheckTotalFlinkMemory > method to check heap, off-heap and total memory as below. > {code:java} > sanityCheckTotalFlinkMemory(totalFlinkMemorySize, jvmHeapMemorySize, > totalFlinkMemorySize); > {code} > As I understand it, the third argument should be > {color:#de350b}*offHeapMemorySize.*{color} > {code:java} > sanityCheckTotalFlinkMemory(totalFlinkMemorySize, jvmHeapMemorySize, > offHeapMemorySize); > {code} > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [flink] tillrohrmann closed pull request #13198: [FLINK-18993][Runtime]Invoke sanityCheckTotalFlinkMemory method incorrectly in JobManagerFlinkMemoryUtils.java
tillrohrmann closed pull request #13198: URL: https://github.com/apache/flink/pull/13198 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Assigned] (FLINK-18993) Invoke sanityCheckTotalFlinkMemory method incorrectly in JobManagerFlinkMemoryUtils.java
[ https://issues.apache.org/jira/browse/FLINK-18993?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Till Rohrmann reassigned FLINK-18993: - Assignee: Peng > Invoke sanityCheckTotalFlinkMemory method incorrectly in > JobManagerFlinkMemoryUtils.java > > > Key: FLINK-18993 > URL: https://issues.apache.org/jira/browse/FLINK-18993 > Project: Flink > Issue Type: Bug > Components: Runtime / Configuration >Affects Versions: 1.11.0 >Reporter: Peng >Assignee: Peng >Priority: Critical > Labels: pull-request-available > Fix For: 1.12.0, 1.11.2 > > > In deriveFromRequiredFineGrainedOptions method from > JobManagerFlinkMemoryUtils.java, it will call sanityCheckTotalFlinkMemory > method to check heap, off-heap and total memory as below. > {code:java} > sanityCheckTotalFlinkMemory(totalFlinkMemorySize, jvmHeapMemorySize, > totalFlinkMemorySize); > {code} > As I understand it, the third argument should be > {color:#de350b}*offHeapMemorySize.*{color} > {code:java} > sanityCheckTotalFlinkMemory(totalFlinkMemorySize, jvmHeapMemorySize, > offHeapMemorySize); > {code} > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (FLINK-18993) Invoke sanityCheckTotalFlinkMemory method incorrectly in JobManagerFlinkMemoryUtils.java
[ https://issues.apache.org/jira/browse/FLINK-18993?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Till Rohrmann updated FLINK-18993: -- Fix Version/s: 1.11.2 1.12.0 > Invoke sanityCheckTotalFlinkMemory method incorrectly in > JobManagerFlinkMemoryUtils.java > > > Key: FLINK-18993 > URL: https://issues.apache.org/jira/browse/FLINK-18993 > Project: Flink > Issue Type: Bug > Components: Runtime / Configuration >Affects Versions: 1.11.0 >Reporter: Peng >Priority: Critical > Labels: pull-request-available > Fix For: 1.12.0, 1.11.2 > > > In deriveFromRequiredFineGrainedOptions method from > JobManagerFlinkMemoryUtils.java, it will call sanityCheckTotalFlinkMemory > method to check heap, off-heap and total memory as below. > {code:java} > sanityCheckTotalFlinkMemory(totalFlinkMemorySize, jvmHeapMemorySize, > totalFlinkMemorySize); > {code} > As I understand it, the third argument should be > {color:#de350b}*offHeapMemorySize.*{color} > {code:java} > sanityCheckTotalFlinkMemory(totalFlinkMemorySize, jvmHeapMemorySize, > offHeapMemorySize); > {code} > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-19011) Parallelize the restore operation in OperatorStateBackend
[ https://issues.apache.org/jira/browse/FLINK-19011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17181981#comment-17181981 ] Jiayi Liao commented on FLINK-19011: [~sewen] Mmm... probably you're right because we also find other problems caused by Union State like full gc on JobManager during failover. But I think the Union state is still useful in some cases. For example we store the watermark as a value state so that the job can recover with a correct watermark. I'd vote dropping this if we can find a better replacement of Union State. > Parallelize the restore operation in OperatorStateBackend > -- > > Key: FLINK-19011 > URL: https://issues.apache.org/jira/browse/FLINK-19011 > Project: Flink > Issue Type: Improvement > Components: Runtime / Checkpointing >Affects Versions: 1.11.1 >Reporter: Jiayi Liao >Priority: Major > > To restore the states, union state needs to read state handles produced by > all operators. And currently during the restore operation, Flink iterates the > state handles one by one, which could last tens of minutes if the magnitude > of state handles exceeds ten thousand. > To accelerate the process, I propose to parallelize the random reads on HDFS > and deserialization. We can create a runnable for each state handle and let > it return the metadata and deserialized data, which can be aggregated in main > thread. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (FLINK-19016) Checksum mismatch when restore from RocksDB
[ https://issues.apache.org/jira/browse/FLINK-19016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17181926#comment-17181926 ] Jiayi Liao edited comment on FLINK-19016 at 8/21/20, 4:26 PM: -- [~sewen] All of your assumptions are right. (BTW the local recovery is not enabled.) I can share my thoughts here but I'm not sure that I'm completely right. During RocksDB's takeDBNativeCheckpoint, RocksDB will invoke {{create_file_cb}} on current in-progress file (see [here|https://github.com/dataArtisans/frocksdb/blob/49bc897d5d768026f1eb816d960c1f2383396ef4/utilities/checkpoint/checkpoint_impl.cc#L292]), which will invoke CreateFile function(see [here|https://github.com/dataArtisans/frocksdb/blob/49bc897d5d768026f1eb816d960c1f2383396ef4/util/file_util.cc#L70]). Since Flink sets sync mode in db's options, the CreateFile and its Append operation may not succeed on disk. (If using fsync, I think it should be guranteed.) And the checksum mismatch problem is caused by partial record. (see [here|https://github.com/dataArtisans/frocksdb/blob/49bc897d5d768026f1eb816d960c1f2383396ef4/db/log_reader.cc#L176]). That's why I suspect that there's something wrong with the sync mode set by Flink. I'm not expert on Linux and RocksDB, plz correct me if I'm wrong. was (Author: wind_ljy): [~sewen] All of your assumptions are right. (BTW the local recovery is not enabled.) I can share my thoughts here but I'm not sure that I'm completely right. During RocksDB's takeDBNativeCheckpoint, RocksDB will invoke {{create_file_cb}} on current in-progress file (see [here|https://github.com/dataArtisans/frocksdb/blob/49bc897d5d768026f1eb816d960c1f2383396ef4/utilities/checkpoint/checkpoint_impl.cc#L292]), which will invoke CreateFile function(see [here|https://github.com/dataArtisans/frocksdb/blob/49bc897d5d768026f1eb816d960c1f2383396ef4/util/file_util.cc#L70]). Since Flink sets sync mode in db's options, the CreateFile and its Append operation may not succeed on disk. (If using fsync, I think it should be guranteed.) I'm not expert on Linux and RocksDB, plz correct me if I'm wrong. > Checksum mismatch when restore from RocksDB > --- > > Key: FLINK-19016 > URL: https://issues.apache.org/jira/browse/FLINK-19016 > Project: Flink > Issue Type: Bug > Components: Runtime / Checkpointing >Affects Versions: 1.11.1 >Reporter: Jiayi Liao >Priority: Major > > The error stack is shown below: > {code:java} > Caused by: org.apache.flink.util.FlinkException: Could not restore keyed > state backend for > KeyedMapBundleOperator_44cfc1ca74b40bb44eed1f38f72b3ea9_(71/300) from any of > the 1 provided restore options. > at > org.apache.flink.streaming.api.operators.BackendRestorerProcedure.createAndRestore(BackendRestorerProcedure.java:135) > at > org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.keyedStatedBackend(StreamTaskStateInitializerImpl.java:307) > at > org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.streamOperatorStateContext(StreamTaskStateInitializerImpl.java:135) > ... 6 more > Caused by: org.apache.flink.runtime.state.BackendBuildingException: Caught > unexpected exception. > at > org.apache.flink.contrib.streaming.state.RocksDBKeyedStateBackendBuilder.build(RocksDBKeyedStateBackendBuilder.java:333) > at > org.apache.flink.contrib.streaming.state.RocksDBStateBackend.createKeyedStateBackend(RocksDBStateBackend.java:580) > at > org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.lambda$keyedStatedBackend$1(StreamTaskStateInitializerImpl.java:291) > at > org.apache.flink.streaming.api.operators.BackendRestorerProcedure.attemptCreateAndRestore(BackendRestorerProcedure.java:142) > at > org.apache.flink.streaming.api.operators.BackendRestorerProcedure.createAndRestore(BackendRestorerProcedure.java:121) > ... 8 more > Caused by: java.io.IOException: Error while opening RocksDB instance. > at > org.apache.flink.contrib.streaming.state.RocksDBOperationUtils.openDB(RocksDBOperationUtils.java:74) > at > org.apache.flink.contrib.streaming.state.restore.AbstractRocksDBRestoreOperation.openDB(AbstractRocksDBRestoreOperation.java:131) > at > org.apache.flink.contrib.streaming.state.restore.RocksDBIncrementalRestoreOperation.restoreFromLocalState(RocksDBIncrementalRestoreOperation.java:214) > at > org.apache.flink.contrib.streaming.state.restore.RocksDBIncrementalRestoreOperation.restoreFromRemoteState(RocksDBIncrementalRestoreOperation.java:188) > at > org.apache.flink.contrib.streaming.state.restore.RocksDBIncrementalRestoreOperation.restoreWithoutRescaling(RocksDBIncrementalRestoreOperation.java:162) > at >
[jira] [Comment Edited] (FLINK-19016) Checksum mismatch when restore from RocksDB
[ https://issues.apache.org/jira/browse/FLINK-19016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17181926#comment-17181926 ] Jiayi Liao edited comment on FLINK-19016 at 8/21/20, 4:26 PM: -- [~sewen] All of your assumptions are right. (BTW the local recovery is not enabled.) I can share my thoughts here but I'm not sure that I'm completely right. During RocksDB's takeDBNativeCheckpoint, RocksDB will invoke {{create_file_cb}} on current in-progress file (see [here|https://github.com/dataArtisans/frocksdb/blob/49bc897d5d768026f1eb816d960c1f2383396ef4/utilities/checkpoint/checkpoint_impl.cc#L292]), which will invoke CreateFile function(see [here|https://github.com/dataArtisans/frocksdb/blob/49bc897d5d768026f1eb816d960c1f2383396ef4/util/file_util.cc#L70]). Since Flink sets sync mode in db's options, the CreateFile and its Append operation may not succeed on disk. (If using fsync, I think it should be guranteed.) And the checksum mismatch problem is caused by partial record(see [here|https://github.com/dataArtisans/frocksdb/blob/49bc897d5d768026f1eb816d960c1f2383396ef4/db/log_reader.cc#L176]). That's why I suspect that there's something wrong with the sync mode set by Flink. I'm not expert on Linux and RocksDB, plz correct me if I'm wrong. was (Author: wind_ljy): [~sewen] All of your assumptions are right. (BTW the local recovery is not enabled.) I can share my thoughts here but I'm not sure that I'm completely right. During RocksDB's takeDBNativeCheckpoint, RocksDB will invoke {{create_file_cb}} on current in-progress file (see [here|https://github.com/dataArtisans/frocksdb/blob/49bc897d5d768026f1eb816d960c1f2383396ef4/utilities/checkpoint/checkpoint_impl.cc#L292]), which will invoke CreateFile function(see [here|https://github.com/dataArtisans/frocksdb/blob/49bc897d5d768026f1eb816d960c1f2383396ef4/util/file_util.cc#L70]). Since Flink sets sync mode in db's options, the CreateFile and its Append operation may not succeed on disk. (If using fsync, I think it should be guranteed.) And the checksum mismatch problem is caused by partial record. (see [here|https://github.com/dataArtisans/frocksdb/blob/49bc897d5d768026f1eb816d960c1f2383396ef4/db/log_reader.cc#L176]). That's why I suspect that there's something wrong with the sync mode set by Flink. I'm not expert on Linux and RocksDB, plz correct me if I'm wrong. > Checksum mismatch when restore from RocksDB > --- > > Key: FLINK-19016 > URL: https://issues.apache.org/jira/browse/FLINK-19016 > Project: Flink > Issue Type: Bug > Components: Runtime / Checkpointing >Affects Versions: 1.11.1 >Reporter: Jiayi Liao >Priority: Major > > The error stack is shown below: > {code:java} > Caused by: org.apache.flink.util.FlinkException: Could not restore keyed > state backend for > KeyedMapBundleOperator_44cfc1ca74b40bb44eed1f38f72b3ea9_(71/300) from any of > the 1 provided restore options. > at > org.apache.flink.streaming.api.operators.BackendRestorerProcedure.createAndRestore(BackendRestorerProcedure.java:135) > at > org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.keyedStatedBackend(StreamTaskStateInitializerImpl.java:307) > at > org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.streamOperatorStateContext(StreamTaskStateInitializerImpl.java:135) > ... 6 more > Caused by: org.apache.flink.runtime.state.BackendBuildingException: Caught > unexpected exception. > at > org.apache.flink.contrib.streaming.state.RocksDBKeyedStateBackendBuilder.build(RocksDBKeyedStateBackendBuilder.java:333) > at > org.apache.flink.contrib.streaming.state.RocksDBStateBackend.createKeyedStateBackend(RocksDBStateBackend.java:580) > at > org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.lambda$keyedStatedBackend$1(StreamTaskStateInitializerImpl.java:291) > at > org.apache.flink.streaming.api.operators.BackendRestorerProcedure.attemptCreateAndRestore(BackendRestorerProcedure.java:142) > at > org.apache.flink.streaming.api.operators.BackendRestorerProcedure.createAndRestore(BackendRestorerProcedure.java:121) > ... 8 more > Caused by: java.io.IOException: Error while opening RocksDB instance. > at > org.apache.flink.contrib.streaming.state.RocksDBOperationUtils.openDB(RocksDBOperationUtils.java:74) > at > org.apache.flink.contrib.streaming.state.restore.AbstractRocksDBRestoreOperation.openDB(AbstractRocksDBRestoreOperation.java:131) > at > org.apache.flink.contrib.streaming.state.restore.RocksDBIncrementalRestoreOperation.restoreFromLocalState(RocksDBIncrementalRestoreOperation.java:214) > at >
[jira] [Comment Edited] (FLINK-19016) Checksum mismatch when restore from RocksDB
[ https://issues.apache.org/jira/browse/FLINK-19016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17181926#comment-17181926 ] Jiayi Liao edited comment on FLINK-19016 at 8/21/20, 4:23 PM: -- [~sewen] All of your assumptions are right. (BTW the local recovery is not enabled.) I can share my thoughts here but I'm not sure that I'm completely right. During RocksDB's takeDBNativeCheckpoint, RocksDB will invoke {{create_file_cb}} on current in-progress file (see [here|https://github.com/dataArtisans/frocksdb/blob/49bc897d5d768026f1eb816d960c1f2383396ef4/utilities/checkpoint/checkpoint_impl.cc#L292]), which will invoke CreateFile function(see [here|https://github.com/dataArtisans/frocksdb/blob/49bc897d5d768026f1eb816d960c1f2383396ef4/util/file_util.cc#L70]). Since Flink sets sync mode in db's options, the CreateFile and its Append operation may not succeed on disk. (If using fsync, I think it should be guranteed.) I'm not expert on Linux and RocksDB, plz correct me if I'm wrong. was (Author: wind_ljy): [~sewen] All of your assumptions are right. (BTW the local recovery is not enabled.) I can share my thoughts here but I'm not sure that I'm completely right. During RocksDB's takeDBNativeCheckpoint, RocksDB will invoke {{create_file_cb}} on current in-progress file (see [here|https://github.com/dataArtisans/frocksdb/blob/49bc897d5d768026f1eb816d960c1f2383396ef4/utilities/checkpoint/checkpoint_impl.cc#L292]), which will invoke CreateFile function(see [here|https://github.com/dataArtisans/frocksdb/blob/49bc897d5d768026f1eb816d960c1f2383396ef4/util/file_util.cc#L70]). Since Flink uses sync operation, the CreateFile and its Append operation may not succeed on disk. (If using fsync, I think it should be guranteed.) I'm not expert on Linux and RocksDB, plz correct me if I'm wrong. > Checksum mismatch when restore from RocksDB > --- > > Key: FLINK-19016 > URL: https://issues.apache.org/jira/browse/FLINK-19016 > Project: Flink > Issue Type: Bug > Components: Runtime / Checkpointing >Affects Versions: 1.11.1 >Reporter: Jiayi Liao >Priority: Major > > The error stack is shown below: > {code:java} > Caused by: org.apache.flink.util.FlinkException: Could not restore keyed > state backend for > KeyedMapBundleOperator_44cfc1ca74b40bb44eed1f38f72b3ea9_(71/300) from any of > the 1 provided restore options. > at > org.apache.flink.streaming.api.operators.BackendRestorerProcedure.createAndRestore(BackendRestorerProcedure.java:135) > at > org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.keyedStatedBackend(StreamTaskStateInitializerImpl.java:307) > at > org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.streamOperatorStateContext(StreamTaskStateInitializerImpl.java:135) > ... 6 more > Caused by: org.apache.flink.runtime.state.BackendBuildingException: Caught > unexpected exception. > at > org.apache.flink.contrib.streaming.state.RocksDBKeyedStateBackendBuilder.build(RocksDBKeyedStateBackendBuilder.java:333) > at > org.apache.flink.contrib.streaming.state.RocksDBStateBackend.createKeyedStateBackend(RocksDBStateBackend.java:580) > at > org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.lambda$keyedStatedBackend$1(StreamTaskStateInitializerImpl.java:291) > at > org.apache.flink.streaming.api.operators.BackendRestorerProcedure.attemptCreateAndRestore(BackendRestorerProcedure.java:142) > at > org.apache.flink.streaming.api.operators.BackendRestorerProcedure.createAndRestore(BackendRestorerProcedure.java:121) > ... 8 more > Caused by: java.io.IOException: Error while opening RocksDB instance. > at > org.apache.flink.contrib.streaming.state.RocksDBOperationUtils.openDB(RocksDBOperationUtils.java:74) > at > org.apache.flink.contrib.streaming.state.restore.AbstractRocksDBRestoreOperation.openDB(AbstractRocksDBRestoreOperation.java:131) > at > org.apache.flink.contrib.streaming.state.restore.RocksDBIncrementalRestoreOperation.restoreFromLocalState(RocksDBIncrementalRestoreOperation.java:214) > at > org.apache.flink.contrib.streaming.state.restore.RocksDBIncrementalRestoreOperation.restoreFromRemoteState(RocksDBIncrementalRestoreOperation.java:188) > at > org.apache.flink.contrib.streaming.state.restore.RocksDBIncrementalRestoreOperation.restoreWithoutRescaling(RocksDBIncrementalRestoreOperation.java:162) > at > org.apache.flink.contrib.streaming.state.restore.RocksDBIncrementalRestoreOperation.restore(RocksDBIncrementalRestoreOperation.java:148) > at > org.apache.flink.contrib.streaming.state.RocksDBKeyedStateBackendBuilder.build(RocksDBKeyedStateBackendBuilder.java:277) >
[jira] [Created] (FLINK-19024) Remove unused "releaseMemory" from ResultSubpartition
Stephan Ewen created FLINK-19024: Summary: Remove unused "releaseMemory" from ResultSubpartition Key: FLINK-19024 URL: https://issues.apache.org/jira/browse/FLINK-19024 Project: Flink Issue Type: Sub-task Components: Runtime / Network Reporter: Stephan Ewen Assignee: Stephan Ewen Fix For: 1.12.0 The {{releaseMemory()}} call in the {{ResultSubpartition}} is currently not meaningful for any existing implementation. Future versions where memory may have to be released will quite possibly not implement that on a "subpartition" level. For example, a sort based shuffle has the buffers on a partition-level, rather than a subpartition level. We should thus remove the {{releaseMemory()}} call from the abstract subpartition interface. Concrete implementations can still release memory on a subpartition level, if needed in the future. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [flink] flinkbot edited a comment on pull request #13206: [FLINK-18948][python] Add end to end test for Python DataStream API.
flinkbot edited a comment on pull request #13206: URL: https://github.com/apache/flink/pull/13206#issuecomment-677500353 ## CI report: * 5b6041fc76df9a5345ae672cd29e5e3b0be51f73 Azure: [FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=5774) Azure: [FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=5764) Bot commands The @flinkbot bot supports the following commands: - `@flinkbot run travis` re-run the last Travis build - `@flinkbot run azure` re-run the last Azure build This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] lgo commented on pull request #12345: draft: [FLINK-17971] [Runtime/StateBackends] Add RocksDB SST ingestion for batch writes
lgo commented on pull request #12345: URL: https://github.com/apache/flink/pull/12345#issuecomment-678368305 Unfortunately not. The changes should be code complete, but I did not find the time to test and evaluate perf on running cluster yet and had backported to 1.9.1 for that. Thanks for the reminder on this though, I'm going to try to do that next week when I've got a bunch of free time. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] flinkbot edited a comment on pull request #13024: [FLINK-18742][flink-clients] Some configuration args do not take effect at client
flinkbot edited a comment on pull request #13024: URL: https://github.com/apache/flink/pull/13024#issuecomment-666148245 ## CI report: * eb611d5b39f997fa3e986b5d163cb65a44b4b0ba UNKNOWN * e505da81c290c0ab3f21defb1ef0f3a3f7be69eb Azure: [SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=5773) Azure: [SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=5749) Bot commands The @flinkbot bot supports the following commands: - `@flinkbot run travis` re-run the last Travis build - `@flinkbot run azure` re-run the last Azure build This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] dianfu commented on a change in pull request #13089: [FLINK-18813][docs-zh] Translate the 'Local Installation' page of 'Try Flink' into Chinese
dianfu commented on a change in pull request #13089: URL: https://github.com/apache/flink/pull/13089#discussion_r474762668 ## File path: docs/try-flink/local_installation.zh.md ## @@ -26,36 +26,35 @@ under the License. {% if site.version contains "SNAPSHOT" %} - NOTE: The Apache Flink community only publishes official builds for - released versions of Apache Flink. + 注意:Apache Flink 社区只发布 Apache Flink 的 release 版本。 - Since you are currently looking at the latest SNAPSHOT - version of the documentation, all version references below will not work. - Please switch the documentation to the latest released version via the release picker which you - find on the left side below the menu. + 由于你当前正在查看的是文档最新的 SNAPSHOT 版本,因此相关内容会被隐藏。请通过左侧菜单底部的版本选择将文档切换到最新的 release 版本。 {% else %} -Follow these few steps to download the latest stable versions and get started. +请按照以下几个步骤下载最新的稳定版本开始使用。 -## Step 1: Download + Review comment: Just wondering why we need this line? I have tried to cherry-pick this PR to release-1.11 locally and find that it could not display the content correctly. After remove this line and also "", etc, everything works well. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Created] (FLINK-19023) Remove pruning of Record Serializer Buffer
Stephan Ewen created FLINK-19023: Summary: Remove pruning of Record Serializer Buffer Key: FLINK-19023 URL: https://issues.apache.org/jira/browse/FLINK-19023 Project: Flink Issue Type: Sub-task Components: Runtime / Network Reporter: Stephan Ewen Assignee: Stephan Ewen Fix For: 1.12.0 Currently, the {{SpanningRecordSerializer}} prunes its internal serialization buffer under special circumstances: - The buffer becomes larger than a certain threshold (5MB) - The full record end lines up exactly with a full buffer length (this change got introduced at some point, it is not clear what the purpose is) This optimization virtually never kicks in (because of the second condition) and also seems unnecessary. There is only a single serializer on the sender side, so this will not help to reduce the maximum memory footprint needed in any way. NOTE: A similar optimization on the reader side ({{SpillingAdaptiveSpanningRecordDeserializer}}) makes sense, because multiple parallel deserializers run in order to piece together the records when retrieving buffers from the network in arbitrary order. Truncating buffers (or spilling) there helps reduce the maximum required memory footprint. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [flink] rmetzger commented on a change in pull request #12823: FLINK-18013: Refactor Hadoop utils to a single module
rmetzger commented on a change in pull request #12823: URL: https://github.com/apache/flink/pull/12823#discussion_r474752292 ## File path: flink-hadoop-utils/pom.xml ## @@ -0,0 +1,78 @@ + + +http://maven.apache.org/POM/4.0.0; +xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance; +xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd;> + + flink-parent + org.apache.flink + 1.12-SNAPSHOT + + 4.0.0 + + flink-hadoop-utils + + + + org.apache.hadoop + hadoop-common + ${hadoop.version} + provided + + + + org.apache.hadoop + hadoop-hdfs + ${hadoop.version} + test + + + + org.apache.flink + flink-core + ${project.version} + + + + org.apache.flink + flink-test-utils-junit Review comment: why is this not test scoped? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Commented] (FLINK-19016) Checksum mismatch when restore from RocksDB
[ https://issues.apache.org/jira/browse/FLINK-19016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17181926#comment-17181926 ] Jiayi Liao commented on FLINK-19016: [~sewen] All of your assumptions are right. (BTW the local recovery is not enabled.) I can share my thoughts here but I'm not sure that I'm completely right. During RocksDB's takeDBNativeCheckpoint, RocksDB will invoke {{create_file_cb}} on current in-progress file (see [here|https://github.com/dataArtisans/frocksdb/blob/49bc897d5d768026f1eb816d960c1f2383396ef4/utilities/checkpoint/checkpoint_impl.cc#L292]), which will invoke CreateFile function(see [here|https://github.com/dataArtisans/frocksdb/blob/49bc897d5d768026f1eb816d960c1f2383396ef4/util/file_util.cc#L70]). Since Flink uses sync operation, the CreateFile and its Append operation may not succeed on disk. (If using fsync, I think it should be guranteed.) I'm not expert on Linux and RocksDB, plz correct me if I'm wrong. > Checksum mismatch when restore from RocksDB > --- > > Key: FLINK-19016 > URL: https://issues.apache.org/jira/browse/FLINK-19016 > Project: Flink > Issue Type: Bug > Components: Runtime / Checkpointing >Affects Versions: 1.11.1 >Reporter: Jiayi Liao >Priority: Major > > The error stack is shown below: > {code:java} > Caused by: org.apache.flink.util.FlinkException: Could not restore keyed > state backend for > KeyedMapBundleOperator_44cfc1ca74b40bb44eed1f38f72b3ea9_(71/300) from any of > the 1 provided restore options. > at > org.apache.flink.streaming.api.operators.BackendRestorerProcedure.createAndRestore(BackendRestorerProcedure.java:135) > at > org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.keyedStatedBackend(StreamTaskStateInitializerImpl.java:307) > at > org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.streamOperatorStateContext(StreamTaskStateInitializerImpl.java:135) > ... 6 more > Caused by: org.apache.flink.runtime.state.BackendBuildingException: Caught > unexpected exception. > at > org.apache.flink.contrib.streaming.state.RocksDBKeyedStateBackendBuilder.build(RocksDBKeyedStateBackendBuilder.java:333) > at > org.apache.flink.contrib.streaming.state.RocksDBStateBackend.createKeyedStateBackend(RocksDBStateBackend.java:580) > at > org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.lambda$keyedStatedBackend$1(StreamTaskStateInitializerImpl.java:291) > at > org.apache.flink.streaming.api.operators.BackendRestorerProcedure.attemptCreateAndRestore(BackendRestorerProcedure.java:142) > at > org.apache.flink.streaming.api.operators.BackendRestorerProcedure.createAndRestore(BackendRestorerProcedure.java:121) > ... 8 more > Caused by: java.io.IOException: Error while opening RocksDB instance. > at > org.apache.flink.contrib.streaming.state.RocksDBOperationUtils.openDB(RocksDBOperationUtils.java:74) > at > org.apache.flink.contrib.streaming.state.restore.AbstractRocksDBRestoreOperation.openDB(AbstractRocksDBRestoreOperation.java:131) > at > org.apache.flink.contrib.streaming.state.restore.RocksDBIncrementalRestoreOperation.restoreFromLocalState(RocksDBIncrementalRestoreOperation.java:214) > at > org.apache.flink.contrib.streaming.state.restore.RocksDBIncrementalRestoreOperation.restoreFromRemoteState(RocksDBIncrementalRestoreOperation.java:188) > at > org.apache.flink.contrib.streaming.state.restore.RocksDBIncrementalRestoreOperation.restoreWithoutRescaling(RocksDBIncrementalRestoreOperation.java:162) > at > org.apache.flink.contrib.streaming.state.restore.RocksDBIncrementalRestoreOperation.restore(RocksDBIncrementalRestoreOperation.java:148) > at > org.apache.flink.contrib.streaming.state.RocksDBKeyedStateBackendBuilder.build(RocksDBKeyedStateBackendBuilder.java:277) > ... 12 more > Caused by: org.rocksdb.RocksDBException: checksum mismatch > at org.rocksdb.RocksDB.open(Native Method) > at org.rocksdb.RocksDB.open(RocksDB.java:286) > at > org.apache.flink.contrib.streaming.state.RocksDBOperationUtils.openDB(RocksDBOperationUtils.java:66) > ... 18 more > {code} > The machine goes down because of hardware problem, then the job cannot > restart successfully anymore. After digging a little bit, I found that > RocksDB in Flink uses sync instead of fsync to synchronized the data with the > disk. With sync operation, the RocksDB cannot guarantee that the current > in-progress file can be persisted on disk in takeDBNativeCheckpoint. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [flink] rmetzger commented on pull request #13217: [FLINK-16866] Make job submission non-blocking
rmetzger commented on pull request #13217: URL: https://github.com/apache/flink/pull/13217#issuecomment-678327440 I assumed the tests to be stable, because my personal CI finished [pretty green](https://dev.azure.com/rmetzger/Flink/_build/results?buildId=8308=results). I will investigate the unstable tests. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] rmetzger edited a comment on pull request #13162: [FLINK-18685][API / DataStream]JobClient.getAccumulators() blocks until streaming job has finished in local environment
rmetzger edited a comment on pull request #13162: URL: https://github.com/apache/flink/pull/13162#issuecomment-678321188 This is the error I'm seeing ``` java.lang.IllegalStateException: MiniCluster is not yet running or has already been shut down. at org.apache.flink.util.Preconditions.checkState(Preconditions.java:195) at org.apache.flink.runtime.minicluster.MiniCluster.getDispatcherGatewayFuture(MiniCluster.java:707) at org.apache.flink.runtime.minicluster.MiniCluster.runDispatcherCommand(MiniCluster.java:621) at org.apache.flink.runtime.minicluster.MiniCluster.getExecutionGraph(MiniCluster.java:607) at org.apache.flink.client.program.PerJobMiniClusterFactory$PerJobMiniClusterJobClient.getAccumulators(PerJobMiniClusterFactory.java:182) at org.apache.flink.client.program.PerJobMiniClusterFactoryTest.testJobExecution(PerJobMiniClusterFactoryTest.java:67) ``` We could implement the getAccumulator methods as follows ```java @Override public CompletableFuture> getAccumulators(ClassLoader classLoader) { if (miniCluster.isRunning()) { return miniCluster .getExecutionGraph(jobID) .thenApply(AccessExecutionGraph::getAccumulatorsSerialized) .thenApply(accumulators -> { try { return AccumulatorHelper.deserializeAndUnwrapAccumulators(accumulators, classLoader); } catch (Exception e) { throw new CompletionException("Cannot deserialize and unwrap accumulators properly.", e); } }); } else { return getJobExecutionResult(classLoader).thenApply(JobExecutionResult::getAllAccumulatorResults); } } ``` (Disclaimer: I'm not very familiar with that part of the codebase, I might need to ask another committer for a final review) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] rmetzger commented on pull request #13162: [FLINK-18685][API / DataStream]JobClient.getAccumulators() blocks until streaming job has finished in local environment
rmetzger commented on pull request #13162: URL: https://github.com/apache/flink/pull/13162#issuecomment-678321188 This is the error I'm seeing ``` java.lang.IllegalStateException: MiniCluster is not yet running or has already been shut down. at org.apache.flink.util.Preconditions.checkState(Preconditions.java:195) at org.apache.flink.runtime.minicluster.MiniCluster.getDispatcherGatewayFuture(MiniCluster.java:707) at org.apache.flink.runtime.minicluster.MiniCluster.runDispatcherCommand(MiniCluster.java:621) at org.apache.flink.runtime.minicluster.MiniCluster.getExecutionGraph(MiniCluster.java:607) at org.apache.flink.client.program.PerJobMiniClusterFactory$PerJobMiniClusterJobClient.getAccumulators(PerJobMiniClusterFactory.java:182) at org.apache.flink.client.program.PerJobMiniClusterFactoryTest.testJobExecution(PerJobMiniClusterFactoryTest.java:67) ``` We could implement the getAccumulator methods as follows ``` @Override public CompletableFuture> getAccumulators(ClassLoader classLoader) { if (miniCluster.isRunning()) { return miniCluster .getExecutionGraph(jobID) .thenApply(AccessExecutionGraph::getAccumulatorsSerialized) .thenApply(accumulators -> { try { return AccumulatorHelper.deserializeAndUnwrapAccumulators(accumulators, classLoader); } catch (Exception e) { throw new CompletionException("Cannot deserialize and unwrap accumulators properly.", e); } }); } else { return getJobExecutionResult(classLoader).thenApply(JobExecutionResult::getAllAccumulatorResults); } } ``` (Disclaimer: I'm not very familiar with that part of the codebase, I might need to ask another committer for a final review) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink-web] sjwiesman commented on pull request #371: [hotfix] Add makefile
sjwiesman commented on pull request #371: URL: https://github.com/apache/flink-web/pull/371#issuecomment-678314215 merged in a229405453080b7f70a9d04fae5e6d22129d20bc This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink-web] sjwiesman closed pull request #371: [hotfix] Add makefile
sjwiesman closed pull request #371: URL: https://github.com/apache/flink-web/pull/371 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Commented] (FLINK-18797) docs and examples use deprecated forms of keyBy
[ https://issues.apache.org/jira/browse/FLINK-18797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17181913#comment-17181913 ] David Anderson commented on FLINK-18797: [~rmetzger] No, I only looked at keyBy. You're right, we should rework the examples so they don't use any deprecated methods. I wonder if it be possible to set up something like checkstyles so that the examples fail if they are using deprecated methods. > docs and examples use deprecated forms of keyBy > --- > > Key: FLINK-18797 > URL: https://issues.apache.org/jira/browse/FLINK-18797 > Project: Flink > Issue Type: Improvement > Components: Documentation, Examples >Affects Versions: 1.11.0, 1.11.1 >Reporter: David Anderson >Assignee: wulei0302 >Priority: Major > Labels: pull-request-available > > The DataStream example at > https://ci.apache.org/projects/flink/flink-docs-stable/dev/datastream_api.html#example-program > uses > {{keyBy(0)}} > which has been deprecated. There are many other cases of this throughout the > docs: > dev/connectors/cassandra.md > dev/parallel.md > dev/stream/operators/index.md > dev/stream/operators/process_function.md > dev/stream/state/queryable_state.md > dev/stream/state/state.md > dev/types_serialization.md > learn-flink/etl.md > ops/scala_shell.md > and also in a number of examples: > AsyncIOExample.java > SideOutputExample.java > TwitterExample.java > GroupedProcessingTimeWindowExample.java > SessionWindowing.java > TopSpeedWindowing.java > WindowWordCount.java > WordCount.java > TwitterExample.scala > GroupedProcessingTimeWindowExample.scala > SessionWindowing.scala > WindowWordCount.scala > WordCount.scala > There are also some uses of keyBy("string"), which has also been deprecated: > dev/connectors/cassandra.md > dev/stream/operators/index.md > dev/types_serialization.md > learn-flink/etl.md > SocketWindowWordCount.java > SocketWindowWordCount.scala > TopSpeedWindowing.scala -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (FLINK-19022) AkkaRpcActor failed to start but no exception information
[ https://issues.apache.org/jira/browse/FLINK-19022?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] tartarus updated FLINK-19022: - Description: My job appeared that JM could not start normally, and the JM container was finally killed by RM. In the end, I found through debug that AkkaRpcActor failed to start because the version of yarn in my job was incompatible with the version in the cluster. [AkkaRpcActor exception handling|https://github.com/apache/flink/blob/478c9657fe1240acdc1eb08ad32ea93e08b0cd5e/flink-runtime/src/main/java/org/apache/flink/runtime/rpc/akka/AkkaRpcActor.java#L550] I add log printing here,and then found the specific problem. {code:java} 2020-08-21 21:31:16,985 ERROR org.apache.flink.runtime.rpc.akka.AkkaRpcActor$StoppedState [flink-akka.actor.default-dispatcher-4] - Could not start RpcEndpoint resourcemanager. java.lang.NoSuchMethodError: org.apache.hadoop.yarn.api.ApplicationMasterProtocolPB.registerApplicationMaster(Lcom/google/protobuf/RpcController;Lorg/apache/hadoop/yarn/proto/YarnServiceProtos$RegisterApplicationMasterRequestProto;)Lorg/apache/hadoop/yarn/proto/YarnServiceProtos$RegisterApplicationMasterResponseProto; at org.apache.hadoop.yarn.api.impl.pb.client.ApplicationMasterProtocolPBClientImpl.registerApplicationMaster(ApplicationMasterProtocolPBClientImpl.java:106) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:191) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102) at com.sun.proxy.$Proxy25.registerApplicationMaster(Unknown Source) at org.apache.hadoop.yarn.client.api.impl.AMRMClientImpl.registerApplicationMaster(AMRMClientImpl.java:222) at org.apache.hadoop.yarn.client.api.impl.AMRMClientImpl.registerApplicationMaster(AMRMClientImpl.java:214) at org.apache.hadoop.yarn.client.api.async.impl.AMRMClientAsyncImpl.registerApplicationMaster(AMRMClientAsyncImpl.java:138) at org.apache.flink.yarn.YarnResourceManager.createAndStartResourceManagerClient(YarnResourceManager.java:229) at org.apache.flink.yarn.YarnResourceManager.initialize(YarnResourceManager.java:262) at org.apache.flink.runtime.resourcemanager.ResourceManager.startResourceManagerServices(ResourceManager.java:204) at org.apache.flink.runtime.resourcemanager.ResourceManager.onStart(ResourceManager.java:192) at org.apache.flink.runtime.rpc.RpcEndpoint.internalCallOnStart(RpcEndpoint.java:185) at org.apache.flink.runtime.rpc.akka.AkkaRpcActor$StoppedState.start(AkkaRpcActor.java:544) at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleControlMessage(AkkaRpcActor.java:169) at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:26) at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:21) at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:123) at akka.japi.pf.UnitCaseStatement.applyOrElse(CaseStatements.scala:21) at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:170) at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171) at akka.actor.Actor$class.aroundReceive(Actor.scala:517) at akka.actor.AbstractActor.aroundReceive(AbstractActor.scala:225) at akka.actor.ActorCell.receiveMessage(ActorCell.scala:592) at akka.actor.ActorCell.invoke(ActorCell.scala:561) at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:258) at akka.dispatch.Mailbox.run(Mailbox.scala:225) at akka.dispatch.Mailbox.exec(Mailbox.scala:235) at akka.dispatch.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) at akka.dispatch.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) at akka.dispatch.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) at akka.dispatch.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107){code} Should we add logs here to help find problems? [ |http://dict.youdao.com/search?q=http=chrome.extension] was: My task appeared that JM could not start normally, and the JM container was finally killed by RM. In the end, I found through debug that AkkaRpcActor failed to start because the version of yarn in my job was incompatible with the version in the cluster. [AkkaRpcActor exception handling|https://github.com/apache/flink/blob/478c9657fe1240acdc1eb08ad32ea93e08b0cd5e/flink-runtime/src/main/java/org/apache/flink/runtime/rpc/akka/AkkaRpcActor.java#L550] I add log printing here,and then found the specific
[GitHub] [flink] flinkbot edited a comment on pull request #13214: [FLINK-18938][tableSQL/API] Throw better exception message for quering sink-only connector
flinkbot edited a comment on pull request #13214: URL: https://github.com/apache/flink/pull/13214#issuecomment-678119788 ## CI report: * ef62ec54b70560a9dabc09f020f7386d907aab20 Azure: [FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=5771) Bot commands The @flinkbot bot supports the following commands: - `@flinkbot run travis` re-run the last Travis build - `@flinkbot run azure` re-run the last Azure build This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Commented] (FLINK-19022) AkkaRpcActor failed to start but no exception information
[ https://issues.apache.org/jira/browse/FLINK-19022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17181911#comment-17181911 ] tartarus commented on FLINK-19022: -- [~chesnay] [~trohrmann] How about adding log printing here to help quickly find the problem? Please assign to me, thanks > AkkaRpcActor failed to start but no exception information > - > > Key: FLINK-19022 > URL: https://issues.apache.org/jira/browse/FLINK-19022 > Project: Flink > Issue Type: Bug > Components: Runtime / Coordination >Affects Versions: 1.10.0 >Reporter: tartarus >Priority: Major > > My task appeared that JM could not start normally, and the JM container was > finally killed by RM. > In the end, I found through debug that AkkaRpcActor failed to start because > the version of yarn in my job was incompatible with the version in the > cluster. > [AkkaRpcActor exception > handling|https://github.com/apache/flink/blob/478c9657fe1240acdc1eb08ad32ea93e08b0cd5e/flink-runtime/src/main/java/org/apache/flink/runtime/rpc/akka/AkkaRpcActor.java#L550] > I add log printing here,and then found the specific problem. > {code:java} > 2020-08-21 21:31:16,985 ERROR > org.apache.flink.runtime.rpc.akka.AkkaRpcActor$StoppedState > [flink-akka.actor.default-dispatcher-4] - Could not start RpcEndpoint > resourcemanager. > java.lang.NoSuchMethodError: > org.apache.hadoop.yarn.api.ApplicationMasterProtocolPB.registerApplicationMaster(Lcom/google/protobuf/RpcController;Lorg/apache/hadoop/yarn/proto/YarnServiceProtos$RegisterApplicationMasterRequestProto;)Lorg/apache/hadoop/yarn/proto/YarnServiceProtos$RegisterApplicationMasterResponseProto; > at > org.apache.hadoop.yarn.api.impl.pb.client.ApplicationMasterProtocolPBClientImpl.registerApplicationMaster(ApplicationMasterProtocolPBClientImpl.java:106) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:191) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102) > at com.sun.proxy.$Proxy25.registerApplicationMaster(Unknown Source) > at > org.apache.hadoop.yarn.client.api.impl.AMRMClientImpl.registerApplicationMaster(AMRMClientImpl.java:222) > at > org.apache.hadoop.yarn.client.api.impl.AMRMClientImpl.registerApplicationMaster(AMRMClientImpl.java:214) > at > org.apache.hadoop.yarn.client.api.async.impl.AMRMClientAsyncImpl.registerApplicationMaster(AMRMClientAsyncImpl.java:138) > at > org.apache.flink.yarn.YarnResourceManager.createAndStartResourceManagerClient(YarnResourceManager.java:229) > at > org.apache.flink.yarn.YarnResourceManager.initialize(YarnResourceManager.java:262) > at > org.apache.flink.runtime.resourcemanager.ResourceManager.startResourceManagerServices(ResourceManager.java:204) > at > org.apache.flink.runtime.resourcemanager.ResourceManager.onStart(ResourceManager.java:192) > at > org.apache.flink.runtime.rpc.RpcEndpoint.internalCallOnStart(RpcEndpoint.java:185) > at > org.apache.flink.runtime.rpc.akka.AkkaRpcActor$StoppedState.start(AkkaRpcActor.java:544) > at > org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleControlMessage(AkkaRpcActor.java:169) > at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:26) > at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:21) > at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:123) > at akka.japi.pf.UnitCaseStatement.applyOrElse(CaseStatements.scala:21) > at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:170) > at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171) > at akka.actor.Actor$class.aroundReceive(Actor.scala:517) > at akka.actor.AbstractActor.aroundReceive(AbstractActor.scala:225) > at akka.actor.ActorCell.receiveMessage(ActorCell.scala:592) > at akka.actor.ActorCell.invoke(ActorCell.scala:561) > at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:258) > at akka.dispatch.Mailbox.run(Mailbox.scala:225) > at akka.dispatch.Mailbox.exec(Mailbox.scala:235) > at akka.dispatch.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) > at > akka.dispatch.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) > at akka.dispatch.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) > at >
[jira] [Created] (FLINK-19022) AkkaRpcActor failed to start but no exception information
tartarus created FLINK-19022: Summary: AkkaRpcActor failed to start but no exception information Key: FLINK-19022 URL: https://issues.apache.org/jira/browse/FLINK-19022 Project: Flink Issue Type: Bug Components: Runtime / Coordination Affects Versions: 1.10.0 Reporter: tartarus My task appeared that JM could not start normally, and the JM container was finally killed by RM. In the end, I found through debug that AkkaRpcActor failed to start because the version of yarn in my job was incompatible with the version in the cluster. [AkkaRpcActor exception handling|https://github.com/apache/flink/blob/478c9657fe1240acdc1eb08ad32ea93e08b0cd5e/flink-runtime/src/main/java/org/apache/flink/runtime/rpc/akka/AkkaRpcActor.java#L550] I add log printing here,and then found the specific problem. {code:java} 2020-08-21 21:31:16,985 ERROR org.apache.flink.runtime.rpc.akka.AkkaRpcActor$StoppedState [flink-akka.actor.default-dispatcher-4] - Could not start RpcEndpoint resourcemanager. java.lang.NoSuchMethodError: org.apache.hadoop.yarn.api.ApplicationMasterProtocolPB.registerApplicationMaster(Lcom/google/protobuf/RpcController;Lorg/apache/hadoop/yarn/proto/YarnServiceProtos$RegisterApplicationMasterRequestProto;)Lorg/apache/hadoop/yarn/proto/YarnServiceProtos$RegisterApplicationMasterResponseProto; at org.apache.hadoop.yarn.api.impl.pb.client.ApplicationMasterProtocolPBClientImpl.registerApplicationMaster(ApplicationMasterProtocolPBClientImpl.java:106) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:191) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102) at com.sun.proxy.$Proxy25.registerApplicationMaster(Unknown Source) at org.apache.hadoop.yarn.client.api.impl.AMRMClientImpl.registerApplicationMaster(AMRMClientImpl.java:222) at org.apache.hadoop.yarn.client.api.impl.AMRMClientImpl.registerApplicationMaster(AMRMClientImpl.java:214) at org.apache.hadoop.yarn.client.api.async.impl.AMRMClientAsyncImpl.registerApplicationMaster(AMRMClientAsyncImpl.java:138) at org.apache.flink.yarn.YarnResourceManager.createAndStartResourceManagerClient(YarnResourceManager.java:229) at org.apache.flink.yarn.YarnResourceManager.initialize(YarnResourceManager.java:262) at org.apache.flink.runtime.resourcemanager.ResourceManager.startResourceManagerServices(ResourceManager.java:204) at org.apache.flink.runtime.resourcemanager.ResourceManager.onStart(ResourceManager.java:192) at org.apache.flink.runtime.rpc.RpcEndpoint.internalCallOnStart(RpcEndpoint.java:185) at org.apache.flink.runtime.rpc.akka.AkkaRpcActor$StoppedState.start(AkkaRpcActor.java:544) at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleControlMessage(AkkaRpcActor.java:169) at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:26) at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:21) at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:123) at akka.japi.pf.UnitCaseStatement.applyOrElse(CaseStatements.scala:21) at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:170) at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171) at akka.actor.Actor$class.aroundReceive(Actor.scala:517) at akka.actor.AbstractActor.aroundReceive(AbstractActor.scala:225) at akka.actor.ActorCell.receiveMessage(ActorCell.scala:592) at akka.actor.ActorCell.invoke(ActorCell.scala:561) at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:258) at akka.dispatch.Mailbox.run(Mailbox.scala:225) at akka.dispatch.Mailbox.exec(Mailbox.scala:235) at akka.dispatch.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) at akka.dispatch.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) at akka.dispatch.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) at akka.dispatch.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107){code} Should we add logs here to help find problems? [ |http://dict.youdao.com/search?q=http=chrome.extension] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-19021) Cleanups of the ResultPartition components
Stephan Ewen created FLINK-19021: Summary: Cleanups of the ResultPartition components Key: FLINK-19021 URL: https://issues.apache.org/jira/browse/FLINK-19021 Project: Flink Issue Type: Improvement Components: Runtime / Network Reporter: Stephan Ewen Assignee: Stephan Ewen Fix For: 1.12.0 This is the umbrella issue for a set of simplifications and cleanups in the {{ResultPartition}} components. This cleanup is in preparation for a possible future refactoring. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-18970) Adding Junit TestMarkers
[ https://issues.apache.org/jira/browse/FLINK-18970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17181900#comment-17181900 ] Stephan Ewen commented on FLINK-18970: -- This distinction already exists: Files ending in "Test" are unit tests, files ending in "ITCase" are integration tests. > Adding Junit TestMarkers > > > Key: FLINK-18970 > URL: https://issues.apache.org/jira/browse/FLINK-18970 > Project: Flink > Issue Type: Improvement > Components: Test Infrastructure, Tests >Affects Versions: 1.11.1 >Reporter: goutham >Priority: Minor > Fix For: 1.11.1 > > Original Estimate: 10h > Remaining Estimate: 10h > > I am planning to add Test Marker to run the Unit test and Integration test > using markers. > Currently, if you want to run the complete build locally it takes close to 2 > hours. Based on requirement developers can run unit tests only or just > integration or both. > By default, it will run all the tests. > planning to introduce below markers > @Tag("IntegrationTest") > @Tag("UnitTest") -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-19011) Parallelize the restore operation in OperatorStateBackend
[ https://issues.apache.org/jira/browse/FLINK-19011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17181898#comment-17181898 ] Stephan Ewen commented on FLINK-19011: -- Union state has caused us much trouble through the years. I am leaning towards deprecating and dropping the feature. - The new sources should not need it any more, the new sinks neither. - The feature is not really scalable, it has an inherent limit, worse than broadcast, because it works linearly until a failure and then becomes quadratic (like a broadcast). So it is a non-scalability you discover only when it is too late. What do you think? > Parallelize the restore operation in OperatorStateBackend > -- > > Key: FLINK-19011 > URL: https://issues.apache.org/jira/browse/FLINK-19011 > Project: Flink > Issue Type: Improvement > Components: Runtime / Checkpointing >Affects Versions: 1.11.1 >Reporter: Jiayi Liao >Priority: Major > > To restore the states, union state needs to read state handles produced by > all operators. And currently during the restore operation, Flink iterates the > state handles one by one, which could last tens of minutes if the magnitude > of state handles exceeds ten thousand. > To accelerate the process, I propose to parallelize the random reads on HDFS > and deserialization. We can create a runnable for each state handle and let > it return the metadata and deserialized data, which can be aggregated in main > thread. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-19016) Checksum mismatch when restore from RocksDB
[ https://issues.apache.org/jira/browse/FLINK-19016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17181897#comment-17181897 ] Stephan Ewen commented on FLINK-19016: -- Thanks for reporting this. Can you help us understand this a bit better? (1) I assume that this is not due to local recovery, because the machine failed, as you said. (2) Does the restore comes from the remote DFS (HDFS / S3 / ...) in this case? (I assume yes) (3) Is the file already corrupt on the DFS, meaning do repeated failovers always fail? (I assume yes) (4) How can fsync affect the whether the upload to DFS is corrupt? If the upload succeeded, then the file could be read locally completely, and it should not matter whether it is fully on disk or partially only in the disk cache. Could you share your thoughts on how fsync can affect this? > Checksum mismatch when restore from RocksDB > --- > > Key: FLINK-19016 > URL: https://issues.apache.org/jira/browse/FLINK-19016 > Project: Flink > Issue Type: Bug > Components: Runtime / Checkpointing >Affects Versions: 1.11.1 >Reporter: Jiayi Liao >Priority: Major > > The error stack is shown below: > {code:java} > Caused by: org.apache.flink.util.FlinkException: Could not restore keyed > state backend for > KeyedMapBundleOperator_44cfc1ca74b40bb44eed1f38f72b3ea9_(71/300) from any of > the 1 provided restore options. > at > org.apache.flink.streaming.api.operators.BackendRestorerProcedure.createAndRestore(BackendRestorerProcedure.java:135) > at > org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.keyedStatedBackend(StreamTaskStateInitializerImpl.java:307) > at > org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.streamOperatorStateContext(StreamTaskStateInitializerImpl.java:135) > ... 6 more > Caused by: org.apache.flink.runtime.state.BackendBuildingException: Caught > unexpected exception. > at > org.apache.flink.contrib.streaming.state.RocksDBKeyedStateBackendBuilder.build(RocksDBKeyedStateBackendBuilder.java:333) > at > org.apache.flink.contrib.streaming.state.RocksDBStateBackend.createKeyedStateBackend(RocksDBStateBackend.java:580) > at > org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.lambda$keyedStatedBackend$1(StreamTaskStateInitializerImpl.java:291) > at > org.apache.flink.streaming.api.operators.BackendRestorerProcedure.attemptCreateAndRestore(BackendRestorerProcedure.java:142) > at > org.apache.flink.streaming.api.operators.BackendRestorerProcedure.createAndRestore(BackendRestorerProcedure.java:121) > ... 8 more > Caused by: java.io.IOException: Error while opening RocksDB instance. > at > org.apache.flink.contrib.streaming.state.RocksDBOperationUtils.openDB(RocksDBOperationUtils.java:74) > at > org.apache.flink.contrib.streaming.state.restore.AbstractRocksDBRestoreOperation.openDB(AbstractRocksDBRestoreOperation.java:131) > at > org.apache.flink.contrib.streaming.state.restore.RocksDBIncrementalRestoreOperation.restoreFromLocalState(RocksDBIncrementalRestoreOperation.java:214) > at > org.apache.flink.contrib.streaming.state.restore.RocksDBIncrementalRestoreOperation.restoreFromRemoteState(RocksDBIncrementalRestoreOperation.java:188) > at > org.apache.flink.contrib.streaming.state.restore.RocksDBIncrementalRestoreOperation.restoreWithoutRescaling(RocksDBIncrementalRestoreOperation.java:162) > at > org.apache.flink.contrib.streaming.state.restore.RocksDBIncrementalRestoreOperation.restore(RocksDBIncrementalRestoreOperation.java:148) > at > org.apache.flink.contrib.streaming.state.RocksDBKeyedStateBackendBuilder.build(RocksDBKeyedStateBackendBuilder.java:277) > ... 12 more > Caused by: org.rocksdb.RocksDBException: checksum mismatch > at org.rocksdb.RocksDB.open(Native Method) > at org.rocksdb.RocksDB.open(RocksDB.java:286) > at > org.apache.flink.contrib.streaming.state.RocksDBOperationUtils.openDB(RocksDBOperationUtils.java:66) > ... 18 more > {code} > The machine goes down because of hardware problem, then the job cannot > restart successfully anymore. After digging a little bit, I found that > RocksDB in Flink uses sync instead of fsync to synchronized the data with the > disk. With sync operation, the RocksDB cannot guarantee that the current > in-progress file can be persisted on disk in takeDBNativeCheckpoint. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [flink] morsapaes commented on a change in pull request #13193: [FLINK-18918][python][docs] Add dedicated connector documentation for Python Table API
morsapaes commented on a change in pull request #13193: URL: https://github.com/apache/flink/pull/13193#discussion_r474700929 ## File path: docs/dev/python/user-guide/table/python_table_api_connectors.md ## @@ -0,0 +1,194 @@ +--- +title: "Connectors" +nav-parent_id: python_tableapi +nav-pos: 130 +--- + + + +This page describes how to use connectors in PyFlink and highlights the different parts between using connectors in PyFlink vs Java/Scala. + +* This will be replaced by the TOC +{:toc} + +NoteFor general connector information and common configuration, please refer to the corresponding [Java/Scala documentation]({{ site.baseurl }}/dev/table/connectors/index.html). + +## Download connector and format jars + +For both connectors and formats, implementations are available as jars that need to be specified as job [dependencies]({{ site.baseurl }}/dev/python/user-guide/table/dependency_management.html). + +{% highlight python %} + +table_env.get_config().get_configuration().set_string("pipeline.jars", "file:///my/jar/path/connector.jar;file:///my/jar/path/json.jar") + +{% endhighlight %} + +## How to use connectors + +In PyFink's Table API, DDL is the recommended way to define sources and sinks, executed via the `execute_sql()` method on the `TableEnvironment`. This makes the table available for use by the application. + +{% highlight python %} + +source_ddl = """ +CREATE TABLE source_table( +a VARCHAR, +b INT +) WITH ( + 'connector.type' = 'kafka', + 'connector.version' = 'universal', + 'connector.topic' = 'source_topic', + 'connector.properties.bootstrap.servers' = 'kafka:9092', + 'connector.properties.group.id' = 'test_3', + 'connector.startup-mode' = 'latest-offset', + 'format.type' = 'json' +) +""" + +sink_ddl = """ +CREATE TABLE sink_table( +a VARCHAR +) WITH ( + 'connector.type' = 'kafka', + 'connector.version' = 'universal', + 'connector.topic' = 'sink_topic', + 'connector.properties.bootstrap.servers' = 'kafka:9092', + 'format.type' = 'json' +) +""" + +t_env.execute_sql(source_ddl) +t_env.execute_sql(sink_ddl) + +t_env.sql_query("select a from source_table") \ +.insert_into("sink_table") + +{% endhighlight %} + +Below is a complete example of how to use the Kafka and Json format in PyFlink. + +{% highlight python %} + +from pyflink.datastream import StreamExecutionEnvironment, TimeCharacteristic +from pyflink.table import StreamTableEnvironment, EnvironmentSettings + + +def log_processing(): +env = StreamExecutionEnvironment.get_execution_environment() +env_settings = EnvironmentSettings.Builder().use_blink_planner().build() +t_env = StreamTableEnvironment.create(stream_execution_environment=env, environment_settings=env_settings) + t_env.get_config().get_configuration().set_boolean("python.fn-execution.memory.managed", True) +# specify connector and format jars +t_env.get_config().get_configuration().set_string("pipeline.jars", "file:///my/jar/path/connector.jar;file:///my/jar/path/json.jar") + +source_ddl = """ +CREATE TABLE source_table( +a VARCHAR, +b INT +) WITH ( + 'connector.type' = 'kafka', + 'connector.version' = 'universal', + 'connector.topic' = 'source_topic', + 'connector.properties.bootstrap.servers' = 'kafka:9092', + 'connector.properties.group.id' = 'test_3', + 'connector.startup-mode' = 'latest-offset', + 'format.type' = 'json' +) +""" + +sink_ddl = """ +CREATE TABLE sink_table( +a VARCHAR +) WITH ( + 'connector.type' = 'kafka', + 'connector.version' = 'universal', + 'connector.topic' = 'sink_topic', + 'connector.properties.bootstrap.servers' = 'kafka:9092', + 'format.type' = 'json' +) +""" + +t_env.execute_sql(source_ddl) +t_env.execute_sql(sink_ddl) + +t_env.sql_query("select a from source_table") \ Review comment: Just a personal preference/suggestion, for readability. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink-web] sjwiesman opened a new pull request #371: [hotfix] Add makefile
sjwiesman opened a new pull request #371: URL: https://github.com/apache/flink-web/pull/371 Adds a makefile to add explicit commands for rebuilding vs running locally. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] flinkbot edited a comment on pull request #13217: [FLINK-16866] Make job submission non-blocking
flinkbot edited a comment on pull request #13217: URL: https://github.com/apache/flink/pull/13217#issuecomment-678285884 ## CI report: * 85c9bbac2d9ca07435fbd80d76208fa2a51e5d37 Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=5776) Bot commands The @flinkbot bot supports the following commands: - `@flinkbot run travis` re-run the last Travis build - `@flinkbot run azure` re-run the last Azure build This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] morsapaes commented on a change in pull request #13193: [FLINK-18918][python][docs] Add dedicated connector documentation for Python Table API
morsapaes commented on a change in pull request #13193: URL: https://github.com/apache/flink/pull/13193#discussion_r474666933 ## File path: docs/dev/python/user-guide/table/python_table_api_connectors.md ## @@ -0,0 +1,194 @@ +--- +title: "Connectors" +nav-parent_id: python_tableapi +nav-pos: 130 +--- + + + +This page describes how to use connectors in PyFlink and highlights the different parts between using connectors in PyFlink vs Java/Scala. + +* This will be replaced by the TOC +{:toc} + +NoteFor general connector information and common configuration, please refer to the corresponding [Java/Scala documentation]({{ site.baseurl }}/dev/table/connectors/index.html). + +## Download connector and format jars + +For both connectors and formats, implementations are available as jars that need to be specified as job [dependencies]({{ site.baseurl }}/dev/python/user-guide/table/dependency_management.html). Review comment: Here, it would be nice to add a short sentence explaining that Flink is a Java/Scala-based project, just for the sake of completeness and so users who are not familiar with Flink understand why they have to deal with JARs in a Python program. ## File path: docs/dev/python/user-guide/table/python_table_api_connectors.md ## @@ -0,0 +1,194 @@ +--- +title: "Connectors" +nav-parent_id: python_tableapi +nav-pos: 130 +--- + + + +This page describes how to use connectors in PyFlink and highlights the different parts between using connectors in PyFlink vs Java/Scala. + +* This will be replaced by the TOC +{:toc} + +NoteFor general connector information and common configuration, please refer to the corresponding [Java/Scala documentation]({{ site.baseurl }}/dev/table/connectors/index.html). + +## Download connector and format jars + +For both connectors and formats, implementations are available as jars that need to be specified as job [dependencies]({{ site.baseurl }}/dev/python/user-guide/table/dependency_management.html). + +{% highlight python %} + +table_env.get_config().get_configuration().set_string("pipeline.jars", "file:///my/jar/path/connector.jar;file:///my/jar/path/json.jar") + +{% endhighlight %} + +## How to use connectors + +In PyFink's Table API, DDL is the recommended way to define sources and sinks, executed via the `execute_sql()` method on the `TableEnvironment`. This makes the table available for use by the application. + +{% highlight python %} + +source_ddl = """ +CREATE TABLE source_table( +a VARCHAR, +b INT +) WITH ( + 'connector.type' = 'kafka', + 'connector.version' = 'universal', + 'connector.topic' = 'source_topic', + 'connector.properties.bootstrap.servers' = 'kafka:9092', + 'connector.properties.group.id' = 'test_3', + 'connector.startup-mode' = 'latest-offset', + 'format.type' = 'json' +) +""" + +sink_ddl = """ +CREATE TABLE sink_table( +a VARCHAR +) WITH ( + 'connector.type' = 'kafka', + 'connector.version' = 'universal', + 'connector.topic' = 'sink_topic', + 'connector.properties.bootstrap.servers' = 'kafka:9092', + 'format.type' = 'json' +) +""" + +t_env.execute_sql(source_ddl) +t_env.execute_sql(sink_ddl) + +t_env.sql_query("select a from source_table") \ Review comment: ```suggestion t_env.sql_query("SELECT a FROM source_table") \ ``` ## File path: docs/dev/python/user-guide/table/python_table_api_connectors.md ## @@ -0,0 +1,194 @@ +--- +title: "Connectors" +nav-parent_id: python_tableapi +nav-pos: 130 +--- + + + +This page describes how to use connectors in PyFlink and highlights the different parts between using connectors in PyFlink vs Java/Scala. + +* This will be replaced by the TOC +{:toc} + +NoteFor general connector information and common configuration, please refer to the corresponding [Java/Scala documentation]({{ site.baseurl }}/dev/table/connectors/index.html). + +## Download connector and format jars + +For both connectors and formats, implementations are available as jars that need to be specified as job [dependencies]({{ site.baseurl }}/dev/python/user-guide/table/dependency_management.html). + +{% highlight python %} + +table_env.get_config().get_configuration().set_string("pipeline.jars", "file:///my/jar/path/connector.jar;file:///my/jar/path/json.jar") + +{% endhighlight %} + +## How to use connectors + +In PyFink's Table API, DDL is the recommended way to define sources and sinks, executed via the `execute_sql()` method on the `TableEnvironment`. This makes the table available for use by the application. + +{% highlight python %} + +source_ddl = """ +CREATE TABLE source_table( +a VARCHAR, +b INT +) WITH ( + 'connector.type' = 'kafka', + 'connector.version' = 'universal', +
[GitHub] [flink] flinkbot edited a comment on pull request #13144: [FLINK-15853][hive][table-planner-blink] Use the new type inference f…
flinkbot edited a comment on pull request #13144: URL: https://github.com/apache/flink/pull/13144#issuecomment-673924067 ## CI report: * 9b69c04b6a987bf81e21fd8a7c062fb72fe7ccb5 Azure: [SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=5769) Bot commands The @flinkbot bot supports the following commands: - `@flinkbot run travis` re-run the last Travis build - `@flinkbot run azure` re-run the last Azure build This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Created] (FLINK-19020) Add more metrics around async operations and backpressure
Igal Shilman created FLINK-19020: Summary: Add more metrics around async operations and backpressure Key: FLINK-19020 URL: https://issues.apache.org/jira/browse/FLINK-19020 Project: Flink Issue Type: Improvement Components: Stateful Functions Reporter: Igal Shilman -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-19019) Add HDFS / S3 / GCS support to Flink-StateFun Docker image.
Igal Shilman created FLINK-19019: Summary: Add HDFS / S3 / GCS support to Flink-StateFun Docker image. Key: FLINK-19019 URL: https://issues.apache.org/jira/browse/FLINK-19019 Project: Flink Issue Type: Improvement Components: Stateful Functions Reporter: Igal Shilman The 3rd party filesystem support is not enabled by default. To make checkpointing work in s3 / gcs etc' we need to add the relevant plugins. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-18797) docs and examples use deprecated forms of keyBy
[ https://issues.apache.org/jira/browse/FLINK-18797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17181894#comment-17181894 ] Robert Metzger commented on FLINK-18797: Thanks a lot for filing a ticket for this [~alpinegizmo]. I noticed that there are more deprecated methods, such as {{.writeAsText()}} or {{.assignTimestampsAndWatermarks()}} which are used in the examples. Did you file tickets for these as well? > docs and examples use deprecated forms of keyBy > --- > > Key: FLINK-18797 > URL: https://issues.apache.org/jira/browse/FLINK-18797 > Project: Flink > Issue Type: Improvement > Components: Documentation, Examples >Affects Versions: 1.11.0, 1.11.1 >Reporter: David Anderson >Assignee: wulei0302 >Priority: Major > Labels: pull-request-available > > The DataStream example at > https://ci.apache.org/projects/flink/flink-docs-stable/dev/datastream_api.html#example-program > uses > {{keyBy(0)}} > which has been deprecated. There are many other cases of this throughout the > docs: > dev/connectors/cassandra.md > dev/parallel.md > dev/stream/operators/index.md > dev/stream/operators/process_function.md > dev/stream/state/queryable_state.md > dev/stream/state/state.md > dev/types_serialization.md > learn-flink/etl.md > ops/scala_shell.md > and also in a number of examples: > AsyncIOExample.java > SideOutputExample.java > TwitterExample.java > GroupedProcessingTimeWindowExample.java > SessionWindowing.java > TopSpeedWindowing.java > WindowWordCount.java > WordCount.java > TwitterExample.scala > GroupedProcessingTimeWindowExample.scala > SessionWindowing.scala > WindowWordCount.scala > WordCount.scala > There are also some uses of keyBy("string"), which has also been deprecated: > dev/connectors/cassandra.md > dev/stream/operators/index.md > dev/types_serialization.md > learn-flink/etl.md > SocketWindowWordCount.java > SocketWindowWordCount.scala > TopSpeedWindowing.scala -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [flink] flinkbot commented on pull request #13217: [FLINK-16866] Make job submission non-blocking
flinkbot commented on pull request #13217: URL: https://github.com/apache/flink/pull/13217#issuecomment-678285884 ## CI report: * 85c9bbac2d9ca07435fbd80d76208fa2a51e5d37 UNKNOWN Bot commands The @flinkbot bot supports the following commands: - `@flinkbot run travis` re-run the last Travis build - `@flinkbot run azure` re-run the last Azure build This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] rmetzger edited a comment on pull request #13217: [FLINK-16866] Make job submission non-blocking
rmetzger edited a comment on pull request #13217: URL: https://github.com/apache/flink/pull/13217#issuecomment-678283618 Note: I'm still investigating one last unstable test `FunctionITCase.testInvalidUseOfTableFunction()` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] rmetzger commented on pull request #13217: [FLINK-16866] Make job submission non-blocking
rmetzger commented on pull request #13217: URL: https://github.com/apache/flink/pull/13217#issuecomment-678283618 Note: I'm still reviewing one last unstable test `FunctionITCase.testInvalidUseOfTableFunction()` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Comment Edited] (FLINK-19005) used metaspace grow on every execution
[ https://issues.apache.org/jira/browse/FLINK-19005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17181871#comment-17181871 ] Matthias edited comment on FLINK-19005 at 8/21/20, 1:09 PM: Thanks for the update. A quick diff shows already that there is a growing number of classes generated through reflection: {code:bash} # analysis on the "after 1 execution" heap dump FLINK-19005 grep -o "class .*" after_1/out.html | sed -e 's~class ~~g' -e 's~~~g' | cut -d'$' -f1 | sed 's/[0-9]*$//g'| sort | uniq -c | sort -rn | head 287 jdk.internal.reflect.GeneratedSerializationConstructorAccessor 150 jdk.internal.reflect.GeneratedMethodAccessor 136 oracle.jdbc.driver.Redirector 41 org.apache.flink.shaded.guava18.com.google.common.cache.LocalCache 37 akka.remote.WireFormats 37 akka.remote.EndpointManager 35 org.apache.flink.shaded.curator.org.apache.curator.shaded.com.google.common.cache.LocalCache 35 akka.remote.RemoteSettings 33 org.apache.flink.shaded.hadoop2.com.google.common.cache.LocalCache 31 akka.remote.serialization.MiscMessageSerializer {code} {code:bash} # analysis on the "after 10 executions" heap dump FLINK-19005 grep -o "class .*" after_10/out.html | sed -e 's~class ~~g' -e 's~~~g' | cut -d'$' -f1 | sed 's/[0-9]*$//g'| sort | uniq -c | sort -rn | head 575 jdk.internal.reflect.GeneratedSerializationConstructorAccessor 223 jdk.internal.reflect.GeneratedMethodAccessor 136 oracle.jdbc.driver.Redirector 49 com.sun.proxy. 41 org.apache.flink.shaded.guava18.com.google.common.cache.LocalCache 37 akka.remote.WireFormats 37 akka.remote.EndpointManager 36 jdk.internal.reflect.GeneratedConstructorAccessor 35 org.apache.flink.shaded.curator.org.apache.curator.shaded.com.google.common.cache.LocalCache 35 akka.remote.RemoteSettings {code} was (Author: mapohl): Thanks for the update: A quick diff shows already that there is a growing number of classes generated through reflection: {code:bash} # analysis on the "after 1 execution" heap dump FLINK-19005 grep -o "class .*" after_1/out.html | sed -e 's~class ~~g' -e 's~~~g' | cut -d'$' -f1 | sed 's/[0-9]*$//g'| sort | uniq -c | sort -rn | head 287 jdk.internal.reflect.GeneratedSerializationConstructorAccessor 150 jdk.internal.reflect.GeneratedMethodAccessor 136 oracle.jdbc.driver.Redirector 41 org.apache.flink.shaded.guava18.com.google.common.cache.LocalCache 37 akka.remote.WireFormats 37 akka.remote.EndpointManager 35 org.apache.flink.shaded.curator.org.apache.curator.shaded.com.google.common.cache.LocalCache 35 akka.remote.RemoteSettings 33 org.apache.flink.shaded.hadoop2.com.google.common.cache.LocalCache 31 akka.remote.serialization.MiscMessageSerializer {code} {code:bash} # analysis on the "after 10 executions" heap dump FLINK-19005 grep -o "class .*" after_10/out.html | sed -e 's~class ~~g' -e 's~~~g' | cut -d'$' -f1 | sed 's/[0-9]*$//g'| sort | uniq -c | sort -rn | head 575 jdk.internal.reflect.GeneratedSerializationConstructorAccessor 223 jdk.internal.reflect.GeneratedMethodAccessor 136 oracle.jdbc.driver.Redirector 49 com.sun.proxy. 41 org.apache.flink.shaded.guava18.com.google.common.cache.LocalCache 37 akka.remote.WireFormats 37 akka.remote.EndpointManager 36 jdk.internal.reflect.GeneratedConstructorAccessor 35 org.apache.flink.shaded.curator.org.apache.curator.shaded.com.google.common.cache.LocalCache 35 akka.remote.RemoteSettings {code} > used metaspace grow on every execution > -- > > Key: FLINK-19005 > URL: https://issues.apache.org/jira/browse/FLINK-19005 > Project: Flink > Issue Type: Bug > Components: API / DataSet, Client / Job Submission >Affects Versions: 1.11.1 >Reporter: Guillermo Sánchez >Assignee: Chesnay Schepler >Priority: Major > Attachments: heap_dump_after_10_executions.zip, > heap_dump_after_1_execution.zip > > > Hi ! > Im running a 1.11.1 flink cluster, where I execute batch jobs made with > DataSet API. > I submit these jobs every day to calculate daily data. > In every execution, cluster's used metaspace increase by 7MB and its never > released. > This ends up with an OutOfMemoryError caused by Metaspace every 15 days and i > need to restart the cluster to clean the metaspace > taskmanager.memory.jvm-metaspace.size is set to 512mb > Any idea of what could be causing this metaspace grow and why is it not > released ? > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (FLINK-19005) used metaspace grow on every execution
[ https://issues.apache.org/jira/browse/FLINK-19005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17181871#comment-17181871 ] Matthias edited comment on FLINK-19005 at 8/21/20, 1:09 PM: Thanks for the update: A quick diff shows already that there is a growing number of classes generated through reflection: {code:bash} # analysis on the "after 1 execution" heap dump FLINK-19005 grep -o "class .*" after_1/out.html | sed -e 's~class ~~g' -e 's~~~g' | cut -d'$' -f1 | sed 's/[0-9]*$//g'| sort | uniq -c | sort -rn | head 287 jdk.internal.reflect.GeneratedSerializationConstructorAccessor 150 jdk.internal.reflect.GeneratedMethodAccessor 136 oracle.jdbc.driver.Redirector 41 org.apache.flink.shaded.guava18.com.google.common.cache.LocalCache 37 akka.remote.WireFormats 37 akka.remote.EndpointManager 35 org.apache.flink.shaded.curator.org.apache.curator.shaded.com.google.common.cache.LocalCache 35 akka.remote.RemoteSettings 33 org.apache.flink.shaded.hadoop2.com.google.common.cache.LocalCache 31 akka.remote.serialization.MiscMessageSerializer {code} {code:bash} # analysis on the "after 10 executions" heap dump FLINK-19005 grep -o "class .*" after_10/out.html | sed -e 's~class ~~g' -e 's~~~g' | cut -d'$' -f1 | sed 's/[0-9]*$//g'| sort | uniq -c | sort -rn | head 575 jdk.internal.reflect.GeneratedSerializationConstructorAccessor 223 jdk.internal.reflect.GeneratedMethodAccessor 136 oracle.jdbc.driver.Redirector 49 com.sun.proxy. 41 org.apache.flink.shaded.guava18.com.google.common.cache.LocalCache 37 akka.remote.WireFormats 37 akka.remote.EndpointManager 36 jdk.internal.reflect.GeneratedConstructorAccessor 35 org.apache.flink.shaded.curator.org.apache.curator.shaded.com.google.common.cache.LocalCache 35 akka.remote.RemoteSettings {code} was (Author: mapohl): Thanks for the update: A quick diff shows already that there is a growing number of class generated through reflection: {code:bash} # analysis on the "after 1 execution" heap dump FLINK-19005 grep -o "class .*" after_1/out.html | sed -e 's~class ~~g' -e 's~~~g' | cut -d'$' -f1 | sed 's/[0-9]*$//g'| sort | uniq -c | sort -rn | head 287 jdk.internal.reflect.GeneratedSerializationConstructorAccessor 150 jdk.internal.reflect.GeneratedMethodAccessor 136 oracle.jdbc.driver.Redirector 41 org.apache.flink.shaded.guava18.com.google.common.cache.LocalCache 37 akka.remote.WireFormats 37 akka.remote.EndpointManager 35 org.apache.flink.shaded.curator.org.apache.curator.shaded.com.google.common.cache.LocalCache 35 akka.remote.RemoteSettings 33 org.apache.flink.shaded.hadoop2.com.google.common.cache.LocalCache 31 akka.remote.serialization.MiscMessageSerializer {code} {code:bash} # analysis on the "after 10 executions" heap dump FLINK-19005 grep -o "class .*" after_10/out.html | sed -e 's~class ~~g' -e 's~~~g' | cut -d'$' -f1 | sed 's/[0-9]*$//g'| sort | uniq -c | sort -rn | head 575 jdk.internal.reflect.GeneratedSerializationConstructorAccessor 223 jdk.internal.reflect.GeneratedMethodAccessor 136 oracle.jdbc.driver.Redirector 49 com.sun.proxy. 41 org.apache.flink.shaded.guava18.com.google.common.cache.LocalCache 37 akka.remote.WireFormats 37 akka.remote.EndpointManager 36 jdk.internal.reflect.GeneratedConstructorAccessor 35 org.apache.flink.shaded.curator.org.apache.curator.shaded.com.google.common.cache.LocalCache 35 akka.remote.RemoteSettings {code} > used metaspace grow on every execution > -- > > Key: FLINK-19005 > URL: https://issues.apache.org/jira/browse/FLINK-19005 > Project: Flink > Issue Type: Bug > Components: API / DataSet, Client / Job Submission >Affects Versions: 1.11.1 >Reporter: Guillermo Sánchez >Assignee: Chesnay Schepler >Priority: Major > Attachments: heap_dump_after_10_executions.zip, > heap_dump_after_1_execution.zip > > > Hi ! > Im running a 1.11.1 flink cluster, where I execute batch jobs made with > DataSet API. > I submit these jobs every day to calculate daily data. > In every execution, cluster's used metaspace increase by 7MB and its never > released. > This ends up with an OutOfMemoryError caused by Metaspace every 15 days and i > need to restart the cluster to clean the metaspace > taskmanager.memory.jvm-metaspace.size is set to 512mb > Any idea of what could be causing this metaspace grow and why is it not > released ? > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-19005) used metaspace grow on every execution
[ https://issues.apache.org/jira/browse/FLINK-19005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17181871#comment-17181871 ] Matthias commented on FLINK-19005: -- Thanks for the update: A quick diff shows already that there is a growing number of class generated through reflection: {code:bash} # analysis on the "after 1 execution" heap dump FLINK-19005 grep -o "class .*" after_1/out.html | sed -e 's~class ~~g' -e 's~~~g' | cut -d'$' -f1 | sed 's/[0-9]*$//g'| sort | uniq -c | sort -rn | head 287 jdk.internal.reflect.GeneratedSerializationConstructorAccessor 150 jdk.internal.reflect.GeneratedMethodAccessor 136 oracle.jdbc.driver.Redirector 41 org.apache.flink.shaded.guava18.com.google.common.cache.LocalCache 37 akka.remote.WireFormats 37 akka.remote.EndpointManager 35 org.apache.flink.shaded.curator.org.apache.curator.shaded.com.google.common.cache.LocalCache 35 akka.remote.RemoteSettings 33 org.apache.flink.shaded.hadoop2.com.google.common.cache.LocalCache 31 akka.remote.serialization.MiscMessageSerializer {code} {code:bash} # analysis on the "after 10 executions" heap dump FLINK-19005 grep -o "class .*" after_10/out.html | sed -e 's~class ~~g' -e 's~~~g' | cut -d'$' -f1 | sed 's/[0-9]*$//g'| sort | uniq -c | sort -rn | head 575 jdk.internal.reflect.GeneratedSerializationConstructorAccessor 223 jdk.internal.reflect.GeneratedMethodAccessor 136 oracle.jdbc.driver.Redirector 49 com.sun.proxy. 41 org.apache.flink.shaded.guava18.com.google.common.cache.LocalCache 37 akka.remote.WireFormats 37 akka.remote.EndpointManager 36 jdk.internal.reflect.GeneratedConstructorAccessor 35 org.apache.flink.shaded.curator.org.apache.curator.shaded.com.google.common.cache.LocalCache 35 akka.remote.RemoteSettings {code} > used metaspace grow on every execution > -- > > Key: FLINK-19005 > URL: https://issues.apache.org/jira/browse/FLINK-19005 > Project: Flink > Issue Type: Bug > Components: API / DataSet, Client / Job Submission >Affects Versions: 1.11.1 >Reporter: Guillermo Sánchez >Assignee: Chesnay Schepler >Priority: Major > Attachments: heap_dump_after_10_executions.zip, > heap_dump_after_1_execution.zip > > > Hi ! > Im running a 1.11.1 flink cluster, where I execute batch jobs made with > DataSet API. > I submit these jobs every day to calculate daily data. > In every execution, cluster's used metaspace increase by 7MB and its never > released. > This ends up with an OutOfMemoryError caused by Metaspace every 15 days and i > need to restart the cluster to clean the metaspace > taskmanager.memory.jvm-metaspace.size is set to 512mb > Any idea of what could be causing this metaspace grow and why is it not > released ? > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-19018) Add connection timeout to remote functions.
Igal Shilman created FLINK-19018: Summary: Add connection timeout to remote functions. Key: FLINK-19018 URL: https://issues.apache.org/jira/browse/FLINK-19018 Project: Flink Issue Type: Improvement Components: Stateful Functions Reporter: Igal Shilman Currently we only have total request timeout which also incorporates connection timeout. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [flink] flinkbot edited a comment on pull request #13216: [FLINK-18999][table-planner-blink][hive] Temporary generic table does…
flinkbot edited a comment on pull request #13216: URL: https://github.com/apache/flink/pull/13216#issuecomment-678268420 ## CI report: * 1df875a397cf4d7cf68e46f695a01ef6734b3af1 Azure: [FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=5775) Bot commands The @flinkbot bot supports the following commands: - `@flinkbot run travis` re-run the last Travis build - `@flinkbot run azure` re-run the last Azure build This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] flinkbot commented on pull request #13217: [FLINK-16866] Make job submission non-blocking
flinkbot commented on pull request #13217: URL: https://github.com/apache/flink/pull/13217#issuecomment-678277886 Thanks a lot for your contribution to the Apache Flink project. I'm the @flinkbot. I help the community to review your pull request. We will use this comment to track the progress of the review. ## Automated Checks Last check on commit 85c9bbac2d9ca07435fbd80d76208fa2a51e5d37 (Fri Aug 21 13:00:46 UTC 2020) **Warnings:** * No documentation files were touched! Remember to keep the Flink docs up to date! Mention the bot in a comment to re-run the automated checks. ## Review Progress * ❓ 1. The [description] looks good. * ❓ 2. There is [consensus] that the contribution should go into to Flink. * ❓ 3. Needs [attention] from. * ❓ 4. The change fits into the overall [architecture]. * ❓ 5. Overall code [quality] is good. Please see the [Pull Request Review Guide](https://flink.apache.org/contributing/reviewing-prs.html) for a full explanation of the review process. The Bot is tracking the review progress through labels. Labels are applied according to the order of the review items. For consensus, approval by a Flink committer of PMC member is required Bot commands The @flinkbot bot supports the following commands: - `@flinkbot approve description` to approve one or more aspects (aspects: `description`, `consensus`, `architecture` and `quality`) - `@flinkbot approve all` to approve all aspects - `@flinkbot approve-until architecture` to approve everything until `architecture` - `@flinkbot attention @username1 [@username2 ..]` to require somebody's attention - `@flinkbot disapprove architecture` to remove an approval you gave earlier This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] flinkbot edited a comment on pull request #13141: [FLINK-18852] Fix StreamScan doesn't inherit parallelism from input in legacy planner
flinkbot edited a comment on pull request #13141: URL: https://github.com/apache/flink/pull/13141#issuecomment-673492124 ## CI report: * 0e9c4198095239ebee07a466a5e23de1a60809ac Azure: [SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=5768) Azure: [FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=5581) Bot commands The @flinkbot bot supports the following commands: - `@flinkbot run travis` re-run the last Travis build - `@flinkbot run azure` re-run the last Azure build This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (FLINK-16866) Make job submission non-blocking
[ https://issues.apache.org/jira/browse/FLINK-16866?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated FLINK-16866: --- Labels: pull-request-available (was: ) > Make job submission non-blocking > > > Key: FLINK-16866 > URL: https://issues.apache.org/jira/browse/FLINK-16866 > Project: Flink > Issue Type: Improvement > Components: Runtime / Coordination >Affects Versions: 1.9.2, 1.10.0, 1.11.0 >Reporter: Till Rohrmann >Assignee: Robert Metzger >Priority: Critical > Labels: pull-request-available > Fix For: 1.12.0 > > > Currently, Flink waits to acknowledge a job submission until the > corresponding {{JobManager}} has been created. Since its creation also > involves the creation of the {{ExecutionGraph}} and potential FS operations, > it can take a bit of time. If the user has configured a too low > {{web.timeout}}, the submission can time out only reporting a > {{TimeoutException}} to the user. > I propose to change the notion of job submission slightly. Instead of waiting > until the {{JobManager}} has been created, a job submission is complete once > all job relevant files have been uploaded to the {{Dispatcher}} and the > {{Dispatcher}} has been told about it. Creating the {{JobManager}} will then > belong to the actual job execution. Consequently, if problems occur while > creating the {{JobManager}} it will result into a job failure. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [flink] rmetzger opened a new pull request #13217: [FLINK-16866] Make jobsubmission non-blocking
rmetzger opened a new pull request #13217: URL: https://github.com/apache/flink/pull/13217 ## What is the purpose of the change This is changing the semantics of the job submission: Instead of completing the `Dispatcher.submitJob()` future after all the initialization happened (which can potentially involve calling external systems etc.), the `.submitJob()` call now returns as soon as the job has been accepted by the Dispatcher. The benefit of this change is that the users will see the root cause of a submission timeout, instead of an akka.ask.timeout. ## Brief change log - Introduce a `DispatcherJob` abstraction that manages the job in a new `INITIALIZING` state - Change web frontend to cope with initializing jobs - change clients to submit & poll ## Verifying this change This PR introduces various new tests for verification. ## Does this pull request potentially affect one of the following parts: - Dependencies (does it add or upgrade a dependency): no - The public API, i.e., is any changed class annotated with `@Public(Evolving)`: no - The serializers: no - The runtime per-record code paths (performance sensitive): no - Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Kubernetes/Yarn/Mesos, ZooKeeper: yes - The S3 file system connector: no ## Documentation This change is transparent to the user and doesn't need a documentation update. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Commented] (FLINK-19005) used metaspace grow on every execution
[ https://issues.apache.org/jira/browse/FLINK-19005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17181864#comment-17181864 ] Guillermo Sánchez commented on FLINK-19005: --- I have uploaded heap_dump_after_1_executions.zip fixed Thanks > used metaspace grow on every execution > -- > > Key: FLINK-19005 > URL: https://issues.apache.org/jira/browse/FLINK-19005 > Project: Flink > Issue Type: Bug > Components: API / DataSet, Client / Job Submission >Affects Versions: 1.11.1 >Reporter: Guillermo Sánchez >Assignee: Chesnay Schepler >Priority: Major > Attachments: heap_dump_after_10_executions.zip, > heap_dump_after_1_execution.zip > > > Hi ! > Im running a 1.11.1 flink cluster, where I execute batch jobs made with > DataSet API. > I submit these jobs every day to calculate daily data. > In every execution, cluster's used metaspace increase by 7MB and its never > released. > This ends up with an OutOfMemoryError caused by Metaspace every 15 days and i > need to restart the cluster to clean the metaspace > taskmanager.memory.jvm-metaspace.size is set to 512mb > Any idea of what could be causing this metaspace grow and why is it not > released ? > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (FLINK-18959) Fail to archiveExecutionGraph because job is not finished when dispatcher close
[ https://issues.apache.org/jira/browse/FLINK-18959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17181861#comment-17181861 ] Till Rohrmann edited comment on FLINK-18959 at 8/21/20, 12:48 PM: -- I agree that we should fix this issue [~Jiangang]. The thing I am asking myself is why it has been done differently in the first place because the original behaviour has been like you described it in your solution proposal. That's why I've pulled in Aljoscha and Tison who worked on the change. What you could try to do is to revert the {{MiniDispatcher}} changes and then see whether some tests fail [~Jiangang]. was (Author: till.rohrmann): I agree that we should fix this issue [~Jiangang]. The thing I am asking myself is why it has been done differently in the first place because the original behaviour has been like you described it in your solution proposal. That's why I've pulled in Aljoscha and Tison who worked on the change. > Fail to archiveExecutionGraph because job is not finished when dispatcher > close > --- > > Key: FLINK-18959 > URL: https://issues.apache.org/jira/browse/FLINK-18959 > Project: Flink > Issue Type: Bug > Components: Runtime / Coordination >Affects Versions: 1.10.0, 1.12.0, 1.11.1 >Reporter: Liu >Priority: Critical > Fix For: 1.12.0, 1.11.2, 1.10.3 > > Attachments: flink-debug-log > > > When job is cancelled, we expect to see it in flink's history server. But I > can not see my job after it is cancelled. > After digging into the problem, I find that the function > archiveExecutionGraph is not executed. Below is the brief log: > {panel:title=log} > 2020-08-14 15:10:06,406 INFO > org.apache.flink.runtime.executiongraph.ExecutionGraph > [flink-akka.actor.default-dispatcher- 15] - Job EtlAndWindow > (6f784d4cc5bae88a332d254b21660372) switched from state RUNNING to CANCELLING. > 2020-08-14 15:10:06,415 DEBUG > org.apache.flink.runtime.dispatcher.MiniDispatcher > [flink-akka.actor.default-dispatcher-3] - Shutting down per-job cluster > because the job was canceled. > 2020-08-14 15:10:06,629 INFO > org.apache.flink.runtime.dispatcher.MiniDispatcher > [flink-akka.actor.default-dispatcher-3] - Stopping dispatcher > akka.tcp://flink@bjfk-c9865.yz02:38663/user/dispatcher. > 2020-08-14 15:10:06,629 INFO > org.apache.flink.runtime.dispatcher.MiniDispatcher > [flink-akka.actor.default-dispatcher-3] - Stopping all currently running jobs > of dispatcher akka.tcp://flink@bjfk-c9865.yz02:38663/user/dispatcher. > 2020-08-14 15:10:06,631 INFO org.apache.flink.runtime.jobmaster.JobMaster > [flink-akka.actor.default-dispatcher-29] - Stopping the JobMaster for job > EtlAndWindow(6f784d4cc5bae88a332d254b21660372). > 2020-08-14 15:10:06,632 DEBUG org.apache.flink.runtime.jobmaster.JobMaster > [flink-akka.actor.default-dispatcher-29] - Disconnect TaskExecutor > container_e144_1590060720089_2161_01_06 because: Stopping JobMaster for > job EtlAndWindow(6f784d4cc5bae88a332d254b21660372). > 2020-08-14 15:10:06,646 INFO > org.apache.flink.runtime.executiongraph.ExecutionGraph > [flink-akka.actor.default-dispatcher-29] - Job EtlAndWindow > (6f784d4cc5bae88a332d254b21660372) switched from state CANCELLING to CANCELED. > 2020-08-14 15:10:06,664 DEBUG > org.apache.flink.runtime.dispatcher.MiniDispatcher > [flink-akka.actor.default-dispatcher-4] - There is a newer JobManagerRunner > for the job 6f784d4cc5bae88a332d254b21660372. > {panel} > From the log, we can see that job is not finished when dispatcher closes. The > process is as following: > * Receive cancel command and send it to all tasks async. > * In MiniDispatcher, begin to shutting down per-job cluster. > * Stopping dispatcher and remove job. > * Job is cancelled and callback is executed in method startJobManagerRunner. > * Because job is removed before, so currentJobManagerRunner is null which > not equals to the original jobManagerRunner. In this case, > archivedExecutionGraph will not be uploaded. > In normal cases, I find that job is cancelled first and then dispatcher is > stopped so that archivedExecutionGraph will succeed. But I think that the > order is not constrained and it is hard to know which comes first. > Above is what I suspected. If so, then we should fix it. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-18959) Fail to archiveExecutionGraph because job is not finished when dispatcher close
[ https://issues.apache.org/jira/browse/FLINK-18959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17181861#comment-17181861 ] Till Rohrmann commented on FLINK-18959: --- I agree that we should fix this issue [~Jiangang]. The thing I am asking myself is why it has been done differently in the first place because the original behaviour has been like you described it in your solution proposal. That's why I've pulled in Aljoscha and Tison who worked on the change. > Fail to archiveExecutionGraph because job is not finished when dispatcher > close > --- > > Key: FLINK-18959 > URL: https://issues.apache.org/jira/browse/FLINK-18959 > Project: Flink > Issue Type: Bug > Components: Runtime / Coordination >Affects Versions: 1.10.0, 1.12.0, 1.11.1 >Reporter: Liu >Priority: Critical > Fix For: 1.12.0, 1.11.2, 1.10.3 > > Attachments: flink-debug-log > > > When job is cancelled, we expect to see it in flink's history server. But I > can not see my job after it is cancelled. > After digging into the problem, I find that the function > archiveExecutionGraph is not executed. Below is the brief log: > {panel:title=log} > 2020-08-14 15:10:06,406 INFO > org.apache.flink.runtime.executiongraph.ExecutionGraph > [flink-akka.actor.default-dispatcher- 15] - Job EtlAndWindow > (6f784d4cc5bae88a332d254b21660372) switched from state RUNNING to CANCELLING. > 2020-08-14 15:10:06,415 DEBUG > org.apache.flink.runtime.dispatcher.MiniDispatcher > [flink-akka.actor.default-dispatcher-3] - Shutting down per-job cluster > because the job was canceled. > 2020-08-14 15:10:06,629 INFO > org.apache.flink.runtime.dispatcher.MiniDispatcher > [flink-akka.actor.default-dispatcher-3] - Stopping dispatcher > akka.tcp://flink@bjfk-c9865.yz02:38663/user/dispatcher. > 2020-08-14 15:10:06,629 INFO > org.apache.flink.runtime.dispatcher.MiniDispatcher > [flink-akka.actor.default-dispatcher-3] - Stopping all currently running jobs > of dispatcher akka.tcp://flink@bjfk-c9865.yz02:38663/user/dispatcher. > 2020-08-14 15:10:06,631 INFO org.apache.flink.runtime.jobmaster.JobMaster > [flink-akka.actor.default-dispatcher-29] - Stopping the JobMaster for job > EtlAndWindow(6f784d4cc5bae88a332d254b21660372). > 2020-08-14 15:10:06,632 DEBUG org.apache.flink.runtime.jobmaster.JobMaster > [flink-akka.actor.default-dispatcher-29] - Disconnect TaskExecutor > container_e144_1590060720089_2161_01_06 because: Stopping JobMaster for > job EtlAndWindow(6f784d4cc5bae88a332d254b21660372). > 2020-08-14 15:10:06,646 INFO > org.apache.flink.runtime.executiongraph.ExecutionGraph > [flink-akka.actor.default-dispatcher-29] - Job EtlAndWindow > (6f784d4cc5bae88a332d254b21660372) switched from state CANCELLING to CANCELED. > 2020-08-14 15:10:06,664 DEBUG > org.apache.flink.runtime.dispatcher.MiniDispatcher > [flink-akka.actor.default-dispatcher-4] - There is a newer JobManagerRunner > for the job 6f784d4cc5bae88a332d254b21660372. > {panel} > From the log, we can see that job is not finished when dispatcher closes. The > process is as following: > * Receive cancel command and send it to all tasks async. > * In MiniDispatcher, begin to shutting down per-job cluster. > * Stopping dispatcher and remove job. > * Job is cancelled and callback is executed in method startJobManagerRunner. > * Because job is removed before, so currentJobManagerRunner is null which > not equals to the original jobManagerRunner. In this case, > archivedExecutionGraph will not be uploaded. > In normal cases, I find that job is cancelled first and then dispatcher is > stopped so that archivedExecutionGraph will succeed. But I think that the > order is not constrained and it is hard to know which comes first. > Above is what I suspected. If so, then we should fix it. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (FLINK-19017) Increases the visibility of a remote function call failures and retires
[ https://issues.apache.org/jira/browse/FLINK-19017?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Igal Shilman updated FLINK-19017: - Summary: Increases the visibility of a remote function call failures and retires (was: Increases the visibility of a remote function call retry) > Increases the visibility of a remote function call failures and retires > --- > > Key: FLINK-19017 > URL: https://issues.apache.org/jira/browse/FLINK-19017 > Project: Flink > Issue Type: Improvement > Components: Stateful Functions >Reporter: Igal Shilman >Assignee: Igal Shilman >Priority: Major > > Right now it is difficult to see if a remote function call is being retired, > this issue would want to improve that situation. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (FLINK-19005) used metaspace grow on every execution
[ https://issues.apache.org/jira/browse/FLINK-19005?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guillermo Sánchez updated FLINK-19005: -- Attachment: heap_dump_after_1_execution.zip > used metaspace grow on every execution > -- > > Key: FLINK-19005 > URL: https://issues.apache.org/jira/browse/FLINK-19005 > Project: Flink > Issue Type: Bug > Components: API / DataSet, Client / Job Submission >Affects Versions: 1.11.1 >Reporter: Guillermo Sánchez >Assignee: Chesnay Schepler >Priority: Major > Attachments: heap_dump_after_10_executions.zip, > heap_dump_after_1_execution.zip > > > Hi ! > Im running a 1.11.1 flink cluster, where I execute batch jobs made with > DataSet API. > I submit these jobs every day to calculate daily data. > In every execution, cluster's used metaspace increase by 7MB and its never > released. > This ends up with an OutOfMemoryError caused by Metaspace every 15 days and i > need to restart the cluster to clean the metaspace > taskmanager.memory.jvm-metaspace.size is set to 512mb > Any idea of what could be causing this metaspace grow and why is it not > released ? > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (FLINK-19005) used metaspace grow on every execution
[ https://issues.apache.org/jira/browse/FLINK-19005?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guillermo Sánchez updated FLINK-19005: -- Attachment: (was: heap_dump_after_1_execution.zip) > used metaspace grow on every execution > -- > > Key: FLINK-19005 > URL: https://issues.apache.org/jira/browse/FLINK-19005 > Project: Flink > Issue Type: Bug > Components: API / DataSet, Client / Job Submission >Affects Versions: 1.11.1 >Reporter: Guillermo Sánchez >Assignee: Chesnay Schepler >Priority: Major > Attachments: heap_dump_after_10_executions.zip > > > Hi ! > Im running a 1.11.1 flink cluster, where I execute batch jobs made with > DataSet API. > I submit these jobs every day to calculate daily data. > In every execution, cluster's used metaspace increase by 7MB and its never > released. > This ends up with an OutOfMemoryError caused by Metaspace every 15 days and i > need to restart the cluster to clean the metaspace > taskmanager.memory.jvm-metaspace.size is set to 512mb > Any idea of what could be causing this metaspace grow and why is it not > released ? > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [flink] flinkbot commented on pull request #13216: [FLINK-18999][table-planner-blink][hive] Temporary generic table does…
flinkbot commented on pull request #13216: URL: https://github.com/apache/flink/pull/13216#issuecomment-678268420 ## CI report: * 1df875a397cf4d7cf68e46f695a01ef6734b3af1 UNKNOWN Bot commands The @flinkbot bot supports the following commands: - `@flinkbot run travis` re-run the last Travis build - `@flinkbot run azure` re-run the last Azure build This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Created] (FLINK-19017) Increases the visibility of a remote function call retry
Igal Shilman created FLINK-19017: Summary: Increases the visibility of a remote function call retry Key: FLINK-19017 URL: https://issues.apache.org/jira/browse/FLINK-19017 Project: Flink Issue Type: Improvement Components: Stateful Functions Reporter: Igal Shilman Assignee: Igal Shilman Right now it is difficult to see if a remote function call is being retired, this issue would want to improve that situation. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (FLINK-19005) used metaspace grow on every execution
[ https://issues.apache.org/jira/browse/FLINK-19005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17181848#comment-17181848 ] Matthias edited comment on FLINK-19005 at 8/21/20, 12:22 PM: - Hi [~gestevez], thanks for getting back to us. It looks like {{heap_dump_after_1_executions.zip}} contains the {{heap_dump_after_10_execution.zip}}. May you update the {{heap_dump_after_1_executions.zip}} attachment? was (Author: mapohl): Hi [~gestevez], thanks for getting back to us. It looks like {{heap_dump_after_10_executions.zip}} contains the {{heap_dump_after_1_execution.zip}}. May you update the {{heap_dump_after_10_executions.zip}} attachment? > used metaspace grow on every execution > -- > > Key: FLINK-19005 > URL: https://issues.apache.org/jira/browse/FLINK-19005 > Project: Flink > Issue Type: Bug > Components: API / DataSet, Client / Job Submission >Affects Versions: 1.11.1 >Reporter: Guillermo Sánchez >Assignee: Chesnay Schepler >Priority: Major > Attachments: heap_dump_after_10_executions.zip, > heap_dump_after_1_execution.zip > > > Hi ! > Im running a 1.11.1 flink cluster, where I execute batch jobs made with > DataSet API. > I submit these jobs every day to calculate daily data. > In every execution, cluster's used metaspace increase by 7MB and its never > released. > This ends up with an OutOfMemoryError caused by Metaspace every 15 days and i > need to restart the cluster to clean the metaspace > taskmanager.memory.jvm-metaspace.size is set to 512mb > Any idea of what could be causing this metaspace grow and why is it not > released ? > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [flink] flinkbot commented on pull request #13216: [FLINK-18999][table-planner-blink][hive] Temporary generic table does…
flinkbot commented on pull request #13216: URL: https://github.com/apache/flink/pull/13216#issuecomment-678262611 Thanks a lot for your contribution to the Apache Flink project. I'm the @flinkbot. I help the community to review your pull request. We will use this comment to track the progress of the review. ## Automated Checks Last check on commit 1df875a397cf4d7cf68e46f695a01ef6734b3af1 (Fri Aug 21 12:22:33 UTC 2020) **Warnings:** * No documentation files were touched! Remember to keep the Flink docs up to date! Mention the bot in a comment to re-run the automated checks. ## Review Progress * ❓ 1. The [description] looks good. * ❓ 2. There is [consensus] that the contribution should go into to Flink. * ❓ 3. Needs [attention] from. * ❓ 4. The change fits into the overall [architecture]. * ❓ 5. Overall code [quality] is good. Please see the [Pull Request Review Guide](https://flink.apache.org/contributing/reviewing-prs.html) for a full explanation of the review process. The Bot is tracking the review progress through labels. Labels are applied according to the order of the review items. For consensus, approval by a Flink committer of PMC member is required Bot commands The @flinkbot bot supports the following commands: - `@flinkbot approve description` to approve one or more aspects (aspects: `description`, `consensus`, `architecture` and `quality`) - `@flinkbot approve all` to approve all aspects - `@flinkbot approve-until architecture` to approve everything until `architecture` - `@flinkbot attention @username1 [@username2 ..]` to require somebody's attention - `@flinkbot disapprove architecture` to remove an approval you gave earlier This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] flinkbot edited a comment on pull request #13209: [FLINK-18832][datastream] Add compatible check for blocking partition with buffer timeout
flinkbot edited a comment on pull request #13209: URL: https://github.com/apache/flink/pull/13209#issuecomment-677744672 ## CI report: * 22f17388c9b9beba64aa8ab4bcd2c0ae6d344308 Azure: [FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=5770) Bot commands The @flinkbot bot supports the following commands: - `@flinkbot run travis` re-run the last Travis build - `@flinkbot run azure` re-run the last Azure build This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] flinkbot edited a comment on pull request #13206: [FLINK-18948][python] Add end to end test for Python DataStream API.
flinkbot edited a comment on pull request #13206: URL: https://github.com/apache/flink/pull/13206#issuecomment-677500353 ## CI report: * 5b6041fc76df9a5345ae672cd29e5e3b0be51f73 Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=5774) Azure: [FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=5764) Bot commands The @flinkbot bot supports the following commands: - `@flinkbot run travis` re-run the last Travis build - `@flinkbot run azure` re-run the last Azure build This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] flinkbot edited a comment on pull request #13050: [FLINK-18750][table] SqlValidatorException thrown when select from a …
flinkbot edited a comment on pull request #13050: URL: https://github.com/apache/flink/pull/13050#issuecomment-667904442 ## CI report: * efe2b4b092cbce31dee74b4261ca7a20904b2000 Azure: [SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=5766) Bot commands The @flinkbot bot supports the following commands: - `@flinkbot run travis` re-run the last Travis build - `@flinkbot run azure` re-run the last Azure build This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Commented] (FLINK-19005) used metaspace grow on every execution
[ https://issues.apache.org/jira/browse/FLINK-19005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17181848#comment-17181848 ] Matthias commented on FLINK-19005: -- Hi [~gestevez], thanks for getting back to us. It looks like {{heap_dump_after_10_executions.zip}} contains the {{heap_dump_after_1_execution.zip}}. May you update the {{heap_dump_after_10_executions.zip}} attachment? > used metaspace grow on every execution > -- > > Key: FLINK-19005 > URL: https://issues.apache.org/jira/browse/FLINK-19005 > Project: Flink > Issue Type: Bug > Components: API / DataSet, Client / Job Submission >Affects Versions: 1.11.1 >Reporter: Guillermo Sánchez >Assignee: Chesnay Schepler >Priority: Major > Attachments: heap_dump_after_10_executions.zip, > heap_dump_after_1_execution.zip > > > Hi ! > Im running a 1.11.1 flink cluster, where I execute batch jobs made with > DataSet API. > I submit these jobs every day to calculate daily data. > In every execution, cluster's used metaspace increase by 7MB and its never > released. > This ends up with an OutOfMemoryError caused by Metaspace every 15 days and i > need to restart the cluster to clean the metaspace > taskmanager.memory.jvm-metaspace.size is set to 512mb > Any idea of what could be causing this metaspace grow and why is it not > released ? > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (FLINK-18999) Temporary generic table doesn't work with HiveCatalog
[ https://issues.apache.org/jira/browse/FLINK-18999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated FLINK-18999: --- Labels: pull-request-available (was: ) > Temporary generic table doesn't work with HiveCatalog > - > > Key: FLINK-18999 > URL: https://issues.apache.org/jira/browse/FLINK-18999 > Project: Flink > Issue Type: Bug > Components: Connectors / Hive >Reporter: Rui Li >Assignee: Rui Li >Priority: Major > Labels: pull-request-available > Fix For: 1.12.0 > > > Suppose current catalog is a {{HiveCatalog}}. If user creates a temporary > generic table, this table cannot be accessed in SQL queries. Will hit > exception like: > {noformat} > Caused by: org.apache.hadoop.hive.metastore.api.NoSuchObjectException: DB.TBL > table not found > at > org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_table_req_result$get_table_req_resultStandardScheme.read(ThriftHiveMetastore.java:55064) > ~[hive-exec-2.3.4.jar:2.3.4] > at > org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_table_req_result$get_table_req_resultStandardScheme.read(ThriftHiveMetastore.java:55032) > ~[hive-exec-2.3.4.jar:2.3.4] > at > org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_table_req_result.read(ThriftHiveMetastore.java:54963) > ~[hive-exec-2.3.4.jar:2.3.4] > at > org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:86) > ~[hive-exec-2.3.4.jar:2.3.4] > at > org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_get_table_req(ThriftHiveMetastore.java:1563) > ~[hive-exec-2.3.4.jar:2.3.4] > at > org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.get_table_req(ThriftHiveMetastore.java:1550) > ~[hive-exec-2.3.4.jar:2.3.4] > at > org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getTable(HiveMetaStoreClient.java:1344) > ~[hive-exec-2.3.4.jar:2.3.4] > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > ~[?:1.8.0_181] > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > ~[?:1.8.0_181] > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > ~[?:1.8.0_181] > at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_181] > at > org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:169) > ~[hive-exec-2.3.4.jar:2.3.4] > at com.sun.proxy.$Proxy28.getTable(Unknown Source) ~[?:?] > at > org.apache.flink.table.catalog.hive.client.HiveMetastoreClientWrapper.getTable(HiveMetastoreClientWrapper.java:112) > ~[flink-connector-hive_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT] > at > org.apache.flink.connectors.hive.HiveTableSource.initAllPartitions(HiveTableSource.java:415) > ~[flink-connector-hive_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT] > at > org.apache.flink.connectors.hive.HiveTableSource.getDataStream(HiveTableSource.java:171) > ~[flink-connector-hive_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT] > at > org.apache.flink.table.planner.plan.nodes.physical.PhysicalLegacyTableSourceScan.getSourceTransformation(PhysicalLegacyTableSourceScan.scala:82) > ~[flink-table-blink_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT] > at > org.apache.flink.table.planner.plan.nodes.physical.stream.StreamExecLegacyTableSourceScan.translateToPlanInternal(StreamExecLegacyTableSourceScan.scala:98) > ~[flink-table-blink_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT] > at > org.apache.flink.table.planner.plan.nodes.physical.stream.StreamExecLegacyTableSourceScan.translateToPlanInternal(StreamExecLegacyTableSourceScan.scala:63) > ~[flink-table-blink_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT] > at > org.apache.flink.table.planner.plan.nodes.exec.ExecNode$class.translateToPlan(ExecNode.scala:58) > ~[flink-table-blink_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT] > at > org.apache.flink.table.planner.plan.nodes.physical.stream.StreamExecLegacyTableSourceScan.translateToPlan(StreamExecLegacyTableSourceScan.scala:63) > ~[flink-table-blink_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT] > at > org.apache.flink.table.planner.plan.nodes.physical.stream.StreamExecLegacySink.translateToTransformation(StreamExecLegacySink.scala:158) > ~[flink-table-blink_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT] > at > org.apache.flink.table.planner.plan.nodes.physical.stream.StreamExecLegacySink.translateToPlanInternal(StreamExecLegacySink.scala:82) > ~[flink-table-blink_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT] > at > org.apache.flink.table.planner.plan.nodes.physical.stream.StreamExecLegacySink.translateToPlanInternal(StreamExecLegacySink.scala:48) >
[GitHub] [flink] lirui-apache opened a new pull request #13216: [FLINK-18999][table-planner-blink][hive] Temporary generic table does…
lirui-apache opened a new pull request #13216: URL: https://github.com/apache/flink/pull/13216 …n't work with HiveCatalog ## What is the purpose of the change Fix the issue that generic temporary table can't be used with hive catalog ## Brief change log - Avoid using catalog table factory for temporary tables - Add test cases ## Verifying this change Existing and added test cases ## Does this pull request potentially affect one of the following parts: - Dependencies (does it add or upgrade a dependency): no - The public API, i.e., is any changed class annotated with `@Public(Evolving)`: no - The serializers: no - The runtime per-record code paths (performance sensitive): no - Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Kubernetes/Yarn/Mesos, ZooKeeper: no - The S3 file system connector: no ## Documentation - Does this pull request introduce a new feature? no - If yes, how is the feature documented? N/A This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] flinkbot edited a comment on pull request #13206: [FLINK-18948][python] Add end to end test for Python DataStream API.
flinkbot edited a comment on pull request #13206: URL: https://github.com/apache/flink/pull/13206#issuecomment-677500353 ## CI report: * 5b6041fc76df9a5345ae672cd29e5e3b0be51f73 Azure: [FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=5764) Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=5774) Bot commands The @flinkbot bot supports the following commands: - `@flinkbot run travis` re-run the last Travis build - `@flinkbot run azure` re-run the last Azure build This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] flinkbot edited a comment on pull request #13212: [FLINK-18973][docs-zh] Translate the 'History Server' page of 'Debugging & Monitoring' into Chinese
flinkbot edited a comment on pull request #13212: URL: https://github.com/apache/flink/pull/13212#issuecomment-678069566 ## CI report: * 0371b8be3b5a368623bf8dc553f94fd57cf0c60e Azure: [FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=5765) Bot commands The @flinkbot bot supports the following commands: - `@flinkbot run travis` re-run the last Travis build - `@flinkbot run azure` re-run the last Azure build This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org