[jira] [Commented] (FLINK-9185) Potential null dereference in PrioritizedOperatorSubtaskState#resolvePrioritizedAlternatives
[ https://issues.apache.org/jira/browse/FLINK-9185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16491491#comment-16491491 ] Ted Yu commented on FLINK-9185: --- Can someone assign this to Stephen ? Thanks > Potential null dereference in > PrioritizedOperatorSubtaskState#resolvePrioritizedAlternatives > > > Key: FLINK-9185 > URL: https://issues.apache.org/jira/browse/FLINK-9185 > Project: Flink > Issue Type: Bug >Reporter: Ted Yu >Priority: Minor > > {code} > if (alternative != null > && alternative.hasState() > && alternative.size() == 1 > && approveFun.apply(reference, alternative.iterator().next())) { > {code} > The return value from approveFun.apply would be unboxed. > We should check that the return value is not null. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-8430) Implement stream-stream non-window full outer join
[ https://issues.apache.org/jira/browse/FLINK-8430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16491482#comment-16491482 ] ASF GitHub Bot commented on FLINK-8430: --- GitHub user hequn8128 opened a pull request: https://github.com/apache/flink/pull/6079 [FLINK-8430] [table] Implement stream-stream non-window full outer join ## What is the purpose of the change Support stream-stream non-window full outer join. ## Brief change log - Add full join process function, including `NonWindowFullJoin` and `NonWindowFullJoinWithNonEquiPredicates`. - Add IT/UT/Harness tests for full outer join - Change document of stream-stream join. ## Verifying this change This change added tests and can be verified as follows: - Added integration tests for full join with or without non-equal predicates. - Added HarnessTests full join with or without non-equal predicates. - Add tests for AccMode generate by full join. ## Does this pull request potentially affect one of the following parts: - Dependencies (does it add or upgrade a dependency): (no) - The public API, i.e., is any changed class annotated with `@Public(Evolving)`: (no) - The serializers: (no) - The runtime per-record code paths (performance sensitive): (no) - Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Yarn/Mesos, ZooKeeper: (no) - The S3 file system connector: (no) ## Documentation - Does this pull request introduce a new feature? (yes) - If yes, how is the feature documented? (docs) You can merge this pull request into a Git repository by running: $ git pull https://github.com/hequn8128/flink 8430 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/6079.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #6079 commit 8b9e8c09aae7d72fae3a6c6bed3327376029a5a2 Author: hequn8128Date: 2018-05-26T02:29:51Z [FLINK-8430] [table] Implement stream-stream non-window full outer join > Implement stream-stream non-window full outer join > -- > > Key: FLINK-8430 > URL: https://issues.apache.org/jira/browse/FLINK-8430 > Project: Flink > Issue Type: Sub-task > Components: Table API SQL >Reporter: Hequn Cheng >Assignee: Hequn Cheng >Priority: Major > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] flink pull request #6079: [FLINK-8430] [table] Implement stream-stream non-w...
GitHub user hequn8128 opened a pull request: https://github.com/apache/flink/pull/6079 [FLINK-8430] [table] Implement stream-stream non-window full outer join ## What is the purpose of the change Support stream-stream non-window full outer join. ## Brief change log - Add full join process function, including `NonWindowFullJoin` and `NonWindowFullJoinWithNonEquiPredicates`. - Add IT/UT/Harness tests for full outer join - Change document of stream-stream join. ## Verifying this change This change added tests and can be verified as follows: - Added integration tests for full join with or without non-equal predicates. - Added HarnessTests full join with or without non-equal predicates. - Add tests for AccMode generate by full join. ## Does this pull request potentially affect one of the following parts: - Dependencies (does it add or upgrade a dependency): (no) - The public API, i.e., is any changed class annotated with `@Public(Evolving)`: (no) - The serializers: (no) - The runtime per-record code paths (performance sensitive): (no) - Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Yarn/Mesos, ZooKeeper: (no) - The S3 file system connector: (no) ## Documentation - Does this pull request introduce a new feature? (yes) - If yes, how is the feature documented? (docs) You can merge this pull request into a Git repository by running: $ git pull https://github.com/hequn8128/flink 8430 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/6079.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #6079 commit 8b9e8c09aae7d72fae3a6c6bed3327376029a5a2 Author: hequn8128Date: 2018-05-26T02:29:51Z [FLINK-8430] [table] Implement stream-stream non-window full outer join ---
[jira] [Assigned] (FLINK-5505) Harmonize ZooKeeper configuration parameters
[ https://issues.apache.org/jira/browse/FLINK-5505?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] vinoyang reassigned FLINK-5505: --- Assignee: vinoyang > Harmonize ZooKeeper configuration parameters > > > Key: FLINK-5505 > URL: https://issues.apache.org/jira/browse/FLINK-5505 > Project: Flink > Issue Type: Improvement > Components: Mesos >Affects Versions: 1.2.0, 1.3.0 >Reporter: Till Rohrmann >Assignee: vinoyang >Priority: Trivial > Fix For: 1.5.1 > > > Since Flink users don't necessarily know all of the Mesos terminology and a > JobManager runs also as a task, I would like to rename some of Flink's Mesos > configuration parameters. I would propose the following changes: > {{mesos.initial-tasks}} => {{mesos.initial-taskmanagers}} > {{mesos.maximum-failed-tasks}} => {{mesos.maximum-failed-taskmanagers}} > {{mesos.resourcemanager.artifactserver.\*}} => {{mesos.artifactserver.*}} > {{mesos.resourcemanager.framework.\*}} => {{mesos.framework.*}} > {{mesos.resourcemanager.tasks.\*}} => {{mesos.taskmanager.*}} > {{recovery.zookeeper.path.mesos-workers}} => > {{mesos.high-availability.zookeeper.path.mesos-workers}} > What do you think [~eronwright]? -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] flink issue #5992: [FLINK-8944] [Kinesis Connector] Use listShards instead o...
Github user kailashhd commented on the issue: https://github.com/apache/flink/pull/5992 Please note I rebased my changes to the master as I was having some failures when building just the flink-connector-kinesis maven project. I hope this will not cause problems with reviewing the code changes. Sorry for the inconvenience if it does. :( ---
[jira] [Commented] (FLINK-8944) Use ListShards for shard discovery in the flink kinesis connector
[ https://issues.apache.org/jira/browse/FLINK-8944?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16491434#comment-16491434 ] ASF GitHub Bot commented on FLINK-8944: --- Github user kailashhd commented on the issue: https://github.com/apache/flink/pull/5992 Please note I rebased my changes to the master as I was having some failures when building just the flink-connector-kinesis maven project. I hope this will not cause problems with reviewing the code changes. Sorry for the inconvenience if it does. :( > Use ListShards for shard discovery in the flink kinesis connector > - > > Key: FLINK-8944 > URL: https://issues.apache.org/jira/browse/FLINK-8944 > Project: Flink > Issue Type: Improvement >Reporter: Kailash Hassan Dayanand >Priority: Minor > > Currently the DescribeStream AWS API used to get list of shards is has a > restricted rate limits on AWS. (5 requests per sec per account). This is > problematic when running multiple flink jobs all on same account since each > subtasks calls the Describe Stream. Changing this to ListShards will provide > more flexibility on rate limits as ListShards has a 100 requests per second > per data stream limits. > More details on the mailing list. https://goo.gl/mRXjKh -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-8944) Use ListShards for shard discovery in the flink kinesis connector
[ https://issues.apache.org/jira/browse/FLINK-8944?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16491432#comment-16491432 ] ASF GitHub Bot commented on FLINK-8944: --- Github user kailashhd commented on a diff in the pull request: https://github.com/apache/flink/pull/5992#discussion_r191033584 --- Diff: flink-connectors/flink-connector-kinesis/src/test/java/org/apache/flink/streaming/connectors/kinesis/proxy/KinesisProxyTest.java --- @@ -26,20 +29,60 @@ import com.amazonaws.ClientConfigurationFactory; import com.amazonaws.services.kinesis.AmazonKinesis; import com.amazonaws.services.kinesis.model.ExpiredIteratorException; +import com.amazonaws.services.kinesis.model.ListShardsRequest; +import com.amazonaws.services.kinesis.model.ListShardsResult; import com.amazonaws.services.kinesis.model.ProvisionedThroughputExceededException; +import com.amazonaws.services.kinesis.model.Shard; +import org.hamcrest.Description; +import org.hamcrest.TypeSafeDiagnosingMatcher; +import org.junit.Assert; import org.junit.Test; import org.powermock.reflect.Whitebox; +import java.util.ArrayList; +import java.util.Arrays; +import java.util.HashMap; +import java.util.HashSet; +import java.util.List; import java.util.Properties; +import java.util.Set; +import java.util.stream.Collectors; +import static org.hamcrest.MatcherAssert.assertThat; +import static org.hamcrest.collection.IsCollectionWithSize.hasSize; +import static org.hamcrest.collection.IsIterableContainingInAnyOrder.containsInAnyOrder; import static org.junit.Assert.assertEquals; import static org.junit.Assert.assertFalse; import static org.junit.Assert.assertTrue; +import static org.mockito.Matchers.argThat; +import static org.mockito.Mockito.doReturn; +import static org.mockito.Mockito.mock; /** * Test for methods in the {@link KinesisProxy} class. */ public class KinesisProxyTest { + private static final String NEXT_TOKEN = "NextToken"; + private static final String fakeStreamName = "fake-stream"; + private Set shardIdSet; + private List shards; + + protected static HashMap+ createInitialSubscribedStreamsToLastDiscoveredShardsState(List streams) { + HashMap initial = new HashMap<>(); + for (String stream : streams) { + initial.put(stream, null); + } + return initial; + } + + private static ListShardsRequestMatcher initialListShardsRequestMatcher() { + return new ListShardsRequestMatcher(null, null); + } + + private static ListShardsRequestMatcher listShardsNextToken(final String nextToken) { --- End diff -- Done > Use ListShards for shard discovery in the flink kinesis connector > - > > Key: FLINK-8944 > URL: https://issues.apache.org/jira/browse/FLINK-8944 > Project: Flink > Issue Type: Improvement >Reporter: Kailash Hassan Dayanand >Priority: Minor > > Currently the DescribeStream AWS API used to get list of shards is has a > restricted rate limits on AWS. (5 requests per sec per account). This is > problematic when running multiple flink jobs all on same account since each > subtasks calls the Describe Stream. Changing this to ListShards will provide > more flexibility on rate limits as ListShards has a 100 requests per second > per data stream limits. > More details on the mailing list. https://goo.gl/mRXjKh -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-8944) Use ListShards for shard discovery in the flink kinesis connector
[ https://issues.apache.org/jira/browse/FLINK-8944?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16491433#comment-16491433 ] ASF GitHub Bot commented on FLINK-8944: --- Github user kailashhd commented on a diff in the pull request: https://github.com/apache/flink/pull/5992#discussion_r191033585 --- Diff: flink-connectors/flink-connector-kinesis/src/test/java/org/apache/flink/streaming/connectors/kinesis/proxy/KinesisProxyTest.java --- @@ -26,20 +29,60 @@ import com.amazonaws.ClientConfigurationFactory; import com.amazonaws.services.kinesis.AmazonKinesis; import com.amazonaws.services.kinesis.model.ExpiredIteratorException; +import com.amazonaws.services.kinesis.model.ListShardsRequest; +import com.amazonaws.services.kinesis.model.ListShardsResult; import com.amazonaws.services.kinesis.model.ProvisionedThroughputExceededException; +import com.amazonaws.services.kinesis.model.Shard; +import org.hamcrest.Description; +import org.hamcrest.TypeSafeDiagnosingMatcher; +import org.junit.Assert; import org.junit.Test; import org.powermock.reflect.Whitebox; +import java.util.ArrayList; +import java.util.Arrays; +import java.util.HashMap; +import java.util.HashSet; +import java.util.List; import java.util.Properties; +import java.util.Set; +import java.util.stream.Collectors; +import static org.hamcrest.MatcherAssert.assertThat; +import static org.hamcrest.collection.IsCollectionWithSize.hasSize; +import static org.hamcrest.collection.IsIterableContainingInAnyOrder.containsInAnyOrder; import static org.junit.Assert.assertEquals; import static org.junit.Assert.assertFalse; import static org.junit.Assert.assertTrue; +import static org.mockito.Matchers.argThat; +import static org.mockito.Mockito.doReturn; +import static org.mockito.Mockito.mock; /** * Test for methods in the {@link KinesisProxy} class. */ public class KinesisProxyTest { + private static final String NEXT_TOKEN = "NextToken"; + private static final String fakeStreamName = "fake-stream"; + private Set shardIdSet; + private List shards; --- End diff -- Done > Use ListShards for shard discovery in the flink kinesis connector > - > > Key: FLINK-8944 > URL: https://issues.apache.org/jira/browse/FLINK-8944 > Project: Flink > Issue Type: Improvement >Reporter: Kailash Hassan Dayanand >Priority: Minor > > Currently the DescribeStream AWS API used to get list of shards is has a > restricted rate limits on AWS. (5 requests per sec per account). This is > problematic when running multiple flink jobs all on same account since each > subtasks calls the Describe Stream. Changing this to ListShards will provide > more flexibility on rate limits as ListShards has a 100 requests per second > per data stream limits. > More details on the mailing list. https://goo.gl/mRXjKh -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] flink pull request #5992: [FLINK-8944] [Kinesis Connector] Use listShards in...
Github user kailashhd commented on a diff in the pull request: https://github.com/apache/flink/pull/5992#discussion_r191033585 --- Diff: flink-connectors/flink-connector-kinesis/src/test/java/org/apache/flink/streaming/connectors/kinesis/proxy/KinesisProxyTest.java --- @@ -26,20 +29,60 @@ import com.amazonaws.ClientConfigurationFactory; import com.amazonaws.services.kinesis.AmazonKinesis; import com.amazonaws.services.kinesis.model.ExpiredIteratorException; +import com.amazonaws.services.kinesis.model.ListShardsRequest; +import com.amazonaws.services.kinesis.model.ListShardsResult; import com.amazonaws.services.kinesis.model.ProvisionedThroughputExceededException; +import com.amazonaws.services.kinesis.model.Shard; +import org.hamcrest.Description; +import org.hamcrest.TypeSafeDiagnosingMatcher; +import org.junit.Assert; import org.junit.Test; import org.powermock.reflect.Whitebox; +import java.util.ArrayList; +import java.util.Arrays; +import java.util.HashMap; +import java.util.HashSet; +import java.util.List; import java.util.Properties; +import java.util.Set; +import java.util.stream.Collectors; +import static org.hamcrest.MatcherAssert.assertThat; +import static org.hamcrest.collection.IsCollectionWithSize.hasSize; +import static org.hamcrest.collection.IsIterableContainingInAnyOrder.containsInAnyOrder; import static org.junit.Assert.assertEquals; import static org.junit.Assert.assertFalse; import static org.junit.Assert.assertTrue; +import static org.mockito.Matchers.argThat; +import static org.mockito.Mockito.doReturn; +import static org.mockito.Mockito.mock; /** * Test for methods in the {@link KinesisProxy} class. */ public class KinesisProxyTest { + private static final String NEXT_TOKEN = "NextToken"; + private static final String fakeStreamName = "fake-stream"; + private Set shardIdSet; + private List shards; --- End diff -- Done ---
[GitHub] flink pull request #5992: [FLINK-8944] [Kinesis Connector] Use listShards in...
Github user kailashhd commented on a diff in the pull request: https://github.com/apache/flink/pull/5992#discussion_r191033584 --- Diff: flink-connectors/flink-connector-kinesis/src/test/java/org/apache/flink/streaming/connectors/kinesis/proxy/KinesisProxyTest.java --- @@ -26,20 +29,60 @@ import com.amazonaws.ClientConfigurationFactory; import com.amazonaws.services.kinesis.AmazonKinesis; import com.amazonaws.services.kinesis.model.ExpiredIteratorException; +import com.amazonaws.services.kinesis.model.ListShardsRequest; +import com.amazonaws.services.kinesis.model.ListShardsResult; import com.amazonaws.services.kinesis.model.ProvisionedThroughputExceededException; +import com.amazonaws.services.kinesis.model.Shard; +import org.hamcrest.Description; +import org.hamcrest.TypeSafeDiagnosingMatcher; +import org.junit.Assert; import org.junit.Test; import org.powermock.reflect.Whitebox; +import java.util.ArrayList; +import java.util.Arrays; +import java.util.HashMap; +import java.util.HashSet; +import java.util.List; import java.util.Properties; +import java.util.Set; +import java.util.stream.Collectors; +import static org.hamcrest.MatcherAssert.assertThat; +import static org.hamcrest.collection.IsCollectionWithSize.hasSize; +import static org.hamcrest.collection.IsIterableContainingInAnyOrder.containsInAnyOrder; import static org.junit.Assert.assertEquals; import static org.junit.Assert.assertFalse; import static org.junit.Assert.assertTrue; +import static org.mockito.Matchers.argThat; +import static org.mockito.Mockito.doReturn; +import static org.mockito.Mockito.mock; /** * Test for methods in the {@link KinesisProxy} class. */ public class KinesisProxyTest { + private static final String NEXT_TOKEN = "NextToken"; + private static final String fakeStreamName = "fake-stream"; + private Set shardIdSet; + private List shards; + + protected static HashMap+ createInitialSubscribedStreamsToLastDiscoveredShardsState(List streams) { + HashMap initial = new HashMap<>(); + for (String stream : streams) { + initial.put(stream, null); + } + return initial; + } + + private static ListShardsRequestMatcher initialListShardsRequestMatcher() { + return new ListShardsRequestMatcher(null, null); + } + + private static ListShardsRequestMatcher listShardsNextToken(final String nextToken) { --- End diff -- Done ---
[jira] [Created] (FLINK-9440) Allow cancelation and reset of timers
Elias Levy created FLINK-9440: - Summary: Allow cancelation and reset of timers Key: FLINK-9440 URL: https://issues.apache.org/jira/browse/FLINK-9440 Project: Flink Issue Type: Bug Components: Streaming Affects Versions: 1.4.2 Reporter: Elias Levy Currently the {{TimerService}} allows one to register timers, but it is not possible to delete a timer or to reset a timer to a new value. If one wishes to reset a timer, one must also handle the previous inserted timer callbacks and ignore them. I would be useful if the API allowed one to remove and reset timers. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (FLINK-9344) Support INTERSECT and INTERSECT ALL for streaming
[ https://issues.apache.org/jira/browse/FLINK-9344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ruidong Li updated FLINK-9344: -- Description: support non-window intersect and non-window intersect all for both SQL and TableAPI (was: support intersect and intersect all for both SQL and TableAPI) > Support INTERSECT and INTERSECT ALL for streaming > - > > Key: FLINK-9344 > URL: https://issues.apache.org/jira/browse/FLINK-9344 > Project: Flink > Issue Type: Improvement > Components: Table API SQL >Reporter: Ruidong Li >Assignee: Ruidong Li >Priority: Major > > support non-window intersect and non-window intersect all for both SQL and > TableAPI -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-9344) Support INTERSECT and INTERSECT ALL for streaming
[ https://issues.apache.org/jira/browse/FLINK-9344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16490976#comment-16490976 ] ASF GitHub Bot commented on FLINK-9344: --- Github user Xpray commented on the issue: https://github.com/apache/flink/pull/5998 Thanks for the review @walterddr @fhueske , I've updated the PR. > Support INTERSECT and INTERSECT ALL for streaming > - > > Key: FLINK-9344 > URL: https://issues.apache.org/jira/browse/FLINK-9344 > Project: Flink > Issue Type: Improvement > Components: Table API SQL >Reporter: Ruidong Li >Assignee: Ruidong Li >Priority: Major > > support intersect and intersect all for both SQL and TableAPI -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] flink issue #5998: [FLINK-9344] [TableAPI & SQL] Support INTERSECT and INTER...
Github user Xpray commented on the issue: https://github.com/apache/flink/pull/5998 Thanks for the review @walterddr @fhueske , I've updated the PR. ---
[jira] [Updated] (FLINK-8770) CompletedCheckPoints stored on ZooKeeper is not up-to-date, when JobManager is restarted it fails to recover the job due to "checkpoint FileNotFound exception"
[ https://issues.apache.org/jira/browse/FLINK-8770?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Till Rohrmann updated FLINK-8770: - Fix Version/s: (was: 1.5.0) 1.5.1 > CompletedCheckPoints stored on ZooKeeper is not up-to-date, when JobManager > is restarted it fails to recover the job due to "checkpoint FileNotFound > exception" > --- > > Key: FLINK-8770 > URL: https://issues.apache.org/jira/browse/FLINK-8770 > Project: Flink > Issue Type: Bug > Components: Local Runtime >Affects Versions: 1.4.0 >Reporter: Xinyang Gao >Priority: Critical > Fix For: 1.5.1 > > Attachments: flink-test-jobmanager-3-b2dm8.log > > > Hi, I am running a Flink cluster (1 JobManager + 6 TaskManagers) with HA mode > on OpenShift, I have enabled Chaos Monkey which kills either JobManager or > one of the TaskManager in every 5 minutes, ZooKeeper quorum is stable with no > chaos monkey enabled. Flink reads data from one Kafka topic and writes data > into another Kafka topic. Checkpoint surely is enabled, with 1000ms interval. > state.checkpoints.num-retained is set to 10. I am using PVC for state backend > (checkpoint, recovery, etc), so the checkpoints and states are persistent. > The restart strategy for Flink jobmanager DeploymentConfig is > {color:#d04437}recreate, {color:#33}which means it will kill the old > container of jobmanager before it restarts the new one.{color}{color} > I have run the Chaos test for one day at first, however I have seen the > exception: > {color:#FF}org.apache.flink.util.FlinkException: Could not retrieve > checkpoint *** from state handle under /***. This indicates that the > retrieved state handle is broken. Try cleaning the state handle store. > {color:#33}and the root cause is checkpoint > {color:#d04437}FileNotFound{color}. {color}{color} > {color:#FF}{color:#33}then the Flink job keeps restarting for a few > hours and due to the above error it cannot be restarted successfully. > {color}{color} > {color:#FF}{color:#33}After further investigation, I have found the > following facts in my PVC:{color}{color} > > {color:#d04437}-rw-r--r--. 1 flink root 11379 Feb 23 02:10 > completedCheckpoint0ee95157de00 > -rw-r--r--. 1 flink root 11379 Feb 23 01:51 completedCheckpoint498d0952cf00 > -rw-r--r--. 1 flink root 11379 Feb 23 02:10 completedCheckpoint650fe5b021fe > -rw-r--r--. 1 flink root 11379 Feb 23 02:10 completedCheckpoint66634149683e > -rw-r--r--. 1 flink root 11379 Feb 23 02:11 completedCheckpoint67f24c3b018e > -rw-r--r--. 1 flink root 11379 Feb 23 02:10 completedCheckpoint6f64ebf0ae64 > -rw-r--r--. 1 flink root 11379 Feb 23 02:11 completedCheckpoint906ebe1fb337 > -rw-r--r--. 1 flink root 11379 Feb 23 02:11 completedCheckpoint98b79ea14b09 > -rw-r--r--. 1 flink root 11379 Feb 23 02:10 completedCheckpointa0d1070e0b6c > -rw-r--r--. 1 flink root 11379 Feb 23 02:11 completedCheckpointbd3a9ba50322 > -rw-r--r--. 1 flink root 11355 Feb 22 17:31 completedCheckpointd433b5e108be > -rw-r--r--. 1 flink root 11379 Feb 22 22:56 completedCheckpointdd0183ed092b > -rw-r--r--. 1 flink root 11379 Feb 22 00:00 completedCheckpointe0a5146c3d81 > -rw-r--r--. 1 flink root 11331 Feb 22 17:06 completedCheckpointec82f3ebc2ad > -rw-r--r--. 1 flink root 11379 Feb 23 02:11 > completedCheckpointf86e460f6720{color} > > {color:#33}The latest 10 checkpoints are created at about 02:10, if you > ignore the old checkpoints which were not deleted successfully (which I do > not care too much).{color} > > {color:#33}However when checking on ZooKeeper, I see the followings in > flink/checkpoints path (I only list one, but the other 9 are similar){color} > {color:#d04437}cZxid = 0x160001ff5d > ��sr;org.apache.flink.runtime.state.RetrievableStreamStateHandle�U�+LwrappedStreamStateHandlet2Lorg/apache/flink/runtime/state/StreamStateHandle;xpsr9org.apache.flink.runtime.state.filesystem.FileStateHandle�u�b�▒▒J > > stateSizefilePathtLorg/apache/flink/core/fs/Path;xp,ssrorg.apache.flink.core.fs.PathLuritLjava/net/URI;xpsr > > java.net.URI�x.C�I�LstringtLjava/lang/String;xpt=file:/mnt/flink-test/recovery/completedCheckpointd004a3753870x > [zk: localhost:2181(CONNECTED) 7] ctime = Fri Feb 23 02:08:18 UTC 2018 > mZxid = 0x160001ff5d > mtime = Fri Feb 23 02:08:18 UTC 2018 > pZxid = 0x1d0c6d > cversion = 31 > dataVersion = 0 > aclVersion = 0 > ephemeralOwner = 0x0 > dataLength = 492{color} > {color:#FF}{color:#33} {color}{color} > so the latest completedCheckpoints status stored on ZooKeeper is at about > {color:#d04437}02:08, {color:#33}which implies that the completed > checkpoints at{color}{color}
[jira] [Updated] (FLINK-9301) NotSoMiniClusterIterations job fails on travis
[ https://issues.apache.org/jira/browse/FLINK-9301?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Till Rohrmann updated FLINK-9301: - Fix Version/s: (was: 1.5.0) 1.5.1 > NotSoMiniClusterIterations job fails on travis > -- > > Key: FLINK-9301 > URL: https://issues.apache.org/jira/browse/FLINK-9301 > Project: Flink > Issue Type: Improvement > Components: Tests >Affects Versions: 1.5.0 >Reporter: Chesnay Schepler >Assignee: Andrey Zagrebin >Priority: Critical > Fix For: 1.5.1 > > > The high-parallelism-iterations-test fails on travis. After starting ~55 > taskmanagers all memory is used and further memory allocations fail. > I'm currently letting it run another time, if it fails again I will disable > the test temporarily. > https://travis-ci.org/zentol/flink-ci/builds/375189790 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (FLINK-8247) Support Hadoop-free variant of Flink on Mesos
[ https://issues.apache.org/jira/browse/FLINK-8247?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Till Rohrmann updated FLINK-8247: - Fix Version/s: (was: 1.5.0) 1.5.1 > Support Hadoop-free variant of Flink on Mesos > - > > Key: FLINK-8247 > URL: https://issues.apache.org/jira/browse/FLINK-8247 > Project: Flink > Issue Type: Bug > Components: Mesos >Affects Versions: 1.4.0 >Reporter: Eron Wright >Assignee: Eron Wright >Priority: Major > Fix For: 1.4.3, 1.5.1 > > > In Hadoop-free mode, Hadoop isn't on the classpath. The Mesos job manager > normally uses the Hadoop UserGroupInformation class to overlay a user context > (`HADOOP_USER_NAME`) for the task managers. > Detect the absence of Hadoop and skip over the `HadoopUserOverlay`, similar > to the logic in `HadoopModuleFactory`.This may require the introduction > of an overlay factory. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (FLINK-8744) Add annotations for documentation common/advanced options
[ https://issues.apache.org/jira/browse/FLINK-8744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Till Rohrmann updated FLINK-8744: - Fix Version/s: (was: 1.5.0) 1.5.1 > Add annotations for documentation common/advanced options > - > > Key: FLINK-8744 > URL: https://issues.apache.org/jira/browse/FLINK-8744 > Project: Flink > Issue Type: New Feature > Components: Configuration, Documentation >Reporter: Chesnay Schepler >Assignee: Chesnay Schepler >Priority: Trivial > Fix For: 1.5.1 > > > The {{ConfigDocsGenerator}} only generates [the full configuration > reference|https://ci.apache.org/projects/flink/flink-docs-master/ops/config.html#full-reference]. > The > [common|https://ci.apache.org/projects/flink/flink-docs-master/ops/config.html#common-options] > and > [advanced|https://ci.apache.org/projects/flink/flink-docs-master/ops/config.html#advanced-options] > sections are still manually defined in {{config.md}} and thus prone to being > outdated / out-of-sync with the full reference. > I propose adding {{@Common}}/{{@Advanced}} annotations based on which we > could generate these sections as well. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (FLINK-7697) Add metrics for Elasticsearch Sink
[ https://issues.apache.org/jira/browse/FLINK-7697?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Till Rohrmann updated FLINK-7697: - Fix Version/s: (was: 1.5.0) 1.5.1 > Add metrics for Elasticsearch Sink > -- > > Key: FLINK-7697 > URL: https://issues.apache.org/jira/browse/FLINK-7697 > Project: Flink > Issue Type: Wish > Components: ElasticSearch Connector >Reporter: Hai Zhou >Assignee: Hai Zhou >Priority: Critical > Fix For: 1.5.1 > > > We should add metrics to track events write to ElasticasearchSink. > eg. > * number of successful bulk sends > * number of documents inserted > * number of documents updated > * number of documents version conflicts -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (FLINK-9328) RocksDBStateBackend might use PlaceholderStreamStateHandle to restor due to StateBackendTestBase class not register snapshots in some UTs
[ https://issues.apache.org/jira/browse/FLINK-9328?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Till Rohrmann updated FLINK-9328: - Fix Version/s: (was: 1.5.0) 1.5.1 > RocksDBStateBackend might use PlaceholderStreamStateHandle to restor due to > StateBackendTestBase class not register snapshots in some UTs > - > > Key: FLINK-9328 > URL: https://issues.apache.org/jira/browse/FLINK-9328 > Project: Flink > Issue Type: Bug > Components: State Backends, Checkpointing >Affects Versions: 1.4.2 >Reporter: Yun Tang >Priority: Minor > Fix For: 1.5.1 > > > Currently, StateBackendTestBase class does not register snapshots to > SharedStateRegistry in testValueState, testListState, testReducingState, > testFoldingState and testMapState UTs, which may cause RocksDBStateBackend to > restore from PlaceholderStreamStateHandle during the 2nd restore procedure if > one specific sst file both existed in the 1st snapshot and the 2nd snapshot > handle. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (FLINK-5018) Make source idle timeout user configurable
[ https://issues.apache.org/jira/browse/FLINK-5018?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Till Rohrmann updated FLINK-5018: - Fix Version/s: (was: 1.5.0) 1.5.1 > Make source idle timeout user configurable > -- > > Key: FLINK-5018 > URL: https://issues.apache.org/jira/browse/FLINK-5018 > Project: Flink > Issue Type: Sub-task > Components: DataStream API >Reporter: Tzu-Li (Gordon) Tai >Priority: Major > Fix For: 1.5.1 > > > There are 2 cases where sources are considered idle and should emit an idle > {{StreamStatus}} downstream, taking Kafka consumer as example: > - The source instance was not assigned any partitions > - The source instance was assigned partitions, but they currently don't have > any data. > For the second case, we can only consider it idle after a timeout threshold. > It would be good to make this timeout user configurable besides a default > value. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (FLINK-8779) ClassLoaderITCase.testKMeansJobWithCustomClassLoader fails on Travis
[ https://issues.apache.org/jira/browse/FLINK-8779?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Till Rohrmann updated FLINK-8779: - Fix Version/s: (was: 1.5.0) 1.5.1 > ClassLoaderITCase.testKMeansJobWithCustomClassLoader fails on Travis > > > Key: FLINK-8779 > URL: https://issues.apache.org/jira/browse/FLINK-8779 > Project: Flink > Issue Type: Bug > Components: Tests >Affects Versions: 1.5.0 >Reporter: Till Rohrmann >Priority: Critical > Labels: test-stability > Fix For: 1.5.1 > > > The \{{ClassLoaderITCase.testKMeansJobWithCustomClassLoader}} fails on Travis > by producing not output for 300s. This might indicate a test instability or a > problem with Flink which was recently introduced. > https://api.travis-ci.org/v3/job/344427688/log.txt -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (FLINK-9215) TaskManager Releasing - org.apache.flink.util.FlinkException
[ https://issues.apache.org/jira/browse/FLINK-9215?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Till Rohrmann updated FLINK-9215: - Fix Version/s: (was: 1.5.0) 1.5.1 > TaskManager Releasing - org.apache.flink.util.FlinkException > - > > Key: FLINK-9215 > URL: https://issues.apache.org/jira/browse/FLINK-9215 > Project: Flink > Issue Type: Bug > Components: Cluster Management, ResourceManager, Streaming >Affects Versions: 1.5.0 >Reporter: Bob Lau >Assignee: Sihua Zhou >Priority: Major > Fix For: 1.5.1 > > > The exception stack is as follows: > {code:java} > //代码占位符 > {"root-exception":" > org.apache.flink.util.FlinkException: Releasing TaskManager > 0d87aa6fa99a6c12e36775b1d6bceb19. > at > org.apache.flink.runtime.jobmaster.slotpool.SlotPool.releaseTaskManagerInternal(SlotPool.java:1067) > at > org.apache.flink.runtime.jobmaster.slotpool.SlotPool.releaseTaskManager(SlotPool.java:1050) > at sun.reflect.GeneratedMethodAccessor110.invoke(Unknown Source) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcInvocation(AkkaRpcActor.java:210) > at > org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleMessage(AkkaRpcActor.java:154) > at > org.apache.flink.runtime.rpc.akka.AkkaRpcActor.lambda$onReceive$1(AkkaRpcActor.java:132) > at akka.actor.ActorCell$$anonfun$become$1.applyOrElse(ActorCell.scala:544) > at akka.actor.Actor$class.aroundReceive(Actor.scala:502) > at akka.actor.UntypedActor.aroundReceive(UntypedActor.scala:95) > at akka.actor.ActorCell.receiveMessage(ActorCell.scala:526) > at akka.actor.ActorCell.invoke(ActorCell.scala:495) > at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:257) > at akka.dispatch.Mailbox.run(Mailbox.scala:224) > at akka.dispatch.Mailbox.exec(Mailbox.scala:234) > at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) > at > scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) > at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) > at > scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) > ","timestamp":1524106438997, > "all-exceptions":[{"exception":" > org.apache.flink.util.FlinkException: Releasing TaskManager > 0d87aa6fa99a6c12e36775b1d6bceb19. > at > org.apache.flink.runtime.jobmaster.slotpool.SlotPool.releaseTaskManagerInternal(SlotPool.java:1067) > at > org.apache.flink.runtime.jobmaster.slotpool.SlotPool.releaseTaskManager(SlotPool.java:1050) > at sun.reflect.GeneratedMethodAccessor110.invoke(Unknown Source) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcInvocation(AkkaRpcActor.java:210) > at > org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleMessage(AkkaRpcActor.java:154) > at > org.apache.flink.runtime.rpc.akka.AkkaRpcActor.lambda$onReceive$1(AkkaRpcActor.java:132) > at akka.actor.ActorCell$$anonfun$become$1.applyOrElse(ActorCell.scala:544) > at akka.actor.Actor$class.aroundReceive(Actor.scala:502) > at akka.actor.UntypedActor.aroundReceive(UntypedActor.scala:95) > at akka.actor.ActorCell.receiveMessage(ActorCell.scala:526) > at akka.actor.ActorCell.invoke(ActorCell.scala:495) > at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:257) > at akka.dispatch.Mailbox.run(Mailbox.scala:224) > at akka.dispatch.Mailbox.exec(Mailbox.scala:234) > at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) > at > scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) > at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) > at > scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) > ","task":"async wait operator > (14/20)","location":"slave1:60199","timestamp":1524106438996 > }],"truncated":false} > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (FLINK-8073) Test instability FlinkKafkaProducer011ITCase.testScaleDownBeforeFirstCheckpoint()
[ https://issues.apache.org/jira/browse/FLINK-8073?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Till Rohrmann updated FLINK-8073: - Fix Version/s: (was: 1.5.0) 1.5.1 > Test instability > FlinkKafkaProducer011ITCase.testScaleDownBeforeFirstCheckpoint() > - > > Key: FLINK-8073 > URL: https://issues.apache.org/jira/browse/FLINK-8073 > Project: Flink > Issue Type: Bug > Components: Kafka Connector, Tests >Affects Versions: 1.4.0, 1.5.0 >Reporter: Kostas Kloudas >Priority: Critical > Labels: test-stability > Fix For: 1.5.1 > > > Travis log: https://travis-ci.org/kl0u/flink/jobs/301985988 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (FLINK-9004) Cluster test: Run general purpose job with failures with Yarn session
[ https://issues.apache.org/jira/browse/FLINK-9004?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Till Rohrmann updated FLINK-9004: - Fix Version/s: (was: 1.5.0) 1.5.1 > Cluster test: Run general purpose job with failures with Yarn session > - > > Key: FLINK-9004 > URL: https://issues.apache.org/jira/browse/FLINK-9004 > Project: Flink > Issue Type: Sub-task > Components: Tests >Affects Versions: 1.5.0 >Reporter: Till Rohrmann >Assignee: Gary Yao >Priority: Blocker > Fix For: 1.5.1 > > > Similar to FLINK-8973, we should run the general purpose job (FLINK-8971) on > a Yarn session cluster and simulate failures. > The job jar should be ill-packaged, meaning that we include too many > dependencies in the user jar. We should include the Scala library, Hadoop and > Flink itself to verify that there are no class loading issues. > The general purpose job should run with misbehavior activated. Additionally, > we should simulate at least the following failure scenarios: > * Kill Flink processes > * Kill connection to storage system for checkpoints and jobs > * Simulate network partition > We should run the test at least with the following state backend: RocksDB > incremental async and checkpointing to S3. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (FLINK-6369) Better support for overlay networks
[ https://issues.apache.org/jira/browse/FLINK-6369?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Till Rohrmann updated FLINK-6369: - Fix Version/s: (was: 1.5.0) 1.5.1 > Better support for overlay networks > --- > > Key: FLINK-6369 > URL: https://issues.apache.org/jira/browse/FLINK-6369 > Project: Flink > Issue Type: Improvement > Components: Docker, Network >Affects Versions: 1.2.0 >Reporter: Patrick Lucas >Priority: Major > Fix For: 1.5.1 > > > Running Flink in an environment that utilizes an overlay network > (containerized environments like Kubernetes or Docker Compose, or cloud > platforms like AWS VPC) poses various challenges related to networking. > The core problem is that in these environments, applications are frequently > addressed by a name different from that with which the application sees > itself. > For instance, it is plausible that the Flink UI (served by the Jobmanager) is > accessed via an ELB, which poses a problem in HA mode when the non-leader UI > returns an HTTP redirect to the leader—but the user may not be able to > connect directly to the leader. > Or, if a user is using [Docker > Compose|https://github.com/apache/flink/blob/aa21f853ab0380ec1f68ae1d0b7c8d9268da4533/flink-contrib/docker-flink/docker-compose.yml], > they cannot submit a job via the CLI since there is a mismatch between the > name used to address the Jobmanager and what the Jobmanager perceives as its > hostname. (see \[1] below for more detail) > > h3. Problems and proposed solutions > There are four instances of this issue that I've run into so far: > h4. Jobmanagers must be addressed by the same name they are configured with > due to limitations of Akka > Akka enforces that messages it receives are addressed with the hostname it is > configured with. Newer versions of Akka (>= 2.4) than what Flink uses > (2.3-custom) have support for accepting messages with the "wrong" hostname, > but it limited to a single "external" hostname. > In many environments, it is likely that not all parties that want to connect > to the Jobmanager have the same way of addressing it (e.g. the ELB example > above). Other similarly-used protocols like HTTP generally don't have this > restriction: if you connect on a socket and send a well-formed message, the > system assumes that it is the desired recipient. > One solution is to not use Akka at all when communicating with the cluster > from the outside, perhaps using an HTTP API instead. This would be somewhat > involved, and probabyl best left as a longer-term goal. > A more immediate solution would be to override this behavior within Flakka, > the custom fork of Akka currently in use by Flink. I'm not sure how much > effort this would take. > h4. The Jobmanager needs to be able to address the Taskmanagers for e.g. > metrics collection > Having the Taskmanagers register themselves by IP is probably the best > solution here. It's a reasonable assumption that IPs can always be used for > communication between the nodes of a single cluster. Asking that each > Taskmanager container have a resolvable hostname is unreasonable. > h4. Jobmanagers in HA mode send HTTP redirects to URLs that aren't externally > resolvable/routable > If multiple Jobmanagers are used in HA mode, HTTP requests to non-leaders > (such as if you put a Kubernetes Service in front of all Jobmanagers in a > cluster) get redirected to the (supposed) hostname of the leader, but this is > potentially unresolvable/unroutable externally. > Enabling non-leader Jobmanagers to proxy API calls to the leader would solve > this. The non-leaders could even serve static asset requests (e.g. for css or > js files) directly. > h4. Queryable state requests involve direct communication with Taskmanagers > Currently, queryable state requests involve communication between the client > and the Jobmanager (for key partitioning lookups) and between the client and > all Taskmanagers. > If the client is inside the network (as would be common in production > use-cases where high-volume lookups are required) this is a non-issue, but > problems crop up if the client is outside the network. > For the communication with the Jobmanager, a similar solution as above can be > used: if all Jobmanagers can service all key partitioning lookup requests > (e.g. by proxying) then a simple Service can be used. > The story is a bit different for the Taskmanagers. The partitioning lookup to > the Jobmanager would return the name of the particular Taskmanager that owned > the desired data, but that name (likely an IP, as proposed in the second > section above) is not necessarily resolvable/routable from the client. > In the context of Kubernetes, where individual containers are generally not >
[jira] [Updated] (FLINK-8336) YarnFileStageTestS3ITCase.testRecursiveUploadForYarnS3 test instability
[ https://issues.apache.org/jira/browse/FLINK-8336?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Till Rohrmann updated FLINK-8336: - Fix Version/s: (was: 1.5.0) 1.5.1 > YarnFileStageTestS3ITCase.testRecursiveUploadForYarnS3 test instability > --- > > Key: FLINK-8336 > URL: https://issues.apache.org/jira/browse/FLINK-8336 > Project: Flink > Issue Type: Bug > Components: FileSystem, Tests, YARN >Affects Versions: 1.5.0 >Reporter: Till Rohrmann >Assignee: Nico Kruber >Priority: Critical > Labels: test-stability > Fix For: 1.4.3, 1.5.1 > > > The {{YarnFileStageTestS3ITCase.testRecursiveUploadForYarnS3}} fails on > Travis. I suspect that this has something to do with the consistency > guarantees S3 gives us. > https://travis-ci.org/tillrohrmann/flink/jobs/323930297 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (FLINK-9013) Document yarn.containers.vcores only being effective when adapting YARN config
[ https://issues.apache.org/jira/browse/FLINK-9013?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Till Rohrmann updated FLINK-9013: - Fix Version/s: (was: 1.5.0) 1.5.1 > Document yarn.containers.vcores only being effective when adapting YARN config > -- > > Key: FLINK-9013 > URL: https://issues.apache.org/jira/browse/FLINK-9013 > Project: Flink > Issue Type: Improvement > Components: Documentation, YARN >Affects Versions: 1.5.0 >Reporter: Nico Kruber >Assignee: Nico Kruber >Priority: Major > Fix For: 1.5.1 > > > Even after specifying {{yarn.containers.vcores}} and having Flink request > such a container from YARN, it may not take these into account at all and > return a container with 1 vcore. > The YARN configuration needs to be adapted to take the vcores into account, > e.g. by setting the {{FairScheduler}} in {{yarn-site.xml}}: > {code} > > yarn.resourcemanager.scheduler.class > > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler > > {code} > This fact should be documented at least at the configuration parameter > documentation of {{yarn.containers.vcores}}. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (FLINK-8046) ContinuousFileMonitoringFunction wrongly ignores files with exact same timestamp
[ https://issues.apache.org/jira/browse/FLINK-8046?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Till Rohrmann updated FLINK-8046: - Fix Version/s: (was: 1.5.0) 1.5.1 > ContinuousFileMonitoringFunction wrongly ignores files with exact same > timestamp > > > Key: FLINK-8046 > URL: https://issues.apache.org/jira/browse/FLINK-8046 > Project: Flink > Issue Type: Bug > Components: Streaming >Affects Versions: 1.3.2 >Reporter: Juan Miguel Cejuela >Priority: Major > Labels: stream > Fix For: 1.5.1 > > Original Estimate: 24h > Remaining Estimate: 24h > > The current monitoring of files sets the internal variable > `globalModificationTime` to filter out files that are "older". However, the > current test (to check "older") does > `boolean shouldIgnore = modificationTime <= globalModificationTime;` (rom > `shouldIgnore`) > The comparison should strictly be SMALLER (NOT smaller or equal). The method > documentation also states "This happens if the modification time of the file > is _smaller_ than...". > The equality acceptance for "older", makes some files with same exact > timestamp to be ignored. The behavior is also non-deterministic, as the first > file to be accepted ("first" being pretty much random) makes the rest of > files with same exact timestamp to be ignored. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (FLINK-8343) Add support for job cluster deployment
[ https://issues.apache.org/jira/browse/FLINK-8343?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Till Rohrmann updated FLINK-8343: - Fix Version/s: (was: 1.5.0) 1.5.1 > Add support for job cluster deployment > -- > > Key: FLINK-8343 > URL: https://issues.apache.org/jira/browse/FLINK-8343 > Project: Flink > Issue Type: Sub-task > Components: Client >Affects Versions: 1.5.0 >Reporter: Till Rohrmann >Assignee: Till Rohrmann >Priority: Major > Labels: flip-6 > Fix For: 1.5.1 > > > For Flip-6 we have to enable a different job cluster deployment. The > difference is that we directly submit the job when we deploy the Flink > cluster instead of following a two step approach (first deployment and then > submission). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (FLINK-7915) Verify functionality of RollingSinkSecuredITCase
[ https://issues.apache.org/jira/browse/FLINK-7915?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Till Rohrmann updated FLINK-7915: - Fix Version/s: (was: 1.5.0) 1.5.1 > Verify functionality of RollingSinkSecuredITCase > > > Key: FLINK-7915 > URL: https://issues.apache.org/jira/browse/FLINK-7915 > Project: Flink > Issue Type: Task > Components: Tests >Affects Versions: 1.4.0 >Reporter: Till Rohrmann >Priority: Critical > Labels: test-stability > Fix For: 1.5.1 > > > I recently stumbled across the test {{RollingSinkSecuredITCase}} which will > only be executed for Hadoop version {{>= 3}}. When trying to run it from > IntelliJ I immediately run into a class not found exception for > {{jdbm.helpers.CachePolicy}} and even after fixing this problem, the test > would not run because it complained about wrong security settings. > I think we should check whether this test is at all working and if not, then > we should remove or replace it with something working. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (FLINK-7550) Give names to REST client/server for clearer logging.
[ https://issues.apache.org/jira/browse/FLINK-7550?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Till Rohrmann updated FLINK-7550: - Fix Version/s: (was: 1.5.0) 1.5.1 > Give names to REST client/server for clearer logging. > - > > Key: FLINK-7550 > URL: https://issues.apache.org/jira/browse/FLINK-7550 > Project: Flink > Issue Type: Improvement > Components: REST >Affects Versions: 1.4.0 >Reporter: Kostas Kloudas >Priority: Major > Fix For: 1.5.1 > > > This issue proposes to give names to the entities composing a REST-ful > service and use these names when logging messages. This will help debugging. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (FLINK-9109) Add flink modify command to documentation
[ https://issues.apache.org/jira/browse/FLINK-9109?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Till Rohrmann updated FLINK-9109: - Fix Version/s: (was: 1.5.0) 1.5.1 > Add flink modify command to documentation > - > > Key: FLINK-9109 > URL: https://issues.apache.org/jira/browse/FLINK-9109 > Project: Flink > Issue Type: Bug > Components: Documentation >Affects Versions: 1.5.0 >Reporter: Till Rohrmann >Assignee: Till Rohrmann >Priority: Major > Labels: flip-6 > Fix For: 1.5.1 > > > We should add documentation for the {{flink modify}} command. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (FLINK-8302) Support shift_left and shift_right in TableAPI
[ https://issues.apache.org/jira/browse/FLINK-8302?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Till Rohrmann updated FLINK-8302: - Fix Version/s: (was: 1.5.0) 1.5.1 > Support shift_left and shift_right in TableAPI > -- > > Key: FLINK-8302 > URL: https://issues.apache.org/jira/browse/FLINK-8302 > Project: Flink > Issue Type: Improvement > Components: Table API SQL >Reporter: DuBin >Priority: Major > Labels: features > Fix For: 1.5.1 > > Original Estimate: 48h > Remaining Estimate: 48h > > Add shift_left and shift_right support in TableAPI, shift_left(input, n) act > as input << n, shift_right(input, n) act as input >> n. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (FLINK-9046) Improve error messages when receiving unexpected messages.
[ https://issues.apache.org/jira/browse/FLINK-9046?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Till Rohrmann updated FLINK-9046: - Fix Version/s: (was: 1.5.0) 1.5.1 > Improve error messages when receiving unexpected messages. > -- > > Key: FLINK-9046 > URL: https://issues.apache.org/jira/browse/FLINK-9046 > Project: Flink > Issue Type: Bug > Components: REST, Webfrontend >Affects Versions: 1.5.0 >Reporter: Kostas Kloudas >Priority: Major > Fix For: 1.5.1 > > > Currently, in many cases, when a Rest Handler received an unexpected > messages, e.g. for a job that it does not exist, it logs the full stack trace > often with misguiding messages. This can happen for example if we launch a > cluster, connect to the WebUI and monitor a job, then kill the cluster while > not shutting down the WebUI and start a new cluster on the same port. In this > case the WebUI will keep asking for the previous job and the > JobDetailsHandler will log the following: > > {code:java} > 2018-03-21 14:27:24,319 ERROR > org.apache.flink.runtime.rest.handler.job.JobDetailsHandler - Exception > occurred in REST handler. > org.apache.flink.runtime.rest.NotFoundException: Job > 548afad8217f4a18db7f50e60d48885a not found > at > org.apache.flink.runtime.rest.handler.job.AbstractExecutionGraphHandler.lambda$handleRequest$1(AbstractExecutionGraphHandler.java:90) > at > java.util.concurrent.CompletableFuture.uniExceptionally(CompletableFuture.java:870) > at > java.util.concurrent.CompletableFuture$UniExceptionally.tryFire(CompletableFuture.java:852) > at > java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474) > at > java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1977) > at > org.apache.flink.runtime.rest.handler.legacy.ExecutionGraphCache.lambda$getExecutionGraph$0(ExecutionGraphCache.java:133) > at > java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:760) > at > java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:736) > at > java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474) > at > java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1977) > at > org.apache.flink.runtime.concurrent.FutureUtils$1.onComplete(FutureUtils.java:755) > at akka.dispatch.OnComplete.internal(Future.scala:258) > at akka.dispatch.OnComplete.internal(Future.scala:256) > at akka.dispatch.japi$CallbackBridge.apply(Future.scala:186) > at akka.dispatch.japi$CallbackBridge.apply(Future.scala:183) > at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:36) > at > org.apache.flink.runtime.concurrent.Executors$DirectExecutionContext.execute(Executors.java:83) > at scala.concurrent.impl.CallbackRunnable.executeWithValue(Promise.scala:44) > at scala.concurrent.impl.Promise$DefaultPromise.tryComplete(Promise.scala:252) > at akka.pattern.PromiseActorRef.$bang(AskSupport.scala:534) > at > akka.pattern.PipeToSupport$PipeableFuture$$anonfun$pipeTo$1.applyOrElse(PipeToSupport.scala:20) > at > akka.pattern.PipeToSupport$PipeableFuture$$anonfun$pipeTo$1.applyOrElse(PipeToSupport.scala:18) > at scala.concurrent.Future$$anonfun$andThen$1.apply(Future.scala:436) > at scala.concurrent.Future$$anonfun$andThen$1.apply(Future.scala:435) > at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:36) > at > akka.dispatch.BatchingExecutor$AbstractBatch.processBatch(BatchingExecutor.scala:55) > at > akka.dispatch.BatchingExecutor$BlockableBatch$$anonfun$run$1.apply$mcV$sp(BatchingExecutor.scala:91) > at > akka.dispatch.BatchingExecutor$BlockableBatch$$anonfun$run$1.apply(BatchingExecutor.scala:91) > at > akka.dispatch.BatchingExecutor$BlockableBatch$$anonfun$run$1.apply(BatchingExecutor.scala:91) > at scala.concurrent.BlockContext$.withBlockContext(BlockContext.scala:72) > at > akka.dispatch.BatchingExecutor$BlockableBatch.run(BatchingExecutor.scala:90) > at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:39) > at > akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:415) > at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) > at > scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) > at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) > at > scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107){code} > > The same holds when asking through the WebUI for the logs of TM. In this > case, the logs will contain: > {code:java} > 2018-03-21 12:26:05,096 ERROR > org.apache.flink.runtime.rest.handler.taskmanager.TaskManagerDetailsHandler - > Implementation error: Unhandled exception. >
[jira] [Updated] (FLINK-8731) TwoInputStreamTaskTest flaky on travis
[ https://issues.apache.org/jira/browse/FLINK-8731?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Till Rohrmann updated FLINK-8731: - Fix Version/s: (was: 1.5.0) 1.5.1 > TwoInputStreamTaskTest flaky on travis > -- > > Key: FLINK-8731 > URL: https://issues.apache.org/jira/browse/FLINK-8731 > Project: Flink > Issue Type: Bug > Components: Streaming, Tests >Affects Versions: 1.5.0 >Reporter: Chesnay Schepler >Priority: Critical > Labels: test-stability > Fix For: 1.5.1 > > > https://travis-ci.org/zentol/flink/builds/344307861 > {code} > Tests run: 5, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 2.479 sec <<< > FAILURE! - in org.apache.flink.streaming.runtime.tasks.TwoInputStreamTaskTest > testOpenCloseAndTimestamps(org.apache.flink.streaming.runtime.tasks.TwoInputStreamTaskTest) > Time elapsed: 0.05 sec <<< ERROR! > java.lang.Exception: error in task > at > org.apache.flink.streaming.runtime.tasks.StreamTaskTestHarness.waitForTaskCompletion(StreamTaskTestHarness.java:250) > at > org.apache.flink.streaming.runtime.tasks.StreamTaskTestHarness.waitForTaskCompletion(StreamTaskTestHarness.java:233) > at > org.apache.flink.streaming.runtime.tasks.TwoInputStreamTaskTest.testOpenCloseAndTimestamps(TwoInputStreamTaskTest.java:99) > Caused by: org.mockito.exceptions.misusing.WrongTypeOfReturnValue: > Boolean cannot be returned by getChannelIndex() > getChannelIndex() should return int > *** > If you're unsure why you're getting above error read on. > Due to the nature of the syntax above problem might occur because: > 1. This exception *might* occur in wrongly written multi-threaded tests. >Please refer to Mockito FAQ on limitations of concurrency testing. > 2. A spy is stubbed using when(spy.foo()).then() syntax. It is safer to stub > spies - >- with doReturn|Throw() family of methods. More in javadocs for > Mockito.spy() method. > at > org.apache.flink.runtime.io.network.partition.consumer.UnionInputGate.waitAndGetNextInputGate(UnionInputGate.java:212) > at > org.apache.flink.runtime.io.network.partition.consumer.UnionInputGate.getNextBufferOrEvent(UnionInputGate.java:158) > at > org.apache.flink.streaming.runtime.io.BarrierBuffer.getNextNonBlocked(BarrierBuffer.java:164) > at > org.apache.flink.streaming.runtime.io.StreamTwoInputProcessor.processInput(StreamTwoInputProcessor.java:292) > at > org.apache.flink.streaming.runtime.tasks.TwoInputStreamTask.run(TwoInputStreamTask.java:115) > at > org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:308) > at > org.apache.flink.streaming.runtime.tasks.StreamTaskTestHarness$TaskThread.run(StreamTaskTestHarness.java:437) > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (FLINK-8971) Create general purpose testing job
[ https://issues.apache.org/jira/browse/FLINK-8971?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Till Rohrmann updated FLINK-8971: - Fix Version/s: (was: 1.5.0) 1.5.1 > Create general purpose testing job > -- > > Key: FLINK-8971 > URL: https://issues.apache.org/jira/browse/FLINK-8971 > Project: Flink > Issue Type: Task > Components: Tests >Affects Versions: 1.5.0 >Reporter: Till Rohrmann >Priority: Critical > Fix For: 1.5.1 > > > In order to write better end-to-end tests we need a general purpose testing > job which comprises as many Flink aspects as possible. These include > different types for records and state, user defined components, state types > and operators. > The job should allow to activate a certain misbehavior, such as slowing > certain paths down or throwing exceptions to simulate failures. > The job should come with a data generator which generates input data such > that the job can verify it's own behavior. This includes the state as well as > the input/output records. > We already have the [heavily misbehaved > job|https://github.com/rmetzger/flink-testing-jobs/blob/flink-1.4/basic-jobs/src/main/java/com/dataartisans/heavymisbehaved/HeavyMisbehavedJob.java] > which simulates some misbehavior. There is also the [state machine > job|https://github.com/dataArtisans/flink-testing-jobs/tree/master/streaming-state-machine] > which can verify itself for invalid state changes which indicate data loss. > We should incorporate their characteristics into the new general purpose job. > Additionally, the general purpose job should contain the following aspects: > * Job containing a sliding window aggregation > * At least one generic Kryo type > * At least one generic Avro type > * At least one Avro specific record type > * At least one input type for which we register a Kryo serializer > * At least one input type for which we provide a user defined serializer > * At least one state type for which we provide a user defined serializer > * At least one state type which uses the AvroSerializer > * Include an operator with ValueState > * Value state changes should be verified (e.g. predictable series of values) > * Include an operator with operator state > * Include an operator with broadcast state > * Broadcast state changes should be verified (e.g. predictable series of > values) > * Include union state > * User defined watermark assigner > The job should be made available in the {{flink-end-to-end-tests}} module. > This issue is intended to serve as an umbrella issue for developing and > extending this job. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (FLINK-6022) Don't serialise Schema when serialising Avro GenericRecord
[ https://issues.apache.org/jira/browse/FLINK-6022?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Till Rohrmann updated FLINK-6022: - Fix Version/s: (was: 1.5.0) 1.5.1 > Don't serialise Schema when serialising Avro GenericRecord > -- > > Key: FLINK-6022 > URL: https://issues.apache.org/jira/browse/FLINK-6022 > Project: Flink > Issue Type: Improvement > Components: Type Serialization System >Reporter: Robert Metzger >Assignee: Stephan Ewen >Priority: Major > Fix For: 1.5.1 > > > Currently, Flink is serializing the schema for each Avro GenericRecord in the > stream. > This leads to a lot of overhead over the wire/disk + high serialization costs. > Therefore, I'm proposing to improve the support for GenericRecord in Flink by > shipping the schema to each serializer through the AvroTypeInformation. > Then, we can only support GenericRecords with the same type per stream, but > the performance will be much better. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (FLINK-6997) SavepointITCase fails in master branch sometimes
[ https://issues.apache.org/jira/browse/FLINK-6997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Till Rohrmann updated FLINK-6997: - Fix Version/s: (was: 1.5.0) 1.5.1 > SavepointITCase fails in master branch sometimes > > > Key: FLINK-6997 > URL: https://issues.apache.org/jira/browse/FLINK-6997 > Project: Flink > Issue Type: Bug > Components: Tests >Affects Versions: 1.3.0, 1.5.0 >Reporter: Ted Yu >Priority: Critical > Labels: test-stability > Fix For: 1.5.1 > > > I got the following test failure (with commit > a0b781461bcf8c2f1d00b93464995f03eda589f1) > {code} > testSavepointForJobWithIteration(org.apache.flink.test.checkpointing.SavepointITCase) > Time elapsed: 8.129 sec <<< ERROR! > java.io.IOException: java.lang.Exception: Failed to complete savepoint > at > org.apache.flink.runtime.testingUtils.TestingCluster.triggerSavepoint(TestingCluster.scala:342) > at > org.apache.flink.runtime.testingUtils.TestingCluster.triggerSavepoint(TestingCluster.scala:316) > at > org.apache.flink.test.checkpointing.SavepointITCase.testSavepointForJobWithIteration(SavepointITCase.java:827) > Caused by: java.lang.Exception: Failed to complete savepoint > at > org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1$$anon$7.apply(JobManager.scala:821) > at > org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1$$anon$7.apply(JobManager.scala:805) > at > org.apache.flink.runtime.concurrent.impl.FlinkFuture$5.onComplete(FlinkFuture.java:272) > at akka.dispatch.OnComplete.internal(Future.scala:247) > at akka.dispatch.OnComplete.internal(Future.scala:245) > at akka.dispatch.japi$CallbackBridge.apply(Future.scala:175) > at akka.dispatch.japi$CallbackBridge.apply(Future.scala:172) > at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:32) > at > akka.dispatch.BatchingExecutor$AbstractBatch.processBatch(BatchingExecutor.scala:55) > at > akka.dispatch.BatchingExecutor$BlockableBatch$$anonfun$run$1.apply$mcV$sp(BatchingExecutor.scala:91) > at > akka.dispatch.BatchingExecutor$BlockableBatch$$anonfun$run$1.apply(BatchingExecutor.scala:91) > at > akka.dispatch.BatchingExecutor$BlockableBatch$$anonfun$run$1.apply(BatchingExecutor.scala:91) > at scala.concurrent.BlockContext$.withBlockContext(BlockContext.scala:72) > at > akka.dispatch.BatchingExecutor$BlockableBatch.run(BatchingExecutor.scala:90) > at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:40) > at > akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:397) > at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) > at > scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) > at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) > at > scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) > Caused by: java.lang.Exception: Failed to trigger savepoint: Not all required > tasks are currently running. > at > org.apache.flink.runtime.checkpoint.CheckpointCoordinator.triggerSavepoint(CheckpointCoordinator.java:382) > at > org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1.applyOrElse(JobManager.scala:800) > at > scala.runtime.AbstractPartialFunction$mcVL$sp.apply$mcVL$sp(AbstractPartialFunction.scala:33) > at > scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:33) > at > scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:25) > at > org.apache.flink.runtime.testingUtils.TestingJobManagerLike$$anonfun$handleTestingMessage$1.applyOrElse(TestingJobManagerLike.scala:95) > at scala.PartialFunction$OrElse.apply(PartialFunction.scala:162) > at > org.apache.flink.runtime.LeaderSessionMessageFilter$$anonfun$receive$1.applyOrElse(LeaderSessionMessageFilter.scala:38) > at > scala.runtime.AbstractPartialFunction$mcVL$sp.apply$mcVL$sp(AbstractPartialFunction.scala:33) > at > scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:33) > at > scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:25) > at org.apache.flink.runtime.LogMessages$$anon$1.apply(LogMessages.scala:33) > at org.apache.flink.runtime.LogMessages$$anon$1.apply(LogMessages.scala:28) > at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:118) > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (FLINK-5125) ContinuousFileProcessingCheckpointITCase is Flaky
[ https://issues.apache.org/jira/browse/FLINK-5125?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Till Rohrmann updated FLINK-5125: - Fix Version/s: (was: 1.5.0) 1.5.1 > ContinuousFileProcessingCheckpointITCase is Flaky > - > > Key: FLINK-5125 > URL: https://issues.apache.org/jira/browse/FLINK-5125 > Project: Flink > Issue Type: Bug > Components: Streaming Connectors >Reporter: Aljoscha Krettek >Assignee: Kostas Kloudas >Priority: Critical > Labels: test-stability > Fix For: 1.5.1 > > > This is the travis log: > https://api.travis-ci.org/jobs/177402367/log.txt?deansi=true > The relevant sections is: > {code} > Running org.apache.flink.test.checkpointing.CoStreamCheckpointingITCase > Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 2.571 sec - > in org.apache.flink.test.exampleJavaPrograms.EnumTriangleBasicITCase > Running org.apache.flink.test.checkpointing.EventTimeWindowCheckpointingITCase > Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 17.704 sec - > in org.apache.flink.test.checkpointing.CoStreamCheckpointingITCase > Running > org.apache.flink.test.checkpointing.EventTimeAllWindowCheckpointingITCase > Tests run: 5, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 11.805 sec - > in org.apache.flink.test.checkpointing.EventTimeAllWindowCheckpointingITCase > Running > org.apache.flink.test.checkpointing.ContinuousFileProcessingCheckpointITCase > org.apache.flink.client.program.ProgramInvocationException: The program > execution failed: Job execution failed. > at > org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:427) > at > org.apache.flink.client.program.StandaloneClusterClient.submitJob(StandaloneClusterClient.java:101) > at > org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:400) > at > org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:392) > at > org.apache.flink.streaming.api.environment.RemoteStreamEnvironment.executeRemotely(RemoteStreamEnvironment.java:209) > at > org.apache.flink.streaming.api.environment.RemoteStreamEnvironment.execute(RemoteStreamEnvironment.java:173) > at org.apache.flink.test.util.TestUtils.tryExecute(TestUtils.java:32) > at > org.apache.flink.test.checkpointing.StreamFaultToleranceTestBase.runCheckpointedProgram(StreamFaultToleranceTestBase.java:106) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:55) > at org.junit.rules.RunRules.evaluate(RunRules.java:20) > at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57) > at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290) > at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71) > at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288) > at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58) > at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268) > at > org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) > at > org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) > at org.junit.runners.ParentRunner.run(ParentRunner.java:363) > at > org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:283) > at > org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:173) > at > org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:153) > at > org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:128) > at > org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:203) > at > org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:155) > at >
[jira] [Updated] (FLINK-7009) dogstatsd mode in statsd reporter
[ https://issues.apache.org/jira/browse/FLINK-7009?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Till Rohrmann updated FLINK-7009: - Fix Version/s: (was: 1.5.0) 1.5.1 > dogstatsd mode in statsd reporter > - > > Key: FLINK-7009 > URL: https://issues.apache.org/jira/browse/FLINK-7009 > Project: Flink > Issue Type: Improvement > Components: Metrics >Affects Versions: 1.4.0 > Environment: org.apache.flink.metrics.statsd.StatsDReporter >Reporter: David Brinegar >Priority: Major > Fix For: 1.5.1 > > > The current statsd reporter can only report a subset of Flink metrics owing > to the manner in which Flink variables are handled, mainly around invalid > characters and metrics too long. As an option, it would be quite useful to > have a stricter dogstatsd compliant output. Dogstatsd metrics are tagged, > should be less than 200 characters including tag names and values, be > alphanumeric + underbar, delimited by periods. As a further pragmatic > restriction, negative and other invalid values should be ignored rather than > sent to the backend. These restrictions play well with a broad set of > collectors and time series databases. > This mode would: > * convert output to ascii alphanumeric characters with underbar, delimited by > periods. Runs of invalid characters within a metric segment would be > collapsed to a single underbar. > * report all Flink variables as tags > * compress overly long segments, say over 50 chars, to a symbolic > representation of the metric name, to preserve the unique metric time series > but avoid downstream truncation > * compress 32 character Flink IDs like tm_id, task_id, job_id, > task_attempt_id, to the first 8 characters, again to preserve enough > distinction amongst metrics while trimming up to 96 characters from the metric > * remove object references from names, such as the instance hash id of the > serializer > * drop negative or invalid numeric values such as "n/a", "-1" which is used > for unknowns like JVM.Memory.NonHeap.Max, and "-9223372036854775808" which is > used for unknowns like currentLowWaterMark > With these in place, it becomes quite reasonable to support LatencyGauge > metrics as well. > One idea for symbolic compression is to take the first 10 valid characters > plus a hash of the long name. For example, a value like this operator_name: > {code:java} > TriggerWindow(TumblingProcessingTimeWindows(5000), > ReducingStateDescriptor{serializer=org.apache.flink.api.java.typeutils.runtime.PojoSerializer@f3395ffa, > > reduceFunction=org.apache.flink.streaming.examples.socket.SocketWindowWordCount$1@4201c465}, > ProcessingTimeTrigger(), WindowedStream.reduce(WindowedStream.java-301)) > {code} > would first drop the instance references. The stable version would be: > > {code:java} > TriggerWindow(TumblingProcessingTimeWindows(5000), > ReducingStateDescriptor{serializer=org.apache.flink.api.java.typeutils.runtime.PojoSerializer, > > reduceFunction=org.apache.flink.streaming.examples.socket.SocketWindowWordCount$1}, > ProcessingTimeTrigger(), WindowedStream.reduce(WindowedStream.java-301)) > {code} > and then the compressed name would be the first ten valid characters plus the > hash of the stable string: > {code} > TriggerWin_d8c007da > {code} > This is just one way of dealing with unruly default names, the main point > would be to preserve the metrics so they are valid, avoid truncation, and can > be aggregated along other dimensions even if this particular dimension is > hard to parse after the compression. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (FLINK-9032) Deprecate ConfigConstants#YARN_CONTAINER_START_COMMAND_TEMPLATE
[ https://issues.apache.org/jira/browse/FLINK-9032?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Till Rohrmann updated FLINK-9032: - Fix Version/s: (was: 1.5.0) 1.5.1 > Deprecate ConfigConstants#YARN_CONTAINER_START_COMMAND_TEMPLATE > --- > > Key: FLINK-9032 > URL: https://issues.apache.org/jira/browse/FLINK-9032 > Project: Flink > Issue Type: Improvement > Components: Configuration, Documentation >Affects Versions: 1.5.0 >Reporter: Hai Zhou >Assignee: Hai Zhou >Priority: Major > Fix For: 1.6.0, 1.5.1 > > > this contants("yarn.container-start-command-template") has disappeared from > the [1.5.0-SNAPSHOT > docs|https://ci.apache.org/projects/flink/flink-docs-master/ops/config.html]. > We should restore it, and I think it should be renamed > "containerized.start-command-template". > [~Zentol], what do you think? -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (FLINK-8067) User code ClassLoader not set before calling ProcessingTimeCallback
[ https://issues.apache.org/jira/browse/FLINK-8067?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Till Rohrmann updated FLINK-8067: - Fix Version/s: (was: 1.5.0) 1.5.1 > User code ClassLoader not set before calling ProcessingTimeCallback > --- > > Key: FLINK-8067 > URL: https://issues.apache.org/jira/browse/FLINK-8067 > Project: Flink > Issue Type: Bug > Components: Streaming >Affects Versions: 1.4.0 >Reporter: Gary Yao >Priority: Minor > Fix For: 1.5.1 > > > The user code ClassLoader is not set as the context ClassLoader for the > thread invoking {{ProcessingTimeCallback#onProcessingTime(long timestamp)}}: > https://github.com/apache/flink/blob/84a07a34ac22af14f2dd0319447ca5f45de6d0bb/flink-streaming-java/src/main/java/org/apache/flink/streaming/runtime/tasks/StreamTask.java#L222 > This is problematic, for example, if user code dynamically loads classes in > {{ProcessFunction#onTimer(long timestamp, OnTimerContext ctx, Collector > out)}} using the current thread's context ClassLoader (also see FLINK-8005). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (FLINK-7727) Extend logging in file server handlers
[ https://issues.apache.org/jira/browse/FLINK-7727?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Till Rohrmann updated FLINK-7727: - Fix Version/s: (was: 1.5.0) 1.5.1 > Extend logging in file server handlers > -- > > Key: FLINK-7727 > URL: https://issues.apache.org/jira/browse/FLINK-7727 > Project: Flink > Issue Type: Improvement > Components: REST, Webfrontend >Affects Versions: 1.4.0 >Reporter: Chesnay Schepler >Assignee: Chesnay Schepler >Priority: Major > Fix For: 1.5.1 > > > The file server handlers check several failure conditions but don't log > anything (like the path), making debugging difficult. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (FLINK-7981) Bump commons-lang3 version to 3.6
[ https://issues.apache.org/jira/browse/FLINK-7981?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Till Rohrmann updated FLINK-7981: - Fix Version/s: (was: 1.5.0) 1.5.1 > Bump commons-lang3 version to 3.6 > - > > Key: FLINK-7981 > URL: https://issues.apache.org/jira/browse/FLINK-7981 > Project: Flink > Issue Type: Improvement > Components: Build System >Affects Versions: 1.4.0 >Reporter: Hai Zhou >Assignee: Hai Zhou >Priority: Trivial > Fix For: 1.5.1 > > > Update commons-lang3 from 3.3.2 to 3.6. > {{SerializationUtils.clone()}} of commons-lang3 (<3.5) has a bug that break > thread safety, which gets stack sometimes caused by race condition of > initializing hash map. > See https://issues.apache.org/jira/browse/LANG-1251. > **other** > [BEAM-2481:Update commons-lang3 dependency to version > 3.6|https://issues.apache.org/jira/browse/BEAM-2481] -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (FLINK-8417) Support STSAssumeRoleSessionCredentialsProvider in FlinkKinesisConsumer
[ https://issues.apache.org/jira/browse/FLINK-8417?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Till Rohrmann updated FLINK-8417: - Fix Version/s: (was: 1.5.0) 1.5.1 > Support STSAssumeRoleSessionCredentialsProvider in FlinkKinesisConsumer > --- > > Key: FLINK-8417 > URL: https://issues.apache.org/jira/browse/FLINK-8417 > Project: Flink > Issue Type: New Feature > Components: Kinesis Connector >Reporter: Tzu-Li (Gordon) Tai >Priority: Major > Fix For: 1.5.1 > > > As discussed in ML: > http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Kinesis-Connectors-With-Temporary-Credentials-td17734.html. > Users need the functionality to access cross-account AWS Kinesis streams, > using AWS Temporary Credentials [1]. > We should add support for > {{AWSConfigConstants.CredentialsProvider.STSAssumeRole}}, which internally > would use the {{STSAssumeRoleSessionCredentialsProvider}} [2] in > {{AWSUtil#getCredentialsProvider(Properties)}}. > [1] https://docs.aws.amazon.com/IAM/latest/UserGuide/id_credentials_temp.html > [2] > https://docs.aws.amazon.com/AWSJavaSDK/latest/javadoc/com/amazonaws/auth/STSAssumeRoleSessionCredentialsProvider.html -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (FLINK-8707) Excessive amount of files opened by flink task manager
[ https://issues.apache.org/jira/browse/FLINK-8707?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Till Rohrmann updated FLINK-8707: - Fix Version/s: (was: 1.5.0) 1.5.1 > Excessive amount of files opened by flink task manager > -- > > Key: FLINK-8707 > URL: https://issues.apache.org/jira/browse/FLINK-8707 > Project: Flink > Issue Type: Bug > Components: TaskManager >Affects Versions: 1.3.2 > Environment: NAME="Red Hat Enterprise Linux Server" > VERSION="7.3 (Maipo)" > Two boxes, each with a Job Manager & Task Manager, using Zookeeper for HA. > flink.yaml below with some settings (removed exact box names) etc: > env.log.dir: ...some dir...residing on the same box > env.pid.dir: some dir...residing on the same box > metrics.reporter.jmx.class: org.apache.flink.metrics.jmx.JMXReporter > metrics.reporters: jmx > state.backend: filesystem > state.backend.fs.checkpointdir: file:///some_nfs_mount > state.checkpoints.dir: file:///some_nfs_mount > state.checkpoints.num-retained: 3 > high-availability.cluster-id: /tst > high-availability.storageDir: file:///some_nfs_mount/ha > high-availability: zookeeper > high-availability.zookeeper.path.root: /flink > high-availability.zookeeper.quorum: ...list of zookeeper boxes > env.java.opts.jobmanager: ...some extra jar args > jobmanager.archive.fs.dir: some dir...residing on the same box > jobmanager.web.submit.enable: true > jobmanager.web.tmpdir: some dir...residing on the same box > env.java.opts.taskmanager: some extra jar args > taskmanager.tmp.dirs: some dir...residing on the same box/var/tmp > taskmanager.network.memory.min: 1024MB > taskmanager.network.memory.max: 2048MB > blob.storage.directory: some dir...residing on the same box >Reporter: Alexander Gardner >Priority: Critical > Fix For: 1.5.1 > > Attachments: AfterRunning-3-jobs-Box2-TM-JCONSOLE.png, > AfterRunning-3-jobs-TM-FDs-BOX2.jpg, AfterRunning-3-jobs-lsof-p.box2-TM, > AfterRunning-3-jobs-lsof.box2-TM, AterRunning-3-jobs-Box1-TM-JCONSOLE.png, > box1-jobmgr-lsof, box1-taskmgr-lsof, box2-jobmgr-lsof, box2-taskmgr-lsof, > ll.txt, ll.txt, lsof.txt, lsof.txt, lsofp.txt, lsofp.txt > > > The job manager has less FDs than the task manager. > > Hi > A support alert indicated that there were a lot of open files for the boxes > running Flink. > There were 4 flink jobs that were dormant but had consumed a number of msgs > from Kafka using the FlinkKafkaConsumer010. > A simple general lsof: > $ lsof | wc -l -> returned 153114 open file descriptors. > Focusing on the TaskManager process (process ID = 12154): > $ lsof | grep 12154 | wc -l- > returned 129322 open FDs > $ lsof -p 12154 | wc -l -> returned 531 FDs > There were 228 threads running for the task manager. > > Drilling down a bit further, looking at a_inode and FIFO entries: > $ lsof -p 12154 | grep a_inode | wc -l = 100 FDs > $ lsof -p 12154 | grep FIFO | wc -l = 200 FDs > $ /proc/12154/maps = 920 entries. > Apart from lsof identifying lots of JARs and SOs being referenced there were > also 244 child processes for the task manager process. > Noticed that in each environment, a creep of file descriptors...are the above > figures deemed excessive for the no of FDs in use? I know Flink uses Netty - > is it using a separate Selector for reads & writes? > Additionally Flink uses memory mapped files? or direct bytebuffers are these > skewing the numbers of FDs shown? > Example of one child process ID 6633: > java 12154 6633 dfdev 387u a_inode 0,9 0 5869 [eventpoll] > java 12154 6633 dfdev 388r FIFO 0,8 0t0 459758080 pipe > java 12154 6633 dfdev 389w FIFO 0,8 0t0 459758080 pipe > Lasty, cannot identify yet the reason for the creep in FDs even if Flink is > pretty dormant or has dormant jobs. Production nodes are not experiencing > excessive amounts of throughput yet either. > > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (FLINK-7816) Support Scala 2.12 closures and Java 8 lambdas in ClosureCleaner
[ https://issues.apache.org/jira/browse/FLINK-7816?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Till Rohrmann updated FLINK-7816: - Fix Version/s: (was: 1.5.0) 1.5.1 > Support Scala 2.12 closures and Java 8 lambdas in ClosureCleaner > > > Key: FLINK-7816 > URL: https://issues.apache.org/jira/browse/FLINK-7816 > Project: Flink > Issue Type: Sub-task > Components: Scala API >Reporter: Aljoscha Krettek >Priority: Major > Fix For: 1.5.1 > > > We have the same problem as Spark: SPARK-14540 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (FLINK-8946) TaskManager stop sending metrics after JobManager failover
[ https://issues.apache.org/jira/browse/FLINK-8946?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Till Rohrmann updated FLINK-8946: - Fix Version/s: (was: 1.5.0) 1.5.1 > TaskManager stop sending metrics after JobManager failover > -- > > Key: FLINK-8946 > URL: https://issues.apache.org/jira/browse/FLINK-8946 > Project: Flink > Issue Type: Bug > Components: Metrics, TaskManager >Affects Versions: 1.4.2 >Reporter: Truong Duc Kien >Assignee: vinoyang >Priority: Critical > Fix For: 1.5.1 > > > Running in Yarn-standalone mode, when the Job Manager performs a failover, > all TaskManager that are inherited from the previous JobManager will not send > metrics to the new JobManager and other registered metric reporters. > > A cursory glance reveal that these line of code might be the cause > [https://github.com/apache/flink/blob/a3478fdfa0f792104123fefbd9bdf01f5029de51/flink-runtime/src/main/scala/org/apache/flink/runtime/taskmanager/TaskManager.scala#L1082-L1086] > Perhap the TaskManager close its metrics group when disassociating > JobManager, but not creating a new one on fail-over association ? > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (FLINK-6856) Eagerly check if serializer config snapshots are deserializable when snapshotting
[ https://issues.apache.org/jira/browse/FLINK-6856?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Till Rohrmann updated FLINK-6856: - Fix Version/s: (was: 1.5.0) 1.5.1 > Eagerly check if serializer config snapshots are deserializable when > snapshotting > - > > Key: FLINK-6856 > URL: https://issues.apache.org/jira/browse/FLINK-6856 > Project: Flink > Issue Type: Improvement > Components: State Backends, Checkpointing >Affects Versions: 1.3.0 >Reporter: Tzu-Li (Gordon) Tai >Assignee: Tzu-Li (Gordon) Tai >Priority: Critical > Fix For: 1.5.1 > > > Currently, if serializer config snapshots are not deserializable (for > example, if the user did not correctly include the deserialization empty > constructor, or the read / write methods are simply wrongly implemented), > user's would only be able to find out this when restoring from the snapshot. > We could eagerly do a check for this when snapshotting, and fail with a good > message indicating that the config snapshot can not be deserialized. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (FLINK-7982) Bump commons-configuration to 2.2
[ https://issues.apache.org/jira/browse/FLINK-7982?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Till Rohrmann updated FLINK-7982: - Fix Version/s: (was: 1.5.0) 1.5.1 > Bump commons-configuration to 2.2 > - > > Key: FLINK-7982 > URL: https://issues.apache.org/jira/browse/FLINK-7982 > Project: Flink > Issue Type: Improvement > Components: Build System >Affects Versions: 1.4.0 >Reporter: Hai Zhou >Assignee: Hai Zhou >Priority: Major > Fix For: 1.5.1 > > > Currently the dependency > {{org.apache.commons:commons-configuration (version:1.7, Sep, 2011)}}, > update to > {{org.apache.commons: commons-configuration2: 2.2}} > Reference hadoop: > [Hadoop Commom: HADOOP-14648 - Bump commons-configuration2 to > 2.1.1|https://issues.apache.org/jira/browse/HADOOP-14648] > [Hadoop Common: HADOOP-13660 - Upgrade commons-configuration version to > 2.1|https://issues.apache.org/jira/browse/HADOOP-13660] -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (FLINK-5505) Harmonize ZooKeeper configuration parameters
[ https://issues.apache.org/jira/browse/FLINK-5505?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Till Rohrmann updated FLINK-5505: - Fix Version/s: (was: 1.5.0) 1.5.1 > Harmonize ZooKeeper configuration parameters > > > Key: FLINK-5505 > URL: https://issues.apache.org/jira/browse/FLINK-5505 > Project: Flink > Issue Type: Improvement > Components: Mesos >Affects Versions: 1.2.0, 1.3.0 >Reporter: Till Rohrmann >Priority: Trivial > Fix For: 1.5.1 > > > Since Flink users don't necessarily know all of the Mesos terminology and a > JobManager runs also as a task, I would like to rename some of Flink's Mesos > configuration parameters. I would propose the following changes: > {{mesos.initial-tasks}} => {{mesos.initial-taskmanagers}} > {{mesos.maximum-failed-tasks}} => {{mesos.maximum-failed-taskmanagers}} > {{mesos.resourcemanager.artifactserver.\*}} => {{mesos.artifactserver.*}} > {{mesos.resourcemanager.framework.\*}} => {{mesos.framework.*}} > {{mesos.resourcemanager.tasks.\*}} => {{mesos.taskmanager.*}} > {{recovery.zookeeper.path.mesos-workers}} => > {{mesos.high-availability.zookeeper.path.mesos-workers}} > What do you think [~eronwright]? -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (FLINK-7582) Rest client may buffer responses indefinitely under heavy laod
[ https://issues.apache.org/jira/browse/FLINK-7582?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Till Rohrmann updated FLINK-7582: - Fix Version/s: (was: 1.5.0) 1.5.1 > Rest client may buffer responses indefinitely under heavy laod > -- > > Key: FLINK-7582 > URL: https://issues.apache.org/jira/browse/FLINK-7582 > Project: Flink > Issue Type: Bug > Components: REST >Affects Versions: 1.4.0 >Reporter: Chesnay Schepler >Priority: Major > Fix For: 1.5.1 > > > The RestClient uses an executor for sending requests and parsing responses. > Under heavy load, i.e. lots of requests being sent, the executor may be used > exclusively for sending requests. The responses that are received by the > netty threads are thus never parsed and are buffered in memory, until either > requests stop coming in or all memory is used up. > We should let the netty receiver thread do the parsing as well. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (FLINK-6437) Move history server configuration to a separate file
[ https://issues.apache.org/jira/browse/FLINK-6437?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Till Rohrmann updated FLINK-6437: - Fix Version/s: (was: 1.5.0) 1.5.1 > Move history server configuration to a separate file > > > Key: FLINK-6437 > URL: https://issues.apache.org/jira/browse/FLINK-6437 > Project: Flink > Issue Type: Improvement > Components: History Server >Affects Versions: 1.3.0 >Reporter: Stephan Ewen >Priority: Major > Fix For: 1.5.1 > > > I suggest to keep the {{flink-conf.yaml}} leaner by moving configuration of > the History Server to a different file. > In general, I would propose to move configurations of separate, independent > and optional components to individual config files. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (FLINK-7881) flink can't deployed on yarn with ha
[ https://issues.apache.org/jira/browse/FLINK-7881?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Till Rohrmann updated FLINK-7881: - Fix Version/s: (was: 1.5.0) 1.5.1 > flink can't deployed on yarn with ha > > > Key: FLINK-7881 > URL: https://issues.apache.org/jira/browse/FLINK-7881 > Project: Flink > Issue Type: Bug > Components: YARN >Affects Versions: 1.3.2 >Reporter: deng >Priority: Critical > Fix For: 1.5.1 > > Attachments: screenshot-1.png, screenshot-2.png > > > I start yarn-ssession.sh on yarn, but it can't read hdfs logical url. It > always connect to hdfs://master:8020, it should be 9000, my hdfs defaultfs is > hdfs://master. > I have config the YARN_CONF_DIR and HADOOP_CONF_DIR, it didn't work. > Is it a bug? i use flink-1.3.0-bin-hadoop27-scala_2.10 > 2017-10-20 11:00:05,395 DEBUG org.apache.hadoop.ipc.Client > - IPC Client (1035144464) connection to > startdt/173.16.5.215:8020 from admin: closed > 2017-10-20 11:00:05,398 ERROR > org.apache.flink.yarn.YarnApplicationMasterRunner - YARN > Application Master initialization failed > java.net.ConnectException: Call From spark3/173.16.5.216 to master:8020 > failed on connection exception: java.net.ConnectException: Connection > refused; For more details see: > http://wiki.apache.org/hadoop/ConnectionRefused > at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) > at > sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57) > at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > at java.lang.reflect.Constructor.newInstance(Constructor.java:526) > at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:792) > at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:732) > at org.apache.hadoop.ipc.Client.call(Client.java:1479) > at org.apache.hadoop.ipc.Client.call(Client.java:1412) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229) > at com.sun.proxy.$Proxy9.getFileInfo(Unknown Source) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:771) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:191) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102) > at com.sun.proxy.$Proxy10.getFileInfo(Unknown Source) > at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:2108) > at > org.apache.hadoop.hdfs.DistributedFileSystem$22.doCall(DistributedFileSystem.java:1305) > at > org.apache.hadoop.hdfs.DistributedFileSystem$22.doCall(DistributedFileSystem.java:1301) > at > org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (FLINK-4052) Unstable test ConnectionUtilsTest
[ https://issues.apache.org/jira/browse/FLINK-4052?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Till Rohrmann updated FLINK-4052: - Fix Version/s: (was: 1.5.0) 1.5.1 > Unstable test ConnectionUtilsTest > - > > Key: FLINK-4052 > URL: https://issues.apache.org/jira/browse/FLINK-4052 > Project: Flink > Issue Type: Bug > Components: Tests >Affects Versions: 1.0.2, 1.3.0 >Reporter: Stephan Ewen >Assignee: Stephan Ewen >Priority: Critical > Labels: test-stability > Fix For: 1.1.0, 1.5.1 > > > The error is the following: > {code} > ConnectionUtilsTest.testReturnLocalHostAddressUsingHeuristics:59 > expected: but > was: > {code} > The probable cause for the failure is that the test tries to select an unused > closed port (from the ephemeral range), and then assumes that all connections > to that port fail. > If a concurrent test actually uses that port, connections to the port will > succeed. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (FLINK-7551) Add VERSION to the REST urls.
[ https://issues.apache.org/jira/browse/FLINK-7551?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Till Rohrmann updated FLINK-7551: - Fix Version/s: (was: 1.5.0) 1.5.1 > Add VERSION to the REST urls. > -- > > Key: FLINK-7551 > URL: https://issues.apache.org/jira/browse/FLINK-7551 > Project: Flink > Issue Type: Improvement > Components: REST >Affects Versions: 1.4.0 >Reporter: Kostas Kloudas >Priority: Major > Fix For: 1.5.1 > > > This is to guarantee that we can update the REST API without breaking > existing third-party clients. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (FLINK-8718) Non-parallel DataStreamSource does not set max parallelism
[ https://issues.apache.org/jira/browse/FLINK-8718?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Till Rohrmann updated FLINK-8718: - Fix Version/s: (was: 1.5.0) 1.5.1 > Non-parallel DataStreamSource does not set max parallelism > -- > > Key: FLINK-8718 > URL: https://issues.apache.org/jira/browse/FLINK-8718 > Project: Flink > Issue Type: Bug > Components: DataStream API, Streaming >Affects Versions: 1.5.0 >Reporter: Gary Yao >Assignee: Gary Yao >Priority: Major > Fix For: 1.5.1 > > > {{org.apache.flink.streaming.api.datastream.DataStreamSource}} does not set > {{maxParallelism}} to 1 if it is non-parallel. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (FLINK-8803) Mini Cluster Shutdown with HA unstable, causing test failures
[ https://issues.apache.org/jira/browse/FLINK-8803?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Till Rohrmann updated FLINK-8803: - Fix Version/s: (was: 1.5.0) 1.5.1 > Mini Cluster Shutdown with HA unstable, causing test failures > - > > Key: FLINK-8803 > URL: https://issues.apache.org/jira/browse/FLINK-8803 > Project: Flink > Issue Type: Bug > Components: Tests >Reporter: Stephan Ewen >Priority: Critical > Fix For: 1.5.1 > > > When the {{FlinkMiniCluster}} is created for HA tests with ZooKeeper, the > shutdown is unstable. > It looks like ZooKeeper may be shut down before the JobManager is shut down, > causing the shutdown procedure of the JobManager (specifically > {{ZooKeeperSubmittedJobGraphStore.removeJobGraph}}) to block until tests time > out. > Full log: https://api.travis-ci.org/v3/job/346853707/log.txt > Note that no ZK threads are alive any more, seems ZK is shut down already. > Relevant Stack Traces: > {code} > "main" #1 prio=5 os_prio=0 tid=0x7f973800a800 nid=0x43b4 waiting on > condition [0x7f973eb0b000] >java.lang.Thread.State: TIMED_WAITING (parking) > at sun.misc.Unsafe.park(Native Method) > - parking to wait for <0x8966cf18> (a > scala.concurrent.impl.Promise$CompletionLatch) > at > java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedNanos(AbstractQueuedSynchronizer.java:1037) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1328) > at > scala.concurrent.impl.Promise$DefaultPromise.tryAwait(Promise.scala:212) > at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:222) > at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:157) > at scala.concurrent.Await$$anonfun$ready$1.apply(package.scala:169) > at scala.concurrent.Await$$anonfun$ready$1.apply(package.scala:169) > at > scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53) > at scala.concurrent.Await$.ready(package.scala:169) > at > org.apache.flink.runtime.minicluster.FlinkMiniCluster.startInternalShutdown(FlinkMiniCluster.scala:469) > at > org.apache.flink.runtime.minicluster.FlinkMiniCluster.stop(FlinkMiniCluster.scala:435) > at > org.apache.flink.runtime.minicluster.FlinkMiniCluster.closeAsync(FlinkMiniCluster.scala:719) > at > org.apache.flink.test.util.MiniClusterResource.after(MiniClusterResource.java:104) > at org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:50) > ... > {code} > {code} > "flink-akka.actor.default-dispatcher-2" #1012 prio=5 os_prio=0 > tid=0x7f97394fa800 nid=0x3328 waiting on condition [0x7f971db29000] >java.lang.Thread.State: TIMED_WAITING (parking) > at sun.misc.Unsafe.park(Native Method) > - parking to wait for <0x87f82a70> (a > java.util.concurrent.CountDownLatch$Sync) > at > java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedNanos(AbstractQueuedSynchronizer.java:1037) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1328) > at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:277) > at > org.apache.flink.shaded.curator.org.apache.curator.CuratorZookeeperClient.internalBlockUntilConnectedOrTimedOut(CuratorZookeeperClient.java:336) > at > org.apache.flink.shaded.curator.org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:107) > at > org.apache.flink.shaded.curator.org.apache.curator.framework.imps.DeleteBuilderImpl.pathInForeground(DeleteBuilderImpl.java:241) > at > org.apache.flink.shaded.curator.org.apache.curator.framework.imps.DeleteBuilderImpl.forPath(DeleteBuilderImpl.java:225) > at > org.apache.flink.shaded.curator.org.apache.curator.framework.imps.DeleteBuilderImpl.forPath(DeleteBuilderImpl.java:35) > at > org.apache.flink.runtime.zookeeper.ZooKeeperStateHandleStore.release(ZooKeeperStateHandleStore.java:478) > at > org.apache.flink.runtime.zookeeper.ZooKeeperStateHandleStore.releaseAndTryRemove(ZooKeeperStateHandleStore.java:435) > at > org.apache.flink.runtime.zookeeper.ZooKeeperStateHandleStore.releaseAndTryRemove(ZooKeeperStateHandleStore.java:405) > at > org.apache.flink.runtime.jobmanager.ZooKeeperSubmittedJobGraphStore.removeJobGraph(ZooKeeperSubmittedJobGraphStore.java:266) > - locked <0x807f4258> (a java.lang.Object) > at >
[jira] [Updated] (FLINK-9142) Lower the minimum number of buffers for incoming channels to 1
[ https://issues.apache.org/jira/browse/FLINK-9142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Till Rohrmann updated FLINK-9142: - Fix Version/s: (was: 1.5.0) 1.5.1 > Lower the minimum number of buffers for incoming channels to 1 > -- > > Key: FLINK-9142 > URL: https://issues.apache.org/jira/browse/FLINK-9142 > Project: Flink > Issue Type: Sub-task > Components: Network >Affects Versions: 1.5.0, 1.6.0 >Reporter: Nico Kruber >Priority: Major > Fix For: 1.5.1 > > > Even if we make the floating buffers optional, we still require > {{taskmanager.network.memory.buffers-per-channel}} number of (exclusive) > buffers per incoming channel with credit-based flow control while without, > the minimum was 1 and only the maximum number of buffers was influenced by > this parameter. > {{taskmanager.network.memory.buffers-per-channel}} is set to {{2}} by default > with the argumentation that this way we will have one buffer available for > netty to process while a worker thread is processing/deserializing the other > buffer. While this seems reasonable, it does increase our minimum > requirements. Instead, we could probably live with {{1}} exclusive buffer and > up to {{gate.getNumberOfInputChannels() * (networkBuffersPerChannel - 1) + > extraNetworkBuffersPerGate}} floating buffers. That way we will have the same > memory footprint as before with only slightly changed behaviour. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (FLINK-8101) Elasticsearch 6.x support
[ https://issues.apache.org/jira/browse/FLINK-8101?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Till Rohrmann updated FLINK-8101: - Fix Version/s: (was: 1.5.0) 1.5.1 > Elasticsearch 6.x support > - > > Key: FLINK-8101 > URL: https://issues.apache.org/jira/browse/FLINK-8101 > Project: Flink > Issue Type: New Feature > Components: ElasticSearch Connector >Affects Versions: 1.4.0 >Reporter: Hai Zhou >Priority: Major > Fix For: 1.5.1 > > > Recently, elasticsearch 6.0.0 was released: > https://www.elastic.co/blog/elasticsearch-6-0-0-released > The minimum version of ES6 compatible Elasticsearch Java Client is 5.6.0 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (FLINK-8416) Kinesis consumer doc examples should demonstrate preferred default credentials provider
[ https://issues.apache.org/jira/browse/FLINK-8416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Till Rohrmann updated FLINK-8416: - Fix Version/s: (was: 1.5.0) 1.5.1 > Kinesis consumer doc examples should demonstrate preferred default > credentials provider > --- > > Key: FLINK-8416 > URL: https://issues.apache.org/jira/browse/FLINK-8416 > Project: Flink > Issue Type: Improvement > Components: Documentation, Kinesis Connector >Reporter: Tzu-Li (Gordon) Tai >Priority: Major > Fix For: 1.4.3, 1.3.4, 1.5.1 > > > The Kinesis consumer docs > [here](https://ci.apache.org/projects/flink/flink-docs-release-1.5/dev/connectors/kinesis.html#kinesis-consumer) > demonstrate providing credentials by explicitly supplying the AWS Access ID > and Key. > The always preferred approach for AWS, unless running locally, is to > automatically fetch the shipped credentials from the AWS environment. > That is actually the default behaviour of the Kinesis consumer, so the docs > should demonstrate that more clearly. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (FLINK-9010) NoResourceAvailableException with FLIP-6
[ https://issues.apache.org/jira/browse/FLINK-9010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Till Rohrmann updated FLINK-9010: - Fix Version/s: (was: 1.5.0) 1.5.1 > NoResourceAvailableException with FLIP-6 > - > > Key: FLINK-9010 > URL: https://issues.apache.org/jira/browse/FLINK-9010 > Project: Flink > Issue Type: Bug > Components: ResourceManager >Affects Versions: 1.5.0 >Reporter: Nico Kruber >Assignee: Nico Kruber >Priority: Critical > Labels: flip-6 > Fix For: 1.5.1 > > > I was trying to run a bigger program with 400 slots (100 TMs, 2 slots each) > with FLIP-6 mode and a checkpointing interval of 1000 and got the following > exception: > {code} > 2018-03-16 03:41:20,154 INFO org.apache.flink.yarn.YarnResourceManager > - Received new container: > container_1521038088305_0257_01_000101 - Remaining pending container > requests: 302 > 2018-03-16 03:41:20,154 INFO org.apache.flink.yarn.YarnResourceManager > - TaskExecutor container_1521038088305_0257_01_000101 will be > started with container size 8192 MB, JVM heap size 5120 MB, JVM direct memory > limit 3072 MB > 2018-03-16 03:41:20,154 INFO org.apache.flink.yarn.YarnResourceManager > - TM:remote keytab path obtained null > 2018-03-16 03:41:20,154 INFO org.apache.flink.yarn.YarnResourceManager > - TM:remote keytab principal obtained null > 2018-03-16 03:41:20,154 INFO org.apache.flink.yarn.YarnResourceManager > - TM:remote yarn conf path obtained null > 2018-03-16 03:41:20,154 INFO org.apache.flink.yarn.YarnResourceManager > - TM:remote krb5 path obtained null > 2018-03-16 03:41:20,155 INFO org.apache.flink.yarn.Utils > - Copying from > file:/mnt/yarn/usercache/hadoop/appcache/application_1521038088305_0257/container_1521038088305_0257_01_01/3cd0c7d7-ccc1-4680-83a5-54e64dd637bc-taskmanager-conf.yaml > to > hdfs://ip-172-31-1-91.eu-west-1.compute.internal:8020/user/hadoop/.flink/application_1521038088305_0257/3cd0c7d7-ccc1-4680-83a5-54e64dd637bc-taskmanager-conf.yaml > 2018-03-16 03:41:20,165 INFO org.apache.flink.yarn.YarnResourceManager > - Prepared local resource for modified yaml: resource { scheme: > "hdfs" host: "ip-172-31-1-91.eu-west-1.compute.internal" port: 8020 file: > "/user/hadoop/.flink/application_1521038088305_0257/3cd0c7d7-ccc1-4680-83a5-54e64dd637bc-taskmanager-conf.yaml" > } size: 595 timestamp: 1521171680164 type: FILE visibility: APPLICATION > 2018-03-16 03:41:20,168 INFO org.apache.flink.yarn.YarnResourceManager > - Creating container launch context for TaskManagers > 2018-03-16 03:41:20,168 INFO org.apache.flink.yarn.YarnResourceManager > - Starting TaskManagers with command: $JAVA_HOME/bin/java > -Xms5120m -Xmx5120m -XX:MaxDirectMemorySize=3072m > -Dlog.file=/taskmanager.log > -Dlogback.configurationFile=file:./logback.xml > -Dlog4j.configuration=file:./log4j.properties > org.apache.flink.yarn.YarnTaskExecutorRunner --configDir . 1> > /taskmanager.out 2> /taskmanager.err > 2018-03-16 03:41:20,176 INFO > org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy - > Opening proxy : ip-172-31-3-221.eu-west-1.compute.internal:8041 > 2018-03-16 03:41:20,180 INFO org.apache.flink.yarn.YarnResourceManager > - Received new container: > container_1521038088305_0257_01_000102 - Remaining pending container > requests: 301 > 2018-03-16 03:41:20,180 INFO org.apache.flink.yarn.YarnResourceManager > - TaskExecutor container_1521038088305_0257_01_000102 will be > started with container size 8192 MB, JVM heap size 5120 MB, JVM direct memory > limit 3072 MB > 2018-03-16 03:41:20,180 INFO org.apache.flink.yarn.YarnResourceManager > - TM:remote keytab path obtained null > 2018-03-16 03:41:20,180 INFO org.apache.flink.yarn.YarnResourceManager > - TM:remote keytab principal obtained null > 2018-03-16 03:41:20,180 INFO org.apache.flink.yarn.YarnResourceManager > - TM:remote yarn conf path obtained null > 2018-03-16 03:41:20,180 INFO org.apache.flink.yarn.YarnResourceManager > - TM:remote krb5 path obtained null > 2018-03-16 03:41:20,181 INFO org.apache.flink.yarn.Utils > - Copying from > file:/mnt/yarn/usercache/hadoop/appcache/application_1521038088305_0257/container_1521038088305_0257_01_01/6766be70-82f7-4999-a371-11c27527fb6e-taskmanager-conf.yaml > to >
[jira] [Updated] (FLINK-7351) test instability in JobClientActorRecoveryITCase#testJobClientRecovery
[ https://issues.apache.org/jira/browse/FLINK-7351?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Till Rohrmann updated FLINK-7351: - Fix Version/s: (was: 1.5.0) 1.5.1 > test instability in JobClientActorRecoveryITCase#testJobClientRecovery > -- > > Key: FLINK-7351 > URL: https://issues.apache.org/jira/browse/FLINK-7351 > Project: Flink > Issue Type: Bug > Components: Job-Submission, Tests >Affects Versions: 1.4.0, 1.3.2 >Reporter: Nico Kruber >Priority: Critical > Labels: test-stability > Fix For: 1.5.1 > > > On a 16-core VM, the following test failed during {{mvn clean verify}} > {code} > Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 22.814 sec > <<< FAILURE! - in org.apache.flink.runtime.client.JobClientActorRecoveryITCase > testJobClientRecovery(org.apache.flink.runtime.client.JobClientActorRecoveryITCase) > Time elapsed: 21.299 sec <<< ERROR! > org.apache.flink.runtime.client.JobExecutionException: Job execution failed. > at > org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1$$anonfun$applyOrElse$7.apply$mcV$sp(JobManager.scala:933) > at > org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1$$anonfun$applyOrElse$7.apply(JobManager.scala:876) > at > org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1$$anonfun$applyOrElse$7.apply(JobManager.scala:876) > at > scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24) > at > scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24) > at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:40) > at > akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:397) > at > scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) > at > scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) > at > scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) > at > scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) > Caused by: > org.apache.flink.runtime.jobmanager.scheduler.NoResourceAvailableException: > Not enough free slots available to run the job. You can decrease the operator > parallelism or increase the number of slots per TaskManager in the > configuration. Resources available to scheduler: Number of instances=0, total > number of slots=0, available slots=0 > at > org.apache.flink.runtime.jobmanager.scheduler.Scheduler.scheduleTask(Scheduler.java:334) > at > org.apache.flink.runtime.jobmanager.scheduler.Scheduler.allocateSlot(Scheduler.java:139) > at > org.apache.flink.runtime.executiongraph.Execution.allocateSlotForExecution(Execution.java:368) > at > org.apache.flink.runtime.executiongraph.Execution.scheduleForExecution(Execution.java:309) > at > org.apache.flink.runtime.executiongraph.ExecutionVertex.scheduleForExecution(ExecutionVertex.java:596) > at > org.apache.flink.runtime.executiongraph.ExecutionJobVertex.scheduleAll(ExecutionJobVertex.java:450) > at > org.apache.flink.runtime.executiongraph.ExecutionGraph.scheduleLazy(ExecutionGraph.java:834) > at > org.apache.flink.runtime.executiongraph.ExecutionGraph.scheduleForExecution(ExecutionGraph.java:814) > at > org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$1.apply$mcV$sp(JobManager.scala:1425) > at > org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$1.apply(JobManager.scala:1372) > at > org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$1.apply(JobManager.scala:1372) > at > scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24) > at > scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24) > at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:40) > at > akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:397) > at > scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) > at > scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) > at > scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) > at > scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (FLINK-8357) enable rolling in default log settings
[ https://issues.apache.org/jira/browse/FLINK-8357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Till Rohrmann updated FLINK-8357: - Fix Version/s: (was: 1.5.0) 1.5.1 > enable rolling in default log settings > -- > > Key: FLINK-8357 > URL: https://issues.apache.org/jira/browse/FLINK-8357 > Project: Flink > Issue Type: Improvement > Components: Logging >Reporter: Xu Mingmin >Assignee: mingleizhang >Priority: Major > Fix For: 1.5.1 > > > The release packages uses {{org.apache.log4j.FileAppender}} for log4j and > {{ch.qos.logback.core.FileAppender}} for logback, which could results in very > large log files. > For most cases, if not all, we need to enable rotation in a production > cluster, and I suppose it's a good idea to make rotation as default. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (FLINK-9253) Make buffer count per InputGate always #channels*buffersPerChannel + ExclusiveBuffers
[ https://issues.apache.org/jira/browse/FLINK-9253?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Till Rohrmann updated FLINK-9253: - Fix Version/s: (was: 1.5.0) 1.5.1 > Make buffer count per InputGate always #channels*buffersPerChannel + > ExclusiveBuffers > - > > Key: FLINK-9253 > URL: https://issues.apache.org/jira/browse/FLINK-9253 > Project: Flink > Issue Type: Sub-task > Components: Network >Affects Versions: 1.5.0, 1.6.0 >Reporter: Nico Kruber >Assignee: Nico Kruber >Priority: Major > Fix For: 1.5.1 > > > The credit-based flow control path assigns exclusive buffers only to remote > channels (which makes sense since local channels don't use any own buffers). > However, this is a bit intransparent with respect to how much data may be in > buffers since this depends on the actual schedule of the job and not the job > graph. > By adapting the floating buffers to use a maximum of > {{#channels*buffersPerChannel + floatingBuffersPerGate - #exclusiveBuffers}}, > we would be channel-type agnostic and keep the old behaviour. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (FLINK-8408) YarnFileStageTestS3ITCase.testRecursiveUploadForYarnS3n unstable on Travis
[ https://issues.apache.org/jira/browse/FLINK-8408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Till Rohrmann updated FLINK-8408: - Fix Version/s: (was: 1.5.0) 1.5.1 > YarnFileStageTestS3ITCase.testRecursiveUploadForYarnS3n unstable on Travis > -- > > Key: FLINK-8408 > URL: https://issues.apache.org/jira/browse/FLINK-8408 > Project: Flink > Issue Type: Bug > Components: FileSystem, Tests >Affects Versions: 1.5.0 >Reporter: Till Rohrmann >Assignee: Nico Kruber >Priority: Critical > Labels: test-stability > Fix For: 1.4.3, 1.5.1 > > > The {{YarnFileStageTestS3ITCase.testRecursiveUploadForYarnS3n}} is unstable > on Travis. > https://travis-ci.org/tillrohrmann/flink/jobs/327216460 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (FLINK-9098) ClassLoaderITCase unstable
[ https://issues.apache.org/jira/browse/FLINK-9098?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Till Rohrmann updated FLINK-9098: - Fix Version/s: (was: 1.5.0) 1.5.1 > ClassLoaderITCase unstable > -- > > Key: FLINK-9098 > URL: https://issues.apache.org/jira/browse/FLINK-9098 > Project: Flink > Issue Type: Bug > Components: Tests >Reporter: Stephan Ewen >Priority: Critical > Fix For: 1.5.1 > > > The some savepoint disposal seems to fail, after that all successive tests > fail because there are not anymore enough slots. > Full log: https://api.travis-ci.org/v3/job/356900367/log.txt -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (FLINK-8820) FlinkKafkaConsumer010 reads too many bytes
[ https://issues.apache.org/jira/browse/FLINK-8820?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Till Rohrmann updated FLINK-8820: - Fix Version/s: (was: 1.5.0) 1.5.1 > FlinkKafkaConsumer010 reads too many bytes > -- > > Key: FLINK-8820 > URL: https://issues.apache.org/jira/browse/FLINK-8820 > Project: Flink > Issue Type: Bug > Components: Kafka Connector >Affects Versions: 1.4.0 >Reporter: Fabian Hueske >Priority: Critical > Fix For: 1.5.1 > > > A user reported that the FlinkKafkaConsumer010 very rarely consumes too many > bytes, i.e., the returned message is too large. The application is running > for about a year and the problem started to occur after upgrading to Flink > 1.4.0. > The user made a good effort in debugging the problem but was not able to > reproduce it in a controlled environment. It seems that the data is correctly > stored in Kafka. > Here's the thread on the thread on the user mailing list for a detailed > description of the problem and analysis so far: > https://lists.apache.org/thread.html/1d62f616d275e9e23a5215ddf7f5466051be7ea96897d827232fcb4e@%3Cuser.flink.apache.org%3E -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (FLINK-8418) Kafka08ITCase.testStartFromLatestOffsets() times out on Travis
[ https://issues.apache.org/jira/browse/FLINK-8418?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Till Rohrmann updated FLINK-8418: - Fix Version/s: (was: 1.5.0) 1.5.1 > Kafka08ITCase.testStartFromLatestOffsets() times out on Travis > -- > > Key: FLINK-8418 > URL: https://issues.apache.org/jira/browse/FLINK-8418 > Project: Flink > Issue Type: Bug > Components: Kafka Connector, Tests >Affects Versions: 1.4.0, 1.5.0 >Reporter: Tzu-Li (Gordon) Tai >Assignee: Tzu-Li (Gordon) Tai >Priority: Critical > Labels: test-stability > Fix For: 1.4.3, 1.5.1 > > > Instance: https://travis-ci.org/kl0u/flink/builds/327733085 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (FLINK-5789) Make Bucketing Sink independent of Hadoop's FileSystem
[ https://issues.apache.org/jira/browse/FLINK-5789?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Till Rohrmann updated FLINK-5789: - Fix Version/s: (was: 1.5.0) 1.5.1 > Make Bucketing Sink independent of Hadoop's FileSystem > -- > > Key: FLINK-5789 > URL: https://issues.apache.org/jira/browse/FLINK-5789 > Project: Flink > Issue Type: Bug > Components: Streaming Connectors >Affects Versions: 1.2.0, 1.1.4 >Reporter: Stephan Ewen >Priority: Major > Fix For: 1.5.1 > > > The {{BucketingSink}} is hard wired to Hadoop's FileSystem, bypassing Flink's > file system abstraction. > This causes several issues: > - The bucketing sink will behave different than other file sinks with > respect to configuration > - Directly supported file systems (not through hadoop) like the MapR File > System does not work in the same way with the BuketingSink as other file > systems > - The previous point is all the more problematic in the effort to make > Hadoop an optional dependency and with in other stacks (Mesos, Kubernetes, > AWS, GCE, Azure) with ideally no Hadoop dependency. > We should port the {{BucketingSink}} to use Flink's FileSystem classes. > To support the *truncate* functionality that is needed for the exactly-once > semantics of the Bucketing Sink, we should extend Flink's FileSystem > abstraction to have the methods > - {{boolean supportsTruncate()}} > - {{void truncate(Path, long)}} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (FLINK-8521) FlinkKafkaProducer011ITCase.testRunOutOfProducersInThePool timed out on Travis
[ https://issues.apache.org/jira/browse/FLINK-8521?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Till Rohrmann updated FLINK-8521: - Fix Version/s: (was: 1.5.0) 1.5.1 > FlinkKafkaProducer011ITCase.testRunOutOfProducersInThePool timed out on Travis > -- > > Key: FLINK-8521 > URL: https://issues.apache.org/jira/browse/FLINK-8521 > Project: Flink > Issue Type: Bug > Components: Kafka Connector, Tests >Affects Versions: 1.5.0 >Reporter: Till Rohrmann >Assignee: Piotr Nowojski >Priority: Critical > Labels: test-stability > Fix For: 1.4.3, 1.5.1 > > > The {{FlinkKafkaProducer011ITCase.testRunOutOfProducersInThePool}} timed out > on Travis with producing no output for longer than 300s. > > https://travis-ci.org/tillrohrmann/flink/jobs/334642014 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (FLINK-6571) InfiniteSource in SourceStreamOperatorTest should deal with InterruptedExceptions
[ https://issues.apache.org/jira/browse/FLINK-6571?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Till Rohrmann updated FLINK-6571: - Fix Version/s: (was: 1.5.0) 1.5.1 > InfiniteSource in SourceStreamOperatorTest should deal with > InterruptedExceptions > - > > Key: FLINK-6571 > URL: https://issues.apache.org/jira/browse/FLINK-6571 > Project: Flink > Issue Type: Improvement > Components: Tests >Affects Versions: 1.3.0, 1.4.0, 1.5.0 >Reporter: Chesnay Schepler >Assignee: Chesnay Schepler >Priority: Critical > Labels: test-stability > Fix For: 1.5.1 > > > So this is a new one: i got a failing test > ({{testNoMaxWatermarkOnAsyncStop}}) due to an uncatched InterruptedException. > {code} > [00:28:15] Tests run: 7, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: > 0.828 sec <<< FAILURE! - in > org.apache.flink.streaming.runtime.operators.StreamSourceOperatorTest > [00:28:15] > testNoMaxWatermarkOnAsyncStop(org.apache.flink.streaming.runtime.operators.StreamSourceOperatorTest) > Time elapsed: 0 sec <<< ERROR! > [00:28:15] java.lang.InterruptedException: sleep interrupted > [00:28:15]at java.lang.Thread.sleep(Native Method) > [00:28:15]at > org.apache.flink.streaming.runtime.operators.StreamSourceOperatorTest$InfiniteSource.run(StreamSourceOperatorTest.java:343) > [00:28:15]at > org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:87) > [00:28:15]at > org.apache.flink.streaming.runtime.operators.StreamSourceOperatorTest.testNoMaxWatermarkOnAsyncStop(StreamSourceOperatorTest.java:176) > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (FLINK-7964) Add Apache Kafka 1.0 connector
[ https://issues.apache.org/jira/browse/FLINK-7964?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Till Rohrmann updated FLINK-7964: - Fix Version/s: (was: 1.5.0) 1.5.1 > Add Apache Kafka 1.0 connector > -- > > Key: FLINK-7964 > URL: https://issues.apache.org/jira/browse/FLINK-7964 > Project: Flink > Issue Type: Improvement > Components: Kafka Connector >Affects Versions: 1.4.0 >Reporter: Hai Zhou >Assignee: Hai Zhou >Priority: Major > Fix For: 1.5.1 > > > Kafka 1.0.0 is no mere bump of the version number. The Apache Kafka Project > Management Committee has packed a number of valuable enhancements into the > release. Here is a summary of a few of them: > * Since its introduction in version 0.10, the Streams API has become hugely > popular among Kafka users, including the likes of Pinterest, Rabobank, > Zalando, and The New York Times. In 1.0, the the API continues to evolve at a > healthy pace. To begin with, the builder API has been improved (KIP-120). A > new API has been added to expose the state of active tasks at runtime > (KIP-130). The new cogroup API makes it much easier to deal with partitioned > aggregates with fewer StateStores and fewer moving parts in your code > (KIP-150). Debuggability gets easier with enhancements to the print() and > writeAsText() methods (KIP-160). And if that’s not enough, check out KIP-138 > and KIP-161 too. For more on streams, check out the Apache Kafka Streams > documentation, including some helpful new tutorial videos. > * Operating Kafka at scale requires that the system remain observable, and to > make that easier, we’ve made a number of improvements to metrics. These are > too many to summarize without becoming tedious, but Connect metrics have been > significantly improved (KIP-196), a litany of new health check metrics are > now exposed (KIP-188), and we now have a global topic and partition count > (KIP-168). Check out KIP-164 and KIP-187 for even more. > * We now support Java 9, leading, among other things, to significantly faster > TLS and CRC32C implementations. Over-the-wire encryption will be faster now, > which will keep Kafka fast and compute costs low when encryption is enabled. > * In keeping with the security theme, KIP-152 cleans up the error handling on > Simple Authentication Security Layer (SASL) authentication attempts. > Previously, some authentication error conditions were indistinguishable from > broker failures and were not logged in a clear way. This is cleaner now. > * Kafka can now tolerate disk failures better. Historically, JBOD storage > configurations have not been recommended, but the architecture has > nevertheless been tempting: after all, why not rely on Kafka’s own > replication mechanism to protect against storage failure rather than using > RAID? With KIP-112, Kafka now handles disk failure more gracefully. A single > disk failure in a JBOD broker will not bring the entire broker down; rather, > the broker will continue serving any log files that remain on functioning > disks. > * Since release 0.11.0, the idempotent producer (which is the producer used > in the presence of a transaction, which of course is the producer we use for > exactly-once processing) required max.in.flight.requests.per.connection to be > equal to one. As anyone who has written or tested a wire protocol can attest, > this put an upper bound on throughput. Thanks to KAFKA-5949, this can now be > as large as five, relaxing the throughput constraint quite a bit. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (FLINK-8135) Add description to MessageParameter
[ https://issues.apache.org/jira/browse/FLINK-8135?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Till Rohrmann updated FLINK-8135: - Fix Version/s: (was: 1.5.0) 1.5.1 > Add description to MessageParameter > --- > > Key: FLINK-8135 > URL: https://issues.apache.org/jira/browse/FLINK-8135 > Project: Flink > Issue Type: Improvement > Components: Documentation, REST >Reporter: Chesnay Schepler >Assignee: Andrei >Priority: Major > Fix For: 1.5.1 > > > For documentation purposes we should add an {{getDescription()}} method to > the {{MessageParameter}} class, describing what this particular parameter is > used for and which values are accepted. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (FLINK-8410) Kafka consumer's commitedOffsets gauge metric is prematurely set
[ https://issues.apache.org/jira/browse/FLINK-8410?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Till Rohrmann updated FLINK-8410: - Fix Version/s: (was: 1.5.0) 1.5.1 > Kafka consumer's commitedOffsets gauge metric is prematurely set > > > Key: FLINK-8410 > URL: https://issues.apache.org/jira/browse/FLINK-8410 > Project: Flink > Issue Type: Bug > Components: Kafka Connector, Metrics >Affects Versions: 1.4.0, 1.3.2, 1.5.0 >Reporter: Tzu-Li (Gordon) Tai >Assignee: Tzu-Li (Gordon) Tai >Priority: Major > Fix For: 1.4.3, 1.3.4, 1.5.1 > > > The {{committedOffset}} metric gauge value is set too early. It is set here: > https://github.com/apache/flink/blob/master/flink-connectors/flink-connector-kafka-0.9/src/main/java/org/apache/flink/streaming/connectors/kafka/internal/Kafka09Fetcher.java#L236 > This sets the committed offset before the actual commit happens, which varies > depending on whether the commit mode is auto periodically, or committed on > checkpoints. Moreover, in the 0.9+ consumers, the {{KafkaConsumerThread}} may > choose to supersede some commit attempts if the commit takes longer than the > commit interval. > While the committed offset back to Kafka is not a critical value used by the > consumer, it will be best to have more accurate values as a Flink-shipped > metric. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (FLINK-9257) End-to-end tests prints "All tests PASS" even if individual test-script returns non-zero exit code
[ https://issues.apache.org/jira/browse/FLINK-9257?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Till Rohrmann updated FLINK-9257: - Fix Version/s: (was: 1.5.0) 1.5.1 > End-to-end tests prints "All tests PASS" even if individual test-script > returns non-zero exit code > -- > > Key: FLINK-9257 > URL: https://issues.apache.org/jira/browse/FLINK-9257 > Project: Flink > Issue Type: Bug > Components: Tests >Affects Versions: 1.5.0 >Reporter: Florian Schmidt >Assignee: Florian Schmidt >Priority: Critical > Fix For: 1.5.1 > > > In some cases the test-suite exits with non-zero exit code but still prints > "All tests PASS" to stdout. This happens because how the test runner works, > which is roughly as follows > # Either run-nightly-tests.sh or run-precommit-tests.sh executes a suite of > tests consisting of one multiple bash scripts. > # As soon as one of those bash scripts exists with non-zero exit code, the > tests won't continue to run and the test-suite will also exit with non-zero > exit code. > # *During the cleanup hook (trap cleanup EXIT in common.sh) it will be > checked whether there are non-empty out files or log files with certain > exceptions. If a tests fails with non-zero exit code, but does not have any > exceptions or .out files, this will still print "All tests PASS" to stdout, > even though they don't* > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (FLINK-8768) Change NettyMessageDecoder to inherit from LengthFieldBasedFrameDecoder
[ https://issues.apache.org/jira/browse/FLINK-8768?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Till Rohrmann updated FLINK-8768: - Fix Version/s: (was: 1.5.0) 1.5.1 > Change NettyMessageDecoder to inherit from LengthFieldBasedFrameDecoder > --- > > Key: FLINK-8768 > URL: https://issues.apache.org/jira/browse/FLINK-8768 > Project: Flink > Issue Type: Improvement > Components: Network >Reporter: Nico Kruber >Assignee: Nico Kruber >Priority: Major > Fix For: 1.5.1 > > > Let {{NettyMessageDecoder}} inherit from {{LengthFieldBasedFrameDecoder}} > instead of being an additional step in the pipeline. This does not only > remove overhead in the pipeline itself but also allows use to override the > {{#extractFrame()}} method to restore the old Netty 4.0.27 behaviour for > non-credit based code paths which had a bug with Netty >= 4.0.28 there. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (FLINK-9193) Deprecate non-well-defined output methods on DataStream
[ https://issues.apache.org/jira/browse/FLINK-9193?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Till Rohrmann updated FLINK-9193: - Fix Version/s: (was: 1.5.0) 1.5.1 > Deprecate non-well-defined output methods on DataStream > --- > > Key: FLINK-9193 > URL: https://issues.apache.org/jira/browse/FLINK-9193 > Project: Flink > Issue Type: Improvement > Components: DataStream API >Reporter: Aljoscha Krettek >Assignee: Aljoscha Krettek >Priority: Major > Fix For: 1.5.1 > > > Some output methods on {{DataStream}} that write text to files are not safe > to use in a streaming program as they have no consistency guarantees. They > are: > - {{writeAsText()}} > - {{writeAsCsv()}} > - {{writeToSocket()}} > - {{writeUsingOutputFormat()}} > Along with those we should also deprecate the {{SinkFunctions}} that they use. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (FLINK-9355) Simplify configuration of local recovery to a simple on/off
[ https://issues.apache.org/jira/browse/FLINK-9355?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Till Rohrmann updated FLINK-9355: - Fix Version/s: (was: 1.5.0) 1.5.1 > Simplify configuration of local recovery to a simple on/off > --- > > Key: FLINK-9355 > URL: https://issues.apache.org/jira/browse/FLINK-9355 > Project: Flink > Issue Type: Improvement > Components: State Backends, Checkpointing >Affects Versions: 1.5.0, 1.6.0 >Reporter: Stefan Richter >Assignee: Stefan Richter >Priority: Major > Fix For: 1.5.1 > > > We can change the configuration of local recovery to a simple > enabled/disabled choice. In the future, if a backend can offer different > implementations of local recovery, the backend can offer its own config > constants as an expert option. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (FLINK-8825) Disallow new String() without charset in checkstyle
[ https://issues.apache.org/jira/browse/FLINK-8825?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Till Rohrmann updated FLINK-8825: - Fix Version/s: (was: 1.5.0) 1.5.1 > Disallow new String() without charset in checkstyle > --- > > Key: FLINK-8825 > URL: https://issues.apache.org/jira/browse/FLINK-8825 > Project: Flink > Issue Type: Bug > Components: Tests >Reporter: Aljoscha Krettek >Assignee: vinoyang >Priority: Major > Fix For: 1.5.1 > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (FLINK-8452) BucketingSinkTest.testNonRollingAvroKeyValueWithCompressionWriter instable on Travis
[ https://issues.apache.org/jira/browse/FLINK-8452?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Till Rohrmann updated FLINK-8452: - Fix Version/s: (was: 1.5.0) 1.5.1 > BucketingSinkTest.testNonRollingAvroKeyValueWithCompressionWriter instable on > Travis > > > Key: FLINK-8452 > URL: https://issues.apache.org/jira/browse/FLINK-8452 > Project: Flink > Issue Type: Bug > Components: DataStream API, Tests >Affects Versions: 1.5.0 >Reporter: Till Rohrmann >Priority: Critical > Labels: test-stability > Fix For: 1.5.1 > > > The {{BucketingSinkTest.testNonRollingAvroKeyValueWithCompressionWriter}} > seems to be instable on Travis: > > https://travis-ci.org/tillrohrmann/flink/jobs/330261310 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (FLINK-7916) Remove NetworkStackThroughputITCase
[ https://issues.apache.org/jira/browse/FLINK-7916?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Till Rohrmann updated FLINK-7916: - Fix Version/s: (was: 1.5.0) 1.5.1 > Remove NetworkStackThroughputITCase > --- > > Key: FLINK-7916 > URL: https://issues.apache.org/jira/browse/FLINK-7916 > Project: Flink > Issue Type: Task > Components: Tests >Affects Versions: 1.4.0 >Reporter: Till Rohrmann >Priority: Major > Fix For: 1.5.1 > > > Flink's code base contains the {{NetworkStackThroughputITCase}} which is not > really a test. Moreover it is marked as {{Ignored}}. I propose to remove this > test because it is more of a benchmark. We could think about creating a > benchmark project where we move these kind of "tests". > In general I think we should remove ignored tests if they won't be fixed > immediately. The danger is far too high that we forget about them and then we > only keep the maintenance burden of it. This is especially true for the above > mentioned test case. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (FLINK-8623) ConnectionUtilsTest.testReturnLocalHostAddressUsingHeuristics unstable on Travis
[ https://issues.apache.org/jira/browse/FLINK-8623?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Till Rohrmann updated FLINK-8623: - Fix Version/s: (was: 1.5.0) 1.5.1 > ConnectionUtilsTest.testReturnLocalHostAddressUsingHeuristics unstable on > Travis > > > Key: FLINK-8623 > URL: https://issues.apache.org/jira/browse/FLINK-8623 > Project: Flink > Issue Type: Bug > Components: Tests >Affects Versions: 1.5.0 >Reporter: Till Rohrmann >Priority: Critical > Labels: test-stability > Fix For: 1.4.3, 1.5.1 > > > {{ConnectionUtilsTest.testReturnLocalHostAddressUsingHeuristics}} fails on > Travis: https://travis-ci.org/apache/flink/jobs/33932 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (FLINK-9360) HA end-to-end nightly test takes more than 15 min in Travis CI
[ https://issues.apache.org/jira/browse/FLINK-9360?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Till Rohrmann updated FLINK-9360: - Fix Version/s: (was: 1.5.0) 1.5.1 > HA end-to-end nightly test takes more than 15 min in Travis CI > -- > > Key: FLINK-9360 > URL: https://issues.apache.org/jira/browse/FLINK-9360 > Project: Flink > Issue Type: Test > Components: Tests >Affects Versions: 1.5.0 >Reporter: Andrey Zagrebin >Priority: Major > Labels: E2E, Nightly > Fix For: 1.5.1 > > > We have not discussed how long the nightly tests should run. Currently > overall testing build time is around the limit of free Travis CI VM (50 min). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (FLINK-8160) Extend OperatorHarness to expose metrics
[ https://issues.apache.org/jira/browse/FLINK-8160?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Till Rohrmann updated FLINK-8160: - Fix Version/s: (was: 1.5.0) 1.5.1 > Extend OperatorHarness to expose metrics > > > Key: FLINK-8160 > URL: https://issues.apache.org/jira/browse/FLINK-8160 > Project: Flink > Issue Type: Improvement > Components: Metrics, Streaming >Reporter: Chesnay Schepler >Assignee: Tuo Wang >Priority: Major > Fix For: 1.5.1 > > > To better test interactions between operators and metrics the harness should > expose the metrics registered by the operator. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (FLINK-8192) Properly annotate APIs of all Flink connectors with @Public / @PublicEvolving / @Internal
[ https://issues.apache.org/jira/browse/FLINK-8192?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Till Rohrmann updated FLINK-8192: - Fix Version/s: (was: 1.5.0) 1.5.1 > Properly annotate APIs of all Flink connectors with @Public / @PublicEvolving > / @Internal > - > > Key: FLINK-8192 > URL: https://issues.apache.org/jira/browse/FLINK-8192 > Project: Flink > Issue Type: Improvement > Components: Streaming Connectors >Reporter: Tzu-Li (Gordon) Tai >Priority: Major > Fix For: 1.5.1 > > > Currently, the APIs of the Flink connectors have absolutely no annotations on > whether their usage is {{Public}} / {{PublicEvolving}} / or {{Internal}}. > We have, for example, instances in the past where a user was mistakenly using > an abstract internal base class in the Elasticsearch connector. > This JIRA tracks the coverage of API usage annotation for all Flink shipped > connectors. Ideally, a separate subtask should be created for each individual > connector. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (FLINK-8161) Flakey YARNSessionCapacitySchedulerITCase on Travis
[ https://issues.apache.org/jira/browse/FLINK-8161?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Till Rohrmann updated FLINK-8161: - Fix Version/s: (was: 1.5.0) 1.5.1 > Flakey YARNSessionCapacitySchedulerITCase on Travis > --- > > Key: FLINK-8161 > URL: https://issues.apache.org/jira/browse/FLINK-8161 > Project: Flink > Issue Type: Bug > Components: Tests, YARN >Affects Versions: 1.5.0 >Reporter: Till Rohrmann >Assignee: Till Rohrmann >Priority: Critical > Labels: test-stability > Fix For: 1.5.1 > > Attachments: 24547.10.tar.gz, > api.travis-ci.org_v3_job_380488917_log.tar.xz > > > The {{YARNSessionCapacitySchedulerITCase}} spuriously fails on Travis because > it now contains {{2017-11-25 22:49:49,204 WARN > akka.remote.transport.netty.NettyTransport- Remote > connection to [null] failed with java.nio.channels.NotYetConnectedException}} > from time to time in the logs. I suspect that this is due to switching from > Flakka to Akka 2.4.0. In order to solve this problem I propose to add this > log statement to the whitelisted log statements. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (FLINK-9187) add prometheus pushgateway reporter
[ https://issues.apache.org/jira/browse/FLINK-9187?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Till Rohrmann updated FLINK-9187: - Fix Version/s: (was: 1.5.0) 1.5.1 > add prometheus pushgateway reporter > --- > > Key: FLINK-9187 > URL: https://issues.apache.org/jira/browse/FLINK-9187 > Project: Flink > Issue Type: New Feature > Components: Metrics >Affects Versions: 1.4.2 >Reporter: lamber-ken >Priority: Minor > Labels: features > Fix For: 1.5.1 > > > make flink system can send metrics to prometheus via pushgateway. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (FLINK-8907) Unable to start the process by start-cluster.sh
[ https://issues.apache.org/jira/browse/FLINK-8907?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Till Rohrmann updated FLINK-8907: - Fix Version/s: (was: 1.5.0) 1.5.1 > Unable to start the process by start-cluster.sh > --- > > Key: FLINK-8907 > URL: https://issues.apache.org/jira/browse/FLINK-8907 > Project: Flink > Issue Type: Bug > Components: Startup Shell Scripts >Reporter: mingleizhang >Priority: Minor > Fix For: 1.5.1 > > > Build a flink based on the latest code from master branch. And I get the > flink binary of flink-1.6-SNAPSHOT. When I run ./start-cluster.sh by the > default flink-conf.yaml. I can not start the process and get the error below. > {code:java} > [root@ricezhang-pjhzf bin]# ./start-cluster.sh > >> Starting cluster. > >> Starting standalonesession daemon on host ricezhang-pjhzf.vclound.com. > >> : Temporary failure in name resolutionost > {code} > *By the way, I build this flink from a windows computer. And copy that folder > {{flink-1.6-SNAPSHOT}} to centos.* -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (FLINK-7386) Flink Elasticsearch 5 connector is not compatible with Elasticsearch 5.2+ client
[ https://issues.apache.org/jira/browse/FLINK-7386?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Till Rohrmann updated FLINK-7386: - Fix Version/s: (was: 1.5.0) 1.5.1 > Flink Elasticsearch 5 connector is not compatible with Elasticsearch 5.2+ > client > > > Key: FLINK-7386 > URL: https://issues.apache.org/jira/browse/FLINK-7386 > Project: Flink > Issue Type: Improvement > Components: ElasticSearch Connector >Reporter: Dawid Wysakowicz >Assignee: Fang Yong >Priority: Critical > Fix For: 1.5.1 > > > In Elasticsearch 5.2.0 client the class {{BulkProcessor}} was refactored and > has no longer the method {{add(ActionRequest)}}. > For more info see: https://github.com/elastic/elasticsearch/pull/20109 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (FLINK-8163) NonHAQueryableStateFsBackendITCase test getting stuck on Travis
[ https://issues.apache.org/jira/browse/FLINK-8163?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Till Rohrmann updated FLINK-8163: - Fix Version/s: (was: 1.5.0) 1.5.1 > NonHAQueryableStateFsBackendITCase test getting stuck on Travis > --- > > Key: FLINK-8163 > URL: https://issues.apache.org/jira/browse/FLINK-8163 > Project: Flink > Issue Type: Bug > Components: Queryable State, Tests >Affects Versions: 1.5.0 >Reporter: Till Rohrmann >Assignee: Kostas Kloudas >Priority: Critical > Labels: test-stability > Fix For: 1.5.1 > > > The {{NonHAQueryableStateFsBackendITCase}} tests seems to get stuck on Travis > producing no output for 300s. > https://travis-ci.org/tillrohrmann/flink/jobs/307988209 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (FLINK-7829) Remove (or at least deprecate) DataStream.writeToFile/Csv
[ https://issues.apache.org/jira/browse/FLINK-7829?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Till Rohrmann updated FLINK-7829: - Fix Version/s: (was: 1.5.0) 1.5.1 > Remove (or at least deprecate) DataStream.writeToFile/Csv > - > > Key: FLINK-7829 > URL: https://issues.apache.org/jira/browse/FLINK-7829 > Project: Flink > Issue Type: Improvement > Components: DataStream API >Reporter: Aljoscha Krettek >Priority: Major > Fix For: 1.5.1 > > > These methods are seductive for users but they should never actually use them > in a production streaming job. For those cases the {{BucketingSink}} should > be used. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (FLINK-8822) RotateLogFile may not work well when sed version is below 4.2
[ https://issues.apache.org/jira/browse/FLINK-8822?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Till Rohrmann updated FLINK-8822: - Fix Version/s: (was: 1.5.0) 1.5.1 > RotateLogFile may not work well when sed version is below 4.2 > - > > Key: FLINK-8822 > URL: https://issues.apache.org/jira/browse/FLINK-8822 > Project: Flink > Issue Type: Bug >Affects Versions: 1.4.0 >Reporter: Xin Liu >Priority: Major > Fix For: 1.5.1 > > > In bin/config.sh rotateLogFilesWithPrefix(),it use extended regular to > process filename with "sed -E",but when sed version is below 4.2,it turns out > "sed: invalid option -- 'E'" > and RotateLogFile won't work well : There will be only one logfile no matter > what is $MAX_LOG_FILE_NUMBER. > so use sed -r may be more suitable. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (FLINK-8672) Support continuous processing in CSV table source
[ https://issues.apache.org/jira/browse/FLINK-8672?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Till Rohrmann updated FLINK-8672: - Fix Version/s: (was: 1.5.0) 1.5.1 > Support continuous processing in CSV table source > - > > Key: FLINK-8672 > URL: https://issues.apache.org/jira/browse/FLINK-8672 > Project: Flink > Issue Type: New Feature > Components: Table API SQL >Reporter: Aljoscha Krettek >Priority: Major > Fix For: 1.5.1 > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (FLINK-7565) Add support for HTTP 1.1 (Chunked transfer encoding) to Flink web UI
[ https://issues.apache.org/jira/browse/FLINK-7565?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Till Rohrmann updated FLINK-7565: - Fix Version/s: (was: 1.5.0) 1.5.1 > Add support for HTTP 1.1 (Chunked transfer encoding) to Flink web UI > > > Key: FLINK-7565 > URL: https://issues.apache.org/jira/browse/FLINK-7565 > Project: Flink > Issue Type: Improvement > Components: Webfrontend >Reporter: Robert Metzger >Priority: Major > Fix For: 1.5.1 > > > I tried sending some REST calls to Flink's web UI, using a HTTP 1.1 client > (Jersey-client 2.23.1), using chunked transfer encoding (which means the > content-length header is not set). > Flink's web Ui treats those requests as empty requests (with no body), > leading to errors and exceptions. > Its probably "just" an upgrade of netty-router or so to fix this? > I've also found a good workaround for my use case (by setting > setChunkedEncodingEnabled(false)) (See also > https://stackoverflow.com/questions/18157218/jersey-2-0-content-length-not-set) > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (FLINK-7064) test instability in WordCountMapreduceITCase
[ https://issues.apache.org/jira/browse/FLINK-7064?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Till Rohrmann updated FLINK-7064: - Fix Version/s: (was: 1.5.0) 1.5.1 > test instability in WordCountMapreduceITCase > > > Key: FLINK-7064 > URL: https://issues.apache.org/jira/browse/FLINK-7064 > Project: Flink > Issue Type: Bug > Components: Tests >Affects Versions: 1.4.0 >Reporter: Nico Kruber >Priority: Critical > Labels: test-stability > Fix For: 1.5.1 > > > Although already mentioned in FLINK-7004, this does not seem fixed yet and > apparently now became an instable test: > {code} > Running org.apache.flink.test.hadoop.mapred.WordCountMapredITCase > Inflater has been closed > java.lang.NullPointerException: Inflater has been closed > at java.util.zip.Inflater.ensureOpen(Inflater.java:389) > at java.util.zip.Inflater.inflate(Inflater.java:257) > at java.util.zip.InflaterInputStream.read(InflaterInputStream.java:152) > at java.io.FilterInputStream.read(FilterInputStream.java:133) > at sun.nio.cs.StreamDecoder.readBytes(StreamDecoder.java:283) > at sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:325) > at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:177) > at java.io.InputStreamReader.read(InputStreamReader.java:184) > at java.io.BufferedReader.fill(BufferedReader.java:154) > at java.io.BufferedReader.readLine(BufferedReader.java:317) > at java.io.BufferedReader.readLine(BufferedReader.java:382) > at > javax.xml.parsers.FactoryFinder.findJarServiceProvider(FactoryFinder.java:319) > at javax.xml.parsers.FactoryFinder.find(FactoryFinder.java:255) > at > javax.xml.parsers.DocumentBuilderFactory.newInstance(DocumentBuilderFactory.java:121) > at > org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:2467) > at > org.apache.hadoop.conf.Configuration.loadResources(Configuration.java:2444) > at > org.apache.hadoop.conf.Configuration.getProps(Configuration.java:2361) > at org.apache.hadoop.conf.Configuration.get(Configuration.java:968) > at > org.apache.hadoop.mapred.JobConf.checkAndWarnDeprecation(JobConf.java:2010) > at org.apache.hadoop.mapred.JobConf.(JobConf.java:423) > at > org.apache.flink.hadoopcompatibility.HadoopInputs.readHadoopFile(HadoopInputs.java:63) > at > org.apache.flink.test.hadoop.mapred.WordCountMapredITCase.internalRun(WordCountMapredITCase.java:79) > at > org.apache.flink.test.hadoop.mapred.WordCountMapredITCase.testProgram(WordCountMapredITCase.java:67) > at > org.apache.flink.test.util.JavaProgramTestBase.testJobWithObjectReuse(JavaProgramTestBase.java:127) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at > org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) > at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:55) > at org.junit.rules.RunRules.evaluate(RunRules.java:20) > at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57) > at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290) > at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71) > at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288) > at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58) > at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268) > at org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:48) > at org.junit.rules.RunRules.evaluate(RunRules.java:20) > at org.junit.runners.ParentRunner.run(ParentRunner.java:363) > at > org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:283) > at > org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:173) > at >
[jira] [Updated] (FLINK-8439) Document using a custom AWS Credentials Provider with flink-3s-fs-hadoop
[ https://issues.apache.org/jira/browse/FLINK-8439?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Till Rohrmann updated FLINK-8439: - Fix Version/s: (was: 1.5.0) 1.5.1 > Document using a custom AWS Credentials Provider with flink-3s-fs-hadoop > > > Key: FLINK-8439 > URL: https://issues.apache.org/jira/browse/FLINK-8439 > Project: Flink > Issue Type: Improvement > Components: Documentation >Reporter: Dyana Rose >Priority: Critical > Fix For: 1.4.3, 1.5.1 > > > This came up when using the s3 for the file system backend and running under > ECS. > With no credentials in the container, hadoop-aws will default to EC2 instance > level credentials when accessing S3. However when running under ECS, you will > generally want to default to the task definition's IAM role. > In this case you need to set the hadoop property > {code:java} > fs.s3a.aws.credentials.provider{code} > to a fully qualified class name(s). see [hadoop-aws > docs|https://github.com/apache/hadoop/blob/1ba491ff907fc5d2618add980734a3534e2be098/hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/index.md] > This works as expected when you add this setting to flink-conf.yaml but there > is a further 'gotcha.' Because the AWS sdk is shaded, the actual full class > name for, in this case, the ContainerCredentialsProvider is > {code:java} > org.apache.flink.fs.s3hadoop.shaded.com.amazonaws.auth.ContainerCredentialsProvider{code} > > meaning the full setting is: > {code:java} > fs.s3a.aws.credentials.provider: > org.apache.flink.fs.s3hadoop.shaded.com.amazonaws.auth.ContainerCredentialsProvider{code} > If you instead set it to the unshaded class name you will see a very > confusing error stating that the ContainerCredentialsProvider doesn't > implement AWSCredentialsProvider (which it most certainly does.) > Adding this information (how to specify alternate Credential Providers, and > the name space gotcha) to the [AWS deployment > docs|https://ci.apache.org/projects/flink/flink-docs-release-1.4/ops/deployment/aws.html] > would be useful to anyone else using S3. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (FLINK-9159) Sanity check default timeout values
[ https://issues.apache.org/jira/browse/FLINK-9159?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Till Rohrmann updated FLINK-9159: - Fix Version/s: (was: 1.5.0) 1.5.1 > Sanity check default timeout values > --- > > Key: FLINK-9159 > URL: https://issues.apache.org/jira/browse/FLINK-9159 > Project: Flink > Issue Type: Bug > Components: Distributed Coordination >Affects Versions: 1.5.0 >Reporter: Till Rohrmann >Assignee: Fabian Hueske >Priority: Blocker > Fix For: 1.5.1 > > > Check that the default timeout values for resource release are sanely chosen. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (FLINK-9258) ConcurrentModificationException in ComponentMetricGroup.getAllVariables
[ https://issues.apache.org/jira/browse/FLINK-9258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Till Rohrmann updated FLINK-9258: - Fix Version/s: (was: 1.5.0) 1.5.1 > ConcurrentModificationException in ComponentMetricGroup.getAllVariables > --- > > Key: FLINK-9258 > URL: https://issues.apache.org/jira/browse/FLINK-9258 > Project: Flink > Issue Type: Bug > Components: Metrics >Affects Versions: 1.4.2 >Reporter: Narayanan Arunachalam >Assignee: Chesnay Schepler >Priority: Major > Fix For: 1.4.3, 1.5.1 > > > Seeing this exception at the job startup time. Looks like there is a race > condition when the metrics variables are constructed. > The error is intermittent. > Exception in thread "main" > org.apache.flink.runtime.client.JobExecutionException: Job execution failed. > at > org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1$$anonfun$applyOrElse$6.apply$mcV$sp(JobManager.scala:897) > at > org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1$$anonfun$applyOrElse$6.apply(JobManager.scala:840) > at > org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1$$anonfun$applyOrElse$6.apply(JobManager.scala:840) > at > scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24) > at > scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24) > at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:39) > at > akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:415) > at > scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) > at > scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) > at > scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) > at > scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) > Caused by: java.util.ConcurrentModificationException > at java.util.HashMap$HashIterator.nextNode(HashMap.java:1437) > at java.util.HashMap$EntryIterator.next(HashMap.java:1471) > at java.util.HashMap$EntryIterator.next(HashMap.java:1469) > at java.util.HashMap.putMapEntries(HashMap.java:511) > at java.util.HashMap.putAll(HashMap.java:784) > at > org.apache.flink.runtime.metrics.groups.ComponentMetricGroup.getAllVariables(ComponentMetricGroup.java:63) > at > org.apache.flink.runtime.metrics.groups.ComponentMetricGroup.getAllVariables(ComponentMetricGroup.java:63) > at > com.netflix.spaas.metrics.MetricsReporterRegistry.getTags(MetricsReporterRegistry.java:147) > at > com.netflix.spaas.metrics.MetricsReporterRegistry.mergeWithSourceAndSinkTags(MetricsReporterRegistry.java:170) > at > com.netflix.spaas.metrics.MetricsReporterRegistry.addReporter(MetricsReporterRegistry.java:75) > at > com.netflix.spaas.nfflink.connector.kafka.source.Kafka010Consumer.createFetcher(Kafka010Consumer.java:69) > at > org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumerBase.run(FlinkKafkaConsumerBase.java:549) > at > org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:86) > at > org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:55) > at > org.apache.flink.streaming.runtime.tasks.SourceStreamTask.run(SourceStreamTask.java:94) > at > org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:264) > at org.apache.flink.runtime.taskmanager.Task.run(Task.java:718) > at java.lang.Thread.run(Thread.java:748) -- This message was sent by Atlassian JIRA (v7.6.3#76005)