[jira] [Commented] (FLINK-8787) Deploying FLIP-6 YARN session with HA fails
[ https://issues.apache.org/jira/browse/FLINK-8787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16378174#comment-16378174 ] ASF GitHub Bot commented on FLINK-8787: --- Github user GJL commented on the issue: https://github.com/apache/flink/pull/5584 test failures due to ``` // logback + log4j, with/out krb5, different JVM opts // IMPORTANT: Be aware that we are using side effects here to modify the created YarnClusterDescriptor, // because we have a reference to the ClusterDescriptor's configuration which we modify continuously ``` > Deploying FLIP-6 YARN session with HA fails > --- > > Key: FLINK-8787 > URL: https://issues.apache.org/jira/browse/FLINK-8787 > Project: Flink > Issue Type: Bug > Components: Client, YARN >Affects Versions: 1.5.0 > Environment: emr-5.12.0 > Hadoop distribution: Amazon 2.8.3 > Applications: ZooKeeper 3.4.10 >Reporter: Gary Yao >Assignee: Gary Yao >Priority: Blocker > Labels: flip-6 > Fix For: 1.5.0 > > > Starting a YARN session with HA in FLIP-6 mode fails with an exception. > Commit: 5e3fa4403f518dd6d3fe9970fe8ca55871add7c9 > Command to start YARN session: > {noformat} > export HADOOP_CLASSPATH=`hadoop classpath` > HADOOP_CONF_DIR=/etc/hadoop/conf bin/yarn-session.sh -d -n 1 -s 1 -jm 2048 > -tm 2048 > {noformat} > Stacktrace: > {noformat} > java.lang.reflect.UndeclaredThrowableException > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1854) > at > org.apache.flink.runtime.security.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:41) > at > org.apache.flink.yarn.cli.FlinkYarnSessionCli.main(FlinkYarnSessionCli.java:790) > Caused by: org.apache.flink.util.FlinkException: Could not write the Yarn > connection information. > at > org.apache.flink.yarn.cli.FlinkYarnSessionCli.run(FlinkYarnSessionCli.java:612) > at > org.apache.flink.yarn.cli.FlinkYarnSessionCli.lambda$main$2(FlinkYarnSessionCli.java:790) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1836) > ... 2 more > Caused by: org.apache.flink.runtime.leaderretrieval.LeaderRetrievalException: > Could not retrieve the leader address and leader session ID. > at > org.apache.flink.runtime.util.LeaderRetrievalUtils.retrieveLeaderConnectionInfo(LeaderRetrievalUtils.java:116) > at > org.apache.flink.client.program.rest.RestClusterClient.getClusterConnectionInfo(RestClusterClient.java:405) > at > org.apache.flink.yarn.cli.FlinkYarnSessionCli.run(FlinkYarnSessionCli.java:589) > ... 6 more > Caused by: java.util.concurrent.TimeoutException: Futures timed out after > [6 milliseconds] > at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:223) > at scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:227) > at scala.concurrent.Await$$anonfun$result$1.apply(package.scala:190) > at > scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53) > at scala.concurrent.Await$.result(package.scala:190) > at scala.concurrent.Await.result(package.scala) > at > org.apache.flink.runtime.util.LeaderRetrievalUtils.retrieveLeaderConnectionInfo(LeaderRetrievalUtils.java:114) > ... 8 more > > The program finished with the following exception: > java.lang.reflect.UndeclaredThrowableException > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1854) > at > org.apache.flink.runtime.security.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:41) > at > org.apache.flink.yarn.cli.FlinkYarnSessionCli.main(FlinkYarnSessionCli.java:790) > Caused by: org.apache.flink.util.FlinkException: Could not write the Yarn > connection information. > at > org.apache.flink.yarn.cli.FlinkYarnSessionCli.run(FlinkYarnSessionCli.java:612) > at > org.apache.flink.yarn.cli.FlinkYarnSessionCli.lambda$main$2(FlinkYarnSessionCli.java:790) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1836) > ... 2 more > Caused by: org.apache.flink.runtime.leaderretrieval.LeaderRetrievalException: > Could not retrieve the leader address and leader session ID. > at > org.apache.flink.runtime.util.LeaderRetrievalUtils.retrieveLeaderConnectionInfo(LeaderRetrievalUtils.java:116) > at > org.apache.flink.client.program.rest.RestClusterClient.getClusterConnectionInfo(RestClusterClient.java:405) > at >
[GitHub] flink issue #5584: [FLINK-8787][flip6] WIP Deploying FLIP-6 YARN session wit...
Github user GJL commented on the issue: https://github.com/apache/flink/pull/5584 test failures due to ``` // logback + log4j, with/out krb5, different JVM opts // IMPORTANT: Be aware that we are using side effects here to modify the created YarnClusterDescriptor, // because we have a reference to the ClusterDescriptor's configuration which we modify continuously ``` ---
[jira] [Commented] (FLINK-6352) FlinkKafkaConsumer should support to use timestamp to set up start offset
[ https://issues.apache.org/jira/browse/FLINK-6352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16378162#comment-16378162 ] ASF GitHub Bot commented on FLINK-6352: --- Github user tzulitai commented on the issue: https://github.com/apache/flink/pull/5282 @aljoscha sorry about the leftover merge markers, I've fixed them now. > FlinkKafkaConsumer should support to use timestamp to set up start offset > - > > Key: FLINK-6352 > URL: https://issues.apache.org/jira/browse/FLINK-6352 > Project: Flink > Issue Type: Improvement > Components: Kafka Connector >Reporter: Fang Yong >Assignee: Fang Yong >Priority: Blocker > Fix For: 1.5.0 > > > Currently "auto.offset.reset" is used to initialize the start offset of > FlinkKafkaConsumer, and the value should be earliest/latest/none. This method > can only let the job comsume the beginning or the most recent data, but can > not specify the specific offset of Kafka began to consume. > So, there should be a configuration item (such as > "flink.source.start.time" and the format is "-MM-dd HH:mm:ss") that > allows user to configure the initial offset of Kafka. The action of > "flink.source.start.time" is as follows: > 1) job start from checkpoint / savepoint > a> offset of partition can be restored from checkpoint/savepoint, > "flink.source.start.time" will be ignored. > b> there's no checkpoint/savepoint for the partition (For example, this > partition is newly increased), the "flink.kafka.start.time" will be used to > initialize the offset of the partition > 2) job has no checkpoint / savepoint, the "flink.source.start.time" is used > to initialize the offset of the kafka > a> the "flink.source.start.time" is valid, use it to set the offset of kafka > b> the "flink.source.start.time" is out-of-range, the same as it does > currently with no initial offset, get kafka's current offset and start reading -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] flink issue #5282: [FLINK-6352] [kafka] Timestamp-based offset configuration...
Github user tzulitai commented on the issue: https://github.com/apache/flink/pull/5282 @aljoscha sorry about the leftover merge markers, I've fixed them now. ---
[jira] [Updated] (FLINK-6997) SavepointITCase fails in master branch sometimes
[ https://issues.apache.org/jira/browse/FLINK-6997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aljoscha Krettek updated FLINK-6997: Priority: Blocker (was: Critical) > SavepointITCase fails in master branch sometimes > > > Key: FLINK-6997 > URL: https://issues.apache.org/jira/browse/FLINK-6997 > Project: Flink > Issue Type: Bug > Components: Tests >Affects Versions: 1.3.0, 1.5.0 >Reporter: Ted Yu >Priority: Blocker > Labels: test-stability > Fix For: 1.5.0 > > > I got the following test failure (with commit > a0b781461bcf8c2f1d00b93464995f03eda589f1) > {code} > testSavepointForJobWithIteration(org.apache.flink.test.checkpointing.SavepointITCase) > Time elapsed: 8.129 sec <<< ERROR! > java.io.IOException: java.lang.Exception: Failed to complete savepoint > at > org.apache.flink.runtime.testingUtils.TestingCluster.triggerSavepoint(TestingCluster.scala:342) > at > org.apache.flink.runtime.testingUtils.TestingCluster.triggerSavepoint(TestingCluster.scala:316) > at > org.apache.flink.test.checkpointing.SavepointITCase.testSavepointForJobWithIteration(SavepointITCase.java:827) > Caused by: java.lang.Exception: Failed to complete savepoint > at > org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1$$anon$7.apply(JobManager.scala:821) > at > org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1$$anon$7.apply(JobManager.scala:805) > at > org.apache.flink.runtime.concurrent.impl.FlinkFuture$5.onComplete(FlinkFuture.java:272) > at akka.dispatch.OnComplete.internal(Future.scala:247) > at akka.dispatch.OnComplete.internal(Future.scala:245) > at akka.dispatch.japi$CallbackBridge.apply(Future.scala:175) > at akka.dispatch.japi$CallbackBridge.apply(Future.scala:172) > at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:32) > at > akka.dispatch.BatchingExecutor$AbstractBatch.processBatch(BatchingExecutor.scala:55) > at > akka.dispatch.BatchingExecutor$BlockableBatch$$anonfun$run$1.apply$mcV$sp(BatchingExecutor.scala:91) > at > akka.dispatch.BatchingExecutor$BlockableBatch$$anonfun$run$1.apply(BatchingExecutor.scala:91) > at > akka.dispatch.BatchingExecutor$BlockableBatch$$anonfun$run$1.apply(BatchingExecutor.scala:91) > at scala.concurrent.BlockContext$.withBlockContext(BlockContext.scala:72) > at > akka.dispatch.BatchingExecutor$BlockableBatch.run(BatchingExecutor.scala:90) > at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:40) > at > akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:397) > at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) > at > scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) > at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) > at > scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) > Caused by: java.lang.Exception: Failed to trigger savepoint: Not all required > tasks are currently running. > at > org.apache.flink.runtime.checkpoint.CheckpointCoordinator.triggerSavepoint(CheckpointCoordinator.java:382) > at > org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1.applyOrElse(JobManager.scala:800) > at > scala.runtime.AbstractPartialFunction$mcVL$sp.apply$mcVL$sp(AbstractPartialFunction.scala:33) > at > scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:33) > at > scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:25) > at > org.apache.flink.runtime.testingUtils.TestingJobManagerLike$$anonfun$handleTestingMessage$1.applyOrElse(TestingJobManagerLike.scala:95) > at scala.PartialFunction$OrElse.apply(PartialFunction.scala:162) > at > org.apache.flink.runtime.LeaderSessionMessageFilter$$anonfun$receive$1.applyOrElse(LeaderSessionMessageFilter.scala:38) > at > scala.runtime.AbstractPartialFunction$mcVL$sp.apply$mcVL$sp(AbstractPartialFunction.scala:33) > at > scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:33) > at > scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:25) > at org.apache.flink.runtime.LogMessages$$anon$1.apply(LogMessages.scala:33) > at org.apache.flink.runtime.LogMessages$$anon$1.apply(LogMessages.scala:28) > at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:118) > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (FLINK-6645) JobMasterTest failed on Travis
[ https://issues.apache.org/jira/browse/FLINK-6645?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aljoscha Krettek updated FLINK-6645: Fix Version/s: 1.5.0 > JobMasterTest failed on Travis > -- > > Key: FLINK-6645 > URL: https://issues.apache.org/jira/browse/FLINK-6645 > Project: Flink > Issue Type: Bug > Components: Tests >Affects Versions: 1.4.0 >Reporter: Chesnay Schepler >Priority: Blocker > Labels: test-stability > Fix For: 1.5.0 > > > {code} > Failed tests: > JobMasterTest.testHeartbeatTimeoutWithResourceManager:220 > Wanted but not invoked: > resourceManagerGateway.disconnectJobManager( > be75c925204aede002136b15238f88b4, > > ); > -> at > org.apache.flink.runtime.jobmaster.JobMasterTest.testHeartbeatTimeoutWithResourceManager(JobMasterTest.java:220) > However, there were other interactions with this mock: > resourceManagerGateway.registerJobManager( > 82d774de-ed78-4670-9623-2fc6638fbbf9, > 11b045ea-8b2b-4df0-aa02-ef4922dfc632, > jm, > "b442340a-7d3d-49d8-b440-1a93a5b43bb6", > be75c925204aede002136b15238f88b4, > 100 ms > ); > -> at > org.apache.flink.runtime.jobmaster.JobMaster$ResourceManagerConnection$1.invokeRegistration(JobMaster.java:1051) > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (FLINK-6645) JobMasterTest failed on Travis
[ https://issues.apache.org/jira/browse/FLINK-6645?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aljoscha Krettek updated FLINK-6645: Priority: Blocker (was: Major) > JobMasterTest failed on Travis > -- > > Key: FLINK-6645 > URL: https://issues.apache.org/jira/browse/FLINK-6645 > Project: Flink > Issue Type: Bug > Components: Tests >Affects Versions: 1.4.0 >Reporter: Chesnay Schepler >Priority: Blocker > Labels: test-stability > Fix For: 1.5.0 > > > {code} > Failed tests: > JobMasterTest.testHeartbeatTimeoutWithResourceManager:220 > Wanted but not invoked: > resourceManagerGateway.disconnectJobManager( > be75c925204aede002136b15238f88b4, > > ); > -> at > org.apache.flink.runtime.jobmaster.JobMasterTest.testHeartbeatTimeoutWithResourceManager(JobMasterTest.java:220) > However, there were other interactions with this mock: > resourceManagerGateway.registerJobManager( > 82d774de-ed78-4670-9623-2fc6638fbbf9, > 11b045ea-8b2b-4df0-aa02-ef4922dfc632, > jm, > "b442340a-7d3d-49d8-b440-1a93a5b43bb6", > be75c925204aede002136b15238f88b4, > 100 ms > ); > -> at > org.apache.flink.runtime.jobmaster.JobMaster$ResourceManagerConnection$1.invokeRegistration(JobMaster.java:1051) > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (FLINK-7064) test instability in WordCountMapreduceITCase
[ https://issues.apache.org/jira/browse/FLINK-7064?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aljoscha Krettek updated FLINK-7064: Fix Version/s: 1.5.0 > test instability in WordCountMapreduceITCase > > > Key: FLINK-7064 > URL: https://issues.apache.org/jira/browse/FLINK-7064 > Project: Flink > Issue Type: Bug > Components: Tests >Affects Versions: 1.4.0 >Reporter: Nico Kruber >Priority: Blocker > Labels: test-stability > Fix For: 1.5.0 > > > Although already mentioned in FLINK-7004, this does not seem fixed yet and > apparently now became an instable test: > {code} > Running org.apache.flink.test.hadoop.mapred.WordCountMapredITCase > Inflater has been closed > java.lang.NullPointerException: Inflater has been closed > at java.util.zip.Inflater.ensureOpen(Inflater.java:389) > at java.util.zip.Inflater.inflate(Inflater.java:257) > at java.util.zip.InflaterInputStream.read(InflaterInputStream.java:152) > at java.io.FilterInputStream.read(FilterInputStream.java:133) > at sun.nio.cs.StreamDecoder.readBytes(StreamDecoder.java:283) > at sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:325) > at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:177) > at java.io.InputStreamReader.read(InputStreamReader.java:184) > at java.io.BufferedReader.fill(BufferedReader.java:154) > at java.io.BufferedReader.readLine(BufferedReader.java:317) > at java.io.BufferedReader.readLine(BufferedReader.java:382) > at > javax.xml.parsers.FactoryFinder.findJarServiceProvider(FactoryFinder.java:319) > at javax.xml.parsers.FactoryFinder.find(FactoryFinder.java:255) > at > javax.xml.parsers.DocumentBuilderFactory.newInstance(DocumentBuilderFactory.java:121) > at > org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:2467) > at > org.apache.hadoop.conf.Configuration.loadResources(Configuration.java:2444) > at > org.apache.hadoop.conf.Configuration.getProps(Configuration.java:2361) > at org.apache.hadoop.conf.Configuration.get(Configuration.java:968) > at > org.apache.hadoop.mapred.JobConf.checkAndWarnDeprecation(JobConf.java:2010) > at org.apache.hadoop.mapred.JobConf.(JobConf.java:423) > at > org.apache.flink.hadoopcompatibility.HadoopInputs.readHadoopFile(HadoopInputs.java:63) > at > org.apache.flink.test.hadoop.mapred.WordCountMapredITCase.internalRun(WordCountMapredITCase.java:79) > at > org.apache.flink.test.hadoop.mapred.WordCountMapredITCase.testProgram(WordCountMapredITCase.java:67) > at > org.apache.flink.test.util.JavaProgramTestBase.testJobWithObjectReuse(JavaProgramTestBase.java:127) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at > org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) > at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:55) > at org.junit.rules.RunRules.evaluate(RunRules.java:20) > at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57) > at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290) > at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71) > at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288) > at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58) > at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268) > at org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:48) > at org.junit.rules.RunRules.evaluate(RunRules.java:20) > at org.junit.runners.ParentRunner.run(ParentRunner.java:363) > at > org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:283) > at > org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:173) > at >
[jira] [Updated] (FLINK-7064) test instability in WordCountMapreduceITCase
[ https://issues.apache.org/jira/browse/FLINK-7064?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aljoscha Krettek updated FLINK-7064: Priority: Blocker (was: Critical) > test instability in WordCountMapreduceITCase > > > Key: FLINK-7064 > URL: https://issues.apache.org/jira/browse/FLINK-7064 > Project: Flink > Issue Type: Bug > Components: Tests >Affects Versions: 1.4.0 >Reporter: Nico Kruber >Priority: Blocker > Labels: test-stability > Fix For: 1.5.0 > > > Although already mentioned in FLINK-7004, this does not seem fixed yet and > apparently now became an instable test: > {code} > Running org.apache.flink.test.hadoop.mapred.WordCountMapredITCase > Inflater has been closed > java.lang.NullPointerException: Inflater has been closed > at java.util.zip.Inflater.ensureOpen(Inflater.java:389) > at java.util.zip.Inflater.inflate(Inflater.java:257) > at java.util.zip.InflaterInputStream.read(InflaterInputStream.java:152) > at java.io.FilterInputStream.read(FilterInputStream.java:133) > at sun.nio.cs.StreamDecoder.readBytes(StreamDecoder.java:283) > at sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:325) > at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:177) > at java.io.InputStreamReader.read(InputStreamReader.java:184) > at java.io.BufferedReader.fill(BufferedReader.java:154) > at java.io.BufferedReader.readLine(BufferedReader.java:317) > at java.io.BufferedReader.readLine(BufferedReader.java:382) > at > javax.xml.parsers.FactoryFinder.findJarServiceProvider(FactoryFinder.java:319) > at javax.xml.parsers.FactoryFinder.find(FactoryFinder.java:255) > at > javax.xml.parsers.DocumentBuilderFactory.newInstance(DocumentBuilderFactory.java:121) > at > org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:2467) > at > org.apache.hadoop.conf.Configuration.loadResources(Configuration.java:2444) > at > org.apache.hadoop.conf.Configuration.getProps(Configuration.java:2361) > at org.apache.hadoop.conf.Configuration.get(Configuration.java:968) > at > org.apache.hadoop.mapred.JobConf.checkAndWarnDeprecation(JobConf.java:2010) > at org.apache.hadoop.mapred.JobConf.(JobConf.java:423) > at > org.apache.flink.hadoopcompatibility.HadoopInputs.readHadoopFile(HadoopInputs.java:63) > at > org.apache.flink.test.hadoop.mapred.WordCountMapredITCase.internalRun(WordCountMapredITCase.java:79) > at > org.apache.flink.test.hadoop.mapred.WordCountMapredITCase.testProgram(WordCountMapredITCase.java:67) > at > org.apache.flink.test.util.JavaProgramTestBase.testJobWithObjectReuse(JavaProgramTestBase.java:127) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at > org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) > at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:55) > at org.junit.rules.RunRules.evaluate(RunRules.java:20) > at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57) > at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290) > at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71) > at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288) > at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58) > at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268) > at org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:48) > at org.junit.rules.RunRules.evaluate(RunRules.java:20) > at org.junit.runners.ParentRunner.run(ParentRunner.java:363) > at > org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:283) > at > org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:173) > at >
[jira] [Updated] (FLINK-8034) ProcessFailureCancelingITCase.testCancelingOnProcessFailure failing on Travis
[ https://issues.apache.org/jira/browse/FLINK-8034?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aljoscha Krettek updated FLINK-8034: Priority: Blocker (was: Critical) > ProcessFailureCancelingITCase.testCancelingOnProcessFailure failing on Travis > - > > Key: FLINK-8034 > URL: https://issues.apache.org/jira/browse/FLINK-8034 > Project: Flink > Issue Type: Bug > Components: Tests >Affects Versions: 1.5.0 >Reporter: Till Rohrmann >Priority: Blocker > Labels: test-stability > Fix For: 1.5.0 > > > The {{ProcessFailureCancelingITCase.testCancelingOnProcessFailure}} is > failing on Travis spuriously. > https://travis-ci.org/tillrohrmann/flink/jobs/299075703 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (FLINK-7733) test instability in Kafka end-to-end test (RetriableCommitFailedException)
[ https://issues.apache.org/jira/browse/FLINK-7733?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aljoscha Krettek updated FLINK-7733: Priority: Blocker (was: Critical) > test instability in Kafka end-to-end test (RetriableCommitFailedException) > -- > > Key: FLINK-7733 > URL: https://issues.apache.org/jira/browse/FLINK-7733 > Project: Flink > Issue Type: Bug > Components: Kafka Connector, Tests >Affects Versions: 1.4.0 >Reporter: Nico Kruber >Priority: Blocker > Labels: test-stability > Fix For: 1.5.0 > > > In a branch with unrelated changes, the Kafka end-to-end tests fails with the > following strange entries in the log. > {code} > 2017-09-27 17:50:36,777 INFO org.apache.kafka.common.utils.AppInfoParser > - Kafka version : 0.10.2.1 > 2017-09-27 17:50:36,778 INFO org.apache.kafka.common.utils.AppInfoParser > - Kafka commitId : e89bffd6b2eff799 > 2017-09-27 17:50:38,492 INFO > org.apache.kafka.clients.consumer.internals.AbstractCoordinator - Discovered > coordinator testing-gce-b6810926-7003-4390-9002-178739bb4946:9092 (id: > 2147483647 rack: null) for group myconsumer. > 2017-09-27 17:50:38,511 INFO > org.apache.kafka.clients.consumer.internals.AbstractCoordinator - Marking > the coordinator testing-gce-b6810926-7003-4390-9002-178739bb4946:9092 (id: > 2147483647 rack: null) dead for group myconsumer > 2017-09-27 17:50:38,618 INFO > org.apache.kafka.clients.consumer.internals.AbstractCoordinator - Discovered > coordinator testing-gce-b6810926-7003-4390-9002-178739bb4946:9092 (id: > 2147483647 rack: null) for group myconsumer. > 2017-09-27 17:50:41,525 INFO > org.apache.flink.runtime.state.DefaultOperatorStateBackend- > DefaultOperatorStateBackend snapshot (In-Memory Stream Factory, synchronous > part) in thread Thread[Async calls on Source: Custom Source -> Map -> Sink: > Unnamed (1/1),5,Flink Task Threads] took 481 ms. > 2017-09-27 17:50:41,598 INFO > org.apache.kafka.clients.consumer.internals.AbstractCoordinator - Marking > the coordinator testing-gce-b6810926-7003-4390-9002-178739bb4946:9092 (id: > 2147483647 rack: null) dead for group myconsumer > 2017-09-27 17:50:41,600 WARN > org.apache.flink.streaming.connectors.kafka.internal.Kafka09Fetcher - > Committing offsets to Kafka failed. This does not compromise Flink's > checkpoints. > org.apache.kafka.clients.consumer.RetriableCommitFailedException: Offset > commit failed with a retriable exception. You should retry committing offsets. > Caused by: org.apache.kafka.common.errors.DisconnectException > 2017-09-27 17:50:41,608 ERROR > org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumerBase - Async > Kafka commit failed. > org.apache.kafka.clients.consumer.RetriableCommitFailedException: Offset > commit failed with a retriable exception. You should retry committing offsets. > Caused by: org.apache.kafka.common.errors.DisconnectException > {code} > https://travis-ci.org/NicoK/flink/jobs/280477275 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (FLINK-8163) NonHAQueryableStateFsBackendITCase test getting stuck on Travis
[ https://issues.apache.org/jira/browse/FLINK-8163?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aljoscha Krettek updated FLINK-8163: Priority: Blocker (was: Critical) > NonHAQueryableStateFsBackendITCase test getting stuck on Travis > --- > > Key: FLINK-8163 > URL: https://issues.apache.org/jira/browse/FLINK-8163 > Project: Flink > Issue Type: Bug > Components: Queryable State, Tests >Affects Versions: 1.5.0 >Reporter: Till Rohrmann >Assignee: Kostas Kloudas >Priority: Blocker > Labels: test-stability > Fix For: 1.5.0 > > > The {{NonHAQueryableStateFsBackendITCase}} tests seems to get stuck on Travis > producing no output for 300s. > https://travis-ci.org/tillrohrmann/flink/jobs/307988209 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (FLINK-8034) ProcessFailureCancelingITCase.testCancelingOnProcessFailure failing on Travis
[ https://issues.apache.org/jira/browse/FLINK-8034?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aljoscha Krettek updated FLINK-8034: Fix Version/s: 1.5.0 > ProcessFailureCancelingITCase.testCancelingOnProcessFailure failing on Travis > - > > Key: FLINK-8034 > URL: https://issues.apache.org/jira/browse/FLINK-8034 > Project: Flink > Issue Type: Bug > Components: Tests >Affects Versions: 1.5.0 >Reporter: Till Rohrmann >Priority: Blocker > Labels: test-stability > Fix For: 1.5.0 > > > The {{ProcessFailureCancelingITCase.testCancelingOnProcessFailure}} is > failing on Travis spuriously. > https://travis-ci.org/tillrohrmann/flink/jobs/299075703 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (FLINK-7733) test instability in Kafka end-to-end test (RetriableCommitFailedException)
[ https://issues.apache.org/jira/browse/FLINK-7733?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aljoscha Krettek updated FLINK-7733: Fix Version/s: 1.5.0 > test instability in Kafka end-to-end test (RetriableCommitFailedException) > -- > > Key: FLINK-7733 > URL: https://issues.apache.org/jira/browse/FLINK-7733 > Project: Flink > Issue Type: Bug > Components: Kafka Connector, Tests >Affects Versions: 1.4.0 >Reporter: Nico Kruber >Priority: Blocker > Labels: test-stability > Fix For: 1.5.0 > > > In a branch with unrelated changes, the Kafka end-to-end tests fails with the > following strange entries in the log. > {code} > 2017-09-27 17:50:36,777 INFO org.apache.kafka.common.utils.AppInfoParser > - Kafka version : 0.10.2.1 > 2017-09-27 17:50:36,778 INFO org.apache.kafka.common.utils.AppInfoParser > - Kafka commitId : e89bffd6b2eff799 > 2017-09-27 17:50:38,492 INFO > org.apache.kafka.clients.consumer.internals.AbstractCoordinator - Discovered > coordinator testing-gce-b6810926-7003-4390-9002-178739bb4946:9092 (id: > 2147483647 rack: null) for group myconsumer. > 2017-09-27 17:50:38,511 INFO > org.apache.kafka.clients.consumer.internals.AbstractCoordinator - Marking > the coordinator testing-gce-b6810926-7003-4390-9002-178739bb4946:9092 (id: > 2147483647 rack: null) dead for group myconsumer > 2017-09-27 17:50:38,618 INFO > org.apache.kafka.clients.consumer.internals.AbstractCoordinator - Discovered > coordinator testing-gce-b6810926-7003-4390-9002-178739bb4946:9092 (id: > 2147483647 rack: null) for group myconsumer. > 2017-09-27 17:50:41,525 INFO > org.apache.flink.runtime.state.DefaultOperatorStateBackend- > DefaultOperatorStateBackend snapshot (In-Memory Stream Factory, synchronous > part) in thread Thread[Async calls on Source: Custom Source -> Map -> Sink: > Unnamed (1/1),5,Flink Task Threads] took 481 ms. > 2017-09-27 17:50:41,598 INFO > org.apache.kafka.clients.consumer.internals.AbstractCoordinator - Marking > the coordinator testing-gce-b6810926-7003-4390-9002-178739bb4946:9092 (id: > 2147483647 rack: null) dead for group myconsumer > 2017-09-27 17:50:41,600 WARN > org.apache.flink.streaming.connectors.kafka.internal.Kafka09Fetcher - > Committing offsets to Kafka failed. This does not compromise Flink's > checkpoints. > org.apache.kafka.clients.consumer.RetriableCommitFailedException: Offset > commit failed with a retriable exception. You should retry committing offsets. > Caused by: org.apache.kafka.common.errors.DisconnectException > 2017-09-27 17:50:41,608 ERROR > org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumerBase - Async > Kafka commit failed. > org.apache.kafka.clients.consumer.RetriableCommitFailedException: Offset > commit failed with a retriable exception. You should retry committing offsets. > Caused by: org.apache.kafka.common.errors.DisconnectException > {code} > https://travis-ci.org/NicoK/flink/jobs/280477275 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (FLINK-8336) YarnFileStageTestS3ITCase.testRecursiveUploadForYarnS3 test instability
[ https://issues.apache.org/jira/browse/FLINK-8336?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aljoscha Krettek updated FLINK-8336: Priority: Blocker (was: Critical) > YarnFileStageTestS3ITCase.testRecursiveUploadForYarnS3 test instability > --- > > Key: FLINK-8336 > URL: https://issues.apache.org/jira/browse/FLINK-8336 > Project: Flink > Issue Type: Bug > Components: FileSystem, Tests, YARN >Affects Versions: 1.5.0 >Reporter: Till Rohrmann >Priority: Blocker > Labels: test-stability > Fix For: 1.5.0, 1.4.2 > > > The {{YarnFileStageTestS3ITCase.testRecursiveUploadForYarnS3}} fails on > Travis. I suspect that this has something to do with the consistency > guarantees S3 gives us. > https://travis-ci.org/tillrohrmann/flink/jobs/323930297 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (FLINK-8073) Test instability FlinkKafkaProducer011ITCase.testScaleDownBeforeFirstCheckpoint()
[ https://issues.apache.org/jira/browse/FLINK-8073?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aljoscha Krettek updated FLINK-8073: Fix Version/s: 1.5.0 > Test instability > FlinkKafkaProducer011ITCase.testScaleDownBeforeFirstCheckpoint() > - > > Key: FLINK-8073 > URL: https://issues.apache.org/jira/browse/FLINK-8073 > Project: Flink > Issue Type: Bug > Components: Kafka Connector, Tests >Affects Versions: 1.4.0, 1.5.0 >Reporter: Kostas Kloudas >Priority: Blocker > Labels: test-stability > Fix For: 1.5.0 > > > Travis log: https://travis-ci.org/kl0u/flink/jobs/301985988 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (FLINK-8337) GatherSumApplyITCase.testConnectedComponentsWithObjectReuseDisabled instable
[ https://issues.apache.org/jira/browse/FLINK-8337?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aljoscha Krettek updated FLINK-8337: Priority: Blocker (was: Critical) > GatherSumApplyITCase.testConnectedComponentsWithObjectReuseDisabled instable > > > Key: FLINK-8337 > URL: https://issues.apache.org/jira/browse/FLINK-8337 > Project: Flink > Issue Type: Bug > Components: Gelly, Tests >Affects Versions: 1.5.0 >Reporter: Till Rohrmann >Priority: Blocker > Labels: test-stability > Fix For: 1.5.0, 1.4.2 > > > The {{GatherSumApplyITCase.testConnectedComponentsWithObjectReuseDisabled}} > fails on Travis. It looks as if a sub partition has not been registered at > the task event dispatcher. > https://travis-ci.org/apache/flink/jobs/323930301 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (FLINK-8073) Test instability FlinkKafkaProducer011ITCase.testScaleDownBeforeFirstCheckpoint()
[ https://issues.apache.org/jira/browse/FLINK-8073?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aljoscha Krettek updated FLINK-8073: Priority: Blocker (was: Critical) > Test instability > FlinkKafkaProducer011ITCase.testScaleDownBeforeFirstCheckpoint() > - > > Key: FLINK-8073 > URL: https://issues.apache.org/jira/browse/FLINK-8073 > Project: Flink > Issue Type: Bug > Components: Kafka Connector, Tests >Affects Versions: 1.4.0, 1.5.0 >Reporter: Kostas Kloudas >Priority: Blocker > Labels: test-stability > Fix For: 1.5.0 > > > Travis log: https://travis-ci.org/kl0u/flink/jobs/301985988 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (FLINK-8163) NonHAQueryableStateFsBackendITCase test getting stuck on Travis
[ https://issues.apache.org/jira/browse/FLINK-8163?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aljoscha Krettek updated FLINK-8163: Fix Version/s: 1.5.0 > NonHAQueryableStateFsBackendITCase test getting stuck on Travis > --- > > Key: FLINK-8163 > URL: https://issues.apache.org/jira/browse/FLINK-8163 > Project: Flink > Issue Type: Bug > Components: Queryable State, Tests >Affects Versions: 1.5.0 >Reporter: Till Rohrmann >Assignee: Kostas Kloudas >Priority: Blocker > Labels: test-stability > Fix For: 1.5.0 > > > The {{NonHAQueryableStateFsBackendITCase}} tests seems to get stuck on Travis > producing no output for 300s. > https://travis-ci.org/tillrohrmann/flink/jobs/307988209 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (FLINK-8460) UtilsTest.testYarnFlinkResourceManagerJobManagerLostLeadership instable on Travis
[ https://issues.apache.org/jira/browse/FLINK-8460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aljoscha Krettek updated FLINK-8460: Fix Version/s: 1.5.0 > UtilsTest.testYarnFlinkResourceManagerJobManagerLostLeadership instable on > Travis > - > > Key: FLINK-8460 > URL: https://issues.apache.org/jira/browse/FLINK-8460 > Project: Flink > Issue Type: Bug > Components: Tests, YARN >Affects Versions: 1.5.0 >Reporter: Till Rohrmann >Priority: Blocker > Labels: test-stability > Fix For: 1.5.0 > > > There is a test instability in > {{UtilsTest.testYarnFlinkResourceManagerJobManagerLostLeadership}} on Travis. > > https://api.travis-ci.org/v3/job/330406729/log.txt -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (FLINK-8460) UtilsTest.testYarnFlinkResourceManagerJobManagerLostLeadership instable on Travis
[ https://issues.apache.org/jira/browse/FLINK-8460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aljoscha Krettek updated FLINK-8460: Priority: Blocker (was: Critical) > UtilsTest.testYarnFlinkResourceManagerJobManagerLostLeadership instable on > Travis > - > > Key: FLINK-8460 > URL: https://issues.apache.org/jira/browse/FLINK-8460 > Project: Flink > Issue Type: Bug > Components: Tests, YARN >Affects Versions: 1.5.0 >Reporter: Till Rohrmann >Priority: Blocker > Labels: test-stability > Fix For: 1.5.0 > > > There is a test instability in > {{UtilsTest.testYarnFlinkResourceManagerJobManagerLostLeadership}} on Travis. > > https://api.travis-ci.org/v3/job/330406729/log.txt -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (FLINK-8402) HadoopS3FileSystemITCase.testDirectoryListing fails on Travis
[ https://issues.apache.org/jira/browse/FLINK-8402?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aljoscha Krettek updated FLINK-8402: Priority: Blocker (was: Critical) > HadoopS3FileSystemITCase.testDirectoryListing fails on Travis > - > > Key: FLINK-8402 > URL: https://issues.apache.org/jira/browse/FLINK-8402 > Project: Flink > Issue Type: Bug > Components: FileSystem, Tests >Affects Versions: 1.5.0 >Reporter: Till Rohrmann >Priority: Blocker > Labels: test-stability > Fix For: 1.5.0, 1.4.2 > > > The test {{HadoopS3FileSystemITCase.testDirectoryListing}} fails on Travis. > https://travis-ci.org/tillrohrmann/flink/jobs/327021175 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (FLINK-8408) YarnFileStageTestS3ITCase.testRecursiveUploadForYarnS3n unstable on Travis
[ https://issues.apache.org/jira/browse/FLINK-8408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aljoscha Krettek updated FLINK-8408: Priority: Blocker (was: Critical) > YarnFileStageTestS3ITCase.testRecursiveUploadForYarnS3n unstable on Travis > -- > > Key: FLINK-8408 > URL: https://issues.apache.org/jira/browse/FLINK-8408 > Project: Flink > Issue Type: Bug > Components: FileSystem, Tests >Affects Versions: 1.5.0 >Reporter: Till Rohrmann >Priority: Blocker > Labels: test-stability > Fix For: 1.5.0, 1.4.2 > > > The {{YarnFileStageTestS3ITCase.testRecursiveUploadForYarnS3n}} is unstable > on Travis. > https://travis-ci.org/tillrohrmann/flink/jobs/327216460 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (FLINK-8452) BucketingSinkTest.testNonRollingAvroKeyValueWithCompressionWriter instable on Travis
[ https://issues.apache.org/jira/browse/FLINK-8452?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aljoscha Krettek updated FLINK-8452: Priority: Blocker (was: Critical) > BucketingSinkTest.testNonRollingAvroKeyValueWithCompressionWriter instable on > Travis > > > Key: FLINK-8452 > URL: https://issues.apache.org/jira/browse/FLINK-8452 > Project: Flink > Issue Type: Bug > Components: DataStream API, Tests >Affects Versions: 1.5.0 >Reporter: Till Rohrmann >Priority: Blocker > Labels: test-stability > Fix For: 1.5.0 > > > The {{BucketingSinkTest.testNonRollingAvroKeyValueWithCompressionWriter}} > seems to be instable on Travis: > > https://travis-ci.org/tillrohrmann/flink/jobs/330261310 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (FLINK-8452) BucketingSinkTest.testNonRollingAvroKeyValueWithCompressionWriter instable on Travis
[ https://issues.apache.org/jira/browse/FLINK-8452?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aljoscha Krettek updated FLINK-8452: Fix Version/s: 1.5.0 > BucketingSinkTest.testNonRollingAvroKeyValueWithCompressionWriter instable on > Travis > > > Key: FLINK-8452 > URL: https://issues.apache.org/jira/browse/FLINK-8452 > Project: Flink > Issue Type: Bug > Components: DataStream API, Tests >Affects Versions: 1.5.0 >Reporter: Till Rohrmann >Priority: Blocker > Labels: test-stability > Fix For: 1.5.0 > > > The {{BucketingSinkTest.testNonRollingAvroKeyValueWithCompressionWriter}} > seems to be instable on Travis: > > https://travis-ci.org/tillrohrmann/flink/jobs/330261310 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (FLINK-8418) Kafka08ITCase.testStartFromLatestOffsets() times out on Travis
[ https://issues.apache.org/jira/browse/FLINK-8418?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aljoscha Krettek updated FLINK-8418: Priority: Blocker (was: Critical) > Kafka08ITCase.testStartFromLatestOffsets() times out on Travis > -- > > Key: FLINK-8418 > URL: https://issues.apache.org/jira/browse/FLINK-8418 > Project: Flink > Issue Type: Bug > Components: Kafka Connector, Tests >Affects Versions: 1.4.0, 1.5.0 >Reporter: Tzu-Li (Gordon) Tai >Assignee: Tzu-Li (Gordon) Tai >Priority: Blocker > Labels: test-stability > Fix For: 1.5.0, 1.4.2 > > > Instance: https://travis-ci.org/kl0u/flink/builds/327733085 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (FLINK-8521) FlinkKafkaProducer011ITCase.testRunOutOfProducersInThePool timed out on Travis
[ https://issues.apache.org/jira/browse/FLINK-8521?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aljoscha Krettek updated FLINK-8521: Priority: Blocker (was: Critical) > FlinkKafkaProducer011ITCase.testRunOutOfProducersInThePool timed out on Travis > -- > > Key: FLINK-8521 > URL: https://issues.apache.org/jira/browse/FLINK-8521 > Project: Flink > Issue Type: Bug > Components: Kafka Connector, Tests >Affects Versions: 1.5.0 >Reporter: Till Rohrmann >Priority: Blocker > Labels: test-stability > Fix For: 1.5.0, 1.4.2 > > > The {{FlinkKafkaProducer011ITCase.testRunOutOfProducersInThePool}} timed out > on Travis with producing no output for longer than 300s. > > https://travis-ci.org/tillrohrmann/flink/jobs/334642014 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (FLINK-8512) HAQueryableStateFsBackendITCase unstable on Travis
[ https://issues.apache.org/jira/browse/FLINK-8512?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aljoscha Krettek updated FLINK-8512: Priority: Blocker (was: Critical) > HAQueryableStateFsBackendITCase unstable on Travis > -- > > Key: FLINK-8512 > URL: https://issues.apache.org/jira/browse/FLINK-8512 > Project: Flink > Issue Type: Bug > Components: Queryable State, Tests >Affects Versions: 1.5.0 >Reporter: Till Rohrmann >Priority: Blocker > Labels: test-stability > Fix For: 1.5.0, 1.4.2 > > > {{HAQueryableStateFsBackendITCase}} is unstable on Travis. > In the logs one can see that there is an {{AssertionError}} in the > {{globalEventExecutor-1-1}} {{Thread}}. This indicates that assertions are > not properly propagated and simply swallowed. > > https://travis-ci.org/apache/flink/jobs/333250401 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (FLINK-8517) StaticlyNestedIterationsITCase.testJobWithoutObjectReuse unstable on Travis
[ https://issues.apache.org/jira/browse/FLINK-8517?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aljoscha Krettek updated FLINK-8517: Priority: Blocker (was: Critical) > StaticlyNestedIterationsITCase.testJobWithoutObjectReuse unstable on Travis > --- > > Key: FLINK-8517 > URL: https://issues.apache.org/jira/browse/FLINK-8517 > Project: Flink > Issue Type: Bug > Components: DataSet API, Tests >Affects Versions: 1.5.0 >Reporter: Till Rohrmann >Priority: Blocker > Labels: test-stability > Fix For: 1.5.0, 1.4.2 > > > The {{StaticlyNestedIterationsITCase.testJobWithoutObjectReuse}} test case > fails on Travis. This exception might be relevant: > {code:java} > org.apache.flink.runtime.client.JobExecutionException: Job execution failed. > at > org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1$$anonfun$applyOrElse$6.apply$mcV$sp(JobManager.scala:891) > at > org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1$$anonfun$applyOrElse$6.apply(JobManager.scala:834) > at > org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1$$anonfun$applyOrElse$6.apply(JobManager.scala:834) > at > scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24) > at > scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24) > at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:39) > at > akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:415) > at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) > at > scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) > at > scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) > at > scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) > Caused by: java.lang.IllegalStateException: Partition > 557b069f2b89f8ba599e6ab0956a3f5a@58f1a6b7d8ae10b9141f17c08d06cecb not > registered at task event dispatcher. > at > org.apache.flink.runtime.io.network.TaskEventDispatcher.subscribeToEvent(TaskEventDispatcher.java:107) > at > org.apache.flink.runtime.iterative.task.IterationHeadTask.initSuperstepBarrier(IterationHeadTask.java:242) > at > org.apache.flink.runtime.iterative.task.IterationHeadTask.run(IterationHeadTask.java:266) > at > org.apache.flink.runtime.operators.BatchTask.invoke(BatchTask.java:368) > at org.apache.flink.runtime.taskmanager.Task.run(Task.java:703) > at java.lang.Thread.run(Thread.java:748){code} > > https://api.travis-ci.org/v3/job/60156/log.txt -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (FLINK-8731) TwoInputStreamTaskTest flaky on travis
[ https://issues.apache.org/jira/browse/FLINK-8731?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aljoscha Krettek updated FLINK-8731: Issue Type: Bug (was: Improvement) > TwoInputStreamTaskTest flaky on travis > -- > > Key: FLINK-8731 > URL: https://issues.apache.org/jira/browse/FLINK-8731 > Project: Flink > Issue Type: Bug > Components: Streaming, Tests >Affects Versions: 1.5.0 >Reporter: Chesnay Schepler >Priority: Blocker > Labels: test-stability > Fix For: 1.5.0 > > > https://travis-ci.org/zentol/flink/builds/344307861 > {code} > Tests run: 5, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 2.479 sec <<< > FAILURE! - in org.apache.flink.streaming.runtime.tasks.TwoInputStreamTaskTest > testOpenCloseAndTimestamps(org.apache.flink.streaming.runtime.tasks.TwoInputStreamTaskTest) > Time elapsed: 0.05 sec <<< ERROR! > java.lang.Exception: error in task > at > org.apache.flink.streaming.runtime.tasks.StreamTaskTestHarness.waitForTaskCompletion(StreamTaskTestHarness.java:250) > at > org.apache.flink.streaming.runtime.tasks.StreamTaskTestHarness.waitForTaskCompletion(StreamTaskTestHarness.java:233) > at > org.apache.flink.streaming.runtime.tasks.TwoInputStreamTaskTest.testOpenCloseAndTimestamps(TwoInputStreamTaskTest.java:99) > Caused by: org.mockito.exceptions.misusing.WrongTypeOfReturnValue: > Boolean cannot be returned by getChannelIndex() > getChannelIndex() should return int > *** > If you're unsure why you're getting above error read on. > Due to the nature of the syntax above problem might occur because: > 1. This exception *might* occur in wrongly written multi-threaded tests. >Please refer to Mockito FAQ on limitations of concurrency testing. > 2. A spy is stubbed using when(spy.foo()).then() syntax. It is safer to stub > spies - >- with doReturn|Throw() family of methods. More in javadocs for > Mockito.spy() method. > at > org.apache.flink.runtime.io.network.partition.consumer.UnionInputGate.waitAndGetNextInputGate(UnionInputGate.java:212) > at > org.apache.flink.runtime.io.network.partition.consumer.UnionInputGate.getNextBufferOrEvent(UnionInputGate.java:158) > at > org.apache.flink.streaming.runtime.io.BarrierBuffer.getNextNonBlocked(BarrierBuffer.java:164) > at > org.apache.flink.streaming.runtime.io.StreamTwoInputProcessor.processInput(StreamTwoInputProcessor.java:292) > at > org.apache.flink.streaming.runtime.tasks.TwoInputStreamTask.run(TwoInputStreamTask.java:115) > at > org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:308) > at > org.apache.flink.streaming.runtime.tasks.StreamTaskTestHarness$TaskThread.run(StreamTaskTestHarness.java:437) > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-8792) bad semantic of method name MessageQueryParameter.convertStringToValue
[ https://issues.apache.org/jira/browse/FLINK-8792?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16378136#comment-16378136 ] ASF GitHub Bot commented on FLINK-8792: --- GitHub user yanghua opened a pull request: https://github.com/apache/flink/pull/5587 [FLINK-8792][REST] bad semantic of method name MessageQueryParameter.convertStringToValue ## What is the purpose of the change *This pull request changed a method name to let it has a good semantic* ## Brief change log - *changed method name from `convertStringToValue` to `convertValueToString`* ## Verifying this change This change is a trivial rework / code cleanup without any test coverage. ## Does this pull request potentially affect one of the following parts: - Dependencies (does it add or upgrade a dependency): (no) - The public API, i.e., is any changed class annotated with `@Public(Evolving)`: (no) - The serializers: (no) - The runtime per-record code paths (performance sensitive): (no) - Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Yarn/Mesos, ZooKeeper: (no) - The S3 file system connector: (no) ## Documentation - Does this pull request introduce a new feature? (no) - If yes, how is the feature documented? (not documented) You can merge this pull request into a Git repository by running: $ git pull https://github.com/yanghua/flink FLINK-8792 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/5587.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #5587 commit b9907188e66785cc86d6e3b7c0a825a52b86eb28 Author: vinoyangDate: 2018-02-27T06:43:52Z [FLINK-8792][REST] bad semantic of method name MessageQueryParameter.convertStringToValue > bad semantic of method name MessageQueryParameter.convertStringToValue > > > Key: FLINK-8792 > URL: https://issues.apache.org/jira/browse/FLINK-8792 > Project: Flink > Issue Type: Improvement > Components: REST >Reporter: vinoyang >Assignee: vinoyang >Priority: Minor > > the method name > {code:java} > MessageQueryParameter.convertStringToValue > {code} > should be > {code:java} > convertValueToString > {code} > or > {code:java} > convertStringFromValue{code} > I think > {code:java} > convertValueToString > {code} > would be better. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] flink pull request #5587: [FLINK-8792][REST] bad semantic of method name Mes...
GitHub user yanghua opened a pull request: https://github.com/apache/flink/pull/5587 [FLINK-8792][REST] bad semantic of method name MessageQueryParameter.convertStringToValue ## What is the purpose of the change *This pull request changed a method name to let it has a good semantic* ## Brief change log - *changed method name from `convertStringToValue` to `convertValueToString`* ## Verifying this change This change is a trivial rework / code cleanup without any test coverage. ## Does this pull request potentially affect one of the following parts: - Dependencies (does it add or upgrade a dependency): (no) - The public API, i.e., is any changed class annotated with `@Public(Evolving)`: (no) - The serializers: (no) - The runtime per-record code paths (performance sensitive): (no) - Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Yarn/Mesos, ZooKeeper: (no) - The S3 file system connector: (no) ## Documentation - Does this pull request introduce a new feature? (no) - If yes, how is the feature documented? (not documented) You can merge this pull request into a Git repository by running: $ git pull https://github.com/yanghua/flink FLINK-8792 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/5587.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #5587 commit b9907188e66785cc86d6e3b7c0a825a52b86eb28 Author: vinoyangDate: 2018-02-27T06:43:52Z [FLINK-8792][REST] bad semantic of method name MessageQueryParameter.convertStringToValue ---
[jira] [Commented] (FLINK-8689) Add runtime support of distinct filter using MapView
[ https://issues.apache.org/jira/browse/FLINK-8689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16378122#comment-16378122 ] ASF GitHub Bot commented on FLINK-8689: --- Github user hequn8128 commented on a diff in the pull request: https://github.com/apache/flink/pull/#discussion_r170828036 --- Diff: flink-libraries/flink-table/src/main/scala/org/apache/flink/table/runtime/aggregate/AggregateUtil.scala --- @@ -1393,6 +1393,21 @@ object AggregateUtil { throw new TableException(s"unsupported Function: '${unSupported.getName}'") } } + + // create distinct accumulator delegate + if (aggregateCall.isDistinct) { --- End diff -- Sql will be verified by calcite to exclude single DISTINCT during sql parse phase, so maybe we don't have to consider single distinct here. > Add runtime support of distinct filter using MapView > - > > Key: FLINK-8689 > URL: https://issues.apache.org/jira/browse/FLINK-8689 > Project: Flink > Issue Type: Sub-task >Reporter: Rong Rong >Assignee: Rong Rong >Priority: Major > > This ticket should cover distinct aggregate function support to codegen for > *AggregateCall*, where *isDistinct* fields is set to true. > This can be verified using the following SQL, which is not currently > producing correct results. > {code:java} > SELECT > a, > SUM(b) OVER (PARTITION BY a ORDER BY proctime ROWS BETWEEN 5 PRECEDING AND > CURRENT ROW) > FROM > MyTable{code} > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-8689) Add runtime support of distinct filter using MapView
[ https://issues.apache.org/jira/browse/FLINK-8689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16378121#comment-16378121 ] ASF GitHub Bot commented on FLINK-8689: --- Github user hequn8128 commented on a diff in the pull request: https://github.com/apache/flink/pull/#discussion_r170828027 --- Diff: flink-libraries/flink-table/src/main/scala/org/apache/flink/table/functions/aggfunctions/DistinctAggDelegateFunction.scala --- @@ -0,0 +1,122 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flink.table.functions.aggfunctions + +import java.util + +import org.apache.flink.api.common.typeinfo.{BasicTypeInfo, TypeInformation} +import org.apache.flink.api.java.typeutils.{PojoField, PojoTypeInfo} +import org.apache.flink.table.api.dataview.MapView +import org.apache.flink.table.dataview.MapViewTypeInfo +import org.apache.flink.table.functions.AggregateFunction + +class DistinctAccumulator[E, ACC] (var mapView: MapView[E, Integer], var realAcc: ACC) { + def this() { +this(null, null.asInstanceOf[ACC]) + } + + def getRealAcc: ACC = realAcc + + def canEqual(a: Any): Boolean = a.isInstanceOf[DistinctAccumulator[E, ACC]] + + override def equals(that: Any): Boolean = +that match { + case that: DistinctAccumulator[E, ACC] => that.canEqual(this) && +this.mapView == that.mapView + case _ => false +} + +} + +class DistinctAggDelegateFunction[E, ACC](elementTypeInfo: TypeInformation[_], + var realAgg: AggregateFunction[_, ACC]) + extends AggregateFunction[util.Map[E, Integer], DistinctAccumulator[E, ACC]] { + + def getRealAgg: AggregateFunction[_, ACC] = realAgg + + override def createAccumulator(): DistinctAccumulator[E, ACC] = { +new DistinctAccumulator[E, ACC]( + new MapView[E, Integer]( +elementTypeInfo.asInstanceOf[TypeInformation[E]], +BasicTypeInfo.INT_TYPE_INFO), + realAgg.createAccumulator()) + } + + def accumulate(acc: DistinctAccumulator[E, ACC], element: E): Boolean = { +if (element != null) { + if (acc.mapView.contains(element)) { +acc.mapView.put(element, acc.mapView.get(element) + 1) +false + } else { +acc.mapView.put(element, 1) +true + } +} else { + false +} + } + + def accumulate(acc: DistinctAccumulator[E, ACC], element: E, count: Int): Boolean = { +if (element != null) { + if (acc.mapView.contains(element)) { +acc.mapView.put(element, acc.mapView.get(element) + count) +false + } else { +acc.mapView.put(element, count) +true + } +} else { + false +} + } + + def retract(acc: DistinctAccumulator[E, ACC], element: E): Boolean = { +if (element != null) { + val count = acc.mapView.get(element) + if (count == 1) { +acc.mapView.remove(element) +true + } else { +acc.mapView.put(element, count - 1) +false + } +} else { + false +} + } + + def resetAccumulator(acc: DistinctAccumulator[E, ACC]): Unit = { +acc.mapView.clear() + } + + override def getValue(acc: DistinctAccumulator[E, ACC]): util.Map[E, Integer] = { +acc.mapView.map + } + + override def getAccumulatorType: TypeInformation[DistinctAccumulator[E, ACC]] = { +val clazz = classOf[DistinctAccumulator[E, ACC]] +val pojoFields = new util.ArrayList[PojoField] +pojoFields.add(new PojoField(clazz.getDeclaredField("mapView"), + new MapViewTypeInfo[E, Integer]( +elementTypeInfo.asInstanceOf[TypeInformation[E]], BasicTypeInfo.INT_TYPE_INFO))) +pojoFields.add(new
[GitHub] flink pull request #5555: [FLINK-8689][table]Add runtime support of distinct...
Github user hequn8128 commented on a diff in the pull request: https://github.com/apache/flink/pull/#discussion_r170828036 --- Diff: flink-libraries/flink-table/src/main/scala/org/apache/flink/table/runtime/aggregate/AggregateUtil.scala --- @@ -1393,6 +1393,21 @@ object AggregateUtil { throw new TableException(s"unsupported Function: '${unSupported.getName}'") } } + + // create distinct accumulator delegate + if (aggregateCall.isDistinct) { --- End diff -- Sql will be verified by calcite to exclude single DISTINCT during sql parse phase, so maybe we don't have to consider single distinct here. ---
[GitHub] flink pull request #5555: [FLINK-8689][table]Add runtime support of distinct...
Github user hequn8128 commented on a diff in the pull request: https://github.com/apache/flink/pull/#discussion_r170828027 --- Diff: flink-libraries/flink-table/src/main/scala/org/apache/flink/table/functions/aggfunctions/DistinctAggDelegateFunction.scala --- @@ -0,0 +1,122 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flink.table.functions.aggfunctions + +import java.util + +import org.apache.flink.api.common.typeinfo.{BasicTypeInfo, TypeInformation} +import org.apache.flink.api.java.typeutils.{PojoField, PojoTypeInfo} +import org.apache.flink.table.api.dataview.MapView +import org.apache.flink.table.dataview.MapViewTypeInfo +import org.apache.flink.table.functions.AggregateFunction + +class DistinctAccumulator[E, ACC] (var mapView: MapView[E, Integer], var realAcc: ACC) { + def this() { +this(null, null.asInstanceOf[ACC]) + } + + def getRealAcc: ACC = realAcc + + def canEqual(a: Any): Boolean = a.isInstanceOf[DistinctAccumulator[E, ACC]] + + override def equals(that: Any): Boolean = +that match { + case that: DistinctAccumulator[E, ACC] => that.canEqual(this) && +this.mapView == that.mapView + case _ => false +} + +} + +class DistinctAggDelegateFunction[E, ACC](elementTypeInfo: TypeInformation[_], + var realAgg: AggregateFunction[_, ACC]) + extends AggregateFunction[util.Map[E, Integer], DistinctAccumulator[E, ACC]] { + + def getRealAgg: AggregateFunction[_, ACC] = realAgg + + override def createAccumulator(): DistinctAccumulator[E, ACC] = { +new DistinctAccumulator[E, ACC]( + new MapView[E, Integer]( +elementTypeInfo.asInstanceOf[TypeInformation[E]], +BasicTypeInfo.INT_TYPE_INFO), + realAgg.createAccumulator()) + } + + def accumulate(acc: DistinctAccumulator[E, ACC], element: E): Boolean = { +if (element != null) { + if (acc.mapView.contains(element)) { +acc.mapView.put(element, acc.mapView.get(element) + 1) +false + } else { +acc.mapView.put(element, 1) +true + } +} else { + false +} + } + + def accumulate(acc: DistinctAccumulator[E, ACC], element: E, count: Int): Boolean = { +if (element != null) { + if (acc.mapView.contains(element)) { +acc.mapView.put(element, acc.mapView.get(element) + count) +false + } else { +acc.mapView.put(element, count) +true + } +} else { + false +} + } + + def retract(acc: DistinctAccumulator[E, ACC], element: E): Boolean = { +if (element != null) { + val count = acc.mapView.get(element) + if (count == 1) { +acc.mapView.remove(element) +true + } else { +acc.mapView.put(element, count - 1) +false + } +} else { + false +} + } + + def resetAccumulator(acc: DistinctAccumulator[E, ACC]): Unit = { +acc.mapView.clear() + } + + override def getValue(acc: DistinctAccumulator[E, ACC]): util.Map[E, Integer] = { +acc.mapView.map + } + + override def getAccumulatorType: TypeInformation[DistinctAccumulator[E, ACC]] = { +val clazz = classOf[DistinctAccumulator[E, ACC]] +val pojoFields = new util.ArrayList[PojoField] +pojoFields.add(new PojoField(clazz.getDeclaredField("mapView"), + new MapViewTypeInfo[E, Integer]( +elementTypeInfo.asInstanceOf[TypeInformation[E]], BasicTypeInfo.INT_TYPE_INFO))) +pojoFields.add(new PojoField(clazz.getDeclaredField("realAcc"), + realAgg.getAccumulatorType)) +new PojoTypeInfo[DistinctAccumulator[E, ACC]](clazz, pojoFields) --- End diff -- Make sense. ---
[jira] [Updated] (FLINK-8792) bad semantic of method name MessageQueryParameter.convertStringToValue
[ https://issues.apache.org/jira/browse/FLINK-8792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] vinoyang updated FLINK-8792: Description: the method name {code:java} MessageQueryParameter.convertStringToValue {code} should be {code:java} convertValueToString {code} or {code:java} convertStringFromValue{code} I think {code:java} convertValueToString {code} would be better. was:the method name > bad semantic of method name MessageQueryParameter.convertStringToValue > > > Key: FLINK-8792 > URL: https://issues.apache.org/jira/browse/FLINK-8792 > Project: Flink > Issue Type: Improvement > Components: REST >Reporter: vinoyang >Assignee: vinoyang >Priority: Minor > > the method name > {code:java} > MessageQueryParameter.convertStringToValue > {code} > should be > {code:java} > convertValueToString > {code} > or > {code:java} > convertStringFromValue{code} > I think > {code:java} > convertValueToString > {code} > would be better. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (FLINK-8792) bad semantic of method name MessageQueryParameter.convertStringToValue
[ https://issues.apache.org/jira/browse/FLINK-8792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] vinoyang updated FLINK-8792: Description: the method name (was: the ) > bad semantic of method name MessageQueryParameter.convertStringToValue > > > Key: FLINK-8792 > URL: https://issues.apache.org/jira/browse/FLINK-8792 > Project: Flink > Issue Type: Improvement > Components: REST >Reporter: vinoyang >Assignee: vinoyang >Priority: Minor > > the method name -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (FLINK-8792) bad semantic of method MessageQueryParameter.convertStringToValue
[ https://issues.apache.org/jira/browse/FLINK-8792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] vinoyang updated FLINK-8792: Description: the > bad semantic of method MessageQueryParameter.convertStringToValue > --- > > Key: FLINK-8792 > URL: https://issues.apache.org/jira/browse/FLINK-8792 > Project: Flink > Issue Type: Improvement > Components: REST >Reporter: vinoyang >Assignee: vinoyang >Priority: Minor > > the -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (FLINK-8792) bad semantic of method name MessageQueryParameter.convertStringToValue
[ https://issues.apache.org/jira/browse/FLINK-8792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] vinoyang updated FLINK-8792: Summary: bad semantic of method name MessageQueryParameter.convertStringToValue (was: bad semantic of method MessageQueryParameter.convertStringToValue ) > bad semantic of method name MessageQueryParameter.convertStringToValue > > > Key: FLINK-8792 > URL: https://issues.apache.org/jira/browse/FLINK-8792 > Project: Flink > Issue Type: Improvement > Components: REST >Reporter: vinoyang >Assignee: vinoyang >Priority: Minor > > the -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Closed] (FLINK-8543) Output Stream closed at org.apache.hadoop.fs.s3a.S3AOutputStream.checkOpen
[ https://issues.apache.org/jira/browse/FLINK-8543?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aljoscha Krettek closed FLINK-8543. --- Resolution: Fixed Fix Version/s: 1.4.3 Fixed on release-1.4 in b74c705157fef3e0d305fdd5bf1a006ae0a98666 Fixed on master in 915213c7afaf3f9d04c240f43d88710280d844e3 > Output Stream closed at org.apache.hadoop.fs.s3a.S3AOutputStream.checkOpen > -- > > Key: FLINK-8543 > URL: https://issues.apache.org/jira/browse/FLINK-8543 > Project: Flink > Issue Type: Bug > Components: Streaming Connectors >Affects Versions: 1.4.0 > Environment: IBM Analytics Engine - > [https://console.bluemix.net/docs/services/AnalyticsEngine/index.html#introduction] > The cluster is based on Hortonworks Data Platform 2.6.2. The following > components are made available. > Apache Spark 2.1.1 Hadoop 2.7.3 > Apache Livy 0.3.0 > Knox 0.12.0 > Ambari 2.5.2 > Anaconda with Python 2.7.13 and 3.5.2 > Jupyter Enterprise Gateway 0.5.0 > HBase 1.1.2 * > Hive 1.2.1 * > Oozie 4.2.0 * > Flume 1.5.2 * > Tez 0.7.0 * > Pig 0.16.0 * > Sqoop 1.4.6 * > Slider 0.92.0 * >Reporter: chris snow >Priority: Blocker > Fix For: 1.4.3, 1.5.0 > > Attachments: Screen Shot 2018-01-30 at 18.34.51.png > > > I'm hitting an issue with my BucketingSink from a streaming job. > > {code:java} > return new BucketingSink>(path) > .setWriter(writer) > .setBucketer(new DateTimeBucketer Object>>(formatString)); > {code} > > I can see that a few files have run into issues with uploading to S3: > !Screen Shot 2018-01-30 at 18.34.51.png! > The Flink console output is showing an exception being thrown by > S3AOutputStream, so I've grabbed the S3AOutputStream class from my cluster > and added some additional logging to the checkOpen() method to log the 'key' > just before the exception is thrown: > > {code:java} > /* > * Decompiled with CFR. > */ > package org.apache.hadoop.fs.s3a; > import com.amazonaws.AmazonClientException; > import com.amazonaws.event.ProgressListener; > import com.amazonaws.services.s3.model.ObjectMetadata; > import com.amazonaws.services.s3.model.PutObjectRequest; > import com.amazonaws.services.s3.transfer.Upload; > import com.amazonaws.services.s3.transfer.model.UploadResult; > import java.io.BufferedOutputStream; > import java.io.File; > import java.io.FileOutputStream; > import java.io.IOException; > import java.io.InterruptedIOException; > import java.io.OutputStream; > import java.util.concurrent.atomic.AtomicBoolean; > import org.apache.hadoop.classification.InterfaceAudience; > import org.apache.hadoop.classification.InterfaceStability; > import org.apache.hadoop.conf.Configuration; > import org.apache.hadoop.fs.s3a.ProgressableProgressListener; > import org.apache.hadoop.fs.s3a.S3AFileSystem; > import org.apache.hadoop.fs.s3a.S3AUtils; > import org.apache.hadoop.util.Progressable; > import org.slf4j.Logger; > @InterfaceAudience.Private > @InterfaceStability.Evolving > public class S3AOutputStream > extends OutputStream { > private final OutputStream backupStream; > private final File backupFile; > private final AtomicBoolean closed = new AtomicBoolean(false); > private final String key; > private final Progressable progress; > private final S3AFileSystem fs; > public static final Logger LOG = S3AFileSystem.LOG; > public S3AOutputStream(Configuration conf, S3AFileSystem fs, String key, > Progressable progress) throws IOException { > this.key = key; > this.progress = progress; > this.fs = fs; > this.backupFile = fs.createTmpFileForWrite("output-", -1, conf); > LOG.debug("OutputStream for key '{}' writing to tempfile: {}", > (Object)key, (Object)this.backupFile); > this.backupStream = new BufferedOutputStream(new > FileOutputStream(this.backupFile)); > } > void checkOpen() throws IOException { > if (!this.closed.get()) return; > // vv-- Additional logging --vvv > LOG.error("OutputStream for key '{}' closed.", (Object)this.key); > throw new IOException("Output Stream closed"); > } > @Override > public void flush() throws IOException { > this.checkOpen(); > this.backupStream.flush(); > } > @Override > public void close() throws IOException { > if (this.closed.getAndSet(true)) { > return; > } > this.backupStream.close(); > LOG.debug("OutputStream for key '{}' closed. Now beginning upload", > (Object)this.key); > try { > ObjectMetadata om = > this.fs.newObjectMetadata(this.backupFile.length()); > Upload upload = >
[jira] [Created] (FLINK-8792) bad semantic of method MessageQueryParameter.convertStringToValue
vinoyang created FLINK-8792: --- Summary: bad semantic of method MessageQueryParameter.convertStringToValue Key: FLINK-8792 URL: https://issues.apache.org/jira/browse/FLINK-8792 Project: Flink Issue Type: Improvement Components: REST Reporter: vinoyang Assignee: vinoyang -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-8756) Support ClusterClient.getAccumulators() in RestClusterClient
[ https://issues.apache.org/jira/browse/FLINK-8756?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16378093#comment-16378093 ] ASF GitHub Bot commented on FLINK-8756: --- Github user yanghua commented on the issue: https://github.com/apache/flink/pull/5573 @GJL I have refactored the code and test case, could you please review the new commit? Thanks. > Support ClusterClient.getAccumulators() in RestClusterClient > > > Key: FLINK-8756 > URL: https://issues.apache.org/jira/browse/FLINK-8756 > Project: Flink > Issue Type: Improvement > Components: Client >Affects Versions: 1.5.0 >Reporter: Aljoscha Krettek >Assignee: vinoyang >Priority: Blocker > Fix For: 1.5.0 > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] flink issue #5573: [FLINK-8756][Client] Support ClusterClient.getAccumulator...
Github user yanghua commented on the issue: https://github.com/apache/flink/pull/5573 @GJL I have refactored the code and test case, could you please review the new commit? Thanks. ---
[jira] [Closed] (FLINK-8044) Introduce scheduling mechanism to satisfy both state locality and input
[ https://issues.apache.org/jira/browse/FLINK-8044?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sihua Zhou closed FLINK-8044. - Resolution: Duplicate Release Note: Already fixed by Stefan Richter. > Introduce scheduling mechanism to satisfy both state locality and input > --- > > Key: FLINK-8044 > URL: https://issues.apache.org/jira/browse/FLINK-8044 > Project: Flink > Issue Type: New Feature > Components: Scheduler >Affects Versions: 1.4.0 >Reporter: Sihua Zhou >Priority: Major > > For local recovery we need a scheduler to satisfy both state locality and > input, but current scheduler allocate resources according to the order of > topologically, this can cause bad result if we allocate base on both state > and input, so some revision for scheduler is necessary now. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-8620) Enable shipping custom artifacts to BlobStore and accessing them through DistributedCache
[ https://issues.apache.org/jira/browse/FLINK-8620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16377993#comment-16377993 ] ASF GitHub Bot commented on FLINK-8620: --- Github user ifndef-SleePy commented on a diff in the pull request: https://github.com/apache/flink/pull/5580#discussion_r170804963 --- Diff: flink-runtime/src/main/java/org/apache/flink/runtime/taskexecutor/TaskManagerRunner.java --- @@ -314,6 +315,9 @@ public static TaskExecutor startTaskManager( TaskManagerConfiguration taskManagerConfiguration = TaskManagerConfiguration.fromConfiguration(configuration); + final FileCache fileCache = new FileCache(taskManagerServicesConfiguration.getTmpDirPaths(), --- End diff -- It seems that the `fileCache` here is not been used at all. > Enable shipping custom artifacts to BlobStore and accessing them through > DistributedCache > - > > Key: FLINK-8620 > URL: https://issues.apache.org/jira/browse/FLINK-8620 > Project: Flink > Issue Type: New Feature >Reporter: Dawid Wysakowicz >Assignee: Dawid Wysakowicz >Priority: Major > > We should be able to distribute custom files to taskmanagers. To do that we > can store those files in BlobStore and later on access them in TaskManagers > through DistributedCache. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] flink pull request #5580: [FLINK-8620] Enable shipping custom files to BlobS...
Github user ifndef-SleePy commented on a diff in the pull request: https://github.com/apache/flink/pull/5580#discussion_r170804963 --- Diff: flink-runtime/src/main/java/org/apache/flink/runtime/taskexecutor/TaskManagerRunner.java --- @@ -314,6 +315,9 @@ public static TaskExecutor startTaskManager( TaskManagerConfiguration taskManagerConfiguration = TaskManagerConfiguration.fromConfiguration(configuration); + final FileCache fileCache = new FileCache(taskManagerServicesConfiguration.getTmpDirPaths(), --- End diff -- It seems that the `fileCache` here is not been used at all. ---
[jira] [Commented] (FLINK-8245) Add some interval between logs when requested resources are not available
[ https://issues.apache.org/jira/browse/FLINK-8245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16377964#comment-16377964 ] Wind commented on FLINK-8245: - Anyone can assign it to me if possible ? > Add some interval between logs when requested resources are not available > - > > Key: FLINK-8245 > URL: https://issues.apache.org/jira/browse/FLINK-8245 > Project: Flink > Issue Type: Improvement >Reporter: Wind >Priority: Minor > Labels: starter > > Maybe we can add a thread.sleep(1000) after this log ? > I will get plenty of repeated logs and disk will run out of space soon when > yarn cluster is short of resources. Is it acceptable ? > {code:java} > LOG.info("Deployment took more than 60 seconds. Please check if the requested > resources are available in the YARN cluster"); > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (FLINK-8245) Add some interval between logs when requested resources are not available
[ https://issues.apache.org/jira/browse/FLINK-8245?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wind updated FLINK-8245: Labels: starter (was: ) > Add some interval between logs when requested resources are not available > - > > Key: FLINK-8245 > URL: https://issues.apache.org/jira/browse/FLINK-8245 > Project: Flink > Issue Type: Improvement >Reporter: Wind >Priority: Minor > Labels: starter > > Maybe we can add a thread.sleep(1000) after this log ? > I will get plenty of repeated logs and disk will run out of space soon when > yarn cluster is short of resources. Is it acceptable ? > {code:java} > LOG.info("Deployment took more than 60 seconds. Please check if the requested > resources are available in the YARN cluster"); > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (FLINK-8753) Introduce savepoint that go though the incremental checkpoint path
[ https://issues.apache.org/jira/browse/FLINK-8753?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sihua Zhou updated FLINK-8753: -- Summary: Introduce savepoint that go though the incremental checkpoint path (was: Introduce Incremental savepoint) > Introduce savepoint that go though the incremental checkpoint path > -- > > Key: FLINK-8753 > URL: https://issues.apache.org/jira/browse/FLINK-8753 > Project: Flink > Issue Type: New Feature > Components: State Backends, Checkpointing >Affects Versions: 1.5.0 >Reporter: Sihua Zhou >Assignee: Sihua Zhou >Priority: Major > > Right now, savepoint goes through the full checkpoint path, take a savepoint > could be slowly. In our production, for some long term job it often costs > more than 10min to complete a savepoint which is unacceptable for a real time > job, so we have to turn back to use the externalized checkpoint instead > currently. But the externalized checkpoint has a time interval (checkpoint > interval) between the last time. So I proposal to introduce the increment > savepoint which goes through the increment checkpoint path. > Any advice would be appreciated! -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (FLINK-8753) Introduce Incremental savepoint
[ https://issues.apache.org/jira/browse/FLINK-8753?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16377908#comment-16377908 ] Sihua Zhou edited comment on FLINK-8753 at 2/27/18 2:26 AM: [~StephanEwen] Thanks for your reply. Indeed, what I am trying to achieve is just a faster savepoint that does not need to iterate all records one by one (along with some condition check that make it slow for huge data). And yes what you are described is very close to what I wanted but I didn't use the word `checkpoint` is that: checkpoint doesn't guarantee to support rescaling (this can be found on [flink-doc|https://ci.apache.org/projects/flink/flink-docs-release-1.4/ops/state/checkpoints.html#difference-to-savepoints] and the comment in this PR [5490|https://github.com/apache/flink/pull/5490]), which is always the purpose that we trigger a savepoint. An interesting thing I found is that, in the implementation checkpoint also support rescaling, I checked that both in code and in practice ... I wonder whether the "archive checkpoint" that you mentioned guarantee to support rescaling? At bout the implementation, I think maybe this issue's title is incorrect ... I just want to implement the savepoint which go though the incremental checkpoint path but treat the `baseSstFile` as empty ( which is look like just submit the local RocksDB snapshot on to DFS)... was (Author: sihuazhou): [~StephanEwen] Thanks for your reply. Indeed, what I am trying to achieve is just a faster savepoint that does not to iterate all records one by one (along with some condition check that make it slow for huge data). And yes what you are described is very close to what I wanted but I didn't use the word `checkpoint` is that: checkpoint doesn't guarantee to support rescaling (this can be found on [flink-doc|https://ci.apache.org/projects/flink/flink-docs-release-1.4/ops/state/checkpoints.html#difference-to-savepoints] and the comment in this PR [5490|https://github.com/apache/flink/pull/5490]), which is always the purpose that we trigger a savepoint. An interesting thing I found is that, in the implementation checkpoint also support rescaling, I checked that both in code and in practice ... I wonder whether the "archive checkpoint" guarantee to support rescaling? At bout the implementation, I think maybe this issue's title incorrect ... I just want to implement the save point which go though the incremental checkpoint path but treat the `baseSstFile` as empty ( which is look like just submit the local RocksDB snapshot on to DFS). > Introduce Incremental savepoint > --- > > Key: FLINK-8753 > URL: https://issues.apache.org/jira/browse/FLINK-8753 > Project: Flink > Issue Type: New Feature > Components: State Backends, Checkpointing >Affects Versions: 1.5.0 >Reporter: Sihua Zhou >Assignee: Sihua Zhou >Priority: Major > > Right now, savepoint goes through the full checkpoint path, take a savepoint > could be slowly. In our production, for some long term job it often costs > more than 10min to complete a savepoint which is unacceptable for a real time > job, so we have to turn back to use the externalized checkpoint instead > currently. But the externalized checkpoint has a time interval (checkpoint > interval) between the last time. So I proposal to introduce the increment > savepoint which goes through the increment checkpoint path. > Any advice would be appreciated! -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-8753) Introduce Incremental savepoint
[ https://issues.apache.org/jira/browse/FLINK-8753?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16377908#comment-16377908 ] Sihua Zhou commented on FLINK-8753: --- [~StephanEwen] Thanks for your reply. Indeed, what I am trying to achieve is just a faster savepoint that does not to iterate all records one by one (along with some condition check that make it slow for huge data). And yes what you are described is very close to what I wanted but I didn't use the word `checkpoint` is that: checkpoint doesn't guarantee to support rescaling (this can be found on [flink-doc|https://ci.apache.org/projects/flink/flink-docs-release-1.4/ops/state/checkpoints.html#difference-to-savepoints] and the comment in this PR [5490|https://github.com/apache/flink/pull/5490]), which is always the purpose that we trigger a savepoint. An interesting thing I found is that, in the implementation checkpoint also support rescaling, I checked that both in code and in practice ... I wonder whether the "archive checkpoint" guarantee to support rescaling? At bout the implementation, I think maybe this issue's title incorrect ... I just want to implement the save point which go though the incremental checkpoint path but treat the `baseSstFile` as empty ( which is look like just submit the local RocksDB snapshot on to DFS). > Introduce Incremental savepoint > --- > > Key: FLINK-8753 > URL: https://issues.apache.org/jira/browse/FLINK-8753 > Project: Flink > Issue Type: New Feature > Components: State Backends, Checkpointing >Affects Versions: 1.5.0 >Reporter: Sihua Zhou >Assignee: Sihua Zhou >Priority: Major > > Right now, savepoint goes through the full checkpoint path, take a savepoint > could be slowly. In our production, for some long term job it often costs > more than 10min to complete a savepoint which is unacceptable for a real time > job, so we have to turn back to use the externalized checkpoint instead > currently. But the externalized checkpoint has a time interval (checkpoint > interval) between the last time. So I proposal to introduce the increment > savepoint which goes through the increment checkpoint path. > Any advice would be appreciated! -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (FLINK-7795) Utilize error-prone to discover common coding mistakes
[ https://issues.apache.org/jira/browse/FLINK-7795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16345955#comment-16345955 ] Ted Yu edited comment on FLINK-7795 at 2/27/18 2:08 AM: error-prone has JDK 8 dependency. was (Author: yuzhih...@gmail.com): error-prone has JDK 8 dependency . > Utilize error-prone to discover common coding mistakes > -- > > Key: FLINK-7795 > URL: https://issues.apache.org/jira/browse/FLINK-7795 > Project: Flink > Issue Type: Improvement > Components: Build System >Reporter: Ted Yu >Priority: Major > > http://errorprone.info/ is a tool which detects common coding mistakes. > We should incorporate into Flink build process. > Here are the dependencies: > {code} > > com.google.errorprone > error_prone_annotation > ${error-prone.version} > provided > > > > com.google.auto.service > auto-service > 1.0-rc3 > true > > > com.google.errorprone > error_prone_check_api > ${error-prone.version} > provided > > > com.google.code.findbugs > jsr305 > > > > > com.google.errorprone > javac > 9-dev-r4023-3 > provided > > > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (FLINK-7897) Consider using nio.Files for file deletion in TransientBlobCleanupTask
[ https://issues.apache.org/jira/browse/FLINK-7897?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated FLINK-7897: -- Description: nio.Files#delete() provides better clue as to why the deletion may fail: https://docs.oracle.com/javase/7/docs/api/java/nio/file/Files.html#delete(java.nio.file.Path) Depending on the potential exception (FileNotFound), the call to localFile.exists() may be skipped. was: nio.Files#delete() provides better clue as to why the deletion may fail: https://docs.oracle.com/javase/7/docs/api/java/nio/file/Files.html#delete(java.nio.file.Path) Depending on the potential exception (FileNotFound), the call to localFile.exists() may be skipped. > Consider using nio.Files for file deletion in TransientBlobCleanupTask > -- > > Key: FLINK-7897 > URL: https://issues.apache.org/jira/browse/FLINK-7897 > Project: Flink > Issue Type: Improvement > Components: Local Runtime >Reporter: Ted Yu >Priority: Minor > > nio.Files#delete() provides better clue as to why the deletion may fail: > https://docs.oracle.com/javase/7/docs/api/java/nio/file/Files.html#delete(java.nio.file.Path) > Depending on the potential exception (FileNotFound), the call to > localFile.exists() may be skipped. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-8335) Upgrade hbase connector dependency to 1.4.1
[ https://issues.apache.org/jira/browse/FLINK-8335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16377898#comment-16377898 ] ASF GitHub Bot commented on FLINK-8335: --- Github user tedyu commented on the issue: https://github.com/apache/flink/pull/5488 Should be good after 1.4.1 is filled in > Upgrade hbase connector dependency to 1.4.1 > --- > > Key: FLINK-8335 > URL: https://issues.apache.org/jira/browse/FLINK-8335 > Project: Flink > Issue Type: Improvement > Components: Batch Connectors and Input/Output Formats >Reporter: Ted Yu >Priority: Minor > > hbase 1.4.1 has been released. > 1.4.0 shows speed improvement over previous 1.x releases. > http://search-hadoop.com/m/HBase/YGbbBxedD1Mnm8t?subj=Re+VOTE+The+second+HBase+1+4+0+release+candidate+RC1+is+available > This issue is to upgrade the dependency to 1.4.1 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] flink issue #5488: [FLINK-8335] [hbase] Upgrade hbase connector dependency t...
Github user tedyu commented on the issue: https://github.com/apache/flink/pull/5488 Should be good after 1.4.1 is filled in ---
[jira] [Updated] (FLINK-8554) Upgrade AWS SDK
[ https://issues.apache.org/jira/browse/FLINK-8554?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated FLINK-8554: -- Description: AWS SDK 1.11.271 fixes a lot of bugs. One of which would exhibit the following: {code} Caused by: java.lang.NullPointerException at com.amazonaws.metrics.AwsSdkMetrics.getRegion(AwsSdkMetrics.java:729) at com.amazonaws.metrics.MetricAdmin.getRegion(MetricAdmin.java:67) at sun.reflect.GeneratedMethodAccessor132.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) {code} was: AWS SDK 1.11.271 fixes a lot of bugs. One of which would exhibit the following: {code} Caused by: java.lang.NullPointerException at com.amazonaws.metrics.AwsSdkMetrics.getRegion(AwsSdkMetrics.java:729) at com.amazonaws.metrics.MetricAdmin.getRegion(MetricAdmin.java:67) at sun.reflect.GeneratedMethodAccessor132.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) {code} > Upgrade AWS SDK > --- > > Key: FLINK-8554 > URL: https://issues.apache.org/jira/browse/FLINK-8554 > Project: Flink > Issue Type: Improvement >Reporter: Ted Yu >Priority: Minor > > AWS SDK 1.11.271 fixes a lot of bugs. > One of which would exhibit the following: > {code} > Caused by: java.lang.NullPointerException > at com.amazonaws.metrics.AwsSdkMetrics.getRegion(AwsSdkMetrics.java:729) > at com.amazonaws.metrics.MetricAdmin.getRegion(MetricAdmin.java:67) > at sun.reflect.GeneratedMethodAccessor132.invoke(Unknown Source) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-8720) Logging exception with S3 connector and BucketingSink
[ https://issues.apache.org/jira/browse/FLINK-8720?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16377835#comment-16377835 ] dejan miljkovic commented on FLINK-8720: Thanks for the suggestions. Removing Hadoop dependencies from jar did solve the problem!!! The application is built on Flink 1.4.1. It looks that more work is needed for class loading logic. I noticed that in application works in InteligJ with some versions of jars but does not work when submitted to cluster. I tried "parent-first" option but that required adding more dependencies in pom.xml. Did not proceed with this because I would affect other application that are deployed. One more thanks a lot for the response and solution. > Logging exception with S3 connector and BucketingSink > - > > Key: FLINK-8720 > URL: https://issues.apache.org/jira/browse/FLINK-8720 > Project: Flink > Issue Type: Bug > Components: Streaming Connectors >Affects Versions: 1.4.1 >Reporter: dejan miljkovic >Priority: Critical > > Trying to stream data to S3. Code works from InteliJ. When submitting code > trough UI on my machine (single node cluster started by start-cluster.sh > script) below stack trace is produced. > > Below is the link to the simple test app that is streaming data to S3. > [https://github.com/dmiljkovic/test-flink-bucketingsink-s3] > The behavior is bit different but same error is produced. Job works only > once. If job is submitted second time below stack trace is produced. If I > restart the cluster job works but only for the first time. > > > org.apache.commons.logging.LogConfigurationException: > java.lang.IllegalAccessError: > org/apache/commons/logging/impl/LogFactoryImpl$3 (Caused by > java.lang.IllegalAccessError: > org/apache/commons/logging/impl/LogFactoryImpl$3) > at > org.apache.commons.logging.impl.LogFactoryImpl.newInstance(LogFactoryImpl.java:637) > at > org.apache.commons.logging.impl.LogFactoryImpl.getInstance(LogFactoryImpl.java:336) > at > org.apache.commons.logging.impl.LogFactoryImpl.getInstance(LogFactoryImpl.java:310) > at org.apache.commons.logging.LogFactory.getLog(LogFactory.java:685) > at > org.apache.http.impl.conn.PoolingClientConnectionManager.(PoolingClientConnectionManager.java:76) > at > org.apache.http.impl.conn.PoolingClientConnectionManager.(PoolingClientConnectionManager.java:102) > at > org.apache.http.impl.conn.PoolingClientConnectionManager.(PoolingClientConnectionManager.java:88) > at > org.apache.http.impl.conn.PoolingClientConnectionManager.(PoolingClientConnectionManager.java:96) > at > com.amazonaws.http.ConnectionManagerFactory.createPoolingClientConnManager(ConnectionManagerFactory.java:26) > at > com.amazonaws.http.HttpClientFactory.createHttpClient(HttpClientFactory.java:96) > at com.amazonaws.http.AmazonHttpClient.(AmazonHttpClient.java:158) > at > com.amazonaws.AmazonWebServiceClient.(AmazonWebServiceClient.java:119) > at > com.amazonaws.services.s3.AmazonS3Client.(AmazonS3Client.java:389) > at > com.amazonaws.services.s3.AmazonS3Client.(AmazonS3Client.java:371) > at > org.apache.hadoop.fs.s3a.S3AFileSystem.initialize(S3AFileSystem.java:235) > at > org.apache.flink.streaming.connectors.fs.bucketing.BucketingSink.createHadoopFileSystem(BucketingSink.java:1206) > at > org.apache.flink.streaming.connectors.fs.bucketing.BucketingSink.initFileSystem(BucketingSink.java:411) > at > org.apache.flink.streaming.connectors.fs.bucketing.BucketingSink.initializeState(BucketingSink.java:355) > at > org.apache.flink.streaming.util.functions.StreamingFunctionUtils.tryRestoreFunction(StreamingFunctionUtils.java:178) > at > org.apache.flink.streaming.util.functions.StreamingFunctionUtils.restoreFunctionState(StreamingFunctionUtils.java:160) > at > org.apache.flink.streaming.api.operators.AbstractUdfStreamOperator.initializeState(AbstractUdfStreamOperator.java:96) > at > org.apache.flink.streaming.api.operators.AbstractStreamOperator.initializeState(AbstractStreamOperator.java:258) > at > org.apache.flink.streaming.runtime.tasks.StreamTask.initializeOperators(StreamTask.java:694) > at > org.apache.flink.streaming.runtime.tasks.StreamTask.initializeState(StreamTask.java:682) > at > org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:253) > at org.apache.flink.runtime.taskmanager.Task.run(Task.java:718) > at java.lang.Thread.run(Thread.java:748) > Caused by: java.lang.IllegalAccessError: > org/apache/commons/logging/impl/LogFactoryImpl$3 > at > org.apache.commons.logging.impl.LogFactoryImpl.getParentClassLoader(LogFactoryImpl.java:700) > at >
[jira] [Commented] (FLINK-7477) Use "hadoop classpath" to augment classpath when available
[ https://issues.apache.org/jira/browse/FLINK-7477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16377679#comment-16377679 ] Ken Krugler commented on FLINK-7477: The odd stuff (some of which might be bogus)... # I had to explicitly add {{kryo.serializers}} as a dependency. # Ditto for {{org.jdom:jdom}}, which our {{Tika}} dependency should have pulled in transitively, but it was missing. # A bunch of stuff with getting integration tests working (including {{maven-failsafe-plugin}} and {{build-helper-maven-plugin}} among others), but that just happened to be at the same time as the AWS client class issue, so unrelated. Not sure how different our (non-flink) shaded exclusion list wound up being from "regular" Flink, here's what it is now: {code:java} log4j:log4j org.scala-lang:scala-library org.scala-lang:scala-compiler org.scala-lang:scala-reflect com.data-artisans:flakka-actor_* com.data-artisans:flakka-remote_* com.data-artisans:flakka-slf4j_* io.netty:netty-all io.netty:netty commons-fileupload:commons-fileupload org.apache.avro:avro commons-collections:commons-collections org.codehaus.jackson:jackson-core-asl org.codehaus.jackson:jackson-mapper-asl com.thoughtworks.paranamer:paranamer org.xerial.snappy:snappy-java org.apache.commons:commons-compress org.tukaani:xz com.esotericsoftware.kryo:kryo com.esotericsoftware.minlog:minlog org.objenesis:objenesis com.twitter:chill_* com.twitter:chill-java commons-lang:commons-lang junit:junit org.apache.commons:commons-lang3 org.slf4j:slf4j-api org.slf4j:slf4j-log4j12 log4j:log4j org.apache.commons:commons-math org.apache.sling:org.apache.sling.commons.json commons-logging:commons-logging commons-codec:commons-codec stax:stax-api com.typesafe:config org.uncommons.maths:uncommons-maths com.github.scopt:scopt_* commons-io:commons-io commons-cli:commons-cli {code} > Use "hadoop classpath" to augment classpath when available > -- > > Key: FLINK-7477 > URL: https://issues.apache.org/jira/browse/FLINK-7477 > Project: Flink > Issue Type: Bug > Components: Startup Shell Scripts >Reporter: Aljoscha Krettek >Assignee: Aljoscha Krettek >Priority: Major > Fix For: 1.4.0 > > > Currently, some cloud environments don't properly put the Hadoop jars into > {{HADOOP_CLASSPATH}} (or don't set {{HADOOP_CLASSPATH}}) at all. We should > check in {{config.sh}} if the {{hadoop}} binary is on the path and augment > our {{INTERNAL_HADOOP_CLASSPATHS}} with the result of {{hadoop classpath}} in > our scripts. > This will improve the out-of-box experience of users that otherwise have to > manually set {{HADOOP_CLASSPATH}}. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-8791) Fix documentation on how to link dependencies
[ https://issues.apache.org/jira/browse/FLINK-8791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16377662#comment-16377662 ] ASF GitHub Bot commented on FLINK-8791: --- GitHub user StephanEwen opened a pull request: https://github.com/apache/flink/pull/5586 [FLINK-8791] [docs] Fix documentation about configuring dependencies This fixes the incorrect docs currently under [Linking with Flink](https://ci.apache.org/projects/flink/flink-docs-master/dev/linking_with_flink.html) and [Linking with Optional Modules](https://ci.apache.org/projects/flink/flink-docs-master/dev/linking.html) This adds redirects from the two old incorrect pages to the new correct page. /cc @twalthr and @alpinegizmo happy to hear your thoughts. For any corrections to spelling errors and formatting fixes, could you add a pull request against my pull request branch (https://github.com/StephanEwen/incubator-flink/tree/docs_dependencies) You can merge this pull request into a Git repository by running: $ git pull https://github.com/StephanEwen/incubator-flink docs_dependencies Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/5586.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #5586 commit 2bdda86170c25d0a9fadca65d33077b53b0734fc Author: Stephan EwenDate: 2018-02-26T15:41:24Z [FLINK-8791] [docs] Fix documentation about configuring dependencies > Fix documentation on how to link dependencies > - > > Key: FLINK-8791 > URL: https://issues.apache.org/jira/browse/FLINK-8791 > Project: Flink > Issue Type: Bug > Components: Documentation >Reporter: Stephan Ewen >Assignee: Stephan Ewen >Priority: Major > Fix For: 1.5.0 > > > The documentation in "Linking with Flink" and "Linking with Optional > Dependencies" is very outdated and gives wrong advise to users. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] flink pull request #5586: [FLINK-8791] [docs] Fix documentation about config...
GitHub user StephanEwen opened a pull request: https://github.com/apache/flink/pull/5586 [FLINK-8791] [docs] Fix documentation about configuring dependencies This fixes the incorrect docs currently under [Linking with Flink](https://ci.apache.org/projects/flink/flink-docs-master/dev/linking_with_flink.html) and [Linking with Optional Modules](https://ci.apache.org/projects/flink/flink-docs-master/dev/linking.html) This adds redirects from the two old incorrect pages to the new correct page. /cc @twalthr and @alpinegizmo happy to hear your thoughts. For any corrections to spelling errors and formatting fixes, could you add a pull request against my pull request branch (https://github.com/StephanEwen/incubator-flink/tree/docs_dependencies) You can merge this pull request into a Git repository by running: $ git pull https://github.com/StephanEwen/incubator-flink docs_dependencies Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/5586.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #5586 commit 2bdda86170c25d0a9fadca65d33077b53b0734fc Author: Stephan EwenDate: 2018-02-26T15:41:24Z [FLINK-8791] [docs] Fix documentation about configuring dependencies ---
[jira] [Created] (FLINK-8791) Fix documentation on how to link dependencies
Stephan Ewen created FLINK-8791: --- Summary: Fix documentation on how to link dependencies Key: FLINK-8791 URL: https://issues.apache.org/jira/browse/FLINK-8791 Project: Flink Issue Type: Bug Components: Documentation Reporter: Stephan Ewen Assignee: Stephan Ewen Fix For: 1.5.0 The documentation in "Linking with Flink" and "Linking with Optional Dependencies" is very outdated and gives wrong advise to users. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-8753) Introduce Incremental savepoint
[ https://issues.apache.org/jira/browse/FLINK-8753?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16377654#comment-16377654 ] Stephan Ewen commented on FLINK-8753: - It sounds like what you are trying to achieve is closer to a checkpoint then to a savepoint: - checkpoint is potentially manually triggered - checkpoint is not automatically removed once it is subsumed, but it is retained - could call it "archive checkpoint" or "detached checkpoint". The tricky question to me is: if this thing is incremental, then there needs to be some bookkeeping on how many references are made to the individual shared state chunks. Someone would still need to hold a reference in the shared state registry, or the state chunks will be removed once all other checkpoints stop referencing them. Alternatively, one could mark the chunks as 'detached', meaning they are not reference counted any more, but always kept. Then the question is, how can one determine how to clean the checkpoint up? The only way I can imagine this to work in practice is on file systems that support hard links - in that case, the hard links do the ref counting for you. > Introduce Incremental savepoint > --- > > Key: FLINK-8753 > URL: https://issues.apache.org/jira/browse/FLINK-8753 > Project: Flink > Issue Type: New Feature > Components: State Backends, Checkpointing >Affects Versions: 1.5.0 >Reporter: Sihua Zhou >Assignee: Sihua Zhou >Priority: Major > > Right now, savepoint goes through the full checkpoint path, take a savepoint > could be slowly. In our production, for some long term job it often costs > more than 10min to complete a savepoint which is unacceptable for a real time > job, so we have to turn back to use the externalized checkpoint instead > currently. But the externalized checkpoint has a time interval (checkpoint > interval) between the last time. So I proposal to introduce the increment > savepoint which goes through the increment checkpoint path. > Any advice would be appreciated! -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-8770) CompletedCheckPoints stored on ZooKeeper is not up-to-date, when JobManager is restarted it fails to recover the job due to "checkpoint FileNotFound exception"
[ https://issues.apache.org/jira/browse/FLINK-8770?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16377623#comment-16377623 ] Stephan Ewen commented on FLINK-8770: - Thank you for reporting this. [~till.rohrmann] - do you have any intuition why this might happen? > CompletedCheckPoints stored on ZooKeeper is not up-to-date, when JobManager > is restarted it fails to recover the job due to "checkpoint FileNotFound > exception" > --- > > Key: FLINK-8770 > URL: https://issues.apache.org/jira/browse/FLINK-8770 > Project: Flink > Issue Type: Bug > Components: Local Runtime >Affects Versions: 1.4.0 >Reporter: Xinyang Gao >Priority: Blocker > Fix For: 1.5.0 > > > Hi, I am running a Flink cluster (1 JobManager + 6 TaskManagers) with HA mode > on OpenShift, I have enabled Chaos Monkey which kills either JobManager or > one of the TaskManager in every 5 minutes, ZooKeeper quorum is stable with no > chaos monkey enabled. Flink reads data from one Kafka topic and writes data > into another Kafka topic. Checkpoint surely is enabled, with 1000ms interval. > state.checkpoints.num-retained is set to 10. I am using PVC for state backend > (checkpoint, recovery, etc), so the checkpoints and states are persistent. > The restart strategy for Flink jobmanager DeploymentConfig is > {color:#d04437}recreate, {color:#33}which means it will kill the old > container of jobmanager before it restarts the new one.{color}{color} > I have run the Chaos test for one day at first, however I have seen the > exception: > {color:#FF}org.apache.flink.util.FlinkException: Could not retrieve > checkpoint *** from state handle under /***. This indicates that the > retrieved state handle is broken. Try cleaning the state handle store. > {color:#33}and the root cause is checkpoint > {color:#d04437}FileNotFound{color}. {color}{color} > {color:#FF}{color:#33}then the Flink job keeps restarting for a few > hours and due to the above error it cannot be restarted successfully. > {color}{color} > {color:#FF}{color:#33}After further investigation, I have found the > following facts in my PVC:{color}{color} > > {color:#d04437}-rw-r--r--. 1 flink root 11379 Feb 23 02:10 > completedCheckpoint0ee95157de00 > -rw-r--r--. 1 flink root 11379 Feb 23 01:51 completedCheckpoint498d0952cf00 > -rw-r--r--. 1 flink root 11379 Feb 23 02:10 completedCheckpoint650fe5b021fe > -rw-r--r--. 1 flink root 11379 Feb 23 02:10 completedCheckpoint66634149683e > -rw-r--r--. 1 flink root 11379 Feb 23 02:11 completedCheckpoint67f24c3b018e > -rw-r--r--. 1 flink root 11379 Feb 23 02:10 completedCheckpoint6f64ebf0ae64 > -rw-r--r--. 1 flink root 11379 Feb 23 02:11 completedCheckpoint906ebe1fb337 > -rw-r--r--. 1 flink root 11379 Feb 23 02:11 completedCheckpoint98b79ea14b09 > -rw-r--r--. 1 flink root 11379 Feb 23 02:10 completedCheckpointa0d1070e0b6c > -rw-r--r--. 1 flink root 11379 Feb 23 02:11 completedCheckpointbd3a9ba50322 > -rw-r--r--. 1 flink root 11355 Feb 22 17:31 completedCheckpointd433b5e108be > -rw-r--r--. 1 flink root 11379 Feb 22 22:56 completedCheckpointdd0183ed092b > -rw-r--r--. 1 flink root 11379 Feb 22 00:00 completedCheckpointe0a5146c3d81 > -rw-r--r--. 1 flink root 11331 Feb 22 17:06 completedCheckpointec82f3ebc2ad > -rw-r--r--. 1 flink root 11379 Feb 23 02:11 > completedCheckpointf86e460f6720{color} > > {color:#33}The latest 10 checkpoints are created at about 02:10, if you > ignore the old checkpoints which were not deleted successfully (which I do > not care too much).{color} > > {color:#33}However when checking on ZooKeeper, I see the followings in > flink/checkpoints path (I only list one, but the other 9 are similar){color} > {color:#d04437}cZxid = 0x160001ff5d > ��sr;org.apache.flink.runtime.state.RetrievableStreamStateHandle�U�+LwrappedStreamStateHandlet2Lorg/apache/flink/runtime/state/StreamStateHandle;xpsr9org.apache.flink.runtime.state.filesystem.FileStateHandle�u�b�▒▒J > > stateSizefilePathtLorg/apache/flink/core/fs/Path;xp,ssrorg.apache.flink.core.fs.PathLuritLjava/net/URI;xpsr > > java.net.URI�x.C�I�LstringtLjava/lang/String;xpt=file:/mnt/flink-test/recovery/completedCheckpointd004a3753870x > [zk: localhost:2181(CONNECTED) 7] ctime = Fri Feb 23 02:08:18 UTC 2018 > mZxid = 0x160001ff5d > mtime = Fri Feb 23 02:08:18 UTC 2018 > pZxid = 0x1d0c6d > cversion = 31 > dataVersion = 0 > aclVersion = 0 > ephemeralOwner = 0x0 > dataLength = 492{color} > {color:#FF}{color:#33} {color}{color} > so the latest completedCheckpoints status stored on ZooKeeper is at about > {color:#d04437}02:08, {color:#33}which implies that the completed > checkpoints at{color}{color}
[jira] [Commented] (FLINK-8784) Share network buffer pool among different partition operators
[ https://issues.apache.org/jira/browse/FLINK-8784?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16377622#comment-16377622 ] Stephan Ewen commented on FLINK-8784: - The choice to not share the network buffer pool, but partition it was made to guard against starvation effects. Flink can never know in which order or speed any connection will progress (that depends on the behavior of the user code), so reserving enough memory for each connection to be able to do its maximum work makes it safe (against deadlocks and against bad performance). For streaming jobs, I am not sure we should change that. For batch jobs, we can probably improve the behavior better through better scheduling than through adjusting the network stack memory model. > Share network buffer pool among different partition operators > - > > Key: FLINK-8784 > URL: https://issues.apache.org/jira/browse/FLINK-8784 > Project: Flink > Issue Type: Improvement > Components: TaskManager >Reporter: Garrett Li >Priority: Major > > A logical network connection exists for each point-to-point exchange of data > over the network, which typically happens at repartitioning- or broadcasting > steps (shuffle phase). et > Currently Flink divide the network buffer pool evenly for every partition > operators. Sometimes it could waste too many network buffers since we might > have a lot of repartitioning and broadcasting in the one Flink job, and NOT > all of the partition operators in the same job have the same traffic. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-8543) Output Stream closed at org.apache.hadoop.fs.s3a.S3AOutputStream.checkOpen
[ https://issues.apache.org/jira/browse/FLINK-8543?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16377510#comment-16377510 ] ASF GitHub Bot commented on FLINK-8543: --- Github user aljoscha commented on the issue: https://github.com/apache/flink/pull/5563 merged > Output Stream closed at org.apache.hadoop.fs.s3a.S3AOutputStream.checkOpen > -- > > Key: FLINK-8543 > URL: https://issues.apache.org/jira/browse/FLINK-8543 > Project: Flink > Issue Type: Bug > Components: Streaming Connectors >Affects Versions: 1.4.0 > Environment: IBM Analytics Engine - > [https://console.bluemix.net/docs/services/AnalyticsEngine/index.html#introduction] > The cluster is based on Hortonworks Data Platform 2.6.2. The following > components are made available. > Apache Spark 2.1.1 Hadoop 2.7.3 > Apache Livy 0.3.0 > Knox 0.12.0 > Ambari 2.5.2 > Anaconda with Python 2.7.13 and 3.5.2 > Jupyter Enterprise Gateway 0.5.0 > HBase 1.1.2 * > Hive 1.2.1 * > Oozie 4.2.0 * > Flume 1.5.2 * > Tez 0.7.0 * > Pig 0.16.0 * > Sqoop 1.4.6 * > Slider 0.92.0 * >Reporter: chris snow >Priority: Blocker > Fix For: 1.5.0 > > Attachments: Screen Shot 2018-01-30 at 18.34.51.png > > > I'm hitting an issue with my BucketingSink from a streaming job. > > {code:java} > return new BucketingSink>(path) > .setWriter(writer) > .setBucketer(new DateTimeBucketer Object>>(formatString)); > {code} > > I can see that a few files have run into issues with uploading to S3: > !Screen Shot 2018-01-30 at 18.34.51.png! > The Flink console output is showing an exception being thrown by > S3AOutputStream, so I've grabbed the S3AOutputStream class from my cluster > and added some additional logging to the checkOpen() method to log the 'key' > just before the exception is thrown: > > {code:java} > /* > * Decompiled with CFR. > */ > package org.apache.hadoop.fs.s3a; > import com.amazonaws.AmazonClientException; > import com.amazonaws.event.ProgressListener; > import com.amazonaws.services.s3.model.ObjectMetadata; > import com.amazonaws.services.s3.model.PutObjectRequest; > import com.amazonaws.services.s3.transfer.Upload; > import com.amazonaws.services.s3.transfer.model.UploadResult; > import java.io.BufferedOutputStream; > import java.io.File; > import java.io.FileOutputStream; > import java.io.IOException; > import java.io.InterruptedIOException; > import java.io.OutputStream; > import java.util.concurrent.atomic.AtomicBoolean; > import org.apache.hadoop.classification.InterfaceAudience; > import org.apache.hadoop.classification.InterfaceStability; > import org.apache.hadoop.conf.Configuration; > import org.apache.hadoop.fs.s3a.ProgressableProgressListener; > import org.apache.hadoop.fs.s3a.S3AFileSystem; > import org.apache.hadoop.fs.s3a.S3AUtils; > import org.apache.hadoop.util.Progressable; > import org.slf4j.Logger; > @InterfaceAudience.Private > @InterfaceStability.Evolving > public class S3AOutputStream > extends OutputStream { > private final OutputStream backupStream; > private final File backupFile; > private final AtomicBoolean closed = new AtomicBoolean(false); > private final String key; > private final Progressable progress; > private final S3AFileSystem fs; > public static final Logger LOG = S3AFileSystem.LOG; > public S3AOutputStream(Configuration conf, S3AFileSystem fs, String key, > Progressable progress) throws IOException { > this.key = key; > this.progress = progress; > this.fs = fs; > this.backupFile = fs.createTmpFileForWrite("output-", -1, conf); > LOG.debug("OutputStream for key '{}' writing to tempfile: {}", > (Object)key, (Object)this.backupFile); > this.backupStream = new BufferedOutputStream(new > FileOutputStream(this.backupFile)); > } > void checkOpen() throws IOException { > if (!this.closed.get()) return; > // vv-- Additional logging --vvv > LOG.error("OutputStream for key '{}' closed.", (Object)this.key); > throw new IOException("Output Stream closed"); > } > @Override > public void flush() throws IOException { > this.checkOpen(); > this.backupStream.flush(); > } > @Override > public void close() throws IOException { > if (this.closed.getAndSet(true)) { > return; > } > this.backupStream.close(); > LOG.debug("OutputStream for key '{}' closed. Now beginning upload", > (Object)this.key); > try { > ObjectMetadata om = > this.fs.newObjectMetadata(this.backupFile.length()); > Upload upload = > this.fs.putObject(this.fs.newPutObjectRequest(this.key, om,
[jira] [Commented] (FLINK-8543) Output Stream closed at org.apache.hadoop.fs.s3a.S3AOutputStream.checkOpen
[ https://issues.apache.org/jira/browse/FLINK-8543?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16377506#comment-16377506 ] ASF GitHub Bot commented on FLINK-8543: --- Github user aljoscha closed the pull request at: https://github.com/apache/flink/pull/5563 > Output Stream closed at org.apache.hadoop.fs.s3a.S3AOutputStream.checkOpen > -- > > Key: FLINK-8543 > URL: https://issues.apache.org/jira/browse/FLINK-8543 > Project: Flink > Issue Type: Bug > Components: Streaming Connectors >Affects Versions: 1.4.0 > Environment: IBM Analytics Engine - > [https://console.bluemix.net/docs/services/AnalyticsEngine/index.html#introduction] > The cluster is based on Hortonworks Data Platform 2.6.2. The following > components are made available. > Apache Spark 2.1.1 Hadoop 2.7.3 > Apache Livy 0.3.0 > Knox 0.12.0 > Ambari 2.5.2 > Anaconda with Python 2.7.13 and 3.5.2 > Jupyter Enterprise Gateway 0.5.0 > HBase 1.1.2 * > Hive 1.2.1 * > Oozie 4.2.0 * > Flume 1.5.2 * > Tez 0.7.0 * > Pig 0.16.0 * > Sqoop 1.4.6 * > Slider 0.92.0 * >Reporter: chris snow >Priority: Blocker > Fix For: 1.5.0 > > Attachments: Screen Shot 2018-01-30 at 18.34.51.png > > > I'm hitting an issue with my BucketingSink from a streaming job. > > {code:java} > return new BucketingSink>(path) > .setWriter(writer) > .setBucketer(new DateTimeBucketer Object>>(formatString)); > {code} > > I can see that a few files have run into issues with uploading to S3: > !Screen Shot 2018-01-30 at 18.34.51.png! > The Flink console output is showing an exception being thrown by > S3AOutputStream, so I've grabbed the S3AOutputStream class from my cluster > and added some additional logging to the checkOpen() method to log the 'key' > just before the exception is thrown: > > {code:java} > /* > * Decompiled with CFR. > */ > package org.apache.hadoop.fs.s3a; > import com.amazonaws.AmazonClientException; > import com.amazonaws.event.ProgressListener; > import com.amazonaws.services.s3.model.ObjectMetadata; > import com.amazonaws.services.s3.model.PutObjectRequest; > import com.amazonaws.services.s3.transfer.Upload; > import com.amazonaws.services.s3.transfer.model.UploadResult; > import java.io.BufferedOutputStream; > import java.io.File; > import java.io.FileOutputStream; > import java.io.IOException; > import java.io.InterruptedIOException; > import java.io.OutputStream; > import java.util.concurrent.atomic.AtomicBoolean; > import org.apache.hadoop.classification.InterfaceAudience; > import org.apache.hadoop.classification.InterfaceStability; > import org.apache.hadoop.conf.Configuration; > import org.apache.hadoop.fs.s3a.ProgressableProgressListener; > import org.apache.hadoop.fs.s3a.S3AFileSystem; > import org.apache.hadoop.fs.s3a.S3AUtils; > import org.apache.hadoop.util.Progressable; > import org.slf4j.Logger; > @InterfaceAudience.Private > @InterfaceStability.Evolving > public class S3AOutputStream > extends OutputStream { > private final OutputStream backupStream; > private final File backupFile; > private final AtomicBoolean closed = new AtomicBoolean(false); > private final String key; > private final Progressable progress; > private final S3AFileSystem fs; > public static final Logger LOG = S3AFileSystem.LOG; > public S3AOutputStream(Configuration conf, S3AFileSystem fs, String key, > Progressable progress) throws IOException { > this.key = key; > this.progress = progress; > this.fs = fs; > this.backupFile = fs.createTmpFileForWrite("output-", -1, conf); > LOG.debug("OutputStream for key '{}' writing to tempfile: {}", > (Object)key, (Object)this.backupFile); > this.backupStream = new BufferedOutputStream(new > FileOutputStream(this.backupFile)); > } > void checkOpen() throws IOException { > if (!this.closed.get()) return; > // vv-- Additional logging --vvv > LOG.error("OutputStream for key '{}' closed.", (Object)this.key); > throw new IOException("Output Stream closed"); > } > @Override > public void flush() throws IOException { > this.checkOpen(); > this.backupStream.flush(); > } > @Override > public void close() throws IOException { > if (this.closed.getAndSet(true)) { > return; > } > this.backupStream.close(); > LOG.debug("OutputStream for key '{}' closed. Now beginning upload", > (Object)this.key); > try { > ObjectMetadata om = > this.fs.newObjectMetadata(this.backupFile.length()); > Upload upload = > this.fs.putObject(this.fs.newPutObjectRequest(this.key, om,
[GitHub] flink issue #5563: [FLINK-8543] Don't call super.close() in AvroKeyValueSink...
Github user aljoscha commented on the issue: https://github.com/apache/flink/pull/5563 merged ---
[GitHub] flink pull request #5563: [FLINK-8543] Don't call super.close() in AvroKeyVa...
Github user aljoscha closed the pull request at: https://github.com/apache/flink/pull/5563 ---
[jira] [Commented] (FLINK-8703) Migrate tests from LocalFlinkMiniCluster to MiniClusterResource
[ https://issues.apache.org/jira/browse/FLINK-8703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16377472#comment-16377472 ] Chesnay Schepler commented on FLINK-8703: - second batch merged to master in 3e056b34f7be817e0d0eee612b6ae44891e33501 > Migrate tests from LocalFlinkMiniCluster to MiniClusterResource > --- > > Key: FLINK-8703 > URL: https://issues.apache.org/jira/browse/FLINK-8703 > Project: Flink > Issue Type: Sub-task > Components: Tests >Reporter: Aljoscha Krettek >Assignee: Chesnay Schepler >Priority: Blocker > Fix For: 1.5.0 > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-8645) Support convenient extension of parent-first ClassLoader patterns
[ https://issues.apache.org/jira/browse/FLINK-8645?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16377470#comment-16377470 ] ASF GitHub Bot commented on FLINK-8645: --- Github user asfgit closed the pull request at: https://github.com/apache/flink/pull/5544 > Support convenient extension of parent-first ClassLoader patterns > - > > Key: FLINK-8645 > URL: https://issues.apache.org/jira/browse/FLINK-8645 > Project: Flink > Issue Type: Improvement > Components: Configuration >Affects Versions: 1.5.0 >Reporter: Chesnay Schepler >Assignee: Chesnay Schepler >Priority: Major > Fix For: 1.5.0 > > > The option {{classloader.parent-first-patterns}} defines a list of class > pattern that should always be loaded through the parent class-loader. The > default value contains all classes that are effectively required to be loaded > that way for Flink to function. > This list cannot be extended in a convenient way, as one would have to > manually copy the existing default and append new entries. This makes the > configuration brittle in light of version upgrades where we may extend the > default, and also obfuscates the configuration a bit. > I propose to separate this option into > {{classloader.parent-first-patterns.base}}, which subsumes the existing > option, and {{classloader.parent-first-patterns.append}} which is > automatically appended to the base. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-8593) Latency metric docs are outdated
[ https://issues.apache.org/jira/browse/FLINK-8593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16377471#comment-16377471 ] ASF GitHub Bot commented on FLINK-8593: --- Github user asfgit closed the pull request at: https://github.com/apache/flink/pull/5484 > Latency metric docs are outdated > > > Key: FLINK-8593 > URL: https://issues.apache.org/jira/browse/FLINK-8593 > Project: Flink > Issue Type: Improvement > Components: Documentation, Metrics >Affects Versions: 1.5.0 >Reporter: Chesnay Schepler >Assignee: Chesnay Schepler >Priority: Major > Fix For: 1.5.0 > > > I missed to update the latency metric documentation while working on > FLINK-7608. The docs should be updated to contain the new naming scheme and > that it is a job-level metric. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-8703) Migrate tests from LocalFlinkMiniCluster to MiniClusterResource
[ https://issues.apache.org/jira/browse/FLINK-8703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16377469#comment-16377469 ] ASF GitHub Bot commented on FLINK-8703: --- Github user asfgit closed the pull request at: https://github.com/apache/flink/pull/5542 > Migrate tests from LocalFlinkMiniCluster to MiniClusterResource > --- > > Key: FLINK-8703 > URL: https://issues.apache.org/jira/browse/FLINK-8703 > Project: Flink > Issue Type: Sub-task > Components: Tests >Reporter: Aljoscha Krettek >Assignee: Chesnay Schepler >Priority: Blocker > Fix For: 1.5.0 > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-8596) Custom command line code does not correctly catch errors
[ https://issues.apache.org/jira/browse/FLINK-8596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16377468#comment-16377468 ] ASF GitHub Bot commented on FLINK-8596: --- Github user asfgit closed the pull request at: https://github.com/apache/flink/pull/5543 > Custom command line code does not correctly catch errors > > > Key: FLINK-8596 > URL: https://issues.apache.org/jira/browse/FLINK-8596 > Project: Flink > Issue Type: Bug > Components: Client >Affects Versions: 1.5.0 >Reporter: Aljoscha Krettek >Assignee: Chesnay Schepler >Priority: Blocker > Fix For: 1.5.0 > > > The code that is trying to load the YARN CLI is catching {{Exception}} but if > the YARN CLI cannot be loaded a {{java.lang.NoClassDefFoundError}} is thrown: > https://github.com/apache/flink/blob/0e20b613087e1b326e05674e3d532ea4aa444bc3/flink-clients/src/main/java/org/apache/flink/client/cli/CliFrontend.java#L1067 > This means that Flink does not work in Hadoop-free mode. We should fix this > and also add an end-to-end test that verifies that Hadoop-free mode works. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] flink pull request #5543: [FLINK-8596][CLI] Also catch NoClassDefFoundErrors
Github user asfgit closed the pull request at: https://github.com/apache/flink/pull/5543 ---
[GitHub] flink pull request #5484: [FLINK-8593][metrics] Update latency metric docs
Github user asfgit closed the pull request at: https://github.com/apache/flink/pull/5484 ---
[GitHub] flink pull request #5544: [FLINK-8645][configuration] Split classloader.pare...
Github user asfgit closed the pull request at: https://github.com/apache/flink/pull/5544 ---
[GitHub] flink pull request #5542: [FLINK-8703][tests] Migrate tests from LocalFlinkM...
Github user asfgit closed the pull request at: https://github.com/apache/flink/pull/5542 ---
[jira] [Closed] (FLINK-8645) Support convenient extension of parent-first ClassLoader patterns
[ https://issues.apache.org/jira/browse/FLINK-8645?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chesnay Schepler closed FLINK-8645. --- Resolution: Fixed Fix Version/s: 1.5.0 master: 50aea889bd88d5ce4a1569569176705f9973a08c > Support convenient extension of parent-first ClassLoader patterns > - > > Key: FLINK-8645 > URL: https://issues.apache.org/jira/browse/FLINK-8645 > Project: Flink > Issue Type: Improvement > Components: Configuration >Affects Versions: 1.5.0 >Reporter: Chesnay Schepler >Assignee: Chesnay Schepler >Priority: Major > Fix For: 1.5.0 > > > The option {{classloader.parent-first-patterns}} defines a list of class > pattern that should always be loaded through the parent class-loader. The > default value contains all classes that are effectively required to be loaded > that way for Flink to function. > This list cannot be extended in a convenient way, as one would have to > manually copy the existing default and append new entries. This makes the > configuration brittle in light of version upgrades where we may extend the > default, and also obfuscates the configuration a bit. > I propose to separate this option into > {{classloader.parent-first-patterns.base}}, which subsumes the existing > option, and {{classloader.parent-first-patterns.append}} which is > automatically appended to the base. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Closed] (FLINK-8596) Custom command line code does not correctly catch errors
[ https://issues.apache.org/jira/browse/FLINK-8596?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chesnay Schepler closed FLINK-8596. --- Resolution: Fixed master: f3bfd22b41acd4c16f71d6d69f7983abc63a1e71 > Custom command line code does not correctly catch errors > > > Key: FLINK-8596 > URL: https://issues.apache.org/jira/browse/FLINK-8596 > Project: Flink > Issue Type: Bug > Components: Client >Affects Versions: 1.5.0 >Reporter: Aljoscha Krettek >Assignee: Chesnay Schepler >Priority: Blocker > Fix For: 1.5.0 > > > The code that is trying to load the YARN CLI is catching {{Exception}} but if > the YARN CLI cannot be loaded a {{java.lang.NoClassDefFoundError}} is thrown: > https://github.com/apache/flink/blob/0e20b613087e1b326e05674e3d532ea4aa444bc3/flink-clients/src/main/java/org/apache/flink/client/cli/CliFrontend.java#L1067 > This means that Flink does not work in Hadoop-free mode. We should fix this > and also add an end-to-end test that verifies that Hadoop-free mode works. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Closed] (FLINK-8593) Latency metric docs are outdated
[ https://issues.apache.org/jira/browse/FLINK-8593?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chesnay Schepler closed FLINK-8593. --- Resolution: Fixed master: a2d1d084b90f0f47b91aa372525f01a382c892e7 > Latency metric docs are outdated > > > Key: FLINK-8593 > URL: https://issues.apache.org/jira/browse/FLINK-8593 > Project: Flink > Issue Type: Improvement > Components: Documentation, Metrics >Affects Versions: 1.5.0 >Reporter: Chesnay Schepler >Assignee: Chesnay Schepler >Priority: Major > Fix For: 1.5.0 > > > I missed to update the latency metric documentation while working on > FLINK-7608. The docs should be updated to contain the new naming scheme and > that it is a job-level metric. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] flink pull request #5585: Release 1.4
Github user ArtHustonHitachi closed the pull request at: https://github.com/apache/flink/pull/5585 ---
[GitHub] flink pull request #5585: Release 1.4
GitHub user ArtHustonHitachi opened a pull request: https://github.com/apache/flink/pull/5585 Release 1.4 *Thank you very much for contributing to Apache Flink - we are happy that you want to help us improve Flink. To help the community review your contribution in the best possible way, please go through the checklist below, which will get the contribution into a shape in which it can be best reviewed.* *Please understand that we do not do this to make contributions to Flink a hassle. In order to uphold a high standard of quality for code contributions, while at the same time managing a large number of contributions, we need contributors to prepare the contributions well, and give reviewers enough contextual information for the review. Please also understand that contributions that do not follow this guide will take longer to review and thus typically be picked up with lower priority by the community.* ## Contribution Checklist - Make sure that the pull request corresponds to a [JIRA issue](https://issues.apache.org/jira/projects/FLINK/issues). Exceptions are made for typos in JavaDoc or documentation files, which need no JIRA issue. - Name the pull request in the form "[FLINK-] [component] Title of the pull request", where *FLINK-* should be replaced by the actual issue number. Skip *component* if you are unsure about which is the best component. Typo fixes that have no associated JIRA issue should be named following this pattern: `[hotfix] [docs] Fix typo in event time introduction` or `[hotfix] [javadocs] Expand JavaDoc for PuncuatedWatermarkGenerator`. - Fill out the template below to describe the changes contributed by the pull request. That will give reviewers the context they need to do the review. - Make sure that the change passes the automated tests, i.e., `mvn clean verify` passes. You can set up Travis CI to do that following [this guide](http://flink.apache.org/contribute-code.html#best-practices). - Each pull request should address only one issue, not mix up code from multiple issues. - Each commit in the pull request has a meaningful commit message (including the JIRA id) - Once all items of the checklist are addressed, remove the above text and this checklist, leaving only the filled out template below. **(The sections below can be removed for hotfixes of typos)** ## What is the purpose of the change *(For example: This pull request makes task deployment go through the blob server, rather than through RPC. That way we avoid re-transferring them on each deployment (during recovery).)* ## Brief change log *(for example:)* - *The TaskInfo is stored in the blob store on job creation time as a persistent artifact* - *Deployments RPC transmits only the blob storage reference* - *TaskManagers retrieve the TaskInfo from the blob cache* ## Verifying this change *(Please pick either of the following options)* This change is a trivial rework / code cleanup without any test coverage. *(or)* This change is already covered by existing tests, such as *(please describe tests)*. *(or)* This change added tests and can be verified as follows: *(example:)* - *Added integration tests for end-to-end deployment with large payloads (100MB)* - *Extended integration test for recovery after master (JobManager) failure* - *Added test that validates that TaskInfo is transferred only once across recoveries* - *Manually verified the change by running a 4 node cluser with 2 JobManagers and 4 TaskManagers, a stateful streaming program, and killing one JobManager and two TaskManagers during the execution, verifying that recovery happens correctly.* ## Does this pull request potentially affect one of the following parts: - Dependencies (does it add or upgrade a dependency): (yes / no) - The public API, i.e., is any changed class annotated with `@Public(Evolving)`: (yes / no) - The serializers: (yes / no / don't know) - The runtime per-record code paths (performance sensitive): (yes / no / don't know) - Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Yarn/Mesos, ZooKeeper: (yes / no / don't know) - The S3 file system connector: (yes / no / don't know) ## Documentation - Does this pull request introduce a new feature? (yes / no) - If yes, how is the feature documented? (not applicable / docs / JavaDocs / not documented) You can merge this pull request into a Git repository by running: $ git pull https://github.com/apache/flink release-1.4 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/5585.patch To
[jira] [Commented] (FLINK-8787) Deploying FLIP-6 YARN session with HA fails
[ https://issues.apache.org/jira/browse/FLINK-8787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16377273#comment-16377273 ] ASF GitHub Bot commented on FLINK-8787: --- Github user GJL commented on a diff in the pull request: https://github.com/apache/flink/pull/5584#discussion_r170679952 --- Diff: flink-yarn/src/main/java/org/apache/flink/yarn/AbstractYarnClusterDescriptor.java --- @@ -186,7 +187,7 @@ public YarnClient getYarnClient() { protected abstract String getYarnJobClusterEntrypoint(); public Configuration getFlinkConfiguration() { - return flinkConfiguration; + return new UnmodifiableConfiguration(flinkConfiguration); --- End diff -- not needed, `flinkConfiguration` is unmodifiable already > Deploying FLIP-6 YARN session with HA fails > --- > > Key: FLINK-8787 > URL: https://issues.apache.org/jira/browse/FLINK-8787 > Project: Flink > Issue Type: Bug > Components: Client, YARN >Affects Versions: 1.5.0 > Environment: emr-5.12.0 > Hadoop distribution: Amazon 2.8.3 > Applications: ZooKeeper 3.4.10 >Reporter: Gary Yao >Assignee: Gary Yao >Priority: Blocker > Labels: flip-6 > Fix For: 1.5.0 > > > Starting a YARN session with HA in FLIP-6 mode fails with an exception. > Commit: 5e3fa4403f518dd6d3fe9970fe8ca55871add7c9 > Command to start YARN session: > {noformat} > export HADOOP_CLASSPATH=`hadoop classpath` > HADOOP_CONF_DIR=/etc/hadoop/conf bin/yarn-session.sh -d -n 1 -s 1 -jm 2048 > -tm 2048 > {noformat} > Stacktrace: > {noformat} > java.lang.reflect.UndeclaredThrowableException > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1854) > at > org.apache.flink.runtime.security.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:41) > at > org.apache.flink.yarn.cli.FlinkYarnSessionCli.main(FlinkYarnSessionCli.java:790) > Caused by: org.apache.flink.util.FlinkException: Could not write the Yarn > connection information. > at > org.apache.flink.yarn.cli.FlinkYarnSessionCli.run(FlinkYarnSessionCli.java:612) > at > org.apache.flink.yarn.cli.FlinkYarnSessionCli.lambda$main$2(FlinkYarnSessionCli.java:790) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1836) > ... 2 more > Caused by: org.apache.flink.runtime.leaderretrieval.LeaderRetrievalException: > Could not retrieve the leader address and leader session ID. > at > org.apache.flink.runtime.util.LeaderRetrievalUtils.retrieveLeaderConnectionInfo(LeaderRetrievalUtils.java:116) > at > org.apache.flink.client.program.rest.RestClusterClient.getClusterConnectionInfo(RestClusterClient.java:405) > at > org.apache.flink.yarn.cli.FlinkYarnSessionCli.run(FlinkYarnSessionCli.java:589) > ... 6 more > Caused by: java.util.concurrent.TimeoutException: Futures timed out after > [6 milliseconds] > at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:223) > at scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:227) > at scala.concurrent.Await$$anonfun$result$1.apply(package.scala:190) > at > scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53) > at scala.concurrent.Await$.result(package.scala:190) > at scala.concurrent.Await.result(package.scala) > at > org.apache.flink.runtime.util.LeaderRetrievalUtils.retrieveLeaderConnectionInfo(LeaderRetrievalUtils.java:114) > ... 8 more > > The program finished with the following exception: > java.lang.reflect.UndeclaredThrowableException > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1854) > at > org.apache.flink.runtime.security.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:41) > at > org.apache.flink.yarn.cli.FlinkYarnSessionCli.main(FlinkYarnSessionCli.java:790) > Caused by: org.apache.flink.util.FlinkException: Could not write the Yarn > connection information. > at > org.apache.flink.yarn.cli.FlinkYarnSessionCli.run(FlinkYarnSessionCli.java:612) > at > org.apache.flink.yarn.cli.FlinkYarnSessionCli.lambda$main$2(FlinkYarnSessionCli.java:790) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1836) > ... 2 more > Caused by: org.apache.flink.runtime.leaderretrieval.LeaderRetrievalException: > Could not retrieve the leader address and leader session ID. > at >
[GitHub] flink pull request #5584: [FLINK-8787][flip6] WIP
Github user GJL commented on a diff in the pull request: https://github.com/apache/flink/pull/5584#discussion_r170679952 --- Diff: flink-yarn/src/main/java/org/apache/flink/yarn/AbstractYarnClusterDescriptor.java --- @@ -186,7 +187,7 @@ public YarnClient getYarnClient() { protected abstract String getYarnJobClusterEntrypoint(); public Configuration getFlinkConfiguration() { - return flinkConfiguration; + return new UnmodifiableConfiguration(flinkConfiguration); --- End diff -- not needed, `flinkConfiguration` is unmodifiable already ---
[jira] [Commented] (FLINK-8787) Deploying FLIP-6 YARN session with HA fails
[ https://issues.apache.org/jira/browse/FLINK-8787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16377267#comment-16377267 ] ASF GitHub Bot commented on FLINK-8787: --- GitHub user GJL opened a pull request: https://github.com/apache/flink/pull/5584 [FLINK-8787][flip6] WIP WIP ## What is the purpose of the change *(For example: This pull request makes task deployment go through the blob server, rather than through RPC. That way we avoid re-transferring them on each deployment (during recovery).)* cc: @tillrohrmann ## Brief change log *(for example:)* - *The TaskInfo is stored in the blob store on job creation time as a persistent artifact* - *Deployments RPC transmits only the blob storage reference* - *TaskManagers retrieve the TaskInfo from the blob cache* ## Verifying this change *(Please pick either of the following options)* This change is a trivial rework / code cleanup without any test coverage. *(or)* This change is already covered by existing tests, such as *(please describe tests)*. *(or)* This change added tests and can be verified as follows: *(example:)* - *Added integration tests for end-to-end deployment with large payloads (100MB)* - *Extended integration test for recovery after master (JobManager) failure* - *Added test that validates that TaskInfo is transferred only once across recoveries* - *Manually verified the change by running a 4 node cluser with 2 JobManagers and 4 TaskManagers, a stateful streaming program, and killing one JobManager and two TaskManagers during the execution, verifying that recovery happens correctly.* ## Does this pull request potentially affect one of the following parts: - Dependencies (does it add or upgrade a dependency): (yes / no) - The public API, i.e., is any changed class annotated with `@Public(Evolving)`: (yes / no) - The serializers: (yes / no / don't know) - The runtime per-record code paths (performance sensitive): (yes / no / don't know) - Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Yarn/Mesos, ZooKeeper: (yes / no / don't know) - The S3 file system connector: (yes / no / don't know) ## Documentation - Does this pull request introduce a new feature? (yes / no) - If yes, how is the feature documented? (not applicable / docs / JavaDocs / not documented) You can merge this pull request into a Git repository by running: $ git pull https://github.com/GJL/flink FLINK-8787 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/5584.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #5584 commit 0cde09add17106f09e1f44b2a73400ea14a9eb21 Author: gyaoDate: 2018-02-26T17:52:18Z [hotfix] Add requireNonNull validation to Configuration copy constructor commit 8f826c673bafd8228bb25da20e7dd40384fae971 Author: gyao Date: 2018-02-26T17:54:34Z [FLINK-8787][flip6] Ensure that zk namespace configuration reaches RestClusterClient > Deploying FLIP-6 YARN session with HA fails > --- > > Key: FLINK-8787 > URL: https://issues.apache.org/jira/browse/FLINK-8787 > Project: Flink > Issue Type: Bug > Components: Client, YARN >Affects Versions: 1.5.0 > Environment: emr-5.12.0 > Hadoop distribution: Amazon 2.8.3 > Applications: ZooKeeper 3.4.10 >Reporter: Gary Yao >Assignee: Gary Yao >Priority: Blocker > Labels: flip-6 > Fix For: 1.5.0 > > > Starting a YARN session with HA in FLIP-6 mode fails with an exception. > Commit: 5e3fa4403f518dd6d3fe9970fe8ca55871add7c9 > Command to start YARN session: > {noformat} > export HADOOP_CLASSPATH=`hadoop classpath` > HADOOP_CONF_DIR=/etc/hadoop/conf bin/yarn-session.sh -d -n 1 -s 1 -jm 2048 > -tm 2048 > {noformat} > Stacktrace: > {noformat} > java.lang.reflect.UndeclaredThrowableException > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1854) > at > org.apache.flink.runtime.security.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:41) > at > org.apache.flink.yarn.cli.FlinkYarnSessionCli.main(FlinkYarnSessionCli.java:790) > Caused by: org.apache.flink.util.FlinkException: Could not write the Yarn > connection information. > at > org.apache.flink.yarn.cli.FlinkYarnSessionCli.run(FlinkYarnSessionCli.java:612) > at > org.apache.flink.yarn.cli.FlinkYarnSessionCli.lambda$main$2(FlinkYarnSessionCli.java:790) > at
[GitHub] flink pull request #5584: [FLINK-8787][flip6] WIP
GitHub user GJL opened a pull request: https://github.com/apache/flink/pull/5584 [FLINK-8787][flip6] WIP WIP ## What is the purpose of the change *(For example: This pull request makes task deployment go through the blob server, rather than through RPC. That way we avoid re-transferring them on each deployment (during recovery).)* cc: @tillrohrmann ## Brief change log *(for example:)* - *The TaskInfo is stored in the blob store on job creation time as a persistent artifact* - *Deployments RPC transmits only the blob storage reference* - *TaskManagers retrieve the TaskInfo from the blob cache* ## Verifying this change *(Please pick either of the following options)* This change is a trivial rework / code cleanup without any test coverage. *(or)* This change is already covered by existing tests, such as *(please describe tests)*. *(or)* This change added tests and can be verified as follows: *(example:)* - *Added integration tests for end-to-end deployment with large payloads (100MB)* - *Extended integration test for recovery after master (JobManager) failure* - *Added test that validates that TaskInfo is transferred only once across recoveries* - *Manually verified the change by running a 4 node cluser with 2 JobManagers and 4 TaskManagers, a stateful streaming program, and killing one JobManager and two TaskManagers during the execution, verifying that recovery happens correctly.* ## Does this pull request potentially affect one of the following parts: - Dependencies (does it add or upgrade a dependency): (yes / no) - The public API, i.e., is any changed class annotated with `@Public(Evolving)`: (yes / no) - The serializers: (yes / no / don't know) - The runtime per-record code paths (performance sensitive): (yes / no / don't know) - Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Yarn/Mesos, ZooKeeper: (yes / no / don't know) - The S3 file system connector: (yes / no / don't know) ## Documentation - Does this pull request introduce a new feature? (yes / no) - If yes, how is the feature documented? (not applicable / docs / JavaDocs / not documented) You can merge this pull request into a Git repository by running: $ git pull https://github.com/GJL/flink FLINK-8787 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/5584.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #5584 commit 0cde09add17106f09e1f44b2a73400ea14a9eb21 Author: gyaoDate: 2018-02-26T17:52:18Z [hotfix] Add requireNonNull validation to Configuration copy constructor commit 8f826c673bafd8228bb25da20e7dd40384fae971 Author: gyao Date: 2018-02-26T17:54:34Z [FLINK-8787][flip6] Ensure that zk namespace configuration reaches RestClusterClient ---
[jira] [Updated] (FLINK-8787) Deploying FLIP-6 YARN session with HA fails
[ https://issues.apache.org/jira/browse/FLINK-8787?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gary Yao updated FLINK-8787: Description: Starting a YARN session with HA in FLIP-6 mode fails with an exception. Commit: 5e3fa4403f518dd6d3fe9970fe8ca55871add7c9 Command to start YARN session: {noformat} export HADOOP_CLASSPATH=`hadoop classpath` HADOOP_CONF_DIR=/etc/hadoop/conf bin/yarn-session.sh -d -n 1 -s 1 -jm 2048 -tm 2048 {noformat} Stacktrace: {noformat} java.lang.reflect.UndeclaredThrowableException at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1854) at org.apache.flink.runtime.security.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:41) at org.apache.flink.yarn.cli.FlinkYarnSessionCli.main(FlinkYarnSessionCli.java:790) Caused by: org.apache.flink.util.FlinkException: Could not write the Yarn connection information. at org.apache.flink.yarn.cli.FlinkYarnSessionCli.run(FlinkYarnSessionCli.java:612) at org.apache.flink.yarn.cli.FlinkYarnSessionCli.lambda$main$2(FlinkYarnSessionCli.java:790) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1836) ... 2 more Caused by: org.apache.flink.runtime.leaderretrieval.LeaderRetrievalException: Could not retrieve the leader address and leader session ID. at org.apache.flink.runtime.util.LeaderRetrievalUtils.retrieveLeaderConnectionInfo(LeaderRetrievalUtils.java:116) at org.apache.flink.client.program.rest.RestClusterClient.getClusterConnectionInfo(RestClusterClient.java:405) at org.apache.flink.yarn.cli.FlinkYarnSessionCli.run(FlinkYarnSessionCli.java:589) ... 6 more Caused by: java.util.concurrent.TimeoutException: Futures timed out after [6 milliseconds] at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:223) at scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:227) at scala.concurrent.Await$$anonfun$result$1.apply(package.scala:190) at scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53) at scala.concurrent.Await$.result(package.scala:190) at scala.concurrent.Await.result(package.scala) at org.apache.flink.runtime.util.LeaderRetrievalUtils.retrieveLeaderConnectionInfo(LeaderRetrievalUtils.java:114) ... 8 more The program finished with the following exception: java.lang.reflect.UndeclaredThrowableException at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1854) at org.apache.flink.runtime.security.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:41) at org.apache.flink.yarn.cli.FlinkYarnSessionCli.main(FlinkYarnSessionCli.java:790) Caused by: org.apache.flink.util.FlinkException: Could not write the Yarn connection information. at org.apache.flink.yarn.cli.FlinkYarnSessionCli.run(FlinkYarnSessionCli.java:612) at org.apache.flink.yarn.cli.FlinkYarnSessionCli.lambda$main$2(FlinkYarnSessionCli.java:790) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1836) ... 2 more Caused by: org.apache.flink.runtime.leaderretrieval.LeaderRetrievalException: Could not retrieve the leader address and leader session ID. at org.apache.flink.runtime.util.LeaderRetrievalUtils.retrieveLeaderConnectionInfo(LeaderRetrievalUtils.java:116) at org.apache.flink.client.program.rest.RestClusterClient.getClusterConnectionInfo(RestClusterClient.java:405) at org.apache.flink.yarn.cli.FlinkYarnSessionCli.run(FlinkYarnSessionCli.java:589) ... 6 more Caused by: java.util.concurrent.TimeoutException: Futures timed out after [6 milliseconds] at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:223) at scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:227) at scala.concurrent.Await$$anonfun$result$1.apply(package.scala:190) at scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53) at scala.concurrent.Await$.result(package.scala:190) at scala.concurrent.Await.result(package.scala) at org.apache.flink.runtime.util.LeaderRetrievalUtils.retrieveLeaderConnectionInfo(LeaderRetrievalUtils.java:114) ... 8 more {noformat} was: Starting a YARN session in FLIP-6 mode fails with an exception. Commit: 5e3fa4403f518dd6d3fe9970fe8ca55871add7c9 Command to start YARN session: {noformat} export HADOOP_CLASSPATH=`hadoop classpath` HADOOP_CONF_DIR=/etc/hadoop/conf bin/yarn-session.sh -d -n 1 -s 1 -jm 2048 -tm 2048 {noformat} Stacktrace: {noformat} java.lang.reflect.UndeclaredThrowableException at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1854) at
[jira] [Updated] (FLINK-8787) Deploying FLIP-6 YARN session with HA fails
[ https://issues.apache.org/jira/browse/FLINK-8787?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gary Yao updated FLINK-8787: Summary: Deploying FLIP-6 YARN session with HA fails (was: Deploying FLIP-6 YARN session fails) > Deploying FLIP-6 YARN session with HA fails > --- > > Key: FLINK-8787 > URL: https://issues.apache.org/jira/browse/FLINK-8787 > Project: Flink > Issue Type: Bug > Components: Client, YARN >Affects Versions: 1.5.0 > Environment: emr-5.12.0 > Hadoop distribution: Amazon 2.8.3 > Applications: ZooKeeper 3.4.10 >Reporter: Gary Yao >Assignee: Gary Yao >Priority: Blocker > Labels: flip-6 > Fix For: 1.5.0 > > > Starting a YARN session in FLIP-6 mode fails with an exception. > Commit: 5e3fa4403f518dd6d3fe9970fe8ca55871add7c9 > Command to start YARN session: > {noformat} > export HADOOP_CLASSPATH=`hadoop classpath` > HADOOP_CONF_DIR=/etc/hadoop/conf bin/yarn-session.sh -d -n 1 -s 1 -jm 2048 > -tm 2048 > {noformat} > Stacktrace: > {noformat} > java.lang.reflect.UndeclaredThrowableException > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1854) > at > org.apache.flink.runtime.security.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:41) > at > org.apache.flink.yarn.cli.FlinkYarnSessionCli.main(FlinkYarnSessionCli.java:790) > Caused by: org.apache.flink.util.FlinkException: Could not write the Yarn > connection information. > at > org.apache.flink.yarn.cli.FlinkYarnSessionCli.run(FlinkYarnSessionCli.java:612) > at > org.apache.flink.yarn.cli.FlinkYarnSessionCli.lambda$main$2(FlinkYarnSessionCli.java:790) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1836) > ... 2 more > Caused by: org.apache.flink.runtime.leaderretrieval.LeaderRetrievalException: > Could not retrieve the leader address and leader session ID. > at > org.apache.flink.runtime.util.LeaderRetrievalUtils.retrieveLeaderConnectionInfo(LeaderRetrievalUtils.java:116) > at > org.apache.flink.client.program.rest.RestClusterClient.getClusterConnectionInfo(RestClusterClient.java:405) > at > org.apache.flink.yarn.cli.FlinkYarnSessionCli.run(FlinkYarnSessionCli.java:589) > ... 6 more > Caused by: java.util.concurrent.TimeoutException: Futures timed out after > [6 milliseconds] > at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:223) > at scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:227) > at scala.concurrent.Await$$anonfun$result$1.apply(package.scala:190) > at > scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53) > at scala.concurrent.Await$.result(package.scala:190) > at scala.concurrent.Await.result(package.scala) > at > org.apache.flink.runtime.util.LeaderRetrievalUtils.retrieveLeaderConnectionInfo(LeaderRetrievalUtils.java:114) > ... 8 more > > The program finished with the following exception: > java.lang.reflect.UndeclaredThrowableException > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1854) > at > org.apache.flink.runtime.security.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:41) > at > org.apache.flink.yarn.cli.FlinkYarnSessionCli.main(FlinkYarnSessionCli.java:790) > Caused by: org.apache.flink.util.FlinkException: Could not write the Yarn > connection information. > at > org.apache.flink.yarn.cli.FlinkYarnSessionCli.run(FlinkYarnSessionCli.java:612) > at > org.apache.flink.yarn.cli.FlinkYarnSessionCli.lambda$main$2(FlinkYarnSessionCli.java:790) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1836) > ... 2 more > Caused by: org.apache.flink.runtime.leaderretrieval.LeaderRetrievalException: > Could not retrieve the leader address and leader session ID. > at > org.apache.flink.runtime.util.LeaderRetrievalUtils.retrieveLeaderConnectionInfo(LeaderRetrievalUtils.java:116) > at > org.apache.flink.client.program.rest.RestClusterClient.getClusterConnectionInfo(RestClusterClient.java:405) > at > org.apache.flink.yarn.cli.FlinkYarnSessionCli.run(FlinkYarnSessionCli.java:589) > ... 6 more > Caused by: java.util.concurrent.TimeoutException: Futures timed out after > [6 milliseconds] > at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:223) > at scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:227) > at scala.concurrent.Await$$anonfun$result$1.apply(package.scala:190) >
[jira] [Commented] (FLINK-6352) FlinkKafkaConsumer should support to use timestamp to set up start offset
[ https://issues.apache.org/jira/browse/FLINK-6352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16377178#comment-16377178 ] ASF GitHub Bot commented on FLINK-6352: --- Github user aljoscha commented on a diff in the pull request: https://github.com/apache/flink/pull/5282#discussion_r170663674 --- Diff: flink-connectors/flink-connector-kafka-0.10/src/test/java/org/apache/flink/streaming/connectors/kafka/internal/Kafka010FetcherTest.java --- @@ -129,9 +129,14 @@ public Void answer(InvocationOnMock invocation) { schema, new Properties(), 0L, +<<< HEAD --- End diff -- Leftover merge markers? > FlinkKafkaConsumer should support to use timestamp to set up start offset > - > > Key: FLINK-6352 > URL: https://issues.apache.org/jira/browse/FLINK-6352 > Project: Flink > Issue Type: Improvement > Components: Kafka Connector >Reporter: Fang Yong >Assignee: Fang Yong >Priority: Blocker > Fix For: 1.5.0 > > > Currently "auto.offset.reset" is used to initialize the start offset of > FlinkKafkaConsumer, and the value should be earliest/latest/none. This method > can only let the job comsume the beginning or the most recent data, but can > not specify the specific offset of Kafka began to consume. > So, there should be a configuration item (such as > "flink.source.start.time" and the format is "-MM-dd HH:mm:ss") that > allows user to configure the initial offset of Kafka. The action of > "flink.source.start.time" is as follows: > 1) job start from checkpoint / savepoint > a> offset of partition can be restored from checkpoint/savepoint, > "flink.source.start.time" will be ignored. > b> there's no checkpoint/savepoint for the partition (For example, this > partition is newly increased), the "flink.kafka.start.time" will be used to > initialize the offset of the partition > 2) job has no checkpoint / savepoint, the "flink.source.start.time" is used > to initialize the offset of the kafka > a> the "flink.source.start.time" is valid, use it to set the offset of kafka > b> the "flink.source.start.time" is out-of-range, the same as it does > currently with no initial offset, get kafka's current offset and start reading -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-8737) Creating a union of UnionGate instances will fail with UnsupportedOperationException when retrieving buffers
[ https://issues.apache.org/jira/browse/FLINK-8737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16377176#comment-16377176 ] Nico Kruber commented on FLINK-8737: We ([~uce] and me) could not think of any way a user can create a union of union input gates by defining a program to run on flink and did not find cases that we actually do now in the code base or reasons to do so in future. Just in case we missed something, [~StephanEwen] can you confirm that creating a recursive union of union input gates is indeed dead code and not needed? > Creating a union of UnionGate instances will fail with > UnsupportedOperationException when retrieving buffers > > > Key: FLINK-8737 > URL: https://issues.apache.org/jira/browse/FLINK-8737 > Project: Flink > Issue Type: Sub-task > Components: Network >Reporter: Nico Kruber >Priority: Blocker > Fix For: 1.5.0 > > > FLINK-8589 introduced a new polling method but did not implement > {{UnionInputGate#pollNextBufferOrEvent()}}. This prevents UnionGate instances > from containing a UnionGate instance which is explicitly allowed by its > documentation and interface. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (FLINK-8737) Creating a union of UnionGate instances will fail with UnsupportedOperationException when retrieving buffers
[ https://issues.apache.org/jira/browse/FLINK-8737?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nico Kruber reassigned FLINK-8737: -- Assignee: Nico Kruber > Creating a union of UnionGate instances will fail with > UnsupportedOperationException when retrieving buffers > > > Key: FLINK-8737 > URL: https://issues.apache.org/jira/browse/FLINK-8737 > Project: Flink > Issue Type: Sub-task > Components: Network >Reporter: Nico Kruber >Assignee: Nico Kruber >Priority: Blocker > Fix For: 1.5.0 > > > FLINK-8589 introduced a new polling method but did not implement > {{UnionInputGate#pollNextBufferOrEvent()}}. This prevents UnionGate instances > from containing a UnionGate instance which is explicitly allowed by its > documentation and interface. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] flink pull request #5282: [FLINK-6352] [kafka] Timestamp-based offset config...
Github user aljoscha commented on a diff in the pull request: https://github.com/apache/flink/pull/5282#discussion_r170663674 --- Diff: flink-connectors/flink-connector-kafka-0.10/src/test/java/org/apache/flink/streaming/connectors/kafka/internal/Kafka010FetcherTest.java --- @@ -129,9 +129,14 @@ public Void answer(InvocationOnMock invocation) { schema, new Properties(), 0L, +<<< HEAD --- End diff -- Leftover merge markers? ---
[jira] [Commented] (FLINK-8737) Creating a union of UnionGate instances will fail with UnsupportedOperationException when retrieving buffers
[ https://issues.apache.org/jira/browse/FLINK-8737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16377172#comment-16377172 ] ASF GitHub Bot commented on FLINK-8737: --- GitHub user NicoK opened a pull request: https://github.com/apache/flink/pull/5583 [FLINK-8737][network] disallow creating a union of UnionGate instances ## What is the purpose of the change Recently, the `InputGate#pollNextBufferOrEvent()` was added but not implemented in `UnionInputGate`. It is, however, used in `UnionInputGate#getNextBufferOrEvent()` and thus any `UnionInputGate` containing a `UnionInputGate` would have failed already. There should be no use case for wiring up inputs this way. Therefore, fail early when trying to construct this. ## Brief change log - throw an `UnsupportedOperationException` when trying to use a `UnionInputGate` for input to a `UnionInputGate` ## Verifying this change This change is a trivial rework / code cleanup without any test coverage. ## Does this pull request potentially affect one of the following parts: - Dependencies (does it add or upgrade a dependency): **no** - The public API, i.e., is any changed class annotated with `@Public(Evolving)`: **no** - The serializers: **no** - The runtime per-record code paths (performance sensitive): **no** - Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Yarn/Mesos, ZooKeeper: **no** - The S3 file system connector: **no** ## Documentation - Does this pull request introduce a new feature? **no** - If yes, how is the feature documented? **JavaDocs** You can merge this pull request into a Git repository by running: $ git pull https://github.com/NicoK/flink flink-8737 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/5583.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #5583 commit 198bf145e2a4b81327ae0c1e58c5bf4f8a5da650 Author: Nico KruberDate: 2018-02-26T16:50:10Z [hotfix][network] minor improvements in UnionInputGate commit 6eff3e396b5a98a1029e3f42b5e78abfb10da7a6 Author: Nico Kruber Date: 2018-02-26T16:52:37Z [FLINK-8737][network] disallow creating a union of UnionInputGate instances Recently, the pollNextBufferOrEvent() was added but not implemented but this is used in getNextBufferOrEvent() and thus any UnionInputGate containing a UnionInputGate would have failed already. There should be no use case for wiring up inputs this way. Therefore, fail early when trying to construct this. > Creating a union of UnionGate instances will fail with > UnsupportedOperationException when retrieving buffers > > > Key: FLINK-8737 > URL: https://issues.apache.org/jira/browse/FLINK-8737 > Project: Flink > Issue Type: Sub-task > Components: Network >Reporter: Nico Kruber >Priority: Blocker > Fix For: 1.5.0 > > > FLINK-8589 introduced a new polling method but did not implement > {{UnionInputGate#pollNextBufferOrEvent()}}. This prevents UnionGate instances > from containing a UnionGate instance which is explicitly allowed by its > documentation and interface. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] flink pull request #5583: [FLINK-8737][network] disallow creating a union of...
GitHub user NicoK opened a pull request: https://github.com/apache/flink/pull/5583 [FLINK-8737][network] disallow creating a union of UnionGate instances ## What is the purpose of the change Recently, the `InputGate#pollNextBufferOrEvent()` was added but not implemented in `UnionInputGate`. It is, however, used in `UnionInputGate#getNextBufferOrEvent()` and thus any `UnionInputGate` containing a `UnionInputGate` would have failed already. There should be no use case for wiring up inputs this way. Therefore, fail early when trying to construct this. ## Brief change log - throw an `UnsupportedOperationException` when trying to use a `UnionInputGate` for input to a `UnionInputGate` ## Verifying this change This change is a trivial rework / code cleanup without any test coverage. ## Does this pull request potentially affect one of the following parts: - Dependencies (does it add or upgrade a dependency): **no** - The public API, i.e., is any changed class annotated with `@Public(Evolving)`: **no** - The serializers: **no** - The runtime per-record code paths (performance sensitive): **no** - Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Yarn/Mesos, ZooKeeper: **no** - The S3 file system connector: **no** ## Documentation - Does this pull request introduce a new feature? **no** - If yes, how is the feature documented? **JavaDocs** You can merge this pull request into a Git repository by running: $ git pull https://github.com/NicoK/flink flink-8737 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/5583.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #5583 commit 198bf145e2a4b81327ae0c1e58c5bf4f8a5da650 Author: Nico KruberDate: 2018-02-26T16:50:10Z [hotfix][network] minor improvements in UnionInputGate commit 6eff3e396b5a98a1029e3f42b5e78abfb10da7a6 Author: Nico Kruber Date: 2018-02-26T16:52:37Z [FLINK-8737][network] disallow creating a union of UnionInputGate instances Recently, the pollNextBufferOrEvent() was added but not implemented but this is used in getNextBufferOrEvent() and thus any UnionInputGate containing a UnionInputGate would have failed already. There should be no use case for wiring up inputs this way. Therefore, fail early when trying to construct this. ---
[jira] [Updated] (FLINK-8770) CompletedCheckPoints stored on ZooKeeper is not up-to-date, when JobManager is restarted it fails to recover the job due to "checkpoint FileNotFound exception"
[ https://issues.apache.org/jira/browse/FLINK-8770?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aljoscha Krettek updated FLINK-8770: Fix Version/s: 1.5.0 > CompletedCheckPoints stored on ZooKeeper is not up-to-date, when JobManager > is restarted it fails to recover the job due to "checkpoint FileNotFound > exception" > --- > > Key: FLINK-8770 > URL: https://issues.apache.org/jira/browse/FLINK-8770 > Project: Flink > Issue Type: Bug > Components: Local Runtime >Affects Versions: 1.4.0 >Reporter: Xinyang Gao >Priority: Blocker > Fix For: 1.5.0 > > > Hi, I am running a Flink cluster (1 JobManager + 6 TaskManagers) with HA mode > on OpenShift, I have enabled Chaos Monkey which kills either JobManager or > one of the TaskManager in every 5 minutes, ZooKeeper quorum is stable with no > chaos monkey enabled. Flink reads data from one Kafka topic and writes data > into another Kafka topic. Checkpoint surely is enabled, with 1000ms interval. > state.checkpoints.num-retained is set to 10. I am using PVC for state backend > (checkpoint, recovery, etc), so the checkpoints and states are persistent. > The restart strategy for Flink jobmanager DeploymentConfig is > {color:#d04437}recreate, {color:#33}which means it will kill the old > container of jobmanager before it restarts the new one.{color}{color} > I have run the Chaos test for one day at first, however I have seen the > exception: > {color:#FF}org.apache.flink.util.FlinkException: Could not retrieve > checkpoint *** from state handle under /***. This indicates that the > retrieved state handle is broken. Try cleaning the state handle store. > {color:#33}and the root cause is checkpoint > {color:#d04437}FileNotFound{color}. {color}{color} > {color:#FF}{color:#33}then the Flink job keeps restarting for a few > hours and due to the above error it cannot be restarted successfully. > {color}{color} > {color:#FF}{color:#33}After further investigation, I have found the > following facts in my PVC:{color}{color} > > {color:#d04437}-rw-r--r--. 1 flink root 11379 Feb 23 02:10 > completedCheckpoint0ee95157de00 > -rw-r--r--. 1 flink root 11379 Feb 23 01:51 completedCheckpoint498d0952cf00 > -rw-r--r--. 1 flink root 11379 Feb 23 02:10 completedCheckpoint650fe5b021fe > -rw-r--r--. 1 flink root 11379 Feb 23 02:10 completedCheckpoint66634149683e > -rw-r--r--. 1 flink root 11379 Feb 23 02:11 completedCheckpoint67f24c3b018e > -rw-r--r--. 1 flink root 11379 Feb 23 02:10 completedCheckpoint6f64ebf0ae64 > -rw-r--r--. 1 flink root 11379 Feb 23 02:11 completedCheckpoint906ebe1fb337 > -rw-r--r--. 1 flink root 11379 Feb 23 02:11 completedCheckpoint98b79ea14b09 > -rw-r--r--. 1 flink root 11379 Feb 23 02:10 completedCheckpointa0d1070e0b6c > -rw-r--r--. 1 flink root 11379 Feb 23 02:11 completedCheckpointbd3a9ba50322 > -rw-r--r--. 1 flink root 11355 Feb 22 17:31 completedCheckpointd433b5e108be > -rw-r--r--. 1 flink root 11379 Feb 22 22:56 completedCheckpointdd0183ed092b > -rw-r--r--. 1 flink root 11379 Feb 22 00:00 completedCheckpointe0a5146c3d81 > -rw-r--r--. 1 flink root 11331 Feb 22 17:06 completedCheckpointec82f3ebc2ad > -rw-r--r--. 1 flink root 11379 Feb 23 02:11 > completedCheckpointf86e460f6720{color} > > {color:#33}The latest 10 checkpoints are created at about 02:10, if you > ignore the old checkpoints which were not deleted successfully (which I do > not care too much).{color} > > {color:#33}However when checking on ZooKeeper, I see the followings in > flink/checkpoints path (I only list one, but the other 9 are similar){color} > {color:#d04437}cZxid = 0x160001ff5d > ��sr;org.apache.flink.runtime.state.RetrievableStreamStateHandle�U�+LwrappedStreamStateHandlet2Lorg/apache/flink/runtime/state/StreamStateHandle;xpsr9org.apache.flink.runtime.state.filesystem.FileStateHandle�u�b�▒▒J > > stateSizefilePathtLorg/apache/flink/core/fs/Path;xp,ssrorg.apache.flink.core.fs.PathLuritLjava/net/URI;xpsr > > java.net.URI�x.C�I�LstringtLjava/lang/String;xpt=file:/mnt/flink-test/recovery/completedCheckpointd004a3753870x > [zk: localhost:2181(CONNECTED) 7] ctime = Fri Feb 23 02:08:18 UTC 2018 > mZxid = 0x160001ff5d > mtime = Fri Feb 23 02:08:18 UTC 2018 > pZxid = 0x1d0c6d > cversion = 31 > dataVersion = 0 > aclVersion = 0 > ephemeralOwner = 0x0 > dataLength = 492{color} > {color:#FF}{color:#33} {color}{color} > so the latest completedCheckpoints status stored on ZooKeeper is at about > {color:#d04437}02:08, {color:#33}which implies that the completed > checkpoints at{color}{color} {color:#d04437}02:10 {color:#33}somehow are > not successfully submitted to ZooKpeer, so when it tries
[jira] [Updated] (FLINK-8779) ClassLoaderITCase.testKMeansJobWithCustomClassLoader fails on Travis
[ https://issues.apache.org/jira/browse/FLINK-8779?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aljoscha Krettek updated FLINK-8779: Priority: Blocker (was: Critical) > ClassLoaderITCase.testKMeansJobWithCustomClassLoader fails on Travis > > > Key: FLINK-8779 > URL: https://issues.apache.org/jira/browse/FLINK-8779 > Project: Flink > Issue Type: Bug > Components: Tests >Affects Versions: 1.5.0 >Reporter: Till Rohrmann >Priority: Blocker > Labels: test-stability > Fix For: 1.5.0 > > > The \{{ClassLoaderITCase.testKMeansJobWithCustomClassLoader}} fails on Travis > by producing not output for 300s. This might indicate a test instability or a > problem with Flink which was recently introduced. > https://api.travis-ci.org/v3/job/344427688/log.txt -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (FLINK-8783) Test instability SlotPoolRpcTest.testExtraSlotsAreKept
[ https://issues.apache.org/jira/browse/FLINK-8783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aljoscha Krettek updated FLINK-8783: Priority: Blocker (was: Critical) > Test instability SlotPoolRpcTest.testExtraSlotsAreKept > -- > > Key: FLINK-8783 > URL: https://issues.apache.org/jira/browse/FLINK-8783 > Project: Flink > Issue Type: Bug > Components: Tests >Affects Versions: 1.5.0 >Reporter: Gary Yao >Priority: Blocker > Fix For: 1.5.0 > > > [https://travis-ci.org/GJL/flink/jobs/346206290] > {noformat} > Tests run: 5, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 0.784 sec <<< > FAILURE! - in org.apache.flink.runtime.jobmaster.slotpool.SlotPoolRpcTest > testExtraSlotsAreKept(org.apache.flink.runtime.jobmaster.slotpool.SlotPoolRpcTest) > Time elapsed: 0.016 sec <<< FAILURE! > java.lang.AssertionError: expected:<0> but was:<1> > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:834) > at org.junit.Assert.assertEquals(Assert.java:645) > at org.junit.Assert.assertEquals(Assert.java:631) > at > org.apache.flink.runtime.jobmaster.slotpool.SlotPoolRpcTest.testExtraSlotsAreKept(SlotPoolRpcTest.java:267) > {noformat} > I reproduced this in IntelliJ by configuring 50 consecutive runs of > {{testExtraSlotsAreKept}}. On my machine the 8th execution fails sporadically. > commit: eeac022f0538e0979e6ad4eb06a2d1031cbd0146 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (FLINK-8783) Test instability SlotPoolRpcTest.testExtraSlotsAreKept
[ https://issues.apache.org/jira/browse/FLINK-8783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aljoscha Krettek updated FLINK-8783: Fix Version/s: 1.5.0 > Test instability SlotPoolRpcTest.testExtraSlotsAreKept > -- > > Key: FLINK-8783 > URL: https://issues.apache.org/jira/browse/FLINK-8783 > Project: Flink > Issue Type: Bug > Components: Tests >Affects Versions: 1.5.0 >Reporter: Gary Yao >Priority: Blocker > Fix For: 1.5.0 > > > [https://travis-ci.org/GJL/flink/jobs/346206290] > {noformat} > Tests run: 5, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 0.784 sec <<< > FAILURE! - in org.apache.flink.runtime.jobmaster.slotpool.SlotPoolRpcTest > testExtraSlotsAreKept(org.apache.flink.runtime.jobmaster.slotpool.SlotPoolRpcTest) > Time elapsed: 0.016 sec <<< FAILURE! > java.lang.AssertionError: expected:<0> but was:<1> > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:834) > at org.junit.Assert.assertEquals(Assert.java:645) > at org.junit.Assert.assertEquals(Assert.java:631) > at > org.apache.flink.runtime.jobmaster.slotpool.SlotPoolRpcTest.testExtraSlotsAreKept(SlotPoolRpcTest.java:267) > {noformat} > I reproduced this in IntelliJ by configuring 50 consecutive runs of > {{testExtraSlotsAreKept}}. On my machine the 8th execution fails sporadically. > commit: eeac022f0538e0979e6ad4eb06a2d1031cbd0146 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (FLINK-8789) StoppableFunctions sometimes not detected
[ https://issues.apache.org/jira/browse/FLINK-8789?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aljoscha Krettek updated FLINK-8789: Fix Version/s: 1.5.0 > StoppableFunctions sometimes not detected > - > > Key: FLINK-8789 > URL: https://issues.apache.org/jira/browse/FLINK-8789 > Project: Flink > Issue Type: Bug > Components: Streaming >Affects Versions: 1.5.0 >Reporter: Chesnay Schepler >Priority: Blocker > Fix For: 1.5.0 > > > While porting the {{TimestampITCase}} i got a strange exception in the logs > when running {{testWatermarkPropagationNoFinalWatermarkOnStop}}: > {code:java} > Caused by: java.lang.UnsupportedOperationException: Stopping not supported by > task Source: Custom Source (1/1) (87f44b422280be896bdf44766ebeb53e). > at org.apache.flink.runtime.taskmanager.Task.stopExecution(Task.java:965) > at > org.apache.flink.runtime.taskexecutor.TaskExecutor.stopTask(TaskExecutor.java:549) > ... 18 more{code} > This source however implements the {{StoppableFunction}} interface. > This happens also against a legacy cluster. -- This message was sent by Atlassian JIRA (v7.6.3#76005)