date:20180226

[jira] [Commented] (FLINK-8787) Deploying FLIP-6 YARN session with HA fails

2018-02-26 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-8787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16378174#comment-16378174
 ] 

ASF GitHub Bot commented on FLINK-8787:
---

Github user GJL commented on the issue:

https://github.com/apache/flink/pull/5584
  
test failures due to 
```
// logback + log4j, with/out krb5, different JVM opts
// IMPORTANT: Be aware that we are using side effects here to modify the 
created YarnClusterDescriptor,
// because we have a reference to the ClusterDescriptor's configuration 
which we modify continuously
```


> Deploying FLIP-6 YARN session with HA fails
> ---
>
> Key: FLINK-8787
> URL: https://issues.apache.org/jira/browse/FLINK-8787
> Project: Flink
>  Issue Type: Bug
>  Components: Client, YARN
>Affects Versions: 1.5.0
> Environment: emr-5.12.0
> Hadoop distribution: Amazon 2.8.3
> Applications: ZooKeeper 3.4.10
>Reporter: Gary Yao
>Assignee: Gary Yao
>Priority: Blocker
>  Labels: flip-6
> Fix For: 1.5.0
>
>
> Starting a YARN session with HA in FLIP-6 mode fails with an exception.
> Commit: 5e3fa4403f518dd6d3fe9970fe8ca55871add7c9
> Command to start YARN session:
> {noformat}
> export HADOOP_CLASSPATH=`hadoop classpath`
> HADOOP_CONF_DIR=/etc/hadoop/conf bin/yarn-session.sh -d -n 1 -s 1 -jm 2048 
> -tm 2048
> {noformat}
> Stacktrace:
> {noformat}
> java.lang.reflect.UndeclaredThrowableException
>  at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1854)
>  at 
> org.apache.flink.runtime.security.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:41)
>  at 
> org.apache.flink.yarn.cli.FlinkYarnSessionCli.main(FlinkYarnSessionCli.java:790)
> Caused by: org.apache.flink.util.FlinkException: Could not write the Yarn 
> connection information.
>  at 
> org.apache.flink.yarn.cli.FlinkYarnSessionCli.run(FlinkYarnSessionCli.java:612)
>  at 
> org.apache.flink.yarn.cli.FlinkYarnSessionCli.lambda$main$2(FlinkYarnSessionCli.java:790)
>  at java.security.AccessController.doPrivileged(Native Method)
>  at javax.security.auth.Subject.doAs(Subject.java:422)
>  at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1836)
>  ... 2 more
> Caused by: org.apache.flink.runtime.leaderretrieval.LeaderRetrievalException: 
> Could not retrieve the leader address and leader session ID.
>  at 
> org.apache.flink.runtime.util.LeaderRetrievalUtils.retrieveLeaderConnectionInfo(LeaderRetrievalUtils.java:116)
>  at 
> org.apache.flink.client.program.rest.RestClusterClient.getClusterConnectionInfo(RestClusterClient.java:405)
>  at 
> org.apache.flink.yarn.cli.FlinkYarnSessionCli.run(FlinkYarnSessionCli.java:589)
>  ... 6 more
> Caused by: java.util.concurrent.TimeoutException: Futures timed out after 
> [6 milliseconds]
>  at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:223)
>  at scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:227)
>  at scala.concurrent.Await$$anonfun$result$1.apply(package.scala:190)
>  at 
> scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53)
>  at scala.concurrent.Await$.result(package.scala:190)
>  at scala.concurrent.Await.result(package.scala)
>  at 
> org.apache.flink.runtime.util.LeaderRetrievalUtils.retrieveLeaderConnectionInfo(LeaderRetrievalUtils.java:114)
>  ... 8 more
> 
>  The program finished with the following exception:
> java.lang.reflect.UndeclaredThrowableException
>  at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1854)
>  at 
> org.apache.flink.runtime.security.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:41)
>  at 
> org.apache.flink.yarn.cli.FlinkYarnSessionCli.main(FlinkYarnSessionCli.java:790)
> Caused by: org.apache.flink.util.FlinkException: Could not write the Yarn 
> connection information.
>  at 
> org.apache.flink.yarn.cli.FlinkYarnSessionCli.run(FlinkYarnSessionCli.java:612)
>  at 
> org.apache.flink.yarn.cli.FlinkYarnSessionCli.lambda$main$2(FlinkYarnSessionCli.java:790)
>  at java.security.AccessController.doPrivileged(Native Method)
>  at javax.security.auth.Subject.doAs(Subject.java:422)
>  at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1836)
>  ... 2 more
> Caused by: org.apache.flink.runtime.leaderretrieval.LeaderRetrievalException: 
> Could not retrieve the leader address and leader session ID.
>  at 
> org.apache.flink.runtime.util.LeaderRetrievalUtils.retrieveLeaderConnectionInfo(LeaderRetrievalUtils.java:116)
>  at 
> org.apache.flink.client.program.rest.RestClusterClient.getClusterConnectionInfo(RestClusterClient.java:405)
>  at 
>

[GitHub] flink issue #5584: [FLINK-8787][flip6] WIP Deploying FLIP-6 YARN session wit...

2018-02-26 Thread GJL

Github user GJL commented on the issue:

https://github.com/apache/flink/pull/5584
  
test failures due to 
```
// logback + log4j, with/out krb5, different JVM opts
// IMPORTANT: Be aware that we are using side effects here to modify the 
created YarnClusterDescriptor,
// because we have a reference to the ClusterDescriptor's configuration 
which we modify continuously
```


---

[jira] [Commented] (FLINK-6352) FlinkKafkaConsumer should support to use timestamp to set up start offset

2018-02-26 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-6352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16378162#comment-16378162
 ] 

ASF GitHub Bot commented on FLINK-6352:
---

Github user tzulitai commented on the issue:

https://github.com/apache/flink/pull/5282
  
@aljoscha sorry about the leftover merge markers, I've fixed them now.


> FlinkKafkaConsumer should support to use timestamp to set up start offset
> -
>
> Key: FLINK-6352
> URL: https://issues.apache.org/jira/browse/FLINK-6352
> Project: Flink
>  Issue Type: Improvement
>  Components: Kafka Connector
>Reporter: Fang Yong
>Assignee: Fang Yong
>Priority: Blocker
> Fix For: 1.5.0
>
>
> Currently "auto.offset.reset" is used to initialize the start offset of 
> FlinkKafkaConsumer, and the value should be earliest/latest/none. This method 
> can only let the job comsume the beginning or the most recent data, but can 
> not specify the specific offset of Kafka began to consume. 
> So, there should be a configuration item (such as 
> "flink.source.start.time" and the format is "-MM-dd HH:mm:ss") that 
> allows user to configure the initial offset of Kafka. The action of 
> "flink.source.start.time" is as follows:
> 1) job start from checkpoint / savepoint
>   a> offset of partition can be restored from checkpoint/savepoint,  
> "flink.source.start.time" will be ignored.
>   b> there's no checkpoint/savepoint for the partition (For example, this 
> partition is newly increased), the "flink.kafka.start.time" will be used to 
> initialize the offset of the partition
> 2) job has no checkpoint / savepoint, the "flink.source.start.time" is used 
> to initialize the offset of the kafka
>   a> the "flink.source.start.time" is valid, use it to set the offset of kafka
>   b> the "flink.source.start.time" is out-of-range, the same as it does 
> currently with no initial offset, get kafka's current offset and start reading



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[GitHub] flink issue #5282: [FLINK-6352] [kafka] Timestamp-based offset configuration...

2018-02-26 Thread tzulitai

Github user tzulitai commented on the issue:

https://github.com/apache/flink/pull/5282
  
@aljoscha sorry about the leftover merge markers, I've fixed them now.


---

[jira] [Updated] (FLINK-6997) SavepointITCase fails in master branch sometimes

2018-02-26 Thread Aljoscha Krettek (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-6997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aljoscha Krettek updated FLINK-6997:

Priority: Blocker  (was: Critical)

> SavepointITCase fails in master branch sometimes
> 
>
> Key: FLINK-6997
> URL: https://issues.apache.org/jira/browse/FLINK-6997
> Project: Flink
>  Issue Type: Bug
>  Components: Tests
>Affects Versions: 1.3.0, 1.5.0
>Reporter: Ted Yu
>Priority: Blocker
>  Labels: test-stability
> Fix For: 1.5.0
>
>
> I got the following test failure (with commit 
> a0b781461bcf8c2f1d00b93464995f03eda589f1)
> {code}
> testSavepointForJobWithIteration(org.apache.flink.test.checkpointing.SavepointITCase)
>   Time elapsed: 8.129 sec  <<< ERROR!
> java.io.IOException: java.lang.Exception: Failed to complete savepoint
>   at 
> org.apache.flink.runtime.testingUtils.TestingCluster.triggerSavepoint(TestingCluster.scala:342)
>   at 
> org.apache.flink.runtime.testingUtils.TestingCluster.triggerSavepoint(TestingCluster.scala:316)
>   at 
> org.apache.flink.test.checkpointing.SavepointITCase.testSavepointForJobWithIteration(SavepointITCase.java:827)
> Caused by: java.lang.Exception: Failed to complete savepoint
>   at 
> org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1$$anon$7.apply(JobManager.scala:821)
>   at 
> org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1$$anon$7.apply(JobManager.scala:805)
>   at 
> org.apache.flink.runtime.concurrent.impl.FlinkFuture$5.onComplete(FlinkFuture.java:272)
>   at akka.dispatch.OnComplete.internal(Future.scala:247)
>   at akka.dispatch.OnComplete.internal(Future.scala:245)
>   at akka.dispatch.japi$CallbackBridge.apply(Future.scala:175)
>   at akka.dispatch.japi$CallbackBridge.apply(Future.scala:172)
>   at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:32)
>   at 
> akka.dispatch.BatchingExecutor$AbstractBatch.processBatch(BatchingExecutor.scala:55)
>   at 
> akka.dispatch.BatchingExecutor$BlockableBatch$$anonfun$run$1.apply$mcV$sp(BatchingExecutor.scala:91)
>   at 
> akka.dispatch.BatchingExecutor$BlockableBatch$$anonfun$run$1.apply(BatchingExecutor.scala:91)
>   at 
> akka.dispatch.BatchingExecutor$BlockableBatch$$anonfun$run$1.apply(BatchingExecutor.scala:91)
>   at scala.concurrent.BlockContext$.withBlockContext(BlockContext.scala:72)
>   at 
> akka.dispatch.BatchingExecutor$BlockableBatch.run(BatchingExecutor.scala:90)
>   at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:40)
>   at 
> akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:397)
>   at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
>   at 
> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
>   at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
>   at 
> scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
> Caused by: java.lang.Exception: Failed to trigger savepoint: Not all required 
> tasks are currently running.
>   at 
> org.apache.flink.runtime.checkpoint.CheckpointCoordinator.triggerSavepoint(CheckpointCoordinator.java:382)
>   at 
> org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1.applyOrElse(JobManager.scala:800)
>   at 
> scala.runtime.AbstractPartialFunction$mcVL$sp.apply$mcVL$sp(AbstractPartialFunction.scala:33)
>   at 
> scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:33)
>   at 
> scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:25)
>   at 
> org.apache.flink.runtime.testingUtils.TestingJobManagerLike$$anonfun$handleTestingMessage$1.applyOrElse(TestingJobManagerLike.scala:95)
>   at scala.PartialFunction$OrElse.apply(PartialFunction.scala:162)
>   at 
> org.apache.flink.runtime.LeaderSessionMessageFilter$$anonfun$receive$1.applyOrElse(LeaderSessionMessageFilter.scala:38)
>   at 
> scala.runtime.AbstractPartialFunction$mcVL$sp.apply$mcVL$sp(AbstractPartialFunction.scala:33)
>   at 
> scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:33)
>   at 
> scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:25)
>   at org.apache.flink.runtime.LogMessages$$anon$1.apply(LogMessages.scala:33)
>   at org.apache.flink.runtime.LogMessages$$anon$1.apply(LogMessages.scala:28)
>   at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:118)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (FLINK-6645) JobMasterTest failed on Travis

2018-02-26 Thread Aljoscha Krettek (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-6645?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aljoscha Krettek updated FLINK-6645:

Fix Version/s: 1.5.0

> JobMasterTest failed on Travis
> --
>
> Key: FLINK-6645
> URL: https://issues.apache.org/jira/browse/FLINK-6645
> Project: Flink
>  Issue Type: Bug
>  Components: Tests
>Affects Versions: 1.4.0
>Reporter: Chesnay Schepler
>Priority: Blocker
>  Labels: test-stability
> Fix For: 1.5.0
>
>
> {code}
> Failed tests: 
>   JobMasterTest.testHeartbeatTimeoutWithResourceManager:220 
> Wanted but not invoked:
> resourceManagerGateway.disconnectJobManager(
> be75c925204aede002136b15238f88b4,
> 
> );
> -> at 
> org.apache.flink.runtime.jobmaster.JobMasterTest.testHeartbeatTimeoutWithResourceManager(JobMasterTest.java:220)
> However, there were other interactions with this mock:
> resourceManagerGateway.registerJobManager(
> 82d774de-ed78-4670-9623-2fc6638fbbf9,
> 11b045ea-8b2b-4df0-aa02-ef4922dfc632,
> jm,
> "b442340a-7d3d-49d8-b440-1a93a5b43bb6",
> be75c925204aede002136b15238f88b4,
> 100 ms
> );
> -> at 
> org.apache.flink.runtime.jobmaster.JobMaster$ResourceManagerConnection$1.invokeRegistration(JobMaster.java:1051)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (FLINK-6645) JobMasterTest failed on Travis

2018-02-26 Thread Aljoscha Krettek (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-6645?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aljoscha Krettek updated FLINK-6645:

Priority: Blocker  (was: Major)

> JobMasterTest failed on Travis
> --
>
> Key: FLINK-6645
> URL: https://issues.apache.org/jira/browse/FLINK-6645
> Project: Flink
>  Issue Type: Bug
>  Components: Tests
>Affects Versions: 1.4.0
>Reporter: Chesnay Schepler
>Priority: Blocker
>  Labels: test-stability
> Fix For: 1.5.0
>
>
> {code}
> Failed tests: 
>   JobMasterTest.testHeartbeatTimeoutWithResourceManager:220 
> Wanted but not invoked:
> resourceManagerGateway.disconnectJobManager(
> be75c925204aede002136b15238f88b4,
> 
> );
> -> at 
> org.apache.flink.runtime.jobmaster.JobMasterTest.testHeartbeatTimeoutWithResourceManager(JobMasterTest.java:220)
> However, there were other interactions with this mock:
> resourceManagerGateway.registerJobManager(
> 82d774de-ed78-4670-9623-2fc6638fbbf9,
> 11b045ea-8b2b-4df0-aa02-ef4922dfc632,
> jm,
> "b442340a-7d3d-49d8-b440-1a93a5b43bb6",
> be75c925204aede002136b15238f88b4,
> 100 ms
> );
> -> at 
> org.apache.flink.runtime.jobmaster.JobMaster$ResourceManagerConnection$1.invokeRegistration(JobMaster.java:1051)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (FLINK-7064) test instability in WordCountMapreduceITCase

2018-02-26 Thread Aljoscha Krettek (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-7064?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aljoscha Krettek updated FLINK-7064:

Fix Version/s: 1.5.0

> test instability in WordCountMapreduceITCase
> 
>
> Key: FLINK-7064
> URL: https://issues.apache.org/jira/browse/FLINK-7064
> Project: Flink
>  Issue Type: Bug
>  Components: Tests
>Affects Versions: 1.4.0
>Reporter: Nico Kruber
>Priority: Blocker
>  Labels: test-stability
> Fix For: 1.5.0
>
>
> Although already mentioned in FLINK-7004, this does not seem fixed yet and 
> apparently now became an instable test:
> {code}
> Running org.apache.flink.test.hadoop.mapred.WordCountMapredITCase
> Inflater has been closed
> java.lang.NullPointerException: Inflater has been closed
>   at java.util.zip.Inflater.ensureOpen(Inflater.java:389)
>   at java.util.zip.Inflater.inflate(Inflater.java:257)
>   at java.util.zip.InflaterInputStream.read(InflaterInputStream.java:152)
>   at java.io.FilterInputStream.read(FilterInputStream.java:133)
>   at sun.nio.cs.StreamDecoder.readBytes(StreamDecoder.java:283)
>   at sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:325)
>   at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:177)
>   at java.io.InputStreamReader.read(InputStreamReader.java:184)
>   at java.io.BufferedReader.fill(BufferedReader.java:154)
>   at java.io.BufferedReader.readLine(BufferedReader.java:317)
>   at java.io.BufferedReader.readLine(BufferedReader.java:382)
>   at 
> javax.xml.parsers.FactoryFinder.findJarServiceProvider(FactoryFinder.java:319)
>   at javax.xml.parsers.FactoryFinder.find(FactoryFinder.java:255)
>   at 
> javax.xml.parsers.DocumentBuilderFactory.newInstance(DocumentBuilderFactory.java:121)
>   at 
> org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:2467)
>   at 
> org.apache.hadoop.conf.Configuration.loadResources(Configuration.java:2444)
>   at 
> org.apache.hadoop.conf.Configuration.getProps(Configuration.java:2361)
>   at org.apache.hadoop.conf.Configuration.get(Configuration.java:968)
>   at 
> org.apache.hadoop.mapred.JobConf.checkAndWarnDeprecation(JobConf.java:2010)
>   at org.apache.hadoop.mapred.JobConf.(JobConf.java:423)
>   at 
> org.apache.flink.hadoopcompatibility.HadoopInputs.readHadoopFile(HadoopInputs.java:63)
>   at 
> org.apache.flink.test.hadoop.mapred.WordCountMapredITCase.internalRun(WordCountMapredITCase.java:79)
>   at 
> org.apache.flink.test.hadoop.mapred.WordCountMapredITCase.testProgram(WordCountMapredITCase.java:67)
>   at 
> org.apache.flink.test.util.JavaProgramTestBase.testJobWithObjectReuse(JavaProgramTestBase.java:127)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:606)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
>   at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:55)
>   at org.junit.rules.RunRules.evaluate(RunRules.java:20)
>   at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
>   at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
>   at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
>   at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
>   at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
>   at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
>   at org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:48)
>   at org.junit.rules.RunRules.evaluate(RunRules.java:20)
>   at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:283)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:173)
>   at 
>

[jira] [Updated] (FLINK-7064) test instability in WordCountMapreduceITCase

2018-02-26 Thread Aljoscha Krettek (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-7064?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aljoscha Krettek updated FLINK-7064:

Priority: Blocker  (was: Critical)

> test instability in WordCountMapreduceITCase
> 
>
> Key: FLINK-7064
> URL: https://issues.apache.org/jira/browse/FLINK-7064
> Project: Flink
>  Issue Type: Bug
>  Components: Tests
>Affects Versions: 1.4.0
>Reporter: Nico Kruber
>Priority: Blocker
>  Labels: test-stability
> Fix For: 1.5.0
>
>
> Although already mentioned in FLINK-7004, this does not seem fixed yet and 
> apparently now became an instable test:
> {code}
> Running org.apache.flink.test.hadoop.mapred.WordCountMapredITCase
> Inflater has been closed
> java.lang.NullPointerException: Inflater has been closed
>   at java.util.zip.Inflater.ensureOpen(Inflater.java:389)
>   at java.util.zip.Inflater.inflate(Inflater.java:257)
>   at java.util.zip.InflaterInputStream.read(InflaterInputStream.java:152)
>   at java.io.FilterInputStream.read(FilterInputStream.java:133)
>   at sun.nio.cs.StreamDecoder.readBytes(StreamDecoder.java:283)
>   at sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:325)
>   at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:177)
>   at java.io.InputStreamReader.read(InputStreamReader.java:184)
>   at java.io.BufferedReader.fill(BufferedReader.java:154)
>   at java.io.BufferedReader.readLine(BufferedReader.java:317)
>   at java.io.BufferedReader.readLine(BufferedReader.java:382)
>   at 
> javax.xml.parsers.FactoryFinder.findJarServiceProvider(FactoryFinder.java:319)
>   at javax.xml.parsers.FactoryFinder.find(FactoryFinder.java:255)
>   at 
> javax.xml.parsers.DocumentBuilderFactory.newInstance(DocumentBuilderFactory.java:121)
>   at 
> org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:2467)
>   at 
> org.apache.hadoop.conf.Configuration.loadResources(Configuration.java:2444)
>   at 
> org.apache.hadoop.conf.Configuration.getProps(Configuration.java:2361)
>   at org.apache.hadoop.conf.Configuration.get(Configuration.java:968)
>   at 
> org.apache.hadoop.mapred.JobConf.checkAndWarnDeprecation(JobConf.java:2010)
>   at org.apache.hadoop.mapred.JobConf.(JobConf.java:423)
>   at 
> org.apache.flink.hadoopcompatibility.HadoopInputs.readHadoopFile(HadoopInputs.java:63)
>   at 
> org.apache.flink.test.hadoop.mapred.WordCountMapredITCase.internalRun(WordCountMapredITCase.java:79)
>   at 
> org.apache.flink.test.hadoop.mapred.WordCountMapredITCase.testProgram(WordCountMapredITCase.java:67)
>   at 
> org.apache.flink.test.util.JavaProgramTestBase.testJobWithObjectReuse(JavaProgramTestBase.java:127)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:606)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
>   at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:55)
>   at org.junit.rules.RunRules.evaluate(RunRules.java:20)
>   at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
>   at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
>   at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
>   at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
>   at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
>   at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
>   at org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:48)
>   at org.junit.rules.RunRules.evaluate(RunRules.java:20)
>   at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:283)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:173)
>   at 
>

[jira] [Updated] (FLINK-8034) ProcessFailureCancelingITCase.testCancelingOnProcessFailure failing on Travis

2018-02-26 Thread Aljoscha Krettek (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-8034?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aljoscha Krettek updated FLINK-8034:

Priority: Blocker  (was: Critical)

> ProcessFailureCancelingITCase.testCancelingOnProcessFailure failing on Travis
> -
>
> Key: FLINK-8034
> URL: https://issues.apache.org/jira/browse/FLINK-8034
> Project: Flink
>  Issue Type: Bug
>  Components: Tests
>Affects Versions: 1.5.0
>Reporter: Till Rohrmann
>Priority: Blocker
>  Labels: test-stability
> Fix For: 1.5.0
>
>
> The {{ProcessFailureCancelingITCase.testCancelingOnProcessFailure}} is 
> failing on Travis spuriously.
> https://travis-ci.org/tillrohrmann/flink/jobs/299075703



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (FLINK-7733) test instability in Kafka end-to-end test (RetriableCommitFailedException)

2018-02-26 Thread Aljoscha Krettek (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-7733?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aljoscha Krettek updated FLINK-7733:

Priority: Blocker  (was: Critical)

> test instability in Kafka end-to-end test (RetriableCommitFailedException)
> --
>
> Key: FLINK-7733
> URL: https://issues.apache.org/jira/browse/FLINK-7733
> Project: Flink
>  Issue Type: Bug
>  Components: Kafka Connector, Tests
>Affects Versions: 1.4.0
>Reporter: Nico Kruber
>Priority: Blocker
>  Labels: test-stability
> Fix For: 1.5.0
>
>
> In a branch with unrelated changes, the Kafka end-to-end tests fails with the 
> following strange entries in the log.
> {code}
> 2017-09-27 17:50:36,777 INFO  org.apache.kafka.common.utils.AppInfoParser 
>   - Kafka version : 0.10.2.1
> 2017-09-27 17:50:36,778 INFO  org.apache.kafka.common.utils.AppInfoParser 
>   - Kafka commitId : e89bffd6b2eff799
> 2017-09-27 17:50:38,492 INFO  
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator  - Discovered 
> coordinator testing-gce-b6810926-7003-4390-9002-178739bb4946:9092 (id: 
> 2147483647 rack: null) for group myconsumer.
> 2017-09-27 17:50:38,511 INFO  
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator  - Marking 
> the coordinator testing-gce-b6810926-7003-4390-9002-178739bb4946:9092 (id: 
> 2147483647 rack: null) dead for group myconsumer
> 2017-09-27 17:50:38,618 INFO  
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator  - Discovered 
> coordinator testing-gce-b6810926-7003-4390-9002-178739bb4946:9092 (id: 
> 2147483647 rack: null) for group myconsumer.
> 2017-09-27 17:50:41,525 INFO  
> org.apache.flink.runtime.state.DefaultOperatorStateBackend- 
> DefaultOperatorStateBackend snapshot (In-Memory Stream Factory, synchronous 
> part) in thread Thread[Async calls on Source: Custom Source -> Map -> Sink: 
> Unnamed (1/1),5,Flink Task Threads] took 481 ms.
> 2017-09-27 17:50:41,598 INFO  
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator  - Marking 
> the coordinator testing-gce-b6810926-7003-4390-9002-178739bb4946:9092 (id: 
> 2147483647 rack: null) dead for group myconsumer
> 2017-09-27 17:50:41,600 WARN  
> org.apache.flink.streaming.connectors.kafka.internal.Kafka09Fetcher  - 
> Committing offsets to Kafka failed. This does not compromise Flink's 
> checkpoints.
> org.apache.kafka.clients.consumer.RetriableCommitFailedException: Offset 
> commit failed with a retriable exception. You should retry committing offsets.
> Caused by: org.apache.kafka.common.errors.DisconnectException
> 2017-09-27 17:50:41,608 ERROR 
> org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumerBase  - Async 
> Kafka commit failed.
> org.apache.kafka.clients.consumer.RetriableCommitFailedException: Offset 
> commit failed with a retriable exception. You should retry committing offsets.
> Caused by: org.apache.kafka.common.errors.DisconnectException
> {code}
> https://travis-ci.org/NicoK/flink/jobs/280477275



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (FLINK-8163) NonHAQueryableStateFsBackendITCase test getting stuck on Travis

2018-02-26 Thread Aljoscha Krettek (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-8163?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aljoscha Krettek updated FLINK-8163:

Priority: Blocker  (was: Critical)

> NonHAQueryableStateFsBackendITCase test getting stuck on Travis
> ---
>
> Key: FLINK-8163
> URL: https://issues.apache.org/jira/browse/FLINK-8163
> Project: Flink
>  Issue Type: Bug
>  Components: Queryable State, Tests
>Affects Versions: 1.5.0
>Reporter: Till Rohrmann
>Assignee: Kostas Kloudas
>Priority: Blocker
>  Labels: test-stability
> Fix For: 1.5.0
>
>
> The {{NonHAQueryableStateFsBackendITCase}} tests seems to get stuck on Travis 
> producing no output for 300s.
> https://travis-ci.org/tillrohrmann/flink/jobs/307988209



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (FLINK-8034) ProcessFailureCancelingITCase.testCancelingOnProcessFailure failing on Travis

2018-02-26 Thread Aljoscha Krettek (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-8034?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aljoscha Krettek updated FLINK-8034:

Fix Version/s: 1.5.0

> ProcessFailureCancelingITCase.testCancelingOnProcessFailure failing on Travis
> -
>
> Key: FLINK-8034
> URL: https://issues.apache.org/jira/browse/FLINK-8034
> Project: Flink
>  Issue Type: Bug
>  Components: Tests
>Affects Versions: 1.5.0
>Reporter: Till Rohrmann
>Priority: Blocker
>  Labels: test-stability
> Fix For: 1.5.0
>
>
> The {{ProcessFailureCancelingITCase.testCancelingOnProcessFailure}} is 
> failing on Travis spuriously.
> https://travis-ci.org/tillrohrmann/flink/jobs/299075703



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (FLINK-7733) test instability in Kafka end-to-end test (RetriableCommitFailedException)

2018-02-26 Thread Aljoscha Krettek (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-7733?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aljoscha Krettek updated FLINK-7733:

Fix Version/s: 1.5.0

> test instability in Kafka end-to-end test (RetriableCommitFailedException)
> --
>
> Key: FLINK-7733
> URL: https://issues.apache.org/jira/browse/FLINK-7733
> Project: Flink
>  Issue Type: Bug
>  Components: Kafka Connector, Tests
>Affects Versions: 1.4.0
>Reporter: Nico Kruber
>Priority: Blocker
>  Labels: test-stability
> Fix For: 1.5.0
>
>
> In a branch with unrelated changes, the Kafka end-to-end tests fails with the 
> following strange entries in the log.
> {code}
> 2017-09-27 17:50:36,777 INFO  org.apache.kafka.common.utils.AppInfoParser 
>   - Kafka version : 0.10.2.1
> 2017-09-27 17:50:36,778 INFO  org.apache.kafka.common.utils.AppInfoParser 
>   - Kafka commitId : e89bffd6b2eff799
> 2017-09-27 17:50:38,492 INFO  
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator  - Discovered 
> coordinator testing-gce-b6810926-7003-4390-9002-178739bb4946:9092 (id: 
> 2147483647 rack: null) for group myconsumer.
> 2017-09-27 17:50:38,511 INFO  
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator  - Marking 
> the coordinator testing-gce-b6810926-7003-4390-9002-178739bb4946:9092 (id: 
> 2147483647 rack: null) dead for group myconsumer
> 2017-09-27 17:50:38,618 INFO  
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator  - Discovered 
> coordinator testing-gce-b6810926-7003-4390-9002-178739bb4946:9092 (id: 
> 2147483647 rack: null) for group myconsumer.
> 2017-09-27 17:50:41,525 INFO  
> org.apache.flink.runtime.state.DefaultOperatorStateBackend- 
> DefaultOperatorStateBackend snapshot (In-Memory Stream Factory, synchronous 
> part) in thread Thread[Async calls on Source: Custom Source -> Map -> Sink: 
> Unnamed (1/1),5,Flink Task Threads] took 481 ms.
> 2017-09-27 17:50:41,598 INFO  
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator  - Marking 
> the coordinator testing-gce-b6810926-7003-4390-9002-178739bb4946:9092 (id: 
> 2147483647 rack: null) dead for group myconsumer
> 2017-09-27 17:50:41,600 WARN  
> org.apache.flink.streaming.connectors.kafka.internal.Kafka09Fetcher  - 
> Committing offsets to Kafka failed. This does not compromise Flink's 
> checkpoints.
> org.apache.kafka.clients.consumer.RetriableCommitFailedException: Offset 
> commit failed with a retriable exception. You should retry committing offsets.
> Caused by: org.apache.kafka.common.errors.DisconnectException
> 2017-09-27 17:50:41,608 ERROR 
> org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumerBase  - Async 
> Kafka commit failed.
> org.apache.kafka.clients.consumer.RetriableCommitFailedException: Offset 
> commit failed with a retriable exception. You should retry committing offsets.
> Caused by: org.apache.kafka.common.errors.DisconnectException
> {code}
> https://travis-ci.org/NicoK/flink/jobs/280477275



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (FLINK-8336) YarnFileStageTestS3ITCase.testRecursiveUploadForYarnS3 test instability

2018-02-26 Thread Aljoscha Krettek (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-8336?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aljoscha Krettek updated FLINK-8336:

Priority: Blocker  (was: Critical)

> YarnFileStageTestS3ITCase.testRecursiveUploadForYarnS3 test instability
> ---
>
> Key: FLINK-8336
> URL: https://issues.apache.org/jira/browse/FLINK-8336
> Project: Flink
>  Issue Type: Bug
>  Components: FileSystem, Tests, YARN
>Affects Versions: 1.5.0
>Reporter: Till Rohrmann
>Priority: Blocker
>  Labels: test-stability
> Fix For: 1.5.0, 1.4.2
>
>
> The {{YarnFileStageTestS3ITCase.testRecursiveUploadForYarnS3}} fails on 
> Travis. I suspect that this has something to do with the consistency 
> guarantees S3 gives us.
> https://travis-ci.org/tillrohrmann/flink/jobs/323930297



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (FLINK-8073) Test instability FlinkKafkaProducer011ITCase.testScaleDownBeforeFirstCheckpoint()

2018-02-26 Thread Aljoscha Krettek (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-8073?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aljoscha Krettek updated FLINK-8073:

Fix Version/s: 1.5.0

> Test instability 
> FlinkKafkaProducer011ITCase.testScaleDownBeforeFirstCheckpoint()
> -
>
> Key: FLINK-8073
> URL: https://issues.apache.org/jira/browse/FLINK-8073
> Project: Flink
>  Issue Type: Bug
>  Components: Kafka Connector, Tests
>Affects Versions: 1.4.0, 1.5.0
>Reporter: Kostas Kloudas
>Priority: Blocker
>  Labels: test-stability
> Fix For: 1.5.0
>
>
> Travis log: https://travis-ci.org/kl0u/flink/jobs/301985988



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (FLINK-8337) GatherSumApplyITCase.testConnectedComponentsWithObjectReuseDisabled instable

2018-02-26 Thread Aljoscha Krettek (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-8337?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aljoscha Krettek updated FLINK-8337:

Priority: Blocker  (was: Critical)

> GatherSumApplyITCase.testConnectedComponentsWithObjectReuseDisabled instable
> 
>
> Key: FLINK-8337
> URL: https://issues.apache.org/jira/browse/FLINK-8337
> Project: Flink
>  Issue Type: Bug
>  Components: Gelly, Tests
>Affects Versions: 1.5.0
>Reporter: Till Rohrmann
>Priority: Blocker
>  Labels: test-stability
> Fix For: 1.5.0, 1.4.2
>
>
> The {{GatherSumApplyITCase.testConnectedComponentsWithObjectReuseDisabled}} 
> fails on Travis. It looks as if a sub partition has not been registered at 
> the task event dispatcher.
> https://travis-ci.org/apache/flink/jobs/323930301



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (FLINK-8073) Test instability FlinkKafkaProducer011ITCase.testScaleDownBeforeFirstCheckpoint()

2018-02-26 Thread Aljoscha Krettek (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-8073?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aljoscha Krettek updated FLINK-8073:

Priority: Blocker  (was: Critical)

> Test instability 
> FlinkKafkaProducer011ITCase.testScaleDownBeforeFirstCheckpoint()
> -
>
> Key: FLINK-8073
> URL: https://issues.apache.org/jira/browse/FLINK-8073
> Project: Flink
>  Issue Type: Bug
>  Components: Kafka Connector, Tests
>Affects Versions: 1.4.0, 1.5.0
>Reporter: Kostas Kloudas
>Priority: Blocker
>  Labels: test-stability
> Fix For: 1.5.0
>
>
> Travis log: https://travis-ci.org/kl0u/flink/jobs/301985988



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (FLINK-8163) NonHAQueryableStateFsBackendITCase test getting stuck on Travis

2018-02-26 Thread Aljoscha Krettek (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-8163?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aljoscha Krettek updated FLINK-8163:

Fix Version/s: 1.5.0

> NonHAQueryableStateFsBackendITCase test getting stuck on Travis
> ---
>
> Key: FLINK-8163
> URL: https://issues.apache.org/jira/browse/FLINK-8163
> Project: Flink
>  Issue Type: Bug
>  Components: Queryable State, Tests
>Affects Versions: 1.5.0
>Reporter: Till Rohrmann
>Assignee: Kostas Kloudas
>Priority: Blocker
>  Labels: test-stability
> Fix For: 1.5.0
>
>
> The {{NonHAQueryableStateFsBackendITCase}} tests seems to get stuck on Travis 
> producing no output for 300s.
> https://travis-ci.org/tillrohrmann/flink/jobs/307988209



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (FLINK-8460) UtilsTest.testYarnFlinkResourceManagerJobManagerLostLeadership instable on Travis

2018-02-26 Thread Aljoscha Krettek (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-8460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aljoscha Krettek updated FLINK-8460:

Fix Version/s: 1.5.0

> UtilsTest.testYarnFlinkResourceManagerJobManagerLostLeadership instable on 
> Travis
> -
>
> Key: FLINK-8460
> URL: https://issues.apache.org/jira/browse/FLINK-8460
> Project: Flink
>  Issue Type: Bug
>  Components: Tests, YARN
>Affects Versions: 1.5.0
>Reporter: Till Rohrmann
>Priority: Blocker
>  Labels: test-stability
> Fix For: 1.5.0
>
>
> There is a test instability in 
> {{UtilsTest.testYarnFlinkResourceManagerJobManagerLostLeadership}} on Travis.
>  
> https://api.travis-ci.org/v3/job/330406729/log.txt



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (FLINK-8460) UtilsTest.testYarnFlinkResourceManagerJobManagerLostLeadership instable on Travis

2018-02-26 Thread Aljoscha Krettek (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-8460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aljoscha Krettek updated FLINK-8460:

Priority: Blocker  (was: Critical)

> UtilsTest.testYarnFlinkResourceManagerJobManagerLostLeadership instable on 
> Travis
> -
>
> Key: FLINK-8460
> URL: https://issues.apache.org/jira/browse/FLINK-8460
> Project: Flink
>  Issue Type: Bug
>  Components: Tests, YARN
>Affects Versions: 1.5.0
>Reporter: Till Rohrmann
>Priority: Blocker
>  Labels: test-stability
> Fix For: 1.5.0
>
>
> There is a test instability in 
> {{UtilsTest.testYarnFlinkResourceManagerJobManagerLostLeadership}} on Travis.
>  
> https://api.travis-ci.org/v3/job/330406729/log.txt



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (FLINK-8402) HadoopS3FileSystemITCase.testDirectoryListing fails on Travis

2018-02-26 Thread Aljoscha Krettek (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-8402?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aljoscha Krettek updated FLINK-8402:

Priority: Blocker  (was: Critical)

> HadoopS3FileSystemITCase.testDirectoryListing fails on Travis
> -
>
> Key: FLINK-8402
> URL: https://issues.apache.org/jira/browse/FLINK-8402
> Project: Flink
>  Issue Type: Bug
>  Components: FileSystem, Tests
>Affects Versions: 1.5.0
>Reporter: Till Rohrmann
>Priority: Blocker
>  Labels: test-stability
> Fix For: 1.5.0, 1.4.2
>
>
> The test {{HadoopS3FileSystemITCase.testDirectoryListing}} fails on Travis.
> https://travis-ci.org/tillrohrmann/flink/jobs/327021175



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (FLINK-8408) YarnFileStageTestS3ITCase.testRecursiveUploadForYarnS3n unstable on Travis

2018-02-26 Thread Aljoscha Krettek (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-8408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aljoscha Krettek updated FLINK-8408:

Priority: Blocker  (was: Critical)

> YarnFileStageTestS3ITCase.testRecursiveUploadForYarnS3n unstable on Travis
> --
>
> Key: FLINK-8408
> URL: https://issues.apache.org/jira/browse/FLINK-8408
> Project: Flink
>  Issue Type: Bug
>  Components: FileSystem, Tests
>Affects Versions: 1.5.0
>Reporter: Till Rohrmann
>Priority: Blocker
>  Labels: test-stability
> Fix For: 1.5.0, 1.4.2
>
>
> The {{YarnFileStageTestS3ITCase.testRecursiveUploadForYarnS3n}} is unstable 
> on Travis.
> https://travis-ci.org/tillrohrmann/flink/jobs/327216460



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (FLINK-8452) BucketingSinkTest.testNonRollingAvroKeyValueWithCompressionWriter instable on Travis

2018-02-26 Thread Aljoscha Krettek (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-8452?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aljoscha Krettek updated FLINK-8452:

Priority: Blocker  (was: Critical)

> BucketingSinkTest.testNonRollingAvroKeyValueWithCompressionWriter instable on 
> Travis
> 
>
> Key: FLINK-8452
> URL: https://issues.apache.org/jira/browse/FLINK-8452
> Project: Flink
>  Issue Type: Bug
>  Components: DataStream API, Tests
>Affects Versions: 1.5.0
>Reporter: Till Rohrmann
>Priority: Blocker
>  Labels: test-stability
> Fix For: 1.5.0
>
>
> The {{BucketingSinkTest.testNonRollingAvroKeyValueWithCompressionWriter}} 
> seems to be instable on Travis:
>  
> https://travis-ci.org/tillrohrmann/flink/jobs/330261310



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (FLINK-8452) BucketingSinkTest.testNonRollingAvroKeyValueWithCompressionWriter instable on Travis

2018-02-26 Thread Aljoscha Krettek (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-8452?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aljoscha Krettek updated FLINK-8452:

Fix Version/s: 1.5.0

> BucketingSinkTest.testNonRollingAvroKeyValueWithCompressionWriter instable on 
> Travis
> 
>
> Key: FLINK-8452
> URL: https://issues.apache.org/jira/browse/FLINK-8452
> Project: Flink
>  Issue Type: Bug
>  Components: DataStream API, Tests
>Affects Versions: 1.5.0
>Reporter: Till Rohrmann
>Priority: Blocker
>  Labels: test-stability
> Fix For: 1.5.0
>
>
> The {{BucketingSinkTest.testNonRollingAvroKeyValueWithCompressionWriter}} 
> seems to be instable on Travis:
>  
> https://travis-ci.org/tillrohrmann/flink/jobs/330261310



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (FLINK-8418) Kafka08ITCase.testStartFromLatestOffsets() times out on Travis

2018-02-26 Thread Aljoscha Krettek (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-8418?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aljoscha Krettek updated FLINK-8418:

Priority: Blocker  (was: Critical)

> Kafka08ITCase.testStartFromLatestOffsets() times out on Travis
> --
>
> Key: FLINK-8418
> URL: https://issues.apache.org/jira/browse/FLINK-8418
> Project: Flink
>  Issue Type: Bug
>  Components: Kafka Connector, Tests
>Affects Versions: 1.4.0, 1.5.0
>Reporter: Tzu-Li (Gordon) Tai
>Assignee: Tzu-Li (Gordon) Tai
>Priority: Blocker
>  Labels: test-stability
> Fix For: 1.5.0, 1.4.2
>
>
> Instance: https://travis-ci.org/kl0u/flink/builds/327733085



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (FLINK-8521) FlinkKafkaProducer011ITCase.testRunOutOfProducersInThePool timed out on Travis

2018-02-26 Thread Aljoscha Krettek (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-8521?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aljoscha Krettek updated FLINK-8521:

Priority: Blocker  (was: Critical)

> FlinkKafkaProducer011ITCase.testRunOutOfProducersInThePool timed out on Travis
> --
>
> Key: FLINK-8521
> URL: https://issues.apache.org/jira/browse/FLINK-8521
> Project: Flink
>  Issue Type: Bug
>  Components: Kafka Connector, Tests
>Affects Versions: 1.5.0
>Reporter: Till Rohrmann
>Priority: Blocker
>  Labels: test-stability
> Fix For: 1.5.0, 1.4.2
>
>
> The {{FlinkKafkaProducer011ITCase.testRunOutOfProducersInThePool}} timed out 
> on Travis with producing no output for longer than 300s.
>  
> https://travis-ci.org/tillrohrmann/flink/jobs/334642014



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (FLINK-8512) HAQueryableStateFsBackendITCase unstable on Travis

2018-02-26 Thread Aljoscha Krettek (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-8512?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aljoscha Krettek updated FLINK-8512:

Priority: Blocker  (was: Critical)

> HAQueryableStateFsBackendITCase unstable on Travis
> --
>
> Key: FLINK-8512
> URL: https://issues.apache.org/jira/browse/FLINK-8512
> Project: Flink
>  Issue Type: Bug
>  Components: Queryable State, Tests
>Affects Versions: 1.5.0
>Reporter: Till Rohrmann
>Priority: Blocker
>  Labels: test-stability
> Fix For: 1.5.0, 1.4.2
>
>
> {{HAQueryableStateFsBackendITCase}} is unstable on Travis.
> In the logs one can see that there is an {{AssertionError}} in the 
> {{globalEventExecutor-1-1}} {{Thread}}. This indicates that assertions are 
> not properly propagated and simply swallowed.
>  
> https://travis-ci.org/apache/flink/jobs/333250401



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (FLINK-8517) StaticlyNestedIterationsITCase.testJobWithoutObjectReuse unstable on Travis

2018-02-26 Thread Aljoscha Krettek (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-8517?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aljoscha Krettek updated FLINK-8517:

Priority: Blocker  (was: Critical)

> StaticlyNestedIterationsITCase.testJobWithoutObjectReuse unstable on Travis
> ---
>
> Key: FLINK-8517
> URL: https://issues.apache.org/jira/browse/FLINK-8517
> Project: Flink
>  Issue Type: Bug
>  Components: DataSet API, Tests
>Affects Versions: 1.5.0
>Reporter: Till Rohrmann
>Priority: Blocker
>  Labels: test-stability
> Fix For: 1.5.0, 1.4.2
>
>
> The {{StaticlyNestedIterationsITCase.testJobWithoutObjectReuse}} test case 
> fails on Travis. This exception might be relevant:
> {code:java}
> org.apache.flink.runtime.client.JobExecutionException: Job execution failed.
>   at 
> org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1$$anonfun$applyOrElse$6.apply$mcV$sp(JobManager.scala:891)
>   at 
> org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1$$anonfun$applyOrElse$6.apply(JobManager.scala:834)
>   at 
> org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1$$anonfun$applyOrElse$6.apply(JobManager.scala:834)
>   at 
> scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24)
>   at 
> scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24)
>   at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:39)
>   at 
> akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:415)
>   at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
>   at 
> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
>   at 
> scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
>   at 
> scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
> Caused by: java.lang.IllegalStateException: Partition 
> 557b069f2b89f8ba599e6ab0956a3f5a@58f1a6b7d8ae10b9141f17c08d06cecb not 
> registered at task event dispatcher.
>   at 
> org.apache.flink.runtime.io.network.TaskEventDispatcher.subscribeToEvent(TaskEventDispatcher.java:107)
>   at 
> org.apache.flink.runtime.iterative.task.IterationHeadTask.initSuperstepBarrier(IterationHeadTask.java:242)
>   at 
> org.apache.flink.runtime.iterative.task.IterationHeadTask.run(IterationHeadTask.java:266)
>   at 
> org.apache.flink.runtime.operators.BatchTask.invoke(BatchTask.java:368)
>   at org.apache.flink.runtime.taskmanager.Task.run(Task.java:703)
>   at java.lang.Thread.run(Thread.java:748){code}
>  
> https://api.travis-ci.org/v3/job/60156/log.txt



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (FLINK-8731) TwoInputStreamTaskTest flaky on travis

2018-02-26 Thread Aljoscha Krettek (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-8731?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aljoscha Krettek updated FLINK-8731:

Issue Type: Bug  (was: Improvement)

> TwoInputStreamTaskTest flaky on travis
> --
>
> Key: FLINK-8731
> URL: https://issues.apache.org/jira/browse/FLINK-8731
> Project: Flink
>  Issue Type: Bug
>  Components: Streaming, Tests
>Affects Versions: 1.5.0
>Reporter: Chesnay Schepler
>Priority: Blocker
>  Labels: test-stability
> Fix For: 1.5.0
>
>
> https://travis-ci.org/zentol/flink/builds/344307861
> {code}
> Tests run: 5, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 2.479 sec <<< 
> FAILURE! - in org.apache.flink.streaming.runtime.tasks.TwoInputStreamTaskTest
> testOpenCloseAndTimestamps(org.apache.flink.streaming.runtime.tasks.TwoInputStreamTaskTest)
>   Time elapsed: 0.05 sec  <<< ERROR!
> java.lang.Exception: error in task
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTaskTestHarness.waitForTaskCompletion(StreamTaskTestHarness.java:250)
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTaskTestHarness.waitForTaskCompletion(StreamTaskTestHarness.java:233)
>   at 
> org.apache.flink.streaming.runtime.tasks.TwoInputStreamTaskTest.testOpenCloseAndTimestamps(TwoInputStreamTaskTest.java:99)
> Caused by: org.mockito.exceptions.misusing.WrongTypeOfReturnValue: 
> Boolean cannot be returned by getChannelIndex()
> getChannelIndex() should return int
> ***
> If you're unsure why you're getting above error read on.
> Due to the nature of the syntax above problem might occur because:
> 1. This exception *might* occur in wrongly written multi-threaded tests.
>Please refer to Mockito FAQ on limitations of concurrency testing.
> 2. A spy is stubbed using when(spy.foo()).then() syntax. It is safer to stub 
> spies - 
>- with doReturn|Throw() family of methods. More in javadocs for 
> Mockito.spy() method.
>   at 
> org.apache.flink.runtime.io.network.partition.consumer.UnionInputGate.waitAndGetNextInputGate(UnionInputGate.java:212)
>   at 
> org.apache.flink.runtime.io.network.partition.consumer.UnionInputGate.getNextBufferOrEvent(UnionInputGate.java:158)
>   at 
> org.apache.flink.streaming.runtime.io.BarrierBuffer.getNextNonBlocked(BarrierBuffer.java:164)
>   at 
> org.apache.flink.streaming.runtime.io.StreamTwoInputProcessor.processInput(StreamTwoInputProcessor.java:292)
>   at 
> org.apache.flink.streaming.runtime.tasks.TwoInputStreamTask.run(TwoInputStreamTask.java:115)
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:308)
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTaskTestHarness$TaskThread.run(StreamTaskTestHarness.java:437)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (FLINK-8792) bad semantic of method name MessageQueryParameter.convertStringToValue

2018-02-26 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-8792?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16378136#comment-16378136
 ] 

ASF GitHub Bot commented on FLINK-8792:
---

GitHub user yanghua opened a pull request:

https://github.com/apache/flink/pull/5587

[FLINK-8792][REST] bad semantic of method name 
MessageQueryParameter.convertStringToValue

## What is the purpose of the change

*This pull request changed a method name to let it has a good semantic*

## Brief change log

  - *changed method name from `convertStringToValue` to 
`convertValueToString`*

## Verifying this change

This change is a trivial rework / code cleanup without any test coverage.

## Does this pull request potentially affect one of the following parts:

  - Dependencies (does it add or upgrade a dependency): (no)
  - The public API, i.e., is any changed class annotated with 
`@Public(Evolving)`: (no)
  - The serializers: (no)
  - The runtime per-record code paths (performance sensitive): (no)
  - Anything that affects deployment or recovery: JobManager (and its 
components), Checkpointing, Yarn/Mesos, ZooKeeper: (no)
  - The S3 file system connector: (no)

## Documentation

  - Does this pull request introduce a new feature? (no)
  - If yes, how is the feature documented? (not documented)


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/yanghua/flink FLINK-8792

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/5587.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #5587


commit b9907188e66785cc86d6e3b7c0a825a52b86eb28
Author: vinoyang 
Date:   2018-02-27T06:43:52Z

[FLINK-8792][REST] bad semantic of method name 
MessageQueryParameter.convertStringToValue




>  bad semantic of method name MessageQueryParameter.convertStringToValue 
> 
>
> Key: FLINK-8792
> URL: https://issues.apache.org/jira/browse/FLINK-8792
> Project: Flink
>  Issue Type: Improvement
>  Components: REST
>Reporter: vinoyang
>Assignee: vinoyang
>Priority: Minor
>
> the method name
> {code:java}
> MessageQueryParameter.convertStringToValue
> {code}
> should be 
> {code:java}
> convertValueToString
> {code}
> or
> {code:java}
> convertStringFromValue{code}
>  I think 
> {code:java}
> convertValueToString
> {code}
> would be better.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[GitHub] flink pull request #5587: [FLINK-8792][REST] bad semantic of method name Mes...

2018-02-26 Thread yanghua

GitHub user yanghua opened a pull request:

https://github.com/apache/flink/pull/5587

[FLINK-8792][REST] bad semantic of method name 
MessageQueryParameter.convertStringToValue

## What is the purpose of the change

*This pull request changed a method name to let it has a good semantic*

## Brief change log

  - *changed method name from `convertStringToValue` to 
`convertValueToString`*

## Verifying this change

This change is a trivial rework / code cleanup without any test coverage.

## Does this pull request potentially affect one of the following parts:

  - Dependencies (does it add or upgrade a dependency): (no)
  - The public API, i.e., is any changed class annotated with 
`@Public(Evolving)`: (no)
  - The serializers: (no)
  - The runtime per-record code paths (performance sensitive): (no)
  - Anything that affects deployment or recovery: JobManager (and its 
components), Checkpointing, Yarn/Mesos, ZooKeeper: (no)
  - The S3 file system connector: (no)

## Documentation

  - Does this pull request introduce a new feature? (no)
  - If yes, how is the feature documented? (not documented)


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/yanghua/flink FLINK-8792

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/5587.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #5587


commit b9907188e66785cc86d6e3b7c0a825a52b86eb28
Author: vinoyang 
Date:   2018-02-27T06:43:52Z

[FLINK-8792][REST] bad semantic of method name 
MessageQueryParameter.convertStringToValue




---

[jira] [Commented] (FLINK-8689) Add runtime support of distinct filter using MapView

2018-02-26 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-8689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16378122#comment-16378122
 ] 

ASF GitHub Bot commented on FLINK-8689:
---

Github user hequn8128 commented on a diff in the pull request:

https://github.com/apache/flink/pull/#discussion_r170828036
  
--- Diff: 
flink-libraries/flink-table/src/main/scala/org/apache/flink/table/runtime/aggregate/AggregateUtil.scala
 ---
@@ -1393,6 +1393,21 @@ object AggregateUtil {
 throw new TableException(s"unsupported Function: 
'${unSupported.getName}'")
 }
   }
+
+  // create distinct accumulator delegate
+  if (aggregateCall.isDistinct) {
--- End diff --

Sql will be verified by calcite to exclude single DISTINCT during sql parse 
phase, so maybe we don't have to consider single distinct here.


> Add runtime support of distinct filter using MapView 
> -
>
> Key: FLINK-8689
> URL: https://issues.apache.org/jira/browse/FLINK-8689
> Project: Flink
>  Issue Type: Sub-task
>Reporter: Rong Rong
>Assignee: Rong Rong
>Priority: Major
>
> This ticket should cover distinct aggregate function support to codegen for 
> *AggregateCall*, where *isDistinct* fields is set to true.
> This can be verified using the following SQL, which is not currently 
> producing correct results.
> {code:java}
> SELECT
>   a,
>   SUM(b) OVER (PARTITION BY a ORDER BY proctime ROWS BETWEEN 5 PRECEDING AND 
> CURRENT ROW)
> FROM
>   MyTable{code}
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (FLINK-8689) Add runtime support of distinct filter using MapView

2018-02-26 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-8689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16378121#comment-16378121
 ] 

ASF GitHub Bot commented on FLINK-8689:
---

Github user hequn8128 commented on a diff in the pull request:

https://github.com/apache/flink/pull/#discussion_r170828027
  
--- Diff: 
flink-libraries/flink-table/src/main/scala/org/apache/flink/table/functions/aggfunctions/DistinctAggDelegateFunction.scala
 ---
@@ -0,0 +1,122 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.table.functions.aggfunctions
+
+import java.util
+
+import org.apache.flink.api.common.typeinfo.{BasicTypeInfo, 
TypeInformation}
+import org.apache.flink.api.java.typeutils.{PojoField, PojoTypeInfo}
+import org.apache.flink.table.api.dataview.MapView
+import org.apache.flink.table.dataview.MapViewTypeInfo
+import org.apache.flink.table.functions.AggregateFunction
+
+class DistinctAccumulator[E, ACC] (var mapView: MapView[E, Integer], var 
realAcc: ACC) {
+  def this() {
+this(null, null.asInstanceOf[ACC])
+  }
+
+  def getRealAcc: ACC = realAcc
+
+  def canEqual(a: Any): Boolean = a.isInstanceOf[DistinctAccumulator[E, 
ACC]]
+
+  override def equals(that: Any): Boolean =
+that match {
+  case that: DistinctAccumulator[E, ACC] => that.canEqual(this) &&
+this.mapView == that.mapView
+  case _ => false
+}
+
+}
+
+class DistinctAggDelegateFunction[E, ACC](elementTypeInfo: 
TypeInformation[_],
+  var realAgg: 
AggregateFunction[_, ACC])
+  extends AggregateFunction[util.Map[E, Integer], DistinctAccumulator[E, 
ACC]] {
+
+  def getRealAgg: AggregateFunction[_, ACC] = realAgg
+
+  override def createAccumulator(): DistinctAccumulator[E, ACC] = {
+new DistinctAccumulator[E, ACC](
+  new MapView[E, Integer](
+elementTypeInfo.asInstanceOf[TypeInformation[E]],
+BasicTypeInfo.INT_TYPE_INFO),
+  realAgg.createAccumulator())
+  }
+
+  def accumulate(acc: DistinctAccumulator[E, ACC], element: E): Boolean = {
+if (element != null) {
+  if (acc.mapView.contains(element)) {
+acc.mapView.put(element, acc.mapView.get(element) + 1)
+false
+  } else {
+acc.mapView.put(element, 1)
+true
+  }
+} else {
+  false
+}
+  }
+
+  def accumulate(acc: DistinctAccumulator[E, ACC], element: E, count: 
Int): Boolean = {
+if (element != null) {
+  if (acc.mapView.contains(element)) {
+acc.mapView.put(element, acc.mapView.get(element) + count)
+false
+  } else {
+acc.mapView.put(element, count)
+true
+  }
+} else {
+  false
+}
+  }
+
+  def retract(acc: DistinctAccumulator[E, ACC], element: E): Boolean = {
+if (element != null) {
+  val count = acc.mapView.get(element)
+  if (count == 1) {
+acc.mapView.remove(element)
+true
+  } else {
+acc.mapView.put(element, count - 1)
+false
+  }
+} else {
+  false
+}
+  }
+
+  def resetAccumulator(acc: DistinctAccumulator[E, ACC]): Unit = {
+acc.mapView.clear()
+  }
+
+  override def getValue(acc: DistinctAccumulator[E, ACC]): util.Map[E, 
Integer] = {
+acc.mapView.map
+  }
+
+  override def getAccumulatorType: TypeInformation[DistinctAccumulator[E, 
ACC]] = {
+val clazz = classOf[DistinctAccumulator[E, ACC]]
+val pojoFields = new util.ArrayList[PojoField]
+pojoFields.add(new PojoField(clazz.getDeclaredField("mapView"),
+  new MapViewTypeInfo[E, Integer](
+elementTypeInfo.asInstanceOf[TypeInformation[E]], 
BasicTypeInfo.INT_TYPE_INFO)))
+pojoFields.add(new

[GitHub] flink pull request #5555: [FLINK-8689][table]Add runtime support of distinct...

2018-02-26 Thread hequn8128

Github user hequn8128 commented on a diff in the pull request:

https://github.com/apache/flink/pull/#discussion_r170828036
  
--- Diff: 
flink-libraries/flink-table/src/main/scala/org/apache/flink/table/runtime/aggregate/AggregateUtil.scala
 ---
@@ -1393,6 +1393,21 @@ object AggregateUtil {
 throw new TableException(s"unsupported Function: 
'${unSupported.getName}'")
 }
   }
+
+  // create distinct accumulator delegate
+  if (aggregateCall.isDistinct) {
--- End diff --

Sql will be verified by calcite to exclude single DISTINCT during sql parse 
phase, so maybe we don't have to consider single distinct here.


---

[GitHub] flink pull request #5555: [FLINK-8689][table]Add runtime support of distinct...

2018-02-26 Thread hequn8128

Github user hequn8128 commented on a diff in the pull request:

https://github.com/apache/flink/pull/#discussion_r170828027
  
--- Diff: 
flink-libraries/flink-table/src/main/scala/org/apache/flink/table/functions/aggfunctions/DistinctAggDelegateFunction.scala
 ---
@@ -0,0 +1,122 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.table.functions.aggfunctions
+
+import java.util
+
+import org.apache.flink.api.common.typeinfo.{BasicTypeInfo, 
TypeInformation}
+import org.apache.flink.api.java.typeutils.{PojoField, PojoTypeInfo}
+import org.apache.flink.table.api.dataview.MapView
+import org.apache.flink.table.dataview.MapViewTypeInfo
+import org.apache.flink.table.functions.AggregateFunction
+
+class DistinctAccumulator[E, ACC] (var mapView: MapView[E, Integer], var 
realAcc: ACC) {
+  def this() {
+this(null, null.asInstanceOf[ACC])
+  }
+
+  def getRealAcc: ACC = realAcc
+
+  def canEqual(a: Any): Boolean = a.isInstanceOf[DistinctAccumulator[E, 
ACC]]
+
+  override def equals(that: Any): Boolean =
+that match {
+  case that: DistinctAccumulator[E, ACC] => that.canEqual(this) &&
+this.mapView == that.mapView
+  case _ => false
+}
+
+}
+
+class DistinctAggDelegateFunction[E, ACC](elementTypeInfo: 
TypeInformation[_],
+  var realAgg: 
AggregateFunction[_, ACC])
+  extends AggregateFunction[util.Map[E, Integer], DistinctAccumulator[E, 
ACC]] {
+
+  def getRealAgg: AggregateFunction[_, ACC] = realAgg
+
+  override def createAccumulator(): DistinctAccumulator[E, ACC] = {
+new DistinctAccumulator[E, ACC](
+  new MapView[E, Integer](
+elementTypeInfo.asInstanceOf[TypeInformation[E]],
+BasicTypeInfo.INT_TYPE_INFO),
+  realAgg.createAccumulator())
+  }
+
+  def accumulate(acc: DistinctAccumulator[E, ACC], element: E): Boolean = {
+if (element != null) {
+  if (acc.mapView.contains(element)) {
+acc.mapView.put(element, acc.mapView.get(element) + 1)
+false
+  } else {
+acc.mapView.put(element, 1)
+true
+  }
+} else {
+  false
+}
+  }
+
+  def accumulate(acc: DistinctAccumulator[E, ACC], element: E, count: 
Int): Boolean = {
+if (element != null) {
+  if (acc.mapView.contains(element)) {
+acc.mapView.put(element, acc.mapView.get(element) + count)
+false
+  } else {
+acc.mapView.put(element, count)
+true
+  }
+} else {
+  false
+}
+  }
+
+  def retract(acc: DistinctAccumulator[E, ACC], element: E): Boolean = {
+if (element != null) {
+  val count = acc.mapView.get(element)
+  if (count == 1) {
+acc.mapView.remove(element)
+true
+  } else {
+acc.mapView.put(element, count - 1)
+false
+  }
+} else {
+  false
+}
+  }
+
+  def resetAccumulator(acc: DistinctAccumulator[E, ACC]): Unit = {
+acc.mapView.clear()
+  }
+
+  override def getValue(acc: DistinctAccumulator[E, ACC]): util.Map[E, 
Integer] = {
+acc.mapView.map
+  }
+
+  override def getAccumulatorType: TypeInformation[DistinctAccumulator[E, 
ACC]] = {
+val clazz = classOf[DistinctAccumulator[E, ACC]]
+val pojoFields = new util.ArrayList[PojoField]
+pojoFields.add(new PojoField(clazz.getDeclaredField("mapView"),
+  new MapViewTypeInfo[E, Integer](
+elementTypeInfo.asInstanceOf[TypeInformation[E]], 
BasicTypeInfo.INT_TYPE_INFO)))
+pojoFields.add(new PojoField(clazz.getDeclaredField("realAcc"),
+  realAgg.getAccumulatorType))
+new PojoTypeInfo[DistinctAccumulator[E, ACC]](clazz, pojoFields)
--- End diff --

Make sense.


---

[jira] [Updated] (FLINK-8792) bad semantic of method name MessageQueryParameter.convertStringToValue

2018-02-26 Thread vinoyang (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-8792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

vinoyang updated FLINK-8792:

Description: 
the method name
{code:java}
MessageQueryParameter.convertStringToValue
{code}
should be 
{code:java}
convertValueToString
{code}
or
{code:java}
convertStringFromValue{code}
 I think 
{code:java}
convertValueToString
{code}
would be better.

  was:the method name 


>  bad semantic of method name MessageQueryParameter.convertStringToValue 
> 
>
> Key: FLINK-8792
> URL: https://issues.apache.org/jira/browse/FLINK-8792
> Project: Flink
>  Issue Type: Improvement
>  Components: REST
>Reporter: vinoyang
>Assignee: vinoyang
>Priority: Minor
>
> the method name
> {code:java}
> MessageQueryParameter.convertStringToValue
> {code}
> should be 
> {code:java}
> convertValueToString
> {code}
> or
> {code:java}
> convertStringFromValue{code}
>  I think 
> {code:java}
> convertValueToString
> {code}
> would be better.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (FLINK-8792) bad semantic of method name MessageQueryParameter.convertStringToValue

2018-02-26 Thread vinoyang (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-8792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

vinoyang updated FLINK-8792:

Description: the method name   (was: the )

>  bad semantic of method name MessageQueryParameter.convertStringToValue 
> 
>
> Key: FLINK-8792
> URL: https://issues.apache.org/jira/browse/FLINK-8792
> Project: Flink
>  Issue Type: Improvement
>  Components: REST
>Reporter: vinoyang
>Assignee: vinoyang
>Priority: Minor
>
> the method name 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (FLINK-8792) bad semantic of method MessageQueryParameter.convertStringToValue

2018-02-26 Thread vinoyang (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-8792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

vinoyang updated FLINK-8792:

Description: the 

>  bad semantic of method MessageQueryParameter.convertStringToValue 
> ---
>
> Key: FLINK-8792
> URL: https://issues.apache.org/jira/browse/FLINK-8792
> Project: Flink
>  Issue Type: Improvement
>  Components: REST
>Reporter: vinoyang
>Assignee: vinoyang
>Priority: Minor
>
> the 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (FLINK-8792) bad semantic of method name MessageQueryParameter.convertStringToValue

2018-02-26 Thread vinoyang (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-8792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

vinoyang updated FLINK-8792:

Summary:  bad semantic of method name 
MessageQueryParameter.convertStringToValue   (was:  bad semantic of method 
MessageQueryParameter.convertStringToValue )

>  bad semantic of method name MessageQueryParameter.convertStringToValue 
> 
>
> Key: FLINK-8792
> URL: https://issues.apache.org/jira/browse/FLINK-8792
> Project: Flink
>  Issue Type: Improvement
>  Components: REST
>Reporter: vinoyang
>Assignee: vinoyang
>Priority: Minor
>
> the 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Closed] (FLINK-8543) Output Stream closed at org.apache.hadoop.fs.s3a.S3AOutputStream.checkOpen

2018-02-26 Thread Aljoscha Krettek (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-8543?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aljoscha Krettek closed FLINK-8543.
---
   Resolution: Fixed
Fix Version/s: 1.4.3

Fixed on release-1.4 in
b74c705157fef3e0d305fdd5bf1a006ae0a98666

Fixed on master in
915213c7afaf3f9d04c240f43d88710280d844e3

> Output Stream closed at org.apache.hadoop.fs.s3a.S3AOutputStream.checkOpen
> --
>
> Key: FLINK-8543
> URL: https://issues.apache.org/jira/browse/FLINK-8543
> Project: Flink
>  Issue Type: Bug
>  Components: Streaming Connectors
>Affects Versions: 1.4.0
> Environment: IBM Analytics Engine - 
> [https://console.bluemix.net/docs/services/AnalyticsEngine/index.html#introduction]
> The cluster is based on Hortonworks Data Platform 2.6.2. The following 
> components are made available.
> Apache Spark 2.1.1 Hadoop 2.7.3
> Apache Livy 0.3.0
> Knox 0.12.0
> Ambari 2.5.2
> Anaconda with Python 2.7.13 and 3.5.2 
> Jupyter Enterprise Gateway 0.5.0 
> HBase 1.1.2 * 
> Hive 1.2.1 *
> Oozie 4.2.0 *
> Flume 1.5.2 * 
> Tez 0.7.0 * 
> Pig 0.16.0 * 
> Sqoop 1.4.6 * 
> Slider 0.92.0 * 
>Reporter: chris snow
>Priority: Blocker
> Fix For: 1.4.3, 1.5.0
>
> Attachments: Screen Shot 2018-01-30 at 18.34.51.png
>
>
> I'm hitting an issue with my BucketingSink from a streaming job.
>  
> {code:java}
> return new BucketingSink>(path)
>  .setWriter(writer)
>  .setBucketer(new DateTimeBucketer Object>>(formatString));
> {code}
>  
> I can see that a few files have run into issues with uploading to S3:
> !Screen Shot 2018-01-30 at 18.34.51.png!   
> The Flink console output is showing an exception being thrown by 
> S3AOutputStream, so I've grabbed the S3AOutputStream class from my cluster 
> and added some additional logging to the checkOpen() method to log the 'key' 
> just before the exception is thrown:
>  
> {code:java}
> /*
>  * Decompiled with CFR.
>  */
> package org.apache.hadoop.fs.s3a;
> import com.amazonaws.AmazonClientException;
> import com.amazonaws.event.ProgressListener;
> import com.amazonaws.services.s3.model.ObjectMetadata;
> import com.amazonaws.services.s3.model.PutObjectRequest;
> import com.amazonaws.services.s3.transfer.Upload;
> import com.amazonaws.services.s3.transfer.model.UploadResult;
> import java.io.BufferedOutputStream;
> import java.io.File;
> import java.io.FileOutputStream;
> import java.io.IOException;
> import java.io.InterruptedIOException;
> import java.io.OutputStream;
> import java.util.concurrent.atomic.AtomicBoolean;
> import org.apache.hadoop.classification.InterfaceAudience;
> import org.apache.hadoop.classification.InterfaceStability;
> import org.apache.hadoop.conf.Configuration;
> import org.apache.hadoop.fs.s3a.ProgressableProgressListener;
> import org.apache.hadoop.fs.s3a.S3AFileSystem;
> import org.apache.hadoop.fs.s3a.S3AUtils;
> import org.apache.hadoop.util.Progressable;
> import org.slf4j.Logger;
> @InterfaceAudience.Private
> @InterfaceStability.Evolving
> public class S3AOutputStream
> extends OutputStream {
> private final OutputStream backupStream;
> private final File backupFile;
> private final AtomicBoolean closed = new AtomicBoolean(false);
> private final String key;
> private final Progressable progress;
> private final S3AFileSystem fs;
> public static final Logger LOG = S3AFileSystem.LOG;
> public S3AOutputStream(Configuration conf, S3AFileSystem fs, String key, 
> Progressable progress) throws IOException {
> this.key = key;
> this.progress = progress;
> this.fs = fs;
> this.backupFile = fs.createTmpFileForWrite("output-", -1, conf);
> LOG.debug("OutputStream for key '{}' writing to tempfile: {}", 
> (Object)key, (Object)this.backupFile);
> this.backupStream = new BufferedOutputStream(new 
> FileOutputStream(this.backupFile));
> }
> void checkOpen() throws IOException {
> if (!this.closed.get()) return;
> // vv-- Additional logging --vvv
> LOG.error("OutputStream for key '{}' closed.", (Object)this.key);
> throw new IOException("Output Stream closed");
> }
> @Override
> public void flush() throws IOException {
> this.checkOpen();
> this.backupStream.flush();
> }
> @Override
> public void close() throws IOException {
> if (this.closed.getAndSet(true)) {
> return;
> }
> this.backupStream.close();
> LOG.debug("OutputStream for key '{}' closed. Now beginning upload", 
> (Object)this.key);
> try {
> ObjectMetadata om = 
> this.fs.newObjectMetadata(this.backupFile.length());
> Upload upload = 
>

[jira] [Created] (FLINK-8792) bad semantic of method MessageQueryParameter.convertStringToValue

2018-02-26 Thread vinoyang (JIRA)

vinoyang created FLINK-8792:
---

 Summary:  bad semantic of method 
MessageQueryParameter.convertStringToValue 
 Key: FLINK-8792
 URL: https://issues.apache.org/jira/browse/FLINK-8792
 Project: Flink
  Issue Type: Improvement
  Components: REST
Reporter: vinoyang
Assignee: vinoyang






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (FLINK-8756) Support ClusterClient.getAccumulators() in RestClusterClient

2018-02-26 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-8756?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16378093#comment-16378093
 ] 

ASF GitHub Bot commented on FLINK-8756:
---

Github user yanghua commented on the issue:

https://github.com/apache/flink/pull/5573
  
 @GJL I have refactored the code and test case, could you please review the 
new commit? Thanks.


> Support ClusterClient.getAccumulators() in RestClusterClient
> 
>
> Key: FLINK-8756
> URL: https://issues.apache.org/jira/browse/FLINK-8756
> Project: Flink
>  Issue Type: Improvement
>  Components: Client
>Affects Versions: 1.5.0
>Reporter: Aljoscha Krettek
>Assignee: vinoyang
>Priority: Blocker
> Fix For: 1.5.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[GitHub] flink issue #5573: [FLINK-8756][Client] Support ClusterClient.getAccumulator...

2018-02-26 Thread yanghua

Github user yanghua commented on the issue:

https://github.com/apache/flink/pull/5573
  
 @GJL I have refactored the code and test case, could you please review the 
new commit? Thanks.


---

[jira] [Closed] (FLINK-8044) Introduce scheduling mechanism to satisfy both state locality and input

2018-02-26 Thread Sihua Zhou (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-8044?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sihua Zhou closed FLINK-8044.
-
  Resolution: Duplicate
Release Note: Already fixed by Stefan Richter.

> Introduce scheduling mechanism to satisfy both state locality and input
> ---
>
> Key: FLINK-8044
> URL: https://issues.apache.org/jira/browse/FLINK-8044
> Project: Flink
>  Issue Type: New Feature
>  Components: Scheduler
>Affects Versions: 1.4.0
>Reporter: Sihua Zhou
>Priority: Major
>
> For local recovery we need a scheduler to satisfy both state locality and 
> input, but current scheduler allocate resources according to the order of 
> topologically, this can cause bad result if we allocate base on both state 
> and input, so some revision for scheduler is necessary now.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (FLINK-8620) Enable shipping custom artifacts to BlobStore and accessing them through DistributedCache

2018-02-26 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-8620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16377993#comment-16377993
 ] 

ASF GitHub Bot commented on FLINK-8620:
---

Github user ifndef-SleePy commented on a diff in the pull request:

https://github.com/apache/flink/pull/5580#discussion_r170804963
  
--- Diff: 
flink-runtime/src/main/java/org/apache/flink/runtime/taskexecutor/TaskManagerRunner.java
 ---
@@ -314,6 +315,9 @@ public static TaskExecutor startTaskManager(
 
TaskManagerConfiguration taskManagerConfiguration = 
TaskManagerConfiguration.fromConfiguration(configuration);
 
+   final FileCache fileCache = new 
FileCache(taskManagerServicesConfiguration.getTmpDirPaths(),
--- End diff --

It seems that the `fileCache` here is not been used at all.


> Enable shipping custom artifacts to BlobStore and accessing them through 
> DistributedCache
> -
>
> Key: FLINK-8620
> URL: https://issues.apache.org/jira/browse/FLINK-8620
> Project: Flink
>  Issue Type: New Feature
>Reporter: Dawid Wysakowicz
>Assignee: Dawid Wysakowicz
>Priority: Major
>
> We should be able to distribute custom files to taskmanagers. To do that we 
> can store those files in BlobStore and later on access them in TaskManagers 
> through DistributedCache.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[GitHub] flink pull request #5580: [FLINK-8620] Enable shipping custom files to BlobS...

2018-02-26 Thread ifndef-SleePy

Github user ifndef-SleePy commented on a diff in the pull request:

https://github.com/apache/flink/pull/5580#discussion_r170804963
  
--- Diff: 
flink-runtime/src/main/java/org/apache/flink/runtime/taskexecutor/TaskManagerRunner.java
 ---
@@ -314,6 +315,9 @@ public static TaskExecutor startTaskManager(
 
TaskManagerConfiguration taskManagerConfiguration = 
TaskManagerConfiguration.fromConfiguration(configuration);
 
+   final FileCache fileCache = new 
FileCache(taskManagerServicesConfiguration.getTmpDirPaths(),
--- End diff --

It seems that the `fileCache` here is not been used at all.


---

[jira] [Commented] (FLINK-8245) Add some interval between logs when requested resources are not available

2018-02-26 Thread Wind (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-8245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16377964#comment-16377964
 ] 

Wind commented on FLINK-8245:
-

Anyone can assign it to me if possible ?

> Add some interval between logs when requested resources are not available
> -
>
> Key: FLINK-8245
> URL: https://issues.apache.org/jira/browse/FLINK-8245
> Project: Flink
>  Issue Type: Improvement
>Reporter: Wind
>Priority: Minor
>  Labels: starter
>
> Maybe we can add a thread.sleep(1000) after this log ?
> I will get plenty of repeated logs and disk will run out of space soon when 
> yarn cluster is short of resources. Is it acceptable ?
> {code:java}
> LOG.info("Deployment took more than 60 seconds. Please check if the requested 
> resources are available in the YARN cluster");
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (FLINK-8245) Add some interval between logs when requested resources are not available

2018-02-26 Thread Wind (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-8245?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wind updated FLINK-8245:

Labels: starter  (was: )

> Add some interval between logs when requested resources are not available
> -
>
> Key: FLINK-8245
> URL: https://issues.apache.org/jira/browse/FLINK-8245
> Project: Flink
>  Issue Type: Improvement
>Reporter: Wind
>Priority: Minor
>  Labels: starter
>
> Maybe we can add a thread.sleep(1000) after this log ?
> I will get plenty of repeated logs and disk will run out of space soon when 
> yarn cluster is short of resources. Is it acceptable ?
> {code:java}
> LOG.info("Deployment took more than 60 seconds. Please check if the requested 
> resources are available in the YARN cluster");
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (FLINK-8753) Introduce savepoint that go though the incremental checkpoint path

2018-02-26 Thread Sihua Zhou (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-8753?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sihua Zhou updated FLINK-8753:
--
Summary: Introduce savepoint that go though the incremental checkpoint path 
 (was: Introduce Incremental savepoint)

> Introduce savepoint that go though the incremental checkpoint path
> --
>
> Key: FLINK-8753
> URL: https://issues.apache.org/jira/browse/FLINK-8753
> Project: Flink
>  Issue Type: New Feature
>  Components: State Backends, Checkpointing
>Affects Versions: 1.5.0
>Reporter: Sihua Zhou
>Assignee: Sihua Zhou
>Priority: Major
>
> Right now, savepoint goes through the full checkpoint path, take a savepoint 
> could be slowly. In our production, for some long term job it often costs 
> more than 10min to complete a savepoint which is unacceptable for a real time 
> job, so we have to turn back to use the externalized checkpoint instead 
> currently. But the externalized  checkpoint has a time interval (checkpoint 
> interval) between the last time. So I proposal to introduce the increment 
> savepoint which goes through the increment checkpoint path.
> Any advice would be appreciated!



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Comment Edited] (FLINK-8753) Introduce Incremental savepoint

2018-02-26 Thread Sihua Zhou (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-8753?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16377908#comment-16377908
 ] 

Sihua Zhou edited comment on FLINK-8753 at 2/27/18 2:26 AM:


[~StephanEwen] Thanks for your reply. Indeed, what I am trying to achieve is 
just a faster savepoint that does not need to iterate all records one by one 
(along with some condition check that make it slow for huge data). And yes what 
you are described is very close to what I wanted but I didn't use the word 
`checkpoint` is that: checkpoint doesn't guarantee to support rescaling (this 
can be found on 
[flink-doc|https://ci.apache.org/projects/flink/flink-docs-release-1.4/ops/state/checkpoints.html#difference-to-savepoints]
 and the comment in this PR [5490|https://github.com/apache/flink/pull/5490]), 
which is always the purpose that we trigger a savepoint. An interesting thing I 
found is that, in the implementation checkpoint also support rescaling, I 
checked that both in code and in practice ... I wonder whether the "archive 
checkpoint" that you mentioned guarantee to support rescaling? 

At bout the implementation, I think maybe this issue's title is incorrect ... I 
just want to implement the savepoint which go though the incremental checkpoint 
path but treat the `baseSstFile` as empty ( which is look like just submit the 
local RocksDB snapshot on to DFS)...


was (Author: sihuazhou):
[~StephanEwen] Thanks for your reply. Indeed, what I am trying to achieve is 
just a faster savepoint that does not  to iterate all records one by one (along 
with some condition check that make it slow for huge data). And yes what you 
are described is very close to what I wanted but I didn't use the word 
`checkpoint` is that: checkpoint doesn't guarantee to support rescaling (this 
can be found on 
[flink-doc|https://ci.apache.org/projects/flink/flink-docs-release-1.4/ops/state/checkpoints.html#difference-to-savepoints]
 and the comment in this PR [5490|https://github.com/apache/flink/pull/5490]), 
which is always the purpose that we trigger a savepoint. An interesting thing I 
found is that, in the implementation checkpoint also support rescaling, I 
checked that both in code and in practice ... I wonder whether the "archive 
checkpoint" guarantee to support rescaling? 

At bout the implementation, I think maybe this issue's title incorrect ... I 
just want to implement the save point which go though the incremental 
checkpoint path but treat the `baseSstFile` as empty ( which is look like just 
submit the local RocksDB snapshot on to DFS).

> Introduce Incremental savepoint
> ---
>
> Key: FLINK-8753
> URL: https://issues.apache.org/jira/browse/FLINK-8753
> Project: Flink
>  Issue Type: New Feature
>  Components: State Backends, Checkpointing
>Affects Versions: 1.5.0
>Reporter: Sihua Zhou
>Assignee: Sihua Zhou
>Priority: Major
>
> Right now, savepoint goes through the full checkpoint path, take a savepoint 
> could be slowly. In our production, for some long term job it often costs 
> more than 10min to complete a savepoint which is unacceptable for a real time 
> job, so we have to turn back to use the externalized checkpoint instead 
> currently. But the externalized  checkpoint has a time interval (checkpoint 
> interval) between the last time. So I proposal to introduce the increment 
> savepoint which goes through the increment checkpoint path.
> Any advice would be appreciated!



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (FLINK-8753) Introduce Incremental savepoint

2018-02-26 Thread Sihua Zhou (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-8753?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16377908#comment-16377908
 ] 

Sihua Zhou commented on FLINK-8753:
---

[~StephanEwen] Thanks for your reply. Indeed, what I am trying to achieve is 
just a faster savepoint that does not  to iterate all records one by one (along 
with some condition check that make it slow for huge data). And yes what you 
are described is very close to what I wanted but I didn't use the word 
`checkpoint` is that: checkpoint doesn't guarantee to support rescaling (this 
can be found on 
[flink-doc|https://ci.apache.org/projects/flink/flink-docs-release-1.4/ops/state/checkpoints.html#difference-to-savepoints]
 and the comment in this PR [5490|https://github.com/apache/flink/pull/5490]), 
which is always the purpose that we trigger a savepoint. An interesting thing I 
found is that, in the implementation checkpoint also support rescaling, I 
checked that both in code and in practice ... I wonder whether the "archive 
checkpoint" guarantee to support rescaling? 

At bout the implementation, I think maybe this issue's title incorrect ... I 
just want to implement the save point which go though the incremental 
checkpoint path but treat the `baseSstFile` as empty ( which is look like just 
submit the local RocksDB snapshot on to DFS).

> Introduce Incremental savepoint
> ---
>
> Key: FLINK-8753
> URL: https://issues.apache.org/jira/browse/FLINK-8753
> Project: Flink
>  Issue Type: New Feature
>  Components: State Backends, Checkpointing
>Affects Versions: 1.5.0
>Reporter: Sihua Zhou
>Assignee: Sihua Zhou
>Priority: Major
>
> Right now, savepoint goes through the full checkpoint path, take a savepoint 
> could be slowly. In our production, for some long term job it often costs 
> more than 10min to complete a savepoint which is unacceptable for a real time 
> job, so we have to turn back to use the externalized checkpoint instead 
> currently. But the externalized  checkpoint has a time interval (checkpoint 
> interval) between the last time. So I proposal to introduce the increment 
> savepoint which goes through the increment checkpoint path.
> Any advice would be appreciated!



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Comment Edited] (FLINK-7795) Utilize error-prone to discover common coding mistakes

2018-02-26 Thread Ted Yu (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-7795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16345955#comment-16345955
 ] 

Ted Yu edited comment on FLINK-7795 at 2/27/18 2:08 AM:


error-prone has JDK 8 dependency.


was (Author: yuzhih...@gmail.com):
error-prone has JDK 8 dependency .

> Utilize error-prone to discover common coding mistakes
> --
>
> Key: FLINK-7795
> URL: https://issues.apache.org/jira/browse/FLINK-7795
> Project: Flink
>  Issue Type: Improvement
>  Components: Build System
>Reporter: Ted Yu
>Priority: Major
>
> http://errorprone.info/ is a tool which detects common coding mistakes.
> We should incorporate into Flink build process.
> Here are the dependencies:
> {code}
> 
>   com.google.errorprone
>   error_prone_annotation
>   ${error-prone.version}
>   provided
> 
> 
>   
>   com.google.auto.service
>   auto-service
>   1.0-rc3
>   true
> 
> 
>   com.google.errorprone
>   error_prone_check_api
>   ${error-prone.version}
>   provided
>   
> 
>   com.google.code.findbugs
>   jsr305
> 
>   
> 
> 
>   com.google.errorprone
>   javac
>   9-dev-r4023-3
>   provided
> 
>   
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (FLINK-7897) Consider using nio.Files for file deletion in TransientBlobCleanupTask

2018-02-26 Thread Ted Yu (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-7897?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated FLINK-7897:
--
Description: 
nio.Files#delete() provides better clue as to why the deletion may fail:

https://docs.oracle.com/javase/7/docs/api/java/nio/file/Files.html#delete(java.nio.file.Path)

Depending on the potential exception (FileNotFound), the call to 
localFile.exists() may be skipped.

  was:
nio.Files#delete() provides better clue as to why the deletion may fail:

https://docs.oracle.com/javase/7/docs/api/java/nio/file/Files.html#delete(java.nio.file.Path)


Depending on the potential exception (FileNotFound), the call to 
localFile.exists() may be skipped.


> Consider using nio.Files for file deletion in TransientBlobCleanupTask
> --
>
> Key: FLINK-7897
> URL: https://issues.apache.org/jira/browse/FLINK-7897
> Project: Flink
>  Issue Type: Improvement
>  Components: Local Runtime
>Reporter: Ted Yu
>Priority: Minor
>
> nio.Files#delete() provides better clue as to why the deletion may fail:
> https://docs.oracle.com/javase/7/docs/api/java/nio/file/Files.html#delete(java.nio.file.Path)
> Depending on the potential exception (FileNotFound), the call to 
> localFile.exists() may be skipped.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (FLINK-8335) Upgrade hbase connector dependency to 1.4.1

2018-02-26 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-8335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16377898#comment-16377898
 ] 

ASF GitHub Bot commented on FLINK-8335:
---

Github user tedyu commented on the issue:

https://github.com/apache/flink/pull/5488
  
Should be good after 1.4.1 is filled in


> Upgrade hbase connector dependency to 1.4.1
> ---
>
> Key: FLINK-8335
> URL: https://issues.apache.org/jira/browse/FLINK-8335
> Project: Flink
>  Issue Type: Improvement
>  Components: Batch Connectors and Input/Output Formats
>Reporter: Ted Yu
>Priority: Minor
>
> hbase 1.4.1 has been released.
> 1.4.0 shows speed improvement over previous 1.x releases.
> http://search-hadoop.com/m/HBase/YGbbBxedD1Mnm8t?subj=Re+VOTE+The+second+HBase+1+4+0+release+candidate+RC1+is+available
> This issue is to upgrade the dependency to 1.4.1



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[GitHub] flink issue #5488: [FLINK-8335] [hbase] Upgrade hbase connector dependency t...

2018-02-26 Thread tedyu

Github user tedyu commented on the issue:

https://github.com/apache/flink/pull/5488
  
Should be good after 1.4.1 is filled in


---

[jira] [Updated] (FLINK-8554) Upgrade AWS SDK

2018-02-26 Thread Ted Yu (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-8554?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated FLINK-8554:
--
Description: 
AWS SDK 1.11.271 fixes a lot of bugs.

One of which would exhibit the following:
{code}
Caused by: java.lang.NullPointerException
at com.amazonaws.metrics.AwsSdkMetrics.getRegion(AwsSdkMetrics.java:729)
at com.amazonaws.metrics.MetricAdmin.getRegion(MetricAdmin.java:67)
at sun.reflect.GeneratedMethodAccessor132.invoke(Unknown Source)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
{code}

  was:
AWS SDK 1.11.271 fixes a lot of bugs.

One of which would exhibit the following:

{code}
Caused by: java.lang.NullPointerException
at com.amazonaws.metrics.AwsSdkMetrics.getRegion(AwsSdkMetrics.java:729)
at com.amazonaws.metrics.MetricAdmin.getRegion(MetricAdmin.java:67)
at sun.reflect.GeneratedMethodAccessor132.invoke(Unknown Source)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
{code}


> Upgrade AWS SDK
> ---
>
> Key: FLINK-8554
> URL: https://issues.apache.org/jira/browse/FLINK-8554
> Project: Flink
>  Issue Type: Improvement
>Reporter: Ted Yu
>Priority: Minor
>
> AWS SDK 1.11.271 fixes a lot of bugs.
> One of which would exhibit the following:
> {code}
> Caused by: java.lang.NullPointerException
>   at com.amazonaws.metrics.AwsSdkMetrics.getRegion(AwsSdkMetrics.java:729)
>   at com.amazonaws.metrics.MetricAdmin.getRegion(MetricAdmin.java:67)
>   at sun.reflect.GeneratedMethodAccessor132.invoke(Unknown Source)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (FLINK-8720) Logging exception with S3 connector and BucketingSink

2018-02-26 Thread dejan miljkovic (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-8720?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16377835#comment-16377835
 ] 

dejan miljkovic commented on FLINK-8720:


Thanks for the suggestions. Removing Hadoop dependencies from jar did solve the 
problem!!!

The application is built on Flink 1.4.1. It looks that more work is needed for 
class loading logic. I noticed that in application works in InteligJ with some 
versions of jars but does not work when submitted to cluster.

I tried "parent-first" option but that required adding more dependencies in 
pom.xml. Did not proceed with this because I would affect other application 
that are deployed.

One more thanks a lot for the response and solution.

> Logging exception with S3 connector and BucketingSink
> -
>
> Key: FLINK-8720
> URL: https://issues.apache.org/jira/browse/FLINK-8720
> Project: Flink
>  Issue Type: Bug
>  Components: Streaming Connectors
>Affects Versions: 1.4.1
>Reporter: dejan miljkovic
>Priority: Critical
>
> Trying to stream data to S3. Code works from InteliJ. When submitting code 
> trough UI on my machine (single node cluster started by start-cluster.sh 
> script) below stack trace is produced.
>  
> Below is the link to the simple test app that is streaming data to S3. 
> [https://github.com/dmiljkovic/test-flink-bucketingsink-s3]
> The behavior is bit different but same error is produced.  Job works only 
> once. If job is submitted second time below stack trace is produced. If I 
> restart the cluster job works but only for the first time.
>  
>  
> org.apache.commons.logging.LogConfigurationException: 
> java.lang.IllegalAccessError: 
> org/apache/commons/logging/impl/LogFactoryImpl$3 (Caused by 
> java.lang.IllegalAccessError: 
> org/apache/commons/logging/impl/LogFactoryImpl$3)
>   at 
> org.apache.commons.logging.impl.LogFactoryImpl.newInstance(LogFactoryImpl.java:637)
>   at 
> org.apache.commons.logging.impl.LogFactoryImpl.getInstance(LogFactoryImpl.java:336)
>   at 
> org.apache.commons.logging.impl.LogFactoryImpl.getInstance(LogFactoryImpl.java:310)
>   at org.apache.commons.logging.LogFactory.getLog(LogFactory.java:685)
>   at 
> org.apache.http.impl.conn.PoolingClientConnectionManager.(PoolingClientConnectionManager.java:76)
>   at 
> org.apache.http.impl.conn.PoolingClientConnectionManager.(PoolingClientConnectionManager.java:102)
>   at 
> org.apache.http.impl.conn.PoolingClientConnectionManager.(PoolingClientConnectionManager.java:88)
>   at 
> org.apache.http.impl.conn.PoolingClientConnectionManager.(PoolingClientConnectionManager.java:96)
>   at 
> com.amazonaws.http.ConnectionManagerFactory.createPoolingClientConnManager(ConnectionManagerFactory.java:26)
>   at 
> com.amazonaws.http.HttpClientFactory.createHttpClient(HttpClientFactory.java:96)
>   at com.amazonaws.http.AmazonHttpClient.(AmazonHttpClient.java:158)
>   at 
> com.amazonaws.AmazonWebServiceClient.(AmazonWebServiceClient.java:119)
>   at 
> com.amazonaws.services.s3.AmazonS3Client.(AmazonS3Client.java:389)
>   at 
> com.amazonaws.services.s3.AmazonS3Client.(AmazonS3Client.java:371)
>   at 
> org.apache.hadoop.fs.s3a.S3AFileSystem.initialize(S3AFileSystem.java:235)
>   at 
> org.apache.flink.streaming.connectors.fs.bucketing.BucketingSink.createHadoopFileSystem(BucketingSink.java:1206)
>   at 
> org.apache.flink.streaming.connectors.fs.bucketing.BucketingSink.initFileSystem(BucketingSink.java:411)
>   at 
> org.apache.flink.streaming.connectors.fs.bucketing.BucketingSink.initializeState(BucketingSink.java:355)
>   at 
> org.apache.flink.streaming.util.functions.StreamingFunctionUtils.tryRestoreFunction(StreamingFunctionUtils.java:178)
>   at 
> org.apache.flink.streaming.util.functions.StreamingFunctionUtils.restoreFunctionState(StreamingFunctionUtils.java:160)
>   at 
> org.apache.flink.streaming.api.operators.AbstractUdfStreamOperator.initializeState(AbstractUdfStreamOperator.java:96)
>   at 
> org.apache.flink.streaming.api.operators.AbstractStreamOperator.initializeState(AbstractStreamOperator.java:258)
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.initializeOperators(StreamTask.java:694)
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.initializeState(StreamTask.java:682)
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:253)
>   at org.apache.flink.runtime.taskmanager.Task.run(Task.java:718)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: java.lang.IllegalAccessError: 
> org/apache/commons/logging/impl/LogFactoryImpl$3
>   at 
> org.apache.commons.logging.impl.LogFactoryImpl.getParentClassLoader(LogFactoryImpl.java:700)
>   at 
>

[jira] [Commented] (FLINK-7477) Use "hadoop classpath" to augment classpath when available

2018-02-26 Thread Ken Krugler (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-7477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16377679#comment-16377679
 ] 

Ken Krugler commented on FLINK-7477:


The odd stuff (some of which might be bogus)...
 # I had to explicitly add {{kryo.serializers}} as a dependency.
 # Ditto for {{org.jdom:jdom}}, which our {{Tika}} dependency should have 
pulled in transitively, but it was missing.
 # A bunch of stuff with getting integration tests working (including 
{{maven-failsafe-plugin}} and {{build-helper-maven-plugin}} among others), but 
that just happened to be at the same time as the AWS client class issue, so 
unrelated.

Not sure how different our (non-flink) shaded exclusion list wound up being 
from "regular" Flink, here's what it is now:

 
{code:java}
log4j:log4j
org.scala-lang:scala-library
org.scala-lang:scala-compiler
org.scala-lang:scala-reflect
com.data-artisans:flakka-actor_*
com.data-artisans:flakka-remote_*
com.data-artisans:flakka-slf4j_*
io.netty:netty-all
io.netty:netty
commons-fileupload:commons-fileupload
org.apache.avro:avro
commons-collections:commons-collections
org.codehaus.jackson:jackson-core-asl
org.codehaus.jackson:jackson-mapper-asl
com.thoughtworks.paranamer:paranamer
org.xerial.snappy:snappy-java
org.apache.commons:commons-compress
org.tukaani:xz
com.esotericsoftware.kryo:kryo
com.esotericsoftware.minlog:minlog
org.objenesis:objenesis
com.twitter:chill_*
com.twitter:chill-java
commons-lang:commons-lang
junit:junit
org.apache.commons:commons-lang3
org.slf4j:slf4j-api
org.slf4j:slf4j-log4j12
log4j:log4j
org.apache.commons:commons-math
org.apache.sling:org.apache.sling.commons.json
commons-logging:commons-logging
commons-codec:commons-codec
stax:stax-api
com.typesafe:config
org.uncommons.maths:uncommons-maths
com.github.scopt:scopt_*
commons-io:commons-io
commons-cli:commons-cli
{code}

> Use "hadoop classpath" to augment classpath when available
> --
>
> Key: FLINK-7477
> URL: https://issues.apache.org/jira/browse/FLINK-7477
> Project: Flink
>  Issue Type: Bug
>  Components: Startup Shell Scripts
>Reporter: Aljoscha Krettek
>Assignee: Aljoscha Krettek
>Priority: Major
> Fix For: 1.4.0
>
>
> Currently, some cloud environments don't properly put the Hadoop jars into 
> {{HADOOP_CLASSPATH}} (or don't set {{HADOOP_CLASSPATH}}) at all. We should 
> check in {{config.sh}} if the {{hadoop}} binary is on the path and augment 
> our {{INTERNAL_HADOOP_CLASSPATHS}} with the result of {{hadoop classpath}} in 
> our scripts.
> This will improve the out-of-box experience of users that otherwise have to 
> manually set {{HADOOP_CLASSPATH}}.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (FLINK-8791) Fix documentation on how to link dependencies

2018-02-26 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-8791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16377662#comment-16377662
 ] 

ASF GitHub Bot commented on FLINK-8791:
---

GitHub user StephanEwen opened a pull request:

https://github.com/apache/flink/pull/5586

[FLINK-8791] [docs] Fix documentation about configuring dependencies

This fixes the incorrect docs currently under [Linking with 
Flink](https://ci.apache.org/projects/flink/flink-docs-master/dev/linking_with_flink.html)
 and [Linking with Optional 
Modules](https://ci.apache.org/projects/flink/flink-docs-master/dev/linking.html)

This adds redirects from the two old incorrect pages to the new correct 
page.

/cc @twalthr and @alpinegizmo happy to hear your thoughts.

For any corrections to spelling errors and formatting fixes, could you add 
a pull request against my pull request branch 
(https://github.com/StephanEwen/incubator-flink/tree/docs_dependencies)

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/StephanEwen/incubator-flink docs_dependencies

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/5586.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #5586


commit 2bdda86170c25d0a9fadca65d33077b53b0734fc
Author: Stephan Ewen 
Date:   2018-02-26T15:41:24Z

[FLINK-8791] [docs] Fix documentation about configuring dependencies




> Fix documentation on how to link dependencies
> -
>
> Key: FLINK-8791
> URL: https://issues.apache.org/jira/browse/FLINK-8791
> Project: Flink
>  Issue Type: Bug
>  Components: Documentation
>Reporter: Stephan Ewen
>Assignee: Stephan Ewen
>Priority: Major
> Fix For: 1.5.0
>
>
> The documentation in "Linking with Flink" and "Linking with Optional 
> Dependencies" is very outdated and gives wrong advise to users.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[GitHub] flink pull request #5586: [FLINK-8791] [docs] Fix documentation about config...

2018-02-26 Thread StephanEwen

GitHub user StephanEwen opened a pull request:

https://github.com/apache/flink/pull/5586

[FLINK-8791] [docs] Fix documentation about configuring dependencies

This fixes the incorrect docs currently under [Linking with 
Flink](https://ci.apache.org/projects/flink/flink-docs-master/dev/linking_with_flink.html)
 and [Linking with Optional 
Modules](https://ci.apache.org/projects/flink/flink-docs-master/dev/linking.html)

This adds redirects from the two old incorrect pages to the new correct 
page.

/cc @twalthr and @alpinegizmo happy to hear your thoughts.

For any corrections to spelling errors and formatting fixes, could you add 
a pull request against my pull request branch 
(https://github.com/StephanEwen/incubator-flink/tree/docs_dependencies)

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/StephanEwen/incubator-flink docs_dependencies

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/5586.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #5586


commit 2bdda86170c25d0a9fadca65d33077b53b0734fc
Author: Stephan Ewen 
Date:   2018-02-26T15:41:24Z

[FLINK-8791] [docs] Fix documentation about configuring dependencies




---

[jira] [Created] (FLINK-8791) Fix documentation on how to link dependencies

2018-02-26 Thread Stephan Ewen (JIRA)

Stephan Ewen created FLINK-8791:
---

 Summary: Fix documentation on how to link dependencies
 Key: FLINK-8791
 URL: https://issues.apache.org/jira/browse/FLINK-8791
 Project: Flink
  Issue Type: Bug
  Components: Documentation
Reporter: Stephan Ewen
Assignee: Stephan Ewen
 Fix For: 1.5.0


The documentation in "Linking with Flink" and "Linking with Optional 
Dependencies" is very outdated and gives wrong advise to users.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (FLINK-8753) Introduce Incremental savepoint

2018-02-26 Thread Stephan Ewen (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-8753?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16377654#comment-16377654
 ] 

Stephan Ewen commented on FLINK-8753:
-

It sounds like what you are trying to achieve is closer to a checkpoint then to 
a savepoint:

  - checkpoint is potentially manually triggered 
  - checkpoint is not automatically removed once it is subsumed, but it is 
retained
  - could call it "archive checkpoint" or "detached checkpoint".

The tricky question to me is: if this thing is incremental, then there needs to 
be some bookkeeping on how many references are made to the individual shared 
state chunks. Someone would still need to hold a reference in the shared state 
registry, or the state chunks will be removed once all other checkpoints stop 
referencing them.

Alternatively, one could mark the chunks as 'detached', meaning they are not 
reference counted any more, but always kept. Then the question is, how can one 
determine how to clean the checkpoint up? The only way I can imagine this to 
work in practice is on file systems that support hard links - in that case, the 
hard links do the ref counting for you.


> Introduce Incremental savepoint
> ---
>
> Key: FLINK-8753
> URL: https://issues.apache.org/jira/browse/FLINK-8753
> Project: Flink
>  Issue Type: New Feature
>  Components: State Backends, Checkpointing
>Affects Versions: 1.5.0
>Reporter: Sihua Zhou
>Assignee: Sihua Zhou
>Priority: Major
>
> Right now, savepoint goes through the full checkpoint path, take a savepoint 
> could be slowly. In our production, for some long term job it often costs 
> more than 10min to complete a savepoint which is unacceptable for a real time 
> job, so we have to turn back to use the externalized checkpoint instead 
> currently. But the externalized  checkpoint has a time interval (checkpoint 
> interval) between the last time. So I proposal to introduce the increment 
> savepoint which goes through the increment checkpoint path.
> Any advice would be appreciated!



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (FLINK-8770) CompletedCheckPoints stored on ZooKeeper is not up-to-date, when JobManager is restarted it fails to recover the job due to "checkpoint FileNotFound exception"

2018-02-26 Thread Stephan Ewen (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-8770?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16377623#comment-16377623
 ] 

Stephan Ewen commented on FLINK-8770:
-

Thank you for reporting this.

[~till.rohrmann] - do you have any intuition why this might happen?

> CompletedCheckPoints stored on ZooKeeper is not up-to-date, when JobManager 
> is restarted it fails to recover the job due to "checkpoint FileNotFound 
> exception"
> ---
>
> Key: FLINK-8770
> URL: https://issues.apache.org/jira/browse/FLINK-8770
> Project: Flink
>  Issue Type: Bug
>  Components: Local Runtime
>Affects Versions: 1.4.0
>Reporter: Xinyang Gao
>Priority: Blocker
> Fix For: 1.5.0
>
>
> Hi, I am running a Flink cluster (1 JobManager + 6 TaskManagers) with HA mode 
> on OpenShift, I have enabled Chaos Monkey which kills either JobManager or 
> one of the TaskManager in every 5 minutes, ZooKeeper quorum is stable with no 
> chaos monkey enabled. Flink reads data from one Kafka topic and writes data 
> into another Kafka topic. Checkpoint surely is enabled, with 1000ms interval. 
> state.checkpoints.num-retained is set to 10. I am using PVC for state backend 
> (checkpoint, recovery, etc), so the checkpoints and states are persistent. 
> The restart strategy for Flink jobmanager DeploymentConfig is 
> {color:#d04437}recreate, {color:#33}which means it will kill the old 
> container of jobmanager before it restarts the new one.{color}{color}
> I have run the Chaos test for one day at first, however I have seen the 
> exception:
> {color:#FF}org.apache.flink.util.FlinkException: Could not retrieve 
> checkpoint *** from state handle under /***. This indicates that the 
> retrieved state handle is broken. Try cleaning the state handle store. 
> {color:#33}and the root cause is checkpoint 
> {color:#d04437}FileNotFound{color}. {color}{color}
> {color:#FF}{color:#33}then the Flink job keeps restarting for a few 
> hours and due to the above error it cannot be restarted successfully. 
> {color}{color}
> {color:#FF}{color:#33}After further investigation, I have found the 
> following facts in my PVC:{color}{color}
>  
> {color:#d04437}-rw-r--r--. 1 flink root 11379 Feb 23 02:10 
> completedCheckpoint0ee95157de00
> -rw-r--r--. 1 flink root 11379 Feb 23 01:51 completedCheckpoint498d0952cf00
> -rw-r--r--. 1 flink root 11379 Feb 23 02:10 completedCheckpoint650fe5b021fe
> -rw-r--r--. 1 flink root 11379 Feb 23 02:10 completedCheckpoint66634149683e
> -rw-r--r--. 1 flink root 11379 Feb 23 02:11 completedCheckpoint67f24c3b018e
> -rw-r--r--. 1 flink root 11379 Feb 23 02:10 completedCheckpoint6f64ebf0ae64
> -rw-r--r--. 1 flink root 11379 Feb 23 02:11 completedCheckpoint906ebe1fb337
> -rw-r--r--. 1 flink root 11379 Feb 23 02:11 completedCheckpoint98b79ea14b09
> -rw-r--r--. 1 flink root 11379 Feb 23 02:10 completedCheckpointa0d1070e0b6c
> -rw-r--r--. 1 flink root 11379 Feb 23 02:11 completedCheckpointbd3a9ba50322
> -rw-r--r--. 1 flink root 11355 Feb 22 17:31 completedCheckpointd433b5e108be
> -rw-r--r--. 1 flink root 11379 Feb 22 22:56 completedCheckpointdd0183ed092b
> -rw-r--r--. 1 flink root 11379 Feb 22 00:00 completedCheckpointe0a5146c3d81
> -rw-r--r--. 1 flink root 11331 Feb 22 17:06 completedCheckpointec82f3ebc2ad
> -rw-r--r--. 1 flink root 11379 Feb 23 02:11 
> completedCheckpointf86e460f6720{color}
>  
> {color:#33}The latest 10 checkpoints are created at about 02:10, if you 
> ignore the old checkpoints which were not deleted successfully (which I do 
> not care too much).{color}
>  
> {color:#33}However when checking on ZooKeeper, I see the followings in 
> flink/checkpoints path (I only list one, but the other 9 are similar){color}
> {color:#d04437}cZxid = 0x160001ff5d
> ��sr;org.apache.flink.runtime.state.RetrievableStreamStateHandle�U�+LwrappedStreamStateHandlet2Lorg/apache/flink/runtime/state/StreamStateHandle;xpsr9org.apache.flink.runtime.state.filesystem.FileStateHandle�u�b�▒▒J
>  
> stateSizefilePathtLorg/apache/flink/core/fs/Path;xp,ssrorg.apache.flink.core.fs.PathLuritLjava/net/URI;xpsr
>  
> java.net.URI�x.C�I�LstringtLjava/lang/String;xpt=file:/mnt/flink-test/recovery/completedCheckpointd004a3753870x
> [zk: localhost:2181(CONNECTED) 7] ctime = Fri Feb 23 02:08:18 UTC 2018
> mZxid = 0x160001ff5d
> mtime = Fri Feb 23 02:08:18 UTC 2018
> pZxid = 0x1d0c6d
> cversion = 31
> dataVersion = 0
> aclVersion = 0
> ephemeralOwner = 0x0
> dataLength = 492{color}
> {color:#FF}{color:#33} {color}{color}
> so the latest completedCheckpoints status stored on ZooKeeper is at about 
> {color:#d04437}02:08, {color:#33}which implies that the completed 
> checkpoints at{color}{color}

[jira] [Commented] (FLINK-8784) Share network buffer pool among different partition operators

2018-02-26 Thread Stephan Ewen (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-8784?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16377622#comment-16377622
 ] 

Stephan Ewen commented on FLINK-8784:
-

The choice to not share the network buffer pool, but partition it was made to 
guard against starvation effects. Flink can never know in which order or speed 
any connection will progress (that depends on the behavior of the user code), 
so reserving enough memory for each connection to be able to do its maximum 
work makes it safe (against deadlocks and against bad performance).

For streaming jobs, I am not sure we should change that.

For batch jobs, we can probably improve the behavior better through better 
scheduling than through adjusting the network stack memory model.

> Share network buffer pool among different partition operators
> -
>
> Key: FLINK-8784
> URL: https://issues.apache.org/jira/browse/FLINK-8784
> Project: Flink
>  Issue Type: Improvement
>  Components: TaskManager
>Reporter: Garrett Li
>Priority: Major
>
> A logical network connection exists for each point-to-point exchange of data 
> over the network, which typically happens at repartitioning- or broadcasting 
> steps (shuffle phase). et
> Currently Flink divide the network buffer pool evenly for every partition 
> operators. Sometimes it could waste too many network buffers since we might 
> have a lot of repartitioning and broadcasting in the one Flink job, and NOT 
> all of the partition operators in the same job have the same traffic.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (FLINK-8543) Output Stream closed at org.apache.hadoop.fs.s3a.S3AOutputStream.checkOpen

2018-02-26 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-8543?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16377510#comment-16377510
 ] 

ASF GitHub Bot commented on FLINK-8543:
---

Github user aljoscha commented on the issue:

https://github.com/apache/flink/pull/5563
  
merged


> Output Stream closed at org.apache.hadoop.fs.s3a.S3AOutputStream.checkOpen
> --
>
> Key: FLINK-8543
> URL: https://issues.apache.org/jira/browse/FLINK-8543
> Project: Flink
>  Issue Type: Bug
>  Components: Streaming Connectors
>Affects Versions: 1.4.0
> Environment: IBM Analytics Engine - 
> [https://console.bluemix.net/docs/services/AnalyticsEngine/index.html#introduction]
> The cluster is based on Hortonworks Data Platform 2.6.2. The following 
> components are made available.
> Apache Spark 2.1.1 Hadoop 2.7.3
> Apache Livy 0.3.0
> Knox 0.12.0
> Ambari 2.5.2
> Anaconda with Python 2.7.13 and 3.5.2 
> Jupyter Enterprise Gateway 0.5.0 
> HBase 1.1.2 * 
> Hive 1.2.1 *
> Oozie 4.2.0 *
> Flume 1.5.2 * 
> Tez 0.7.0 * 
> Pig 0.16.0 * 
> Sqoop 1.4.6 * 
> Slider 0.92.0 * 
>Reporter: chris snow
>Priority: Blocker
> Fix For: 1.5.0
>
> Attachments: Screen Shot 2018-01-30 at 18.34.51.png
>
>
> I'm hitting an issue with my BucketingSink from a streaming job.
>  
> {code:java}
> return new BucketingSink>(path)
>  .setWriter(writer)
>  .setBucketer(new DateTimeBucketer Object>>(formatString));
> {code}
>  
> I can see that a few files have run into issues with uploading to S3:
> !Screen Shot 2018-01-30 at 18.34.51.png!   
> The Flink console output is showing an exception being thrown by 
> S3AOutputStream, so I've grabbed the S3AOutputStream class from my cluster 
> and added some additional logging to the checkOpen() method to log the 'key' 
> just before the exception is thrown:
>  
> {code:java}
> /*
>  * Decompiled with CFR.
>  */
> package org.apache.hadoop.fs.s3a;
> import com.amazonaws.AmazonClientException;
> import com.amazonaws.event.ProgressListener;
> import com.amazonaws.services.s3.model.ObjectMetadata;
> import com.amazonaws.services.s3.model.PutObjectRequest;
> import com.amazonaws.services.s3.transfer.Upload;
> import com.amazonaws.services.s3.transfer.model.UploadResult;
> import java.io.BufferedOutputStream;
> import java.io.File;
> import java.io.FileOutputStream;
> import java.io.IOException;
> import java.io.InterruptedIOException;
> import java.io.OutputStream;
> import java.util.concurrent.atomic.AtomicBoolean;
> import org.apache.hadoop.classification.InterfaceAudience;
> import org.apache.hadoop.classification.InterfaceStability;
> import org.apache.hadoop.conf.Configuration;
> import org.apache.hadoop.fs.s3a.ProgressableProgressListener;
> import org.apache.hadoop.fs.s3a.S3AFileSystem;
> import org.apache.hadoop.fs.s3a.S3AUtils;
> import org.apache.hadoop.util.Progressable;
> import org.slf4j.Logger;
> @InterfaceAudience.Private
> @InterfaceStability.Evolving
> public class S3AOutputStream
> extends OutputStream {
> private final OutputStream backupStream;
> private final File backupFile;
> private final AtomicBoolean closed = new AtomicBoolean(false);
> private final String key;
> private final Progressable progress;
> private final S3AFileSystem fs;
> public static final Logger LOG = S3AFileSystem.LOG;
> public S3AOutputStream(Configuration conf, S3AFileSystem fs, String key, 
> Progressable progress) throws IOException {
> this.key = key;
> this.progress = progress;
> this.fs = fs;
> this.backupFile = fs.createTmpFileForWrite("output-", -1, conf);
> LOG.debug("OutputStream for key '{}' writing to tempfile: {}", 
> (Object)key, (Object)this.backupFile);
> this.backupStream = new BufferedOutputStream(new 
> FileOutputStream(this.backupFile));
> }
> void checkOpen() throws IOException {
> if (!this.closed.get()) return;
> // vv-- Additional logging --vvv
> LOG.error("OutputStream for key '{}' closed.", (Object)this.key);
> throw new IOException("Output Stream closed");
> }
> @Override
> public void flush() throws IOException {
> this.checkOpen();
> this.backupStream.flush();
> }
> @Override
> public void close() throws IOException {
> if (this.closed.getAndSet(true)) {
> return;
> }
> this.backupStream.close();
> LOG.debug("OutputStream for key '{}' closed. Now beginning upload", 
> (Object)this.key);
> try {
> ObjectMetadata om = 
> this.fs.newObjectMetadata(this.backupFile.length());
> Upload upload = 
> this.fs.putObject(this.fs.newPutObjectRequest(this.key, om,

[jira] [Commented] (FLINK-8543) Output Stream closed at org.apache.hadoop.fs.s3a.S3AOutputStream.checkOpen

2018-02-26 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-8543?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16377506#comment-16377506
 ] 

ASF GitHub Bot commented on FLINK-8543:
---

Github user aljoscha closed the pull request at:

https://github.com/apache/flink/pull/5563


> Output Stream closed at org.apache.hadoop.fs.s3a.S3AOutputStream.checkOpen
> --
>
> Key: FLINK-8543
> URL: https://issues.apache.org/jira/browse/FLINK-8543
> Project: Flink
>  Issue Type: Bug
>  Components: Streaming Connectors
>Affects Versions: 1.4.0
> Environment: IBM Analytics Engine - 
> [https://console.bluemix.net/docs/services/AnalyticsEngine/index.html#introduction]
> The cluster is based on Hortonworks Data Platform 2.6.2. The following 
> components are made available.
> Apache Spark 2.1.1 Hadoop 2.7.3
> Apache Livy 0.3.0
> Knox 0.12.0
> Ambari 2.5.2
> Anaconda with Python 2.7.13 and 3.5.2 
> Jupyter Enterprise Gateway 0.5.0 
> HBase 1.1.2 * 
> Hive 1.2.1 *
> Oozie 4.2.0 *
> Flume 1.5.2 * 
> Tez 0.7.0 * 
> Pig 0.16.0 * 
> Sqoop 1.4.6 * 
> Slider 0.92.0 * 
>Reporter: chris snow
>Priority: Blocker
> Fix For: 1.5.0
>
> Attachments: Screen Shot 2018-01-30 at 18.34.51.png
>
>
> I'm hitting an issue with my BucketingSink from a streaming job.
>  
> {code:java}
> return new BucketingSink>(path)
>  .setWriter(writer)
>  .setBucketer(new DateTimeBucketer Object>>(formatString));
> {code}
>  
> I can see that a few files have run into issues with uploading to S3:
> !Screen Shot 2018-01-30 at 18.34.51.png!   
> The Flink console output is showing an exception being thrown by 
> S3AOutputStream, so I've grabbed the S3AOutputStream class from my cluster 
> and added some additional logging to the checkOpen() method to log the 'key' 
> just before the exception is thrown:
>  
> {code:java}
> /*
>  * Decompiled with CFR.
>  */
> package org.apache.hadoop.fs.s3a;
> import com.amazonaws.AmazonClientException;
> import com.amazonaws.event.ProgressListener;
> import com.amazonaws.services.s3.model.ObjectMetadata;
> import com.amazonaws.services.s3.model.PutObjectRequest;
> import com.amazonaws.services.s3.transfer.Upload;
> import com.amazonaws.services.s3.transfer.model.UploadResult;
> import java.io.BufferedOutputStream;
> import java.io.File;
> import java.io.FileOutputStream;
> import java.io.IOException;
> import java.io.InterruptedIOException;
> import java.io.OutputStream;
> import java.util.concurrent.atomic.AtomicBoolean;
> import org.apache.hadoop.classification.InterfaceAudience;
> import org.apache.hadoop.classification.InterfaceStability;
> import org.apache.hadoop.conf.Configuration;
> import org.apache.hadoop.fs.s3a.ProgressableProgressListener;
> import org.apache.hadoop.fs.s3a.S3AFileSystem;
> import org.apache.hadoop.fs.s3a.S3AUtils;
> import org.apache.hadoop.util.Progressable;
> import org.slf4j.Logger;
> @InterfaceAudience.Private
> @InterfaceStability.Evolving
> public class S3AOutputStream
> extends OutputStream {
> private final OutputStream backupStream;
> private final File backupFile;
> private final AtomicBoolean closed = new AtomicBoolean(false);
> private final String key;
> private final Progressable progress;
> private final S3AFileSystem fs;
> public static final Logger LOG = S3AFileSystem.LOG;
> public S3AOutputStream(Configuration conf, S3AFileSystem fs, String key, 
> Progressable progress) throws IOException {
> this.key = key;
> this.progress = progress;
> this.fs = fs;
> this.backupFile = fs.createTmpFileForWrite("output-", -1, conf);
> LOG.debug("OutputStream for key '{}' writing to tempfile: {}", 
> (Object)key, (Object)this.backupFile);
> this.backupStream = new BufferedOutputStream(new 
> FileOutputStream(this.backupFile));
> }
> void checkOpen() throws IOException {
> if (!this.closed.get()) return;
> // vv-- Additional logging --vvv
> LOG.error("OutputStream for key '{}' closed.", (Object)this.key);
> throw new IOException("Output Stream closed");
> }
> @Override
> public void flush() throws IOException {
> this.checkOpen();
> this.backupStream.flush();
> }
> @Override
> public void close() throws IOException {
> if (this.closed.getAndSet(true)) {
> return;
> }
> this.backupStream.close();
> LOG.debug("OutputStream for key '{}' closed. Now beginning upload", 
> (Object)this.key);
> try {
> ObjectMetadata om = 
> this.fs.newObjectMetadata(this.backupFile.length());
> Upload upload = 
> this.fs.putObject(this.fs.newPutObjectRequest(this.key, om,

[GitHub] flink issue #5563: [FLINK-8543] Don't call super.close() in AvroKeyValueSink...

2018-02-26 Thread aljoscha

Github user aljoscha commented on the issue:

https://github.com/apache/flink/pull/5563
  
merged


---

[GitHub] flink pull request #5563: [FLINK-8543] Don't call super.close() in AvroKeyVa...

2018-02-26 Thread aljoscha

Github user aljoscha closed the pull request at:

https://github.com/apache/flink/pull/5563


---

[jira] [Commented] (FLINK-8703) Migrate tests from LocalFlinkMiniCluster to MiniClusterResource

2018-02-26 Thread Chesnay Schepler (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-8703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16377472#comment-16377472
 ] 

Chesnay Schepler commented on FLINK-8703:
-

second batch merged to master in 3e056b34f7be817e0d0eee612b6ae44891e33501

> Migrate tests from LocalFlinkMiniCluster to MiniClusterResource
> ---
>
> Key: FLINK-8703
> URL: https://issues.apache.org/jira/browse/FLINK-8703
> Project: Flink
>  Issue Type: Sub-task
>  Components: Tests
>Reporter: Aljoscha Krettek
>Assignee: Chesnay Schepler
>Priority: Blocker
> Fix For: 1.5.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (FLINK-8645) Support convenient extension of parent-first ClassLoader patterns

2018-02-26 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-8645?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16377470#comment-16377470
 ] 

ASF GitHub Bot commented on FLINK-8645:
---

Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/5544


> Support convenient extension of parent-first ClassLoader patterns
> -
>
> Key: FLINK-8645
> URL: https://issues.apache.org/jira/browse/FLINK-8645
> Project: Flink
>  Issue Type: Improvement
>  Components: Configuration
>Affects Versions: 1.5.0
>Reporter: Chesnay Schepler
>Assignee: Chesnay Schepler
>Priority: Major
> Fix For: 1.5.0
>
>
> The option {{classloader.parent-first-patterns}} defines a list of class 
> pattern that should always be loaded through the parent class-loader. The 
> default value contains all classes that are effectively required to be loaded 
> that way for Flink to function.
> This list cannot be extended in a convenient way, as one would have to 
> manually copy the existing default and append new entries. This makes the 
> configuration brittle in light of version upgrades where we may extend the 
> default, and also obfuscates the configuration a bit.
> I propose to separate this option into 
> {{classloader.parent-first-patterns.base}}, which subsumes the existing 
> option, and {{classloader.parent-first-patterns.append}} which is 
> automatically appended to the base.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (FLINK-8593) Latency metric docs are outdated

2018-02-26 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-8593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16377471#comment-16377471
 ] 

ASF GitHub Bot commented on FLINK-8593:
---

Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/5484


> Latency metric docs are outdated
> 
>
> Key: FLINK-8593
> URL: https://issues.apache.org/jira/browse/FLINK-8593
> Project: Flink
>  Issue Type: Improvement
>  Components: Documentation, Metrics
>Affects Versions: 1.5.0
>Reporter: Chesnay Schepler
>Assignee: Chesnay Schepler
>Priority: Major
> Fix For: 1.5.0
>
>
> I missed to update the latency metric documentation while working on 
> FLINK-7608. The docs should be updated to contain the new naming scheme and 
> that it is a job-level metric.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (FLINK-8703) Migrate tests from LocalFlinkMiniCluster to MiniClusterResource

2018-02-26 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-8703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16377469#comment-16377469
 ] 

ASF GitHub Bot commented on FLINK-8703:
---

Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/5542


> Migrate tests from LocalFlinkMiniCluster to MiniClusterResource
> ---
>
> Key: FLINK-8703
> URL: https://issues.apache.org/jira/browse/FLINK-8703
> Project: Flink
>  Issue Type: Sub-task
>  Components: Tests
>Reporter: Aljoscha Krettek
>Assignee: Chesnay Schepler
>Priority: Blocker
> Fix For: 1.5.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (FLINK-8596) Custom command line code does not correctly catch errors

2018-02-26 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-8596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16377468#comment-16377468
 ] 

ASF GitHub Bot commented on FLINK-8596:
---

Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/5543


> Custom command line code does not correctly catch errors
> 
>
> Key: FLINK-8596
> URL: https://issues.apache.org/jira/browse/FLINK-8596
> Project: Flink
>  Issue Type: Bug
>  Components: Client
>Affects Versions: 1.5.0
>Reporter: Aljoscha Krettek
>Assignee: Chesnay Schepler
>Priority: Blocker
> Fix For: 1.5.0
>
>
> The code that is trying to load the YARN CLI is catching {{Exception}} but if 
> the YARN CLI cannot be loaded a {{java.lang.NoClassDefFoundError}} is thrown: 
> https://github.com/apache/flink/blob/0e20b613087e1b326e05674e3d532ea4aa444bc3/flink-clients/src/main/java/org/apache/flink/client/cli/CliFrontend.java#L1067
> This means that Flink does not work in Hadoop-free mode. We should fix this 
> and also add an end-to-end test that verifies that Hadoop-free mode works.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[GitHub] flink pull request #5543: [FLINK-8596][CLI] Also catch NoClassDefFoundErrors

2018-02-26 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/5543


---

[GitHub] flink pull request #5484: [FLINK-8593][metrics] Update latency metric docs

2018-02-26 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/5484


---

[GitHub] flink pull request #5544: [FLINK-8645][configuration] Split classloader.pare...

2018-02-26 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/5544


---

[GitHub] flink pull request #5542: [FLINK-8703][tests] Migrate tests from LocalFlinkM...

2018-02-26 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/5542


---

[jira] [Closed] (FLINK-8645) Support convenient extension of parent-first ClassLoader patterns

2018-02-26 Thread Chesnay Schepler (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-8645?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chesnay Schepler closed FLINK-8645.
---
   Resolution: Fixed
Fix Version/s: 1.5.0

master: 50aea889bd88d5ce4a1569569176705f9973a08c 

> Support convenient extension of parent-first ClassLoader patterns
> -
>
> Key: FLINK-8645
> URL: https://issues.apache.org/jira/browse/FLINK-8645
> Project: Flink
>  Issue Type: Improvement
>  Components: Configuration
>Affects Versions: 1.5.0
>Reporter: Chesnay Schepler
>Assignee: Chesnay Schepler
>Priority: Major
> Fix For: 1.5.0
>
>
> The option {{classloader.parent-first-patterns}} defines a list of class 
> pattern that should always be loaded through the parent class-loader. The 
> default value contains all classes that are effectively required to be loaded 
> that way for Flink to function.
> This list cannot be extended in a convenient way, as one would have to 
> manually copy the existing default and append new entries. This makes the 
> configuration brittle in light of version upgrades where we may extend the 
> default, and also obfuscates the configuration a bit.
> I propose to separate this option into 
> {{classloader.parent-first-patterns.base}}, which subsumes the existing 
> option, and {{classloader.parent-first-patterns.append}} which is 
> automatically appended to the base.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Closed] (FLINK-8596) Custom command line code does not correctly catch errors

2018-02-26 Thread Chesnay Schepler (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-8596?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chesnay Schepler closed FLINK-8596.
---
Resolution: Fixed

master: f3bfd22b41acd4c16f71d6d69f7983abc63a1e71

> Custom command line code does not correctly catch errors
> 
>
> Key: FLINK-8596
> URL: https://issues.apache.org/jira/browse/FLINK-8596
> Project: Flink
>  Issue Type: Bug
>  Components: Client
>Affects Versions: 1.5.0
>Reporter: Aljoscha Krettek
>Assignee: Chesnay Schepler
>Priority: Blocker
> Fix For: 1.5.0
>
>
> The code that is trying to load the YARN CLI is catching {{Exception}} but if 
> the YARN CLI cannot be loaded a {{java.lang.NoClassDefFoundError}} is thrown: 
> https://github.com/apache/flink/blob/0e20b613087e1b326e05674e3d532ea4aa444bc3/flink-clients/src/main/java/org/apache/flink/client/cli/CliFrontend.java#L1067
> This means that Flink does not work in Hadoop-free mode. We should fix this 
> and also add an end-to-end test that verifies that Hadoop-free mode works.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Closed] (FLINK-8593) Latency metric docs are outdated

2018-02-26 Thread Chesnay Schepler (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-8593?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chesnay Schepler closed FLINK-8593.
---
Resolution: Fixed

master: a2d1d084b90f0f47b91aa372525f01a382c892e7

> Latency metric docs are outdated
> 
>
> Key: FLINK-8593
> URL: https://issues.apache.org/jira/browse/FLINK-8593
> Project: Flink
>  Issue Type: Improvement
>  Components: Documentation, Metrics
>Affects Versions: 1.5.0
>Reporter: Chesnay Schepler
>Assignee: Chesnay Schepler
>Priority: Major
> Fix For: 1.5.0
>
>
> I missed to update the latency metric documentation while working on 
> FLINK-7608. The docs should be updated to contain the new naming scheme and 
> that it is a job-level metric.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[GitHub] flink pull request #5585: Release 1.4

2018-02-26 Thread ArtHustonHitachi

Github user ArtHustonHitachi closed the pull request at:

https://github.com/apache/flink/pull/5585


---

[GitHub] flink pull request #5585: Release 1.4

2018-02-26 Thread ArtHustonHitachi

GitHub user ArtHustonHitachi opened a pull request:

https://github.com/apache/flink/pull/5585

Release 1.4

*Thank you very much for contributing to Apache Flink - we are happy that 
you want to help us improve Flink. To help the community review your 
contribution in the best possible way, please go through the checklist below, 
which will get the contribution into a shape in which it can be best reviewed.*

*Please understand that we do not do this to make contributions to Flink a 
hassle. In order to uphold a high standard of quality for code contributions, 
while at the same time managing a large number of contributions, we need 
contributors to prepare the contributions well, and give reviewers enough 
contextual information for the review. Please also understand that 
contributions that do not follow this guide will take longer to review and thus 
typically be picked up with lower priority by the community.*

## Contribution Checklist

  - Make sure that the pull request corresponds to a [JIRA 
issue](https://issues.apache.org/jira/projects/FLINK/issues). Exceptions are 
made for typos in JavaDoc or documentation files, which need no JIRA issue.
  
  - Name the pull request in the form "[FLINK-] [component] Title of 
the pull request", where *FLINK-* should be replaced by the actual issue 
number. Skip *component* if you are unsure about which is the best component.
  Typo fixes that have no associated JIRA issue should be named following 
this pattern: `[hotfix] [docs] Fix typo in event time introduction` or 
`[hotfix] [javadocs] Expand JavaDoc for PuncuatedWatermarkGenerator`.

  - Fill out the template below to describe the changes contributed by the 
pull request. That will give reviewers the context they need to do the review.
  
  - Make sure that the change passes the automated tests, i.e., `mvn clean 
verify` passes. You can set up Travis CI to do that following [this 
guide](http://flink.apache.org/contribute-code.html#best-practices).

  - Each pull request should address only one issue, not mix up code from 
multiple issues.
  
  - Each commit in the pull request has a meaningful commit message 
(including the JIRA id)

  - Once all items of the checklist are addressed, remove the above text 
and this checklist, leaving only the filled out template below.


**(The sections below can be removed for hotfixes of typos)**

## What is the purpose of the change

*(For example: This pull request makes task deployment go through the blob 
server, rather than through RPC. That way we avoid re-transferring them on each 
deployment (during recovery).)*


## Brief change log

*(for example:)*
  - *The TaskInfo is stored in the blob store on job creation time as a 
persistent artifact*
  - *Deployments RPC transmits only the blob storage reference*
  - *TaskManagers retrieve the TaskInfo from the blob cache*


## Verifying this change

*(Please pick either of the following options)*

This change is a trivial rework / code cleanup without any test coverage.

*(or)*

This change is already covered by existing tests, such as *(please describe 
tests)*.

*(or)*

This change added tests and can be verified as follows:

*(example:)*
  - *Added integration tests for end-to-end deployment with large payloads 
(100MB)*
  - *Extended integration test for recovery after master (JobManager) 
failure*
  - *Added test that validates that TaskInfo is transferred only once 
across recoveries*
  - *Manually verified the change by running a 4 node cluser with 2 
JobManagers and 4 TaskManagers, a stateful streaming program, and killing one 
JobManager and two TaskManagers during the execution, verifying that recovery 
happens correctly.*

## Does this pull request potentially affect one of the following parts:

  - Dependencies (does it add or upgrade a dependency): (yes / no)
  - The public API, i.e., is any changed class annotated with 
`@Public(Evolving)`: (yes / no)
  - The serializers: (yes / no / don't know)
  - The runtime per-record code paths (performance sensitive): (yes / no / 
don't know)
  - Anything that affects deployment or recovery: JobManager (and its 
components), Checkpointing, Yarn/Mesos, ZooKeeper: (yes / no / don't know)
  - The S3 file system connector: (yes / no / don't know)

## Documentation

  - Does this pull request introduce a new feature? (yes / no)
  - If yes, how is the feature documented? (not applicable / docs / 
JavaDocs / not documented)


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/apache/flink release-1.4

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/5585.patch

To

[jira] [Commented] (FLINK-8787) Deploying FLIP-6 YARN session with HA fails

2018-02-26 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-8787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16377273#comment-16377273
 ] 

ASF GitHub Bot commented on FLINK-8787:
---

Github user GJL commented on a diff in the pull request:

https://github.com/apache/flink/pull/5584#discussion_r170679952
  
--- Diff: 
flink-yarn/src/main/java/org/apache/flink/yarn/AbstractYarnClusterDescriptor.java
 ---
@@ -186,7 +187,7 @@ public YarnClient getYarnClient() {
protected abstract String getYarnJobClusterEntrypoint();
 
public Configuration getFlinkConfiguration() {
-   return flinkConfiguration;
+   return new UnmodifiableConfiguration(flinkConfiguration);
--- End diff --

not needed, `flinkConfiguration` is unmodifiable already


> Deploying FLIP-6 YARN session with HA fails
> ---
>
> Key: FLINK-8787
> URL: https://issues.apache.org/jira/browse/FLINK-8787
> Project: Flink
>  Issue Type: Bug
>  Components: Client, YARN
>Affects Versions: 1.5.0
> Environment: emr-5.12.0
> Hadoop distribution: Amazon 2.8.3
> Applications: ZooKeeper 3.4.10
>Reporter: Gary Yao
>Assignee: Gary Yao
>Priority: Blocker
>  Labels: flip-6
> Fix For: 1.5.0
>
>
> Starting a YARN session with HA in FLIP-6 mode fails with an exception.
> Commit: 5e3fa4403f518dd6d3fe9970fe8ca55871add7c9
> Command to start YARN session:
> {noformat}
> export HADOOP_CLASSPATH=`hadoop classpath`
> HADOOP_CONF_DIR=/etc/hadoop/conf bin/yarn-session.sh -d -n 1 -s 1 -jm 2048 
> -tm 2048
> {noformat}
> Stacktrace:
> {noformat}
> java.lang.reflect.UndeclaredThrowableException
>  at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1854)
>  at 
> org.apache.flink.runtime.security.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:41)
>  at 
> org.apache.flink.yarn.cli.FlinkYarnSessionCli.main(FlinkYarnSessionCli.java:790)
> Caused by: org.apache.flink.util.FlinkException: Could not write the Yarn 
> connection information.
>  at 
> org.apache.flink.yarn.cli.FlinkYarnSessionCli.run(FlinkYarnSessionCli.java:612)
>  at 
> org.apache.flink.yarn.cli.FlinkYarnSessionCli.lambda$main$2(FlinkYarnSessionCli.java:790)
>  at java.security.AccessController.doPrivileged(Native Method)
>  at javax.security.auth.Subject.doAs(Subject.java:422)
>  at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1836)
>  ... 2 more
> Caused by: org.apache.flink.runtime.leaderretrieval.LeaderRetrievalException: 
> Could not retrieve the leader address and leader session ID.
>  at 
> org.apache.flink.runtime.util.LeaderRetrievalUtils.retrieveLeaderConnectionInfo(LeaderRetrievalUtils.java:116)
>  at 
> org.apache.flink.client.program.rest.RestClusterClient.getClusterConnectionInfo(RestClusterClient.java:405)
>  at 
> org.apache.flink.yarn.cli.FlinkYarnSessionCli.run(FlinkYarnSessionCli.java:589)
>  ... 6 more
> Caused by: java.util.concurrent.TimeoutException: Futures timed out after 
> [6 milliseconds]
>  at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:223)
>  at scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:227)
>  at scala.concurrent.Await$$anonfun$result$1.apply(package.scala:190)
>  at 
> scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53)
>  at scala.concurrent.Await$.result(package.scala:190)
>  at scala.concurrent.Await.result(package.scala)
>  at 
> org.apache.flink.runtime.util.LeaderRetrievalUtils.retrieveLeaderConnectionInfo(LeaderRetrievalUtils.java:114)
>  ... 8 more
> 
>  The program finished with the following exception:
> java.lang.reflect.UndeclaredThrowableException
>  at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1854)
>  at 
> org.apache.flink.runtime.security.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:41)
>  at 
> org.apache.flink.yarn.cli.FlinkYarnSessionCli.main(FlinkYarnSessionCli.java:790)
> Caused by: org.apache.flink.util.FlinkException: Could not write the Yarn 
> connection information.
>  at 
> org.apache.flink.yarn.cli.FlinkYarnSessionCli.run(FlinkYarnSessionCli.java:612)
>  at 
> org.apache.flink.yarn.cli.FlinkYarnSessionCli.lambda$main$2(FlinkYarnSessionCli.java:790)
>  at java.security.AccessController.doPrivileged(Native Method)
>  at javax.security.auth.Subject.doAs(Subject.java:422)
>  at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1836)
>  ... 2 more
> Caused by: org.apache.flink.runtime.leaderretrieval.LeaderRetrievalException: 
> Could not retrieve the leader address and leader session ID.
>  at 
>

[GitHub] flink pull request #5584: [FLINK-8787][flip6] WIP

2018-02-26 Thread GJL

Github user GJL commented on a diff in the pull request:

https://github.com/apache/flink/pull/5584#discussion_r170679952
  
--- Diff: 
flink-yarn/src/main/java/org/apache/flink/yarn/AbstractYarnClusterDescriptor.java
 ---
@@ -186,7 +187,7 @@ public YarnClient getYarnClient() {
protected abstract String getYarnJobClusterEntrypoint();
 
public Configuration getFlinkConfiguration() {
-   return flinkConfiguration;
+   return new UnmodifiableConfiguration(flinkConfiguration);
--- End diff --

not needed, `flinkConfiguration` is unmodifiable already


---

[jira] [Commented] (FLINK-8787) Deploying FLIP-6 YARN session with HA fails

2018-02-26 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-8787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16377267#comment-16377267
 ] 

ASF GitHub Bot commented on FLINK-8787:
---

GitHub user GJL opened a pull request:

https://github.com/apache/flink/pull/5584

[FLINK-8787][flip6] WIP

WIP

## What is the purpose of the change

*(For example: This pull request makes task deployment go through the blob 
server, rather than through RPC. That way we avoid re-transferring them on each 
deployment (during recovery).)*

cc: @tillrohrmann 

## Brief change log

*(for example:)*
  - *The TaskInfo is stored in the blob store on job creation time as a 
persistent artifact*
  - *Deployments RPC transmits only the blob storage reference*
  - *TaskManagers retrieve the TaskInfo from the blob cache*


## Verifying this change

*(Please pick either of the following options)*

This change is a trivial rework / code cleanup without any test coverage.

*(or)*

This change is already covered by existing tests, such as *(please describe 
tests)*.

*(or)*

This change added tests and can be verified as follows:

*(example:)*
  - *Added integration tests for end-to-end deployment with large payloads 
(100MB)*
  - *Extended integration test for recovery after master (JobManager) 
failure*
  - *Added test that validates that TaskInfo is transferred only once 
across recoveries*
  - *Manually verified the change by running a 4 node cluser with 2 
JobManagers and 4 TaskManagers, a stateful streaming program, and killing one 
JobManager and two TaskManagers during the execution, verifying that recovery 
happens correctly.*

## Does this pull request potentially affect one of the following parts:

  - Dependencies (does it add or upgrade a dependency): (yes / no)
  - The public API, i.e., is any changed class annotated with 
`@Public(Evolving)`: (yes / no)
  - The serializers: (yes / no / don't know)
  - The runtime per-record code paths (performance sensitive): (yes / no / 
don't know)
  - Anything that affects deployment or recovery: JobManager (and its 
components), Checkpointing, Yarn/Mesos, ZooKeeper: (yes / no / don't know)
  - The S3 file system connector: (yes / no / don't know)

## Documentation

  - Does this pull request introduce a new feature? (yes / no)
  - If yes, how is the feature documented? (not applicable / docs / 
JavaDocs / not documented)


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/GJL/flink FLINK-8787

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/5584.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #5584


commit 0cde09add17106f09e1f44b2a73400ea14a9eb21
Author: gyao 
Date:   2018-02-26T17:52:18Z

[hotfix] Add requireNonNull validation to Configuration copy constructor

commit 8f826c673bafd8228bb25da20e7dd40384fae971
Author: gyao 
Date:   2018-02-26T17:54:34Z

[FLINK-8787][flip6] Ensure that zk namespace configuration reaches 
RestClusterClient




> Deploying FLIP-6 YARN session with HA fails
> ---
>
> Key: FLINK-8787
> URL: https://issues.apache.org/jira/browse/FLINK-8787
> Project: Flink
>  Issue Type: Bug
>  Components: Client, YARN
>Affects Versions: 1.5.0
> Environment: emr-5.12.0
> Hadoop distribution: Amazon 2.8.3
> Applications: ZooKeeper 3.4.10
>Reporter: Gary Yao
>Assignee: Gary Yao
>Priority: Blocker
>  Labels: flip-6
> Fix For: 1.5.0
>
>
> Starting a YARN session with HA in FLIP-6 mode fails with an exception.
> Commit: 5e3fa4403f518dd6d3fe9970fe8ca55871add7c9
> Command to start YARN session:
> {noformat}
> export HADOOP_CLASSPATH=`hadoop classpath`
> HADOOP_CONF_DIR=/etc/hadoop/conf bin/yarn-session.sh -d -n 1 -s 1 -jm 2048 
> -tm 2048
> {noformat}
> Stacktrace:
> {noformat}
> java.lang.reflect.UndeclaredThrowableException
>  at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1854)
>  at 
> org.apache.flink.runtime.security.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:41)
>  at 
> org.apache.flink.yarn.cli.FlinkYarnSessionCli.main(FlinkYarnSessionCli.java:790)
> Caused by: org.apache.flink.util.FlinkException: Could not write the Yarn 
> connection information.
>  at 
> org.apache.flink.yarn.cli.FlinkYarnSessionCli.run(FlinkYarnSessionCli.java:612)
>  at 
> org.apache.flink.yarn.cli.FlinkYarnSessionCli.lambda$main$2(FlinkYarnSessionCli.java:790)
>  at

[GitHub] flink pull request #5584: [FLINK-8787][flip6] WIP

2018-02-26 Thread GJL

GitHub user GJL opened a pull request:

https://github.com/apache/flink/pull/5584

[FLINK-8787][flip6] WIP

WIP

## What is the purpose of the change

*(For example: This pull request makes task deployment go through the blob 
server, rather than through RPC. That way we avoid re-transferring them on each 
deployment (during recovery).)*

cc: @tillrohrmann 

## Brief change log

*(for example:)*
  - *The TaskInfo is stored in the blob store on job creation time as a 
persistent artifact*
  - *Deployments RPC transmits only the blob storage reference*
  - *TaskManagers retrieve the TaskInfo from the blob cache*


## Verifying this change

*(Please pick either of the following options)*

This change is a trivial rework / code cleanup without any test coverage.

*(or)*

This change is already covered by existing tests, such as *(please describe 
tests)*.

*(or)*

This change added tests and can be verified as follows:

*(example:)*
  - *Added integration tests for end-to-end deployment with large payloads 
(100MB)*
  - *Extended integration test for recovery after master (JobManager) 
failure*
  - *Added test that validates that TaskInfo is transferred only once 
across recoveries*
  - *Manually verified the change by running a 4 node cluser with 2 
JobManagers and 4 TaskManagers, a stateful streaming program, and killing one 
JobManager and two TaskManagers during the execution, verifying that recovery 
happens correctly.*

## Does this pull request potentially affect one of the following parts:

  - Dependencies (does it add or upgrade a dependency): (yes / no)
  - The public API, i.e., is any changed class annotated with 
`@Public(Evolving)`: (yes / no)
  - The serializers: (yes / no / don't know)
  - The runtime per-record code paths (performance sensitive): (yes / no / 
don't know)
  - Anything that affects deployment or recovery: JobManager (and its 
components), Checkpointing, Yarn/Mesos, ZooKeeper: (yes / no / don't know)
  - The S3 file system connector: (yes / no / don't know)

## Documentation

  - Does this pull request introduce a new feature? (yes / no)
  - If yes, how is the feature documented? (not applicable / docs / 
JavaDocs / not documented)


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/GJL/flink FLINK-8787

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/5584.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #5584


commit 0cde09add17106f09e1f44b2a73400ea14a9eb21
Author: gyao 
Date:   2018-02-26T17:52:18Z

[hotfix] Add requireNonNull validation to Configuration copy constructor

commit 8f826c673bafd8228bb25da20e7dd40384fae971
Author: gyao 
Date:   2018-02-26T17:54:34Z

[FLINK-8787][flip6] Ensure that zk namespace configuration reaches 
RestClusterClient




---

[jira] [Updated] (FLINK-8787) Deploying FLIP-6 YARN session with HA fails

2018-02-26 Thread Gary Yao (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-8787?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gary Yao updated FLINK-8787:

Description: 
Starting a YARN session with HA in FLIP-6 mode fails with an exception.

Commit: 5e3fa4403f518dd6d3fe9970fe8ca55871add7c9

Command to start YARN session:
{noformat}
export HADOOP_CLASSPATH=`hadoop classpath`
HADOOP_CONF_DIR=/etc/hadoop/conf bin/yarn-session.sh -d -n 1 -s 1 -jm 2048 -tm 
2048
{noformat}

Stacktrace:
{noformat}
java.lang.reflect.UndeclaredThrowableException
 at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1854)
 at 
org.apache.flink.runtime.security.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:41)
 at 
org.apache.flink.yarn.cli.FlinkYarnSessionCli.main(FlinkYarnSessionCli.java:790)
Caused by: org.apache.flink.util.FlinkException: Could not write the Yarn 
connection information.
 at 
org.apache.flink.yarn.cli.FlinkYarnSessionCli.run(FlinkYarnSessionCli.java:612)
 at 
org.apache.flink.yarn.cli.FlinkYarnSessionCli.lambda$main$2(FlinkYarnSessionCli.java:790)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:422)
 at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1836)
 ... 2 more
Caused by: org.apache.flink.runtime.leaderretrieval.LeaderRetrievalException: 
Could not retrieve the leader address and leader session ID.
 at 
org.apache.flink.runtime.util.LeaderRetrievalUtils.retrieveLeaderConnectionInfo(LeaderRetrievalUtils.java:116)
 at 
org.apache.flink.client.program.rest.RestClusterClient.getClusterConnectionInfo(RestClusterClient.java:405)
 at 
org.apache.flink.yarn.cli.FlinkYarnSessionCli.run(FlinkYarnSessionCli.java:589)
 ... 6 more
Caused by: java.util.concurrent.TimeoutException: Futures timed out after 
[6 milliseconds]
 at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:223)
 at scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:227)
 at scala.concurrent.Await$$anonfun$result$1.apply(package.scala:190)
 at 
scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53)
 at scala.concurrent.Await$.result(package.scala:190)
 at scala.concurrent.Await.result(package.scala)
 at 
org.apache.flink.runtime.util.LeaderRetrievalUtils.retrieveLeaderConnectionInfo(LeaderRetrievalUtils.java:114)
 ... 8 more


 The program finished with the following exception:

java.lang.reflect.UndeclaredThrowableException
 at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1854)
 at 
org.apache.flink.runtime.security.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:41)
 at 
org.apache.flink.yarn.cli.FlinkYarnSessionCli.main(FlinkYarnSessionCli.java:790)
Caused by: org.apache.flink.util.FlinkException: Could not write the Yarn 
connection information.
 at 
org.apache.flink.yarn.cli.FlinkYarnSessionCli.run(FlinkYarnSessionCli.java:612)
 at 
org.apache.flink.yarn.cli.FlinkYarnSessionCli.lambda$main$2(FlinkYarnSessionCli.java:790)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:422)
 at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1836)
 ... 2 more
Caused by: org.apache.flink.runtime.leaderretrieval.LeaderRetrievalException: 
Could not retrieve the leader address and leader session ID.
 at 
org.apache.flink.runtime.util.LeaderRetrievalUtils.retrieveLeaderConnectionInfo(LeaderRetrievalUtils.java:116)
 at 
org.apache.flink.client.program.rest.RestClusterClient.getClusterConnectionInfo(RestClusterClient.java:405)
 at 
org.apache.flink.yarn.cli.FlinkYarnSessionCli.run(FlinkYarnSessionCli.java:589)
 ... 6 more
Caused by: java.util.concurrent.TimeoutException: Futures timed out after 
[6 milliseconds]
 at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:223)
 at scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:227)
 at scala.concurrent.Await$$anonfun$result$1.apply(package.scala:190)
 at 
scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53)
 at scala.concurrent.Await$.result(package.scala:190)
 at scala.concurrent.Await.result(package.scala)
 at 
org.apache.flink.runtime.util.LeaderRetrievalUtils.retrieveLeaderConnectionInfo(LeaderRetrievalUtils.java:114)
 ... 8 more

{noformat}

  was:
Starting a YARN session in FLIP-6 mode fails with an exception.

Commit: 5e3fa4403f518dd6d3fe9970fe8ca55871add7c9

Command to start YARN session:
{noformat}
export HADOOP_CLASSPATH=`hadoop classpath`
HADOOP_CONF_DIR=/etc/hadoop/conf bin/yarn-session.sh -d -n 1 -s 1 -jm 2048 -tm 
2048
{noformat}

Stacktrace:
{noformat}
java.lang.reflect.UndeclaredThrowableException
 at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1854)
 at

[jira] [Updated] (FLINK-8787) Deploying FLIP-6 YARN session with HA fails

2018-02-26 Thread Gary Yao (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-8787?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gary Yao updated FLINK-8787:

Summary: Deploying FLIP-6 YARN session with HA fails  (was: Deploying 
FLIP-6 YARN session fails)

> Deploying FLIP-6 YARN session with HA fails
> ---
>
> Key: FLINK-8787
> URL: https://issues.apache.org/jira/browse/FLINK-8787
> Project: Flink
>  Issue Type: Bug
>  Components: Client, YARN
>Affects Versions: 1.5.0
> Environment: emr-5.12.0
> Hadoop distribution: Amazon 2.8.3
> Applications: ZooKeeper 3.4.10
>Reporter: Gary Yao
>Assignee: Gary Yao
>Priority: Blocker
>  Labels: flip-6
> Fix For: 1.5.0
>
>
> Starting a YARN session in FLIP-6 mode fails with an exception.
> Commit: 5e3fa4403f518dd6d3fe9970fe8ca55871add7c9
> Command to start YARN session:
> {noformat}
> export HADOOP_CLASSPATH=`hadoop classpath`
> HADOOP_CONF_DIR=/etc/hadoop/conf bin/yarn-session.sh -d -n 1 -s 1 -jm 2048 
> -tm 2048
> {noformat}
> Stacktrace:
> {noformat}
> java.lang.reflect.UndeclaredThrowableException
>  at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1854)
>  at 
> org.apache.flink.runtime.security.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:41)
>  at 
> org.apache.flink.yarn.cli.FlinkYarnSessionCli.main(FlinkYarnSessionCli.java:790)
> Caused by: org.apache.flink.util.FlinkException: Could not write the Yarn 
> connection information.
>  at 
> org.apache.flink.yarn.cli.FlinkYarnSessionCli.run(FlinkYarnSessionCli.java:612)
>  at 
> org.apache.flink.yarn.cli.FlinkYarnSessionCli.lambda$main$2(FlinkYarnSessionCli.java:790)
>  at java.security.AccessController.doPrivileged(Native Method)
>  at javax.security.auth.Subject.doAs(Subject.java:422)
>  at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1836)
>  ... 2 more
> Caused by: org.apache.flink.runtime.leaderretrieval.LeaderRetrievalException: 
> Could not retrieve the leader address and leader session ID.
>  at 
> org.apache.flink.runtime.util.LeaderRetrievalUtils.retrieveLeaderConnectionInfo(LeaderRetrievalUtils.java:116)
>  at 
> org.apache.flink.client.program.rest.RestClusterClient.getClusterConnectionInfo(RestClusterClient.java:405)
>  at 
> org.apache.flink.yarn.cli.FlinkYarnSessionCli.run(FlinkYarnSessionCli.java:589)
>  ... 6 more
> Caused by: java.util.concurrent.TimeoutException: Futures timed out after 
> [6 milliseconds]
>  at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:223)
>  at scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:227)
>  at scala.concurrent.Await$$anonfun$result$1.apply(package.scala:190)
>  at 
> scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53)
>  at scala.concurrent.Await$.result(package.scala:190)
>  at scala.concurrent.Await.result(package.scala)
>  at 
> org.apache.flink.runtime.util.LeaderRetrievalUtils.retrieveLeaderConnectionInfo(LeaderRetrievalUtils.java:114)
>  ... 8 more
> 
>  The program finished with the following exception:
> java.lang.reflect.UndeclaredThrowableException
>  at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1854)
>  at 
> org.apache.flink.runtime.security.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:41)
>  at 
> org.apache.flink.yarn.cli.FlinkYarnSessionCli.main(FlinkYarnSessionCli.java:790)
> Caused by: org.apache.flink.util.FlinkException: Could not write the Yarn 
> connection information.
>  at 
> org.apache.flink.yarn.cli.FlinkYarnSessionCli.run(FlinkYarnSessionCli.java:612)
>  at 
> org.apache.flink.yarn.cli.FlinkYarnSessionCli.lambda$main$2(FlinkYarnSessionCli.java:790)
>  at java.security.AccessController.doPrivileged(Native Method)
>  at javax.security.auth.Subject.doAs(Subject.java:422)
>  at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1836)
>  ... 2 more
> Caused by: org.apache.flink.runtime.leaderretrieval.LeaderRetrievalException: 
> Could not retrieve the leader address and leader session ID.
>  at 
> org.apache.flink.runtime.util.LeaderRetrievalUtils.retrieveLeaderConnectionInfo(LeaderRetrievalUtils.java:116)
>  at 
> org.apache.flink.client.program.rest.RestClusterClient.getClusterConnectionInfo(RestClusterClient.java:405)
>  at 
> org.apache.flink.yarn.cli.FlinkYarnSessionCli.run(FlinkYarnSessionCli.java:589)
>  ... 6 more
> Caused by: java.util.concurrent.TimeoutException: Futures timed out after 
> [6 milliseconds]
>  at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:223)
>  at scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:227)
>  at scala.concurrent.Await$$anonfun$result$1.apply(package.scala:190)
>

[jira] [Commented] (FLINK-6352) FlinkKafkaConsumer should support to use timestamp to set up start offset

2018-02-26 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-6352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16377178#comment-16377178
 ] 

ASF GitHub Bot commented on FLINK-6352:
---

Github user aljoscha commented on a diff in the pull request:

https://github.com/apache/flink/pull/5282#discussion_r170663674
  
--- Diff: 
flink-connectors/flink-connector-kafka-0.10/src/test/java/org/apache/flink/streaming/connectors/kafka/internal/Kafka010FetcherTest.java
 ---
@@ -129,9 +129,14 @@ public Void answer(InvocationOnMock invocation) {
schema,
new Properties(),
0L,
+<<< HEAD
--- End diff --

Leftover merge markers?


> FlinkKafkaConsumer should support to use timestamp to set up start offset
> -
>
> Key: FLINK-6352
> URL: https://issues.apache.org/jira/browse/FLINK-6352
> Project: Flink
>  Issue Type: Improvement
>  Components: Kafka Connector
>Reporter: Fang Yong
>Assignee: Fang Yong
>Priority: Blocker
> Fix For: 1.5.0
>
>
> Currently "auto.offset.reset" is used to initialize the start offset of 
> FlinkKafkaConsumer, and the value should be earliest/latest/none. This method 
> can only let the job comsume the beginning or the most recent data, but can 
> not specify the specific offset of Kafka began to consume. 
> So, there should be a configuration item (such as 
> "flink.source.start.time" and the format is "-MM-dd HH:mm:ss") that 
> allows user to configure the initial offset of Kafka. The action of 
> "flink.source.start.time" is as follows:
> 1) job start from checkpoint / savepoint
>   a> offset of partition can be restored from checkpoint/savepoint,  
> "flink.source.start.time" will be ignored.
>   b> there's no checkpoint/savepoint for the partition (For example, this 
> partition is newly increased), the "flink.kafka.start.time" will be used to 
> initialize the offset of the partition
> 2) job has no checkpoint / savepoint, the "flink.source.start.time" is used 
> to initialize the offset of the kafka
>   a> the "flink.source.start.time" is valid, use it to set the offset of kafka
>   b> the "flink.source.start.time" is out-of-range, the same as it does 
> currently with no initial offset, get kafka's current offset and start reading



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (FLINK-8737) Creating a union of UnionGate instances will fail with UnsupportedOperationException when retrieving buffers

2018-02-26 Thread Nico Kruber (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-8737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16377176#comment-16377176
 ] 

Nico Kruber commented on FLINK-8737:


We ([~uce] and me) could not think of any way a user can create a union of 
union input gates by defining a program to run on flink and did not find cases 
that we actually do now in the code base or reasons to do so in future. Just in 
case we missed something, [~StephanEwen] can you confirm that creating a 
recursive union of union input gates is indeed dead code and not needed?

> Creating a union of UnionGate instances will fail with 
> UnsupportedOperationException when retrieving buffers
> 
>
> Key: FLINK-8737
> URL: https://issues.apache.org/jira/browse/FLINK-8737
> Project: Flink
>  Issue Type: Sub-task
>  Components: Network
>Reporter: Nico Kruber
>Priority: Blocker
> Fix For: 1.5.0
>
>
> FLINK-8589 introduced a new polling method but did not implement 
> {{UnionInputGate#pollNextBufferOrEvent()}}. This prevents UnionGate instances 
> from containing a UnionGate instance which is explicitly allowed by its 
> documentation and interface.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Assigned] (FLINK-8737) Creating a union of UnionGate instances will fail with UnsupportedOperationException when retrieving buffers

2018-02-26 Thread Nico Kruber (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-8737?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nico Kruber reassigned FLINK-8737:
--

Assignee: Nico Kruber

> Creating a union of UnionGate instances will fail with 
> UnsupportedOperationException when retrieving buffers
> 
>
> Key: FLINK-8737
> URL: https://issues.apache.org/jira/browse/FLINK-8737
> Project: Flink
>  Issue Type: Sub-task
>  Components: Network
>Reporter: Nico Kruber
>Assignee: Nico Kruber
>Priority: Blocker
> Fix For: 1.5.0
>
>
> FLINK-8589 introduced a new polling method but did not implement 
> {{UnionInputGate#pollNextBufferOrEvent()}}. This prevents UnionGate instances 
> from containing a UnionGate instance which is explicitly allowed by its 
> documentation and interface.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[GitHub] flink pull request #5282: [FLINK-6352] [kafka] Timestamp-based offset config...

2018-02-26 Thread aljoscha

Github user aljoscha commented on a diff in the pull request:

https://github.com/apache/flink/pull/5282#discussion_r170663674
  
--- Diff: 
flink-connectors/flink-connector-kafka-0.10/src/test/java/org/apache/flink/streaming/connectors/kafka/internal/Kafka010FetcherTest.java
 ---
@@ -129,9 +129,14 @@ public Void answer(InvocationOnMock invocation) {
schema,
new Properties(),
0L,
+<<< HEAD
--- End diff --

Leftover merge markers?


---

[jira] [Commented] (FLINK-8737) Creating a union of UnionGate instances will fail with UnsupportedOperationException when retrieving buffers

2018-02-26 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-8737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16377172#comment-16377172
 ] 

ASF GitHub Bot commented on FLINK-8737:
---

GitHub user NicoK opened a pull request:

https://github.com/apache/flink/pull/5583

[FLINK-8737][network] disallow creating a union of UnionGate instances

## What is the purpose of the change

Recently, the `InputGate#pollNextBufferOrEvent()` was added but not 
implemented in `UnionInputGate`. It is, however, used in 
`UnionInputGate#getNextBufferOrEvent()` and thus any `UnionInputGate` 
containing a `UnionInputGate` would have failed already. There should be no use 
case for wiring up inputs this way. Therefore, fail early when trying to 
construct this.

## Brief change log

- throw an `UnsupportedOperationException` when trying to use a 
`UnionInputGate` for input to a `UnionInputGate`

## Verifying this change

This change is a trivial rework / code cleanup without any test coverage.

## Does this pull request potentially affect one of the following parts:

  - Dependencies (does it add or upgrade a dependency): **no**
  - The public API, i.e., is any changed class annotated with 
`@Public(Evolving)`: **no**
  - The serializers: **no**
  - The runtime per-record code paths (performance sensitive): **no**
  - Anything that affects deployment or recovery: JobManager (and its 
components), Checkpointing, Yarn/Mesos, ZooKeeper: **no**
  - The S3 file system connector: **no**

## Documentation

  - Does this pull request introduce a new feature? **no**
  - If yes, how is the feature documented? **JavaDocs**


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/NicoK/flink flink-8737

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/5583.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #5583


commit 198bf145e2a4b81327ae0c1e58c5bf4f8a5da650
Author: Nico Kruber 
Date:   2018-02-26T16:50:10Z

[hotfix][network] minor improvements in UnionInputGate

commit 6eff3e396b5a98a1029e3f42b5e78abfb10da7a6
Author: Nico Kruber 
Date:   2018-02-26T16:52:37Z

[FLINK-8737][network] disallow creating a union of UnionInputGate instances

Recently, the pollNextBufferOrEvent() was added but not implemented but 
this is
used in getNextBufferOrEvent() and thus any UnionInputGate containing a 
UnionInputGate
would have failed already. There should be no use case for wiring up inputs
this way. Therefore, fail early when trying to construct this.




> Creating a union of UnionGate instances will fail with 
> UnsupportedOperationException when retrieving buffers
> 
>
> Key: FLINK-8737
> URL: https://issues.apache.org/jira/browse/FLINK-8737
> Project: Flink
>  Issue Type: Sub-task
>  Components: Network
>Reporter: Nico Kruber
>Priority: Blocker
> Fix For: 1.5.0
>
>
> FLINK-8589 introduced a new polling method but did not implement 
> {{UnionInputGate#pollNextBufferOrEvent()}}. This prevents UnionGate instances 
> from containing a UnionGate instance which is explicitly allowed by its 
> documentation and interface.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[GitHub] flink pull request #5583: [FLINK-8737][network] disallow creating a union of...

2018-02-26 Thread NicoK

GitHub user NicoK opened a pull request:

https://github.com/apache/flink/pull/5583

[FLINK-8737][network] disallow creating a union of UnionGate instances

## What is the purpose of the change

Recently, the `InputGate#pollNextBufferOrEvent()` was added but not 
implemented in `UnionInputGate`. It is, however, used in 
`UnionInputGate#getNextBufferOrEvent()` and thus any `UnionInputGate` 
containing a `UnionInputGate` would have failed already. There should be no use 
case for wiring up inputs this way. Therefore, fail early when trying to 
construct this.

## Brief change log

- throw an `UnsupportedOperationException` when trying to use a 
`UnionInputGate` for input to a `UnionInputGate`

## Verifying this change

This change is a trivial rework / code cleanup without any test coverage.

## Does this pull request potentially affect one of the following parts:

  - Dependencies (does it add or upgrade a dependency): **no**
  - The public API, i.e., is any changed class annotated with 
`@Public(Evolving)`: **no**
  - The serializers: **no**
  - The runtime per-record code paths (performance sensitive): **no**
  - Anything that affects deployment or recovery: JobManager (and its 
components), Checkpointing, Yarn/Mesos, ZooKeeper: **no**
  - The S3 file system connector: **no**

## Documentation

  - Does this pull request introduce a new feature? **no**
  - If yes, how is the feature documented? **JavaDocs**


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/NicoK/flink flink-8737

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/5583.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #5583


commit 198bf145e2a4b81327ae0c1e58c5bf4f8a5da650
Author: Nico Kruber 
Date:   2018-02-26T16:50:10Z

[hotfix][network] minor improvements in UnionInputGate

commit 6eff3e396b5a98a1029e3f42b5e78abfb10da7a6
Author: Nico Kruber 
Date:   2018-02-26T16:52:37Z

[FLINK-8737][network] disallow creating a union of UnionInputGate instances

Recently, the pollNextBufferOrEvent() was added but not implemented but 
this is
used in getNextBufferOrEvent() and thus any UnionInputGate containing a 
UnionInputGate
would have failed already. There should be no use case for wiring up inputs
this way. Therefore, fail early when trying to construct this.




---

[jira] [Updated] (FLINK-8770) CompletedCheckPoints stored on ZooKeeper is not up-to-date, when JobManager is restarted it fails to recover the job due to "checkpoint FileNotFound exception"

2018-02-26 Thread Aljoscha Krettek (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-8770?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aljoscha Krettek updated FLINK-8770:

Fix Version/s: 1.5.0

> CompletedCheckPoints stored on ZooKeeper is not up-to-date, when JobManager 
> is restarted it fails to recover the job due to "checkpoint FileNotFound 
> exception"
> ---
>
> Key: FLINK-8770
> URL: https://issues.apache.org/jira/browse/FLINK-8770
> Project: Flink
>  Issue Type: Bug
>  Components: Local Runtime
>Affects Versions: 1.4.0
>Reporter: Xinyang Gao
>Priority: Blocker
> Fix For: 1.5.0
>
>
> Hi, I am running a Flink cluster (1 JobManager + 6 TaskManagers) with HA mode 
> on OpenShift, I have enabled Chaos Monkey which kills either JobManager or 
> one of the TaskManager in every 5 minutes, ZooKeeper quorum is stable with no 
> chaos monkey enabled. Flink reads data from one Kafka topic and writes data 
> into another Kafka topic. Checkpoint surely is enabled, with 1000ms interval. 
> state.checkpoints.num-retained is set to 10. I am using PVC for state backend 
> (checkpoint, recovery, etc), so the checkpoints and states are persistent. 
> The restart strategy for Flink jobmanager DeploymentConfig is 
> {color:#d04437}recreate, {color:#33}which means it will kill the old 
> container of jobmanager before it restarts the new one.{color}{color}
> I have run the Chaos test for one day at first, however I have seen the 
> exception:
> {color:#FF}org.apache.flink.util.FlinkException: Could not retrieve 
> checkpoint *** from state handle under /***. This indicates that the 
> retrieved state handle is broken. Try cleaning the state handle store. 
> {color:#33}and the root cause is checkpoint 
> {color:#d04437}FileNotFound{color}. {color}{color}
> {color:#FF}{color:#33}then the Flink job keeps restarting for a few 
> hours and due to the above error it cannot be restarted successfully. 
> {color}{color}
> {color:#FF}{color:#33}After further investigation, I have found the 
> following facts in my PVC:{color}{color}
>  
> {color:#d04437}-rw-r--r--. 1 flink root 11379 Feb 23 02:10 
> completedCheckpoint0ee95157de00
> -rw-r--r--. 1 flink root 11379 Feb 23 01:51 completedCheckpoint498d0952cf00
> -rw-r--r--. 1 flink root 11379 Feb 23 02:10 completedCheckpoint650fe5b021fe
> -rw-r--r--. 1 flink root 11379 Feb 23 02:10 completedCheckpoint66634149683e
> -rw-r--r--. 1 flink root 11379 Feb 23 02:11 completedCheckpoint67f24c3b018e
> -rw-r--r--. 1 flink root 11379 Feb 23 02:10 completedCheckpoint6f64ebf0ae64
> -rw-r--r--. 1 flink root 11379 Feb 23 02:11 completedCheckpoint906ebe1fb337
> -rw-r--r--. 1 flink root 11379 Feb 23 02:11 completedCheckpoint98b79ea14b09
> -rw-r--r--. 1 flink root 11379 Feb 23 02:10 completedCheckpointa0d1070e0b6c
> -rw-r--r--. 1 flink root 11379 Feb 23 02:11 completedCheckpointbd3a9ba50322
> -rw-r--r--. 1 flink root 11355 Feb 22 17:31 completedCheckpointd433b5e108be
> -rw-r--r--. 1 flink root 11379 Feb 22 22:56 completedCheckpointdd0183ed092b
> -rw-r--r--. 1 flink root 11379 Feb 22 00:00 completedCheckpointe0a5146c3d81
> -rw-r--r--. 1 flink root 11331 Feb 22 17:06 completedCheckpointec82f3ebc2ad
> -rw-r--r--. 1 flink root 11379 Feb 23 02:11 
> completedCheckpointf86e460f6720{color}
>  
> {color:#33}The latest 10 checkpoints are created at about 02:10, if you 
> ignore the old checkpoints which were not deleted successfully (which I do 
> not care too much).{color}
>  
> {color:#33}However when checking on ZooKeeper, I see the followings in 
> flink/checkpoints path (I only list one, but the other 9 are similar){color}
> {color:#d04437}cZxid = 0x160001ff5d
> ��sr;org.apache.flink.runtime.state.RetrievableStreamStateHandle�U�+LwrappedStreamStateHandlet2Lorg/apache/flink/runtime/state/StreamStateHandle;xpsr9org.apache.flink.runtime.state.filesystem.FileStateHandle�u�b�▒▒J
>  
> stateSizefilePathtLorg/apache/flink/core/fs/Path;xp,ssrorg.apache.flink.core.fs.PathLuritLjava/net/URI;xpsr
>  
> java.net.URI�x.C�I�LstringtLjava/lang/String;xpt=file:/mnt/flink-test/recovery/completedCheckpointd004a3753870x
> [zk: localhost:2181(CONNECTED) 7] ctime = Fri Feb 23 02:08:18 UTC 2018
> mZxid = 0x160001ff5d
> mtime = Fri Feb 23 02:08:18 UTC 2018
> pZxid = 0x1d0c6d
> cversion = 31
> dataVersion = 0
> aclVersion = 0
> ephemeralOwner = 0x0
> dataLength = 492{color}
> {color:#FF}{color:#33} {color}{color}
> so the latest completedCheckpoints status stored on ZooKeeper is at about 
> {color:#d04437}02:08, {color:#33}which implies that the completed 
> checkpoints at{color}{color} {color:#d04437}02:10 {color:#33}somehow are 
> not successfully submitted to ZooKpeer, so when it tries

[jira] [Updated] (FLINK-8779) ClassLoaderITCase.testKMeansJobWithCustomClassLoader fails on Travis

2018-02-26 Thread Aljoscha Krettek (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-8779?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aljoscha Krettek updated FLINK-8779:

Priority: Blocker  (was: Critical)

> ClassLoaderITCase.testKMeansJobWithCustomClassLoader fails on Travis
> 
>
> Key: FLINK-8779
> URL: https://issues.apache.org/jira/browse/FLINK-8779
> Project: Flink
>  Issue Type: Bug
>  Components: Tests
>Affects Versions: 1.5.0
>Reporter: Till Rohrmann
>Priority: Blocker
>  Labels: test-stability
> Fix For: 1.5.0
>
>
> The \{{ClassLoaderITCase.testKMeansJobWithCustomClassLoader}} fails on Travis 
> by producing not output for 300s. This might indicate a test instability or a 
> problem with Flink which was recently introduced.
> https://api.travis-ci.org/v3/job/344427688/log.txt



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (FLINK-8783) Test instability SlotPoolRpcTest.testExtraSlotsAreKept

2018-02-26 Thread Aljoscha Krettek (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-8783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aljoscha Krettek updated FLINK-8783:

Priority: Blocker  (was: Critical)

> Test instability SlotPoolRpcTest.testExtraSlotsAreKept
> --
>
> Key: FLINK-8783
> URL: https://issues.apache.org/jira/browse/FLINK-8783
> Project: Flink
>  Issue Type: Bug
>  Components: Tests
>Affects Versions: 1.5.0
>Reporter: Gary Yao
>Priority: Blocker
> Fix For: 1.5.0
>
>
> [https://travis-ci.org/GJL/flink/jobs/346206290]
>  {noformat}
> Tests run: 5, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 0.784 sec <<< 
> FAILURE! - in org.apache.flink.runtime.jobmaster.slotpool.SlotPoolRpcTest
> testExtraSlotsAreKept(org.apache.flink.runtime.jobmaster.slotpool.SlotPoolRpcTest)
>  Time elapsed: 0.016 sec <<< FAILURE!
> java.lang.AssertionError: expected:<0> but was:<1>
>  at org.junit.Assert.fail(Assert.java:88)
>  at org.junit.Assert.failNotEquals(Assert.java:834)
>  at org.junit.Assert.assertEquals(Assert.java:645)
>  at org.junit.Assert.assertEquals(Assert.java:631)
>  at 
> org.apache.flink.runtime.jobmaster.slotpool.SlotPoolRpcTest.testExtraSlotsAreKept(SlotPoolRpcTest.java:267)
> {noformat}
> I reproduced this in IntelliJ by configuring 50 consecutive runs of 
> {{testExtraSlotsAreKept}}. On my machine the 8th execution fails sporadically.
> commit: eeac022f0538e0979e6ad4eb06a2d1031cbd0146



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (FLINK-8783) Test instability SlotPoolRpcTest.testExtraSlotsAreKept

2018-02-26 Thread Aljoscha Krettek (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-8783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aljoscha Krettek updated FLINK-8783:

Fix Version/s: 1.5.0

> Test instability SlotPoolRpcTest.testExtraSlotsAreKept
> --
>
> Key: FLINK-8783
> URL: https://issues.apache.org/jira/browse/FLINK-8783
> Project: Flink
>  Issue Type: Bug
>  Components: Tests
>Affects Versions: 1.5.0
>Reporter: Gary Yao
>Priority: Blocker
> Fix For: 1.5.0
>
>
> [https://travis-ci.org/GJL/flink/jobs/346206290]
>  {noformat}
> Tests run: 5, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 0.784 sec <<< 
> FAILURE! - in org.apache.flink.runtime.jobmaster.slotpool.SlotPoolRpcTest
> testExtraSlotsAreKept(org.apache.flink.runtime.jobmaster.slotpool.SlotPoolRpcTest)
>  Time elapsed: 0.016 sec <<< FAILURE!
> java.lang.AssertionError: expected:<0> but was:<1>
>  at org.junit.Assert.fail(Assert.java:88)
>  at org.junit.Assert.failNotEquals(Assert.java:834)
>  at org.junit.Assert.assertEquals(Assert.java:645)
>  at org.junit.Assert.assertEquals(Assert.java:631)
>  at 
> org.apache.flink.runtime.jobmaster.slotpool.SlotPoolRpcTest.testExtraSlotsAreKept(SlotPoolRpcTest.java:267)
> {noformat}
> I reproduced this in IntelliJ by configuring 50 consecutive runs of 
> {{testExtraSlotsAreKept}}. On my machine the 8th execution fails sporadically.
> commit: eeac022f0538e0979e6ad4eb06a2d1031cbd0146



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (FLINK-8789) StoppableFunctions sometimes not detected

2018-02-26 Thread Aljoscha Krettek (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-8789?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aljoscha Krettek updated FLINK-8789:

Fix Version/s: 1.5.0

> StoppableFunctions sometimes not detected
> -
>
> Key: FLINK-8789
> URL: https://issues.apache.org/jira/browse/FLINK-8789
> Project: Flink
>  Issue Type: Bug
>  Components: Streaming
>Affects Versions: 1.5.0
>Reporter: Chesnay Schepler
>Priority: Blocker
> Fix For: 1.5.0
>
>
> While porting the {{TimestampITCase}} i got a strange exception in the logs 
> when running {{testWatermarkPropagationNoFinalWatermarkOnStop}}:
> {code:java}
> Caused by: java.lang.UnsupportedOperationException: Stopping not supported by 
> task Source: Custom Source (1/1) (87f44b422280be896bdf44766ebeb53e).
>     at org.apache.flink.runtime.taskmanager.Task.stopExecution(Task.java:965)
>     at 
> org.apache.flink.runtime.taskexecutor.TaskExecutor.stopTask(TaskExecutor.java:549)
>     ... 18 more{code}
>  This source however implements the {{StoppableFunction}} interface.
> This happens also against a legacy cluster.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

1 2 3 4 >

1 - 100 of 358 matches

Mail list logo