[jira] [Updated] (FLINK-6985) Remove bugfix version from docs title

2017-06-22 Thread Ufuk Celebi (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-6985?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ufuk Celebi updated FLINK-6985:
---
Summary: Remove bugfix version from docs title  (was: Remove minor version 
from docs title)

> Remove bugfix version from docs title
> -
>
> Key: FLINK-6985
> URL: https://issues.apache.org/jira/browse/FLINK-6985
> Project: Flink
>  Issue Type: Improvement
>  Components: Documentation
>Reporter: Ufuk Celebi
>Assignee: Ufuk Celebi
>Priority: Minor
>
> The docs HTML title contains the minor version of the corresponding release. 
> This can be confusing as we build the docs nightly from the respective 
> release branch.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (FLINK-6985) Remove minor version from docs title

2017-06-22 Thread Ufuk Celebi (JIRA)
Ufuk Celebi created FLINK-6985:
--

 Summary: Remove minor version from docs title
 Key: FLINK-6985
 URL: https://issues.apache.org/jira/browse/FLINK-6985
 Project: Flink
  Issue Type: Improvement
  Components: Documentation
Reporter: Ufuk Celebi
Assignee: Ufuk Celebi
Priority: Minor


The docs HTML title contains the minor version of the corresponding release. 
This can be confusing as we build the docs nightly from the respective release 
branch.




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (FLINK-6952) Add link to Javadocs

2017-06-20 Thread Ufuk Celebi (JIRA)
Ufuk Celebi created FLINK-6952:
--

 Summary: Add link to Javadocs
 Key: FLINK-6952
 URL: https://issues.apache.org/jira/browse/FLINK-6952
 Project: Flink
  Issue Type: Improvement
  Components: Documentation
Reporter: Ufuk Celebi
Assignee: Ufuk Celebi
Priority: Minor


The project webpage and the docs are missing links to the Javadocs.

I think we should add them as part of the external links at the bottom of the 
doc navigation (above "Project Page").

In the same manner we could add a link to the Scaladocs, but if I remember 
correctly there was a problem with the build of the Scaladocs. Correct, 
[~aljoscha]?




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Closed] (FLINK-5777) Pass savepoint information to CheckpointingOperation

2017-05-15 Thread Ufuk Celebi (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-5777?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ufuk Celebi closed FLINK-5777.
--
   Resolution: Fixed
Fix Version/s: 1.3.0

Fixed in 6e7a91741708a2b167a2bbca5dda5b2059df5e18.

> Pass savepoint information to CheckpointingOperation
> 
>
> Key: FLINK-5777
> URL: https://issues.apache.org/jira/browse/FLINK-5777
> Project: Flink
>  Issue Type: Sub-task
>  Components: State Backends, Checkpointing
>Reporter: Ufuk Celebi
>Assignee: Ufuk Celebi
> Fix For: 1.3.0
>
>
> In order to make savepoints self contained in a single directory, we need to 
> pass some information to {{StreamTask#CheckpointingOperation}}.
> I propose to extend the {{CheckpointMetaData}} for this.
> We currently have some overlap with CheckpointMetaData, CheckpointMetrics, 
> and manually passed checkpoint ID and checkpoint timestamps. We should 
> restrict CheckpointMetaData to the integral meta data that needs to be passed 
> to StreamTask#CheckpointingOperation.
> This means that we move the CheckpointMetrics out of the CheckpointMetaData 
> and the BarrierBuffer/BarrierTracker create CheckpointMetrics separately and 
> send it back with the acknowledge message.
> CheckpointMetaData should be extended with the following properties:
> - boolean isSavepoint
> - String targetDirectory
> There are two code paths that lead to the CheckpointingOperation:
> 1. From CheckpointCoordinator via RPC to StreamTask#triggerCheckpoint
> - Execution#triggerCheckpoint(long, long) 
> => triggerCheckpoint(CheckpointMetaData)
> - TaskManagerGateway#triggerCheckpoint(ExecutionAttemptID, JobID, long, long) 
> => TaskManagerGateway#triggerCheckpoint(ExecutionAttemptID, JobID, 
> CheckpointMetaData)
> - Task#triggerCheckpointBarrier(long, long) =>  
> Task#triggerCheckpointBarrier(CheckpointMetaData)
> 2. From intermediate streams via the CheckpointBarrier to  
> StreamTask#triggerCheckpointOnBarrier
> - triggerCheckpointOnBarrier(CheckpointMetaData)
> => triggerCheckpointOnBarrier(CheckpointMetaData, CheckpointMetrics)
> - CheckpointBarrier(long, long) => CheckpointBarrier(CheckpointMetaData)
> - AcknowledgeCheckpoint(CheckpointMetaData)
> => AcknowledgeCheckpoint(long, CheckpointMetrics)
> The state backends provide another stream factory that is called in 
> CheckpointingOperation when the meta data indicates savepoint. The state 
> backends can choose whether they return the regular checkpoint stream factory 
> in that case or a special one for savepoints. That way backends that don’t 
> checkpoint to a file system can special case savepoints easily.
> - FsStateBackend: return special FsCheckpointStreamFactory with different 
> directory layout
> - MemoryStateBackend: return regular checkpoint stream factory 
> (MemCheckpointStreamFactory) => The _metadata file will contain all state as 
> the state handles are part of it



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (FLINK-5778) Split FileStateHandle into fileName and basePath

2017-05-08 Thread Ufuk Celebi (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-5778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16000443#comment-16000443
 ] 

Ufuk Celebi commented on FLINK-5778:


No, I don't think so. I was waiting for FLINK-5823 
(https://github.com/apache/flink/pull/3522) to be merged/rebased in order to 
rebase my PR on top of that one.

> Split FileStateHandle into fileName and basePath
> 
>
> Key: FLINK-5778
> URL: https://issues.apache.org/jira/browse/FLINK-5778
> Project: Flink
>  Issue Type: Sub-task
>  Components: State Backends, Checkpointing
>Reporter: Ufuk Celebi
>Assignee: Ufuk Celebi
>
> Store the statePath as a basePath and a fileName and allow to overwrite the 
> basePath. We cannot overwrite the base path as long as the state handle is 
> still in flight and not persisted. Otherwise we risk a resource leak.
> We need this in order to be able to relocate savepoints.
> {code}
> interface RelativeBaseLocationStreamStateHandle {
>void clearBaseLocation();
>void setBaseLocation(String baseLocation);
> }
> {code}
> FileStateHandle should implement this and the SavepointSerializer should 
> forward the calls when a savepoint is stored or loaded, clear before store 
> and set after load.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (FLINK-5440) Misleading error message when migrating and scaling down from savepoint

2017-05-05 Thread Ufuk Celebi (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-5440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15998371#comment-15998371
 ] 

Ufuk Celebi commented on FLINK-5440:


Unfortunately, I haven't followed the most recent development in this area. If 
the error message has been changed, feel free to close this issue.

> Misleading error message when migrating and scaling down from savepoint
> ---
>
> Key: FLINK-5440
> URL: https://issues.apache.org/jira/browse/FLINK-5440
> Project: Flink
>  Issue Type: Improvement
>  Components: State Backends, Checkpointing
>Reporter: Ufuk Celebi
>Priority: Minor
>
> When resuming from an 1.1 savepoint with 1.2 and reducing the parallelism 
> (and correctly setting the max parallelism), the error message says something 
> about a missing operator which is misleading. Restoring from the same 
> savepoint with the savepoint parallelism works as expected.
> Instead it should state that this kind of operation is not possible. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (FLINK-5439) Adjust max parallelism when migrating

2017-05-05 Thread Ufuk Celebi (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-5439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15998369#comment-15998369
 ] 

Ufuk Celebi commented on FLINK-5439:


Unfortunately, I haven't followed the most recent development in this area. If 
you think it has been fixed, feel free to close it.

> Adjust max parallelism when migrating
> -
>
> Key: FLINK-5439
> URL: https://issues.apache.org/jira/browse/FLINK-5439
> Project: Flink
>  Issue Type: Improvement
>  Components: State Backends, Checkpointing
>Reporter: Ufuk Celebi
>Priority: Minor
>
> When migrating from v1 savepoints which don't have the notion of a max 
> parallelism, the job needs to explicitly set the max parallelism to the 
> parallelism of the savepoint.
> [~stefanrichte...@gmail.com] If this not trivially implemented, let's close 
> this as won't fix.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Closed] (FLINK-6337) Remove the buffer provider from PartitionRequestServerHandler

2017-05-02 Thread Ufuk Celebi (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-6337?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ufuk Celebi closed FLINK-6337.
--
   Resolution: Fixed
Fix Version/s: 1.3.0

Fixed in 464d6f5 (master).

> Remove the buffer provider from PartitionRequestServerHandler
> -
>
> Key: FLINK-6337
> URL: https://issues.apache.org/jira/browse/FLINK-6337
> Project: Flink
>  Issue Type: Improvement
>  Components: Network
>Reporter: zhijiang
>Assignee: zhijiang
>Priority: Minor
> Fix For: 1.3.0
>
>
> Currently, {{PartitionRequestServerHandler}} will create a 
> {{LocalBufferPool}} when the channel is registered. The {{LocalBufferPool}} 
> is only used to get segment size for creating read view in 
> {{SpillableSubpartition}}, and the buffers in the pool will not be used all 
> the time, so it will waste the buffer resource of global pool.
> We would like to remove the {{LocalBufferPool}} from the 
> {{PartitionRequestServerHandler}}, and the {{LocalBufferPool}} in 
> {{ResultPartition}} can also provide the segment size for creating sub 
> partition view.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (FLINK-5125) ContinuousFileProcessingCheckpointITCase is Flaky

2017-05-02 Thread Ufuk Celebi (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-5125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15992746#comment-15992746
 ] 

Ufuk Celebi commented on FLINK-5125:


https://api.travis-ci.org/jobs/227891378/log.txt?deansi=true

> ContinuousFileProcessingCheckpointITCase is Flaky
> -
>
> Key: FLINK-5125
> URL: https://issues.apache.org/jira/browse/FLINK-5125
> Project: Flink
>  Issue Type: Bug
>  Components: Streaming Connectors
>Reporter: Aljoscha Krettek
>Assignee: Kostas Kloudas
>  Labels: test-stability
>
> This is the travis log: 
> https://api.travis-ci.org/jobs/177402367/log.txt?deansi=true
> The relevant sections is:
> {code}
> Running org.apache.flink.test.checkpointing.CoStreamCheckpointingITCase
> Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 2.571 sec - 
> in org.apache.flink.test.exampleJavaPrograms.EnumTriangleBasicITCase
> Running org.apache.flink.test.checkpointing.EventTimeWindowCheckpointingITCase
> Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 17.704 sec - 
> in org.apache.flink.test.checkpointing.CoStreamCheckpointingITCase
> Running 
> org.apache.flink.test.checkpointing.EventTimeAllWindowCheckpointingITCase
> Tests run: 5, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 11.805 sec - 
> in org.apache.flink.test.checkpointing.EventTimeAllWindowCheckpointingITCase
> Running 
> org.apache.flink.test.checkpointing.ContinuousFileProcessingCheckpointITCase
> org.apache.flink.client.program.ProgramInvocationException: The program 
> execution failed: Job execution failed.
>   at 
> org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:427)
>   at 
> org.apache.flink.client.program.StandaloneClusterClient.submitJob(StandaloneClusterClient.java:101)
>   at 
> org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:400)
>   at 
> org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:392)
>   at 
> org.apache.flink.streaming.api.environment.RemoteStreamEnvironment.executeRemotely(RemoteStreamEnvironment.java:209)
>   at 
> org.apache.flink.streaming.api.environment.RemoteStreamEnvironment.execute(RemoteStreamEnvironment.java:173)
>   at org.apache.flink.test.util.TestUtils.tryExecute(TestUtils.java:32)
>   at 
> org.apache.flink.test.checkpointing.StreamFaultToleranceTestBase.runCheckpointedProgram(StreamFaultToleranceTestBase.java:106)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:606)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:55)
>   at org.junit.rules.RunRules.evaluate(RunRules.java:20)
>   at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
>   at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
>   at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
>   at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
>   at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
>   at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
>   at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
>   at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
>   at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:283)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:173)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:153)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:128)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:203)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:155)
>   at 
> 

[jira] [Commented] (FLINK-4052) Unstable test ConnectionUtilsTest

2017-05-02 Thread Ufuk Celebi (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-4052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15992741#comment-15992741
 ] 

Ufuk Celebi commented on FLINK-4052:


Happened again here: 
https://api.travis-ci.org/jobs/227891375/log.txt?deansi=true

> Unstable test ConnectionUtilsTest
> -
>
> Key: FLINK-4052
> URL: https://issues.apache.org/jira/browse/FLINK-4052
> Project: Flink
>  Issue Type: Bug
>  Components: Tests
>Affects Versions: 1.0.2, 1.3.0
>Reporter: Stephan Ewen
>Assignee: Stephan Ewen
>Priority: Critical
>  Labels: test-stability
> Fix For: 1.1.0
>
>
> The error is the following:
> {code}
> ConnectionUtilsTest.testReturnLocalHostAddressUsingHeuristics:59 
> expected: but 
> was:
> {code}
> The probable cause for the failure is that the test tries to select an unused 
> closed port (from the ephemeral range), and then assumes that all connections 
> to that port fail.
> If a concurrent test actually uses that port, connections to the port will 
> succeed.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (FLINK-6337) Remove the buffer provider from PartitionRequestServerHandler

2017-04-25 Thread Ufuk Celebi (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-6337?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15982936#comment-15982936
 ] 

Ufuk Celebi commented on FLINK-6337:


Just double checked this and I think you are right. We currently use the buffer 
provider only for the buffer size which doesn't make sense. +1 to remove. I 
agree that the server handler is not a good place for it anyways (I think that 
was a work around in the initial version that is not needed any more). Could 
you make this refactoring an independent pull request so that we can review it 
easily?

> Remove the buffer provider from PartitionRequestServerHandler
> -
>
> Key: FLINK-6337
> URL: https://issues.apache.org/jira/browse/FLINK-6337
> Project: Flink
>  Issue Type: Improvement
>  Components: Network
>Reporter: zhijiang
>Assignee: zhijiang
>Priority: Minor
>
> Currently, {{PartitionRequestServerHandler}} will create a 
> {{LocalBufferPool}} when the channel is registered. The {{LocalBufferPool}} 
> is only used to get segment size for creating read view in 
> {{SpillableSubpartition}}, and the buffers in the pool will not be used all 
> the time, so it will waste the buffer resource of global pool.
> We would like to remove the {{LocalBufferPool}} from the 
> {{PartitionRequestServerHandler}}, and the {{LocalBufferPool}} in 
> {{ResultPartition}} can also provide the segment size for creating sub 
> partition view.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (FLINK-6337) Remove the buffer provider from PartitionRequestServerHandler

2017-04-25 Thread Ufuk Celebi (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-6337?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15982545#comment-15982545
 ] 

Ufuk Celebi commented on FLINK-6337:


Hey [~zjwang]! I think the buffer pool is required for spillable result 
partitions (when creating SpillableSubpartitionView). When they are consumed, a 
buffer provider needs to be given to the partition to copy the buffers to. I 
think this is not implemented in an efficient way at the moment (ideally the 
copy should not be necessary), but the proper implementation would require some 
more involved refactorings.

> Remove the buffer provider from PartitionRequestServerHandler
> -
>
> Key: FLINK-6337
> URL: https://issues.apache.org/jira/browse/FLINK-6337
> Project: Flink
>  Issue Type: Improvement
>  Components: Network
>Reporter: zhijiang
>Assignee: zhijiang
>Priority: Minor
>
> Currently, {{PartitionRequestServerHandler}} will create a 
> {{LocalBufferPool}} when the channel is registered. The {{LocalBufferPool}} 
> is only used to get segment size for creating read view in 
> {{SpillableSubpartition}}, and the buffers in the pool will not be used all 
> the time, so it will waste the buffer resource of global pool.
> We would like to remove the {{LocalBufferPool}} from the 
> {{PartitionRequestServerHandler}}, and the {{LocalBufferPool}} in 
> {{ResultPartition}} can also provide the segment size for creating sub 
> partition view.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Closed] (FLINK-3031) Consistent Shutdown of Streaming Jobs

2017-04-03 Thread Ufuk Celebi (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-3031?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ufuk Celebi closed FLINK-3031.
--
Resolution: Invalid

Yes

> Consistent Shutdown of Streaming Jobs
> -
>
> Key: FLINK-3031
> URL: https://issues.apache.org/jira/browse/FLINK-3031
> Project: Flink
>  Issue Type: Bug
>  Components: DataStream API
>Reporter: Matthias J. Sax
>
> Depends on FLINK-2111
> When a streaming job is shut down cleanly via "stop", a last consistent 
> snapshot should be collected. This snapshot could be used to resume a job 
> later on.
> See mail archive for more details of the discussion: 
> https://mail-archives.apache.org/mod_mbox/flink-dev/201511.mbox/%3CCA%2Bfaj9xDFAUG_zi%3D%3DE2H8s-8R4cn8ZBDON_hf%2B1Rud5pJqvZ4A%40mail.gmail.com%3E



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Closed] (FLINK-6194) More broken links in docs

2017-03-27 Thread Ufuk Celebi (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-6194?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ufuk Celebi closed FLINK-6194.
--
   Resolution: Fixed
Fix Version/s: 1.3.0

Merged in 037b5cb (master).

> More broken links in docs
> -
>
> Key: FLINK-6194
> URL: https://issues.apache.org/jira/browse/FLINK-6194
> Project: Flink
>  Issue Type: Bug
>Reporter: Patrick Lucas
>Assignee: Patrick Lucas
> Fix For: 1.3.0
>
>
> My script noticed a few broken links that made it into the docs. I'll fix 
> them up.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Closed] (FLINK-6182) Fix possible NPE in SourceStreamTask

2017-03-24 Thread Ufuk Celebi (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-6182?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ufuk Celebi closed FLINK-6182.
--
   Resolution: Fixed
Fix Version/s: 1.2.1
   1.3.0

Fixed in b703a24 (release-1.2), 4b19e27 (master).

> Fix possible NPE in SourceStreamTask
> 
>
> Key: FLINK-6182
> URL: https://issues.apache.org/jira/browse/FLINK-6182
> Project: Flink
>  Issue Type: Bug
>  Components: Local Runtime
>Reporter: Ufuk Celebi
>Priority: Minor
> Fix For: 1.3.0, 1.2.1
>
>
> If SourceStreamTask is cancelled before being invoked, `headOperator` is not 
> set yet, which leads to an NPE. This is not critical but leads to noisy logs.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (FLINK-6182) Fix possible NPE in SourceStreamTask

2017-03-24 Thread Ufuk Celebi (JIRA)
Ufuk Celebi created FLINK-6182:
--

 Summary: Fix possible NPE in SourceStreamTask
 Key: FLINK-6182
 URL: https://issues.apache.org/jira/browse/FLINK-6182
 Project: Flink
  Issue Type: Bug
  Components: Local Runtime
Reporter: Ufuk Celebi
Priority: Minor


If SourceStreamTask is cancelled before being invoked, `headOperator` is not 
set yet, which leads to an NPE. This is not critical but leads to noisy logs.




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (FLINK-6175) HistoryServerTest.testFullArchiveLifecycle fails

2017-03-23 Thread Ufuk Celebi (JIRA)
Ufuk Celebi created FLINK-6175:
--

 Summary: HistoryServerTest.testFullArchiveLifecycle fails
 Key: FLINK-6175
 URL: https://issues.apache.org/jira/browse/FLINK-6175
 Project: Flink
  Issue Type: Test
  Components: Tests, Webfrontend
Reporter: Ufuk Celebi
Assignee: Chesnay Schepler


https://s3.amazonaws.com/archive.travis-ci.org/jobs/213933823/log.txt

{code}
estFullArchiveLifecycle(org.apache.flink.runtime.webmonitor.history.HistoryServerTest)
  Time elapsed: 2.162 sec  <<< FAILURE!
java.lang.AssertionError: /joboverview.json did not contain valid json
at org.junit.Assert.fail(Assert.java:88)
at org.junit.Assert.assertTrue(Assert.java:41)
at org.junit.Assert.assertNotNull(Assert.java:712)
at 
org.apache.flink.runtime.webmonitor.history.HistoryServerTest.testFullArchiveLifecycle(HistoryServerTest.java:98)
{code}

Happened on a branch with unrelated changes [~Zentol].



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Closed] (FLINK-6170) Some checkpoint metrics rely on latest stat snapshot

2017-03-22 Thread Ufuk Celebi (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-6170?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ufuk Celebi closed FLINK-6170.
--
   Resolution: Fixed
Fix Version/s: 1.2.1
   1.3.0

Fixed in d0695c0 (master), 7fbb115 (release-1.2).

> Some checkpoint metrics rely on latest stat snapshot
> 
>
> Key: FLINK-6170
> URL: https://issues.apache.org/jira/browse/FLINK-6170
> Project: Flink
>  Issue Type: Bug
>  Components: Metrics, State Backends, Checkpointing, Webfrontend
>Reporter: Ufuk Celebi
>Assignee: Ufuk Celebi
> Fix For: 1.3.0, 1.2.1
>
>
> Some checkpoint metrics use the latest stats snapshot to get the returned 
> metric value. These snapshots are only updated when the {{WebRuntimeMonitor}} 
> actually requests some stats (web UI or REST API).
> In practice, this means that these metrics are only updated when users are 
> browsing the web UI.
> Instead of relying on the latest snapshot, the checkpoint metrics should be 
> directly updated via the completion callbacks.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Closed] (FLINK-6171) Some checkpoint metrics rely on latest stat snapshot

2017-03-22 Thread Ufuk Celebi (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-6171?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ufuk Celebi closed FLINK-6171.
--
Resolution: Duplicate

> Some checkpoint metrics rely on latest stat snapshot
> 
>
> Key: FLINK-6171
> URL: https://issues.apache.org/jira/browse/FLINK-6171
> Project: Flink
>  Issue Type: Bug
>  Components: Metrics, State Backends, Checkpointing, Webfrontend
>Reporter: Ufuk Celebi
>
> Some checkpoint metrics use the latest stats snapshot to get the returned 
> metric value. These snapshots are only updated when the {{WebRuntimeMonitor}} 
> actually requests some stats (web UI or REST API).
> In practice, this means that these metrics are only updated when users are 
> browsing the web UI.
> Instead of relying on the latest snapshot, the checkpoint metrics should be 
> directly updated via the completion callbacks.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (FLINK-6170) Some checkpoint metrics rely on latest stat snapshot

2017-03-22 Thread Ufuk Celebi (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-6170?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ufuk Celebi reassigned FLINK-6170:
--

Assignee: Ufuk Celebi

> Some checkpoint metrics rely on latest stat snapshot
> 
>
> Key: FLINK-6170
> URL: https://issues.apache.org/jira/browse/FLINK-6170
> Project: Flink
>  Issue Type: Bug
>  Components: Metrics, State Backends, Checkpointing, Webfrontend
>Reporter: Ufuk Celebi
>Assignee: Ufuk Celebi
>
> Some checkpoint metrics use the latest stats snapshot to get the returned 
> metric value. These snapshots are only updated when the {{WebRuntimeMonitor}} 
> actually requests some stats (web UI or REST API).
> In practice, this means that these metrics are only updated when users are 
> browsing the web UI.
> Instead of relying on the latest snapshot, the checkpoint metrics should be 
> directly updated via the completion callbacks.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (FLINK-6171) Some checkpoint metrics rely on latest stat snapshot

2017-03-22 Thread Ufuk Celebi (JIRA)
Ufuk Celebi created FLINK-6171:
--

 Summary: Some checkpoint metrics rely on latest stat snapshot
 Key: FLINK-6171
 URL: https://issues.apache.org/jira/browse/FLINK-6171
 Project: Flink
  Issue Type: Bug
  Components: Metrics, State Backends, Checkpointing, Webfrontend
Reporter: Ufuk Celebi


Some checkpoint metrics use the latest stats snapshot to get the returned 
metric value. These snapshots are only updated when the {{WebRuntimeMonitor}} 
actually requests some stats (web UI or REST API).

In practice, this means that these metrics are only updated when users are 
browsing the web UI.

Instead of relying on the latest snapshot, the checkpoint metrics should be 
directly updated via the completion callbacks.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (FLINK-6170) Some checkpoint metrics rely on latest stat snapshot

2017-03-22 Thread Ufuk Celebi (JIRA)
Ufuk Celebi created FLINK-6170:
--

 Summary: Some checkpoint metrics rely on latest stat snapshot
 Key: FLINK-6170
 URL: https://issues.apache.org/jira/browse/FLINK-6170
 Project: Flink
  Issue Type: Bug
  Components: Metrics, State Backends, Checkpointing, Webfrontend
Reporter: Ufuk Celebi


Some checkpoint metrics use the latest stats snapshot to get the returned 
metric value. These snapshots are only updated when the {{WebRuntimeMonitor}} 
actually requests some stats (web UI or REST API).

In practice, this means that these metrics are only updated when users are 
browsing the web UI.

Instead of relying on the latest snapshot, the checkpoint metrics should be 
directly updated via the completion callbacks.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (FLINK-6167) Consider removing ArchivedExecutionGraph classes

2017-03-22 Thread Ufuk Celebi (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-6167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15936454#comment-15936454
 ] 

Ufuk Celebi commented on FLINK-6167:


I think we should wait to see whether the FileSystem based approach is well 
received. In the future, we might want to get rid of it or at least provide a 
RPC based variant.

> Consider removing ArchivedExecutionGraph classes
> 
>
> Key: FLINK-6167
> URL: https://issues.apache.org/jira/browse/FLINK-6167
> Project: Flink
>  Issue Type: Improvement
>  Components: JobManager
>Affects Versions: 1.3.0
>Reporter: Chesnay Schepler
>Priority: Minor
>
> The Archived* versions of the ExecutionGraph classes (ExecutionGraph, 
> ExecutionJobVertex, ExecutionVertex, Execution) were originally intended to 
> provide a serializable object that can be transferred to the History Server.
> The revised implementation of the history server however no longer requires 
> them.
> As such we could either remove them, or keep them for testing purposes 
> (instead of mocking) as they simplify the testing of the web-interface 
> handlers quite a lot, which would however require keeping the Access* 
> interfaces.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (FLINK-6127) Add MissingDeprecatedCheck to checkstyle

2017-03-20 Thread Ufuk Celebi (JIRA)
Ufuk Celebi created FLINK-6127:
--

 Summary: Add MissingDeprecatedCheck to checkstyle
 Key: FLINK-6127
 URL: https://issues.apache.org/jira/browse/FLINK-6127
 Project: Flink
  Issue Type: Improvement
  Components: Build System
Reporter: Ufuk Celebi
Priority: Minor


We should add the MissingDeprecatedCheck to our checkstyle rules to help 
avoiding deprecations without JavaDocs mentioning why the deprecation happened.

http://checkstyle.sourceforge.net/apidocs/com/puppycrawl/tools/checkstyle/checks/annotation/MissingDeprecatedCheck.html




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Closed] (FLINK-6122) add TravisCI build status to README.md

2017-03-20 Thread Ufuk Celebi (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-6122?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ufuk Celebi closed FLINK-6122.
--
   Resolution: Implemented
Fix Version/s: (was: 1.2.1)

Implemented in 486f724 (master).

> add TravisCI build status to README.md
> --
>
> Key: FLINK-6122
> URL: https://issues.apache.org/jira/browse/FLINK-6122
> Project: Flink
>  Issue Type: Improvement
>  Components: Build System
>Affects Versions: 1.2.0
>Reporter: Bowen Li
>Assignee: Bowen Li
>Priority: Minor
> Fix For: 1.3.0
>
>
> Add TravisCI build status to README in github repo. Expectation is to have 
> something like https://github.com/apache/incubator-airflow



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Closed] (FLINK-5635) Improve Docker tooling to make it easier to build images and launch Flink via Docker tools

2017-03-15 Thread Ufuk Celebi (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-5635?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ufuk Celebi closed FLINK-5635.
--
   Resolution: Fixed
Fix Version/s: 1.3.0

Fixed in 227478b, a3627f2 (master).


> Improve Docker tooling to make it easier to build images and launch Flink via 
> Docker tools
> --
>
> Key: FLINK-5635
> URL: https://issues.apache.org/jira/browse/FLINK-5635
> Project: Flink
>  Issue Type: Improvement
>  Components: Docker
>Affects Versions: 1.2.0
>Reporter: Jamie Grier
>Assignee: Patrick Lucas
> Fix For: 1.3.0
>
>
> This is a bit of a catch-all ticket for general improvements to the Flink on 
> Docker experience.
> Things to improve:
>   - Make it possible to build a Docker image from your own flink-dist 
> directory as well as official releases.
>   - Make it possible to override the image name so a user can more easily 
> publish these images to their Docker repository
>   - Provide scripts that show how to properly run on Docker Swarm or similar 
> environments with overlay networking (Kubernetes) without using host 
> networking.
>   - Log to stdout rather than to files.
>   - Work properly with docker-compose for local deployment as well as 
> production deployments (Swarm/k8s)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (FLINK-5985) Flink treats every task as stateful (making topology changes impossible)

2017-03-10 Thread Ufuk Celebi (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-5985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15904869#comment-15904869
 ] 

Ufuk Celebi commented on FLINK-5985:


Thanks again for reporting this. You are right, we accidentally changed the 
behaviour between 1.1 and 1.2. This is a critical issue.

In 1.1 we did not serialize stateless tasks as part of the savepoint and 
therefore never took them into account on savepoint loading. With the recent 
refactorings in 1.2 we now serialize stateless tasks this as "zero length 
state". This makes the stateless operators part of the savepoint we try to load 
the state back to the new job.

I think there a couple of easy ways to fix this, but 
[~stefanrichte...@gmail.com] probably has the definitive answer here.

I think this issue warrants a 1.2.1 release asap as users cannot change their 
topologies if they didn't specify a UID for each operator.


> Flink treats every task as stateful (making topology changes impossible)
> 
>
> Key: FLINK-5985
> URL: https://issues.apache.org/jira/browse/FLINK-5985
> Project: Flink
>  Issue Type: Bug
>  Components: State Backends, Checkpointing
>Affects Versions: 1.2.0
>Reporter: Gyula Fora
>Priority: Critical
>
> It seems  that Flink treats every Task as stateful so changing the topology 
> is not possible without setting uid on every single operator.
> If the topology has an iteration this is virtually impossible (or at least 
> gets super hacky)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (FLINK-5985) Flink treats every task as stateful (making topology changes impossible)

2017-03-10 Thread Ufuk Celebi (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-5985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15904779#comment-15904779
 ] 

Ufuk Celebi commented on FLINK-5985:


Thanks for taking the time to create the example, Gyula.

> Flink treats every task as stateful (making topology changes impossible)
> 
>
> Key: FLINK-5985
> URL: https://issues.apache.org/jira/browse/FLINK-5985
> Project: Flink
>  Issue Type: Bug
>  Components: State Backends, Checkpointing
>Affects Versions: 1.2.0
>Reporter: Gyula Fora
>Priority: Critical
>
> It seems  that Flink treats every Task as stateful so changing the topology 
> is not possible without setting uid on every single operator.
> If the topology has an iteration this is virtually impossible (or at least 
> gets super hacky)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Closed] (FLINK-6002) Documentation: 'MacOS X' under 'Download and Start Flink' in Quickstart page is not rendered correctly

2017-03-09 Thread Ufuk Celebi (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-6002?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ufuk Celebi closed FLINK-6002.
--
   Resolution: Fixed
Fix Version/s: 1.3.0

Fixed in 2a32af8 (1.2), 8d92f86 (master).

> Documentation: 'MacOS X' under 'Download and Start Flink' in Quickstart page 
> is not rendered correctly
> --
>
> Key: FLINK-6002
> URL: https://issues.apache.org/jira/browse/FLINK-6002
> Project: Flink
>  Issue Type: Bug
>  Components: Documentation
>Affects Versions: 1.2.0
>Reporter: Bowen Li
>Priority: Trivial
> Fix For: 1.3.0, 1.2.1
>
>
> On 
> https://ci.apache.org/projects/flink/flink-docs-release-1.2/quickstart/setup_quickstart.html#setup-download-and-start-flink
>  , command lines in "MacOS X" part are not rendered correctly.
> This is because the markdown is misformatted - it doesn't leave a blank line 
> between text and code block.
> So the fix is simple - add a blank line between text and code block.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (FLINK-5985) Flink treats every task as stateful (making topology changes impossible)

2017-03-08 Thread Ufuk Celebi (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-5985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15901356#comment-15901356
 ] 

Ufuk Celebi commented on FLINK-5985:


Is it possible to share the driver program (privately works as well)? I could 
then try to reproduce the issue with dummy data.



> Flink treats every task as stateful (making topology changes impossible)
> 
>
> Key: FLINK-5985
> URL: https://issues.apache.org/jira/browse/FLINK-5985
> Project: Flink
>  Issue Type: Bug
>  Components: State Backends, Checkpointing
>Affects Versions: 1.2.0
>Reporter: Gyula Fora
>Priority: Critical
>
> It seems  that Flink treats every Task as stateful so changing the topology 
> is not possible without setting uid on every single operator.
> If the topology has an iteration this is virtually impossible (or at least 
> gets super hacky)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (FLINK-5999) MiniClusterITCase.runJobWithMultipleRpcServices fails

2017-03-08 Thread Ufuk Celebi (JIRA)
Ufuk Celebi created FLINK-5999:
--

 Summary: MiniClusterITCase.runJobWithMultipleRpcServices fails
 Key: FLINK-5999
 URL: https://issues.apache.org/jira/browse/FLINK-5999
 Project: Flink
  Issue Type: Test
  Components: Distributed Coordination, Tests
Reporter: Ufuk Celebi


In a branch with unrelated changes to the web frontend I've seen the following 
test fail:

{code}
runJobWithMultipleRpcServices(org.apache.flink.runtime.minicluster.MiniClusterITCase)
  Time elapsed: 1.145 sec  <<< ERROR!
java.util.ConcurrentModificationException: null
at java.util.HashMap$HashIterator.nextNode(HashMap.java:1429)
at java.util.HashMap$ValueIterator.next(HashMap.java:1458)
at 
org.apache.flink.runtime.resourcemanager.JobLeaderIdService.clear(JobLeaderIdService.java:114)
at 
org.apache.flink.runtime.resourcemanager.JobLeaderIdService.stop(JobLeaderIdService.java:92)
at 
org.apache.flink.runtime.resourcemanager.ResourceManager.shutDown(ResourceManager.java:182)
at 
org.apache.flink.runtime.resourcemanager.ResourceManagerRunner.shutDownInternally(ResourceManagerRunner.java:83)
at 
org.apache.flink.runtime.resourcemanager.ResourceManagerRunner.shutDown(ResourceManagerRunner.java:78)
at 
org.apache.flink.runtime.minicluster.MiniCluster.shutdownInternally(MiniCluster.java:313)
at 
org.apache.flink.runtime.minicluster.MiniCluster.shutdown(MiniCluster.java:281)
at 
org.apache.flink.runtime.minicluster.MiniClusterITCase.runJobWithMultipleRpcServices(MiniClusterITCase.java:72)
{code}




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (FLINK-5999) MiniClusterITCase.runJobWithMultipleRpcServices fails

2017-03-08 Thread Ufuk Celebi (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-5999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15901330#comment-15901330
 ] 

Ufuk Celebi commented on FLINK-5999:


https://s3.amazonaws.com/archive.travis-ci.org/jobs/208981739/log.txt

> MiniClusterITCase.runJobWithMultipleRpcServices fails
> -
>
> Key: FLINK-5999
> URL: https://issues.apache.org/jira/browse/FLINK-5999
> Project: Flink
>  Issue Type: Test
>  Components: Distributed Coordination, Tests
>Reporter: Ufuk Celebi
>  Labels: test-stability
>
> In a branch with unrelated changes to the web frontend I've seen the 
> following test fail:
> {code}
> runJobWithMultipleRpcServices(org.apache.flink.runtime.minicluster.MiniClusterITCase)
>   Time elapsed: 1.145 sec  <<< ERROR!
> java.util.ConcurrentModificationException: null
>   at java.util.HashMap$HashIterator.nextNode(HashMap.java:1429)
>   at java.util.HashMap$ValueIterator.next(HashMap.java:1458)
>   at 
> org.apache.flink.runtime.resourcemanager.JobLeaderIdService.clear(JobLeaderIdService.java:114)
>   at 
> org.apache.flink.runtime.resourcemanager.JobLeaderIdService.stop(JobLeaderIdService.java:92)
>   at 
> org.apache.flink.runtime.resourcemanager.ResourceManager.shutDown(ResourceManager.java:182)
>   at 
> org.apache.flink.runtime.resourcemanager.ResourceManagerRunner.shutDownInternally(ResourceManagerRunner.java:83)
>   at 
> org.apache.flink.runtime.resourcemanager.ResourceManagerRunner.shutDown(ResourceManagerRunner.java:78)
>   at 
> org.apache.flink.runtime.minicluster.MiniCluster.shutdownInternally(MiniCluster.java:313)
>   at 
> org.apache.flink.runtime.minicluster.MiniCluster.shutdown(MiniCluster.java:281)
>   at 
> org.apache.flink.runtime.minicluster.MiniClusterITCase.runJobWithMultipleRpcServices(MiniClusterITCase.java:72)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (FLINK-5780) Extend ConfigOption with descriptions

2017-03-08 Thread Ufuk Celebi (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-5780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15901053#comment-15901053
 ] 

Ufuk Celebi commented on FLINK-5780:


[~dawidwys] Thanks for picking this up.

> Extend ConfigOption with descriptions
> -
>
> Key: FLINK-5780
> URL: https://issues.apache.org/jira/browse/FLINK-5780
> Project: Flink
>  Issue Type: Sub-task
>  Components: Core, Documentation
>Reporter: Ufuk Celebi
>Assignee: Dawid Wysakowicz
>
> The {{ConfigOption}} type is meant to replace the flat {{ConfigConstants}}. 
> As part of automating the generation of a docs config page we need to extend  
> {{ConfigOption}} with description fields.
> From the ML discussion, these could be:
> {code}
> void shortDescription(String);
> void longDescription(String);
> {code}
> In practice, the description string should contain HTML/Markdown.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (FLINK-5985) Flink treats every task as stateful (making topology changes impossible)

2017-03-08 Thread Ufuk Celebi (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-5985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15901040#comment-15901040
 ] 

Ufuk Celebi commented on FLINK-5985:


I fully agree that this whole UID business is very tricky and intransparent 
right now.

Could you:
1) Post the Exception you get
2) Run the job before the savepoint with DEBUG logging for 
org.apache.flink.streaming.api.graph
3) Run the job with which you want to restore with DEBUG logging for 
org.apache.flink.streaming.api.graph?

Either this is a bug or we are overlooking a stateful task that does not have a 
UID set. You can set a UID after the fact with 1.2 via `setUIDHash` to the 
String of the JobVertexID.


> Flink treats every task as stateful (making topology changes impossible)
> 
>
> Key: FLINK-5985
> URL: https://issues.apache.org/jira/browse/FLINK-5985
> Project: Flink
>  Issue Type: Bug
>  Components: State Backends, Checkpointing
>Affects Versions: 1.2.0
>Reporter: Gyula Fora
>Priority: Critical
>
> It seems  that Flink treats every Task as stateful so changing the topology 
> is not possible without setting uid on every single operator.
> If the topology has an iteration this is virtually impossible (or at least 
> gets super hacky)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (FLINK-5996) Jobmanager HA should not crash on lost ZK node

2017-03-08 Thread Ufuk Celebi (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-5996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15901032#comment-15901032
 ] 

Ufuk Celebi commented on FLINK-5996:


Just to understand this correctly, eventually the JM recovers from this, but 
only after suspending all running jobs and reconnecting to another/the same 
host, correct?


> Jobmanager HA should not crash on lost ZK node
> --
>
> Key: FLINK-5996
> URL: https://issues.apache.org/jira/browse/FLINK-5996
> Project: Flink
>  Issue Type: Bug
>  Components: JobManager
>Affects Versions: 1.2.0
>Reporter: Gyula Fora
>
> Even if there are multiple zk hosts configured the jobmanager crashes if one 
> of them is lost:
> org.apache.flink.shaded.org.apache.curator.CuratorConnectionLossException: 
> KeeperErrorCode = ConnectionLoss
>   at 
> org.apache.flink.shaded.org.apache.curator.ConnectionState.checkTimeouts(ConnectionState.java:197)
>   at 
> org.apache.flink.shaded.org.apache.curator.ConnectionState.getZooKeeper(ConnectionState.java:87)
>   at 
> org.apache.flink.shaded.org.apache.curator.CuratorZookeeperClient.getZooKeeper(CuratorZookeeperClient.java:115)
>   at 
> org.apache.flink.shaded.org.apache.curator.framework.imps.CuratorFrameworkImpl.getZooKeeper(CuratorFrameworkImpl.java:477)
>   at 
> org.apache.flink.shaded.org.apache.curator.framework.imps.GetDataBuilderImpl$4.call(GetDataBuilderImpl.java:302)
>   at 
> org.apache.flink.shaded.org.apache.curator.framework.imps.GetDataBuilderImpl$4.call(GetDataBuilderImpl.java:291)
>   at 
> org.apache.flink.shaded.org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:107)
>   at 
> org.apache.flink.shaded.org.apache.curator.framework.imps.GetDataBuilderImpl.pathInForeground(GetDataBuilderImpl.java:288)
>   at 
> org.apache.flink.shaded.org.apache.curator.framework.imps.GetDataBuilderImpl.forPath(GetDataBuilderImpl.java:279)
>   at 
> org.apache.flink.shaded.org.apache.curator.framework.imps.GetDataBuilderImpl.forPath(GetDataBuilderImpl.java:41)
>   at 
> org.apache.flink.shaded.org.apache.curator.framework.recipes.shared.SharedValue.readValue(SharedValue.java:244)
>   at 
> org.apache.flink.shaded.org.apache.curator.framework.recipes.shared.SharedValue.access$100(SharedValue.java:44)
>   at 
> org.apache.flink.shaded.org.apache.curator.framework.recipes.shared.SharedValue$1.process(SharedValue.java:61)
>   at 
> org.apache.flink.shaded.org.apache.curator.framework.imps.NamespaceWatcher.process(NamespaceWatcher.java:67)
>   at 
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:522)
>   at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498)
> We should have some retry logic there



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (FLINK-5993) Add section about Azure deployment

2017-03-08 Thread Ufuk Celebi (JIRA)
Ufuk Celebi created FLINK-5993:
--

 Summary: Add section about Azure deployment 
 Key: FLINK-5993
 URL: https://issues.apache.org/jira/browse/FLINK-5993
 Project: Flink
  Issue Type: Improvement
  Components: Documentation
Reporter: Ufuk Celebi


Running Flink on Azure can lead to unexpected problems. From the user mailing 
list:
{quote}
For anyone seeing this thread in the future, we managed to solve the issue. The 
problem was in the Azure storage SDK.
Flink is using Hadoop 2.7, so we used version 2.7.3 of the Hadoop-azure 
package. This package uses version 2.0.0 of the azure-storage package, dated 
from 2014. It has several bugs that were since fixed, specifically one where 
the socket timeout was infinite. We updated this package to version 5.0.0 and 
everything is working smoothly now.
{quote}
(http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Flink-checkpointing-gets-stuck-td11776.html)

At least this caveat should be covered.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (FLINK-5985) Flink treats every task as stateful (making topology changes impossible)

2017-03-08 Thread Ufuk Celebi (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-5985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15900973#comment-15900973
 ] 

Ufuk Celebi commented on FLINK-5985:


It should be possible to change the topology if you set the UIDs *only* for the 
stateful operators, because these are the only ones that are part of the 
savepoint state.

If you don't set any UIDs, changing the topology will result in a changed auto 
generated UID for the stateful operators as well. That's the situation you are 
describing here, right?


> Flink treats every task as stateful (making topology changes impossible)
> 
>
> Key: FLINK-5985
> URL: https://issues.apache.org/jira/browse/FLINK-5985
> Project: Flink
>  Issue Type: Bug
>  Components: State Backends, Checkpointing
>Affects Versions: 1.2.0
>Reporter: Gyula Fora
>Priority: Critical
>
> It seems  that Flink treats every Task as stateful so changing the topology 
> is not possible without setting uid on every single operator.
> If the topology has an iteration this is virtually impossible (or at least 
> gets super hacky)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Closed] (FLINK-5965) Typo on DropWizard wrappers

2017-03-06 Thread Ufuk Celebi (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-5965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ufuk Celebi closed FLINK-5965.
--
   Resolution: Fixed
Fix Version/s: 1.2.1
   1.3.0

Fixed in 32a55a2 (release-1.2), 62517ca (master).

> Typo on DropWizard wrappers
> ---
>
> Key: FLINK-5965
> URL: https://issues.apache.org/jira/browse/FLINK-5965
> Project: Flink
>  Issue Type: Bug
>  Components: Documentation
>Reporter: Francisco Sokol
>Priority: Trivial
> Fix For: 1.3.0, 1.2.1
>
>
> The metrics page, in the monitoring section, has two small typos in the 
> example code:
> - `DropWizardHistogramWrapper` should be  `DropwizardHistogramWrapper`
> - `DropWizardMeterWrapper` should be `DropwizardMeterWrapper`



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Closed] (FLINK-5402) Fails AkkaRpcServiceTest#testTerminationFuture

2017-03-01 Thread Ufuk Celebi (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-5402?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ufuk Celebi closed FLINK-5402.
--
   Resolution: Fixed
Fix Version/s: 1.3.0

Fixed in 72f18dc (master).

> Fails AkkaRpcServiceTest#testTerminationFuture
> --
>
> Key: FLINK-5402
> URL: https://issues.apache.org/jira/browse/FLINK-5402
> Project: Flink
>  Issue Type: Bug
>  Components: Distributed Coordination
>Affects Versions: 1.2.0
> Environment: macOS
>Reporter: Anton Solovev
>Assignee: Dawid Wysakowicz
>  Labels: test-stability
> Fix For: 1.3.0
>
>
> {code}
> testTerminationFuture(org.apache.flink.runtime.rpc.akka.AkkaRpcServiceTest)  
> Time elapsed: 1.013 sec  <<< ERROR!
> org.junit.runners.model.TestTimedOutException: test timed out after 1000 
> milliseconds
>   at sun.misc.Unsafe.park(Native Method)
>   at 
> java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedNanos(AbstractQueuedSynchronizer.java:1037)
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1328)
>   at 
> scala.concurrent.impl.Promise$DefaultPromise.tryAwait(Promise.scala:208)
>   at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:218)
>   at 
> scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223)
>   at scala.concurrent.Await$$anonfun$result$1.apply(package.scala:107)
>   at 
> scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53)
>   at scala.concurrent.Await$.result(package.scala:107)
>   at akka.remote.Remoting.start(Remoting.scala:179)
>   at 
> akka.remote.RemoteActorRefProvider.init(RemoteActorRefProvider.scala:184)
>   at akka.actor.ActorSystemImpl.liftedTree2$1(ActorSystem.scala:620)
>   at akka.actor.ActorSystemImpl._start$lzycompute(ActorSystem.scala:617)
>   at akka.actor.ActorSystemImpl._start(ActorSystem.scala:617)
>   at akka.actor.ActorSystemImpl.start(ActorSystem.scala:634)
>   at akka.actor.ActorSystem$.apply(ActorSystem.scala:142)
>   at akka.actor.ActorSystem$.apply(ActorSystem.scala:119)
>   at akka.actor.ActorSystem$.create(ActorSystem.scala:67)
>   at 
> org.apache.flink.runtime.akka.AkkaUtils$.createActorSystem(AkkaUtils.scala:104)
>   at 
> org.apache.flink.runtime.akka.AkkaUtils$.createDefaultActorSystem(AkkaUtils.scala:114)
>   at 
> org.apache.flink.runtime.akka.AkkaUtils.createDefaultActorSystem(AkkaUtils.scala)
>   at 
> org.apache.flink.runtime.rpc.akka.AkkaRpcServiceTest.testTerminationFuture(AkkaRpcServiceTest.java:134)
> {code} in org.apache.flink.runtime.rpc.akka.AkkaRpcServiceTest while testing 
> current master 1.2.0 branch 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Closed] (FLINK-5389) Fails #testAnswerFailureWhenSavepointReadFails

2017-03-01 Thread Ufuk Celebi (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-5389?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ufuk Celebi closed FLINK-5389.
--
   Resolution: Fixed
Fix Version/s: 1.3.0

Fixed in b59c14e (master).

> Fails #testAnswerFailureWhenSavepointReadFails
> --
>
> Key: FLINK-5389
> URL: https://issues.apache.org/jira/browse/FLINK-5389
> Project: Flink
>  Issue Type: Bug
>  Components: State Backends, Checkpointing
>Affects Versions: 1.2.0
> Environment: macOS sierra
>Reporter: Anton Solovev
>Assignee: Ufuk Celebi
>  Labels: test
> Fix For: 1.3.0
>
>
> {{testAnswerFailureWhenSavepointReadFails}} fails in {{JobSubmitTest}} when  
> {{timeout}} is set to 5000ms, but when 6000ms it pass



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Closed] (FLINK-2573) Add Kerberos test case

2017-02-28 Thread Ufuk Celebi (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-2573?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ufuk Celebi closed FLINK-2573.
--
Resolution: Fixed

All sub issues are resolved

> Add Kerberos test case
> --
>
> Key: FLINK-2573
> URL: https://issues.apache.org/jira/browse/FLINK-2573
> Project: Flink
>  Issue Type: Test
>  Components: Distributed Coordination, Local Runtime
>Affects Versions: 0.9, 0.10.0
>Reporter: Maximilian Michels
>
> The Kerberos support has been tested manually in a cluster environment but 
> programmatic test cases are missing. An automated test case could be 
> implemented against Hadoop's MiniKDC.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Closed] (FLINK-4634) TaskStopTest.testStopExecution() times out

2017-02-28 Thread Ufuk Celebi (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-4634?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ufuk Celebi closed FLINK-4634.
--
Resolution: Cannot Reproduce

> TaskStopTest.testStopExecution() times out
> --
>
> Key: FLINK-4634
> URL: https://issues.apache.org/jira/browse/FLINK-4634
> Project: Flink
>  Issue Type: Bug
>  Components: Distributed Coordination
>Reporter: Robert Metzger
>  Labels: test-stability
>
> The mentioned test failed in: 
> https://s3.amazonaws.com/archive.travis-ci.org/jobs/160407887/log.txt
> {code}
> testStopExecution(org.apache.flink.runtime.taskmanager.TaskStopTest)  Time 
> elapsed: 10.619 sec  <<< ERROR!
> java.lang.Exception: test timed out after 1 milliseconds
>   at org.junit.internal.runners.MethodRoadie$1.run(MethodRoadie.java:77)
>   at 
> org.junit.internal.runners.MethodRoadie.runBeforesThenTestThenAfters(MethodRoadie.java:96)
>   at 
> org.powermock.modules.junit4.internal.impl.PowerMockJUnit44RunnerDelegateImpl$PowerMockJUnit44MethodRunner.executeTest(PowerMockJUnit44RunnerDelegateImpl.java:294)
>   at 
> org.powermock.modules.junit4.internal.impl.PowerMockJUnit47RunnerDelegateImpl$PowerMockJUnit47MethodRunner.executeTestInSuper(PowerMockJUnit47RunnerDelegateImpl.java:127)
>   at 
> org.powermock.modules.junit4.internal.impl.PowerMockJUnit47RunnerDelegateImpl$PowerMockJUnit47MethodRunner.executeTest(PowerMockJUnit47RunnerDelegateImpl.java:82)
>   at 
> org.powermock.modules.junit4.internal.impl.PowerMockJUnit44RunnerDelegateImpl$PowerMockJUnit44MethodRunner.runBeforesThenTestThenAfters(PowerMockJUnit44RunnerDelegateImpl.java:282)
>   at 
> org.junit.internal.runners.MethodRoadie.runWithTimeout(MethodRoadie.java:57)
>   at org.junit.internal.runners.MethodRoadie.run(MethodRoadie.java:47)
>   at 
> org.powermock.modules.junit4.internal.impl.PowerMockJUnit44RunnerDelegateImpl.invokeTestMethod(PowerMockJUnit44RunnerDelegateImpl.java:207)
>   at 
> org.powermock.modules.junit4.internal.impl.PowerMockJUnit44RunnerDelegateImpl.runMethods(PowerMockJUnit44RunnerDelegateImpl.java:146)
>   at 
> org.powermock.modules.junit4.internal.impl.PowerMockJUnit44RunnerDelegateImpl$1.run(PowerMockJUnit44RunnerDelegateImpl.java:120)
>   at 
> org.junit.internal.runners.ClassRoadie.runUnprotected(ClassRoadie.java:33)
>   at 
> org.junit.internal.runners.ClassRoadie.runProtected(ClassRoadie.java:45)
>   at 
> org.powermock.modules.junit4.internal.impl.PowerMockJUnit44RunnerDelegateImpl.run(PowerMockJUnit44RunnerDelegateImpl.java:118)
>   at 
> org.powermock.modules.junit4.common.internal.impl.JUnit4TestSuiteChunkerImpl.run(JUnit4TestSuiteChunkerImpl.java:104)
>   at 
> org.powermock.modules.junit4.common.internal.impl.AbstractCommonPowerMockRunner.run(AbstractCommonPowerMockRunner.java:53)
>   at 
> org.powermock.modules.junit4.PowerMockRunner.run(PowerMockRunner.java:53)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:283)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:173)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:153)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:128)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:203)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:155)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:103)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (FLINK-3704) JobManagerHAProcessFailureBatchRecoveryITCase.testJobManagerProcessFailure unstable

2017-02-28 Thread Ufuk Celebi (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15888456#comment-15888456
 ] 

Ufuk Celebi commented on FLINK-3704:


[~dernasherbrezon] No, I think it is as Till says an issue with the test 
sharing a ZooKeeper instance

> JobManagerHAProcessFailureBatchRecoveryITCase.testJobManagerProcessFailure 
> unstable
> ---
>
> Key: FLINK-3704
> URL: https://issues.apache.org/jira/browse/FLINK-3704
> Project: Flink
>  Issue Type: Bug
>  Components: JobManager
>Reporter: Robert Metzger
>  Labels: test-stability
>
> https://s3.amazonaws.com/archive.travis-ci.org/jobs/120882840/log.txt
> {code}
> testJobManagerProcessFailure[1](org.apache.flink.test.recovery.JobManagerHAProcessFailureBatchRecoveryITCase)
>   Time elapsed: 9.302 sec  <<< ERROR!
> java.io.IOException: Actor at 
> akka.tcp://flink@127.0.0.1:55591/user/jobmanager not reachable. Please make 
> sure that the actor is running and its port is reachable.
>   at 
> org.apache.flink.runtime.akka.AkkaUtils$.getActorRef(AkkaUtils.scala:384)
>   at org.apache.flink.runtime.akka.AkkaUtils.getActorRef(AkkaUtils.scala)
>   at 
> org.apache.flink.test.recovery.JobManagerHAProcessFailureBatchRecoveryITCase.testJobManagerProcessFailure(JobManagerHAProcessFailureBatchRecoveryITCase.java:290)
> Caused by: akka.actor.ActorNotFound: Actor not found for: 
> ActorSelection[Anchor(akka.tcp://flink@127.0.0.1:55591/), 
> Path(/user/jobmanager)]
>   at 
> akka.actor.ActorSelection$$anonfun$resolveOne$1.apply(ActorSelection.scala:65)
>   at 
> akka.actor.ActorSelection$$anonfun$resolveOne$1.apply(ActorSelection.scala:63)
>   at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:32)
>   at 
> akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.processBatch$1(BatchingExecutor.scala:67)
>   at 
> akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.apply$mcV$sp(BatchingExecutor.scala:82)
>   at 
> akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.apply(BatchingExecutor.scala:59)
>   at 
> akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.apply(BatchingExecutor.scala:59)
>   at 
> scala.concurrent.BlockContext$.withBlockContext(BlockContext.scala:72)
>   at akka.dispatch.BatchingExecutor$Batch.run(BatchingExecutor.scala:58)
>   at 
> akka.dispatch.ExecutionContexts$sameThreadExecutionContext$.unbatchedExecute(Future.scala:74)
>   at 
> akka.dispatch.BatchingExecutor$class.execute(BatchingExecutor.scala:110)
>   at 
> akka.dispatch.ExecutionContexts$sameThreadExecutionContext$.execute(Future.scala:73)
>   at 
> scala.concurrent.impl.CallbackRunnable.executeWithValue(Promise.scala:40)
>   at 
> scala.concurrent.impl.Promise$DefaultPromise.tryComplete(Promise.scala:248)
>   at akka.pattern.PromiseActorRef.$bang(AskSupport.scala:267)
>   at akka.actor.EmptyLocalActorRef.specialHandle(ActorRef.scala:508)
>   at akka.actor.DeadLetterActorRef.specialHandle(ActorRef.scala:541)
>   at akka.actor.DeadLetterActorRef.$bang(ActorRef.scala:531)
>   at 
> akka.remote.RemoteActorRefProvider$RemoteDeadLetterActorRef.$bang(RemoteActorRefProvider.scala:87)
>   at akka.remote.EndpointWriter.postStop(Endpoint.scala:561)
>   at akka.actor.Actor$class.aroundPostStop(Actor.scala:475)
>   at akka.remote.EndpointActor.aroundPostStop(Endpoint.scala:415)
>   at 
> akka.actor.dungeon.FaultHandling$class.akka$actor$dungeon$FaultHandling$$finishTerminate(FaultHandling.scala:210)
>   at 
> akka.actor.dungeon.FaultHandling$class.terminate(FaultHandling.scala:172)
>   at akka.actor.ActorCell.terminate(ActorCell.scala:369)
>   at akka.actor.ActorCell.invokeAll$1(ActorCell.scala:462)
>   at akka.actor.ActorCell.systemInvoke(ActorCell.scala:478)
>   at akka.dispatch.Mailbox.processAllSystemMessages(Mailbox.scala:279)
>   at akka.dispatch.Mailbox.run(Mailbox.scala:220)
>   at akka.dispatch.Mailbox.exec(Mailbox.scala:231)
>   at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
>   at 
> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.pollAndExecAll(ForkJoinPool.java:1253)
>   at 
> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1346)
>   at 
> scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
>   at 
> scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Closed] (FLINK-3812) Kafka09ITCase testAllDeletes fails on Travis

2017-02-28 Thread Ufuk Celebi (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-3812?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ufuk Celebi closed FLINK-3812.
--
Resolution: Cannot Reproduce

> Kafka09ITCase testAllDeletes fails on Travis
> 
>
> Key: FLINK-3812
> URL: https://issues.apache.org/jira/browse/FLINK-3812
> Project: Flink
>  Issue Type: Bug
>  Components: Tests
>Affects Versions: 1.1.0
>Reporter: Ufuk Celebi
>  Labels: test-stability
>
> {code}
> Tests run: 14, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 123.878 sec 
> <<< FAILURE! - in org.apache.flink.streaming.connectors.kafka.Kafka09ITCase
> testAllDeletes(org.apache.flink.streaming.connectors.kafka.Kafka09ITCase)  
> Time elapsed: 3.553 sec  <<< ERROR!
> org.apache.flink.client.program.ProgramInvocationException: The program 
> execution failed: Job execution failed.
>   at org.apache.flink.client.program.Client.runBlocking(Client.java:381)
>   at org.apache.flink.client.program.Client.runBlocking(Client.java:355)
>   at org.apache.flink.client.program.Client.runBlocking(Client.java:348)
>   at 
> org.apache.flink.streaming.api.environment.RemoteStreamEnvironment.executeRemotely(RemoteStreamEnvironment.java:208)
>   at 
> org.apache.flink.streaming.api.environment.RemoteStreamEnvironment.execute(RemoteStreamEnvironment.java:172)
>   at 
> org.apache.flink.streaming.connectors.kafka.KafkaConsumerTestBase.runAllDeletesTest(KafkaConsumerTestBase.java:1056)
>   at 
> org.apache.flink.streaming.connectors.kafka.Kafka09ITCase.testAllDeletes(Kafka09ITCase.java:110)
> Caused by: org.apache.flink.runtime.client.JobExecutionException: Job 
> execution failed.
>   at 
> org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1$$anonfun$applyOrElse$7.apply$mcV$sp(JobManager.scala:806)
>   at 
> org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1$$anonfun$applyOrElse$7.apply(JobManager.scala:752)
>   at 
> org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1$$anonfun$applyOrElse$7.apply(JobManager.scala:752)
>   at 
> scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24)
>   at 
> scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24)
>   at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:41)
>   at 
> akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:401)
>   at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
>   at 
> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.pollAndExecAll(ForkJoinPool.java:1253)
>   at 
> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1346)
>   at 
> scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
>   at 
> scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
> Caused by: java.lang.Exception: Failed to send data to Kafka: This server 
> does not host this topic-partition.
>   at 
> org.apache.flink.streaming.connectors.kafka.FlinkKafkaProducerBase.checkErroneous(FlinkKafkaProducerBase.java:287)
>   at 
> org.apache.flink.streaming.connectors.kafka.FlinkKafkaProducerBase.close(FlinkKafkaProducerBase.java:276)
>   at 
> org.apache.flink.api.common.functions.util.FunctionUtils.closeFunction(FunctionUtils.java:45)
>   at 
> org.apache.flink.streaming.api.operators.AbstractUdfStreamOperator.close(AbstractUdfStreamOperator.java:98)
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.closeAllOperators(StreamTask.java:329)
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:238)
>   at org.apache.flink.runtime.taskmanager.Task.run(Task.java:579)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: org.apache.kafka.common.errors.UnknownTopicOrPartitionException: 
> This server does not host this topic-partition.
> {code}
> https://s3.amazonaws.com/archive.travis-ci.org/jobs/125565203/log.txt



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Closed] (FLINK-4071) JobManagerHAJobGraphRecoveryITCase.testJobManagerCleanUp failed on Travis

2017-02-28 Thread Ufuk Celebi (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-4071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ufuk Celebi closed FLINK-4071.
--
Resolution: Cannot Reproduce

I think this has been fixed with the introduction of the SUSPENDED state.

> JobManagerHAJobGraphRecoveryITCase.testJobManagerCleanUp failed on Travis
> -
>
> Key: FLINK-4071
> URL: https://issues.apache.org/jira/browse/FLINK-4071
> Project: Flink
>  Issue Type: Bug
>  Components: Tests
>Affects Versions: 1.1.0
>Reporter: Till Rohrmann
>Assignee: Ufuk Celebi
>Priority: Critical
>  Labels: test-stability
>
> The test case {{JobManagerHAJobGraphRecoveryITCase.testJobManagerCleanUp}} 
> failed on Travis.
> https://s3.amazonaws.com/archive.travis-ci.org/jobs/137498380/log.txt



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (FLINK-3990) ZooKeeperLeaderElectionTest kills the JVM

2017-02-28 Thread Ufuk Celebi (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15888414#comment-15888414
 ] 

Ufuk Celebi commented on FLINK-3990:


This seems to be unrelated to the mentioned tests. I think we can close this 
issue as it didn't occur since June 16. What do you think?

> ZooKeeperLeaderElectionTest kills the JVM
> -
>
> Key: FLINK-3990
> URL: https://issues.apache.org/jira/browse/FLINK-3990
> Project: Flink
>  Issue Type: Bug
>  Components: Tests
>Affects Versions: 1.1.0
>Reporter: Stephan Ewen
>Priority: Critical
>  Labels: test-stability
>
> Frequently, the {{ZooKeeperLeaderElectionTest}} causes the JVM to exit, 
> failing the build.
> https://api.travis-ci.org/jobs/133759814/log.txt?deansi=true



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (FLINK-5389) Fails #testAnswerFailureWhenSavepointReadFails

2017-02-28 Thread Ufuk Celebi (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-5389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15888400#comment-15888400
 ] 

Ufuk Celebi commented on FLINK-5389:


I would increase the timeout here and in FLINK-5402.

> Fails #testAnswerFailureWhenSavepointReadFails
> --
>
> Key: FLINK-5389
> URL: https://issues.apache.org/jira/browse/FLINK-5389
> Project: Flink
>  Issue Type: Bug
>  Components: State Backends, Checkpointing
>Affects Versions: 1.2.0
> Environment: macOS sierra
>Reporter: Anton Solovev
>Assignee: Ufuk Celebi
>  Labels: test
>
> {{testAnswerFailureWhenSavepointReadFails}} fails in {{JobSubmitTest}} when  
> {{timeout}} is set to 5000ms, but when 6000ms it pass



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Closed] (FLINK-4060) WebRuntimeMonitorITCase.testStandaloneWebRuntimeMonitor failed on Travis

2017-02-28 Thread Ufuk Celebi (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-4060?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ufuk Celebi closed FLINK-4060.
--
Resolution: Cannot Reproduce

> WebRuntimeMonitorITCase.testStandaloneWebRuntimeMonitor failed on Travis
> 
>
> Key: FLINK-4060
> URL: https://issues.apache.org/jira/browse/FLINK-4060
> Project: Flink
>  Issue Type: Bug
>  Components: Tests
>Affects Versions: 1.1.0
>Reporter: Till Rohrmann
>Priority: Critical
>  Labels: test-stability
>
> The test case {{WebRuntimeMonitorITCase.testStandaloneWebRuntimeMonitor}} 
> failed on Travis with
> {code}
> WebRuntimeMonitorITCase.testStandaloneWebRuntimeMonitor:113 expected:<200 OK> 
> but was:<503 Service Unavailable>
> {code}
> https://s3.amazonaws.com/archive.travis-ci.org/jobs/137191030/log.txt



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (FLINK-5125) ContinuousFileProcessingCheckpointITCase is Flaky

2017-02-28 Thread Ufuk Celebi (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-5125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15888387#comment-15888387
 ] 

Ufuk Celebi commented on FLINK-5125:


Any progress on this [~kkl0u]]?

> ContinuousFileProcessingCheckpointITCase is Flaky
> -
>
> Key: FLINK-5125
> URL: https://issues.apache.org/jira/browse/FLINK-5125
> Project: Flink
>  Issue Type: Bug
>  Components: Streaming
>Reporter: Aljoscha Krettek
>Assignee: Kostas Kloudas
>  Labels: test-stability
>
> This is the travis log: 
> https://api.travis-ci.org/jobs/177402367/log.txt?deansi=true
> The relevant sections is:
> {code}
> Running org.apache.flink.test.checkpointing.CoStreamCheckpointingITCase
> Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 2.571 sec - 
> in org.apache.flink.test.exampleJavaPrograms.EnumTriangleBasicITCase
> Running org.apache.flink.test.checkpointing.EventTimeWindowCheckpointingITCase
> Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 17.704 sec - 
> in org.apache.flink.test.checkpointing.CoStreamCheckpointingITCase
> Running 
> org.apache.flink.test.checkpointing.EventTimeAllWindowCheckpointingITCase
> Tests run: 5, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 11.805 sec - 
> in org.apache.flink.test.checkpointing.EventTimeAllWindowCheckpointingITCase
> Running 
> org.apache.flink.test.checkpointing.ContinuousFileProcessingCheckpointITCase
> org.apache.flink.client.program.ProgramInvocationException: The program 
> execution failed: Job execution failed.
>   at 
> org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:427)
>   at 
> org.apache.flink.client.program.StandaloneClusterClient.submitJob(StandaloneClusterClient.java:101)
>   at 
> org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:400)
>   at 
> org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:392)
>   at 
> org.apache.flink.streaming.api.environment.RemoteStreamEnvironment.executeRemotely(RemoteStreamEnvironment.java:209)
>   at 
> org.apache.flink.streaming.api.environment.RemoteStreamEnvironment.execute(RemoteStreamEnvironment.java:173)
>   at org.apache.flink.test.util.TestUtils.tryExecute(TestUtils.java:32)
>   at 
> org.apache.flink.test.checkpointing.StreamFaultToleranceTestBase.runCheckpointedProgram(StreamFaultToleranceTestBase.java:106)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:606)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:55)
>   at org.junit.rules.RunRules.evaluate(RunRules.java:20)
>   at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
>   at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
>   at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
>   at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
>   at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
>   at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
>   at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
>   at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
>   at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:283)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:173)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:153)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:128)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:203)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:155)
>   at 
> 

[jira] [Closed] (FLINK-3212) JobManagerCheckpointRecoveryITCase

2017-02-28 Thread Ufuk Celebi (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-3212?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ufuk Celebi closed FLINK-3212.
--
Resolution: Cannot Reproduce

> JobManagerCheckpointRecoveryITCase
> --
>
> Key: FLINK-3212
> URL: https://issues.apache.org/jira/browse/FLINK-3212
> Project: Flink
>  Issue Type: Bug
>  Components: Tests
>Reporter: Matthias J. Sax
>Priority: Critical
>  Labels: test-stability
>
> {noformat}
> Tests in error: 
> JobManagerCheckpointRecoveryITCase.testCheckpointRecoveryFailure:354 » 
> IllegalState
> JobManagerCheckpointRecoveryITCase.testCheckpointedStreamingSumProgram:192 » 
> IO
> {noformat}
> https://travis-ci.org/mjsax/flink/jobs/101407273



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Closed] (FLINK-4153) ExecutionGraphRestartTest Fails

2017-02-28 Thread Ufuk Celebi (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-4153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ufuk Celebi closed FLINK-4153.
--
Resolution: Cannot Reproduce

Respective test code changed since this issue was created and it was not 
reported again or happened on the test infrastructure. 

> ExecutionGraphRestartTest Fails
> ---
>
> Key: FLINK-4153
> URL: https://issues.apache.org/jira/browse/FLINK-4153
> Project: Flink
>  Issue Type: Bug
>  Components: Distributed Coordination
>Affects Versions: 1.1.0
>Reporter: Aljoscha Krettek
>Assignee: Ufuk Celebi
>  Labels: test-stability
>
> The test failed with this output:
> {code}
> Tests run: 11, Failures: 2, Errors: 0, Skipped: 0, Time elapsed: 240.854 sec 
> <<< FAILURE! - in 
> org.apache.flink.runtime.executiongraph.ExecutionGraphRestartTest
> testRestartAutomatically(org.apache.flink.runtime.executiongraph.ExecutionGraphRestartTest)
>   Time elapsed: 120.085 sec  <<< FAILURE!
> java.lang.AssertionError: expected: but was:
> at org.junit.Assert.fail(Assert.java:88)
> at org.junit.Assert.failNotEquals(Assert.java:743)
> at org.junit.Assert.assertEquals(Assert.java:118)
> at org.junit.Assert.assertEquals(Assert.java:144)
> at 
> org.apache.flink.runtime.executiongraph.ExecutionGraphRestartTest.restartAfterFailure(ExecutionGraphRestartTest.java:806)
> at 
> org.apache.flink.runtime.executiongraph.ExecutionGraphRestartTest.testRestartAutomatically(ExecutionGraphRestartTest.java:210)
> testFailingExecutionAfterRestart(org.apache.flink.runtime.executiongraph.ExecutionGraphRestartTest)
>   Time elapsed: 120.018 sec  <<< FAILURE!
> java.lang.AssertionError: expected: but was:
> at org.junit.Assert.fail(Assert.java:88)
> at org.junit.Assert.failNotEquals(Assert.java:743)
> at org.junit.Assert.assertEquals(Assert.java:118)
> at org.junit.Assert.assertEquals(Assert.java:144)
> at 
> org.apache.flink.runtime.executiongraph.ExecutionGraphRestartTest.testFailingExecutionAfterRestart(ExecutionGraphRestartTest.java:530)
> {code}
> Unfortunately, this failed locally so the above output is the only output I 
> have for the failure.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Closed] (FLINK-4995) YarnFlinkResourceManagerTest JobManager Lost Leadership test failed

2017-02-28 Thread Ufuk Celebi (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-4995?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ufuk Celebi closed FLINK-4995.
--
Resolution: Cannot Reproduce

Test removed

> YarnFlinkResourceManagerTest JobManager Lost Leadership test failed
> ---
>
> Key: FLINK-4995
> URL: https://issues.apache.org/jira/browse/FLINK-4995
> Project: Flink
>  Issue Type: Bug
>  Components: YARN
>Affects Versions: 1.1.3
>Reporter: Ufuk Celebi
>  Labels: test-stability
>
> {code}
> Running org.apache.flink.yarn.YarnFlinkResourceManagerTest
> Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 1.621 sec <<< 
> FAILURE! - in org.apache.flink.yarn.YarnFlinkResourceManagerTest
> testYarnFlinkResourceManagerJobManagerLostLeadership(org.apache.flink.yarn.YarnFlinkResourceManagerTest)
>   Time elapsed: 0.397 sec  <<< FAILURE!
> java.lang.AssertionError: assertion failed: expected class 
> org.apache.flink.runtime.messages.Acknowledge, found class 
> org.apache.flink.runtime.clusterframework.messages.RegisterResourceManager
>   at scala.Predef$.assert(Predef.scala:165)
>   at 
> akka.testkit.TestKitBase$class.expectMsgClass_internal(TestKit.scala:424)
>   at akka.testkit.TestKitBase$class.expectMsgClass(TestKit.scala:419)
>   at akka.testkit.TestKit.expectMsgClass(TestKit.scala:718)
>   at akka.testkit.JavaTestKit.expectMsgClass(JavaTestKit.java:408)
>   at 
> org.apache.flink.yarn.YarnFlinkResourceManagerTest$1.(YarnFlinkResourceManagerTest.java:179)
>   at 
> org.apache.flink.yarn.YarnFlinkResourceManagerTest.testYarnFlinkResourceManagerJobManagerLostLeadership(YarnFlinkResourceManagerTest.java:90)
> {code}
> https://travis-ci.org/uce/flink/jobs/172552415
> Failed in a branch with an unrelated change in TaskTest.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Closed] (FLINK-3571) SavepointITCase.testRestoreFailure fails on Travis

2017-02-28 Thread Ufuk Celebi (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-3571?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ufuk Celebi closed FLINK-3571.
--
Resolution: Cannot Reproduce

Savepoint code changed quite a bit since this issue was created. I've not seen 
this fail recently. If it occurs again, I will re-open/create a new issue.

> SavepointITCase.testRestoreFailure fails on Travis
> --
>
> Key: FLINK-3571
> URL: https://issues.apache.org/jira/browse/FLINK-3571
> Project: Flink
>  Issue Type: Bug
>  Components: State Backends, Checkpointing
>Reporter: Till Rohrmann
>Assignee: Ufuk Celebi
>Priority: Critical
>  Labels: test-stability
>
> The test case {{SavepointITCase.testRestoreFailure}} failed on Travis [1]
> [1] https://s3.amazonaws.com/archive.travis-ci.org/jobs/113084521/log.txt



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Closed] (FLINK-4397) Unstable test SlotCountExceedingParallelismTest.tearDown

2017-02-28 Thread Ufuk Celebi (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-4397?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ufuk Celebi closed FLINK-4397.
--
Resolution: Cannot Reproduce

> Unstable test SlotCountExceedingParallelismTest.tearDown
> 
>
> Key: FLINK-4397
> URL: https://issues.apache.org/jira/browse/FLINK-4397
> Project: Flink
>  Issue Type: Bug
>  Components: Distributed Coordination, Metrics
>Reporter: Kostas Kloudas
>  Labels: test-stability
>
> An instance can be found here:
> https://s3.amazonaws.com/archive.travis-ci.org/jobs/152392524/log.txt



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Closed] (FLINK-3331) Test instability: LocalFlinkMiniClusterITCase.testLocalFlinkMiniClusterWithMultipleTaskManagers

2017-02-28 Thread Ufuk Celebi (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-3331?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ufuk Celebi closed FLINK-3331.
--
Resolution: Cannot Reproduce

Did not occur again and a lot of time passed since the original report. I would 
like to close this and re-open if it occurs again.

> Test instability: 
> LocalFlinkMiniClusterITCase.testLocalFlinkMiniClusterWithMultipleTaskManagers
> ---
>
> Key: FLINK-3331
> URL: https://issues.apache.org/jira/browse/FLINK-3331
> Project: Flink
>  Issue Type: Bug
>  Components: Distributed Coordination
>Reporter: Robert Metzger
>  Labels: test-stability
>
> {code}
> testLocalFlinkMiniClusterWithMultipleTaskManagers(org.apache.flink.test.runtime.minicluster.LocalFlinkMiniClusterITCase)
>   Time elapsed: 11.022 sec  <<< FAILURE!
> java.lang.AssertionError: assertion failed: expected 3, found 2
>   at scala.Predef$.assert(Predef.scala:179)
>   at akka.testkit.TestKitBase$class.expectMsg_internal(TestKit.scala:339)
>   at akka.testkit.TestKitBase$class.expectMsg(TestKit.scala:324)
>   at akka.testkit.TestKit.expectMsg(TestKit.scala:718)
>   at akka.testkit.JavaTestKit.expectMsgEquals(JavaTestKit.java:389)
>   at 
> org.apache.flink.test.runtime.minicluster.LocalFlinkMiniClusterITCase$1$1.run(LocalFlinkMiniClusterITCase.java:78)
>   at akka.testkit.JavaTestKit$Within$1.apply(JavaTestKit.java:232)
>   at akka.testkit.TestKitBase$class.within(TestKit.scala:296)
>   at akka.testkit.TestKit.within(TestKit.scala:718)
>   at akka.testkit.TestKitBase$class.within(TestKit.scala:310)
>   at akka.testkit.TestKit.within(TestKit.scala:718)
>   at akka.testkit.JavaTestKit$Within.(JavaTestKit.java:230)
>   at 
> org.apache.flink.test.runtime.minicluster.LocalFlinkMiniClusterITCase$1$1.(LocalFlinkMiniClusterITCase.java:70)
>   at 
> org.apache.flink.test.runtime.minicluster.LocalFlinkMiniClusterITCase$1.(LocalFlinkMiniClusterITCase.java:70)
>   at 
> org.apache.flink.test.runtime.minicluster.LocalFlinkMiniClusterITCase.testLocalFlinkMiniClusterWithMultipleTaskManagers(LocalFlinkMiniClusterITCase.java:67)
> LocalFlinkMiniClusterITCase.testLocalFlinkMiniClusterWithMultipleTaskManagers:67
>  assertion failed: expected 3, found 2
> {code}
> https://s3.amazonaws.com/archive.travis-ci.org/jobs/106868167/log.txt



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Closed] (FLINK-3214) WindowCheckpointingITCase

2017-02-28 Thread Ufuk Celebi (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-3214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ufuk Celebi closed FLINK-3214.
--
Resolution: Cannot Reproduce

Did not occur again and a lot of time passed since the original report. I would 
like to close this and re-open if it occurs again.

> WindowCheckpointingITCase
> -
>
> Key: FLINK-3214
> URL: https://issues.apache.org/jira/browse/FLINK-3214
> Project: Flink
>  Issue Type: Bug
>  Components: Tests
>Reporter: Matthias J. Sax
>Priority: Critical
>  Labels: test-stability
>
> No Output for 300 seconds. Build got canceled.
> https://travis-ci.org/apache/flink/jobs/101407292



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Closed] (FLINK-3238) EventTimeAllWindowCheckpointingITCase.testSlidingTimeWindow()

2017-02-28 Thread Ufuk Celebi (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-3238?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ufuk Celebi closed FLINK-3238.
--
Resolution: Cannot Reproduce

Did not occur again and a lot of time passed since the original report. I would 
like to close this and re-open if it occurs again.

> EventTimeAllWindowCheckpointingITCase.testSlidingTimeWindow()
> -
>
> Key: FLINK-3238
> URL: https://issues.apache.org/jira/browse/FLINK-3238
> Project: Flink
>  Issue Type: Bug
>  Components: Tests
>Reporter: Matthias J. Sax
>Assignee: Aljoscha Krettek
>Priority: Critical
>  Labels: test-stability
>
> "Maven produced no output for 300 seconds."
> https://travis-ci.org/mjsax/flink/jobs/102475719



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Closed] (FLINK-3344) EventTimeWindowCheckpointingITCase.testPreAggregatedTumblingTimeWindow

2017-02-28 Thread Ufuk Celebi (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-3344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ufuk Celebi closed FLINK-3344.
--
Resolution: Cannot Reproduce

Did not occur again and a lot of time passed since the original report. I would 
like to close this and re-open if it occurs again.

> EventTimeWindowCheckpointingITCase.testPreAggregatedTumblingTimeWindow
> --
>
> Key: FLINK-3344
> URL: https://issues.apache.org/jira/browse/FLINK-3344
> Project: Flink
>  Issue Type: Bug
>  Components: Tests
>Reporter: Matthias J. Sax
>Assignee: Aljoscha Krettek
>Priority: Critical
>  Labels: test-stability
>
> https://travis-ci.org/apache/flink/jobs/107198388
> https://travis-ci.org/mjsax/flink/jobs/107198383
> {noformat}
> Maven produced no output for 300 seconds.
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Closed] (FLINK-3356) JobClientActorRecoveryITCase.testJobClientRecovery

2017-02-28 Thread Ufuk Celebi (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-3356?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ufuk Celebi closed FLINK-3356.
--
Resolution: Cannot Reproduce

Did not occur again and a lot of time passed since the original report. I would 
like to close this and re-open if it occurs again.

> JobClientActorRecoveryITCase.testJobClientRecovery
> --
>
> Key: FLINK-3356
> URL: https://issues.apache.org/jira/browse/FLINK-3356
> Project: Flink
>  Issue Type: Bug
>  Components: Tests
>Reporter: Matthias J. Sax
>Priority: Critical
>  Labels: test-stability
>
> https://travis-ci.org/apache/flink/jobs/107597706
> https://travis-ci.org/mjsax/flink/jobs/107597700
> {noformat}
> Tests in error: 
>   JobClientActorRecoveryITCase.testJobClientRecovery:135 » Timeout Futures 
> timed...
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Closed] (FLINK-3199) KafkaITCase.testOneToOneSources

2017-02-28 Thread Ufuk Celebi (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-3199?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ufuk Celebi closed FLINK-3199.
--
Resolution: Cannot Reproduce

Did not occur again and a lot of time passed since the original report (Kafka 
code changed quite a bit in the mean time). I would like to close this and 
re-open if it occurs again.

> KafkaITCase.testOneToOneSources
> ---
>
> Key: FLINK-3199
> URL: https://issues.apache.org/jira/browse/FLINK-3199
> Project: Flink
>  Issue Type: Bug
>  Components: Tests
>Reporter: Matthias J. Sax
>Priority: Critical
>  Labels: test-stability
>
> https://travis-ci.org/mjsax/flink/jobs/100167558
> {noformat}
> Failed tests: 
> KafkaITCase.testOneToOneSources:96->KafkaConsumerTestBase.runOneToOneExactlyOnceTest:521->KafkaTestBase.tryExecute:318
>  Test failed: The program execution failed: Job execution failed.
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Closed] (FLINK-3142) CheckpointCoordinatorTest fails

2017-02-28 Thread Ufuk Celebi (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-3142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ufuk Celebi closed FLINK-3142.
--
Resolution: Cannot Reproduce

Did not occur again and a lot of time passed since the original report. I would 
like to close this and re-open if it occurs again.

> CheckpointCoordinatorTest fails
> ---
>
> Key: FLINK-3142
> URL: https://issues.apache.org/jira/browse/FLINK-3142
> Project: Flink
>  Issue Type: Bug
>  Components: Tests
>Reporter: Matthias J. Sax
>Priority: Critical
>  Labels: test-stability
>
> https://travis-ci.org/mjsax/flink/jobs/95439203
> {noformat}
> Tests run: 14, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 63.713 sec 
> <<< FAILURE! - in 
> org.apache.flink.runtime.checkpoint.CheckpointCoordinatorTest
> testMaxConcurrentAttempsWithSubsumption(org.apache.flink.runtime.checkpoint.CheckpointCoordinatorTest)
>   Time elapsed: 60.145 sec  <<< FAILURE!
> java.lang.AssertionError: null
>   at org.junit.Assert.fail(Assert.java:86)
>   at org.junit.Assert.assertTrue(Assert.java:41)
>   at org.junit.Assert.assertNotNull(Assert.java:621)
>   at org.junit.Assert.assertNotNull(Assert.java:631)
>   at 
> org.apache.flink.runtime.checkpoint.CheckpointCoordinatorTest.testMaxConcurrentAttempsWithSubsumption(CheckpointCoordinatorTest.java:946)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Closed] (FLINK-3053) SuccessAfterNetworkBuffersFailureITCase.testSuccessfulProgramAfterFailure

2017-02-28 Thread Ufuk Celebi (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-3053?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ufuk Celebi closed FLINK-3053.
--
Resolution: Cannot Reproduce

Did not occur again and a lot of time passed since the original report. I would 
like to close this and re-open if it occurs again.

> SuccessAfterNetworkBuffersFailureITCase.testSuccessfulProgramAfterFailure
> -
>
> Key: FLINK-3053
> URL: https://issues.apache.org/jira/browse/FLINK-3053
> Project: Flink
>  Issue Type: Bug
>  Components: Tests
>Reporter: Matthias J. Sax
>Priority: Critical
>  Labels: test-stability
>
> https://travis-ci.org/apache/flink/jobs/92091437
> {noformat}
> Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 16.09 sec <<< 
> FAILURE! - in 
> org.apache.flink.test.misc.SuccessAfterNetworkBuffersFailureITCase
> testSuccessfulProgramAfterFailure(org.apache.flink.test.misc.SuccessAfterNetworkBuffersFailureITCase)
>   Time elapsed: 16.08 sec  <<< FAILURE!
> java.lang.AssertionError: null
>   at org.junit.Assert.fail(Assert.java:86)
>   at org.junit.Assert.assertTrue(Assert.java:41)
>   at org.junit.Assert.assertTrue(Assert.java:52)
>   at 
> org.apache.flink.test.misc.SuccessAfterNetworkBuffersFailureITCase.testSuccessfulProgramAfterFailure(SuccessAfterNetworkBuffersFailureITCase.java:72)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Closed] (FLINK-3095) Test ZooKeeperLeaderElectionTest.testZooKeeperReelectionWithReplacement unstable

2017-02-28 Thread Ufuk Celebi (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-3095?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ufuk Celebi closed FLINK-3095.
--
Resolution: Cannot Reproduce

Did not occur again and a lot of time passed since the original report. I would 
like to close this and re-open if it occurs again.


> Test ZooKeeperLeaderElectionTest.testZooKeeperReelectionWithReplacement 
> unstable
> 
>
> Key: FLINK-3095
> URL: https://issues.apache.org/jira/browse/FLINK-3095
> Project: Flink
>  Issue Type: Bug
>  Components: JobManager
>Affects Versions: 1.0.0
>Reporter: Robert Metzger
>  Labels: test-stability
>
> The test recently failed for me: 
> https://travis-ci.org/apache/flink/jobs/94155701



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Closed] (FLINK-2911) AccumulatingAlignedProcessingTimeWindowOperatorTest.checkpointRestoreWithPendingWindowTumbling

2017-02-28 Thread Ufuk Celebi (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-2911?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ufuk Celebi closed FLINK-2911.
--
Resolution: Cannot Reproduce

Did not occur again and a lot of time passed since the original report. I would 
like to close this and re-open if it occurs again.

> AccumulatingAlignedProcessingTimeWindowOperatorTest.checkpointRestoreWithPendingWindowTumbling
> --
>
> Key: FLINK-2911
> URL: https://issues.apache.org/jira/browse/FLINK-2911
> Project: Flink
>  Issue Type: Bug
>  Components: Streaming
>Reporter: Matthias J. Sax
>Priority: Critical
>  Labels: test-stability
>
> https://travis-ci.org/mjsax/flink/jobs/87181219
> {noformat}
> Tests run: 11, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 6.277 sec 
> <<< FAILURE! - in 
> org.apache.flink.streaming.runtime.operators.windowing.AccumulatingAlignedProcessingTimeWindowOperatorTest
> checkpointRestoreWithPendingWindowTumbling(org.apache.flink.streaming.runtime.operators.windowing.AccumulatingAlignedProcessingTimeWindowOperatorTest)
>   Time elapsed: 1.511 sec  <<< FAILURE!
> java.lang.AssertionError: expected:<1000> but was:<1001>
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.failNotEquals(Assert.java:743)
>   at org.junit.Assert.assertEquals(Assert.java:118)
>   at org.junit.Assert.assertEquals(Assert.java:555)
>   at org.junit.Assert.assertEquals(Assert.java:542)
>   at 
> org.apache.flink.streaming.runtime.operators.windowing.AccumulatingAlignedProcessingTimeWindowOperatorTest.checkpointRestoreWithPendingWindowTumbling(AccumulatingAlignedProcessingTimeWindowOperatorTest.java:603)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Closed] (FLINK-2965) KafkaITCase failure

2017-02-28 Thread Ufuk Celebi (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-2965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ufuk Celebi closed FLINK-2965.
--
Resolution: Cannot Reproduce

Did not occur again and a lot of time passed since the original report (Kafka 
code changed quite a bit). I would like to close this and re-open if it occurs 
again.

> KafkaITCase failure
> ---
>
> Key: FLINK-2965
> URL: https://issues.apache.org/jira/browse/FLINK-2965
> Project: Flink
>  Issue Type: Bug
>  Components: Streaming, Streaming Connectors
>Affects Versions: 0.10.0
>Reporter: Gyula Fora
>  Labels: test-stability
>
> I found an interesting failure in the KafkaITCase, I am not sure if this 
> happened before.
> It received a duplicate record and failed on that (not the usual zookeeper 
> timeout thing)
> Logs are here: 
> https://s3.amazonaws.com/archive.travis-ci.org/jobs/89171477/log.txt



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Closed] (FLINK-2771) IterateTest.testSimpleIteration fails on Travis

2017-02-28 Thread Ufuk Celebi (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-2771?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ufuk Celebi closed FLINK-2771.
--
Resolution: Invalid

Test has been removed.

> IterateTest.testSimpleIteration fails on Travis
> ---
>
> Key: FLINK-2771
> URL: https://issues.apache.org/jira/browse/FLINK-2771
> Project: Flink
>  Issue Type: Bug
>  Components: Local Runtime
>Affects Versions: 0.10.0
>Reporter: Till Rohrmann
>Assignee: Rekha Joshi
>Priority: Critical
>  Labels: test-stability
>
> The {{IterateTest.testSimpleIteration}} failed on Travis with
> {code}
> Failed tests: 
>   IterateTest.testSimpleIteration:384 null
> {code}
> https://s3.amazonaws.com/archive.travis-ci.org/jobs/81986242/log.txt 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Closed] (FLINK-2719) ProcessFailureStreamingRecoveryITCase>AbstractProcessFailureRecoveryTest.testTaskManagerProcessFailure failed on Travis

2017-02-28 Thread Ufuk Celebi (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-2719?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ufuk Celebi closed FLINK-2719.
--
Resolution: Cannot Reproduce

Did not occur again and a lot of time passed since the original report. I would 
like to close this and re-open if it occurs again.

> ProcessFailureStreamingRecoveryITCase>AbstractProcessFailureRecoveryTest.testTaskManagerProcessFailure
>  failed on Travis
> ---
>
> Key: FLINK-2719
> URL: https://issues.apache.org/jira/browse/FLINK-2719
> Project: Flink
>  Issue Type: Bug
>  Components: Distributed Coordination, Local Runtime
>Affects Versions: 0.10.0
>Reporter: Till Rohrmann
>Priority: Critical
>  Labels: test-stability
>
> The test case 
> {{ProcessFailureStreamingRecoveryITCase>AbstractProcessFailureRecoveryTest.testTaskManagerProcessFailure}}
>  failed on travis with the following exception
> {code}
> Failed tests: 
>   
> ProcessFailureStreamingRecoveryITCase>AbstractProcessFailureRecoveryTest.testTaskManagerProcessFailure:211
>  The program encountered a FileNotFoundException : File does not exist: 
> /tmp/cbe4a9aa-3b9a-455d-b7b4-a9abf7c2d9d5/03801d139e79e850249e386ffd89c13ca727bcd8
> {code}
> Most likely, this is a problem of the Travis infrastructure that we could not 
> create the temp file. Maybe we should harden this.
> https://s3.amazonaws.com/archive.travis-ci.org/jobs/81028955/log.txt



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Closed] (FLINK-2718) LeaserChangeCleanupTest sometimes fails

2017-02-28 Thread Ufuk Celebi (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-2718?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ufuk Celebi closed FLINK-2718.
--
Resolution: Cannot Reproduce

Did not occur again and a lot of time passed since the original report. I would 
like to close this and re-open if it occurs again.

> LeaserChangeCleanupTest sometimes fails
> ---
>
> Key: FLINK-2718
> URL: https://issues.apache.org/jira/browse/FLINK-2718
> Project: Flink
>  Issue Type: Bug
>  Components: Tests
>Affects Versions: 0.10.0
>Reporter: Márton Balassi
>Priority: Critical
>  Labels: test-stability
>
> The test failed with a futures timeout on travis:
> LeaderChangeStateCleanupTest.testReelectionOfSameJobManager:245 » Timeout 
> Futu...
> Complete log:
> https://s3.amazonaws.com/archive.travis-ci.org/jobs/81294357/log.txt



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Closed] (FLINK-2618) ExternalSortITCase failure

2017-02-28 Thread Ufuk Celebi (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-2618?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ufuk Celebi closed FLINK-2618.
--
Resolution: Cannot Reproduce

Did not occur again and a lot of time passed since the original report. I would 
like to close this and re-open if it occurs again.

> ExternalSortITCase failure
> --
>
> Key: FLINK-2618
> URL: https://issues.apache.org/jira/browse/FLINK-2618
> Project: Flink
>  Issue Type: Bug
>  Components: Local Runtime
>Reporter: Sachin Goel
>  Labels: test-stability
>
> {{ExternalSortITCase.testSpillingSortWithIntermediateMerge}} fails with the 
> exception:
> {code}
> org.apache.flink.types.NullKeyFieldException: Field 0 is null, but expected 
> to hold a key.
>   at 
> org.apache.flink.api.common.typeutils.record.RecordComparator.setReference(RecordComparator.java:212)
>   at 
> org.apache.flink.api.common.typeutils.record.RecordComparator.setReference(RecordComparator.java:40)
>   at 
> org.apache.flink.runtime.operators.sort.MergeIterator$HeadStream.nextHead(MergeIterator.java:127)
>   at 
> org.apache.flink.runtime.operators.sort.MergeIterator.next(MergeIterator.java:88)
>   at 
> org.apache.flink.runtime.operators.sort.MergeIterator.next(MergeIterator.java:69)
>   at 
> org.apache.flink.runtime.operators.sort.ExternalSortITCase.testSpillingSortWithIntermediateMerge(ExternalSortITCase.java:301)
> {code}
> Here is the build log: 
> https://s3.amazonaws.com/archive.travis-ci.org/jobs/78579607/log.txt



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Closed] (FLINK-2504) ExternalSortLargeRecordsITCase.testSortWithLongAndShortRecordsMixed failed spuriously

2017-02-28 Thread Ufuk Celebi (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-2504?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ufuk Celebi closed FLINK-2504.
--
Resolution: Cannot Reproduce

Did not occur again and a lot of time passed since the original report. I would 
like to close this and re-open if it occurs again.


> ExternalSortLargeRecordsITCase.testSortWithLongAndShortRecordsMixed failed 
> spuriously
> -
>
> Key: FLINK-2504
> URL: https://issues.apache.org/jira/browse/FLINK-2504
> Project: Flink
>  Issue Type: Bug
>  Components: Local Runtime
>Affects Versions: 0.10.0
>Reporter: Till Rohrmann
>Priority: Critical
>  Labels: test-stability
>
> The test 
> {{ExternalSortLargeRecordsITCase.testSortWithLongAndShortRecordsMixed}} 
> failed in one of my Travis builds: 
> https://travis-ci.org/tillrohrmann/flink/jobs/74881883



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Closed] (FLINK-2612) ZooKeeperLeaderElectionITCase failure

2017-02-28 Thread Ufuk Celebi (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-2612?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ufuk Celebi closed FLINK-2612.
--
Resolution: Cannot Reproduce

Did not occur again and a lot of time passed since the original report. I would 
like to close this and re-open if it occurs again.

> ZooKeeperLeaderElectionITCase failure
> -
>
> Key: FLINK-2612
> URL: https://issues.apache.org/jira/browse/FLINK-2612
> Project: Flink
>  Issue Type: Bug
>  Components: Distributed Coordination
>Reporter: Sachin Goel
>  Labels: test-stability
>
> {{testTaskManagerRegistrationAtReelectedLeader}} fails with the following 
> exception:
> {code}
> java.lang.IllegalArgumentException: Multiple entries with same key: 
> InstanceSpec{dataDirectory=/tmp/1441269883020-4, port=46897, 
> electionPort=58417, quorumPort=56164, deleteDataDirectoryOnClose=true, 
> serverId=20, tickTime=-1, maxClientCnxns=-1} 
> org.apache.curator.test.InstanceSpec@b731=[InstanceSpec{dataDirectory=/tmp/1441269883019-0,
>  port=60852, electionPort=49197, quorumPort=59952, 
> deleteDataDirectoryOnClose=true, serverId=11, tickTime=-1, maxClientCnxns=-1} 
> org.apache.curator.test.InstanceSpec@edb4, 
> InstanceSpec{dataDirectory=/tmp/1441269883019-1, port=47963, 
> electionPort=45591, quorumPort=54337, deleteDataDirectoryOnClose=true, 
> serverId=12, tickTime=-1, maxClientCnxns=-1} 
> org.apache.curator.test.InstanceSpec@bb5b, 
> InstanceSpec{dataDirectory=/tmp/1441269883019-2, port=40075, 
> electionPort=44064, quorumPort=43944, deleteDataDirectoryOnClose=true, 
> serverId=13, tickTime=-1, maxClientCnxns=-1} 
> org.apache.curator.test.InstanceSpec@9c8b, 
> InstanceSpec{dataDirectory=/tmp/1441269883019-3, port=33070, 
> electionPort=60533, quorumPort=36048, deleteDataDirectoryOnClose=true, 
> serverId=14, tickTime=-1, maxClientCnxns=-1} 
> org.apache.curator.test.InstanceSpec@812e, 
> InstanceSpec{dataDirectory=/tmp/1441269883019-4, port=46897, 
> electionPort=44653, quorumPort=49755, deleteDataDirectoryOnClose=true, 
> serverId=15, tickTime=-1, maxClientCnxns=-1} 
> org.apache.curator.test.InstanceSpec@b731, 
> InstanceSpec{dataDirectory=/tmp/1441269883020-0, port=50473, 
> electionPort=45215, quorumPort=54810, deleteDataDirectoryOnClose=true, 
> serverId=16, tickTime=-1, maxClientCnxns=-1} 
> org.apache.curator.test.InstanceSpec@c529, 
> InstanceSpec{dataDirectory=/tmp/1441269883020-1, port=60594, 
> electionPort=60502, quorumPort=48875, deleteDataDirectoryOnClose=true, 
> serverId=17, tickTime=-1, maxClientCnxns=-1} 
> org.apache.curator.test.InstanceSpec@ecb2, 
> InstanceSpec{dataDirectory=/tmp/1441269883020-2, port=38484, 
> electionPort=47168, quorumPort=40916, deleteDataDirectoryOnClose=true, 
> serverId=18, tickTime=-1, maxClientCnxns=-1} 
> org.apache.curator.test.InstanceSpec@9654, 
> InstanceSpec{dataDirectory=/tmp/1441269883020-3, port=48396, 
> electionPort=48709, quorumPort=46917, deleteDataDirectoryOnClose=true, 
> serverId=19, tickTime=-1, maxClientCnxns=-1} 
> org.apache.curator.test.InstanceSpec@bd0c, 
> InstanceSpec{dataDirectory=/tmp/1441269883020-4, port=46897, 
> electionPort=58417, quorumPort=56164, deleteDataDirectoryOnClose=true, 
> serverId=20, tickTime=-1, maxClientCnxns=-1} 
> org.apache.curator.test.InstanceSpec@b731] and 
> InstanceSpec{dataDirectory=/tmp/1441269883019-4, port=46897, 
> electionPort=44653, quorumPort=49755, deleteDataDirectoryOnClose=true, 
> serverId=15, tickTime=-1, maxClientCnxns=-1} 
> org.apache.curator.test.InstanceSpec@b731=[InstanceSpec{dataDirectory=/tmp/1441269883019-0,
>  port=60852, electionPort=49197, quorumPort=59952, 
> deleteDataDirectoryOnClose=true, serverId=11, tickTime=-1, maxClientCnxns=-1} 
> org.apache.curator.test.InstanceSpec@edb4, 
> InstanceSpec{dataDirectory=/tmp/1441269883019-1, port=47963, 
> electionPort=45591, quorumPort=54337, deleteDataDirectoryOnClose=true, 
> serverId=12, tickTime=-1, maxClientCnxns=-1} 
> org.apache.curator.test.InstanceSpec@bb5b, 
> InstanceSpec{dataDirectory=/tmp/1441269883019-2, port=40075, 
> electionPort=44064, quorumPort=43944, deleteDataDirectoryOnClose=true, 
> serverId=13, tickTime=-1, maxClientCnxns=-1} 
> org.apache.curator.test.InstanceSpec@9c8b, 
> InstanceSpec{dataDirectory=/tmp/1441269883019-3, port=33070, 
> electionPort=60533, quorumPort=36048, deleteDataDirectoryOnClose=true, 
> serverId=14, tickTime=-1, maxClientCnxns=-1} 
> org.apache.curator.test.InstanceSpec@812e, 
> InstanceSpec{dataDirectory=/tmp/1441269883019-4, port=46897, 
> electionPort=44653, quorumPort=49755, deleteDataDirectoryOnClose=true, 
> serverId=15, tickTime=-1, maxClientCnxns=-1} 
> org.apache.curator.test.InstanceSpec@b731, 
> InstanceSpec{dataDirectory=/tmp/1441269883020-0, port=50473, 
> electionPort=45215, quorumPort=54810, deleteDataDirectoryOnClose=true, 
> serverId=16, 

[jira] [Closed] (FLINK-2803) Add test case for Flink's memory allocation

2017-02-28 Thread Ufuk Celebi (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-2803?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ufuk Celebi closed FLINK-2803.
--
Resolution: Unresolved

> Add test case for Flink's memory allocation
> ---
>
> Key: FLINK-2803
> URL: https://issues.apache.org/jira/browse/FLINK-2803
> Project: Flink
>  Issue Type: Test
>  Components: Distributed Coordination, Startup Shell Scripts, YARN
>Affects Versions: 0.9, 0.10.0
>Reporter: Maximilian Michels
>Priority: Minor
>
> We need a test case which checks the correct memory settings for heap and 
> off-heap memory allocation.
> Memory is calculated in
> 1. The startup scripts ({{taskmanager.sh}})
> 2. The ({{TaskManager}})
> 3. The YARN {{ApplicationMasterActor}}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Closed] (FLINK-5328) ConnectedComponentsITCase testJobWithoutObjectReuse fails

2017-02-28 Thread Ufuk Celebi (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-5328?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ufuk Celebi closed FLINK-5328.
--
Resolution: Fixed

Didn't occur in a while and probably fixed as part of the recent FS 
refactorings.

> ConnectedComponentsITCase testJobWithoutObjectReuse fails
> -
>
> Key: FLINK-5328
> URL: https://issues.apache.org/jira/browse/FLINK-5328
> Project: Flink
>  Issue Type: Test
>  Components: Local Runtime
>Reporter: Ufuk Celebi
>  Labels: test-stability
> Attachments: connectedComponentsFailure.txt
>
>
> I've seen this fail a couple of times now: 
> ConnectedComponentsITCase#testJobWithoutObjectReuse.
> {code}
> Job execution failed.
> org.apache.flink.runtime.client.JobExecutionException: Job execution failed.
>   at 
> org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1$$anonfun$applyOrElse$6.apply$mcV$sp(JobManager.scala:900)
>   at 
> org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1$$anonfun$applyOrElse$6.apply(JobManager.scala:843)
>   at 
> org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1$$anonfun$applyOrElse$6.apply(JobManager.scala:843)
>   at 
> scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24)
>   at 
> scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24)
>   at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:41)
>   at 
> akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:401)
>   at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
>   at 
> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
>   at 
> scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
>   at 
> scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
> Caused by: java.io.IOException: Stream Closed
>   at java.io.FileInputStream.readBytes(Native Method)
>   at java.io.FileInputStream.read(FileInputStream.java:272)
>   at 
> org.apache.flink.core.fs.local.LocalDataInputStream.read(LocalDataInputStream.java:72)
>   at 
> org.apache.flink.core.fs.FSDataInputStreamWrapper.read(FSDataInputStreamWrapper.java:59)
>   at 
> org.apache.flink.api.common.io.DelimitedInputFormat.fillBuffer(DelimitedInputFormat.java:662)
>   at 
> org.apache.flink.api.common.io.DelimitedInputFormat.readLine(DelimitedInputFormat.java:556)
>   at 
> org.apache.flink.api.common.io.DelimitedInputFormat.nextRecord(DelimitedInputFormat.java:522)
>   at 
> org.apache.flink.api.java.io.CsvInputFormat.nextRecord(CsvInputFormat.java:78)
>   at 
> org.apache.flink.runtime.operators.DataSourceTask.invoke(DataSourceTask.java:166)
>   at org.apache.flink.runtime.taskmanager.Task.run(Task.java:654)
>   at java.lang.Thread.run(Thread.java:745)
> {code}
> Complete logs are attached.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (FLINK-2910) Combine tests for binary graph operators

2017-02-28 Thread Ufuk Celebi (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15888269#comment-15888269
 ] 

Ufuk Celebi commented on FLINK-2910:


Is this issue still relevant or can it be closed?

> Combine tests for binary graph operators
> 
>
> Key: FLINK-2910
> URL: https://issues.apache.org/jira/browse/FLINK-2910
> Project: Flink
>  Issue Type: Test
>  Components: Gelly
>Affects Versions: 0.10.0
>Reporter: Martin Junghanns
>Assignee: Martin Junghanns
>Priority: Minor
>
> Atm, testing a binary operator (i.e. union and difference) is done in two 
> similar tests: one is testing the expected vertex set and one the expected 
> edge set. This can be combined in one test per operator using 
> {{LocalCollectionOutputFormat<>}}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (FLINK-4655) Add tests for validation of Expressions

2017-02-28 Thread Ufuk Celebi (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-4655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15888260#comment-15888260
 ] 

Ufuk Celebi commented on FLINK-4655:


[~twalthr], [~fhueske] is this issue still relevant?

> Add tests for validation of Expressions
> ---
>
> Key: FLINK-4655
> URL: https://issues.apache.org/jira/browse/FLINK-4655
> Project: Flink
>  Issue Type: Test
>  Components: Table API & SQL
>Reporter: Timo Walther
>
> Currently, it is only tested if Table API expressions work if the input is 
> correct. The validation method of expressions is not tested. The 
> {{ExpressionTestBase}} should be extended to provide means to also test 
> invalid expressions.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (FLINK-1702) Authenticate via Kerberos from the client only

2017-02-28 Thread Ufuk Celebi (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15888256#comment-15888256
 ] 

Ufuk Celebi commented on FLINK-1702:


[~tzulitai] Can we close this issue?

> Authenticate via Kerberos from the client only
> --
>
> Key: FLINK-1702
> URL: https://issues.apache.org/jira/browse/FLINK-1702
> Project: Flink
>  Issue Type: Improvement
>  Components: Security
>Reporter: Maximilian Michels
>Priority: Minor
> Fix For: 1.0.0
>
>
> FLINK-1504 implemented support for Kerberos authentication for HDFS in Flink. 
> Currently, the authentication has to be performed on every node when the job 
> or task manager comes up. That implies that all nodes are already 
> authenticated using Kerberos.
> For the Hadoop security mechanism it would be sufficient if the Client 
> authenticated once using Kerberos and received the Hadoop DelegationToken. 
> This Token could then be passed to all nodes. It would be renewed using the 
> Hadoop security mechanisms.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Closed] (FLINK-2172) Stabilize SocketOutputFormatTest

2017-02-28 Thread Ufuk Celebi (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-2172?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ufuk Celebi closed FLINK-2172.
--
Resolution: Won't Fix

SocketOutputFormatTest has been removed.

> Stabilize SocketOutputFormatTest
> 
>
> Key: FLINK-2172
> URL: https://issues.apache.org/jira/browse/FLINK-2172
> Project: Flink
>  Issue Type: Test
>  Components: Streaming
>Affects Versions: 0.9
>Reporter: Márton Balassi
>
> As for a resolution of FLINK-2139 I am adding tests for the core streaming 
> outputformats. Added a skeleton for the socket output too, but found that it 
> was unstable and disabled it for now for that reason. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Closed] (FLINK-5587) AsyncWaitOperatorTest timed out on Travis

2017-02-28 Thread Ufuk Celebi (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-5587?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ufuk Celebi closed FLINK-5587.
--
   Resolution: Fixed
Fix Version/s: 1.2.0
   1.3.0

> AsyncWaitOperatorTest timed out on Travis
> -
>
> Key: FLINK-5587
> URL: https://issues.apache.org/jira/browse/FLINK-5587
> Project: Flink
>  Issue Type: Test
>  Components: DataStream API, Local Runtime
>Reporter: Ufuk Celebi
>  Labels: test-stability
> Fix For: 1.3.0, 1.2.0
>
>
> The Maven watch dog script cancelled the build and printed a stack trace for 
> {{AsyncWaitOperatorTest.testOperatorChainWithProcessingTime(AsyncWaitOperatorTest.java:379)}}.
> https://s3.amazonaws.com/archive.travis-ci.org/jobs/192441719/log.txt



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Closed] (FLINK-3903) Homebrew Installation

2017-02-28 Thread Ufuk Celebi (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-3903?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ufuk Celebi closed FLINK-3903.
--
   Resolution: Fixed
Fix Version/s: 1.2.1
   1.3.0

Fixed in 90173b7 (release-1.2), 87b9077 (master).

> Homebrew Installation
> -
>
> Key: FLINK-3903
> URL: https://issues.apache.org/jira/browse/FLINK-3903
> Project: Flink
>  Issue Type: Task
>  Components: Build System, Documentation
>Reporter: Eron Wright 
>Priority: Minor
>  Labels: starter
> Fix For: 1.3.0, 1.2.1
>
>
> Recently I submitted a formula for apache-flink to the 
> [homebrew|http://brew.sh/] project.   Homebrew simplifies installation on Mac:
> {code}
> $ brew install apache-flink
> ...
> $ flink --version
> Version: 1.0.2, Commit ID: d39af15
> {code}
> Updates to the formula are adhoc at the moment.  I opened this issue to 
> formalize updating homebrew into the release process.  I drafted a procedure 
> doc here:
> [https://gist.github.com/EronWright/b62bd3b192a15be4c200a2542f7c9376]
>  
> Please also consider updating the website documentation to suggest homebrew 
> as an alternate installation method for Mac users.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (FLINK-5928) Externalized checkpoints overwritting each other

2017-02-27 Thread Ufuk Celebi (JIRA)
Ufuk Celebi created FLINK-5928:
--

 Summary: Externalized checkpoints overwritting each other
 Key: FLINK-5928
 URL: https://issues.apache.org/jira/browse/FLINK-5928
 Project: Flink
  Issue Type: Bug
  Components: State Backends, Checkpointing
Reporter: Ufuk Celebi
Assignee: Ufuk Celebi
Priority: Critical


I noticed that PR #3346 accidentally broke externalized checkpoints by using a 
fixed meta data file name. We should restore the old behaviour with creating 
random files and double check why no test caught this.

This will likely superseded by upcoming changes from [~StephanEwen] to use 
metadata streams on the JobManager.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (FLINK-5926) Show state backend information in web UI

2017-02-27 Thread Ufuk Celebi (JIRA)
Ufuk Celebi created FLINK-5926:
--

 Summary: Show state backend information in web UI
 Key: FLINK-5926
 URL: https://issues.apache.org/jira/browse/FLINK-5926
 Project: Flink
  Issue Type: Improvement
  Components: Webfrontend
Reporter: Ufuk Celebi
Priority: Minor


With #3411 and follow ups, the state backends will be available at the job 
manager via the snapshot settings. We should extend the 
checkpoints/configuration tab in the web UI to also show the configured state 
backend.

https://github.com/apache/flink/pull/3411




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (FLINK-5923) Test instability in SavepointITCase testTriggerSavepointAndResume

2017-02-27 Thread Ufuk Celebi (JIRA)
Ufuk Celebi created FLINK-5923:
--

 Summary: Test instability in SavepointITCase 
testTriggerSavepointAndResume
 Key: FLINK-5923
 URL: https://issues.apache.org/jira/browse/FLINK-5923
 Project: Flink
  Issue Type: Test
  Components: Tests
Reporter: Ufuk Celebi
Assignee: Ufuk Celebi


https://s3.amazonaws.com/archive.travis-ci.org/jobs/205042538/log.txt

{code}
Failed tests: 
  SavepointITCase.testTriggerSavepointAndResume:258 Checkpoints directory not 
cleaned up: 
[/tmp/junit1029044621247843839/junit7338507921051602138/checkpoints/47fa12635d098bdafd52def453e6d66c/chk-4]
 expected:<0> but was:<1>
{code}

I think this is due to a race in the test. When shutting down the cluster it 
can happen that in progress checkpoints linger around.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (FLINK-5778) Split FileStateHandle into fileName and basePath

2017-02-26 Thread Ufuk Celebi (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-5778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15884816#comment-15884816
 ] 

Ufuk Celebi commented on FLINK-5778:


I think your suggestion makes a lot of sense. I will do it as you propose and 
handle this at the savepoint serializer.

> Split FileStateHandle into fileName and basePath
> 
>
> Key: FLINK-5778
> URL: https://issues.apache.org/jira/browse/FLINK-5778
> Project: Flink
>  Issue Type: Sub-task
>  Components: State Backends, Checkpointing
>Reporter: Ufuk Celebi
>Assignee: Ufuk Celebi
>
> Store the statePath as a basePath and a fileName and allow to overwrite the 
> basePath. We cannot overwrite the base path as long as the state handle is 
> still in flight and not persisted. Otherwise we risk a resource leak.
> We need this in order to be able to relocate savepoints.
> {code}
> interface RelativeBaseLocationStreamStateHandle {
>void clearBaseLocation();
>void setBaseLocation(String baseLocation);
> }
> {code}
> FileStateHandle should implement this and the SavepointSerializer should 
> forward the calls when a savepoint is stored or loaded, clear before store 
> and set after load.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Closed] (FLINK-5894) HA docs are misleading re: state backends

2017-02-23 Thread Ufuk Celebi (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-5894?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ufuk Celebi closed FLINK-5894.
--
   Resolution: Fixed
Fix Version/s: 1.2.1
   1.3.0

db3c5f3 (release-1.2), 234b905 (master).

> HA docs are misleading re: state backends
> -
>
> Key: FLINK-5894
> URL: https://issues.apache.org/jira/browse/FLINK-5894
> Project: Flink
>  Issue Type: Improvement
>  Components: Documentation
>Reporter: David Anderson
> Fix For: 1.3.0, 1.2.1
>
>
> Towards the end of 
> https://ci.apache.org/projects/flink/flink-docs-release-1.2/setup/jobmanager_high_availability.html#configuration
>  it says "Currently, only the file system state backend is supported in HA 
> mode." 
> The state handles are written to the FileSystem and a reference to them is 
> kept in ZooKeeper. So it's actually independent of the backend being used.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Closed] (FLINK-5876) Mention Scala type fallacies for queryable state client serializers

2017-02-22 Thread Ufuk Celebi (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-5876?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ufuk Celebi closed FLINK-5876.
--
   Resolution: Fixed
Fix Version/s: 1.2.1
   1.3.0

> Mention Scala type fallacies for queryable state client serializers
> ---
>
> Key: FLINK-5876
> URL: https://issues.apache.org/jira/browse/FLINK-5876
> Project: Flink
>  Issue Type: Improvement
>  Components: Documentation
>Reporter: Ufuk Celebi
> Fix For: 1.3.0, 1.2.1
>
>
> FLINK-5801 shows a very hard to debug issue when querying state from the 
> Scala API that can result in serializer mismatches between the client and the 
> job.
> We should mention that the type serializers should be created via the Scala 
> macros.
> {code}
> import org.apache.flink.streaming.api.scala._
> ...
> val keySerializer = createTypeInformation[Long].createSerializer(new 
> ExecutionConfig)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (FLINK-5876) Mention Scala type fallacies for queryable state client serializers

2017-02-22 Thread Ufuk Celebi (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-5876?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15878123#comment-15878123
 ] 

Ufuk Celebi commented on FLINK-5876:


Fixed in 15bb424 (release-1.2), 8e1775a (master).

> Mention Scala type fallacies for queryable state client serializers
> ---
>
> Key: FLINK-5876
> URL: https://issues.apache.org/jira/browse/FLINK-5876
> Project: Flink
>  Issue Type: Improvement
>  Components: Documentation
>Reporter: Ufuk Celebi
> Fix For: 1.3.0, 1.2.1
>
>
> FLINK-5801 shows a very hard to debug issue when querying state from the 
> Scala API that can result in serializer mismatches between the client and the 
> job.
> We should mention that the type serializers should be created via the Scala 
> macros.
> {code}
> import org.apache.flink.streaming.api.scala._
> ...
> val keySerializer = createTypeInformation[Long].createSerializer(new 
> ExecutionConfig)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (FLINK-5801) Queryable State from Scala job/client fails with key of type Long

2017-02-21 Thread Ufuk Celebi (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-5801?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15876416#comment-15876416
 ] 

Ufuk Celebi commented on FLINK-5801:


The root cause of this a mismatch in the key serializers. The client Scala code 
creates a generic KryoSerializer, because the scala.Long type cannot be 
captured at runtime.

A work around is to use a macro when creating the serializers:

{code}
import org.apache.flink.streaming.api.scala._

...
val keySerializer = createTypeInformation[Long].createSerializer(new 
ExecutionConfig)
{code}

I created FLINK-5876 for adding a note to the docs. Sorry for the inconvenience 
this has caused.

> Queryable State from Scala job/client fails with key of type Long
> -
>
> Key: FLINK-5801
> URL: https://issues.apache.org/jira/browse/FLINK-5801
> Project: Flink
>  Issue Type: Bug
>  Components: Queryable State
>Affects Versions: 1.2.0
> Environment: Flink 1.2.0
> Scala 2.10
>Reporter: Patrick Lucas
>Assignee: Ufuk Celebi
> Attachments: OrderFulfillment.scala, OrderFulfillmentStateQuery.scala
>
>
> While working on a demo Flink job, to try out Queryable State, I exposed some 
> state of type Long -> custom class via the Query server. However, the query 
> server returned an exception when I tried to send a query:
> {noformat}
> Exception in thread "main" java.lang.RuntimeException: Failed to query state 
> backend for query 0. Caused by: java.io.IOException: Unable to deserialize 
> key and namespace. This indicates a mismatch in the key/namespace serializers 
> used by the KvState instance and this access.
>   at 
> org.apache.flink.runtime.query.netty.message.KvStateRequestSerializer.deserializeKeyAndNamespace(KvStateRequestSerializer.java:392)
>   at 
> org.apache.flink.runtime.state.heap.AbstractHeapState.getSerializedValue(AbstractHeapState.java:130)
>   at 
> org.apache.flink.runtime.query.netty.KvStateServerHandler$AsyncKvStateQueryTask.run(KvStateServerHandler.java:220)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: java.io.EOFException
>   at 
> org.apache.flink.runtime.util.DataInputDeserializer.readLong(DataInputDeserializer.java:217)
>   at 
> org.apache.flink.api.common.typeutils.base.LongSerializer.deserialize(LongSerializer.java:69)
>   at 
> org.apache.flink.api.common.typeutils.base.LongSerializer.deserialize(LongSerializer.java:27)
>   at 
> org.apache.flink.runtime.query.netty.message.KvStateRequestSerializer.deserializeKeyAndNamespace(KvStateRequestSerializer.java:379)
>   ... 7 more
>   at 
> org.apache.flink.runtime.query.netty.KvStateServerHandler$AsyncKvStateQueryTask.run(KvStateServerHandler.java:257)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>   at java.lang.Thread.run(Thread.java:745)
> {noformat}
> I banged my head against this for a while, then per [~jgrier]'s suggestion I 
> tried simply changing the key from Long to String (modifying the two 
> {{keyBy}} calls and the {{keySerializer}} {{TypeHint}} in the attached code) 
> and it started working perfectly.
> cc [~uce]



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (FLINK-5876) Mention Scala type fallacies for queryable state client serializers

2017-02-21 Thread Ufuk Celebi (JIRA)
Ufuk Celebi created FLINK-5876:
--

 Summary: Mention Scala type fallacies for queryable state client 
serializers
 Key: FLINK-5876
 URL: https://issues.apache.org/jira/browse/FLINK-5876
 Project: Flink
  Issue Type: Improvement
  Components: Documentation
Reporter: Ufuk Celebi


FLINK-5801 shows a very hard to debug issue when querying state from the Scala 
API that can result in serializer mismatches between the client and the job.

We should mention that the type serializers should be created via the Scala 
macros.

{code}
import org.apache.flink.streaming.api.scala._
...
val keySerializer = createTypeInformation[Long].createSerializer(new 
ExecutionConfig)
{code}




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Closed] (FLINK-4803) Job Cancel can hang forever waiting for OutputFormat.close()

2017-02-21 Thread Ufuk Celebi (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-4803?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ufuk Celebi closed FLINK-4803.
--
Resolution: Duplicate

> Job Cancel can hang forever waiting for OutputFormat.close()
> 
>
> Key: FLINK-4803
> URL: https://issues.apache.org/jira/browse/FLINK-4803
> Project: Flink
>  Issue Type: Bug
>  Components: Batch Connectors and Input/Output Formats
>Affects Versions: 1.1.1
>Reporter: Shannon Carey
>
> If the Flink job uses a badly-behaved OutputFormat (in this example, a 
> HadoopOutputFormat containing a CqlBulkOutputFormat), where the close() 
> method blocks forever, it is impossible to cancel the Flink job even though 
> the blocked thread would respond to an interrupt. The stack traces below show 
> the state of the important threads when a job is canceled and the 
> OutputFormat is blocking forever inside of close().
> I suggest that `DataSinkTask.cancel()` method be updated to add a timeout on 
> `this.format.close()`. When the timeout is reached, the Task thread should be 
> interrupted.
> {code}
> "Canceler for DataSink 
> (org.apache.flink.api.scala.hadoop.mapred.HadoopOutputFormat@13bf17a2) (2/5)" 
> #6422 daemon prio=5 os_prio=0 tid=0x7fb7e42f nid=0x34f3 waiting for 
> monitor entry [0x7fb7be079000]
>java.lang.Thread.State: BLOCKED (on object monitor)
> at 
> org.apache.flink.api.java.hadoop.mapred.HadoopOutputFormatBase.close(HadoopOutputFormatBase.java:158)
> - waiting to lock <0x0006bae5f788> (a java.lang.Object)
> at 
> org.apache.flink.runtime.operators.DataSinkTask.cancel(DataSinkTask.java:268)
> at 
> org.apache.flink.runtime.taskmanager.Task$TaskCanceler.run(Task.java:1149)
> at java.lang.Thread.run(Thread.java:745)
> "DataSink 
> (org.apache.flink.api.scala.hadoop.mapred.HadoopOutputFormat@13bf17a2) (2/5)" 
> #6410 daemon prio=5 os_prio=0 tid=0x7fb7e79a4800 nid=0x2ad8 waiting on 
> condition [0x7fb7bdf78000]
>java.lang.Thread.State: TIMED_WAITING (parking)
> at sun.misc.Unsafe.park(Native Method)
> - parking to wait for  <0x0006c5ab5e20> (a 
> java.util.concurrent.SynchronousQueue$TransferStack)
> at 
> java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
> at 
> java.util.concurrent.SynchronousQueue$TransferStack.awaitFulfill(SynchronousQueue.java:460)
> at 
> java.util.concurrent.SynchronousQueue$TransferStack.transfer(SynchronousQueue.java:362)
> at 
> java.util.concurrent.SynchronousQueue.offer(SynchronousQueue.java:895)
> at 
> org.apache.cassandra.io.sstable.SSTableSimpleUnsortedWriter.put(SSTableSimpleUnsortedWriter.java:194)
> at 
> org.apache.cassandra.io.sstable.SSTableSimpleUnsortedWriter.sync(SSTableSimpleUnsortedWriter.java:180)
> at 
> org.apache.cassandra.io.sstable.SSTableSimpleUnsortedWriter.close(SSTableSimpleUnsortedWriter.java:156)
> at 
> org.apache.cassandra.io.sstable.CQLSSTableWriter.close(CQLSSTableWriter.java:275)
> at 
> org.apache.cassandra.hadoop.AbstractBulkRecordWriter.close(AbstractBulkRecordWriter.java:133)
> at 
> org.apache.cassandra.hadoop.AbstractBulkRecordWriter.close(AbstractBulkRecordWriter.java:126)
> at 
> org.apache.flink.api.java.hadoop.mapred.HadoopOutputFormatBase.close(HadoopOutputFormatBase.java:158)
> - locked <0x0006bae5f788> (a java.lang.Object)
> at 
> org.apache.flink.runtime.operators.DataSinkTask.invoke(DataSinkTask.java:234)
> at org.apache.flink.runtime.taskmanager.Task.run(Task.java:584)
> at java.lang.Thread.run(Thread.java:745)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Closed] (FLINK-3457) Link to Apache Flink meetups from the 'Community' section of the website

2017-02-20 Thread Ufuk Celebi (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-3457?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ufuk Celebi closed FLINK-3457.
--
Resolution: Fixed

Fixed in dafb7f8 (asf-site).

> Link to Apache Flink meetups from the 'Community' section of the website
> 
>
> Key: FLINK-3457
> URL: https://issues.apache.org/jira/browse/FLINK-3457
> Project: Flink
>  Issue Type: Task
>  Components: Project Website
>Reporter: Slim Baltagi
>Assignee: Gabor Horvath
>Priority: Trivial
>
> Now with the number of Apache Flink meetups increasing worldwide, it is 
> helpful to add a link to Apache Flink meetups 
> http://www.meetup.com/topics/apache-flink/ to the community section of 
> https://flink.apache.org/community.html so visitors can conveniently find 
> them  right from the Apache Flink website. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Closed] (FLINK-5842) Wrong 'since' version for ElasticSearch 5.x connector

2017-02-20 Thread Ufuk Celebi (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-5842?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ufuk Celebi closed FLINK-5842.
--
   Resolution: Fixed
Fix Version/s: 1.3.0

0bdf3a7 (master).

> Wrong 'since' version for ElasticSearch 5.x connector
> -
>
> Key: FLINK-5842
> URL: https://issues.apache.org/jira/browse/FLINK-5842
> Project: Flink
>  Issue Type: Bug
>  Components: Documentation, Streaming Connectors
>Affects Versions: 1.3.0
>Reporter: Dawid Wysakowicz
>Assignee: Patrick Lucas
> Fix For: 1.3.0
>
>
> The documentation claims that ElasticSearch 5.x is supported since Flink 
> 1.2.0 which is not true, as the support was merged after 1.2.0.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (FLINK-5781) Generation HTML from ConfigOption

2017-02-20 Thread Ufuk Celebi (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-5781?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ufuk Celebi reassigned FLINK-5781:
--

Assignee: (was: Ufuk Celebi)

> Generation HTML from ConfigOption
> -
>
> Key: FLINK-5781
> URL: https://issues.apache.org/jira/browse/FLINK-5781
> Project: Flink
>  Issue Type: Sub-task
>  Components: Documentation
>Reporter: Ufuk Celebi
>
> Use the ConfigOption instances to generate a HTML page that we can use to 
> include in the docs configuration page.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (FLINK-5779) Auto generate configuration docs

2017-02-20 Thread Ufuk Celebi (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-5779?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ufuk Celebi reassigned FLINK-5779:
--

Assignee: (was: Ufuk Celebi)

> Auto generate configuration docs
> 
>
> Key: FLINK-5779
> URL: https://issues.apache.org/jira/browse/FLINK-5779
> Project: Flink
>  Issue Type: Improvement
>  Components: Documentation
>Reporter: Ufuk Celebi
>
> As per discussion on the mailing list we need to improve on the configuration 
> documentation page 
> (http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/Discuss-Organizing-Documentation-for-Configuration-Options-td15773.html).
> We decided to try to (semi) automate this in order to not get of sync in the 
> future.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (FLINK-5780) Extend ConfigOption with descriptions

2017-02-20 Thread Ufuk Celebi (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-5780?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ufuk Celebi reassigned FLINK-5780:
--

Assignee: (was: Ufuk Celebi)

> Extend ConfigOption with descriptions
> -
>
> Key: FLINK-5780
> URL: https://issues.apache.org/jira/browse/FLINK-5780
> Project: Flink
>  Issue Type: Sub-task
>  Components: Core, Documentation
>Reporter: Ufuk Celebi
>
> The {{ConfigOption}} type is meant to replace the flat {{ConfigConstants}}. 
> As part of automating the generation of a docs config page we need to extend  
> {{ConfigOption}} with description fields.
> From the ML discussion, these could be:
> {code}
> void shortDescription(String);
> void longDescription(String);
> {code}
> In practice, the description string should contain HTML/Markdown.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Closed] (FLINK-5837) improve readability of the queryable state docs

2017-02-18 Thread Ufuk Celebi (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-5837?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ufuk Celebi closed FLINK-5837.
--
   Resolution: Fixed
Fix Version/s: 1.2.1
   1.3.0

Fixed in b21f9d1 (release-1.2), 70475b3 (master).

> improve readability of the queryable state docs
> ---
>
> Key: FLINK-5837
> URL: https://issues.apache.org/jira/browse/FLINK-5837
> Project: Flink
>  Issue Type: Improvement
>  Components: Documentation
>Reporter: David Anderson
>Assignee: David Anderson
>Priority: Minor
> Fix For: 1.3.0, 1.2.1
>
>
> I'll propose some edits relating to grammar and language usage.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Closed] (FLINK-5825) In yarn mode, a small pic can not be loaded

2017-02-17 Thread Ufuk Celebi (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-5825?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ufuk Celebi closed FLINK-5825.
--
   Resolution: Fixed
Fix Version/s: 1.2.1
   1.3.0

Fixed in a2b0816 (master), 69d453b (release-1.2).

> In yarn mode, a small pic can not be loaded
> ---
>
> Key: FLINK-5825
> URL: https://issues.apache.org/jira/browse/FLINK-5825
> Project: Flink
>  Issue Type: Bug
>  Components: Webfrontend, YARN
>Reporter: Tao Wang
>Priority: Minor
> Fix For: 1.3.0, 1.2.1
>
>
> In yarn mode, the web frontend url is accessed from yarn in format like 
> "http://spark-91-206:8088/proxy/application_1487122678902_0015/;, and the 
> running job page's url is 
> "http://spark-91-206:8088/proxy/application_1487122678902_0015/#/jobs/9440a129ea5899c16e7c1a7e8f2897b3;.
> One .png file called "horizontal.png", which is very small can not be loaded 
> in that mode, because in "index.styl" it is cited as absolute path.
> We should make the path relative.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (FLINK-5824) Fix String/byte conversions without explicit encoding

2017-02-17 Thread Ufuk Celebi (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-5824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15871455#comment-15871455
 ] 

Ufuk Celebi commented on FLINK-5824:


[~dawidwys] I've just assigned it to you.

[~StephanEwen] [~stefanrichte...@gmail.com] Stephan mentioned that this also 
affects savepoints. Do we want to do something special there or simply fix it 
(possibly breaking savepoints)?

> Fix String/byte conversions without explicit encoding
> -
>
> Key: FLINK-5824
> URL: https://issues.apache.org/jira/browse/FLINK-5824
> Project: Flink
>  Issue Type: Bug
>  Components: Python API, Queryable State, State Backends, 
> Checkpointing, Webfrontend
>Reporter: Ufuk Celebi
>Assignee: Dawid Wysakowicz
>Priority: Blocker
>
> In a couple of places we convert Strings to bytes and bytes back to Strings 
> without explicitly specifying an encoding. This can lead to problems when 
> client and server default encodings differ.
> The task of this JIRA is to go over the whole project and look for 
> conversions where we don't specify an encoding and fix it to specify UTF-8 
> explicitly.
> For starters, we can {{grep -R 'getBytes()' .}}, which already reveals many 
> problematic places.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (FLINK-5824) Fix String/byte conversions without explicit encoding

2017-02-17 Thread Ufuk Celebi (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-5824?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ufuk Celebi reassigned FLINK-5824:
--

Assignee: Dawid Wysakowicz

> Fix String/byte conversions without explicit encoding
> -
>
> Key: FLINK-5824
> URL: https://issues.apache.org/jira/browse/FLINK-5824
> Project: Flink
>  Issue Type: Bug
>  Components: Python API, Queryable State, State Backends, 
> Checkpointing, Webfrontend
>Reporter: Ufuk Celebi
>Assignee: Dawid Wysakowicz
>Priority: Blocker
>
> In a couple of places we convert Strings to bytes and bytes back to Strings 
> without explicitly specifying an encoding. This can lead to problems when 
> client and server default encodings differ.
> The task of this JIRA is to go over the whole project and look for 
> conversions where we don't specify an encoding and fix it to specify UTF-8 
> explicitly.
> For starters, we can {{grep -R 'getBytes()' .}}, which already reveals many 
> problematic places.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (FLINK-5824) Fix String/byte conversions without explicit encoding

2017-02-16 Thread Ufuk Celebi (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-5824?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ufuk Celebi updated FLINK-5824:
---
Summary: Fix String/byte conversions without explicit encoding  (was: Fix 
String/byte conversion without explicit encodings)

> Fix String/byte conversions without explicit encoding
> -
>
> Key: FLINK-5824
> URL: https://issues.apache.org/jira/browse/FLINK-5824
> Project: Flink
>  Issue Type: Bug
>  Components: Python API, Queryable State, State Backends, 
> Checkpointing, Webfrontend
>Reporter: Ufuk Celebi
>
> In a couple of places we convert Strings to bytes and bytes back to Strings 
> without explicitly specifying an encoding. This can lead to problems when 
> client and server default encodings differ.
> The task of this JIRA is to go over the whole project and look for 
> conversions where we don't specify an encoding and fix it to specify UTF-8 
> explicitly.
> For starters, we can {{grep -R 'getBytes()' .}}, which already reveals many 
> problematic places.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (FLINK-5824) Fix String/byte conversion without explicit encodings

2017-02-16 Thread Ufuk Celebi (JIRA)
Ufuk Celebi created FLINK-5824:
--

 Summary: Fix String/byte conversion without explicit encodings
 Key: FLINK-5824
 URL: https://issues.apache.org/jira/browse/FLINK-5824
 Project: Flink
  Issue Type: Bug
  Components: Python API, Queryable State, State Backends, 
Checkpointing, Webfrontend
Reporter: Ufuk Celebi


In a couple of places we convert Strings to bytes and bytes back to Strings 
without explicitly specifying an encoding. This can lead to problems when 
client and server default encodings differ.

The task of this JIRA is to go over the whole project and look for conversions 
where we don't specify an encoding and fix it to specify UTF-8 explicitly.

For starters, we can {{grep -R 'getBytes()' .}}, which already reveals many 
problematic places.





--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


<    1   2   3   4   5   6   7   8   9   10   >