[jira] [Commented] (FLINK-8802) Concurrent serialization without duplicating serializers in state server.

2018-03-19 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-8802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16405813#comment-16405813
 ] 

ASF GitHub Bot commented on FLINK-8802:
---

Github user kl0u commented on the issue:

https://github.com/apache/flink/pull/5691
  
That is because cleaning up the cache is happening from another thread, 
other than the one 
accessing the serializers, and ThreadLocal does not have a clear() method 
that you can call
from another thread and clean all the state in it. Each thread can only 
clean its own state.

> On Mar 20, 2018, at 3:34 AM, sihua zhou  wrote:
> 
> @sihuazhou commented on this pull request.
> 
> In 
flink-runtime/src/main/java/org/apache/flink/runtime/state/internal/InternalQueryableKvState.java
 :
> 
> > private final boolean areSerializersStateless;
>  
> - private final ConcurrentMap> 
serializerCache;
> + private final ConcurrentMap> 
serializerCache = new ConcurrentHashMap<>(4);
>  
> nit: just wonder why didn't use ThreadLocal> 
provided by JDK...
> 
> —
> You are receiving this because you authored the thread.
> Reply to this email directly, view it on GitHub 
, or 
mute the thread 
.
> 




> Concurrent serialization without duplicating serializers in state server.
> -
>
> Key: FLINK-8802
> URL: https://issues.apache.org/jira/browse/FLINK-8802
> Project: Flink
>  Issue Type: Bug
>  Components: Queryable State
>Affects Versions: 1.5.0
>Reporter: Kostas Kloudas
>Assignee: Kostas Kloudas
>Priority: Blocker
> Fix For: 1.5.0
>
>
> The `getSerializedValue()` may be called by multiple threads but serializers 
> are not duplicated, which may lead to exceptions thrown when a serializer is 
> stateful.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] flink issue #5691: [FLINK-8802] [QS] Fix concurrent access to non-duplicated...

2018-03-19 Thread kl0u
Github user kl0u commented on the issue:

https://github.com/apache/flink/pull/5691
  
That is because cleaning up the cache is happening from another thread, 
other than the one 
accessing the serializers, and ThreadLocal does not have a clear() method 
that you can call
from another thread and clean all the state in it. Each thread can only 
clean its own state.

> On Mar 20, 2018, at 3:34 AM, sihua zhou  wrote:
> 
> @sihuazhou commented on this pull request.
> 
> In 
flink-runtime/src/main/java/org/apache/flink/runtime/state/internal/InternalQueryableKvState.java
 :
> 
> > private final boolean areSerializersStateless;
>  
> - private final ConcurrentMap> 
serializerCache;
> + private final ConcurrentMap> 
serializerCache = new ConcurrentHashMap<>(4);
>  
> nit: just wonder why didn't use ThreadLocal> 
provided by JDK...
> 
> —
> You are receiving this because you authored the thread.
> Reply to this email directly, view it on GitHub 
, or 
mute the thread 
.
> 




---


[jira] [Commented] (FLINK-9018) Unclosed snapshotCloseableRegistry in RocksDBKeyedStateBackend#FullSnapshotStrategy#performSnapshot

2018-03-19 Thread Sihua Zhou (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-9018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16405781#comment-16405781
 ] 

Sihua Zhou commented on FLINK-9018:
---

Since this is a minor change, I covered it in 
[5716|https://github.com/apache/flink/pull/5716].

> Unclosed snapshotCloseableRegistry in 
> RocksDBKeyedStateBackend#FullSnapshotStrategy#performSnapshot
> ---
>
> Key: FLINK-9018
> URL: https://issues.apache.org/jira/browse/FLINK-9018
> Project: Flink
>  Issue Type: Bug
>  Components: State Backends, Checkpointing
>Reporter: Ted Yu
>Priority: Minor
>
> {code}
>   final CloseableRegistry snapshotCloseableRegistry = new 
> CloseableRegistry();
>   if (kvStateInformation.isEmpty()) {
> if (LOG.isDebugEnabled()) {
>   LOG.debug("Asynchronous RocksDB snapshot performed on empty keyed 
> state at {}. Returning null.",
> timestamp);
> }
> return DoneFuture.of(SnapshotResult.empty());
>   }
> {code}
> If the method returns in the above if block, snapshotCloseableRegistry is not 
> closed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (FLINK-9018) Unclosed snapshotCloseableRegistry in RocksDBKeyedStateBackend#FullSnapshotStrategy#performSnapshot

2018-03-19 Thread Sihua Zhou (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-9018?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sihua Zhou reassigned FLINK-9018:
-

Assignee: Sihua Zhou

> Unclosed snapshotCloseableRegistry in 
> RocksDBKeyedStateBackend#FullSnapshotStrategy#performSnapshot
> ---
>
> Key: FLINK-9018
> URL: https://issues.apache.org/jira/browse/FLINK-9018
> Project: Flink
>  Issue Type: Bug
>  Components: State Backends, Checkpointing
>Reporter: Ted Yu
>Assignee: Sihua Zhou
>Priority: Minor
>
> {code}
>   final CloseableRegistry snapshotCloseableRegistry = new 
> CloseableRegistry();
>   if (kvStateInformation.isEmpty()) {
> if (LOG.isDebugEnabled()) {
>   LOG.debug("Asynchronous RocksDB snapshot performed on empty keyed 
> state at {}. Returning null.",
> timestamp);
> }
> return DoneFuture.of(SnapshotResult.empty());
>   }
> {code}
> If the method returns in the above if block, snapshotCloseableRegistry is not 
> closed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-8620) Enable shipping custom artifacts to BlobStore and accessing them through DistributedCache

2018-03-19 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-8620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16405778#comment-16405778
 ] 

ASF GitHub Bot commented on FLINK-8620:
---

Github user aljoscha commented on the issue:

https://github.com/apache/flink/pull/5580
  
So the current state of this PR doesn't work with the Python API because it 
uploads complete directories? Is that really used/needed by the Python API?

I would like to have this feature in 1.5, if possible. This PR also adds an 
end-to-end test, which should make it very robust against future changes.


> Enable shipping custom artifacts to BlobStore and accessing them through 
> DistributedCache
> -
>
> Key: FLINK-8620
> URL: https://issues.apache.org/jira/browse/FLINK-8620
> Project: Flink
>  Issue Type: New Feature
>Reporter: Dawid Wysakowicz
>Assignee: Dawid Wysakowicz
>Priority: Major
>
> We should be able to distribute custom files to taskmanagers. To do that we 
> can store those files in BlobStore and later on access them in TaskManagers 
> through DistributedCache.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] flink issue #5580: [FLINK-8620] Enable shipping custom files to BlobStore an...

2018-03-19 Thread aljoscha
Github user aljoscha commented on the issue:

https://github.com/apache/flink/pull/5580
  
So the current state of this PR doesn't work with the Python API because it 
uploads complete directories? Is that really used/needed by the Python API?

I would like to have this feature in 1.5, if possible. This PR also adds an 
end-to-end test, which should make it very robust against future changes.


---


[jira] [Commented] (FLINK-8886) Job isolation via scheduling in shared cluster

2018-03-19 Thread Elias Levy (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-8886?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16405753#comment-16405753
 ] 

Elias Levy commented on FLINK-8886:
---

Thanks.  I am aware of YARN, but we do want to manage YARN just to run Flink.  
I we wanted to go down the route of using a generic resource manager, we'd use 
Mesos, but we don't have a need to do.

As I alluded to, we already run multiple Flink clusters in standalone mode to 
gain isolation, but this is usually wasteful, as the JM is usually lightly 
loaded and you need at least two of them for high-availability.  So it would be 
useful to share JMs while jobs are isolated to TMs, which is what I am 
proposing.

As for the complexity of the proposal, I'd argue that is is relatively 
lightweight in the restrictive mode.  Just permit the TM to register a set of 
tags placed in the config file, then have the scheduler only schedule jobs on 
TMs that have an exact match for the tags. 

> Job isolation via scheduling in shared cluster
> --
>
> Key: FLINK-8886
> URL: https://issues.apache.org/jira/browse/FLINK-8886
> Project: Flink
>  Issue Type: Improvement
>  Components: Distributed Coordination, Local Runtime, Scheduler
>Affects Versions: 1.5.0
>Reporter: Elias Levy
>Priority: Major
>
> Flink's TaskManager executes tasks from different jobs within the same JMV as 
> threads.  We prefer to isolate different jobs on their on JVM.  Thus, we must 
> use different TMs for different jobs.  As currently the scheduler will 
> allocate task slots within a TM to tasks from different jobs, that means we 
> must stand up one cluster per job.  This is wasteful, as it requires at least 
> two JobManagers per cluster for high-availability, and the JMs have low 
> utilization.
> Additionally, different jobs may require different resources.  Some jobs are 
> compute heavy.  Some are IO heavy (lots of state in RocksDB).  At the moment 
> the scheduler threats all TMs are equivalent, except possibly in their number 
> of available task slots.  Thus, one is required to stand up multiple cluster 
> if there is a need for different types of TMs.
>  
> It would be useful if one could specify requirements on job, such that they 
> are only scheduled on a subset of TMs.  Properly configured, that would 
> permit isolation of jobs in a shared cluster and scheduling of jobs with 
> specific resource needs.
>  
> One possible implementation is to specify a set of tags on the TM config file 
> which the TMs used when registering with the JM, and another set of tags 
> configured within the job or supplied when submitting the job.  The scheduler 
> could then match the tags in the job with the tags in the TMs.  In a 
> restrictive mode the scheduler would assign a job task to a TM only if all 
> tags match.  In a relaxed mode the scheduler could assign a job task to a TM 
> if there is a partial match, while giving preference to a more accurate match.
>  
>  
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-8415) Unprotected access to recordsToSend in LongRecordWriterThread#shutdown()

2018-03-19 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-8415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16405750#comment-16405750
 ] 

ASF GitHub Bot commented on FLINK-8415:
---

GitHub user yanghua opened a pull request:

https://github.com/apache/flink/pull/5724

[FLINK-8415] Unprotected access to recordsToSend in 
LongRecordWriterThread#shutdown()

## What is the purpose of the change

*This pull request marked a method as `synchronized` keyword.*


## Brief change log

  - *marked method `LongRecordWriterThread#shutdown` as `synchronized`*


## Verifying this change

This change is already covered by existing tests, such as 
*(StreamNetworkThroughputBenchmark#tearDown)*.


## Does this pull request potentially affect one of the following parts:

  - Dependencies (does it add or upgrade a dependency): (yes / **no**)
  - The public API, i.e., is any changed class annotated with 
`@Public(Evolving)`: (yes / **no**)
  - The serializers: (yes / **no** / don't know)
  - The runtime per-record code paths (performance sensitive): (yes / 
**no** / don't know)
  - Anything that affects deployment or recovery: JobManager (and its 
components), Checkpointing, Yarn/Mesos, ZooKeeper: (yes / **no** / don't know)
  - The S3 file system connector: (yes / **no** / don't know)

## Documentation

  - Does this pull request introduce a new feature? (yes / **no**)
  - If yes, how is the feature documented? (not applicable / docs / 
JavaDocs / **not documented**)


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/yanghua/flink FLINK-8415

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/5724.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #5724


commit 52979c23ccdc3f868b0cd9af61292f36f67151fa
Author: yanghua 
Date:   2018-03-20T03:21:42Z

[FLINK-8415] Unprotected access to recordsToSend in 
LongRecordWriterThread#shutdown()




> Unprotected access to recordsToSend in LongRecordWriterThread#shutdown()
> 
>
> Key: FLINK-8415
> URL: https://issues.apache.org/jira/browse/FLINK-8415
> Project: Flink
>  Issue Type: Bug
>  Components: Streaming
>Reporter: Ted Yu
>Assignee: vinoyang
>Priority: Minor
>
> {code}
>   public void shutdown() {
> running = false;
> recordsToSend.complete(0L);
> {code}
> In other methods, access to recordsToSend is protected by synchronized 
> keyword.
> shutdown() should do the same.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] flink pull request #5724: [FLINK-8415] Unprotected access to recordsToSend i...

2018-03-19 Thread yanghua
GitHub user yanghua opened a pull request:

https://github.com/apache/flink/pull/5724

[FLINK-8415] Unprotected access to recordsToSend in 
LongRecordWriterThread#shutdown()

## What is the purpose of the change

*This pull request marked a method as `synchronized` keyword.*


## Brief change log

  - *marked method `LongRecordWriterThread#shutdown` as `synchronized`*


## Verifying this change

This change is already covered by existing tests, such as 
*(StreamNetworkThroughputBenchmark#tearDown)*.


## Does this pull request potentially affect one of the following parts:

  - Dependencies (does it add or upgrade a dependency): (yes / **no**)
  - The public API, i.e., is any changed class annotated with 
`@Public(Evolving)`: (yes / **no**)
  - The serializers: (yes / **no** / don't know)
  - The runtime per-record code paths (performance sensitive): (yes / 
**no** / don't know)
  - Anything that affects deployment or recovery: JobManager (and its 
components), Checkpointing, Yarn/Mesos, ZooKeeper: (yes / **no** / don't know)
  - The S3 file system connector: (yes / **no** / don't know)

## Documentation

  - Does this pull request introduce a new feature? (yes / **no**)
  - If yes, how is the feature documented? (not applicable / docs / 
JavaDocs / **not documented**)


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/yanghua/flink FLINK-8415

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/5724.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #5724


commit 52979c23ccdc3f868b0cd9af61292f36f67151fa
Author: yanghua 
Date:   2018-03-20T03:21:42Z

[FLINK-8415] Unprotected access to recordsToSend in 
LongRecordWriterThread#shutdown()




---


[jira] [Commented] (FLINK-9019) Unclosed closeableRegistry in StreamTaskStateInitializerImpl#rawOperatorStateInputs

2018-03-19 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-9019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16405739#comment-16405739
 ] 

ASF GitHub Bot commented on FLINK-9019:
---

Github user yanghua commented on the issue:

https://github.com/apache/flink/pull/5723
  
cc @tillrohrmann 


> Unclosed closeableRegistry in 
> StreamTaskStateInitializerImpl#rawOperatorStateInputs
> ---
>
> Key: FLINK-9019
> URL: https://issues.apache.org/jira/browse/FLINK-9019
> Project: Flink
>  Issue Type: Bug
>Reporter: Ted Yu
>Assignee: vinoyang
>Priority: Minor
>
> {code}
>  final CloseableRegistry closeableRegistry = new CloseableRegistry();
> ...
>  if (rawOperatorState != null) {
> ...
>   }
> }
> return CloseableIterable.empty();
> {code}
> If rawOperatorState is null, closeableRegistry would be left unclosed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] flink issue #5723: [FLINK-9019] Unclosed closeableRegistry in StreamTaskStat...

2018-03-19 Thread yanghua
Github user yanghua commented on the issue:

https://github.com/apache/flink/pull/5723
  
cc @tillrohrmann 


---


[jira] [Assigned] (FLINK-8394) Lack of synchronization accessing expectedRecord in ReceiverThread#shutdown

2018-03-19 Thread vinoyang (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-8394?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

vinoyang reassigned FLINK-8394:
---

Assignee: vinoyang

> Lack of synchronization accessing expectedRecord in ReceiverThread#shutdown
> ---
>
> Key: FLINK-8394
> URL: https://issues.apache.org/jira/browse/FLINK-8394
> Project: Flink
>  Issue Type: Test
>  Components: Streaming
>Reporter: Ted Yu
>Assignee: vinoyang
>Priority: Minor
>
> {code}
>   public void shutdown() {
> running = false;
> interrupt();
> expectedRecord.complete(0L);
> {code}
> Access to expectedRecord should be protected by synchronization, as done on 
> other methods.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-8886) Job isolation via scheduling in shared cluster

2018-03-19 Thread vinoyang (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-8886?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16405723#comment-16405723
 ] 

vinoyang commented on FLINK-8886:
-

[~elevy] YARN has a feature named 'Node Label' since Apache Hadoop 2.6+ can 
match your requirements. And the previous issue see *FLINK-7836* I have 
supported the flink client to specify YARN node label expression. So you can 
run your job on YARN.

Or for standalone cluster mode, you can just split a big cluster into some 
smaller clusters base on dimensions of job / resource / business and so on . 

I think introduce this feature into standalone cluster would make the scheduler 
more heavier and complex. Because the standalone cluster's scheduler is not 
good at resource assignment and management . Yarn and mesos would be better 
choice. 

[~till.rohrmann] What's your opinion?

> Job isolation via scheduling in shared cluster
> --
>
> Key: FLINK-8886
> URL: https://issues.apache.org/jira/browse/FLINK-8886
> Project: Flink
>  Issue Type: Improvement
>  Components: Distributed Coordination, Local Runtime, Scheduler
>Affects Versions: 1.5.0
>Reporter: Elias Levy
>Priority: Major
>
> Flink's TaskManager executes tasks from different jobs within the same JMV as 
> threads.  We prefer to isolate different jobs on their on JVM.  Thus, we must 
> use different TMs for different jobs.  As currently the scheduler will 
> allocate task slots within a TM to tasks from different jobs, that means we 
> must stand up one cluster per job.  This is wasteful, as it requires at least 
> two JobManagers per cluster for high-availability, and the JMs have low 
> utilization.
> Additionally, different jobs may require different resources.  Some jobs are 
> compute heavy.  Some are IO heavy (lots of state in RocksDB).  At the moment 
> the scheduler threats all TMs are equivalent, except possibly in their number 
> of available task slots.  Thus, one is required to stand up multiple cluster 
> if there is a need for different types of TMs.
>  
> It would be useful if one could specify requirements on job, such that they 
> are only scheduled on a subset of TMs.  Properly configured, that would 
> permit isolation of jobs in a shared cluster and scheduling of jobs with 
> specific resource needs.
>  
> One possible implementation is to specify a set of tags on the TM config file 
> which the TMs used when registering with the JM, and another set of tags 
> configured within the job or supplied when submitting the job.  The scheduler 
> could then match the tags in the job with the tags in the TMs.  In a 
> restrictive mode the scheduler would assign a job task to a TM only if all 
> tags match.  In a relaxed mode the scheduler could assign a job task to a TM 
> if there is a partial match, while giving preference to a more accurate match.
>  
>  
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (FLINK-8394) Lack of synchronization accessing expectedRecord in ReceiverThread#shutdown

2018-03-19 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-8394?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated FLINK-8394:
--
Description: 
{code}
  public void shutdown() {
running = false;
interrupt();
expectedRecord.complete(0L);
{code}

Access to expectedRecord should be protected by synchronization, as done on 
other methods.

  was:
{code}
  public void shutdown() {
running = false;
interrupt();
expectedRecord.complete(0L);
{code}
Access to expectedRecord should be protected by synchronization, as done on 
other methods.


> Lack of synchronization accessing expectedRecord in ReceiverThread#shutdown
> ---
>
> Key: FLINK-8394
> URL: https://issues.apache.org/jira/browse/FLINK-8394
> Project: Flink
>  Issue Type: Test
>  Components: Streaming
>Reporter: Ted Yu
>Priority: Minor
>
> {code}
>   public void shutdown() {
> running = false;
> interrupt();
> expectedRecord.complete(0L);
> {code}
> Access to expectedRecord should be protected by synchronization, as done on 
> other methods.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-8802) Concurrent serialization without duplicating serializers in state server.

2018-03-19 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-8802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16405713#comment-16405713
 ] 

ASF GitHub Bot commented on FLINK-8802:
---

Github user sihuazhou commented on a diff in the pull request:

https://github.com/apache/flink/pull/5691#discussion_r175640731
  
--- Diff: 
flink-runtime/src/main/java/org/apache/flink/runtime/state/internal/InternalQueryableKvState.java
 ---
@@ -16,46 +16,42 @@
  * limitations under the License.
  */
 
-package org.apache.flink.runtime.query;
+package org.apache.flink.runtime.state.internal;
 
-import org.apache.flink.annotation.Internal;
 import org.apache.flink.annotation.VisibleForTesting;
-import org.apache.flink.runtime.state.internal.InternalKvState;
+import org.apache.flink.api.common.typeutils.TypeSerializer;
+import org.apache.flink.runtime.query.KvStateInfo;
 import org.apache.flink.util.Preconditions;
 
 import java.util.concurrent.ConcurrentHashMap;
 import java.util.concurrent.ConcurrentMap;
 
 /**
- * An entry holding the {@link InternalKvState} along with its {@link 
KvStateInfo}.
+ * An abstract base class to be subclassed by states that are expected to 
be queryable.
+ * Its main task is to keep a "thread-local" copy of the different 
serializers (if needed).
  *
  * @param   The type of key the state is associated to
  * @param   The type of the namespace the state is associated to
  * @param   The type of values kept internally in state
  */
-@Internal
-public class KvStateEntry {
+public abstract class InternalQueryableKvState implements 
InternalKvState {
 
-   private final InternalKvState state;
private final KvStateInfo stateInfo;
-
private final boolean areSerializersStateless;
 
-   private final ConcurrentMap> 
serializerCache;
+   private final ConcurrentMap> 
serializerCache = new ConcurrentHashMap<>(4);
 
--- End diff --

nit:  just wonder why didn't use ThreadLocal> provided 
by JDK...


> Concurrent serialization without duplicating serializers in state server.
> -
>
> Key: FLINK-8802
> URL: https://issues.apache.org/jira/browse/FLINK-8802
> Project: Flink
>  Issue Type: Bug
>  Components: Queryable State
>Affects Versions: 1.5.0
>Reporter: Kostas Kloudas
>Assignee: Kostas Kloudas
>Priority: Blocker
> Fix For: 1.5.0
>
>
> The `getSerializedValue()` may be called by multiple threads but serializers 
> are not duplicated, which may lead to exceptions thrown when a serializer is 
> stateful.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] flink pull request #5691: [FLINK-8802] [QS] Fix concurrent access to non-dup...

2018-03-19 Thread sihuazhou
Github user sihuazhou commented on a diff in the pull request:

https://github.com/apache/flink/pull/5691#discussion_r175640731
  
--- Diff: 
flink-runtime/src/main/java/org/apache/flink/runtime/state/internal/InternalQueryableKvState.java
 ---
@@ -16,46 +16,42 @@
  * limitations under the License.
  */
 
-package org.apache.flink.runtime.query;
+package org.apache.flink.runtime.state.internal;
 
-import org.apache.flink.annotation.Internal;
 import org.apache.flink.annotation.VisibleForTesting;
-import org.apache.flink.runtime.state.internal.InternalKvState;
+import org.apache.flink.api.common.typeutils.TypeSerializer;
+import org.apache.flink.runtime.query.KvStateInfo;
 import org.apache.flink.util.Preconditions;
 
 import java.util.concurrent.ConcurrentHashMap;
 import java.util.concurrent.ConcurrentMap;
 
 /**
- * An entry holding the {@link InternalKvState} along with its {@link 
KvStateInfo}.
+ * An abstract base class to be subclassed by states that are expected to 
be queryable.
+ * Its main task is to keep a "thread-local" copy of the different 
serializers (if needed).
  *
  * @param   The type of key the state is associated to
  * @param   The type of the namespace the state is associated to
  * @param   The type of values kept internally in state
  */
-@Internal
-public class KvStateEntry {
+public abstract class InternalQueryableKvState implements 
InternalKvState {
 
-   private final InternalKvState state;
private final KvStateInfo stateInfo;
-
private final boolean areSerializersStateless;
 
-   private final ConcurrentMap> 
serializerCache;
+   private final ConcurrentMap> 
serializerCache = new ConcurrentHashMap<>(4);
 
--- End diff --

nit:  just wonder why didn't use ThreadLocal> provided 
by JDK...


---


[jira] [Assigned] (FLINK-8415) Unprotected access to recordsToSend in LongRecordWriterThread#shutdown()

2018-03-19 Thread vinoyang (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-8415?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

vinoyang reassigned FLINK-8415:
---

Assignee: vinoyang

> Unprotected access to recordsToSend in LongRecordWriterThread#shutdown()
> 
>
> Key: FLINK-8415
> URL: https://issues.apache.org/jira/browse/FLINK-8415
> Project: Flink
>  Issue Type: Bug
>  Components: Streaming
>Reporter: Ted Yu
>Assignee: vinoyang
>Priority: Minor
>
> {code}
>   public void shutdown() {
> running = false;
> recordsToSend.complete(0L);
> {code}
> In other methods, access to recordsToSend is protected by synchronized 
> keyword.
> shutdown() should do the same.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-9019) Unclosed closeableRegistry in StreamTaskStateInitializerImpl#rawOperatorStateInputs

2018-03-19 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-9019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16405696#comment-16405696
 ] 

ASF GitHub Bot commented on FLINK-9019:
---

GitHub user yanghua opened a pull request:

https://github.com/apache/flink/pull/5723

[FLINK-9019] Unclosed closeableRegistry in 
StreamTaskStateInitializerImpl#rawOperatorStateInputs

## What is the purpose of the change

*This pull request fixed a resource(CloseableRegistry) leak.*


## Brief change log

  - *create the `ClosableRegistry`'s instance if necessary*

## Verifying this change

This change is already covered by existing tests, such as 
*StreamTaskStateInitializerImplTest*.

## Does this pull request potentially affect one of the following parts:

  - Dependencies (does it add or upgrade a dependency): (yes / **no**)
  - The public API, i.e., is any changed class annotated with 
`@Public(Evolving)`: (yes / **no**)
  - The serializers: (yes / **no** / don't know)
  - The runtime per-record code paths (performance sensitive): (yes / 
**no** / don't know)
  - Anything that affects deployment or recovery: JobManager (and its 
components), Checkpointing, Yarn/Mesos, ZooKeeper: (yes / **no** / don't know)
  - The S3 file system connector: (yes / **no** / don't know)

## Documentation

  - Does this pull request introduce a new feature? (yes / **no**)
  - If yes, how is the feature documented? (not applicable / docs / 
JavaDocs / **not documented**)


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/yanghua/flink FLINK-9019

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/5723.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #5723


commit 447c4a56b43d3c0737ad54e4e91846ea54ff7205
Author: yanghua 
Date:   2018-03-20T02:02:21Z

[FLINK-9019] Unclosed closeableRegistry in 
StreamTaskStateInitializerImpl#rawOperatorStateInputs




> Unclosed closeableRegistry in 
> StreamTaskStateInitializerImpl#rawOperatorStateInputs
> ---
>
> Key: FLINK-9019
> URL: https://issues.apache.org/jira/browse/FLINK-9019
> Project: Flink
>  Issue Type: Bug
>Reporter: Ted Yu
>Assignee: vinoyang
>Priority: Minor
>
> {code}
>  final CloseableRegistry closeableRegistry = new CloseableRegistry();
> ...
>  if (rawOperatorState != null) {
> ...
>   }
> }
> return CloseableIterable.empty();
> {code}
> If rawOperatorState is null, closeableRegistry would be left unclosed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] flink pull request #5723: [FLINK-9019] Unclosed closeableRegistry in StreamT...

2018-03-19 Thread yanghua
GitHub user yanghua opened a pull request:

https://github.com/apache/flink/pull/5723

[FLINK-9019] Unclosed closeableRegistry in 
StreamTaskStateInitializerImpl#rawOperatorStateInputs

## What is the purpose of the change

*This pull request fixed a resource(CloseableRegistry) leak.*


## Brief change log

  - *create the `ClosableRegistry`'s instance if necessary*

## Verifying this change

This change is already covered by existing tests, such as 
*StreamTaskStateInitializerImplTest*.

## Does this pull request potentially affect one of the following parts:

  - Dependencies (does it add or upgrade a dependency): (yes / **no**)
  - The public API, i.e., is any changed class annotated with 
`@Public(Evolving)`: (yes / **no**)
  - The serializers: (yes / **no** / don't know)
  - The runtime per-record code paths (performance sensitive): (yes / 
**no** / don't know)
  - Anything that affects deployment or recovery: JobManager (and its 
components), Checkpointing, Yarn/Mesos, ZooKeeper: (yes / **no** / don't know)
  - The S3 file system connector: (yes / **no** / don't know)

## Documentation

  - Does this pull request introduce a new feature? (yes / **no**)
  - If yes, how is the feature documented? (not applicable / docs / 
JavaDocs / **not documented**)


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/yanghua/flink FLINK-9019

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/5723.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #5723


commit 447c4a56b43d3c0737ad54e4e91846ea54ff7205
Author: yanghua 
Date:   2018-03-20T02:02:21Z

[FLINK-9019] Unclosed closeableRegistry in 
StreamTaskStateInitializerImpl#rawOperatorStateInputs




---


[jira] [Commented] (FLINK-8946) TaskManager stop sending metrics after JobManager failover

2018-03-19 Thread Truong Duc Kien (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-8946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16405692#comment-16405692
 ] 

Truong Duc Kien commented on FLINK-8946:


Sorry, we haven't had time to test with Flip-6 yet.

> TaskManager stop sending metrics after JobManager failover
> --
>
> Key: FLINK-8946
> URL: https://issues.apache.org/jira/browse/FLINK-8946
> Project: Flink
>  Issue Type: Bug
>  Components: Metrics, TaskManager
>Affects Versions: 1.4.2
>Reporter: Truong Duc Kien
>Assignee: vinoyang
>Priority: Critical
> Fix For: 1.5.0
>
>
> Running in Yarn-standalone mode, when the Job Manager performs a failover, 
> all TaskManager that are inherited from the previous JobManager will not send 
> metrics to the new JobManager and other registered metric reporters.
>  
> A cursory glance reveal that these line of code might be the cause
> [https://github.com/apache/flink/blob/a3478fdfa0f792104123fefbd9bdf01f5029de51/flink-runtime/src/main/scala/org/apache/flink/runtime/taskmanager/TaskManager.scala#L1082-L1086]
> Perhap the TaskManager close its metrics group when disassociating 
> JobManager, but not creating a new one on fail-over association ?
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-9019) Unclosed closeableRegistry in StreamTaskStateInitializerImpl#rawOperatorStateInputs

2018-03-19 Thread vinoyang (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-9019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16405688#comment-16405688
 ] 

vinoyang commented on FLINK-9019:
-

sorry, I was busy and have not joined the discussion. Agree with [~sihuazhou] , 
here the _closeableRegistry_ would be used in a anonymous implementation of 
'*CloseableIterable*' as a return type. I think 
we could fix this issue with simply moving the constraction into the if block, 
just like :
{code:java}
if (rawOperatorState != null) {
   final CloseableRegistry closeableRegistry = new CloseableRegistry();

   return new CloseableIterable() {

   }
}

return CloseableIterable.empty();
{code}
 

> Unclosed closeableRegistry in 
> StreamTaskStateInitializerImpl#rawOperatorStateInputs
> ---
>
> Key: FLINK-9019
> URL: https://issues.apache.org/jira/browse/FLINK-9019
> Project: Flink
>  Issue Type: Bug
>Reporter: Ted Yu
>Assignee: vinoyang
>Priority: Minor
>
> {code}
>  final CloseableRegistry closeableRegistry = new CloseableRegistry();
> ...
>  if (rawOperatorState != null) {
> ...
>   }
> }
> return CloseableIterable.empty();
> {code}
> If rawOperatorState is null, closeableRegistry would be left unclosed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (FLINK-5486) Lack of synchronization in BucketingSink#handleRestoredBucketState()

2018-03-19 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-5486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16351986#comment-16351986
 ] 

Ted Yu edited comment on FLINK-5486 at 3/19/18 9:12 PM:


Can this get more review, please?


was (Author: yuzhih...@gmail.com):
Can this get more review, please ?

> Lack of synchronization in BucketingSink#handleRestoredBucketState()
> 
>
> Key: FLINK-5486
> URL: https://issues.apache.org/jira/browse/FLINK-5486
> Project: Flink
>  Issue Type: Bug
>  Components: Streaming Connectors
>Reporter: Ted Yu
>Assignee: mingleizhang
>Priority: Major
> Fix For: 1.3.4
>
>
> Here is related code:
> {code}
>   
> handlePendingFilesForPreviousCheckpoints(bucketState.pendingFilesPerCheckpoint);
>   synchronized (bucketState.pendingFilesPerCheckpoint) {
> bucketState.pendingFilesPerCheckpoint.clear();
>   }
> {code}
> The handlePendingFilesForPreviousCheckpoints() call should be enclosed inside 
> the synchronization block. Otherwise during the processing of 
> handlePendingFilesForPreviousCheckpoints(), some entries of the map may be 
> cleared.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (FLINK-7795) Utilize error-prone to discover common coding mistakes

2018-03-19 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-7795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16345955#comment-16345955
 ] 

Ted Yu edited comment on FLINK-7795 at 3/19/18 9:09 PM:


error-prone has JDK 8 dependency .


was (Author: yuzhih...@gmail.com):
error-prone has JDK 8 dependency.

> Utilize error-prone to discover common coding mistakes
> --
>
> Key: FLINK-7795
> URL: https://issues.apache.org/jira/browse/FLINK-7795
> Project: Flink
>  Issue Type: Improvement
>  Components: Build System
>Reporter: Ted Yu
>Priority: Major
>
> http://errorprone.info/ is a tool which detects common coding mistakes.
> We should incorporate into Flink build process.
> Here are the dependencies:
> {code}
> 
>   com.google.errorprone
>   error_prone_annotation
>   ${error-prone.version}
>   provided
> 
> 
>   
>   com.google.auto.service
>   auto-service
>   1.0-rc3
>   true
> 
> 
>   com.google.errorprone
>   error_prone_check_api
>   ${error-prone.version}
>   provided
>   
> 
>   com.google.code.findbugs
>   jsr305
> 
>   
> 
> 
>   com.google.errorprone
>   javac
>   9-dev-r4023-3
>   provided
> 
>   
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (FLINK-8661) Replace Collections.EMPTY_MAP with Collections.emptyMap()

2018-03-19 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-8661?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated FLINK-8661:
--
Description: The use of Collections.EMPTY_SET and Collections.EMPTY_MAP 
often causes unchecked assignment and it should be replaced with 
Collections.emptySet() and Collections.emptyMap() .  (was: The use of 
Collections.EMPTY_SET and Collections.EMPTY_MAP often causes unchecked 
assignment and it should be replaced with Collections.emptySet() and 
Collections.emptyMap().)

> Replace Collections.EMPTY_MAP with Collections.emptyMap()
> -
>
> Key: FLINK-8661
> URL: https://issues.apache.org/jira/browse/FLINK-8661
> Project: Flink
>  Issue Type: Bug
>Reporter: Ted Yu
>Assignee: mingleizhang
>Priority: Minor
>
> The use of Collections.EMPTY_SET and Collections.EMPTY_MAP often causes 
> unchecked assignment and it should be replaced with Collections.emptySet() 
> and Collections.emptyMap() .



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (FLINK-8941) SpanningRecordSerializationTest fails on Travis

2018-03-19 Thread Aljoscha Krettek (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-8941?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aljoscha Krettek updated FLINK-8941:

Fix Version/s: (was: 1.6.0)

> SpanningRecordSerializationTest fails on Travis
> ---
>
> Key: FLINK-8941
> URL: https://issues.apache.org/jira/browse/FLINK-8941
> Project: Flink
>  Issue Type: Improvement
>  Components: Network, Tests
>Affects Versions: 1.5.0
>Reporter: Chesnay Schepler
>Assignee: Nico Kruber
>Priority: Blocker
>  Labels: test-stability
> Fix For: 1.5.0
>
>
> https://travis-ci.org/zentol/flink/jobs/353217791
> {code:java}
> testHandleMixedLargeRecords(org.apache.flink.runtime.io.network.api.serialization.SpanningRecordSerializationTest)
>   Time elapsed: 1.992 sec  <<< ERROR!
> java.nio.channels.ClosedChannelException: null
>   at sun.nio.ch.FileChannelImpl.ensureOpen(FileChannelImpl.java:110)
>   at sun.nio.ch.FileChannelImpl.write(FileChannelImpl.java:199)
>   at 
> org.apache.flink.runtime.io.network.api.serialization.SpillingAdaptiveSpanningRecordDeserializer$SpanningWrapper.addNextChunkFromMemorySegment(SpillingAdaptiveSpanningRecordDeserializer.java:528)
>   at 
> org.apache.flink.runtime.io.network.api.serialization.SpillingAdaptiveSpanningRecordDeserializer$SpanningWrapper.access$200(SpillingAdaptiveSpanningRecordDeserializer.java:430)
>   at 
> org.apache.flink.runtime.io.network.api.serialization.SpillingAdaptiveSpanningRecordDeserializer.setNextBuffer(SpillingAdaptiveSpanningRecordDeserializer.java:75)
>   at 
> org.apache.flink.runtime.io.network.api.serialization.SpanningRecordSerializationTest.testSerializationRoundTrip(SpanningRecordSerializationTest.java:143)
>   at 
> org.apache.flink.runtime.io.network.api.serialization.SpanningRecordSerializationTest.testSerializationRoundTrip(SpanningRecordSerializationTest.java:109)
>   at 
> org.apache.flink.runtime.io.network.api.serialization.SpanningRecordSerializationTest.testHandleMixedLargeRecords(SpanningRecordSerializationTest.java:98){code}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (FLINK-9013) Document yarn.containers.vcores only being effective when adapting YARN config

2018-03-19 Thread Aljoscha Krettek (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-9013?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aljoscha Krettek updated FLINK-9013:

Affects Version/s: (was: 1.6.0)

> Document yarn.containers.vcores only being effective when adapting YARN config
> --
>
> Key: FLINK-9013
> URL: https://issues.apache.org/jira/browse/FLINK-9013
> Project: Flink
>  Issue Type: Improvement
>  Components: Documentation, YARN
>Affects Versions: 1.5.0
>Reporter: Nico Kruber
>Assignee: Nico Kruber
>Priority: Major
> Fix For: 1.5.0
>
>
> Even after specifying {{yarn.containers.vcores}} and having Flink request 
> such a container from YARN, it may not take these into account at all and 
> return a container with 1 vcore.
> The YARN configuration needs to be adapted to take the vcores into account, 
> e.g. by setting the {{FairScheduler}} in {{yarn-site.xml}}:
> {code}
> 
>   yarn.resourcemanager.scheduler.class
>   
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler
> 
> {code}
> This fact should be documented at least at the configuration parameter 
> documentation of  {{yarn.containers.vcores}}.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (FLINK-9010) NoResourceAvailableException with FLIP-6

2018-03-19 Thread Aljoscha Krettek (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-9010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aljoscha Krettek updated FLINK-9010:

Affects Version/s: (was: 1.6.0)

> NoResourceAvailableException with FLIP-6 
> -
>
> Key: FLINK-9010
> URL: https://issues.apache.org/jira/browse/FLINK-9010
> Project: Flink
>  Issue Type: Bug
>  Components: ResourceManager
>Affects Versions: 1.5.0
>Reporter: Nico Kruber
>Assignee: Nico Kruber
>Priority: Blocker
>  Labels: flip-6
> Fix For: 1.5.0
>
>
> I was trying to run a bigger program with 400 slots (100 TMs, 2 slots each) 
> with FLIP-6 mode and a checkpointing interval of 1000 and got the following 
> exception:
> {code}
> 2018-03-16 03:41:20,154 INFO  org.apache.flink.yarn.YarnResourceManager   
>   - Received new container: 
> container_1521038088305_0257_01_000101 - Remaining pending container 
> requests: 302
> 2018-03-16 03:41:20,154 INFO  org.apache.flink.yarn.YarnResourceManager   
>   - TaskExecutor container_1521038088305_0257_01_000101 will be 
> started with container size 8192 MB, JVM heap size 5120 MB, JVM direct memory 
> limit 3072 MB
> 2018-03-16 03:41:20,154 INFO  org.apache.flink.yarn.YarnResourceManager   
>   - TM:remote keytab path obtained null
> 2018-03-16 03:41:20,154 INFO  org.apache.flink.yarn.YarnResourceManager   
>   - TM:remote keytab principal obtained null
> 2018-03-16 03:41:20,154 INFO  org.apache.flink.yarn.YarnResourceManager   
>   - TM:remote yarn conf path obtained null
> 2018-03-16 03:41:20,154 INFO  org.apache.flink.yarn.YarnResourceManager   
>   - TM:remote krb5 path obtained null
> 2018-03-16 03:41:20,155 INFO  org.apache.flink.yarn.Utils 
>   - Copying from 
> file:/mnt/yarn/usercache/hadoop/appcache/application_1521038088305_0257/container_1521038088305_0257_01_01/3cd0c7d7-ccc1-4680-83a5-54e64dd637bc-taskmanager-conf.yaml
>  to 
> hdfs://ip-172-31-1-91.eu-west-1.compute.internal:8020/user/hadoop/.flink/application_1521038088305_0257/3cd0c7d7-ccc1-4680-83a5-54e64dd637bc-taskmanager-conf.yaml
> 2018-03-16 03:41:20,165 INFO  org.apache.flink.yarn.YarnResourceManager   
>   - Prepared local resource for modified yaml: resource { scheme: 
> "hdfs" host: "ip-172-31-1-91.eu-west-1.compute.internal" port: 8020 file: 
> "/user/hadoop/.flink/application_1521038088305_0257/3cd0c7d7-ccc1-4680-83a5-54e64dd637bc-taskmanager-conf.yaml"
>  } size: 595 timestamp: 1521171680164 type: FILE visibility: APPLICATION
> 2018-03-16 03:41:20,168 INFO  org.apache.flink.yarn.YarnResourceManager   
>   - Creating container launch context for TaskManagers
> 2018-03-16 03:41:20,168 INFO  org.apache.flink.yarn.YarnResourceManager   
>   - Starting TaskManagers with command: $JAVA_HOME/bin/java 
> -Xms5120m -Xmx5120m -XX:MaxDirectMemorySize=3072m  
> -Dlog.file=/taskmanager.log 
> -Dlogback.configurationFile=file:./logback.xml 
> -Dlog4j.configuration=file:./log4j.properties 
> org.apache.flink.yarn.YarnTaskExecutorRunner --configDir . 1> 
> /taskmanager.out 2> /taskmanager.err
> 2018-03-16 03:41:20,176 INFO  
> org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy  - 
> Opening proxy : ip-172-31-3-221.eu-west-1.compute.internal:8041
> 2018-03-16 03:41:20,180 INFO  org.apache.flink.yarn.YarnResourceManager   
>   - Received new container: 
> container_1521038088305_0257_01_000102 - Remaining pending container 
> requests: 301
> 2018-03-16 03:41:20,180 INFO  org.apache.flink.yarn.YarnResourceManager   
>   - TaskExecutor container_1521038088305_0257_01_000102 will be 
> started with container size 8192 MB, JVM heap size 5120 MB, JVM direct memory 
> limit 3072 MB
> 2018-03-16 03:41:20,180 INFO  org.apache.flink.yarn.YarnResourceManager   
>   - TM:remote keytab path obtained null
> 2018-03-16 03:41:20,180 INFO  org.apache.flink.yarn.YarnResourceManager   
>   - TM:remote keytab principal obtained null
> 2018-03-16 03:41:20,180 INFO  org.apache.flink.yarn.YarnResourceManager   
>   - TM:remote yarn conf path obtained null
> 2018-03-16 03:41:20,180 INFO  org.apache.flink.yarn.YarnResourceManager   
>   - TM:remote krb5 path obtained null
> 2018-03-16 03:41:20,181 INFO  org.apache.flink.yarn.Utils 
>   - Copying from 
> file:/mnt/yarn/usercache/hadoop/appcache/application_1521038088305_0257/container_1521038088305_0257_01_01/6766be70-82f7-4999-a371-11c27527fb6e-taskmanager-conf.yaml
>  to 
> 

[jira] [Updated] (FLINK-9012) Shaded Hadoop S3A end-to-end test failing with S3 bucket in Ireland

2018-03-19 Thread Aljoscha Krettek (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-9012?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aljoscha Krettek updated FLINK-9012:

Affects Version/s: (was: 1.6.0)

> Shaded Hadoop S3A end-to-end test failing with S3 bucket in Ireland
> ---
>
> Key: FLINK-9012
> URL: https://issues.apache.org/jira/browse/FLINK-9012
> Project: Flink
>  Issue Type: Bug
>  Components: Tests
>Affects Versions: 1.5.0
>Reporter: Nico Kruber
>Assignee: Nico Kruber
>Priority: Blocker
> Fix For: 1.5.0
>
>
> https://api.travis-ci.org/v3/job/354259892/log.txt
> {code}
> Found AWS bucket [secure], running Shaded Hadoop S3A e2e tests.
> Flink dist directory: /home/travis/build/NicoK/flink/build-target
> TEST_DATA_DIR: 
> /home/travis/build/NicoK/flink/flink-end-to-end-tests/test-scripts/temp-test-directory-05775180416
>   % Total% Received % Xferd  Average Speed   TimeTime Time  
> Current
>  Dload  Upload   Total   SpentLeft  Speed
>   0 00 00 0  0  0 --:--:-- --:--:-- --:--:-- 0
>   0 00 00 0  0  0 --:--:-- --:--:-- --:--:-- 0
>  91   4930   4490 0   2476  0 --:--:-- --:--:-- --:--:--  2467
> 
> TemporaryRedirectPlease re-send this request to 
> the specified temporary endpoint. Continue to use the original request 
> endpoint for future 
> requests.[secure][secure].s3-eu-west-1.amazonaws.com1FCEC82C3EBF7C7ENG5dxnXQ0Y5mK2X/m3bU+Z7Fqt0mNVL2JsjyVjGZUmpDmNuBDfKJACh7VI6tCTYEZsw65W057lA=Starting
>  cluster.
> Starting standalonesession daemon on host 
> travis-job-087822e3-2f4c-46b7-b9bd-b6d4aba6136c.
> Starting taskexecutor daemon on host 
> travis-job-087822e3-2f4c-46b7-b9bd-b6d4aba6136c.
> Dispatcher/TaskManagers are not yet up
> Waiting for dispatcher REST endpoint to come up...
> Dispatcher/TaskManagers are not yet up
> Waiting for dispatcher REST endpoint to come up...
> Dispatcher/TaskManagers are not yet up
> Waiting for dispatcher REST endpoint to come up...
> Dispatcher/TaskManagers are not yet up
> Waiting for dispatcher REST endpoint to come up...
> Dispatcher/TaskManagers are not yet up
> Waiting for dispatcher REST endpoint to come up...
> Waiting for dispatcher REST endpoint to come up...
> Dispatcher REST endpoint is up.
> Starting execution of program
> 
>  The program finished with the following exception:
> org.apache.flink.client.program.ProgramInvocationException: Could not 
> retrieve the execution result.
>   at 
> org.apache.flink.client.program.rest.RestClusterClient.submitJob(RestClusterClient.java:246)
>   at 
> org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:458)
>   at 
> org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:446)
>   at 
> org.apache.flink.client.program.ContextEnvironment.execute(ContextEnvironment.java:62)
>   at 
> org.apache.flink.examples.java.wordcount.WordCount.main(WordCount.java:86)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:528)
>   at 
> org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:420)
>   at 
> org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:398)
>   at 
> org.apache.flink.client.cli.CliFrontend.executeProgram(CliFrontend.java:780)
>   at 
> org.apache.flink.client.cli.CliFrontend.runProgram(CliFrontend.java:274)
>   at org.apache.flink.client.cli.CliFrontend.run(CliFrontend.java:209)
>   at 
> org.apache.flink.client.cli.CliFrontend.parseParameters(CliFrontend.java:1019)
>   at 
> org.apache.flink.client.cli.CliFrontend.lambda$main$9(CliFrontend.java:1095)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1836)
>   at 
> org.apache.flink.runtime.security.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:41)
>   at org.apache.flink.client.cli.CliFrontend.main(CliFrontend.java:1095)
> Caused by: org.apache.flink.runtime.client.JobSubmissionException: Failed to 
> submit JobGraph.
>   at 
> org.apache.flink.client.program.rest.RestClusterClient.lambda$submitJob$4(RestClusterClient.java:341)
>   at 
> 

[jira] [Updated] (FLINK-9027) Web UI does not cleanup temporary files

2018-03-19 Thread Aljoscha Krettek (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-9027?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aljoscha Krettek updated FLINK-9027:

Affects Version/s: (was: 1.6.0)

> Web UI does not cleanup temporary files
> ---
>
> Key: FLINK-9027
> URL: https://issues.apache.org/jira/browse/FLINK-9027
> Project: Flink
>  Issue Type: Bug
>  Components: Webfrontend
>Affects Versions: 1.5.0
>Reporter: Nico Kruber
>Priority: Blocker
> Fix For: 1.5.0
>
>
> The web UI creates two directories in {{java.io.tmp}}, namely 
> {{flink-web-}} and {{flink-web-upload-}} and both are not cleaned 
> up (running {{start-cluster.sh}} and {{stop-
> cluster.sh}}.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (FLINK-9010) NoResourceAvailableException with FLIP-6

2018-03-19 Thread Aljoscha Krettek (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-9010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aljoscha Krettek updated FLINK-9010:

Fix Version/s: (was: 1.6.0)

> NoResourceAvailableException with FLIP-6 
> -
>
> Key: FLINK-9010
> URL: https://issues.apache.org/jira/browse/FLINK-9010
> Project: Flink
>  Issue Type: Bug
>  Components: ResourceManager
>Affects Versions: 1.5.0
>Reporter: Nico Kruber
>Assignee: Nico Kruber
>Priority: Blocker
>  Labels: flip-6
> Fix For: 1.5.0
>
>
> I was trying to run a bigger program with 400 slots (100 TMs, 2 slots each) 
> with FLIP-6 mode and a checkpointing interval of 1000 and got the following 
> exception:
> {code}
> 2018-03-16 03:41:20,154 INFO  org.apache.flink.yarn.YarnResourceManager   
>   - Received new container: 
> container_1521038088305_0257_01_000101 - Remaining pending container 
> requests: 302
> 2018-03-16 03:41:20,154 INFO  org.apache.flink.yarn.YarnResourceManager   
>   - TaskExecutor container_1521038088305_0257_01_000101 will be 
> started with container size 8192 MB, JVM heap size 5120 MB, JVM direct memory 
> limit 3072 MB
> 2018-03-16 03:41:20,154 INFO  org.apache.flink.yarn.YarnResourceManager   
>   - TM:remote keytab path obtained null
> 2018-03-16 03:41:20,154 INFO  org.apache.flink.yarn.YarnResourceManager   
>   - TM:remote keytab principal obtained null
> 2018-03-16 03:41:20,154 INFO  org.apache.flink.yarn.YarnResourceManager   
>   - TM:remote yarn conf path obtained null
> 2018-03-16 03:41:20,154 INFO  org.apache.flink.yarn.YarnResourceManager   
>   - TM:remote krb5 path obtained null
> 2018-03-16 03:41:20,155 INFO  org.apache.flink.yarn.Utils 
>   - Copying from 
> file:/mnt/yarn/usercache/hadoop/appcache/application_1521038088305_0257/container_1521038088305_0257_01_01/3cd0c7d7-ccc1-4680-83a5-54e64dd637bc-taskmanager-conf.yaml
>  to 
> hdfs://ip-172-31-1-91.eu-west-1.compute.internal:8020/user/hadoop/.flink/application_1521038088305_0257/3cd0c7d7-ccc1-4680-83a5-54e64dd637bc-taskmanager-conf.yaml
> 2018-03-16 03:41:20,165 INFO  org.apache.flink.yarn.YarnResourceManager   
>   - Prepared local resource for modified yaml: resource { scheme: 
> "hdfs" host: "ip-172-31-1-91.eu-west-1.compute.internal" port: 8020 file: 
> "/user/hadoop/.flink/application_1521038088305_0257/3cd0c7d7-ccc1-4680-83a5-54e64dd637bc-taskmanager-conf.yaml"
>  } size: 595 timestamp: 1521171680164 type: FILE visibility: APPLICATION
> 2018-03-16 03:41:20,168 INFO  org.apache.flink.yarn.YarnResourceManager   
>   - Creating container launch context for TaskManagers
> 2018-03-16 03:41:20,168 INFO  org.apache.flink.yarn.YarnResourceManager   
>   - Starting TaskManagers with command: $JAVA_HOME/bin/java 
> -Xms5120m -Xmx5120m -XX:MaxDirectMemorySize=3072m  
> -Dlog.file=/taskmanager.log 
> -Dlogback.configurationFile=file:./logback.xml 
> -Dlog4j.configuration=file:./log4j.properties 
> org.apache.flink.yarn.YarnTaskExecutorRunner --configDir . 1> 
> /taskmanager.out 2> /taskmanager.err
> 2018-03-16 03:41:20,176 INFO  
> org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy  - 
> Opening proxy : ip-172-31-3-221.eu-west-1.compute.internal:8041
> 2018-03-16 03:41:20,180 INFO  org.apache.flink.yarn.YarnResourceManager   
>   - Received new container: 
> container_1521038088305_0257_01_000102 - Remaining pending container 
> requests: 301
> 2018-03-16 03:41:20,180 INFO  org.apache.flink.yarn.YarnResourceManager   
>   - TaskExecutor container_1521038088305_0257_01_000102 will be 
> started with container size 8192 MB, JVM heap size 5120 MB, JVM direct memory 
> limit 3072 MB
> 2018-03-16 03:41:20,180 INFO  org.apache.flink.yarn.YarnResourceManager   
>   - TM:remote keytab path obtained null
> 2018-03-16 03:41:20,180 INFO  org.apache.flink.yarn.YarnResourceManager   
>   - TM:remote keytab principal obtained null
> 2018-03-16 03:41:20,180 INFO  org.apache.flink.yarn.YarnResourceManager   
>   - TM:remote yarn conf path obtained null
> 2018-03-16 03:41:20,180 INFO  org.apache.flink.yarn.YarnResourceManager   
>   - TM:remote krb5 path obtained null
> 2018-03-16 03:41:20,181 INFO  org.apache.flink.yarn.Utils 
>   - Copying from 
> file:/mnt/yarn/usercache/hadoop/appcache/application_1521038088305_0257/container_1521038088305_0257_01_01/6766be70-82f7-4999-a371-11c27527fb6e-taskmanager-conf.yaml
>  to 
> 

[jira] [Updated] (FLINK-8941) SpanningRecordSerializationTest fails on Travis

2018-03-19 Thread Aljoscha Krettek (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-8941?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aljoscha Krettek updated FLINK-8941:

Affects Version/s: (was: 1.6.0)

> SpanningRecordSerializationTest fails on Travis
> ---
>
> Key: FLINK-8941
> URL: https://issues.apache.org/jira/browse/FLINK-8941
> Project: Flink
>  Issue Type: Improvement
>  Components: Network, Tests
>Affects Versions: 1.5.0
>Reporter: Chesnay Schepler
>Assignee: Nico Kruber
>Priority: Blocker
>  Labels: test-stability
> Fix For: 1.5.0
>
>
> https://travis-ci.org/zentol/flink/jobs/353217791
> {code:java}
> testHandleMixedLargeRecords(org.apache.flink.runtime.io.network.api.serialization.SpanningRecordSerializationTest)
>   Time elapsed: 1.992 sec  <<< ERROR!
> java.nio.channels.ClosedChannelException: null
>   at sun.nio.ch.FileChannelImpl.ensureOpen(FileChannelImpl.java:110)
>   at sun.nio.ch.FileChannelImpl.write(FileChannelImpl.java:199)
>   at 
> org.apache.flink.runtime.io.network.api.serialization.SpillingAdaptiveSpanningRecordDeserializer$SpanningWrapper.addNextChunkFromMemorySegment(SpillingAdaptiveSpanningRecordDeserializer.java:528)
>   at 
> org.apache.flink.runtime.io.network.api.serialization.SpillingAdaptiveSpanningRecordDeserializer$SpanningWrapper.access$200(SpillingAdaptiveSpanningRecordDeserializer.java:430)
>   at 
> org.apache.flink.runtime.io.network.api.serialization.SpillingAdaptiveSpanningRecordDeserializer.setNextBuffer(SpillingAdaptiveSpanningRecordDeserializer.java:75)
>   at 
> org.apache.flink.runtime.io.network.api.serialization.SpanningRecordSerializationTest.testSerializationRoundTrip(SpanningRecordSerializationTest.java:143)
>   at 
> org.apache.flink.runtime.io.network.api.serialization.SpanningRecordSerializationTest.testSerializationRoundTrip(SpanningRecordSerializationTest.java:109)
>   at 
> org.apache.flink.runtime.io.network.api.serialization.SpanningRecordSerializationTest.testHandleMixedLargeRecords(SpanningRecordSerializationTest.java:98){code}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (FLINK-9011) YarnResourceManager spamming log file at INFO level

2018-03-19 Thread Aljoscha Krettek (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-9011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aljoscha Krettek updated FLINK-9011:

Affects Version/s: (was: 1.6.0)

> YarnResourceManager spamming log file at INFO level
> ---
>
> Key: FLINK-9011
> URL: https://issues.apache.org/jira/browse/FLINK-9011
> Project: Flink
>  Issue Type: Bug
>  Components: ResourceManager, YARN
>Affects Versions: 1.5.0
>Reporter: Nico Kruber
>Assignee: vinoyang
>Priority: Blocker
>  Labels: flip-6
> Fix For: 1.5.0
>
>
> For every requested resource, the {{YarnResourceManager}} spams the log with 
> log-level INFO and the following messages:
> {code}
> 2018-03-16 03:41:20,180 INFO  org.apache.flink.yarn.YarnResourceManager   
>   - Received new container: 
> container_1521038088305_0257_01_000102 - Remaining pending container 
> requests: 301
> 2018-03-16 03:41:20,180 INFO  org.apache.flink.yarn.YarnResourceManager   
>   - TaskExecutor container_1521038088305_0257_01_000102 will be 
> started with container size 8192 MB, JVM heap size 5120 MB, JVM direct memory 
> limit 3072 MB
> 2018-03-16 03:41:20,180 INFO  org.apache.flink.yarn.YarnResourceManager   
>   - TM:remote keytab path obtained null
> 2018-03-16 03:41:20,180 INFO  org.apache.flink.yarn.YarnResourceManager   
>   - TM:remote keytab principal obtained null
> 2018-03-16 03:41:20,180 INFO  org.apache.flink.yarn.YarnResourceManager   
>   - TM:remote yarn conf path obtained null
> 2018-03-16 03:41:20,180 INFO  org.apache.flink.yarn.YarnResourceManager   
>   - TM:remote krb5 path obtained null
> 2018-03-16 03:41:20,181 INFO  org.apache.flink.yarn.Utils 
>   - Copying from 
> file:/mnt/yarn/usercache/hadoop/appcache/application_1521038088305_0257/container_1521038088305_0257_01_01/6766be70-82f7-4999-a371-11c27527fb6e-taskmanager-conf.yaml
>  to 
> hdfs://ip-172-31-1-91.eu-west-1.compute.internal:8020/user/hadoop/.flink/application_1521038088305_0257/6766be70-82f7-4999-a371-11c27527fb6e-taskmanager-conf.yaml
> 2018-03-16 03:41:20,190 INFO  org.apache.flink.yarn.YarnResourceManager   
>   - Prepared local resource for modified yaml: resource { scheme: 
> "hdfs" host: "ip-172-31-1-91.eu-west-1.compute.internal" port: 8020 file: 
> "/user/hadoop/.flink/application_1521038088305_0257/6766be70-82f7-4999-a371-11c27527fb6e-taskmanager-conf.yaml"
>  } size: 595 timestamp: 1521171680190 type: FILE visibility: APPLICATION
> 2018-03-16 03:41:20,194 INFO  org.apache.flink.yarn.YarnResourceManager   
>   - Creating container launch context for TaskManagers
> 2018-03-16 03:41:20,194 INFO  org.apache.flink.yarn.YarnResourceManager   
>   - Starting TaskManagers with command: $JAVA_HOME/bin/java 
> -Xms5120m -Xmx5120m -XX:MaxDirectMemorySize=3072m  
> -Dlog.file=/taskmanager.log 
> -Dlogback.configurationFile=file:./logback.xml 
> -Dlog4j.configuration=file:./log4j.properties 
> org.apache.flink.yarn.YarnTaskExecutorRunner --configDir . 1> 
> /taskmanager.out 2> /taskmanager.err
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (FLINK-9014) Adapt BackPressureStatsTracker to work with credit-based flow control

2018-03-19 Thread Aljoscha Krettek (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-9014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aljoscha Krettek updated FLINK-9014:

Affects Version/s: (was: 1.6.0)

> Adapt BackPressureStatsTracker to work with credit-based flow control
> -
>
> Key: FLINK-9014
> URL: https://issues.apache.org/jira/browse/FLINK-9014
> Project: Flink
>  Issue Type: Sub-task
>  Components: Network, Webfrontend
>Affects Versions: 1.5.0
>Reporter: Nico Kruber
>Priority: Critical
> Fix For: 1.5.0
>
>
> The {{BackPressureStatsTracker}} relies on sampling threads being blocked in 
> {{LocalBufferPool#requestBufferBuilderBlocking}} to indicate backpressure but 
> with credit-based flow control, we are also back-pressured if we did not get 
> any credits (yet).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (FLINK-9011) YarnResourceManager spamming log file at INFO level

2018-03-19 Thread Aljoscha Krettek (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-9011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aljoscha Krettek updated FLINK-9011:

Fix Version/s: (was: 1.6.0)

> YarnResourceManager spamming log file at INFO level
> ---
>
> Key: FLINK-9011
> URL: https://issues.apache.org/jira/browse/FLINK-9011
> Project: Flink
>  Issue Type: Bug
>  Components: ResourceManager, YARN
>Affects Versions: 1.5.0
>Reporter: Nico Kruber
>Assignee: vinoyang
>Priority: Blocker
>  Labels: flip-6
> Fix For: 1.5.0
>
>
> For every requested resource, the {{YarnResourceManager}} spams the log with 
> log-level INFO and the following messages:
> {code}
> 2018-03-16 03:41:20,180 INFO  org.apache.flink.yarn.YarnResourceManager   
>   - Received new container: 
> container_1521038088305_0257_01_000102 - Remaining pending container 
> requests: 301
> 2018-03-16 03:41:20,180 INFO  org.apache.flink.yarn.YarnResourceManager   
>   - TaskExecutor container_1521038088305_0257_01_000102 will be 
> started with container size 8192 MB, JVM heap size 5120 MB, JVM direct memory 
> limit 3072 MB
> 2018-03-16 03:41:20,180 INFO  org.apache.flink.yarn.YarnResourceManager   
>   - TM:remote keytab path obtained null
> 2018-03-16 03:41:20,180 INFO  org.apache.flink.yarn.YarnResourceManager   
>   - TM:remote keytab principal obtained null
> 2018-03-16 03:41:20,180 INFO  org.apache.flink.yarn.YarnResourceManager   
>   - TM:remote yarn conf path obtained null
> 2018-03-16 03:41:20,180 INFO  org.apache.flink.yarn.YarnResourceManager   
>   - TM:remote krb5 path obtained null
> 2018-03-16 03:41:20,181 INFO  org.apache.flink.yarn.Utils 
>   - Copying from 
> file:/mnt/yarn/usercache/hadoop/appcache/application_1521038088305_0257/container_1521038088305_0257_01_01/6766be70-82f7-4999-a371-11c27527fb6e-taskmanager-conf.yaml
>  to 
> hdfs://ip-172-31-1-91.eu-west-1.compute.internal:8020/user/hadoop/.flink/application_1521038088305_0257/6766be70-82f7-4999-a371-11c27527fb6e-taskmanager-conf.yaml
> 2018-03-16 03:41:20,190 INFO  org.apache.flink.yarn.YarnResourceManager   
>   - Prepared local resource for modified yaml: resource { scheme: 
> "hdfs" host: "ip-172-31-1-91.eu-west-1.compute.internal" port: 8020 file: 
> "/user/hadoop/.flink/application_1521038088305_0257/6766be70-82f7-4999-a371-11c27527fb6e-taskmanager-conf.yaml"
>  } size: 595 timestamp: 1521171680190 type: FILE visibility: APPLICATION
> 2018-03-16 03:41:20,194 INFO  org.apache.flink.yarn.YarnResourceManager   
>   - Creating container launch context for TaskManagers
> 2018-03-16 03:41:20,194 INFO  org.apache.flink.yarn.YarnResourceManager   
>   - Starting TaskManagers with command: $JAVA_HOME/bin/java 
> -Xms5120m -Xmx5120m -XX:MaxDirectMemorySize=3072m  
> -Dlog.file=/taskmanager.log 
> -Dlogback.configurationFile=file:./logback.xml 
> -Dlog4j.configuration=file:./log4j.properties 
> org.apache.flink.yarn.YarnTaskExecutorRunner --configDir . 1> 
> /taskmanager.out 2> /taskmanager.err
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (FLINK-9027) Web UI does not cleanup temporary files

2018-03-19 Thread Aljoscha Krettek (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-9027?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aljoscha Krettek updated FLINK-9027:

Fix Version/s: (was: 1.6.0)

> Web UI does not cleanup temporary files
> ---
>
> Key: FLINK-9027
> URL: https://issues.apache.org/jira/browse/FLINK-9027
> Project: Flink
>  Issue Type: Bug
>  Components: Webfrontend
>Affects Versions: 1.5.0, 1.6.0
>Reporter: Nico Kruber
>Priority: Blocker
> Fix For: 1.5.0
>
>
> The web UI creates two directories in {{java.io.tmp}}, namely 
> {{flink-web-}} and {{flink-web-upload-}} and both are not cleaned 
> up (running {{start-cluster.sh}} and {{stop-
> cluster.sh}}.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (FLINK-9014) Adapt BackPressureStatsTracker to work with credit-based flow control

2018-03-19 Thread Aljoscha Krettek (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-9014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aljoscha Krettek updated FLINK-9014:

Fix Version/s: (was: 1.6.0)

> Adapt BackPressureStatsTracker to work with credit-based flow control
> -
>
> Key: FLINK-9014
> URL: https://issues.apache.org/jira/browse/FLINK-9014
> Project: Flink
>  Issue Type: Sub-task
>  Components: Network, Webfrontend
>Affects Versions: 1.5.0, 1.6.0
>Reporter: Nico Kruber
>Priority: Critical
> Fix For: 1.5.0
>
>
> The {{BackPressureStatsTracker}} relies on sampling threads being blocked in 
> {{LocalBufferPool#requestBufferBuilderBlocking}} to indicate backpressure but 
> with credit-based flow control, we are also back-pressured if we did not get 
> any credits (yet).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (FLINK-9013) Document yarn.containers.vcores only being effective when adapting YARN config

2018-03-19 Thread Aljoscha Krettek (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-9013?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aljoscha Krettek updated FLINK-9013:

Fix Version/s: (was: 1.6.0)

> Document yarn.containers.vcores only being effective when adapting YARN config
> --
>
> Key: FLINK-9013
> URL: https://issues.apache.org/jira/browse/FLINK-9013
> Project: Flink
>  Issue Type: Improvement
>  Components: Documentation, YARN
>Affects Versions: 1.5.0, 1.6.0
>Reporter: Nico Kruber
>Assignee: Nico Kruber
>Priority: Major
> Fix For: 1.5.0
>
>
> Even after specifying {{yarn.containers.vcores}} and having Flink request 
> such a container from YARN, it may not take these into account at all and 
> return a container with 1 vcore.
> The YARN configuration needs to be adapted to take the vcores into account, 
> e.g. by setting the {{FairScheduler}} in {{yarn-site.xml}}:
> {code}
> 
>   yarn.resourcemanager.scheduler.class
>   
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler
> 
> {code}
> This fact should be documented at least at the configuration parameter 
> documentation of  {{yarn.containers.vcores}}.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (FLINK-9012) Shaded Hadoop S3A end-to-end test failing with S3 bucket in Ireland

2018-03-19 Thread Aljoscha Krettek (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-9012?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aljoscha Krettek updated FLINK-9012:

Fix Version/s: (was: 1.6.0)

> Shaded Hadoop S3A end-to-end test failing with S3 bucket in Ireland
> ---
>
> Key: FLINK-9012
> URL: https://issues.apache.org/jira/browse/FLINK-9012
> Project: Flink
>  Issue Type: Bug
>  Components: Tests
>Affects Versions: 1.5.0, 1.6.0
>Reporter: Nico Kruber
>Assignee: Nico Kruber
>Priority: Blocker
> Fix For: 1.5.0
>
>
> https://api.travis-ci.org/v3/job/354259892/log.txt
> {code}
> Found AWS bucket [secure], running Shaded Hadoop S3A e2e tests.
> Flink dist directory: /home/travis/build/NicoK/flink/build-target
> TEST_DATA_DIR: 
> /home/travis/build/NicoK/flink/flink-end-to-end-tests/test-scripts/temp-test-directory-05775180416
>   % Total% Received % Xferd  Average Speed   TimeTime Time  
> Current
>  Dload  Upload   Total   SpentLeft  Speed
>   0 00 00 0  0  0 --:--:-- --:--:-- --:--:-- 0
>   0 00 00 0  0  0 --:--:-- --:--:-- --:--:-- 0
>  91   4930   4490 0   2476  0 --:--:-- --:--:-- --:--:--  2467
> 
> TemporaryRedirectPlease re-send this request to 
> the specified temporary endpoint. Continue to use the original request 
> endpoint for future 
> requests.[secure][secure].s3-eu-west-1.amazonaws.com1FCEC82C3EBF7C7ENG5dxnXQ0Y5mK2X/m3bU+Z7Fqt0mNVL2JsjyVjGZUmpDmNuBDfKJACh7VI6tCTYEZsw65W057lA=Starting
>  cluster.
> Starting standalonesession daemon on host 
> travis-job-087822e3-2f4c-46b7-b9bd-b6d4aba6136c.
> Starting taskexecutor daemon on host 
> travis-job-087822e3-2f4c-46b7-b9bd-b6d4aba6136c.
> Dispatcher/TaskManagers are not yet up
> Waiting for dispatcher REST endpoint to come up...
> Dispatcher/TaskManagers are not yet up
> Waiting for dispatcher REST endpoint to come up...
> Dispatcher/TaskManagers are not yet up
> Waiting for dispatcher REST endpoint to come up...
> Dispatcher/TaskManagers are not yet up
> Waiting for dispatcher REST endpoint to come up...
> Dispatcher/TaskManagers are not yet up
> Waiting for dispatcher REST endpoint to come up...
> Waiting for dispatcher REST endpoint to come up...
> Dispatcher REST endpoint is up.
> Starting execution of program
> 
>  The program finished with the following exception:
> org.apache.flink.client.program.ProgramInvocationException: Could not 
> retrieve the execution result.
>   at 
> org.apache.flink.client.program.rest.RestClusterClient.submitJob(RestClusterClient.java:246)
>   at 
> org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:458)
>   at 
> org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:446)
>   at 
> org.apache.flink.client.program.ContextEnvironment.execute(ContextEnvironment.java:62)
>   at 
> org.apache.flink.examples.java.wordcount.WordCount.main(WordCount.java:86)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:528)
>   at 
> org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:420)
>   at 
> org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:398)
>   at 
> org.apache.flink.client.cli.CliFrontend.executeProgram(CliFrontend.java:780)
>   at 
> org.apache.flink.client.cli.CliFrontend.runProgram(CliFrontend.java:274)
>   at org.apache.flink.client.cli.CliFrontend.run(CliFrontend.java:209)
>   at 
> org.apache.flink.client.cli.CliFrontend.parseParameters(CliFrontend.java:1019)
>   at 
> org.apache.flink.client.cli.CliFrontend.lambda$main$9(CliFrontend.java:1095)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1836)
>   at 
> org.apache.flink.runtime.security.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:41)
>   at org.apache.flink.client.cli.CliFrontend.main(CliFrontend.java:1095)
> Caused by: org.apache.flink.runtime.client.JobSubmissionException: Failed to 
> submit JobGraph.
>   at 
> org.apache.flink.client.program.rest.RestClusterClient.lambda$submitJob$4(RestClusterClient.java:341)
>   at 
> 

[jira] [Commented] (FLINK-8959) Port AccumulatorLiveITCase to flip6

2018-03-19 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-8959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16405348#comment-16405348
 ] 

ASF GitHub Bot commented on FLINK-8959:
---

Github user zentol commented on a diff in the pull request:

https://github.com/apache/flink/pull/5719#discussion_r175566292
  
--- Diff: 
flink-tests/src/test/java/org/apache/flink/test/accumulators/AccumulatorLiveITCase.java
 ---
@@ -18,292 +18,188 @@
 
 package org.apache.flink.test.accumulators;
 
-import org.apache.flink.api.common.JobExecutionResult;
-import org.apache.flink.api.common.JobID;
 import org.apache.flink.api.common.Plan;
-import org.apache.flink.api.common.accumulators.Accumulator;
 import org.apache.flink.api.common.accumulators.IntCounter;
 import org.apache.flink.api.common.functions.RichFlatMapFunction;
 import org.apache.flink.api.common.io.OutputFormat;
 import org.apache.flink.api.java.DataSet;
 import org.apache.flink.api.java.ExecutionEnvironment;
-import org.apache.flink.api.java.LocalEnvironment;
+import org.apache.flink.client.program.ClusterClient;
 import org.apache.flink.configuration.AkkaOptions;
-import org.apache.flink.configuration.ConfigConstants;
 import org.apache.flink.configuration.Configuration;
+import org.apache.flink.configuration.HeartbeatManagerOptions;
+import org.apache.flink.core.testutils.MultiShotLatch;
 import org.apache.flink.optimizer.DataStatistics;
 import org.apache.flink.optimizer.Optimizer;
 import org.apache.flink.optimizer.plan.OptimizedPlan;
 import org.apache.flink.optimizer.plantranslate.JobGraphGenerator;
-import org.apache.flink.runtime.akka.AkkaUtils;
-import org.apache.flink.runtime.akka.ListeningBehaviour;
-import org.apache.flink.runtime.executiongraph.ExecutionAttemptID;
-import org.apache.flink.runtime.instance.ActorGateway;
-import org.apache.flink.runtime.instance.AkkaActorGateway;
 import org.apache.flink.runtime.jobgraph.JobGraph;
-import org.apache.flink.runtime.messages.JobManagerMessages;
-import org.apache.flink.runtime.testingUtils.TestingCluster;
-import org.apache.flink.runtime.testingUtils.TestingJobManagerMessages;
-import org.apache.flink.runtime.testingUtils.TestingTaskManagerMessages;
 import org.apache.flink.runtime.testingUtils.TestingUtils;
 import org.apache.flink.streaming.api.datastream.DataStream;
 import 
org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
+import org.apache.flink.test.util.MiniClusterResource;
+import org.apache.flink.testutils.category.Flip6;
 import org.apache.flink.util.Collector;
 import org.apache.flink.util.TestLogger;
 
-import akka.actor.ActorRef;
-import akka.actor.ActorSystem;
-import akka.pattern.Patterns;
-import akka.testkit.JavaTestKit;
-import akka.util.Timeout;
-import org.junit.After;
 import org.junit.Before;
+import org.junit.ClassRule;
 import org.junit.Test;
+import org.junit.experimental.categories.Category;
 import org.slf4j.Logger;
 import org.slf4j.LoggerFactory;
 
 import java.io.IOException;
 import java.util.ArrayList;
 import java.util.List;
 import java.util.Map;
-import java.util.concurrent.TimeUnit;
 
-import scala.concurrent.Await;
-import scala.concurrent.Future;
-import scala.concurrent.duration.FiniteDuration;
-
-import static org.junit.Assert.fail;
+import static org.junit.Assert.assertEquals;
+import static org.junit.Assert.assertTrue;
 
 /**
- * Tests the availability of accumulator results during runtime. The test 
case tests a user-defined
- * accumulator and Flink's internal accumulators for two consecutive tasks.
- *
- * CHAINED[Source -> Map] -> Sink
- *
- * Checks are performed as the elements arrive at the operators. Checks 
consist of a message sent by
- * the task to the task manager which notifies the job manager and sends 
the current accumulators.
- * The task blocks until the test has been notified about the current 
accumulator values.
- *
- * A barrier between the operators ensures that that pipelining is 
disabled for the streaming test.
- * The batch job reads the records one at a time. The streaming code 
buffers the records beforehand;
- * that's why exact guarantees about the number of records read are very 
hard to make. Thus, why we
- * check for an upper bound of the elements read.
+ * Tests the availability of accumulator results during runtime.
  */
+@Category(Flip6.class)
 public class AccumulatorLiveITCase extends TestLogger {
 
private static final Logger LOG = 
LoggerFactory.getLogger(AccumulatorLiveITCase.class);
 
-   private static ActorSystem system;
-   private static ActorGateway 

[GitHub] flink pull request #5719: [FLINK-8959][tests] Port AccumulatorLiveITCase to ...

2018-03-19 Thread zentol
Github user zentol commented on a diff in the pull request:

https://github.com/apache/flink/pull/5719#discussion_r175566292
  
--- Diff: 
flink-tests/src/test/java/org/apache/flink/test/accumulators/AccumulatorLiveITCase.java
 ---
@@ -18,292 +18,188 @@
 
 package org.apache.flink.test.accumulators;
 
-import org.apache.flink.api.common.JobExecutionResult;
-import org.apache.flink.api.common.JobID;
 import org.apache.flink.api.common.Plan;
-import org.apache.flink.api.common.accumulators.Accumulator;
 import org.apache.flink.api.common.accumulators.IntCounter;
 import org.apache.flink.api.common.functions.RichFlatMapFunction;
 import org.apache.flink.api.common.io.OutputFormat;
 import org.apache.flink.api.java.DataSet;
 import org.apache.flink.api.java.ExecutionEnvironment;
-import org.apache.flink.api.java.LocalEnvironment;
+import org.apache.flink.client.program.ClusterClient;
 import org.apache.flink.configuration.AkkaOptions;
-import org.apache.flink.configuration.ConfigConstants;
 import org.apache.flink.configuration.Configuration;
+import org.apache.flink.configuration.HeartbeatManagerOptions;
+import org.apache.flink.core.testutils.MultiShotLatch;
 import org.apache.flink.optimizer.DataStatistics;
 import org.apache.flink.optimizer.Optimizer;
 import org.apache.flink.optimizer.plan.OptimizedPlan;
 import org.apache.flink.optimizer.plantranslate.JobGraphGenerator;
-import org.apache.flink.runtime.akka.AkkaUtils;
-import org.apache.flink.runtime.akka.ListeningBehaviour;
-import org.apache.flink.runtime.executiongraph.ExecutionAttemptID;
-import org.apache.flink.runtime.instance.ActorGateway;
-import org.apache.flink.runtime.instance.AkkaActorGateway;
 import org.apache.flink.runtime.jobgraph.JobGraph;
-import org.apache.flink.runtime.messages.JobManagerMessages;
-import org.apache.flink.runtime.testingUtils.TestingCluster;
-import org.apache.flink.runtime.testingUtils.TestingJobManagerMessages;
-import org.apache.flink.runtime.testingUtils.TestingTaskManagerMessages;
 import org.apache.flink.runtime.testingUtils.TestingUtils;
 import org.apache.flink.streaming.api.datastream.DataStream;
 import 
org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
+import org.apache.flink.test.util.MiniClusterResource;
+import org.apache.flink.testutils.category.Flip6;
 import org.apache.flink.util.Collector;
 import org.apache.flink.util.TestLogger;
 
-import akka.actor.ActorRef;
-import akka.actor.ActorSystem;
-import akka.pattern.Patterns;
-import akka.testkit.JavaTestKit;
-import akka.util.Timeout;
-import org.junit.After;
 import org.junit.Before;
+import org.junit.ClassRule;
 import org.junit.Test;
+import org.junit.experimental.categories.Category;
 import org.slf4j.Logger;
 import org.slf4j.LoggerFactory;
 
 import java.io.IOException;
 import java.util.ArrayList;
 import java.util.List;
 import java.util.Map;
-import java.util.concurrent.TimeUnit;
 
-import scala.concurrent.Await;
-import scala.concurrent.Future;
-import scala.concurrent.duration.FiniteDuration;
-
-import static org.junit.Assert.fail;
+import static org.junit.Assert.assertEquals;
+import static org.junit.Assert.assertTrue;
 
 /**
- * Tests the availability of accumulator results during runtime. The test 
case tests a user-defined
- * accumulator and Flink's internal accumulators for two consecutive tasks.
- *
- * CHAINED[Source -> Map] -> Sink
- *
- * Checks are performed as the elements arrive at the operators. Checks 
consist of a message sent by
- * the task to the task manager which notifies the job manager and sends 
the current accumulators.
- * The task blocks until the test has been notified about the current 
accumulator values.
- *
- * A barrier between the operators ensures that that pipelining is 
disabled for the streaming test.
- * The batch job reads the records one at a time. The streaming code 
buffers the records beforehand;
- * that's why exact guarantees about the number of records read are very 
hard to make. Thus, why we
- * check for an upper bound of the elements read.
+ * Tests the availability of accumulator results during runtime.
  */
+@Category(Flip6.class)
 public class AccumulatorLiveITCase extends TestLogger {
 
private static final Logger LOG = 
LoggerFactory.getLogger(AccumulatorLiveITCase.class);
 
-   private static ActorSystem system;
-   private static ActorGateway jobManagerGateway;
-   private static ActorRef taskManager;
-
-   private static JobID jobID;
-   private static JobGraph jobGraph;
-
// name of user accumulator
private static final String ACCUMULATOR_NAME = 

[jira] [Commented] (FLINK-8937) Refactor flink end-to-end test skeleton

2018-03-19 Thread Aljoscha Krettek (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-8937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16405343#comment-16405343
 ] 

Aljoscha Krettek commented on FLINK-8937:
-

That sounds good. Is that even possible, though, with registering methods and 
whatnot?

It might be a good idea to scrap the current scripts and instead move to 
something like Python. Or Python.

> Refactor flink end-to-end test skeleton
> ---
>
> Key: FLINK-8937
> URL: https://issues.apache.org/jira/browse/FLINK-8937
> Project: Flink
>  Issue Type: Improvement
>  Components: Tests
>Reporter: Florian Schmidt
>Assignee: Florian Schmidt
>Priority: Major
>
> I was thinking about refactoring the end-to-end tests skeleton. The following 
> ideas came to mind:
> - Allow each individual test script to specify some setup and cleanup method 
> that gets executed before and after each test, especially also if that test 
> fails and the EXIT trap is triggered
> - Refactor adding a new test script into a function that provides uniform 
> error handling
> - Add helper methods for common tasks like waiting for a pattern to appear in 
> some log file (e.g. to ensure that a task is running)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-8073) Test instability FlinkKafkaProducer011ITCase.testScaleDownBeforeFirstCheckpoint()

2018-03-19 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-8073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16405298#comment-16405298
 ] 

ASF GitHub Bot commented on FLINK-8073:
---

Github user zentol commented on the issue:

https://github.com/apache/flink/pull/5718
  
I don't want to make a fuss over a single test, so for the purposes of this 
test it's fine to just remote the timeout.

In general however i think it would be pretty neat if we would add such a 
`Timeout` to the `TestLogger` class. I see the problem of debugging tests that 
have timeouts, but removing timeouts altogether is the wrong conclusion imo. 
Instead, we could implement the `Timeout` such that it doesn't fail the test if 
a certain profile/property is set.

With this we keep the benefits of test timeouts (CI service independence, 
less special behavior locally vs CI, fixed upper bound for test times which is 
particularly useful for new tests, "guarantee" that the test terminates) while 
still allowing debugging. In fact we may end up improving the debugging 
situation by consolidating how timeouts are implemented instead of each test 
rolling their own solution that you can't disable.

The travis watchdog _cannot_ be removed as it covers the entire maven 
process that from time to time locks up outside of tests. That said, the fact 
that we have to rely on the travis watchdog _to ensure that tests terminate_ is 
a bad sign. Not to mention that it already forced us to introduce workarounds 
to make tests "compatible", like the kafka tests and others that print stuff 
for the sole purposes of not triggering it.



> Test instability 
> FlinkKafkaProducer011ITCase.testScaleDownBeforeFirstCheckpoint()
> -
>
> Key: FLINK-8073
> URL: https://issues.apache.org/jira/browse/FLINK-8073
> Project: Flink
>  Issue Type: Bug
>  Components: Kafka Connector, Tests
>Affects Versions: 1.4.0, 1.5.0
>Reporter: Kostas Kloudas
>Priority: Blocker
>  Labels: test-stability
> Fix For: 1.5.0
>
>
> Travis log: https://travis-ci.org/kl0u/flink/jobs/301985988



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] flink issue #5718: [FLINK-8073][kafka-tests] Disable timeout in tests

2018-03-19 Thread zentol
Github user zentol commented on the issue:

https://github.com/apache/flink/pull/5718
  
I don't want to make a fuss over a single test, so for the purposes of this 
test it's fine to just remote the timeout.

In general however i think it would be pretty neat if we would add such a 
`Timeout` to the `TestLogger` class. I see the problem of debugging tests that 
have timeouts, but removing timeouts altogether is the wrong conclusion imo. 
Instead, we could implement the `Timeout` such that it doesn't fail the test if 
a certain profile/property is set.

With this we keep the benefits of test timeouts (CI service independence, 
less special behavior locally vs CI, fixed upper bound for test times which is 
particularly useful for new tests, "guarantee" that the test terminates) while 
still allowing debugging. In fact we may end up improving the debugging 
situation by consolidating how timeouts are implemented instead of each test 
rolling their own solution that you can't disable.

The travis watchdog _cannot_ be removed as it covers the entire maven 
process that from time to time locks up outside of tests. That said, the fact 
that we have to rely on the travis watchdog _to ensure that tests terminate_ is 
a bad sign. Not to mention that it already forced us to introduce workarounds 
to make tests "compatible", like the kafka tests and others that print stuff 
for the sole purposes of not triggering it.



---


[jira] [Updated] (FLINK-8886) Job isolation via scheduling in shared cluster

2018-03-19 Thread Aljoscha Krettek (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-8886?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aljoscha Krettek updated FLINK-8886:

Component/s: Local Runtime
 Distributed Coordination

> Job isolation via scheduling in shared cluster
> --
>
> Key: FLINK-8886
> URL: https://issues.apache.org/jira/browse/FLINK-8886
> Project: Flink
>  Issue Type: Improvement
>  Components: Distributed Coordination, Local Runtime, Scheduler
>Affects Versions: 1.5.0
>Reporter: Elias Levy
>Priority: Major
>
> Flink's TaskManager executes tasks from different jobs within the same JMV as 
> threads.  We prefer to isolate different jobs on their on JVM.  Thus, we must 
> use different TMs for different jobs.  As currently the scheduler will 
> allocate task slots within a TM to tasks from different jobs, that means we 
> must stand up one cluster per job.  This is wasteful, as it requires at least 
> two JobManagers per cluster for high-availability, and the JMs have low 
> utilization.
> Additionally, different jobs may require different resources.  Some jobs are 
> compute heavy.  Some are IO heavy (lots of state in RocksDB).  At the moment 
> the scheduler threats all TMs are equivalent, except possibly in their number 
> of available task slots.  Thus, one is required to stand up multiple cluster 
> if there is a need for different types of TMs.
>  
> It would be useful if one could specify requirements on job, such that they 
> are only scheduled on a subset of TMs.  Properly configured, that would 
> permit isolation of jobs in a shared cluster and scheduling of jobs with 
> specific resource needs.
>  
> One possible implementation is to specify a set of tags on the TM config file 
> which the TMs used when registering with the JM, and another set of tags 
> configured within the job or supplied when submitting the job.  The scheduler 
> could then match the tags in the job with the tags in the TMs.  In a 
> restrictive mode the scheduler would assign a job task to a TM only if all 
> tags match.  In a relaxed mode the scheduler could assign a job task to a TM 
> if there is a partial match, while giving preference to a more accurate match.
>  
>  
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (FLINK-8939) Provide better support for saving streaming data to s3

2018-03-19 Thread Aljoscha Krettek (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-8939?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aljoscha Krettek updated FLINK-8939:

Component/s: Streaming Connectors

> Provide better support for saving streaming data to s3
> --
>
> Key: FLINK-8939
> URL: https://issues.apache.org/jira/browse/FLINK-8939
> Project: Flink
>  Issue Type: Improvement
>  Components: DataStream API, Streaming Connectors
>Reporter: chris snow
>Priority: Major
> Attachments: 18652BC0-DD67-42C3-9A33-12F7BC10F9F3.png
>
>
> Flink seems to struggle saving data to s3 due to the lack of a truncate 
> method, and in my test this resulted in lots of files with a .valid-length 
> suffix
> I’m using a bucketing sink:
> {code:java}
> return new BucketingSink>(path)
> .setWriter(writer)
> .setBucketer(new DateTimeBucketer Object>>(formatString));{code}
>  
> !18652BC0-DD67-42C3-9A33-12F7BC10F9F3.png!
> See also, the discussion in the comments here: 
> https://issues.apache.org/jira/browse/FLINK-8543?filter=-2



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-9020) Move test projects of end-to-end tests in separate modules

2018-03-19 Thread Aljoscha Krettek (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-9020?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16405189#comment-16405189
 ] 

Aljoscha Krettek commented on FLINK-9020:
-

Only seeing this now but it makes a lot of sense! 

> Move test projects of end-to-end tests in separate modules
> --
>
> Key: FLINK-9020
> URL: https://issues.apache.org/jira/browse/FLINK-9020
> Project: Flink
>  Issue Type: Improvement
>  Components: Tests
>Affects Versions: 1.5.0
>Reporter: Florian Schmidt
>Assignee: Florian Schmidt
>Priority: Major
>
> I would like to propose to move each test case in the end-to-end tests into 
> it's own module. Reason is that currently we are building all jars for the 
> tests from one pom.xml, which makes it hard to have specific tests for 
> certain build types (e.g. examples derived from the flink quickstart 
> archetype).
> For the current state this would mean
> - change packaging from flink-end-to-end-tests from jar to pom
> - refactor the classloader example to be in its own module



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (FLINK-9027) Web UI does not cleanup temporary files

2018-03-19 Thread Nico Kruber (JIRA)
Nico Kruber created FLINK-9027:
--

 Summary: Web UI does not cleanup temporary files
 Key: FLINK-9027
 URL: https://issues.apache.org/jira/browse/FLINK-9027
 Project: Flink
  Issue Type: Bug
  Components: Webfrontend
Affects Versions: 1.5.0, 1.6.0
Reporter: Nico Kruber
 Fix For: 1.5.0, 1.6.0


The web UI creates two directories in {{java.io.tmp}}, namely 
{{flink-web-}} and {{flink-web-upload-}} and both are not cleaned 
up (running {{start-cluster.sh}} and {{stop-
cluster.sh}}.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Closed] (FLINK-9025) Problem with convert String to Row

2018-03-19 Thread Fabian Hueske (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-9025?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fabian Hueske closed FLINK-9025.

Resolution: Not A Problem

The {{Row}} data type was not designed for flexible schemas.

Closing as "Not a problem".

> Problem with convert String to Row
> --
>
> Key: FLINK-9025
> URL: https://issues.apache.org/jira/browse/FLINK-9025
> Project: Flink
>  Issue Type: Bug
>  Components: Java API
>Reporter: ChenDonghe
>Priority: Major
>
> when we convert json string into Row, and set type of filed in the row, code 
> below:
> {code:java}
> // code placeholder
> DataStream stream = env.addSource(consumer.getInstance(sourceTopic, 
> new SimpleStringSchema()).setStartFromLatest());
> DataStream dataStreamRow = stream.map(new 
> ConvertDataStream()).returns(typeinfo);
> {code}
> this convert process calls the copy function in RowSerializer class, code 
> below:
>  
> {code:java}
> // public final class RowSerializer extends TypeSerializer
> @Override
> public Row copy(Row from) {
>int len = fieldSerializers.length;
>if (from.getArity() != len) {
>   throw new RuntimeException("Row arity of from does not match 
> serializers.");
>}
>Row result = new Row(len);
>for (int i = 0; i < len; i++) {
>   Object fromField = from.getField(i);
>   if (fromField != null) {
>  Object copy = fieldSerializers[i].copy(fromField);
>  result.setField(i, copy);
>   } else {
>  result.setField(i, null);
>   }
>}
>return result;
> }
> {code}
> the json string type message from kafka convert to the row type, in this 
> process, RowSerrializer copy the from-row to a new result-row, but the type 
> of result-row filed is Object type, for example, the first message from kafka 
> filed0 is Integer type, the second message from kafka filed0 is Long type, if 
> we set the filed0 is Long, we hope Integer can be compatible, but we got an 
> exception : can not convert Interger to Long, so we hope RowSerrializer copy 
> function can be more flexible, it can be act as a table, we can insert an 
> Integer type value into Long type filed.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-9025) Problem with convert String to Row

2018-03-19 Thread Fabian Hueske (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-9025?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16405162#comment-16405162
 ] 

Fabian Hueske commented on FLINK-9025:
--

Rows have a schema with fixed data types like most types in Flink.
This behavior was deliberately chosen to improve performance. Since the {{Row}} 
type is internally used by Flink's SQL feature and Table API, we cannot make 
the serialization and copy logic more heavy to avoid performance implications.

If you'd like to use the {{Row}} data type, you should ensure that all input 
records follow the same schema.
If that is not possible, you should chose a different data type.

Closing this issue as "Not a problem".
Feel free to reopen if you'd like to discuss the issue further or reach out to 
the user mailing list.

> Problem with convert String to Row
> --
>
> Key: FLINK-9025
> URL: https://issues.apache.org/jira/browse/FLINK-9025
> Project: Flink
>  Issue Type: Bug
>  Components: Java API
>Reporter: ChenDonghe
>Priority: Major
>
> when we convert json string into Row, and set type of filed in the row, code 
> below:
> {code:java}
> // code placeholder
> DataStream stream = env.addSource(consumer.getInstance(sourceTopic, 
> new SimpleStringSchema()).setStartFromLatest());
> DataStream dataStreamRow = stream.map(new 
> ConvertDataStream()).returns(typeinfo);
> {code}
> this convert process calls the copy function in RowSerializer class, code 
> below:
>  
> {code:java}
> // public final class RowSerializer extends TypeSerializer
> @Override
> public Row copy(Row from) {
>int len = fieldSerializers.length;
>if (from.getArity() != len) {
>   throw new RuntimeException("Row arity of from does not match 
> serializers.");
>}
>Row result = new Row(len);
>for (int i = 0; i < len; i++) {
>   Object fromField = from.getField(i);
>   if (fromField != null) {
>  Object copy = fieldSerializers[i].copy(fromField);
>  result.setField(i, copy);
>   } else {
>  result.setField(i, null);
>   }
>}
>return result;
> }
> {code}
> the json string type message from kafka convert to the row type, in this 
> process, RowSerrializer copy the from-row to a new result-row, but the type 
> of result-row filed is Object type, for example, the first message from kafka 
> filed0 is Integer type, the second message from kafka filed0 is Long type, if 
> we set the filed0 is Long, we hope Integer can be compatible, but we got an 
> exception : can not convert Interger to Long, so we hope RowSerrializer copy 
> function can be more flexible, it can be act as a table, we can insert an 
> Integer type value into Long type filed.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-9026) Unregister finished tasks from TaskManagerMetricGroup and close it

2018-03-19 Thread Sihua Zhou (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-9026?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16405130#comment-16405130
 ] 

Sihua Zhou commented on FLINK-9026:
---

Hi [~till.rohrmann] have you already work on this? Or I'd like to take this 
ticket.

> Unregister finished tasks from TaskManagerMetricGroup and close it
> --
>
> Key: FLINK-9026
> URL: https://issues.apache.org/jira/browse/FLINK-9026
> Project: Flink
>  Issue Type: Bug
>  Components: Metrics
>Affects Versions: 1.5.0, 1.6.0
>Reporter: Till Rohrmann
>Priority: Major
>  Labels: flip-6
> Fix For: 1.5.0
>
>
> We should unregister {{Tasks}} from the {{TaskManagerMetricGroup}} when they 
> have reached a final state. Moreover, we should close the 
> {{TaskManagerMetricGroup}} either in the {{TaskExecutor#postStop}} method or 
> let the caller do this.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (FLINK-9026) Unregister finished tasks from TaskManagerMetricGroup and close it

2018-03-19 Thread Till Rohrmann (JIRA)
Till Rohrmann created FLINK-9026:


 Summary: Unregister finished tasks from TaskManagerMetricGroup and 
close it
 Key: FLINK-9026
 URL: https://issues.apache.org/jira/browse/FLINK-9026
 Project: Flink
  Issue Type: Bug
  Components: Metrics
Affects Versions: 1.5.0, 1.6.0
Reporter: Till Rohrmann
 Fix For: 1.5.0


We should unregister {{Tasks}} from the {{TaskManagerMetricGroup}} when they 
have reached a final state. Moreover, we should close the 
{{TaskManagerMetricGroup}} either in the {{TaskExecutor#postStop}} method or 
let the caller do this.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-8946) TaskManager stop sending metrics after JobManager failover

2018-03-19 Thread Till Rohrmann (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-8946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16405101#comment-16405101
 ] 

Till Rohrmann commented on FLINK-8946:
--

Is this also a problem for Flip-6 [~kien_truong]?

> TaskManager stop sending metrics after JobManager failover
> --
>
> Key: FLINK-8946
> URL: https://issues.apache.org/jira/browse/FLINK-8946
> Project: Flink
>  Issue Type: Bug
>  Components: Metrics, TaskManager
>Affects Versions: 1.4.2
>Reporter: Truong Duc Kien
>Assignee: vinoyang
>Priority: Critical
> Fix For: 1.5.0
>
>
> Running in Yarn-standalone mode, when the Job Manager performs a failover, 
> all TaskManager that are inherited from the previous JobManager will not send 
> metrics to the new JobManager and other registered metric reporters.
>  
> A cursory glance reveal that these line of code might be the cause
> [https://github.com/apache/flink/blob/a3478fdfa0f792104123fefbd9bdf01f5029de51/flink-runtime/src/main/scala/org/apache/flink/runtime/taskmanager/TaskManager.scala#L1082-L1086]
> Perhap the TaskManager close its metrics group when disassociating 
> JobManager, but not creating a new one on fail-over association ?
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-8976) End-to-end test: Resume with different parallelism

2018-03-19 Thread Till Rohrmann (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-8976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16405096#comment-16405096
 ] 

Till Rohrmann commented on FLINK-8976:
--

My oversight. It should also work with incremental checkpoints. At least it 
worked for me when trying it out. I will correct the description.

> End-to-end test: Resume with different parallelism
> --
>
> Key: FLINK-8976
> URL: https://issues.apache.org/jira/browse/FLINK-8976
> Project: Flink
>  Issue Type: Sub-task
>  Components: Tests
>Affects Versions: 1.5.0
>Reporter: Till Rohrmann
>Assignee: Tzu-Li (Gordon) Tai
>Priority: Blocker
> Fix For: 1.5.0
>
>
> Similar to FLINK-8975, we should have an end-to-end test which resumes a job 
> with a different parallelism after taking 
> a) a savepoint
> b) from the last retained checkpoint (this won't work with RocksDB 
> incremental checkpoints) 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (FLINK-8976) End-to-end test: Resume with different parallelism

2018-03-19 Thread Till Rohrmann (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-8976?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Till Rohrmann updated FLINK-8976:
-
Description: 
Similar to FLINK-8975, we should have an end-to-end test which resumes a job 
with a different parallelism after taking 
a) a savepoint
b) from the last retained checkpoint

  was:
Similar to FLINK-8975, we should have an end-to-end test which resumes a job 
with a different parallelism after taking 
a) a savepoint
b) from the last retained checkpoint (this won't work with RocksDB incremental 
checkpoints) 


> End-to-end test: Resume with different parallelism
> --
>
> Key: FLINK-8976
> URL: https://issues.apache.org/jira/browse/FLINK-8976
> Project: Flink
>  Issue Type: Sub-task
>  Components: Tests
>Affects Versions: 1.5.0
>Reporter: Till Rohrmann
>Assignee: Tzu-Li (Gordon) Tai
>Priority: Blocker
> Fix For: 1.5.0
>
>
> Similar to FLINK-8975, we should have an end-to-end test which resumes a job 
> with a different parallelism after taking 
> a) a savepoint
> b) from the last retained checkpoint



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (FLINK-9025) Problem with convert String to Row

2018-03-19 Thread ChenDonghe (JIRA)
ChenDonghe created FLINK-9025:
-

 Summary: Problem with convert String to Row
 Key: FLINK-9025
 URL: https://issues.apache.org/jira/browse/FLINK-9025
 Project: Flink
  Issue Type: Bug
  Components: Java API
Reporter: ChenDonghe


when we convert json string into Row, and set type of filed in the row, code 
below:
{code:java}
// code placeholder
DataStream stream = env.addSource(consumer.getInstance(sourceTopic, new 
SimpleStringSchema()).setStartFromLatest());

DataStream dataStreamRow = stream.map(new 
ConvertDataStream()).returns(typeinfo);
{code}
this convert process calls the copy function in RowSerializer class, code below:

 
{code:java}
// public final class RowSerializer extends TypeSerializer

@Override
public Row copy(Row from) {
   int len = fieldSerializers.length;

   if (from.getArity() != len) {
  throw new RuntimeException("Row arity of from does not match 
serializers.");
   }

   Row result = new Row(len);
   for (int i = 0; i < len; i++) {
  Object fromField = from.getField(i);
  if (fromField != null) {
 Object copy = fieldSerializers[i].copy(fromField);
 result.setField(i, copy);
  } else {
 result.setField(i, null);
  }
   }
   return result;
}
{code}
the json string type message from kafka convert to the row type, in this 
process, RowSerrializer copy the from-row to a new result-row, but the type of 
result-row filed is Object type, for example, the first message from kafka 
filed0 is Integer type, the second message from kafka filed0 is Long type, if 
we set the filed0 is Long, we hope Integer can be compatible, but we got an 
exception : can not convert Interger to Long, so we hope RowSerrializer copy 
function can be more flexible, it can be act as a table, we can insert an 
Integer type value into Long type filed.

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Closed] (FLINK-9021) org.apache.flink.table.codegen.CodeGenException: Unsupported call: TUMBLE

2018-03-19 Thread Fabian Hueske (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-9021?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fabian Hueske closed FLINK-9021.

Resolution: Not A Problem

You can implement this functionality with a user-defined aggregate function 
(see docs).

Closing this issue.

Feel free to reopen if you think it needs further discussion.
Cheers, Fabian

> org.apache.flink.table.codegen.CodeGenException: Unsupported call: TUMBLE 
> --
>
> Key: FLINK-9021
> URL: https://issues.apache.org/jira/browse/FLINK-9021
> Project: Flink
>  Issue Type: Bug
>  Components: Table API  SQL
>Affects Versions: 1.4.2
> Environment: java 8
> flink 1.4.2
> scala 2.11
>Reporter: yizhou
>Priority: Major
>
> I have a stream like this: {{<_time(timestamp), uri(string), userId(int)>}}. 
> The {{_time}} attribute is rowtime and I register it as a table:
> {{tableEnv.registerDataStream("userVisitPage", stream, "_time.rowtime, 
> uri,userId");}}
> {{Then I query the table:}}
> {code:java}
> final String sql = "SELECT tumble_start(_time, interval '10' second) as 
> timestart, " 
>+ " count(distinct userId) as uv, "
>+ " uri as uri, "
>+ " count(1) as pv " 
>+  "FROM userVisitPage " 
>+ "GROUP BY tumble(_time, interval '10' second), uri";
> final Table table = tableEnv.sqlQuery(sql);
> tableEnv.toRetractStream(table, Row.class);
> {code}
> {{but occur exceptions:}}{{}}
>  
>  
> {code:java}
> 2018-03-19 19:30:53,881 ERROR 
> [com.qunhe.logcomplex.oceanus.util.TaskSubmitter] - main - submit task failed 
> org.apache.flink.table.codegen.CodeGenException: Unsupported call: TUMBLE If 
> you think this function should be supported, you can create an issue and 
> start a discussion for it. at 
> org.apache.flink.table.codegen.CodeGenerator$$anonfun$visitCall$3.apply(CodeGenerator.scala:1006)
>  at 
> org.apache.flink.table.codegen.CodeGenerator$$anonfun$visitCall$3.apply(CodeGenerator.scala:1006)
>  at scala.Option.getOrElse(Option.scala:121) at 
> org.apache.flink.table.codegen.CodeGenerator.visitCall(CodeGenerator.scala:1006)
>  at 
> org.apache.flink.table.codegen.CodeGenerator.visitCall(CodeGenerator.scala:67)
>  at org.apache.calcite.rex.RexCall.accept(RexCall.java:107) at 
> org.apache.flink.table.codegen.CodeGenerator.generateExpression(CodeGenerator.scala:234)
>  at 
> org.apache.flink.table.codegen.CodeGenerator$$anonfun$7.apply(CodeGenerator.scala:321)
>  at 
> org.apache.flink.table.codegen.CodeGenerator$$anonfun$7.apply(CodeGenerator.scala:321)
>  at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>  at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>  at 
> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
>  at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48) at 
> scala.collection.TraversableLike$class.map(TraversableLike.scala:234) at 
> scala.collection.AbstractTraversable.map(Traversable.scala:104) at 
> org.apache.flink.table.codegen.CodeGenerator.generateResultExpression(CodeGenerator.scala:321)
>  at 
> org.apache.flink.table.plan.nodes.CommonCalc$class.generateFunction(CommonCalc.scala:44)
>  at 
> org.apache.flink.table.plan.nodes.datastream.DataStreamCalc.generateFunction(DataStreamCalc.scala:43)
>  at 
> org.apache.flink.table.plan.nodes.datastream.DataStreamCalc.translateToPlan(DataStreamCalc.scala:116)
>  at 
> org.apache.flink.table.plan.nodes.datastream.DataStreamGroupAggregate.translateToPlan(DataStreamGroupAggregate.scala:113)
>  at 
> org.apache.flink.table.plan.nodes.datastream.DataStreamGroupAggregate.translateToPlan(DataStreamGroupAggregate.scala:113)
>  at 
> org.apache.flink.table.plan.nodes.datastream.DataStreamCalc.translateToPlan(DataStreamCalc.scala:97)
>  at 
> org.apache.flink.table.api.StreamTableEnvironment.translateToCRow(StreamTableEnvironment.scala:837)
>  at 
> org.apache.flink.table.api.StreamTableEnvironment.translate(StreamTableEnvironment.scala:764)
>  at 
> org.apache.flink.table.api.StreamTableEnvironment.translate(StreamTableEnvironment.scala:734)
>  at 
> org.apache.flink.table.api.java.StreamTableEnvironment.toRetractStream(StreamTableEnvironment.scala:414)
>  at 
> org.apache.flink.table.api.java.StreamTableEnvironment.toRetractStream(StreamTableEnvironment.scala:357
> {code}
>  
> how can I implement this query



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-9019) Unclosed closeableRegistry in StreamTaskStateInitializerImpl#rawOperatorStateInputs

2018-03-19 Thread Sihua Zhou (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-9019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16405049#comment-16405049
 ] 

Sihua Zhou commented on FLINK-9019:
---

I think we can't use try-with-resources here, cause {{closeableRegistry}} will 
be wrapped by an anonymous class as the return value of 
`rawOperatorStateInputs()`. But I agree that closing a empty 
{{closeableRegistry}} here could make code more safer.

> Unclosed closeableRegistry in 
> StreamTaskStateInitializerImpl#rawOperatorStateInputs
> ---
>
> Key: FLINK-9019
> URL: https://issues.apache.org/jira/browse/FLINK-9019
> Project: Flink
>  Issue Type: Bug
>Reporter: Ted Yu
>Assignee: vinoyang
>Priority: Minor
>
> {code}
>  final CloseableRegistry closeableRegistry = new CloseableRegistry();
> ...
>  if (rawOperatorState != null) {
> ...
>   }
> }
> return CloseableIterable.empty();
> {code}
> If rawOperatorState is null, closeableRegistry would be left unclosed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (FLINK-8906) Flip6DefaultCLI is not tested in org.apache.flink.client.cli tests

2018-03-19 Thread Till Rohrmann (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-8906?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Till Rohrmann resolved FLINK-8906.
--
Resolution: Fixed

Fixed via
master: 463b922abebc77420d0ec990ba18ad83ea33c0e4
1.5.0: 0ba8ed6ba8b2f17de79e05207dc14226504b8be9

> Flip6DefaultCLI is not tested in org.apache.flink.client.cli tests
> --
>
> Key: FLINK-8906
> URL: https://issues.apache.org/jira/browse/FLINK-8906
> Project: Flink
>  Issue Type: Bug
>  Components: Client, Tests
>Affects Versions: 1.5.0, 1.6.0
>Reporter: Nico Kruber
>Assignee: Nico Kruber
>Priority: Blocker
>  Labels: flip-6
> Fix For: 1.5.0
>
>
> Various tests in {{org.apache.flink.client.cli}} only test with the 
> {{DefaultCLI}} but should also test {{Flip6DefaultCLI}}.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (FLINK-8905) RestClusterClient#getMaxSlots returns 0

2018-03-19 Thread Till Rohrmann (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-8905?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Till Rohrmann resolved FLINK-8905.
--
Resolution: Fixed

Fixed via
master: 52475b3478e78f12d5e2a9ecb10e2bf3d5133687
1.5.0: 76b0eb670921065ea2348cebbb62779d7fba6351

> RestClusterClient#getMaxSlots returns 0
> ---
>
> Key: FLINK-8905
> URL: https://issues.apache.org/jira/browse/FLINK-8905
> Project: Flink
>  Issue Type: Bug
>  Components: Client, Cluster Management, REST
>Affects Versions: 1.5.0, 1.6.0
>Reporter: Nico Kruber
>Assignee: Nico Kruber
>Priority: Blocker
>  Labels: flip-6
> Fix For: 1.5.0
>
>
> {{RestClusterClient#getMaxSlots()}} returns 0 although it should return 
> {{-1}} if the value is unknown.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] flink pull request #5671: [FLINK-8906][flip6][tests] also test Flip6DefaultC...

2018-03-19 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/5671


---


[jira] [Commented] (FLINK-8906) Flip6DefaultCLI is not tested in org.apache.flink.client.cli tests

2018-03-19 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-8906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16405014#comment-16405014
 ] 

ASF GitHub Bot commented on FLINK-8906:
---

Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/5671


> Flip6DefaultCLI is not tested in org.apache.flink.client.cli tests
> --
>
> Key: FLINK-8906
> URL: https://issues.apache.org/jira/browse/FLINK-8906
> Project: Flink
>  Issue Type: Bug
>  Components: Client, Tests
>Affects Versions: 1.5.0, 1.6.0
>Reporter: Nico Kruber
>Assignee: Nico Kruber
>Priority: Blocker
>  Labels: flip-6
> Fix For: 1.5.0
>
>
> Various tests in {{org.apache.flink.client.cli}} only test with the 
> {{DefaultCLI}} but should also test {{Flip6DefaultCLI}}.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-8988) End-to-end test: Cassandra connector

2018-03-19 Thread Till Rohrmann (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-8988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16405004#comment-16405004
 ] 

Till Rohrmann commented on FLINK-8988:
--

{{CassandraConnectorITCase}} is indeed already quite good. If I'm not mistaken, 
then these tests don't test the failure case yet. Thus, having an e2e test 
which fails before succeeding and then checking for the correct output could 
still be helpful. What do you think [~suez1224]?

> End-to-end test: Cassandra connector
> 
>
> Key: FLINK-8988
> URL: https://issues.apache.org/jira/browse/FLINK-8988
> Project: Flink
>  Issue Type: Sub-task
>  Components: Cassandra Connector, Tests
>Reporter: Till Rohrmann
>Priority: Major
>
> In order to test the integration with Cassandra, we should add an end-to-end 
> test which tests the Cassandra connector. In order to do this, we need to add 
> a script/function which sets up a {{Cassandra}} cluster. Then we can run a 
> simple job writing information to {{Cassandra}} using the 
> {{CassandraRowWriteAheadSink}} and the {{CassandraTupleWriteAheadSink}}.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-9019) Unclosed closeableRegistry in StreamTaskStateInitializerImpl#rawOperatorStateInputs

2018-03-19 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-9019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16405003#comment-16405003
 ] 

Ted Yu commented on FLINK-9019:
---

Since CloseableRegistry implements Closeable, try-with-resources should be used 
for the instance.

> Unclosed closeableRegistry in 
> StreamTaskStateInitializerImpl#rawOperatorStateInputs
> ---
>
> Key: FLINK-9019
> URL: https://issues.apache.org/jira/browse/FLINK-9019
> Project: Flink
>  Issue Type: Bug
>Reporter: Ted Yu
>Assignee: vinoyang
>Priority: Minor
>
> {code}
>  final CloseableRegistry closeableRegistry = new CloseableRegistry();
> ...
>  if (rawOperatorState != null) {
> ...
>   }
> }
> return CloseableIterable.empty();
> {code}
> If rawOperatorState is null, closeableRegistry would be left unclosed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-8562) Fix YARNSessionFIFOSecuredITCase

2018-03-19 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-8562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16405002#comment-16405002
 ] 

ASF GitHub Bot commented on FLINK-8562:
---

Github user tillrohrmann commented on the issue:

https://github.com/apache/flink/pull/5416
  
The problem is @suez1224 that in Flip-6 mode we won't allocate 2 containers 
if there is no job submitted. Thus, we would have to wait for a different 
signal in Flip-6 mode.


> Fix YARNSessionFIFOSecuredITCase
> 
>
> Key: FLINK-8562
> URL: https://issues.apache.org/jira/browse/FLINK-8562
> Project: Flink
>  Issue Type: Bug
>  Components: Security
>Affects Versions: 1.5.0, 1.6.0
>Reporter: Shuyi Chen
>Assignee: Shuyi Chen
>Priority: Blocker
> Fix For: 1.5.0
>
>
> Currently, YARNSessionFIFOSecuredITCase will not fail even if the current 
> Flink YARN Kerberos integration is failing in production. Please see 
> FLINK-8275.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] flink issue #5416: [FLINK-8562] [Security] Fix YARNSessionFIFOSecuredITCase

2018-03-19 Thread tillrohrmann
Github user tillrohrmann commented on the issue:

https://github.com/apache/flink/pull/5416
  
The problem is @suez1224 that in Flip-6 mode we won't allocate 2 containers 
if there is no job submitted. Thus, we would have to wait for a different 
signal in Flip-6 mode.


---


[jira] [Commented] (FLINK-8562) Fix YARNSessionFIFOSecuredITCase

2018-03-19 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-8562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16405000#comment-16405000
 ] 

ASF GitHub Bot commented on FLINK-8562:
---

Github user tillrohrmann commented on a diff in the pull request:

https://github.com/apache/flink/pull/5416#discussion_r175483617
  
--- Diff: 
flink-yarn-tests/src/test/java/org/apache/flink/yarn/YarnTestBase.java ---
@@ -206,7 +209,7 @@ public void checkClusterEmpty() throws IOException, 
YarnException {
}
}
 
-   flinkConfiguration = new 
org.apache.flink.configuration.Configuration();
+   flinkConfiguration = GlobalConfiguration.loadConfiguration();
--- End diff --

Can't we read the vanilla configuration in `YarnTestBase#start:474` then 
configure the keytab and the principal and write this `Configuration` out via 
`BootstrapTool#writeConfiguration`? We could then simply set 
`flinkConfiguration` to this `Configuration`. What do you think?


> Fix YARNSessionFIFOSecuredITCase
> 
>
> Key: FLINK-8562
> URL: https://issues.apache.org/jira/browse/FLINK-8562
> Project: Flink
>  Issue Type: Bug
>  Components: Security
>Affects Versions: 1.5.0, 1.6.0
>Reporter: Shuyi Chen
>Assignee: Shuyi Chen
>Priority: Blocker
> Fix For: 1.5.0
>
>
> Currently, YARNSessionFIFOSecuredITCase will not fail even if the current 
> Flink YARN Kerberos integration is failing in production. Please see 
> FLINK-8275.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] flink pull request #5416: [FLINK-8562] [Security] Fix YARNSessionFIFOSecured...

2018-03-19 Thread tillrohrmann
Github user tillrohrmann commented on a diff in the pull request:

https://github.com/apache/flink/pull/5416#discussion_r175483617
  
--- Diff: 
flink-yarn-tests/src/test/java/org/apache/flink/yarn/YarnTestBase.java ---
@@ -206,7 +209,7 @@ public void checkClusterEmpty() throws IOException, 
YarnException {
}
}
 
-   flinkConfiguration = new 
org.apache.flink.configuration.Configuration();
+   flinkConfiguration = GlobalConfiguration.loadConfiguration();
--- End diff --

Can't we read the vanilla configuration in `YarnTestBase#start:474` then 
configure the keytab and the principal and write this `Configuration` out via 
`BootstrapTool#writeConfiguration`? We could then simply set 
`flinkConfiguration` to this `Configuration`. What do you think?


---


[jira] [Commented] (FLINK-8562) Fix YARNSessionFIFOSecuredITCase

2018-03-19 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-8562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16404988#comment-16404988
 ] 

ASF GitHub Bot commented on FLINK-8562:
---

Github user tillrohrmann commented on a diff in the pull request:

https://github.com/apache/flink/pull/5416#discussion_r175480428
  
--- Diff: 
flink-yarn-tests/src/test/java/org/apache/flink/yarn/YARNSessionFIFOSecuredITCase.java
 ---
@@ -97,6 +98,15 @@ public static void teardownSecureCluster() throws 
Exception {
SecureTestEnvironment.cleanup();
}
 
+   @Override
+   public void testDetachedMode() throws InterruptedException, IOException 
{
+   super.testDetachedMode();
+   ensureStringInNamedLogFiles(new String[]{"Login successful for 
user", "using keytab file"},
--- End diff --

Thanks for the pointer :-)


> Fix YARNSessionFIFOSecuredITCase
> 
>
> Key: FLINK-8562
> URL: https://issues.apache.org/jira/browse/FLINK-8562
> Project: Flink
>  Issue Type: Bug
>  Components: Security
>Affects Versions: 1.5.0, 1.6.0
>Reporter: Shuyi Chen
>Assignee: Shuyi Chen
>Priority: Blocker
> Fix For: 1.5.0
>
>
> Currently, YARNSessionFIFOSecuredITCase will not fail even if the current 
> Flink YARN Kerberos integration is failing in production. Please see 
> FLINK-8275.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] flink pull request #5416: [FLINK-8562] [Security] Fix YARNSessionFIFOSecured...

2018-03-19 Thread tillrohrmann
Github user tillrohrmann commented on a diff in the pull request:

https://github.com/apache/flink/pull/5416#discussion_r175480428
  
--- Diff: 
flink-yarn-tests/src/test/java/org/apache/flink/yarn/YARNSessionFIFOSecuredITCase.java
 ---
@@ -97,6 +98,15 @@ public static void teardownSecureCluster() throws 
Exception {
SecureTestEnvironment.cleanup();
}
 
+   @Override
+   public void testDetachedMode() throws InterruptedException, IOException 
{
+   super.testDetachedMode();
+   ensureStringInNamedLogFiles(new String[]{"Login successful for 
user", "using keytab file"},
--- End diff --

Thanks for the pointer :-)


---


[jira] [Commented] (FLINK-8562) Fix YARNSessionFIFOSecuredITCase

2018-03-19 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-8562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16404987#comment-16404987
 ] 

ASF GitHub Bot commented on FLINK-8562:
---

Github user tillrohrmann commented on a diff in the pull request:

https://github.com/apache/flink/pull/5416#discussion_r175480192
  
--- Diff: 
flink-yarn-tests/src/test/java/org/apache/flink/yarn/YARNSessionFIFOITCase.java 
---
@@ -105,7 +105,7 @@ public void testDetachedMode() throws 
InterruptedException, IOException {
}
 
//additional sleep for the JM/TM to start and establish 
connection
-   sleep(2000);
+   sleep(3000);
--- End diff --

Hmm is this timeout increase not a little bit arbitrary. Can we implement 
some better waiting strategy? Like waiting for the respective log statement, 
for example.


> Fix YARNSessionFIFOSecuredITCase
> 
>
> Key: FLINK-8562
> URL: https://issues.apache.org/jira/browse/FLINK-8562
> Project: Flink
>  Issue Type: Bug
>  Components: Security
>Affects Versions: 1.5.0, 1.6.0
>Reporter: Shuyi Chen
>Assignee: Shuyi Chen
>Priority: Blocker
> Fix For: 1.5.0
>
>
> Currently, YARNSessionFIFOSecuredITCase will not fail even if the current 
> Flink YARN Kerberos integration is failing in production. Please see 
> FLINK-8275.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] flink pull request #5416: [FLINK-8562] [Security] Fix YARNSessionFIFOSecured...

2018-03-19 Thread tillrohrmann
Github user tillrohrmann commented on a diff in the pull request:

https://github.com/apache/flink/pull/5416#discussion_r175480192
  
--- Diff: 
flink-yarn-tests/src/test/java/org/apache/flink/yarn/YARNSessionFIFOITCase.java 
---
@@ -105,7 +105,7 @@ public void testDetachedMode() throws 
InterruptedException, IOException {
}
 
//additional sleep for the JM/TM to start and establish 
connection
-   sleep(2000);
+   sleep(3000);
--- End diff --

Hmm is this timeout increase not a little bit arbitrary. Can we implement 
some better waiting strategy? Like waiting for the respective log statement, 
for example.


---


[jira] [Commented] (FLINK-7270) Add support for dynamic properties to cluster entry point

2018-03-19 Thread Till Rohrmann (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-7270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16404984#comment-16404984
 ] 

Till Rohrmann commented on FLINK-7270:
--

{{env.java.opts}} is a configuration option which you can use to specify JVM 
options with which you want to start the JVM for Flink processes. If that is 
not working properly, then please open a corresponding JIRA ticket 
[~juanmirocks].

This ticket is about parsing dynamic properties with which the Flink component 
is started. These values should override the configuration settings read from 
the global configuration as done in {{MesosJobClusterEntrypoint}} for example.

> Add support for dynamic properties to cluster entry point
> -
>
> Key: FLINK-7270
> URL: https://issues.apache.org/jira/browse/FLINK-7270
> Project: Flink
>  Issue Type: Improvement
>  Components: Cluster Management
>Affects Versions: 1.3.1
>Reporter: Till Rohrmann
>Assignee: Fang Yong
>Priority: Minor
>  Labels: flip-6
>
> We should respect dynamic properties when starting the {{ClusterEntrypoint}}. 
> This basically means extracting them from the passed command line arguments 
> and then adding the to the loaded {{Configuration}}.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (FLINK-7897) Consider using nio.Files for file deletion in TransientBlobCleanupTask

2018-03-19 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-7897?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated FLINK-7897:
--
Description: 
nio.Files#delete() provides better clue as to why the deletion may fail:

https://docs.oracle.com/javase/7/docs/api/java/nio/file/Files.html#delete(java.nio.file.Path)


Depending on the potential exception (FileNotFound), the call to 
localFile.exists() may be skipped.

  was:
nio.Files#delete() provides better clue as to why the deletion may fail:

https://docs.oracle.com/javase/7/docs/api/java/nio/file/Files.html#delete(java.nio.file.Path)

Depending on the potential exception (FileNotFound), the call to 
localFile.exists() may be skipped.


> Consider using nio.Files for file deletion in TransientBlobCleanupTask
> --
>
> Key: FLINK-7897
> URL: https://issues.apache.org/jira/browse/FLINK-7897
> Project: Flink
>  Issue Type: Improvement
>  Components: Local Runtime
>Reporter: Ted Yu
>Priority: Minor
>
> nio.Files#delete() provides better clue as to why the deletion may fail:
> https://docs.oracle.com/javase/7/docs/api/java/nio/file/Files.html#delete(java.nio.file.Path)
> Depending on the potential exception (FileNotFound), the call to 
> localFile.exists() may be skipped.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (FLINK-8415) Unprotected access to recordsToSend in LongRecordWriterThread#shutdown()

2018-03-19 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-8415?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated FLINK-8415:
--
Description: 
{code}
  public void shutdown() {
running = false;
recordsToSend.complete(0L);
{code}

In other methods, access to recordsToSend is protected by synchronized keyword.

shutdown() should do the same.

  was:
{code}
  public void shutdown() {
running = false;
recordsToSend.complete(0L);
{code}
In other methods, access to recordsToSend is protected by synchronized keyword.

shutdown() should do the same.


> Unprotected access to recordsToSend in LongRecordWriterThread#shutdown()
> 
>
> Key: FLINK-8415
> URL: https://issues.apache.org/jira/browse/FLINK-8415
> Project: Flink
>  Issue Type: Bug
>  Components: Streaming
>Reporter: Ted Yu
>Priority: Minor
>
> {code}
>   public void shutdown() {
> running = false;
> recordsToSend.complete(0L);
> {code}
> In other methods, access to recordsToSend is protected by synchronized 
> keyword.
> shutdown() should do the same.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-8073) Test instability FlinkKafkaProducer011ITCase.testScaleDownBeforeFirstCheckpoint()

2018-03-19 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-8073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16404972#comment-16404972
 ] 

ASF GitHub Bot commented on FLINK-8073:
---

Github user tillrohrmann commented on the issue:

https://github.com/apache/flink/pull/5718
  
I agree with @pnowojski. The watchdog and Travis itself should time out the 
test for us if things take too long.


> Test instability 
> FlinkKafkaProducer011ITCase.testScaleDownBeforeFirstCheckpoint()
> -
>
> Key: FLINK-8073
> URL: https://issues.apache.org/jira/browse/FLINK-8073
> Project: Flink
>  Issue Type: Bug
>  Components: Kafka Connector, Tests
>Affects Versions: 1.4.0, 1.5.0
>Reporter: Kostas Kloudas
>Priority: Blocker
>  Labels: test-stability
> Fix For: 1.5.0
>
>
> Travis log: https://travis-ci.org/kl0u/flink/jobs/301985988



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] flink issue #5718: [FLINK-8073][kafka-tests] Disable timeout in tests

2018-03-19 Thread tillrohrmann
Github user tillrohrmann commented on the issue:

https://github.com/apache/flink/pull/5718
  
I agree with @pnowojski. The watchdog and Travis itself should time out the 
test for us if things take too long.


---


[GitHub] flink issue #5708: [FLINK-8984][network] Drop taskmanager.exactly-once.block...

2018-03-19 Thread pnowojski
Github user pnowojski commented on the issue:

https://github.com/apache/flink/pull/5708
  
Yes, it's a valid setting, but has worse performance and whole credit based 
flow control looses it's sense. 
`taskmanager.exactly-once.blocking.data.enabled` setting was thought as a 
fallback option if there is a bug in new code path, but this is achieved by 
using `taskmanager.network.credit-based-flow-control.enabled` in two places 
(and both of those places are about using `credit-based-flow-control`).

In other words, we do not think there is a much value in allowing 
`taskmanager.network.credit-based-flow-control.enabled` enabled, but 
`taskmanager.exactly-once.blocking.data.enabled` 
disabled, and it would make the configuration more complex/confusing for 
the users.


---


[jira] [Commented] (FLINK-8948) RescalingITCase fails on Travis

2018-03-19 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-8948?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16404948#comment-16404948
 ] 

ASF GitHub Bot commented on FLINK-8948:
---

Github user pnowojski commented on the issue:

https://github.com/apache/flink/pull/5710
  
thanks!


> RescalingITCase fails on Travis
> ---
>
> Key: FLINK-8948
> URL: https://issues.apache.org/jira/browse/FLINK-8948
> Project: Flink
>  Issue Type: Improvement
>  Components: State Backends, Checkpointing, Tests, Travis
>Affects Versions: 1.5.0
>Reporter: Chesnay Schepler
>Assignee: Piotr Nowojski
>Priority: Blocker
> Fix For: 1.5.0
>
>
> https://travis-ci.org/apache/flink/jobs/353468272
> {code}
> testSavepointRescalingInKeyedStateDerivedMaxParallelism[0](org.apache.flink.test.checkpointing.RescalingITCase)
>   Time elapsed: 1.858 sec  <<< ERROR!
> org.apache.flink.runtime.client.JobExecutionException: Job execution failed.
>   at 
> org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1$$anonfun$applyOrElse$6.apply$mcV$sp(JobManager.scala:891)
>   at 
> org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1$$anonfun$applyOrElse$6.apply(JobManager.scala:834)
>   at 
> org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1$$anonfun$applyOrElse$6.apply(JobManager.scala:834)
>   at 
> scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24)
>   at 
> scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24)
>   at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:39)
>   at 
> akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:415)
>   at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
>   at 
> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
>   at 
> scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
>   at 
> scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
> Caused by: java.lang.IllegalStateException: null
>   at 
> org.apache.flink.util.Preconditions.checkState(Preconditions.java:179)
>   at 
> org.apache.flink.runtime.io.network.buffer.BufferBuilder.finish(BufferBuilder.java:105)
>   at 
> org.apache.flink.runtime.io.network.api.writer.RecordWriter.closeBufferBuilder(RecordWriter.java:218)
>   at 
> org.apache.flink.runtime.io.network.api.writer.RecordWriter.clearBuffers(RecordWriter.java:178)
>   at 
> org.apache.flink.streaming.runtime.io.StreamRecordWriter.close(StreamRecordWriter.java:99)
>   at 
> org.apache.flink.streaming.runtime.io.RecordWriterOutput.close(RecordWriterOutput.java:161)
>   at 
> org.apache.flink.streaming.runtime.tasks.OperatorChain.releaseOutputs(OperatorChain.java:239)
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:402)
>   at org.apache.flink.runtime.taskmanager.Task.run(Task.java:703)
>   at java.lang.Thread.run(Thread.java:748)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] flink issue #5710: [FLINK-8948][runtime] Fix IllegalStateException when clos...

2018-03-19 Thread pnowojski
Github user pnowojski commented on the issue:

https://github.com/apache/flink/pull/5710
  
thanks!


---


[jira] [Commented] (FLINK-8073) Test instability FlinkKafkaProducer011ITCase.testScaleDownBeforeFirstCheckpoint()

2018-03-19 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-8073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16404944#comment-16404944
 ] 

ASF GitHub Bot commented on FLINK-8073:
---

Github user pnowojski commented on the issue:

https://github.com/apache/flink/pull/5718
  
I think that timeouts in IDE are only making things worse:
1. you can not debug, since longer debugging session will fail 
2. in case of deadlocks, locally you can always manually dump/view stack 
traces but you also can do full thread/memory dump and/or attach a debugger.

On travis you can not debug it in any way, thus automated actions make 
sense (and they preserve some travis resources for other builds). 

Furthermore, such rule would have to be implemented in all of our tests, 
not only here. Also IMO if you want to replace/remove `travis_mvn_watchdog` 
with such rule, this is out of scope of this ticket and it should be discussed 
separately (I would be against it), so that it doesn't hinder this ticket.

When I originally created those tests (with `@Test(timeout = )`) I did 
it by copying without thinking from a test, that was printing something to logs 
once every checkpoint, thus it was interrupting travis watchdog and one could 
argue that it required this special case timeout. This commit makes it more 
consistent with rest of the tests.


> Test instability 
> FlinkKafkaProducer011ITCase.testScaleDownBeforeFirstCheckpoint()
> -
>
> Key: FLINK-8073
> URL: https://issues.apache.org/jira/browse/FLINK-8073
> Project: Flink
>  Issue Type: Bug
>  Components: Kafka Connector, Tests
>Affects Versions: 1.4.0, 1.5.0
>Reporter: Kostas Kloudas
>Priority: Blocker
>  Labels: test-stability
> Fix For: 1.5.0
>
>
> Travis log: https://travis-ci.org/kl0u/flink/jobs/301985988



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] flink issue #5718: [FLINK-8073][kafka-tests] Disable timeout in tests

2018-03-19 Thread pnowojski
Github user pnowojski commented on the issue:

https://github.com/apache/flink/pull/5718
  
I think that timeouts in IDE are only making things worse:
1. you can not debug, since longer debugging session will fail 
2. in case of deadlocks, locally you can always manually dump/view stack 
traces but you also can do full thread/memory dump and/or attach a debugger.

On travis you can not debug it in any way, thus automated actions make 
sense (and they preserve some travis resources for other builds). 

Furthermore, such rule would have to be implemented in all of our tests, 
not only here. Also IMO if you want to replace/remove `travis_mvn_watchdog` 
with such rule, this is out of scope of this ticket and it should be discussed 
separately (I would be against it), so that it doesn't hinder this ticket.

When I originally created those tests (with `@Test(timeout = )`) I did 
it by copying without thinking from a test, that was printing something to logs 
once every checkpoint, thus it was interrupting travis watchdog and one could 
argue that it required this special case timeout. This commit makes it more 
consistent with rest of the tests.


---


[jira] [Updated] (FLINK-9024) IntelliJ IDEA support run and debug Flink code

2018-03-19 Thread chillon.m (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-9024?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

chillon.m updated FLINK-9024:
-
Issue Type: Improvement  (was: Wish)

> IntelliJ IDEA support run and debug Flink code
> --
>
> Key: FLINK-9024
> URL: https://issues.apache.org/jira/browse/FLINK-9024
> Project: Flink
>  Issue Type: Improvement
>  Components: Documentation
>Reporter: chillon.m
>Priority: Trivial
> Attachments: 10086390.png
>
>
> [https://ci.apache.org/projects/flink/flink-docs-release-1.4/dev/java8.html]
>   
>  IntelliJ IDEA comes bundled with the Eclipse compiler.
>  we can debug and run Flink code with some configuration steps. !10086390.png!
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (FLINK-9024) IntelliJ IDEA support run and debug Flink code

2018-03-19 Thread chillon.m (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-9024?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

chillon.m updated FLINK-9024:
-
Issue Type: Wish  (was: Bug)

> IntelliJ IDEA support run and debug Flink code
> --
>
> Key: FLINK-9024
> URL: https://issues.apache.org/jira/browse/FLINK-9024
> Project: Flink
>  Issue Type: Wish
>  Components: Documentation
>Reporter: chillon.m
>Priority: Trivial
> Attachments: 10086390.png
>
>
> [https://ci.apache.org/projects/flink/flink-docs-release-1.4/dev/java8.html]
>   
>  IntelliJ IDEA comes bundled with the Eclipse compiler.
>  we can debug and run Flink code with some configuration steps. !10086390.png!
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-8959) Port AccumulatorLiveITCase to flip6

2018-03-19 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-8959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16404932#comment-16404932
 ] 

ASF GitHub Bot commented on FLINK-8959:
---

Github user aljoscha commented on a diff in the pull request:

https://github.com/apache/flink/pull/5719#discussion_r175462181
  
--- Diff: 
flink-tests/src/test/java/org/apache/flink/test/accumulators/AccumulatorLiveITCase.java
 ---
@@ -18,292 +18,188 @@
 
 package org.apache.flink.test.accumulators;
 
-import org.apache.flink.api.common.JobExecutionResult;
-import org.apache.flink.api.common.JobID;
 import org.apache.flink.api.common.Plan;
-import org.apache.flink.api.common.accumulators.Accumulator;
 import org.apache.flink.api.common.accumulators.IntCounter;
 import org.apache.flink.api.common.functions.RichFlatMapFunction;
 import org.apache.flink.api.common.io.OutputFormat;
 import org.apache.flink.api.java.DataSet;
 import org.apache.flink.api.java.ExecutionEnvironment;
-import org.apache.flink.api.java.LocalEnvironment;
+import org.apache.flink.client.program.ClusterClient;
 import org.apache.flink.configuration.AkkaOptions;
-import org.apache.flink.configuration.ConfigConstants;
 import org.apache.flink.configuration.Configuration;
+import org.apache.flink.configuration.HeartbeatManagerOptions;
+import org.apache.flink.core.testutils.MultiShotLatch;
 import org.apache.flink.optimizer.DataStatistics;
 import org.apache.flink.optimizer.Optimizer;
 import org.apache.flink.optimizer.plan.OptimizedPlan;
 import org.apache.flink.optimizer.plantranslate.JobGraphGenerator;
-import org.apache.flink.runtime.akka.AkkaUtils;
-import org.apache.flink.runtime.akka.ListeningBehaviour;
-import org.apache.flink.runtime.executiongraph.ExecutionAttemptID;
-import org.apache.flink.runtime.instance.ActorGateway;
-import org.apache.flink.runtime.instance.AkkaActorGateway;
 import org.apache.flink.runtime.jobgraph.JobGraph;
-import org.apache.flink.runtime.messages.JobManagerMessages;
-import org.apache.flink.runtime.testingUtils.TestingCluster;
-import org.apache.flink.runtime.testingUtils.TestingJobManagerMessages;
-import org.apache.flink.runtime.testingUtils.TestingTaskManagerMessages;
 import org.apache.flink.runtime.testingUtils.TestingUtils;
 import org.apache.flink.streaming.api.datastream.DataStream;
 import 
org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
+import org.apache.flink.test.util.MiniClusterResource;
+import org.apache.flink.testutils.category.Flip6;
 import org.apache.flink.util.Collector;
 import org.apache.flink.util.TestLogger;
 
-import akka.actor.ActorRef;
-import akka.actor.ActorSystem;
-import akka.pattern.Patterns;
-import akka.testkit.JavaTestKit;
-import akka.util.Timeout;
-import org.junit.After;
 import org.junit.Before;
+import org.junit.ClassRule;
 import org.junit.Test;
+import org.junit.experimental.categories.Category;
 import org.slf4j.Logger;
 import org.slf4j.LoggerFactory;
 
 import java.io.IOException;
 import java.util.ArrayList;
 import java.util.List;
 import java.util.Map;
-import java.util.concurrent.TimeUnit;
 
-import scala.concurrent.Await;
-import scala.concurrent.Future;
-import scala.concurrent.duration.FiniteDuration;
-
-import static org.junit.Assert.fail;
+import static org.junit.Assert.assertEquals;
+import static org.junit.Assert.assertTrue;
 
 /**
- * Tests the availability of accumulator results during runtime. The test 
case tests a user-defined
- * accumulator and Flink's internal accumulators for two consecutive tasks.
- *
- * CHAINED[Source -> Map] -> Sink
- *
- * Checks are performed as the elements arrive at the operators. Checks 
consist of a message sent by
- * the task to the task manager which notifies the job manager and sends 
the current accumulators.
- * The task blocks until the test has been notified about the current 
accumulator values.
- *
- * A barrier between the operators ensures that that pipelining is 
disabled for the streaming test.
- * The batch job reads the records one at a time. The streaming code 
buffers the records beforehand;
- * that's why exact guarantees about the number of records read are very 
hard to make. Thus, why we
- * check for an upper bound of the elements read.
+ * Tests the availability of accumulator results during runtime.
  */
+@Category(Flip6.class)
 public class AccumulatorLiveITCase extends TestLogger {
 
private static final Logger LOG = 
LoggerFactory.getLogger(AccumulatorLiveITCase.class);
 
-   private static ActorSystem system;
-   private static ActorGateway 

[GitHub] flink pull request #5719: [FLINK-8959][tests] Port AccumulatorLiveITCase to ...

2018-03-19 Thread aljoscha
Github user aljoscha commented on a diff in the pull request:

https://github.com/apache/flink/pull/5719#discussion_r175462181
  
--- Diff: 
flink-tests/src/test/java/org/apache/flink/test/accumulators/AccumulatorLiveITCase.java
 ---
@@ -18,292 +18,188 @@
 
 package org.apache.flink.test.accumulators;
 
-import org.apache.flink.api.common.JobExecutionResult;
-import org.apache.flink.api.common.JobID;
 import org.apache.flink.api.common.Plan;
-import org.apache.flink.api.common.accumulators.Accumulator;
 import org.apache.flink.api.common.accumulators.IntCounter;
 import org.apache.flink.api.common.functions.RichFlatMapFunction;
 import org.apache.flink.api.common.io.OutputFormat;
 import org.apache.flink.api.java.DataSet;
 import org.apache.flink.api.java.ExecutionEnvironment;
-import org.apache.flink.api.java.LocalEnvironment;
+import org.apache.flink.client.program.ClusterClient;
 import org.apache.flink.configuration.AkkaOptions;
-import org.apache.flink.configuration.ConfigConstants;
 import org.apache.flink.configuration.Configuration;
+import org.apache.flink.configuration.HeartbeatManagerOptions;
+import org.apache.flink.core.testutils.MultiShotLatch;
 import org.apache.flink.optimizer.DataStatistics;
 import org.apache.flink.optimizer.Optimizer;
 import org.apache.flink.optimizer.plan.OptimizedPlan;
 import org.apache.flink.optimizer.plantranslate.JobGraphGenerator;
-import org.apache.flink.runtime.akka.AkkaUtils;
-import org.apache.flink.runtime.akka.ListeningBehaviour;
-import org.apache.flink.runtime.executiongraph.ExecutionAttemptID;
-import org.apache.flink.runtime.instance.ActorGateway;
-import org.apache.flink.runtime.instance.AkkaActorGateway;
 import org.apache.flink.runtime.jobgraph.JobGraph;
-import org.apache.flink.runtime.messages.JobManagerMessages;
-import org.apache.flink.runtime.testingUtils.TestingCluster;
-import org.apache.flink.runtime.testingUtils.TestingJobManagerMessages;
-import org.apache.flink.runtime.testingUtils.TestingTaskManagerMessages;
 import org.apache.flink.runtime.testingUtils.TestingUtils;
 import org.apache.flink.streaming.api.datastream.DataStream;
 import 
org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
+import org.apache.flink.test.util.MiniClusterResource;
+import org.apache.flink.testutils.category.Flip6;
 import org.apache.flink.util.Collector;
 import org.apache.flink.util.TestLogger;
 
-import akka.actor.ActorRef;
-import akka.actor.ActorSystem;
-import akka.pattern.Patterns;
-import akka.testkit.JavaTestKit;
-import akka.util.Timeout;
-import org.junit.After;
 import org.junit.Before;
+import org.junit.ClassRule;
 import org.junit.Test;
+import org.junit.experimental.categories.Category;
 import org.slf4j.Logger;
 import org.slf4j.LoggerFactory;
 
 import java.io.IOException;
 import java.util.ArrayList;
 import java.util.List;
 import java.util.Map;
-import java.util.concurrent.TimeUnit;
 
-import scala.concurrent.Await;
-import scala.concurrent.Future;
-import scala.concurrent.duration.FiniteDuration;
-
-import static org.junit.Assert.fail;
+import static org.junit.Assert.assertEquals;
+import static org.junit.Assert.assertTrue;
 
 /**
- * Tests the availability of accumulator results during runtime. The test 
case tests a user-defined
- * accumulator and Flink's internal accumulators for two consecutive tasks.
- *
- * CHAINED[Source -> Map] -> Sink
- *
- * Checks are performed as the elements arrive at the operators. Checks 
consist of a message sent by
- * the task to the task manager which notifies the job manager and sends 
the current accumulators.
- * The task blocks until the test has been notified about the current 
accumulator values.
- *
- * A barrier between the operators ensures that that pipelining is 
disabled for the streaming test.
- * The batch job reads the records one at a time. The streaming code 
buffers the records beforehand;
- * that's why exact guarantees about the number of records read are very 
hard to make. Thus, why we
- * check for an upper bound of the elements read.
+ * Tests the availability of accumulator results during runtime.
  */
+@Category(Flip6.class)
 public class AccumulatorLiveITCase extends TestLogger {
 
private static final Logger LOG = 
LoggerFactory.getLogger(AccumulatorLiveITCase.class);
 
-   private static ActorSystem system;
-   private static ActorGateway jobManagerGateway;
-   private static ActorRef taskManager;
-
-   private static JobID jobID;
-   private static JobGraph jobGraph;
-
// name of user accumulator
private static final String ACCUMULATOR_NAME = 

[jira] [Closed] (FLINK-8947) Timeout handler issue

2018-03-19 Thread Aljoscha Krettek (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-8947?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aljoscha Krettek closed FLINK-8947.
---
Resolution: Not A Problem

This is working as intended. The bounded extractor will only emit a higher 
watermark when it "sees" new data. You can write a custom watermark assigner 
derived from {{AssignerWithPeriodicWatermarks}} that also advances the 
event-time/watermark when not seeing new data.

> Timeout handler issue
> -
>
> Key: FLINK-8947
> URL: https://issues.apache.org/jira/browse/FLINK-8947
> Project: Flink
>  Issue Type: Bug
>  Components: CEP
>Affects Versions: 1.4.2
>Reporter: dhiraj prajapati
>Priority: Critical
>
> The issue is same as FLINK-5753
> I am using Event time and have used watermark interval. Still, I have 
> observed the timeout executes only after the next event
> if: first event appears, second event not appear in the stream 
> and *no new events appear in a stream*, timeout handler is not executed.
> Expected result: timeout handler should be executed in case if there are no 
> new events in a stream
>  
> My code snippet:
> DataStream dataStream = env.socketTextStream("localhost", 1212);
> dataStream.getExecutionConfig().setAutoWatermarkInterval(100L);
> dataStream = dataStream.assignTimestampsAndWatermarks(new 
> BoundedOutOfOrdernessTimestampExtractor(
>  Time.seconds(0)) {
> private static final long serialVersionUID = 4969170359023055566L;
> @Override
>  public long extractTimestamp(JSONObject event) {
>  return System.currentTimeMillis();
>  }
>  });



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (FLINK-9024) IntelliJ IDEA support run and debug Flink code

2018-03-19 Thread chillon.m (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-9024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16404893#comment-16404893
 ] 

chillon.m edited comment on FLINK-9024 at 3/19/18 2:42 PM:
---

[~Zentol] it give me a illusion that only eclipse can run and debug flink code 
when read 
[https://ci.apache.org/projects/flink/flink-docs-release-1.4/dev/java8.html].

 add doc how to dev flink code with intellj idea.


was (Author: chillon_m):
[~Zentol] it give me a illusion that only eclipse can run and debug flink code 
when read 
[https://ci.apache.org/projects/flink/flink-docs-release-1.4/dev/java8.html].

 add doc how to dev flink code with intellj idea,not only eclipse can debug and 
run flink code.

 

> IntelliJ IDEA support run and debug Flink code
> --
>
> Key: FLINK-9024
> URL: https://issues.apache.org/jira/browse/FLINK-9024
> Project: Flink
>  Issue Type: Bug
>  Components: Documentation
>Reporter: chillon.m
>Priority: Trivial
> Attachments: 10086390.png
>
>
> [https://ci.apache.org/projects/flink/flink-docs-release-1.4/dev/java8.html]
>   
>  IntelliJ IDEA comes bundled with the Eclipse compiler.
>  we can debug and run Flink code with some configuration steps. !10086390.png!
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (FLINK-9024) IntelliJ IDEA support run and debug Flink code

2018-03-19 Thread chillon.m (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-9024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16404893#comment-16404893
 ] 

chillon.m edited comment on FLINK-9024 at 3/19/18 2:39 PM:
---

[~Zentol] it give me a illusion that only eclipse can run and debug flink code 
when read 
[https://ci.apache.org/projects/flink/flink-docs-release-1.4/dev/java8.html].

 add doc how to dev flink code with intellj idea,not only eclipse can debug and 
run flink code.

 


was (Author: chillon_m):
[~Zentol] add doc on 
https://ci.apache.org/projects/flink/flink-docs-release-1.4/dev/java8.html how 
to dev flink code with intellj idea.
not only eclipse can debug and run flink code.

> IntelliJ IDEA support run and debug Flink code
> --
>
> Key: FLINK-9024
> URL: https://issues.apache.org/jira/browse/FLINK-9024
> Project: Flink
>  Issue Type: Bug
>  Components: Documentation
>Reporter: chillon.m
>Priority: Trivial
> Attachments: 10086390.png
>
>
> [https://ci.apache.org/projects/flink/flink-docs-release-1.4/dev/java8.html]
>   
>  IntelliJ IDEA comes bundled with the Eclipse compiler.
>  we can debug and run Flink code with some configuration steps. !10086390.png!
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (FLINK-9024) IntelliJ IDEA support run and debug Flink code

2018-03-19 Thread chillon.m (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-9024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16404893#comment-16404893
 ] 

chillon.m edited comment on FLINK-9024 at 3/19/18 2:23 PM:
---

[~Zentol] add doc on 
https://ci.apache.org/projects/flink/flink-docs-release-1.4/dev/java8.html how 
to dev flink code with intellj idea.
not only eclipse can debug and run flink code.


was (Author: chillon_m):
[~Zentol] add doc on 
https://ci.apache.org/projects/flink/flink-docs-release-1.4/dev/java8.html how 
to dev flink code with intellj idea.

> IntelliJ IDEA support run and debug Flink code
> --
>
> Key: FLINK-9024
> URL: https://issues.apache.org/jira/browse/FLINK-9024
> Project: Flink
>  Issue Type: Bug
>  Components: Documentation
>Reporter: chillon.m
>Priority: Trivial
> Attachments: 10086390.png
>
>
> [https://ci.apache.org/projects/flink/flink-docs-release-1.4/dev/java8.html]
>   
>  IntelliJ IDEA comes bundled with the Eclipse compiler.
>  we can debug and run Flink code with some configuration steps. !10086390.png!
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-9024) IntelliJ IDEA support run and debug Flink code

2018-03-19 Thread chillon.m (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-9024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16404893#comment-16404893
 ] 

chillon.m commented on FLINK-9024:
--

[~Zentol] add doc on 
https://ci.apache.org/projects/flink/flink-docs-release-1.4/dev/java8.html how 
to dev flink code with intellj idea.

> IntelliJ IDEA support run and debug Flink code
> --
>
> Key: FLINK-9024
> URL: https://issues.apache.org/jira/browse/FLINK-9024
> Project: Flink
>  Issue Type: Bug
>  Components: Documentation
>Reporter: chillon.m
>Priority: Trivial
> Attachments: 10086390.png
>
>
> [https://ci.apache.org/projects/flink/flink-docs-release-1.4/dev/java8.html]
>   
>  IntelliJ IDEA comes bundled with the Eclipse compiler.
>  we can debug and run Flink code with some configuration steps. !10086390.png!
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-8935) Implement remaining methods in MiniClusterClient

2018-03-19 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-8935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16404891#comment-16404891
 ] 

ASF GitHub Bot commented on FLINK-8935:
---

Github user zentol commented on the issue:

https://github.com/apache/flink/pull/5690
  
merging...


> Implement remaining methods in MiniClusterClient
> 
>
> Key: FLINK-8935
> URL: https://issues.apache.org/jira/browse/FLINK-8935
> Project: Flink
>  Issue Type: Improvement
>  Components: Tests
>Affects Versions: 1.5.0
>Reporter: Chesnay Schepler
>Assignee: Chesnay Schepler
>Priority: Blocker
> Fix For: 1.5.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] flink issue #5690: [FLINK-8935][tests] Extend MiniClusterClient

2018-03-19 Thread zentol
Github user zentol commented on the issue:

https://github.com/apache/flink/pull/5690
  
merging...


---


[jira] [Commented] (FLINK-8948) RescalingITCase fails on Travis

2018-03-19 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-8948?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16404884#comment-16404884
 ] 

ASF GitHub Bot commented on FLINK-8948:
---

Github user zentol commented on the issue:

https://github.com/apache/flink/pull/5710
  
merging.


> RescalingITCase fails on Travis
> ---
>
> Key: FLINK-8948
> URL: https://issues.apache.org/jira/browse/FLINK-8948
> Project: Flink
>  Issue Type: Improvement
>  Components: State Backends, Checkpointing, Tests, Travis
>Affects Versions: 1.5.0
>Reporter: Chesnay Schepler
>Assignee: Piotr Nowojski
>Priority: Blocker
> Fix For: 1.5.0
>
>
> https://travis-ci.org/apache/flink/jobs/353468272
> {code}
> testSavepointRescalingInKeyedStateDerivedMaxParallelism[0](org.apache.flink.test.checkpointing.RescalingITCase)
>   Time elapsed: 1.858 sec  <<< ERROR!
> org.apache.flink.runtime.client.JobExecutionException: Job execution failed.
>   at 
> org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1$$anonfun$applyOrElse$6.apply$mcV$sp(JobManager.scala:891)
>   at 
> org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1$$anonfun$applyOrElse$6.apply(JobManager.scala:834)
>   at 
> org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1$$anonfun$applyOrElse$6.apply(JobManager.scala:834)
>   at 
> scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24)
>   at 
> scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24)
>   at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:39)
>   at 
> akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:415)
>   at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
>   at 
> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
>   at 
> scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
>   at 
> scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
> Caused by: java.lang.IllegalStateException: null
>   at 
> org.apache.flink.util.Preconditions.checkState(Preconditions.java:179)
>   at 
> org.apache.flink.runtime.io.network.buffer.BufferBuilder.finish(BufferBuilder.java:105)
>   at 
> org.apache.flink.runtime.io.network.api.writer.RecordWriter.closeBufferBuilder(RecordWriter.java:218)
>   at 
> org.apache.flink.runtime.io.network.api.writer.RecordWriter.clearBuffers(RecordWriter.java:178)
>   at 
> org.apache.flink.streaming.runtime.io.StreamRecordWriter.close(StreamRecordWriter.java:99)
>   at 
> org.apache.flink.streaming.runtime.io.RecordWriterOutput.close(RecordWriterOutput.java:161)
>   at 
> org.apache.flink.streaming.runtime.tasks.OperatorChain.releaseOutputs(OperatorChain.java:239)
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:402)
>   at org.apache.flink.runtime.taskmanager.Task.run(Task.java:703)
>   at java.lang.Thread.run(Thread.java:748)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-8958) Port TaskCancelAsyncProducerConsumerITCase to flip6

2018-03-19 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-8958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16404887#comment-16404887
 ] 

ASF GitHub Bot commented on FLINK-8958:
---

GitHub user zentol opened a pull request:

https://github.com/apache/flink/pull/5722

[FLINK-8958][tests] Port TaskCancelAsyncProducerConsumerITCase to flip6

## What is the purpose of the change

This PR ports the `TaskCancelAsyncProducerConsumerITCase` to flip6. The 
existing test was renamed to `TaskCancelAsyncProducerConsumerITCase`, and a 
ported copy was added.



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/zentol/flink 8958

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/5722.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #5722


commit 9df4a4ff1fadc12f37c94659ae13fc6ad1623d2d
Author: zentol 
Date:   2018-03-19T14:16:18Z

[FLINK-8958][tests] Port TaskCancelAsyncProducerConsumerITCase to flip6




> Port TaskCancelAsyncProducerConsumerITCase to flip6
> ---
>
> Key: FLINK-8958
> URL: https://issues.apache.org/jira/browse/FLINK-8958
> Project: Flink
>  Issue Type: Improvement
>  Components: Tests
>Affects Versions: 1.5.0
>Reporter: Chesnay Schepler
>Assignee: Chesnay Schepler
>Priority: Blocker
> Fix For: 1.5.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] flink pull request #5722: [FLINK-8958][tests] Port TaskCancelAsyncProducerCo...

2018-03-19 Thread zentol
GitHub user zentol opened a pull request:

https://github.com/apache/flink/pull/5722

[FLINK-8958][tests] Port TaskCancelAsyncProducerConsumerITCase to flip6

## What is the purpose of the change

This PR ports the `TaskCancelAsyncProducerConsumerITCase` to flip6. The 
existing test was renamed to `TaskCancelAsyncProducerConsumerITCase`, and a 
ported copy was added.



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/zentol/flink 8958

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/5722.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #5722


commit 9df4a4ff1fadc12f37c94659ae13fc6ad1623d2d
Author: zentol 
Date:   2018-03-19T14:16:18Z

[FLINK-8958][tests] Port TaskCancelAsyncProducerConsumerITCase to flip6




---


[GitHub] flink issue #5710: [FLINK-8948][runtime] Fix IllegalStateException when clos...

2018-03-19 Thread zentol
Github user zentol commented on the issue:

https://github.com/apache/flink/pull/5710
  
merging.


---


[jira] [Assigned] (FLINK-8958) Port TaskCancelAsyncProducerConsumerITCase to flip6

2018-03-19 Thread Chesnay Schepler (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-8958?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chesnay Schepler reassigned FLINK-8958:
---

Assignee: Chesnay Schepler

> Port TaskCancelAsyncProducerConsumerITCase to flip6
> ---
>
> Key: FLINK-8958
> URL: https://issues.apache.org/jira/browse/FLINK-8958
> Project: Flink
>  Issue Type: Improvement
>  Components: Tests
>Affects Versions: 1.5.0
>Reporter: Chesnay Schepler
>Assignee: Chesnay Schepler
>Priority: Blocker
> Fix For: 1.5.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (FLINK-9024) IntelliJ IDEA support run and debug Flink code

2018-03-19 Thread Chesnay Schepler (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-9024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16404882#comment-16404882
 ] 

Chesnay Schepler edited comment on FLINK-9024 at 3/19/18 2:15 PM:
--

Please explain what problem you are experiencing.


was (Author: zentol):
You have to upload the picture as an attachment.

Also, please explain what problem you are experiencing.

> IntelliJ IDEA support run and debug Flink code
> --
>
> Key: FLINK-9024
> URL: https://issues.apache.org/jira/browse/FLINK-9024
> Project: Flink
>  Issue Type: Bug
>  Components: Documentation
>Reporter: chillon.m
>Priority: Trivial
> Attachments: 10086390.png
>
>
> [https://ci.apache.org/projects/flink/flink-docs-release-1.4/dev/java8.html]
>   
>  IntelliJ IDEA comes bundled with the Eclipse compiler.
>  we can debug and run Flink code with some configuration steps. !10086390.png!
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (FLINK-9024) IntelliJ IDEA support run and debug Flink code

2018-03-19 Thread chillon.m (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-9024?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

chillon.m updated FLINK-9024:
-
Description: 
[https://ci.apache.org/projects/flink/flink-docs-release-1.4/dev/java8.html]
  
 IntelliJ IDEA comes bundled with the Eclipse compiler.
 we can debug and run Flink code with some configuration steps. !10086390.png!
 

  was:
[https://ci.apache.org/projects/flink/flink-docs-release-1.4/dev/java8.html]
  
 IntelliJ IDEA comes bundled with the Eclipse compiler.
 we can debug and run Flink code with some configuration steps. !10086390.png!
   !10086390.png!


> IntelliJ IDEA support run and debug Flink code
> --
>
> Key: FLINK-9024
> URL: https://issues.apache.org/jira/browse/FLINK-9024
> Project: Flink
>  Issue Type: Bug
>  Components: Documentation
>Reporter: chillon.m
>Priority: Trivial
> Attachments: 10086390.png
>
>
> [https://ci.apache.org/projects/flink/flink-docs-release-1.4/dev/java8.html]
>   
>  IntelliJ IDEA comes bundled with the Eclipse compiler.
>  we can debug and run Flink code with some configuration steps. !10086390.png!
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-9024) IntelliJ IDEA support run and debug Flink code

2018-03-19 Thread Chesnay Schepler (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-9024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16404882#comment-16404882
 ] 

Chesnay Schepler commented on FLINK-9024:
-

You have to upload the picture as an attachment.

Also, please explain what problem you are experiencing.

> IntelliJ IDEA support run and debug Flink code
> --
>
> Key: FLINK-9024
> URL: https://issues.apache.org/jira/browse/FLINK-9024
> Project: Flink
>  Issue Type: Bug
>  Components: Documentation
>Reporter: chillon.m
>Priority: Trivial
> Attachments: 10086390.png
>
>
> [https://ci.apache.org/projects/flink/flink-docs-release-1.4/dev/java8.html]
>   
>  IntelliJ IDEA comes bundled with the Eclipse compiler.
>  we can debug and run Flink code with some configuration steps. !10086390.png!
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


  1   2   >