[jira] [Commented] (FLINK-2445) Add tests for HadoopOutputFormats

2016-03-24 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15210737#comment-15210737
 ] 

ASF GitHub Bot commented on FLINK-2445:
---

Github user mliesenberg closed the pull request at:

https://github.com/apache/flink/pull/1625


> Add tests for HadoopOutputFormats
> -
>
> Key: FLINK-2445
> URL: https://issues.apache.org/jira/browse/FLINK-2445
> Project: Flink
>  Issue Type: Test
>  Components: Hadoop Compatibility, Tests
>Affects Versions: 0.9.1, 0.10.0
>Reporter: Fabian Hueske
>Assignee: Martin Liesenberg
>  Labels: starter
> Fix For: 1.1.0, 1.0.1
>
>
> The HadoopOutputFormats and HadoopOutputFormatBase classes are not 
> sufficiently covered by unit tests.
> We need tests that ensure that the methods of the wrapped Hadoop 
> OutputFormats are correctly called. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-2445] Add tests for HadoopOutputFormats

2016-03-24 Thread mliesenberg
Github user mliesenberg closed the pull request at:

https://github.com/apache/flink/pull/1625


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-2445) Add tests for HadoopOutputFormats

2016-03-24 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15210724#comment-15210724
 ] 

ASF GitHub Bot commented on FLINK-2445:
---

Github user fhueske commented on the pull request:

https://github.com/apache/flink/pull/1625#issuecomment-200961463
  
Hi @mliesenberg, I merged this PR but it wasn't correctly closed. Can you 
do it? Thanks 


> Add tests for HadoopOutputFormats
> -
>
> Key: FLINK-2445
> URL: https://issues.apache.org/jira/browse/FLINK-2445
> Project: Flink
>  Issue Type: Test
>  Components: Hadoop Compatibility, Tests
>Affects Versions: 0.9.1, 0.10.0
>Reporter: Fabian Hueske
>Assignee: Martin Liesenberg
>  Labels: starter
> Fix For: 1.1.0, 1.0.1
>
>
> The HadoopOutputFormats and HadoopOutputFormatBase classes are not 
> sufficiently covered by unit tests.
> We need tests that ensure that the methods of the wrapped Hadoop 
> OutputFormats are correctly called. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-2445] Add tests for HadoopOutputFormats

2016-03-24 Thread fhueske
Github user fhueske commented on the pull request:

https://github.com/apache/flink/pull/1625#issuecomment-200961463
  
Hi @mliesenberg, I merged this PR but it wasn't correctly closed. Can you 
do it? Thanks 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-3659] Allow ConnectedStreams to Be Keye...

2016-03-24 Thread StephanEwen
Github user StephanEwen commented on the pull request:

https://github.com/apache/flink/pull/1831#issuecomment-200951040
  
Thanks for describing this.

This has quite some big implications, as far as I can see it. The state in 
the connected stream is now a "broadcast state" not partitioned, so allowing to 
do that on the key/value state probably breaks some ongoing efforts, like 
scaling, etc.

How about a more clean separation of these things:
  1. The connected streams (fully partitioned or not)
  2. Broadcast inputs, which are similar to the "side inputs" in cloud 
dataflow or the broadcast variables in DataSet.

That gives us
  - clean semantics, behavior that users can work with
  - We do not overcomplicate the key/value abstraction
  - and we can also make sure we checkpoint the broadcast state once only 
(rather than on each parallel subtask). 



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-3659) Allow ConnectedStreams to Be Keyed on Only One Side

2016-03-24 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3659?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15210655#comment-15210655
 ] 

ASF GitHub Bot commented on FLINK-3659:
---

Github user StephanEwen commented on the pull request:

https://github.com/apache/flink/pull/1831#issuecomment-200951040
  
Thanks for describing this.

This has quite some big implications, as far as I can see it. The state in 
the connected stream is now a "broadcast state" not partitioned, so allowing to 
do that on the key/value state probably breaks some ongoing efforts, like 
scaling, etc.

How about a more clean separation of these things:
  1. The connected streams (fully partitioned or not)
  2. Broadcast inputs, which are similar to the "side inputs" in cloud 
dataflow or the broadcast variables in DataSet.

That gives us
  - clean semantics, behavior that users can work with
  - We do not overcomplicate the key/value abstraction
  - and we can also make sure we checkpoint the broadcast state once only 
(rather than on each parallel subtask). 



> Allow ConnectedStreams to Be Keyed on Only One Side
> ---
>
> Key: FLINK-3659
> URL: https://issues.apache.org/jira/browse/FLINK-3659
> Project: Flink
>  Issue Type: Improvement
>  Components: Streaming
>Affects Versions: 1.0.0
>Reporter: Aljoscha Krettek
>Assignee: Aljoscha Krettek
>
> Currently, we only allow partitioned state when both inputs are keyed (with 
> the same key type). I think a valid use case is to have only one side keyed 
> and have the other side broadcast to publish some updates that are relevant 
> for all keys.
> When relaxing the requirement to have only one side keyed we must still 
> ensure that the same key is used if both sides are keyed.
> [~gyfora] Did you try this with what you're working on?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-3659) Allow ConnectedStreams to Be Keyed on Only One Side

2016-03-24 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3659?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15210628#comment-15210628
 ] 

ASF GitHub Bot commented on FLINK-3659:
---

Github user tillrohrmann commented on the pull request:

https://github.com/apache/flink/pull/1831#issuecomment-200945681
  
I like @aljoscha's idea to separate more explicitly the user state access 
and it's implementation. Having an accessor would also allow us to get rid of 
the swapping of the actual state objects which are wrapped by the `State` 
objects. Then the implementation of a `StateBackend` wouldn't have to be spread 
out over the `KvState` classes anymore. This again would make it easier to 
integrate the notion of virtual state partitions/shards into `StateBackends`. 
So in general, I think it would simplify our current `StateBackend` 
implementations noticeable.


> Allow ConnectedStreams to Be Keyed on Only One Side
> ---
>
> Key: FLINK-3659
> URL: https://issues.apache.org/jira/browse/FLINK-3659
> Project: Flink
>  Issue Type: Improvement
>  Components: Streaming
>Affects Versions: 1.0.0
>Reporter: Aljoscha Krettek
>Assignee: Aljoscha Krettek
>
> Currently, we only allow partitioned state when both inputs are keyed (with 
> the same key type). I think a valid use case is to have only one side keyed 
> and have the other side broadcast to publish some updates that are relevant 
> for all keys.
> When relaxing the requirement to have only one side keyed we must still 
> ensure that the same key is used if both sides are keyed.
> [~gyfora] Did you try this with what you're working on?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-3659] Allow ConnectedStreams to Be Keye...

2016-03-24 Thread tillrohrmann
Github user tillrohrmann commented on the pull request:

https://github.com/apache/flink/pull/1831#issuecomment-200945681
  
I like @aljoscha's idea to separate more explicitly the user state access 
and it's implementation. Having an accessor would also allow us to get rid of 
the swapping of the actual state objects which are wrapped by the `State` 
objects. Then the implementation of a `StateBackend` wouldn't have to be spread 
out over the `KvState` classes anymore. This again would make it easier to 
integrate the notion of virtual state partitions/shards into `StateBackends`. 
So in general, I think it would simplify our current `StateBackend` 
implementations noticeable.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-3547) Add support for streaming projection, selection, and union

2016-03-24 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15210612#comment-15210612
 ] 

ASF GitHub Bot commented on FLINK-3547:
---

Github user vasia commented on the pull request:

https://github.com/apache/flink/pull/1820#issuecomment-200942437
  
I've rebased on master and will merge once travis turns green :)


> Add support for streaming projection, selection, and union
> --
>
> Key: FLINK-3547
> URL: https://issues.apache.org/jira/browse/FLINK-3547
> Project: Flink
>  Issue Type: Sub-task
>  Components: Table API
>Reporter: Vasia Kalavri
>Assignee: Vasia Kalavri
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-3547] add support for streaming filter,...

2016-03-24 Thread vasia
Github user vasia commented on the pull request:

https://github.com/apache/flink/pull/1820#issuecomment-200942437
  
I've rebased on master and will merge once travis turns green :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-3667) Generalize client<->cluster communication

2016-03-24 Thread Till Rohrmann (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15210591#comment-15210591
 ] 

Till Rohrmann commented on FLINK-3667:
--

Sounds like a good idea. +1 :-)

> Generalize client<->cluster communication
> -
>
> Key: FLINK-3667
> URL: https://issues.apache.org/jira/browse/FLINK-3667
> Project: Flink
>  Issue Type: Improvement
>  Components: YARN Client
>Reporter: Maximilian Michels
>Assignee: Maximilian Michels
>
> Here are some notes I took when inspecting the client<->cluster classes with 
> regard to future integration of other resource management frameworks in 
> addition to Yarn (e.g. Mesos).
> {noformat}
> 1 Cluster Client Abstraction
> 
> 1.1 Status Quo
> ──
> 1.1.1 FlinkYarnClient
> ╌
>   • Holds the cluster configuration (Flink-specific and Yarn-specific)
>   • Contains the deploy() method to deploy the cluster
>   • Creates the Hadoop Yarn client
>   • Receives the initial job manager address
>   • Bootstraps the FlinkYarnCluster
> 1.1.2 FlinkYarnCluster
> ╌╌
>   • Wrapper around the Hadoop Yarn client
>   • Queries cluster for status updates
>   • Life time methods to start and shutdown the cluster
>   • Flink specific features like shutdown after job completion
> 1.1.3 ApplicationClient
> ╌╌╌
>   • Acts as a middle-man for asynchronous cluster communication
>   • Designed to communicate with Yarn, not used in Standalone mode
> 1.1.4 CliFrontend
> ╌
>   • Deeply integrated with FlinkYarnClient and FlinkYarnCluster
>   • Constantly distinguishes between Yarn and Standalone mode
>   • Would be nice to have a general abstraction in place
> 1.1.5 Client
> 
>   • Job submission and Job related actions, agnostic of resource framework
> 1.2 Proposal
> 
> 1.2.1 ClusterConfig (before: AbstractFlinkYarnClient)
> ╌
>   • Extensible cluster-agnostic config
>   • May be extended by specific cluster, e.g. YarnClusterConfig
> 1.2.2 ClusterClient (before: AbstractFlinkYarnClient)
> ╌
>   • Deals with cluster (RM) specific communication
>   • Exposes framework agnostic information
>   • YarnClusterClient, MesosClusterClient, StandaloneClusterClient
> 1.2.3 FlinkCluster (before: AbstractFlinkYarnCluster)
> ╌
>   • Basic interface to communicate with a running cluster
>   • Receives the ClusterClient for cluster-specific communication
>   • Should not have to care about the specific implementations of the
> client
> 1.2.4 ApplicationClient
> ╌╌╌
>   • Can be changed to work cluster-agnostic (first steps already in
> FLINK-3543)
> 1.2.5 CliFrontend
> ╌
>   • CliFrontend does never have to differentiate between different
> cluster types after it has determined which cluster class to load.
>   • Base class handles framework agnostic command line arguments
>   • Pluggables for Yarn, Mesos handle specific commands
> {noformat}
> I would like to create/refactor the affected classes to set us up for a more 
> flexible client side resource management abstraction.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-3659) Allow ConnectedStreams to Be Keyed on Only One Side

2016-03-24 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3659?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15210586#comment-15210586
 ] 

ASF GitHub Bot commented on FLINK-3659:
---

Github user aljoscha commented on the pull request:

https://github.com/apache/flink/pull/1831#issuecomment-200933436
  
To elaborate on this. State right now works well if you stick to the 
(admittedly somewhat hidden) rules. That is, you should only access state if 
there is a key available.

If there is no key available the behavior changes in unexpected ways based 
on what state backend is used and the capabilities of the key serializer. For 
example, let's look at access to `ValueState` in `open()`. For mem/fs state: 
`ValueState.value()` works, it will return the default value. 
`ValueState.update()` will throw a NPE. For RocksDB state: Neither method works 
if the key serializer cannot handle null values. If it can, then both methods 
will change state for the `null` key.

For these reasons I would like to change the semantics of state such that 
the user always has to call `getState` (or a similar method) and that the 
returned accessor object is documented to only be valid for the duration of the 
processing method. Right now, the user can wreak all kinds of havoc by 
down-casting the returned State object. Right now we have a very simple system 
that works if the user keeps to the rules and also makes things go fast. If we 
want to make it more restrictive we will lose some performance, of course.


> Allow ConnectedStreams to Be Keyed on Only One Side
> ---
>
> Key: FLINK-3659
> URL: https://issues.apache.org/jira/browse/FLINK-3659
> Project: Flink
>  Issue Type: Improvement
>  Components: Streaming
>Affects Versions: 1.0.0
>Reporter: Aljoscha Krettek
>Assignee: Aljoscha Krettek
>
> Currently, we only allow partitioned state when both inputs are keyed (with 
> the same key type). I think a valid use case is to have only one side keyed 
> and have the other side broadcast to publish some updates that are relevant 
> for all keys.
> When relaxing the requirement to have only one side keyed we must still 
> ensure that the same key is used if both sides are keyed.
> [~gyfora] Did you try this with what you're working on?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-3659] Allow ConnectedStreams to Be Keye...

2016-03-24 Thread aljoscha
Github user aljoscha commented on the pull request:

https://github.com/apache/flink/pull/1831#issuecomment-200933436
  
To elaborate on this. State right now works well if you stick to the 
(admittedly somewhat hidden) rules. That is, you should only access state if 
there is a key available.

If there is no key available the behavior changes in unexpected ways based 
on what state backend is used and the capabilities of the key serializer. For 
example, let's look at access to `ValueState` in `open()`. For mem/fs state: 
`ValueState.value()` works, it will return the default value. 
`ValueState.update()` will throw a NPE. For RocksDB state: Neither method works 
if the key serializer cannot handle null values. If it can, then both methods 
will change state for the `null` key.

For these reasons I would like to change the semantics of state such that 
the user always has to call `getState` (or a similar method) and that the 
returned accessor object is documented to only be valid for the duration of the 
processing method. Right now, the user can wreak all kinds of havoc by 
down-casting the returned State object. Right now we have a very simple system 
that works if the user keeps to the rules and also makes things go fast. If we 
want to make it more restrictive we will lose some performance, of course.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Resolved] (FLINK-3639) Add methods and utilities to register DataSets and Tables in the TableEnvironment

2016-03-24 Thread Vasia Kalavri (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-3639?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vasia Kalavri resolved FLINK-3639.
--
   Resolution: Implemented
Fix Version/s: 1.1.0

> Add methods and utilities to register DataSets and Tables in the 
> TableEnvironment
> -
>
> Key: FLINK-3639
> URL: https://issues.apache.org/jira/browse/FLINK-3639
> Project: Flink
>  Issue Type: New Feature
>  Components: Table API
>Affects Versions: 1.1.0
>Reporter: Vasia Kalavri
>Assignee: Vasia Kalavri
> Fix For: 1.1.0
>
>
> In order to make tables queryable from SQL we need to register them under a 
> unique name in the TableEnvironment.
> [This design 
> document|https://docs.google.com/document/d/1sITIShmJMGegzAjGqFuwiN_iw1urwykKsLiacokxSw0/edit]
>  describes the proposed API.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-3639) Add methods and utilities to register DataSets and Tables in the TableEnvironment

2016-03-24 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3639?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15210528#comment-15210528
 ] 

ASF GitHub Bot commented on FLINK-3639:
---

Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/1827


> Add methods and utilities to register DataSets and Tables in the 
> TableEnvironment
> -
>
> Key: FLINK-3639
> URL: https://issues.apache.org/jira/browse/FLINK-3639
> Project: Flink
>  Issue Type: New Feature
>  Components: Table API
>Affects Versions: 1.1.0
>Reporter: Vasia Kalavri
>Assignee: Vasia Kalavri
>
> In order to make tables queryable from SQL we need to register them under a 
> unique name in the TableEnvironment.
> [This design 
> document|https://docs.google.com/document/d/1sITIShmJMGegzAjGqFuwiN_iw1urwykKsLiacokxSw0/edit]
>  describes the proposed API.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-3639] add methods for registering datas...

2016-03-24 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/1827


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-3257) Add Exactly-Once Processing Guarantees in Iterative DataStream Jobs

2016-03-24 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15210494#comment-15210494
 ] 

ASF GitHub Bot commented on FLINK-3257:
---

Github user StephanEwen commented on the pull request:

https://github.com/apache/flink/pull/1668#issuecomment-200909639
  
The core idea of this is very good, also the illustration is very nice.

After an offline chat with @senorcarbone, we concluded that a remaining 
problem in this is currently the way it integrates with the timeout-based 
termination detection.

Which brings us to the point that we should (in my opinion) change the way 
that loops terminate. It should probably be based on end-of-stream events, to 
make it deterministic and not susceptible to delays.

Question is now, does it make sense to do the termination change first, and 
base this on top of it, or to merge this irrespective of that...


> Add Exactly-Once Processing Guarantees in Iterative DataStream Jobs
> ---
>
> Key: FLINK-3257
> URL: https://issues.apache.org/jira/browse/FLINK-3257
> Project: Flink
>  Issue Type: Improvement
>Reporter: Paris Carbone
>Assignee: Paris Carbone
>
> The current snapshotting algorithm cannot support cycles in the execution 
> graph. An alternative scheme can potentially include records in-transit 
> through the back-edges of a cyclic execution graph (ABS [1]) to achieve the 
> same guarantees.
> One straightforward implementation of ABS for cyclic graphs can work as 
> follows along the lines:
> 1) Upon triggering a barrier in an IterationHead from the TaskManager start 
> block output and start upstream backup of all records forwarded from the 
> respective IterationSink.
> 2) The IterationSink should eventually forward the current snapshotting epoch 
> barrier to the IterationSource.
> 3) Upon receiving a barrier from the IterationSink, the IterationSource 
> should finalize the snapshot, unblock its output and emit all records 
> in-transit in FIFO order and continue the usual execution.
> --
> Upon restart the IterationSource should emit all records from the injected 
> snapshot first and then continue its usual execution.
> Several optimisations and slight variations can be potentially achieved but 
> this can be the initial implementation take.
> [1] http://arxiv.org/abs/1506.08603



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-3257] Add Exactly-Once Processing Guara...

2016-03-24 Thread StephanEwen
Github user StephanEwen commented on the pull request:

https://github.com/apache/flink/pull/1668#issuecomment-200909639
  
The core idea of this is very good, also the illustration is very nice.

After an offline chat with @senorcarbone, we concluded that a remaining 
problem in this is currently the way it integrates with the timeout-based 
termination detection.

Which brings us to the point that we should (in my opinion) change the way 
that loops terminate. It should probably be based on end-of-stream events, to 
make it deterministic and not susceptible to delays.

Question is now, does it make sense to do the termination change first, and 
base this on top of it, or to merge this irrespective of that...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-3659] Allow ConnectedStreams to Be Keye...

2016-03-24 Thread gyfora
Github user gyfora commented on the pull request:

https://github.com/apache/flink/pull/1831#issuecomment-200907475
  
@StephanEwen 

This PR does not change the behaviour of any existing Flink applications.

It now allows though that users only specify key of one input of the 
comapfunctions for instance:
`input1.keyBy(...).connect(input2.broadcast())`

This was previously impossible which probably blocks many existing 
use-cases (actually blocking one of my own applications that I try to build on 
Flink) where one input does not have associated state.

After this change the key-value state defined by the keyed input stream 
works as expected.

The only not so fortunate behaviour is that users can still call 
state.value() for inputs from the non-keyed stream and the behaviour is not 
clearly defined. If there was already input from the other side it returns the 
state for the last key, otherwise it will probably throw a nullpointer 
exception.

I think this is acceptable behaviour for the time being because well 
written programs will work as expected. We can think about how we want to 
handle the other non-keyed input but that will probably include changing many 
things in the KvBackends so they can do this properly. This problem already 
exists in flink as state access outside of the processing method is not well 
defined.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-3659) Allow ConnectedStreams to Be Keyed on Only One Side

2016-03-24 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3659?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15210480#comment-15210480
 ] 

ASF GitHub Bot commented on FLINK-3659:
---

Github user gyfora commented on the pull request:

https://github.com/apache/flink/pull/1831#issuecomment-200907475
  
@StephanEwen 

This PR does not change the behaviour of any existing Flink applications.

It now allows though that users only specify key of one input of the 
comapfunctions for instance:
`input1.keyBy(...).connect(input2.broadcast())`

This was previously impossible which probably blocks many existing 
use-cases (actually blocking one of my own applications that I try to build on 
Flink) where one input does not have associated state.

After this change the key-value state defined by the keyed input stream 
works as expected.

The only not so fortunate behaviour is that users can still call 
state.value() for inputs from the non-keyed stream and the behaviour is not 
clearly defined. If there was already input from the other side it returns the 
state for the last key, otherwise it will probably throw a nullpointer 
exception.

I think this is acceptable behaviour for the time being because well 
written programs will work as expected. We can think about how we want to 
handle the other non-keyed input but that will probably include changing many 
things in the KvBackends so they can do this properly. This problem already 
exists in flink as state access outside of the processing method is not well 
defined.


> Allow ConnectedStreams to Be Keyed on Only One Side
> ---
>
> Key: FLINK-3659
> URL: https://issues.apache.org/jira/browse/FLINK-3659
> Project: Flink
>  Issue Type: Improvement
>  Components: Streaming
>Affects Versions: 1.0.0
>Reporter: Aljoscha Krettek
>Assignee: Aljoscha Krettek
>
> Currently, we only allow partitioned state when both inputs are keyed (with 
> the same key type). I think a valid use case is to have only one side keyed 
> and have the other side broadcast to publish some updates that are relevant 
> for all keys.
> When relaxing the requirement to have only one side keyed we must still 
> ensure that the same key is used if both sides are keyed.
> [~gyfora] Did you try this with what you're working on?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-3659) Allow ConnectedStreams to Be Keyed on Only One Side

2016-03-24 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3659?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15210412#comment-15210412
 ] 

ASF GitHub Bot commented on FLINK-3659:
---

Github user StephanEwen commented on the pull request:

https://github.com/apache/flink/pull/1831#issuecomment-200887743
  
Can someone elaborate on the semantics?

I am against merging something that changes semantics and has zero 
description.


> Allow ConnectedStreams to Be Keyed on Only One Side
> ---
>
> Key: FLINK-3659
> URL: https://issues.apache.org/jira/browse/FLINK-3659
> Project: Flink
>  Issue Type: Improvement
>  Components: Streaming
>Affects Versions: 1.0.0
>Reporter: Aljoscha Krettek
>Assignee: Aljoscha Krettek
>
> Currently, we only allow partitioned state when both inputs are keyed (with 
> the same key type). I think a valid use case is to have only one side keyed 
> and have the other side broadcast to publish some updates that are relevant 
> for all keys.
> When relaxing the requirement to have only one side keyed we must still 
> ensure that the same key is used if both sides are keyed.
> [~gyfora] Did you try this with what you're working on?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-3664) Create a method to easily Summarize a DataSet

2016-03-24 Thread Todd Lisonbee (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3664?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15210409#comment-15210409
 ] 

Todd Lisonbee commented on FLINK-3664:
--

Hi Fabian, thanks for the feedback.

Your first 3 comments all make sense - agreed.

On distinct counts, I thought about it but wasn't sure so I left it out for 
now.  For an approximate, the best idea I had was to choose some arbitrary 
number, maybe 100.  And then just report the exact number of distinct values if 
less than 100, or to say 100+ if greater than 100.  This would be nice for 
categorical variables that happen to have less than 100 different values.  But 
with enough rows and columns it could be expensive (even if Tuple is currently 
limited to 22) or at least relatively more expensive than the other 
calculations.  There isn't a perfect magic number.  I didn't like this idea all 
of the way.

Do you know of a nice way to approximate distinct counts?

Thanks.

> Create a method to easily Summarize a DataSet
> -
>
> Key: FLINK-3664
> URL: https://issues.apache.org/jira/browse/FLINK-3664
> Project: Flink
>  Issue Type: Improvement
>Reporter: Todd Lisonbee
> Attachments: DataSet-Summary-Design-March2016-v1.txt
>
>
> Here is an example:
> {code}
> /**
>  * Summarize a DataSet of Tuples by collecting single pass statistics for all 
> columns
>  */
> public Tuple summarize()
> Dataset> input = // [...]
> Tuple3 summary 
> = input.summarize()
> summary.getField(0).stddev()
> summary.getField(1).maxStringLength()
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-3659] Allow ConnectedStreams to Be Keye...

2016-03-24 Thread StephanEwen
Github user StephanEwen commented on the pull request:

https://github.com/apache/flink/pull/1831#issuecomment-200887743
  
Can someone elaborate on the semantics?

I am against merging something that changes semantics and has zero 
description.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-3659) Allow ConnectedStreams to Be Keyed on Only One Side

2016-03-24 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3659?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15210406#comment-15210406
 ] 

ASF GitHub Bot commented on FLINK-3659:
---

Github user gyfora commented on the pull request:

https://github.com/apache/flink/pull/1831#issuecomment-200886203
  
I think you can go ahead merging this if no-one has any objections :)


> Allow ConnectedStreams to Be Keyed on Only One Side
> ---
>
> Key: FLINK-3659
> URL: https://issues.apache.org/jira/browse/FLINK-3659
> Project: Flink
>  Issue Type: Improvement
>  Components: Streaming
>Affects Versions: 1.0.0
>Reporter: Aljoscha Krettek
>Assignee: Aljoscha Krettek
>
> Currently, we only allow partitioned state when both inputs are keyed (with 
> the same key type). I think a valid use case is to have only one side keyed 
> and have the other side broadcast to publish some updates that are relevant 
> for all keys.
> When relaxing the requirement to have only one side keyed we must still 
> ensure that the same key is used if both sides are keyed.
> [~gyfora] Did you try this with what you're working on?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-3659] Allow ConnectedStreams to Be Keye...

2016-03-24 Thread gyfora
Github user gyfora commented on the pull request:

https://github.com/apache/flink/pull/1831#issuecomment-200886203
  
I think you can go ahead merging this if no-one has any objections :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-3659] Allow ConnectedStreams to Be Keye...

2016-03-24 Thread StephanEwen
Github user StephanEwen commented on the pull request:

https://github.com/apache/flink/pull/1831#issuecomment-200888107
  
For example, how does keyed state work for the input side that is not key 
partitioned? How is the key found? How is partitioning guaranteed?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-3659) Allow ConnectedStreams to Be Keyed on Only One Side

2016-03-24 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3659?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15210413#comment-15210413
 ] 

ASF GitHub Bot commented on FLINK-3659:
---

Github user StephanEwen commented on the pull request:

https://github.com/apache/flink/pull/1831#issuecomment-200888107
  
For example, how does keyed state work for the input side that is not key 
partitioned? How is the key found? How is partitioning guaranteed?


> Allow ConnectedStreams to Be Keyed on Only One Side
> ---
>
> Key: FLINK-3659
> URL: https://issues.apache.org/jira/browse/FLINK-3659
> Project: Flink
>  Issue Type: Improvement
>  Components: Streaming
>Affects Versions: 1.0.0
>Reporter: Aljoscha Krettek
>Assignee: Aljoscha Krettek
>
> Currently, we only allow partitioned state when both inputs are keyed (with 
> the same key type). I think a valid use case is to have only one side keyed 
> and have the other side broadcast to publish some updates that are relevant 
> for all keys.
> When relaxing the requirement to have only one side keyed we must still 
> ensure that the same key is used if both sides are keyed.
> [~gyfora] Did you try this with what you're working on?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-3257] Add Exactly-Once Processing Guara...

2016-03-24 Thread uce
Github user uce commented on a diff in the pull request:

https://github.com/apache/flink/pull/1668#discussion_r57327419
  
--- Diff: 
flink-streaming-java/src/main/java/org/apache/flink/streaming/runtime/tasks/StreamIterationHead.java
 ---
@@ -17,43 +17,80 @@
 
 package org.apache.flink.streaming.runtime.tasks;
 
-import java.util.concurrent.ArrayBlockingQueue;
-import java.util.concurrent.BlockingQueue;
-import java.util.concurrent.TimeUnit;
-
 import org.apache.flink.annotation.Internal;
 import org.apache.flink.api.common.JobID;
+import org.apache.flink.core.memory.DataInputViewStreamWrapper;
+import org.apache.flink.core.memory.DataOutputViewStreamWrapper;
+import org.apache.flink.runtime.io.network.api.CheckpointBarrier;
+import org.apache.flink.runtime.state.AbstractStateBackend;
+import org.apache.flink.runtime.state.StreamStateHandle;
+import org.apache.flink.streaming.api.operators.AbstractStreamOperator;
+import org.apache.flink.streaming.api.operators.OneInputStreamOperator;
 import org.apache.flink.streaming.api.watermark.Watermark;
-import org.apache.flink.streaming.runtime.io.RecordWriterOutput;
 import org.apache.flink.streaming.runtime.io.BlockingQueueBroker;
+import org.apache.flink.streaming.runtime.io.RecordWriterOutput;
 import org.apache.flink.streaming.runtime.streamrecord.StreamRecord;
+import 
org.apache.flink.streaming.runtime.streamrecord.StreamRecordSerializer;
+import org.apache.flink.types.Either;
 import org.slf4j.Logger;
 import org.slf4j.LoggerFactory;
 
+import java.io.InputStream;
+import java.util.ArrayList;
+import java.util.List;
+import java.util.concurrent.ArrayBlockingQueue;
+import java.util.concurrent.BlockingQueue;
+import java.util.concurrent.TimeUnit;
+
 @Internal
-public class StreamIterationHead extends OneInputStreamTask 
{
+public class StreamIterationHead extends OneInputStreamTask {
 
private static final Logger LOG = 
LoggerFactory.getLogger(StreamIterationHead.class);
 
private volatile boolean running = true;
 
-   // 

-   
+   /**
+* A flag that is on during the duration of a checkpoint. While 
onSnapshot is true the iteration head has to perform
+* upstream backup of all records in transit within the loop.
+*/
+   private volatile boolean onSnapshot = false;
--- End diff --

Always accessed in lock scope, no need for `volatile`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-3613) Add standard deviation, mean, variance to list of Aggregations

2016-03-24 Thread Fabian Hueske (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3613?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15210344#comment-15210344
 ] 

Fabian Hueske commented on FLINK-3613:
--

Hi [~tlisonbee], thanks for the detailed design doc! I'll give some feedback 
soon. 

> Add standard deviation, mean, variance to list of Aggregations
> --
>
> Key: FLINK-3613
> URL: https://issues.apache.org/jira/browse/FLINK-3613
> Project: Flink
>  Issue Type: Improvement
>Reporter: Todd Lisonbee
>Priority: Minor
> Attachments: DataSet-Aggregation-Design-March2016-v1.txt
>
>
> Implement standard deviation, mean, variance for 
> org.apache.flink.api.java.aggregation.Aggregations
> Ideally implementation should be single pass and numerically stable.
> References:
> "Scalable and Numerically Stable Descriptive Statistics in SystemML", Tian et 
> al, International Conference on Data Engineering 2012
> http://dl.acm.org/citation.cfm?id=2310392
> "The Kahan summation algorithm (also known as compensated summation) reduces 
> the numerical errors that occur when adding a sequence of finite precision 
> floating point numbers. Numerical errors arise due to truncation and 
> rounding. These errors can lead to numerical instability when calculating 
> variance."
> https://en.wikipedia.org/wiki/Kahan_summation_algorithm



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-3664) Create a method to easily Summarize a DataSet

2016-03-24 Thread Fabian Hueske (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3664?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15210343#comment-15210343
 ] 

Fabian Hueske commented on FLINK-3664:
--

Hi [~tlisonbee], I like this proposal a lot. It would be a great feature to add.

A few comments on you proposal:
- If possible use a {{CombineFunction}} instead of a {{GroupCombineFunction}}. 
{{CombineFunction}} can be executed using a hash-based execution strategy 
whereas {{GroupCombineFunction}} must be evaluated with a sort-based strategy. 
Hash-based combiners are not supported yet but will hopefully be added soon 
(see PR #1517).
- We add new functionality such as {{summarize()}} to {{DataSetUtils}} and not 
directly to {{DataSet}}. {{DataSetUtils}} serves kind of as a staging area for 
new features where we are not sure yet whether we want to add them to the core 
API.
- It is certainly possible to implement the feature as {{CustomUnaryOperation}} 
but it is also fine to directly call the API methods.
- Do you think it makes sense to include (approximate) distinct counts for 
integer and String values or should this as a separate method?

> Create a method to easily Summarize a DataSet
> -
>
> Key: FLINK-3664
> URL: https://issues.apache.org/jira/browse/FLINK-3664
> Project: Flink
>  Issue Type: Improvement
>Reporter: Todd Lisonbee
> Attachments: DataSet-Summary-Design-March2016-v1.txt
>
>
> Here is an example:
> {code}
> /**
>  * Summarize a DataSet of Tuples by collecting single pass statistics for all 
> columns
>  */
> public Tuple summarize()
> Dataset> input = // [...]
> Tuple3 summary 
> = input.summarize()
> summary.getField(0).stddev()
> summary.getField(1).maxStringLength()
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-3257) Add Exactly-Once Processing Guarantees in Iterative DataStream Jobs

2016-03-24 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15210328#comment-15210328
 ] 

ASF GitHub Bot commented on FLINK-3257:
---

Github user uce commented on a diff in the pull request:

https://github.com/apache/flink/pull/1668#discussion_r57327419
  
--- Diff: 
flink-streaming-java/src/main/java/org/apache/flink/streaming/runtime/tasks/StreamIterationHead.java
 ---
@@ -17,43 +17,80 @@
 
 package org.apache.flink.streaming.runtime.tasks;
 
-import java.util.concurrent.ArrayBlockingQueue;
-import java.util.concurrent.BlockingQueue;
-import java.util.concurrent.TimeUnit;
-
 import org.apache.flink.annotation.Internal;
 import org.apache.flink.api.common.JobID;
+import org.apache.flink.core.memory.DataInputViewStreamWrapper;
+import org.apache.flink.core.memory.DataOutputViewStreamWrapper;
+import org.apache.flink.runtime.io.network.api.CheckpointBarrier;
+import org.apache.flink.runtime.state.AbstractStateBackend;
+import org.apache.flink.runtime.state.StreamStateHandle;
+import org.apache.flink.streaming.api.operators.AbstractStreamOperator;
+import org.apache.flink.streaming.api.operators.OneInputStreamOperator;
 import org.apache.flink.streaming.api.watermark.Watermark;
-import org.apache.flink.streaming.runtime.io.RecordWriterOutput;
 import org.apache.flink.streaming.runtime.io.BlockingQueueBroker;
+import org.apache.flink.streaming.runtime.io.RecordWriterOutput;
 import org.apache.flink.streaming.runtime.streamrecord.StreamRecord;
+import 
org.apache.flink.streaming.runtime.streamrecord.StreamRecordSerializer;
+import org.apache.flink.types.Either;
 import org.slf4j.Logger;
 import org.slf4j.LoggerFactory;
 
+import java.io.InputStream;
+import java.util.ArrayList;
+import java.util.List;
+import java.util.concurrent.ArrayBlockingQueue;
+import java.util.concurrent.BlockingQueue;
+import java.util.concurrent.TimeUnit;
+
 @Internal
-public class StreamIterationHead extends OneInputStreamTask 
{
+public class StreamIterationHead extends OneInputStreamTask {
 
private static final Logger LOG = 
LoggerFactory.getLogger(StreamIterationHead.class);
 
private volatile boolean running = true;
 
-   // 

-   
+   /**
+* A flag that is on during the duration of a checkpoint. While 
onSnapshot is true the iteration head has to perform
+* upstream backup of all records in transit within the loop.
+*/
+   private volatile boolean onSnapshot = false;
--- End diff --

Always accessed in lock scope, no need for `volatile`


> Add Exactly-Once Processing Guarantees in Iterative DataStream Jobs
> ---
>
> Key: FLINK-3257
> URL: https://issues.apache.org/jira/browse/FLINK-3257
> Project: Flink
>  Issue Type: Improvement
>Reporter: Paris Carbone
>Assignee: Paris Carbone
>
> The current snapshotting algorithm cannot support cycles in the execution 
> graph. An alternative scheme can potentially include records in-transit 
> through the back-edges of a cyclic execution graph (ABS [1]) to achieve the 
> same guarantees.
> One straightforward implementation of ABS for cyclic graphs can work as 
> follows along the lines:
> 1) Upon triggering a barrier in an IterationHead from the TaskManager start 
> block output and start upstream backup of all records forwarded from the 
> respective IterationSink.
> 2) The IterationSink should eventually forward the current snapshotting epoch 
> barrier to the IterationSource.
> 3) Upon receiving a barrier from the IterationSink, the IterationSource 
> should finalize the snapshot, unblock its output and emit all records 
> in-transit in FIFO order and continue the usual execution.
> --
> Upon restart the IterationSource should emit all records from the injected 
> snapshot first and then continue its usual execution.
> Several optimisations and slight variations can be potentially achieved but 
> this can be the initial implementation take.
> [1] http://arxiv.org/abs/1506.08603



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (FLINK-3667) Generalize client<->cluster communication

2016-03-24 Thread Maximilian Michels (JIRA)
Maximilian Michels created FLINK-3667:
-

 Summary: Generalize client<->cluster communication
 Key: FLINK-3667
 URL: https://issues.apache.org/jira/browse/FLINK-3667
 Project: Flink
  Issue Type: Improvement
  Components: YARN Client
Reporter: Maximilian Michels
Assignee: Maximilian Michels


Here are some notes I took when inspecting the client<->cluster classes with 
regard to future integration of other resource management frameworks in 
addition to Yarn (e.g. Mesos).

{noformat}

1 Cluster Client Abstraction


1.1 Status Quo
──

1.1.1 FlinkYarnClient
╌

  • Holds the cluster configuration (Flink-specific and Yarn-specific)
  • Contains the deploy() method to deploy the cluster
  • Creates the Hadoop Yarn client
  • Receives the initial job manager address
  • Bootstraps the FlinkYarnCluster


1.1.2 FlinkYarnCluster
╌╌

  • Wrapper around the Hadoop Yarn client
  • Queries cluster for status updates
  • Life time methods to start and shutdown the cluster
  • Flink specific features like shutdown after job completion


1.1.3 ApplicationClient
╌╌╌

  • Acts as a middle-man for asynchronous cluster communication
  • Designed to communicate with Yarn, not used in Standalone mode


1.1.4 CliFrontend
╌

  • Deeply integrated with FlinkYarnClient and FlinkYarnCluster
  • Constantly distinguishes between Yarn and Standalone mode
  • Would be nice to have a general abstraction in place


1.1.5 Client


  • Job submission and Job related actions, agnostic of resource framework


1.2 Proposal


1.2.1 ClusterConfig (before: AbstractFlinkYarnClient)
╌

  • Extensible cluster-agnostic config
  • May be extended by specific cluster, e.g. YarnClusterConfig


1.2.2 ClusterClient (before: AbstractFlinkYarnClient)
╌

  • Deals with cluster (RM) specific communication
  • Exposes framework agnostic information
  • YarnClusterClient, MesosClusterClient, StandaloneClusterClient


1.2.3 FlinkCluster (before: AbstractFlinkYarnCluster)
╌

  • Basic interface to communicate with a running cluster
  • Receives the ClusterClient for cluster-specific communication
  • Should not have to care about the specific implementations of the
client


1.2.4 ApplicationClient
╌╌╌

  • Can be changed to work cluster-agnostic (first steps already in
FLINK-3543)


1.2.5 CliFrontend
╌

  • CliFrontend does never have to differentiate between different
cluster types after it has determined which cluster class to load.
  • Base class handles framework agnostic command line arguments
  • Pluggables for Yarn, Mesos handle specific commands


{noformat}

I would like to create/refactor the affected classes to set us up for a more 
flexible client side resource management abstraction.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-3633) Job submission silently fails when using user code types

2016-03-24 Thread Till Rohrmann (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3633?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15210320#comment-15210320
 ] 

Till Rohrmann commented on FLINK-3633:
--

Yes that should work. Will try to update the PR. Thanks for the pointer :-)

On Thu, Mar 24, 2016 at 2:23 PM, ASF GitHub Bot (JIRA) 



> Job submission silently fails when using user code types
> 
>
> Key: FLINK-3633
> URL: https://issues.apache.org/jira/browse/FLINK-3633
> Project: Flink
>  Issue Type: Bug
>Affects Versions: 1.1.0
>Reporter: Till Rohrmann
>Assignee: Till Rohrmann
>Priority: Blocker
> Fix For: 1.1.0
>
>
> With the changes introduced by FLINK-3327, it is no longer possible to run 
> remote Flink jobs which work on user code types. The reason is that now the 
> {{ExecutionConfig}} is directly stored in the {{JobGraph}} which is sent as 
> an Akka message to the {{JobManager}}. Per default, user code types are 
> automatically detected and registered in the {{ExecutionConfig}}. When 
> deserializing a {{JobGraph}} whose {{ExecutionConfig}} contains user code 
> classes the user code class loader is consequently required. However, Akka 
> does not have access to it and uses the system class loader. This causes that 
> Akka silently discards the {{SubmitJob}} message which cannot be deserialized 
> because of a {{ClassNotFoundException}}.
> I propose to not sent the {{ExecutionConfig}} explicitly with the 
> {{JobGraph}} and, thus, to partially revert the changes to before FLINK-3327. 
> Before, the {{ExectuionConfig}} was serialized into the job configuration and 
> deserialized on the {{TaskManager}} using the proper user code class loader.
> In order to reproduce the problem you can submit the following job to a 
> remote cluster.
> {code}
> public class Job {
>   public static class CustomType {
>   private final int value;
>   public CustomType(int value) {
>   this.value = value;
>   }
>   @Override
>   public String toString() {
>   return "CustomType(" + value + ")";
>   }
>   }
>   public static void main(String[] args) throws Exception {
>   ExecutionEnvironment env = 
> ExecutionEnvironment.createRemoteEnvironment(Address, Port, PathToJar);
>   env.getConfig().disableAutoTypeRegistration();
>   DataSet input = env.fromElements(1,2,3,4,5);
>   DataSet customTypes = input.map(new 
> MapFunction() {
>   @Override
>   public CustomType map(Integer integer) throws Exception 
> {
>   return new CustomType(integer);
>   }
>   });
>   customTypes.print();
>   }
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-3639] add methods for registering datas...

2016-03-24 Thread vasia
Github user vasia commented on the pull request:

https://github.com/apache/flink/pull/1827#issuecomment-200864343
  
Merging!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-3651) Fix faulty RollingSink Restore

2016-03-24 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3651?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15210309#comment-15210309
 ] 

ASF GitHub Bot commented on FLINK-3651:
---

Github user aljoscha commented on the pull request:

https://github.com/apache/flink/pull/1830#issuecomment-200863307
  
Yes, what @rmetzger  said is correct. Without the fix and the increase of 
parallelism it will be a flaky test that still succeeds sometimes.


> Fix faulty RollingSink Restore
> --
>
> Key: FLINK-3651
> URL: https://issues.apache.org/jira/browse/FLINK-3651
> Project: Flink
>  Issue Type: Bug
>  Components: Streaming
>Reporter: Aljoscha Krettek
>Assignee: Aljoscha Krettek
>
> The RollingSink restore logic has a bug where the sink for subtask index 1 
> also removes files for subtask index 11 because the regex that checks for the 
> file name also matches that one. Adding the suffix to the regex should solve 
> the problem because then the regex for 1 will only match files for subtask 
> index 1.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-3651] Fix faulty RollingSink Restore

2016-03-24 Thread aljoscha
Github user aljoscha commented on the pull request:

https://github.com/apache/flink/pull/1830#issuecomment-200863307
  
Yes, what @rmetzger  said is correct. Without the fix and the increase of 
parallelism it will be a flaky test that still succeeds sometimes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-1745) Add exact k-nearest-neighbours algorithm to machine learning library

2016-03-24 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15210282#comment-15210282
 ] 

ASF GitHub Bot commented on FLINK-1745:
---

Github user danielblazevski commented on the pull request:

https://github.com/apache/flink/pull/1220#issuecomment-200854252
  
@tillrohrmann @chiwanpark done, polished up KNN.scala and some minor 
changes -- e.g. expanding the description of the parameters in the beginning of 
KNN.scala.  

Looking forward to doing the approximate version.  I ran some tests last 
week of the pure Scala z-value KNN and it looks promising 
(https://github.com/danielblazevski/zknn-scala)


> Add exact k-nearest-neighbours algorithm to machine learning library
> 
>
> Key: FLINK-1745
> URL: https://issues.apache.org/jira/browse/FLINK-1745
> Project: Flink
>  Issue Type: New Feature
>  Components: Machine Learning Library
>Reporter: Till Rohrmann
>Assignee: Daniel Blazevski
>  Labels: ML, Starter
>
> Even though the k-nearest-neighbours (kNN) [1,2] algorithm is quite trivial 
> it is still used as a mean to classify data and to do regression. This issue 
> focuses on the implementation of an exact kNN (H-BNLJ, H-BRJ) algorithm as 
> proposed in [2].
> Could be a starter task.
> Resources:
> [1] [http://en.wikipedia.org/wiki/K-nearest_neighbors_algorithm]
> [2] [https://www.cs.utah.edu/~lifeifei/papers/mrknnj.pdf]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-1745] Add exact k-nearest-neighbours al...

2016-03-24 Thread danielblazevski
Github user danielblazevski commented on the pull request:

https://github.com/apache/flink/pull/1220#issuecomment-200854252
  
@tillrohrmann @chiwanpark done, polished up KNN.scala and some minor 
changes -- e.g. expanding the description of the parameters in the beginning of 
KNN.scala.  

Looking forward to doing the approximate version.  I ran some tests last 
week of the pure Scala z-value KNN and it looks promising 
(https://github.com/danielblazevski/zknn-scala)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-3257) Add Exactly-Once Processing Guarantees in Iterative DataStream Jobs

2016-03-24 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15210279#comment-15210279
 ] 

ASF GitHub Bot commented on FLINK-3257:
---

Github user uce commented on a diff in the pull request:

https://github.com/apache/flink/pull/1668#discussion_r57321347
  
--- Diff: 
flink-streaming-java/src/main/java/org/apache/flink/streaming/runtime/tasks/StreamTask.java
 ---
@@ -450,112 +450,121 @@ else if (operator != null) {
}
}
 
-   @Override
-   public boolean triggerCheckpoint(final long checkpointId, final long 
timestamp) throws Exception {
-   LOG.debug("Starting checkpoint {} on task {}", checkpointId, 
getName());
-   
-   synchronized (lock) {
-   if (isRunning) {
-
-   // since both state checkpointing and 
downstream barrier emission occurs in this
-   // lock scope, they are an atomic operation 
regardless of the order in which they occur
-   // we immediately emit the checkpoint barriers, 
so the downstream operators can start
-   // their checkpoint work as soon as possible
-   
operatorChain.broadcastCheckpointBarrier(checkpointId, timestamp);
-   
-   // now draw the state snapshot
-   final StreamOperator[] allOperators = 
operatorChain.getAllOperators();
-   final StreamTaskState[] states = new 
StreamTaskState[allOperators.length];
+   /**
+* Checkpoints all operator states of the current StreamTask. 
+* Thread-safety must be handled outside the scope of this function
+*/
+   protected boolean checkpointStatesInternal(final long checkpointId, 
long timestamp) throws Exception {
--- End diff --

When rebasing we have to double check that nothing changed in this method 
when calling from `triggerCheckpoint` etc.


> Add Exactly-Once Processing Guarantees in Iterative DataStream Jobs
> ---
>
> Key: FLINK-3257
> URL: https://issues.apache.org/jira/browse/FLINK-3257
> Project: Flink
>  Issue Type: Improvement
>Reporter: Paris Carbone
>Assignee: Paris Carbone
>
> The current snapshotting algorithm cannot support cycles in the execution 
> graph. An alternative scheme can potentially include records in-transit 
> through the back-edges of a cyclic execution graph (ABS [1]) to achieve the 
> same guarantees.
> One straightforward implementation of ABS for cyclic graphs can work as 
> follows along the lines:
> 1) Upon triggering a barrier in an IterationHead from the TaskManager start 
> block output and start upstream backup of all records forwarded from the 
> respective IterationSink.
> 2) The IterationSink should eventually forward the current snapshotting epoch 
> barrier to the IterationSource.
> 3) Upon receiving a barrier from the IterationSink, the IterationSource 
> should finalize the snapshot, unblock its output and emit all records 
> in-transit in FIFO order and continue the usual execution.
> --
> Upon restart the IterationSource should emit all records from the injected 
> snapshot first and then continue its usual execution.
> Several optimisations and slight variations can be potentially achieved but 
> this can be the initial implementation take.
> [1] http://arxiv.org/abs/1506.08603



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-3257] Add Exactly-Once Processing Guara...

2016-03-24 Thread uce
Github user uce commented on a diff in the pull request:

https://github.com/apache/flink/pull/1668#discussion_r57321347
  
--- Diff: 
flink-streaming-java/src/main/java/org/apache/flink/streaming/runtime/tasks/StreamTask.java
 ---
@@ -450,112 +450,121 @@ else if (operator != null) {
}
}
 
-   @Override
-   public boolean triggerCheckpoint(final long checkpointId, final long 
timestamp) throws Exception {
-   LOG.debug("Starting checkpoint {} on task {}", checkpointId, 
getName());
-   
-   synchronized (lock) {
-   if (isRunning) {
-
-   // since both state checkpointing and 
downstream barrier emission occurs in this
-   // lock scope, they are an atomic operation 
regardless of the order in which they occur
-   // we immediately emit the checkpoint barriers, 
so the downstream operators can start
-   // their checkpoint work as soon as possible
-   
operatorChain.broadcastCheckpointBarrier(checkpointId, timestamp);
-   
-   // now draw the state snapshot
-   final StreamOperator[] allOperators = 
operatorChain.getAllOperators();
-   final StreamTaskState[] states = new 
StreamTaskState[allOperators.length];
+   /**
+* Checkpoints all operator states of the current StreamTask. 
+* Thread-safety must be handled outside the scope of this function
+*/
+   protected boolean checkpointStatesInternal(final long checkpointId, 
long timestamp) throws Exception {
--- End diff --

When rebasing we have to double check that nothing changed in this method 
when calling from `triggerCheckpoint` etc.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Created] (FLINK-3666) Remove Nephele references

2016-03-24 Thread Chesnay Schepler (JIRA)
Chesnay Schepler created FLINK-3666:
---

 Summary: Remove Nephele references
 Key: FLINK-3666
 URL: https://issues.apache.org/jira/browse/FLINK-3666
 Project: Flink
  Issue Type: Improvement
  Components: Documentation
Reporter: Chesnay Schepler
Priority: Trivial


There still exist a few references to nephele which should be removed:

{code}
flink\docs\setup\local_setup.md:
   79  $ tail log/flink-*-jobmanager-*.log
   80  INFO ... - Initializing memory manager with 409 megabytes of memory
   81: INFO ... - Trying to load 
org.apache.flinknephele.jobmanager.scheduler.local.LocalScheduler as scheduler
   82  INFO ... - Setting up web info server, using web-root directory ...
   83: INFO ... - Web info server will display information about nephele 
job-manager on localhost, port 8081.
   84  INFO ... - Starting web info server for JobManager on port 8081
   85  ~~~
   ..
  118  $ cd flink
  119  $ bin/start-local.sh
  120: Starting Nephele job manager
  121  ~~~
{code}

{code}
flink\flink-runtime\src\main\java\org\apache\flink\runtime\operators\TaskContext.java:
   70:  AbstractInvokable getOwningNepheleTask();
{code}

{code}
flink\flink-runtime\src\main\java\org\apache\flink\runtime\operators\BatchTask.java:
 1149* @param message The main message for the log.
 1150* @param taskName The name of the task.
 1151:   * @param parent The nephele task that contains the code producing the 
message.
 1152*
 1153* @return The string for logging.
 
 1254*/
 1255   @SuppressWarnings("unchecked")
 1256:  public static  Collector initOutputs(AbstractInvokable 
nepheleTask, ClassLoader cl, TaskConfig config,
 1257   
List chainedTasksTarget,
 1258   
List eventualOutputs,
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-3257] Add Exactly-Once Processing Guara...

2016-03-24 Thread uce
Github user uce commented on a diff in the pull request:

https://github.com/apache/flink/pull/1668#discussion_r57320231
  
--- Diff: 
flink-streaming-java/src/main/java/org/apache/flink/streaming/runtime/tasks/StreamTask.java
 ---
@@ -450,112 +450,121 @@ else if (operator != null) {
}
}
 
-   @Override
-   public boolean triggerCheckpoint(final long checkpointId, final long 
timestamp) throws Exception {
-   LOG.debug("Starting checkpoint {} on task {}", checkpointId, 
getName());
-   
-   synchronized (lock) {
-   if (isRunning) {
-
-   // since both state checkpointing and 
downstream barrier emission occurs in this
-   // lock scope, they are an atomic operation 
regardless of the order in which they occur
-   // we immediately emit the checkpoint barriers, 
so the downstream operators can start
-   // their checkpoint work as soon as possible
-   
operatorChain.broadcastCheckpointBarrier(checkpointId, timestamp);
-   
-   // now draw the state snapshot
-   final StreamOperator[] allOperators = 
operatorChain.getAllOperators();
-   final StreamTaskState[] states = new 
StreamTaskState[allOperators.length];
+   /**
+* Checkpoints all operator states of the current StreamTask. 
+* Thread-safety must be handled outside the scope of this function
+*/
+   protected boolean checkpointStatesInternal(final long checkpointId, 
long timestamp) throws Exception {
--- End diff --

Regarding the JavaDocs:
- The idomiatic style is to have a short description and then a blank line 
for more details (the first line will be displayed as a summary in the IDE etc.)
- The `of the current StreamTask` is clear from context
- The Thread-safety part should be more explicit, for instance `The caller 
has to make sure to call this method in scope of the task's checkpoint lock`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-3257) Add Exactly-Once Processing Guarantees in Iterative DataStream Jobs

2016-03-24 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15210270#comment-15210270
 ] 

ASF GitHub Bot commented on FLINK-3257:
---

Github user uce commented on a diff in the pull request:

https://github.com/apache/flink/pull/1668#discussion_r57320231
  
--- Diff: 
flink-streaming-java/src/main/java/org/apache/flink/streaming/runtime/tasks/StreamTask.java
 ---
@@ -450,112 +450,121 @@ else if (operator != null) {
}
}
 
-   @Override
-   public boolean triggerCheckpoint(final long checkpointId, final long 
timestamp) throws Exception {
-   LOG.debug("Starting checkpoint {} on task {}", checkpointId, 
getName());
-   
-   synchronized (lock) {
-   if (isRunning) {
-
-   // since both state checkpointing and 
downstream barrier emission occurs in this
-   // lock scope, they are an atomic operation 
regardless of the order in which they occur
-   // we immediately emit the checkpoint barriers, 
so the downstream operators can start
-   // their checkpoint work as soon as possible
-   
operatorChain.broadcastCheckpointBarrier(checkpointId, timestamp);
-   
-   // now draw the state snapshot
-   final StreamOperator[] allOperators = 
operatorChain.getAllOperators();
-   final StreamTaskState[] states = new 
StreamTaskState[allOperators.length];
+   /**
+* Checkpoints all operator states of the current StreamTask. 
+* Thread-safety must be handled outside the scope of this function
+*/
+   protected boolean checkpointStatesInternal(final long checkpointId, 
long timestamp) throws Exception {
--- End diff --

Regarding the JavaDocs:
- The idomiatic style is to have a short description and then a blank line 
for more details (the first line will be displayed as a summary in the IDE etc.)
- The `of the current StreamTask` is clear from context
- The Thread-safety part should be more explicit, for instance `The caller 
has to make sure to call this method in scope of the task's checkpoint lock`.


> Add Exactly-Once Processing Guarantees in Iterative DataStream Jobs
> ---
>
> Key: FLINK-3257
> URL: https://issues.apache.org/jira/browse/FLINK-3257
> Project: Flink
>  Issue Type: Improvement
>Reporter: Paris Carbone
>Assignee: Paris Carbone
>
> The current snapshotting algorithm cannot support cycles in the execution 
> graph. An alternative scheme can potentially include records in-transit 
> through the back-edges of a cyclic execution graph (ABS [1]) to achieve the 
> same guarantees.
> One straightforward implementation of ABS for cyclic graphs can work as 
> follows along the lines:
> 1) Upon triggering a barrier in an IterationHead from the TaskManager start 
> block output and start upstream backup of all records forwarded from the 
> respective IterationSink.
> 2) The IterationSink should eventually forward the current snapshotting epoch 
> barrier to the IterationSource.
> 3) Upon receiving a barrier from the IterationSink, the IterationSource 
> should finalize the snapshot, unblock its output and emit all records 
> in-transit in FIFO order and continue the usual execution.
> --
> Upon restart the IterationSource should emit all records from the injected 
> snapshot first and then continue its usual execution.
> Several optimisations and slight variations can be potentially achieved but 
> this can be the initial implementation take.
> [1] http://arxiv.org/abs/1506.08603



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-3257) Add Exactly-Once Processing Guarantees in Iterative DataStream Jobs

2016-03-24 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15210263#comment-15210263
 ] 

ASF GitHub Bot commented on FLINK-3257:
---

Github user uce commented on a diff in the pull request:

https://github.com/apache/flink/pull/1668#discussion_r57319737
  
--- Diff: 
flink-streaming-java/src/main/java/org/apache/flink/streaming/runtime/tasks/StreamTask.java
 ---
@@ -450,112 +450,121 @@ else if (operator != null) {
}
}
 
-   @Override
-   public boolean triggerCheckpoint(final long checkpointId, final long 
timestamp) throws Exception {
-   LOG.debug("Starting checkpoint {} on task {}", checkpointId, 
getName());
-   
-   synchronized (lock) {
-   if (isRunning) {
-
-   // since both state checkpointing and 
downstream barrier emission occurs in this
-   // lock scope, they are an atomic operation 
regardless of the order in which they occur
-   // we immediately emit the checkpoint barriers, 
so the downstream operators can start
-   // their checkpoint work as soon as possible
-   
operatorChain.broadcastCheckpointBarrier(checkpointId, timestamp);
-   
-   // now draw the state snapshot
-   final StreamOperator[] allOperators = 
operatorChain.getAllOperators();
-   final StreamTaskState[] states = new 
StreamTaskState[allOperators.length];
+   /**
+* Checkpoints all operator states of the current StreamTask. 
+* Thread-safety must be handled outside the scope of this function
+*/
+   protected boolean checkpointStatesInternal(final long checkpointId, 
long timestamp) throws Exception {
--- End diff --

What about naming this as in the comments `drawStateSnapshot`? That it is 
internal is more or less communicated by the fact that it is a `protected` 
method.


> Add Exactly-Once Processing Guarantees in Iterative DataStream Jobs
> ---
>
> Key: FLINK-3257
> URL: https://issues.apache.org/jira/browse/FLINK-3257
> Project: Flink
>  Issue Type: Improvement
>Reporter: Paris Carbone
>Assignee: Paris Carbone
>
> The current snapshotting algorithm cannot support cycles in the execution 
> graph. An alternative scheme can potentially include records in-transit 
> through the back-edges of a cyclic execution graph (ABS [1]) to achieve the 
> same guarantees.
> One straightforward implementation of ABS for cyclic graphs can work as 
> follows along the lines:
> 1) Upon triggering a barrier in an IterationHead from the TaskManager start 
> block output and start upstream backup of all records forwarded from the 
> respective IterationSink.
> 2) The IterationSink should eventually forward the current snapshotting epoch 
> barrier to the IterationSource.
> 3) Upon receiving a barrier from the IterationSink, the IterationSource 
> should finalize the snapshot, unblock its output and emit all records 
> in-transit in FIFO order and continue the usual execution.
> --
> Upon restart the IterationSource should emit all records from the injected 
> snapshot first and then continue its usual execution.
> Several optimisations and slight variations can be potentially achieved but 
> this can be the initial implementation take.
> [1] http://arxiv.org/abs/1506.08603



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-3257] Add Exactly-Once Processing Guara...

2016-03-24 Thread uce
Github user uce commented on a diff in the pull request:

https://github.com/apache/flink/pull/1668#discussion_r57319737
  
--- Diff: 
flink-streaming-java/src/main/java/org/apache/flink/streaming/runtime/tasks/StreamTask.java
 ---
@@ -450,112 +450,121 @@ else if (operator != null) {
}
}
 
-   @Override
-   public boolean triggerCheckpoint(final long checkpointId, final long 
timestamp) throws Exception {
-   LOG.debug("Starting checkpoint {} on task {}", checkpointId, 
getName());
-   
-   synchronized (lock) {
-   if (isRunning) {
-
-   // since both state checkpointing and 
downstream barrier emission occurs in this
-   // lock scope, they are an atomic operation 
regardless of the order in which they occur
-   // we immediately emit the checkpoint barriers, 
so the downstream operators can start
-   // their checkpoint work as soon as possible
-   
operatorChain.broadcastCheckpointBarrier(checkpointId, timestamp);
-   
-   // now draw the state snapshot
-   final StreamOperator[] allOperators = 
operatorChain.getAllOperators();
-   final StreamTaskState[] states = new 
StreamTaskState[allOperators.length];
+   /**
+* Checkpoints all operator states of the current StreamTask. 
+* Thread-safety must be handled outside the scope of this function
+*/
+   protected boolean checkpointStatesInternal(final long checkpointId, 
long timestamp) throws Exception {
--- End diff --

What about naming this as in the comments `drawStateSnapshot`? That it is 
internal is more or less communicated by the fact that it is a `protected` 
method.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-3657) Change access of DataSetUtils.countElements() to 'public'

2016-03-24 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15210261#comment-15210261
 ] 

ASF GitHub Bot commented on FLINK-3657:
---

Github user smarthi commented on the pull request:

https://github.com/apache/flink/pull/1829#issuecomment-200848523
  
@uce @fhueske to add more context to this PR, we r in the final stretch of 
a planned 0.12.0 Mahout release that adds Flink as a distributed engine for 
Samsara linear algebra. I will be demoing the Mahout-Flink integration at my 
talk in ApacheCon, Vancouver, May 9-11. If this can't make it in 1.0.1, I guess 
we need to go with a redundant clone in Mahout until this becomes available in 
a future Flink release and want to avoid that.   


> Change access of DataSetUtils.countElements() to 'public' 
> --
>
> Key: FLINK-3657
> URL: https://issues.apache.org/jira/browse/FLINK-3657
> Project: Flink
>  Issue Type: Improvement
>  Components: DataSet API
>Affects Versions: 1.0.0
>Reporter: Suneel Marthi
>Assignee: Suneel Marthi
>Priority: Minor
> Fix For: 1.0.1
>
>
> The access of DatasetUtils.countElements() is presently 'private', change 
> that to be 'public'. We happened to be replicating the functionality in our 
> project and realized the method already existed in Flink.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: FLINK-3657: Change access of DataSetUtils.coun...

2016-03-24 Thread smarthi
Github user smarthi commented on the pull request:

https://github.com/apache/flink/pull/1829#issuecomment-200848523
  
@uce @fhueske to add more context to this PR, we r in the final stretch of 
a planned 0.12.0 Mahout release that adds Flink as a distributed engine for 
Samsara linear algebra. I will be demoing the Mahout-Flink integration at my 
talk in ApacheCon, Vancouver, May 9-11. If this can't make it in 1.0.1, I guess 
we need to go with a redundant clone in Mahout until this becomes available in 
a future Flink release and want to avoid that.   


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-3633] Fix user code de/serialization in...

2016-03-24 Thread uce
Github user uce commented on the pull request:

https://github.com/apache/flink/pull/1818#issuecomment-200830880
  
The other programs that are executed in `ClassLoaderITCase` are contained 
in the `org.apache.flink.test.classloading.jar` package (search for 
`KMeansForTest` for example). They are then packaged by the assembly plugin 
(see `pom.xml` of `flink-tests`). The `ClassLoaderITCase` sets up a cluster and 
submits the assembled JARs.

Can we just follow this setup?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-3633) Job submission silently fails when using user code types

2016-03-24 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3633?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15210201#comment-15210201
 ] 

ASF GitHub Bot commented on FLINK-3633:
---

Github user uce commented on the pull request:

https://github.com/apache/flink/pull/1818#issuecomment-200830880
  
The other programs that are executed in `ClassLoaderITCase` are contained 
in the `org.apache.flink.test.classloading.jar` package (search for 
`KMeansForTest` for example). They are then packaged by the assembly plugin 
(see `pom.xml` of `flink-tests`). The `ClassLoaderITCase` sets up a cluster and 
submits the assembled JARs.

Can we just follow this setup?


> Job submission silently fails when using user code types
> 
>
> Key: FLINK-3633
> URL: https://issues.apache.org/jira/browse/FLINK-3633
> Project: Flink
>  Issue Type: Bug
>Affects Versions: 1.1.0
>Reporter: Till Rohrmann
>Assignee: Till Rohrmann
>Priority: Blocker
> Fix For: 1.1.0
>
>
> With the changes introduced by FLINK-3327, it is no longer possible to run 
> remote Flink jobs which work on user code types. The reason is that now the 
> {{ExecutionConfig}} is directly stored in the {{JobGraph}} which is sent as 
> an Akka message to the {{JobManager}}. Per default, user code types are 
> automatically detected and registered in the {{ExecutionConfig}}. When 
> deserializing a {{JobGraph}} whose {{ExecutionConfig}} contains user code 
> classes the user code class loader is consequently required. However, Akka 
> does not have access to it and uses the system class loader. This causes that 
> Akka silently discards the {{SubmitJob}} message which cannot be deserialized 
> because of a {{ClassNotFoundException}}.
> I propose to not sent the {{ExecutionConfig}} explicitly with the 
> {{JobGraph}} and, thus, to partially revert the changes to before FLINK-3327. 
> Before, the {{ExectuionConfig}} was serialized into the job configuration and 
> deserialized on the {{TaskManager}} using the proper user code class loader.
> In order to reproduce the problem you can submit the following job to a 
> remote cluster.
> {code}
> public class Job {
>   public static class CustomType {
>   private final int value;
>   public CustomType(int value) {
>   this.value = value;
>   }
>   @Override
>   public String toString() {
>   return "CustomType(" + value + ")";
>   }
>   }
>   public static void main(String[] args) throws Exception {
>   ExecutionEnvironment env = 
> ExecutionEnvironment.createRemoteEnvironment(Address, Port, PathToJar);
>   env.getConfig().disableAutoTypeRegistration();
>   DataSet input = env.fromElements(1,2,3,4,5);
>   DataSet customTypes = input.map(new 
> MapFunction() {
>   @Override
>   public CustomType map(Integer integer) throws Exception 
> {
>   return new CustomType(integer);
>   }
>   });
>   customTypes.print();
>   }
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-3633) Job submission silently fails when using user code types

2016-03-24 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3633?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15210180#comment-15210180
 ] 

ASF GitHub Bot commented on FLINK-3633:
---

Github user tillrohrmann commented on the pull request:

https://github.com/apache/flink/pull/1818#issuecomment-200821794
  
I agree that having a dedicated class loading test would be nicer. However, 
we would need to build a jar containing the test job before executing the test, 
right? So the easiest solution for that would be a new module. But that is also 
really ugly. @uce do you have a better idea for that?


> Job submission silently fails when using user code types
> 
>
> Key: FLINK-3633
> URL: https://issues.apache.org/jira/browse/FLINK-3633
> Project: Flink
>  Issue Type: Bug
>Affects Versions: 1.1.0
>Reporter: Till Rohrmann
>Assignee: Till Rohrmann
>Priority: Blocker
> Fix For: 1.1.0
>
>
> With the changes introduced by FLINK-3327, it is no longer possible to run 
> remote Flink jobs which work on user code types. The reason is that now the 
> {{ExecutionConfig}} is directly stored in the {{JobGraph}} which is sent as 
> an Akka message to the {{JobManager}}. Per default, user code types are 
> automatically detected and registered in the {{ExecutionConfig}}. When 
> deserializing a {{JobGraph}} whose {{ExecutionConfig}} contains user code 
> classes the user code class loader is consequently required. However, Akka 
> does not have access to it and uses the system class loader. This causes that 
> Akka silently discards the {{SubmitJob}} message which cannot be deserialized 
> because of a {{ClassNotFoundException}}.
> I propose to not sent the {{ExecutionConfig}} explicitly with the 
> {{JobGraph}} and, thus, to partially revert the changes to before FLINK-3327. 
> Before, the {{ExectuionConfig}} was serialized into the job configuration and 
> deserialized on the {{TaskManager}} using the proper user code class loader.
> In order to reproduce the problem you can submit the following job to a 
> remote cluster.
> {code}
> public class Job {
>   public static class CustomType {
>   private final int value;
>   public CustomType(int value) {
>   this.value = value;
>   }
>   @Override
>   public String toString() {
>   return "CustomType(" + value + ")";
>   }
>   }
>   public static void main(String[] args) throws Exception {
>   ExecutionEnvironment env = 
> ExecutionEnvironment.createRemoteEnvironment(Address, Port, PathToJar);
>   env.getConfig().disableAutoTypeRegistration();
>   DataSet input = env.fromElements(1,2,3,4,5);
>   DataSet customTypes = input.map(new 
> MapFunction() {
>   @Override
>   public CustomType map(Integer integer) throws Exception 
> {
>   return new CustomType(integer);
>   }
>   });
>   customTypes.print();
>   }
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-3633] Fix user code de/serialization in...

2016-03-24 Thread tillrohrmann
Github user tillrohrmann commented on the pull request:

https://github.com/apache/flink/pull/1818#issuecomment-200821794
  
I agree that having a dedicated class loading test would be nicer. However, 
we would need to build a jar containing the test job before executing the test, 
right? So the easiest solution for that would be a new module. But that is also 
really ugly. @uce do you have a better idea for that?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Closed] (FLINK-2935) Allow scala shell to read yarn properties

2016-03-24 Thread Chiwan Park (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-2935?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chiwan Park closed FLINK-2935.
--
   Resolution: Implemented
Fix Version/s: 1.1.0

Implemented via 5108f6875f102e613d70cc90fd13d269465f2bff.

> Allow scala shell to read yarn properties
> -
>
> Key: FLINK-2935
> URL: https://issues.apache.org/jira/browse/FLINK-2935
> Project: Flink
>  Issue Type: Improvement
>  Components: Scala Shell
>Affects Versions: 0.9.1
>Reporter: Johannes
>Assignee: Chiwan Park
>Priority: Minor
>  Labels: easyfix
> Fix For: 1.1.0
>
>
> Currently the deployment of flink via yarn and the scala shell are not linked.
> When deploying a yarn session the file
> bq. org.apache.flink.client.CliFrontend
> creates a 
> bq. .yarn-properties-$username
> file with the connection properties.
> There should be a way to have the scala shell automatically read this file if 
> wanted as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-2935) Allow scala shell to read yarn properties

2016-03-24 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15210165#comment-15210165
 ] 

ASF GitHub Bot commented on FLINK-2935:
---

Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/1500


> Allow scala shell to read yarn properties
> -
>
> Key: FLINK-2935
> URL: https://issues.apache.org/jira/browse/FLINK-2935
> Project: Flink
>  Issue Type: Improvement
>  Components: Scala Shell
>Affects Versions: 0.9.1
>Reporter: Johannes
>Assignee: Chiwan Park
>Priority: Minor
>  Labels: easyfix
>
> Currently the deployment of flink via yarn and the scala shell are not linked.
> When deploying a yarn session the file
> bq. org.apache.flink.client.CliFrontend
> creates a 
> bq. .yarn-properties-$username
> file with the connection properties.
> There should be a way to have the scala shell automatically read this file if 
> wanted as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-2935] [scala-shell] Allow Scala shell t...

2016-03-24 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/1500


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-2935) Allow scala shell to read yarn properties

2016-03-24 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15210134#comment-15210134
 ] 

ASF GitHub Bot commented on FLINK-2935:
---

Github user chiwanpark commented on the pull request:

https://github.com/apache/flink/pull/1500#issuecomment-200804302
  
Oh, thanks for pointing it @mxm. Then I'll merge this.


> Allow scala shell to read yarn properties
> -
>
> Key: FLINK-2935
> URL: https://issues.apache.org/jira/browse/FLINK-2935
> Project: Flink
>  Issue Type: Improvement
>  Components: Scala Shell
>Affects Versions: 0.9.1
>Reporter: Johannes
>Assignee: Chiwan Park
>Priority: Minor
>  Labels: easyfix
>
> Currently the deployment of flink via yarn and the scala shell are not linked.
> When deploying a yarn session the file
> bq. org.apache.flink.client.CliFrontend
> creates a 
> bq. .yarn-properties-$username
> file with the connection properties.
> There should be a way to have the scala shell automatically read this file if 
> wanted as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-3657) Change access of DataSetUtils.countElements() to 'public'

2016-03-24 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15210133#comment-15210133
 ] 

ASF GitHub Bot commented on FLINK-3657:
---

Github user fhueske commented on the pull request:

https://github.com/apache/flink/pull/1829#issuecomment-200804264
  
@uce, API stability is not an issue. The method was `private` before and 
part of the accessible API. IMO, the question is rather whether we want to add 
new features in a bugfix release.


> Change access of DataSetUtils.countElements() to 'public' 
> --
>
> Key: FLINK-3657
> URL: https://issues.apache.org/jira/browse/FLINK-3657
> Project: Flink
>  Issue Type: Improvement
>  Components: DataSet API
>Affects Versions: 1.0.0
>Reporter: Suneel Marthi
>Assignee: Suneel Marthi
>Priority: Minor
> Fix For: 1.0.1
>
>
> The access of DatasetUtils.countElements() is presently 'private', change 
> that to be 'public'. We happened to be replicating the functionality in our 
> project and realized the method already existed in Flink.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-2935] [scala-shell] Allow Scala shell t...

2016-03-24 Thread chiwanpark
Github user chiwanpark commented on the pull request:

https://github.com/apache/flink/pull/1500#issuecomment-200804302
  
Oh, thanks for pointing it @mxm. Then I'll merge this.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: FLINK-3657: Change access of DataSetUtils.coun...

2016-03-24 Thread fhueske
Github user fhueske commented on the pull request:

https://github.com/apache/flink/pull/1829#issuecomment-200804264
  
@uce, API stability is not an issue. The method was `private` before and 
part of the accessible API. IMO, the question is rather whether we want to add 
new features in a bugfix release.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-3639) Add methods and utilities to register DataSets and Tables in the TableEnvironment

2016-03-24 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3639?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15210129#comment-15210129
 ] 

ASF GitHub Bot commented on FLINK-3639:
---

Github user fhueske commented on the pull request:

https://github.com/apache/flink/pull/1827#issuecomment-200803230
  
Changes look good! +1 to merge


> Add methods and utilities to register DataSets and Tables in the 
> TableEnvironment
> -
>
> Key: FLINK-3639
> URL: https://issues.apache.org/jira/browse/FLINK-3639
> Project: Flink
>  Issue Type: New Feature
>  Components: Table API
>Affects Versions: 1.1.0
>Reporter: Vasia Kalavri
>Assignee: Vasia Kalavri
>
> In order to make tables queryable from SQL we need to register them under a 
> unique name in the TableEnvironment.
> [This design 
> document|https://docs.google.com/document/d/1sITIShmJMGegzAjGqFuwiN_iw1urwykKsLiacokxSw0/edit]
>  describes the proposed API.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-3639] add methods for registering datas...

2016-03-24 Thread fhueske
Github user fhueske commented on the pull request:

https://github.com/apache/flink/pull/1827#issuecomment-200803230
  
Changes look good! +1 to merge


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: FLINK-3657: Change access of DataSetUtils.coun...

2016-03-24 Thread uce
Github user uce commented on the pull request:

https://github.com/apache/flink/pull/1829#issuecomment-200803091
  
As a side note (I didn't look at the code changes): We can't rename the 
method name for `1.0.1` as `DataSetUtils` is annotated with `PublicEvolving`, 
meaning that we are only allowed to change between minor versions, that is for 
`1.1.0`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-3657) Change access of DataSetUtils.countElements() to 'public'

2016-03-24 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15210127#comment-15210127
 ] 

ASF GitHub Bot commented on FLINK-3657:
---

Github user uce commented on the pull request:

https://github.com/apache/flink/pull/1829#issuecomment-200803091
  
As a side note (I didn't look at the code changes): We can't rename the 
method name for `1.0.1` as `DataSetUtils` is annotated with `PublicEvolving`, 
meaning that we are only allowed to change between minor versions, that is for 
`1.1.0`.


> Change access of DataSetUtils.countElements() to 'public' 
> --
>
> Key: FLINK-3657
> URL: https://issues.apache.org/jira/browse/FLINK-3657
> Project: Flink
>  Issue Type: Improvement
>  Components: DataSet API
>Affects Versions: 1.0.0
>Reporter: Suneel Marthi
>Assignee: Suneel Marthi
>Priority: Minor
> Fix For: 1.0.1
>
>
> The access of DatasetUtils.countElements() is presently 'private', change 
> that to be 'public'. We happened to be replicating the functionality in our 
> project and realized the method already existed in Flink.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-3544] Introduce ResourceManager compone...

2016-03-24 Thread StephanEwen
Github user StephanEwen commented on the pull request:

https://github.com/apache/flink/pull/1741#issuecomment-200803051
  
Continuing my review now, but don't block this on me. If I find anything 
crucial, I will open a pull request against master if it is merged by then..


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-3544) ResourceManager runtime components

2016-03-24 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15210126#comment-15210126
 ] 

ASF GitHub Bot commented on FLINK-3544:
---

Github user StephanEwen commented on the pull request:

https://github.com/apache/flink/pull/1741#issuecomment-200803051
  
Continuing my review now, but don't block this on me. If I find anything 
crucial, I will open a pull request against master if it is merged by then..


> ResourceManager runtime components
> --
>
> Key: FLINK-3544
> URL: https://issues.apache.org/jira/browse/FLINK-3544
> Project: Flink
>  Issue Type: Sub-task
>  Components: ResourceManager
>Affects Versions: 1.1.0
>Reporter: Maximilian Michels
>Assignee: Maximilian Michels
> Fix For: 1.1.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-3547] add support for streaming filter,...

2016-03-24 Thread fhueske
Github user fhueske commented on the pull request:

https://github.com/apache/flink/pull/1820#issuecomment-200802410
  
Thanks for the update! +1 to merge :-)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-3547) Add support for streaming projection, selection, and union

2016-03-24 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15210123#comment-15210123
 ] 

ASF GitHub Bot commented on FLINK-3547:
---

Github user fhueske commented on the pull request:

https://github.com/apache/flink/pull/1820#issuecomment-200802410
  
Thanks for the update! +1 to merge :-)


> Add support for streaming projection, selection, and union
> --
>
> Key: FLINK-3547
> URL: https://issues.apache.org/jira/browse/FLINK-3547
> Project: Flink
>  Issue Type: Sub-task
>  Components: Table API
>Reporter: Vasia Kalavri
>Assignee: Vasia Kalavri
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-3633) Job submission silently fails when using user code types

2016-03-24 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3633?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15210119#comment-15210119
 ] 

ASF GitHub Bot commented on FLINK-3633:
---

Github user uce commented on the pull request:

https://github.com/apache/flink/pull/1818#issuecomment-200802077
  
The changes look good and make parts where user code is serialized very 
clear. :+1: 

I've verified that the `ScalaShellITCase` with the change in this PR fails 
for the current master. I'm wondering whether it is possible to make the test 
for this more explicit by adding a JAR with the example program outlined in the 
JIRA issue to the `ClassLoaderITCase`.

In any case, +1 to merge. It's your call whether you add a further class 
loader test or not.





> Job submission silently fails when using user code types
> 
>
> Key: FLINK-3633
> URL: https://issues.apache.org/jira/browse/FLINK-3633
> Project: Flink
>  Issue Type: Bug
>Affects Versions: 1.1.0
>Reporter: Till Rohrmann
>Assignee: Till Rohrmann
>Priority: Blocker
> Fix For: 1.1.0
>
>
> With the changes introduced by FLINK-3327, it is no longer possible to run 
> remote Flink jobs which work on user code types. The reason is that now the 
> {{ExecutionConfig}} is directly stored in the {{JobGraph}} which is sent as 
> an Akka message to the {{JobManager}}. Per default, user code types are 
> automatically detected and registered in the {{ExecutionConfig}}. When 
> deserializing a {{JobGraph}} whose {{ExecutionConfig}} contains user code 
> classes the user code class loader is consequently required. However, Akka 
> does not have access to it and uses the system class loader. This causes that 
> Akka silently discards the {{SubmitJob}} message which cannot be deserialized 
> because of a {{ClassNotFoundException}}.
> I propose to not sent the {{ExecutionConfig}} explicitly with the 
> {{JobGraph}} and, thus, to partially revert the changes to before FLINK-3327. 
> Before, the {{ExectuionConfig}} was serialized into the job configuration and 
> deserialized on the {{TaskManager}} using the proper user code class loader.
> In order to reproduce the problem you can submit the following job to a 
> remote cluster.
> {code}
> public class Job {
>   public static class CustomType {
>   private final int value;
>   public CustomType(int value) {
>   this.value = value;
>   }
>   @Override
>   public String toString() {
>   return "CustomType(" + value + ")";
>   }
>   }
>   public static void main(String[] args) throws Exception {
>   ExecutionEnvironment env = 
> ExecutionEnvironment.createRemoteEnvironment(Address, Port, PathToJar);
>   env.getConfig().disableAutoTypeRegistration();
>   DataSet input = env.fromElements(1,2,3,4,5);
>   DataSet customTypes = input.map(new 
> MapFunction() {
>   @Override
>   public CustomType map(Integer integer) throws Exception 
> {
>   return new CustomType(integer);
>   }
>   });
>   customTypes.print();
>   }
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-3633] Fix user code de/serialization in...

2016-03-24 Thread uce
Github user uce commented on the pull request:

https://github.com/apache/flink/pull/1818#issuecomment-200802077
  
The changes look good and make parts where user code is serialized very 
clear. :+1: 

I've verified that the `ScalaShellITCase` with the change in this PR fails 
for the current master. I'm wondering whether it is possible to make the test 
for this more explicit by adding a JAR with the example program outlined in the 
JIRA issue to the `ClassLoaderITCase`.

In any case, +1 to merge. It's your call whether you add a further class 
loader test or not.





---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-2946) Add orderBy() to Table API

2016-03-24 Thread Fabian Hueske (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15210118#comment-15210118
 ] 

Fabian Hueske commented on FLINK-2946:
--

Hi [~dawidwys], thanks a lot for working on this issue!
I had a look at your branch. You're definitely on the right track. Here are a 
few comments:

- The Table API syntax looks good
- In {{Table.orderBy()}} you should not extract aggregations, etc. Instead 
check that the expressions match the following patterns ({{Table.as()}} does 
similar checks):
-- {{UnresolvedFieldReference}}
-- {{Asc(UnresolvedFieldReference)}}
-- {{Desc(UnresolvedFieldReference)}}
-- We can add support for more complex expressions and order by position later.
- Add asc() to {{RexNodeTranslator}}
- I just realized that Flink's range partitioning lacks support to define sort 
orders for partition keys. We need to add this to make global sorting work 
correctly. I added FLINK-3665 to address this issue.
- We do not need to range partition if the parallelism of the input is 1 (check 
{{inputDs.getParallelism() == 1}})

I'll be out for vacation for about two weeks. Not sure if I can follow up on 
this until I am back.

> Add orderBy() to Table API
> --
>
> Key: FLINK-2946
> URL: https://issues.apache.org/jira/browse/FLINK-2946
> Project: Flink
>  Issue Type: New Feature
>  Components: Table API
>Reporter: Timo Walther
>Assignee: Dawid Wysakowicz
>
> In order to implement a FLINK-2099 prototype that uses the Table APIs code 
> generation facilities, the Table API needs a sorting feature.
> I would implement it the next days. Ideas how to implement such a sorting 
> feature are very welcome. Is there any more efficient way instead of 
> {{.sortPartition(...).setParallism(1)}}? Is it better to sort locally on the 
> nodes first and finally sort on one node afterwards?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (FLINK-3665) Range partitioning lacks support to define sort orders

2016-03-24 Thread Fabian Hueske (JIRA)
Fabian Hueske created FLINK-3665:


 Summary: Range partitioning lacks support to define sort orders
 Key: FLINK-3665
 URL: https://issues.apache.org/jira/browse/FLINK-3665
 Project: Flink
  Issue Type: Improvement
  Components: DataSet API
Affects Versions: 1.0.0
Reporter: Fabian Hueske
 Fix For: 1.1.0


{{DataSet.partitionByRange()}} does not allow to specify the sort order of 
fields. This is fine if range partitioning is used to reduce skewed 
partitioning. 
However, it is not sufficient if range partitioning is used to sort a data set 
in parallel. 

Since {{DataSet.partitionByRange()}} is {{@Public}} API and cannot be easily 
changed, I propose to add a method {{withOrders(Order... orders)}} to 
{{PartitionOperator}}. The method should throw an exception if the partitioning 
method of {{PartitionOperator}} is not range partitioning.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (FLINK-2821) Change Akka configuration to allow accessing actors from different URLs

2016-03-24 Thread Maximilian Michels (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-2821?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maximilian Michels updated FLINK-2821:
--
Fix Version/s: 1.1.0

> Change Akka configuration to allow accessing actors from different URLs
> ---
>
> Key: FLINK-2821
> URL: https://issues.apache.org/jira/browse/FLINK-2821
> Project: Flink
>  Issue Type: Bug
>  Components: Distributed Runtime
>Reporter: Robert Metzger
>Assignee: Maximilian Michels
> Fix For: 1.1.0
>
>
> Akka expects the actor's URL to be exactly matching.
> As pointed out here, cases where users were complaining about this: 
> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Error-trying-to-access-JM-through-proxy-td3018.html
>   - Proxy routing (as described here, send to the proxy URL, receiver 
> recognizes only original URL)
>   - Using hostname / IP interchangeably does not work (we solved this by 
> always putting IP addresses into URLs, never hostnames)
>   - Binding to multiple interfaces (any local 0.0.0.0) does not work. Still 
> no solution to that (but seems not too much of a restriction)
> I am aware that this is not possible due to Akka, so it is actually not a 
> Flink bug. But I think we should track the resolution of the issue here 
> anyways because its affecting our user's satisfaction.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-3651) Fix faulty RollingSink Restore

2016-03-24 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3651?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15210100#comment-15210100
 ] 

ASF GitHub Bot commented on FLINK-3651:
---

Github user rmetzger commented on the pull request:

https://github.com/apache/flink/pull/1830#issuecomment-200796710
  
I think the test fails only sporadically, depending on the speed at which 
tasks are deleting the files.
If you are lucky, everything is correct.


> Fix faulty RollingSink Restore
> --
>
> Key: FLINK-3651
> URL: https://issues.apache.org/jira/browse/FLINK-3651
> Project: Flink
>  Issue Type: Bug
>  Components: Streaming
>Reporter: Aljoscha Krettek
>Assignee: Aljoscha Krettek
>
> The RollingSink restore logic has a bug where the sink for subtask index 1 
> also removes files for subtask index 11 because the regex that checks for the 
> file name also matches that one. Adding the suffix to the regex should solve 
> the problem because then the regex for 1 will only match files for subtask 
> index 1.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-3651] Fix faulty RollingSink Restore

2016-03-24 Thread rmetzger
Github user rmetzger commented on the pull request:

https://github.com/apache/flink/pull/1830#issuecomment-200796710
  
I think the test fails only sporadically, depending on the speed at which 
tasks are deleting the files.
If you are lucky, everything is correct.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-3651) Fix faulty RollingSink Restore

2016-03-24 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3651?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15210096#comment-15210096
 ] 

ASF GitHub Bot commented on FLINK-3651:
---

Github user uce commented on the pull request:

https://github.com/apache/flink/pull/1830#issuecomment-200795802
  
Good catch! The changes look good to me.

I tried running the adjusted `RollingSinkFaultToleranceITCase` and 
`RollingSinkFaultTolerance2ITCase` w/o the fix in `RollingSink` and they still 
succeeded.

If this expected? Is there a way we can test this differently?


> Fix faulty RollingSink Restore
> --
>
> Key: FLINK-3651
> URL: https://issues.apache.org/jira/browse/FLINK-3651
> Project: Flink
>  Issue Type: Bug
>  Components: Streaming
>Reporter: Aljoscha Krettek
>Assignee: Aljoscha Krettek
>
> The RollingSink restore logic has a bug where the sink for subtask index 1 
> also removes files for subtask index 11 because the regex that checks for the 
> file name also matches that one. Adding the suffix to the regex should solve 
> the problem because then the regex for 1 will only match files for subtask 
> index 1.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-3651] Fix faulty RollingSink Restore

2016-03-24 Thread uce
Github user uce commented on the pull request:

https://github.com/apache/flink/pull/1830#issuecomment-200795802
  
Good catch! The changes look good to me.

I tried running the adjusted `RollingSinkFaultToleranceITCase` and 
`RollingSinkFaultTolerance2ITCase` w/o the fix in `RollingSink` and they still 
succeeded.

If this expected? Is there a way we can test this differently?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-3524) Provide a JSONDeserialisationSchema in the kafka connector package

2016-03-24 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15210093#comment-15210093
 ] 

ASF GitHub Bot commented on FLINK-3524:
---

Github user rmetzger commented on the pull request:

https://github.com/apache/flink/pull/1834#issuecomment-200795303
  
+1 to merge


> Provide a JSONDeserialisationSchema in the kafka connector package
> --
>
> Key: FLINK-3524
> URL: https://issues.apache.org/jira/browse/FLINK-3524
> Project: Flink
>  Issue Type: Improvement
>  Components: Kafka Connector
>Reporter: Robert Metzger
>Assignee: Chesnay Schepler
>  Labels: starter
>
> (I don't want to include this into 1.0.0)
> Currently, there is no standardized way of parsing JSON data from a Kafka 
> stream. I see a lot of users using JSON in their topics. It would make things 
> easier for our users to provide a serializer for them.
> I suggest to use the jackson library because we have that aready as a 
> dependency in Flink and it allows to parse from a byte[].
> I would suggest to provide the following classes:
>  - JSONDeserializationSchema()
>  - JSONDeKeyValueSerializationSchema(bool includeMetadata)
> The second variant should produce a record like this:
> {code}
> {"key": "keydata",
> "value": "valuedata",
> "metadata": {"offset": 123, "topic": "", "partition": 2 }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-3524] [kafka] Add JSONDeserializationSc...

2016-03-24 Thread rmetzger
Github user rmetzger commented on the pull request:

https://github.com/apache/flink/pull/1834#issuecomment-200795303
  
+1 to merge


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-3658) Allow the FlinkKafkaProducer to send data to multiple topics

2016-03-24 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15210088#comment-15210088
 ] 

ASF GitHub Bot commented on FLINK-3658:
---

Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/1832


> Allow the FlinkKafkaProducer to send data to multiple topics
> 
>
> Key: FLINK-3658
> URL: https://issues.apache.org/jira/browse/FLINK-3658
> Project: Flink
>  Issue Type: Improvement
>  Components: Kafka Connector
>Affects Versions: 1.0.0
>Reporter: Robert Metzger
>Assignee: Robert Metzger
> Fix For: 1.1.0
>
>
> Currently, the FlinkKafkaProducer is sending all events to one topic defined 
> when creating the producer.
> We could allow users to send messages to multiple topics by extending the 
> {{KeyedSerializationSchema}} by a method {{public String getTargetTopic(T 
> element)}} which overrides the default topic if the return value is not null.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (FLINK-3658) Allow the FlinkKafkaProducer to send data to multiple topics

2016-03-24 Thread Robert Metzger (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-3658?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Metzger resolved FLINK-3658.
---
   Resolution: Fixed
Fix Version/s: 1.1.0

Resolved in master for 1.1.0 
http://git-wip-us.apache.org/repos/asf/flink/commit/c77e5ece

> Allow the FlinkKafkaProducer to send data to multiple topics
> 
>
> Key: FLINK-3658
> URL: https://issues.apache.org/jira/browse/FLINK-3658
> Project: Flink
>  Issue Type: Improvement
>  Components: Kafka Connector
>Affects Versions: 1.0.0
>Reporter: Robert Metzger
>Assignee: Robert Metzger
> Fix For: 1.1.0
>
>
> Currently, the FlinkKafkaProducer is sending all events to one topic defined 
> when creating the producer.
> We could allow users to send messages to multiple topics by extending the 
> {{KeyedSerializationSchema}} by a method {{public String getTargetTopic(T 
> element)}} which overrides the default topic if the return value is not null.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-3658][Kafka] Allow producing into multi...

2016-03-24 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/1832


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-2609] [streaming] auto-register types

2016-03-24 Thread rmetzger
Github user rmetzger commented on the pull request:

https://github.com/apache/flink/pull/1833#issuecomment-200793560
  
+1 to merge


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-2609) Automatic type registration is only called from the batch execution environment

2016-03-24 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15210086#comment-15210086
 ] 

ASF GitHub Bot commented on FLINK-2609:
---

Github user rmetzger commented on the pull request:

https://github.com/apache/flink/pull/1833#issuecomment-200793560
  
+1 to merge


> Automatic type registration is only called from the batch execution 
> environment
> ---
>
> Key: FLINK-2609
> URL: https://issues.apache.org/jira/browse/FLINK-2609
> Project: Flink
>  Issue Type: Bug
>  Components: Streaming
>Affects Versions: 0.10.0
>Reporter: Robert Metzger
>Assignee: Chesnay Schepler
>
> Kryo types in the streaming API are quite expensive to serialize because they 
> are not automatically registered at Kryo.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-3655) Allow comma-separated or multiple directories to be specified for FileInputFormat

2016-03-24 Thread Robert Metzger (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15210081#comment-15210081
 ] 

Robert Metzger commented on FLINK-3655:
---

Thank you for opening a JIRA for this feature request.
I think its a good idea and it shouldn't be too difficult to implement.

> Allow comma-separated or multiple directories to be specified for 
> FileInputFormat
> -
>
> Key: FLINK-3655
> URL: https://issues.apache.org/jira/browse/FLINK-3655
> Project: Flink
>  Issue Type: Improvement
>  Components: Core
>Affects Versions: 1.0.0
>Reporter: Gna Phetsarath
>Priority: Minor
>  Labels: starter
>
> Allow comma-separated or multiple directories to be specified for 
> FileInputFormat so that a DataSource will process the directories 
> sequentially.
>
> env.readFile("/data/2016/01/01/*/*,/data/2016/01/02/*/*,/data/2016/01/03/*/*")
> in Scala
>env.readFile(paths: Seq[String])
> or 
>   env.readFile(path: String, otherPaths: String*)
> Wildcard support would be a bonus.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (FLINK-3655) Allow comma-separated or multiple directories to be specified for FileInputFormat

2016-03-24 Thread Robert Metzger (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-3655?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Metzger updated FLINK-3655:
--
Labels: starter  (was: )

> Allow comma-separated or multiple directories to be specified for 
> FileInputFormat
> -
>
> Key: FLINK-3655
> URL: https://issues.apache.org/jira/browse/FLINK-3655
> Project: Flink
>  Issue Type: Improvement
>  Components: Core
>Affects Versions: 1.0.0
>Reporter: Gna Phetsarath
>Priority: Minor
>  Labels: starter
>
> Allow comma-separated or multiple directories to be specified for 
> FileInputFormat so that a DataSource will process the directories 
> sequentially.
>
> env.readFile("/data/2016/01/01/*/*,/data/2016/01/02/*/*,/data/2016/01/03/*/*")
> in Scala
>env.readFile(paths: Seq[String])
> or 
>   env.readFile(path: String, otherPaths: String*)
> Wildcard support would be a bonus.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Closed] (FLINK-3663) FlinkKafkaConsumerBase.logPartitionInfo is missing a log marker

2016-03-24 Thread Ufuk Celebi (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-3663?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ufuk Celebi closed FLINK-3663.
--
   Resolution: Fixed
Fix Version/s: 1.0.1
   1.1.0

Fixed in f0b6ba4 (master), 085f7b2 (release-1.0).

> FlinkKafkaConsumerBase.logPartitionInfo is missing a log marker
> ---
>
> Key: FLINK-3663
> URL: https://issues.apache.org/jira/browse/FLINK-3663
> Project: Flink
>  Issue Type: Bug
>  Components: Kafka Connector
>Affects Versions: 1.0.0
>Reporter: Niels Zeilemaker
>Priority: Trivial
> Fix For: 1.1.0, 1.0.1
>
>
> While debugging a flink kafka app I noticed that the logPartitionInfo method 
> is broken. It's missing a marker, and hence the stringbuffer is never logged.
> I can create a pull-request fixing the problem if necessary.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-3663) FlinkKafkaConsumerBase.logPartitionInfo is missing a log marker

2016-03-24 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3663?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15210074#comment-15210074
 ] 

ASF GitHub Bot commented on FLINK-3663:
---

Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/1835


> FlinkKafkaConsumerBase.logPartitionInfo is missing a log marker
> ---
>
> Key: FLINK-3663
> URL: https://issues.apache.org/jira/browse/FLINK-3663
> Project: Flink
>  Issue Type: Bug
>  Components: Kafka Connector
>Affects Versions: 1.0.0
>Reporter: Niels Zeilemaker
>Priority: Trivial
>
> While debugging a flink kafka app I noticed that the logPartitionInfo method 
> is broken. It's missing a marker, and hence the stringbuffer is never logged.
> I can create a pull-request fixing the problem if necessary.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: FLINK-3663: FlinkKafkaConsumerBase.logPartitio...

2016-03-24 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/1835


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-3663) FlinkKafkaConsumerBase.logPartitionInfo is missing a log marker

2016-03-24 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3663?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15210072#comment-15210072
 ] 

ASF GitHub Bot commented on FLINK-3663:
---

Github user uce commented on the pull request:

https://github.com/apache/flink/pull/1835#issuecomment-200787034
  
I'm merging this


> FlinkKafkaConsumerBase.logPartitionInfo is missing a log marker
> ---
>
> Key: FLINK-3663
> URL: https://issues.apache.org/jira/browse/FLINK-3663
> Project: Flink
>  Issue Type: Bug
>  Components: Kafka Connector
>Affects Versions: 1.0.0
>Reporter: Niels Zeilemaker
>Priority: Trivial
>
> While debugging a flink kafka app I noticed that the logPartitionInfo method 
> is broken. It's missing a marker, and hence the stringbuffer is never logged.
> I can create a pull-request fixing the problem if necessary.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Closed] (FLINK-3653) recovery.zookeeper.storageDir is not documented on the configuration page

2016-03-24 Thread Ufuk Celebi (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-3653?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ufuk Celebi closed FLINK-3653.
--
   Resolution: Fixed
Fix Version/s: 1.0.1

Fixed in 5357ebe (release-1.0), 27dfd86 (master).

> recovery.zookeeper.storageDir is not documented on the configuration page
> -
>
> Key: FLINK-3653
> URL: https://issues.apache.org/jira/browse/FLINK-3653
> Project: Flink
>  Issue Type: Bug
>  Components: Documentation
>Affects Versions: 1.0.0
>Reporter: Stefano Baghino
>Assignee: Stefano Baghino
>Priority: Minor
> Fix For: 1.1.0, 1.0.1
>
>
> The {{recovery.zookeeper.storageDir}} option is documented in the HA page but 
> is missing from the configuration page. Since it's required for HA I think it 
> would be a good idea to have it on both pages.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-3653) recovery.zookeeper.storageDir is not documented on the configuration page

2016-03-24 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3653?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15210070#comment-15210070
 ] 

ASF GitHub Bot commented on FLINK-3653:
---

Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/1828


> recovery.zookeeper.storageDir is not documented on the configuration page
> -
>
> Key: FLINK-3653
> URL: https://issues.apache.org/jira/browse/FLINK-3653
> Project: Flink
>  Issue Type: Bug
>  Components: Documentation
>Affects Versions: 1.0.0
>Reporter: Stefano Baghino
>Assignee: Stefano Baghino
>Priority: Minor
> Fix For: 1.1.0, 1.0.1
>
>
> The {{recovery.zookeeper.storageDir}} option is documented in the HA page but 
> is missing from the configuration page. Since it's required for HA I think it 
> would be a good idea to have it on both pages.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-3653] recovery.zookeeper.storageDir is ...

2016-03-24 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/1828


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-3653) recovery.zookeeper.storageDir is not documented on the configuration page

2016-03-24 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3653?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15210068#comment-15210068
 ] 

ASF GitHub Bot commented on FLINK-3653:
---

Github user uce commented on the pull request:

https://github.com/apache/flink/pull/1828#issuecomment-200786158
  
Thanks! I'm merging this.


> recovery.zookeeper.storageDir is not documented on the configuration page
> -
>
> Key: FLINK-3653
> URL: https://issues.apache.org/jira/browse/FLINK-3653
> Project: Flink
>  Issue Type: Bug
>  Components: Documentation
>Affects Versions: 1.0.0
>Reporter: Stefano Baghino
>Assignee: Stefano Baghino
>Priority: Minor
> Fix For: 1.1.0
>
>
> The {{recovery.zookeeper.storageDir}} option is documented in the HA page but 
> is missing from the configuration page. Since it's required for HA I think it 
> would be a good idea to have it on both pages.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-3653] recovery.zookeeper.storageDir is ...

2016-03-24 Thread uce
Github user uce commented on the pull request:

https://github.com/apache/flink/pull/1828#issuecomment-200786158
  
Thanks! I'm merging this.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-3547] add support for streaming filter,...

2016-03-24 Thread vasia
Github user vasia commented on the pull request:

https://github.com/apache/flink/pull/1820#issuecomment-200784086
  
Thanks. I've addressed you comments!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-3547) Add support for streaming projection, selection, and union

2016-03-24 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15210065#comment-15210065
 ] 

ASF GitHub Bot commented on FLINK-3547:
---

Github user vasia commented on the pull request:

https://github.com/apache/flink/pull/1820#issuecomment-200784086
  
Thanks. I've addressed you comments!


> Add support for streaming projection, selection, and union
> --
>
> Key: FLINK-3547
> URL: https://issues.apache.org/jira/browse/FLINK-3547
> Project: Flink
>  Issue Type: Sub-task
>  Components: Table API
>Reporter: Vasia Kalavri
>Assignee: Vasia Kalavri
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-2935) Allow scala shell to read yarn properties

2016-03-24 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15210043#comment-15210043
 ] 

ASF GitHub Bot commented on FLINK-2935:
---

Github user mxm commented on the pull request:

https://github.com/apache/flink/pull/1500#issuecomment-200770572
  
Thanks @chiwanpark! As far as I can see, there are no overlapping classes 
between the two pull requests.


> Allow scala shell to read yarn properties
> -
>
> Key: FLINK-2935
> URL: https://issues.apache.org/jira/browse/FLINK-2935
> Project: Flink
>  Issue Type: Improvement
>  Components: Scala Shell
>Affects Versions: 0.9.1
>Reporter: Johannes
>Assignee: Chiwan Park
>Priority: Minor
>  Labels: easyfix
>
> Currently the deployment of flink via yarn and the scala shell are not linked.
> When deploying a yarn session the file
> bq. org.apache.flink.client.CliFrontend
> creates a 
> bq. .yarn-properties-$username
> file with the connection properties.
> There should be a way to have the scala shell automatically read this file if 
> wanted as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-2935] [scala-shell] Allow Scala shell t...

2016-03-24 Thread mxm
Github user mxm commented on the pull request:

https://github.com/apache/flink/pull/1500#issuecomment-200770572
  
Thanks @chiwanpark! As far as I can see, there are no overlapping classes 
between the two pull requests.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-3544] Introduce ResourceManager compone...

2016-03-24 Thread mxm
Github user mxm commented on the pull request:

https://github.com/apache/flink/pull/1741#issuecomment-200769986
  
I've incorporated the changes and the tests pass. I would like to merge the 
pull request. Please let me know if there are still pending code reviews.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-3639) Add methods and utilities to register DataSets and Tables in the TableEnvironment

2016-03-24 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3639?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15210039#comment-15210039
 ] 

ASF GitHub Bot commented on FLINK-3639:
---

Github user vasia commented on the pull request:

https://github.com/apache/flink/pull/1827#issuecomment-200769959
  
Thanks. I've addressed your comments!


> Add methods and utilities to register DataSets and Tables in the 
> TableEnvironment
> -
>
> Key: FLINK-3639
> URL: https://issues.apache.org/jira/browse/FLINK-3639
> Project: Flink
>  Issue Type: New Feature
>  Components: Table API
>Affects Versions: 1.1.0
>Reporter: Vasia Kalavri
>Assignee: Vasia Kalavri
>
> In order to make tables queryable from SQL we need to register them under a 
> unique name in the TableEnvironment.
> [This design 
> document|https://docs.google.com/document/d/1sITIShmJMGegzAjGqFuwiN_iw1urwykKsLiacokxSw0/edit]
>  describes the proposed API.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-3544) ResourceManager runtime components

2016-03-24 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15210037#comment-15210037
 ] 

ASF GitHub Bot commented on FLINK-3544:
---

Github user mxm commented on a diff in the pull request:

https://github.com/apache/flink/pull/1741#discussion_r57295865
  
--- Diff: 
flink-core/src/main/java/org/apache/flink/configuration/ConfigConstants.java ---
@@ -240,6 +246,37 @@
 */
public static final String FS_STREAM_OPENING_TIMEOUT_KEY = 
"taskmanager.runtime.fs_timeout";
 
+   
+   //  Common Resource Framework Configuration (YARN & Mesos) 

+
+   /**
+* Percentage of heap space to remove from containers (YARN / Mesos), 
to compensate
+* for other JVM memory usage.
+*/
+   public static final String CONTAINERED_HEAP_CUTOFF_RATIO = 
"containered.heap-cutoff-ratio";
+
+   /**
+* Minimum amount of heap memory to remove in containers, as a safety 
margin.
+*/
+   public static final String CONTAINERED_HEAP_CUTOFF_MIN = 
"containered.heap-cutoff-min";
+
+   /**
+* Prefix for passing custom environment variables to Flink's master 
process.
+* For example for passing LD_LIBRARY_PATH as an env variable to the 
AppMaster, set:
+* yarn.application-master.env.LD_LIBRARY_PATH: "/usr/lib/native"
+* in the flink-conf.yaml.
+*/
+   public static final String CONTAINERED_MASTER_ENV_PREFIX = 
"containered.application-master.env.";
+
+   /**
+* Similar to the {@see CONTAINERED_MASTER_ENV_PREFIX}, this 
configuration prefix allows
+* setting custom environment variables for the workers (TaskManagers)
+*/
+   public static final String CONTAINERED_TASK_MANAGER_ENV_PREFIX = 
"containered.taskmanager.env.";
--- End diff --

I thought about theses prefixes again. I think they make sense but we could 
possibly change them before the release in a follow-up.


> ResourceManager runtime components
> --
>
> Key: FLINK-3544
> URL: https://issues.apache.org/jira/browse/FLINK-3544
> Project: Flink
>  Issue Type: Sub-task
>  Components: ResourceManager
>Affects Versions: 1.1.0
>Reporter: Maximilian Michels
>Assignee: Maximilian Michels
> Fix For: 1.1.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


  1   2   >