[GitHub] flink pull request: [FLINK-2324] [streaming] Partitioned state che...

2015-07-29 Thread gyfora
Github user gyfora commented on the pull request:

https://github.com/apache/flink/pull/937#issuecomment-125872157
  
Merging your PR fixed the exactly-once guarantees @StephanEwen , great!

I also added an extra test for Partitioned states.

Do you think it is okay to leave the StreamCheckpointingITCase how I 
modified it? Now it tests all the different checkpointing mechanisms together. 
If you add a test for the Checkpointed interface itself, this should be good.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-2324) Rework partitioned state storage

2015-07-29 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14645645#comment-14645645
 ] 

ASF GitHub Bot commented on FLINK-2324:
---

Github user gyfora commented on the pull request:

https://github.com/apache/flink/pull/937#issuecomment-125872157
  
Merging your PR fixed the exactly-once guarantees @StephanEwen , great!

I also added an extra test for Partitioned states.

Do you think it is okay to leave the StreamCheckpointingITCase how I 
modified it? Now it tests all the different checkpointing mechanisms together. 
If you add a test for the Checkpointed interface itself, this should be good.


> Rework partitioned state storage
> 
>
> Key: FLINK-2324
> URL: https://issues.apache.org/jira/browse/FLINK-2324
> Project: Flink
>  Issue Type: Improvement
>Reporter: Gyula Fora
>Assignee: Gyula Fora
>
> Partitioned states are currently stored per-key in statehandles. This is 
> alright for in-memory storage but is very inefficient for HDFS. 
> The logic behind the current mechanism is that this approach provides a way 
> to repartition a state without fetching the data from the external storage 
> and only manipulating handles.
> We should come up with a solution that can achieve both.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1901) Create sample operator for Dataset

2015-07-29 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14645658#comment-14645658
 ] 

ASF GitHub Bot commented on FLINK-1901:
---

GitHub user ChengXiangLi opened a pull request:

https://github.com/apache/flink/pull/949

[FLINK-1901] [core] Create sample operator for Dataset.

This PR includes:
1. 4 random sampler implementation for different sample strategies.
2. sample operator for DataSet Java API.
3. random sampler unit test.
4. sample operator Java API integration test.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/ChengXiangLi/flink FLINK-1901

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/949.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #949


commit f7ba8779b8d6a6d66ab5d4e2435a70e220b1e0fc
Author: chengxiang li 
Date:   2015-07-22T03:38:13Z

[FLINK-1901] [core] Create sample operator for Dataset.




> Create sample operator for Dataset
> --
>
> Key: FLINK-1901
> URL: https://issues.apache.org/jira/browse/FLINK-1901
> Project: Flink
>  Issue Type: Improvement
>  Components: Core
>Reporter: Theodore Vasiloudis
>Assignee: Chengxiang Li
>
> In order to be able to implement Stochastic Gradient Descent and a number of 
> other machine learning algorithms we need to have a way to take a random 
> sample from a Dataset.
> We need to be able to sample with or without replacement from the Dataset, 
> choose the relative size of the sample, and set a seed for reproducibility.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-1901] [core] Create sample operator for...

2015-07-29 Thread ChengXiangLi
GitHub user ChengXiangLi opened a pull request:

https://github.com/apache/flink/pull/949

[FLINK-1901] [core] Create sample operator for Dataset.

This PR includes:
1. 4 random sampler implementation for different sample strategies.
2. sample operator for DataSet Java API.
3. random sampler unit test.
4. sample operator Java API integration test.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/ChengXiangLi/flink FLINK-1901

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/949.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #949


commit f7ba8779b8d6a6d66ab5d4e2435a70e220b1e0fc
Author: chengxiang li 
Date:   2015-07-22T03:38:13Z

[FLINK-1901] [core] Create sample operator for Dataset.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Closed] (FLINK-2418) Add an end-to-end streaming fault tolerance test for the Checkpointed interface

2015-07-29 Thread Stephan Ewen (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-2418?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephan Ewen closed FLINK-2418.
---

> Add an end-to-end  streaming fault tolerance test for the Checkpointed 
> interface
> 
>
> Key: FLINK-2418
> URL: https://issues.apache.org/jira/browse/FLINK-2418
> Project: Flink
>  Issue Type: Bug
>  Components: Streaming
>Affects Versions: 0.10
>Reporter: Stephan Ewen
>Assignee: Stephan Ewen
> Fix For: 0.10
>
>
> The current test lacks the following:
>   - Does not validate and exactly-once counts after partitioning step
>   - Does not cover all cases with the {{Checkpointed}} interface, but uses a 
> mix of by-key state and explicitly Checkpointed state
>   - The test uses a non-fault-tolerant deprecated infinite reducer.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (FLINK-2418) Add an end-to-end streaming fault tolerance test for the Checkpointed interface

2015-07-29 Thread Stephan Ewen (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-2418?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephan Ewen resolved FLINK-2418.
-
Resolution: Fixed

Fixed via a0556efb233f15c6985d17886372a8b4b00392b2

> Add an end-to-end  streaming fault tolerance test for the Checkpointed 
> interface
> 
>
> Key: FLINK-2418
> URL: https://issues.apache.org/jira/browse/FLINK-2418
> Project: Flink
>  Issue Type: Bug
>  Components: Streaming
>Affects Versions: 0.10
>Reporter: Stephan Ewen
>Assignee: Stephan Ewen
> Fix For: 0.10
>
>
> The current test lacks the following:
>   - Does not validate and exactly-once counts after partitioning step
>   - Does not cover all cases with the {{Checkpointed}} interface, but uses a 
> mix of by-key state and explicitly Checkpointed state
>   - The test uses a non-fault-tolerant deprecated infinite reducer.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Closed] (FLINK-2421) StreamRecordSerializer incorrectly duplicates and misses tests

2015-07-29 Thread Stephan Ewen (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-2421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephan Ewen closed FLINK-2421.
---

> StreamRecordSerializer incorrectly duplicates and misses tests
> --
>
> Key: FLINK-2421
> URL: https://issues.apache.org/jira/browse/FLINK-2421
> Project: Flink
>  Issue Type: Bug
>  Components: Streaming
>Affects Versions: 0.10
>Reporter: Stephan Ewen
>Assignee: Stephan Ewen
> Fix For: 0.10
>
>
> The duplication method of the {{StreamRecordSerializer}} does not respect the 
> duplication of the internal type serializer.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Closed] (FLINK-2420) OutputFlush thread in stream writers does not propagate exceptions

2015-07-29 Thread Stephan Ewen (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-2420?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephan Ewen closed FLINK-2420.
---

> OutputFlush thread in stream writers does not propagate exceptions
> --
>
> Key: FLINK-2420
> URL: https://issues.apache.org/jira/browse/FLINK-2420
> Project: Flink
>  Issue Type: Bug
>  Components: Streaming
>Affects Versions: 0.10
>Reporter: Stephan Ewen
>Assignee: Stephan Ewen
> Fix For: 0.10
>
>
> The output flush thread only throws exceptions that it encountered. This 
> simply lets the thread die, when the exceptions reach the root of the stack.
> The exceptions never reach the actual writer code. That way, exceptions that 
> only happen on flush (or the last flush) would never be detected.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (FLINK-2420) OutputFlush thread in stream writers does not propagate exceptions

2015-07-29 Thread Stephan Ewen (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-2420?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephan Ewen resolved FLINK-2420.
-
Resolution: Fixed

Fixed via 8ba321332b994579f387add8bd0855bd29cb33ec

> OutputFlush thread in stream writers does not propagate exceptions
> --
>
> Key: FLINK-2420
> URL: https://issues.apache.org/jira/browse/FLINK-2420
> Project: Flink
>  Issue Type: Bug
>  Components: Streaming
>Affects Versions: 0.10
>Reporter: Stephan Ewen
>Assignee: Stephan Ewen
> Fix For: 0.10
>
>
> The output flush thread only throws exceptions that it encountered. This 
> simply lets the thread die, when the exceptions reach the root of the stack.
> The exceptions never reach the actual writer code. That way, exceptions that 
> only happen on flush (or the last flush) would never be detected.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (FLINK-2421) StreamRecordSerializer incorrectly duplicates and misses tests

2015-07-29 Thread Stephan Ewen (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-2421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephan Ewen resolved FLINK-2421.
-
Resolution: Fixed

Fixed via 2d237e18a2f7cf21721340933c505bb518c4fc66

> StreamRecordSerializer incorrectly duplicates and misses tests
> --
>
> Key: FLINK-2421
> URL: https://issues.apache.org/jira/browse/FLINK-2421
> Project: Flink
>  Issue Type: Bug
>  Components: Streaming
>Affects Versions: 0.10
>Reporter: Stephan Ewen
>Assignee: Stephan Ewen
> Fix For: 0.10
>
>
> The duplication method of the {{StreamRecordSerializer}} does not respect the 
> duplication of the internal type serializer.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Closed] (FLINK-2406) Abstract BarrierBuffer to an exchangeable BarrierHandler

2015-07-29 Thread Stephan Ewen (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-2406?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephan Ewen closed FLINK-2406.
---

> Abstract BarrierBuffer to an exchangeable BarrierHandler
> 
>
> Key: FLINK-2406
> URL: https://issues.apache.org/jira/browse/FLINK-2406
> Project: Flink
>  Issue Type: Sub-task
>  Components: Streaming
>Affects Versions: 0.10
>Reporter: Stephan Ewen
>Assignee: Stephan Ewen
> Fix For: 0.10
>
>
> We need to make the Checkpoint handling pluggable, to allow us to use 
> different implementations:
>   - BarrierBuffer for "exactly once" processing. This inevitably introduces a 
> bit of latency.
>   - BarrierTracker for "at least once" processing, with no added latency.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (FLINK-2402) Add a non-blocking BarrierTracker

2015-07-29 Thread Stephan Ewen (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-2402?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephan Ewen resolved FLINK-2402.
-
Resolution: Implemented

Implemented in 8f87b7164b644ea8f1708f7eb76567e58341b224

> Add a non-blocking BarrierTracker
> -
>
> Key: FLINK-2402
> URL: https://issues.apache.org/jira/browse/FLINK-2402
> Project: Flink
>  Issue Type: Sub-task
>  Components: Streaming
>Affects Versions: 0.10
>Reporter: Stephan Ewen
>Assignee: Stephan Ewen
> Fix For: 0.10
>
>
> This issue would add a new tracker for barriers that simply tracks what 
> barriers have been observed from which inputs. It never blocks off buffers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Closed] (FLINK-2402) Add a non-blocking BarrierTracker

2015-07-29 Thread Stephan Ewen (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-2402?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephan Ewen closed FLINK-2402.
---

> Add a non-blocking BarrierTracker
> -
>
> Key: FLINK-2402
> URL: https://issues.apache.org/jira/browse/FLINK-2402
> Project: Flink
>  Issue Type: Sub-task
>  Components: Streaming
>Affects Versions: 0.10
>Reporter: Stephan Ewen
>Assignee: Stephan Ewen
> Fix For: 0.10
>
>
> This issue would add a new tracker for barriers that simply tracks what 
> barriers have been observed from which inputs. It never blocks off buffers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (FLINK-2406) Abstract BarrierBuffer to an exchangeable BarrierHandler

2015-07-29 Thread Stephan Ewen (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-2406?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephan Ewen resolved FLINK-2406.
-
Resolution: Fixed

Fixed via 0579f90bab165a7df336163eb9d6337267020029

> Abstract BarrierBuffer to an exchangeable BarrierHandler
> 
>
> Key: FLINK-2406
> URL: https://issues.apache.org/jira/browse/FLINK-2406
> Project: Flink
>  Issue Type: Sub-task
>  Components: Streaming
>Affects Versions: 0.10
>Reporter: Stephan Ewen
>Assignee: Stephan Ewen
> Fix For: 0.10
>
>
> We need to make the Checkpoint handling pluggable, to allow us to use 
> different implementations:
>   - BarrierBuffer for "exactly once" processing. This inevitably introduces a 
> bit of latency.
>   - BarrierTracker for "at least once" processing, with no added latency.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-2399] Version checks for Job Manager an...

2015-07-29 Thread StephanEwen
Github user StephanEwen commented on the pull request:

https://github.com/apache/flink/pull/945#issuecomment-125886476
  
Using `getClass().getPackage().getImplementationVersion()` would be a 
decent first approach then, I guess. The critical part seems to be the 
Client-to-JobManager communication.

How about the following as a first step: The client sends its version 
together with the `SubmitJob` message (just add a field there). The JobManager 
would check the version and respond with a failure, if it does not match. You 
can probably make the JobManager part very simple, no need to add extra 
constructor parameters, etc.

That way, the change would be minimally invasive, and we could see how well 
it addresses the issues, and whether we should extend this to other messages as 
well.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-2399) Fail when actor versions don't match

2015-07-29 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14645737#comment-14645737
 ] 

ASF GitHub Bot commented on FLINK-2399:
---

Github user StephanEwen commented on the pull request:

https://github.com/apache/flink/pull/945#issuecomment-125886476
  
Using `getClass().getPackage().getImplementationVersion()` would be a 
decent first approach then, I guess. The critical part seems to be the 
Client-to-JobManager communication.

How about the following as a first step: The client sends its version 
together with the `SubmitJob` message (just add a field there). The JobManager 
would check the version and respond with a failure, if it does not match. You 
can probably make the JobManager part very simple, no need to add extra 
constructor parameters, etc.

That way, the change would be minimally invasive, and we could see how well 
it addresses the issues, and whether we should extend this to other messages as 
well.


> Fail when actor versions don't match
> 
>
> Key: FLINK-2399
> URL: https://issues.apache.org/jira/browse/FLINK-2399
> Project: Flink
>  Issue Type: Improvement
>  Components: JobManager, TaskManager
>Affects Versions: 0.9, master
>Reporter: Ufuk Celebi
>Assignee: Sachin Goel
>Priority: Minor
> Fix For: 0.10
>
>
> Problem: there can be subtle errors when actors from different Flink versions 
> communicate with each other, for example when an old client (e.g. Flink 0.9) 
> communicates with a new JobManager (e.g. Flink 0.10-SNAPSHOT).
> We can check that the versions match on first communication between the 
> actors and fail if they don't match.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-2424) InstantiationUtil.serializeObject(Object) does not close output stream

2015-07-29 Thread Stephan Ewen (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14645740#comment-14645740
 ] 

Stephan Ewen commented on FLINK-2424:
-

oha, good one!

> InstantiationUtil.serializeObject(Object) does not close output stream
> --
>
> Key: FLINK-2424
> URL: https://issues.apache.org/jira/browse/FLINK-2424
> Project: Flink
>  Issue Type: Bug
>  Components: Core
>Affects Versions: 0.9, master
>Reporter: Ufuk Celebi
>Assignee: Ufuk Celebi
> Fix For: 0.10, 0.9.1
>
>
> The utility methods InstantiationUtil.serializeObject(Object) does not close 
> the created output stream.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-2406] [FLINK-2402] Abstract the Barrier...

2015-07-29 Thread StephanEwen
Github user StephanEwen commented on the pull request:

https://github.com/apache/flink/pull/938#issuecomment-125887226
  
It failed when at the end of the program, queued channel data was present 
at the end of the job. This happens only with slow consumers. Apparently, the 
local machines are usually fast enough to never leave queue data, and Travis is 
the slow one again ;-)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-2406) Abstract BarrierBuffer to an exchangeable BarrierHandler

2015-07-29 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14645744#comment-14645744
 ] 

ASF GitHub Bot commented on FLINK-2406:
---

Github user StephanEwen commented on the pull request:

https://github.com/apache/flink/pull/938#issuecomment-125887226
  
It failed when at the end of the program, queued channel data was present 
at the end of the job. This happens only with slow consumers. Apparently, the 
local machines are usually fast enough to never leave queue data, and Travis is 
the slow one again ;-)


> Abstract BarrierBuffer to an exchangeable BarrierHandler
> 
>
> Key: FLINK-2406
> URL: https://issues.apache.org/jira/browse/FLINK-2406
> Project: Flink
>  Issue Type: Sub-task
>  Components: Streaming
>Affects Versions: 0.10
>Reporter: Stephan Ewen
>Assignee: Stephan Ewen
> Fix For: 0.10
>
>
> We need to make the Checkpoint handling pluggable, to allow us to use 
> different implementations:
>   - BarrierBuffer for "exactly once" processing. This inevitably introduces a 
> bit of latency.
>   - BarrierTracker for "at least once" processing, with no added latency.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-2391) Storm-compatibility:method FlinkTopologyBuilder.createTopology() throws java.lang.NullPointerException

2015-07-29 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14645748#comment-14645748
 ] 

ASF GitHub Bot commented on FLINK-2391:
---

Github user StephanEwen commented on the pull request:

https://github.com/apache/flink/pull/940#issuecomment-125887557
  
Will merge this...


> Storm-compatibility:method FlinkTopologyBuilder.createTopology() throws 
> java.lang.NullPointerException
> --
>
> Key: FLINK-2391
> URL: https://issues.apache.org/jira/browse/FLINK-2391
> Project: Flink
>  Issue Type: Bug
>  Components: flink-contrib
>Affects Versions: 0.10
> Environment: win7 32bit;linux
>Reporter: Huang Wei
>  Labels: features
> Fix For: 0.10
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> core dumped at FlinkOutputFieldsDeclarer.java : 160(package 
> FlinkOutputFieldsDeclarer).
> code : fieldIndexes[i] = this.outputSchema.fieldIndex(groupingFields.get(i));
> in this line, the var this.outputSchema may be null.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-2391]Fix Storm-compatibility FlinkTopol...

2015-07-29 Thread StephanEwen
Github user StephanEwen commented on the pull request:

https://github.com/apache/flink/pull/940#issuecomment-125887557
  
Will merge this...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-2406) Abstract BarrierBuffer to an exchangeable BarrierHandler

2015-07-29 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14645749#comment-14645749
 ] 

ASF GitHub Bot commented on FLINK-2406:
---

Github user uce commented on the pull request:

https://github.com/apache/flink/pull/938#issuecomment-125887588
  
I really love Travis :heart:


> Abstract BarrierBuffer to an exchangeable BarrierHandler
> 
>
> Key: FLINK-2406
> URL: https://issues.apache.org/jira/browse/FLINK-2406
> Project: Flink
>  Issue Type: Sub-task
>  Components: Streaming
>Affects Versions: 0.10
>Reporter: Stephan Ewen
>Assignee: Stephan Ewen
> Fix For: 0.10
>
>
> We need to make the Checkpoint handling pluggable, to allow us to use 
> different implementations:
>   - BarrierBuffer for "exactly once" processing. This inevitably introduces a 
> bit of latency.
>   - BarrierTracker for "at least once" processing, with no added latency.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-2406] [FLINK-2402] Abstract the Barrier...

2015-07-29 Thread uce
Github user uce commented on the pull request:

https://github.com/apache/flink/pull/938#issuecomment-125887588
  
I really love Travis :heart:


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-2406] [FLINK-2402] Abstract the Barrier...

2015-07-29 Thread StephanEwen
Github user StephanEwen closed the pull request at:

https://github.com/apache/flink/pull/938


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-2406) Abstract BarrierBuffer to an exchangeable BarrierHandler

2015-07-29 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14645761#comment-14645761
 ] 

ASF GitHub Bot commented on FLINK-2406:
---

Github user StephanEwen closed the pull request at:

https://github.com/apache/flink/pull/938


> Abstract BarrierBuffer to an exchangeable BarrierHandler
> 
>
> Key: FLINK-2406
> URL: https://issues.apache.org/jira/browse/FLINK-2406
> Project: Flink
>  Issue Type: Sub-task
>  Components: Streaming
>Affects Versions: 0.10
>Reporter: Stephan Ewen
>Assignee: Stephan Ewen
> Fix For: 0.10
>
>
> We need to make the Checkpoint handling pluggable, to allow us to use 
> different implementations:
>   - BarrierBuffer for "exactly once" processing. This inevitably introduces a 
> bit of latency.
>   - BarrierTracker for "at least once" processing, with no added latency.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-2406) Abstract BarrierBuffer to an exchangeable BarrierHandler

2015-07-29 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14645760#comment-14645760
 ] 

ASF GitHub Bot commented on FLINK-2406:
---

Github user StephanEwen commented on the pull request:

https://github.com/apache/flink/pull/938#issuecomment-125889092
  
Manually merged in 8f87b7164b644ea8f1708f7eb76567e58341b224


> Abstract BarrierBuffer to an exchangeable BarrierHandler
> 
>
> Key: FLINK-2406
> URL: https://issues.apache.org/jira/browse/FLINK-2406
> Project: Flink
>  Issue Type: Sub-task
>  Components: Streaming
>Affects Versions: 0.10
>Reporter: Stephan Ewen
>Assignee: Stephan Ewen
> Fix For: 0.10
>
>
> We need to make the Checkpoint handling pluggable, to allow us to use 
> different implementations:
>   - BarrierBuffer for "exactly once" processing. This inevitably introduces a 
> bit of latency.
>   - BarrierTracker for "at least once" processing, with no added latency.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-2406] [FLINK-2402] Abstract the Barrier...

2015-07-29 Thread StephanEwen
Github user StephanEwen commented on the pull request:

https://github.com/apache/flink/pull/938#issuecomment-125889092
  
Manually merged in 8f87b7164b644ea8f1708f7eb76567e58341b224


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-2231] Create a Serializer for Scala Enu...

2015-07-29 Thread StephanEwen
Github user StephanEwen commented on a diff in the pull request:

https://github.com/apache/flink/pull/935#discussion_r35741171
  
--- Diff: 
flink-scala/src/main/scala/org/apache/flink/api/scala/typeutils/EnumValueSerializer.scala
 ---
@@ -0,0 +1,61 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.flink.api.scala.typeutils
+
+import org.apache.flink.api.common.typeutils.TypeSerializer
+import org.apache.flink.api.common.typeutils.base.IntSerializer
+import org.apache.flink.core.memory.{DataInputView, DataOutputView}
+
+/**
+ * Serializer for [[Enumeration]] values.
+ */
+class EnumValueSerializer[E <: Enumeration](val enum: E) extends 
TypeSerializer[E#Value] {
+
+  type T = E#Value
+
+  val intSerializer = new IntSerializer()
+
+  override def duplicate: EnumValueSerializer[E] = this
+
+  override def createInstance: T = enum(0)
+
+  override def isImmutableType: Boolean = true
+
+  override def getLength: Int = intSerializer.getLength
+
+  override def copy(from: T): T = enum.apply(from.id)
+
+  override def copy(from: T, reuse: T): T = copy(from)
+
+  override def copy(src: DataInputView, tgt: DataOutputView): Unit = 
intSerializer.copy(src, tgt)
+
+  override def serialize(v: T, tgt: DataOutputView): Unit = 
intSerializer.serialize(v.id, tgt)
+
+  override def deserialize(source: DataInputView): T = 
enum(intSerializer.deserialize(source))
+
+  override def deserialize(reuse: T, source: DataInputView): T = 
deserialize(source)
+
+  override def equals(obj: Any): Boolean = {
--- End diff --

Can you also override `hashCode()` here? Keeps it consistent with 
overriding `equals()`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-2231) Create a Serializer for Scala Enumerations

2015-07-29 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14645766#comment-14645766
 ] 

ASF GitHub Bot commented on FLINK-2231:
---

Github user StephanEwen commented on a diff in the pull request:

https://github.com/apache/flink/pull/935#discussion_r35741171
  
--- Diff: 
flink-scala/src/main/scala/org/apache/flink/api/scala/typeutils/EnumValueSerializer.scala
 ---
@@ -0,0 +1,61 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.flink.api.scala.typeutils
+
+import org.apache.flink.api.common.typeutils.TypeSerializer
+import org.apache.flink.api.common.typeutils.base.IntSerializer
+import org.apache.flink.core.memory.{DataInputView, DataOutputView}
+
+/**
+ * Serializer for [[Enumeration]] values.
+ */
+class EnumValueSerializer[E <: Enumeration](val enum: E) extends 
TypeSerializer[E#Value] {
+
+  type T = E#Value
+
+  val intSerializer = new IntSerializer()
+
+  override def duplicate: EnumValueSerializer[E] = this
+
+  override def createInstance: T = enum(0)
+
+  override def isImmutableType: Boolean = true
+
+  override def getLength: Int = intSerializer.getLength
+
+  override def copy(from: T): T = enum.apply(from.id)
+
+  override def copy(from: T, reuse: T): T = copy(from)
+
+  override def copy(src: DataInputView, tgt: DataOutputView): Unit = 
intSerializer.copy(src, tgt)
+
+  override def serialize(v: T, tgt: DataOutputView): Unit = 
intSerializer.serialize(v.id, tgt)
+
+  override def deserialize(source: DataInputView): T = 
enum(intSerializer.deserialize(source))
+
+  override def deserialize(reuse: T, source: DataInputView): T = 
deserialize(source)
+
+  override def equals(obj: Any): Boolean = {
--- End diff --

Can you also override `hashCode()` here? Keeps it consistent with 
overriding `equals()`


> Create a Serializer for Scala Enumerations
> --
>
> Key: FLINK-2231
> URL: https://issues.apache.org/jira/browse/FLINK-2231
> Project: Flink
>  Issue Type: Improvement
>  Components: Scala API
>Reporter: Stephan Ewen
>Assignee: Alexander Alexandrov
>
> Scala Enumerations are currently serialized with Kryo, but should be 
> efficiently serialized by just writing the {{initial}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-2231] Create a Serializer for Scala Enu...

2015-07-29 Thread StephanEwen
Github user StephanEwen commented on the pull request:

https://github.com/apache/flink/pull/935#issuecomment-125890350
  
Other than the one comment, this looks good.

+1


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-2231) Create a Serializer for Scala Enumerations

2015-07-29 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14645773#comment-14645773
 ] 

ASF GitHub Bot commented on FLINK-2231:
---

Github user StephanEwen commented on the pull request:

https://github.com/apache/flink/pull/935#issuecomment-125890350
  
Other than the one comment, this looks good.

+1


> Create a Serializer for Scala Enumerations
> --
>
> Key: FLINK-2231
> URL: https://issues.apache.org/jira/browse/FLINK-2231
> Project: Flink
>  Issue Type: Improvement
>  Components: Scala API
>Reporter: Stephan Ewen
>Assignee: Alexander Alexandrov
>
> Scala Enumerations are currently serialized with Kryo, but should be 
> efficiently serialized by just writing the {{initial}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-2419] Add test for sinks after keyBy an...

2015-07-29 Thread gyfora
Github user gyfora commented on the pull request:

https://github.com/apache/flink/pull/947#issuecomment-125894533
  
If you give a +1 @rmetzger I will merge this


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-2419) DataStream sinks lose key information

2015-07-29 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14645779#comment-14645779
 ] 

ASF GitHub Bot commented on FLINK-2419:
---

Github user gyfora commented on the pull request:

https://github.com/apache/flink/pull/947#issuecomment-125894533
  
If you give a +1 @rmetzger I will merge this


> DataStream sinks lose key information
> -
>
> Key: FLINK-2419
> URL: https://issues.apache.org/jira/browse/FLINK-2419
> Project: Flink
>  Issue Type: Bug
>  Components: Streaming
>Reporter: Gyula Fora
>Assignee: Gyula Fora
>Priority: Blocker
> Fix For: 0.10
>
>
> If the user applies an addSink method after keyBy, the sink will not have the 
> information of the key as the transform method is bypassed.
> This makes it impossible to use partitioned states with sinks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-1901] [core] Create sample operator for...

2015-07-29 Thread ChengXiangLi
Github user ChengXiangLi commented on the pull request:

https://github.com/apache/flink/pull/949#issuecomment-125898830
  
Previously, i plan to leave the sample scala API to an separate PR as i not 
very familiar with scala, but the failed test shows that Flink has a test to 
make sure scala and java has the same API, i would try to add scala API and 
integration test later.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-1901) Create sample operator for Dataset

2015-07-29 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14645803#comment-14645803
 ] 

ASF GitHub Bot commented on FLINK-1901:
---

Github user ChengXiangLi commented on the pull request:

https://github.com/apache/flink/pull/949#issuecomment-125898830
  
Previously, i plan to leave the sample scala API to an separate PR as i not 
very familiar with scala, but the failed test shows that Flink has a test to 
make sure scala and java has the same API, i would try to add scala API and 
integration test later.


> Create sample operator for Dataset
> --
>
> Key: FLINK-1901
> URL: https://issues.apache.org/jira/browse/FLINK-1901
> Project: Flink
>  Issue Type: Improvement
>  Components: Core
>Reporter: Theodore Vasiloudis
>Assignee: Chengxiang Li
>
> In order to be able to implement Stochastic Gradient Descent and a number of 
> other machine learning algorithms we need to have a way to take a random 
> sample from a Dataset.
> We need to be able to sample with or without replacement from the Dataset, 
> choose the relative size of the sample, and set a seed for reproducibility.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (FLINK-2425) Give access to TaskManager config and hostname in the Runtime Environment

2015-07-29 Thread Stephan Ewen (JIRA)
Stephan Ewen created FLINK-2425:
---

 Summary: Give access to TaskManager config and hostname in the 
Runtime Environment
 Key: FLINK-2425
 URL: https://issues.apache.org/jira/browse/FLINK-2425
 Project: Flink
  Issue Type: Improvement
  Components: TaskManager
Affects Versions: 0.10
Reporter: Stephan Ewen
 Fix For: 0.10


The RuntimeEnvironment (that is used by the operators to access the context) 
should give access to the TaskManager's configuration, to allow to read config 
values.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (FLINK-2426) Create a read-only variant of the Configuration

2015-07-29 Thread Stephan Ewen (JIRA)
Stephan Ewen created FLINK-2426:
---

 Summary: Create a read-only variant of the Configuration
 Key: FLINK-2426
 URL: https://issues.apache.org/jira/browse/FLINK-2426
 Project: Flink
  Issue Type: Sub-task
  Components: Core
Affects Versions: 0.10
Reporter: Stephan Ewen
 Fix For: 0.10


In order to give the TaskManager configuration to runtime components (or even 
user code), it should be wrapped in an {{UnmodifiableConfiguration}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-2425) Give access to TaskManager config and hostname in the Runtime Environment

2015-07-29 Thread Robert Metzger (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14645817#comment-14645817
 ] 

Robert Metzger commented on FLINK-2425:
---

+1 That is something that users have requested

> Give access to TaskManager config and hostname in the Runtime Environment
> -
>
> Key: FLINK-2425
> URL: https://issues.apache.org/jira/browse/FLINK-2425
> Project: Flink
>  Issue Type: Improvement
>  Components: TaskManager
>Affects Versions: 0.10
>Reporter: Stephan Ewen
> Fix For: 0.10
>
>
> The RuntimeEnvironment (that is used by the operators to access the context) 
> should give access to the TaskManager's configuration, to allow to read 
> config values.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-2387] add streaming test case for live ...

2015-07-29 Thread StephanEwen
Github user StephanEwen commented on the pull request:

https://github.com/apache/flink/pull/926#issuecomment-125902465
  
How does this PR relate to the recent improvements on the stability of the 
live accumulator tests?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-2387) Add test for live accumulators in Streaming

2015-07-29 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14645821#comment-14645821
 ] 

ASF GitHub Bot commented on FLINK-2387:
---

Github user StephanEwen commented on the pull request:

https://github.com/apache/flink/pull/926#issuecomment-125902465
  
How does this PR relate to the recent improvements on the stability of the 
live accumulator tests?


> Add test for live accumulators in Streaming
> ---
>
> Key: FLINK-2387
> URL: https://issues.apache.org/jira/browse/FLINK-2387
> Project: Flink
>  Issue Type: Test
>  Components: Streaming
>Reporter: Maximilian Michels
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-2240) Use BloomFilter to minimize probe side records which are spilled to disk in Hybrid-Hash-Join

2015-07-29 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14645822#comment-14645822
 ] 

ASF GitHub Bot commented on FLINK-2240:
---

Github user StephanEwen commented on the pull request:

https://github.com/apache/flink/pull/888#issuecomment-125903682
  
Okay, it took me a while, but I actually walked through this and would like 
to merge it soon.
To make this functionality configurable, I opened an issue to give runtime 
operators access to the TaskManager config, so that we can define a switch to 
turn this on and off.

I would actually turn it on by default, it seems to work well as far as I 
have tried it.


> Use BloomFilter to minimize probe side records which are spilled to disk in 
> Hybrid-Hash-Join
> 
>
> Key: FLINK-2240
> URL: https://issues.apache.org/jira/browse/FLINK-2240
> Project: Flink
>  Issue Type: Improvement
>  Components: Core
>Reporter: Chengxiang Li
>Assignee: Chengxiang Li
>Priority: Minor
>
> In Hybrid-Hash-Join, while small table does not fit into memory, part of the 
> small table data would be spilled to disk, and the counterpart partition of 
> big table data would be spilled to disk in probe phase as well. If we build a 
> BloomFilter while spill small table to disk during build phase, and use it to 
> filter the big table records which tend to be spilled to disk, this may 
> greatly  reduce the spilled big table file size, and saved the disk IO cost 
> for writing and further reading.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-2240] Use BloomFilter to filter probe r...

2015-07-29 Thread StephanEwen
Github user StephanEwen commented on the pull request:

https://github.com/apache/flink/pull/888#issuecomment-125903682
  
Okay, it took me a while, but I actually walked through this and would like 
to merge it soon.
To make this functionality configurable, I opened an issue to give runtime 
operators access to the TaskManager config, so that we can define a switch to 
turn this on and off.

I would actually turn it on by default, it seems to work well as far as I 
have tried it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Created] (FLINK-2427) Allow the BarrierBuffer to maintain multiple queues of blocked inputs

2015-07-29 Thread Stephan Ewen (JIRA)
Stephan Ewen created FLINK-2427:
---

 Summary: Allow the BarrierBuffer to maintain multiple queues of 
blocked inputs
 Key: FLINK-2427
 URL: https://issues.apache.org/jira/browse/FLINK-2427
 Project: Flink
  Issue Type: Sub-task
  Components: Streaming
Affects Versions: 0.10
Reporter: Stephan Ewen
Assignee: Stephan Ewen
 Fix For: 0.10


In corner cases (dropped barriers due to failures/startup races), this is 
required for proper operation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-2240] Use BloomFilter to filter probe r...

2015-07-29 Thread ChengXiangLi
Github user ChengXiangLi commented on the pull request:

https://github.com/apache/flink/pull/888#issuecomment-125914138
  
Thanks to point that out, Stephan, i didn't notice the configuration issue 
before.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-2240) Use BloomFilter to minimize probe side records which are spilled to disk in Hybrid-Hash-Join

2015-07-29 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14645842#comment-14645842
 ] 

ASF GitHub Bot commented on FLINK-2240:
---

Github user ChengXiangLi commented on the pull request:

https://github.com/apache/flink/pull/888#issuecomment-125914138
  
Thanks to point that out, Stephan, i didn't notice the configuration issue 
before.


> Use BloomFilter to minimize probe side records which are spilled to disk in 
> Hybrid-Hash-Join
> 
>
> Key: FLINK-2240
> URL: https://issues.apache.org/jira/browse/FLINK-2240
> Project: Flink
>  Issue Type: Improvement
>  Components: Core
>Reporter: Chengxiang Li
>Assignee: Chengxiang Li
>Priority: Minor
>
> In Hybrid-Hash-Join, while small table does not fit into memory, part of the 
> small table data would be spilled to disk, and the counterpart partition of 
> big table data would be spilled to disk in probe phase as well. If we build a 
> BloomFilter while spill small table to disk during build phase, and use it to 
> filter the big table records which tend to be spilled to disk, this may 
> greatly  reduce the spilled big table file size, and saved the disk IO cost 
> for writing and further reading.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-2152) Provide zipWithIndex utility in flink-contrib

2015-07-29 Thread Robert Metzger (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14645895#comment-14645895
 ] 

Robert Metzger commented on FLINK-2152:
---

Yes, we either have to use a concurrent list or create a copy of it.
The elements in the broadcast set are shared among the parallel instances.

> Provide zipWithIndex utility in flink-contrib
> -
>
> Key: FLINK-2152
> URL: https://issues.apache.org/jira/browse/FLINK-2152
> Project: Flink
>  Issue Type: Improvement
>  Components: Java API
>Reporter: Robert Metzger
>Assignee: Andra Lungu
>Priority: Trivial
>  Labels: starter
> Fix For: 0.10
>
>
> We should provide a simple utility method for zipping elements in a data set 
> with a dense index.
> its up for discussion whether we want it directly in the API or if we should 
> provide it only as a utility from {{flink-contrib}}.
> I would put it in {{flink-contrib}}.
> See my answer on SO: 
> http://stackoverflow.com/questions/30596556/zipwithindex-on-apache-flink



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-2152) Provide zipWithIndex utility in flink-contrib

2015-07-29 Thread Andra Lungu (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14645904#comment-14645904
 ] 

Andra Lungu commented on FLINK-2152:


Hey,

Since this is Johannes Günther's finding, I suggest he simply opens a PR with 
what worked for him :)
Nice catch BTW!

Cheers,
Andra

> Provide zipWithIndex utility in flink-contrib
> -
>
> Key: FLINK-2152
> URL: https://issues.apache.org/jira/browse/FLINK-2152
> Project: Flink
>  Issue Type: Improvement
>  Components: Java API
>Reporter: Robert Metzger
>Assignee: Andra Lungu
>Priority: Trivial
>  Labels: starter
> Fix For: 0.10
>
>
> We should provide a simple utility method for zipping elements in a data set 
> with a dense index.
> its up for discussion whether we want it directly in the API or if we should 
> provide it only as a utility from {{flink-contrib}}.
> I would put it in {{flink-contrib}}.
> See my answer on SO: 
> http://stackoverflow.com/questions/30596556/zipwithindex-on-apache-flink



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (FLINK-2407) Add an API switch to select between "exactly once" and "at least once" fault tolerance

2015-07-29 Thread Stephan Ewen (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-2407?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephan Ewen reassigned FLINK-2407:
---

Assignee: Stephan Ewen

> Add an API switch to select between "exactly once" and "at least once" fault 
> tolerance
> --
>
> Key: FLINK-2407
> URL: https://issues.apache.org/jira/browse/FLINK-2407
> Project: Flink
>  Issue Type: Sub-task
>  Components: Streaming
>Affects Versions: 0.10
>Reporter: Stephan Ewen
>Assignee: Stephan Ewen
> Fix For: 0.10
>
>
> Based on the addition of the BarrierTracker, we can add a switch to choose 
> between the two modes "exactly once" and "at least once".



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-2391]Fix Storm-compatibility FlinkTopol...

2015-07-29 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/940


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-2391) Storm-compatibility:method FlinkTopologyBuilder.createTopology() throws java.lang.NullPointerException

2015-07-29 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14645954#comment-14645954
 ] 

ASF GitHub Bot commented on FLINK-2391:
---

Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/940


> Storm-compatibility:method FlinkTopologyBuilder.createTopology() throws 
> java.lang.NullPointerException
> --
>
> Key: FLINK-2391
> URL: https://issues.apache.org/jira/browse/FLINK-2391
> Project: Flink
>  Issue Type: Bug
>  Components: flink-contrib
>Affects Versions: 0.10
> Environment: win7 32bit;linux
>Reporter: Huang Wei
>  Labels: features
> Fix For: 0.10
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> core dumped at FlinkOutputFieldsDeclarer.java : 160(package 
> FlinkOutputFieldsDeclarer).
> code : fieldIndexes[i] = this.outputSchema.fieldIndex(groupingFields.get(i));
> in this line, the var this.outputSchema may be null.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: Updated method documentation in joinDataSet.sc...

2015-07-29 Thread StephanEwen
Github user StephanEwen commented on the pull request:

https://github.com/apache/flink/pull/909#issuecomment-125933965
  
Merging this...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Closed] (FLINK-2391) Storm-compatibility:method FlinkTopologyBuilder.createTopology() throws java.lang.NullPointerException

2015-07-29 Thread Stephan Ewen (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-2391?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephan Ewen closed FLINK-2391.
---

> Storm-compatibility:method FlinkTopologyBuilder.createTopology() throws 
> java.lang.NullPointerException
> --
>
> Key: FLINK-2391
> URL: https://issues.apache.org/jira/browse/FLINK-2391
> Project: Flink
>  Issue Type: Bug
>  Components: flink-contrib
>Affects Versions: 0.10
> Environment: win7 32bit;linux
>Reporter: Huang Wei
>  Labels: features
> Fix For: 0.10
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> core dumped at FlinkOutputFieldsDeclarer.java : 160(package 
> FlinkOutputFieldsDeclarer).
> code : fieldIndexes[i] = this.outputSchema.fieldIndex(groupingFields.get(i));
> in this line, the var this.outputSchema may be null.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (FLINK-2391) Storm-compatibility:method FlinkTopologyBuilder.createTopology() throws java.lang.NullPointerException

2015-07-29 Thread Stephan Ewen (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-2391?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephan Ewen resolved FLINK-2391.
-
Resolution: Fixed

Fixed in ada9037bef760d46a4c3be2177e04bd72e620dad

Thank you for the patch!

> Storm-compatibility:method FlinkTopologyBuilder.createTopology() throws 
> java.lang.NullPointerException
> --
>
> Key: FLINK-2391
> URL: https://issues.apache.org/jira/browse/FLINK-2391
> Project: Flink
>  Issue Type: Bug
>  Components: flink-contrib
>Affects Versions: 0.10
> Environment: win7 32bit;linux
>Reporter: Huang Wei
>  Labels: features
> Fix For: 0.10
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> core dumped at FlinkOutputFieldsDeclarer.java : 160(package 
> FlinkOutputFieldsDeclarer).
> code : fieldIndexes[i] = this.outputSchema.fieldIndex(groupingFields.get(i));
> in this line, the var this.outputSchema may be null.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: Updated method documentation in joinDataSet.sc...

2015-07-29 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/909


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: Updated method documentation in joinDataSet.sc...

2015-07-29 Thread StephanEwen
Github user StephanEwen commented on the pull request:

https://github.com/apache/flink/pull/909#issuecomment-125934297
  
Manually merged. Thank you for the patch!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-2231] Create a Serializer for Scala Enu...

2015-07-29 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/935


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-2231) Create a Serializer for Scala Enumerations

2015-07-29 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14645974#comment-14645974
 ] 

ASF GitHub Bot commented on FLINK-2231:
---

Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/935


> Create a Serializer for Scala Enumerations
> --
>
> Key: FLINK-2231
> URL: https://issues.apache.org/jira/browse/FLINK-2231
> Project: Flink
>  Issue Type: Improvement
>  Components: Scala API
>Reporter: Stephan Ewen
>Assignee: Alexander Alexandrov
>
> Scala Enumerations are currently serialized with Kryo, but should be 
> efficiently serialized by just writing the {{initial}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-2231] Create a Serializer for Scala Enu...

2015-07-29 Thread aljoscha
Github user aljoscha commented on the pull request:

https://github.com/apache/flink/pull/935#issuecomment-125936680
  
Merged, thanks for your work. :smile: 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-2231) Create a Serializer for Scala Enumerations

2015-07-29 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14645977#comment-14645977
 ] 

ASF GitHub Bot commented on FLINK-2231:
---

Github user aljoscha commented on the pull request:

https://github.com/apache/flink/pull/935#issuecomment-125936680
  
Merged, thanks for your work. :smile: 


> Create a Serializer for Scala Enumerations
> --
>
> Key: FLINK-2231
> URL: https://issues.apache.org/jira/browse/FLINK-2231
> Project: Flink
>  Issue Type: Improvement
>  Components: Scala API
>Reporter: Stephan Ewen
>Assignee: Alexander Alexandrov
> Fix For: 0.10
>
>
> Scala Enumerations are currently serialized with Kryo, but should be 
> efficiently serialized by just writing the {{initial}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Closed] (FLINK-2231) Create a Serializer for Scala Enumerations

2015-07-29 Thread Aljoscha Krettek (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-2231?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aljoscha Krettek closed FLINK-2231.
---
   Resolution: Fixed
Fix Version/s: 0.10

Implemented in 
https://github.com/apache/flink/commit/a34c9416790e0cddedb3f2518fd0bea2331cbcc0

> Create a Serializer for Scala Enumerations
> --
>
> Key: FLINK-2231
> URL: https://issues.apache.org/jira/browse/FLINK-2231
> Project: Flink
>  Issue Type: Improvement
>  Components: Scala API
>Reporter: Stephan Ewen
>Assignee: Alexander Alexandrov
> Fix For: 0.10
>
>
> Scala Enumerations are currently serialized with Kryo, but should be 
> efficiently serialized by just writing the {{initial}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (FLINK-2425) Give access to TaskManager config and hostname in the Runtime Environment

2015-07-29 Thread Sachin Goel (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-2425?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sachin Goel reassigned FLINK-2425:
--

Assignee: Sachin Goel

> Give access to TaskManager config and hostname in the Runtime Environment
> -
>
> Key: FLINK-2425
> URL: https://issues.apache.org/jira/browse/FLINK-2425
> Project: Flink
>  Issue Type: Improvement
>  Components: TaskManager
>Affects Versions: 0.10
>Reporter: Stephan Ewen
>Assignee: Sachin Goel
> Fix For: 0.10
>
>
> The RuntimeEnvironment (that is used by the operators to access the context) 
> should give access to the TaskManager's configuration, to allow to read 
> config values.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-2425) Give access to TaskManager config and hostname in the Runtime Environment

2015-07-29 Thread Stephan Ewen (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14645984#comment-14645984
 ] 

Stephan Ewen commented on FLINK-2425:
-

Could you describe in a few lines how you mean to implement this?

> Give access to TaskManager config and hostname in the Runtime Environment
> -
>
> Key: FLINK-2425
> URL: https://issues.apache.org/jira/browse/FLINK-2425
> Project: Flink
>  Issue Type: Improvement
>  Components: TaskManager
>Affects Versions: 0.10
>Reporter: Stephan Ewen
>Assignee: Sachin Goel
> Fix For: 0.10
>
>
> The RuntimeEnvironment (that is used by the operators to access the context) 
> should give access to the TaskManager's configuration, to allow to read 
> config values.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (FLINK-2428) Clean up unused properties in StreamConfig

2015-07-29 Thread Stephan Ewen (JIRA)
Stephan Ewen created FLINK-2428:
---

 Summary: Clean up unused properties in StreamConfig
 Key: FLINK-2428
 URL: https://issues.apache.org/jira/browse/FLINK-2428
 Project: Flink
  Issue Type: Bug
  Components: Streaming
Affects Versions: 0.10
Reporter: Stephan Ewen
Priority: Minor


There is a multitude of unused properties in the {{StreamConfig}}, which should 
be removed, if no longer relevant.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-2425) Give access to TaskManager config and hostname in the Runtime Environment

2015-07-29 Thread Sachin Goel (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14645993#comment-14645993
 ] 

Sachin Goel commented on FLINK-2425:


The readonly configuration will take the actual configuration as an argument in 
its constructor and provide all access functions of Configuration, while 
keeping it private.

> Give access to TaskManager config and hostname in the Runtime Environment
> -
>
> Key: FLINK-2425
> URL: https://issues.apache.org/jira/browse/FLINK-2425
> Project: Flink
>  Issue Type: Improvement
>  Components: TaskManager
>Affects Versions: 0.10
>Reporter: Stephan Ewen
>Assignee: Sachin Goel
> Fix For: 0.10
>
>
> The RuntimeEnvironment (that is used by the operators to access the context) 
> should give access to the TaskManager's configuration, to allow to read 
> config values.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-2425) Give access to TaskManager config and hostname in the Runtime Environment

2015-07-29 Thread Stephan Ewen (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14645997#comment-14645997
 ] 

Stephan Ewen commented on FLINK-2425:
-

Ah, I think you are in the wrong issue ;-) That would be here [FLINK-2426]


> Give access to TaskManager config and hostname in the Runtime Environment
> -
>
> Key: FLINK-2425
> URL: https://issues.apache.org/jira/browse/FLINK-2425
> Project: Flink
>  Issue Type: Improvement
>  Components: TaskManager
>Affects Versions: 0.10
>Reporter: Stephan Ewen
>Assignee: Sachin Goel
> Fix For: 0.10
>
>
> The RuntimeEnvironment (that is used by the operators to access the context) 
> should give access to the TaskManager's configuration, to allow to read 
> config values.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-2425) Give access to TaskManager config and hostname in the Runtime Environment

2015-07-29 Thread Sachin Goel (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14646003#comment-14646003
 ] 

Sachin Goel commented on FLINK-2425:


Yes. But this will certainly use the readonly configuration defined there. 
After that it's just a matter of passing it to the RuntimeEnvironment which can 
give access to the user by passing it through the DistributedContext. 
Or is there a better way? :')

> Give access to TaskManager config and hostname in the Runtime Environment
> -
>
> Key: FLINK-2425
> URL: https://issues.apache.org/jira/browse/FLINK-2425
> Project: Flink
>  Issue Type: Improvement
>  Components: TaskManager
>Affects Versions: 0.10
>Reporter: Stephan Ewen
>Assignee: Sachin Goel
> Fix For: 0.10
>
>
> The RuntimeEnvironment (that is used by the operators to access the context) 
> should give access to the TaskManager's configuration, to allow to read 
> config values.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-2425) Give access to TaskManager config and hostname in the Runtime Environment

2015-07-29 Thread Stephan Ewen (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14646019#comment-14646019
 ] 

Stephan Ewen commented on FLINK-2425:
-

The TaskManager configuration object needs only go into the RuntimeEnviromnent, 
we only need it at the internal operators. Exposing it to the user functions 
would be another thing. Probably useful as well.

> Give access to TaskManager config and hostname in the Runtime Environment
> -
>
> Key: FLINK-2425
> URL: https://issues.apache.org/jira/browse/FLINK-2425
> Project: Flink
>  Issue Type: Improvement
>  Components: TaskManager
>Affects Versions: 0.10
>Reporter: Stephan Ewen
>Assignee: Sachin Goel
> Fix For: 0.10
>
>
> The RuntimeEnvironment (that is used by the operators to access the context) 
> should give access to the TaskManager's configuration, to allow to read 
> config values.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: Cascading changes for compatibility

2015-07-29 Thread mxm
GitHub user mxm opened a pull request:

https://github.com/apache/flink/pull/950

Cascading changes for compatibility

@fhueske and me are working on getting Cascading to run on top of Flink. 
These two commits introduce changes that were necessary to make the translation 
possible.

Up to debate is the second commit, which loads the userclassloader before 
calling `configure` on an InputFormat. This is necessary because Cascading has 
its own interface that needs to be wrapped inside a Flink `InputFormat`. 
Usually, all dependencies are loaded upon instantiation of the class. However, 
Cascading loads its own input format while `configure(config)` is called (via 
the class name inside the passed config). Without the changes, this leads to a 
ClassNotFoundException.

Let me know what you think about it. 

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/mxm/flink cascading-dev

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/950.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #950


commit 433fec2a90b5e868d0c3b7fa258b85c32345b954
Author: Fabian Hueske 
Date:   2015-07-16T22:31:09Z

[cascading] add getJobConf() to HadoopInputSplit

commit a81582c3cf59952381ce5fb9e15adeb775fcbff7
Author: Maximilian Michels 
Date:   2015-07-29T12:51:14Z

[cascading] load user classloader when configuring InputFormat




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-2425) Give access to TaskManager config and hostname in the Runtime Environment

2015-07-29 Thread Sachin Goel (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14646023#comment-14646023
 ] 

Sachin Goel commented on FLINK-2425:


Should I provide access to configuration via RuntimeContext as well? Since user 
cannot modify it, there can't be any harm to doing it.
Access to config parameters might help the user write the program better.

> Give access to TaskManager config and hostname in the Runtime Environment
> -
>
> Key: FLINK-2425
> URL: https://issues.apache.org/jira/browse/FLINK-2425
> Project: Flink
>  Issue Type: Improvement
>  Components: TaskManager
>Affects Versions: 0.10
>Reporter: Stephan Ewen
>Assignee: Sachin Goel
> Fix For: 0.10
>
>
> The RuntimeEnvironment (that is used by the operators to access the context) 
> should give access to the TaskManager's configuration, to allow to read 
> config values.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (FLINK-2429) Remove the "enableCheckpointing()" without interval variant

2015-07-29 Thread Stephan Ewen (JIRA)
Stephan Ewen created FLINK-2429:
---

 Summary: Remove the "enableCheckpointing()" without interval 
variant
 Key: FLINK-2429
 URL: https://issues.apache.org/jira/browse/FLINK-2429
 Project: Flink
  Issue Type: Bug
  Components: Streaming
Reporter: Stephan Ewen






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (FLINK-2429) Remove the "enableCheckpointing()" without interval variant

2015-07-29 Thread Stephan Ewen (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-2429?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephan Ewen updated FLINK-2429:

Issue Type: Wish  (was: Bug)

> Remove the "enableCheckpointing()" without interval variant
> ---
>
> Key: FLINK-2429
> URL: https://issues.apache.org/jira/browse/FLINK-2429
> Project: Flink
>  Issue Type: Wish
>  Components: Streaming
>Reporter: Stephan Ewen
>
> It is not very obvious what the default checkpointing interval. Also, when 
> somebody activates 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (FLINK-2429) Remove the "enableCheckpointing()" without interval variant

2015-07-29 Thread Stephan Ewen (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-2429?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephan Ewen updated FLINK-2429:

Description: 
I think it is not very obvious what the default checkpointing interval is.
Also, when somebody activates checkpointing, shouldn't they think about what 
they want in terms of frequency and recovery latency tradeoffs?

  was:It is not very obvious what the default checkpointing interval. Also, 
when somebody activates 


> Remove the "enableCheckpointing()" without interval variant
> ---
>
> Key: FLINK-2429
> URL: https://issues.apache.org/jira/browse/FLINK-2429
> Project: Flink
>  Issue Type: Wish
>  Components: Streaming
>Reporter: Stephan Ewen
>Priority: Minor
>
> I think it is not very obvious what the default checkpointing interval is.
> Also, when somebody activates checkpointing, shouldn't they think about what 
> they want in terms of frequency and recovery latency tradeoffs?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (FLINK-2429) Remove the "enableCheckpointing()" without interval variant

2015-07-29 Thread Stephan Ewen (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-2429?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephan Ewen updated FLINK-2429:

Priority: Minor  (was: Major)

> Remove the "enableCheckpointing()" without interval variant
> ---
>
> Key: FLINK-2429
> URL: https://issues.apache.org/jira/browse/FLINK-2429
> Project: Flink
>  Issue Type: Wish
>  Components: Streaming
>Reporter: Stephan Ewen
>Priority: Minor
>
> It is not very obvious what the default checkpointing interval. Also, when 
> somebody activates 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (FLINK-2429) Remove the "enableCheckpointing()" without interval variant

2015-07-29 Thread Stephan Ewen (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-2429?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephan Ewen updated FLINK-2429:

Description: It is not very obvious what the default checkpointing 
interval. Also, when somebody activates 

> Remove the "enableCheckpointing()" without interval variant
> ---
>
> Key: FLINK-2429
> URL: https://issues.apache.org/jira/browse/FLINK-2429
> Project: Flink
>  Issue Type: Bug
>  Components: Streaming
>Reporter: Stephan Ewen
>
> It is not very obvious what the default checkpointing interval. Also, when 
> somebody activates 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-2425) Give access to TaskManager config and hostname in the Runtime Environment

2015-07-29 Thread Stephan Ewen (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14646029#comment-14646029
 ] 

Stephan Ewen commented on FLINK-2425:
-

How about opening a pull request for the RuntimeEnvironment first? This is 
needed soon, the RuntimeContext is not urgent.

> Give access to TaskManager config and hostname in the Runtime Environment
> -
>
> Key: FLINK-2425
> URL: https://issues.apache.org/jira/browse/FLINK-2425
> Project: Flink
>  Issue Type: Improvement
>  Components: TaskManager
>Affects Versions: 0.10
>Reporter: Stephan Ewen
>Assignee: Sachin Goel
> Fix For: 0.10
>
>
> The RuntimeEnvironment (that is used by the operators to access the context) 
> should give access to the TaskManager's configuration, to allow to read 
> config values.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-2387) Add test for live accumulators in Streaming

2015-07-29 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14646039#comment-14646039
 ] 

ASF GitHub Bot commented on FLINK-2387:
---

Github user mxm commented on the pull request:

https://github.com/apache/flink/pull/926#issuecomment-125955678
  
Sorry for the plain description. This pull request adds a test for the 
streaming part of the live accumulators, i.e. it makes sure that user-defined 
and Flink internal accumulators work also in streaming programs. The current 
test only tests the batch side.

The stability of the live accumulator tests should not be affected by this 
pull request. It uses the same technique as the current (improved) live 
accumulator tests.


> Add test for live accumulators in Streaming
> ---
>
> Key: FLINK-2387
> URL: https://issues.apache.org/jira/browse/FLINK-2387
> Project: Flink
>  Issue Type: Test
>  Components: Streaming
>Reporter: Maximilian Michels
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-2387] add streaming test case for live ...

2015-07-29 Thread mxm
Github user mxm commented on the pull request:

https://github.com/apache/flink/pull/926#issuecomment-125955678
  
Sorry for the plain description. This pull request adds a test for the 
streaming part of the live accumulators, i.e. it makes sure that user-defined 
and Flink internal accumulators work also in streaming programs. The current 
test only tests the batch side.

The stability of the live accumulator tests should not be affected by this 
pull request. It uses the same technique as the current (improved) live 
accumulator tests.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-2407] [streaming] Add an API switch to ...

2015-07-29 Thread StephanEwen
GitHub user StephanEwen opened a pull request:

https://github.com/apache/flink/pull/951

[FLINK-2407] [streaming] Add an API switch to choose between "exactly once" 
and "at least once".

Adds a switch to choose between **exactly once** and **at least once** 
checkpointing mode.

Exactly Once
==
Sets the checkpointing mode to "exactly once". This mode means that the 
system will checkpoint the operator and user function state in such a way that, 
upon recovery, every record will be reflected exactly once in the operator 
state.

For example, if a user function counts the number of elements in a stream, 
this number will consistently be equal to the number of actual elements in the 
stream, regardless of failures and recovery.

Note that this does not mean that each record flows through the streaming 
data flow only once. It means that upon recovery, the state of 
operators/functions is restored such that the resumed data streams pick up 
exactly at after the last modification to the state.
 
Note that this mode does not guarantee exactly-once behavior in the 
interaction with external systems (only state in Flink's operators and user 
functions). The reason for that is that a certain level of "collaboration" is 
required between two systems to achieve exactly-once guarantees. However, for 
certain systems, connectors can be written that facilitate this collaboration.

This mode sustains high throughput. Depending on the data flow graph and 
operations, this mode may increase the record latency, because operators need 
to align their input streams, in order to create a consistent snapshot point. 
The latency increase for simple dataflows (no repartitioning) is negligible. 
For simple dataflows with repartitioning, the average latency remains small, 
but the slowest records typically have an increased latency.


At Least Once
===

Sets the checkpointing mode to "at least once". This mode means that the 
system will checkpoint the operator and user function state in a simpler way. 
Upon failure and recovery, some records may be reflected multiple times in the 
operator state.

For example, if a user function counts the number of elements in a stream, 
this number will equal to, or larger, than the actual number of elements in the 
stream, in the presence of failure and recovery.

This mode has minimal impact on latency and may be preferable in very-low 
latency scenarios, where a sustained very-low latency (such as few 
milliseconds) is needed, and where occasional duplicate messages (on recovery) 
do not matter.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/StephanEwen/incubator-flink 
at_least_once_switch

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/951.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #951


commit b089efa6ef688d61f41463d644768845393a913b
Author: Stephan Ewen 
Date:   2015-07-29T12:12:42Z

[FLINK-2407] [streaming] Add an API switch to choose between "exactly once" 
and "at least once".

commit 71bd9c01aa24c9420766c9c79ff80618341b8e69
Author: Stephan Ewen 
Date:   2015-07-29T12:49:23Z

[hotfix] Code cleanups in the StreamConfig




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-2407) Add an API switch to select between "exactly once" and "at least once" fault tolerance

2015-07-29 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14646072#comment-14646072
 ] 

ASF GitHub Bot commented on FLINK-2407:
---

GitHub user StephanEwen opened a pull request:

https://github.com/apache/flink/pull/951

[FLINK-2407] [streaming] Add an API switch to choose between "exactly once" 
and "at least once".

Adds a switch to choose between **exactly once** and **at least once** 
checkpointing mode.

Exactly Once
==
Sets the checkpointing mode to "exactly once". This mode means that the 
system will checkpoint the operator and user function state in such a way that, 
upon recovery, every record will be reflected exactly once in the operator 
state.

For example, if a user function counts the number of elements in a stream, 
this number will consistently be equal to the number of actual elements in the 
stream, regardless of failures and recovery.

Note that this does not mean that each record flows through the streaming 
data flow only once. It means that upon recovery, the state of 
operators/functions is restored such that the resumed data streams pick up 
exactly at after the last modification to the state.
 
Note that this mode does not guarantee exactly-once behavior in the 
interaction with external systems (only state in Flink's operators and user 
functions). The reason for that is that a certain level of "collaboration" is 
required between two systems to achieve exactly-once guarantees. However, for 
certain systems, connectors can be written that facilitate this collaboration.

This mode sustains high throughput. Depending on the data flow graph and 
operations, this mode may increase the record latency, because operators need 
to align their input streams, in order to create a consistent snapshot point. 
The latency increase for simple dataflows (no repartitioning) is negligible. 
For simple dataflows with repartitioning, the average latency remains small, 
but the slowest records typically have an increased latency.


At Least Once
===

Sets the checkpointing mode to "at least once". This mode means that the 
system will checkpoint the operator and user function state in a simpler way. 
Upon failure and recovery, some records may be reflected multiple times in the 
operator state.

For example, if a user function counts the number of elements in a stream, 
this number will equal to, or larger, than the actual number of elements in the 
stream, in the presence of failure and recovery.

This mode has minimal impact on latency and may be preferable in very-low 
latency scenarios, where a sustained very-low latency (such as few 
milliseconds) is needed, and where occasional duplicate messages (on recovery) 
do not matter.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/StephanEwen/incubator-flink 
at_least_once_switch

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/951.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #951


commit b089efa6ef688d61f41463d644768845393a913b
Author: Stephan Ewen 
Date:   2015-07-29T12:12:42Z

[FLINK-2407] [streaming] Add an API switch to choose between "exactly once" 
and "at least once".

commit 71bd9c01aa24c9420766c9c79ff80618341b8e69
Author: Stephan Ewen 
Date:   2015-07-29T12:49:23Z

[hotfix] Code cleanups in the StreamConfig




> Add an API switch to select between "exactly once" and "at least once" fault 
> tolerance
> --
>
> Key: FLINK-2407
> URL: https://issues.apache.org/jira/browse/FLINK-2407
> Project: Flink
>  Issue Type: Sub-task
>  Components: Streaming
>Affects Versions: 0.10
>Reporter: Stephan Ewen
>Assignee: Stephan Ewen
> Fix For: 0.10
>
>
> Based on the addition of the BarrierTracker, we can add a switch to choose 
> between the two modes "exactly once" and "at least once".



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: Cascading changes for compatibility

2015-07-29 Thread StephanEwen
Github user StephanEwen commented on the pull request:

https://github.com/apache/flink/pull/950#issuecomment-125962637
  
For robustness, can you restore the thread context classloader to the 
original one in a finally clause?

Otherwise +1


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-2387) Add test for live accumulators in Streaming

2015-07-29 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14646081#comment-14646081
 ] 

ASF GitHub Bot commented on FLINK-2387:
---

Github user StephanEwen commented on the pull request:

https://github.com/apache/flink/pull/926#issuecomment-125963497
  
Looks good


> Add test for live accumulators in Streaming
> ---
>
> Key: FLINK-2387
> URL: https://issues.apache.org/jira/browse/FLINK-2387
> Project: Flink
>  Issue Type: Test
>  Components: Streaming
>Reporter: Maximilian Michels
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-2387] add streaming test case for live ...

2015-07-29 Thread StephanEwen
Github user StephanEwen commented on the pull request:

https://github.com/apache/flink/pull/926#issuecomment-125963497
  
Looks good


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-1927][py] Operator distribution rework

2015-07-29 Thread mxm
Github user mxm commented on a diff in the pull request:

https://github.com/apache/flink/pull/931#discussion_r35763245
  
--- Diff: 
flink-staging/flink-language-binding/flink-python/src/main/java/org/apache/flink/languagebinding/api/java/python/streaming/PythonStreamer.java
 ---
@@ -114,31 +97,12 @@ public void run() {
 
Runtime.getRuntime().addShutdownHook(shutdownThread);
 
-   socket = server.accept();
-   in = socket.getInputStream();
-   out = socket.getOutputStream();
-
-   byte[] opSize = new byte[4];
-   putInt(opSize, 0, operator.length);
-   out.write(opSize, 0, 4);
-   out.write(operator, 0, operator.length);
-
-   byte[] meta = importString.toString().getBytes("utf-8");
-   putInt(opSize, 0, meta.length);
-   out.write(opSize, 0, 4);
-   out.write(meta, 0, meta.length);
-
-   byte[] input = inputFilePath.getBytes("utf-8");
-   putInt(opSize, 0, input.length);
-   out.write(opSize, 0, 4);
-   out.write(input, 0, input.length);
-
-   byte[] output = outputFilePath.getBytes("utf-8");
-   putInt(opSize, 0, output.length);
-   out.write(opSize, 0, 4);
-   out.write(output, 0, output.length);
-
-   out.flush();
+   process.getOutputStream().write("operator\n".getBytes());
+   process.getOutputStream().write(("" + server.getLocalPort() + 
"\n").getBytes());
+   process.getOutputStream().write((id + "\n").getBytes());
+   process.getOutputStream().write((inputFilePath + 
"\n").getBytes());
+   process.getOutputStream().write((outputFilePath + 
"\n").getBytes());
+   process.getOutputStream().flush();
--- End diff --

We could reuse `process.getOutputStream()` here by saving it to a variable.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-1927) [Py] Rework operator distribution

2015-07-29 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14646088#comment-14646088
 ] 

ASF GitHub Bot commented on FLINK-1927:
---

Github user mxm commented on a diff in the pull request:

https://github.com/apache/flink/pull/931#discussion_r35763245
  
--- Diff: 
flink-staging/flink-language-binding/flink-python/src/main/java/org/apache/flink/languagebinding/api/java/python/streaming/PythonStreamer.java
 ---
@@ -114,31 +97,12 @@ public void run() {
 
Runtime.getRuntime().addShutdownHook(shutdownThread);
 
-   socket = server.accept();
-   in = socket.getInputStream();
-   out = socket.getOutputStream();
-
-   byte[] opSize = new byte[4];
-   putInt(opSize, 0, operator.length);
-   out.write(opSize, 0, 4);
-   out.write(operator, 0, operator.length);
-
-   byte[] meta = importString.toString().getBytes("utf-8");
-   putInt(opSize, 0, meta.length);
-   out.write(opSize, 0, 4);
-   out.write(meta, 0, meta.length);
-
-   byte[] input = inputFilePath.getBytes("utf-8");
-   putInt(opSize, 0, input.length);
-   out.write(opSize, 0, 4);
-   out.write(input, 0, input.length);
-
-   byte[] output = outputFilePath.getBytes("utf-8");
-   putInt(opSize, 0, output.length);
-   out.write(opSize, 0, 4);
-   out.write(output, 0, output.length);
-
-   out.flush();
+   process.getOutputStream().write("operator\n".getBytes());
+   process.getOutputStream().write(("" + server.getLocalPort() + 
"\n").getBytes());
+   process.getOutputStream().write((id + "\n").getBytes());
+   process.getOutputStream().write((inputFilePath + 
"\n").getBytes());
+   process.getOutputStream().write((outputFilePath + 
"\n").getBytes());
+   process.getOutputStream().flush();
--- End diff --

We could reuse `process.getOutputStream()` here by saving it to a variable.


> [Py] Rework operator distribution
> -
>
> Key: FLINK-1927
> URL: https://issues.apache.org/jira/browse/FLINK-1927
> Project: Flink
>  Issue Type: Improvement
>  Components: Python API
>Affects Versions: 0.9
>Reporter: Chesnay Schepler
>Assignee: Chesnay Schepler
>Priority: Minor
> Fix For: 0.9
>
>
> Currently, the python operator is created when execution the python plan 
> file, serialized using dill and saved as a byte[] in the java function. It is 
> then deserialized at runtime on each node.
> The current implementation is fairly hacky, and imposes certain limitations 
> that make it hard to work with. Chaining, or generally saving other 
> user-code, always requires a separate deserialization step after 
> deserializing the operator.
> These issues can be easily circumvented by rebuilding the (python) plan on 
> each node, instead of serializing the operator. The plan creation is 
> deterministic, and every operator is uniquely identified by an ID that is 
> already known to the java function.
> This change will allow us to easily support custom serializers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1927) [Py] Rework operator distribution

2015-07-29 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14646216#comment-14646216
 ] 

ASF GitHub Bot commented on FLINK-1927:
---

Github user mxm commented on the pull request:

https://github.com/apache/flink/pull/931#issuecomment-125979125
  
Thanks for the pull request @zentol!

+1 for removing the dill library. As far as I can see, we handle all the 
serialization ourselves now. We only used the Dill library to serialize the 
user-defined function alongside with the operator. Now, the operator is 
extracted from the plan which has been distributed in the Python files to the 
nodes beforehand. The plan is only send once to generate the Java execution 
plan. The old behavior was to serialize the operator, pass it to Java, and sent 
it back again during execution. Performance-wise the new implementation could 
even be a bit faster.

+1 for explicitly passing the file paths. Java and Python have different 
ways to determine temporary file paths and this has been a problem in the 
passed on some platforms.

Your changes are sensible and this looks to merge.

Changes look sensible and good to me.


> [Py] Rework operator distribution
> -
>
> Key: FLINK-1927
> URL: https://issues.apache.org/jira/browse/FLINK-1927
> Project: Flink
>  Issue Type: Improvement
>  Components: Python API
>Affects Versions: 0.9
>Reporter: Chesnay Schepler
>Assignee: Chesnay Schepler
>Priority: Minor
> Fix For: 0.9
>
>
> Currently, the python operator is created when execution the python plan 
> file, serialized using dill and saved as a byte[] in the java function. It is 
> then deserialized at runtime on each node.
> The current implementation is fairly hacky, and imposes certain limitations 
> that make it hard to work with. Chaining, or generally saving other 
> user-code, always requires a separate deserialization step after 
> deserializing the operator.
> These issues can be easily circumvented by rebuilding the (python) plan on 
> each node, instead of serializing the operator. The plan creation is 
> deterministic, and every operator is uniquely identified by an ID that is 
> already known to the java function.
> This change will allow us to easily support custom serializers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-1927][py] Operator distribution rework

2015-07-29 Thread mxm
Github user mxm commented on the pull request:

https://github.com/apache/flink/pull/931#issuecomment-125979125
  
Thanks for the pull request @zentol!

+1 for removing the dill library. As far as I can see, we handle all the 
serialization ourselves now. We only used the Dill library to serialize the 
user-defined function alongside with the operator. Now, the operator is 
extracted from the plan which has been distributed in the Python files to the 
nodes beforehand. The plan is only send once to generate the Java execution 
plan. The old behavior was to serialize the operator, pass it to Java, and sent 
it back again during execution. Performance-wise the new implementation could 
even be a bit faster.

+1 for explicitly passing the file paths. Java and Python have different 
ways to determine temporary file paths and this has been a problem in the 
passed on some platforms.

Your changes are sensible and this looks to merge.

Changes look sensible and good to me.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-1927][py] Operator distribution rework

2015-07-29 Thread zentol
Github user zentol commented on the pull request:

https://github.com/apache/flink/pull/931#issuecomment-125983186
  
Thanks for the review @mxm .

I've addressed the cosmetic issue you mentioned, and added a small fix for 
a separate issue as well (error reporting was partially broken).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-1927) [Py] Rework operator distribution

2015-07-29 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14646239#comment-14646239
 ] 

ASF GitHub Bot commented on FLINK-1927:
---

Github user zentol commented on the pull request:

https://github.com/apache/flink/pull/931#issuecomment-125983186
  
Thanks for the review @mxm .

I've addressed the cosmetic issue you mentioned, and added a small fix for 
a separate issue as well (error reporting was partially broken).


> [Py] Rework operator distribution
> -
>
> Key: FLINK-1927
> URL: https://issues.apache.org/jira/browse/FLINK-1927
> Project: Flink
>  Issue Type: Improvement
>  Components: Python API
>Affects Versions: 0.9
>Reporter: Chesnay Schepler
>Assignee: Chesnay Schepler
>Priority: Minor
> Fix For: 0.9
>
>
> Currently, the python operator is created when execution the python plan 
> file, serialized using dill and saved as a byte[] in the java function. It is 
> then deserialized at runtime on each node.
> The current implementation is fairly hacky, and imposes certain limitations 
> that make it hard to work with. Chaining, or generally saving other 
> user-code, always requires a separate deserialization step after 
> deserializing the operator.
> These issues can be easily circumvented by rebuilding the (python) plan on 
> each node, instead of serializing the operator. The plan creation is 
> deterministic, and every operator is uniquely identified by an ID that is 
> already known to the java function.
> This change will allow us to easily support custom serializers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-2425) Give access to TaskManager config and hostname in the Runtime Environment

2015-07-29 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14646251#comment-14646251
 ] 

ASF GitHub Bot commented on FLINK-2425:
---

GitHub user sachingoel0101 opened a pull request:

https://github.com/apache/flink/pull/952

[FLINK-2425]Provide access to task manager configuration from 
RuntimeEnvironment

Also fixes [FLINK-2426]: Define an UnmodifiableConfiguration class which 
doesn't allow modifications to the underlying configuration object.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/sachingoel0101/flink flink-2426

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/952.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #952


commit a8b0debbd60dd4eeb01e007e0b736f32f3724c71
Author: Sachin Goel 
Date:   2015-07-29T13:21:58Z

[FLINK-2425]Provide access to task manager configuration inside 
RuntimeEnvironment




> Give access to TaskManager config and hostname in the Runtime Environment
> -
>
> Key: FLINK-2425
> URL: https://issues.apache.org/jira/browse/FLINK-2425
> Project: Flink
>  Issue Type: Improvement
>  Components: TaskManager
>Affects Versions: 0.10
>Reporter: Stephan Ewen
>Assignee: Sachin Goel
> Fix For: 0.10
>
>
> The RuntimeEnvironment (that is used by the operators to access the context) 
> should give access to the TaskManager's configuration, to allow to read 
> config values.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-2425]Provide access to task manager con...

2015-07-29 Thread sachingoel0101
GitHub user sachingoel0101 opened a pull request:

https://github.com/apache/flink/pull/952

[FLINK-2425]Provide access to task manager configuration from 
RuntimeEnvironment

Also fixes [FLINK-2426]: Define an UnmodifiableConfiguration class which 
doesn't allow modifications to the underlying configuration object.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/sachingoel0101/flink flink-2426

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/952.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #952


commit a8b0debbd60dd4eeb01e007e0b736f32f3724c71
Author: Sachin Goel 
Date:   2015-07-29T13:21:58Z

[FLINK-2425]Provide access to task manager configuration inside 
RuntimeEnvironment




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: Cascading changes for compatibility

2015-07-29 Thread mxm
Github user mxm commented on the pull request:

https://github.com/apache/flink/pull/950#issuecomment-125988920
  
Yes, I've updated the pull request.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-2399] Version checks for Job Manager an...

2015-07-29 Thread sachingoel0101
Github user sachingoel0101 commented on the pull request:

https://github.com/apache/flink/pull/945#issuecomment-125991194
  
1. Added version checks between JobClient and JobManager. 
2. Versions are accessed from the Configuration class now, since flink-core 
gets built first and version is readily available, leading to exact 
verification.
3. Added a dummy version string to allow passing of tests in IDE.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-2399) Fail when actor versions don't match

2015-07-29 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14646279#comment-14646279
 ] 

ASF GitHub Bot commented on FLINK-2399:
---

Github user sachingoel0101 commented on the pull request:

https://github.com/apache/flink/pull/945#issuecomment-125991194
  
1. Added version checks between JobClient and JobManager. 
2. Versions are accessed from the Configuration class now, since flink-core 
gets built first and version is readily available, leading to exact 
verification.
3. Added a dummy version string to allow passing of tests in IDE.


> Fail when actor versions don't match
> 
>
> Key: FLINK-2399
> URL: https://issues.apache.org/jira/browse/FLINK-2399
> Project: Flink
>  Issue Type: Improvement
>  Components: JobManager, TaskManager
>Affects Versions: 0.9, master
>Reporter: Ufuk Celebi
>Assignee: Sachin Goel
>Priority: Minor
> Fix For: 0.10
>
>
> Problem: there can be subtle errors when actors from different Flink versions 
> communicate with each other, for example when an old client (e.g. Flink 0.9) 
> communicates with a new JobManager (e.g. Flink 0.10-SNAPSHOT).
> We can check that the versions match on first communication between the 
> actors and fail if they don't match.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-2407] [streaming] Add an API switch to ...

2015-07-29 Thread gyfora
Github user gyfora commented on a diff in the pull request:

https://github.com/apache/flink/pull/951#discussion_r35778711
  
--- Diff: 
flink-staging/flink-streaming/flink-streaming-core/src/main/java/org/apache/flink/streaming/api/graph/StreamingJobGraphGenerator.java
 ---
@@ -269,10 +270,19 @@ private void setVertexConfig(Integer vertexID, 
StreamConfig config,
config.setNumberOfOutputs(nonChainableOutputs.size());
config.setNonChainedOutputs(nonChainableOutputs);
config.setChainedOutputs(chainableOutputs);
-   config.setStateMonitoring(streamGraph.isCheckpointingEnabled());
-   
config.setStateHandleProvider(streamGraph.getStateHandleProvider());
+
+   
config.setCheckpointingEnabled(streamGraph.isCheckpointingEnabled());
+   if (streamGraph.isCheckpointingEnabled()) {
+   
config.setCheckpointMode(streamGraph.getCheckpointingMode());
+   
config.setStateHandleProvider(streamGraph.getStateHandleProvider());
+   } else {
+   // the at least once input handler is slightly cheaper 
(in the absence of checkpoints),
+   // so we use that one if checkpointing is not enabled
+   
config.setCheckpointMode(CheckpointingMode.AT_LEAST_ONCE);
--- End diff --

So we set an at_least_once handler even though we won't receive any 
barriers? Why not add an OFF mode which will use a "no-op barrier handler" or 
skip the barrier handler  altogether? 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-2407) Add an API switch to select between "exactly once" and "at least once" fault tolerance

2015-07-29 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14646336#comment-14646336
 ] 

ASF GitHub Bot commented on FLINK-2407:
---

Github user gyfora commented on a diff in the pull request:

https://github.com/apache/flink/pull/951#discussion_r35778711
  
--- Diff: 
flink-staging/flink-streaming/flink-streaming-core/src/main/java/org/apache/flink/streaming/api/graph/StreamingJobGraphGenerator.java
 ---
@@ -269,10 +270,19 @@ private void setVertexConfig(Integer vertexID, 
StreamConfig config,
config.setNumberOfOutputs(nonChainableOutputs.size());
config.setNonChainedOutputs(nonChainableOutputs);
config.setChainedOutputs(chainableOutputs);
-   config.setStateMonitoring(streamGraph.isCheckpointingEnabled());
-   
config.setStateHandleProvider(streamGraph.getStateHandleProvider());
+
+   
config.setCheckpointingEnabled(streamGraph.isCheckpointingEnabled());
+   if (streamGraph.isCheckpointingEnabled()) {
+   
config.setCheckpointMode(streamGraph.getCheckpointingMode());
+   
config.setStateHandleProvider(streamGraph.getStateHandleProvider());
+   } else {
+   // the at least once input handler is slightly cheaper 
(in the absence of checkpoints),
+   // so we use that one if checkpointing is not enabled
+   
config.setCheckpointMode(CheckpointingMode.AT_LEAST_ONCE);
--- End diff --

So we set an at_least_once handler even though we won't receive any 
barriers? Why not add an OFF mode which will use a "no-op barrier handler" or 
skip the barrier handler  altogether? 


> Add an API switch to select between "exactly once" and "at least once" fault 
> tolerance
> --
>
> Key: FLINK-2407
> URL: https://issues.apache.org/jira/browse/FLINK-2407
> Project: Flink
>  Issue Type: Sub-task
>  Components: Streaming
>Affects Versions: 0.10
>Reporter: Stephan Ewen
>Assignee: Stephan Ewen
> Fix For: 0.10
>
>
> Based on the addition of the BarrierTracker, we can add a switch to choose 
> between the two modes "exactly once" and "at least once".



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-2407] [streaming] Add an API switch to ...

2015-07-29 Thread gyfora
Github user gyfora commented on the pull request:

https://github.com/apache/flink/pull/951#issuecomment-126005919
  
Aside from my minor comment, this looks very good :+1: 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-2407) Add an API switch to select between "exactly once" and "at least once" fault tolerance

2015-07-29 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14646341#comment-14646341
 ] 

ASF GitHub Bot commented on FLINK-2407:
---

Github user gyfora commented on the pull request:

https://github.com/apache/flink/pull/951#issuecomment-126005919
  
Aside from my minor comment, this looks very good :+1: 


> Add an API switch to select between "exactly once" and "at least once" fault 
> tolerance
> --
>
> Key: FLINK-2407
> URL: https://issues.apache.org/jira/browse/FLINK-2407
> Project: Flink
>  Issue Type: Sub-task
>  Components: Streaming
>Affects Versions: 0.10
>Reporter: Stephan Ewen
>Assignee: Stephan Ewen
> Fix For: 0.10
>
>
> Based on the addition of the BarrierTracker, we can add a switch to choose 
> between the two modes "exactly once" and "at least once".



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-2407] [streaming] Add an API switch to ...

2015-07-29 Thread StephanEwen
Github user StephanEwen commented on the pull request:

https://github.com/apache/flink/pull/951#issuecomment-126007290
  
Yeah, I thought about an "off handler". Turns out that the BarrierTracker 
is almost like an "off handler" when no barriers arrive.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-2407) Add an API switch to select between "exactly once" and "at least once" fault tolerance

2015-07-29 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14646350#comment-14646350
 ] 

ASF GitHub Bot commented on FLINK-2407:
---

Github user StephanEwen commented on the pull request:

https://github.com/apache/flink/pull/951#issuecomment-126007290
  
Yeah, I thought about an "off handler". Turns out that the BarrierTracker 
is almost like an "off handler" when no barriers arrive.


> Add an API switch to select between "exactly once" and "at least once" fault 
> tolerance
> --
>
> Key: FLINK-2407
> URL: https://issues.apache.org/jira/browse/FLINK-2407
> Project: Flink
>  Issue Type: Sub-task
>  Components: Streaming
>Affects Versions: 0.10
>Reporter: Stephan Ewen
>Assignee: Stephan Ewen
> Fix For: 0.10
>
>
> Based on the addition of the BarrierTracker, we can add a switch to choose 
> between the two modes "exactly once" and "at least once".



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-2407] [streaming] Add an API switch to ...

2015-07-29 Thread gyfora
Github user gyfora commented on the pull request:

https://github.com/apache/flink/pull/951#issuecomment-126008434
  
Yes, I double checked and you are right. This is practically as lightweight 
as it gets. :)

+1 to merge


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


  1   2   >