date:20150729


[ 
https://issues.apache.org/jira/browse/FLINK-2407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14646383#comment-14646383
 ] 

ASF GitHub Bot commented on FLINK-2407:
---

Github user StephanEwen commented on the pull request:

https://github.com/apache/flink/pull/951#issuecomment-126014037
  
Manually merged in b211a62111aa3c558586874d0ec5b168e6bb31f1


 Add an API switch to select between exactly once and at least once fault 
 tolerance
 --

 Key: FLINK-2407
 URL: https://issues.apache.org/jira/browse/FLINK-2407
 Project: Flink
  Issue Type: Sub-task
  Components: Streaming
Affects Versions: 0.10
Reporter: Stephan Ewen
Assignee: Stephan Ewen
 Fix For: 0.10


 Based on the addition of the BarrierTracker, we can add a switch to choose 
 between the two modes exactly once and at least once.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Closed] (FLINK-2407) Add an API switch to select between exactly once and at least once fault tolerance


 [ 
https://issues.apache.org/jira/browse/FLINK-2407?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephan Ewen closed FLINK-2407.
---

 Add an API switch to select between exactly once and at least once fault 
 tolerance
 --

 Key: FLINK-2407
 URL: https://issues.apache.org/jira/browse/FLINK-2407
 Project: Flink
  Issue Type: Sub-task
  Components: Streaming
Affects Versions: 0.10
Reporter: Stephan Ewen
Assignee: Stephan Ewen
 Fix For: 0.10


 Based on the addition of the BarrierTracker, we can add a switch to choose 
 between the two modes exactly once and at least once.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (FLINK-2407) Add an API switch to select between exactly once and at least once fault tolerance


 [ 
https://issues.apache.org/jira/browse/FLINK-2407?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephan Ewen resolved FLINK-2407.
-
Resolution: Fixed

Fixed in b211a62111aa3c558586874d0ec5b168e6bb31f1

 Add an API switch to select between exactly once and at least once fault 
 tolerance
 --

 Key: FLINK-2407
 URL: https://issues.apache.org/jira/browse/FLINK-2407
 Project: Flink
  Issue Type: Sub-task
  Components: Streaming
Affects Versions: 0.10
Reporter: Stephan Ewen
Assignee: Stephan Ewen
 Fix For: 0.10


 Based on the addition of the BarrierTracker, we can add a switch to choose 
 between the two modes exactly once and at least once.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (FLINK-2399) Fail when actor versions don't match


[ 
https://issues.apache.org/jira/browse/FLINK-2399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14646279#comment-14646279
 ] 

ASF GitHub Bot commented on FLINK-2399:
---

Github user sachingoel0101 commented on the pull request:

https://github.com/apache/flink/pull/945#issuecomment-125991194
  
1. Added version checks between JobClient and JobManager. 
2. Versions are accessed from the Configuration class now, since flink-core 
gets built first and version is readily available, leading to exact 
verification.
3. Added a dummy version string to allow passing of tests in IDE.


 Fail when actor versions don't match
 

 Key: FLINK-2399
 URL: https://issues.apache.org/jira/browse/FLINK-2399
 Project: Flink
  Issue Type: Improvement
  Components: JobManager, TaskManager
Affects Versions: 0.9, master
Reporter: Ufuk Celebi
Assignee: Sachin Goel
Priority: Minor
 Fix For: 0.10


 Problem: there can be subtle errors when actors from different Flink versions 
 communicate with each other, for example when an old client (e.g. Flink 0.9) 
 communicates with a new JobManager (e.g. Flink 0.10-SNAPSHOT).
 We can check that the versions match on first communication between the 
 actors and fail if they don't match.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (FLINK-2407) Add an API switch to select between exactly once and at least once fault tolerance


[ 
https://issues.apache.org/jira/browse/FLINK-2407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14646350#comment-14646350
 ] 

ASF GitHub Bot commented on FLINK-2407:
---

Github user StephanEwen commented on the pull request:

https://github.com/apache/flink/pull/951#issuecomment-126007290
  
Yeah, I thought about an off handler. Turns out that the BarrierTracker 
is almost like an off handler when no barriers arrive.


 Add an API switch to select between exactly once and at least once fault 
 tolerance
 --

 Key: FLINK-2407
 URL: https://issues.apache.org/jira/browse/FLINK-2407
 Project: Flink
  Issue Type: Sub-task
  Components: Streaming
Affects Versions: 0.10
Reporter: Stephan Ewen
Assignee: Stephan Ewen
 Fix For: 0.10


 Based on the addition of the BarrierTracker, we can add a switch to choose 
 between the two modes exactly once and at least once.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[GitHub] flink pull request: [FLINK-2407] [streaming] Add an API switch to ...

Github user StephanEwen commented on the pull request:

https://github.com/apache/flink/pull/951#issuecomment-126007290
  
Yeah, I thought about an off handler. Turns out that the BarrierTracker 
is almost like an off handler when no barriers arrive.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[jira] [Commented] (FLINK-2324) Rework partitioned state storage


[ 
https://issues.apache.org/jira/browse/FLINK-2324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14646391#comment-14646391
 ] 

ASF GitHub Bot commented on FLINK-2324:
---

Github user gyfora commented on the pull request:

https://github.com/apache/flink/pull/937#issuecomment-126015036
  
Any concerns against merging this? @senorcarbone?


 Rework partitioned state storage
 

 Key: FLINK-2324
 URL: https://issues.apache.org/jira/browse/FLINK-2324
 Project: Flink
  Issue Type: Improvement
Reporter: Gyula Fora
Assignee: Gyula Fora

 Partitioned states are currently stored per-key in statehandles. This is 
 alright for in-memory storage but is very inefficient for HDFS. 
 The logic behind the current mechanism is that this approach provides a way 
 to repartition a state without fetching the data from the external storage 
 and only manipulating handles.
 We should come up with a solution that can achieve both.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Deleted] (FLINK-2430) Potential race condition when restart all is called for a Twill runnable

2015-07-29 Thread Henry Saputra (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-2430?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Henry Saputra deleted FLINK-2430:
-


 Potential race condition when restart all is called for a Twill runnable
 

 Key: FLINK-2430
 URL: https://issues.apache.org/jira/browse/FLINK-2430
 Project: Flink
  Issue Type: Bug
Reporter: Henry Saputra

 When sending restart instance to all for a particular TwillRunnable, it could 
 have race condition where the heartbeat thread run right after all containers 
 have been released which make the check:
  // Looks for containers requests.
   if (provisioning.isEmpty()  runnableContainerRequests.isEmpty()  
 runningContainers.isEmpty()) {
 LOG.info(All containers completed. Shutting down application 
 master.);
 break;
   }
 This could happen when all running containers are empty and new 
 runnableContainerRequests has not been added.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[GitHub] flink pull request: [FLINK-2324] [streaming] Partitioned state che...

Github user gyfora commented on the pull request:

https://github.com/apache/flink/pull/937#issuecomment-126015036
  
Any concerns against merging this? @senorcarbone?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: FLINK-2322 Unclosed stream may leak resource

2015-07-29 Thread tedyu

Github user tedyu commented on the pull request:

https://github.com/apache/flink/pull/928#issuecomment-126021923
  
Anything I can do to move this forward ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-2407] [streaming] Add an API switch to ...

Github user gyfora commented on the pull request:

https://github.com/apache/flink/pull/951#issuecomment-126005919
  
Aside from my minor comment, this looks very good :+1: 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[jira] [Commented] (FLINK-2407) Add an API switch to select between exactly once and at least once fault tolerance


[ 
https://issues.apache.org/jira/browse/FLINK-2407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14646341#comment-14646341
 ] 

ASF GitHub Bot commented on FLINK-2407:
---

Github user gyfora commented on the pull request:

https://github.com/apache/flink/pull/951#issuecomment-126005919
  
Aside from my minor comment, this looks very good :+1: 


 Add an API switch to select between exactly once and at least once fault 
 tolerance
 --

 Key: FLINK-2407
 URL: https://issues.apache.org/jira/browse/FLINK-2407
 Project: Flink
  Issue Type: Sub-task
  Components: Streaming
Affects Versions: 0.10
Reporter: Stephan Ewen
Assignee: Stephan Ewen
 Fix For: 0.10


 Based on the addition of the BarrierTracker, we can add a switch to choose 
 between the two modes exactly once and at least once.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Closed] (FLINK-2430) Potential race condition when restart all is called for a Twill runnable

2015-07-29 Thread Henry Saputra (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-2430?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Henry Saputra closed FLINK-2430.

Resolution: Invalid

Pop sorry wrong project =(

 Potential race condition when restart all is called for a Twill runnable
 

 Key: FLINK-2430
 URL: https://issues.apache.org/jira/browse/FLINK-2430
 Project: Flink
  Issue Type: Bug
Affects Versions: 0.6-incubating
Reporter: Henry Saputra

 When sending restart instance to all for a particular TwillRunnable, it could 
 have race condition where the heartbeat thread run right after all containers 
 have been released which make the check:
  // Looks for containers requests.
   if (provisioning.isEmpty()  runnableContainerRequests.isEmpty()  
 runningContainers.isEmpty()) {
 LOG.info(All containers completed. Shutting down application 
 master.);
 break;
   }
 This could happen when all running containers are empty and new 
 runnableContainerRequests has not been added.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[GitHub] flink pull request: [FLINK-2407] [streaming] Add an API switch to ...

Github user StephanEwen commented on the pull request:

https://github.com/apache/flink/pull/951#issuecomment-126014037
  
Manually merged in b211a62111aa3c558586874d0ec5b168e6bb31f1


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[jira] [Commented] (FLINK-2425) Give access to TaskManager config and hostname in the Runtime Environment


[ 
https://issues.apache.org/jira/browse/FLINK-2425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14646251#comment-14646251
 ] 

ASF GitHub Bot commented on FLINK-2425:
---

GitHub user sachingoel0101 opened a pull request:

https://github.com/apache/flink/pull/952

[FLINK-2425]Provide access to task manager configuration from 
RuntimeEnvironment

Also fixes [FLINK-2426]: Define an UnmodifiableConfiguration class which 
doesn't allow modifications to the underlying configuration object.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/sachingoel0101/flink flink-2426

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/952.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #952


commit a8b0debbd60dd4eeb01e007e0b736f32f3724c71
Author: Sachin Goel sachingoel0...@gmail.com
Date:   2015-07-29T13:21:58Z

[FLINK-2425]Provide access to task manager configuration inside 
RuntimeEnvironment




 Give access to TaskManager config and hostname in the Runtime Environment
 -

 Key: FLINK-2425
 URL: https://issues.apache.org/jira/browse/FLINK-2425
 Project: Flink
  Issue Type: Improvement
  Components: TaskManager
Affects Versions: 0.10
Reporter: Stephan Ewen
Assignee: Sachin Goel
 Fix For: 0.10


 The RuntimeEnvironment (that is used by the operators to access the context) 
 should give access to the TaskManager's configuration, to allow to read 
 config values.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (FLINK-2430) Potential race condition when restart all is called for a Twill runnable

2015-07-29 Thread Henry Saputra (JIRA)

Henry Saputra created FLINK-2430:


 Summary: Potential race condition when restart all is called for a 
Twill runnable
 Key: FLINK-2430
 URL: https://issues.apache.org/jira/browse/FLINK-2430
 Project: Flink
  Issue Type: Bug
Affects Versions: 0.6-incubating
Reporter: Henry Saputra


When sending restart instance to all for a particular TwillRunnable, it could 
have race condition where the heartbeat thread run right after all containers 
have been released which make the check:

 // Looks for containers requests.
  if (provisioning.isEmpty()  runnableContainerRequests.isEmpty()  
runningContainers.isEmpty()) {
LOG.info(All containers completed. Shutting down application master.);
break;
  }

This could happen when all running containers are empty and new 
runnableContainerRequests has not been added.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[GitHub] flink pull request: Cascading changes for compatibility

2015-07-29 Thread mxm

Github user mxm commented on the pull request:

https://github.com/apache/flink/pull/950#issuecomment-125988920
  
Yes, I've updated the pull request.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[jira] [Commented] (FLINK-2407) Add an API switch to select between exactly once and at least once fault tolerance


[ 
https://issues.apache.org/jira/browse/FLINK-2407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14646336#comment-14646336
 ] 

ASF GitHub Bot commented on FLINK-2407:
---

Github user gyfora commented on a diff in the pull request:

https://github.com/apache/flink/pull/951#discussion_r35778711
  
--- Diff: 
flink-staging/flink-streaming/flink-streaming-core/src/main/java/org/apache/flink/streaming/api/graph/StreamingJobGraphGenerator.java
 ---
@@ -269,10 +270,19 @@ private void setVertexConfig(Integer vertexID, 
StreamConfig config,
config.setNumberOfOutputs(nonChainableOutputs.size());
config.setNonChainedOutputs(nonChainableOutputs);
config.setChainedOutputs(chainableOutputs);
-   config.setStateMonitoring(streamGraph.isCheckpointingEnabled());
-   
config.setStateHandleProvider(streamGraph.getStateHandleProvider());
+
+   
config.setCheckpointingEnabled(streamGraph.isCheckpointingEnabled());
+   if (streamGraph.isCheckpointingEnabled()) {
+   
config.setCheckpointMode(streamGraph.getCheckpointingMode());
+   
config.setStateHandleProvider(streamGraph.getStateHandleProvider());
+   } else {
+   // the at least once input handler is slightly cheaper 
(in the absence of checkpoints),
+   // so we use that one if checkpointing is not enabled
+   
config.setCheckpointMode(CheckpointingMode.AT_LEAST_ONCE);
--- End diff --

So we set an at_least_once handler even though we won't receive any 
barriers? Why not add an OFF mode which will use a no-op barrier handler or 
skip the barrier handler  altogether? 


 Add an API switch to select between exactly once and at least once fault 
 tolerance
 --

 Key: FLINK-2407
 URL: https://issues.apache.org/jira/browse/FLINK-2407
 Project: Flink
  Issue Type: Sub-task
  Components: Streaming
Affects Versions: 0.10
Reporter: Stephan Ewen
Assignee: Stephan Ewen
 Fix For: 0.10


 Based on the addition of the BarrierTracker, we can add a switch to choose 
 between the two modes exactly once and at least once.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[GitHub] flink pull request: [FLINK-2407] [streaming] Add an API switch to ...

Github user gyfora commented on a diff in the pull request:

https://github.com/apache/flink/pull/951#discussion_r35778711
  
--- Diff: 
flink-staging/flink-streaming/flink-streaming-core/src/main/java/org/apache/flink/streaming/api/graph/StreamingJobGraphGenerator.java
 ---
@@ -269,10 +270,19 @@ private void setVertexConfig(Integer vertexID, 
StreamConfig config,
config.setNumberOfOutputs(nonChainableOutputs.size());
config.setNonChainedOutputs(nonChainableOutputs);
config.setChainedOutputs(chainableOutputs);
-   config.setStateMonitoring(streamGraph.isCheckpointingEnabled());
-   
config.setStateHandleProvider(streamGraph.getStateHandleProvider());
+
+   
config.setCheckpointingEnabled(streamGraph.isCheckpointingEnabled());
+   if (streamGraph.isCheckpointingEnabled()) {
+   
config.setCheckpointMode(streamGraph.getCheckpointingMode());
+   
config.setStateHandleProvider(streamGraph.getStateHandleProvider());
+   } else {
+   // the at least once input handler is slightly cheaper 
(in the absence of checkpoints),
+   // so we use that one if checkpointing is not enabled
+   
config.setCheckpointMode(CheckpointingMode.AT_LEAST_ONCE);
--- End diff --

So we set an at_least_once handler even though we won't receive any 
barriers? Why not add an OFF mode which will use a no-op barrier handler or 
skip the barrier handler  altogether? 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[jira] [Commented] (FLINK-2407) Add an API switch to select between exactly once and at least once fault tolerance


[ 
https://issues.apache.org/jira/browse/FLINK-2407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14646384#comment-14646384
 ] 

ASF GitHub Bot commented on FLINK-2407:
---

Github user StephanEwen closed the pull request at:

https://github.com/apache/flink/pull/951


 Add an API switch to select between exactly once and at least once fault 
 tolerance
 --

 Key: FLINK-2407
 URL: https://issues.apache.org/jira/browse/FLINK-2407
 Project: Flink
  Issue Type: Sub-task
  Components: Streaming
Affects Versions: 0.10
Reporter: Stephan Ewen
Assignee: Stephan Ewen
 Fix For: 0.10


 Based on the addition of the BarrierTracker, we can add a switch to choose 
 between the two modes exactly once and at least once.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[GitHub] flink pull request: [FLINK-2407] [streaming] Add an API switch to ...

Github user gyfora commented on the pull request:

https://github.com/apache/flink/pull/951#issuecomment-126008434
  
Yes, I double checked and you are right. This is practically as lightweight 
as it gets. :)

+1 to merge


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[jira] [Commented] (FLINK-2407) Add an API switch to select between exactly once and at least once fault tolerance


[ 
https://issues.apache.org/jira/browse/FLINK-2407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14646357#comment-14646357
 ] 

ASF GitHub Bot commented on FLINK-2407:
---

Github user gyfora commented on the pull request:

https://github.com/apache/flink/pull/951#issuecomment-126008434
  
Yes, I double checked and you are right. This is practically as lightweight 
as it gets. :)

+1 to merge


 Add an API switch to select between exactly once and at least once fault 
 tolerance
 --

 Key: FLINK-2407
 URL: https://issues.apache.org/jira/browse/FLINK-2407
 Project: Flink
  Issue Type: Sub-task
  Components: Streaming
Affects Versions: 0.10
Reporter: Stephan Ewen
Assignee: Stephan Ewen
 Fix For: 0.10


 Based on the addition of the BarrierTracker, we can add a switch to choose 
 between the two modes exactly once and at least once.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (FLINK-2322) Unclosed stream may leak resource


[ 
https://issues.apache.org/jira/browse/FLINK-2322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14646439#comment-14646439
 ] 

ASF GitHub Bot commented on FLINK-2322:
---

Github user tedyu commented on the pull request:

https://github.com/apache/flink/pull/928#issuecomment-126021923
  
Anything I can do to move this forward ?


 Unclosed stream may leak resource
 -

 Key: FLINK-2322
 URL: https://issues.apache.org/jira/browse/FLINK-2322
 Project: Flink
  Issue Type: Bug
Reporter: Ted Yu
  Labels: starter

 In UdfAnalyzerUtils.java :
 {code}
 ClassReader cr = new 
 ClassReader(Thread.currentThread().getContextClassLoader()
 .getResourceAsStream(internalClassName.replace('.', '/') + 
 .class));
 {code}
 The stream returned by getResourceAsStream() should be closed upon exit of 
 findMethodNode()
 In ParameterTool#fromPropertiesFile():
 {code}
 props.load(new FileInputStream(propertiesFile));
 {code}
 The FileInputStream should be closed before returning.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[GitHub] flink pull request: [FLINK-2419] Add test for sinks after keyBy an...

2015-07-29 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/947


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[jira] [Commented] (FLINK-2324) Rework partitioned state storage


[ 
https://issues.apache.org/jira/browse/FLINK-2324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14646641#comment-14646641
 ] 

ASF GitHub Bot commented on FLINK-2324:
---

Github user aljoscha commented on the pull request:

https://github.com/apache/flink/pull/937#issuecomment-126070524
  
I was too busy but I would like to take a quick look. I will look at it 
tomorrow morning.


 Rework partitioned state storage
 

 Key: FLINK-2324
 URL: https://issues.apache.org/jira/browse/FLINK-2324
 Project: Flink
  Issue Type: Improvement
Reporter: Gyula Fora
Assignee: Gyula Fora

 Partitioned states are currently stored per-key in statehandles. This is 
 alright for in-memory storage but is very inefficient for HDFS. 
 The logic behind the current mechanism is that this approach provides a way 
 to repartition a state without fetching the data from the external storage 
 and only manipulating handles.
 We should come up with a solution that can achieve both.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[GitHub] flink pull request: [FLINK-2324] [streaming] Partitioned state che...

2015-07-29 Thread aljoscha

Github user aljoscha commented on the pull request:

https://github.com/apache/flink/pull/937#issuecomment-126070524
  
I was too busy but I would like to take a quick look. I will look at it 
tomorrow morning.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-2205] Fix confusing entries in JM UI Jo...

2015-07-29 Thread ebautistabar

Github user ebautistabar commented on the pull request:

https://github.com/apache/flink/pull/927#issuecomment-126099697
  
As the new UI didn't show job configurations yet, I have created another PR 
to deal with it, which also includes the display changes requested in 
FLINK-2205.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[jira] [Commented] (FLINK-2205) Confusing entries in JM Webfrontend Job Configuration section


[ 
https://issues.apache.org/jira/browse/FLINK-2205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14646789#comment-14646789
 ] 

ASF GitHub Bot commented on FLINK-2205:
---

Github user ebautistabar commented on the pull request:

https://github.com/apache/flink/pull/927#issuecomment-126099697
  
As the new UI didn't show job configurations yet, I have created another PR 
to deal with it, which also includes the display changes requested in 
FLINK-2205.


 Confusing entries in JM Webfrontend Job Configuration section
 -

 Key: FLINK-2205
 URL: https://issues.apache.org/jira/browse/FLINK-2205
 Project: Flink
  Issue Type: Bug
  Components: Webfrontend
Affects Versions: 0.9
Reporter: Fabian Hueske
Priority: Minor
  Labels: starter

 The Job Configuration section of the job history / analyze page of the 
 JobManager webinterface contains two confusing entries:
 - {{Number of execution retries}} is actually the maximum number of retries 
 and should be renamed accordingly. The default value is -1 and should be 
 changed to deactivated (or 0).
 - {{Job parallelism}} which is -1 by default. A parallelism of -1 is not very 
 meaningful. It would be better to show something like auto



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (FLINK-2431) [py] refactor PlanBinder/OperationInfo

2015-07-29 Thread Chesnay Schepler (JIRA)

Chesnay Schepler created FLINK-2431:
---

 Summary: [py] refactor PlanBinder/OperationInfo
 Key: FLINK-2431
 URL: https://issues.apache.org/jira/browse/FLINK-2431
 Project: Flink
  Issue Type: Improvement
  Components: Python API
Reporter: Chesnay Schepler
Assignee: Chesnay Schepler
Priority: Minor


These two classes deserve a restructuring to become more readable and 
consistent with PythonPlanBinder/PythonOperationInfo.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[GitHub] flink pull request: [FLINK-2357] [web dashboard] Shows job configu...

2015-07-29 Thread ebautistabar

GitHub user ebautistabar opened a pull request:

https://github.com/apache/flink/pull/953

[FLINK-2357] [web dashboard] Shows job configuration in new dashboard

I wanted to change the configuration display, as in #927, but it wasn't 
implemented in the new UI. So I decided to create this PR to take care of it.

- Added handler for new endpoint `/jobs/:jobid/config`
- The endpoint is hit when selecting a job in the UI
- Example of response:
```json
{
execution-config: {
execution-mode: PIPELINED,
job-parallelism: -1,
max-execution-retries: -1,
object-reuse-mode: false,
user-config: {
example: true
}
},
jid: dedc15369efa94eb1e3ad6482f1344b6,
name: WordCount Example
}
```
- The info is stored in the existing `$rootScope.job` object
- Added new tab in job display to show the config

Please, let me know if there is anything to change, improve, etc.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/ebautistabar/flink add-job-config-new-ui

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/953.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #953


commit b51444262bdb8aba4b1f6a755a36f029af066a79
Author: Enrique Bautista ebautista...@gmail.com
Date:   2015-07-29T20:49:13Z

[FLINK-2357] [web dashboard] Shows job configuration in new dashboard




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[jira] [Commented] (FLINK-2357) New JobManager Runtime Web Frontend


[ 
https://issues.apache.org/jira/browse/FLINK-2357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14646778#comment-14646778
 ] 

ASF GitHub Bot commented on FLINK-2357:
---

GitHub user ebautistabar opened a pull request:

https://github.com/apache/flink/pull/953

[FLINK-2357] [web dashboard] Shows job configuration in new dashboard

I wanted to change the configuration display, as in #927, but it wasn't 
implemented in the new UI. So I decided to create this PR to take care of it.

- Added handler for new endpoint `/jobs/:jobid/config`
- The endpoint is hit when selecting a job in the UI
- Example of response:
```json
{
execution-config: {
execution-mode: PIPELINED,
job-parallelism: -1,
max-execution-retries: -1,
object-reuse-mode: false,
user-config: {
example: true
}
},
jid: dedc15369efa94eb1e3ad6482f1344b6,
name: WordCount Example
}
```
- The info is stored in the existing `$rootScope.job` object
- Added new tab in job display to show the config

Please, let me know if there is anything to change, improve, etc.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/ebautistabar/flink add-job-config-new-ui

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/953.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #953


commit b51444262bdb8aba4b1f6a755a36f029af066a79
Author: Enrique Bautista ebautista...@gmail.com
Date:   2015-07-29T20:49:13Z

[FLINK-2357] [web dashboard] Shows job configuration in new dashboard




 New JobManager Runtime Web Frontend
 ---

 Key: FLINK-2357
 URL: https://issues.apache.org/jira/browse/FLINK-2357
 Project: Flink
  Issue Type: New Feature
  Components: Webfrontend
Affects Versions: 0.10
Reporter: Stephan Ewen
 Attachments: Webfrontend Mockup.pdf


 We need to improve rework the Job Manager Web Frontend.
 The current web frontend is limited and has a lot of design issues
   - It does not display and progress while operators are running. This is 
 especially problematic for streaming jobs
   - It has no graph representation of the data flows
   - it does not allow to look into execution attempts
   - it has no hook to deal with the upcoming live accumulators
   - The architecture is not very modular/extensible
 I propose to add a new JobManager web frontend:
   - Based on Netty HTTP (very lightweight)
   - Using rest-style URLs for jobs and vertices
   - integrating the D3 graph renderer of the previews with the runtime monitor
   - with details on execution attempts
   - first class visualization of records processed and bytes processed



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (FLINK-2419) DataStream sinks lose key information


[ 
https://issues.apache.org/jira/browse/FLINK-2419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14646592#comment-14646592
 ] 

ASF GitHub Bot commented on FLINK-2419:
---

Github user senorcarbone commented on the pull request:

https://github.com/apache/flink/pull/947#issuecomment-126054868
  
:+1: from me. looks that it does the job


 DataStream sinks lose key information
 -

 Key: FLINK-2419
 URL: https://issues.apache.org/jira/browse/FLINK-2419
 Project: Flink
  Issue Type: Bug
  Components: Streaming
Reporter: Gyula Fora
Assignee: Gyula Fora
Priority: Blocker
 Fix For: 0.10


 If the user applies an addSink method after keyBy, the sink will not have the 
 information of the key as the transform method is bypassed.
 This makes it impossible to use partitioned states with sinks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[GitHub] flink pull request: Framesize fix

2015-07-29 Thread kl0u

Github user kl0u commented on the pull request:

https://github.com/apache/flink/pull/934#issuecomment-126076384
  
Hi @mxm ,

Thanks a lot for the comments!

I integrated most of them. Please have a look and let me know what you 
think.

For the merging of the the different types of snapshots and handling them 
uniformly I do not have any current solution. If you have any, I am open, of 
course, to discuss it, because I agree that this would be nice.

For the comment on the getAccumulatorResultsStringified():
1) this is to be presented by the web interface to the user, just for 
monitoring purposes
2) this is called at the jobManager.

The problem is that the jobManager has only the blobKeys that point to the 
stored accumulators. The serialized data reside in the blobCache and have to be 
fetched in order to be inspected.

Currently the jobManager just forwards the blobKeys to the client, which 
fetches the blobs and does the deserialization and the final merging. This is 
done for jobManager scalability reasons, as given that we are talking about 
accumulators of arbitrary size, loading them from disk and deserializing them 
would be time and resource consuming. The same holds in the case that we wanted 
to get the type of these large accumulators (it is needed by the method). We 
would have to load and deserialize them at the jobManager. The currently 
implemented solution is just the result of this design decision. If you have 
any other strategy or solution that is worth implementing, let me know.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-2419] Add test for sinks after keyBy an...

2015-07-29 Thread senorcarbone

Github user senorcarbone commented on the pull request:

https://github.com/apache/flink/pull/947#issuecomment-126054868
  
:+1: from me. looks that it does the job


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-2324] [streaming] Partitioned state che...

2015-07-29 Thread senorcarbone

Github user senorcarbone commented on the pull request:

https://github.com/apache/flink/pull/937#issuecomment-126067262
  
That would be good ^^
then it's :+1: from me, at least for now.
It's generally good performance-wise to have less serialised states. This 
means that we will have a constant number of issued writes to external storage 
(== #subtasks). On the other hand this also makes our life harder a bit when it 
comes to repartitioning, as you already mentioned we need to revisit this.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[jira] [Commented] (FLINK-2324) Rework partitioned state storage


[ 
https://issues.apache.org/jira/browse/FLINK-2324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14646634#comment-14646634
 ] 

ASF GitHub Bot commented on FLINK-2324:
---

Github user senorcarbone commented on the pull request:

https://github.com/apache/flink/pull/937#issuecomment-126067262
  
That would be good ^^
then it's :+1: from me, at least for now.
It's generally good performance-wise to have less serialised states. This 
means that we will have a constant number of issued writes to external storage 
(== #subtasks). On the other hand this also makes our life harder a bit when it 
comes to repartitioning, as you already mentioned we need to revisit this.


 Rework partitioned state storage
 

 Key: FLINK-2324
 URL: https://issues.apache.org/jira/browse/FLINK-2324
 Project: Flink
  Issue Type: Improvement
Reporter: Gyula Fora
Assignee: Gyula Fora

 Partitioned states are currently stored per-key in statehandles. This is 
 alright for in-memory storage but is very inefficient for HDFS. 
 The logic behind the current mechanism is that this approach provides a way 
 to repartition a state without fetching the data from the external storage 
 and only manipulating handles.
 We should come up with a solution that can achieve both.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Closed] (FLINK-2419) DataStream sinks lose key information

2015-07-29 Thread Gyula Fora (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-2419?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gyula Fora closed FLINK-2419.
-
Resolution: Fixed

 DataStream sinks lose key information
 -

 Key: FLINK-2419
 URL: https://issues.apache.org/jira/browse/FLINK-2419
 Project: Flink
  Issue Type: Bug
  Components: Streaming
Reporter: Gyula Fora
Assignee: Gyula Fora
Priority: Blocker
 Fix For: 0.10


 If the user applies an addSink method after keyBy, the sink will not have the 
 information of the key as the transform method is bypassed.
 This makes it impossible to use partitioned states with sinks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (FLINK-2324) Rework partitioned state storage


[ 
https://issues.apache.org/jira/browse/FLINK-2324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14646619#comment-14646619
 ] 

ASF GitHub Bot commented on FLINK-2324:
---

Github user gyfora commented on the pull request:

https://github.com/apache/flink/pull/937#issuecomment-126062430
  
I will actually add one more test to verify the correct behaviour of all 
the wrapper statehandle classes.


 Rework partitioned state storage
 

 Key: FLINK-2324
 URL: https://issues.apache.org/jira/browse/FLINK-2324
 Project: Flink
  Issue Type: Improvement
Reporter: Gyula Fora
Assignee: Gyula Fora

 Partitioned states are currently stored per-key in statehandles. This is 
 alright for in-memory storage but is very inefficient for HDFS. 
 The logic behind the current mechanism is that this approach provides a way 
 to repartition a state without fetching the data from the external storage 
 and only manipulating handles.
 We should come up with a solution that can achieve both.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[GitHub] flink pull request: [FLINK-2324] [streaming] Partitioned state che...

Github user gyfora commented on the pull request:

https://github.com/apache/flink/pull/937#issuecomment-126062430
  
I will actually add one more test to verify the correct behaviour of all 
the wrapper statehandle classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[jira] [Created] (FLINK-2433) Jekyll on windows for building local documentation

Sachin Goel created FLINK-2433:
--

 Summary: Jekyll on windows for building local documentation
 Key: FLINK-2433
 URL: https://issues.apache.org/jira/browse/FLINK-2433
 Project: Flink
  Issue Type: Improvement
  Components: Documentation
Reporter: Sachin Goel


We should add a script to allow building documentation on a Windows machine 
similar to the docs/build_docs.sh script.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (FLINK-1919) Add HCatOutputFormat for Tuple data types

2015-07-29 Thread James Cao (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-1919?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14647061#comment-14647061
 ] 

James Cao commented on FLINK-1919:
--

Hello, is there any one working on this issue now? If not, I would like to 
contribute on this issue. Thanks:)

 Add HCatOutputFormat for Tuple data types
 -

 Key: FLINK-1919
 URL: https://issues.apache.org/jira/browse/FLINK-1919
 Project: Flink
  Issue Type: New Feature
  Components: Java API, Scala API
Reporter: Fabian Hueske
Priority: Minor
  Labels: starter

 It would be good to have an OutputFormat that can write data to HCatalog 
 tables.
 The Hadoop `HCatOutputFormat` expects `HCatRecord` objects and writes these 
 to HCatalog tables. We can do the same thing, by creating these `HCatRecord` 
 object with a Map function that precedes a `HadoopOutputFormat` that wraps 
 the Hadoop `HCatOutputFormat`.
 Better support for Flink Tuples can be added by implementing a custom 
 `HCatOutputFormat` that also depends on the Hadoop `HCatOutputFormat` but 
 internally converts Flink Tuples to `HCatRecords`. This would also include to 
 check if the schema of the HCatalog table and the Flink tuples match. For 
 data types other than tuples, the OutputFormat could either require a 
 preceding Map function that converts to `HCatRecords` or let users specify a 
 MapFunction and invoke that internally.
 We have already a Flink `HCatInputFormat` which does this in the reverse 
 directions, i.e., it emits Flink Tuples from HCatalog tables.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[GitHub] flink pull request: [FLINK-2433][docs]Add script to build local do...

2015-07-29 Thread sachingoel0101

GitHub user sachingoel0101 opened a pull request:

https://github.com/apache/flink/pull/954

[FLINK-2433][docs]Add script to build local documentation on windows



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/sachingoel0101/flink flink-2433

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/954.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #954


commit 1e13daffd34e84d6436221e31ae39736f76a9a72
Author: Sachin Goel sachingoel0...@gmail.com
Date:   2015-07-30T04:18:09Z

[FLINK-2433][docs]Add script to build local documentation on windows




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[jira] [Commented] (FLINK-2433) Jekyll on windows for building local documentation


[ 
https://issues.apache.org/jira/browse/FLINK-2433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14647149#comment-14647149
 ] 

ASF GitHub Bot commented on FLINK-2433:
---

GitHub user sachingoel0101 opened a pull request:

https://github.com/apache/flink/pull/954

[FLINK-2433][docs]Add script to build local documentation on windows



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/sachingoel0101/flink flink-2433

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/954.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #954


commit 1e13daffd34e84d6436221e31ae39736f76a9a72
Author: Sachin Goel sachingoel0...@gmail.com
Date:   2015-07-30T04:18:09Z

[FLINK-2433][docs]Add script to build local documentation on windows




 Jekyll on windows for building local documentation
 --

 Key: FLINK-2433
 URL: https://issues.apache.org/jira/browse/FLINK-2433
 Project: Flink
  Issue Type: Improvement
  Components: Documentation
Reporter: Sachin Goel

 We should add a script to allow building documentation on a Windows machine 
 similar to the docs/build_docs.sh script.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (FLINK-2324) Rework partitioned state storage


[ 
https://issues.apache.org/jira/browse/FLINK-2324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14645645#comment-14645645
 ] 

ASF GitHub Bot commented on FLINK-2324:
---

Github user gyfora commented on the pull request:

https://github.com/apache/flink/pull/937#issuecomment-125872157
  
Merging your PR fixed the exactly-once guarantees @StephanEwen , great!

I also added an extra test for Partitioned states.

Do you think it is okay to leave the StreamCheckpointingITCase how I 
modified it? Now it tests all the different checkpointing mechanisms together. 
If you add a test for the Checkpointed interface itself, this should be good.


 Rework partitioned state storage
 

 Key: FLINK-2324
 URL: https://issues.apache.org/jira/browse/FLINK-2324
 Project: Flink
  Issue Type: Improvement
Reporter: Gyula Fora
Assignee: Gyula Fora

 Partitioned states are currently stored per-key in statehandles. This is 
 alright for in-memory storage but is very inefficient for HDFS. 
 The logic behind the current mechanism is that this approach provides a way 
 to repartition a state without fetching the data from the external storage 
 and only manipulating handles.
 We should come up with a solution that can achieve both.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[GitHub] flink pull request: [FLINK-2324] [streaming] Partitioned state che...

Github user gyfora commented on the pull request:

https://github.com/apache/flink/pull/937#issuecomment-125872157
  
Merging your PR fixed the exactly-once guarantees @StephanEwen , great!

I also added an extra test for Partitioned states.

Do you think it is okay to leave the StreamCheckpointingITCase how I 
modified it? Now it tests all the different checkpointing mechanisms together. 
If you add a test for the Checkpointed interface itself, this should be good.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[jira] [Closed] (FLINK-2406) Abstract BarrierBuffer to an exchangeable BarrierHandler


 [ 
https://issues.apache.org/jira/browse/FLINK-2406?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephan Ewen closed FLINK-2406.
---

 Abstract BarrierBuffer to an exchangeable BarrierHandler
 

 Key: FLINK-2406
 URL: https://issues.apache.org/jira/browse/FLINK-2406
 Project: Flink
  Issue Type: Sub-task
  Components: Streaming
Affects Versions: 0.10
Reporter: Stephan Ewen
Assignee: Stephan Ewen
 Fix For: 0.10


 We need to make the Checkpoint handling pluggable, to allow us to use 
 different implementations:
   - BarrierBuffer for exactly once processing. This inevitably introduces a 
 bit of latency.
   - BarrierTracker for at least once processing, with no added latency.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (FLINK-2402) Add a non-blocking BarrierTracker


 [ 
https://issues.apache.org/jira/browse/FLINK-2402?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephan Ewen resolved FLINK-2402.
-
Resolution: Implemented

Implemented in 8f87b7164b644ea8f1708f7eb76567e58341b224

 Add a non-blocking BarrierTracker
 -

 Key: FLINK-2402
 URL: https://issues.apache.org/jira/browse/FLINK-2402
 Project: Flink
  Issue Type: Sub-task
  Components: Streaming
Affects Versions: 0.10
Reporter: Stephan Ewen
Assignee: Stephan Ewen
 Fix For: 0.10


 This issue would add a new tracker for barriers that simply tracks what 
 barriers have been observed from which inputs. It never blocks off buffers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Closed] (FLINK-2402) Add a non-blocking BarrierTracker


 [ 
https://issues.apache.org/jira/browse/FLINK-2402?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephan Ewen closed FLINK-2402.
---

 Add a non-blocking BarrierTracker
 -

 Key: FLINK-2402
 URL: https://issues.apache.org/jira/browse/FLINK-2402
 Project: Flink
  Issue Type: Sub-task
  Components: Streaming
Affects Versions: 0.10
Reporter: Stephan Ewen
Assignee: Stephan Ewen
 Fix For: 0.10


 This issue would add a new tracker for barriers that simply tracks what 
 barriers have been observed from which inputs. It never blocks off buffers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (FLINK-2406) Abstract BarrierBuffer to an exchangeable BarrierHandler


 [ 
https://issues.apache.org/jira/browse/FLINK-2406?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephan Ewen resolved FLINK-2406.
-
Resolution: Fixed

Fixed via 0579f90bab165a7df336163eb9d6337267020029

 Abstract BarrierBuffer to an exchangeable BarrierHandler
 

 Key: FLINK-2406
 URL: https://issues.apache.org/jira/browse/FLINK-2406
 Project: Flink
  Issue Type: Sub-task
  Components: Streaming
Affects Versions: 0.10
Reporter: Stephan Ewen
Assignee: Stephan Ewen
 Fix For: 0.10


 We need to make the Checkpoint handling pluggable, to allow us to use 
 different implementations:
   - BarrierBuffer for exactly once processing. This inevitably introduces a 
 bit of latency.
   - BarrierTracker for at least once processing, with no added latency.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (FLINK-2406) Abstract BarrierBuffer to an exchangeable BarrierHandler


[ 
https://issues.apache.org/jira/browse/FLINK-2406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14645744#comment-14645744
 ] 

ASF GitHub Bot commented on FLINK-2406:
---

Github user StephanEwen commented on the pull request:

https://github.com/apache/flink/pull/938#issuecomment-125887226
  
It failed when at the end of the program, queued channel data was present 
at the end of the job. This happens only with slow consumers. Apparently, the 
local machines are usually fast enough to never leave queue data, and Travis is 
the slow one again ;-)


 Abstract BarrierBuffer to an exchangeable BarrierHandler
 

 Key: FLINK-2406
 URL: https://issues.apache.org/jira/browse/FLINK-2406
 Project: Flink
  Issue Type: Sub-task
  Components: Streaming
Affects Versions: 0.10
Reporter: Stephan Ewen
Assignee: Stephan Ewen
 Fix For: 0.10


 We need to make the Checkpoint handling pluggable, to allow us to use 
 different implementations:
   - BarrierBuffer for exactly once processing. This inevitably introduces a 
 bit of latency.
   - BarrierTracker for at least once processing, with no added latency.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[GitHub] flink pull request: [FLINK-2406] [FLINK-2402] Abstract the Barrier...

Github user StephanEwen commented on the pull request:

https://github.com/apache/flink/pull/938#issuecomment-125889092
  
Manually merged in 8f87b7164b644ea8f1708f7eb76567e58341b224


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[jira] [Commented] (FLINK-2406) Abstract BarrierBuffer to an exchangeable BarrierHandler


[ 
https://issues.apache.org/jira/browse/FLINK-2406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14645761#comment-14645761
 ] 

ASF GitHub Bot commented on FLINK-2406:
---

Github user StephanEwen closed the pull request at:

https://github.com/apache/flink/pull/938


 Abstract BarrierBuffer to an exchangeable BarrierHandler
 

 Key: FLINK-2406
 URL: https://issues.apache.org/jira/browse/FLINK-2406
 Project: Flink
  Issue Type: Sub-task
  Components: Streaming
Affects Versions: 0.10
Reporter: Stephan Ewen
Assignee: Stephan Ewen
 Fix For: 0.10


 We need to make the Checkpoint handling pluggable, to allow us to use 
 different implementations:
   - BarrierBuffer for exactly once processing. This inevitably introduces a 
 bit of latency.
   - BarrierTracker for at least once processing, with no added latency.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[GitHub] flink pull request: [FLINK-2231] Create a Serializer for Scala Enu...

Github user StephanEwen commented on the pull request:

https://github.com/apache/flink/pull/935#issuecomment-125890350
  
Other than the one comment, this looks good.

+1


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[jira] [Commented] (FLINK-1901) Create sample operator for Dataset


[ 
https://issues.apache.org/jira/browse/FLINK-1901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14645658#comment-14645658
 ] 

ASF GitHub Bot commented on FLINK-1901:
---

GitHub user ChengXiangLi opened a pull request:

https://github.com/apache/flink/pull/949

[FLINK-1901] [core] Create sample operator for Dataset.

This PR includes:
1. 4 random sampler implementation for different sample strategies.
2. sample operator for DataSet Java API.
3. random sampler unit test.
4. sample operator Java API integration test.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/ChengXiangLi/flink FLINK-1901

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/949.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #949


commit f7ba8779b8d6a6d66ab5d4e2435a70e220b1e0fc
Author: chengxiang li chengxiang...@intel.com
Date:   2015-07-22T03:38:13Z

[FLINK-1901] [core] Create sample operator for Dataset.




 Create sample operator for Dataset
 --

 Key: FLINK-1901
 URL: https://issues.apache.org/jira/browse/FLINK-1901
 Project: Flink
  Issue Type: Improvement
  Components: Core
Reporter: Theodore Vasiloudis
Assignee: Chengxiang Li

 In order to be able to implement Stochastic Gradient Descent and a number of 
 other machine learning algorithms we need to have a way to take a random 
 sample from a Dataset.
 We need to be able to sample with or without replacement from the Dataset, 
 choose the relative size of the sample, and set a seed for reproducibility.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[GitHub] flink pull request: [FLINK-1901] [core] Create sample operator for...

2015-07-29 Thread ChengXiangLi

GitHub user ChengXiangLi opened a pull request:

https://github.com/apache/flink/pull/949

[FLINK-1901] [core] Create sample operator for Dataset.

This PR includes:
1. 4 random sampler implementation for different sample strategies.
2. sample operator for DataSet Java API.
3. random sampler unit test.
4. sample operator Java API integration test.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/ChengXiangLi/flink FLINK-1901

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/949.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #949


commit f7ba8779b8d6a6d66ab5d4e2435a70e220b1e0fc
Author: chengxiang li chengxiang...@intel.com
Date:   2015-07-22T03:38:13Z

[FLINK-1901] [core] Create sample operator for Dataset.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-2399] Version checks for Job Manager an...

Github user StephanEwen commented on the pull request:

https://github.com/apache/flink/pull/945#issuecomment-125886476
  
Using `getClass().getPackage().getImplementationVersion()` would be a 
decent first approach then, I guess. The critical part seems to be the 
Client-to-JobManager communication.

How about the following as a first step: The client sends its version 
together with the `SubmitJob` message (just add a field there). The JobManager 
would check the version and respond with a failure, if it does not match. You 
can probably make the JobManager part very simple, no need to add extra 
constructor parameters, etc.

That way, the change would be minimally invasive, and we could see how well 
it addresses the issues, and whether we should extend this to other messages as 
well.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-2406] [FLINK-2402] Abstract the Barrier...

2015-07-29 Thread uce

Github user uce commented on the pull request:

https://github.com/apache/flink/pull/938#issuecomment-125887588
  
I really love Travis :heart:


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-2406] [FLINK-2402] Abstract the Barrier...

Github user StephanEwen closed the pull request at:

https://github.com/apache/flink/pull/938


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[jira] [Resolved] (FLINK-2418) Add an end-to-end streaming fault tolerance test for the Checkpointed interface


 [ 
https://issues.apache.org/jira/browse/FLINK-2418?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephan Ewen resolved FLINK-2418.
-
Resolution: Fixed

Fixed via a0556efb233f15c6985d17886372a8b4b00392b2

 Add an end-to-end  streaming fault tolerance test for the Checkpointed 
 interface
 

 Key: FLINK-2418
 URL: https://issues.apache.org/jira/browse/FLINK-2418
 Project: Flink
  Issue Type: Bug
  Components: Streaming
Affects Versions: 0.10
Reporter: Stephan Ewen
Assignee: Stephan Ewen
 Fix For: 0.10


 The current test lacks the following:
   - Does not validate and exactly-once counts after partitioning step
   - Does not cover all cases with the {{Checkpointed}} interface, but uses a 
 mix of by-key state and explicitly Checkpointed state
   - The test uses a non-fault-tolerant deprecated infinite reducer.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[GitHub] flink pull request: [FLINK-2406] [FLINK-2402] Abstract the Barrier...

Github user StephanEwen commented on the pull request:

https://github.com/apache/flink/pull/938#issuecomment-125887226
  
It failed when at the end of the program, queued channel data was present 
at the end of the job. This happens only with slow consumers. Apparently, the 
local machines are usually fast enough to never leave queue data, and Travis is 
the slow one again ;-)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[jira] [Commented] (FLINK-2231) Create a Serializer for Scala Enumerations


[ 
https://issues.apache.org/jira/browse/FLINK-2231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14645773#comment-14645773
 ] 

ASF GitHub Bot commented on FLINK-2231:
---

Github user StephanEwen commented on the pull request:

https://github.com/apache/flink/pull/935#issuecomment-125890350
  
Other than the one comment, this looks good.

+1


 Create a Serializer for Scala Enumerations
 --

 Key: FLINK-2231
 URL: https://issues.apache.org/jira/browse/FLINK-2231
 Project: Flink
  Issue Type: Improvement
  Components: Scala API
Reporter: Stephan Ewen
Assignee: Alexander Alexandrov

 Scala Enumerations are currently serialized with Kryo, but should be 
 efficiently serialized by just writing the {{initial}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Closed] (FLINK-2418) Add an end-to-end streaming fault tolerance test for the Checkpointed interface


 [ 
https://issues.apache.org/jira/browse/FLINK-2418?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephan Ewen closed FLINK-2418.
---

 Add an end-to-end  streaming fault tolerance test for the Checkpointed 
 interface
 

 Key: FLINK-2418
 URL: https://issues.apache.org/jira/browse/FLINK-2418
 Project: Flink
  Issue Type: Bug
  Components: Streaming
Affects Versions: 0.10
Reporter: Stephan Ewen
Assignee: Stephan Ewen
 Fix For: 0.10


 The current test lacks the following:
   - Does not validate and exactly-once counts after partitioning step
   - Does not cover all cases with the {{Checkpointed}} interface, but uses a 
 mix of by-key state and explicitly Checkpointed state
   - The test uses a non-fault-tolerant deprecated infinite reducer.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (FLINK-2399) Fail when actor versions don't match

[
https://issues.apache.org/jira/browse/FLINK-2399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14645737#comment-14645737
]

ASF GitHub Bot commented on FLINK-2399:
---

Github user StephanEwen commented on the pull request:

https://github.com/apache/flink/pull/945#issuecomment-125886476

Using `getClass().getPackage().getImplementationVersion()` would be a
decent first approach then, I guess. The critical part seems to be the
Client-to-JobManager communication.

How about the following as a first step: The client sends its version
together with the `SubmitJob` message (just add a field there). The JobManager
would check the version and respond with a failure, if it does not match. You
can probably make the JobManager part very simple, no need to add extra
constructor parameters, etc.

That way, the change would be minimally invasive, and we could see how well
it addresses the issues, and whether we should extend this to other messages as
well.

Fail when actor versions don't match

Key: FLINK-2399
URL: https://issues.apache.org/jira/browse/FLINK-2399
Project: Flink
Issue Type: Improvement
Components: JobManager, TaskManager
Affects Versions: 0.9, master
Reporter: Ufuk Celebi
Assignee: Sachin Goel
Priority: Minor
Fix For: 0.10

Problem: there can be subtle errors when actors from different Flink versions
communicate with each other, for example when an old client (e.g. Flink 0.9)
communicates with a new JobManager (e.g. Flink 0.10-SNAPSHOT).
We can check that the versions match on first communication between the
actors and fail if they don't match.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[GitHub] flink pull request: [FLINK-2391]Fix Storm-compatibility FlinkTopol...

Github user StephanEwen commented on the pull request:

https://github.com/apache/flink/pull/940#issuecomment-125887557
  
Will merge this...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[jira] [Commented] (FLINK-2406) Abstract BarrierBuffer to an exchangeable BarrierHandler


[ 
https://issues.apache.org/jira/browse/FLINK-2406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14645749#comment-14645749
 ] 

ASF GitHub Bot commented on FLINK-2406:
---

Github user uce commented on the pull request:

https://github.com/apache/flink/pull/938#issuecomment-125887588
  
I really love Travis :heart:


 Abstract BarrierBuffer to an exchangeable BarrierHandler
 

 Key: FLINK-2406
 URL: https://issues.apache.org/jira/browse/FLINK-2406
 Project: Flink
  Issue Type: Sub-task
  Components: Streaming
Affects Versions: 0.10
Reporter: Stephan Ewen
Assignee: Stephan Ewen
 Fix For: 0.10


 We need to make the Checkpoint handling pluggable, to allow us to use 
 different implementations:
   - BarrierBuffer for exactly once processing. This inevitably introduces a 
 bit of latency.
   - BarrierTracker for at least once processing, with no added latency.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (FLINK-2391) Storm-compatibility:method FlinkTopologyBuilder.createTopology() throws java.lang.NullPointerException


[ 
https://issues.apache.org/jira/browse/FLINK-2391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14645748#comment-14645748
 ] 

ASF GitHub Bot commented on FLINK-2391:
---

Github user StephanEwen commented on the pull request:

https://github.com/apache/flink/pull/940#issuecomment-125887557
  
Will merge this...


 Storm-compatibility:method FlinkTopologyBuilder.createTopology() throws 
 java.lang.NullPointerException
 --

 Key: FLINK-2391
 URL: https://issues.apache.org/jira/browse/FLINK-2391
 Project: Flink
  Issue Type: Bug
  Components: flink-contrib
Affects Versions: 0.10
 Environment: win7 32bit;linux
Reporter: Huang Wei
  Labels: features
 Fix For: 0.10

   Original Estimate: 168h
  Remaining Estimate: 168h

 core dumped at FlinkOutputFieldsDeclarer.java : 160(package 
 FlinkOutputFieldsDeclarer).
 code : fieldIndexes[i] = this.outputSchema.fieldIndex(groupingFields.get(i));
 in this line, the var this.outputSchema may be null.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (FLINK-2231) Create a Serializer for Scala Enumerations


[ 
https://issues.apache.org/jira/browse/FLINK-2231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14645766#comment-14645766
 ] 

ASF GitHub Bot commented on FLINK-2231:
---

Github user StephanEwen commented on a diff in the pull request:

https://github.com/apache/flink/pull/935#discussion_r35741171
  
--- Diff: 
flink-scala/src/main/scala/org/apache/flink/api/scala/typeutils/EnumValueSerializer.scala
 ---
@@ -0,0 +1,61 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * License); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.flink.api.scala.typeutils
+
+import org.apache.flink.api.common.typeutils.TypeSerializer
+import org.apache.flink.api.common.typeutils.base.IntSerializer
+import org.apache.flink.core.memory.{DataInputView, DataOutputView}
+
+/**
+ * Serializer for [[Enumeration]] values.
+ */
+class EnumValueSerializer[E : Enumeration](val enum: E) extends 
TypeSerializer[E#Value] {
+
+  type T = E#Value
+
+  val intSerializer = new IntSerializer()
+
+  override def duplicate: EnumValueSerializer[E] = this
+
+  override def createInstance: T = enum(0)
+
+  override def isImmutableType: Boolean = true
+
+  override def getLength: Int = intSerializer.getLength
+
+  override def copy(from: T): T = enum.apply(from.id)
+
+  override def copy(from: T, reuse: T): T = copy(from)
+
+  override def copy(src: DataInputView, tgt: DataOutputView): Unit = 
intSerializer.copy(src, tgt)
+
+  override def serialize(v: T, tgt: DataOutputView): Unit = 
intSerializer.serialize(v.id, tgt)
+
+  override def deserialize(source: DataInputView): T = 
enum(intSerializer.deserialize(source))
+
+  override def deserialize(reuse: T, source: DataInputView): T = 
deserialize(source)
+
+  override def equals(obj: Any): Boolean = {
--- End diff --

Can you also override `hashCode()` here? Keeps it consistent with 
overriding `equals()`


 Create a Serializer for Scala Enumerations
 --

 Key: FLINK-2231
 URL: https://issues.apache.org/jira/browse/FLINK-2231
 Project: Flink
  Issue Type: Improvement
  Components: Scala API
Reporter: Stephan Ewen
Assignee: Alexander Alexandrov

 Scala Enumerations are currently serialized with Kryo, but should be 
 efficiently serialized by just writing the {{initial}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[GitHub] flink pull request: [FLINK-2231] Create a Serializer for Scala Enu...

Github user StephanEwen commented on a diff in the pull request:

https://github.com/apache/flink/pull/935#discussion_r35741171
  
--- Diff: 
flink-scala/src/main/scala/org/apache/flink/api/scala/typeutils/EnumValueSerializer.scala
 ---
@@ -0,0 +1,61 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * License); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.flink.api.scala.typeutils
+
+import org.apache.flink.api.common.typeutils.TypeSerializer
+import org.apache.flink.api.common.typeutils.base.IntSerializer
+import org.apache.flink.core.memory.{DataInputView, DataOutputView}
+
+/**
+ * Serializer for [[Enumeration]] values.
+ */
+class EnumValueSerializer[E : Enumeration](val enum: E) extends 
TypeSerializer[E#Value] {
+
+  type T = E#Value
+
+  val intSerializer = new IntSerializer()
+
+  override def duplicate: EnumValueSerializer[E] = this
+
+  override def createInstance: T = enum(0)
+
+  override def isImmutableType: Boolean = true
+
+  override def getLength: Int = intSerializer.getLength
+
+  override def copy(from: T): T = enum.apply(from.id)
+
+  override def copy(from: T, reuse: T): T = copy(from)
+
+  override def copy(src: DataInputView, tgt: DataOutputView): Unit = 
intSerializer.copy(src, tgt)
+
+  override def serialize(v: T, tgt: DataOutputView): Unit = 
intSerializer.serialize(v.id, tgt)
+
+  override def deserialize(source: DataInputView): T = 
enum(intSerializer.deserialize(source))
+
+  override def deserialize(reuse: T, source: DataInputView): T = 
deserialize(source)
+
+  override def equals(obj: Any): Boolean = {
--- End diff --

Can you also override `hashCode()` here? Keeps it consistent with 
overriding `equals()`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[jira] [Resolved] (FLINK-2420) OutputFlush thread in stream writers does not propagate exceptions


 [ 
https://issues.apache.org/jira/browse/FLINK-2420?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephan Ewen resolved FLINK-2420.
-
Resolution: Fixed

Fixed via 8ba321332b994579f387add8bd0855bd29cb33ec

 OutputFlush thread in stream writers does not propagate exceptions
 --

 Key: FLINK-2420
 URL: https://issues.apache.org/jira/browse/FLINK-2420
 Project: Flink
  Issue Type: Bug
  Components: Streaming
Affects Versions: 0.10
Reporter: Stephan Ewen
Assignee: Stephan Ewen
 Fix For: 0.10


 The output flush thread only throws exceptions that it encountered. This 
 simply lets the thread die, when the exceptions reach the root of the stack.
 The exceptions never reach the actual writer code. That way, exceptions that 
 only happen on flush (or the last flush) would never be detected.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (FLINK-2421) StreamRecordSerializer incorrectly duplicates and misses tests


 [ 
https://issues.apache.org/jira/browse/FLINK-2421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephan Ewen resolved FLINK-2421.
-
Resolution: Fixed

Fixed via 2d237e18a2f7cf21721340933c505bb518c4fc66

 StreamRecordSerializer incorrectly duplicates and misses tests
 --

 Key: FLINK-2421
 URL: https://issues.apache.org/jira/browse/FLINK-2421
 Project: Flink
  Issue Type: Bug
  Components: Streaming
Affects Versions: 0.10
Reporter: Stephan Ewen
Assignee: Stephan Ewen
 Fix For: 0.10


 The duplication method of the {{StreamRecordSerializer}} does not respect the 
 duplication of the internal type serializer.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Closed] (FLINK-2421) StreamRecordSerializer incorrectly duplicates and misses tests


 [ 
https://issues.apache.org/jira/browse/FLINK-2421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephan Ewen closed FLINK-2421.
---

 StreamRecordSerializer incorrectly duplicates and misses tests
 --

 Key: FLINK-2421
 URL: https://issues.apache.org/jira/browse/FLINK-2421
 Project: Flink
  Issue Type: Bug
  Components: Streaming
Affects Versions: 0.10
Reporter: Stephan Ewen
Assignee: Stephan Ewen
 Fix For: 0.10


 The duplication method of the {{StreamRecordSerializer}} does not respect the 
 duplication of the internal type serializer.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Closed] (FLINK-2420) OutputFlush thread in stream writers does not propagate exceptions


 [ 
https://issues.apache.org/jira/browse/FLINK-2420?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephan Ewen closed FLINK-2420.
---

 OutputFlush thread in stream writers does not propagate exceptions
 --

 Key: FLINK-2420
 URL: https://issues.apache.org/jira/browse/FLINK-2420
 Project: Flink
  Issue Type: Bug
  Components: Streaming
Affects Versions: 0.10
Reporter: Stephan Ewen
Assignee: Stephan Ewen
 Fix For: 0.10


 The output flush thread only throws exceptions that it encountered. This 
 simply lets the thread die, when the exceptions reach the root of the stack.
 The exceptions never reach the actual writer code. That way, exceptions that 
 only happen on flush (or the last flush) would never be detected.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (FLINK-2424) InstantiationUtil.serializeObject(Object) does not close output stream


[ 
https://issues.apache.org/jira/browse/FLINK-2424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14645740#comment-14645740
 ] 

Stephan Ewen commented on FLINK-2424:
-

oha, good one!

 InstantiationUtil.serializeObject(Object) does not close output stream
 --

 Key: FLINK-2424
 URL: https://issues.apache.org/jira/browse/FLINK-2424
 Project: Flink
  Issue Type: Bug
  Components: Core
Affects Versions: 0.9, master
Reporter: Ufuk Celebi
Assignee: Ufuk Celebi
 Fix For: 0.10, 0.9.1


 The utility methods InstantiationUtil.serializeObject(Object) does not close 
 the created output stream.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[GitHub] flink pull request: [FLINK-2419] Add test for sinks after keyBy an...

Github user gyfora commented on the pull request:

https://github.com/apache/flink/pull/947#issuecomment-125894533
  
If you give a +1 @rmetzger I will merge this


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[jira] [Commented] (FLINK-2419) DataStream sinks lose key information


[ 
https://issues.apache.org/jira/browse/FLINK-2419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14645779#comment-14645779
 ] 

ASF GitHub Bot commented on FLINK-2419:
---

Github user gyfora commented on the pull request:

https://github.com/apache/flink/pull/947#issuecomment-125894533
  
If you give a +1 @rmetzger I will merge this


 DataStream sinks lose key information
 -

 Key: FLINK-2419
 URL: https://issues.apache.org/jira/browse/FLINK-2419
 Project: Flink
  Issue Type: Bug
  Components: Streaming
Reporter: Gyula Fora
Assignee: Gyula Fora
Priority: Blocker
 Fix For: 0.10


 If the user applies an addSink method after keyBy, the sink will not have the 
 information of the key as the transform method is bypassed.
 This makes it impossible to use partitioned states with sinks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (FLINK-1901) Create sample operator for Dataset


[ 
https://issues.apache.org/jira/browse/FLINK-1901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14645803#comment-14645803
 ] 

ASF GitHub Bot commented on FLINK-1901:
---

Github user ChengXiangLi commented on the pull request:

https://github.com/apache/flink/pull/949#issuecomment-125898830
  
Previously, i plan to leave the sample scala API to an separate PR as i not 
very familiar with scala, but the failed test shows that Flink has a test to 
make sure scala and java has the same API, i would try to add scala API and 
integration test later.


 Create sample operator for Dataset
 --

 Key: FLINK-1901
 URL: https://issues.apache.org/jira/browse/FLINK-1901
 Project: Flink
  Issue Type: Improvement
  Components: Core
Reporter: Theodore Vasiloudis
Assignee: Chengxiang Li

 In order to be able to implement Stochastic Gradient Descent and a number of 
 other machine learning algorithms we need to have a way to take a random 
 sample from a Dataset.
 We need to be able to sample with or without replacement from the Dataset, 
 choose the relative size of the sample, and set a seed for reproducibility.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[GitHub] flink pull request: [FLINK-2240] Use BloomFilter to filter probe r...

Github user StephanEwen commented on the pull request:

https://github.com/apache/flink/pull/888#issuecomment-125903682
  
Okay, it took me a while, but I actually walked through this and would like 
to merge it soon.
To make this functionality configurable, I opened an issue to give runtime 
operators access to the TaskManager config, so that we can define a switch to 
turn this on and off.

I would actually turn it on by default, it seems to work well as far as I 
have tried it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[jira] [Commented] (FLINK-2240) Use BloomFilter to minimize probe side records which are spilled to disk in Hybrid-Hash-Join


[ 
https://issues.apache.org/jira/browse/FLINK-2240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14645822#comment-14645822
 ] 

ASF GitHub Bot commented on FLINK-2240:
---

Github user StephanEwen commented on the pull request:

https://github.com/apache/flink/pull/888#issuecomment-125903682
  
Okay, it took me a while, but I actually walked through this and would like 
to merge it soon.
To make this functionality configurable, I opened an issue to give runtime 
operators access to the TaskManager config, so that we can define a switch to 
turn this on and off.

I would actually turn it on by default, it seems to work well as far as I 
have tried it.


 Use BloomFilter to minimize probe side records which are spilled to disk in 
 Hybrid-Hash-Join
 

 Key: FLINK-2240
 URL: https://issues.apache.org/jira/browse/FLINK-2240
 Project: Flink
  Issue Type: Improvement
  Components: Core
Reporter: Chengxiang Li
Assignee: Chengxiang Li
Priority: Minor

 In Hybrid-Hash-Join, while small table does not fit into memory, part of the 
 small table data would be spilled to disk, and the counterpart partition of 
 big table data would be spilled to disk in probe phase as well. If we build a 
 BloomFilter while spill small table to disk during build phase, and use it to 
 filter the big table records which tend to be spilled to disk, this may 
 greatly  reduce the spilled big table file size, and saved the disk IO cost 
 for writing and further reading.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[GitHub] flink pull request: [FLINK-2240] Use BloomFilter to filter probe r...

2015-07-29 Thread ChengXiangLi

Github user ChengXiangLi commented on the pull request:

https://github.com/apache/flink/pull/888#issuecomment-125914138
  
Thanks to point that out, Stephan, i didn't notice the configuration issue 
before.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[jira] [Commented] (FLINK-2240) Use BloomFilter to minimize probe side records which are spilled to disk in Hybrid-Hash-Join


[ 
https://issues.apache.org/jira/browse/FLINK-2240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14645842#comment-14645842
 ] 

ASF GitHub Bot commented on FLINK-2240:
---

Github user ChengXiangLi commented on the pull request:

https://github.com/apache/flink/pull/888#issuecomment-125914138
  
Thanks to point that out, Stephan, i didn't notice the configuration issue 
before.


 Use BloomFilter to minimize probe side records which are spilled to disk in 
 Hybrid-Hash-Join
 

 Key: FLINK-2240
 URL: https://issues.apache.org/jira/browse/FLINK-2240
 Project: Flink
  Issue Type: Improvement
  Components: Core
Reporter: Chengxiang Li
Assignee: Chengxiang Li
Priority: Minor

 In Hybrid-Hash-Join, while small table does not fit into memory, part of the 
 small table data would be spilled to disk, and the counterpart partition of 
 big table data would be spilled to disk in probe phase as well. If we build a 
 BloomFilter while spill small table to disk during build phase, and use it to 
 filter the big table records which tend to be spilled to disk, this may 
 greatly  reduce the spilled big table file size, and saved the disk IO cost 
 for writing and further reading.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (FLINK-2425) Give access to TaskManager config and hostname in the Runtime Environment

2015-07-29 Thread Robert Metzger (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-2425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14645817#comment-14645817
 ] 

Robert Metzger commented on FLINK-2425:
---

+1 That is something that users have requested

 Give access to TaskManager config and hostname in the Runtime Environment
 -

 Key: FLINK-2425
 URL: https://issues.apache.org/jira/browse/FLINK-2425
 Project: Flink
  Issue Type: Improvement
  Components: TaskManager
Affects Versions: 0.10
Reporter: Stephan Ewen
 Fix For: 0.10


 The RuntimeEnvironment (that is used by the operators to access the context) 
 should give access to the TaskManager's configuration, to allow to read 
 config values.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (FLINK-2427) Allow the BarrierBuffer to maintain multiple queues of blocked inputs

Stephan Ewen created FLINK-2427:
---

 Summary: Allow the BarrierBuffer to maintain multiple queues of 
blocked inputs
 Key: FLINK-2427
 URL: https://issues.apache.org/jira/browse/FLINK-2427
 Project: Flink
  Issue Type: Sub-task
  Components: Streaming
Affects Versions: 0.10
Reporter: Stephan Ewen
Assignee: Stephan Ewen
 Fix For: 0.10


In corner cases (dropped barriers due to failures/startup races), this is 
required for proper operation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (FLINK-2152) Provide zipWithIndex utility in flink-contrib

2015-07-29 Thread Robert Metzger (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-2152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14645895#comment-14645895
 ] 

Robert Metzger commented on FLINK-2152:
---

Yes, we either have to use a concurrent list or create a copy of it.
The elements in the broadcast set are shared among the parallel instances.

 Provide zipWithIndex utility in flink-contrib
 -

 Key: FLINK-2152
 URL: https://issues.apache.org/jira/browse/FLINK-2152
 Project: Flink
  Issue Type: Improvement
  Components: Java API
Reporter: Robert Metzger
Assignee: Andra Lungu
Priority: Trivial
  Labels: starter
 Fix For: 0.10


 We should provide a simple utility method for zipping elements in a data set 
 with a dense index.
 its up for discussion whether we want it directly in the API or if we should 
 provide it only as a utility from {{flink-contrib}}.
 I would put it in {{flink-contrib}}.
 See my answer on SO: 
 http://stackoverflow.com/questions/30596556/zipwithindex-on-apache-flink



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[GitHub] flink pull request: [FLINK-1901] [core] Create sample operator for...

2015-07-29 Thread ChengXiangLi

Github user ChengXiangLi commented on the pull request:

https://github.com/apache/flink/pull/949#issuecomment-125898830
  
Previously, i plan to leave the sample scala API to an separate PR as i not 
very familiar with scala, but the failed test shows that Flink has a test to 
make sure scala and java has the same API, i would try to add scala API and 
integration test later.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[jira] [Commented] (FLINK-2387) Add test for live accumulators in Streaming


[ 
https://issues.apache.org/jira/browse/FLINK-2387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14645821#comment-14645821
 ] 

ASF GitHub Bot commented on FLINK-2387:
---

Github user StephanEwen commented on the pull request:

https://github.com/apache/flink/pull/926#issuecomment-125902465
  
How does this PR relate to the recent improvements on the stability of the 
live accumulator tests?


 Add test for live accumulators in Streaming
 ---

 Key: FLINK-2387
 URL: https://issues.apache.org/jira/browse/FLINK-2387
 Project: Flink
  Issue Type: Test
  Components: Streaming
Reporter: Maximilian Michels





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[GitHub] flink pull request: [FLINK-2387] add streaming test case for live ...

Github user StephanEwen commented on the pull request:

https://github.com/apache/flink/pull/926#issuecomment-125902465
  
How does this PR relate to the recent improvements on the stability of the 
live accumulator tests?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[jira] [Commented] (FLINK-2152) Provide zipWithIndex utility in flink-contrib

2015-07-29 Thread Andra Lungu (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-2152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14645904#comment-14645904
 ] 

Andra Lungu commented on FLINK-2152:


Hey,

Since this is Johannes Günther's finding, I suggest he simply opens a PR with 
what worked for him :)
Nice catch BTW!

Cheers,
Andra

 Provide zipWithIndex utility in flink-contrib
 -

 Key: FLINK-2152
 URL: https://issues.apache.org/jira/browse/FLINK-2152
 Project: Flink
  Issue Type: Improvement
  Components: Java API
Reporter: Robert Metzger
Assignee: Andra Lungu
Priority: Trivial
  Labels: starter
 Fix For: 0.10


 We should provide a simple utility method for zipping elements in a data set 
 with a dense index.
 its up for discussion whether we want it directly in the API or if we should 
 provide it only as a utility from {{flink-contrib}}.
 I would put it in {{flink-contrib}}.
 See my answer on SO: 
 http://stackoverflow.com/questions/30596556/zipwithindex-on-apache-flink



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[GitHub] flink pull request: [FLINK-2231] Create a Serializer for Scala Enu...

2015-07-29 Thread aljoscha

Github user aljoscha commented on the pull request:

https://github.com/apache/flink/pull/935#issuecomment-125936680
  
Merged, thanks for your work. :smile: 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-2231] Create a Serializer for Scala Enu...

2015-07-29 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/935


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[jira] [Commented] (FLINK-2231) Create a Serializer for Scala Enumerations


[ 
https://issues.apache.org/jira/browse/FLINK-2231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14645974#comment-14645974
 ] 

ASF GitHub Bot commented on FLINK-2231:
---

Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/935


 Create a Serializer for Scala Enumerations
 --

 Key: FLINK-2231
 URL: https://issues.apache.org/jira/browse/FLINK-2231
 Project: Flink
  Issue Type: Improvement
  Components: Scala API
Reporter: Stephan Ewen
Assignee: Alexander Alexandrov

 Scala Enumerations are currently serialized with Kryo, but should be 
 efficiently serialized by just writing the {{initial}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (FLINK-2428) Clean up unused properties in StreamConfig

Stephan Ewen created FLINK-2428:
---

 Summary: Clean up unused properties in StreamConfig
 Key: FLINK-2428
 URL: https://issues.apache.org/jira/browse/FLINK-2428
 Project: Flink
  Issue Type: Bug
  Components: Streaming
Affects Versions: 0.10
Reporter: Stephan Ewen
Priority: Minor


There is a multitude of unused properties in the {{StreamConfig}}, which should 
be removed, if no longer relevant.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (FLINK-2425) Give access to TaskManager config and hostname in the Runtime Environment


[ 
https://issues.apache.org/jira/browse/FLINK-2425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14646003#comment-14646003
 ] 

Sachin Goel commented on FLINK-2425:


Yes. But this will certainly use the readonly configuration defined there. 
After that it's just a matter of passing it to the RuntimeEnvironment which can 
give access to the user by passing it through the DistributedContext. 
Or is there a better way? :')

 Give access to TaskManager config and hostname in the Runtime Environment
 -

 Key: FLINK-2425
 URL: https://issues.apache.org/jira/browse/FLINK-2425
 Project: Flink
  Issue Type: Improvement
  Components: TaskManager
Affects Versions: 0.10
Reporter: Stephan Ewen
Assignee: Sachin Goel
 Fix For: 0.10


 The RuntimeEnvironment (that is used by the operators to access the context) 
 should give access to the TaskManager's configuration, to allow to read 
 config values.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (FLINK-2425) Give access to TaskManager config and hostname in the Runtime Environment


[ 
https://issues.apache.org/jira/browse/FLINK-2425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14646023#comment-14646023
 ] 

Sachin Goel commented on FLINK-2425:


Should I provide access to configuration via RuntimeContext as well? Since user 
cannot modify it, there can't be any harm to doing it.
Access to config parameters might help the user write the program better.

 Give access to TaskManager config and hostname in the Runtime Environment
 -

 Key: FLINK-2425
 URL: https://issues.apache.org/jira/browse/FLINK-2425
 Project: Flink
  Issue Type: Improvement
  Components: TaskManager
Affects Versions: 0.10
Reporter: Stephan Ewen
Assignee: Sachin Goel
 Fix For: 0.10


 The RuntimeEnvironment (that is used by the operators to access the context) 
 should give access to the TaskManager's configuration, to allow to read 
 config values.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (FLINK-2391) Storm-compatibility:method FlinkTopologyBuilder.createTopology() throws java.lang.NullPointerException


[ 
https://issues.apache.org/jira/browse/FLINK-2391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14645954#comment-14645954
 ] 

ASF GitHub Bot commented on FLINK-2391:
---

Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/940


 Storm-compatibility:method FlinkTopologyBuilder.createTopology() throws 
 java.lang.NullPointerException
 --

 Key: FLINK-2391
 URL: https://issues.apache.org/jira/browse/FLINK-2391
 Project: Flink
  Issue Type: Bug
  Components: flink-contrib
Affects Versions: 0.10
 Environment: win7 32bit;linux
Reporter: Huang Wei
  Labels: features
 Fix For: 0.10

   Original Estimate: 168h
  Remaining Estimate: 168h

 core dumped at FlinkOutputFieldsDeclarer.java : 160(package 
 FlinkOutputFieldsDeclarer).
 code : fieldIndexes[i] = this.outputSchema.fieldIndex(groupingFields.get(i));
 in this line, the var this.outputSchema may be null.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (FLINK-2425) Give access to TaskManager config and hostname in the Runtime Environment


[ 
https://issues.apache.org/jira/browse/FLINK-2425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14645993#comment-14645993
 ] 

Sachin Goel commented on FLINK-2425:


The readonly configuration will take the actual configuration as an argument in 
its constructor and provide all access functions of Configuration, while 
keeping it private.

 Give access to TaskManager config and hostname in the Runtime Environment
 -

 Key: FLINK-2425
 URL: https://issues.apache.org/jira/browse/FLINK-2425
 Project: Flink
  Issue Type: Improvement
  Components: TaskManager
Affects Versions: 0.10
Reporter: Stephan Ewen
Assignee: Sachin Goel
 Fix For: 0.10


 The RuntimeEnvironment (that is used by the operators to access the context) 
 should give access to the TaskManager's configuration, to allow to read 
 config values.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[GitHub] flink pull request: Cascading changes for compatibility

2015-07-29 Thread mxm

GitHub user mxm opened a pull request:

https://github.com/apache/flink/pull/950

Cascading changes for compatibility

@fhueske and me are working on getting Cascading to run on top of Flink. 
These two commits introduce changes that were necessary to make the translation 
possible.

Up to debate is the second commit, which loads the userclassloader before 
calling `configure` on an InputFormat. This is necessary because Cascading has 
its own interface that needs to be wrapped inside a Flink `InputFormat`. 
Usually, all dependencies are loaded upon instantiation of the class. However, 
Cascading loads its own input format while `configure(config)` is called (via 
the class name inside the passed config). Without the changes, this leads to a 
ClassNotFoundException.

Let me know what you think about it. 

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/mxm/flink cascading-dev

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/950.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #950


commit 433fec2a90b5e868d0c3b7fa258b85c32345b954
Author: Fabian Hueske fhue...@apache.org
Date:   2015-07-16T22:31:09Z

[cascading] add getJobConf() to HadoopInputSplit

commit a81582c3cf59952381ce5fb9e15adeb775fcbff7
Author: Maximilian Michels m...@apache.org
Date:   2015-07-29T12:51:14Z

[cascading] load user classloader when configuring InputFormat




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[jira] [Updated] (FLINK-2429) Remove the enableCheckpointing() without interval variant


 [ 
https://issues.apache.org/jira/browse/FLINK-2429?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephan Ewen updated FLINK-2429:

Description: 
I think it is not very obvious what the default checkpointing interval is.
Also, when somebody activates checkpointing, shouldn't they think about what 
they want in terms of frequency and recovery latency tradeoffs?

  was:It is not very obvious what the default checkpointing interval. Also, 
when somebody activates 


 Remove the enableCheckpointing() without interval variant
 ---

 Key: FLINK-2429
 URL: https://issues.apache.org/jira/browse/FLINK-2429
 Project: Flink
  Issue Type: Wish
  Components: Streaming
Reporter: Stephan Ewen
Priority: Minor

 I think it is not very obvious what the default checkpointing interval is.
 Also, when somebody activates checkpointing, shouldn't they think about what 
 they want in terms of frequency and recovery latency tradeoffs?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (FLINK-2429) Remove the enableCheckpointing() without interval variant