[GitHub] flink pull request: [FLINK-2324] [streaming] Partitioned state che...
Github user gyfora commented on the pull request: https://github.com/apache/flink/pull/937#issuecomment-125872157 Merging your PR fixed the exactly-once guarantees @StephanEwen , great! I also added an extra test for Partitioned states. Do you think it is okay to leave the StreamCheckpointingITCase how I modified it? Now it tests all the different checkpointing mechanisms together. If you add a test for the Checkpointed interface itself, this should be good. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-2324) Rework partitioned state storage
[ https://issues.apache.org/jira/browse/FLINK-2324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14645645#comment-14645645 ] ASF GitHub Bot commented on FLINK-2324: --- Github user gyfora commented on the pull request: https://github.com/apache/flink/pull/937#issuecomment-125872157 Merging your PR fixed the exactly-once guarantees @StephanEwen , great! I also added an extra test for Partitioned states. Do you think it is okay to leave the StreamCheckpointingITCase how I modified it? Now it tests all the different checkpointing mechanisms together. If you add a test for the Checkpointed interface itself, this should be good. > Rework partitioned state storage > > > Key: FLINK-2324 > URL: https://issues.apache.org/jira/browse/FLINK-2324 > Project: Flink > Issue Type: Improvement >Reporter: Gyula Fora >Assignee: Gyula Fora > > Partitioned states are currently stored per-key in statehandles. This is > alright for in-memory storage but is very inefficient for HDFS. > The logic behind the current mechanism is that this approach provides a way > to repartition a state without fetching the data from the external storage > and only manipulating handles. > We should come up with a solution that can achieve both. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1901) Create sample operator for Dataset
[ https://issues.apache.org/jira/browse/FLINK-1901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14645658#comment-14645658 ] ASF GitHub Bot commented on FLINK-1901: --- GitHub user ChengXiangLi opened a pull request: https://github.com/apache/flink/pull/949 [FLINK-1901] [core] Create sample operator for Dataset. This PR includes: 1. 4 random sampler implementation for different sample strategies. 2. sample operator for DataSet Java API. 3. random sampler unit test. 4. sample operator Java API integration test. You can merge this pull request into a Git repository by running: $ git pull https://github.com/ChengXiangLi/flink FLINK-1901 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/949.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #949 commit f7ba8779b8d6a6d66ab5d4e2435a70e220b1e0fc Author: chengxiang li Date: 2015-07-22T03:38:13Z [FLINK-1901] [core] Create sample operator for Dataset. > Create sample operator for Dataset > -- > > Key: FLINK-1901 > URL: https://issues.apache.org/jira/browse/FLINK-1901 > Project: Flink > Issue Type: Improvement > Components: Core >Reporter: Theodore Vasiloudis >Assignee: Chengxiang Li > > In order to be able to implement Stochastic Gradient Descent and a number of > other machine learning algorithms we need to have a way to take a random > sample from a Dataset. > We need to be able to sample with or without replacement from the Dataset, > choose the relative size of the sample, and set a seed for reproducibility. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-1901] [core] Create sample operator for...
GitHub user ChengXiangLi opened a pull request: https://github.com/apache/flink/pull/949 [FLINK-1901] [core] Create sample operator for Dataset. This PR includes: 1. 4 random sampler implementation for different sample strategies. 2. sample operator for DataSet Java API. 3. random sampler unit test. 4. sample operator Java API integration test. You can merge this pull request into a Git repository by running: $ git pull https://github.com/ChengXiangLi/flink FLINK-1901 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/949.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #949 commit f7ba8779b8d6a6d66ab5d4e2435a70e220b1e0fc Author: chengxiang li Date: 2015-07-22T03:38:13Z [FLINK-1901] [core] Create sample operator for Dataset. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Closed] (FLINK-2418) Add an end-to-end streaming fault tolerance test for the Checkpointed interface
[ https://issues.apache.org/jira/browse/FLINK-2418?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stephan Ewen closed FLINK-2418. --- > Add an end-to-end streaming fault tolerance test for the Checkpointed > interface > > > Key: FLINK-2418 > URL: https://issues.apache.org/jira/browse/FLINK-2418 > Project: Flink > Issue Type: Bug > Components: Streaming >Affects Versions: 0.10 >Reporter: Stephan Ewen >Assignee: Stephan Ewen > Fix For: 0.10 > > > The current test lacks the following: > - Does not validate and exactly-once counts after partitioning step > - Does not cover all cases with the {{Checkpointed}} interface, but uses a > mix of by-key state and explicitly Checkpointed state > - The test uses a non-fault-tolerant deprecated infinite reducer. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (FLINK-2418) Add an end-to-end streaming fault tolerance test for the Checkpointed interface
[ https://issues.apache.org/jira/browse/FLINK-2418?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stephan Ewen resolved FLINK-2418. - Resolution: Fixed Fixed via a0556efb233f15c6985d17886372a8b4b00392b2 > Add an end-to-end streaming fault tolerance test for the Checkpointed > interface > > > Key: FLINK-2418 > URL: https://issues.apache.org/jira/browse/FLINK-2418 > Project: Flink > Issue Type: Bug > Components: Streaming >Affects Versions: 0.10 >Reporter: Stephan Ewen >Assignee: Stephan Ewen > Fix For: 0.10 > > > The current test lacks the following: > - Does not validate and exactly-once counts after partitioning step > - Does not cover all cases with the {{Checkpointed}} interface, but uses a > mix of by-key state and explicitly Checkpointed state > - The test uses a non-fault-tolerant deprecated infinite reducer. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Closed] (FLINK-2421) StreamRecordSerializer incorrectly duplicates and misses tests
[ https://issues.apache.org/jira/browse/FLINK-2421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stephan Ewen closed FLINK-2421. --- > StreamRecordSerializer incorrectly duplicates and misses tests > -- > > Key: FLINK-2421 > URL: https://issues.apache.org/jira/browse/FLINK-2421 > Project: Flink > Issue Type: Bug > Components: Streaming >Affects Versions: 0.10 >Reporter: Stephan Ewen >Assignee: Stephan Ewen > Fix For: 0.10 > > > The duplication method of the {{StreamRecordSerializer}} does not respect the > duplication of the internal type serializer. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Closed] (FLINK-2420) OutputFlush thread in stream writers does not propagate exceptions
[ https://issues.apache.org/jira/browse/FLINK-2420?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stephan Ewen closed FLINK-2420. --- > OutputFlush thread in stream writers does not propagate exceptions > -- > > Key: FLINK-2420 > URL: https://issues.apache.org/jira/browse/FLINK-2420 > Project: Flink > Issue Type: Bug > Components: Streaming >Affects Versions: 0.10 >Reporter: Stephan Ewen >Assignee: Stephan Ewen > Fix For: 0.10 > > > The output flush thread only throws exceptions that it encountered. This > simply lets the thread die, when the exceptions reach the root of the stack. > The exceptions never reach the actual writer code. That way, exceptions that > only happen on flush (or the last flush) would never be detected. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (FLINK-2420) OutputFlush thread in stream writers does not propagate exceptions
[ https://issues.apache.org/jira/browse/FLINK-2420?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stephan Ewen resolved FLINK-2420. - Resolution: Fixed Fixed via 8ba321332b994579f387add8bd0855bd29cb33ec > OutputFlush thread in stream writers does not propagate exceptions > -- > > Key: FLINK-2420 > URL: https://issues.apache.org/jira/browse/FLINK-2420 > Project: Flink > Issue Type: Bug > Components: Streaming >Affects Versions: 0.10 >Reporter: Stephan Ewen >Assignee: Stephan Ewen > Fix For: 0.10 > > > The output flush thread only throws exceptions that it encountered. This > simply lets the thread die, when the exceptions reach the root of the stack. > The exceptions never reach the actual writer code. That way, exceptions that > only happen on flush (or the last flush) would never be detected. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (FLINK-2421) StreamRecordSerializer incorrectly duplicates and misses tests
[ https://issues.apache.org/jira/browse/FLINK-2421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stephan Ewen resolved FLINK-2421. - Resolution: Fixed Fixed via 2d237e18a2f7cf21721340933c505bb518c4fc66 > StreamRecordSerializer incorrectly duplicates and misses tests > -- > > Key: FLINK-2421 > URL: https://issues.apache.org/jira/browse/FLINK-2421 > Project: Flink > Issue Type: Bug > Components: Streaming >Affects Versions: 0.10 >Reporter: Stephan Ewen >Assignee: Stephan Ewen > Fix For: 0.10 > > > The duplication method of the {{StreamRecordSerializer}} does not respect the > duplication of the internal type serializer. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Closed] (FLINK-2406) Abstract BarrierBuffer to an exchangeable BarrierHandler
[ https://issues.apache.org/jira/browse/FLINK-2406?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stephan Ewen closed FLINK-2406. --- > Abstract BarrierBuffer to an exchangeable BarrierHandler > > > Key: FLINK-2406 > URL: https://issues.apache.org/jira/browse/FLINK-2406 > Project: Flink > Issue Type: Sub-task > Components: Streaming >Affects Versions: 0.10 >Reporter: Stephan Ewen >Assignee: Stephan Ewen > Fix For: 0.10 > > > We need to make the Checkpoint handling pluggable, to allow us to use > different implementations: > - BarrierBuffer for "exactly once" processing. This inevitably introduces a > bit of latency. > - BarrierTracker for "at least once" processing, with no added latency. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (FLINK-2402) Add a non-blocking BarrierTracker
[ https://issues.apache.org/jira/browse/FLINK-2402?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stephan Ewen resolved FLINK-2402. - Resolution: Implemented Implemented in 8f87b7164b644ea8f1708f7eb76567e58341b224 > Add a non-blocking BarrierTracker > - > > Key: FLINK-2402 > URL: https://issues.apache.org/jira/browse/FLINK-2402 > Project: Flink > Issue Type: Sub-task > Components: Streaming >Affects Versions: 0.10 >Reporter: Stephan Ewen >Assignee: Stephan Ewen > Fix For: 0.10 > > > This issue would add a new tracker for barriers that simply tracks what > barriers have been observed from which inputs. It never blocks off buffers. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Closed] (FLINK-2402) Add a non-blocking BarrierTracker
[ https://issues.apache.org/jira/browse/FLINK-2402?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stephan Ewen closed FLINK-2402. --- > Add a non-blocking BarrierTracker > - > > Key: FLINK-2402 > URL: https://issues.apache.org/jira/browse/FLINK-2402 > Project: Flink > Issue Type: Sub-task > Components: Streaming >Affects Versions: 0.10 >Reporter: Stephan Ewen >Assignee: Stephan Ewen > Fix For: 0.10 > > > This issue would add a new tracker for barriers that simply tracks what > barriers have been observed from which inputs. It never blocks off buffers. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (FLINK-2406) Abstract BarrierBuffer to an exchangeable BarrierHandler
[ https://issues.apache.org/jira/browse/FLINK-2406?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stephan Ewen resolved FLINK-2406. - Resolution: Fixed Fixed via 0579f90bab165a7df336163eb9d6337267020029 > Abstract BarrierBuffer to an exchangeable BarrierHandler > > > Key: FLINK-2406 > URL: https://issues.apache.org/jira/browse/FLINK-2406 > Project: Flink > Issue Type: Sub-task > Components: Streaming >Affects Versions: 0.10 >Reporter: Stephan Ewen >Assignee: Stephan Ewen > Fix For: 0.10 > > > We need to make the Checkpoint handling pluggable, to allow us to use > different implementations: > - BarrierBuffer for "exactly once" processing. This inevitably introduces a > bit of latency. > - BarrierTracker for "at least once" processing, with no added latency. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-2399] Version checks for Job Manager an...
Github user StephanEwen commented on the pull request: https://github.com/apache/flink/pull/945#issuecomment-125886476 Using `getClass().getPackage().getImplementationVersion()` would be a decent first approach then, I guess. The critical part seems to be the Client-to-JobManager communication. How about the following as a first step: The client sends its version together with the `SubmitJob` message (just add a field there). The JobManager would check the version and respond with a failure, if it does not match. You can probably make the JobManager part very simple, no need to add extra constructor parameters, etc. That way, the change would be minimally invasive, and we could see how well it addresses the issues, and whether we should extend this to other messages as well. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-2399) Fail when actor versions don't match
[ https://issues.apache.org/jira/browse/FLINK-2399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14645737#comment-14645737 ] ASF GitHub Bot commented on FLINK-2399: --- Github user StephanEwen commented on the pull request: https://github.com/apache/flink/pull/945#issuecomment-125886476 Using `getClass().getPackage().getImplementationVersion()` would be a decent first approach then, I guess. The critical part seems to be the Client-to-JobManager communication. How about the following as a first step: The client sends its version together with the `SubmitJob` message (just add a field there). The JobManager would check the version and respond with a failure, if it does not match. You can probably make the JobManager part very simple, no need to add extra constructor parameters, etc. That way, the change would be minimally invasive, and we could see how well it addresses the issues, and whether we should extend this to other messages as well. > Fail when actor versions don't match > > > Key: FLINK-2399 > URL: https://issues.apache.org/jira/browse/FLINK-2399 > Project: Flink > Issue Type: Improvement > Components: JobManager, TaskManager >Affects Versions: 0.9, master >Reporter: Ufuk Celebi >Assignee: Sachin Goel >Priority: Minor > Fix For: 0.10 > > > Problem: there can be subtle errors when actors from different Flink versions > communicate with each other, for example when an old client (e.g. Flink 0.9) > communicates with a new JobManager (e.g. Flink 0.10-SNAPSHOT). > We can check that the versions match on first communication between the > actors and fail if they don't match. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2424) InstantiationUtil.serializeObject(Object) does not close output stream
[ https://issues.apache.org/jira/browse/FLINK-2424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14645740#comment-14645740 ] Stephan Ewen commented on FLINK-2424: - oha, good one! > InstantiationUtil.serializeObject(Object) does not close output stream > -- > > Key: FLINK-2424 > URL: https://issues.apache.org/jira/browse/FLINK-2424 > Project: Flink > Issue Type: Bug > Components: Core >Affects Versions: 0.9, master >Reporter: Ufuk Celebi >Assignee: Ufuk Celebi > Fix For: 0.10, 0.9.1 > > > The utility methods InstantiationUtil.serializeObject(Object) does not close > the created output stream. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-2406] [FLINK-2402] Abstract the Barrier...
Github user StephanEwen commented on the pull request: https://github.com/apache/flink/pull/938#issuecomment-125887226 It failed when at the end of the program, queued channel data was present at the end of the job. This happens only with slow consumers. Apparently, the local machines are usually fast enough to never leave queue data, and Travis is the slow one again ;-) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-2406) Abstract BarrierBuffer to an exchangeable BarrierHandler
[ https://issues.apache.org/jira/browse/FLINK-2406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14645744#comment-14645744 ] ASF GitHub Bot commented on FLINK-2406: --- Github user StephanEwen commented on the pull request: https://github.com/apache/flink/pull/938#issuecomment-125887226 It failed when at the end of the program, queued channel data was present at the end of the job. This happens only with slow consumers. Apparently, the local machines are usually fast enough to never leave queue data, and Travis is the slow one again ;-) > Abstract BarrierBuffer to an exchangeable BarrierHandler > > > Key: FLINK-2406 > URL: https://issues.apache.org/jira/browse/FLINK-2406 > Project: Flink > Issue Type: Sub-task > Components: Streaming >Affects Versions: 0.10 >Reporter: Stephan Ewen >Assignee: Stephan Ewen > Fix For: 0.10 > > > We need to make the Checkpoint handling pluggable, to allow us to use > different implementations: > - BarrierBuffer for "exactly once" processing. This inevitably introduces a > bit of latency. > - BarrierTracker for "at least once" processing, with no added latency. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2391) Storm-compatibility:method FlinkTopologyBuilder.createTopology() throws java.lang.NullPointerException
[ https://issues.apache.org/jira/browse/FLINK-2391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14645748#comment-14645748 ] ASF GitHub Bot commented on FLINK-2391: --- Github user StephanEwen commented on the pull request: https://github.com/apache/flink/pull/940#issuecomment-125887557 Will merge this... > Storm-compatibility:method FlinkTopologyBuilder.createTopology() throws > java.lang.NullPointerException > -- > > Key: FLINK-2391 > URL: https://issues.apache.org/jira/browse/FLINK-2391 > Project: Flink > Issue Type: Bug > Components: flink-contrib >Affects Versions: 0.10 > Environment: win7 32bit;linux >Reporter: Huang Wei > Labels: features > Fix For: 0.10 > > Original Estimate: 168h > Remaining Estimate: 168h > > core dumped at FlinkOutputFieldsDeclarer.java : 160(package > FlinkOutputFieldsDeclarer). > code : fieldIndexes[i] = this.outputSchema.fieldIndex(groupingFields.get(i)); > in this line, the var this.outputSchema may be null. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-2391]Fix Storm-compatibility FlinkTopol...
Github user StephanEwen commented on the pull request: https://github.com/apache/flink/pull/940#issuecomment-125887557 Will merge this... --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-2406) Abstract BarrierBuffer to an exchangeable BarrierHandler
[ https://issues.apache.org/jira/browse/FLINK-2406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14645749#comment-14645749 ] ASF GitHub Bot commented on FLINK-2406: --- Github user uce commented on the pull request: https://github.com/apache/flink/pull/938#issuecomment-125887588 I really love Travis :heart: > Abstract BarrierBuffer to an exchangeable BarrierHandler > > > Key: FLINK-2406 > URL: https://issues.apache.org/jira/browse/FLINK-2406 > Project: Flink > Issue Type: Sub-task > Components: Streaming >Affects Versions: 0.10 >Reporter: Stephan Ewen >Assignee: Stephan Ewen > Fix For: 0.10 > > > We need to make the Checkpoint handling pluggable, to allow us to use > different implementations: > - BarrierBuffer for "exactly once" processing. This inevitably introduces a > bit of latency. > - BarrierTracker for "at least once" processing, with no added latency. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-2406] [FLINK-2402] Abstract the Barrier...
Github user uce commented on the pull request: https://github.com/apache/flink/pull/938#issuecomment-125887588 I really love Travis :heart: --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-2406] [FLINK-2402] Abstract the Barrier...
Github user StephanEwen closed the pull request at: https://github.com/apache/flink/pull/938 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-2406) Abstract BarrierBuffer to an exchangeable BarrierHandler
[ https://issues.apache.org/jira/browse/FLINK-2406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14645761#comment-14645761 ] ASF GitHub Bot commented on FLINK-2406: --- Github user StephanEwen closed the pull request at: https://github.com/apache/flink/pull/938 > Abstract BarrierBuffer to an exchangeable BarrierHandler > > > Key: FLINK-2406 > URL: https://issues.apache.org/jira/browse/FLINK-2406 > Project: Flink > Issue Type: Sub-task > Components: Streaming >Affects Versions: 0.10 >Reporter: Stephan Ewen >Assignee: Stephan Ewen > Fix For: 0.10 > > > We need to make the Checkpoint handling pluggable, to allow us to use > different implementations: > - BarrierBuffer for "exactly once" processing. This inevitably introduces a > bit of latency. > - BarrierTracker for "at least once" processing, with no added latency. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2406) Abstract BarrierBuffer to an exchangeable BarrierHandler
[ https://issues.apache.org/jira/browse/FLINK-2406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14645760#comment-14645760 ] ASF GitHub Bot commented on FLINK-2406: --- Github user StephanEwen commented on the pull request: https://github.com/apache/flink/pull/938#issuecomment-125889092 Manually merged in 8f87b7164b644ea8f1708f7eb76567e58341b224 > Abstract BarrierBuffer to an exchangeable BarrierHandler > > > Key: FLINK-2406 > URL: https://issues.apache.org/jira/browse/FLINK-2406 > Project: Flink > Issue Type: Sub-task > Components: Streaming >Affects Versions: 0.10 >Reporter: Stephan Ewen >Assignee: Stephan Ewen > Fix For: 0.10 > > > We need to make the Checkpoint handling pluggable, to allow us to use > different implementations: > - BarrierBuffer for "exactly once" processing. This inevitably introduces a > bit of latency. > - BarrierTracker for "at least once" processing, with no added latency. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-2406] [FLINK-2402] Abstract the Barrier...
Github user StephanEwen commented on the pull request: https://github.com/apache/flink/pull/938#issuecomment-125889092 Manually merged in 8f87b7164b644ea8f1708f7eb76567e58341b224 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-2231] Create a Serializer for Scala Enu...
Github user StephanEwen commented on a diff in the pull request: https://github.com/apache/flink/pull/935#discussion_r35741171 --- Diff: flink-scala/src/main/scala/org/apache/flink/api/scala/typeutils/EnumValueSerializer.scala --- @@ -0,0 +1,61 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.flink.api.scala.typeutils + +import org.apache.flink.api.common.typeutils.TypeSerializer +import org.apache.flink.api.common.typeutils.base.IntSerializer +import org.apache.flink.core.memory.{DataInputView, DataOutputView} + +/** + * Serializer for [[Enumeration]] values. + */ +class EnumValueSerializer[E <: Enumeration](val enum: E) extends TypeSerializer[E#Value] { + + type T = E#Value + + val intSerializer = new IntSerializer() + + override def duplicate: EnumValueSerializer[E] = this + + override def createInstance: T = enum(0) + + override def isImmutableType: Boolean = true + + override def getLength: Int = intSerializer.getLength + + override def copy(from: T): T = enum.apply(from.id) + + override def copy(from: T, reuse: T): T = copy(from) + + override def copy(src: DataInputView, tgt: DataOutputView): Unit = intSerializer.copy(src, tgt) + + override def serialize(v: T, tgt: DataOutputView): Unit = intSerializer.serialize(v.id, tgt) + + override def deserialize(source: DataInputView): T = enum(intSerializer.deserialize(source)) + + override def deserialize(reuse: T, source: DataInputView): T = deserialize(source) + + override def equals(obj: Any): Boolean = { --- End diff -- Can you also override `hashCode()` here? Keeps it consistent with overriding `equals()` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-2231) Create a Serializer for Scala Enumerations
[ https://issues.apache.org/jira/browse/FLINK-2231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14645766#comment-14645766 ] ASF GitHub Bot commented on FLINK-2231: --- Github user StephanEwen commented on a diff in the pull request: https://github.com/apache/flink/pull/935#discussion_r35741171 --- Diff: flink-scala/src/main/scala/org/apache/flink/api/scala/typeutils/EnumValueSerializer.scala --- @@ -0,0 +1,61 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.flink.api.scala.typeutils + +import org.apache.flink.api.common.typeutils.TypeSerializer +import org.apache.flink.api.common.typeutils.base.IntSerializer +import org.apache.flink.core.memory.{DataInputView, DataOutputView} + +/** + * Serializer for [[Enumeration]] values. + */ +class EnumValueSerializer[E <: Enumeration](val enum: E) extends TypeSerializer[E#Value] { + + type T = E#Value + + val intSerializer = new IntSerializer() + + override def duplicate: EnumValueSerializer[E] = this + + override def createInstance: T = enum(0) + + override def isImmutableType: Boolean = true + + override def getLength: Int = intSerializer.getLength + + override def copy(from: T): T = enum.apply(from.id) + + override def copy(from: T, reuse: T): T = copy(from) + + override def copy(src: DataInputView, tgt: DataOutputView): Unit = intSerializer.copy(src, tgt) + + override def serialize(v: T, tgt: DataOutputView): Unit = intSerializer.serialize(v.id, tgt) + + override def deserialize(source: DataInputView): T = enum(intSerializer.deserialize(source)) + + override def deserialize(reuse: T, source: DataInputView): T = deserialize(source) + + override def equals(obj: Any): Boolean = { --- End diff -- Can you also override `hashCode()` here? Keeps it consistent with overriding `equals()` > Create a Serializer for Scala Enumerations > -- > > Key: FLINK-2231 > URL: https://issues.apache.org/jira/browse/FLINK-2231 > Project: Flink > Issue Type: Improvement > Components: Scala API >Reporter: Stephan Ewen >Assignee: Alexander Alexandrov > > Scala Enumerations are currently serialized with Kryo, but should be > efficiently serialized by just writing the {{initial}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-2231] Create a Serializer for Scala Enu...
Github user StephanEwen commented on the pull request: https://github.com/apache/flink/pull/935#issuecomment-125890350 Other than the one comment, this looks good. +1 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-2231) Create a Serializer for Scala Enumerations
[ https://issues.apache.org/jira/browse/FLINK-2231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14645773#comment-14645773 ] ASF GitHub Bot commented on FLINK-2231: --- Github user StephanEwen commented on the pull request: https://github.com/apache/flink/pull/935#issuecomment-125890350 Other than the one comment, this looks good. +1 > Create a Serializer for Scala Enumerations > -- > > Key: FLINK-2231 > URL: https://issues.apache.org/jira/browse/FLINK-2231 > Project: Flink > Issue Type: Improvement > Components: Scala API >Reporter: Stephan Ewen >Assignee: Alexander Alexandrov > > Scala Enumerations are currently serialized with Kryo, but should be > efficiently serialized by just writing the {{initial}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-2419] Add test for sinks after keyBy an...
Github user gyfora commented on the pull request: https://github.com/apache/flink/pull/947#issuecomment-125894533 If you give a +1 @rmetzger I will merge this --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-2419) DataStream sinks lose key information
[ https://issues.apache.org/jira/browse/FLINK-2419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14645779#comment-14645779 ] ASF GitHub Bot commented on FLINK-2419: --- Github user gyfora commented on the pull request: https://github.com/apache/flink/pull/947#issuecomment-125894533 If you give a +1 @rmetzger I will merge this > DataStream sinks lose key information > - > > Key: FLINK-2419 > URL: https://issues.apache.org/jira/browse/FLINK-2419 > Project: Flink > Issue Type: Bug > Components: Streaming >Reporter: Gyula Fora >Assignee: Gyula Fora >Priority: Blocker > Fix For: 0.10 > > > If the user applies an addSink method after keyBy, the sink will not have the > information of the key as the transform method is bypassed. > This makes it impossible to use partitioned states with sinks. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-1901] [core] Create sample operator for...
Github user ChengXiangLi commented on the pull request: https://github.com/apache/flink/pull/949#issuecomment-125898830 Previously, i plan to leave the sample scala API to an separate PR as i not very familiar with scala, but the failed test shows that Flink has a test to make sure scala and java has the same API, i would try to add scala API and integration test later. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-1901) Create sample operator for Dataset
[ https://issues.apache.org/jira/browse/FLINK-1901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14645803#comment-14645803 ] ASF GitHub Bot commented on FLINK-1901: --- Github user ChengXiangLi commented on the pull request: https://github.com/apache/flink/pull/949#issuecomment-125898830 Previously, i plan to leave the sample scala API to an separate PR as i not very familiar with scala, but the failed test shows that Flink has a test to make sure scala and java has the same API, i would try to add scala API and integration test later. > Create sample operator for Dataset > -- > > Key: FLINK-1901 > URL: https://issues.apache.org/jira/browse/FLINK-1901 > Project: Flink > Issue Type: Improvement > Components: Core >Reporter: Theodore Vasiloudis >Assignee: Chengxiang Li > > In order to be able to implement Stochastic Gradient Descent and a number of > other machine learning algorithms we need to have a way to take a random > sample from a Dataset. > We need to be able to sample with or without replacement from the Dataset, > choose the relative size of the sample, and set a seed for reproducibility. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (FLINK-2425) Give access to TaskManager config and hostname in the Runtime Environment
Stephan Ewen created FLINK-2425: --- Summary: Give access to TaskManager config and hostname in the Runtime Environment Key: FLINK-2425 URL: https://issues.apache.org/jira/browse/FLINK-2425 Project: Flink Issue Type: Improvement Components: TaskManager Affects Versions: 0.10 Reporter: Stephan Ewen Fix For: 0.10 The RuntimeEnvironment (that is used by the operators to access the context) should give access to the TaskManager's configuration, to allow to read config values. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (FLINK-2426) Create a read-only variant of the Configuration
Stephan Ewen created FLINK-2426: --- Summary: Create a read-only variant of the Configuration Key: FLINK-2426 URL: https://issues.apache.org/jira/browse/FLINK-2426 Project: Flink Issue Type: Sub-task Components: Core Affects Versions: 0.10 Reporter: Stephan Ewen Fix For: 0.10 In order to give the TaskManager configuration to runtime components (or even user code), it should be wrapped in an {{UnmodifiableConfiguration}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2425) Give access to TaskManager config and hostname in the Runtime Environment
[ https://issues.apache.org/jira/browse/FLINK-2425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14645817#comment-14645817 ] Robert Metzger commented on FLINK-2425: --- +1 That is something that users have requested > Give access to TaskManager config and hostname in the Runtime Environment > - > > Key: FLINK-2425 > URL: https://issues.apache.org/jira/browse/FLINK-2425 > Project: Flink > Issue Type: Improvement > Components: TaskManager >Affects Versions: 0.10 >Reporter: Stephan Ewen > Fix For: 0.10 > > > The RuntimeEnvironment (that is used by the operators to access the context) > should give access to the TaskManager's configuration, to allow to read > config values. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-2387] add streaming test case for live ...
Github user StephanEwen commented on the pull request: https://github.com/apache/flink/pull/926#issuecomment-125902465 How does this PR relate to the recent improvements on the stability of the live accumulator tests? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-2387) Add test for live accumulators in Streaming
[ https://issues.apache.org/jira/browse/FLINK-2387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14645821#comment-14645821 ] ASF GitHub Bot commented on FLINK-2387: --- Github user StephanEwen commented on the pull request: https://github.com/apache/flink/pull/926#issuecomment-125902465 How does this PR relate to the recent improvements on the stability of the live accumulator tests? > Add test for live accumulators in Streaming > --- > > Key: FLINK-2387 > URL: https://issues.apache.org/jira/browse/FLINK-2387 > Project: Flink > Issue Type: Test > Components: Streaming >Reporter: Maximilian Michels > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2240) Use BloomFilter to minimize probe side records which are spilled to disk in Hybrid-Hash-Join
[ https://issues.apache.org/jira/browse/FLINK-2240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14645822#comment-14645822 ] ASF GitHub Bot commented on FLINK-2240: --- Github user StephanEwen commented on the pull request: https://github.com/apache/flink/pull/888#issuecomment-125903682 Okay, it took me a while, but I actually walked through this and would like to merge it soon. To make this functionality configurable, I opened an issue to give runtime operators access to the TaskManager config, so that we can define a switch to turn this on and off. I would actually turn it on by default, it seems to work well as far as I have tried it. > Use BloomFilter to minimize probe side records which are spilled to disk in > Hybrid-Hash-Join > > > Key: FLINK-2240 > URL: https://issues.apache.org/jira/browse/FLINK-2240 > Project: Flink > Issue Type: Improvement > Components: Core >Reporter: Chengxiang Li >Assignee: Chengxiang Li >Priority: Minor > > In Hybrid-Hash-Join, while small table does not fit into memory, part of the > small table data would be spilled to disk, and the counterpart partition of > big table data would be spilled to disk in probe phase as well. If we build a > BloomFilter while spill small table to disk during build phase, and use it to > filter the big table records which tend to be spilled to disk, this may > greatly reduce the spilled big table file size, and saved the disk IO cost > for writing and further reading. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-2240] Use BloomFilter to filter probe r...
Github user StephanEwen commented on the pull request: https://github.com/apache/flink/pull/888#issuecomment-125903682 Okay, it took me a while, but I actually walked through this and would like to merge it soon. To make this functionality configurable, I opened an issue to give runtime operators access to the TaskManager config, so that we can define a switch to turn this on and off. I would actually turn it on by default, it seems to work well as far as I have tried it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Created] (FLINK-2427) Allow the BarrierBuffer to maintain multiple queues of blocked inputs
Stephan Ewen created FLINK-2427: --- Summary: Allow the BarrierBuffer to maintain multiple queues of blocked inputs Key: FLINK-2427 URL: https://issues.apache.org/jira/browse/FLINK-2427 Project: Flink Issue Type: Sub-task Components: Streaming Affects Versions: 0.10 Reporter: Stephan Ewen Assignee: Stephan Ewen Fix For: 0.10 In corner cases (dropped barriers due to failures/startup races), this is required for proper operation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-2240] Use BloomFilter to filter probe r...
Github user ChengXiangLi commented on the pull request: https://github.com/apache/flink/pull/888#issuecomment-125914138 Thanks to point that out, Stephan, i didn't notice the configuration issue before. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-2240) Use BloomFilter to minimize probe side records which are spilled to disk in Hybrid-Hash-Join
[ https://issues.apache.org/jira/browse/FLINK-2240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14645842#comment-14645842 ] ASF GitHub Bot commented on FLINK-2240: --- Github user ChengXiangLi commented on the pull request: https://github.com/apache/flink/pull/888#issuecomment-125914138 Thanks to point that out, Stephan, i didn't notice the configuration issue before. > Use BloomFilter to minimize probe side records which are spilled to disk in > Hybrid-Hash-Join > > > Key: FLINK-2240 > URL: https://issues.apache.org/jira/browse/FLINK-2240 > Project: Flink > Issue Type: Improvement > Components: Core >Reporter: Chengxiang Li >Assignee: Chengxiang Li >Priority: Minor > > In Hybrid-Hash-Join, while small table does not fit into memory, part of the > small table data would be spilled to disk, and the counterpart partition of > big table data would be spilled to disk in probe phase as well. If we build a > BloomFilter while spill small table to disk during build phase, and use it to > filter the big table records which tend to be spilled to disk, this may > greatly reduce the spilled big table file size, and saved the disk IO cost > for writing and further reading. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2152) Provide zipWithIndex utility in flink-contrib
[ https://issues.apache.org/jira/browse/FLINK-2152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14645895#comment-14645895 ] Robert Metzger commented on FLINK-2152: --- Yes, we either have to use a concurrent list or create a copy of it. The elements in the broadcast set are shared among the parallel instances. > Provide zipWithIndex utility in flink-contrib > - > > Key: FLINK-2152 > URL: https://issues.apache.org/jira/browse/FLINK-2152 > Project: Flink > Issue Type: Improvement > Components: Java API >Reporter: Robert Metzger >Assignee: Andra Lungu >Priority: Trivial > Labels: starter > Fix For: 0.10 > > > We should provide a simple utility method for zipping elements in a data set > with a dense index. > its up for discussion whether we want it directly in the API or if we should > provide it only as a utility from {{flink-contrib}}. > I would put it in {{flink-contrib}}. > See my answer on SO: > http://stackoverflow.com/questions/30596556/zipwithindex-on-apache-flink -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2152) Provide zipWithIndex utility in flink-contrib
[ https://issues.apache.org/jira/browse/FLINK-2152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14645904#comment-14645904 ] Andra Lungu commented on FLINK-2152: Hey, Since this is Johannes Günther's finding, I suggest he simply opens a PR with what worked for him :) Nice catch BTW! Cheers, Andra > Provide zipWithIndex utility in flink-contrib > - > > Key: FLINK-2152 > URL: https://issues.apache.org/jira/browse/FLINK-2152 > Project: Flink > Issue Type: Improvement > Components: Java API >Reporter: Robert Metzger >Assignee: Andra Lungu >Priority: Trivial > Labels: starter > Fix For: 0.10 > > > We should provide a simple utility method for zipping elements in a data set > with a dense index. > its up for discussion whether we want it directly in the API or if we should > provide it only as a utility from {{flink-contrib}}. > I would put it in {{flink-contrib}}. > See my answer on SO: > http://stackoverflow.com/questions/30596556/zipwithindex-on-apache-flink -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (FLINK-2407) Add an API switch to select between "exactly once" and "at least once" fault tolerance
[ https://issues.apache.org/jira/browse/FLINK-2407?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stephan Ewen reassigned FLINK-2407: --- Assignee: Stephan Ewen > Add an API switch to select between "exactly once" and "at least once" fault > tolerance > -- > > Key: FLINK-2407 > URL: https://issues.apache.org/jira/browse/FLINK-2407 > Project: Flink > Issue Type: Sub-task > Components: Streaming >Affects Versions: 0.10 >Reporter: Stephan Ewen >Assignee: Stephan Ewen > Fix For: 0.10 > > > Based on the addition of the BarrierTracker, we can add a switch to choose > between the two modes "exactly once" and "at least once". -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-2391]Fix Storm-compatibility FlinkTopol...
Github user asfgit closed the pull request at: https://github.com/apache/flink/pull/940 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-2391) Storm-compatibility:method FlinkTopologyBuilder.createTopology() throws java.lang.NullPointerException
[ https://issues.apache.org/jira/browse/FLINK-2391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14645954#comment-14645954 ] ASF GitHub Bot commented on FLINK-2391: --- Github user asfgit closed the pull request at: https://github.com/apache/flink/pull/940 > Storm-compatibility:method FlinkTopologyBuilder.createTopology() throws > java.lang.NullPointerException > -- > > Key: FLINK-2391 > URL: https://issues.apache.org/jira/browse/FLINK-2391 > Project: Flink > Issue Type: Bug > Components: flink-contrib >Affects Versions: 0.10 > Environment: win7 32bit;linux >Reporter: Huang Wei > Labels: features > Fix For: 0.10 > > Original Estimate: 168h > Remaining Estimate: 168h > > core dumped at FlinkOutputFieldsDeclarer.java : 160(package > FlinkOutputFieldsDeclarer). > code : fieldIndexes[i] = this.outputSchema.fieldIndex(groupingFields.get(i)); > in this line, the var this.outputSchema may be null. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: Updated method documentation in joinDataSet.sc...
Github user StephanEwen commented on the pull request: https://github.com/apache/flink/pull/909#issuecomment-125933965 Merging this... --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Closed] (FLINK-2391) Storm-compatibility:method FlinkTopologyBuilder.createTopology() throws java.lang.NullPointerException
[ https://issues.apache.org/jira/browse/FLINK-2391?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stephan Ewen closed FLINK-2391. --- > Storm-compatibility:method FlinkTopologyBuilder.createTopology() throws > java.lang.NullPointerException > -- > > Key: FLINK-2391 > URL: https://issues.apache.org/jira/browse/FLINK-2391 > Project: Flink > Issue Type: Bug > Components: flink-contrib >Affects Versions: 0.10 > Environment: win7 32bit;linux >Reporter: Huang Wei > Labels: features > Fix For: 0.10 > > Original Estimate: 168h > Remaining Estimate: 168h > > core dumped at FlinkOutputFieldsDeclarer.java : 160(package > FlinkOutputFieldsDeclarer). > code : fieldIndexes[i] = this.outputSchema.fieldIndex(groupingFields.get(i)); > in this line, the var this.outputSchema may be null. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (FLINK-2391) Storm-compatibility:method FlinkTopologyBuilder.createTopology() throws java.lang.NullPointerException
[ https://issues.apache.org/jira/browse/FLINK-2391?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stephan Ewen resolved FLINK-2391. - Resolution: Fixed Fixed in ada9037bef760d46a4c3be2177e04bd72e620dad Thank you for the patch! > Storm-compatibility:method FlinkTopologyBuilder.createTopology() throws > java.lang.NullPointerException > -- > > Key: FLINK-2391 > URL: https://issues.apache.org/jira/browse/FLINK-2391 > Project: Flink > Issue Type: Bug > Components: flink-contrib >Affects Versions: 0.10 > Environment: win7 32bit;linux >Reporter: Huang Wei > Labels: features > Fix For: 0.10 > > Original Estimate: 168h > Remaining Estimate: 168h > > core dumped at FlinkOutputFieldsDeclarer.java : 160(package > FlinkOutputFieldsDeclarer). > code : fieldIndexes[i] = this.outputSchema.fieldIndex(groupingFields.get(i)); > in this line, the var this.outputSchema may be null. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: Updated method documentation in joinDataSet.sc...
Github user asfgit closed the pull request at: https://github.com/apache/flink/pull/909 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: Updated method documentation in joinDataSet.sc...
Github user StephanEwen commented on the pull request: https://github.com/apache/flink/pull/909#issuecomment-125934297 Manually merged. Thank you for the patch! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-2231] Create a Serializer for Scala Enu...
Github user asfgit closed the pull request at: https://github.com/apache/flink/pull/935 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-2231) Create a Serializer for Scala Enumerations
[ https://issues.apache.org/jira/browse/FLINK-2231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14645974#comment-14645974 ] ASF GitHub Bot commented on FLINK-2231: --- Github user asfgit closed the pull request at: https://github.com/apache/flink/pull/935 > Create a Serializer for Scala Enumerations > -- > > Key: FLINK-2231 > URL: https://issues.apache.org/jira/browse/FLINK-2231 > Project: Flink > Issue Type: Improvement > Components: Scala API >Reporter: Stephan Ewen >Assignee: Alexander Alexandrov > > Scala Enumerations are currently serialized with Kryo, but should be > efficiently serialized by just writing the {{initial}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-2231] Create a Serializer for Scala Enu...
Github user aljoscha commented on the pull request: https://github.com/apache/flink/pull/935#issuecomment-125936680 Merged, thanks for your work. :smile: --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-2231) Create a Serializer for Scala Enumerations
[ https://issues.apache.org/jira/browse/FLINK-2231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14645977#comment-14645977 ] ASF GitHub Bot commented on FLINK-2231: --- Github user aljoscha commented on the pull request: https://github.com/apache/flink/pull/935#issuecomment-125936680 Merged, thanks for your work. :smile: > Create a Serializer for Scala Enumerations > -- > > Key: FLINK-2231 > URL: https://issues.apache.org/jira/browse/FLINK-2231 > Project: Flink > Issue Type: Improvement > Components: Scala API >Reporter: Stephan Ewen >Assignee: Alexander Alexandrov > Fix For: 0.10 > > > Scala Enumerations are currently serialized with Kryo, but should be > efficiently serialized by just writing the {{initial}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Closed] (FLINK-2231) Create a Serializer for Scala Enumerations
[ https://issues.apache.org/jira/browse/FLINK-2231?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aljoscha Krettek closed FLINK-2231. --- Resolution: Fixed Fix Version/s: 0.10 Implemented in https://github.com/apache/flink/commit/a34c9416790e0cddedb3f2518fd0bea2331cbcc0 > Create a Serializer for Scala Enumerations > -- > > Key: FLINK-2231 > URL: https://issues.apache.org/jira/browse/FLINK-2231 > Project: Flink > Issue Type: Improvement > Components: Scala API >Reporter: Stephan Ewen >Assignee: Alexander Alexandrov > Fix For: 0.10 > > > Scala Enumerations are currently serialized with Kryo, but should be > efficiently serialized by just writing the {{initial}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (FLINK-2425) Give access to TaskManager config and hostname in the Runtime Environment
[ https://issues.apache.org/jira/browse/FLINK-2425?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sachin Goel reassigned FLINK-2425: -- Assignee: Sachin Goel > Give access to TaskManager config and hostname in the Runtime Environment > - > > Key: FLINK-2425 > URL: https://issues.apache.org/jira/browse/FLINK-2425 > Project: Flink > Issue Type: Improvement > Components: TaskManager >Affects Versions: 0.10 >Reporter: Stephan Ewen >Assignee: Sachin Goel > Fix For: 0.10 > > > The RuntimeEnvironment (that is used by the operators to access the context) > should give access to the TaskManager's configuration, to allow to read > config values. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2425) Give access to TaskManager config and hostname in the Runtime Environment
[ https://issues.apache.org/jira/browse/FLINK-2425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14645984#comment-14645984 ] Stephan Ewen commented on FLINK-2425: - Could you describe in a few lines how you mean to implement this? > Give access to TaskManager config and hostname in the Runtime Environment > - > > Key: FLINK-2425 > URL: https://issues.apache.org/jira/browse/FLINK-2425 > Project: Flink > Issue Type: Improvement > Components: TaskManager >Affects Versions: 0.10 >Reporter: Stephan Ewen >Assignee: Sachin Goel > Fix For: 0.10 > > > The RuntimeEnvironment (that is used by the operators to access the context) > should give access to the TaskManager's configuration, to allow to read > config values. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (FLINK-2428) Clean up unused properties in StreamConfig
Stephan Ewen created FLINK-2428: --- Summary: Clean up unused properties in StreamConfig Key: FLINK-2428 URL: https://issues.apache.org/jira/browse/FLINK-2428 Project: Flink Issue Type: Bug Components: Streaming Affects Versions: 0.10 Reporter: Stephan Ewen Priority: Minor There is a multitude of unused properties in the {{StreamConfig}}, which should be removed, if no longer relevant. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2425) Give access to TaskManager config and hostname in the Runtime Environment
[ https://issues.apache.org/jira/browse/FLINK-2425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14645993#comment-14645993 ] Sachin Goel commented on FLINK-2425: The readonly configuration will take the actual configuration as an argument in its constructor and provide all access functions of Configuration, while keeping it private. > Give access to TaskManager config and hostname in the Runtime Environment > - > > Key: FLINK-2425 > URL: https://issues.apache.org/jira/browse/FLINK-2425 > Project: Flink > Issue Type: Improvement > Components: TaskManager >Affects Versions: 0.10 >Reporter: Stephan Ewen >Assignee: Sachin Goel > Fix For: 0.10 > > > The RuntimeEnvironment (that is used by the operators to access the context) > should give access to the TaskManager's configuration, to allow to read > config values. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2425) Give access to TaskManager config and hostname in the Runtime Environment
[ https://issues.apache.org/jira/browse/FLINK-2425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14645997#comment-14645997 ] Stephan Ewen commented on FLINK-2425: - Ah, I think you are in the wrong issue ;-) That would be here [FLINK-2426] > Give access to TaskManager config and hostname in the Runtime Environment > - > > Key: FLINK-2425 > URL: https://issues.apache.org/jira/browse/FLINK-2425 > Project: Flink > Issue Type: Improvement > Components: TaskManager >Affects Versions: 0.10 >Reporter: Stephan Ewen >Assignee: Sachin Goel > Fix For: 0.10 > > > The RuntimeEnvironment (that is used by the operators to access the context) > should give access to the TaskManager's configuration, to allow to read > config values. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2425) Give access to TaskManager config and hostname in the Runtime Environment
[ https://issues.apache.org/jira/browse/FLINK-2425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14646003#comment-14646003 ] Sachin Goel commented on FLINK-2425: Yes. But this will certainly use the readonly configuration defined there. After that it's just a matter of passing it to the RuntimeEnvironment which can give access to the user by passing it through the DistributedContext. Or is there a better way? :') > Give access to TaskManager config and hostname in the Runtime Environment > - > > Key: FLINK-2425 > URL: https://issues.apache.org/jira/browse/FLINK-2425 > Project: Flink > Issue Type: Improvement > Components: TaskManager >Affects Versions: 0.10 >Reporter: Stephan Ewen >Assignee: Sachin Goel > Fix For: 0.10 > > > The RuntimeEnvironment (that is used by the operators to access the context) > should give access to the TaskManager's configuration, to allow to read > config values. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2425) Give access to TaskManager config and hostname in the Runtime Environment
[ https://issues.apache.org/jira/browse/FLINK-2425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14646019#comment-14646019 ] Stephan Ewen commented on FLINK-2425: - The TaskManager configuration object needs only go into the RuntimeEnviromnent, we only need it at the internal operators. Exposing it to the user functions would be another thing. Probably useful as well. > Give access to TaskManager config and hostname in the Runtime Environment > - > > Key: FLINK-2425 > URL: https://issues.apache.org/jira/browse/FLINK-2425 > Project: Flink > Issue Type: Improvement > Components: TaskManager >Affects Versions: 0.10 >Reporter: Stephan Ewen >Assignee: Sachin Goel > Fix For: 0.10 > > > The RuntimeEnvironment (that is used by the operators to access the context) > should give access to the TaskManager's configuration, to allow to read > config values. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: Cascading changes for compatibility
GitHub user mxm opened a pull request: https://github.com/apache/flink/pull/950 Cascading changes for compatibility @fhueske and me are working on getting Cascading to run on top of Flink. These two commits introduce changes that were necessary to make the translation possible. Up to debate is the second commit, which loads the userclassloader before calling `configure` on an InputFormat. This is necessary because Cascading has its own interface that needs to be wrapped inside a Flink `InputFormat`. Usually, all dependencies are loaded upon instantiation of the class. However, Cascading loads its own input format while `configure(config)` is called (via the class name inside the passed config). Without the changes, this leads to a ClassNotFoundException. Let me know what you think about it. You can merge this pull request into a Git repository by running: $ git pull https://github.com/mxm/flink cascading-dev Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/950.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #950 commit 433fec2a90b5e868d0c3b7fa258b85c32345b954 Author: Fabian Hueske Date: 2015-07-16T22:31:09Z [cascading] add getJobConf() to HadoopInputSplit commit a81582c3cf59952381ce5fb9e15adeb775fcbff7 Author: Maximilian Michels Date: 2015-07-29T12:51:14Z [cascading] load user classloader when configuring InputFormat --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-2425) Give access to TaskManager config and hostname in the Runtime Environment
[ https://issues.apache.org/jira/browse/FLINK-2425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14646023#comment-14646023 ] Sachin Goel commented on FLINK-2425: Should I provide access to configuration via RuntimeContext as well? Since user cannot modify it, there can't be any harm to doing it. Access to config parameters might help the user write the program better. > Give access to TaskManager config and hostname in the Runtime Environment > - > > Key: FLINK-2425 > URL: https://issues.apache.org/jira/browse/FLINK-2425 > Project: Flink > Issue Type: Improvement > Components: TaskManager >Affects Versions: 0.10 >Reporter: Stephan Ewen >Assignee: Sachin Goel > Fix For: 0.10 > > > The RuntimeEnvironment (that is used by the operators to access the context) > should give access to the TaskManager's configuration, to allow to read > config values. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (FLINK-2429) Remove the "enableCheckpointing()" without interval variant
Stephan Ewen created FLINK-2429: --- Summary: Remove the "enableCheckpointing()" without interval variant Key: FLINK-2429 URL: https://issues.apache.org/jira/browse/FLINK-2429 Project: Flink Issue Type: Bug Components: Streaming Reporter: Stephan Ewen -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (FLINK-2429) Remove the "enableCheckpointing()" without interval variant
[ https://issues.apache.org/jira/browse/FLINK-2429?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stephan Ewen updated FLINK-2429: Issue Type: Wish (was: Bug) > Remove the "enableCheckpointing()" without interval variant > --- > > Key: FLINK-2429 > URL: https://issues.apache.org/jira/browse/FLINK-2429 > Project: Flink > Issue Type: Wish > Components: Streaming >Reporter: Stephan Ewen > > It is not very obvious what the default checkpointing interval. Also, when > somebody activates -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (FLINK-2429) Remove the "enableCheckpointing()" without interval variant
[ https://issues.apache.org/jira/browse/FLINK-2429?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stephan Ewen updated FLINK-2429: Description: I think it is not very obvious what the default checkpointing interval is. Also, when somebody activates checkpointing, shouldn't they think about what they want in terms of frequency and recovery latency tradeoffs? was:It is not very obvious what the default checkpointing interval. Also, when somebody activates > Remove the "enableCheckpointing()" without interval variant > --- > > Key: FLINK-2429 > URL: https://issues.apache.org/jira/browse/FLINK-2429 > Project: Flink > Issue Type: Wish > Components: Streaming >Reporter: Stephan Ewen >Priority: Minor > > I think it is not very obvious what the default checkpointing interval is. > Also, when somebody activates checkpointing, shouldn't they think about what > they want in terms of frequency and recovery latency tradeoffs? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (FLINK-2429) Remove the "enableCheckpointing()" without interval variant
[ https://issues.apache.org/jira/browse/FLINK-2429?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stephan Ewen updated FLINK-2429: Priority: Minor (was: Major) > Remove the "enableCheckpointing()" without interval variant > --- > > Key: FLINK-2429 > URL: https://issues.apache.org/jira/browse/FLINK-2429 > Project: Flink > Issue Type: Wish > Components: Streaming >Reporter: Stephan Ewen >Priority: Minor > > It is not very obvious what the default checkpointing interval. Also, when > somebody activates -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (FLINK-2429) Remove the "enableCheckpointing()" without interval variant
[ https://issues.apache.org/jira/browse/FLINK-2429?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stephan Ewen updated FLINK-2429: Description: It is not very obvious what the default checkpointing interval. Also, when somebody activates > Remove the "enableCheckpointing()" without interval variant > --- > > Key: FLINK-2429 > URL: https://issues.apache.org/jira/browse/FLINK-2429 > Project: Flink > Issue Type: Bug > Components: Streaming >Reporter: Stephan Ewen > > It is not very obvious what the default checkpointing interval. Also, when > somebody activates -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2425) Give access to TaskManager config and hostname in the Runtime Environment
[ https://issues.apache.org/jira/browse/FLINK-2425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14646029#comment-14646029 ] Stephan Ewen commented on FLINK-2425: - How about opening a pull request for the RuntimeEnvironment first? This is needed soon, the RuntimeContext is not urgent. > Give access to TaskManager config and hostname in the Runtime Environment > - > > Key: FLINK-2425 > URL: https://issues.apache.org/jira/browse/FLINK-2425 > Project: Flink > Issue Type: Improvement > Components: TaskManager >Affects Versions: 0.10 >Reporter: Stephan Ewen >Assignee: Sachin Goel > Fix For: 0.10 > > > The RuntimeEnvironment (that is used by the operators to access the context) > should give access to the TaskManager's configuration, to allow to read > config values. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2387) Add test for live accumulators in Streaming
[ https://issues.apache.org/jira/browse/FLINK-2387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14646039#comment-14646039 ] ASF GitHub Bot commented on FLINK-2387: --- Github user mxm commented on the pull request: https://github.com/apache/flink/pull/926#issuecomment-125955678 Sorry for the plain description. This pull request adds a test for the streaming part of the live accumulators, i.e. it makes sure that user-defined and Flink internal accumulators work also in streaming programs. The current test only tests the batch side. The stability of the live accumulator tests should not be affected by this pull request. It uses the same technique as the current (improved) live accumulator tests. > Add test for live accumulators in Streaming > --- > > Key: FLINK-2387 > URL: https://issues.apache.org/jira/browse/FLINK-2387 > Project: Flink > Issue Type: Test > Components: Streaming >Reporter: Maximilian Michels > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-2387] add streaming test case for live ...
Github user mxm commented on the pull request: https://github.com/apache/flink/pull/926#issuecomment-125955678 Sorry for the plain description. This pull request adds a test for the streaming part of the live accumulators, i.e. it makes sure that user-defined and Flink internal accumulators work also in streaming programs. The current test only tests the batch side. The stability of the live accumulator tests should not be affected by this pull request. It uses the same technique as the current (improved) live accumulator tests. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-2407] [streaming] Add an API switch to ...
GitHub user StephanEwen opened a pull request: https://github.com/apache/flink/pull/951 [FLINK-2407] [streaming] Add an API switch to choose between "exactly once" and "at least once". Adds a switch to choose between **exactly once** and **at least once** checkpointing mode. Exactly Once == Sets the checkpointing mode to "exactly once". This mode means that the system will checkpoint the operator and user function state in such a way that, upon recovery, every record will be reflected exactly once in the operator state. For example, if a user function counts the number of elements in a stream, this number will consistently be equal to the number of actual elements in the stream, regardless of failures and recovery. Note that this does not mean that each record flows through the streaming data flow only once. It means that upon recovery, the state of operators/functions is restored such that the resumed data streams pick up exactly at after the last modification to the state. Note that this mode does not guarantee exactly-once behavior in the interaction with external systems (only state in Flink's operators and user functions). The reason for that is that a certain level of "collaboration" is required between two systems to achieve exactly-once guarantees. However, for certain systems, connectors can be written that facilitate this collaboration. This mode sustains high throughput. Depending on the data flow graph and operations, this mode may increase the record latency, because operators need to align their input streams, in order to create a consistent snapshot point. The latency increase for simple dataflows (no repartitioning) is negligible. For simple dataflows with repartitioning, the average latency remains small, but the slowest records typically have an increased latency. At Least Once === Sets the checkpointing mode to "at least once". This mode means that the system will checkpoint the operator and user function state in a simpler way. Upon failure and recovery, some records may be reflected multiple times in the operator state. For example, if a user function counts the number of elements in a stream, this number will equal to, or larger, than the actual number of elements in the stream, in the presence of failure and recovery. This mode has minimal impact on latency and may be preferable in very-low latency scenarios, where a sustained very-low latency (such as few milliseconds) is needed, and where occasional duplicate messages (on recovery) do not matter. You can merge this pull request into a Git repository by running: $ git pull https://github.com/StephanEwen/incubator-flink at_least_once_switch Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/951.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #951 commit b089efa6ef688d61f41463d644768845393a913b Author: Stephan Ewen Date: 2015-07-29T12:12:42Z [FLINK-2407] [streaming] Add an API switch to choose between "exactly once" and "at least once". commit 71bd9c01aa24c9420766c9c79ff80618341b8e69 Author: Stephan Ewen Date: 2015-07-29T12:49:23Z [hotfix] Code cleanups in the StreamConfig --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-2407) Add an API switch to select between "exactly once" and "at least once" fault tolerance
[ https://issues.apache.org/jira/browse/FLINK-2407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14646072#comment-14646072 ] ASF GitHub Bot commented on FLINK-2407: --- GitHub user StephanEwen opened a pull request: https://github.com/apache/flink/pull/951 [FLINK-2407] [streaming] Add an API switch to choose between "exactly once" and "at least once". Adds a switch to choose between **exactly once** and **at least once** checkpointing mode. Exactly Once == Sets the checkpointing mode to "exactly once". This mode means that the system will checkpoint the operator and user function state in such a way that, upon recovery, every record will be reflected exactly once in the operator state. For example, if a user function counts the number of elements in a stream, this number will consistently be equal to the number of actual elements in the stream, regardless of failures and recovery. Note that this does not mean that each record flows through the streaming data flow only once. It means that upon recovery, the state of operators/functions is restored such that the resumed data streams pick up exactly at after the last modification to the state. Note that this mode does not guarantee exactly-once behavior in the interaction with external systems (only state in Flink's operators and user functions). The reason for that is that a certain level of "collaboration" is required between two systems to achieve exactly-once guarantees. However, for certain systems, connectors can be written that facilitate this collaboration. This mode sustains high throughput. Depending on the data flow graph and operations, this mode may increase the record latency, because operators need to align their input streams, in order to create a consistent snapshot point. The latency increase for simple dataflows (no repartitioning) is negligible. For simple dataflows with repartitioning, the average latency remains small, but the slowest records typically have an increased latency. At Least Once === Sets the checkpointing mode to "at least once". This mode means that the system will checkpoint the operator and user function state in a simpler way. Upon failure and recovery, some records may be reflected multiple times in the operator state. For example, if a user function counts the number of elements in a stream, this number will equal to, or larger, than the actual number of elements in the stream, in the presence of failure and recovery. This mode has minimal impact on latency and may be preferable in very-low latency scenarios, where a sustained very-low latency (such as few milliseconds) is needed, and where occasional duplicate messages (on recovery) do not matter. You can merge this pull request into a Git repository by running: $ git pull https://github.com/StephanEwen/incubator-flink at_least_once_switch Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/951.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #951 commit b089efa6ef688d61f41463d644768845393a913b Author: Stephan Ewen Date: 2015-07-29T12:12:42Z [FLINK-2407] [streaming] Add an API switch to choose between "exactly once" and "at least once". commit 71bd9c01aa24c9420766c9c79ff80618341b8e69 Author: Stephan Ewen Date: 2015-07-29T12:49:23Z [hotfix] Code cleanups in the StreamConfig > Add an API switch to select between "exactly once" and "at least once" fault > tolerance > -- > > Key: FLINK-2407 > URL: https://issues.apache.org/jira/browse/FLINK-2407 > Project: Flink > Issue Type: Sub-task > Components: Streaming >Affects Versions: 0.10 >Reporter: Stephan Ewen >Assignee: Stephan Ewen > Fix For: 0.10 > > > Based on the addition of the BarrierTracker, we can add a switch to choose > between the two modes "exactly once" and "at least once". -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: Cascading changes for compatibility
Github user StephanEwen commented on the pull request: https://github.com/apache/flink/pull/950#issuecomment-125962637 For robustness, can you restore the thread context classloader to the original one in a finally clause? Otherwise +1 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-2387) Add test for live accumulators in Streaming
[ https://issues.apache.org/jira/browse/FLINK-2387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14646081#comment-14646081 ] ASF GitHub Bot commented on FLINK-2387: --- Github user StephanEwen commented on the pull request: https://github.com/apache/flink/pull/926#issuecomment-125963497 Looks good > Add test for live accumulators in Streaming > --- > > Key: FLINK-2387 > URL: https://issues.apache.org/jira/browse/FLINK-2387 > Project: Flink > Issue Type: Test > Components: Streaming >Reporter: Maximilian Michels > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-2387] add streaming test case for live ...
Github user StephanEwen commented on the pull request: https://github.com/apache/flink/pull/926#issuecomment-125963497 Looks good --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-1927][py] Operator distribution rework
Github user mxm commented on a diff in the pull request: https://github.com/apache/flink/pull/931#discussion_r35763245 --- Diff: flink-staging/flink-language-binding/flink-python/src/main/java/org/apache/flink/languagebinding/api/java/python/streaming/PythonStreamer.java --- @@ -114,31 +97,12 @@ public void run() { Runtime.getRuntime().addShutdownHook(shutdownThread); - socket = server.accept(); - in = socket.getInputStream(); - out = socket.getOutputStream(); - - byte[] opSize = new byte[4]; - putInt(opSize, 0, operator.length); - out.write(opSize, 0, 4); - out.write(operator, 0, operator.length); - - byte[] meta = importString.toString().getBytes("utf-8"); - putInt(opSize, 0, meta.length); - out.write(opSize, 0, 4); - out.write(meta, 0, meta.length); - - byte[] input = inputFilePath.getBytes("utf-8"); - putInt(opSize, 0, input.length); - out.write(opSize, 0, 4); - out.write(input, 0, input.length); - - byte[] output = outputFilePath.getBytes("utf-8"); - putInt(opSize, 0, output.length); - out.write(opSize, 0, 4); - out.write(output, 0, output.length); - - out.flush(); + process.getOutputStream().write("operator\n".getBytes()); + process.getOutputStream().write(("" + server.getLocalPort() + "\n").getBytes()); + process.getOutputStream().write((id + "\n").getBytes()); + process.getOutputStream().write((inputFilePath + "\n").getBytes()); + process.getOutputStream().write((outputFilePath + "\n").getBytes()); + process.getOutputStream().flush(); --- End diff -- We could reuse `process.getOutputStream()` here by saving it to a variable. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-1927) [Py] Rework operator distribution
[ https://issues.apache.org/jira/browse/FLINK-1927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14646088#comment-14646088 ] ASF GitHub Bot commented on FLINK-1927: --- Github user mxm commented on a diff in the pull request: https://github.com/apache/flink/pull/931#discussion_r35763245 --- Diff: flink-staging/flink-language-binding/flink-python/src/main/java/org/apache/flink/languagebinding/api/java/python/streaming/PythonStreamer.java --- @@ -114,31 +97,12 @@ public void run() { Runtime.getRuntime().addShutdownHook(shutdownThread); - socket = server.accept(); - in = socket.getInputStream(); - out = socket.getOutputStream(); - - byte[] opSize = new byte[4]; - putInt(opSize, 0, operator.length); - out.write(opSize, 0, 4); - out.write(operator, 0, operator.length); - - byte[] meta = importString.toString().getBytes("utf-8"); - putInt(opSize, 0, meta.length); - out.write(opSize, 0, 4); - out.write(meta, 0, meta.length); - - byte[] input = inputFilePath.getBytes("utf-8"); - putInt(opSize, 0, input.length); - out.write(opSize, 0, 4); - out.write(input, 0, input.length); - - byte[] output = outputFilePath.getBytes("utf-8"); - putInt(opSize, 0, output.length); - out.write(opSize, 0, 4); - out.write(output, 0, output.length); - - out.flush(); + process.getOutputStream().write("operator\n".getBytes()); + process.getOutputStream().write(("" + server.getLocalPort() + "\n").getBytes()); + process.getOutputStream().write((id + "\n").getBytes()); + process.getOutputStream().write((inputFilePath + "\n").getBytes()); + process.getOutputStream().write((outputFilePath + "\n").getBytes()); + process.getOutputStream().flush(); --- End diff -- We could reuse `process.getOutputStream()` here by saving it to a variable. > [Py] Rework operator distribution > - > > Key: FLINK-1927 > URL: https://issues.apache.org/jira/browse/FLINK-1927 > Project: Flink > Issue Type: Improvement > Components: Python API >Affects Versions: 0.9 >Reporter: Chesnay Schepler >Assignee: Chesnay Schepler >Priority: Minor > Fix For: 0.9 > > > Currently, the python operator is created when execution the python plan > file, serialized using dill and saved as a byte[] in the java function. It is > then deserialized at runtime on each node. > The current implementation is fairly hacky, and imposes certain limitations > that make it hard to work with. Chaining, or generally saving other > user-code, always requires a separate deserialization step after > deserializing the operator. > These issues can be easily circumvented by rebuilding the (python) plan on > each node, instead of serializing the operator. The plan creation is > deterministic, and every operator is uniquely identified by an ID that is > already known to the java function. > This change will allow us to easily support custom serializers. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1927) [Py] Rework operator distribution
[ https://issues.apache.org/jira/browse/FLINK-1927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14646216#comment-14646216 ] ASF GitHub Bot commented on FLINK-1927: --- Github user mxm commented on the pull request: https://github.com/apache/flink/pull/931#issuecomment-125979125 Thanks for the pull request @zentol! +1 for removing the dill library. As far as I can see, we handle all the serialization ourselves now. We only used the Dill library to serialize the user-defined function alongside with the operator. Now, the operator is extracted from the plan which has been distributed in the Python files to the nodes beforehand. The plan is only send once to generate the Java execution plan. The old behavior was to serialize the operator, pass it to Java, and sent it back again during execution. Performance-wise the new implementation could even be a bit faster. +1 for explicitly passing the file paths. Java and Python have different ways to determine temporary file paths and this has been a problem in the passed on some platforms. Your changes are sensible and this looks to merge. Changes look sensible and good to me. > [Py] Rework operator distribution > - > > Key: FLINK-1927 > URL: https://issues.apache.org/jira/browse/FLINK-1927 > Project: Flink > Issue Type: Improvement > Components: Python API >Affects Versions: 0.9 >Reporter: Chesnay Schepler >Assignee: Chesnay Schepler >Priority: Minor > Fix For: 0.9 > > > Currently, the python operator is created when execution the python plan > file, serialized using dill and saved as a byte[] in the java function. It is > then deserialized at runtime on each node. > The current implementation is fairly hacky, and imposes certain limitations > that make it hard to work with. Chaining, or generally saving other > user-code, always requires a separate deserialization step after > deserializing the operator. > These issues can be easily circumvented by rebuilding the (python) plan on > each node, instead of serializing the operator. The plan creation is > deterministic, and every operator is uniquely identified by an ID that is > already known to the java function. > This change will allow us to easily support custom serializers. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-1927][py] Operator distribution rework
Github user mxm commented on the pull request: https://github.com/apache/flink/pull/931#issuecomment-125979125 Thanks for the pull request @zentol! +1 for removing the dill library. As far as I can see, we handle all the serialization ourselves now. We only used the Dill library to serialize the user-defined function alongside with the operator. Now, the operator is extracted from the plan which has been distributed in the Python files to the nodes beforehand. The plan is only send once to generate the Java execution plan. The old behavior was to serialize the operator, pass it to Java, and sent it back again during execution. Performance-wise the new implementation could even be a bit faster. +1 for explicitly passing the file paths. Java and Python have different ways to determine temporary file paths and this has been a problem in the passed on some platforms. Your changes are sensible and this looks to merge. Changes look sensible and good to me. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-1927][py] Operator distribution rework
Github user zentol commented on the pull request: https://github.com/apache/flink/pull/931#issuecomment-125983186 Thanks for the review @mxm . I've addressed the cosmetic issue you mentioned, and added a small fix for a separate issue as well (error reporting was partially broken). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-1927) [Py] Rework operator distribution
[ https://issues.apache.org/jira/browse/FLINK-1927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14646239#comment-14646239 ] ASF GitHub Bot commented on FLINK-1927: --- Github user zentol commented on the pull request: https://github.com/apache/flink/pull/931#issuecomment-125983186 Thanks for the review @mxm . I've addressed the cosmetic issue you mentioned, and added a small fix for a separate issue as well (error reporting was partially broken). > [Py] Rework operator distribution > - > > Key: FLINK-1927 > URL: https://issues.apache.org/jira/browse/FLINK-1927 > Project: Flink > Issue Type: Improvement > Components: Python API >Affects Versions: 0.9 >Reporter: Chesnay Schepler >Assignee: Chesnay Schepler >Priority: Minor > Fix For: 0.9 > > > Currently, the python operator is created when execution the python plan > file, serialized using dill and saved as a byte[] in the java function. It is > then deserialized at runtime on each node. > The current implementation is fairly hacky, and imposes certain limitations > that make it hard to work with. Chaining, or generally saving other > user-code, always requires a separate deserialization step after > deserializing the operator. > These issues can be easily circumvented by rebuilding the (python) plan on > each node, instead of serializing the operator. The plan creation is > deterministic, and every operator is uniquely identified by an ID that is > already known to the java function. > This change will allow us to easily support custom serializers. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2425) Give access to TaskManager config and hostname in the Runtime Environment
[ https://issues.apache.org/jira/browse/FLINK-2425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14646251#comment-14646251 ] ASF GitHub Bot commented on FLINK-2425: --- GitHub user sachingoel0101 opened a pull request: https://github.com/apache/flink/pull/952 [FLINK-2425]Provide access to task manager configuration from RuntimeEnvironment Also fixes [FLINK-2426]: Define an UnmodifiableConfiguration class which doesn't allow modifications to the underlying configuration object. You can merge this pull request into a Git repository by running: $ git pull https://github.com/sachingoel0101/flink flink-2426 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/952.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #952 commit a8b0debbd60dd4eeb01e007e0b736f32f3724c71 Author: Sachin Goel Date: 2015-07-29T13:21:58Z [FLINK-2425]Provide access to task manager configuration inside RuntimeEnvironment > Give access to TaskManager config and hostname in the Runtime Environment > - > > Key: FLINK-2425 > URL: https://issues.apache.org/jira/browse/FLINK-2425 > Project: Flink > Issue Type: Improvement > Components: TaskManager >Affects Versions: 0.10 >Reporter: Stephan Ewen >Assignee: Sachin Goel > Fix For: 0.10 > > > The RuntimeEnvironment (that is used by the operators to access the context) > should give access to the TaskManager's configuration, to allow to read > config values. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-2425]Provide access to task manager con...
GitHub user sachingoel0101 opened a pull request: https://github.com/apache/flink/pull/952 [FLINK-2425]Provide access to task manager configuration from RuntimeEnvironment Also fixes [FLINK-2426]: Define an UnmodifiableConfiguration class which doesn't allow modifications to the underlying configuration object. You can merge this pull request into a Git repository by running: $ git pull https://github.com/sachingoel0101/flink flink-2426 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/952.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #952 commit a8b0debbd60dd4eeb01e007e0b736f32f3724c71 Author: Sachin Goel Date: 2015-07-29T13:21:58Z [FLINK-2425]Provide access to task manager configuration inside RuntimeEnvironment --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: Cascading changes for compatibility
Github user mxm commented on the pull request: https://github.com/apache/flink/pull/950#issuecomment-125988920 Yes, I've updated the pull request. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-2399] Version checks for Job Manager an...
Github user sachingoel0101 commented on the pull request: https://github.com/apache/flink/pull/945#issuecomment-125991194 1. Added version checks between JobClient and JobManager. 2. Versions are accessed from the Configuration class now, since flink-core gets built first and version is readily available, leading to exact verification. 3. Added a dummy version string to allow passing of tests in IDE. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-2399) Fail when actor versions don't match
[ https://issues.apache.org/jira/browse/FLINK-2399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14646279#comment-14646279 ] ASF GitHub Bot commented on FLINK-2399: --- Github user sachingoel0101 commented on the pull request: https://github.com/apache/flink/pull/945#issuecomment-125991194 1. Added version checks between JobClient and JobManager. 2. Versions are accessed from the Configuration class now, since flink-core gets built first and version is readily available, leading to exact verification. 3. Added a dummy version string to allow passing of tests in IDE. > Fail when actor versions don't match > > > Key: FLINK-2399 > URL: https://issues.apache.org/jira/browse/FLINK-2399 > Project: Flink > Issue Type: Improvement > Components: JobManager, TaskManager >Affects Versions: 0.9, master >Reporter: Ufuk Celebi >Assignee: Sachin Goel >Priority: Minor > Fix For: 0.10 > > > Problem: there can be subtle errors when actors from different Flink versions > communicate with each other, for example when an old client (e.g. Flink 0.9) > communicates with a new JobManager (e.g. Flink 0.10-SNAPSHOT). > We can check that the versions match on first communication between the > actors and fail if they don't match. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-2407] [streaming] Add an API switch to ...
Github user gyfora commented on a diff in the pull request: https://github.com/apache/flink/pull/951#discussion_r35778711 --- Diff: flink-staging/flink-streaming/flink-streaming-core/src/main/java/org/apache/flink/streaming/api/graph/StreamingJobGraphGenerator.java --- @@ -269,10 +270,19 @@ private void setVertexConfig(Integer vertexID, StreamConfig config, config.setNumberOfOutputs(nonChainableOutputs.size()); config.setNonChainedOutputs(nonChainableOutputs); config.setChainedOutputs(chainableOutputs); - config.setStateMonitoring(streamGraph.isCheckpointingEnabled()); - config.setStateHandleProvider(streamGraph.getStateHandleProvider()); + + config.setCheckpointingEnabled(streamGraph.isCheckpointingEnabled()); + if (streamGraph.isCheckpointingEnabled()) { + config.setCheckpointMode(streamGraph.getCheckpointingMode()); + config.setStateHandleProvider(streamGraph.getStateHandleProvider()); + } else { + // the at least once input handler is slightly cheaper (in the absence of checkpoints), + // so we use that one if checkpointing is not enabled + config.setCheckpointMode(CheckpointingMode.AT_LEAST_ONCE); --- End diff -- So we set an at_least_once handler even though we won't receive any barriers? Why not add an OFF mode which will use a "no-op barrier handler" or skip the barrier handler altogether? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-2407) Add an API switch to select between "exactly once" and "at least once" fault tolerance
[ https://issues.apache.org/jira/browse/FLINK-2407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14646336#comment-14646336 ] ASF GitHub Bot commented on FLINK-2407: --- Github user gyfora commented on a diff in the pull request: https://github.com/apache/flink/pull/951#discussion_r35778711 --- Diff: flink-staging/flink-streaming/flink-streaming-core/src/main/java/org/apache/flink/streaming/api/graph/StreamingJobGraphGenerator.java --- @@ -269,10 +270,19 @@ private void setVertexConfig(Integer vertexID, StreamConfig config, config.setNumberOfOutputs(nonChainableOutputs.size()); config.setNonChainedOutputs(nonChainableOutputs); config.setChainedOutputs(chainableOutputs); - config.setStateMonitoring(streamGraph.isCheckpointingEnabled()); - config.setStateHandleProvider(streamGraph.getStateHandleProvider()); + + config.setCheckpointingEnabled(streamGraph.isCheckpointingEnabled()); + if (streamGraph.isCheckpointingEnabled()) { + config.setCheckpointMode(streamGraph.getCheckpointingMode()); + config.setStateHandleProvider(streamGraph.getStateHandleProvider()); + } else { + // the at least once input handler is slightly cheaper (in the absence of checkpoints), + // so we use that one if checkpointing is not enabled + config.setCheckpointMode(CheckpointingMode.AT_LEAST_ONCE); --- End diff -- So we set an at_least_once handler even though we won't receive any barriers? Why not add an OFF mode which will use a "no-op barrier handler" or skip the barrier handler altogether? > Add an API switch to select between "exactly once" and "at least once" fault > tolerance > -- > > Key: FLINK-2407 > URL: https://issues.apache.org/jira/browse/FLINK-2407 > Project: Flink > Issue Type: Sub-task > Components: Streaming >Affects Versions: 0.10 >Reporter: Stephan Ewen >Assignee: Stephan Ewen > Fix For: 0.10 > > > Based on the addition of the BarrierTracker, we can add a switch to choose > between the two modes "exactly once" and "at least once". -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-2407] [streaming] Add an API switch to ...
Github user gyfora commented on the pull request: https://github.com/apache/flink/pull/951#issuecomment-126005919 Aside from my minor comment, this looks very good :+1: --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-2407) Add an API switch to select between "exactly once" and "at least once" fault tolerance
[ https://issues.apache.org/jira/browse/FLINK-2407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14646341#comment-14646341 ] ASF GitHub Bot commented on FLINK-2407: --- Github user gyfora commented on the pull request: https://github.com/apache/flink/pull/951#issuecomment-126005919 Aside from my minor comment, this looks very good :+1: > Add an API switch to select between "exactly once" and "at least once" fault > tolerance > -- > > Key: FLINK-2407 > URL: https://issues.apache.org/jira/browse/FLINK-2407 > Project: Flink > Issue Type: Sub-task > Components: Streaming >Affects Versions: 0.10 >Reporter: Stephan Ewen >Assignee: Stephan Ewen > Fix For: 0.10 > > > Based on the addition of the BarrierTracker, we can add a switch to choose > between the two modes "exactly once" and "at least once". -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-2407] [streaming] Add an API switch to ...
Github user StephanEwen commented on the pull request: https://github.com/apache/flink/pull/951#issuecomment-126007290 Yeah, I thought about an "off handler". Turns out that the BarrierTracker is almost like an "off handler" when no barriers arrive. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-2407) Add an API switch to select between "exactly once" and "at least once" fault tolerance
[ https://issues.apache.org/jira/browse/FLINK-2407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14646350#comment-14646350 ] ASF GitHub Bot commented on FLINK-2407: --- Github user StephanEwen commented on the pull request: https://github.com/apache/flink/pull/951#issuecomment-126007290 Yeah, I thought about an "off handler". Turns out that the BarrierTracker is almost like an "off handler" when no barriers arrive. > Add an API switch to select between "exactly once" and "at least once" fault > tolerance > -- > > Key: FLINK-2407 > URL: https://issues.apache.org/jira/browse/FLINK-2407 > Project: Flink > Issue Type: Sub-task > Components: Streaming >Affects Versions: 0.10 >Reporter: Stephan Ewen >Assignee: Stephan Ewen > Fix For: 0.10 > > > Based on the addition of the BarrierTracker, we can add a switch to choose > between the two modes "exactly once" and "at least once". -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-2407] [streaming] Add an API switch to ...
Github user gyfora commented on the pull request: https://github.com/apache/flink/pull/951#issuecomment-126008434 Yes, I double checked and you are right. This is practically as lightweight as it gets. :) +1 to merge --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---