[GitHub] flink pull request: [FLINK-2425]Provide access to task manager con...
GitHub user sachingoel0101 opened a pull request: https://github.com/apache/flink/pull/952 [FLINK-2425]Provide access to task manager configuration from RuntimeEnvironment Also fixes [FLINK-2426]: Define an UnmodifiableConfiguration class which doesn't allow modifications to the underlying configuration object. You can merge this pull request into a Git repository by running: $ git pull https://github.com/sachingoel0101/flink flink-2426 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/952.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #952 commit a8b0debbd60dd4eeb01e007e0b736f32f3724c71 Author: Sachin Goel sachingoel0...@gmail.com Date: 2015-07-29T13:21:58Z [FLINK-2425]Provide access to task manager configuration inside RuntimeEnvironment --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Resolved] (FLINK-2424) InstantiationUtil.serializeObject(Object) does not close output stream
[ https://issues.apache.org/jira/browse/FLINK-2424?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ufuk Celebi resolved FLINK-2424. Resolution: Fixed Fixed via 7bd57d7 (0.10), c756fe8 (0.9.1). InstantiationUtil.serializeObject(Object) does not close output stream -- Key: FLINK-2424 URL: https://issues.apache.org/jira/browse/FLINK-2424 Project: Flink Issue Type: Bug Components: Core Affects Versions: 0.9, master Reporter: Ufuk Celebi Assignee: Ufuk Celebi Fix For: 0.10, 0.9.1 The utility methods InstantiationUtil.serializeObject(Object) does not close the created output stream. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2407) Add an API switch to select between exactly once and at least once fault tolerance
[ https://issues.apache.org/jira/browse/FLINK-2407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14646383#comment-14646383 ] ASF GitHub Bot commented on FLINK-2407: --- Github user StephanEwen commented on the pull request: https://github.com/apache/flink/pull/951#issuecomment-126014037 Manually merged in b211a62111aa3c558586874d0ec5b168e6bb31f1 Add an API switch to select between exactly once and at least once fault tolerance -- Key: FLINK-2407 URL: https://issues.apache.org/jira/browse/FLINK-2407 Project: Flink Issue Type: Sub-task Components: Streaming Affects Versions: 0.10 Reporter: Stephan Ewen Assignee: Stephan Ewen Fix For: 0.10 Based on the addition of the BarrierTracker, we can add a switch to choose between the two modes exactly once and at least once. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Closed] (FLINK-2407) Add an API switch to select between exactly once and at least once fault tolerance
[ https://issues.apache.org/jira/browse/FLINK-2407?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stephan Ewen closed FLINK-2407. --- Add an API switch to select between exactly once and at least once fault tolerance -- Key: FLINK-2407 URL: https://issues.apache.org/jira/browse/FLINK-2407 Project: Flink Issue Type: Sub-task Components: Streaming Affects Versions: 0.10 Reporter: Stephan Ewen Assignee: Stephan Ewen Fix For: 0.10 Based on the addition of the BarrierTracker, we can add a switch to choose between the two modes exactly once and at least once. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (FLINK-2407) Add an API switch to select between exactly once and at least once fault tolerance
[ https://issues.apache.org/jira/browse/FLINK-2407?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stephan Ewen resolved FLINK-2407. - Resolution: Fixed Fixed in b211a62111aa3c558586874d0ec5b168e6bb31f1 Add an API switch to select between exactly once and at least once fault tolerance -- Key: FLINK-2407 URL: https://issues.apache.org/jira/browse/FLINK-2407 Project: Flink Issue Type: Sub-task Components: Streaming Affects Versions: 0.10 Reporter: Stephan Ewen Assignee: Stephan Ewen Fix For: 0.10 Based on the addition of the BarrierTracker, we can add a switch to choose between the two modes exactly once and at least once. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2399) Fail when actor versions don't match
[ https://issues.apache.org/jira/browse/FLINK-2399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14646279#comment-14646279 ] ASF GitHub Bot commented on FLINK-2399: --- Github user sachingoel0101 commented on the pull request: https://github.com/apache/flink/pull/945#issuecomment-125991194 1. Added version checks between JobClient and JobManager. 2. Versions are accessed from the Configuration class now, since flink-core gets built first and version is readily available, leading to exact verification. 3. Added a dummy version string to allow passing of tests in IDE. Fail when actor versions don't match Key: FLINK-2399 URL: https://issues.apache.org/jira/browse/FLINK-2399 Project: Flink Issue Type: Improvement Components: JobManager, TaskManager Affects Versions: 0.9, master Reporter: Ufuk Celebi Assignee: Sachin Goel Priority: Minor Fix For: 0.10 Problem: there can be subtle errors when actors from different Flink versions communicate with each other, for example when an old client (e.g. Flink 0.9) communicates with a new JobManager (e.g. Flink 0.10-SNAPSHOT). We can check that the versions match on first communication between the actors and fail if they don't match. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2407) Add an API switch to select between exactly once and at least once fault tolerance
[ https://issues.apache.org/jira/browse/FLINK-2407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14646350#comment-14646350 ] ASF GitHub Bot commented on FLINK-2407: --- Github user StephanEwen commented on the pull request: https://github.com/apache/flink/pull/951#issuecomment-126007290 Yeah, I thought about an off handler. Turns out that the BarrierTracker is almost like an off handler when no barriers arrive. Add an API switch to select between exactly once and at least once fault tolerance -- Key: FLINK-2407 URL: https://issues.apache.org/jira/browse/FLINK-2407 Project: Flink Issue Type: Sub-task Components: Streaming Affects Versions: 0.10 Reporter: Stephan Ewen Assignee: Stephan Ewen Fix For: 0.10 Based on the addition of the BarrierTracker, we can add a switch to choose between the two modes exactly once and at least once. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-2407] [streaming] Add an API switch to ...
Github user StephanEwen commented on the pull request: https://github.com/apache/flink/pull/951#issuecomment-126007290 Yeah, I thought about an off handler. Turns out that the BarrierTracker is almost like an off handler when no barriers arrive. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-2324) Rework partitioned state storage
[ https://issues.apache.org/jira/browse/FLINK-2324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14646391#comment-14646391 ] ASF GitHub Bot commented on FLINK-2324: --- Github user gyfora commented on the pull request: https://github.com/apache/flink/pull/937#issuecomment-126015036 Any concerns against merging this? @senorcarbone? Rework partitioned state storage Key: FLINK-2324 URL: https://issues.apache.org/jira/browse/FLINK-2324 Project: Flink Issue Type: Improvement Reporter: Gyula Fora Assignee: Gyula Fora Partitioned states are currently stored per-key in statehandles. This is alright for in-memory storage but is very inefficient for HDFS. The logic behind the current mechanism is that this approach provides a way to repartition a state without fetching the data from the external storage and only manipulating handles. We should come up with a solution that can achieve both. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Deleted] (FLINK-2430) Potential race condition when restart all is called for a Twill runnable
[ https://issues.apache.org/jira/browse/FLINK-2430?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Henry Saputra deleted FLINK-2430: - Potential race condition when restart all is called for a Twill runnable Key: FLINK-2430 URL: https://issues.apache.org/jira/browse/FLINK-2430 Project: Flink Issue Type: Bug Reporter: Henry Saputra When sending restart instance to all for a particular TwillRunnable, it could have race condition where the heartbeat thread run right after all containers have been released which make the check: // Looks for containers requests. if (provisioning.isEmpty() runnableContainerRequests.isEmpty() runningContainers.isEmpty()) { LOG.info(All containers completed. Shutting down application master.); break; } This could happen when all running containers are empty and new runnableContainerRequests has not been added. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-2324] [streaming] Partitioned state che...
Github user gyfora commented on the pull request: https://github.com/apache/flink/pull/937#issuecomment-126015036 Any concerns against merging this? @senorcarbone? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: FLINK-2322 Unclosed stream may leak resource
Github user tedyu commented on the pull request: https://github.com/apache/flink/pull/928#issuecomment-126021923 Anything I can do to move this forward ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-2407] [streaming] Add an API switch to ...
Github user gyfora commented on the pull request: https://github.com/apache/flink/pull/951#issuecomment-126005919 Aside from my minor comment, this looks very good :+1: --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-2407) Add an API switch to select between exactly once and at least once fault tolerance
[ https://issues.apache.org/jira/browse/FLINK-2407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14646341#comment-14646341 ] ASF GitHub Bot commented on FLINK-2407: --- Github user gyfora commented on the pull request: https://github.com/apache/flink/pull/951#issuecomment-126005919 Aside from my minor comment, this looks very good :+1: Add an API switch to select between exactly once and at least once fault tolerance -- Key: FLINK-2407 URL: https://issues.apache.org/jira/browse/FLINK-2407 Project: Flink Issue Type: Sub-task Components: Streaming Affects Versions: 0.10 Reporter: Stephan Ewen Assignee: Stephan Ewen Fix For: 0.10 Based on the addition of the BarrierTracker, we can add a switch to choose between the two modes exactly once and at least once. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Closed] (FLINK-2430) Potential race condition when restart all is called for a Twill runnable
[ https://issues.apache.org/jira/browse/FLINK-2430?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Henry Saputra closed FLINK-2430. Resolution: Invalid Pop sorry wrong project =( Potential race condition when restart all is called for a Twill runnable Key: FLINK-2430 URL: https://issues.apache.org/jira/browse/FLINK-2430 Project: Flink Issue Type: Bug Affects Versions: 0.6-incubating Reporter: Henry Saputra When sending restart instance to all for a particular TwillRunnable, it could have race condition where the heartbeat thread run right after all containers have been released which make the check: // Looks for containers requests. if (provisioning.isEmpty() runnableContainerRequests.isEmpty() runningContainers.isEmpty()) { LOG.info(All containers completed. Shutting down application master.); break; } This could happen when all running containers are empty and new runnableContainerRequests has not been added. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-2407] [streaming] Add an API switch to ...
Github user StephanEwen commented on the pull request: https://github.com/apache/flink/pull/951#issuecomment-126014037 Manually merged in b211a62111aa3c558586874d0ec5b168e6bb31f1 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-2425) Give access to TaskManager config and hostname in the Runtime Environment
[ https://issues.apache.org/jira/browse/FLINK-2425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14646251#comment-14646251 ] ASF GitHub Bot commented on FLINK-2425: --- GitHub user sachingoel0101 opened a pull request: https://github.com/apache/flink/pull/952 [FLINK-2425]Provide access to task manager configuration from RuntimeEnvironment Also fixes [FLINK-2426]: Define an UnmodifiableConfiguration class which doesn't allow modifications to the underlying configuration object. You can merge this pull request into a Git repository by running: $ git pull https://github.com/sachingoel0101/flink flink-2426 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/952.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #952 commit a8b0debbd60dd4eeb01e007e0b736f32f3724c71 Author: Sachin Goel sachingoel0...@gmail.com Date: 2015-07-29T13:21:58Z [FLINK-2425]Provide access to task manager configuration inside RuntimeEnvironment Give access to TaskManager config and hostname in the Runtime Environment - Key: FLINK-2425 URL: https://issues.apache.org/jira/browse/FLINK-2425 Project: Flink Issue Type: Improvement Components: TaskManager Affects Versions: 0.10 Reporter: Stephan Ewen Assignee: Sachin Goel Fix For: 0.10 The RuntimeEnvironment (that is used by the operators to access the context) should give access to the TaskManager's configuration, to allow to read config values. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (FLINK-2430) Potential race condition when restart all is called for a Twill runnable
Henry Saputra created FLINK-2430: Summary: Potential race condition when restart all is called for a Twill runnable Key: FLINK-2430 URL: https://issues.apache.org/jira/browse/FLINK-2430 Project: Flink Issue Type: Bug Affects Versions: 0.6-incubating Reporter: Henry Saputra When sending restart instance to all for a particular TwillRunnable, it could have race condition where the heartbeat thread run right after all containers have been released which make the check: // Looks for containers requests. if (provisioning.isEmpty() runnableContainerRequests.isEmpty() runningContainers.isEmpty()) { LOG.info(All containers completed. Shutting down application master.); break; } This could happen when all running containers are empty and new runnableContainerRequests has not been added. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: Cascading changes for compatibility
Github user mxm commented on the pull request: https://github.com/apache/flink/pull/950#issuecomment-125988920 Yes, I've updated the pull request. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-2407) Add an API switch to select between exactly once and at least once fault tolerance
[ https://issues.apache.org/jira/browse/FLINK-2407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14646336#comment-14646336 ] ASF GitHub Bot commented on FLINK-2407: --- Github user gyfora commented on a diff in the pull request: https://github.com/apache/flink/pull/951#discussion_r35778711 --- Diff: flink-staging/flink-streaming/flink-streaming-core/src/main/java/org/apache/flink/streaming/api/graph/StreamingJobGraphGenerator.java --- @@ -269,10 +270,19 @@ private void setVertexConfig(Integer vertexID, StreamConfig config, config.setNumberOfOutputs(nonChainableOutputs.size()); config.setNonChainedOutputs(nonChainableOutputs); config.setChainedOutputs(chainableOutputs); - config.setStateMonitoring(streamGraph.isCheckpointingEnabled()); - config.setStateHandleProvider(streamGraph.getStateHandleProvider()); + + config.setCheckpointingEnabled(streamGraph.isCheckpointingEnabled()); + if (streamGraph.isCheckpointingEnabled()) { + config.setCheckpointMode(streamGraph.getCheckpointingMode()); + config.setStateHandleProvider(streamGraph.getStateHandleProvider()); + } else { + // the at least once input handler is slightly cheaper (in the absence of checkpoints), + // so we use that one if checkpointing is not enabled + config.setCheckpointMode(CheckpointingMode.AT_LEAST_ONCE); --- End diff -- So we set an at_least_once handler even though we won't receive any barriers? Why not add an OFF mode which will use a no-op barrier handler or skip the barrier handler altogether? Add an API switch to select between exactly once and at least once fault tolerance -- Key: FLINK-2407 URL: https://issues.apache.org/jira/browse/FLINK-2407 Project: Flink Issue Type: Sub-task Components: Streaming Affects Versions: 0.10 Reporter: Stephan Ewen Assignee: Stephan Ewen Fix For: 0.10 Based on the addition of the BarrierTracker, we can add a switch to choose between the two modes exactly once and at least once. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-2407] [streaming] Add an API switch to ...
Github user gyfora commented on a diff in the pull request: https://github.com/apache/flink/pull/951#discussion_r35778711 --- Diff: flink-staging/flink-streaming/flink-streaming-core/src/main/java/org/apache/flink/streaming/api/graph/StreamingJobGraphGenerator.java --- @@ -269,10 +270,19 @@ private void setVertexConfig(Integer vertexID, StreamConfig config, config.setNumberOfOutputs(nonChainableOutputs.size()); config.setNonChainedOutputs(nonChainableOutputs); config.setChainedOutputs(chainableOutputs); - config.setStateMonitoring(streamGraph.isCheckpointingEnabled()); - config.setStateHandleProvider(streamGraph.getStateHandleProvider()); + + config.setCheckpointingEnabled(streamGraph.isCheckpointingEnabled()); + if (streamGraph.isCheckpointingEnabled()) { + config.setCheckpointMode(streamGraph.getCheckpointingMode()); + config.setStateHandleProvider(streamGraph.getStateHandleProvider()); + } else { + // the at least once input handler is slightly cheaper (in the absence of checkpoints), + // so we use that one if checkpointing is not enabled + config.setCheckpointMode(CheckpointingMode.AT_LEAST_ONCE); --- End diff -- So we set an at_least_once handler even though we won't receive any barriers? Why not add an OFF mode which will use a no-op barrier handler or skip the barrier handler altogether? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-2407) Add an API switch to select between exactly once and at least once fault tolerance
[ https://issues.apache.org/jira/browse/FLINK-2407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14646384#comment-14646384 ] ASF GitHub Bot commented on FLINK-2407: --- Github user StephanEwen closed the pull request at: https://github.com/apache/flink/pull/951 Add an API switch to select between exactly once and at least once fault tolerance -- Key: FLINK-2407 URL: https://issues.apache.org/jira/browse/FLINK-2407 Project: Flink Issue Type: Sub-task Components: Streaming Affects Versions: 0.10 Reporter: Stephan Ewen Assignee: Stephan Ewen Fix For: 0.10 Based on the addition of the BarrierTracker, we can add a switch to choose between the two modes exactly once and at least once. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-2407] [streaming] Add an API switch to ...
Github user gyfora commented on the pull request: https://github.com/apache/flink/pull/951#issuecomment-126008434 Yes, I double checked and you are right. This is practically as lightweight as it gets. :) +1 to merge --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-2407) Add an API switch to select between exactly once and at least once fault tolerance
[ https://issues.apache.org/jira/browse/FLINK-2407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14646357#comment-14646357 ] ASF GitHub Bot commented on FLINK-2407: --- Github user gyfora commented on the pull request: https://github.com/apache/flink/pull/951#issuecomment-126008434 Yes, I double checked and you are right. This is practically as lightweight as it gets. :) +1 to merge Add an API switch to select between exactly once and at least once fault tolerance -- Key: FLINK-2407 URL: https://issues.apache.org/jira/browse/FLINK-2407 Project: Flink Issue Type: Sub-task Components: Streaming Affects Versions: 0.10 Reporter: Stephan Ewen Assignee: Stephan Ewen Fix For: 0.10 Based on the addition of the BarrierTracker, we can add a switch to choose between the two modes exactly once and at least once. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2322) Unclosed stream may leak resource
[ https://issues.apache.org/jira/browse/FLINK-2322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14646439#comment-14646439 ] ASF GitHub Bot commented on FLINK-2322: --- Github user tedyu commented on the pull request: https://github.com/apache/flink/pull/928#issuecomment-126021923 Anything I can do to move this forward ? Unclosed stream may leak resource - Key: FLINK-2322 URL: https://issues.apache.org/jira/browse/FLINK-2322 Project: Flink Issue Type: Bug Reporter: Ted Yu Labels: starter In UdfAnalyzerUtils.java : {code} ClassReader cr = new ClassReader(Thread.currentThread().getContextClassLoader() .getResourceAsStream(internalClassName.replace('.', '/') + .class)); {code} The stream returned by getResourceAsStream() should be closed upon exit of findMethodNode() In ParameterTool#fromPropertiesFile(): {code} props.load(new FileInputStream(propertiesFile)); {code} The FileInputStream should be closed before returning. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-2419] Add test for sinks after keyBy an...
Github user asfgit closed the pull request at: https://github.com/apache/flink/pull/947 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-2324) Rework partitioned state storage
[ https://issues.apache.org/jira/browse/FLINK-2324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14646641#comment-14646641 ] ASF GitHub Bot commented on FLINK-2324: --- Github user aljoscha commented on the pull request: https://github.com/apache/flink/pull/937#issuecomment-126070524 I was too busy but I would like to take a quick look. I will look at it tomorrow morning. Rework partitioned state storage Key: FLINK-2324 URL: https://issues.apache.org/jira/browse/FLINK-2324 Project: Flink Issue Type: Improvement Reporter: Gyula Fora Assignee: Gyula Fora Partitioned states are currently stored per-key in statehandles. This is alright for in-memory storage but is very inefficient for HDFS. The logic behind the current mechanism is that this approach provides a way to repartition a state without fetching the data from the external storage and only manipulating handles. We should come up with a solution that can achieve both. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-2324] [streaming] Partitioned state che...
Github user aljoscha commented on the pull request: https://github.com/apache/flink/pull/937#issuecomment-126070524 I was too busy but I would like to take a quick look. I will look at it tomorrow morning. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-2205] Fix confusing entries in JM UI Jo...
Github user ebautistabar commented on the pull request: https://github.com/apache/flink/pull/927#issuecomment-126099697 As the new UI didn't show job configurations yet, I have created another PR to deal with it, which also includes the display changes requested in FLINK-2205. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-2205) Confusing entries in JM Webfrontend Job Configuration section
[ https://issues.apache.org/jira/browse/FLINK-2205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14646789#comment-14646789 ] ASF GitHub Bot commented on FLINK-2205: --- Github user ebautistabar commented on the pull request: https://github.com/apache/flink/pull/927#issuecomment-126099697 As the new UI didn't show job configurations yet, I have created another PR to deal with it, which also includes the display changes requested in FLINK-2205. Confusing entries in JM Webfrontend Job Configuration section - Key: FLINK-2205 URL: https://issues.apache.org/jira/browse/FLINK-2205 Project: Flink Issue Type: Bug Components: Webfrontend Affects Versions: 0.9 Reporter: Fabian Hueske Priority: Minor Labels: starter The Job Configuration section of the job history / analyze page of the JobManager webinterface contains two confusing entries: - {{Number of execution retries}} is actually the maximum number of retries and should be renamed accordingly. The default value is -1 and should be changed to deactivated (or 0). - {{Job parallelism}} which is -1 by default. A parallelism of -1 is not very meaningful. It would be better to show something like auto -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (FLINK-2431) [py] refactor PlanBinder/OperationInfo
Chesnay Schepler created FLINK-2431: --- Summary: [py] refactor PlanBinder/OperationInfo Key: FLINK-2431 URL: https://issues.apache.org/jira/browse/FLINK-2431 Project: Flink Issue Type: Improvement Components: Python API Reporter: Chesnay Schepler Assignee: Chesnay Schepler Priority: Minor These two classes deserve a restructuring to become more readable and consistent with PythonPlanBinder/PythonOperationInfo. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-2357] [web dashboard] Shows job configu...
GitHub user ebautistabar opened a pull request: https://github.com/apache/flink/pull/953 [FLINK-2357] [web dashboard] Shows job configuration in new dashboard I wanted to change the configuration display, as in #927, but it wasn't implemented in the new UI. So I decided to create this PR to take care of it. - Added handler for new endpoint `/jobs/:jobid/config` - The endpoint is hit when selecting a job in the UI - Example of response: ```json { execution-config: { execution-mode: PIPELINED, job-parallelism: -1, max-execution-retries: -1, object-reuse-mode: false, user-config: { example: true } }, jid: dedc15369efa94eb1e3ad6482f1344b6, name: WordCount Example } ``` - The info is stored in the existing `$rootScope.job` object - Added new tab in job display to show the config Please, let me know if there is anything to change, improve, etc. You can merge this pull request into a Git repository by running: $ git pull https://github.com/ebautistabar/flink add-job-config-new-ui Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/953.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #953 commit b51444262bdb8aba4b1f6a755a36f029af066a79 Author: Enrique Bautista ebautista...@gmail.com Date: 2015-07-29T20:49:13Z [FLINK-2357] [web dashboard] Shows job configuration in new dashboard --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-2357) New JobManager Runtime Web Frontend
[ https://issues.apache.org/jira/browse/FLINK-2357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14646778#comment-14646778 ] ASF GitHub Bot commented on FLINK-2357: --- GitHub user ebautistabar opened a pull request: https://github.com/apache/flink/pull/953 [FLINK-2357] [web dashboard] Shows job configuration in new dashboard I wanted to change the configuration display, as in #927, but it wasn't implemented in the new UI. So I decided to create this PR to take care of it. - Added handler for new endpoint `/jobs/:jobid/config` - The endpoint is hit when selecting a job in the UI - Example of response: ```json { execution-config: { execution-mode: PIPELINED, job-parallelism: -1, max-execution-retries: -1, object-reuse-mode: false, user-config: { example: true } }, jid: dedc15369efa94eb1e3ad6482f1344b6, name: WordCount Example } ``` - The info is stored in the existing `$rootScope.job` object - Added new tab in job display to show the config Please, let me know if there is anything to change, improve, etc. You can merge this pull request into a Git repository by running: $ git pull https://github.com/ebautistabar/flink add-job-config-new-ui Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/953.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #953 commit b51444262bdb8aba4b1f6a755a36f029af066a79 Author: Enrique Bautista ebautista...@gmail.com Date: 2015-07-29T20:49:13Z [FLINK-2357] [web dashboard] Shows job configuration in new dashboard New JobManager Runtime Web Frontend --- Key: FLINK-2357 URL: https://issues.apache.org/jira/browse/FLINK-2357 Project: Flink Issue Type: New Feature Components: Webfrontend Affects Versions: 0.10 Reporter: Stephan Ewen Attachments: Webfrontend Mockup.pdf We need to improve rework the Job Manager Web Frontend. The current web frontend is limited and has a lot of design issues - It does not display and progress while operators are running. This is especially problematic for streaming jobs - It has no graph representation of the data flows - it does not allow to look into execution attempts - it has no hook to deal with the upcoming live accumulators - The architecture is not very modular/extensible I propose to add a new JobManager web frontend: - Based on Netty HTTP (very lightweight) - Using rest-style URLs for jobs and vertices - integrating the D3 graph renderer of the previews with the runtime monitor - with details on execution attempts - first class visualization of records processed and bytes processed -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2419) DataStream sinks lose key information
[ https://issues.apache.org/jira/browse/FLINK-2419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14646592#comment-14646592 ] ASF GitHub Bot commented on FLINK-2419: --- Github user senorcarbone commented on the pull request: https://github.com/apache/flink/pull/947#issuecomment-126054868 :+1: from me. looks that it does the job DataStream sinks lose key information - Key: FLINK-2419 URL: https://issues.apache.org/jira/browse/FLINK-2419 Project: Flink Issue Type: Bug Components: Streaming Reporter: Gyula Fora Assignee: Gyula Fora Priority: Blocker Fix For: 0.10 If the user applies an addSink method after keyBy, the sink will not have the information of the key as the transform method is bypassed. This makes it impossible to use partitioned states with sinks. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: Framesize fix
Github user kl0u commented on the pull request: https://github.com/apache/flink/pull/934#issuecomment-126076384 Hi @mxm , Thanks a lot for the comments! I integrated most of them. Please have a look and let me know what you think. For the merging of the the different types of snapshots and handling them uniformly I do not have any current solution. If you have any, I am open, of course, to discuss it, because I agree that this would be nice. For the comment on the getAccumulatorResultsStringified(): 1) this is to be presented by the web interface to the user, just for monitoring purposes 2) this is called at the jobManager. The problem is that the jobManager has only the blobKeys that point to the stored accumulators. The serialized data reside in the blobCache and have to be fetched in order to be inspected. Currently the jobManager just forwards the blobKeys to the client, which fetches the blobs and does the deserialization and the final merging. This is done for jobManager scalability reasons, as given that we are talking about accumulators of arbitrary size, loading them from disk and deserializing them would be time and resource consuming. The same holds in the case that we wanted to get the type of these large accumulators (it is needed by the method). We would have to load and deserialize them at the jobManager. The currently implemented solution is just the result of this design decision. If you have any other strategy or solution that is worth implementing, let me know. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-2419] Add test for sinks after keyBy an...
Github user senorcarbone commented on the pull request: https://github.com/apache/flink/pull/947#issuecomment-126054868 :+1: from me. looks that it does the job --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-2324] [streaming] Partitioned state che...
Github user senorcarbone commented on the pull request: https://github.com/apache/flink/pull/937#issuecomment-126067262 That would be good ^^ then it's :+1: from me, at least for now. It's generally good performance-wise to have less serialised states. This means that we will have a constant number of issued writes to external storage (== #subtasks). On the other hand this also makes our life harder a bit when it comes to repartitioning, as you already mentioned we need to revisit this. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-2324) Rework partitioned state storage
[ https://issues.apache.org/jira/browse/FLINK-2324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14646634#comment-14646634 ] ASF GitHub Bot commented on FLINK-2324: --- Github user senorcarbone commented on the pull request: https://github.com/apache/flink/pull/937#issuecomment-126067262 That would be good ^^ then it's :+1: from me, at least for now. It's generally good performance-wise to have less serialised states. This means that we will have a constant number of issued writes to external storage (== #subtasks). On the other hand this also makes our life harder a bit when it comes to repartitioning, as you already mentioned we need to revisit this. Rework partitioned state storage Key: FLINK-2324 URL: https://issues.apache.org/jira/browse/FLINK-2324 Project: Flink Issue Type: Improvement Reporter: Gyula Fora Assignee: Gyula Fora Partitioned states are currently stored per-key in statehandles. This is alright for in-memory storage but is very inefficient for HDFS. The logic behind the current mechanism is that this approach provides a way to repartition a state without fetching the data from the external storage and only manipulating handles. We should come up with a solution that can achieve both. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Closed] (FLINK-2419) DataStream sinks lose key information
[ https://issues.apache.org/jira/browse/FLINK-2419?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gyula Fora closed FLINK-2419. - Resolution: Fixed DataStream sinks lose key information - Key: FLINK-2419 URL: https://issues.apache.org/jira/browse/FLINK-2419 Project: Flink Issue Type: Bug Components: Streaming Reporter: Gyula Fora Assignee: Gyula Fora Priority: Blocker Fix For: 0.10 If the user applies an addSink method after keyBy, the sink will not have the information of the key as the transform method is bypassed. This makes it impossible to use partitioned states with sinks. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2324) Rework partitioned state storage
[ https://issues.apache.org/jira/browse/FLINK-2324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14646619#comment-14646619 ] ASF GitHub Bot commented on FLINK-2324: --- Github user gyfora commented on the pull request: https://github.com/apache/flink/pull/937#issuecomment-126062430 I will actually add one more test to verify the correct behaviour of all the wrapper statehandle classes. Rework partitioned state storage Key: FLINK-2324 URL: https://issues.apache.org/jira/browse/FLINK-2324 Project: Flink Issue Type: Improvement Reporter: Gyula Fora Assignee: Gyula Fora Partitioned states are currently stored per-key in statehandles. This is alright for in-memory storage but is very inefficient for HDFS. The logic behind the current mechanism is that this approach provides a way to repartition a state without fetching the data from the external storage and only manipulating handles. We should come up with a solution that can achieve both. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-2324] [streaming] Partitioned state che...
Github user gyfora commented on the pull request: https://github.com/apache/flink/pull/937#issuecomment-126062430 I will actually add one more test to verify the correct behaviour of all the wrapper statehandle classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Created] (FLINK-2433) Jekyll on windows for building local documentation
Sachin Goel created FLINK-2433: -- Summary: Jekyll on windows for building local documentation Key: FLINK-2433 URL: https://issues.apache.org/jira/browse/FLINK-2433 Project: Flink Issue Type: Improvement Components: Documentation Reporter: Sachin Goel We should add a script to allow building documentation on a Windows machine similar to the docs/build_docs.sh script. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1919) Add HCatOutputFormat for Tuple data types
[ https://issues.apache.org/jira/browse/FLINK-1919?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14647061#comment-14647061 ] James Cao commented on FLINK-1919: -- Hello, is there any one working on this issue now? If not, I would like to contribute on this issue. Thanks:) Add HCatOutputFormat for Tuple data types - Key: FLINK-1919 URL: https://issues.apache.org/jira/browse/FLINK-1919 Project: Flink Issue Type: New Feature Components: Java API, Scala API Reporter: Fabian Hueske Priority: Minor Labels: starter It would be good to have an OutputFormat that can write data to HCatalog tables. The Hadoop `HCatOutputFormat` expects `HCatRecord` objects and writes these to HCatalog tables. We can do the same thing, by creating these `HCatRecord` object with a Map function that precedes a `HadoopOutputFormat` that wraps the Hadoop `HCatOutputFormat`. Better support for Flink Tuples can be added by implementing a custom `HCatOutputFormat` that also depends on the Hadoop `HCatOutputFormat` but internally converts Flink Tuples to `HCatRecords`. This would also include to check if the schema of the HCatalog table and the Flink tuples match. For data types other than tuples, the OutputFormat could either require a preceding Map function that converts to `HCatRecords` or let users specify a MapFunction and invoke that internally. We have already a Flink `HCatInputFormat` which does this in the reverse directions, i.e., it emits Flink Tuples from HCatalog tables. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-2433][docs]Add script to build local do...
GitHub user sachingoel0101 opened a pull request: https://github.com/apache/flink/pull/954 [FLINK-2433][docs]Add script to build local documentation on windows You can merge this pull request into a Git repository by running: $ git pull https://github.com/sachingoel0101/flink flink-2433 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/954.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #954 commit 1e13daffd34e84d6436221e31ae39736f76a9a72 Author: Sachin Goel sachingoel0...@gmail.com Date: 2015-07-30T04:18:09Z [FLINK-2433][docs]Add script to build local documentation on windows --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-2433) Jekyll on windows for building local documentation
[ https://issues.apache.org/jira/browse/FLINK-2433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14647149#comment-14647149 ] ASF GitHub Bot commented on FLINK-2433: --- GitHub user sachingoel0101 opened a pull request: https://github.com/apache/flink/pull/954 [FLINK-2433][docs]Add script to build local documentation on windows You can merge this pull request into a Git repository by running: $ git pull https://github.com/sachingoel0101/flink flink-2433 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/954.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #954 commit 1e13daffd34e84d6436221e31ae39736f76a9a72 Author: Sachin Goel sachingoel0...@gmail.com Date: 2015-07-30T04:18:09Z [FLINK-2433][docs]Add script to build local documentation on windows Jekyll on windows for building local documentation -- Key: FLINK-2433 URL: https://issues.apache.org/jira/browse/FLINK-2433 Project: Flink Issue Type: Improvement Components: Documentation Reporter: Sachin Goel We should add a script to allow building documentation on a Windows machine similar to the docs/build_docs.sh script. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2324) Rework partitioned state storage
[ https://issues.apache.org/jira/browse/FLINK-2324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14645645#comment-14645645 ] ASF GitHub Bot commented on FLINK-2324: --- Github user gyfora commented on the pull request: https://github.com/apache/flink/pull/937#issuecomment-125872157 Merging your PR fixed the exactly-once guarantees @StephanEwen , great! I also added an extra test for Partitioned states. Do you think it is okay to leave the StreamCheckpointingITCase how I modified it? Now it tests all the different checkpointing mechanisms together. If you add a test for the Checkpointed interface itself, this should be good. Rework partitioned state storage Key: FLINK-2324 URL: https://issues.apache.org/jira/browse/FLINK-2324 Project: Flink Issue Type: Improvement Reporter: Gyula Fora Assignee: Gyula Fora Partitioned states are currently stored per-key in statehandles. This is alright for in-memory storage but is very inefficient for HDFS. The logic behind the current mechanism is that this approach provides a way to repartition a state without fetching the data from the external storage and only manipulating handles. We should come up with a solution that can achieve both. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-2324] [streaming] Partitioned state che...
Github user gyfora commented on the pull request: https://github.com/apache/flink/pull/937#issuecomment-125872157 Merging your PR fixed the exactly-once guarantees @StephanEwen , great! I also added an extra test for Partitioned states. Do you think it is okay to leave the StreamCheckpointingITCase how I modified it? Now it tests all the different checkpointing mechanisms together. If you add a test for the Checkpointed interface itself, this should be good. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Closed] (FLINK-2406) Abstract BarrierBuffer to an exchangeable BarrierHandler
[ https://issues.apache.org/jira/browse/FLINK-2406?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stephan Ewen closed FLINK-2406. --- Abstract BarrierBuffer to an exchangeable BarrierHandler Key: FLINK-2406 URL: https://issues.apache.org/jira/browse/FLINK-2406 Project: Flink Issue Type: Sub-task Components: Streaming Affects Versions: 0.10 Reporter: Stephan Ewen Assignee: Stephan Ewen Fix For: 0.10 We need to make the Checkpoint handling pluggable, to allow us to use different implementations: - BarrierBuffer for exactly once processing. This inevitably introduces a bit of latency. - BarrierTracker for at least once processing, with no added latency. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (FLINK-2402) Add a non-blocking BarrierTracker
[ https://issues.apache.org/jira/browse/FLINK-2402?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stephan Ewen resolved FLINK-2402. - Resolution: Implemented Implemented in 8f87b7164b644ea8f1708f7eb76567e58341b224 Add a non-blocking BarrierTracker - Key: FLINK-2402 URL: https://issues.apache.org/jira/browse/FLINK-2402 Project: Flink Issue Type: Sub-task Components: Streaming Affects Versions: 0.10 Reporter: Stephan Ewen Assignee: Stephan Ewen Fix For: 0.10 This issue would add a new tracker for barriers that simply tracks what barriers have been observed from which inputs. It never blocks off buffers. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Closed] (FLINK-2402) Add a non-blocking BarrierTracker
[ https://issues.apache.org/jira/browse/FLINK-2402?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stephan Ewen closed FLINK-2402. --- Add a non-blocking BarrierTracker - Key: FLINK-2402 URL: https://issues.apache.org/jira/browse/FLINK-2402 Project: Flink Issue Type: Sub-task Components: Streaming Affects Versions: 0.10 Reporter: Stephan Ewen Assignee: Stephan Ewen Fix For: 0.10 This issue would add a new tracker for barriers that simply tracks what barriers have been observed from which inputs. It never blocks off buffers. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (FLINK-2406) Abstract BarrierBuffer to an exchangeable BarrierHandler
[ https://issues.apache.org/jira/browse/FLINK-2406?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stephan Ewen resolved FLINK-2406. - Resolution: Fixed Fixed via 0579f90bab165a7df336163eb9d6337267020029 Abstract BarrierBuffer to an exchangeable BarrierHandler Key: FLINK-2406 URL: https://issues.apache.org/jira/browse/FLINK-2406 Project: Flink Issue Type: Sub-task Components: Streaming Affects Versions: 0.10 Reporter: Stephan Ewen Assignee: Stephan Ewen Fix For: 0.10 We need to make the Checkpoint handling pluggable, to allow us to use different implementations: - BarrierBuffer for exactly once processing. This inevitably introduces a bit of latency. - BarrierTracker for at least once processing, with no added latency. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2406) Abstract BarrierBuffer to an exchangeable BarrierHandler
[ https://issues.apache.org/jira/browse/FLINK-2406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14645744#comment-14645744 ] ASF GitHub Bot commented on FLINK-2406: --- Github user StephanEwen commented on the pull request: https://github.com/apache/flink/pull/938#issuecomment-125887226 It failed when at the end of the program, queued channel data was present at the end of the job. This happens only with slow consumers. Apparently, the local machines are usually fast enough to never leave queue data, and Travis is the slow one again ;-) Abstract BarrierBuffer to an exchangeable BarrierHandler Key: FLINK-2406 URL: https://issues.apache.org/jira/browse/FLINK-2406 Project: Flink Issue Type: Sub-task Components: Streaming Affects Versions: 0.10 Reporter: Stephan Ewen Assignee: Stephan Ewen Fix For: 0.10 We need to make the Checkpoint handling pluggable, to allow us to use different implementations: - BarrierBuffer for exactly once processing. This inevitably introduces a bit of latency. - BarrierTracker for at least once processing, with no added latency. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-2406] [FLINK-2402] Abstract the Barrier...
Github user StephanEwen commented on the pull request: https://github.com/apache/flink/pull/938#issuecomment-125889092 Manually merged in 8f87b7164b644ea8f1708f7eb76567e58341b224 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-2406) Abstract BarrierBuffer to an exchangeable BarrierHandler
[ https://issues.apache.org/jira/browse/FLINK-2406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14645761#comment-14645761 ] ASF GitHub Bot commented on FLINK-2406: --- Github user StephanEwen closed the pull request at: https://github.com/apache/flink/pull/938 Abstract BarrierBuffer to an exchangeable BarrierHandler Key: FLINK-2406 URL: https://issues.apache.org/jira/browse/FLINK-2406 Project: Flink Issue Type: Sub-task Components: Streaming Affects Versions: 0.10 Reporter: Stephan Ewen Assignee: Stephan Ewen Fix For: 0.10 We need to make the Checkpoint handling pluggable, to allow us to use different implementations: - BarrierBuffer for exactly once processing. This inevitably introduces a bit of latency. - BarrierTracker for at least once processing, with no added latency. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-2231] Create a Serializer for Scala Enu...
Github user StephanEwen commented on the pull request: https://github.com/apache/flink/pull/935#issuecomment-125890350 Other than the one comment, this looks good. +1 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-1901) Create sample operator for Dataset
[ https://issues.apache.org/jira/browse/FLINK-1901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14645658#comment-14645658 ] ASF GitHub Bot commented on FLINK-1901: --- GitHub user ChengXiangLi opened a pull request: https://github.com/apache/flink/pull/949 [FLINK-1901] [core] Create sample operator for Dataset. This PR includes: 1. 4 random sampler implementation for different sample strategies. 2. sample operator for DataSet Java API. 3. random sampler unit test. 4. sample operator Java API integration test. You can merge this pull request into a Git repository by running: $ git pull https://github.com/ChengXiangLi/flink FLINK-1901 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/949.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #949 commit f7ba8779b8d6a6d66ab5d4e2435a70e220b1e0fc Author: chengxiang li chengxiang...@intel.com Date: 2015-07-22T03:38:13Z [FLINK-1901] [core] Create sample operator for Dataset. Create sample operator for Dataset -- Key: FLINK-1901 URL: https://issues.apache.org/jira/browse/FLINK-1901 Project: Flink Issue Type: Improvement Components: Core Reporter: Theodore Vasiloudis Assignee: Chengxiang Li In order to be able to implement Stochastic Gradient Descent and a number of other machine learning algorithms we need to have a way to take a random sample from a Dataset. We need to be able to sample with or without replacement from the Dataset, choose the relative size of the sample, and set a seed for reproducibility. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-1901] [core] Create sample operator for...
GitHub user ChengXiangLi opened a pull request: https://github.com/apache/flink/pull/949 [FLINK-1901] [core] Create sample operator for Dataset. This PR includes: 1. 4 random sampler implementation for different sample strategies. 2. sample operator for DataSet Java API. 3. random sampler unit test. 4. sample operator Java API integration test. You can merge this pull request into a Git repository by running: $ git pull https://github.com/ChengXiangLi/flink FLINK-1901 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/949.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #949 commit f7ba8779b8d6a6d66ab5d4e2435a70e220b1e0fc Author: chengxiang li chengxiang...@intel.com Date: 2015-07-22T03:38:13Z [FLINK-1901] [core] Create sample operator for Dataset. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-2399] Version checks for Job Manager an...
Github user StephanEwen commented on the pull request: https://github.com/apache/flink/pull/945#issuecomment-125886476 Using `getClass().getPackage().getImplementationVersion()` would be a decent first approach then, I guess. The critical part seems to be the Client-to-JobManager communication. How about the following as a first step: The client sends its version together with the `SubmitJob` message (just add a field there). The JobManager would check the version and respond with a failure, if it does not match. You can probably make the JobManager part very simple, no need to add extra constructor parameters, etc. That way, the change would be minimally invasive, and we could see how well it addresses the issues, and whether we should extend this to other messages as well. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-2406] [FLINK-2402] Abstract the Barrier...
Github user uce commented on the pull request: https://github.com/apache/flink/pull/938#issuecomment-125887588 I really love Travis :heart: --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-2406] [FLINK-2402] Abstract the Barrier...
Github user StephanEwen closed the pull request at: https://github.com/apache/flink/pull/938 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Resolved] (FLINK-2418) Add an end-to-end streaming fault tolerance test for the Checkpointed interface
[ https://issues.apache.org/jira/browse/FLINK-2418?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stephan Ewen resolved FLINK-2418. - Resolution: Fixed Fixed via a0556efb233f15c6985d17886372a8b4b00392b2 Add an end-to-end streaming fault tolerance test for the Checkpointed interface Key: FLINK-2418 URL: https://issues.apache.org/jira/browse/FLINK-2418 Project: Flink Issue Type: Bug Components: Streaming Affects Versions: 0.10 Reporter: Stephan Ewen Assignee: Stephan Ewen Fix For: 0.10 The current test lacks the following: - Does not validate and exactly-once counts after partitioning step - Does not cover all cases with the {{Checkpointed}} interface, but uses a mix of by-key state and explicitly Checkpointed state - The test uses a non-fault-tolerant deprecated infinite reducer. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-2406] [FLINK-2402] Abstract the Barrier...
Github user StephanEwen commented on the pull request: https://github.com/apache/flink/pull/938#issuecomment-125887226 It failed when at the end of the program, queued channel data was present at the end of the job. This happens only with slow consumers. Apparently, the local machines are usually fast enough to never leave queue data, and Travis is the slow one again ;-) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-2231) Create a Serializer for Scala Enumerations
[ https://issues.apache.org/jira/browse/FLINK-2231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14645773#comment-14645773 ] ASF GitHub Bot commented on FLINK-2231: --- Github user StephanEwen commented on the pull request: https://github.com/apache/flink/pull/935#issuecomment-125890350 Other than the one comment, this looks good. +1 Create a Serializer for Scala Enumerations -- Key: FLINK-2231 URL: https://issues.apache.org/jira/browse/FLINK-2231 Project: Flink Issue Type: Improvement Components: Scala API Reporter: Stephan Ewen Assignee: Alexander Alexandrov Scala Enumerations are currently serialized with Kryo, but should be efficiently serialized by just writing the {{initial}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Closed] (FLINK-2418) Add an end-to-end streaming fault tolerance test for the Checkpointed interface
[ https://issues.apache.org/jira/browse/FLINK-2418?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stephan Ewen closed FLINK-2418. --- Add an end-to-end streaming fault tolerance test for the Checkpointed interface Key: FLINK-2418 URL: https://issues.apache.org/jira/browse/FLINK-2418 Project: Flink Issue Type: Bug Components: Streaming Affects Versions: 0.10 Reporter: Stephan Ewen Assignee: Stephan Ewen Fix For: 0.10 The current test lacks the following: - Does not validate and exactly-once counts after partitioning step - Does not cover all cases with the {{Checkpointed}} interface, but uses a mix of by-key state and explicitly Checkpointed state - The test uses a non-fault-tolerant deprecated infinite reducer. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2399) Fail when actor versions don't match
[ https://issues.apache.org/jira/browse/FLINK-2399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14645737#comment-14645737 ] ASF GitHub Bot commented on FLINK-2399: --- Github user StephanEwen commented on the pull request: https://github.com/apache/flink/pull/945#issuecomment-125886476 Using `getClass().getPackage().getImplementationVersion()` would be a decent first approach then, I guess. The critical part seems to be the Client-to-JobManager communication. How about the following as a first step: The client sends its version together with the `SubmitJob` message (just add a field there). The JobManager would check the version and respond with a failure, if it does not match. You can probably make the JobManager part very simple, no need to add extra constructor parameters, etc. That way, the change would be minimally invasive, and we could see how well it addresses the issues, and whether we should extend this to other messages as well. Fail when actor versions don't match Key: FLINK-2399 URL: https://issues.apache.org/jira/browse/FLINK-2399 Project: Flink Issue Type: Improvement Components: JobManager, TaskManager Affects Versions: 0.9, master Reporter: Ufuk Celebi Assignee: Sachin Goel Priority: Minor Fix For: 0.10 Problem: there can be subtle errors when actors from different Flink versions communicate with each other, for example when an old client (e.g. Flink 0.9) communicates with a new JobManager (e.g. Flink 0.10-SNAPSHOT). We can check that the versions match on first communication between the actors and fail if they don't match. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-2391]Fix Storm-compatibility FlinkTopol...
Github user StephanEwen commented on the pull request: https://github.com/apache/flink/pull/940#issuecomment-125887557 Will merge this... --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-2406) Abstract BarrierBuffer to an exchangeable BarrierHandler
[ https://issues.apache.org/jira/browse/FLINK-2406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14645749#comment-14645749 ] ASF GitHub Bot commented on FLINK-2406: --- Github user uce commented on the pull request: https://github.com/apache/flink/pull/938#issuecomment-125887588 I really love Travis :heart: Abstract BarrierBuffer to an exchangeable BarrierHandler Key: FLINK-2406 URL: https://issues.apache.org/jira/browse/FLINK-2406 Project: Flink Issue Type: Sub-task Components: Streaming Affects Versions: 0.10 Reporter: Stephan Ewen Assignee: Stephan Ewen Fix For: 0.10 We need to make the Checkpoint handling pluggable, to allow us to use different implementations: - BarrierBuffer for exactly once processing. This inevitably introduces a bit of latency. - BarrierTracker for at least once processing, with no added latency. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2391) Storm-compatibility:method FlinkTopologyBuilder.createTopology() throws java.lang.NullPointerException
[ https://issues.apache.org/jira/browse/FLINK-2391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14645748#comment-14645748 ] ASF GitHub Bot commented on FLINK-2391: --- Github user StephanEwen commented on the pull request: https://github.com/apache/flink/pull/940#issuecomment-125887557 Will merge this... Storm-compatibility:method FlinkTopologyBuilder.createTopology() throws java.lang.NullPointerException -- Key: FLINK-2391 URL: https://issues.apache.org/jira/browse/FLINK-2391 Project: Flink Issue Type: Bug Components: flink-contrib Affects Versions: 0.10 Environment: win7 32bit;linux Reporter: Huang Wei Labels: features Fix For: 0.10 Original Estimate: 168h Remaining Estimate: 168h core dumped at FlinkOutputFieldsDeclarer.java : 160(package FlinkOutputFieldsDeclarer). code : fieldIndexes[i] = this.outputSchema.fieldIndex(groupingFields.get(i)); in this line, the var this.outputSchema may be null. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2231) Create a Serializer for Scala Enumerations
[ https://issues.apache.org/jira/browse/FLINK-2231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14645766#comment-14645766 ] ASF GitHub Bot commented on FLINK-2231: --- Github user StephanEwen commented on a diff in the pull request: https://github.com/apache/flink/pull/935#discussion_r35741171 --- Diff: flink-scala/src/main/scala/org/apache/flink/api/scala/typeutils/EnumValueSerializer.scala --- @@ -0,0 +1,61 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * License); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.flink.api.scala.typeutils + +import org.apache.flink.api.common.typeutils.TypeSerializer +import org.apache.flink.api.common.typeutils.base.IntSerializer +import org.apache.flink.core.memory.{DataInputView, DataOutputView} + +/** + * Serializer for [[Enumeration]] values. + */ +class EnumValueSerializer[E : Enumeration](val enum: E) extends TypeSerializer[E#Value] { + + type T = E#Value + + val intSerializer = new IntSerializer() + + override def duplicate: EnumValueSerializer[E] = this + + override def createInstance: T = enum(0) + + override def isImmutableType: Boolean = true + + override def getLength: Int = intSerializer.getLength + + override def copy(from: T): T = enum.apply(from.id) + + override def copy(from: T, reuse: T): T = copy(from) + + override def copy(src: DataInputView, tgt: DataOutputView): Unit = intSerializer.copy(src, tgt) + + override def serialize(v: T, tgt: DataOutputView): Unit = intSerializer.serialize(v.id, tgt) + + override def deserialize(source: DataInputView): T = enum(intSerializer.deserialize(source)) + + override def deserialize(reuse: T, source: DataInputView): T = deserialize(source) + + override def equals(obj: Any): Boolean = { --- End diff -- Can you also override `hashCode()` here? Keeps it consistent with overriding `equals()` Create a Serializer for Scala Enumerations -- Key: FLINK-2231 URL: https://issues.apache.org/jira/browse/FLINK-2231 Project: Flink Issue Type: Improvement Components: Scala API Reporter: Stephan Ewen Assignee: Alexander Alexandrov Scala Enumerations are currently serialized with Kryo, but should be efficiently serialized by just writing the {{initial}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-2231] Create a Serializer for Scala Enu...
Github user StephanEwen commented on a diff in the pull request: https://github.com/apache/flink/pull/935#discussion_r35741171 --- Diff: flink-scala/src/main/scala/org/apache/flink/api/scala/typeutils/EnumValueSerializer.scala --- @@ -0,0 +1,61 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * License); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.flink.api.scala.typeutils + +import org.apache.flink.api.common.typeutils.TypeSerializer +import org.apache.flink.api.common.typeutils.base.IntSerializer +import org.apache.flink.core.memory.{DataInputView, DataOutputView} + +/** + * Serializer for [[Enumeration]] values. + */ +class EnumValueSerializer[E : Enumeration](val enum: E) extends TypeSerializer[E#Value] { + + type T = E#Value + + val intSerializer = new IntSerializer() + + override def duplicate: EnumValueSerializer[E] = this + + override def createInstance: T = enum(0) + + override def isImmutableType: Boolean = true + + override def getLength: Int = intSerializer.getLength + + override def copy(from: T): T = enum.apply(from.id) + + override def copy(from: T, reuse: T): T = copy(from) + + override def copy(src: DataInputView, tgt: DataOutputView): Unit = intSerializer.copy(src, tgt) + + override def serialize(v: T, tgt: DataOutputView): Unit = intSerializer.serialize(v.id, tgt) + + override def deserialize(source: DataInputView): T = enum(intSerializer.deserialize(source)) + + override def deserialize(reuse: T, source: DataInputView): T = deserialize(source) + + override def equals(obj: Any): Boolean = { --- End diff -- Can you also override `hashCode()` here? Keeps it consistent with overriding `equals()` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Resolved] (FLINK-2420) OutputFlush thread in stream writers does not propagate exceptions
[ https://issues.apache.org/jira/browse/FLINK-2420?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stephan Ewen resolved FLINK-2420. - Resolution: Fixed Fixed via 8ba321332b994579f387add8bd0855bd29cb33ec OutputFlush thread in stream writers does not propagate exceptions -- Key: FLINK-2420 URL: https://issues.apache.org/jira/browse/FLINK-2420 Project: Flink Issue Type: Bug Components: Streaming Affects Versions: 0.10 Reporter: Stephan Ewen Assignee: Stephan Ewen Fix For: 0.10 The output flush thread only throws exceptions that it encountered. This simply lets the thread die, when the exceptions reach the root of the stack. The exceptions never reach the actual writer code. That way, exceptions that only happen on flush (or the last flush) would never be detected. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (FLINK-2421) StreamRecordSerializer incorrectly duplicates and misses tests
[ https://issues.apache.org/jira/browse/FLINK-2421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stephan Ewen resolved FLINK-2421. - Resolution: Fixed Fixed via 2d237e18a2f7cf21721340933c505bb518c4fc66 StreamRecordSerializer incorrectly duplicates and misses tests -- Key: FLINK-2421 URL: https://issues.apache.org/jira/browse/FLINK-2421 Project: Flink Issue Type: Bug Components: Streaming Affects Versions: 0.10 Reporter: Stephan Ewen Assignee: Stephan Ewen Fix For: 0.10 The duplication method of the {{StreamRecordSerializer}} does not respect the duplication of the internal type serializer. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Closed] (FLINK-2421) StreamRecordSerializer incorrectly duplicates and misses tests
[ https://issues.apache.org/jira/browse/FLINK-2421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stephan Ewen closed FLINK-2421. --- StreamRecordSerializer incorrectly duplicates and misses tests -- Key: FLINK-2421 URL: https://issues.apache.org/jira/browse/FLINK-2421 Project: Flink Issue Type: Bug Components: Streaming Affects Versions: 0.10 Reporter: Stephan Ewen Assignee: Stephan Ewen Fix For: 0.10 The duplication method of the {{StreamRecordSerializer}} does not respect the duplication of the internal type serializer. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Closed] (FLINK-2420) OutputFlush thread in stream writers does not propagate exceptions
[ https://issues.apache.org/jira/browse/FLINK-2420?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stephan Ewen closed FLINK-2420. --- OutputFlush thread in stream writers does not propagate exceptions -- Key: FLINK-2420 URL: https://issues.apache.org/jira/browse/FLINK-2420 Project: Flink Issue Type: Bug Components: Streaming Affects Versions: 0.10 Reporter: Stephan Ewen Assignee: Stephan Ewen Fix For: 0.10 The output flush thread only throws exceptions that it encountered. This simply lets the thread die, when the exceptions reach the root of the stack. The exceptions never reach the actual writer code. That way, exceptions that only happen on flush (or the last flush) would never be detected. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2424) InstantiationUtil.serializeObject(Object) does not close output stream
[ https://issues.apache.org/jira/browse/FLINK-2424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14645740#comment-14645740 ] Stephan Ewen commented on FLINK-2424: - oha, good one! InstantiationUtil.serializeObject(Object) does not close output stream -- Key: FLINK-2424 URL: https://issues.apache.org/jira/browse/FLINK-2424 Project: Flink Issue Type: Bug Components: Core Affects Versions: 0.9, master Reporter: Ufuk Celebi Assignee: Ufuk Celebi Fix For: 0.10, 0.9.1 The utility methods InstantiationUtil.serializeObject(Object) does not close the created output stream. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-2419] Add test for sinks after keyBy an...
Github user gyfora commented on the pull request: https://github.com/apache/flink/pull/947#issuecomment-125894533 If you give a +1 @rmetzger I will merge this --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-2419) DataStream sinks lose key information
[ https://issues.apache.org/jira/browse/FLINK-2419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14645779#comment-14645779 ] ASF GitHub Bot commented on FLINK-2419: --- Github user gyfora commented on the pull request: https://github.com/apache/flink/pull/947#issuecomment-125894533 If you give a +1 @rmetzger I will merge this DataStream sinks lose key information - Key: FLINK-2419 URL: https://issues.apache.org/jira/browse/FLINK-2419 Project: Flink Issue Type: Bug Components: Streaming Reporter: Gyula Fora Assignee: Gyula Fora Priority: Blocker Fix For: 0.10 If the user applies an addSink method after keyBy, the sink will not have the information of the key as the transform method is bypassed. This makes it impossible to use partitioned states with sinks. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1901) Create sample operator for Dataset
[ https://issues.apache.org/jira/browse/FLINK-1901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14645803#comment-14645803 ] ASF GitHub Bot commented on FLINK-1901: --- Github user ChengXiangLi commented on the pull request: https://github.com/apache/flink/pull/949#issuecomment-125898830 Previously, i plan to leave the sample scala API to an separate PR as i not very familiar with scala, but the failed test shows that Flink has a test to make sure scala and java has the same API, i would try to add scala API and integration test later. Create sample operator for Dataset -- Key: FLINK-1901 URL: https://issues.apache.org/jira/browse/FLINK-1901 Project: Flink Issue Type: Improvement Components: Core Reporter: Theodore Vasiloudis Assignee: Chengxiang Li In order to be able to implement Stochastic Gradient Descent and a number of other machine learning algorithms we need to have a way to take a random sample from a Dataset. We need to be able to sample with or without replacement from the Dataset, choose the relative size of the sample, and set a seed for reproducibility. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-2240] Use BloomFilter to filter probe r...
Github user StephanEwen commented on the pull request: https://github.com/apache/flink/pull/888#issuecomment-125903682 Okay, it took me a while, but I actually walked through this and would like to merge it soon. To make this functionality configurable, I opened an issue to give runtime operators access to the TaskManager config, so that we can define a switch to turn this on and off. I would actually turn it on by default, it seems to work well as far as I have tried it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-2240) Use BloomFilter to minimize probe side records which are spilled to disk in Hybrid-Hash-Join
[ https://issues.apache.org/jira/browse/FLINK-2240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14645822#comment-14645822 ] ASF GitHub Bot commented on FLINK-2240: --- Github user StephanEwen commented on the pull request: https://github.com/apache/flink/pull/888#issuecomment-125903682 Okay, it took me a while, but I actually walked through this and would like to merge it soon. To make this functionality configurable, I opened an issue to give runtime operators access to the TaskManager config, so that we can define a switch to turn this on and off. I would actually turn it on by default, it seems to work well as far as I have tried it. Use BloomFilter to minimize probe side records which are spilled to disk in Hybrid-Hash-Join Key: FLINK-2240 URL: https://issues.apache.org/jira/browse/FLINK-2240 Project: Flink Issue Type: Improvement Components: Core Reporter: Chengxiang Li Assignee: Chengxiang Li Priority: Minor In Hybrid-Hash-Join, while small table does not fit into memory, part of the small table data would be spilled to disk, and the counterpart partition of big table data would be spilled to disk in probe phase as well. If we build a BloomFilter while spill small table to disk during build phase, and use it to filter the big table records which tend to be spilled to disk, this may greatly reduce the spilled big table file size, and saved the disk IO cost for writing and further reading. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-2240] Use BloomFilter to filter probe r...
Github user ChengXiangLi commented on the pull request: https://github.com/apache/flink/pull/888#issuecomment-125914138 Thanks to point that out, Stephan, i didn't notice the configuration issue before. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-2240) Use BloomFilter to minimize probe side records which are spilled to disk in Hybrid-Hash-Join
[ https://issues.apache.org/jira/browse/FLINK-2240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14645842#comment-14645842 ] ASF GitHub Bot commented on FLINK-2240: --- Github user ChengXiangLi commented on the pull request: https://github.com/apache/flink/pull/888#issuecomment-125914138 Thanks to point that out, Stephan, i didn't notice the configuration issue before. Use BloomFilter to minimize probe side records which are spilled to disk in Hybrid-Hash-Join Key: FLINK-2240 URL: https://issues.apache.org/jira/browse/FLINK-2240 Project: Flink Issue Type: Improvement Components: Core Reporter: Chengxiang Li Assignee: Chengxiang Li Priority: Minor In Hybrid-Hash-Join, while small table does not fit into memory, part of the small table data would be spilled to disk, and the counterpart partition of big table data would be spilled to disk in probe phase as well. If we build a BloomFilter while spill small table to disk during build phase, and use it to filter the big table records which tend to be spilled to disk, this may greatly reduce the spilled big table file size, and saved the disk IO cost for writing and further reading. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2425) Give access to TaskManager config and hostname in the Runtime Environment
[ https://issues.apache.org/jira/browse/FLINK-2425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14645817#comment-14645817 ] Robert Metzger commented on FLINK-2425: --- +1 That is something that users have requested Give access to TaskManager config and hostname in the Runtime Environment - Key: FLINK-2425 URL: https://issues.apache.org/jira/browse/FLINK-2425 Project: Flink Issue Type: Improvement Components: TaskManager Affects Versions: 0.10 Reporter: Stephan Ewen Fix For: 0.10 The RuntimeEnvironment (that is used by the operators to access the context) should give access to the TaskManager's configuration, to allow to read config values. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (FLINK-2427) Allow the BarrierBuffer to maintain multiple queues of blocked inputs
Stephan Ewen created FLINK-2427: --- Summary: Allow the BarrierBuffer to maintain multiple queues of blocked inputs Key: FLINK-2427 URL: https://issues.apache.org/jira/browse/FLINK-2427 Project: Flink Issue Type: Sub-task Components: Streaming Affects Versions: 0.10 Reporter: Stephan Ewen Assignee: Stephan Ewen Fix For: 0.10 In corner cases (dropped barriers due to failures/startup races), this is required for proper operation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2152) Provide zipWithIndex utility in flink-contrib
[ https://issues.apache.org/jira/browse/FLINK-2152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14645895#comment-14645895 ] Robert Metzger commented on FLINK-2152: --- Yes, we either have to use a concurrent list or create a copy of it. The elements in the broadcast set are shared among the parallel instances. Provide zipWithIndex utility in flink-contrib - Key: FLINK-2152 URL: https://issues.apache.org/jira/browse/FLINK-2152 Project: Flink Issue Type: Improvement Components: Java API Reporter: Robert Metzger Assignee: Andra Lungu Priority: Trivial Labels: starter Fix For: 0.10 We should provide a simple utility method for zipping elements in a data set with a dense index. its up for discussion whether we want it directly in the API or if we should provide it only as a utility from {{flink-contrib}}. I would put it in {{flink-contrib}}. See my answer on SO: http://stackoverflow.com/questions/30596556/zipwithindex-on-apache-flink -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-1901] [core] Create sample operator for...
Github user ChengXiangLi commented on the pull request: https://github.com/apache/flink/pull/949#issuecomment-125898830 Previously, i plan to leave the sample scala API to an separate PR as i not very familiar with scala, but the failed test shows that Flink has a test to make sure scala and java has the same API, i would try to add scala API and integration test later. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-2387) Add test for live accumulators in Streaming
[ https://issues.apache.org/jira/browse/FLINK-2387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14645821#comment-14645821 ] ASF GitHub Bot commented on FLINK-2387: --- Github user StephanEwen commented on the pull request: https://github.com/apache/flink/pull/926#issuecomment-125902465 How does this PR relate to the recent improvements on the stability of the live accumulator tests? Add test for live accumulators in Streaming --- Key: FLINK-2387 URL: https://issues.apache.org/jira/browse/FLINK-2387 Project: Flink Issue Type: Test Components: Streaming Reporter: Maximilian Michels -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-2387] add streaming test case for live ...
Github user StephanEwen commented on the pull request: https://github.com/apache/flink/pull/926#issuecomment-125902465 How does this PR relate to the recent improvements on the stability of the live accumulator tests? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-2152) Provide zipWithIndex utility in flink-contrib
[ https://issues.apache.org/jira/browse/FLINK-2152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14645904#comment-14645904 ] Andra Lungu commented on FLINK-2152: Hey, Since this is Johannes Günther's finding, I suggest he simply opens a PR with what worked for him :) Nice catch BTW! Cheers, Andra Provide zipWithIndex utility in flink-contrib - Key: FLINK-2152 URL: https://issues.apache.org/jira/browse/FLINK-2152 Project: Flink Issue Type: Improvement Components: Java API Reporter: Robert Metzger Assignee: Andra Lungu Priority: Trivial Labels: starter Fix For: 0.10 We should provide a simple utility method for zipping elements in a data set with a dense index. its up for discussion whether we want it directly in the API or if we should provide it only as a utility from {{flink-contrib}}. I would put it in {{flink-contrib}}. See my answer on SO: http://stackoverflow.com/questions/30596556/zipwithindex-on-apache-flink -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-2231] Create a Serializer for Scala Enu...
Github user aljoscha commented on the pull request: https://github.com/apache/flink/pull/935#issuecomment-125936680 Merged, thanks for your work. :smile: --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-2231] Create a Serializer for Scala Enu...
Github user asfgit closed the pull request at: https://github.com/apache/flink/pull/935 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-2231) Create a Serializer for Scala Enumerations
[ https://issues.apache.org/jira/browse/FLINK-2231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14645974#comment-14645974 ] ASF GitHub Bot commented on FLINK-2231: --- Github user asfgit closed the pull request at: https://github.com/apache/flink/pull/935 Create a Serializer for Scala Enumerations -- Key: FLINK-2231 URL: https://issues.apache.org/jira/browse/FLINK-2231 Project: Flink Issue Type: Improvement Components: Scala API Reporter: Stephan Ewen Assignee: Alexander Alexandrov Scala Enumerations are currently serialized with Kryo, but should be efficiently serialized by just writing the {{initial}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (FLINK-2428) Clean up unused properties in StreamConfig
Stephan Ewen created FLINK-2428: --- Summary: Clean up unused properties in StreamConfig Key: FLINK-2428 URL: https://issues.apache.org/jira/browse/FLINK-2428 Project: Flink Issue Type: Bug Components: Streaming Affects Versions: 0.10 Reporter: Stephan Ewen Priority: Minor There is a multitude of unused properties in the {{StreamConfig}}, which should be removed, if no longer relevant. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2425) Give access to TaskManager config and hostname in the Runtime Environment
[ https://issues.apache.org/jira/browse/FLINK-2425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14646003#comment-14646003 ] Sachin Goel commented on FLINK-2425: Yes. But this will certainly use the readonly configuration defined there. After that it's just a matter of passing it to the RuntimeEnvironment which can give access to the user by passing it through the DistributedContext. Or is there a better way? :') Give access to TaskManager config and hostname in the Runtime Environment - Key: FLINK-2425 URL: https://issues.apache.org/jira/browse/FLINK-2425 Project: Flink Issue Type: Improvement Components: TaskManager Affects Versions: 0.10 Reporter: Stephan Ewen Assignee: Sachin Goel Fix For: 0.10 The RuntimeEnvironment (that is used by the operators to access the context) should give access to the TaskManager's configuration, to allow to read config values. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2425) Give access to TaskManager config and hostname in the Runtime Environment
[ https://issues.apache.org/jira/browse/FLINK-2425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14646023#comment-14646023 ] Sachin Goel commented on FLINK-2425: Should I provide access to configuration via RuntimeContext as well? Since user cannot modify it, there can't be any harm to doing it. Access to config parameters might help the user write the program better. Give access to TaskManager config and hostname in the Runtime Environment - Key: FLINK-2425 URL: https://issues.apache.org/jira/browse/FLINK-2425 Project: Flink Issue Type: Improvement Components: TaskManager Affects Versions: 0.10 Reporter: Stephan Ewen Assignee: Sachin Goel Fix For: 0.10 The RuntimeEnvironment (that is used by the operators to access the context) should give access to the TaskManager's configuration, to allow to read config values. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2391) Storm-compatibility:method FlinkTopologyBuilder.createTopology() throws java.lang.NullPointerException
[ https://issues.apache.org/jira/browse/FLINK-2391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14645954#comment-14645954 ] ASF GitHub Bot commented on FLINK-2391: --- Github user asfgit closed the pull request at: https://github.com/apache/flink/pull/940 Storm-compatibility:method FlinkTopologyBuilder.createTopology() throws java.lang.NullPointerException -- Key: FLINK-2391 URL: https://issues.apache.org/jira/browse/FLINK-2391 Project: Flink Issue Type: Bug Components: flink-contrib Affects Versions: 0.10 Environment: win7 32bit;linux Reporter: Huang Wei Labels: features Fix For: 0.10 Original Estimate: 168h Remaining Estimate: 168h core dumped at FlinkOutputFieldsDeclarer.java : 160(package FlinkOutputFieldsDeclarer). code : fieldIndexes[i] = this.outputSchema.fieldIndex(groupingFields.get(i)); in this line, the var this.outputSchema may be null. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2425) Give access to TaskManager config and hostname in the Runtime Environment
[ https://issues.apache.org/jira/browse/FLINK-2425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14645993#comment-14645993 ] Sachin Goel commented on FLINK-2425: The readonly configuration will take the actual configuration as an argument in its constructor and provide all access functions of Configuration, while keeping it private. Give access to TaskManager config and hostname in the Runtime Environment - Key: FLINK-2425 URL: https://issues.apache.org/jira/browse/FLINK-2425 Project: Flink Issue Type: Improvement Components: TaskManager Affects Versions: 0.10 Reporter: Stephan Ewen Assignee: Sachin Goel Fix For: 0.10 The RuntimeEnvironment (that is used by the operators to access the context) should give access to the TaskManager's configuration, to allow to read config values. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: Cascading changes for compatibility
GitHub user mxm opened a pull request: https://github.com/apache/flink/pull/950 Cascading changes for compatibility @fhueske and me are working on getting Cascading to run on top of Flink. These two commits introduce changes that were necessary to make the translation possible. Up to debate is the second commit, which loads the userclassloader before calling `configure` on an InputFormat. This is necessary because Cascading has its own interface that needs to be wrapped inside a Flink `InputFormat`. Usually, all dependencies are loaded upon instantiation of the class. However, Cascading loads its own input format while `configure(config)` is called (via the class name inside the passed config). Without the changes, this leads to a ClassNotFoundException. Let me know what you think about it. You can merge this pull request into a Git repository by running: $ git pull https://github.com/mxm/flink cascading-dev Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/950.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #950 commit 433fec2a90b5e868d0c3b7fa258b85c32345b954 Author: Fabian Hueske fhue...@apache.org Date: 2015-07-16T22:31:09Z [cascading] add getJobConf() to HadoopInputSplit commit a81582c3cf59952381ce5fb9e15adeb775fcbff7 Author: Maximilian Michels m...@apache.org Date: 2015-07-29T12:51:14Z [cascading] load user classloader when configuring InputFormat --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Updated] (FLINK-2429) Remove the enableCheckpointing() without interval variant
[ https://issues.apache.org/jira/browse/FLINK-2429?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stephan Ewen updated FLINK-2429: Description: I think it is not very obvious what the default checkpointing interval is. Also, when somebody activates checkpointing, shouldn't they think about what they want in terms of frequency and recovery latency tradeoffs? was:It is not very obvious what the default checkpointing interval. Also, when somebody activates Remove the enableCheckpointing() without interval variant --- Key: FLINK-2429 URL: https://issues.apache.org/jira/browse/FLINK-2429 Project: Flink Issue Type: Wish Components: Streaming Reporter: Stephan Ewen Priority: Minor I think it is not very obvious what the default checkpointing interval is. Also, when somebody activates checkpointing, shouldn't they think about what they want in terms of frequency and recovery latency tradeoffs? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (FLINK-2429) Remove the enableCheckpointing() without interval variant
[ https://issues.apache.org/jira/browse/FLINK-2429?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stephan Ewen updated FLINK-2429: Issue Type: Wish (was: Bug) Remove the enableCheckpointing() without interval variant --- Key: FLINK-2429 URL: https://issues.apache.org/jira/browse/FLINK-2429 Project: Flink Issue Type: Wish Components: Streaming Reporter: Stephan Ewen It is not very obvious what the default checkpointing interval. Also, when somebody activates -- This message was sent by Atlassian JIRA (v6.3.4#6332)