[GitHub] flink pull request: Fix handling of receiver and sender task failu...
Github user uce commented on the pull request: https://github.com/apache/flink/pull/746#issuecomment-107337457 Merging now. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Created] (FLINK-2125) String delimiter for SocketTextStream
Márton Balassi created FLINK-2125: - Summary: String delimiter for SocketTextStream Key: FLINK-2125 URL: https://issues.apache.org/jira/browse/FLINK-2125 Project: Flink Issue Type: Improvement Components: Streaming Affects Versions: 0.9 Reporter: Márton Balassi Priority: Minor The SocketTextStreamFunction uses a character delimiter, despite other parts of the API using String delimiter. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (FLINK-2126) Scala shell tests sporadically fail on travis
Robert Metzger created FLINK-2126: - Summary: Scala shell tests sporadically fail on travis Key: FLINK-2126 URL: https://issues.apache.org/jira/browse/FLINK-2126 Project: Flink Issue Type: Bug Components: Scala Shell Affects Versions: 0.9 Reporter: Robert Metzger See https://travis-ci.org/rmetzger/flink/jobs/64893149 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2120) Rename AbstractJobVertex to JobVertex
[ https://issues.apache.org/jira/browse/FLINK-2120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14567209#comment-14567209 ] Maximilian Michels commented on FLINK-2120: --- +1 Rename AbstractJobVertex to JobVertex - Key: FLINK-2120 URL: https://issues.apache.org/jira/browse/FLINK-2120 Project: Flink Issue Type: Wish Components: Distributed Runtime Affects Versions: master Reporter: Ufuk Celebi Priority: Trivial I would like to rename AbstractJobVertex to JobVertex. It is not abstract and we have a lot of references to it in tests, where we create instances. This is trivial, but I think it is a bad name. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-2037] Provide flink-python.jar in lib/
Github user asfgit closed the pull request at: https://github.com/apache/flink/pull/691 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-2037) Unable to start Python API using ./bin/pyflink*.sh
[ https://issues.apache.org/jira/browse/FLINK-2037?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14567216#comment-14567216 ] ASF GitHub Bot commented on FLINK-2037: --- Github user asfgit closed the pull request at: https://github.com/apache/flink/pull/691 Unable to start Python API using ./bin/pyflink*.sh -- Key: FLINK-2037 URL: https://issues.apache.org/jira/browse/FLINK-2037 Project: Flink Issue Type: Bug Components: Python API Affects Versions: 0.9 Reporter: Robert Metzger Assignee: Robert Metzger Priority: Critical Calling {{./bin/pyflink3.sh}} will lead to {code} ./bin/pyflink3.sh log4j:WARN No appenders could be found for logger (org.apache.hadoop.metrics2.lib.MutableMetricsFactory). log4j:WARN Please initialize the log4j system properly. log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info. JAR file does not exist: /home/robert/incubator-flink/build-target/bin/../lib/flink-python-0.9-SNAPSHOT.jar Use the help option (-h or --help) to get help on the command. {code} This is due to the script expecting a {{flink-python-0.9-SNAPSHOT.jar}} file to exist in {{lib}} (its wrong anyways that the version name is included here. That should be replaced by a {{*}}). I'll look into the issue ... -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1319) Add static code analysis for UDFs
[ https://issues.apache.org/jira/browse/FLINK-1319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14567225#comment-14567225 ] ASF GitHub Bot commented on FLINK-1319: --- Github user uce commented on the pull request: https://github.com/apache/flink/pull/729#issuecomment-107419997 OK, I will review this. I vote to stick to Stephan's suggested approach instead of package based exclusions: analyze everything and allow exclusions with a `@SkipCodeAnalysis` annotation. Any further opinions on the output of the analysis (stdout vs. logging question)? Add static code analysis for UDFs - Key: FLINK-1319 URL: https://issues.apache.org/jira/browse/FLINK-1319 Project: Flink Issue Type: New Feature Components: Java API, Scala API Reporter: Stephan Ewen Assignee: Timo Walther Priority: Minor Flink's Optimizer takes information that tells it for UDFs which fields of the input elements are accessed, modified, or frwarded/copied. This information frequently helps to reuse partitionings, sorts, etc. It may speed up programs significantly, as it can frequently eliminate sorts and shuffles, which are costly. Right now, users can add lightweight annotations to UDFs to provide this information (such as adding {{@ConstandFields(0-3, 1, 2-1)}}. We worked with static code analysis of UDFs before, to determine this information automatically. This is an incredible feature, as it magically makes programs faster. For record-at-a-time operations (Map, Reduce, FlatMap, Join, Cross), this works surprisingly well in many cases. We used the Soot toolkit for the static code analysis. Unfortunately, Soot is LGPL licensed and thus we did not include any of the code so far. I propose to add this functionality to Flink, in the form of a drop-in addition, to work around the LGPL incompatibility with ALS 2.0. Users could simply download a special flink-code-analysis.jar and drop it into the lib folder to enable this functionality. We may even add a script to tools that downloads that library automatically into the lib folder. This should be legally fine, since we do not redistribute LGPL code and only dynamically link it (the incompatibility with ASL 2.0 is mainly in the patentability, if I remember correctly). Prior work on this has been done by [~aljoscha] and [~skunert], which could provide a code base to start with. *Appendix* Hompage to Soot static analysis toolkit: http://www.sable.mcgill.ca/soot/ Papers on static analysis and for optimization: http://stratosphere.eu/assets/papers/EnablingOperatorReorderingSCA_12.pdf and http://stratosphere.eu/assets/papers/openingTheBlackBoxes_12.pdf Quick introduction to the Optimizer: http://stratosphere.eu/assets/papers/2014-VLDBJ_Stratosphere_Overview.pdf (Section 6) Optimizer for Iterations: http://stratosphere.eu/assets/papers/spinningFastIterativeDataFlows_12.pdf (Sections 4.3 and 5.3) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-1319][core] Add static code analysis fo...
Github user rmetzger commented on the pull request: https://github.com/apache/flink/pull/729#issuecomment-107422340 +1 for the annotation. I'm against using stdout. Logging frameworks are much better at controlling the output. The quickstart mvn archetype provides a log4j.properties file, so we can configure it the way we want it to. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-1319) Add static code analysis for UDFs
[ https://issues.apache.org/jira/browse/FLINK-1319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14567244#comment-14567244 ] ASF GitHub Bot commented on FLINK-1319: --- Github user rmetzger commented on the pull request: https://github.com/apache/flink/pull/729#issuecomment-107422340 +1 for the annotation. I'm against using stdout. Logging frameworks are much better at controlling the output. The quickstart mvn archetype provides a log4j.properties file, so we can configure it the way we want it to. Add static code analysis for UDFs - Key: FLINK-1319 URL: https://issues.apache.org/jira/browse/FLINK-1319 Project: Flink Issue Type: New Feature Components: Java API, Scala API Reporter: Stephan Ewen Assignee: Timo Walther Priority: Minor Flink's Optimizer takes information that tells it for UDFs which fields of the input elements are accessed, modified, or frwarded/copied. This information frequently helps to reuse partitionings, sorts, etc. It may speed up programs significantly, as it can frequently eliminate sorts and shuffles, which are costly. Right now, users can add lightweight annotations to UDFs to provide this information (such as adding {{@ConstandFields(0-3, 1, 2-1)}}. We worked with static code analysis of UDFs before, to determine this information automatically. This is an incredible feature, as it magically makes programs faster. For record-at-a-time operations (Map, Reduce, FlatMap, Join, Cross), this works surprisingly well in many cases. We used the Soot toolkit for the static code analysis. Unfortunately, Soot is LGPL licensed and thus we did not include any of the code so far. I propose to add this functionality to Flink, in the form of a drop-in addition, to work around the LGPL incompatibility with ALS 2.0. Users could simply download a special flink-code-analysis.jar and drop it into the lib folder to enable this functionality. We may even add a script to tools that downloads that library automatically into the lib folder. This should be legally fine, since we do not redistribute LGPL code and only dynamically link it (the incompatibility with ASL 2.0 is mainly in the patentability, if I remember correctly). Prior work on this has been done by [~aljoscha] and [~skunert], which could provide a code base to start with. *Appendix* Hompage to Soot static analysis toolkit: http://www.sable.mcgill.ca/soot/ Papers on static analysis and for optimization: http://stratosphere.eu/assets/papers/EnablingOperatorReorderingSCA_12.pdf and http://stratosphere.eu/assets/papers/openingTheBlackBoxes_12.pdf Quick introduction to the Optimizer: http://stratosphere.eu/assets/papers/2014-VLDBJ_Stratosphere_Overview.pdf (Section 6) Optimizer for Iterations: http://stratosphere.eu/assets/papers/spinningFastIterativeDataFlows_12.pdf (Sections 4.3 and 5.3) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: Implemented TwitterSourceFilter and adapted Tw...
Github user aljoscha commented on the pull request: https://github.com/apache/flink/pull/695#issuecomment-107422496 Is this good to merge now? I'm rewriting all the sources (again) and this should probably go in. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Assigned] (FLINK-1430) Add test for streaming scala api completeness
[ https://issues.apache.org/jira/browse/FLINK-1430?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Márton Balassi reassigned FLINK-1430: - Assignee: Márton Balassi (was: Mingliang Qi) Add test for streaming scala api completeness - Key: FLINK-1430 URL: https://issues.apache.org/jira/browse/FLINK-1430 Project: Flink Issue Type: Test Components: Streaming Affects Versions: 0.9 Reporter: Márton Balassi Assignee: Márton Balassi Currently the completeness of the streaming scala api is not tested. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: New operator state interfaces
Github user rmetzger commented on a diff in the pull request: https://github.com/apache/flink/pull/747#discussion_r31420656 --- Diff: flink-staging/flink-streaming/flink-streaming-connectors/flink-connector-kafka/src/main/java/org/apache/flink/streaming/connectors/kafka/api/persistent/PersistentKafkaSource.java --- @@ -140,11 +144,14 @@ public void open(Configuration parameters) throws Exception { // most likely the number of offsets we're going to store here will be lower than the number of partitions. int numPartitions = getNumberOfPartitions(); LOG.debug(The topic {} has {} partitions, topicName, numPartitions); - this.lastOffsets = new long[numPartitions]; - this.commitedOffsets = new long[numPartitions]; - Arrays.fill(this.lastOffsets, -1); - Arrays.fill(this.commitedOffsets, 0); // just to make it clear - + + long[] defaultOffset = new long[numPartitions]; + Arrays.fill(defaultOffset, -1); + + this.lastOffsets = getRuntimeContext().getOperatorState(offset, defaultOffset); + + //TODO: commit fetched offset to ZK if not default --- End diff -- We can not merge the PR with this TODO open. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-1319) Add static code analysis for UDFs
[ https://issues.apache.org/jira/browse/FLINK-1319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14567146#comment-14567146 ] ASF GitHub Bot commented on FLINK-1319: --- Github user StephanEwen commented on the pull request: https://github.com/apache/flink/pull/729#issuecomment-107393190 Let us merge this for 0.9 and have it deactivated by default. Let's gradually activate it in the next releases as it gets exposure Add static code analysis for UDFs - Key: FLINK-1319 URL: https://issues.apache.org/jira/browse/FLINK-1319 Project: Flink Issue Type: New Feature Components: Java API, Scala API Reporter: Stephan Ewen Assignee: Timo Walther Priority: Minor Flink's Optimizer takes information that tells it for UDFs which fields of the input elements are accessed, modified, or frwarded/copied. This information frequently helps to reuse partitionings, sorts, etc. It may speed up programs significantly, as it can frequently eliminate sorts and shuffles, which are costly. Right now, users can add lightweight annotations to UDFs to provide this information (such as adding {{@ConstandFields(0-3, 1, 2-1)}}. We worked with static code analysis of UDFs before, to determine this information automatically. This is an incredible feature, as it magically makes programs faster. For record-at-a-time operations (Map, Reduce, FlatMap, Join, Cross), this works surprisingly well in many cases. We used the Soot toolkit for the static code analysis. Unfortunately, Soot is LGPL licensed and thus we did not include any of the code so far. I propose to add this functionality to Flink, in the form of a drop-in addition, to work around the LGPL incompatibility with ALS 2.0. Users could simply download a special flink-code-analysis.jar and drop it into the lib folder to enable this functionality. We may even add a script to tools that downloads that library automatically into the lib folder. This should be legally fine, since we do not redistribute LGPL code and only dynamically link it (the incompatibility with ASL 2.0 is mainly in the patentability, if I remember correctly). Prior work on this has been done by [~aljoscha] and [~skunert], which could provide a code base to start with. *Appendix* Hompage to Soot static analysis toolkit: http://www.sable.mcgill.ca/soot/ Papers on static analysis and for optimization: http://stratosphere.eu/assets/papers/EnablingOperatorReorderingSCA_12.pdf and http://stratosphere.eu/assets/papers/openingTheBlackBoxes_12.pdf Quick introduction to the Optimizer: http://stratosphere.eu/assets/papers/2014-VLDBJ_Stratosphere_Overview.pdf (Section 6) Optimizer for Iterations: http://stratosphere.eu/assets/papers/spinningFastIterativeDataFlows_12.pdf (Sections 4.3 and 5.3) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (FLINK-2037) Unable to start Python API using ./bin/pyflink*.sh
[ https://issues.apache.org/jira/browse/FLINK-2037?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Metzger resolved FLINK-2037. --- Resolution: Fixed Resolved in http://git-wip-us.apache.org/repos/asf/flink/commit/2f347a03 Unable to start Python API using ./bin/pyflink*.sh -- Key: FLINK-2037 URL: https://issues.apache.org/jira/browse/FLINK-2037 Project: Flink Issue Type: Bug Components: Python API Affects Versions: 0.9 Reporter: Robert Metzger Assignee: Robert Metzger Priority: Critical Calling {{./bin/pyflink3.sh}} will lead to {code} ./bin/pyflink3.sh log4j:WARN No appenders could be found for logger (org.apache.hadoop.metrics2.lib.MutableMetricsFactory). log4j:WARN Please initialize the log4j system properly. log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info. JAR file does not exist: /home/robert/incubator-flink/build-target/bin/../lib/flink-python-0.9-SNAPSHOT.jar Use the help option (-h or --help) to get help on the command. {code} This is due to the script expecting a {{flink-python-0.9-SNAPSHOT.jar}} file to exist in {{lib}} (its wrong anyways that the version name is included here. That should be replaced by a {{*}}). I'll look into the issue ... -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (FLINK-1954) Task Failures and Error Handling
[ https://issues.apache.org/jira/browse/FLINK-1954?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ufuk Celebi resolved FLINK-1954. Resolution: Fixed Fix Version/s: 0.9 Fixed in 1aad5b759432f0b59a9dcc366a4b66c2681626f1, 2a65b62216e8fb73fce65209bf646ca67e5f96b0, dce1be18593539ff29c3d55c5f2c1208a2e54c10, f75c16b0540c839079188bb58b5acf2ede108767. Task Failures and Error Handling Key: FLINK-1954 URL: https://issues.apache.org/jira/browse/FLINK-1954 Project: Flink Issue Type: Improvement Components: Distributed Runtime Affects Versions: master Reporter: Ufuk Celebi Priority: Blocker Fix For: 0.9 This is an issue to keep track of subtasks for error handling of task failures. The design doc for this can be found here: https://cwiki.apache.org/confluence/display/FLINK/Task+Failures+and+Error+Handling -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (FLINK-1955) Improve error handling of sender task failures
[ https://issues.apache.org/jira/browse/FLINK-1955?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ufuk Celebi updated FLINK-1955: --- Fix Version/s: 0.9 Improve error handling of sender task failures -- Key: FLINK-1955 URL: https://issues.apache.org/jira/browse/FLINK-1955 Project: Flink Issue Type: Improvement Components: Distributed Runtime Affects Versions: master Reporter: Ufuk Celebi Fix For: 0.9 Currently, if a sender task fails, the produced partition is silently released. We want: * Produced result partition becomes erroneous with a SenderFailedException * Receiver cancels itself when encountering the SenderFailedException * May also be cancelled by the JobManager (if that call is faster than the detection of the failed sender) * This closes the Netty channel -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (FLINK-1941) Add documentation for Gelly-GSA
[ https://issues.apache.org/jira/browse/FLINK-1941?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ufuk Celebi updated FLINK-1941: --- Fix Version/s: 0.9 Add documentation for Gelly-GSA --- Key: FLINK-1941 URL: https://issues.apache.org/jira/browse/FLINK-1941 Project: Flink Issue Type: Task Components: Gelly Affects Versions: 0.9 Reporter: Vasia Kalavri Assignee: Vasia Kalavri Labels: docs, gelly Fix For: 0.9 Add a section in the Gelly guide to describe the newly introduced Gather-Sum-Apply iteration method. Show how GSA uses delta iterations internally and explain the differences of this model as compared to vertex-centric. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (FLINK-2123) Fix CLI client logging
Ufuk Celebi created FLINK-2123: -- Summary: Fix CLI client logging Key: FLINK-2123 URL: https://issues.apache.org/jira/browse/FLINK-2123 Project: Flink Issue Type: Bug Components: Core Affects Versions: master Reporter: Ufuk Celebi The CLI client complains about missing log4j configuration and prints too much information. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2076) Bug in re-openable hash join
[ https://issues.apache.org/jira/browse/FLINK-2076?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14567143#comment-14567143 ] ASF GitHub Bot commented on FLINK-2076: --- Github user StephanEwen commented on the pull request: https://github.com/apache/flink/pull/751#issuecomment-107392568 Awesome, thanks a lot @chiwanpark Bug in re-openable hash join Key: FLINK-2076 URL: https://issues.apache.org/jira/browse/FLINK-2076 Project: Flink Issue Type: Bug Components: Local Runtime Affects Versions: 0.9 Reporter: Stephan Ewen Assignee: Chiwan Park It happens deterministically in my machine with the following setup: TaskManager: - heap size: 512m - network buffers: 4096 - slots: 32 Job: - ConnectedComponents - 100k vertices - 1.2m edges -- this gives around 260 m Flink managed memory, across 32 slots is 8MB per slot, with several mem consumers in the job, makes the iterative hash join out-of-core {code} java.lang.RuntimeException: Hash Join bug in memory management: Memory buffers leaked. at org.apache.flink.runtime.operators.hash.MutableHashTable.buildTableFromSpilledPartition(MutableHashTable.java:733) at org.apache.flink.runtime.operators.hash.MutableHashTable.prepareNextPartition(MutableHashTable.java:508) at org.apache.flink.runtime.operators.hash.ReOpenableMutableHashTable.prepareNextPartition(ReOpenableMutableHashTable.java:167) at org.apache.flink.runtime.operators.hash.MutableHashTable.nextRecord(MutableHashTable.java:541) at org.apache.flink.runtime.operators.hash.NonReusingBuildSecondHashMatchIterator.callWithNextKey(NonReusingBuildSecondHashMatchIterator.java:102) at org.apache.flink.runtime.operators.AbstractCachedBuildSideMatchDriver.run(AbstractCachedBuildSideMatchDriver.java:155) at org.apache.flink.runtime.operators.RegularPactTask.run(RegularPactTask.java:496) at org.apache.flink.runtime.iterative.task.AbstractIterativePactTask.run(AbstractIterativePactTask.java:139) at org.apache.flink.runtime.iterative.task.IterationIntermediatePactTask.run(IterationIntermediatePactTask.java:92) at org.apache.flink.runtime.operators.RegularPactTask.invoke(RegularPactTask.java:362) at org.apache.flink.runtime.taskmanager.Task.run(Task.java:560) at java.lang.Thread.run(Thread.java:745) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2004) Memory leak in presence of failed checkpoints in KafkaSource
[ https://issues.apache.org/jira/browse/FLINK-2004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14567144#comment-14567144 ] ASF GitHub Bot commented on FLINK-2004: --- Github user StephanEwen commented on the pull request: https://github.com/apache/flink/pull/674#issuecomment-107392647 Looks good! Will merge this.. Memory leak in presence of failed checkpoints in KafkaSource Key: FLINK-2004 URL: https://issues.apache.org/jira/browse/FLINK-2004 Project: Flink Issue Type: Bug Components: Streaming Affects Versions: 0.9 Reporter: Stephan Ewen Assignee: Robert Metzger Priority: Critical Fix For: 0.9 Checkpoints that fail never send a commit message to the tasks. Maintaining a map of all pending checkpoints introduces a memory leak, as entries for failed checkpoints will never be removed. Approaches to fix this: - The source cleans up entries from older checkpoints once a checkpoint is committed (simple implementation in a linked hash map) - The commit message could include the optional state handle (source needs not maintain the map) - The checkpoint coordinator could send messages for failed checkpoints? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-2076] [runtime] Fix memory leakage in M...
Github user StephanEwen commented on the pull request: https://github.com/apache/flink/pull/751#issuecomment-107392568 Awesome, thanks a lot @chiwanpark --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-2004] Fix memory leak in presence of fa...
Github user StephanEwen commented on the pull request: https://github.com/apache/flink/pull/674#issuecomment-107392647 Looks good! Will merge this.. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-2037] Provide flink-python.jar in lib/
Github user rmetzger commented on the pull request: https://github.com/apache/flink/pull/691#issuecomment-107391906 I'm going to merge this ... --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Created] (FLINK-2122) Make all internal streaming operators Checkpointable
Aljoscha Krettek created FLINK-2122: --- Summary: Make all internal streaming operators Checkpointable Key: FLINK-2122 URL: https://issues.apache.org/jira/browse/FLINK-2122 Project: Flink Issue Type: Improvement Components: Streaming Reporter: Aljoscha Krettek Right now, only user state is checkpointed and restored. This should be extended to flink-internal operators such as reducers and windows. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2098) Checkpoint barrier initiation at source is not aligned with snapshotting
[ https://issues.apache.org/jira/browse/FLINK-2098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14567126#comment-14567126 ] ASF GitHub Bot commented on FLINK-2098: --- Github user aljoscha commented on the pull request: https://github.com/apache/flink/pull/742#issuecomment-107388477 As per the discussion on the mailing list I'm rewriting the Source interface to only have the run()/cancel() variant. Checkpoint barrier initiation at source is not aligned with snapshotting Key: FLINK-2098 URL: https://issues.apache.org/jira/browse/FLINK-2098 Project: Flink Issue Type: Bug Components: Streaming Affects Versions: 0.9 Reporter: Stephan Ewen Assignee: Aljoscha Krettek Priority: Blocker Fix For: 0.9 The stream source does not properly align the emission of checkpoint barriers with the drawing of snapshots. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (FLINK-2124) FromElementsFunction is not really Serializable
Aljoscha Krettek created FLINK-2124: --- Summary: FromElementsFunction is not really Serializable Key: FLINK-2124 URL: https://issues.apache.org/jira/browse/FLINK-2124 Project: Flink Issue Type: Bug Components: Streaming Reporter: Aljoscha Krettek The function stores an Iterable of T. T is not necessarily Serializable and and Iterable is also not necessarily Serializable. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-2098] Ensure checkpoints and element em...
Github user aljoscha commented on the pull request: https://github.com/apache/flink/pull/742#issuecomment-107388477 As per the discussion on the mailing list I'm rewriting the Source interface to only have the run()/cancel() variant. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Resolved] (FLINK-1958) Improve error handling of receiver task failures
[ https://issues.apache.org/jira/browse/FLINK-1958?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ufuk Celebi resolved FLINK-1958. Resolution: Fixed Fixed in dce1be1. Improve error handling of receiver task failures Key: FLINK-1958 URL: https://issues.apache.org/jira/browse/FLINK-1958 Project: Flink Issue Type: Improvement Components: Distributed Runtime Affects Versions: master Reporter: Ufuk Celebi Currently, receiver task failures silently fail. We need the following behaviour: * Sender keeps going. May be back-pressured when no receiver pulls the data any more. * Sender may be cancelled by JobManager * Partition stays sane * Netty channel needs to be closed * Transfer needs to be canceled by a cancel message (receiver to sender) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (FLINK-1958) Improve error handling of receiver task failures
[ https://issues.apache.org/jira/browse/FLINK-1958?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ufuk Celebi updated FLINK-1958: --- Fix Version/s: 0.9 Improve error handling of receiver task failures Key: FLINK-1958 URL: https://issues.apache.org/jira/browse/FLINK-1958 Project: Flink Issue Type: Improvement Components: Distributed Runtime Affects Versions: master Reporter: Ufuk Celebi Fix For: 0.9 Currently, receiver task failures silently fail. We need the following behaviour: * Sender keeps going. May be back-pressured when no receiver pulls the data any more. * Sender may be cancelled by JobManager * Partition stays sane * Netty channel needs to be closed * Transfer needs to be canceled by a cancel message (receiver to sender) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (FLINK-1955) Improve error handling of sender task failures
[ https://issues.apache.org/jira/browse/FLINK-1955?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ufuk Celebi resolved FLINK-1955. Resolution: Fixed Fixed in f75c16b. Improve error handling of sender task failures -- Key: FLINK-1955 URL: https://issues.apache.org/jira/browse/FLINK-1955 Project: Flink Issue Type: Improvement Components: Distributed Runtime Affects Versions: master Reporter: Ufuk Celebi Currently, if a sender task fails, the produced partition is silently released. We want: * Produced result partition becomes erroneous with a SenderFailedException * Receiver cancels itself when encountering the SenderFailedException * May also be cancelled by the JobManager (if that call is faster than the detection of the failed sender) * This closes the Netty channel -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-1319][core] Add static code analysis fo...
Github user StephanEwen commented on the pull request: https://github.com/apache/flink/pull/729#issuecomment-107393190 Let us merge this for 0.9 and have it deactivated by default. Let's gradually activate it in the next releases as it gets exposure --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-2037) Unable to start Python API using ./bin/pyflink*.sh
[ https://issues.apache.org/jira/browse/FLINK-2037?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14567139#comment-14567139 ] ASF GitHub Bot commented on FLINK-2037: --- Github user rmetzger commented on the pull request: https://github.com/apache/flink/pull/691#issuecomment-107391906 I'm going to merge this ... Unable to start Python API using ./bin/pyflink*.sh -- Key: FLINK-2037 URL: https://issues.apache.org/jira/browse/FLINK-2037 Project: Flink Issue Type: Bug Components: Python API Affects Versions: 0.9 Reporter: Robert Metzger Assignee: Robert Metzger Priority: Critical Calling {{./bin/pyflink3.sh}} will lead to {code} ./bin/pyflink3.sh log4j:WARN No appenders could be found for logger (org.apache.hadoop.metrics2.lib.MutableMetricsFactory). log4j:WARN Please initialize the log4j system properly. log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info. JAR file does not exist: /home/robert/incubator-flink/build-target/bin/../lib/flink-python-0.9-SNAPSHOT.jar Use the help option (-h or --help) to get help on the command. {code} This is due to the script expecting a {{flink-python-0.9-SNAPSHOT.jar}} file to exist in {{lib}} (its wrong anyways that the version name is included here. That should be replaced by a {{*}}). I'll look into the issue ... -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1993) Replace MultipleLinearRegression's custom SGD with optimization framework's SGD
[ https://issues.apache.org/jira/browse/FLINK-1993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14567399#comment-14567399 ] ASF GitHub Bot commented on FLINK-1993: --- Github user thvasilo closed the pull request at: https://github.com/apache/flink/pull/725 Replace MultipleLinearRegression's custom SGD with optimization framework's SGD --- Key: FLINK-1993 URL: https://issues.apache.org/jira/browse/FLINK-1993 Project: Flink Issue Type: Task Components: Machine Learning Library Reporter: Till Rohrmann Assignee: Theodore Vasiloudis Priority: Minor Labels: ML Fix For: 0.9 The current implementation of MultipleLinearRegression uses a custom SGD implementation. Flink's optimization framework also contains a SGD optimizer which should replace the custom implementation once the framework is merged. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2101) Scheme Inference doesn't work for Tuple5
[ https://issues.apache.org/jira/browse/FLINK-2101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14567536#comment-14567536 ] Ufuk Celebi commented on FLINK-2101: I think there is no problem, but Robert wanted to improve the Exception message. Robert, can you do this and then close the issue? Scheme Inference doesn't work for Tuple5 Key: FLINK-2101 URL: https://issues.apache.org/jira/browse/FLINK-2101 Project: Flink Issue Type: Bug Components: Kafka Connector Affects Versions: master Reporter: Rico Bergmann Assignee: Robert Metzger Calling addSink(new KafkaSinkTuple5String, String, String, Long, Double( localhost:9092, webtrends.ec1601, new Utils.TypeInformationSerializationSchemaTuple5String, String, String, Long, Double( new Tuple5String, String, String, Long, Double(), env.getConfig(; gives me an Exception stating, that the generic type infos are not given. Exception in thread main org.apache.flink.api.common.functions.InvalidTypesException: Tuple needs to be parameterized by using generics. at org.apache.flink.api.java.typeutils.TypeExtractor.createTypeInfoWithTypeHierarchy(TypeExtractor.java:396) at org.apache.flink.api.java.typeutils.TypeExtractor.privateCreateTypeInfo(TypeExtractor.java:339) at org.apache.flink.api.java.typeutils.TypeExtractor.createTypeInfo(TypeExtractor.java:318) at org.apache.flink.api.java.typeutils.ObjectArrayTypeInfo.init(ObjectArrayTypeInfo.java:45) at org.apache.flink.api.java.typeutils.ObjectArrayTypeInfo.getInfoFor(ObjectArrayTypeInfo.java:167) at org.apache.flink.api.java.typeutils.TypeExtractor.privateGetForClass(TypeExtractor.java:1150) at org.apache.flink.api.java.typeutils.TypeExtractor.privateGetForClass(TypeExtractor.java:1122) at org.apache.flink.api.java.typeutils.TypeExtractor.privateGetForObject(TypeExtractor.java:1476) at org.apache.flink.api.java.typeutils.TypeExtractor.getForObject(TypeExtractor.java:1446) at org.apache.flink.streaming.connectors.kafka.Utils$TypeInformationSerializationSchema.init(Utils.java:37) at de.otto.streamexample.WCExample.main(WCExample.java:132) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2098) Checkpoint barrier initiation at source is not aligned with snapshotting
[ https://issues.apache.org/jira/browse/FLINK-2098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14567445#comment-14567445 ] ASF GitHub Bot commented on FLINK-2098: --- Github user aljoscha commented on the pull request: https://github.com/apache/flink/pull/742#issuecomment-107589378 I reworked the sources now. Could someone please have another pass over this. I think this is very critical code. Checkpoint barrier initiation at source is not aligned with snapshotting Key: FLINK-2098 URL: https://issues.apache.org/jira/browse/FLINK-2098 Project: Flink Issue Type: Bug Components: Streaming Affects Versions: 0.9 Reporter: Stephan Ewen Assignee: Aljoscha Krettek Priority: Blocker Fix For: 0.9 The stream source does not properly align the emission of checkpoint barriers with the drawing of snapshots. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-2121] Fix the summation in FileInputFor...
Github user asfgit closed the pull request at: https://github.com/apache/flink/pull/752 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-2101) Scheme Inference doesn't work for Tuple5
[ https://issues.apache.org/jira/browse/FLINK-2101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14567489#comment-14567489 ] Stephan Ewen commented on FLINK-2101: - So, is there any problem remaining, or can this issue be closed? Scheme Inference doesn't work for Tuple5 Key: FLINK-2101 URL: https://issues.apache.org/jira/browse/FLINK-2101 Project: Flink Issue Type: Bug Components: Kafka Connector Affects Versions: master Reporter: Rico Bergmann Assignee: Robert Metzger Calling addSink(new KafkaSinkTuple5String, String, String, Long, Double( localhost:9092, webtrends.ec1601, new Utils.TypeInformationSerializationSchemaTuple5String, String, String, Long, Double( new Tuple5String, String, String, Long, Double(), env.getConfig(; gives me an Exception stating, that the generic type infos are not given. Exception in thread main org.apache.flink.api.common.functions.InvalidTypesException: Tuple needs to be parameterized by using generics. at org.apache.flink.api.java.typeutils.TypeExtractor.createTypeInfoWithTypeHierarchy(TypeExtractor.java:396) at org.apache.flink.api.java.typeutils.TypeExtractor.privateCreateTypeInfo(TypeExtractor.java:339) at org.apache.flink.api.java.typeutils.TypeExtractor.createTypeInfo(TypeExtractor.java:318) at org.apache.flink.api.java.typeutils.ObjectArrayTypeInfo.init(ObjectArrayTypeInfo.java:45) at org.apache.flink.api.java.typeutils.ObjectArrayTypeInfo.getInfoFor(ObjectArrayTypeInfo.java:167) at org.apache.flink.api.java.typeutils.TypeExtractor.privateGetForClass(TypeExtractor.java:1150) at org.apache.flink.api.java.typeutils.TypeExtractor.privateGetForClass(TypeExtractor.java:1122) at org.apache.flink.api.java.typeutils.TypeExtractor.privateGetForObject(TypeExtractor.java:1476) at org.apache.flink.api.java.typeutils.TypeExtractor.getForObject(TypeExtractor.java:1446) at org.apache.flink.streaming.connectors.kafka.Utils$TypeInformationSerializationSchema.init(Utils.java:37) at de.otto.streamexample.WCExample.main(WCExample.java:132) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-2098] Ensure checkpoints and element em...
Github user StephanEwen commented on the pull request: https://github.com/apache/flink/pull/742#issuecomment-107613541 I'll make a pass --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-2127) The GSA Documentation has trailing /p s
[ https://issues.apache.org/jira/browse/FLINK-2127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14567396#comment-14567396 ] Ufuk Celebi commented on FLINK-2127: Building it locally works fine. I guess it's a problem with the CI server, which builds the docs. I'm not familiar with the setup there, but it has probably another version of Jekyll/kramdown installed. Can you build it locally with ./docs/build_docs.sh -p and check http://localhost:4000/libs/gelly_guide.html#vertex-centric-iterations? The GSA Documentation has trailing /p s - Key: FLINK-2127 URL: https://issues.apache.org/jira/browse/FLINK-2127 Project: Flink Issue Type: Bug Components: Documentation, Gelly Affects Versions: 0.9 Reporter: Andra Lungu Priority: Minor Within the GSA Section of the documentation, there are trailing: p class=text-center image /p. It would be nice to remove them :) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1430) Add test for streaming scala api completeness
[ https://issues.apache.org/jira/browse/FLINK-1430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14567410#comment-14567410 ] ASF GitHub Bot commented on FLINK-1430: --- Github user aljoscha commented on the pull request: https://github.com/apache/flink/pull/753#issuecomment-107569766 +1 looks good to merge :smile: Add test for streaming scala api completeness - Key: FLINK-1430 URL: https://issues.apache.org/jira/browse/FLINK-1430 Project: Flink Issue Type: Test Components: Streaming Affects Versions: 0.9 Reporter: Márton Balassi Assignee: Márton Balassi Currently the completeness of the streaming scala api is not tested. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-2098] Ensure checkpoints and element em...
Github user aljoscha commented on the pull request: https://github.com/apache/flink/pull/742#issuecomment-107589378 I reworked the sources now. Could someone please have another pass over this. I think this is very critical code. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Closed] (FLINK-2121) FileInputFormat.addFilesInDir miscalculates total size
[ https://issues.apache.org/jira/browse/FLINK-2121?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stephan Ewen closed FLINK-2121. --- FileInputFormat.addFilesInDir miscalculates total size -- Key: FLINK-2121 URL: https://issues.apache.org/jira/browse/FLINK-2121 Project: Flink Issue Type: Bug Components: Core Reporter: Gabor Gevay Assignee: Gabor Gevay Priority: Minor Fix For: 0.9 In FileInputFormat.addFilesInDir, the length variable should start from 0, because the return value is always used by adding it to the length (instead of just assigning). So with the current version, the length before the call will be seen twice in the result. mvn verify caught this for me now. The reason why this hasn't been seen yet, is because testGetStatisticsMultipleNestedFiles catches this only if it gets the listings of the outer directory in a certain order. Concretely, if the inner directory is seen before the other file in the outer directory, then length is 0 at that point, so the bug doesn't show. But if the other file is seen first, then its size is added twice to the total result. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (FLINK-2121) FileInputFormat.addFilesInDir miscalculates total size
[ https://issues.apache.org/jira/browse/FLINK-2121?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stephan Ewen resolved FLINK-2121. - Resolution: Fixed Fix Version/s: 0.9 Fixed via 033409190235f93ed6d4e652214e7f35a34c3fe3 Thank you for the patch! FileInputFormat.addFilesInDir miscalculates total size -- Key: FLINK-2121 URL: https://issues.apache.org/jira/browse/FLINK-2121 Project: Flink Issue Type: Bug Components: Core Reporter: Gabor Gevay Assignee: Gabor Gevay Priority: Minor Fix For: 0.9 In FileInputFormat.addFilesInDir, the length variable should start from 0, because the return value is always used by adding it to the length (instead of just assigning). So with the current version, the length before the call will be seen twice in the result. mvn verify caught this for me now. The reason why this hasn't been seen yet, is because testGetStatisticsMultipleNestedFiles catches this only if it gets the listings of the outer directory in a certain order. Concretely, if the inner directory is seen before the other file in the outer directory, then length is 0 at that point, so the bug doesn't show. But if the other file is seen first, then its size is added twice to the total result. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2098) Checkpoint barrier initiation at source is not aligned with snapshotting
[ https://issues.apache.org/jira/browse/FLINK-2098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14567498#comment-14567498 ] ASF GitHub Bot commented on FLINK-2098: --- Github user StephanEwen commented on the pull request: https://github.com/apache/flink/pull/742#issuecomment-107613541 I'll make a pass Checkpoint barrier initiation at source is not aligned with snapshotting Key: FLINK-2098 URL: https://issues.apache.org/jira/browse/FLINK-2098 Project: Flink Issue Type: Bug Components: Streaming Affects Versions: 0.9 Reporter: Stephan Ewen Assignee: Aljoscha Krettek Priority: Blocker Fix For: 0.9 The stream source does not properly align the emission of checkpoint barriers with the drawing of snapshots. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [WIP] [FLINK-1993] [ml] - Replace MultipleLine...
Github user thvasilo commented on the pull request: https://github.com/apache/flink/pull/725#issuecomment-107563713 Closing as it gets super-seeded by the optimization framework [refactoring](https://github.com/apache/flink/pull/740) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [WIP] [FLINK-1993] [ml] - Replace MultipleLine...
Github user thvasilo closed the pull request at: https://github.com/apache/flink/pull/725 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-1430] [streaming] Scala API completenes...
GitHub user mbalassi opened a pull request: https://github.com/apache/flink/pull/753 [FLINK-1430] [streaming] Scala API completeness for streaming Added the necessary checks and missing methods for streaming. Created an abstract base class for these completeness tests to have the base functionality in a central place. You can merge this pull request into a Git repository by running: $ git pull https://github.com/mbalassi/flink flink-1430 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/753.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #753 commit 50c818d6d28c33546546812c78db7f47c1a32447 Author: mbalassi mbala...@apache.org Date: 2015-06-01T14:56:15Z [FLINK-1430] [streaming] Scala API completeness for streaming --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Closed] (FLINK-2004) Memory leak in presence of failed checkpoints in KafkaSource
[ https://issues.apache.org/jira/browse/FLINK-2004?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stephan Ewen closed FLINK-2004. --- Memory leak in presence of failed checkpoints in KafkaSource Key: FLINK-2004 URL: https://issues.apache.org/jira/browse/FLINK-2004 Project: Flink Issue Type: Bug Components: Streaming Affects Versions: 0.9 Reporter: Stephan Ewen Assignee: Robert Metzger Priority: Critical Fix For: 0.9 Checkpoints that fail never send a commit message to the tasks. Maintaining a map of all pending checkpoints introduces a memory leak, as entries for failed checkpoints will never be removed. Approaches to fix this: - The source cleans up entries from older checkpoints once a checkpoint is committed (simple implementation in a linked hash map) - The commit message could include the optional state handle (source needs not maintain the map) - The checkpoint coordinator could send messages for failed checkpoints? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2004) Memory leak in presence of failed checkpoints in KafkaSource
[ https://issues.apache.org/jira/browse/FLINK-2004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14567475#comment-14567475 ] ASF GitHub Bot commented on FLINK-2004: --- Github user asfgit closed the pull request at: https://github.com/apache/flink/pull/674 Memory leak in presence of failed checkpoints in KafkaSource Key: FLINK-2004 URL: https://issues.apache.org/jira/browse/FLINK-2004 Project: Flink Issue Type: Bug Components: Streaming Affects Versions: 0.9 Reporter: Stephan Ewen Assignee: Robert Metzger Priority: Critical Fix For: 0.9 Checkpoints that fail never send a commit message to the tasks. Maintaining a map of all pending checkpoints introduces a memory leak, as entries for failed checkpoints will never be removed. Approaches to fix this: - The source cleans up entries from older checkpoints once a checkpoint is committed (simple implementation in a linked hash map) - The commit message could include the optional state handle (source needs not maintain the map) - The checkpoint coordinator could send messages for failed checkpoints? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-2004] Fix memory leak in presence of fa...
Github user asfgit closed the pull request at: https://github.com/apache/flink/pull/674 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-2127) The GSA Documentation has trailing /p s
[ https://issues.apache.org/jira/browse/FLINK-2127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14567378#comment-14567378 ] Andra Lungu commented on FLINK-2127: Hi Ufuk, Nope, that's not what I meant :) If you look at: http://ci.apache.org/projects/flink/flink-docs-master/libs/gelly_guide.html#vertex-centric-iterations , the image for the second superstep speaks for itself! The GSA Documentation has trailing /p s - Key: FLINK-2127 URL: https://issues.apache.org/jira/browse/FLINK-2127 Project: Flink Issue Type: Bug Components: Documentation, Gelly Affects Versions: 0.9 Reporter: Andra Lungu Priority: Minor Within the GSA Section of the documentation, there are trailing: p class=text-center image /p. It would be nice to remove them :) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1319) Add static code analysis for UDFs
[ https://issues.apache.org/jira/browse/FLINK-1319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14567304#comment-14567304 ] ASF GitHub Bot commented on FLINK-1319: --- Github user StephanEwen commented on the pull request: https://github.com/apache/flink/pull/729#issuecomment-107482998 The concern about logging is that, when using the local mode inside the IDE, the system logs a lot and the hints get lost. If you don't want sysoutput, you could always deactivate the analysis. Add static code analysis for UDFs - Key: FLINK-1319 URL: https://issues.apache.org/jira/browse/FLINK-1319 Project: Flink Issue Type: New Feature Components: Java API, Scala API Reporter: Stephan Ewen Assignee: Timo Walther Priority: Minor Flink's Optimizer takes information that tells it for UDFs which fields of the input elements are accessed, modified, or frwarded/copied. This information frequently helps to reuse partitionings, sorts, etc. It may speed up programs significantly, as it can frequently eliminate sorts and shuffles, which are costly. Right now, users can add lightweight annotations to UDFs to provide this information (such as adding {{@ConstandFields(0-3, 1, 2-1)}}. We worked with static code analysis of UDFs before, to determine this information automatically. This is an incredible feature, as it magically makes programs faster. For record-at-a-time operations (Map, Reduce, FlatMap, Join, Cross), this works surprisingly well in many cases. We used the Soot toolkit for the static code analysis. Unfortunately, Soot is LGPL licensed and thus we did not include any of the code so far. I propose to add this functionality to Flink, in the form of a drop-in addition, to work around the LGPL incompatibility with ALS 2.0. Users could simply download a special flink-code-analysis.jar and drop it into the lib folder to enable this functionality. We may even add a script to tools that downloads that library automatically into the lib folder. This should be legally fine, since we do not redistribute LGPL code and only dynamically link it (the incompatibility with ASL 2.0 is mainly in the patentability, if I remember correctly). Prior work on this has been done by [~aljoscha] and [~skunert], which could provide a code base to start with. *Appendix* Hompage to Soot static analysis toolkit: http://www.sable.mcgill.ca/soot/ Papers on static analysis and for optimization: http://stratosphere.eu/assets/papers/EnablingOperatorReorderingSCA_12.pdf and http://stratosphere.eu/assets/papers/openingTheBlackBoxes_12.pdf Quick introduction to the Optimizer: http://stratosphere.eu/assets/papers/2014-VLDBJ_Stratosphere_Overview.pdf (Section 6) Optimizer for Iterations: http://stratosphere.eu/assets/papers/spinningFastIterativeDataFlows_12.pdf (Sections 4.3 and 5.3) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-1319][core] Add static code analysis fo...
Github user StephanEwen commented on the pull request: https://github.com/apache/flink/pull/729#issuecomment-107482998 The concern about logging is that, when using the local mode inside the IDE, the system logs a lot and the hints get lost. If you don't want sysoutput, you could always deactivate the analysis. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Resolved] (FLINK-2114) PunctuationPolicy.toString() throws NullPointerException if extractor is null
[ https://issues.apache.org/jira/browse/FLINK-2114?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gabor Gevay resolved FLINK-2114. Resolution: Fixed PunctuationPolicy.toString() throws NullPointerException if extractor is null - Key: FLINK-2114 URL: https://issues.apache.org/jira/browse/FLINK-2114 Project: Flink Issue Type: Bug Components: Streaming Reporter: Gabor Gevay Assignee: Gabor Gevay Priority: Minor Parenthesis is missing in PunctuationPolicy.toString() around the conditional operator checking for not null, which makes the condition always true. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-2076] [runtime] Fix memory leakage in M...
Github user chiwanpark commented on the pull request: https://github.com/apache/flink/pull/751#issuecomment-107478713 I fixed the bug related to non-null memory segment. The bug was caused by non-clearing `writeBehindBuffersAvailable` variable in `close()` method of `MutableHashTable` class. I added a test case for this bug. I have a problem for creating a test case to test the bug related to null memory segment. I think that it is related to parallelism also. If I run ConnectedComponents example with parallelism 1, the bug not occurs. So I cannot reproduce the state without ConnectedComponents example. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-2111) Add terminate signal to cleanly stop streaming jobs
[ https://issues.apache.org/jira/browse/FLINK-2111?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14567386#comment-14567386 ] ASF GitHub Bot commented on FLINK-2111: --- Github user StephanEwen commented on the pull request: https://github.com/apache/flink/pull/750#issuecomment-107549742 Good idea! Given the fact that we are stabilizing at the moment (and this is introducing new functionality), I vote to postpone that to after the 0.9 release. That would be two weeks or so, if things go well. Until then, here are a few thoughts - Terminate sounds like a very non-graceful killing. What this does is the opposite, so how about calling it stop signal. - We will need a way to stop streaming jobs and perform a checkpoint during the stopping. Would be good to bear this in mind. Can we add this as part of this pull request? Add terminate signal to cleanly stop streaming jobs - Key: FLINK-2111 URL: https://issues.apache.org/jira/browse/FLINK-2111 Project: Flink Issue Type: Improvement Components: Distributed Runtime, JobManager, Local Runtime, Streaming, TaskManager, Webfrontend Reporter: Matthias J. Sax Assignee: Matthias J. Sax Priority: Minor Currently, streaming jobs can only be stopped using cancel command, what is a hard stop with no clean shutdown. The new introduced terminate signal, will only affect streaming source tasks such that the sources can stop emitting data and terminate cleanly, resulting in a clean termination of the whole streaming job. This feature is a pre-requirment for https://issues.apache.org/jira/browse/FLINK-1929 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-2111] Add terminate signal to cleanly...
Github user StephanEwen commented on the pull request: https://github.com/apache/flink/pull/750#issuecomment-107549742 Good idea! Given the fact that we are stabilizing at the moment (and this is introducing new functionality), I vote to postpone that to after the 0.9 release. That would be two weeks or so, if things go well. Until then, here are a few thoughts - Terminate sounds like a very non-graceful killing. What this does is the opposite, so how about calling it stop signal. - We will need a way to stop streaming jobs and perform a checkpoint during the stopping. Would be good to bear this in mind. Can we add this as part of this pull request? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Created] (FLINK-2127) The GSA Documentation has trailing /p s
Andra Lungu created FLINK-2127: -- Summary: The GSA Documentation has trailing /p s Key: FLINK-2127 URL: https://issues.apache.org/jira/browse/FLINK-2127 Project: Flink Issue Type: Bug Components: Gelly Affects Versions: 0.9 Reporter: Andra Lungu Priority: Minor Within the GSA Section of the documentation, there are trailing: p class=text-center image /p. It would be nice to remove them :) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (FLINK-2127) The GSA Documentation has trailing /p s
[ https://issues.apache.org/jira/browse/FLINK-2127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ufuk Celebi updated FLINK-2127: --- Component/s: Documentation The GSA Documentation has trailing /p s - Key: FLINK-2127 URL: https://issues.apache.org/jira/browse/FLINK-2127 Project: Flink Issue Type: Bug Components: Documentation, Gelly Affects Versions: 0.9 Reporter: Andra Lungu Priority: Minor Within the GSA Section of the documentation, there are trailing: p class=text-center image /p. It would be nice to remove them :) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2076) Bug in re-openable hash join
[ https://issues.apache.org/jira/browse/FLINK-2076?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14567296#comment-14567296 ] ASF GitHub Bot commented on FLINK-2076: --- Github user chiwanpark commented on the pull request: https://github.com/apache/flink/pull/751#issuecomment-107478713 I fixed the bug related to non-null memory segment. The bug was caused by non-clearing `writeBehindBuffersAvailable` variable in `close()` method of `MutableHashTable` class. I added a test case for this bug. I have a problem for creating a test case to test the bug related to null memory segment. I think that it is related to parallelism also. If I run ConnectedComponents example with parallelism 1, the bug not occurs. So I cannot reproduce the state without ConnectedComponents example. Bug in re-openable hash join Key: FLINK-2076 URL: https://issues.apache.org/jira/browse/FLINK-2076 Project: Flink Issue Type: Bug Components: Local Runtime Affects Versions: 0.9 Reporter: Stephan Ewen Assignee: Chiwan Park It happens deterministically in my machine with the following setup: TaskManager: - heap size: 512m - network buffers: 4096 - slots: 32 Job: - ConnectedComponents - 100k vertices - 1.2m edges -- this gives around 260 m Flink managed memory, across 32 slots is 8MB per slot, with several mem consumers in the job, makes the iterative hash join out-of-core {code} java.lang.RuntimeException: Hash Join bug in memory management: Memory buffers leaked. at org.apache.flink.runtime.operators.hash.MutableHashTable.buildTableFromSpilledPartition(MutableHashTable.java:733) at org.apache.flink.runtime.operators.hash.MutableHashTable.prepareNextPartition(MutableHashTable.java:508) at org.apache.flink.runtime.operators.hash.ReOpenableMutableHashTable.prepareNextPartition(ReOpenableMutableHashTable.java:167) at org.apache.flink.runtime.operators.hash.MutableHashTable.nextRecord(MutableHashTable.java:541) at org.apache.flink.runtime.operators.hash.NonReusingBuildSecondHashMatchIterator.callWithNextKey(NonReusingBuildSecondHashMatchIterator.java:102) at org.apache.flink.runtime.operators.AbstractCachedBuildSideMatchDriver.run(AbstractCachedBuildSideMatchDriver.java:155) at org.apache.flink.runtime.operators.RegularPactTask.run(RegularPactTask.java:496) at org.apache.flink.runtime.iterative.task.AbstractIterativePactTask.run(AbstractIterativePactTask.java:139) at org.apache.flink.runtime.iterative.task.IterationIntermediatePactTask.run(IterationIntermediatePactTask.java:92) at org.apache.flink.runtime.operators.RegularPactTask.invoke(RegularPactTask.java:362) at org.apache.flink.runtime.taskmanager.Task.run(Task.java:560) at java.lang.Thread.run(Thread.java:745) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2120) Rename AbstractJobVertex to JobVertex
[ https://issues.apache.org/jira/browse/FLINK-2120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14567329#comment-14567329 ] Stephan Ewen commented on FLINK-2120: - +1 Rename AbstractJobVertex to JobVertex - Key: FLINK-2120 URL: https://issues.apache.org/jira/browse/FLINK-2120 Project: Flink Issue Type: Wish Components: Distributed Runtime Affects Versions: master Reporter: Ufuk Celebi Priority: Trivial I would like to rename AbstractJobVertex to JobVertex. It is not abstract and we have a lot of references to it in tests, where we create instances. This is trivial, but I think it is a bad name. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2121) FileInputFormat.addFilesInDir miscalculates total size
[ https://issues.apache.org/jira/browse/FLINK-2121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14567328#comment-14567328 ] ASF GitHub Bot commented on FLINK-2121: --- Github user StephanEwen commented on the pull request: https://github.com/apache/flink/pull/752#issuecomment-107498384 +1, should go into the release candidate. Will merge this now... FileInputFormat.addFilesInDir miscalculates total size -- Key: FLINK-2121 URL: https://issues.apache.org/jira/browse/FLINK-2121 Project: Flink Issue Type: Bug Components: Core Reporter: Gabor Gevay Assignee: Gabor Gevay Priority: Minor In FileInputFormat.addFilesInDir, the length variable should start from 0, because the return value is always used by adding it to the length (instead of just assigning). So with the current version, the length before the call will be seen twice in the result. mvn verify caught this for me now. The reason why this hasn't been seen yet, is because testGetStatisticsMultipleNestedFiles catches this only if it gets the listings of the outer directory in a certain order. Concretely, if the inner directory is seen before the other file in the outer directory, then length is 0 at that point, so the bug doesn't show. But if the other file is seen first, then its size is added twice to the total result. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-2121] Fix the summation in FileInputFor...
Github user StephanEwen commented on the pull request: https://github.com/apache/flink/pull/752#issuecomment-107498384 +1, should go into the release candidate. Will merge this now... --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: Implemented TwitterSourceFilter and adapted Tw...
Github user HilmiYildirim commented on the pull request: https://github.com/apache/flink/pull/695#issuecomment-107551663 I think it is good. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-2127) The GSA Documentation has trailing /p s
[ https://issues.apache.org/jira/browse/FLINK-2127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14567661#comment-14567661 ] Andra Lungu commented on FLINK-2127: Yep, building it locally works :). Nevertheless, the users see the online documentation, so I would still like to keep this open as a bug. The problem seems to be the CI server, yes. Can someone help me with more information? I'm sure we can find a way to fix this. The GSA Documentation has trailing /p s - Key: FLINK-2127 URL: https://issues.apache.org/jira/browse/FLINK-2127 Project: Flink Issue Type: Bug Components: Documentation, Gelly Affects Versions: 0.9 Reporter: Andra Lungu Priority: Minor Within the GSA Section of the documentation, there are trailing: p class=text-center image /p. It would be nice to remove them :) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1727) Add decision tree to machine learning library
[ https://issues.apache.org/jira/browse/FLINK-1727?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14567676#comment-14567676 ] Sachin Goel commented on FLINK-1727: A complete implementation, with both Gini gain and entropy gain, along with Categorical and continuous attributes is present here: https://github.com/apache/flink/pull/710 Two example data sets are also included in the test suite. Add decision tree to machine learning library - Key: FLINK-1727 URL: https://issues.apache.org/jira/browse/FLINK-1727 Project: Flink Issue Type: New Feature Components: Machine Learning Library Reporter: Till Rohrmann Assignee: Sachin Goel Labels: ML Decision trees are widely used for classification and regression tasks. Thus, it would be worthwhile to add support for them to Flink's machine learning library. A streaming parallel decision tree learning algorithm has been proposed by Ben-Haim and Tom-Tov [1]. This can maybe adapted to a batch use case as well. [2] contains an overview of different techniques of how to scale inductive learning algorithms up. A presentation of Spark's MLlib decision tree implementation can be found in [3]. Resources: [1] [http://www.jmlr.org/papers/volume11/ben-haim10a/ben-haim10a.pdf] [2] [http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.46.8226rep=rep1type=pdf] [3] [http://spark-summit.org/wp-content/uploads/2014/07/Scalable-Distributed-Decision-Trees-in-Spark-Made-Das-Sparks-Talwalkar.pdf] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-2098] Ensure checkpoints and element em...
Github user mbalassi commented on a diff in the pull request: https://github.com/apache/flink/pull/742#discussion_r31461903 --- Diff: flink-staging/flink-streaming/flink-streaming-connectors/flink-connector-twitter/src/main/java/org/apache/flink/streaming/connectors/twitter/TwitterSource.java --- @@ -1,322 +1,322 @@ -/* - * Licensed to the Apache Software Foundation (ASF) under one or more - * contributor license agreements. See the NOTICE file distributed with - * this work for additional information regarding copyright ownership. - * The ASF licenses this file to You under the Apache License, Version 2.0 - * (the License); you may not use this file except in compliance with - * the License. You may obtain a copy of the License at - * - *http://www.apache.org/licenses/LICENSE-2.0 - * - * Unless required by applicable law or agreed to in writing, software - * distributed under the License is distributed on an AS IS BASIS, - * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. - * See the License for the specific language governing permissions and - * limitations under the License. - */ - -package org.apache.flink.streaming.connectors.twitter; - -import java.io.FileInputStream; -import java.io.InputStream; -import java.util.Properties; -import java.util.concurrent.BlockingQueue; -import java.util.concurrent.LinkedBlockingQueue; -import java.util.concurrent.TimeUnit; - -import org.apache.flink.configuration.Configuration; -import org.apache.flink.streaming.api.functions.source.RichParallelSourceFunction; -import org.apache.flink.streaming.api.functions.source.SourceFunction; -import org.apache.flink.util.Collector; -import org.slf4j.Logger; -import org.slf4j.LoggerFactory; - -import com.twitter.hbc.ClientBuilder; -import com.twitter.hbc.core.Constants; -import com.twitter.hbc.core.endpoint.StatusesSampleEndpoint; -import com.twitter.hbc.core.processor.StringDelimitedProcessor; -import com.twitter.hbc.httpclient.BasicClient; -import com.twitter.hbc.httpclient.auth.Authentication; -import com.twitter.hbc.httpclient.auth.OAuth1; - -/** - * Implementation of {@link SourceFunction} specialized to emit tweets from - * Twitter. It can connect to Twitter Streaming API, collect tweets and - */ -public class TwitterSource extends RichParallelSourceFunctionString { - - private static final Logger LOG = LoggerFactory.getLogger(TwitterSource.class); - - private static final long serialVersionUID = 1L; - private String authPath; - private transient BlockingQueueString queue; - private int queueSize = 1; - private transient BasicClient client; - private int waitSec = 5; - - private int maxNumberOfTweets; - private int currentNumberOfTweets; - - private String nextElement = null; - - private volatile boolean isRunning = false; - - /** -* Create {@link TwitterSource} for streaming -* -* @param authPath -*Location of the properties file containing the required -*authentication information. -*/ - public TwitterSource(String authPath) { - this.authPath = authPath; - maxNumberOfTweets = -1; - } - - /** -* Create {@link TwitterSource} to collect finite number of tweets -* -* @param authPath -*Location of the properties file containing the required -*authentication information. -* @param numberOfTweets -* -*/ - public TwitterSource(String authPath, int numberOfTweets) { - this.authPath = authPath; - this.maxNumberOfTweets = numberOfTweets; - } - - @Override - public void open(Configuration parameters) throws Exception { - initializeConnection(); - currentNumberOfTweets = 0; - } - - /** -* Initialize Hosebird Client to be able to consume Twitter's Streaming API -*/ - private void initializeConnection() { - - if (LOG.isInfoEnabled()) { - LOG.info(Initializing Twitter Streaming API connection); - } - - queue = new LinkedBlockingQueueString(queueSize); - - StatusesSampleEndpoint endpoint = new StatusesSampleEndpoint(); - endpoint.stallWarnings(false); - - Authentication auth = authenticate(); - - initializeClient(endpoint, auth); - - if (LOG.isInfoEnabled()) { - LOG.info(Twitter Streaming API connection established successfully); - } - } - - private OAuth1 authenticate()
[jira] [Commented] (FLINK-2098) Checkpoint barrier initiation at source is not aligned with snapshotting
[ https://issues.apache.org/jira/browse/FLINK-2098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14567853#comment-14567853 ] ASF GitHub Bot commented on FLINK-2098: --- Github user mbalassi commented on a diff in the pull request: https://github.com/apache/flink/pull/742#discussion_r31460859 --- Diff: flink-staging/flink-streaming/flink-streaming-connectors/flink-connector-kafka/src/main/java/org/apache/flink/streaming/connectors/kafka/api/persistent/PersistentKafkaSource.java --- @@ -61,10 +63,14 @@ * * Note that the autocommit feature of Kafka needs to be disabled for using this source. */ -public class PersistentKafkaSourceOUT extends RichParallelSourceFunctionOUT implements +public class PersistentKafkaSourceOUT extends RichSourceFunctionOUT implements ResultTypeQueryableOUT, CheckpointCommitter, + ParallelSourceFunctionOUT, CheckpointedAsynchronouslylong[] { --- End diff -- Please have a RichParallelSourceFunction instead. Checkpoint barrier initiation at source is not aligned with snapshotting Key: FLINK-2098 URL: https://issues.apache.org/jira/browse/FLINK-2098 Project: Flink Issue Type: Bug Components: Streaming Affects Versions: 0.9 Reporter: Stephan Ewen Assignee: Aljoscha Krettek Priority: Blocker Fix For: 0.9 The stream source does not properly align the emission of checkpoint barriers with the drawing of snapshots. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-2098] Ensure checkpoints and element em...
Github user mbalassi commented on a diff in the pull request: https://github.com/apache/flink/pull/742#discussion_r31460859 --- Diff: flink-staging/flink-streaming/flink-streaming-connectors/flink-connector-kafka/src/main/java/org/apache/flink/streaming/connectors/kafka/api/persistent/PersistentKafkaSource.java --- @@ -61,10 +63,14 @@ * * Note that the autocommit feature of Kafka needs to be disabled for using this source. */ -public class PersistentKafkaSourceOUT extends RichParallelSourceFunctionOUT implements +public class PersistentKafkaSourceOUT extends RichSourceFunctionOUT implements ResultTypeQueryableOUT, CheckpointCommitter, + ParallelSourceFunctionOUT, CheckpointedAsynchronouslylong[] { --- End diff -- Please have a RichParallelSourceFunction instead. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-2098) Checkpoint barrier initiation at source is not aligned with snapshotting
[ https://issues.apache.org/jira/browse/FLINK-2098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14567868#comment-14567868 ] ASF GitHub Bot commented on FLINK-2098: --- Github user mbalassi commented on a diff in the pull request: https://github.com/apache/flink/pull/742#discussion_r31461903 --- Diff: flink-staging/flink-streaming/flink-streaming-connectors/flink-connector-twitter/src/main/java/org/apache/flink/streaming/connectors/twitter/TwitterSource.java --- @@ -1,322 +1,322 @@ -/* - * Licensed to the Apache Software Foundation (ASF) under one or more - * contributor license agreements. See the NOTICE file distributed with - * this work for additional information regarding copyright ownership. - * The ASF licenses this file to You under the Apache License, Version 2.0 - * (the License); you may not use this file except in compliance with - * the License. You may obtain a copy of the License at - * - *http://www.apache.org/licenses/LICENSE-2.0 - * - * Unless required by applicable law or agreed to in writing, software - * distributed under the License is distributed on an AS IS BASIS, - * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. - * See the License for the specific language governing permissions and - * limitations under the License. - */ - -package org.apache.flink.streaming.connectors.twitter; - -import java.io.FileInputStream; -import java.io.InputStream; -import java.util.Properties; -import java.util.concurrent.BlockingQueue; -import java.util.concurrent.LinkedBlockingQueue; -import java.util.concurrent.TimeUnit; - -import org.apache.flink.configuration.Configuration; -import org.apache.flink.streaming.api.functions.source.RichParallelSourceFunction; -import org.apache.flink.streaming.api.functions.source.SourceFunction; -import org.apache.flink.util.Collector; -import org.slf4j.Logger; -import org.slf4j.LoggerFactory; - -import com.twitter.hbc.ClientBuilder; -import com.twitter.hbc.core.Constants; -import com.twitter.hbc.core.endpoint.StatusesSampleEndpoint; -import com.twitter.hbc.core.processor.StringDelimitedProcessor; -import com.twitter.hbc.httpclient.BasicClient; -import com.twitter.hbc.httpclient.auth.Authentication; -import com.twitter.hbc.httpclient.auth.OAuth1; - -/** - * Implementation of {@link SourceFunction} specialized to emit tweets from - * Twitter. It can connect to Twitter Streaming API, collect tweets and - */ -public class TwitterSource extends RichParallelSourceFunctionString { - - private static final Logger LOG = LoggerFactory.getLogger(TwitterSource.class); - - private static final long serialVersionUID = 1L; - private String authPath; - private transient BlockingQueueString queue; - private int queueSize = 1; - private transient BasicClient client; - private int waitSec = 5; - - private int maxNumberOfTweets; - private int currentNumberOfTweets; - - private String nextElement = null; - - private volatile boolean isRunning = false; - - /** -* Create {@link TwitterSource} for streaming -* -* @param authPath -*Location of the properties file containing the required -*authentication information. -*/ - public TwitterSource(String authPath) { - this.authPath = authPath; - maxNumberOfTweets = -1; - } - - /** -* Create {@link TwitterSource} to collect finite number of tweets -* -* @param authPath -*Location of the properties file containing the required -*authentication information. -* @param numberOfTweets -* -*/ - public TwitterSource(String authPath, int numberOfTweets) { - this.authPath = authPath; - this.maxNumberOfTweets = numberOfTweets; - } - - @Override - public void open(Configuration parameters) throws Exception { - initializeConnection(); - currentNumberOfTweets = 0; - } - - /** -* Initialize Hosebird Client to be able to consume Twitter's Streaming API -*/ - private void initializeConnection() { - - if (LOG.isInfoEnabled()) { - LOG.info(Initializing Twitter Streaming API connection); - } - - queue = new LinkedBlockingQueueString(queueSize); - - StatusesSampleEndpoint endpoint = new StatusesSampleEndpoint(); - endpoint.stallWarnings(false); - - Authentication auth = authenticate(); - -
[jira] [Commented] (FLINK-2098) Checkpoint barrier initiation at source is not aligned with snapshotting
[ https://issues.apache.org/jira/browse/FLINK-2098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14567903#comment-14567903 ] ASF GitHub Bot commented on FLINK-2098: --- Github user aljoscha commented on a diff in the pull request: https://github.com/apache/flink/pull/742#discussion_r31463699 --- Diff: flink-staging/flink-streaming/flink-streaming-connectors/flink-connector-twitter/src/main/java/org/apache/flink/streaming/connectors/twitter/TwitterSource.java --- @@ -1,322 +1,322 @@ -/* - * Licensed to the Apache Software Foundation (ASF) under one or more - * contributor license agreements. See the NOTICE file distributed with - * this work for additional information regarding copyright ownership. - * The ASF licenses this file to You under the Apache License, Version 2.0 - * (the License); you may not use this file except in compliance with - * the License. You may obtain a copy of the License at - * - *http://www.apache.org/licenses/LICENSE-2.0 - * - * Unless required by applicable law or agreed to in writing, software - * distributed under the License is distributed on an AS IS BASIS, - * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. - * See the License for the specific language governing permissions and - * limitations under the License. - */ - -package org.apache.flink.streaming.connectors.twitter; - -import java.io.FileInputStream; -import java.io.InputStream; -import java.util.Properties; -import java.util.concurrent.BlockingQueue; -import java.util.concurrent.LinkedBlockingQueue; -import java.util.concurrent.TimeUnit; - -import org.apache.flink.configuration.Configuration; -import org.apache.flink.streaming.api.functions.source.RichParallelSourceFunction; -import org.apache.flink.streaming.api.functions.source.SourceFunction; -import org.apache.flink.util.Collector; -import org.slf4j.Logger; -import org.slf4j.LoggerFactory; - -import com.twitter.hbc.ClientBuilder; -import com.twitter.hbc.core.Constants; -import com.twitter.hbc.core.endpoint.StatusesSampleEndpoint; -import com.twitter.hbc.core.processor.StringDelimitedProcessor; -import com.twitter.hbc.httpclient.BasicClient; -import com.twitter.hbc.httpclient.auth.Authentication; -import com.twitter.hbc.httpclient.auth.OAuth1; - -/** - * Implementation of {@link SourceFunction} specialized to emit tweets from - * Twitter. It can connect to Twitter Streaming API, collect tweets and - */ -public class TwitterSource extends RichParallelSourceFunctionString { - - private static final Logger LOG = LoggerFactory.getLogger(TwitterSource.class); - - private static final long serialVersionUID = 1L; - private String authPath; - private transient BlockingQueueString queue; - private int queueSize = 1; - private transient BasicClient client; - private int waitSec = 5; - - private int maxNumberOfTweets; - private int currentNumberOfTweets; - - private String nextElement = null; - - private volatile boolean isRunning = false; - - /** -* Create {@link TwitterSource} for streaming -* -* @param authPath -*Location of the properties file containing the required -*authentication information. -*/ - public TwitterSource(String authPath) { - this.authPath = authPath; - maxNumberOfTweets = -1; - } - - /** -* Create {@link TwitterSource} to collect finite number of tweets -* -* @param authPath -*Location of the properties file containing the required -*authentication information. -* @param numberOfTweets -* -*/ - public TwitterSource(String authPath, int numberOfTweets) { - this.authPath = authPath; - this.maxNumberOfTweets = numberOfTweets; - } - - @Override - public void open(Configuration parameters) throws Exception { - initializeConnection(); - currentNumberOfTweets = 0; - } - - /** -* Initialize Hosebird Client to be able to consume Twitter's Streaming API -*/ - private void initializeConnection() { - - if (LOG.isInfoEnabled()) { - LOG.info(Initializing Twitter Streaming API connection); - } - - queue = new LinkedBlockingQueueString(queueSize); - - StatusesSampleEndpoint endpoint = new StatusesSampleEndpoint(); - endpoint.stallWarnings(false); - - Authentication auth = authenticate(); - -
[jira] [Commented] (FLINK-2101) Scheme Inference doesn't work for Tuple5
[ https://issues.apache.org/jira/browse/FLINK-2101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14567813#comment-14567813 ] Robert Metzger commented on FLINK-2101: --- Yes, will do. Scheme Inference doesn't work for Tuple5 Key: FLINK-2101 URL: https://issues.apache.org/jira/browse/FLINK-2101 Project: Flink Issue Type: Bug Components: Kafka Connector Affects Versions: master Reporter: Rico Bergmann Assignee: Robert Metzger Calling addSink(new KafkaSinkTuple5String, String, String, Long, Double( localhost:9092, webtrends.ec1601, new Utils.TypeInformationSerializationSchemaTuple5String, String, String, Long, Double( new Tuple5String, String, String, Long, Double(), env.getConfig(; gives me an Exception stating, that the generic type infos are not given. Exception in thread main org.apache.flink.api.common.functions.InvalidTypesException: Tuple needs to be parameterized by using generics. at org.apache.flink.api.java.typeutils.TypeExtractor.createTypeInfoWithTypeHierarchy(TypeExtractor.java:396) at org.apache.flink.api.java.typeutils.TypeExtractor.privateCreateTypeInfo(TypeExtractor.java:339) at org.apache.flink.api.java.typeutils.TypeExtractor.createTypeInfo(TypeExtractor.java:318) at org.apache.flink.api.java.typeutils.ObjectArrayTypeInfo.init(ObjectArrayTypeInfo.java:45) at org.apache.flink.api.java.typeutils.ObjectArrayTypeInfo.getInfoFor(ObjectArrayTypeInfo.java:167) at org.apache.flink.api.java.typeutils.TypeExtractor.privateGetForClass(TypeExtractor.java:1150) at org.apache.flink.api.java.typeutils.TypeExtractor.privateGetForClass(TypeExtractor.java:1122) at org.apache.flink.api.java.typeutils.TypeExtractor.privateGetForObject(TypeExtractor.java:1476) at org.apache.flink.api.java.typeutils.TypeExtractor.getForObject(TypeExtractor.java:1446) at org.apache.flink.streaming.connectors.kafka.Utils$TypeInformationSerializationSchema.init(Utils.java:37) at de.otto.streamexample.WCExample.main(WCExample.java:132) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2098) Checkpoint barrier initiation at source is not aligned with snapshotting
[ https://issues.apache.org/jira/browse/FLINK-2098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14567848#comment-14567848 ] ASF GitHub Bot commented on FLINK-2098: --- Github user mbalassi commented on a diff in the pull request: https://github.com/apache/flink/pull/742#discussion_r31460503 --- Diff: flink-staging/flink-streaming/flink-streaming-core/src/main/java/org/apache/flink/streaming/api/functions/source/SourceFunction.java --- @@ -19,66 +19,81 @@ package org.apache.flink.streaming.api.functions.source; import org.apache.flink.api.common.functions.Function; +import org.apache.flink.util.Collector; import java.io.Serializable; /** * Base interface for all stream data sources in Flink. The contract of a stream source - * is similar to an iterator - it is consumed as in the following pseudo code: --- End diff -- Maybe streaming data sources, instead of stream data sources. Not to confuse with file streams are intermediate results. Checkpoint barrier initiation at source is not aligned with snapshotting Key: FLINK-2098 URL: https://issues.apache.org/jira/browse/FLINK-2098 Project: Flink Issue Type: Bug Components: Streaming Affects Versions: 0.9 Reporter: Stephan Ewen Assignee: Aljoscha Krettek Priority: Blocker Fix For: 0.9 The stream source does not properly align the emission of checkpoint barriers with the drawing of snapshots. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-2098] Ensure checkpoints and element em...
Github user mbalassi commented on a diff in the pull request: https://github.com/apache/flink/pull/742#discussion_r31460503 --- Diff: flink-staging/flink-streaming/flink-streaming-core/src/main/java/org/apache/flink/streaming/api/functions/source/SourceFunction.java --- @@ -19,66 +19,81 @@ package org.apache.flink.streaming.api.functions.source; import org.apache.flink.api.common.functions.Function; +import org.apache.flink.util.Collector; import java.io.Serializable; /** * Base interface for all stream data sources in Flink. The contract of a stream source - * is similar to an iterator - it is consumed as in the following pseudo code: --- End diff -- Maybe streaming data sources, instead of stream data sources. Not to confuse with file streams are intermediate results. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Created] (FLINK-2128) ScalaShellITSuite failing
Ufuk Celebi created FLINK-2128: -- Summary: ScalaShellITSuite failing Key: FLINK-2128 URL: https://issues.apache.org/jira/browse/FLINK-2128 Project: Flink Issue Type: Bug Components: Scala Shell Affects Versions: master Reporter: Ufuk Celebi https://s3.amazonaws.com/archive.travis-ci.org/jobs/64947781/log.txt {code} ScalaShellITSuite: log4j:ERROR setFile(null,true) call failed. java.io.FileNotFoundException: /.log (Permission denied) at java.io.FileOutputStream.open(Native Method) at java.io.FileOutputStream.init(FileOutputStream.java:221) at java.io.FileOutputStream.init(FileOutputStream.java:142) at org.apache.log4j.FileAppender.setFile(FileAppender.java:294) at org.apache.log4j.FileAppender.activateOptions(FileAppender.java:165) at org.apache.log4j.config.PropertySetter.activate(PropertySetter.java:307) at org.apache.log4j.config.PropertySetter.setProperties(PropertySetter.java:172) at org.apache.log4j.config.PropertySetter.setProperties(PropertySetter.java:104) at org.apache.log4j.PropertyConfigurator.parseAppender(PropertyConfigurator.java:842) at org.apache.log4j.PropertyConfigurator.parseCategory(PropertyConfigurator.java:768) at org.apache.log4j.PropertyConfigurator.configureRootCategory(PropertyConfigurator.java:648) at org.apache.log4j.PropertyConfigurator.doConfigure(PropertyConfigurator.java:514) at org.apache.log4j.PropertyConfigurator.doConfigure(PropertyConfigurator.java:580) at org.apache.log4j.helpers.OptionConverter.selectAndConfigure(OptionConverter.java:526) at org.apache.log4j.LogManager.clinit(LogManager.java:127) at org.slf4j.impl.Log4jLoggerFactory.getLogger(Log4jLoggerFactory.java:66) at org.slf4j.LoggerFactory.getLogger(LoggerFactory.java:277) at org.slf4j.LoggerFactory.getLogger(LoggerFactory.java:288) at org.apache.flink.test.util.TestBaseUtils.clinit(TestBaseUtils.java:69) at org.apache.flink.api.scala.ScalaShellITSuite.beforeAll(ScalaShellITSuite.scala:198) at org.scalatest.BeforeAndAfterAll$class.beforeAll(BeforeAndAfterAll.scala:187) at org.apache.flink.api.scala.ScalaShellITSuite.beforeAll(ScalaShellITSuite.scala:33) at org.scalatest.BeforeAndAfterAll$class.run(BeforeAndAfterAll.scala:253) at org.apache.flink.api.scala.ScalaShellITSuite.run(ScalaShellITSuite.scala:33) at org.scalatest.Suite$class.callExecuteOnSuite$1(Suite.scala:1492) at org.scalatest.Suite$$anonfun$runNestedSuites$1.apply(Suite.scala:1528) at org.scalatest.Suite$$anonfun$runNestedSuites$1.apply(Suite.scala:1526) at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33) at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108) at org.scalatest.Suite$class.runNestedSuites(Suite.scala:1526) at org.scalatest.tools.DiscoverySuite.runNestedSuites(DiscoverySuite.scala:29) at org.scalatest.Suite$class.run(Suite.scala:1421) at org.scalatest.tools.DiscoverySuite.run(DiscoverySuite.scala:29) at org.scalatest.tools.SuiteRunner.run(SuiteRunner.scala:55) at org.scalatest.tools.Runner$$anonfun$doRunRunRunDaDoRunRun$3.apply(Runner.scala:2563) at org.scalatest.tools.Runner$$anonfun$doRunRunRunDaDoRunRun$3.apply(Runner.scala:2557) at scala.collection.immutable.List.foreach(List.scala:318) at org.scalatest.tools.Runner$.doRunRunRunDaDoRunRun(Runner.scala:2557) at org.scalatest.tools.Runner$$anonfun$runOptionallyWithPassFailReporter$2.apply(Runner.scala:1044) at org.scalatest.tools.Runner$$anonfun$runOptionallyWithPassFailReporter$2.apply(Runner.scala:1043) at org.scalatest.tools.Runner$.withClassLoaderAndDispatchReporter(Runner.scala:2722) at org.scalatest.tools.Runner$.runOptionallyWithPassFailReporter(Runner.scala:1043) at org.scalatest.tools.Runner$.main(Runner.scala:860) at org.scalatest.tools.Runner.main(Runner.scala) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2127) The GSA Documentation has trailing /p s
[ https://issues.apache.org/jira/browse/FLINK-2127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14567976#comment-14567976 ] Ufuk Celebi commented on FLINK-2127: This is definitely a bug. I didn't say that we should close it. ;-) [~mxm], can you provide some info on the CI setup? Which version of Jekyll and kramdown is installed? Locally, I'm using kramdown 1.7.0 and jekyll 2.5.3. The GSA Documentation has trailing /p s - Key: FLINK-2127 URL: https://issues.apache.org/jira/browse/FLINK-2127 Project: Flink Issue Type: Bug Components: Documentation, Gelly Affects Versions: 0.9 Reporter: Andra Lungu Priority: Minor Within the GSA Section of the documentation, there are trailing: p class=text-center image /p. It would be nice to remove them :) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (FLINK-2089) Buffer recycled IllegalStateException during cancelling
[ https://issues.apache.org/jira/browse/FLINK-2089?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ufuk Celebi resolved FLINK-2089. Resolution: Fixed Fix Version/s: 0.9 Fixed in 28eb274. Buffer recycled IllegalStateException during cancelling - Key: FLINK-2089 URL: https://issues.apache.org/jira/browse/FLINK-2089 Project: Flink Issue Type: Bug Components: Distributed Runtime Affects Versions: master Reporter: Ufuk Celebi Assignee: Ufuk Celebi Fix For: 0.9 [~rmetzger] reported the following stack trace during cancelling of high parallelism jobs: {code} Error: java.lang.IllegalStateException: Buffer has already been recycled. at org.apache.flink.shaded.com.google.common.base.Preconditions.checkState(Preconditions.java:173) at org.apache.flink.runtime.io.network.buffer.Buffer.ensureNotRecycled(Buffer.java:142) at org.apache.flink.runtime.io.network.buffer.Buffer.getMemorySegment(Buffer.java:78) at org.apache.flink.runtime.io.network.api.serialization.SpillingAdaptiveSpanningRecordDeserializer.setNextBuffer(SpillingAdaptiveSpanningRecordDeserializer.java:72) at org.apache.flink.runtime.io.network.api.reader.AbstractRecordReader.getNextRecord(AbstractRecordReader.java:80) at org.apache.flink.runtime.io.network.api.reader.MutableRecordReader.next(MutableRecordReader.java:34) at org.apache.flink.runtime.operators.util.ReaderIterator.next(ReaderIterator.java:73) at org.apache.flink.runtime.operators.MapDriver.run(MapDriver.java:96) at org.apache.flink.runtime.operators.RegularPactTask.run(RegularPactTask.java:496) at org.apache.flink.runtime.operators.RegularPactTask.invoke(RegularPactTask.java:362) at org.apache.flink.runtime.taskmanager.Task.run(Task.java:559) at java.lang.Thread.run(Thread.java:745) {code} This looks like a concurrent buffer pool release/buffer usage error. I'm investing this today. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (FLINK-1724) TestingCluster uses local communication with multiple task managers
[ https://issues.apache.org/jira/browse/FLINK-1724?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ufuk Celebi resolved FLINK-1724. Resolution: Fixed Fix Version/s: 0.9 This has been fixed as I've said. The point you've raised is valid and needs a separate discussion on the mailing list. TestingCluster uses local communication with multiple task managers --- Key: FLINK-1724 URL: https://issues.apache.org/jira/browse/FLINK-1724 Project: Flink Issue Type: Bug Affects Versions: master Reporter: Ufuk Celebi Priority: Minor Fix For: 0.9 Starting a task manager via TestingUtils does not respect the number of configured task managers and mis-configures the task managers to use local network communication (LocalConnectionManager instead of NettyConnectionManager). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1747) Remove deadlock detection and pipeline breaker placement in optimizer
[ https://issues.apache.org/jira/browse/FLINK-1747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14568078#comment-14568078 ] Ufuk Celebi commented on FLINK-1747: [~StephanEwen], was this a duplicate of FLINK-2041? Remove deadlock detection and pipeline breaker placement in optimizer - Key: FLINK-1747 URL: https://issues.apache.org/jira/browse/FLINK-1747 Project: Flink Issue Type: Improvement Components: Optimizer Affects Versions: master Reporter: Ufuk Celebi Priority: Minor The deadlock detection in the optimizer, which places pipeline breaking caches has become redundant with recently added changes. We now use blocking data exchanges for branching programs, which are merged again at a later point. Therefore, we can start removing the respective code in the optimizer. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-2119] Add ExecutionGraph support for ba...
GitHub user uce opened a pull request: https://github.com/apache/flink/pull/754 [FLINK-2119] Add ExecutionGraph support for batch scheduling This PR adds support for a newly introduced scheduling mode `BATCH_FROM_SOURCES`. The goal for me was to make this change *minimally invasive* in order to not touch too much core code shortly before the release. Essentially, this only touches two parts of the codebase: the scheduling action for blocking results and the job vertices. If you set the scheduling mode to `BATCH_FROM_SOURCES`, you can manually configure which input vertices are used as the sources when scheduling (`setAsBatchSource`). You can then manually specify the successor vertices (`addBatchSuccessor`), which are scheduled after the blocking results are finished. When there are no successors specified manually, the result consumers are scheduled as before. Mixing pipelined and blocking results leads to unspecified behaviour currently (aka it's not a good idea to do this at the moment). When you have something like this: ``` O sink | . - denotes a pipelined result O union +´|`+ | | | â â â --- denotes a blocking result O O O src0 src1 src2 ``` You can first first schedule `src0`, `src1`, `src2`, and then continue with the `union-sink` pipeline. ```java src[0].setAsBatchSource(); // src0 is the first to go... src[0].addBatchSuccessors(src[1]); // src0 = src1 src[1].addBatchSuccessors(src[2]); // src1 = src2 src[2].addBatchSuccessors(union); // src2 = [union = sink] ``` @StephanEwen or @tillrohrmann will work on the Optimizer/JobGraph counterpart of this and will build the `JobGraph` for programs in batch mode using the methods introduced in this PR. Do you guys think that this minimal support is sufficient for the first version? (Going over the result partition notification code, I really think it's pressing to refactor it. It is very very hard to understand. The corresponding issue [FLINK-1833](https://issues.apache.org/jira/browse/FLINK-1833) has been created a while back. I want to do this after the release.) You can merge this pull request into a Git repository by running: $ git pull https://github.com/uce/incubator-flink legs-2119 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/754.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #754 commit 4ac15982700257d3deb2d55a389afd0531f7f8be Author: Ufuk Celebi u...@apache.org Date: 2015-06-01T21:12:47Z [FLINK-2119] Add ExecutionGraph support for batch scheduling --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Closed] (FLINK-1747) Remove deadlock detection and pipeline breaker placement in optimizer
[ https://issues.apache.org/jira/browse/FLINK-1747?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stephan Ewen closed FLINK-1747. --- Remove deadlock detection and pipeline breaker placement in optimizer - Key: FLINK-1747 URL: https://issues.apache.org/jira/browse/FLINK-1747 Project: Flink Issue Type: Improvement Components: Optimizer Affects Versions: master Reporter: Ufuk Celebi Priority: Minor The deadlock detection in the optimizer, which places pipeline breaking caches has become redundant with recently added changes. We now use blocking data exchanges for branching programs, which are merged again at a later point. Therefore, we can start removing the respective code in the optimizer. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (FLINK-1747) Remove deadlock detection and pipeline breaker placement in optimizer
[ https://issues.apache.org/jira/browse/FLINK-1747?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stephan Ewen resolved FLINK-1747. - Resolution: Duplicate Duplicate of FLINK-2041 Remove deadlock detection and pipeline breaker placement in optimizer - Key: FLINK-1747 URL: https://issues.apache.org/jira/browse/FLINK-1747 Project: Flink Issue Type: Improvement Components: Optimizer Affects Versions: master Reporter: Ufuk Celebi Priority: Minor The deadlock detection in the optimizer, which places pipeline breaking caches has become redundant with recently added changes. We now use blocking data exchanges for branching programs, which are merged again at a later point. Therefore, we can start removing the respective code in the optimizer. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-2098] Ensure checkpoints and element em...
Github user StephanEwen commented on the pull request: https://github.com/apache/flink/pull/742#issuecomment-107751723 Looks good all in all. I am preparing a followup pull request that cleans up a few things, adds comments, and addresses Marton's comments. One thing I noticed is that all the non-checkpointed sources have a checkpoint lock in the signature as well. Should we offer two source interfaces: `SourceFunction` and `CheckpointedSourceFunction` ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-2098) Checkpoint barrier initiation at source is not aligned with snapshotting
[ https://issues.apache.org/jira/browse/FLINK-2098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14568279#comment-14568279 ] ASF GitHub Bot commented on FLINK-2098: --- Github user StephanEwen commented on the pull request: https://github.com/apache/flink/pull/742#issuecomment-107751723 Looks good all in all. I am preparing a followup pull request that cleans up a few things, adds comments, and addresses Marton's comments. One thing I noticed is that all the non-checkpointed sources have a checkpoint lock in the signature as well. Should we offer two source interfaces: `SourceFunction` and `CheckpointedSourceFunction` ? Checkpoint barrier initiation at source is not aligned with snapshotting Key: FLINK-2098 URL: https://issues.apache.org/jira/browse/FLINK-2098 Project: Flink Issue Type: Bug Components: Streaming Affects Versions: 0.9 Reporter: Stephan Ewen Assignee: Aljoscha Krettek Priority: Blocker Fix For: 0.9 The stream source does not properly align the emission of checkpoint barriers with the drawing of snapshots. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-2098] Improvements on checkpoint-aligne...
GitHub user StephanEwen opened a pull request: https://github.com/apache/flink/pull/755 [FLINK-2098] Improvements on checkpoint-aligned sources Based on #742 , include multiple cleanups and fixes on top. You can merge this pull request into a Git repository by running: $ git pull https://github.com/StephanEwen/incubator-flink stream_sources Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/755.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #755 commit 152f4af6966e8a2e770f20a1f516d1de29c5b13e Author: twalthr twal...@apache.org Date: 2015-05-27T13:32:11Z [hotfix] Remove execute() after print() in Table API examples This closes #735 commit d82942a396777ea6debbf55a91916d4d5c3ecdaa Author: Aljoscha Krettek aljoscha.kret...@gmail.com Date: 2015-05-28T08:24:37Z [FLINK-2098] Ensure checkpoints and element emission are in order Before, it could happen that a streaming source would keep emitting elements while a checkpoint is being performed. This can lead to inconsistencies in the checkpoints with other operators. This also adds a test that checks whether only one checkpoint is executed at a time and the serial behaviour of checkpointing and emission. This changes the SourceFunction interface to have run()/cancel() methods where the run() method takes a lock object on which it needs to synchronize updates to state and emission of elements. commit dad6a0092489fe7a9ef4508dc006d5a6cc42a2ba Author: Stephan Ewen se...@apache.org Date: 2015-06-02T01:31:43Z [FLINK-2098] Improvements on checkpoint-aligned sources --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-2098) Checkpoint barrier initiation at source is not aligned with snapshotting
[ https://issues.apache.org/jira/browse/FLINK-2098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14568362#comment-14568362 ] ASF GitHub Bot commented on FLINK-2098: --- Github user StephanEwen commented on the pull request: https://github.com/apache/flink/pull/742#issuecomment-107768981 Some important comments: - Exceptions should always propagate, and exceptions during cancelling can be thrown. The `Task` class filters out exceptions that come after the transition to the canceling state. Don't try to be super smart there, propagate your exceptions, and the context will decide whether they should be logged. - The RabbitMQ source is not doing any proper failure handling (critical!). I opened a separate JIRA issue for that. - Just commenting out the twitter sources is bad style, in my opinion. There is actually no chance of getting this pull request in, and this one here is a release blocker, while the Twitter Source one is not a release blocker. - All functions set their `running` flag to true at the beginning of the `run()` method. In the case of races between invoking and canceling the source, it can mean that the flag is set to false in the cancel() method and then to true in the run() method, resulting in a non-canceled source. I fixed some cases in my follow up pull-request, but many are still in. Please make another careful pass with respect to that. - In general, the streaming sources are extremely badly commented. This needs big improvements! Checkpoint barrier initiation at source is not aligned with snapshotting Key: FLINK-2098 URL: https://issues.apache.org/jira/browse/FLINK-2098 Project: Flink Issue Type: Bug Components: Streaming Affects Versions: 0.9 Reporter: Stephan Ewen Assignee: Aljoscha Krettek Priority: Blocker Fix For: 0.9 The stream source does not properly align the emission of checkpoint barriers with the drawing of snapshots. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-2098] Ensure checkpoints and element em...
Github user StephanEwen commented on the pull request: https://github.com/apache/flink/pull/742#issuecomment-107768981 Some important comments: - Exceptions should always propagate, and exceptions during cancelling can be thrown. The `Task` class filters out exceptions that come after the transition to the canceling state. Don't try to be super smart there, propagate your exceptions, and the context will decide whether they should be logged. - The RabbitMQ source is not doing any proper failure handling (critical!). I opened a separate JIRA issue for that. - Just commenting out the twitter sources is bad style, in my opinion. There is actually no chance of getting this pull request in, and this one here is a release blocker, while the Twitter Source one is not a release blocker. - All functions set their `running` flag to true at the beginning of the `run()` method. In the case of races between invoking and canceling the source, it can mean that the flag is set to false in the cancel() method and then to true in the run() method, resulting in a non-canceled source. I fixed some cases in my follow up pull-request, but many are still in. Please make another careful pass with respect to that. - In general, the streaming sources are extremely badly commented. This needs big improvements! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-2098) Checkpoint barrier initiation at source is not aligned with snapshotting
[ https://issues.apache.org/jira/browse/FLINK-2098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14568363#comment-14568363 ] ASF GitHub Bot commented on FLINK-2098: --- Github user StephanEwen commented on the pull request: https://github.com/apache/flink/pull/742#issuecomment-107769045 Here is my updated code: https://github.com/StephanEwen/incubator-flink/tree/stream_sources Checkpoint barrier initiation at source is not aligned with snapshotting Key: FLINK-2098 URL: https://issues.apache.org/jira/browse/FLINK-2098 Project: Flink Issue Type: Bug Components: Streaming Affects Versions: 0.9 Reporter: Stephan Ewen Assignee: Aljoscha Krettek Priority: Blocker Fix For: 0.9 The stream source does not properly align the emission of checkpoint barriers with the drawing of snapshots. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1731) Add kMeans clustering algorithm to machine learning library
[ https://issues.apache.org/jira/browse/FLINK-1731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14568392#comment-14568392 ] Sachin Goel commented on FLINK-1731: I'm creating a separate issue for Initialization schemes. This would address the Random, kmeans++ and kmeans|| initialization methods. Since any initialization itself is a solution to the kmeans problem, they would all be instances of Predictor also. User can access the centroids learned via instance.centroids and pass them to the KMeans algorithm which has been implemented. These is another way possible which takes the burden off the user to figure out how to pass the initial centroids to KMeans. We can have a parameter which signifies which initialization scheme to use. The KMeans algorithm would then need to call the appropriate initialization scheme in its fit function and work with the centroids found by the initialization scheme as its initial centroids. Add kMeans clustering algorithm to machine learning library --- Key: FLINK-1731 URL: https://issues.apache.org/jira/browse/FLINK-1731 Project: Flink Issue Type: New Feature Components: Machine Learning Library Reporter: Till Rohrmann Assignee: Peter Schrott Labels: ML The Flink repository already contains a kMeans implementation but it is not yet ported to the machine learning library. I assume that only the used data types have to be adapted and then it can be more or less directly moved to flink-ml. The kMeans++ [1] and the kMeans|| [2] algorithm constitute a better implementation because the improve the initial seeding phase to achieve near optimal clustering. It might be worthwhile to implement kMeans||. Resources: [1] http://ilpubs.stanford.edu:8090/778/1/2006-13.pdf [2] http://theory.stanford.edu/~sergei/papers/vldb12-kmpar.pdf -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (FLINK-2130) RabbitMQ source does not fail when failing to retrieve elements
Stephan Ewen created FLINK-2130: --- Summary: RabbitMQ source does not fail when failing to retrieve elements Key: FLINK-2130 URL: https://issues.apache.org/jira/browse/FLINK-2130 Project: Flink Issue Type: Bug Reporter: Stephan Ewen The RMQ source only logs when elements cannot be retrieved. Failures are not propagated. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (FLINK-2131) Add Initialization schemes for K-means clustering
Sachin Goel created FLINK-2131: -- Summary: Add Initialization schemes for K-means clustering Key: FLINK-2131 URL: https://issues.apache.org/jira/browse/FLINK-2131 Project: Flink Issue Type: Task Components: Machine Learning Library Reporter: Sachin Goel Assignee: Sachin Goel The Lloyd's [KMeans] algorithm takes initial centroids as its input. However, in case the user doesn't provide the initial centers, they may ask for a particular initialization scheme to be followed. The most commonly used are these: 1. Random initialization: Self-explanatory 2. kmeans++ initialization: http://ilpubs.stanford.edu:8090/778/1/2006-13.pdf 3. kmeans|| : http://theory.stanford.edu/~sergei/papers/vldb12-kmpar.pdf For very large data sets, or for large values of k, the kmeans|| method is preferred as it provides the same approximation guarantees as kmeans++ and requires lesser number of passes over the input data. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-2098] Ensure checkpoints and element em...
Github user aljoscha commented on the pull request: https://github.com/apache/flink/pull/742#issuecomment-107811407 I will fix the remaining cancel() races. The twitter stuff I just commented out because I assumed that the new TwitterSource could be merged right away and I was waiting for that. As for two Source interfaces. We can certainly do that. The reason I didn't do it is because there would be a lot of duplication because we have SourceFunction, ParallelSourceFunction, RichSourceFunction and ParallelRichSourceFunction. With the new Source interface this would go up to 8 interfaces for the sources. (That's also the reason why I didn't have Kafka derived from the RichParallelSourceFunction, I thought that maybe we could get rid of the special interfaces for parallel sources.) Also, I realize there are many more problems. I just can't address them all in a single PR. :sweat_smile: --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---